Monday, September 8, 2008

Inherent efficiencies of Cloud Computing, Utility Computing, and PaaS

Ever noticed that the two hottest topics in IT today are Data Center Efficiency and Cloud Computing? Ever wondered if the two might be related? I did. And it’s clear that the media, industry analysts – and most of all IT OPs – have missed this relationship entirely. I now feel obligated to point out how and why we need to make this connection as soon as we can.

Let me cut to the chase: The most theoretically-efficient IT compute infrastructure is a Utility Computing architecture – essentially the same architecture which supports PaaS or “cloud computing”. So it helps to understand why this is so, why today's efficiency “point solutions” will never result in maximum compute efficiency, and why the “Green IT” movement needs to embrace Utility Computing architectures as soon as it can.

To illustrate, I’ll remind you of one of my favorite observations: How is Amazon Web Services able to charge $0.10/CPU-Hour, (equating to ~$870/year) when the average IT department or hosting provider has a loaded server cost of somewhere between $2,000-$4,000/year? What does Amazon know that the rest of the industry doesn’t?

First: A bit of background

Data center efficiency is top-of-mind lately. As I’ve mentioned before, a recent EPA report to U.S. Congress outlined that over 1.5% of U.S. electricity is going to power data centers, and that number may well double by 2011. Plus, according to an Uptime Institute White Paper, the 3-year cost of power to operate a server will now outstrip the original purchase cost of that server. Clearly, the issues of high cost and limited capacity for power are currently hamstringing data center growth, and the industry is trying to find a way to overcome it.

Why point-solutions and traditional approaches will miss the mark

I regularly attend a number of industry organizations and forums on IT energy efficiency, and have spoken with all major industry analysts on the topic. And what strikes me as absurdly odd is that the industry (taken as a whole) is missing-the-mark on solving this energy problem. Industry bodies – mostly driven by large equipment vendors – are mainly proposing *incremental* improvements to “old” technology models. Ultimately these provide a few % improvement here, a few % there. Better power supplies. DC power distribution. Air flow blanking panels. Yawn.

These approaches are oddly similar to Detroit trying to figure out how to make its gas-guzzlers more efficient by using higher-pressure tires, better engine control chips and better spark plugs. They’ll never get to an order-of-magnitude efficiency improvement on transportation.

Plus, industry bodies are focusing on metrics (mostly a good idea) that will never get us to the major improvements we need. Rather, the current metrics are lulling us into a misplaced sense of complacency. To wit: The most oft-quoted data center efficiency metrics are the PUE (Power Use Effectiveness), and it’s reciprocal, the DCiE (Data Center Infrastructure Efficiency). These essentially say, “get as much power through your data center and to the compute equipment, with as little siphoned-off to overhead as possible.”

While PUE/DCIE are nice metrics to help drive overhead (power distribution, cooling) power use down, they don’t at all address the efficiency with which the compute equipment is applied. For example, you could have a pathetically low-level of compute utilization, but still achieve an incredibly wonderful PUE and DCIE number. Sort of akin to Detroit talking about transmission efficiency rather than actual mileage.

These metrics will continue to mislead the IT industry unless it fundamentally looks at how IT resources are applied, utilized and operated. (BTW, I am more optimistic about the Deployed HW Utilization Efficiency “DH-UE” metric put forth by the Uptime Institute in an excellent white paper, but rarely mentioned)

Where we have to begin: Focus on operational efficiency rather than equipment efficiency

So, while Detroit was focused on incremental equipment efficiency like higher tire pressure and better spark plugs to increase mileage, Toyota was looking at fundamental questions like how the car was operated. The Prius didn’t just have a more efficient engine, but it had batteries (for high peak needs), regenerative braking (to re-capture idle “cycles”), and a computer/transmission to “broker” these energy sources. This was an entirely new operational model for a vehicle.

The IT industry now needs a similar operationally-efficient re-engineering.

Yes, we still need more efficient cooling systems and power distribution. But we need to re-think how we operate and allocate resources in an entirely new way. This is the ONLY approach that will result in Amazon-level cost reductions and economies-of-scale. And I am referring to cost reductions WITHIN your own IT infrastructure. Not to outsourcing. A per-CPU cost basis under $1,000/year, including power, cooling, and administration. What IT Operations professional doesn’t desire that?

Punchline: The link between efficiency, cloud computing & utility computing architectures

What the industry has termed “utility computing” or Platform-as-a-Service (a form of “cloud” computing) provides just this ideal form of operational-efficiency and energy-efficiency to IT.

Consider the principles of Utility Computing (the architecture behind “clouds”): Only use compute power when you need it. Re-assign it when-and-where it’s required. Retire it when it’s not needed at all. Dynamically consolidate workloads. And be indifferent with respect to the make, model and type of HW and SW. Now consider the possibilities of using this architecture within your four walls.

Using the design centers above, power (and electricity cost) is *inherently* minimized because capital efficiency is continuously maximized. Always, regardless of the variation from hour-to-hour or month-to-month. And, this approach is still compatible with “overhead” improvements such as to cooling and power distribution. But it always guarantees that the working capital of the data center is continuously optimized. (Think: the Prius engine isn’t running all the time!)

On top of this approach, it would then be appropriate to re-focus on PUE/DCIE!

The industry is slowly coming around. In a recent article, my CEO, Bill Coleman, pointed out a similar observation: Bring the “cloud” inside, operate your existing equipment more efficiently, and save power and $ in the process.

I’m only waiting for the rest of the industry to acquiesce the inherent connection between Energy efficiency, operational efficiency, power efficiency, and the architectures behind cloud computing.

Only then will we see a precipitous drop in energy consumed by IT, and the economies-of-scale that technology ought to provide.

No comments: