Thursday, December 20, 2007

A day at DOE's Save Energy Now workshop

Earlier this week I had the privilege of attending a workshop sponsored by the Department of Energy's Save Energy Now program , in collaboration with the EPA/EnergyStar group. The purpose was to help DOE define and engineer tools to help assess data center efficiency performance. They're reaching-out to industry to ensure they're using the right metrics, asking the right questions, and producing a product that will be useful in real-life. The tool will (presumably) help firms "score" the efficiency of their data center from 1-100, much the same way there are already tools to score the energy-efficiency of buildings, etc. BTW, see my earlier blog on some proposed metrics from The Green Grid and Uptime Institute.

In the room were mostly vendors such as AMD, Emerson Electric, HP, Intel, NetApp and Sun. Plus, folks from DOE, EPA, Lawrence Berkeley National Labs, the Green Grid, and the Uptime Institute attended as supporting resources to DOE. And first-off, it was great to see the overlap/interaction between these groups -- especially that DOE, Green Grid and Uptime Institute were cooperating.

The tool itself is expected to have 4 components of assessment: IT (servers, storage, networking, softwar), Cooling (chillers, AC units, fans), Power Systems (UPS, distribution) and Energy Sources (generation, etc.).

But the real core of the day's conversation is around what's meant by "Efficiency" - all agreed that at the highest level it was the ratio of "useful computing output" to the "energy supplied to the overall data center". That second number is sort of the easy part: it includes all of the power used for lighting, cooling, power distribution, etc. etc.; sometimes it's hard to measure empirically, but it all comes down to kWhs. The real issue, it turns out, is what's meant by the first number, "useful computing output".

In our afternoon IT breakout group, about 10 of us debated for a good hour or so about just that: how do we define "output"? Is it MIPS? Is it measured in SPEC units? Is it CPU utilization? And what about storage and network "output" as well? In the end, we agreed that we should define it the way any IT professional would: As an application service with an associated SLA. So for example, the useful output would be a service called "Microsoft Exchange" with SLA traits of X users, Y uptime, and Z latency. And most important, this approach makes the output independent of underlying technology and implementation. Thus, two data centers could implement this service/SLA combination in vastly different ways, with vastly different power requirements and therefore energy efficiencies.

When the DOE tool (or set of tools) is complete in mid-2008, it will represent a seminal event: Data Center operators will have a standard benchmark against which to self-rate -- and to (confidentially) compare their efficiencies against their peers. It will also begin to put pressure on IT vendors (!) to focus on the "gestalt" efficiency of their systems, not just of their components. (IMHO, this will make me *very* happy)

And, I hope, this benchmark will begin to accelerate (a) the move toward dynamically-managed shared compute resources, and (b) the technical & organizational bridge between IT system management and facilities management.

No comments: