Thursday, December 20, 2007

A day at DOE's Save Energy Now workshop

Earlier this week I had the privilege of attending a workshop sponsored by the Department of Energy's Save Energy Now program , in collaboration with the EPA/EnergyStar group. The purpose was to help DOE define and engineer tools to help assess data center efficiency performance. They're reaching-out to industry to ensure they're using the right metrics, asking the right questions, and producing a product that will be useful in real-life. The tool will (presumably) help firms "score" the efficiency of their data center from 1-100, much the same way there are already tools to score the energy-efficiency of buildings, etc. BTW, see my earlier blog on some proposed metrics from The Green Grid and Uptime Institute.

In the room were mostly vendors such as AMD, Emerson Electric, HP, Intel, NetApp and Sun. Plus, folks from DOE, EPA, Lawrence Berkeley National Labs, the Green Grid, and the Uptime Institute attended as supporting resources to DOE. And first-off, it was great to see the overlap/interaction between these groups -- especially that DOE, Green Grid and Uptime Institute were cooperating.

The tool itself is expected to have 4 components of assessment: IT (servers, storage, networking, softwar), Cooling (chillers, AC units, fans), Power Systems (UPS, distribution) and Energy Sources (generation, etc.).

But the real core of the day's conversation is around what's meant by "Efficiency" - all agreed that at the highest level it was the ratio of "useful computing output" to the "energy supplied to the overall data center". That second number is sort of the easy part: it includes all of the power used for lighting, cooling, power distribution, etc. etc.; sometimes it's hard to measure empirically, but it all comes down to kWhs. The real issue, it turns out, is what's meant by the first number, "useful computing output".

In our afternoon IT breakout group, about 10 of us debated for a good hour or so about just that: how do we define "output"? Is it MIPS? Is it measured in SPEC units? Is it CPU utilization? And what about storage and network "output" as well? In the end, we agreed that we should define it the way any IT professional would: As an application service with an associated SLA. So for example, the useful output would be a service called "Microsoft Exchange" with SLA traits of X users, Y uptime, and Z latency. And most important, this approach makes the output independent of underlying technology and implementation. Thus, two data centers could implement this service/SLA combination in vastly different ways, with vastly different power requirements and therefore energy efficiencies.

When the DOE tool (or set of tools) is complete in mid-2008, it will represent a seminal event: Data Center operators will have a standard benchmark against which to self-rate -- and to (confidentially) compare their efficiencies against their peers. It will also begin to put pressure on IT vendors (!) to focus on the "gestalt" efficiency of their systems, not just of their components. (IMHO, this will make me *very* happy)

And, I hope, this benchmark will begin to accelerate (a) the move toward dynamically-managed shared compute resources, and (b) the technical & organizational bridge between IT system management and facilities management.

Wednesday, December 19, 2007

End-of-the-year holiday shopping for IT

The end of the fiscal year us upon most of us - and for some that means spending any remaining dollars on some quick-hit (read: fast ROI) initiatives.

So in the spirit of the holidays, here are some interesting links:

Bridget Botelho at SearchDataCenter writes "Servers get no rest during the holidays" -- that most enterprises (and especially small/medium businesses) leave all servers on during the holidays, even if the company is closed. It's a waste of power/money, even though many solutions abound. (And did you even think about the security risk of keeping your IT on but essentially unsupervised?)

2. And, for those of you with money left in your budget, Rick Vanover from TechRepublic chimed-in to suggest 10 good ways to use your remaining IT budget before the end of the year - I particularly like the following:
  • #3: Purchase power management: Many new power management devices are available now that can be a good replacement for your limited power distribution units (PDUs). These PDUs can add management layers to individual power sockets for power consumption, naming, grouping, and power control. The new devices can also add more ports should you need to power more computer systems in your racks.
I would only add that in addition to switched PDUs, you should consider purchasing policy-based software to control them.

3. And, Rick went on to also cite an earlier blog of his,
10 things you should know about advance power management - another topic near-and-dear to my heart, especially:
  • #7: Turn off retired or unused devices: This will reduce your power consumption — and possibly accelerate your removal of the device so as not to overprovision power unnecessarily...
The net-net is the following: If you need a quick-hit, easy-to-implement (and money-saving) and pragmatic solution, pursue intelligent operation of your IT - no need for a technology refresh, no need to re-architect anything.

Monday, December 10, 2007

Server Power Management Myths - and more

An old friend, James Governor, in his GreenMonk blog recently got me thinking. First he pointed out that it's common in Japan to turn servers off at night. Then it wasn't so much as his follow-on blog (about turning servers off when you don't need them) as acomment he highlighted from Mike Gunderloy:
  • Has anyone looked at the labor costs of this? I know that even on my tiny little dozen-machine network, I am reluctant to power everything off at night simply because it takes so bloody long waiting for the damn things to boot up in the morning. Seems like actual working fast-boot technologies would go a long way to sell this initiative.
This is exactly the sort of objection or "urban myth" that we're trying to dispel. For example, many believe that application availabiltiy might be compromised if servers are shut down. However, there is a solution to this: policy-based control, whereby servers might be powered-up in advance of their need. That's the sort of work we're doing at Cassatt. And even in a small business, if servers in a closet are turned off nights and weekends, you're still talking about energy savings on the order of 40% or more over the duration of a year!

BTW, if you're interested in additional "Urban Myths" about server power control, check out the "Myths and Realities of Power Management" page.

And while you're at it, give us some feedback on how you feel about IT Energy Management and "Green IT": we're hosting a 5-minute survey this week (and, you could win a Wii if you take it).

Saturday, December 8, 2007

The Case for Energy-Proportional Computing

senior vice president of operations at Google and a Google FellowEnergy-proportional designs would enable large energy savings in servers, potentially doubling their efficiency in real-life use. Achieving energy proportionality will require significant improvements in the energy usage profile of every system component, particularly the memory and disk subsystems."

The two are making the case not only for server power management, but are calling on vendors to go a step further, to make computers adapt their consumptive ranges directly to the compute load consumed. This would be highly complementary to consolidation efforts currently underway.

In conclusion, the paper says,

  • Servers and desktop computers benefit from much of the energy-efficiency research and development that was initially driven by mobile devices' needs. However, unlike mobile devices, which idle for long periods, servers spend most of their time at moderate utilizations of 10 to 50 percent and exhibit poor efficiency at these levels. Energy-proportional computers would enable large additional energy savings, potentially doubling the efficiency of a typical server. Some CPUs already exhibit reasonably energy-proportional profiles, but most other server components do not.
  • We need significant improvements in memory and disk subsystems, as these components are responsible for an increasing fraction of the system energy usage. Developers should make better energy proportionality a primary design objective for future components and systems. To this end, we urge energy-efficiency benchmark developers to report measurements at nonpeak activity levels for a more complete characterization of a system's energy behavior
Among a few scholarly pieces from Google, the report also cites two great references; one, the US EPA's Report to Congress on Data Center Efficiency, and the other is one of many fine works by Jonathan Koomey, "Estimating Total Power Consumption by Servers in the U.S. and the World"