Tuesday, September 22, 2009

Alternative Recommendation for DCeP "Service Productivity"

Back in February of this year, The Green Grid published a paper listing proposed Proxy measures for data center productivity, specifically Data Center Energy Productivity (DCeP).

This paper followed a much earlier output from the group in 2007 - which helped define the now much-used PUE and DCiE metrics which I wrote about back then. Those metrics were (and are) nice if what you care about are "basic" efficiencies of a data center -- simply how much power is getting to your servers relative to all of the other power being consumed by infrastructure systems (e.g. lighting, power distribution, cooling, etc.). But the shortcomings are they don't quantify the "useful output" of a datacenter vs. power input. So, for example, you could have a fantastic PUE... but with a datacenter full of idle servers.

Again, enter The Green Grid to take analysis to the next level. The excellent paper published in February details 8 "proxy" approaches (i.e. not necessarily metrics) that could be used by data center operators to begin to baseline efficiencies based on "useful output". The Green Grid also set up a survey where they have been soliciting feedback from users regarding the appropriateness, usefulness, etc. of these proxies.

Why 8 approaches? Because not everyone agrees on what "useful work output" of a datacenter really is. Should it be Bits-per-kWh (proxy #4)? Weighted CPU utilization (proxies #5 & #6)? Compute units delivered per second (proxy #7)? Each has its pros and cons. Fortunately, the Green Grid recognized that nothing's perfect. Says the paper: "...The goal is to find a proxy that will substitute for a difficult measurement, but that still gives a good-enough indication of useful work completed."

In addition, the Data Center Knowledge blog pointed out:
The new goal is to develop a simple indicator, or proxy, rather than a full metric. The Green Grid compares the proxy to EPA mileage ratings for new cars, which provide useful data on energy efficiency, with the caveat that “your mileage may vary.” The proposals “do not explicitly address all data center devices and thus fall short of a complete overall measure of data center productivity,” the group says.
To this end, the issue was also recently dealt with extremely eloquently in Steve Chambers' ViewYonder perspective on datacenter efficiency - and has the right idea: Why not base efficiency on the service provided (as opposed to CPUs themselves, or some abstract mathematical element). This approach is very similar to what I proposed a year ago February, Measuring "useful work" of a Datacenter"

In short, the proposal is to compare the data center Service's SLAs with the power the overall datacenter consumes.

Why use the "SLA" (Service Level Agreement)? Two reasons. (1) The SLA is already part of the vernacular that datacenter operators already use. It's easily understood, and frequently well-documented. (2) The SLA encapsulates many "behind-the-scenes" factors that contribute to energy consumption. Take this example: Not all 1,000 seat email services are created equal. One may be within a Tier-I data center with a relatively low response rate requirement and allowing users only 500MB of storage per mailbox. Another enterprise with the same email application may be operating in a Tier-III datacenter environment with a rigorously-controlled response rate, a full disaster-recovery requirement, and 2GB of storage per mailbox. These two SLA examples are quite different and will therefore consume different power. But wouldn't you now rather compare apples-to-apples to see if your particular instantiation of these 1,000 mailboxes was more efficient to another enterprise with the same SLA?

How would such a proxy/measurement be accomplished? The approach is somewhat analogous to the Green Grid's proxy #1 ("Self-assessment reporting"), coupled with peer-reporting/comparison of data as is done with the DOE's DC-Pro tool.

Thus, data centers would
1) quantify the number of Services and SLAs for each,
2) measure overall power consumed,
3) upload these numbers to a public (but anonymized) database.
After a while, there would be statistically-significant comparisons to be made -- say a "best practice" energy efficiency range for a given Tier-III email application with 2GB storage and disaster-recovery option.

I'm open to other suggestions of how to pragmatically apply application SLAs vs Watts to gauge overall datacenter energy efficiency - again, my earlier proposal of this is here. But it seems that the SLA encapsulates all of the "output" related service metrics, while being agnostic to the actual implementation. Seems elegant, if you ask me.

No comments: