Friday, May 30, 2008

More support for server power management

It was a great week for those endorsing the concept of actively-managing the power status of servers to drastically reduce data center power consumption.

First, Andy Lawrence with The 451 group outlined it as
... a technology that optimizes the energy efficiency of computers and other electronic devices by reducing power consumption to the lowest possible point while still supporting minimal agreed business or functional objectives. It does this by placing equipment or components into reduced power states according to policies and schedules or in response to a drop in machine utilization. It can also restore equipment to a fully functioning mode according to the same principles.
In one of his recent works, "Power management: a three-phased, multispeed market" that there are 3 waves of power management:

1) Enterprise Desktop power management
2) Datacenter/server power management
3) Integrated datacenter power distribution management
"Although this is a market in its infancy, [server] power management will be an important, if not critical, piece of the technological jigsaw, we believe, and one that needs to be in place to run an eco-efficient datacenter. Cassatt is one of a handful of companies making the early running. Cassatt Active Response 5.1 adds a critical on-demand capability that improves the business case for the technology.
Bridget Botelho in also managed to speak with a user of Cassatt's active response. The company began with a small environment of about eight devices, mostly servers. Once the use of Active Power Management was validated, the company increased to 40 devices, and now successfully runs Active Power Management on about 120 devices.

The global company has 5,000 racks of servers, most of them in Sunnyvale, Calif., and others located around the world. Prior to using Cassatt, the company had no way to manage system power, but the high cost of power in California and a sense of corporate responsibility prompted it to try Cassatt's software.

I really like the way the user (who shall remain nameless) endorses the concept (at right)

Wednesday, May 21, 2008

IT Automation & Power Mangement: a Synopsis of Recent Coverage

This week has seen an unusual share of coverage about IT energy issues - from a lot of different angles. However, whether or not "green" was stuck to the heading, everyone seems to agree that energy is a serious operational cost issue, and the bottom line for any energy-related initiative is always cost.

Grids & Power Management: Howard Marks of InformationWeek writes of his observation that in storage, MAID (Massive Arrays of Inactive Disks) regularly shuts down idle drives. And in grid arrays he's now seeing RAIN (Redundant Arrays of Independent Nodes) with a similar model -- essentially shutting down servers when not in use.

IT Energy Survey: Erin Bell of eChannelLine referenced a recent survey done by Cassatt regarding the degree of energy waste in data centers, and operators' willingness to look at power management as one option to curb it.

Automation = Efficiency: Gail Dutton in Virtual Strategy Magazine writes of the myth and realities of running a true "lights-out" data center, how automation plays a supporting role, and how various levels of efficiencies can be recovered by doing so... if management is willing to cede control.

Automation & Utility Computing: John Rath in the Data Center Links blog posted some humorous/insightful comments about the current reality of Utility Computing: "Myth? I don't think so. I think it is the early stages, but definitely not a myth.. I don't think utility computing, cloud computing, grid computing -- what ever you want to call it is a myth. Far from it. The applications of these technologies will be seen in different places, for different reasons and at different paces over the coming years." I couldn't agree with you more, John.

Curbing the Electric Consumption of Avatars: My buddy Dave Ohara on the Green Data Center Blog reminds us of Nicholas Carr's observation that the power consumption of a Second Life avatar is about that of a real-life Brazilian. Sort of tongue-in-cheek, but getting at the point that these computing environments numbering 1,000's of servers also need to be power-controlled during off-peak periods.

And finally,

It's about Efficient Operation, stupid: Arthur Cole of Data Center Central -- bless his heart -- finally implies the observation we've been pointing out for months: it's great to pursue efficient equipment to achieve IT energy efficiency, but to get the rest of the way, you need to pursue Efficient Operation of the equipment, too.

Monday, May 19, 2008

A big step forward for self-managing data centers

Today there is a modest but hugely-meaningful announcement regarding data center operational efficiency coming out of Cassatt here in San Jose. It's about Cassatt Active Response 5.1 software, and it includes the ability to take actions on infrastructure based on application demand. That means that server power control, or even entire server repurposing and network provisioning, can be triggered by the demand (or service level) of a given application.

This means that if you run a development lab, idled machines might be automatically power-down after a period of time (say, after a test run) to save power. It means that a server farm can automatically trigger provisioning additional servers if service levels drop below a pre-determined threshold - saving time. It means as different application demands ebb and flow, the data center will adapt to demand by re-purposing (or retiring) bare-metal hardware -- making the best use of capital. And all of this whether-or-not virtualization is present, regardless of the underlying platform, and without adding software layers.

You'll find this concept filed under Gartner's Real-Time Infrastructure (RTI) category, under Forrester's Organic IT concept, or sometimes under Utility Computing. You've even seen big vendors predicting it as a vision. But I'm happy to point out that we're doing some of it already today. Think of this as another step toward a greener data center, because it really optimizes all forms of operational and capital efficiency...

The simplest application of this Demand-Based Policy Management is with Active Power Management of data center servers to curb rampant power waste. (Check out Who's Recommending Power Management) The concept has been long-used on desktops (check out 1E or Verdiem; there are over a half-million desktops under power management today). In IT environments such as dev/test, we've seen opportunities to cut gross power consumption by 30% or more in a few months. All this by simply monitoring the server activity and then gracefully shutting-down idled hardware
(where idleness can be defined any way data center managers prefer). Check out the endorsements of power management from the EPA's Andrew Fanara as well as from Jon Koomey in the Cassatt press release. You can also watch a webcast about Active Power Management

A more sophisticated use of demand-based policy is to automatically maintain the service-level of applications. Take the server farm example... or for that matter, a SOA service. In either example, demand on the service may be cyclical or unpredicatable. Instead of massively over-provisioning hardware, one could use a service level metric (again, of your determination) to provide control. If service level drops (say, due to increased demand, or perhaps because of equipment failure) the Cassatt system will simply power-up or re-purpose another piece of hardware and create a new server to increase compute capacity. And you don' t need a virtualization platform to do this. (Or you could. Previously we announced compatibility with VMware ESX as well as with Xen.) Check out the 10 min webcast on demand-based policy management.

The benefits here are massive: Besides the power saved when not using a piece of equipment, you're maximizing the use of capital because it's dynamically repurposed. Similar control is used for Cloud Computing infrastructure like Amazon Web Services (AWS) and they've achieved a compute price-point unheard-of in the industry: $0.10 per CPU per hour. Try that with your existing infrastructure.

My, my. I've overlooked the remainder of the Cassatt announcement:
  • a new interface to interact with external systems - either management systems, or equipment like Load Balancers. Take the example of using an F5 load balancer in an environment with dynamic repurposing. As new servers are provisioned, Active Response can communicate in real-time to the balancer and provide the new VIPs in seconds as the servers are brought online.
  • a new set of platform compatibilities, including power distribution units, used to remotely power-manage servers. This is in addition to a massive list of supported hardware, OSs, applications and VM technologies. This solution will work with what you have today :)
If you want more info, go to the Cassatt website, or tune into some on-demand webcast presentations of the various Policy-Based controls for IT infrastructure.

Saturday, May 10, 2008

Cassatt scoops PARC?

Kudos to Data Center Knowledge for their report that Xerox PARC is developing predictive data center management software that increases efficiency by 30% -- and that it's based on management software for high-speed printers. (While I get the concept, it's a little like saying NASA has breakthrough space shuttle software based on a toaster...) As reported in GreenTechMedia:
PARC has developed software that can reduce servers’ energy usage by 30 percent (or, more likely, allow data centers to provide 30 percent more service using the same energy)... The software basically predicts demand, allowing data centers to prioritize and manage jobs more efficiently... Similar control software could be used to monitor and control electricity demand on the grid or in buildings, he said, but the first application is likely to be increasing data-center efficiency.

Further reported in the C|Net coverage,
In [the] ongoing project, PARC is trying to take the adaptive control systems that effectively manage the inside operations of printers and apply it to controlling data centers. Instead of slowing down the paper feed, for example, the adaptive system might shut down a bank of servers to cool off part of a data center, according to Nitin Parekh, director of business development in the hardware systems group.
Now, I know of a company that is already doing this in practice, and it happens to be where I work -- Cassatt. Cassatt's Active Response software. While not predictive, it does apply a continuous optimization algorithm to a huge swath of resources -- multivendor servers, O/Ss, applications, networking hardware. I like to say that it achieves savings via Efficient Operation of equipment, not through efficient equipment itself. And the savings number PARC quotes -- 30% -- is conservative by our own estimates.

Our explanation: Our system's goal is to maintain the service-level of any and all software applications in a large data center. If those levels should change -- because of a shift in demand, or due to an equipment failure -- the software takes action to correct for it. This action could be to provision a new piece of hardware/software and to re-route a network. Or it could mean shutting-down a server if an application is over-provisioned.

But wait: there's more. To take the PARC example above: one could also assign additional variables (we call them custom attributes) to each server or application. These attributes might have to do with temperature, power, or location. So, if in the course of maintaining an SLA for an application, our software finds that a bank of active servers is in a hot part of the data center, it might instead migrate the application to a server bank in a cooler area.

It's a lot cooler than anything inside a high-speed printer. I like to think of it as an operating system - but for an entire data center.

Monday, May 5, 2008

More IT efficiency metrics: McKinsey & Co. data center study

Last week at the Uptime Institute symposium on IT Energy Efficiency, Will Forest of McKinsey & Co. rolled-out a major study they've done, "Revolutionizing Data Center Efficiency". BTW, to my surprise, this report was even picked-up by the New York Times.

True-to-form, the report is chock-full of really great data. Also true-to-form, it has lots of words... and it has (ahem) yes, yet a
new IT efficiency metric.

Getting a bit of negativity out of the way first, I can't say that there was anything "revolutionary" in the report, at least in terms of revolutionary recommendations. However, the recommendations were much more detailed/actionable than what I've seen come out of other such reports. And, building on the seminal
2007 EPA report to congress on server and data center energy efficiency, this report definitely has indisputable analysis that data centers are inefficient and energy hogs... and where those inefficiencies lie.

Next, a few high-points from the report:

  • 40% of equipment (on average) is in Development/Test, vs. Production
  • Data Center greenhouse emissions will surpass that or airlines by 2020
  • Server utilization is still low; up to 30% of servers are "dead"; average cooling utilization is only 50%
  • The move from mainframes to client/server & multi-tier has exacerbated the problem by creating multiple silos of poor efficiency & utilization -- and encouraging sloppy design
The sources of the crisis suggested by McKinsey include
  1. poor application design
  2. poor power & cooling design
  3. poor capacity management
  4. poor application of efficient design & technology
  5. lack of Sr. executive oversight of operations & TCO
Now, it wouldn't be a high-end consultant's report if it didn't include a new metric. McKinsey's is called the Corporate Average Data center Efficiency (CADE) = (Facility Efficiency) x (Asset Efficiency).
- Facility Efficiency is defined as (Facility Energy Efficiency) * (Facility Utilization)
- Asset Efficiency is defined as (IT Energy Efficiency) * (IT utilization)

McKinsey also suggested differing types of initiatives (facility-based and IT-based) and the level-of-impact they would have. Aside from the hypothetical nature of this, I *really* like this chart, because it can begin to help IT & facilities professionals rank where to start. I also like it because it explicitly recommends power management solutions.

Anyway, this all makes perfect sense, except for two (pragmatic) issues
(1) The industry can't even agree on what units to use for "efficiency" -- in fact, few can even agree on what "useful work" output of a data center even is.
(2) Just when it looked like there was going to be a simplified efficiency metric (i.e. the PUE, or the Power Usage Effectiveness put forth by The Uptime Institute and Green Grid) now there's another one.

Then, add-in the fact that McKinsey wants to instill a 5-level rating system, with 5 CADE "tiers" (ranging from 0-5%, 5-10%, 10-20%, 20-40% and > 40% efficient) . Nice, if you can just agree on units. Oh - - and that's in addition to a rating system already put forward by the Green Grid, and ranking systems being proposed by the Department of Energy's DC-Pro tool, and the EPA Energy Star program that will (in the future) rank data centers.

Net-net: I'm not dinging McKinsey's desire to throw something unique into the ring. It has merit (I suppose).

In the end, maybe determining the best metric to follow is to take Dr. Ruth's suggestion: Do whatever works for you.