Friday, June 27, 2008

Sanity check: Data center energy summit

As I mentioned in yesterday's entry, the Silicon Valley Leadership Group's (SVLG) energy summit was fantastic - the first time I've seen actual implementations and data from innovations to help with energy efficiency. But on further reflection, there was something missing and rather alarming.

BTW, all of the the data was correct, and the conclusions were dead-on - the findings/predictions from the 2007 EPA report on data center efficiency were validated.

However, check out the list of projects in the Accenture report: 9 of them focused on site infrastructure, while only 3 of them focused on IT equipment.

Why weren't more projects aimed at making the IT equipment itself more efficient?

Now, if you look at where power is used in a data center, you'll find that with a "good" PUE, 30% might go toward infrastructure (with 70% getting to IT equipment), and that with a "bad" PUE, maybe 60% goes toward infrastructure (with the remaining 40% getting to IT equipment). In either case, the IT equipment is chewing-up a great deal of the total energy consumed... and yet only 1/4 of the projects undertaken had to do with curbing that energy.

This is like saying that a car engine is the chief energy-consuming component in a car, but that to increase gas mileage, scientists are focusing on drive trains and tire pressure.

My take is that the industry is addressing the things it knows and feels comfortable with: wires, pipes, ducts, water, freon, etc. Indeed, these are the "low-hanging fruit" of opportunities to reduce data center power. But why aren't IT equipment vendors addressing the other side of the problem: Compute equipment and how it's operated?

IT equipment is operated as an "always-on" and statically-allocated resource. Rather, it needs to be viewed as a dynamically-allocated, only-on-when-needed resource. More of a "utility" style resource. This approach will ultimately (and continuously) minimize capital resources, operational resources and (by association) power, while always optimizing efficiency. It is where the industry is going -- what's termed as cloud computing. This observation cuts directly to Bill Coleman's keynote (video here) earlier this week at O'Reilly's Velocity conference. It also alludes to Subodh Bapat's keynote where he outlined a continuously-optimized IT, energy, facilities and power grid system.

I certainly hope that at the SVLG's data center energy summit '09 next year, more projects focus on how IT equipment is operated, rather than on the "plumbing" that surrounds it. I can't wait to see the efficiency numbers that emanate from an "IT Cloud" resource.

Thursday, June 26, 2008

Real data center efficiency project results

I spent the day at the SVLG data center energy summit, where 11 actual case studies were presented, each detailing how companies and their technology partners addressed energy efficiency, while quantifying the specific results. For all of the hubub with industry associations talking about theory, metrics, standards and efficiency, today marked the first time someone is actually doing something. And, representing Cassatt's efforts, I was happy to provide input and real-life user data as well.

We had about 300 people in the room, including press, analysts, industry and government. It was a full day of presentations and reviews of various types of projects that were undertaken, as well as what the outcomes were. The results given during the day were then collated by Teresa Tung of Accenture Technology Labs, and compared against the results posited by the 2007 EPA report to congress on data center energy efficiency. The results matched up quite well. (even Andrew Fanara, EPA Energy Star's project director, blew a sigh of relief that his report in fact matched reality).

The complete set of session reports, as well as the Accenture report, are already on the web for all to see and learn from. That others could learn from these pioneers was part of the goal of the day.

The first session of the day featured IT resource optimization. Synopsis outlined how they consolidated data centers and saved power, square feet and more. Cassatt (presented by yours truly) then described how Active Power Management helped a nearby company curb over 20% of their power bill with innovative power management software.

The day also had a number of sessions sponsored by Lawrence Berkeley Labs, Oracle, the USPS, Power Assure and others, focussing on air flow management, cooling, power distribution projects and more.

Over lunch, we were treated to a CIO panel that included Teri Takai, California's CIO -- overseeing more than 120 other CIOs of state agencies (how does it feel to be CIO for the earth's 7th largest economy?) plus PK Agarwal, California's CTO.

I also sat in on a session given by Dave Shroyer of NetApp, Bill Tschudi of Lawrence Berkeley National Labs, and Ray Pfeifer -- covering NetApp's ultra-efficient data centers. Dave pointed out that during certain seasons of the year, NetApp achieves a PUE of 1.19!! That's probably the best I've heard of ever.

Finally, at the end of the day, Accenture presented its report findings, summarizing the results of the 11 projects. Receiving the report were Andrew Fanara of the EPA, Paul Roggensack of the California Energy Commission, Paul Scheihing of the DOE, Bill Tschudi of Lawrence Berkeley labs, and Kevin Timmons of Yahoo. I suppose you could call this peer-review. But what it finally signals to the industry is that people are finally taking action, comparing reality against theory, and sharing their pragmatic best-practices. Crazy as this sounds, I feel it was a seminal event, and hope that other industry associations follow suit.

I'm sure you'll see additional coverage of this by other online folks I met there: Debra Grove of Grove Associates, as well as Rich Miller of Data Center Knowledge.

Wednesday, June 25, 2008

Energy, data centers, and the cloud

It's been a pretty information-loaded week regarding energy-management issues within the data center.

The first was the keynote Bill Coleman gave at O'Reilly's Velocity Conference earlier. He addressed why the current trends (point-products, resulting complexity) in data centers is unsustainable, and why economies-of-scale are declining. His answer: the cloud. He talks about 1.0 (where we are today), cloud 2.0 (more sophisticated, may even replace the PC), and 3.0 where it really represents the "webtone" of services available on-demand, reliable, and composable. And the more efficient these economies-of-scale become, the more energy efficient they become as well. DataCenter Knowledge covered that as well, with a few relevant pointers to boot.

The other nifty reference was on Microsoft's TechNet Magazine. There, Dave Ohara pointed out similar observations from a number of sources --- that turning off unused assets is a path to sustainable computing. He gave a few examples including

  • Weill Medical College's HPC clusters leveraging IPMI for node shut-down
  • Citrix' approach to using their PowerSmart utility to power-down idle HP-based presentation servers
  • Microsoft's "Power-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services"
  • Commercially-available solutions, such as Cassatt's
With this mounting set of options, will power management continue to gain in popularity?

Tuesday, June 24, 2008

Postcards from Gartner's IT Infrastructure, Operations & Management Summit 2008 (Day 2)

Today's presentations were almost entirely about virtualization with generous servings of analysis of cloud computing.

Thomas Bittman
Thomas Bittman opened with a keynote on Gartner's predictions on cloud computing, and the likely march the industry will take over the next 5+ years toward this eventuality.

Gartner's predictions are that there will be many varieties of cloud computing, from the AWS-style of raw hardware, up through various types of service providers of platforms, services, and even component services that will be wired-together by other types of providers.

Bittman even went so far as to suggest that service "brokers" could emerge. For example, you've established an SLA with a cloud computing service provider, and for whatever reason, that SLA isn't met (maybe AWS has another glitch). Instantaneously, your broker finds another compatible cloud and "fails-over" instantly to that provider.

Gartner's sense was that there will likely be a few "mega" providers (AWS, Salesforce, Google, MSFT, others) and then hundreds/thousands of smaller mid-market and specialty providers... not unlike the evolution of the hardware market today. And on that note, they also predicted that the hardware providers (like Dell) will probably get into the hardware-as-a-service market shortly as well. That should be interesting to watch.

Cameron Haight
Next, Cameron Haight spoke about emerging "virtualization standards."

He made the very reasonable assumption that users will want to manage multiple VM technologies using a single tool. (And, with a straw-poll, the audience conclusively agreed).

A few of the initiatives already underway include:

* DMTF (Distributed Management Task Force) is already working on draft specifications for interesting standards... not for VMs, but for properaties that would aid in management -- such as a Virtual system profile (i.e. for re-creating a set of VMs), and a resource allocation capability profile (i.e. for monitoring managing VM hardware resources like CPU, memory, network ports, storage, etc.

* also an Open Virtualization Format (OVF) is underway. This isn't a standard for VM files. Rather, this would tag VMs with metadata to ID them, say for packaging/distribution. For example, it would help characterize what's "inside" a VM before powering it on. My suspicion is that this could be the foundation for a "common" type of SW container, and a common approach to monitoring/managing such VMs. But I also suspect that the vendors will either (a) fight this tooth-and-nail, or (b) adopt it, but "extend" it to suit their needs...

Cameron also ran a few interesting audience polls during his session. Follows are some notes I took, but I believe he'll probably publish them in a forthcoming research note:

Q: What VMware products are you currently using?
71% VirtualCenter
36% Update manager
31% Site recovery manager
23% Lab manager

Q: What do you think is most important for VMware to focus on?
30% optimization of VM performance
18% VM sprawl
16% Maintanance/patching
12% Accounting/chargeback
11% Root-cause analysis

Yep, these pass my sanity check. It should be quite interesting to see what VMware's next moves will be.

Monday, June 23, 2008

Postcards from Gartner's IT Infrastructure, Operations & Management Summit 2008

Here I am in Orlando at the Gartner IT conference. The day's been insightful and validating, if you happen to be in the Real-Time Infrastructure (RTI) business.

Andy Kyte
The opening keynote was from Andy Kyte, Gartner VP and Fellow. He's a dynamic speaker, and focussed mostly on IT Modernization and strategy. He was careful to define strategy/strategic planning -- and accused just about every IT management organization of buying "puppies". That is, most orgs buy products because cute "here-and-now" reasons, without realizing that they're really signing-up to a 15-year-long relationship with a dirty, hairy, high-maintenance and expensive pet. His point was validated when he pointed-out all of the point-products that organizations have purchased that essentially only add to cost & complexity, rather than reduce it. He posited that more products have to be purchased with a long-term (7+ years) strategic vision, and that short-term economic validation was often to blame for the morass that IT finds itself within.

Donna Scott
Next was Donna Scott, speaking about IT Ops management trends -- and later on in the day, speaking about IT modernization and RTI. She led-off with a list of projects that enable business growth ... and that projects that don't enable business growth should be canceled.

But most interesting was her coverage of the "cloud" which she (and Thomas Bittman, next) predicted would be where IT is evolving. She suggested that IT ops will evolve into an "insourced hosting" model - where IT departments will be building "internal cloud-computing" style infrastructures to support business owners. We here at Cassatt salute you, since that's what we enable :)

What was also cool about Donna's presentations were her many polls from the audience (probably 1,000 plus). Her first question was "what grade would you give leading IT management providers" 70% of them (CA, BMC, HP and IBM) got a "C" or worse. Her conclusion was that they still don' t manage complexity (they may monitor it, though), they still support the point-solution mentality, and most focus on single homogeneous platforms.

Finally, and most validating, Donna listed the chief properties/components of a Real-time infrastructure system... which she feels is practically on the market. Her list:
  • IT services provisioning
  • IT services automation (starting & stopping applications as-needed)
  • Process automation & change management
  • Dynamic Virtualization management
  • Services optimization
  • Performance management, capacity management.
Personally, it sounds like a pretty familiar list. She outlined what RTI could enable; the list was also pretty familiar:
  • Service virtualization management
  • J2EE management
  • Oracle RAC management
  • Disaster Recovery - sharing & re-configuring assets
  • Managing a shared test environment
  • "loosely-coupled" HA - replacing failed nodes
  • Dynamic Repurposing nodes
  • Dynamic capacity on demand / capacity expansion
Thomas Bittman
Thomas gave a great talk on "Virtualization changes virtually everything"... and essentially outlined the path the industry will likely take towards cloud computing. He essentially pointed out where "automation" is going wrong today... that "automation" tools are focusing on components, rather than on service levels. Until that happens, IT will continue down its complexity path.

Then he hit on a concept that will IMHO be the next big thing: The Meta-O/S. Think of it as an O/S for the data center -- the O/S that enables RTI. For example, what if you started with VirtualCenter, made it work with any VM technology (Xen, MSFT, etc.), made it manage physical/native resources as well, and finally abstracted away the rest of your physical infrastructure? Then, what if it could be told to optimize resources for application service levels , and/or to minimize power or capital or some combination at all times?

We're probably closer to this vision than you think - and the more industry is comfortable with sharing resources, and more dissatisfied with vendor point-solutions, the more it will be accepting of this meta-O/S concept.

I sometimes use this analogy:
What if you walked into a data center and were told to manage it - - 10,000 servers, 100 different HW models, 5,000 applications, various O/S flavors and revs, multiple networks, etc. etc. Well, you *wouldn't* tell me that you'd hire 200 sysadmins, buy multitple software management tools and analysis packages, set up complicated CMDBs and change-management boards, and buy a bunch of pagers for after-hours fire-drills. But that's how it's done today.

Rather (given a clean slate) you'd say "I'd get a computer to figure-out how and when to run applications, and to govern what software was paired with what hardware when. It would prioritize resources, and continually optimize overall operating costs. That's the rational approach. and that's what the Meta-O/S will do.

Thursday, June 19, 2008

A day with the California Energy Commission

Today I spent in Sacramento (yes, it was close to 100 degrees) with the commissioners of the CEC who presided over the Efficiency Committee Load Management Standards Workshop on Enabling Technologies.

The meeting was overtly to talk about "Demand Response" technologies -- that is, technologies that can temporarily reduce commercial, industrial and/or residential power loads during critical periods of the day (especially during the summer). This is important to "level-out" the demand placed on the electrical grid - and avoid having to over-build generation/transmission capabilities.

Why was this so interesting? three reasons....

One was that it was a great opportunity to hear what officials from the California Public Utilities Commission (CPUC), the California Independent System Operator (California ISO), Lawrence Berkeley Labs, PG&E, Southern California Edison, San Diego Gas & Electric, and Sacramental Muni had to say about technologies they're considering.

Two was the opportunity to see a number of vendors (Cassatt Included) present and demonstrate some cool ideas for curbing industrial and residential loads - - smart/programmable thermostats etc., Not to mention the various wired, satellite, Wimax, FM, and other really innovative ways for signaling "events" to these devices over wide areas (think: States).

Three (and most interesting to this author) was that the majority of "new" technologies were focused on the residential market -- smart thermostats and the like. To my surprise, essentially none of the technologies addressed the 60 gigawatthours of energy being consumed by data centers or ways to either permanently curb this number, or at least to curtail it during "demand response" events.

I'll be following-up with attendees, including officials from the major California utilities (who drive their Demand Response programs) to see how we can help reduce the energy consumption and wasted power in large data centers.

Monday, June 9, 2008

Rate your data center efficiency ! !

Two very noteworthy announcements in the past few weeks for those concerned with diagnosing how energy efficient their data centers are. And (surprisingly) these announcement & tools are coming from the US government who, in my opinion, is out in front of industry for the first time.

First, the US DOE is now beta-testing it's "DC-Pro" (Data Center Profiling) tool as part of it's "Save Energy Now" program. The tool is in its beta-testing stage, and is currently available here, with a really complete FAQ available as well. This tool is great in that it asks for tons of information (an education in-and-of itself), yields an anonymous ranking of your data center relative to peers, and also makes suggestions as to where else you can seek additional efficiencies in your environment.

Second, and in conjunction with the DOE, The US EPA is piloting its Energy Star rating for data centers, much the way they have Energy Star ratings for buildings/structures. The EPA is extending initial participants in the study to July 1, 2008. Information, expression-of-interest forms, etc. are available on the EPA Enterprise Server and Data Center Efficiency Page. Cassatt, among others, will be part of this study -- which also ranks efficiencies of data centers and awards the Energy Star label to the top quartile of annual participants. Note that this is less a diagnostic tool (as-is the DOE DC-Pro tool) and more of a recognition of performance.

My suggestion is to participate in both... If for no other reason than it will *cause* your organization to begin to ask questions, take measurements, and begin to learn tactical/tangible sources of efficiency improvements -- rather than simply discuss abstract concepts.

Friday, June 6, 2008

Real-life data center efficiency projects!

It's about time the industry stopped talking about data center & IT energy efficiency, and started doing something.

The good news is the first wave of actual projects, metrics and results is about to be unveiled. The Silicon Valley Leadership Group (SVLG) is hosting their first Data Center Summit '08 in Santa Clara CA on June 26. The 1-day event will be co-sponsored by the California Energy Commission, the US DOE, and Lawrence Berkeley National Labs.

The summit will have 11 case studies of actual projects - each co-sponsored by the SVLG, but implemented by one or more silicon valley companies & technology vendors. Each project addresses a different aspect of data center energy efficiency - from power management to cooling improvements to server efficiency increases. The point being how we can show real progress, and share the data with others.
I believe that California's own CIO will be attending, various other officials, and possibly folks from the EPA as well.

Cassatt, I might add, will be co-presenting a real-life energy-saving project with a user of our Active Power Management technology.

Registration is a mere $295 - you can go to the SVLG website and sign up yourself.

Thursday, June 5, 2008

Why IT monitoring is costing the industry money

I was giving a presentation to an analyst today, describing how an optimized IT infrastructure is inherently energy efficient.

And then it occurred to me: The entire IT monitoring and reporting sector (those guys who write software that pages you when something goes wrong in your data center) is perpetuating waste.

The software assumes that there's a problem only when service level agreements (SLAs) are too low -- but never when they are too high. This implies that alert storms get triggered when you're under-provisioned. But when you're over-provisioned, it's bad too... too much capital being wasted delivering an SLA that's better than needed. This scenario is probably replayed during every off-peak hour a data center operates.

What you don't measure, you can't manage. And therein lies the waste being perpetuated by IT: it's been implicitly assumed that too much infrastructure is OK.

Actually, what we need is a monitoring and control system that maintains an optimal service level -- not too high, not too low. And, when demand changes, automatically adjusts resources to re-optimize the SLAs. That adjustment might include re-allocating or de-allocating hardware, or re-provisioning servers on-the-fly.

Just once, I'd like one of my IT friends to get an alarm delivered to his pager that reads "system critically over-provisioned: wasting power"