Monday, July 21, 2008

Postcards from the SF Datacenter Dynamics meeting

This was an awesome (and pretty intense) 1-day show in San Francisco this past Friday, covering all of the current IT operations and energy efficiency topics. It was one of a number of local/international shows run by the same Brits who also publish ZeroDowntime. And, it was one of their largest - they claim it drew ~ 800 folks, almost all of whom were directly involved in operating end-user datacenters. I definitely recommend attending one in your area.

I had great conversations with some of the authorities, and attended a handful of sessions that included the US DOE, a panel on datacenter econometrics, an end-user panel regarding datacenter automation, and some vendor presentations regarding upcoming technologies.

US DOE
This was definitely the most newsworthy session (see my previous blog entry). The DOE has been piloting their DataCenter assessment tool, "DC-Pro" lately - and their primary assistant, Lawrence Berkeley National Laboratories (LBNL), gave a walk-through of the tool, plus a roadmap of the overall goals and roll-out plans from now through 2011. I've now spoken with Bill Tschudi of LBNL a number of times; he's optimistic that the DOE will hit its goals for the tool, and for thousands of data center operators to make use of it in the next year or so.

IT Econometrics panel
Early in the morning there was also a decent panel covering topics of "green data center econometrics", essentially diving into a number of cost topics frequently overlooked in analyses. On the panel was Jon Haas of Intel, Mark Honeck of Quimonda (big DRAM manufacturer), and Winston Bumpus representing the Green Grid. Everyone agreed to "measure first", the same mantra that came out of the Uptime Institute earlier this year... in other words, measure power, temperatures, airflows and economics first, so to establish a baseline and a quantifiable goal for improvement. The other conclusion i'm happy they reached was to pursue projects that can get done quickly and show real benefits - pursue tactical initatives first.

Lastly (and by virtue of who was on the panel) came an interesting conclusion having to do with server-based power consumption: Memory is a *huge* power hog, made even worse by the move toward virtualization which typically requires a large memory upgrade for consolidating servers. One finding was that not all memory is created equal, and not all configurations consume equal power (16 1Gb SIMMs can consume four times as much power as 2 8Gb SIMMs)

On Datacenter Automation
This was a fantastic session given jointly by Cisco and OSIsoft. Cisco's primary speaker was in charge of their global lab compute capacity, and is trying to consolidate something like 200 separate labs around the globe. He clearly understood the organization differences between Facilities & IT operations, and the need to fill the gap - otherwise no useful efficiencies could be realized. Further, he predicted (if not asked to require) that IT automation systems (that govern compute, power and cooling resources) ultimately be integrated with building automation systems. In fact, he went a step further and posited that he'd like to see automation systems that interact globally. That means, he'd like to be able to dynamically push (compute) load to locations where capacity was economical -- a "follow-the-moon" strategy. This is counter to the traditional example whereby one pushes cooling to where the hot spots are; rather, push compute loads to where the cooling (and floorspace) is. This form of automation is right up my alley :) I'm happy to see other industry leaders as proponents.

I also had a chance to speak at length with Paul Marcoux, Cisco's VP of green engineering. (An interesting proposal from him here). He very much believes that the US will face carbon emissions capping/trading in the next few years... after it's incepted by the EC and others. Ergo, Cisco is taking the lead in comprehensive sustainability initiatives. And if you look at the number of sustainability organizations they're taking the lead in, you have to believe it.

Exhibitors
Aside from the sponsor/exhibitors, there were very few vendors at the show, and lots of time to interact & network with local peers -- something that's invaluable, and that I heard that time and again from attendees who've been in the past. What was also great what that most (but not all) of their pitches were truly education, with a minority being "commercials" for product.

Summary
This is a great show for data center managers to attend; it's only one day out of your schedule, and because it visits 7 US cities, minimal travel is usually involved. They've also got an international perspective because they visit 20 other cities around the globe.

Sunday, July 20, 2008

A look at DOE's new Datacenter Profiling Tool

This past Friday I got a good look at the DOE's new DC-Pro (Datacenter Profiler) tool, while at the San Francisco Datacenter Dynamics conference. The overview and demo was driven by Bill Tschudi and Paul Matthew of Lawrence Berkeley National Labs (LBNL) who've been instrumental in helping DOE construct it as part of their Save Energy Now program (the beta release was back in June). What's really unique is that DOE, LBNL and the EPA have all collaborated to help create this, the first time I've ever seen the government take the lead ahead of industry in creating such useful IP for the private sector. An excellent presentation about this initiative is at the top of this DOE Page.

This tool absolutely complements the excellent work being done by the EPA to create an Energy Star rating for datacenters (where your facility will get the lable if it falls into the top quartile of facilities for that year), as well as datacenter metrics work being done by the Uptime InstituteThe Green Grid (of which Cassatt is a member).

The DC-Pro tool has (and will have) a number of useful outputs:

  • Ability to help you track DCiE over time
  • Outline of end-use energy & resource breakouts
  • List of energy reduction potential (and idealized DCiE)
  • Specifically define areas for improvement (i.e. power sources, HVAC, IT, etc.)
What's initially clear after logging-in to the web-based tool is the amount of depth/thought that went into the questions it asks, with context-sensitive pull-down menus, etc. It asks for so much data in fact, that I absolutely recommend that users download the checklist in advance to collect all the data they'll need to gather). Just to illustrate the though that goes into it, this tool even cares about your geography and zip code, because the carbon content of electricity varies for different parts of the country. You'd better be prepared to team up with your IT and Facilities counterparts to complete the data collection.

But what's also great is that the tool generates immediately-useful data, such as your DCiE/PUE, and how it ranks your measures relative to others who have used the tool so you can compare with peers (all of which has been anonymized so that no proprietary data is revealed). In this way you can also track how improvements affect efficiency for yourself and against peer groups.

Bill also outlined the roadmap for the tool (the tool's not entirely complete yet), which is impressive and agressive. Plus, by 2011, they'd like to see
  • 3,000 data centers will have completed awareness training through classes or webcasts via DOE partners
  • 1,500 mid-tier and enterprise-class data centers will have applied the Assessment Protocols and Tools to improve data center energy efficiency by 25% (on average);
  • 200 enterprise-class data centers will have improved their energy efficiency by 50% (on average) via aggressive measures as accelerated virtualization, high-efficiency servers, high-efficiency power systems (e.g., fuel cells), optimized cooling, and combined heat and power systems
  • 200 Qualified Specialists will be certified to assist data centers
Part of the roadmap for the tool are individual components that will make recommendations for improvements. Most of these sections are due to be integrated in the September '08 timeframe, and include the Air Management, HVAC, Power Chain, and IT sections. DOE and LBNL are reaching-out to external industry groups for input on these sections. It's my belief that these components will be highly comprehensive, and look towards some highly-agressive options & technologies that datacenter operators can leverage.

The montra I hear from everyone I speak with is "measure, measure, measure" (you can't control what you can't measure). But finally, someone has developed a tool in which to dump your measurements, and with which to compute your absolute and relative progress in becoming more energy-efficient.

Kudos!

Wednesday, July 16, 2008

Is an "internal" cloud an oxymoron?

By definition, "the cloud" lies external to the enterprise data center. And it's got great properties: in an Infrastructure-as-a-Service example, per-CPU operating costs are on the order of $800/year (see Amazon's EC2 price list), whereas CPU operating costs are typically $3k-$5k/year in the average data center.

It seems to me that the industry has become overly-fixated on hosted clouds (IaaS, PaaS, SaaS etc.) that are run by third parties which have all of those nice economies-of-scale.

But what about implementing an "Internal" cloud inside of corporate data centers? John Foley of InformationWeek just raised this question in talking about Elastra.
[they are] working on a version of Cloud Server for data center VMware environments, or what it refers to as "private clouds." That's an oxymoron since cloud computing, by definition, happens outside of the corporate data center, but it's the technology that's important here, not the semantics."
Semantics aside, what properties would an ideal "internal" cloud have? Clearly the same economics as a "traditional" cloud, but with some added benefits to avoid the current pitfalls of external clouds. The improved properties include --
  • Should work with existing physical & virtual resources in the data center (heterogeneous platforms & O/S's)
  • Should let you specify whether your apps are virtualized or not (but either way, provide capacity-on-demand)
  • Wouldn't require that sensitive data be hosted outside the enterprise; it would maintain internal auditability
  • Ought to adhere to internal security and configuration management processes
  • Would not disrupt existing software architectures
  • Would allow you to add additional capacity (compute resources) on-the-fly
  • Could be segregated to support both production & development environments
  • Would provide internal metering & billing for internal users and business units
Maybe an "internal cloud" needs a new name, but what it represents is essentially the basis of Utility Computing. Check out the Cloudy Times blog, that also references the potential of an Internal Cloud.

I'm guessing that, as cloud computing gains steam, IT organizations will want the same properties internally - through implementing an "internal" cloud leveraging utility computing infrastructure.

ProductionScale recently ruminated on this topic (calling it a private cloud, instead of internal):
"What is private cloud computing? To make a non-technical analogy, Private Cloud Computing is a little like owning your own car instead of using a rental car that you share with others others and that someone else owns for your automobile and transportation needs. Rental cars haven't completely replaced personal automobile ownership for many obvious reasons. Public Cloud Services will not likely replace dedicated private servers either and will likely drive adoption of private cloud computing".
Working for Cassatt, I'm biased toward believing that a market for Internal Cloud infrastructure providers will emerge.... and potentially help enterprises dovetail their internal clouds with public clouds. Any other opinions?

Friday, June 27, 2008

Sanity check: Data center energy summit

As I mentioned in yesterday's entry, the Silicon Valley Leadership Group's (SVLG) energy summit was fantastic - the first time I've seen actual implementations and data from innovations to help with energy efficiency. But on further reflection, there was something missing and rather alarming.

BTW, all of the the data was correct, and the conclusions were dead-on - the findings/predictions from the 2007 EPA report on data center efficiency were validated.

However, check out the list of projects in the Accenture report: 9 of them focused on site infrastructure, while only 3 of them focused on IT equipment.

Why weren't more projects aimed at making the IT equipment itself more efficient?

Now, if you look at where power is used in a data center, you'll find that with a "good" PUE, 30% might go toward infrastructure (with 70% getting to IT equipment), and that with a "bad" PUE, maybe 60% goes toward infrastructure (with the remaining 40% getting to IT equipment). In either case, the IT equipment is chewing-up a great deal of the total energy consumed... and yet only 1/4 of the projects undertaken had to do with curbing that energy.

This is like saying that a car engine is the chief energy-consuming component in a car, but that to increase gas mileage, scientists are focusing on drive trains and tire pressure.

My take is that the industry is addressing the things it knows and feels comfortable with: wires, pipes, ducts, water, freon, etc. Indeed, these are the "low-hanging fruit" of opportunities to reduce data center power. But why aren't IT equipment vendors addressing the other side of the problem: Compute equipment and how it's operated?

IT equipment is operated as an "always-on" and statically-allocated resource. Rather, it needs to be viewed as a dynamically-allocated, only-on-when-needed resource. More of a "utility" style resource. This approach will ultimately (and continuously) minimize capital resources, operational resources and (by association) power, while always optimizing efficiency. It is where the industry is going -- what's termed as cloud computing. This observation cuts directly to Bill Coleman's keynote (video here) earlier this week at O'Reilly's Velocity conference. It also alludes to Subodh Bapat's keynote where he outlined a continuously-optimized IT, energy, facilities and power grid system.

I certainly hope that at the SVLG's data center energy summit '09 next year, more projects focus on how IT equipment is operated, rather than on the "plumbing" that surrounds it. I can't wait to see the efficiency numbers that emanate from an "IT Cloud" resource.

Thursday, June 26, 2008

Real data center efficiency project results

I spent the day at the SVLG data center energy summit, where 11 actual case studies were presented, each detailing how companies and their technology partners addressed energy efficiency, while quantifying the specific results. For all of the hubub with industry associations talking about theory, metrics, standards and efficiency, today marked the first time someone is actually doing something. And, representing Cassatt's efforts, I was happy to provide input and real-life user data as well.

We had about 300 people in the room, including press, analysts, industry and government. It was a full day of presentations and reviews of various types of projects that were undertaken, as well as what the outcomes were. The results given during the day were then collated by Teresa Tung of Accenture Technology Labs, and compared against the results posited by the 2007 EPA report to congress on data center energy efficiency. The results matched up quite well. (even Andrew Fanara, EPA Energy Star's project director, blew a sigh of relief that his report in fact matched reality).

The complete set of session reports, as well as the Accenture report, are already on the web for all to see and learn from. That others could learn from these pioneers was part of the goal of the day.

The first session of the day featured IT resource optimization. Synopsis outlined how they consolidated data centers and saved power, square feet and more. Cassatt (presented by yours truly) then described how Active Power Management helped a nearby company curb over 20% of their power bill with innovative power management software.

The day also had a number of sessions sponsored by Lawrence Berkeley Labs, Oracle, the USPS, Power Assure and others, focussing on air flow management, cooling, power distribution projects and more.

Over lunch, we were treated to a CIO panel that included Teri Takai, California's CIO -- overseeing more than 120 other CIOs of state agencies (how does it feel to be CIO for the earth's 7th largest economy?) plus PK Agarwal, California's CTO.

I also sat in on a session given by Dave Shroyer of NetApp, Bill Tschudi of Lawrence Berkeley National Labs, and Ray Pfeifer -- covering NetApp's ultra-efficient data centers. Dave pointed out that during certain seasons of the year, NetApp achieves a PUE of 1.19!! That's probably the best I've heard of ever.

Finally, at the end of the day, Accenture presented its report findings, summarizing the results of the 11 projects. Receiving the report were Andrew Fanara of the EPA, Paul Roggensack of the California Energy Commission, Paul Scheihing of the DOE, Bill Tschudi of Lawrence Berkeley labs, and Kevin Timmons of Yahoo. I suppose you could call this peer-review. But what it finally signals to the industry is that people are finally taking action, comparing reality against theory, and sharing their pragmatic best-practices. Crazy as this sounds, I feel it was a seminal event, and hope that other industry associations follow suit.

I'm sure you'll see additional coverage of this by other online folks I met there: Debra Grove of Grove Associates, as well as Rich Miller of Data Center Knowledge.

Wednesday, June 25, 2008

Energy, data centers, and the cloud

It's been a pretty information-loaded week regarding energy-management issues within the data center.

The first was the keynote Bill Coleman gave at O'Reilly's Velocity Conference earlier. He addressed why the current trends (point-products, resulting complexity) in data centers is unsustainable, and why economies-of-scale are declining. His answer: the cloud. He talks about 1.0 (where we are today), cloud 2.0 (more sophisticated, may even replace the PC), and 3.0 where it really represents the "webtone" of services available on-demand, reliable, and composable. And the more efficient these economies-of-scale become, the more energy efficient they become as well. DataCenter Knowledge covered that as well, with a few relevant pointers to boot.

The other nifty reference was on Microsoft's TechNet Magazine. There, Dave Ohara pointed out similar observations from a number of sources --- that turning off unused assets is a path to sustainable computing. He gave a few examples including

  • Weill Medical College's HPC clusters leveraging IPMI for node shut-down
  • Citrix' approach to using their PowerSmart utility to power-down idle HP-based presentation servers
  • Microsoft's "Power-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services"
  • Commercially-available solutions, such as Cassatt's
With this mounting set of options, will power management continue to gain in popularity?

Tuesday, June 24, 2008

Postcards from Gartner's IT Infrastructure, Operations & Management Summit 2008 (Day 2)

Today's presentations were almost entirely about virtualization with generous servings of analysis of cloud computing.

Thomas Bittman
Thomas Bittman opened with a keynote on Gartner's predictions on cloud computing, and the likely march the industry will take over the next 5+ years toward this eventuality.

Gartner's predictions are that there will be many varieties of cloud computing, from the AWS-style of raw hardware, up through various types of service providers of platforms, services, and even component services that will be wired-together by other types of providers.

Bittman even went so far as to suggest that service "brokers" could emerge. For example, you've established an SLA with a cloud computing service provider, and for whatever reason, that SLA isn't met (maybe AWS has another glitch). Instantaneously, your broker finds another compatible cloud and "fails-over" instantly to that provider.

Gartner's sense was that there will likely be a few "mega" providers (AWS, Salesforce, Google, MSFT, others) and then hundreds/thousands of smaller mid-market and specialty providers... not unlike the evolution of the hardware market today. And on that note, they also predicted that the hardware providers (like Dell) will probably get into the hardware-as-a-service market shortly as well. That should be interesting to watch.


Cameron Haight
Next, Cameron Haight spoke about emerging "virtualization standards."

He made the very reasonable assumption that users will want to manage multiple VM technologies using a single tool. (And, with a straw-poll, the audience conclusively agreed).

A few of the initiatives already underway include:

* DMTF (Distributed Management Task Force) is already working on draft specifications for interesting standards... not for VMs, but for properaties that would aid in management -- such as a Virtual system profile (i.e. for re-creating a set of VMs), and a resource allocation capability profile (i.e. for monitoring managing VM hardware resources like CPU, memory, network ports, storage, etc.

* also an Open Virtualization Format (OVF) is underway. This isn't a standard for VM files. Rather, this would tag VMs with metadata to ID them, say for packaging/distribution. For example, it would help characterize what's "inside" a VM before powering it on. My suspicion is that this could be the foundation for a "common" type of SW container, and a common approach to monitoring/managing such VMs. But I also suspect that the vendors will either (a) fight this tooth-and-nail, or (b) adopt it, but "extend" it to suit their needs...

Cameron also ran a few interesting audience polls during his session. Follows are some notes I took, but I believe he'll probably publish them in a forthcoming research note:

Q: What VMware products are you currently using?
71% VirtualCenter
36% Update manager
31% Site recovery manager
23% Lab manager

Q: What do you think is most important for VMware to focus on?
30% optimization of VM performance
18% VM sprawl
16% Maintanance/patching
12% Accounting/chargeback
11% Root-cause analysis

Yep, these pass my sanity check. It should be quite interesting to see what VMware's next moves will be.