Wednesday, February 27, 2008

Data Center Facilities & IT Ops are Merging

Reading through today's news, I saw that Emerson Electric has just acquired Aperture Technologies (which itself had acquired the Advantage Group last year).

In the IT/Data Center space, Emerson has been the leader in cooling and power distribution (maintaining an edge, I believe, over APC). Aperture, on the other hand, has a number of real-time data center monitoring, instrumentation and management products.

I think what's so interesting about this acquisition is that it signals a merging of two worlds: Facilities, who traditionally worry about cooling, power distribution, power conditioning, etc., and IT Operations, who are obviously involved in running the "data" part of the data center, e.g. servers, applications, network, and storage.

Driving this new angle on business are two forces - the first (and somewhat less interesting one) is simply the sales process; the real "decision makers" in the data center are IT Ops rather than Facilities. Thus, the traditional facilities guys (i.e. Emerson) have needed to penetrate this family of buyers to lock-in both sets of buyers.

The other force is the trend toward "systemness": That is, to really operate a data center efficiently -- some would say "greenly" -- the individual systems must all be interconnected and related as one larger system. Building management folks have known this for some time; heating, cooling, lighting, electricity, safety, etc. systems have all been interrelated. In the data center space, the move afoot is similar: compute systems, power systems, cooling systems, asset databases, etc. need to act as a whole. With luck, mergers of this type will bring a whole new level of efficiency of operation to data centers around the globe.

To that end, I'd stay tuned for similar announcements from the other big guys in the space: APC (who already offer some degree of competitive management products), as well as Eaton Electric, who also clearly appreciate the need for "systemness" in the data center.

Monday, February 25, 2008

Where you won't see cloud computing

Almost a year ago I was invited to one of our VC funders, Warburg Pincus, for what was essentially a 2-day CIO roundtable. I sat across from CIOs (yes, the real deal) from leading Financial, Telco, networking and SaaS providers.

At one point, the guy from a really big networking provider offers-up that their they're the biggest customer of a particular SaaS vendor - and that it's getting to the point where they were uncomfortable with critical data residing outside of their four walls. There were a variety of legitimate reasons, notably availability and security. So they were in the process of demanding that the SaaS provider actually provide a physical hardware appliance inside of their enterprise.

This set me thinking -- this sort objection was going to hold for a number of potential SaaS and "cloud computing" offerings for quite some time. So, despite the clear economics of cloud computing, the following objections will be hard to completely overcome:

  • SLA & availability: major CIOs will never trust anyone other than their own staffs to maintain critical apps, especially those with time- and availability sensitivity
  • Compliance: i.e. the need for complete auditability of applications and their data
  • Privacy & legal requirements: consider that data hosted anywhere in the continental US may be subject to the Patriot Act - something which may not appeal to firms based outside the US.
  • Liability: Who bares (sometimes considerable) financial burden when data is lost or unavailable?
  • Responsiveness: there are a few applications - such as in the financial sector - where application-specific power (such as data crunching in a grid) or speed/latency (such as in financial trading applications) will likely never be met by generic "cloud" technologies.

So I'm guessing that "core" applications will never be placed outside the enterprise and into the cloud (counter to the admittedly hyperbolic statements from a certain CTO). But I would certainly expect that non-core apps, such as for marketing, customer-service, website, and other non-realtime applications may well end-up in the "cloud" in a few short years. BTW, James Urquhart has lots to say about pros/cons in his own blog.

Of course, there is another option -- creating a "cloud-style" utility computing environment within the enterprise.

Monday, February 18, 2008

Measuring "Useful Work" of a Data Center

I've been seeing this topic arise again and again everywhere that data center efficiency is discussed. Essentially, how do we measure the "useful output" of a data center, and then compare that against the Watts that went in? I'm constantly amazed at the opinions on the topic -- mostly driven by equipment vendors pushing their own particular CPU metric. But more on that later.

First-off, there are even different views on how much energy that "goes in" - Energy into the data center as a whole which includes cooling; energy that gets to the server, or even just the energy that gets the CPU. For a quick breakdown on this, there's this excellent paper from Emerson that analyzes the levels.

More important (to me) is, what's the "Work Output"? Vendors would have us believe that the way to look at it is CPU output - like the proprietary "M-Value" from Sun, or MIPS, or somewhat more generic metrics such as from
SPEC. But these performance benchmarks don't mean much to the average Joe, vary considerably from machine-to-machine, and are biased toward different types of compute loads. Using such granular metrics is akin to describing the performance of a car by saying it's (displacement) x (compression ratio) x (fuel injection factor). It's absurd and just doesn't help, esp. when all I care about is acceleration. Ergo every attempt to define data center efficiency by using granular numbers is doomed to fail... or worse, get bogged down by politics.

A better approach is to treat the server (and, in fact, the entire data center!) as a "black box". Let's not care what's inside, i.e.
servers, networking, storage (ahem, what's "under the hood"). There are just too many variables.

Instead, let's describe -- in the language of IT professionals and data center managers -- what the output of that black box is, (e.g. the car's acceleration) and what they really care about. They use the language of the "SLA", or Service-Level Agreement. Generically, an SLA is composed of something like:
- Type of application (e.g. Exchange)
- How many users (e.g. mailboxes)
- How many transactions (e.g. emails)
- How many files (e.g. archives)
- What response rate/time
- What level of availability

Now we have something useful. The above 6 (or so) pieces of data describe everything we need to know - and then we can measure them against Watts.

For example, there might be two different server/network/storage implementations for this Exchange installation, but each signs-up to the same SLA. Who cares how many MIPs the servers are capable of? If one approach draws fewer Watts than the other while meeting the identical SLA, it's more efficient. "But what about the Network HW?", "But what about storage and backup?" I hear you cry. Well, those issues are covered in the SLA under response rate and availability.

Here's an interesting illustration: "But I demand N+2 redundancy" you cry. Well, that would be covered under the SLA for availability. Now consider that there is more than one approach to get N+2 redundancy inside the "black box". The first method is to have 2 dedicated machines per each application. The other is to have a pool of shared servers standing-by. IMHO both will give you N+2, but the second approach is more energy-efficient while still delivering the SLA. Thus, for every SLA within the "black box", there are approaches with differing efficiencies. And yes, some vendors' hardware may be more efficient at some pieces of work than others... it's all part of the optimization IT management performs. Maybe one day vendor X will sell a server optimized for Exchange, while vendor Y will sell one optimzed for SAP. Cool idea.

I continue to see hand-wringing and debate at the Green Grid, DOE and EPA/Energy Star around measuring and comparing useful data center output. Seems to me they should just default to using the language of IT -- the application SLA -- and get out of their myopic conversations about hardware and architecture. They can't see the forest for the trees.

Saturday, February 9, 2008

Green Grid Coverage: The Pros & Cons

Since writing about last week's Green Grid Technical forum, it's been tough-going for the organization; some of the trade rags have received the Forum with major scepticism.

First,
Ashlee Vance with The Register (in his usual intelligent but wry style) wrote "Green Grid pollutes environment with more white papers". Then, Matt Sansberry with SearchDataCenter.com posted the straight-up critical "Green Grid tech forum leaves users hungry for something meatier"

Net-net, these writers have valid points, but they're also missing an opportunty.

First, in the writer's corner:
  • Yep, Ashlee and Matt have a point: It's taken a year+ for the Green Grid to organize and to issue a bunch of white papers that essentially echo what's been out in the industry for some time -- empiric data, metrics, and best practices.
  • Much of the "heavy lifting" -- creation of best-practices and assessment tools -- is already underway from other (affiliated-with-the-Green-Grid) organizations, including,
    (1) the Silicon Valley Leadership Group's (SVLG) Energy Efficient Data Center Project will be showcasing real-world actions being taken by companies today
    (2) the US EPA has already performed an monumental study of the state of Data Center Energy, and is now working on developing an EnergyStar rating for data centers, much the same way they have one for buildings
    (3) the US DOE's Save Energy Now program is already far along in constructing an extremely comprehensive assessment, rating and comparison tool (to be called "DC-pro") for data centers
  • Debate about specific metrics (and their reciprocals) should have been settled long ago; there's nothing new about these analysis approaches
  • Indeed, the organization and its management is still highly vendor-centric
However, to the Grid's Defense (and, in full disclosure, Cassatt, my employer, is a General Member of The Green Grid):
  • Any new organization takes time to "level-set" the membership. Getting 150+ members to all agree on metrics and methodologies is no small feat. You've got to start somewhere.
  • Of course vendors would be first to create/join such a group. Certainly there's self-interest, but they also know they have significant power to drive broad awareness in the future. The Green Grid expects that as time progresses, they'll be attracting more and more "end-users" to the group and knows that this is a critical evolutionary goal.
  • Remember: This organization is not a "standards body" per se: rather, their charter is to develop and promote energy efficiency, user-centric models and metrics, and technologies. Thus their affiliation with SVLG, EPA, DOE and others to help promote those outside efforts seems to be working.
Now, for my editorial on the matter: (note that this is in the context of being a member -- but not a member with any "insider" information):
  • Given that the Green Grid is currently weighted toward vendors, and given that significant best-practice, assessment, & metric work is already being done outside the org, it would be logical for the group to focus on what it can lead in -- technology.
  • Thus, I could foresee The Green Grid's opportunity to lead-the-charge in two areas:
    (1) Identifying areas of "non-incremental" energy efficiency improvement, i.e. not just figuring out how to squeeze a few percent efficiency out of a power supply or a cooling system, but rather identifying radical new ways in which data centers and their technology is designed and operated
    (2) Encouraging their vendor membership (and the free market) to pursue these breakthrough technologies and operational frameworks.
  • Analogy: without President Kennedy's challenging the country to reach the moon, ordinary evolution of aeronautics would have taken decades or more to do the same. Perhaps the Green Grid can step-up to championing a similar energy-efficiency challenge.
In summary, rather than slam a well-intended group of very influential players, critics should instead help re-focus the energies of some 150 companies toward what they do best - innovation.

Wednesday, February 6, 2008

Green Grid Technical Forum: Day #2

Today was Day #2 of the Green Grid forum in San Francisco, of which Cassatt has recently joined as a member.

The day opened with a session on the differing behavior, goals and relationships of IT operations contrasted with facilities management.

However, what really kicked-off the theme for the day was the discussion of data center metrics, specifically, what's meant by "Data Center Productivity." The chair and co-chairs of the Metrics Work Group delved into a proposed metric DCeP, essentially = (Useful Work)/(Energy Consumed). To be fair, this metric is proposed as a guidepost only -- in other words, the units of "useful work" could vary, based on who's using the formula. Thus, it's not meant to compare one data center to another, but would certainly be useful when tracking improvements in one's own data center. And, naturally, the "families" of DCeP ratios would vary based on whether the data center is a Tier I, II, III or IV facility.

Later in the day, a session followed where a number of members presented how they're currently measuring actual PUE's of their own data centers. This included members such as Digital Realty Trust, Eaton Electric, HP, Microsoft, Texas Instruments and British Telecom. This was also really insightful -- to hear the hows and whys of physically taking the measurements (sometimes automated, sometimes "a guy with a clipboard") as well as how the metrics were being used. In one case, HP, after measuring something like 18 different data center PUEs, discovered one facility what was *way* out-of-whack (read: poor) compared with the others. I suspect this factored heavily when HP decided which centers to de-commission first as they consolidated.

Also eye-opening was the story of another member who observed that use of efficient equipment alone was not sufficient to score "high" with a PUE. Remember, it's the ratio of energy-in::Energy-reaching-IT-equipment. So, in our friend's case, his brand-spanking-new data center was operating at a fraction of it's total capacity... but had all of it's infrastructure (cooling, power distribution, etc.) running as well. Therefore the computed ratio in fact turned out to be worse than most of his older data centers. Thus the observation: maximize your efficiency by always balancing facility "overhead" against the IT load.

Finally, it was incredibly refreshing to see all of these heavy-hitters cooperating for the common good of the industry, setting competitive pressures aside. Picture Intel & AMD in the same room; Dell, Sun, IBM & HP on the same committee. In fact, the only amusing scene was when printouts for directions to the Director's dinner were handed-out to the Microsoft representatives -- printouts generated from Google Maps.

All-in-all, the 2-day meeting was a huge success, sharing information and aligning the body of over 130 companies.

Tuesday, February 5, 2008

Green Grid Technical Forum: Day #1

Since Cassatt is a new General Member to the Green Grid, I was able to attend this 2-day meeting. Today's session was quite well-attended... I estimate ~ 300+ participants in the room here in San Francisco for the opening talks and panels. Today/tomorrow appear to be engineered for members (and the press) as a comprehensive report-out of the excellent machinations of the organization and its workgroups.

The morning opened with some end-users (data center operators) such as Enterprise Rent-a-Car and AllState Insurance talking about comprehensive steps they took (and are still taking) to radically shift the efficiency of their operations. (I might interject that one of the strategies Allstate takes is to power-down idle equipment...)

But what was really interesting in the morning were a number of panels covering how the organization is reaching-out to other orgs and to government; panels featured Andrew Fanara of the EPA's EnergyStar program, Paul Sheihing of the DOE's Save Energy Now program, and Anson Wu who's helping develop the EU's CoC (Code of Conduct, roughly analogous to EnergyStar). A number of other organizations, such as Lawrence Berkeley National Labs were also present. It's good to see this degree of cooperation, but what was still left unsaid was specifically how these groups were specifically cooperating. Perhaps one answer lay in the fact that the Green Grid was still spinning-up liaison work groups -- but it's certainly on their radar.

Also really neat was the fact that EnergyStar will be creating a rating system for Data Centers, the same way that they have a rating for buildings, and that DOE will be creating an efficiency diagnostic/assessment tool ("DC-Pro") for Data Centers the same way they have tools for, say, chemical plants.

Also of special note were a number of talks having to do with metrics. Every org has them, and even more are being developed. There's alphabet soup like PUE, DCiE, DCP, SI-POM and DH-UE. In fact, nearly every organization above is proposing metrics to help characterize data center power consumption and efficiency. Finally, the question was asked regarding how to sort through who's creating what metric. I think that's one of the pivotal points -- what will be the generally-accepted set of metrics to use when assessing the efficiency of a data center? Hmmm. Maybe we'll find out on Day #2...

Monday, February 4, 2008

Huge Power Inefficiencies in Dev/Test Labs

Last week I was able to meet with the Director of Global Lab Services for a major networking company. We had a wide-ranging conversation, but here is an insightful summary that continues to reinforce my power management thesis:
  • Company divisions are moving to individual P&L's (with chargebacks) and are under pressure to ID and eliminate wasteful costs
  • He's overseeing the use of thousands of pieces of hardware (servers, switches, routers), being used by roughly 500 engineers (many overseas in India). Many of these pieces consume over 1kW each.
  • Lab hardware is constantly being reserved for test-beds, checked-out, and then returned to unreserved status. But it's clear that even when a piece of hardware is part of a test-bed, it may be un-used for as much as 50% of the time.
  • In labs alone, the company is paying nearly $100k per month in electricity bills
  • He knows that there are massive power inefficiencies, and is looking for a solution.
Now, this company is not even on the Fortune 500 list (although on the F1000 list).... which says to me that there are probably hundreds of other enterprises facing the same problem. So as the industry focuses on "green data centers," virtualization and cooling technologies, *why* don't we simply look at the use rates (some would day the duty cycles) of the equipment itself? Simply using Active Power Management ought to be a quick/easy solution right off the bottom line.

And what better place to begin than in Dev/Test?