Thursday, December 20, 2007

A day at DOE's Save Energy Now workshop

Earlier this week I had the privilege of attending a workshop sponsored by the Department of Energy's Save Energy Now program , in collaboration with the EPA/EnergyStar group. The purpose was to help DOE define and engineer tools to help assess data center efficiency performance. They're reaching-out to industry to ensure they're using the right metrics, asking the right questions, and producing a product that will be useful in real-life. The tool will (presumably) help firms "score" the efficiency of their data center from 1-100, much the same way there are already tools to score the energy-efficiency of buildings, etc. BTW, see my earlier blog on some proposed metrics from The Green Grid and Uptime Institute.

In the room were mostly vendors such as AMD, Emerson Electric, HP, Intel, NetApp and Sun. Plus, folks from DOE, EPA, Lawrence Berkeley National Labs, the Green Grid, and the Uptime Institute attended as supporting resources to DOE. And first-off, it was great to see the overlap/interaction between these groups -- especially that DOE, Green Grid and Uptime Institute were cooperating.

The tool itself is expected to have 4 components of assessment: IT (servers, storage, networking, softwar), Cooling (chillers, AC units, fans), Power Systems (UPS, distribution) and Energy Sources (generation, etc.).

But the real core of the day's conversation is around what's meant by "Efficiency" - all agreed that at the highest level it was the ratio of "useful computing output" to the "energy supplied to the overall data center". That second number is sort of the easy part: it includes all of the power used for lighting, cooling, power distribution, etc. etc.; sometimes it's hard to measure empirically, but it all comes down to kWhs. The real issue, it turns out, is what's meant by the first number, "useful computing output".

In our afternoon IT breakout group, about 10 of us debated for a good hour or so about just that: how do we define "output"? Is it MIPS? Is it measured in SPEC units? Is it CPU utilization? And what about storage and network "output" as well? In the end, we agreed that we should define it the way any IT professional would: As an application service with an associated SLA. So for example, the useful output would be a service called "Microsoft Exchange" with SLA traits of X users, Y uptime, and Z latency. And most important, this approach makes the output independent of underlying technology and implementation. Thus, two data centers could implement this service/SLA combination in vastly different ways, with vastly different power requirements and therefore energy efficiencies.

When the DOE tool (or set of tools) is complete in mid-2008, it will represent a seminal event: Data Center operators will have a standard benchmark against which to self-rate -- and to (confidentially) compare their efficiencies against their peers. It will also begin to put pressure on IT vendors (!) to focus on the "gestalt" efficiency of their systems, not just of their components. (IMHO, this will make me *very* happy)

And, I hope, this benchmark will begin to accelerate (a) the move toward dynamically-managed shared compute resources, and (b) the technical & organizational bridge between IT system management and facilities management.

Wednesday, December 19, 2007

End-of-the-year holiday shopping for IT

The end of the fiscal year us upon most of us - and for some that means spending any remaining dollars on some quick-hit (read: fast ROI) initiatives.

So in the spirit of the holidays, here are some interesting links:

1.
Bridget Botelho at SearchDataCenter writes "Servers get no rest during the holidays" -- that most enterprises (and especially small/medium businesses) leave all servers on during the holidays, even if the company is closed. It's a waste of power/money, even though many solutions abound. (And did you even think about the security risk of keeping your IT on but essentially unsupervised?)

2. And, for those of you with money left in your budget, Rick Vanover from TechRepublic chimed-in to suggest 10 good ways to use your remaining IT budget before the end of the year - I particularly like the following:
  • #3: Purchase power management: Many new power management devices are available now that can be a good replacement for your limited power distribution units (PDUs). These PDUs can add management layers to individual power sockets for power consumption, naming, grouping, and power control. The new devices can also add more ports should you need to power more computer systems in your racks.
I would only add that in addition to switched PDUs, you should consider purchasing policy-based software to control them.

3. And, Rick went on to also cite an earlier blog of his,
10 things you should know about advance power management - another topic near-and-dear to my heart, especially:
  • #7: Turn off retired or unused devices: This will reduce your power consumption — and possibly accelerate your removal of the device so as not to overprovision power unnecessarily...
The net-net is the following: If you need a quick-hit, easy-to-implement (and money-saving) and pragmatic solution, pursue intelligent operation of your IT - no need for a technology refresh, no need to re-architect anything.


Monday, December 10, 2007

Server Power Management Myths - and more

An old friend, James Governor, in his GreenMonk blog recently got me thinking. First he pointed out that it's common in Japan to turn servers off at night. Then it wasn't so much as his follow-on blog (about turning servers off when you don't need them) as acomment he highlighted from Mike Gunderloy:
  • Has anyone looked at the labor costs of this? I know that even on my tiny little dozen-machine network, I am reluctant to power everything off at night simply because it takes so bloody long waiting for the damn things to boot up in the morning. Seems like actual working fast-boot technologies would go a long way to sell this initiative.
This is exactly the sort of objection or "urban myth" that we're trying to dispel. For example, many believe that application availabiltiy might be compromised if servers are shut down. However, there is a solution to this: policy-based control, whereby servers might be powered-up in advance of their need. That's the sort of work we're doing at Cassatt. And even in a small business, if servers in a closet are turned off nights and weekends, you're still talking about energy savings on the order of 40% or more over the duration of a year!

BTW, if you're interested in additional "Urban Myths" about server power control, check out the "Myths and Realities of Power Management" page.

And while you're at it, give us some feedback on how you feel about IT Energy Management and "Green IT": we're hosting a 5-minute survey this week (and, you could win a Wii if you take it).



Saturday, December 8, 2007

The Case for Energy-Proportional Computing

senior vice president of operations at Google and a Google FellowEnergy-proportional designs would enable large energy savings in servers, potentially doubling their efficiency in real-life use. Achieving energy proportionality will require significant improvements in the energy usage profile of every system component, particularly the memory and disk subsystems."

The two are making the case not only for server power management, but are calling on vendors to go a step further, to make computers adapt their consumptive ranges directly to the compute load consumed. This would be highly complementary to consolidation efforts currently underway.

In conclusion, the paper says,

  • Servers and desktop computers benefit from much of the energy-efficiency research and development that was initially driven by mobile devices' needs. However, unlike mobile devices, which idle for long periods, servers spend most of their time at moderate utilizations of 10 to 50 percent and exhibit poor efficiency at these levels. Energy-proportional computers would enable large additional energy savings, potentially doubling the efficiency of a typical server. Some CPUs already exhibit reasonably energy-proportional profiles, but most other server components do not.
  • We need significant improvements in memory and disk subsystems, as these components are responsible for an increasing fraction of the system energy usage. Developers should make better energy proportionality a primary design objective for future components and systems. To this end, we urge energy-efficiency benchmark developers to report measurements at nonpeak activity levels for a more complete characterization of a system's energy behavior
Among a few scholarly pieces from Google, the report also cites two great references; one, the US EPA's Report to Congress on Data Center Efficiency, and the other is one of many fine works by Jonathan Koomey, "Estimating Total Power Consumption by Servers in the U.S. and the World"

Tuesday, November 27, 2007

Postcards from the 2007 Gartner Data Center Conference

I attended Day #1 of the Gartner Data Center conference here in Las Vegas today - after making the strategic error of being dropped-off at the MGM Grand lobby, and having to walk what must have been 3/4 mile to the conference center...

Thomas Bittman opened the AM with a keynote on the Future of Data Center Operations. It had a pretty broad coverage of the state of DC Ops today. He had at least one memorable interjection -- What seemed as a warning to equipment vendors who have strangle-holds over customers... strongly urging customers to reject platform-specific IT technologies. He also predicted the emergence of the "meta-O/S" and the "cloud-O/S" which (I think) is a re-packaging of Gartner's Real Time Infrastructure (RTI) story. And that the meta-O/S had to be platform adn vendor-neutral. But this was the first time that I've heard Gartner pay specific attention to the emergence & legitimac of cloud computing (and the "O/S" to run it).

Next, Donna Scott gave an equally broad-ranging talk on IT Operations Management. Again, she conducted her now 5+ year-old survey of IT's biggest pressures. And once again "high rate of change", "cost containment" and "maintaining availability" took top-honors as the largest ulcer-producing pressures facing CIOs. Also true-to-form, she re-iterated that a shared infrastructure (RTI) is inevitable, breaking down the islands of technology in large data centers.


There were also some interesting vendor break-out sessions; take for example, a session on managing power and cooling from Emerson Network Power by Greg Ratcliff. The trend here is also toward an intelligent monitoring and infrastructure. He spoke of localized cooling (even within the rack) needed as rack power density increases. There was definitely reference to "adaptive cooling" and "adaptive power" -- again implying that efficiencies in large data centers can only be achieved through better use of technology, rather than throwing raw horsepower at the heat/power problem.


Finally, one last surprising (to me) datapoint: the general audience was asked who was using virtualization in production - and 1/2 to 2/3 of the audience raised their hands. This definitely drove-home the point that VMs are (and will be) everywhere. However, I combine this observation with the earlier point that data centers will need a management layer, an "O/S", which is vendor-neutral. At the moment, I don't see any of the existing large vendors stepping up to fill this virtualization managment need any time soon.









Thursday, November 15, 2007

Assessing the New Data Center Metrics

I've been reading-up on work that the Green Grid, Uptime Institute and others have been doing to define metrics around data center efficiency. The work is good but in my mind, misses the mark slightly. All of the metrics I've seen thus far are static - That is, they assume some steady-state aspect of the data center... steady compute loads, steady quantity of servers, etc. But that's not how the world works.

Even Detroit knows that autos get different efficiencies based on how & where they're driven... so the metric called "mileage" is actually measured & documented twice -- one for City, one for Highway. Data Centers need something akin to this as well.

Why? Because IT departments operate at greatly different levels; peak (maybe during the day) as well as off-peak (perhaps nights/weekends). Ideally, the data center should know how to adapt to these conditions: re-purposing "live" machines during peak hours; retiring and temporarily shutting-down idle servers during off-peak; removing power conditioning equipment when not needed; turning off specific CRAC units and chillers when not required (i.e. cold days and/or off-peak hours). We need an efficiency metric that indicates how data centers operate Dynamically.

Anyway, here's a quick survey course in what metrics I did find, and what I'd like to see:

The Green Grid on metrics:
  1. Data Center Infrastructure Efficiency,
    DCiE = (IT equipment power)/(total facility power).

    This is supposed to be a quick ratio showing how much power gets to servers, versus how much else is consumed by power distribution, cooling, lighting, etc. Driving this ratio up means you have less overhead wasting Watts. This wouldn't be too bad a metric if it was used and monitored 24x7, i.e. peak and off-peak.
  2. Power Usage Effectiveness,
    PUE = 1/DCiE (just the boring reciprocal)
  3. Data Center Productivity, (a metric to be adopted in the future)
    DCP = (useful computing work)/(total facility power)

    In theory, this is a great metric: It's like saying "how many MIPS per Watt" can you produce? (BTW, the human brain, the most powerful of all computers, consumes somethling like 25W). Anyway, DCP is a contentious metric... because each computing vendor wants to define "useful computing work" with their own (preferential) way of computing. Frankly, this is most useful to measure efficiency at the server level.
The Uptime Institute

In an excellent paper, the Uptime Institute discusses these in "Four Metrics Define Data Center "Greenness":
  1. Site Infrastructure Energy Efficiency Ratio:
    SI-EER which the Institute is currently working to re-cast in more intuitive and technically accurate terms. I suspect this is much like the Green Grid's DCiE, above
  2. Site Infrastructure Power Overhead Multiplier Which is essentially the same metric as the Green Grid's PUE, above
    SI-POM = (data center power consumption at the meter)/(total power consumption at the plug for IT equipment)
  3. Deployed Hardware Utilization Ratio:
    DH-UR = (qty of servers running live applications)/(total number of servers actually deployed)

    This speaks to the real-time utilization of hardware, and IMHO is one of the best metrics for a dynamic data center. It points to how many deployed servers are actually doing work, vs. those that are sitting "comatose". A very promising metric if it's used in conjunction with equipment that constantly optimizes how many servers are "on", and shuts down idled servers, constantly minimizing this metric.
  4. Deployed Hardware Utilization Efficiency
    DH-UE = (minimum qty of servers needed to handle peak load)/(total number of servers deployed)

    This is another great metric - it speaks to the capital efficiency of hardware - how many need to be provisioned and on the floor, relative to how many are being used actively.
In my ideal world, I'd like to see two things to get us to a "City/Highway" style approach:
  • A DH-UR that changes dynamically, constantly being minimized. This implies that only required servers are actually powered-up and active.
  • An SI-POM that was always driven toward a constant ratio, regardless of compute demand. Which implies that, as compute demand falls, servers are retired and other support equipment (power handling, cooling) also shuts down, keeping the efficiency ratio balanced.
I look forward to conversations with the Green Grid, Uptime Institute and EPA to consider these tweaks to their already fine work.




Wednesday, November 7, 2007

CIO Dialogue - Notes from the Real World

I felt the need to share the following conversation I recently had with the VP of Enterprise Operations for a major healthcare provider. On one hand, the conversation sounded like every stereotype I've heard in trade rags... except it's true. So read this, but be sure to get to the punch line at the end.

He's been in his job for 18 months, and is just now seeming to get his hand around turning the battleship. Which, I might add, "owns one of every imaginable platform and software type" and has perhaps 3,000+ apps on 12,000-15,000 servers, maybe 30%-40% is development. He's got lots of AIX and lots of Sun, but ultimately a mix of other vendors too.

When asked exactly what he owns, he says he doesn't really know... but they're planning a CMDB proje
ct soon. Also, they're quickly running out of data center space, and are pushing 95% of maximum UPS power in most locations. He's thrown-down the gauntlet and halted all new server purchases -- in favor of initiating a virtualization project (which, I might add, is getting upwards of 20:1 consolidation, although he knows that high ratio won't last). He's a risk-taker because he has to be.

So I asked him point-blank, what does he need to make this work. Without a flinch (or a smile) he said "Process and Automation." Process, a la ITIL, and automation -- both of the Run-Book style, as well as the operational style. " If I could have the automation vision that IBM was hawking a few years ago, I'd be thrilled. But it's still vapor".

The good news is that he's closely teamed with his Facilities manager to help him cope with power, real estate and cooling. The bad news is that the Facilities guy is also at wit's-end.

The punchline: This real-life vignette tells me that the traditional IT model is really broken. How come IT -- with all of its computers -- is actually the least automated and efficient arm of the company? I recently read a report from the Uptime Institute which talked about the Economic Meltdown of Moore's Law -- literally, for every $1 of compute asset, it currently costs $1.80 operate it; by 2009, electricity alone will cost triple ($3) what the box cost. What's wrong with this picture?

I know that my VP friend is not alone. But when will the treadmill of IT-being-slave-to-the-hardware end? I'd like to think that automation, active asset management, and the drive toward greater environmental efficiency will begin to influence vendors and managers alike.


Saturday, November 3, 2007

What the Green Data Center Can Learn from the Prius

With The announcement of Active Power Management, and now the Cassatt Active Response product line, I hope that data center operations will now nudge a little closer to the 21st century.

Here's analogy #1: You're driving and come to a red light, you stop the car but the engine keeps running. It's wasteful and inefficient, but because it's generally considered too inconvenient to sta
rt & stop your engine every time you hit a red light, nobody does it. Enter the Prius: Come to a red light, and the engine automatically stops; hit the accelerator, and it starts again. Simple. Automatic. Efficient.

That's the analogy Cassatt is bringing to servers -- if they sit idle, even for an hour a day, they're automatically shut off and re-started when
they're needed. For production environments, this might only apply to a few scale-out architectures that are provisioned for busy times-of-day, but for Development/Test, there are *always* machines that go unused for periods of time. Cassatt's Active Power Management takes care of this automatically. Simple. Automatic. Efficient.

Don't just believe me. On Aug. 2, the EPA published a Report to Congress on Server and Data Cen
ter Efficiency. A core tenet in the report states "implement Power Management on 100% of applicable servers" was a core aspect to "improved operation" of US data centers.

Oh - and here's analogy #2: (and believe-it-or-not, it's from Detroit as well as Japan): It's called Cylinder Shutdown. Turns out that when you literally don't need all the engine's horsepower, cylinders within the engine are dynamically shut down. Check out the future Northstar XV12 Caddy engine, as well the engine in the 2008 Honda Accord.

Turns out, Cassatt technology can do this with IT Servers/blades as well! If you have a farm of servers and a few are sitting idle, they're turned off and kept as "bare metal" until some application needs their horsepower. Then they're dynamically re-purposed for whatever application is needed. That's the ultimate in capital efficiency.

Can this really work? With customers we've spoken to -- some with development environments pushing 4,000 servers -- actively controlling server power & repurposing can save nearly 50% (that's fifty) of operational costs.

Think of all the cars idling at this very moment, and the amount of fuel they're burning. Now, think of all of the servers in your data centers & labs just sitting there waiting to do something. And think of all the Watts they're chewing.


Wednesday, October 17, 2007

Postcards from TelaData's Convergence Conference

I attended and presented at TelaData's Data Center "Convergence" conference yesterday, and had the opportunity to chat with a number of Facilities and IT Operations managers for really big companies. Over 300 folks were at the event, and I bet 2/3 of them were facilities managers looking for an edge to improve the efficiency (and reduce the cost) of building/maintaining large data centers.

Bob Brown, TelaData's CEO offered that the theme of "convergence" applied to 3 areas:
  • Technology Convergence as it applies to voice, video & data all converging to a IP-based standards, and what the implications are on data center build-out, power, cabling, etc.
  • Organization Convergence - the need/requirement that IT and Facilities cooperate and drive new efficiencies; without this cooperation, breakthrough efficiencies and cost reductions just aren't possible
  • Automation convergence, e.g. building/facilities automation standards (like BACnet) interoperating with IT automation (power control, distribution)
One of the highlights was an end-of-day panel with Robert Aldrich (Cisco - Green Data Center solutions), Bill Weihl (Google's "Energy Czar"), and Dean Nelson (Sun's Global lab & data center services person). The panel was congenial, but it was clear that the companies approach "greenness" from slightly different perspectives:
  • Google made it clear that all employees are encouraged to look at Total Cost of Ownership for every project they pursue; they encourage tradeoffs from everyone, esp. between Facilities, operations and IT. It's a numbers game, which benefits the company overall.
  • Cisco operates huge data centers as well - they're also numerically driven, and seem focussed on deriving metrics and standards around energy use - before they implement new programs/policies
  • Sun also drove home the need for IT & Facilities to interact (Dean brought his Facilities counterpart along) and really emphasized that one of the massive benefits of efficient IT is to "give back" real estate to the company. Real estate is the 2nd largest cost to a company (next to payroll) and this has made a huge impact on Sun's margin and bottom line.
A number of talks on the "energy efficiency" track (including mine) dealt with power control at the server level. During our presentations, Cassatt, Server Technology, PowerAssure and VMware all alluded to various approaches to actively managing server power status. Server Tech took a clearly Facilities-minded approach to "Load Shedding" in emergency conditions; PowerAssure takes a managed-services approach, and VMware spoke to future product capabilities. Cassatt (from my biased perspective) had the best holistic and cross-platform story for actively managing server power status across an enterprise.

What was clear from conversations with folks who attended was that there still a rift between facilities managers and IT. For there to be any meaningful progress in data center efficiency, there had to be shared corporate and economic goals for both. And this had to start at the CFO level (for example).





Sunday, October 14, 2007

More Leading Indicators for Cloud Computing

Two more companies I've recently come across that will help accelerate this model: RightScale, essentially brokers AWS infrastructure, while providing a easier-to-use "dashboard, and FlexiScale, a UK-based company, that's trying to take-on Amazon's web services at its own game. (I guess you need to have the word "scale" in your name nowadays...)
  • RightScale provides a platform and consulting services that enable companies to create scalable web solutions running on Amazon Web Services (AWS) that are reliable, easy to manage, and cost less. The RightScale dashboard saves time in maintaining, managing and monitoring all AWS activities, while RightGrid coordinates the auto-scaling of servers according to usage load. The RightImage library provides pre-built installation templates for common software stacks, and RightScale DeltaSets make it easy to customize and manage modifications to machine images. Together with Amazon Elastic Compute CloudAmazon Simple Storage Service (S3) and Amazon Simple Queue Service (SQS) — RightScale enables a next-generation platform for deploying highly scalable web applications.
  • FlexiScale's claims:
    • Provisioning & Scalability:Additional servers can be launched and load-balanced in <1 style="font-weight: bold;">
    • Flexibility:OS agnostic - we support MS Server and all common versions of Linux; Clone a server image and re-use for another test or production server; Policy engine based load-balancing between physical servers
    • Self-service via Control Panel or API:Provisioning of Virtual Dedicated Servers; Start, stop and delete Virtual Dedicated Servers; Resize memory and storage
    • Quality of Service: Fully monitored system - network, storage and servers; Fully automated HW recovery;Flexible snapshot based backups;Secure - each customer has their own VLAN and their own virtual disks
    • Pricing: No subscription fees and no minimum term contract;Simple to understand utility pay-as-you-go pricing model with no catches;Billing module that lets customers see transparently what resources they have been using
I'm sure we'll be seeing more competitors like these in the coming months.

Tuesday, October 9, 2007

Is Cloud Computing Mainstream?

Well, not yet. But by now you've probably heard of the deal between IBM and Google to provide an initial 6 universities (University of Washington in Seattle, Carnegie Mellon University, Massachusetts Institute of Technology, Stanford University, University of California at Berkeley and University of Maryland) with hardware to allow students to program to the cloud.

But this just accelerates my prediction that the concept of "cloud computing" will quickly mature. Folks are beta-testing Amazon's (somewhat fault-prone) EC2 and S3, upon-which anything from components (i.e. queuing services) to entire website storage, can be hosted.

So now, students will be educated in this form of programming, and go into industry with a new level of comfort with this paradigm. This will surely turn the hosting market on its ear in 1-2 years.

Monday, September 24, 2007

Obstacles to Greener Computing

In a post on Greenbiz.com, Emerson Network Power, the Data Center Users Group (a Emerson-sponsored organization), the EPA and Lawrence Berkeley National Labs conducted a survey showing that at least 65 percent of IT managers are using at least some form of energy efficiency practices to reduce costs and lower their environmental impact.

The obstacles they found to achieving such efficiencies include:
  • 40% - lack of encouragement from top management
  • 36% - widespread unawareness of the cost/benefit relationship of energy efficiency
  • 35% - enterprises not wanting to risk reliability
  • 33% - lack of communication between IT and facilities departments

There were some other interesting statistics on energy consumption in the data center:

  • 60% of the data center electrical load is used to power IT equipment:
    - 56% of that being used to power servers,
    - 27% for storage
    - 19% for network equipment

Some other interesting random data points:
  • 41 % of survey respondents said their data center electrical usage is not metered separately from the rest of their facilities.
  • 81% of operators believe that by 2012 they will need additional data center capacity, despite the fact that 64 percent have built or upgraded their data center in the last five years.
  • 27% of respondents believe that despite consolidation and the use of virtualization, their server inventory will increase throughout the next five years.
My big takeaways from this: (1) There's still a big awareness & organizational rift between Facilities & IT Operations, and (2) that efficiencies still stink, and (3) data centers will still grow, as will the energy problem.

Friday, September 21, 2007

A Special PodCast on Active Power Management

Last week I teamed-up with our partner BearingPoint to record a brief overview of Active Power Management: Your On-Ramp to Utility Computing. BearingPoint's Managing Director of Financial Services Infrastructure Solutions joined me to discuss what is meant by Utility Computing. We covered how BearingPoint is organized to help customers implement this type of solution, how it changes the economics and operations of computing, and simple places to start for any medium-to-large enterprise.

One of the best places to begin using automation to optimize IT resource consumption is Active Power Management, i.e. applying policy and software-aware power control, all in a platform-agnostic power optimization scheme. Why ARE your idle machines on if they're not being used?

BTW, there are some other great technology-related BearingPoint Podcasts here too.

Monday, September 3, 2007

The Next Big Thing in data center efficiency: Active Power Management

Tuesday will be a "D'oh" moment for anyone who runs a Data Center and cares about electricity operating costs.

Even the EPA missed this one in their recent Report on Server and Data Center Energy Efficiency.

It's Active Power Management: That is, safely and intelligently powering-down unused and/or idle servers, and re-powering them when needed. It's a huge (and obvious) move when you consider that the average server burns-up more than half of its fully-loaded rated power when it's just sitting doing nothing.

Reams have already been written about energy-efficient servers, DC power distribution, improved cooling systems, hot/cold aisles, and of course server consolidation. But they've all missed-the-boat -- until today. And the solution is surprisingly simple.

Regardless of the type of server and type of software, this technology is non-disruptive to the data center. It's also the moral-equivalent to turning off the lights in a room that you're not using. Think of the Thousands of servers that sit idle most of the time in Development/Test, or in a "warm" failover facility.

What's the secret sauce?
  • It's the ability to gracefully shut-down software prior to turning off the box, and then ensuring re-start when the box is needed again.
  • It's the ability to set policies around application importance and interrelationships, and to be able to communicate directly with the software during a power-cycle.
  • It's the ability to do all of this from a hardware- and software-neutral perspective
  • And, it's the ability to do this in a way that will please both IT and facilities
The singular superstition about turning-off servers is why data centers have kept every single server in their inventory on all of the time. Until now.

All made possible with Cassatt's leading ability to apply sophisticated optimization technology to any problem in the data center. And don't just believe me. Look at what PG&E, Brocade, and IDC are saying. This should help raise-the-bar for energy-efficiency best-practices in the data center.



Friday, August 17, 2007

Watch this webinar: Leveraging Utility Computing - BearingPoint, & Cassatt

I certainly have to highly-recommend this upcoming CIO-magazine sponsored webinar: Utility Computing: How to Leverage Industry Advances to Optimize Data Center Economics. It's scheduled for September 19, 2:00 PM EDT.

Along with BearingPoint,m I'll be discussing (yes, live and in-person!) what we mean by utility computing, and how the technologies that make "utility computing" possible are available today.

It's more than virtualization. It's about intelligently pooling all HW resources you own today to radically cut operational and capital costs -- and attain a level of agility that current HW/SW models inherently block.

As I've said before, CIOs are doing this already, you just don't know it. Look at Amazon's EC2. Look at Google's infrastructure. Look at Sun's Grid system. It's possible to do with the IT infrastructure you have sitting in your data center today.

What could you do if your total operational cost basis (fully-loaded, everything) was $0.10 per instance-hour for all of your compute services?



Pursuing Data Center Efficiency: TelaData's Convergence Conference

Heads-up: TelaData - essentially the premier consultants for designing data centers - is holding a Technology Convergence conference on October 16. I'll be giving a talk on Active Power Management, and Bill Coleman our CEO, will be keynoting. There's also an amazing lineup of speakers on infrastructure, power, and convergence topics. More of my thoughts on policy-based power management.

Bob Brown, CEO of TelaData, is a visionary on this conference. He sees a massive convergence of technologies... technologies
within the data center (i.e. the move toward IP-based video & audio) and the convergence of data center design itself (i.e. facilities, cabling, power management, etc.).

The two together have to be taken into consideration when designing new facilities. If you don't, then you risk mis-estimating compute, power, cabling and other layout requirements. And the $100+ million building you construct is obsolete before it's complete.

And these guys are the pros. While it's confidential (I think) they're advising some of the biggest data center users and web 2.0 companies in the business on data center construction.

Monday, August 6, 2007

The CMDB - An anemic answer for a deeper crisis

My fist dabble in an occasional series of "A contrarian in the Data Center"....

I know that this is quite a provocative subject, but take a moment to consider where I'm going:

My thesis: CMDBs will be doomed either to (a) a short-lived existence as they sediment into other data center products, or (b) disappearing altogether as the industry finally realizes that utility computing (using generic hardware and standard stacks) obviates the need for an a la carte solution which tracks which-asset-is-where-and-doing-what-for-whom.

My evidence: Do you think that Amazon Web Services' EC2 compute "cloud" went out and purchased a commercial CMDB to manage their infrastructure and billing? Do you think Google maintains a central CMDB to track what department owns what machine? Isn't it odd that
an umteen-volume ITIL process ultimately relies on the existence of a conceptual CMDB? (In fact, doesn't it ring strange that such a "panacea" technology needs a so many volumes of paper just to make it work?)

My logic: CMDBs
are essentially a "band aid" for a larger (and growing) problem - complexity. They inherently do nothing to reduce the underlying complexity, configuration variances, or hand-crafted maintenance of the underlying infrastructure. In short, they are just another point-solution product that center managers think will help them drive to a simpler lifestyle -- and they're dead wrong. Instead, they'll be buying another complexity layer - but this time, one that requires them to re-work process as well.

"But wait!" you say; CMDBs are needed because how else do you get your head around infrastructure variances? On what do you base configuration management?
What do compliance systems use as a basis? Incident management processes have to "check in" somewhere, don't they?

Well, yes and no. By saying yes to most of the questions above, you're unconsciously complying with the status quo mindset of how data centers are architected and run. With layers of special-purpose tools, each supposedly simplifying the tasks-at-hand. But collectively, they themselves create complexity, redundancy, and the need for more tools like themselves. Every one of these tools maintain the assumption of
continued complexity, configuration variances, and hand-crafted maintenance of underlying infrastructure

So? BREAK THE MODEL!

My conclusion: What if the data center had an "operating system" ? This would automatically pool, re-purpose and re-provision all types of physical servers, virtual machines, networking and storage infrastructure. It would optimize how these resources were applied and combined (even down to selecting the most power- and compute-efficient hardware). It would respond to failures by simply managing around them and re-provisioning alternate resources. It would react to disasters by selecting entirely different physical locations for compute loads. And all completely platform-agnostic.

Now - if this system existed (and, of course, it does), then why would you need a CMDB?
  • The "Data base" and the "configuration-of-record" would have to already be known by the system, therefore present from the start, and constantly updated in real-time
  • Any infrastructure variances would be known in real-time - or eliminated in real-time, as the system re-configured and optimized
  • Configuration management, as we understand it today, would be obviated altogether. The system would be given a set of policies from which it would be allowed to choose only approved configurations (all standard, or not). The approved configurations would be constantly monitored and corrected if needed. There would be no "configuration drift" because there would be no human interactions directly with machines - only policies which consistently delivered upgrades, patches and/or roll-backs.
  • Compliance (per above) would essentially be governed by policy as well. The system's internal database (and historic record) could be polled by any external system which wanted to ensure that compliance was enforced over time.
  • Traditional incident management processes would essentially be a thing of the past, since most would be dealt with automatically. In essence, trouble tickets would be opened, diagnosed, corrected and closed automatically, and in a matter of seconds or minutes. Why then a massive ITIL encyclopedia to govern a non-existent human process?
Net-net: eventually, IT staffs will get off of the treadmill and realize that all of these point-solutions -- CMDB's included -- are merely perpetuating the need for themselves. Look outside the box you live in, break the mold, consider that there's a better way to manage IT on the other side. Say "No" to these point solutions, and start with saying "No" to the notion of CMDBs.

Say "Yes" to treating the data center at the system-level scale, not at the atomic scale.


Monday, July 23, 2007

HP Buys OpsWare: An Innovator's Dilemma?

Rumors circulating the valley today proved true: HP offered $1.6 billion to acquire OpsWare, a data center automation firm. HP has been quite savvy in acquiring a number of top players (Mercury Interactive, Peregrine Systems) in the automation space, bolstering its standing in the enterprise software area. But I pose the following question: Will HP be successful in making this marriage work?

Here's why: The combination of Mercury (performance management tools, IT governance and more), Peregrine (IT financial management, and more) and Opsware (IT automated provisioning & process automation) make for a powerful set of disruptive technologies -- that disrupt how data centers are "traditionally" monitored and managed. Those entrenched "traditional" products include IBM Tivoli, CA Unicenter, and HP's own OpenView.

Now, following the Innovator's Dilemma theory, posited by Clay Christensen, large companies will be adverse to adopting newer technologies that would naturally erode/cannibalize their existing technologies, ways of doing business, and cash flows.

Yet, here we have an example where OpsWare (at least would have) come gunning at OpenView with a completely new & disruptive approach to managing and monitoring data centers. But now, OpsWare will be part of HP. So, Will HP have the balls to manage OpsWare to its natural disruptive technological end (and probably make a boatload of money), or, will they find that it erodes the OpenView market and becomes a threat to their existing OpenView base?

Part of me says these guys are smart. I would have combined the same types of companies to create a new IT management platform for the future. (Oh - and prediction: watch this space to see who picks up the next set of disruptive companies like Netuitive and Splunk). However, the other part of me says that HP's data center automation message as-of late has been singularly "Blades". This is simplistic, and surely driven by marketing-program-du-jure.

There's on other chink in the execution of this deal: these acquisitions do nothing to advance HP's own hardware business -- and arguably, push hardware down a level towards further commoditization. What effect will this have on HP hardware revenues?

My theory continues to hold that the next real player in the data center management space has to come from an independent, an non-equipment vendor. That leaves BMC, EMC, or CA. These guys are the only ones truly incented to create a revolutionary management platform that's truly platform independent -- and therefore valuable.

Tuesday, June 12, 2007

Postcards from Gartner's IT Infrastructure, Ops & Management Summit 2007

Orlando is steaming, and Gartner's newest IT Summit hasn't been far behind. This is Gartner's first summit of its type, with reasonable attendance (~800) and a great line-up of talks and break-out sessions. Monday was also where Cassatt announced its partnership with BladeLogic.

The event opened with a keynotes from Mike Chuba and Cameron Haight, followed by a great forw
ard-looking talk on the future of infrastructure and operations from Tom Bittman. He clearly sees the period between 2008-2012 as a shift from "Silos" to shared IT "Pools" - as virtualization itself shifts from consolidation to higher-value to the data center. He further predicted that starting around 2010, true Real Time Infrastructure will become mainstream (see the picture). This will be the enabler as IT-as-a-utility.

But he was careful to define growing distinctions between types of virtualization:
At the Application level, there are containers, zones, LPARs, VPARS;At the O/S level, we're seeing a number mainstream VM technologies, including SW appliances; and, at the Hardware layer, we are seeing Grid and Infrastructure-as-a-Service (i.e. Amazon EC2).

Perhaps the most entertaining 'guest' keynote was from Peter Cochrane - ex-CIO for BT, and now a highly-regarded consultant. Brilliant, wry and witty, he opened by positing that IT's reality is having to deal with heterogeneity, mobility and increasing availability of bandwidth. With that bandwidth, (which will be exploited via increasing penetration of fiber, frequency-hopping and spatial distribution) the notions of connectivity to the "cloud" will be pervasive. And, the concepts of "connectivity" and "communication" will begin to shift to concepts of "location" and "presence". The other theme he put forward was one of release of control/centralization. He began with the fact that central control of broadcast bandwidth was shifting from a few thousand outlets (broadcast TV, radio, etc.) to billions of sources (phones, pervasive wifi, transmission "hopping", etc.) Release of control was also shifting from creativity in the office to creativity at home... very web 2.0 -- Oh, and he reitterated that the best definition of web 2.o was put forth by Tim O'Reilly a number of years ago (one of my favorite pieces).

One of the most well-attended break-out sessions on day 1 was run by Ed Holub & Debra Curtis, "Running IT Like a Business" - putting forth that IT has to think of itself not as a cost center, but as a business unit with customer management, product management, marketing, financial controls and... yes... pricing. This of course requires that IT figure out how to identify costs and provide charge-back. And finally, it requires that IT be comfortable with losing business to competitors, i.e. outsourcers. More than ever, running IT-as-a-Utility to achieve efficiencies seems more needed than ever here.

Another series of sessions dealt with Data Center Power Management. It was clear that the current way of running data center was essentially going to run out of electrical capacity in the future - so talk was not only about server efficiency, but cooling efficiency, and prudent facilities design as well. One particularly interesting breakout session, hosted by Will Cappelli, "The Convergence of Operations and Energy Management". The observations here were huge: companies w/large data centers will come face-to-face with international & domestic carbon emissions regulations; IT and Facilities orgs will be required to work together to increase overall energy efficiency; IT energy & power consumption will have to be managed and intelligently optimized (on this, see my previous blog on turning off idled servers).

Clearly, there were tons of other content regarding IT Operations management, Process (i.e. ITIL), discussion around CMDBs, virtualization and more. Way too much to summarize. But stay-tuned as I may comment on some of these from time-to-time :)

Tuesday, June 5, 2007

"Green Data Centers" and the Silicon Valley Leadership Group

Last month Cassatt Corp. hosted a special meeting of the Silicon Valley Leadership Group's Energy Efficient Data Center Demonstration Project. With about 20 companies represented in the room, the topic (hosted by Ray Pfeifer, SVLG's Chair of the Project) was centered on radical new ways to cut the rapidly-rising power consumption (and cost) of running today's large data centers.

The Question: We posed the following question to the audience -- Why keep servers turned on when they are not being used? This is especially impactful when you consider a recent APC white paper indicating that the average server consumes about 50% of its loaded power, even when sitting idle. The analogy is to lighting in modern office buildings, where motion sensors only turn lights on for occupied rooms... and when a room is deemed idle and unoccupied, lighting is turned off. Why couldn't the same analogy apply to the over-provisioned servers in a data center - during peak times, as well as to nights/weekends? Clearly the problem is not as simplistic as light switches, but why isn't there a solution?

The Demo: So, with 3 video projectors blazing, two live data centers on-line, and the Collage Software in control, we set out to prove to the audience that power management was not only possible, but that it would save money as well. The scenario illustrated that an external trigger (say, a "curtailment event" from a local utility) could cause Collage to apply policies to power-down low-priority servers (according to their power consumption and/or efficiency) and even migrate-away their compute loads to another data center where power was cheaper. Obviously, the same could be done on a scheduled basis as well. Well, the demo was a success, even down to watching real-time power consumption curves dip-and-settle.

Not uncoincidentally, it turns out that power companies (like PG&E here in Sunny CA) offer incentives for shifting power to off-peak times, as well as special demand-response programs and incentives for firms that react to electrical demand during "events" by additional short-term reductions in power use (like turning off lights & HVAC). Our initial conversations with PG&E (who was also in the room!) showed that they were eager to pursue this type of approach, and confirmed that Cassatt was the first to tackle this problem!

The Numbers: Following the demonstration, we also ran some conservative financial numbers. They showed that a company with 500 typical servers could regularly schedule shut-downs of idle equipment -- even during peak periods as well as nights/weekends -- and save 20% + of their total energy costs! These indications were significantly encouraging regarding the economics of this approach.

The Punchline: So, what there was to take away here is that medium-to-large data centers can save a bunch of $, be "green", and do so without having to change any hardware or software! So ask yourself -- as you pursue installation of energy-efficient IT *equipment* why are you not also pursuing energy-efficient *operation* of that equipment??

My Related Blogs:
4/6/07 - D'oh: Turning Off Idle Servers
1/22/07 - Clothes Driers, Data Centers, and Power Management

Thursday, May 31, 2007

XenSource : Cassatt :: Virtualization : Automation

Here's an update on what could be an interesting partnership: XenSource (the commercial end of Xen, the opensource hypervisor) is teaming-up with Cassatt for a webcast on June 6th. (which, by-the-way, is yours-truly's birthday... a distinction which I share with Tim O'Reilly and an old buddy from Sun, Simon Phipps).

XenSource, you will recall, is the current underdog VM provider in the market - a pure-play high-performance approach to application virtualization. Part of their competitive strategy is (a) how they price (zero if you choose the open-source version, but way lower than the competition if you buy from XenSource), and (b) how they
distribute (bundled in with the major Linux distros, not to mention Solaris)

The webcast has two heavy-hitters: Simon Crosby (who's XenSource's CTO), and Rob Gingell (who was a past Sun Fellow and VP -- now Cassatt's CTO). It should be interesting to hear their take on automating virtualization, given the market noise around XenSource - but also given the fact that they themselves don't offer a sophisticated VM management/automation solution... but Cassatt does.

Given I have an 'inside track' on this, I suspect that the conversation will also turn to "what comes next" after virtualization - probably a pretty valuable conversation if you care about your IT career a year from now.

Sunday, May 20, 2007

Postcards from Forrester's IT Forum 2007

Forrester Research last week held its 2007 IT forum in Nashville, which I attended.

If there was any theme during the first two days, there it was a conceptual shift from "IT" (info. technology) to "BT" (business technology) - keynoted by their CEO, George Colony. During one of the keynotes, the oft-repeated adage summed it up:
"There are no IT projects anymore, just Business Projects"

To that end, there were a litany of guest-keynoters, notably Jeanne Ross of MIT Sloan, Robert Willett, CEO of Best Buy, and others. Each of their presentations went down a relatively conceptual path of assessing organizational agility and business-readiness, alignment, and somewhat Dilbert-esque abstractions trying to align their talks with the concept of "BT"... I say this only because the audience was less one of MBAs, and more of operations executives looking for tactical trends and pointers.

However, the best talk IMHO came from Robert Beauchamp, CEO of BMC software. He's a very down-to-earth, articulate guy- even in front of 1,000 people. I was most impressed by his Shoemaker's Children analogy... that the IT (alright, BT) organizations in enterprises are arguably the
least automated departments around. ERP is automated. Finance is automated. Customer interaction is automated. But IT is still manually glued-together, with operations costs continuing to outpace capital investments. He showed the chart here at right (from IDC!) which hits the point home.

However, I was rather impressed with the analysts we spoke 1:1 with. Each is closely tracking the IT automation trend, how virtualization is playing an initiating role in the IT Utility, and how this automation trend is beginning. Also most notably, I bumped into an old friend from Sun, James Staten, who was just brought on to Forrester to follow trends with the Mega Data Centers such as their economics and use of automation as well.

Saturday, April 28, 2007

SmugMug's Don MacAskill and Utility Computing

Surfing has its benefits. I tripped over SmugMug CEO Don MacAskill's Blog today. SmugMug is a rough competitor to Flickr and other (lower-grade) photo archiving sites. They archive about 130,000,000 photos right now. And They use Amazon's S3.

This is a perfect commercial example of server-less IT I spoke about last week. And it's proof that the economics of Utility Computing are compelling. MacAskill estimated that he's saving about $500,000 anually by not buying and managing his own storage (he computes the number in his blod). And he expects that number to increase. Amazon has taken the traditional approach to managing storage (S3) and computing (EC2) and applied a utility automation paradigm -- enabling a completely new cost model. How else could they be offering such pricing to users?

What's this going to enable in the future, Read on: MacAskill's other Blog: "Amazon+2 Guys = The Next YouTube". (aka the server-less web service!)

I gotta keep wondering: When is corporate IT going to catch onto this utility computing approach, and make "compute clouds" out of their own stuff?

Sunday, April 22, 2007

Prediction: Server-Less IT Services

Folks like Greg Papadapoulous at Sun say that a small number of companies will invest in creating a huge infrastructure of computing power. (See my blog of 12 Jan. 2007). And folks like Amazon are already doing so with their Electronic Compute Cloud (EC2), while others like Google, eBay, Yahoo etc. are likely to follow.

To Wit: Carriers like Verizon have announced intentions to do so, and SalesForce.com recently announced its existing ability to host more than just CRM applications. But what will really signal the shift toward "compute cloud" use will be the third-party vendors that make use of these resources.

So Here's my prediction: As the infrastructure vendors build-out their compute and storage farms, a new class of computing "brokers" will emerge. These players will adapt the needs of users and IT departments to make seamless use of these compute and storage "clouds". Everything from backing-up your laptop for pennies a GB, to hosting and failover services that don't own a single server.

And here's proof it's happening, with "mashups" of the following just around the corner:
  • JungleDisk: offering a simple windows interface to allow individuals to create a "web drive" onto Amazon's S3 storage
  • Weoceo: offering a product that allows existing servers to "overflow" peak computing needs onto Amazon's EC2 cloud
  • Enomalism: providing services to provision and migrate virtual "elastic" servers, even onto and off-of the Amazon EC2 cloud
  • Elasticlive: which essentially provides virtual hosting services - as predicted - (and works with Enomalism, above). Plus, they charge by the "instance-hour", not by the server type!
  • Geoelastic: a beta group of "global hosting providers" who will be creating a "global elastic computing cloud" and presumably balancing loads between physical centers.
  • Distributed Potential: beginning to deliver pay-per-use grid computing capacity (powered by Elasticlive and Enomalism technologies, above)
  • Distributed Exchange: Also powered by (and presumably founded by) ElasticLive and Enomalism; claiming to "broker" excess compute capacity between providers
  • Dozens of 3rd-parties creating even more applications on S3
The question is, how quickly will small- to medium-sized businesses feel comfortable outsourcing their IT needs to a service that itself may not own any physical servers? What security, compliance and privacy issues might arise? My gut tells me that these are merely details that will be overcome as the new economics of this model crushes the existing economics of owning your own iron.

Lastly, from somewhat of a self-serving perspective, Cassatt essentially creates a "cloud" out of existing resources within corporate IT. At that point, shifting loads between "clouds" (internal or external) becomes a simply policy-based procedure.

Thursday, April 19, 2007

Moving Virtualization Up a Notch

I had the opportunity to speak the other day with Dan Kusnetzky, who interviewed Cassatt for his ZDnet blog which reports on virtualization trends. And boy, he really gets the trend.

Right off, he started with observing that "virtualization" isn't just one thing (Consider: Hypervisors, zones, containers, LPars, network VLANs and virtualized storage). We also quickly observed that virtualization probably isn't an end-game-in-itself for IT. Rather, it represents the most critical enabler that will ignite transformation in the IT industry.

That transformation represents a new way to look at managing IT: Today, we have specialized hardware, software, HA/failover software, monitoring & performance analysis systems, and dozens more. Tomorrow, the transformation will look like managing all of these systems holistically, much the way an Operating System manages components within a server. The automation will be technology agnostic, made possible through virtualization. A number of Dan's earlier interviews all point to this inevitability as well.

He had a bunch of great observations, but the last I liked best: "It's important to take the broadest possible view and avoid point solutions. From this vantage point, a failure of some resource must be handled in the same way as any other condition that causes the configuration to no longer meet service level objectives.
"

For me, the takeaway from the conversation was something I've said before: take the "long view" on implementing virtualization... it may yield you quick HW savings today, but if its automated in an "IT-as-utility" context, its future savings will dwarf what the industry is seeing now.

Friday, April 6, 2007

D'oh. Turning off idle servers

A question just dawned on me. Data centers automate service-levels according to policies... like "always make sure application X has priority" or "always keep server utilization at-or-above Y". So, why not treat power consumption the same way?

Here's what I mean: there are times when server use is low - like during weekends or during the evening. There are also "events" (like the power emergencies we tend to get here in California) where you'd like to minimize power use when your electrical utility tells you to.

So I'm thinking - why shouldn't data centers respond to electrical cost/availability/demand the same way they respond to compute availability/demand? When "events" happen, we turn off the office lights, right?

It turns out that power companies (like PG&E here in Sunny CA) have "traditional" programs to encourage energy efficiency (like rebates for efficient light bulbs, and even efficient servers). But they also have special demand-response programs and incentives for firms that react to electrical demand during "events" by additional short-term reductions in power use (like turning off lights & AC).

Couple that with server automation software, and you've got a combination that's pretty neat: Data Centers that can do things like turn-off low-priority servers, or perhaps move critical applications to other data centers during power events. Cassatt's identified a couple of interesting scenarios:

  • Low-priority servers automatically powered-off during power "emergencies"
  • Standby servers that remain powered-off ("cold") until needed
  • "Follow-the-moon" policies where compute loads are moved to geographies with the least-expensive power
  • Policies/direction to use the most power-efficient servers first
  • "Dynamic" consolidation, where virtual machines are constantly moved to achieve a "best-fit" to maintain utilization levels (minimizing powered-up servers)
Our initial conversations with PG&E showed that practically no data center customers employ this technology yet, even though PG&E offers programs to reward (with rebates/incentives) this sort of activity. BTW, there are any number of other utilities that encourage energy efficiency, e.g. the Consortium for Energy Efficiency.

If building operators can automatically turn off non-critical lights and HVAC systems during electrical emergencies, then why don't data centers?

Sunday, February 25, 2007

Blog Tag

Well, I got tagged (not the graffiti type) by Steve Wilson. Now it’s my turn to reveal 5 things-you-don't-know-about-me, and then tag 5 more unsuspecting-yet-interesting folks… But first, I had to do a bit of the pedigree/ancestry tracing to see where this tagging all began:

Steve first got tagged by Rich Green (a former colleague; and prior to that, a former colleague); Sin-Yaw Wang tagged Rich; Hal Stern tagged Sin-Yaw; Mary Cay Kosten tagged Hal. Then the trail went cold... Mary Cay’s original tag was from behind Sun’s Firewall, so this was a dead-end.

So I took another track from someone else I know: Jonathan Schwartz was tagged by James Governor (of RedMonk, a cool analyst group I’ve worked with in the past); James was tagged by Jeff Pulver. Ahh. Jeff points to the "Root" of Blog Tag pedigree, 3 orders-of-magnitude better/bigger than mine, residing at Solo SEO ... clearly indicating that some folks have way too much time.

Anyway, here goes the whole point of the thing:

  1. I once sailed 1,300 miles from Myrtle Beach SC to Tortola, BVI – with no electronics onboard except a UHF radio and a Timex digital watch. There were 3 of us onboard for 10 days (fortunately two of us knew how to use a sextant). It was one of the most memorable times of my life, being at the mercy of the elements, but having ‘science’ in our back pocket. We made landfall within a mile or so of our target… You gotta read the book “Longitude” to appreciate how important this form of navigation was.
  2. I was a product of the Reagan defense-spending era – my first job (just out of engineering school) was working for a defense contractor designing and building hardware-based real-time adaptive optics and wavefront correction systems. Next time you hear about DoD blowing planes out of the sky with lasers, think of me. (I think that’s all I’m allowed to say about that)
  3. I’d rather be renovating a bathroom, laying tile, or for that matter, building a house. There’s something intensely gratifying about building something permanent/durable.
  4. I wrote a program (circa 1979) in Basic on a Commodore PET with 16k of RAM that played Solitaire. Yes, just 16k. When I finished, there wasn’t enough space to actually execute it. This was perhaps my finest (and final) foray into software. Unless you consider writing FORTRAN batch jobs using punch-cards on a Sperry/Univac.
  5. I started building a historically-accurate plank-on-frame scale model of Lord Trafalgar’s flagship, HMS Victory, back when I was in high school. I put 18 months into it and got as far as building the hull. Then life intervened. Once the kid(s) are out of school, I burn-out in High Tech, and I get over that “I’ve gotta build a house” thing, the HMS Victory is how I plan to occupy my remaining time in the Rest Home.
And now, the Tag-You're-It list (which isn't so easy when most of your friends don't yet blog): Ashesh Badani, Rich Sands, James Urquhart, David Gee, and Vinay Pai.


Sunday, February 18, 2007

Virtual Infrastructure Diversity... and the Need to Manage it.

I can't take credit for the following observations... They were simply handed to me by one of Cassatt's best account executives, who spends endless days speaking with CIOs and operations personnel who manage huge, diverse data centers - and who frequently miss these important facts. He's noted that "Virtualization" is being inaccurately subsumed to mean the software hypervisors purveyed by VMware.

However, the Truth is that virtualization is multi-vendor, multi-technology, and inherently heterogeneous. So: How will these diverse technologies be managed in the future? Here are some observations:
  1. People are confusing ‘Virtualization’ with VMware (but remember, there are more types of virtualization than just hypervisors for software!)
  2. VMware’s scope is limited to X86 Platforms. Most organizations have more than X86.
  3. There are other types of Virtualization within other platforms (Example: Mainframe LPARs, Solaris containers/Zones, HP VPARs).
  4. There are even Virtualization alternatives in X86 (XenSource, Xen/RHEL, Xen/SuSE, and SWSoft Virtuozzo and others).
  5. There are different types of Virtualization (JVM, VLANs in the network domain, SAN & NAS in the storage domain, not to mention Incipient and 3Par).
  6. VMM Virtualization is an OS feature and it’s price will be commoditized to $0 over time. Evidence:
    - Historically LPARs came with MVS
    - Sun does not charge extra for Containers/Zones
    - IBM AIX & HP-UX don’t charge extra either
    - JVM’s are free.
    - VLAN-ing comes built-in into Switch Firmware
    - You can get NAS for free or pay for specially-tuned version in Proprietary Hardware (NetApps)
    - Watch out in the SAN-space but it will be interesting to see where that goes with iSCSI and 10Gig Ethernet etc.
  7. A comment on X86 VMM-pricing:
    - Red Hat will deliver Xen for free in RH5
    - SuSE will deliver Xen for free in SuSE 10.x
    - Intel supports it for free in Intel-VT chips
    - AMD supports for free in their Pacifica chips
    - XenSource costs 25% of VMware and will eventually be acquired by somebody who will give it away for free as part of something else
Thus we see that there are nearly dozens of types of "virtualization" that need to be managed collectively - many of them are already present in data centers, and many will be available for free (or nearly so) in the near future.

This is the next impending management crisis - management, automation, and optimization of virtualized computation, containers, storage, and networking.


Wednesday, February 7, 2007

Postcards from IDC’s Virtualization Forum 2.0

I just attended the IDC analyst conference yesterday in New York. In some sense, the big news was simply how pervasive virtualization is becoming, and how many different technology sources there are. And most interesting, there was lots of talk about *managing* virtualization, not just using it for consolidation.

John Humphreys (Enterprise Computing) opened with some riveting statistics:

  • 62% of VM users are looking for a “unified tool”
  • 45% of servers planned for installation next year will be virtualized
  • 23% is the average savings being reaped from HW, power and facilities
  • 70% of IT costs still reside in operations... not in hardware or software.

In addition, he had the foresight to refer to VMs as “the new atomic unit of management.” Hmmm. Right up Cassatt’s alley.

Finally, under “challenges”, one of the big bullets was “How can you consolidate/manage across the DMZ?”... which I found interesting. True, it’s a growing issue, but frankly, with automated network configuration, Collage already manages virtual (and physical) resources across a number of virtual networks.

We also spoke 1:1 with analysts Matt Eastwood (VP, Enterprise platforms), Michelle Bailey (Datacenter Trends), and Al Gillen (VP, system software). Overall, they confirmed the themes that Virtualization is pervasive, and the challenges were becoming how to manage this new technology. Topics of note:

  • Managing across networks (as above)
  • Parameterized & “mass-produced” provisioning of VMs
  • Managing a virtual enterprise across geographies
  • Justifying economics beyond hardware savings

Besides the better-known technologies (i.e. VMware and Xen) there were also some interesting virtualization options:

  • Trigence: which has an interesting “encapsulation” technology; they don’t use a hypervisor, per se, but rather encapsulate an application, plus all relevant files/libraries, etc. so it’s completely portable
  • SWsoft: which has a unique virtualization approach which, if you only care about one OS, gives you high performance and a huge degree of consolidation
  • HP and IBM: both hyping their versions of self-managing blade systems.
  • IBM has also announced their Secure Hypervisor (sHype) product, that may be incorporated into 3rd-party hypervisors.

From a purely selfish perspective, Cassatt is pretty well-positioned to help manage/automate an upcoming need: as the Virtualization market matures, more datacenters will need a vendor-neutral way of managing across virtual and physical domains, pooling resources, and guaranteeing service levels.