In the room were mostly vendors such as AMD, Emerson Electric, HP, Intel, NetApp and Sun. Plus, folks from DOE, EPA, Lawrence Berkeley National Labs, the Green Grid, and the Uptime Institute attended as supporting resources to DOE. And first-off, it was great to see the overlap/interaction between these groups -- especially that DOE, Green Grid and Uptime Institute were cooperating.
The tool itself is expected to have 4 components of assessment: IT (servers, storage, networking, softwar), Cooling (chillers, AC units, fans), Power Systems (UPS, distribution) and Energy Sources (generation, etc.).
But the real core of the day's conversation is around what's meant by "Efficiency" - all agreed that at the highest level it was the ratio of "useful computing output" to the "energy supplied to the overall data center". That second number is sort of the easy part: it includes all of the power used for lighting, cooling, power distribution, etc. etc.; sometimes it's hard to measure empirically, but it all comes down to kWhs. The real issue, it turns out, is what's meant by the first number, "useful computing output".
In our afternoon IT breakout group, about 10 of us debated for a good hour or so about just that: how do we define "output"? Is it MIPS? Is it measured in SPEC units? Is it CPU utilization? And what about storage and network "output" as well? In the end, we agreed that we should define it the way any IT professional would: As an application service with an associated
When the DOE tool (or set of tools) is complete in mid-2008, it will represent a seminal event:
And, I hope, this benchmark will begin to accelerate (a) the move toward dynamically-managed shared compute resources, and (b) the technical & organizational bridge between IT system management and facilities management.
Thursday, December 20, 2007
In the room were mostly vendors such as AMD, Emerson Electric, HP, Intel, NetApp and Sun. Plus, folks from DOE, EPA, Lawrence Berkeley National Labs, the Green Grid, and the Uptime Institute attended as supporting resources to DOE. And first-off, it was great to see the overlap/interaction between these groups -- especially that DOE, Green Grid and Uptime Institute were cooperating.
Wednesday, December 19, 2007
So in the spirit of the holidays, here are some interesting links:
1. Bridget Botelho at SearchDataCenter writes "Servers get no rest during the holidays" -- that most enterprises (and especially small/medium businesses) leave all servers on during the holidays, even if the company is closed. It's a waste of power/money, even though many solutions abound. (And did you even think about the security risk of keeping your IT on but essentially unsupervised?)
2. And, for those of you with money left in your budget, Rick Vanover from TechRepublic chimed-in to suggest 10 good ways to use your remaining IT budget before the end of the year - I particularly like the following:
- #3: Purchase power management: Many new power management devices are available now that can be a good replacement for your limited power distribution units (PDUs). These PDUs can add management layers to individual power sockets for power consumption, naming, grouping, and power control. The new devices can also add more ports should you need to power more computer systems in your racks.
3. And, Rick went on to also cite an earlier blog of his, 10 things you should know about advance power management - another topic near-and-dear to my heart, especially:
- #7: Turn off retired or unused devices: This will reduce your power consumption — and possibly accelerate your removal of the device so as not to overprovision power unnecessarily...
Monday, December 10, 2007
- Has anyone looked at the labor costs of this? I know that even on my tiny little dozen-machine network, I am reluctant to power everything off at night simply because it takes so bloody long waiting for the damn things to boot up in the morning. Seems like actual working fast-boot technologies would go a long way to sell this initiative.
BTW, if you're interested in additional "Urban Myths" about server power control, check out the "Myths and Realities of Power Management" page.
And while you're at it, give us some feedback on how you feel about IT Energy Management and "Green IT": we're hosting a 5-minute survey this week (and, you could win a Wii if you take it).
Saturday, December 8, 2007
The two are making the case not only for server power management, but are calling on vendors to go a step further, to make computers adapt their consumptive ranges directly to the compute load consumed. This would be highly complementary to consolidation efforts currently underway.
In conclusion, the paper says,
- Servers and desktop computers benefit from much of the energy-efficiency research and development that was initially driven by mobile devices' needs. However, unlike mobile devices, which idle for long periods, servers spend most of their time at moderate utilizations of 10 to 50 percent and exhibit poor efficiency at these levels. Energy-proportional computers would enable large additional energy savings, potentially doubling the efficiency of a typical server. Some CPUs already exhibit reasonably energy-proportional profiles, but most other server components do not.
- We need significant improvements in memory and disk subsystems, as these components are responsible for an increasing fraction of the system energy usage. Developers should make better energy proportionality a primary design objective for future components and systems. To this end, we urge energy-efficiency benchmark developers to report measurements at nonpeak activity levels for a more complete characterization of a system's energy behavior
Tuesday, November 27, 2007
Thomas Bittman opened the AM with a keynote on the Future of Data Center Operations. It had a pretty broad coverage of the state of DC Ops today. He had at least one memorable interjection -- What seemed as a warning to equipment vendors who have strangle-holds over customers... strongly urging customers to reject platform-specific IT technologies. He also predicted the emergence of the "meta-O/S" and the "cloud-O/S" which (I think) is a re-packaging of Gartner's Real Time Infrastructure (RTI) story. And that the meta-O/S had to be platform adn vendor-neutral. But this was the first time that I've heard Gartner pay specific attention to the emergence & legitimac of cloud computing (and the "O/S" to run it).
Next, Donna Scott gave an equally broad-ranging talk on IT Operations Management. Again, she conducted her now 5+ year-old survey of IT's biggest pressures. And once again "high rate of change", "cost containment" and "maintaining availability" took top-honors as the largest ulcer-producing pressures facing CIOs. Also true-to-form, she re-iterated that a shared infrastructure (RTI) is inevitable, breaking down the islands of technology in large data centers.
There were also some interesting vendor break-out sessions; take for example, a session on managing power and cooling from Emerson Network Power by Greg Ratcliff. The trend here is also toward an intelligent monitoring and infrastructure. He spoke of localized cooling (even within the rack) needed as rack power density increases. There was definitely reference to "adaptive cooling" and "adaptive power" -- again implying that efficiencies in large data centers can only be achieved through better use of technology, rather than throwing raw horsepower at the heat/power problem.
Finally, one last surprising (to me) datapoint: the general audience was asked who was using virtualization in production - and 1/2 to 2/3 of the audience raised their hands. This definitely drove-home the point that VMs are (and will be) everywhere. However, I combine this observation with the earlier point that data centers will need a management layer, an "O/S", which is vendor-neutral. At the moment, I don't see any of the existing large vendors stepping up to fill this virtualization managment need any time soon.
Thursday, November 15, 2007
Even Detroit knows that autos get different efficiencies based on how & where they're driven... so the metric called "mileage" is actually measured & documented twice -- one for City, one for Highway. Data Centers need something akin to this as well.
Why? Because IT departments operate at greatly different levels; peak (maybe during the day) as well as off-peak (perhaps nights/weekends). Ideally, the data center should know how to adapt to these conditions: re-purposing "live" machines during peak hours; retiring and temporarily shutting-down idle servers during off-peak; removing power conditioning equipment when not needed; turning off specific CRAC units and chillers when not required (i.e. cold days and/or off-peak hours). We need an efficiency metric that indicates how data centers operate Dynamically.
Anyway, here's a quick survey course in what metrics I did find, and what I'd like to see:
The Green Grid on metrics:
- Data Center Infrastructure Efficiency,
DCiE = (IT equipment power)/(total facility power).
This is supposed to be a quick ratio showing how much power gets to servers, versus how much else is consumed by power distribution, cooling, lighting, etc. Driving this ratio up means you have less overhead wasting Watts. This wouldn't be too bad a metric if it was used and monitored 24x7, i.e. peak and off-peak.
- Power Usage Effectiveness,
PUE = 1/DCiE (just the boring reciprocal)
- Data Center Productivity, (a metric to be adopted in the future)
DCP = (useful computing work)/(total facility power)
In theory, this is a great metric: It's like saying "how many MIPS per Watt" can you produce? (BTW, the human brain, the most powerful of all computers, consumes somethling like 25W). Anyway, DCP is a contentious metric... because each computing vendor wants to define "useful computing work" with their own (preferential) way of computing. Frankly, this is most useful to measure efficiency at the server level.
In an excellent paper, the Uptime Institute discusses these in "Four Metrics Define Data Center "Greenness":
- Site Infrastructure Energy Efficiency Ratio:
SI-EER which the Institute is currently working to re-cast in more intuitive and technically accurate terms. I suspect this is much like the Green Grid's DCiE, above
- Site Infrastructure Power Overhead Multiplier Which is essentially the same metric as the Green Grid's PUE, above
SI-POM = (data center power consumption at the meter)/(total power consumption at the plug for IT equipment)
- Deployed Hardware Utilization Ratio:
DH-UR = (qty of servers running live applications)/(total number of servers actually deployed)
This speaks to the real-time utilization of hardware, and IMHO is one of the best metrics for a dynamic data center. It points to how many deployed servers are actually doing work, vs. those that are sitting "comatose". A very promising metric if it's used in conjunction with equipment that constantly optimizes how many servers are "on", and shuts down idled servers, constantly minimizing this metric.
- Deployed Hardware Utilization Efficiency
DH-UE = (minimum qty of servers needed to handle peak load)/(total number of servers deployed)
This is another great metric - it speaks to the capital efficiency of hardware - how many need to be provisioned and on the floor, relative to how many are being used actively.
- A DH-UR that changes dynamically, constantly being minimized. This implies that only required servers are actually powered-up and active.
- An SI-POM that was always driven toward a constant ratio, regardless of compute demand. Which implies that, as compute demand falls, servers are retired and other support equipment (power handling, cooling) also shuts down, keeping the efficiency ratio balanced.
Wednesday, November 7, 2007
He's been in his job for 18 months, and is just now seeming to get his hand around turning the battleship. Which, I might add, "owns one of every imaginable platform and software type" and has perhaps 3,000+ apps on 12,000-15,000 servers, maybe 30%-40% is development. He's got lots of AIX and lots of Sun, but ultimately a mix of other vendors too.
When asked exactly what he owns, he says he doesn't really know... but they're planning a CMDB project soon. Also, they're quickly running out of data center space, and are pushing 95% of maximum UPS power in most locations. He's thrown-down the gauntlet and halted all new server purchases -- in favor of initiating a virtualization project (which, I might add, is getting upwards of 20:1 consolidation, although he knows that high ratio won't last). He's a risk-taker because he has to be.
So I asked him point-blank, what does he need to make this work. Without a flinch (or a smile) he said "Process and Automation." Process, a la ITIL, and automation -- both of the Run-Book style, as well as the operational style. " If I could have the automation vision that IBM was hawking a few years ago, I'd be thrilled. But it's still vapor".
The good news is that he's closely teamed with his Facilities manager to help him cope with power, real estate and cooling. The bad news is that the Facilities guy is also at wit's-end.
The punchline: This real-life vignette tells me that the traditional IT model is really broken. How come IT -- with all of its computers -- is actually the least automated and efficient arm of the company? I recently read a report from the Uptime Institute which talked about the Economic Meltdown of Moore's Law -- literally, for every $1 of compute asset, it currently costs $1.80 operate it; by 2009, electricity alone will cost triple ($3) what the box cost. What's wrong with this picture?
I know that my VP friend is not alone. But when will the treadmill of IT-being-slave-to-the-hardware end? I'd like to think that automation, active asset management, and the drive toward greater environmental efficiency will begin to influence vendors and managers alike.
Saturday, November 3, 2007
Here's analogy #1: You're driving and come to a red light, you stop the car but the engine keeps running. It's wasteful and inefficient, but because it's generally considered too inconvenient to start & stop your engine every time you hit a red light, nobody does it. Enter the Prius: Come to a red light, and the engine automatically stops; hit the accelerator, and it starts again. Simple. Automatic. Efficient.
That's the analogy Cassatt is bringing to servers -- if they sit idle, even for an hour a day, they're automatically shut off and re-started when they're needed. For production environments, this might only apply to a few scale-out architectures that are provisioned for busy times-of-day, but for Development/Test, there are *always* machines that go unused for periods of time. Cassatt's Active Power Management takes care of this automatically. Simple. Automatic. Efficient.
Don't just believe me. On Aug. 2, the EPA published a Report to Congress on Server and Data Center Efficiency. A core tenet in the report states "implement Power Management on 100% of applicable servers" was a core aspect to "improved operation" of US data centers.
Oh - and here's analogy #2: (and believe-it-or-not, it's from Detroit as well as Japan): It's called Cylinder Shutdown. Turns out that when you literally don't need all the engine's horsepower, cylinders within the engine are dynamically shut down. Check out the future Northstar XV12 Caddy engine, as well the engine in the 2008 Honda Accord.
Turns out, Cassatt technology can do this with IT Servers/blades as well! If you have a farm of servers and a few are sitting idle, they're turned off and kept as "bare metal" until some application needs their horsepower. Then they're dynamically re-purposed for whatever application is needed. That's the ultimate in capital efficiency.
Can this really work? With customers we've spoken to -- some with development environments pushing 4,000 servers -- actively controlling server power & repurposing can save nearly 50% (that's fifty) of operational costs.
Think of all the cars idling at this very moment, and the amount of fuel they're burning. Now, think of all of the servers in your data centers & labs just sitting there waiting to do something. And think of all the Watts they're chewing.
Wednesday, October 17, 2007
Bob Brown, TelaData's CEO offered that the theme of "convergence" applied to 3 areas:
- Technology Convergence as it applies to voice, video & data all converging to a IP-based standards, and what the implications are on data center build-out, power, cabling, etc.
- Organization Convergence - the need/requirement that IT and Facilities cooperate and drive new efficiencies; without this cooperation, breakthrough efficiencies and cost reductions just aren't possible
- Automation convergence, e.g. building/facilities automation standards (like BACnet) interoperating with IT automation (power control, distribution)
- Google made it clear that all employees are encouraged to look at Total Cost of Ownership for every project they pursue; they encourage tradeoffs from everyone, esp. between Facilities, operations and IT. It's a numbers game, which benefits the company overall.
- Cisco operates huge data centers as well - they're also numerically driven, and seem focussed on deriving metrics and standards around energy use - before they implement new programs/policies
- Sun also drove home the need for IT & Facilities to interact (Dean brought his Facilities counterpart along) and really emphasized that one of the massive benefits of efficient IT is to "give back" real estate to the company. Real estate is the 2nd largest cost to a company (next to payroll) and this has made a huge impact on Sun's margin and bottom line.
What was clear from conversations with folks who attended was that there still a rift between facilities managers and IT. For there to be any meaningful progress in data center efficiency, there had to be shared corporate and economic goals for both. And this had to start at the CFO level (for example).
Sunday, October 14, 2007
- RightScale provides a platform and consulting services that enable companies to create scalable web solutions running on Amazon Web Services (AWS) that are reliable, easy to manage, and cost less. The RightScale dashboard saves time in maintaining, managing and monitoring all AWS activities, while RightGrid coordinates the auto-scaling of servers according to usage load. The RightImage library provides pre-built installation templates for common software stacks, and RightScale DeltaSets make it easy to customize and manage modifications to machine images. Together with Amazon Elastic Compute CloudAmazon Simple Storage Service (S3) and Amazon Simple Queue Service (SQS) — RightScale enables a next-generation platform for deploying highly scalable web applications.
- FlexiScale's claims:
- Provisioning & Scalability:Additional servers can be launched and load-balanced in <1 style="font-weight: bold;">
- Flexibility:OS agnostic - we support MS Server and all common versions of Linux; Clone a server image and re-use for another test or production server; Policy engine based load-balancing between physical servers
- Self-service via Control Panel or API:Provisioning of Virtual Dedicated Servers; Start, stop and delete Virtual Dedicated Servers; Resize memory and storage
- Quality of Service: Fully monitored system - network, storage and servers; Fully automated HW recovery;Flexible snapshot based backups;Secure - each customer has their own VLAN and their own virtual disks
- Pricing: No subscription fees and no minimum term contract;Simple to understand utility pay-as-you-go pricing model with no catches;Billing module that lets customers see transparently what resources they have been using
Tuesday, October 9, 2007
But this just accelerates my prediction that the concept of "cloud computing" will quickly mature. Folks are beta-testing Amazon's (somewhat fault-prone) EC2 and S3, upon-which anything from components (i.e. queuing services) to entire website storage, can be hosted.
So now, students will be educated in this form of programming, and go into industry with a new level of comfort with this paradigm. This will surely turn the hosting market on its ear in 1-2 years.
Monday, September 24, 2007
The obstacles they found to achieving such efficiencies include:
- 40% - lack of encouragement from top management
- 36% - widespread unawareness of the cost/benefit relationship of energy efficiency
- 35% - enterprises not wanting to risk reliability
- 33% - lack of communication between IT and facilities departments
There were some other interesting statistics on energy consumption in the data center:
60% of the data center electrical load is used to power IT equipment:
- 56% of that being used to power servers,
- 27% for storage
- 19% for network equipment
- 41 % of survey respondents said their data center electrical usage is not metered separately from the rest of their facilities.
- 81% of operators believe that by 2012 they will need additional data center capacity, despite the fact that 64 percent have built or upgraded their data center in the last five years.
- 27% of respondents believe that despite consolidation and the use of virtualization, their server inventory will increase throughout the next five years.
Friday, September 21, 2007
One of the best places to begin using automation to optimize IT resource consumption is Active Power Management, i.e. applying policy and software-aware power control, all in a platform-agnostic power optimization scheme. Why ARE your idle machines on if they're not being used?
BTW, there are some other great technology-related BearingPoint Podcasts here too.
Monday, September 3, 2007
Even the EPA missed this one in their recent Report on Server and Data Center Energy Efficiency.
It's Active Power Management: That is, safely and intelligently powering-down unused and/or idle servers, and re-powering them when needed. It's a huge (and obvious) move when you consider that the average server burns-up more than half of its fully-loaded rated power when it's just sitting doing nothing.
Reams have already been written about energy-efficient servers, DC power distribution, improved cooling systems, hot/cold aisles, and of course server consolidation. But they've all missed-the-boat -- until today. And the solution is surprisingly simple.
Regardless of the type of server and type of software, this technology is non-disruptive to the data center. It's also the moral-equivalent to turning off the lights in a room that you're not using. Think of the Thousands of servers that sit idle most of the time in Development/Test, or in a "warm" failover facility.
What's the secret sauce?
- It's the ability to gracefully shut-down software prior to turning off the box, and then ensuring re-start when the box is needed again.
- It's the ability to set policies around application importance and interrelationships, and to be able to communicate directly with the software during a power-cycle.
- It's the ability to do all of this from a hardware- and software-neutral perspective
- And, it's the ability to do this in a way that will please both IT and facilities
All made possible with Cassatt's leading ability to apply sophisticated optimization technology to any problem in the data center. And don't just believe me. Look at what PG&E, Brocade, and IDC are saying. This should help raise-the-bar for energy-efficiency best-practices in the data center.
Friday, August 17, 2007
Along with BearingPoint,m I'll be discussing (yes, live and in-person!) what we mean by utility computing, and how the technologies that make "utility computing" possible are available today.
It's more than virtualization. It's about intelligently pooling all HW resources you own today to radically cut operational and capital costs -- and attain a level of agility that current HW/SW models inherently block.
As I've said before, CIOs are doing this already, you just don't know it. Look at Amazon's EC2. Look at Google's infrastructure. Look at Sun's Grid system. It's possible to do with the IT infrastructure you have sitting in your data center today.
What could you do if your total operational cost basis (fully-loaded, everything) was $0.10 per instance-hour for all of your compute services?
Bob Brown, CEO of TelaData, is a visionary on this conference. He sees a massive convergence of technologies... technologies within the data center (i.e. the move toward IP-based video & audio) and the convergence of data center design itself (i.e. facilities, cabling, power management, etc.).
The two together have to be taken into consideration when designing new facilities. If you don't, then you risk mis-estimating compute, power, cabling and other layout requirements. And the $100+ million building you construct is obsolete before it's complete.
And these guys are the pros. While it's confidential (I think) they're advising some of the biggest data center users and web 2.0 companies in the business on data center construction.
Monday, August 6, 2007
I know that this is quite a provocative subject, but take a moment to consider where I'm going:
My thesis: CMDBs will be doomed either to (a) a short-lived existence as they sediment into other data center products, or (b) disappearing altogether as the industry finally realizes that utility computing (using generic hardware and standard stacks) obviates the need for an a la carte solution which tracks which-asset-is-where-and-doing-what-for-whom.
My evidence: Do you think that Amazon Web Services' EC2 compute "cloud" went out and purchased a commercial CMDB to manage their infrastructure and billing? Do you think Google maintains a central CMDB to track what department owns what machine? Isn't it odd that an umteen-volume ITIL process ultimately relies on the existence of a conceptual CMDB? (In fact, doesn't it ring strange that such a "panacea" technology needs a so many volumes of paper just to make it work?)
My logic: CMDBs are essentially a "band aid" for a larger (and growing) problem - complexity. They inherently do nothing to reduce the underlying complexity, configuration variances, or hand-crafted maintenance of the underlying infrastructure. In short, they are just another point-solution product that center managers think will help them drive to a simpler lifestyle -- and they're dead wrong. Instead, they'll be buying another complexity layer - but this time, one that requires them to re-work process as well.
"But wait!" you say; CMDBs are needed because how else do you get your head around infrastructure variances? On what do you base configuration management? What do compliance systems use as a basis? Incident management processes have to "check in" somewhere, don't they?
Well, yes and no. By saying yes to most of the questions above, you're unconsciously complying with the status quo mindset of how data centers are architected and run. With layers of special-purpose tools, each supposedly simplifying the tasks-at-hand. But collectively, they themselves create complexity, redundancy, and the need for more tools like themselves. Every one of these tools maintain the assumption of continued complexity, configuration variances, and hand-crafted maintenance of underlying infrastructure
So? BREAK THE MODEL!
My conclusion: What if the data center had an "operating system" ? This would automatically pool, re-purpose and re-provision all types of physical servers, virtual machines, networking and storage infrastructure. It would optimize how these resources were applied and combined (even down to selecting the most power- and compute-efficient hardware). It would respond to failures by simply managing around them and re-provisioning alternate resources. It would react to disasters by selecting entirely different physical locations for compute loads. And all completely platform-agnostic.
Now - if this system existed (and, of course, it does), then why would you need a CMDB?
- The "Data base" and the "configuration-of-record" would have to already be known by the system, therefore present from the start, and constantly updated in real-time
- Any infrastructure variances would be known in real-time - or eliminated in real-time, as the system re-configured and optimized
- Configuration management, as we understand it today, would be obviated altogether. The system would be given a set of policies from which it would be allowed to choose only approved configurations (all standard, or not). The approved configurations would be constantly monitored and corrected if needed. There would be no "configuration drift" because there would be no human interactions directly with machines - only policies which consistently delivered upgrades, patches and/or roll-backs.
- Compliance (per above) would essentially be governed by policy as well. The system's internal database (and historic record) could be polled by any external system which wanted to ensure that compliance was enforced over time.
- Traditional incident management processes would essentially be a thing of the past, since most would be dealt with automatically. In essence, trouble tickets would be opened, diagnosed, corrected and closed automatically, and in a matter of seconds or minutes. Why then a massive ITIL encyclopedia to govern a non-existent human process?
Say "Yes" to treating the data center at the system-level scale, not at the atomic scale.
Monday, July 23, 2007
Here's why: The combination of Mercury (performance management tools, IT governance and more), Peregrine (IT financial management, and more) and Opsware (IT automated provisioning & process automation) make for a powerful set of disruptive technologies -- that disrupt how data centers are "traditionally" monitored and managed. Those entrenched "traditional" products include IBM Tivoli, CA Unicenter, and HP's own OpenView.
Now, following the Innovator's Dilemma theory, posited by Clay Christensen, large companies will be adverse to adopting newer technologies that would naturally erode/cannibalize their existing technologies, ways of doing business, and cash flows.
Yet, here we have an example where OpsWare (at least would have) come gunning at OpenView with a completely new & disruptive approach to managing and monitoring data centers. But now, OpsWare will be part of HP. So, Will HP have the balls to manage OpsWare to its natural disruptive technological end (and probably make a boatload of money), or, will they find that it erodes the OpenView market and becomes a threat to their existing OpenView base?
Part of me says these guys are smart. I would have combined the same types of companies to create a new IT management platform for the future. (Oh - and prediction: watch this space to see who picks up the next set of disruptive companies like Netuitive and Splunk). However, the other part of me says that HP's data center automation message as-of late has been singularly "Blades". This is simplistic, and surely driven by marketing-program-du-jure.
There's on other chink in the execution of this deal: these acquisitions do nothing to advance HP's own hardware business -- and arguably, push hardware down a level towards further commoditization. What effect will this have on HP hardware revenues?
My theory continues to hold that the next real player in the data center management space has to come from an independent, an non-equipment vendor. That leaves BMC, EMC, or CA. These guys are the only ones truly incented to create a revolutionary management platform that's truly platform independent -- and therefore valuable.
Tuesday, June 12, 2007
The event opened with a keynotes from Mike Chuba and Cameron Haight, followed by a great forward-looking talk on the future of infrastructure and operations from Tom Bittman. He clearly sees the period between 2008-2012 as a shift from "Silos" to shared IT "Pools" - as virtualization itself shifts from consolidation to higher-value to the data center. He further predicted that starting around 2010, true Real Time Infrastructure will become mainstream (see the picture). This will be the enabler as IT-as-a-utility.
But he was careful to define growing distinctions between types of virtualization: At the Application level, there are containers, zones, LPARs, VPARS;At the O/S level, we're seeing a number mainstream VM technologies, including SW appliances; and, at the Hardware layer, we are seeing Grid and Infrastructure-as-a-Service (i.e. Amazon EC2).
Perhaps the most entertaining 'guest' keynote was from Peter Cochrane - ex-CIO for BT, and now a highly-regarded consultant. Brilliant, wry and witty, he opened by positing that IT's reality is having to deal with heterogeneity, mobility and increasing availability of bandwidth. With that bandwidth, (which will be exploited via increasing penetration of fiber, frequency-hopping and spatial distribution) the notions of connectivity to the "cloud" will be pervasive. And, the concepts of "connectivity" and "communication" will begin to shift to concepts of "location" and "presence". The other theme he put forward was one of release of control/centralization. He began with the fact that central control of broadcast bandwidth was shifting from a few thousand outlets (broadcast TV, radio, etc.) to billions of sources (phones, pervasive wifi, transmission "hopping", etc.) Release of control was also shifting from creativity in the office to creativity at home... very web 2.0 -- Oh, and he reitterated that the best definition of web 2.o was put forth by Tim O'Reilly a number of years ago (one of my favorite pieces).
One of the most well-attended break-out sessions on day 1 was run by Ed Holub & Debra Curtis, "Running IT Like a Business" - putting forth that IT has to think of itself not as a cost center, but as a business unit with customer management, product management, marketing, financial controls and... yes... pricing. This of course requires that IT figure out how to identify costs and provide charge-back. And finally, it requires that IT be comfortable with losing business to competitors, i.e. outsourcers. More than ever, running IT-as-a-Utility to achieve efficiencies seems more needed than ever here.
Another series of sessions dealt with Data Center Power Management. It was clear that the current way of running data center was essentially going to run out of electrical capacity in the future - so talk was not only about server efficiency, but cooling efficiency, and prudent facilities design as well. One particularly interesting breakout session, hosted by Will Cappelli, "The Convergence of Operations and Energy Management". The observations here were huge: companies w/large data centers will come face-to-face with international & domestic carbon emissions regulations; IT and Facilities orgs will be required to work together to increase overall energy efficiency; IT energy & power consumption will have to be managed and intelligently optimized (on this, see my previous blog on turning off idled servers).
Clearly, there were tons of other content regarding IT Operations management, Process (i.e. ITIL), discussion around CMDBs, virtualization and more. Way too much to summarize. But stay-tuned as I may comment on some of these from time-to-time :)
Tuesday, June 5, 2007
The Question: We posed the following question to the audience -- Why keep servers turned on when they are not being used? This is especially impactful when you consider a recent APC white paper indicating that the average server consumes about 50% of its loaded power, even when sitting idle. The analogy is to lighting in modern office buildings, where motion sensors only turn lights on for occupied rooms... and when a room is deemed idle and unoccupied, lighting is turned off. Why couldn't the same analogy apply to the over-provisioned servers in a data center - during peak times, as well as to nights/weekends? Clearly the problem is not as simplistic as light switches, but why isn't there a solution?
The Demo: So, with 3 video projectors blazing, two live data centers on-line, and the Collage Software in control, we set out to prove to the audience that power management was not only possible, but that it would save money as well. The scenario illustrated that an external trigger (say, a "curtailment event" from a local utility) could cause Collage to apply policies to power-down low-priority servers (according to their power consumption and/or efficiency) and even migrate-away their compute loads to another data center where power was cheaper. Obviously, the same could be done on a scheduled basis as well. Well, the demo was a success, even down to watching real-time power consumption curves dip-and-settle.
Not uncoincidentally, it turns out that power companies (like PG&E here in Sunny CA) offer incentives for shifting power to off-peak times, as well as special demand-response programs and incentives for firms that react to electrical demand during "events" by additional short-term reductions in power use (like turning off lights & HVAC). Our initial conversations with PG&E (who was also in the room!) showed that they were eager to pursue this type of approach, and confirmed that Cassatt was the first to tackle this problem!
The Numbers: Following the demonstration, we also ran some conservative financial numbers. They showed that a company with 500 typical servers could regularly schedule shut-downs of idle equipment -- even during peak periods as well as nights/weekends -- and save 20% + of their total energy costs! These indications were significantly encouraging regarding the economics of this approach.
The Punchline: So, what there was to take away here is that medium-to-large data centers can save a bunch of $, be "green", and do so without having to change any hardware or software! So ask yourself -- as you pursue installation of energy-efficient IT *equipment* why are you not also pursuing energy-efficient *operation* of that equipment??
My Related Blogs:
4/6/07 - D'oh: Turning Off Idle Servers
1/22/07 - Clothes Driers, Data Centers, and Power Management
Thursday, May 31, 2007
XenSource, you will recall, is the current underdog VM provider in the market - a pure-play high-performance approach to application virtualization. Part of their competitive strategy is (a) how they price (zero if you choose the open-source version, but way lower than the competition if you buy from XenSource), and (b) how they distribute (bundled in with the major Linux distros, not to mention Solaris)
The webcast has two heavy-hitters: Simon Crosby (who's XenSource's CTO), and Rob Gingell (who was a past Sun Fellow and VP -- now Cassatt's CTO). It should be interesting to hear their take on automating virtualization, given the market noise around XenSource - but also given the fact that they themselves don't offer a sophisticated VM management/automation solution... but Cassatt does.
Given I have an 'inside track' on this, I suspect that the conversation will also turn to "what comes next" after virtualization - probably a pretty valuable conversation if you care about your IT career a year from now.
Sunday, May 20, 2007
If there was any theme during the first two days, there it was a conceptual shift from "IT" (info. technology) to "BT" (business technology) - keynoted by their CEO, George Colony. During one of the keynotes, the oft-repeated adage summed it up: "There are no IT projects anymore, just Business Projects"
To that end, there were a litany of guest-keynoters, notably Jeanne Ross of MIT Sloan, Robert Willett, CEO of Best Buy, and others. Each of their presentations went down a relatively conceptual path of assessing organizational agility and business-readiness, alignment, and somewhat Dilbert-esque abstractions trying to align their talks with the concept of "BT"... I say this only because the audience was less one of MBAs, and more of operations executives looking for tactical trends and pointers.
However, the best talk IMHO came from Robert Beauchamp, CEO of BMC software. He's a very down-to-earth, articulate guy- even in front of 1,000 people. I was most impressed by his Shoemaker's Children analogy... that the IT (alright, BT) organizations in enterprises are arguably the least automated departments around. ERP is automated. Finance is automated. Customer interaction is automated. But IT is still manually glued-together, with operations costs continuing to outpace capital investments. He showed the chart here at right (from IDC!) which hits the point home.
However, I was rather impressed with the analysts we spoke 1:1 with. Each is closely tracking the IT automation trend, how virtualization is playing an initiating role in the IT Utility, and how this automation trend is beginning. Also most notably, I bumped into an old friend from Sun, James Staten, who was just brought on to Forrester to follow trends with the Mega Data Centers such as their economics and use of automation as well.
Saturday, April 28, 2007
This is a perfect commercial example of server-less IT I spoke about last week. And it's proof that the economics of Utility Computing are compelling. MacAskill estimated that he's saving about $500,000 anually by not buying and managing his own storage (he computes the number in his blod). And he expects that number to increase. Amazon has taken the traditional approach to managing storage (S3) and computing (EC2) and applied a utility automation paradigm -- enabling a completely new cost model. How else could they be offering such pricing to users?
What's this going to enable in the future, Read on: MacAskill's other Blog: "Amazon+2 Guys = The Next YouTube". (aka the server-less web service!)
I gotta keep wondering: When is corporate IT going to catch onto this utility computing approach, and make "compute clouds" out of their own stuff?
Sunday, April 22, 2007
To Wit: Carriers like Verizon have announced intentions to do so, and SalesForce.com recently announced its existing ability to host more than just CRM applications. But what will really signal the shift toward "compute cloud" use will be the third-party vendors that make use of these resources.
So Here's my prediction: As the infrastructure vendors build-out their compute and storage farms, a new class of computing "brokers" will emerge. These players will adapt the needs of users and IT departments to make seamless use of these compute and storage "clouds". Everything from backing-up your laptop for pennies a GB, to hosting and failover services that don't own a single server.
And here's proof it's happening, with "mashups" of the following just around the corner:
- JungleDisk: offering a simple windows interface to allow individuals to create a "web drive" onto Amazon's S3 storage
- Weoceo: offering a product that allows existing servers to "overflow" peak computing needs onto Amazon's EC2 cloud
- Enomalism: providing services to provision and migrate virtual "elastic" servers, even onto and off-of the Amazon EC2 cloud
- Elasticlive: which essentially provides virtual hosting services - as predicted - (and works with Enomalism, above). Plus, they charge by the "instance-hour", not by the server type!
- Geoelastic: a beta group of "global hosting providers" who will be creating a "global elastic computing cloud" and presumably balancing loads between physical centers.
- Distributed Potential: beginning to deliver pay-per-use grid computing capacity (powered by Elasticlive and Enomalism technologies, above)
- Distributed Exchange: Also powered by (and presumably founded by) ElasticLive and Enomalism; claiming to "broker" excess compute capacity between providers
- Dozens of 3rd-parties creating even more applications on S3
Lastly, from somewhat of a self-serving perspective, Cassatt essentially creates a "cloud" out of existing resources within corporate IT. At that point, shifting loads between "clouds" (internal or external) becomes a simply policy-based procedure.
Thursday, April 19, 2007
Right off, he started with observing that "virtualization" isn't just one thing (Consider: Hypervisors, zones, containers, LPars, network VLANs and virtualized storage). We also quickly observed that virtualization probably isn't an end-game-in-itself for IT. Rather, it represents the most critical enabler that will ignite transformation in the IT industry.
That transformation represents a new way to look at managing IT: Today, we have specialized hardware, software, HA/failover software, monitoring & performance analysis systems, and dozens more. Tomorrow, the transformation will look like managing all of these systems holistically, much the way an Operating System manages components within a server. The automation will be technology agnostic, made possible through virtualization. A number of Dan's earlier interviews all point to this inevitability as well.
He had a bunch of great observations, but the last I liked best: "It's important to take the broadest possible view and avoid point solutions. From this vantage point, a failure of some resource must be handled in the same way as any other condition that causes the configuration to no longer meet service level objectives."
For me, the takeaway from the conversation was something I've said before: take the "long view" on implementing virtualization... it may yield you quick HW savings today, but if its automated in an "IT-as-utility" context, its future savings will dwarf what the industry is seeing now.
Friday, April 6, 2007
Here's what I mean: there are times when server use is low - like during weekends or during the evening. There are also "events" (like the power emergencies we tend to get here in California) where you'd like to minimize power use when your electrical utility tells you to.
So I'm thinking - why shouldn't data centers respond to electrical cost/availability/demand the same way they respond to compute availability/demand? When "events" happen, we turn off the office lights, right?
It turns out that power companies (like PG&E here in Sunny CA) have "traditional" programs to encourage energy efficiency (like rebates for efficient light bulbs, and even efficient servers). But they also have special demand-response programs and incentives for firms that react to electrical demand during "events" by additional short-term reductions in power use (like turning off lights & AC).
Couple that with server automation software, and you've got a combination that's pretty neat: Data Centers that can do things like turn-off low-priority servers, or perhaps move critical applications to other data centers during power events. Cassatt's identified a couple of interesting scenarios:
- Low-priority servers automatically powered-off during power "emergencies"
- Standby servers that remain powered-off ("cold") until needed
- "Follow-the-moon" policies where compute loads are moved to geographies with the least-expensive power
- Policies/direction to use the most power-efficient servers first
- "Dynamic" consolidation, where virtual machines are constantly moved to achieve a "best-fit" to maintain utilization levels (minimizing powered-up servers)
If building operators can automatically turn off non-critical lights and HVAC systems during electrical emergencies, then why don't data centers?
Sunday, February 25, 2007
Well, I got tagged (not the graffiti type) by Steve Wilson. Now it’s my turn to reveal 5 things-you-don't-know-about-me, and then tag 5 more unsuspecting-yet-interesting folks… But first, I had to do a bit of the pedigree/ancestry tracing to see where this tagging all began:
Steve first got tagged by Rich Green (a former colleague; and prior to that, a former colleague); Sin-Yaw Wang tagged Rich; Hal Stern tagged Sin-Yaw; Mary Cay Kosten tagged Hal. Then the trail went cold... Mary Cay’s original tag was from behind Sun’s Firewall, so this was a dead-end.
So I took another track from someone else I know: Jonathan Schwartz was tagged by James Governor (of RedMonk, a cool analyst group I’ve worked with in the past); James was tagged by Jeff Pulver. Ahh. Jeff points to the "Root" of Blog Tag pedigree, 3 orders-of-magnitude better/bigger than mine, residing at Solo SEO ... clearly indicating that some folks have way too much time.
Anyway, here goes the whole point of the thing:
- I once sailed 1,300 miles from
Myrtle Beach SCto Tortola, BVI – with no electronics onboard except a UHF radio and a Timex digital watch. There were 3 of us onboard for 10 days (fortunately two of us knew how to use a sextant). It was one of the most memorable times of my life, being at the mercy of the elements, but having ‘science’ in our back pocket. We made landfall within a mile or so of our target… You gotta read the book “Longitude” to appreciate how important this form of navigation was.
- I was a product of the Reagan defense-spending era – my first job (just out of engineering school) was working for a defense contractor designing and building hardware-based real-time adaptive optics and wavefront correction systems. Next time you hear about DoD blowing planes out of the sky with lasers, think of me. (I think that’s all I’m allowed to say about that)
- I’d rather be renovating a bathroom, laying tile, or for that matter, building a house. There’s something intensely gratifying about building something permanent/durable.
- I wrote a program (circa 1979) in Basic on a Commodore PET with 16k of RAM that played Solitaire. Yes, just 16k. When I finished, there wasn’t enough space to actually execute it. This was perhaps my finest (and final) foray into software. Unless you consider writing FORTRAN batch jobs using punch-cards on a Sperry/Univac.
- I started building a historically-accurate plank-on-frame scale model of Lord Trafalgar’s flagship, HMS Victory, back when I was in high school. I put 18 months into it and got as far as building the hull. Then life intervened. Once the kid(s) are out of school, I burn-out in High Tech, and I get over that “I’ve gotta build a house” thing, the HMS Victory is how I plan to occupy my remaining time in the Rest Home.
Sunday, February 18, 2007
However, the Truth is that virtualization is multi-vendor, multi-technology, and inherently heterogeneous. So: How will these diverse technologies be managed in the future? Here are some observations:
- People are confusing ‘Virtualization’ with VMware (but remember, there are more types of virtualization than just hypervisors for software!)
- VMware’s scope is limited to X86 Platforms. Most organizations have more than X86.
- There are other types of Virtualization within other platforms (Example: Mainframe LPARs, Solaris containers/Zones, HP VPARs).
- There are even Virtualization alternatives in X86 (XenSource, Xen/RHEL, Xen/SuSE, and SWSoft Virtuozzo and others).
- There are different types of Virtualization (JVM, VLANs in the network domain, SAN & NAS in the storage domain, not to mention Incipient and 3Par).
- VMM Virtualization is an OS feature and it’s price will be commoditized to $0 over time. Evidence:
- Historically LPARs came with MVS
- Sun does not charge extra for Containers/Zones
- IBM AIX & HP-UX don’t charge extra either
- JVM’s are free.
- VLAN-ing comes built-in into Switch Firmware
- You can get NAS for free or pay for specially-tuned version in Proprietary Hardware (NetApps)
- Watch out in the SAN-space but it will be interesting to see where that goes with iSCSI and 10Gig Ethernet etc.
- A comment on X86 VMM-pricing:
- Red Hat will deliver Xen for free in RH5
- SuSE will deliver Xen for free in SuSE 10.x
- Intel supports it for free in Intel-VT chips
- AMD supports for free in their Pacifica chips
- XenSource costs 25% of VMware and will eventually be acquired by somebody who will give it away for free as part of something else
This is the next impending management crisis - management, automation, and optimization of virtualized computation, containers, storage, and networking.
Wednesday, February 7, 2007
John Humphreys (Enterprise Computing) opened with some riveting statistics:
- 62% of VM users are looking for a “unified tool”
- 45% of servers planned for installation next year will be virtualized
- 23% is the average savings being reaped from HW, power and facilities
- 70% of IT costs still reside in operations... not in hardware or software.
In addition, he had the foresight to refer to VMs as “the new atomic unit of management.” Hmmm. Right up Cassatt’s alley.
Finally, under “challenges”, one of the big bullets was “How can you consolidate/manage across the DMZ?”... which I found interesting. True, it’s a growing issue, but frankly, with automated network configuration, Collage already manages virtual (and physical) resources across a number of virtual networks.
We also spoke 1:1 with analysts Matt Eastwood (VP,
- Managing across networks (as above)
- Parameterized & “mass-produced” provisioning of VMs
- Managing a virtual enterprise across geographies
- Justifying economics beyond hardware savings
Besides the better-known technologies (i.e. VMware and Xen) there were also some interesting virtualization options:
- Trigence: which has an interesting “encapsulation” technology; they don’t use a hypervisor, per se, but rather encapsulate an application, plus all relevant files/libraries, etc. so it’s completely portable
- SWsoft: which has a unique virtualization approach which, if you only care about one OS, gives you high performance and a huge degree of consolidation
- HP and IBM: both hyping their versions of self-managing blade systems.
- IBM has also announced their Secure Hypervisor (sHype) product, that may be incorporated into 3rd-party hypervisors.
From a purely selfish perspective, Cassatt is pretty well-positioned to help manage/automate an upcoming need: as the Virtualization market matures, more datacenters will need a vendor-neutral way of managing across virtual and physical domains, pooling resources, and guaranteeing service levels.