Wednesday, December 24, 2008

Q&A with Egenera's CMO

Jeremy Geelan of Virtualization Journal had a nice pre-holiday interview w/my boss, Christine Crandell of Egenera.

For some time, Egenera's been known in the somewhat rarified circles of high-performance, mission-critical computing for Wall Street, Government & Service Providers. We've recently unbundled our infrastructure orchestration & management software, and are now shipping it on Dell hardware as well. Because the software is infrastructure, I/O and network centric, it's highly complementary to any form of virtualization:
Customers will begin to demand tools that manage not just virtual machines, but that integrate the management of all forms of virtualization - OS, I/O, network and storage - and do so across multi-vendor technologies. When coupled with network and storage virtualization, virtual machines offer sophisticated forms of high availability, disaster recovery, and service migration across all forms of platforms. When managed together, these new levels of abstraction will enable true utility computing, "cloud" style elastic compute services, and forms of IT services that are fully driven by business priorities rather than technical convenience.
The elegant aspect to this form of management is that it's transparent to the form of virtual or physical applications it supports. Wouldn't it be nice to have a unified infrastructure management system (yes, with HA and DR) across all physical & virtual environments?

Tuesday, December 16, 2008

Five 9's of pooled resources with standard hardware!

Today Egenera & Dell announced the start of shipping their Dell / PAN system. I believe this marks a new strategic direction for the company, and hopefully, a new set of infrastructure management options for mission-critical users of any kind of physical/virtual environment.

Egenera has been best known for combining its high-performance BladeFrame hardware with its PAN Manager (Processing Area Network) software. This duo has historically been used by hundreds of customers to create
very high-performance, highly-reliable, and instantly-reconfigurable compute environments. The system essentially virtualizes and orchestrates pools of servers, networks (including I/O) and SANs to create scalable & flexible assets. But let me be clear: this is an infrastructure play, not a virtualization play. The technology works on servers whether-or-not virtualization is present. More on that later.

What's nifty about today's announcement (the deal was originally announced back in May) is that it's the first time that the PAN software is being delivered (actually, OEM'd) on third-party equipment, specifically Dell PowerEdge servers. That means that if you're building a mission-critical environment or one that's being repurposed frequenty, or one that's mixed physical/virtual, the Dell / Egenera System can support it all on standard Dell hardware.
"The Dell / PAN System by Egenera is a highly available and flexible computing platform that eliminates the need to dedicate servers to applications. Instead, the Dell / PAN System creates a processing area network (PAN) that connects and centrally manages multiple Dell PowerEdge servers together with standard network and storage resources. The Dell / PAN System delivers rapid server provisioning and re-deployment in minutes, plus high availability and site recovery at lower total cost than competitive offerings. The fully-integrated solution enables customers to measurably simplify operations by creating a single resource pool and management tool for both physical and virtual servers, resulting in rapid response to organizational changes and heightened business agility."
Egenera & Dell's "big bets" on the market here are clear:
  • More companies are going to want to buy standard, off-the-shelf servers, especially as the economy slows
  • More IT staffs will find they have mixed physical and virtual environments, and at least two distinct management systems for them. Somehow this will need unification
  • More IT Operations groups will find it inevitable that they will have 2 or more virtualization technologies, and therefore will need a unified approach to HA, DR, network and storage management to support these systems as well.
In general, this form of infrastructure management is highly complementary (and mostly transparent) to users of virtualization. And *I'm* betting that we see more of it in use.

Monday, December 15, 2008

Virtualization’s definition broadens, and so do management technologies

VMBlog's David Marshall has begun a series titled "Prediction 2009: The future of Virtualization" and has been polling industry representatives on their perspectives.

In my contribution to the series I believe that during 2009, we will see the market for virtualization finally evolve. It will expand from the current myopic perspective of hardware virtualization to include realizations that:
  • There are many types of hardware and OS virtualization, each appropriate for different uses and environments
  • For true flexibility, IT operations will also need to leverage virtualization of I/O, networks and storage.
Ergo, we'll see many more attempts to orchestrate infrastructure, chasing Egenera's approach where the underlying HW, Network, I/O and storge is part of a highly-reliable fabric -- on top of which virtualization (or, simply native HW/SW) can be placed.

Wednesday, December 3, 2008

A Google Cloud In Your Data Center?

I love it when idle ruminations possibly come true. Sort of.

Back in October, I blogged about a thought experiment: what if you could have an Amazon EC2 "appliance" behind your corporate firewall? Would it validate the concept/legitimacy of an "internal cloud" architecture?

Well, in an article by John Foley today, that might just be the case, except with Google's App Engine:

One technology company is working on a way to provide "a complete wrapper around App Engine," with the goal of recreating the App Engine environment outside of Google's data center, according to Google product manager Pete Koomen. "It would let you take an App Engine application and run it on your own servers if you needed to," he says. Koomen declined to name the company involved, but my sense is that it's just one of several options that will become available. The subject came up in a discussion of public clouds, private clouds, and hybrid clouds that are a public-private combination.

To me, this is helping establish the fact that whatever architecture the "big guys" are building in their hosted environments may well make its way into private data centers in the no-so-distant future.

The concept of how static internal infrastructure is managed today is changing... not doubt in my mind that compute resources are becoming more adaptive, agile, "elastic", etc., and that the economic advantages will follow.

Post-Script -
As I'm about to publish this, I also should highlight a bit of caution: Some would call Google's App Engine a proprietary cloud (PaaS) architecture. If such an App Engine "appliance" were really true, then this *could* be an attempt by Google to enhance adoption. Enterprises that were loathe to risk lock-in to Google's cloud, could instead run their App Engine jobs internally. You'd still be locked-into the App Engine architecture, but not into Google's infrastructure.

Monday, December 1, 2008

Virtual DR - Don't risk tunnel vision

I just read an interesting article today by Bridget Bothello pointing out that automated virtualization disaster recovery is not a silver bullet.

While the article used VMware's Site Recovery Manager (SRM) as an example, it alludes to limitations of all VM-based HA and DR: While these tools provide simplified failover, remember that they only apply to vendor-specific virtualized instances. The article referenced Mike Laverick, a VMware and Citrix Certified Instructor who wrote a book on SRM.

Here's the gotcha: Nearly all environments have both physical and virtual applications - and short of creating independently-managed DR and HA "silos", there aren't too many ways to unify P & V DR/HA. Said Laverick of this quandry (and I quote) "It is such a royal PITA."

There are a few products on the market that can unify DR replication of HW environments from bare metal, regardless of whether they consist of physical instances or virtual hosts. For example, Egenera's PAN Manager software will replicate an entire HW, network & storage environment (and provision new VM hosts) in a matter of minutes.

But this all begs a few Q's:
- Do we really expect production to virtualize 100% of applications?
- What apps are unlikely to be virtualized? And,
- How desirable is mixed P & V HA/DR?

Wednesday, November 12, 2008

Strides toward internal clouds & more efficient data centers

While I was attending a recent Tier-1 conference of hosted service providers, the question arose of how to build a cloud infrastructure like what Amazon, Google and other 'big guns' already have? Cloud computing was looking great, and IT managers all wanted a piece of it.

Then, at a recent Cloud Computing conference in Mountain View, a number of CIO panelists (especially one representing the state of California) treated the cloud with caution: What of security, SLA control, vendor lock-in and auditability? Cloud computing was still looking nascent.

The solution is the "great taste, less filling" answer -- IT orgs that already own data centers, that want the economic benefits of clouds, but wouldn't outsource a thing to a cloud, can now build an "internal cloud" or a "private cloud". (Whether the words used to
describe it are Infrastructure-as-a-Service, Hardware-as-a-Service, or Utility Computing, these are simply infrastructures that has properties of "elasticity" and "self-healing," while adapting to user demand to preserve service levels)

As Dan Kusnetzky recently pointed out, such environments can "continue to scan the environment to manage events based upon time, occurrence of specific events, capacity considerations and ongoing workload demands" and adjust as-needed."

Well, Cassatt announced today software that does just that. It's the 5.2 release of Active Response. It's capable of transforming existing hetergeneous infrastructures into ones that act "Amazon EC2-like" to build an "internal compute cloud" behind existing firewalls. Whether the environments are Windows, Sun, Linux or IBM platforms. Whether they contain VMs from VMware, Citrix or Parallels. Regardless of networking gear from Cisco, Extreme, Force 10 and others. And, regardess of whether there is a need to manage physical apps, virtual apps, or *both* at the same time (you can even go from P to V and back again on-the-fly).

These details all matter because of a fallacious assumption the industry is making, one that's being proliferated by leading VM vendors: That all IT problems will all be solved IF you virtualize 100% of your infrastructure, and IF you use that vendor's technology. It's not true; rather, IT has to PLAN for managing physical and virtual apps from the same console. IT has to PLAN to manage VMs from differing vendors at the same time.

Scott Lowe observed similar issues in his recent article on the Challenges of cloud computing -

"What about moving resources from one cloud computing environment to another environment? Is it possible to move resources from one cloud to another, like from an internal cloud to an external cloud? What if the clouds are built on different underlying technologies? This doesn't even begin to address the practical and technological concerns around security or privacy that come into play when discussing external clouds interacting with internal ones.

"Given that virtualization typically plays a significant role in cloud computing environments, the interoperability of hypervisors and guest virtual machines (VMs) will be a key factor in the acceptance of widespread cloud computing. Will organizations be able to make a VMware ESX-powered internal cloud work properly with a Xen-powered external cloud, or vice versa?

The ability to build a utility-computing style "internal cloud" is now very real. Check out the Cassatt website, or download a new white paper on internal clouds, and how they generate efficiency and agility-- without the hobbling effects of using an external cloud. I can attest to its quality :)

There's also Steve Oberlin's, Cassatt's Chief Scientist, overview video of the product.

Finally, consider registering for a joint webcast he's doing with James Staten of Forrester Research on November 20th. They'll also be covering cloud computing, internal cloud technologies, and the overall impact on data center efficiency.

Monday, November 10, 2008

ITIL, ITSM, and the Cloud

There's been tons written by pundits about the cloud recently, but I haven't seen any significant in-depth analysis of how implementing compute clouds is integrated with IT Operations. IT OPS is the "guts" of how IT operates day-to-day processes, configurations, changes, additions and problem resolutions. The most popular reference to these processes is ITSM (IT Service Management), and the most popular guide to managing the processes is ITIL (the IT Information Library, v3). Not everyone uses (or even believes in) ITIL, and indeed, it's not required. But it is a convenient way to look at the possible methods/processes IT Ops can bring-to-bear to manage the size and complexity of today's data centers.

Obviously, if you're outsourcing your IT to a Software-as-a-Service provider, you've already obviated most ITSM issues. Somebody else is managing infrastructure for you.

But if you're running your own software in a cloud (say, Amazon EC2) you'd still probably worry about how tomanage & change software configurations; how security is administered; and how new versions are deployed. The only real processes eliminated are those dealing with the hardware -- you still have the software and data management processes to deal with.

Now, if you're operating your own cloud (say you're a hosted services provider, or building an "internal cloud" within your data center) there are still a number of processes to manage -- but also, a number that are conveniently automated or eliminated.

For example, if you look at the 'Service operation" block above, things like Event Management or Problem Management are conveniently automated (if not eliminated) by the "self-healing" aspects of most cloud computing (really utility computing) policy & orchestration engines. Similarly, in the "service design" block, things like capacity management and service level management are similarly automated, and don't require a traditional paper policy.

Consider the types of processes that would be impacted with the use of a truly "elastic" and "self-healing" cloud: "trouble tickets" would be opened and closed automatically and within seconds or minutes. Problem managent would essentially take care of itself. Service levels would be automated. Configurations would be machine-tracked and machine-verified. Indeed, most of the complexity that ITIL was designed to help manage, would be handled by computer, the way complex systems ought to be.

One other quick observation: in a cloud environment, where resources are dynamically and continuously shifted and repurposed, the Configuration Management System (usually a relational database) becomes "real-time", that is, it could change minute-by-minute, instead of daily or weekly, as-is the case with most current CMDB systems.

At any rate, I'd really like to see more in-depth analysis from the IT Ops and/or analyst community to dissect how ITSM is impacted as more IT staffs turn to, or implement, cloud-style automated infrastructures. This way, we can also get out of being "cloud idealists" and become "cloud pragmatists."

Wednesday, November 5, 2008

The art of powering-down servers

I was pretty happy to see that Ted Samson of Infoworld wrote a really well-balanced analysis of the advantages (and calculated risks) of using server power management in the data center.

Besides speaking with leading SW companies in the space like Cassatt, he also pinged authorities at HP, IBM and Sun, who all had thoughtful positions on the merits of powering servers on/off based on their usage (and when they were idle). He also spoke with Robert Aldrich of Cisco Systems -- who pointed out that they are already using power management quite extensively internally, with no ill-effects. BTW, Robert pointed out that servers at Cisco use 40% of their power when idle. Most analyses I've seen show that the number is closer to 60%-80%. Good for them.

What's also fascinating about Ted's post are the comments. Most are quite supportive of power management, with a realization that future, denser data centers will need this -- and that some power companies have already figured-out that dynamic "load management" is one of the most intelligent operational innovations available today.

Tuesday, October 21, 2008

Gartner on Green Data Center Recommendations

Gartner Research just issued a very telling release on taking a holistic view of energy-efficient data centers, rather than a narrow point-technology view. Gartner compared the data center to a "living organism" in terms of how it needs to be treated as a dynamic mechanism. (BTW, I owe a head nod to Dave O's GreenM3 blog for coining the term "the living data center")

Said Rakesh Kumar, a research vice president at Gartner,
“If ‘greening’ the data centre is the goal, power efficiency is the starting point but not sufficient on its own... Green’ requires an end-to-end, integrated view of the data centre, including the building, energy efficiency, waste management, asset management, capacity management, technology architecture, support services, energy sources and operations.”
“Data centre managers need to think differently about their data centres. Tomorrow’s data centre is moving from being static to becoming a living organism, where modelling and measuring tools will become one of the major elements of its management,” said Mr Kumar. “It will be dynamic and address a variety of technical, financial and environmental demands, and modular to respond quickly to demands for floor space. In addition, it will need to have some degree of flexibility, to run workloads where energy is cheapest and above all be highly-available, with 99.999 per cent availability.”
I like this analysis because it implies a dynamic "utility computing" style data center where workloads can be moved, servers can be repurposed, and capacity is always matched to demand. This is the ideal approach to ensuring constant efficiency.

The release also had six recommendations; Here's the one I like the most:
6. Manage the server efficiencies. Move away from the ‘always on’ mentality and look at powering equipment down
To me, it sounds like technologies like Active Power management are finally getting traction; and, it seems that power management is being validated -- especially in environments with very highly cyclical workloads. (most recently endorsed in a 451 Group report, as well by a host of vendors).

Especially with the economy in a spin, and margins being tightened, look for more ideas for increasing the $ efficiency of data center assets.

Saturday, October 18, 2008

IT Analysts opening their Kimono

There was a time when IT industry analysts would only provide information, opinion or data for a price. But it seems that in today's web 2.o world, they are exposing more of their thoughts in the form of blogs and other "free" information. I suspect that this is happening due to downward pressure on price (full subscriptions to analysts and reports for a year can be tens of thousands of dollars), plus the realization of the need to "market" their expertise and insights to a broader audience.

Here are some of my favorite, complementary, data center-related analyst info you can subscribe to:

Forrester: Great set of analysts and topics here. Check out their entire Blog Listings page to find the right IT industry slice for your taste.

Gartner: Gartner has a "blog network" page where it appears they've asked most of their analysts to do individual blogs, many of which are on various IT topics and technologies. And just so you don't think there isn't any overlap between analyst coverage, they also have a *very* complete blog and video covering Cloud Computing and Cisco's possible intentions, too.

IDC: I just discovered the "IDC Exchange" (which I wrote about earlier last week). They recently did a *very* nice multiple installment piece on cloud computing you have to check out. They've also recentely completed an analysis piece on Cisco and their possible cloud computing intentions.

Redmonk: I've known James Governor since my days at Sun. He runs a multi-topic blog called Monkchips which is insightful with a tinge of wry wit from across the Pond (It's also #3 on the list of Top Analyst Blogs). Michael Cote, another Redmonker, also has a quality blog (People over Process) on IT operations issues and more (BTW, Michael's Blog is #8 on the list)

Saugatuck Technology: I've been following their reports on SOA and related technologies, but they've been branching out. While not a Blog, they email out a very nice complementary summary of each of their extended reports in the form of "Research Alerts"

Thursday, October 16, 2008

Awesome Blog/Report on Cloud Computing by IDC

Those quant guys at IDC have been at it again. This time, they've posted a really fine report/overview on cloud computing on their "IDC Exchange" Blog page, authored by Frank Gens. It was initially posted in September, but it looks like they've been adding report bits (and great graphics) to it for a while.
Over the coming weeks, we’ll roll out a number of posts on cloud services and cloud computing. While these posts can be read standalone, they can also be viewed as parts of a single, coherent IDC overview of this emerging model and market opportunity. We’ll use this post to create a “virtual table of contents”, adding links to these cloud-related posts as they’re published, allowing you to see how different elements of our cloud outlook fit together, and to easily navigate among them.
Here's the Table of Contents:
BTW, if you want a chuckle, click on the "listen now" button to hear a machine-generated voice read the pages for you. Listen closely to the fact that the computer never takes a breath. :)

Tuesday, October 14, 2008

Postcards from the Cloud Summit Executive conference

I'm just now getting back from a full day in Mountain View at the Cloud Summit Executive conference. And if there were themes that summarized the day, they would be "it's all about the business impacts" and "integration of services will be critical."

The conference was sponsored by TechWeb, and hosted/moderated by the experienced MR Rangaswami Co-Founder of Sand Hill Group. The crowd was around 300 folks, just small enough that you could network w/interesting people during the generous lunch/coffee breaks -- nice schedule design.
And, while some of the vendor/presenters were definitely commercials, some of the content in the general sessions was really worth the price.

The day opened with Tom Hogan, SVP from HP. Although the talk was (of course) vendor-centric, he did a really nice job of summing-up the challenges IT faces (nearly 85% of budgets going toward "keeping the lights on"), and yet identifying the business opportunities that the "cloud" will enable. Most notably these were spawning opportunities for smaller businesses, including enabling certain departmental-scale projects in larger orgs. The cloud, according to Tom, was just another channel for deploying business services -- not a panacea. Use it as part of your portfolio mix.

Next was a panel moderated by Bruce Richardson of AMR research on "selling the crowd to Wall Street and Main Street. It included Bryan McGrath from Credit Suisse, Robin Vasan from the Mayfield Fund, and Jeff Koser, business author. Again, the tone was decidedly business-focused, with little discussion of technology (in contrast to SDForum, a few weeks earlier). There were some great tidbits within the discussion on how to design a resilient business/model based on using cloud infrastructure -- plus entrepreneurial tips anyone should use (Focus on how to attract customers; maintain a super-low cost-of-sales, and a very scalable sales channel; keep price of product under $50k; ensure a "sticky" product with recurring revenue, etc.)

Later in the afternoon was a very thoughtful presentation/discussion from Vishal Sikka, CTO from SAP. His opening slide:
Where we are: Power, Infrastructure, Operations
Where we want to be: Integration, Integrity, Elasticity
First, he said, businesses are missing "integration", especially between critical applications; cloud computing could make this worse -- but there are initiatives to improve on this. Problem is, they won't happen over night. On "Integrity" he also pointed out that data integrity today is in fact fragmented -- and again, the cloud could make this worse before it improves on things. And finally -- the topic near to my heart -- "elasticity". Vishal strongly said that elasticity (of compute capacity) HAD to be delivered for all apps of all types, and for DBs as well. And he cautioned: Infrastructure will be permanently heterogeneous. Plan for it.

The other Achilles heel for the cloud, he said, was compliance. It's all about transparency and control (distinct from security). Although companies have been outsourcing for years, the "cloud" still needs ways to provide logs, tracking, compliance tools, etc. [sounds to me like a business opportunity...]

Later in the afternoon (after a good networking lunch, complete with Red Bull on ice!) was what I thought was the best panel of all: "Understanding enterprise requirements for the cloud". It was moderated by David Berlind at TechWeb, with Art Wittmann of Information Week -- and featured two diametrically-opposed perspectives on IT: Carolyn Lawson, CIO from the California Public Utilities Commission, and Anthony Hill, CIO from Golden Gate University.

First, Art covered high-points of a recent report on Cloud Computing by InformationWeek: 62% of respondents still had no interest (or not enough info) to consider cloud computing at all. Of those considering clouds, there "likes" included meeting user demand and scale, and avoiding huge capital outlays. But their "fears" were expected: Security, Control, Performance, Support, and Vendor Lock-in.

But then the real great part of the panel began. Carolyn Lawson of CPUC ran a government IT operation. Data was highly sensitive; new capital and employees were hard to come by (literally approving state approval). She faced stiff legal issues around geography (as it related to where data lives), data liability, and data security. And she literally said "going to a cloud architecture would be like stepping off of a cliff", given her current constraints.

On the other hand, Anthony Hill of Golden Gate University had a completely different set of constraints and drivers. He's outsourced nearly every application the University uses -- to nearly a dozen different SaaS providers -- and keeps incremental user costs nearly at zero. He has only a small staff, and has avoided huge capital and operational budgets, while supporting the strategic needs of the business to provide an "online university" environment. To be sure, he has challenges too: very high vendor risk (what if they go out of business? what happens to my data?); very high switching costs (how do I migrate my data?); and, the fact that today, he's doing ALL of the inter-application integration himself.

But the big take-away from this interaction was that it is clear that for some businesses, the "cloud" is a godsend -- but for others, it will make almost no inroads in the foreseeable future. Conclusion: Look at the business needs first, before assuming that technology solves all.

A final note: conferences like this are invaluable for their networking opportunities, and TechWeb had a good mix of content (but please try to reduce the "commercials" in the future), small size, break-outs and long breaks. The audience was qualified, too; lots of CEO and "office of the CTO" on badges, etc. My buddy and Yoda-of-the-Blog James Urquhart was there too, as-was a generous quantity of VCs who traveled a few miles from Sand Hill road.

Monday, October 13, 2008

Cloud Computing forever changes consolidation and capacity management

This is an intriguing topic - the relationship between the need to forecast compute capacity (part art and part science today), and the "elasticity" guaranteed by what we're calling "the cloud."

So last week, when Michael Coté (an analyst with RedMonk) wrote about "How cloud computing will change capacity management" I thought it would be a good idea to expand on his observations and to dissect the issues and trends. Including my prediction that existing capacity management tool value will be overtaken by utility computing technologies.

First, terms: When I talk about the "cloud", I'm usually talking about Infrastructure-as-a-Service (a la Amazon EC2) rather than platform-as-a-service (e.g. Google app engine) or Software-as-a-Service. To me, IaaS represents that "raw" compute capacity on which we could provision any arbitrary compute service, grid, etc. (It's also what I consider the underlying architecture that's been called Utility Computing).

Michael was clear to define two other terms, Capacity Management, and Capacity Planning. Capacity management is the balancing of compute resources against demand (usually with demand data you have), while capacity planning is trying to estimate future required capacity (usually without the demand data you'd like).

Another related issue that has to be addressed is
Consolidation Planning - essentially "reverse" capacity planning -- estimating how to minimize overall in-use capacity while maintaining service and availability levels for virtualized applications.

So how does use of "cloud" (IaaS) impact capacity management/planning, as well as consolidation planning? In my estimation, there are two broad views on this:
  1. If you buy-into using the "public" cloud, then all the work you've been doing to estimate capacity and to plan for consolidation doesn't really matter. It's because your capacity has been outsourced to another provider who will bill you on an as-used basis. The IaaS "cloud" is elastic, and expands/contracts in relationship to demand.
  2. If you instead build an "internal cloud", or essentially architect a utility computing IaaS infrastructure, the story is a little different. You're taking non-infinite resources (your data center) and applying them in a more dynamic fashion. Nonetheless, the way you've been doing capacity management/planning, and even consolidation planning, will change forever.
I'll take #2, above, as an example, because its operation is more transparent. You start with your existing infrastructure (machines, network, storage) and use policy-based provisioning/controls to continuously adjust how it is applied. This approach yields a number of nice properties:
  • Efficiency: You only use the capacity (physical and/or virtual) you need, and only when you need it
  • Continuous consolidation: A corollary to above is that the policy engine can "continuously consolidate" virtualized apps (e.g. it can continually compute and re-adjust consolidated applications for "best-fit" against working resources)
  • Global view: Global available capacity (and global in-use capacity) is always known
  • Prioritization: You can apply policy to prioritize capacity use (e.g. e-commerce apps get priority during the holidays, financial apps get priority at quarter-close)
  • Safety net: You can apply policy to limit specific capacity use (e.g. you're introducing a new application, and you don't know what initial demand will be)
  • Resource use: It enables solutions for "resource contention" (borrowing from Peter to pay Paul); higher-priority applications can temporarily borrow capacity from lower-priority apps.
The net-net of the properties above is the long-term obviation of capacity planning, capacity management, and consolidation-planning tools. (Now take a deep breath)

Yes. Long-term, I would expect existing capacity management tools like PlateSpin PowerRecon, CiRBA's Data Center Intelligence, and VMware's Capacity Planner to be completely obviated with the appropriate internal IaaS architectures. Why? Well, let's say you do clever consolidation-planning for your apps. You virtualize them and cram them into many fewer servers. But a few months pass, and the business demand for a few apps changes... so you have to start re-planning over again. Contrast this against an IaaS infrastructure, where you let a computer continuously figure out the "best fit" for your applications. The current concept of "static" resource planning is destined for the history books.

Oh - and there are some nice side-benefits of allowing policy to govern when and where applications are provisioned in an internal IaaS ("internal cloud") architecture:

1) Simplified capacity additions: If capacity is allocated on a global basis, then the need to plan on a per-application basis is much less important. Raw capacity can be added to a "free pool" of servers, and the governing policy engine allocates it as-needed to individual applications. In fact, the more applications you have, the "smoother" capacity can be allocated, and the more statistical (rather than granular) capacity measurement can become.

2) Re-defined "consolidation planning": As I said above, the "static" approach to consolidation planning will give way to continuous resource allocation, essentially "continuous consolidation." Instead, you'll simply find yourself looking at used capacity (whether for physical or virtualized apps) and add raw capacity (as in #1) as-needed. The hard work of figuring out "best fit" for consolidation will take place automatically, and dynamically.

3) Re-defined capacity management: Just like #2 - Rather than using tools to determine "static" capacity needs, you'll get a global perspective on available vs. used raw capacity. You'll simply add raw capacity as-needed, and it will be allocated to physical and/or virtual workloads as-needed.

4) Re-defined
capacity planning for new apps: Instead of the "black art" of figuring-out how much capacity (and HW purchase) to allocate to new apps, you'll use policy instead. For example, you roll-out a new app, and use policy to throttle how much capacity it uses. If you under-forecast, you can "open the throttle" and allow more resources to be used -- and if it's a critical app, maybe even dynamically "borrow" resources from elsewhere until you permanently acquire new capacity.

5) Attention to application "Phase": You'll also realize that the best capital efficiency occurs when you have "out-of-phase" resource demands. For example, most demand for app servers happens during the day, while demand for backup servers happens at night -- so these out-of-phase needs could theoretically share hardware. So I would expect administrative duties to shift towards "global load balancing", and encouraging non-essential tasks to take place during off-hours. Much the same way Independent System Operators across the country share electric loads.

BTW, if all of this sounds like "vision" and vaporware, it's not. There are firms offering architectures like IaaS, Internal Clouds and utility computing technologies today, that work with your existing equipment. I know one of them pretty well :)

Thursday, October 9, 2008

Cassatt's chief scientist explains, simplifies

What's the sign of a really smart guy? The ability to take a complex topic and simplify it so that even your mom will understand it.

Steve Oberlin, Cassatt's Chief Scientist, had done that. He's taken a look at how data centers operate, the dynamics that drive them, and how existing technology can help simplify IT management's life. Simpler capacity management, service-level management, and overall energy efficiency. It's the basis behind utility computing, what will drive Infrastructure-as-a-Service (IaaS), the basis for building "internal cloud" infrastructures we're all talking about.

Oh. And what's the sign of an extraordinarily smart guy? That he can simplify-down these concepts -and- produce the entire video himself.

Monday, October 6, 2008

Would you buy an Amazon EC2 appliance?

Before you scream "a what?" I'm only posing this as a thought experiment...

But the concept was recently put forth as an illustration
at last week's SDForum by an attendee. I kind of thought about it for a few minutes, and realized that the concept isn't as crazy as it first sounds. In fact, it implies major changes for IT are on the way.

First of all, the idea of a SaaS provider or web service provider creating a physical appliance for the enterprise is not new. There's the Google search appliance, but I also expect providers like to do the same in the near future. (There are some very large enterprises that want to be 100% sure that their critical/sensitive data is resident behind their firewall, and they want to bring the value of their SaaS provider inside.)

So I thought, what would I expect from an Amazon EC2/S3 appliance to do? Similar to Google's appliance providing internal search, I'd expect an Amazon appliance to create an elastic, resilient set of compute and storage services inside a company, and it could support any/all applications no matter what the user demand. It would also have cost-transparency, i.e. I'd know exactly what it cost to operate each CPU (or virtual CPU) on an hourly basis. Same goes for storage.

This approach would have various advantages (plus a small limitation) to how IT is operated today. The limitation would be that its "elasticity" would be limited by the poolable compute horsepower within an enterprise. But the advantages would be huge -- who wouldn' t like a cost basis ~$0.10/CPU-hour from their existing resources? Who wouldn't like to shrug-off traditional capacity planning? etc. etc. AND they'd be able to maintain all of their existing compliance and security architectures, since they were still using their own behind-the-firewall facilities.

Does it still sound crazy so far?

NOW what if Amazon were to take one little extra step. Remember that limitation above -- the what-if-I-run-out-of-compute-resources issue? What if Amazon allowed the appliance user to permit reaching-out to Amazon's public EC2/S3? Say you hit peak compute demand. Say you had a large power outage or a series of hardware failures. Say you were rolling-out a new app and you couldn't accurately forecast demand. This feature would be valuable to you because you'd have practically infinite "overflow" -- and it would be valuable to Amazon since it would drive incremental business to their public infrastructure.

To be honest, I have no idea what Amazon is planning. But I DO know that the concept of commercially-available software/hardware to create internal "clouds" is happening today. And not just in the "special case" of VMware's "VDC-OS", but in a more generalized approach.

Companies like Cassatt can -- today -- take an existing compute environment, and transform its operation so that it acts like an EC2 (an "internal cloud"). It responds to demand changes, it works around failures, and it optimizes how resources are pooled. You don't have to virtualize applications if you don't want to; and if you do, you can use whatever VM technology you prefer. It's all managed as an "elastic" pool for you. And metered, too.

To be sure, others are developing similar approaches to transforming how *internal* IT is managed. But if you are one of those who believes in the value of a "cloud" but wouldn' t use it, maybe you should think again.

Sound crazy?

Thursday, October 2, 2008

Decades of experience with Clouds: Telcos

While at yesterday's SDForum meeting on cloud computing, a panelist pointed out that we've been living with (a form of) cloud computing for decades. It's called Telephony.

On reflection, the telcos do give us an interesting model for what PaaS *could* be like, and a metaphor for types of cloud services. To wit:
  • As users, we don't know (or care) where the carrier's gear is, or what platform it's based on so long as our calls are connected and our voicemail is available.
  • There isn't technical "lock-in" as we think of it. Your address (phone number) is now portable between carriers, and the cloud "API" is the dial tone plus those DTMF touch-tones
  • I can "mash-up" applications such as Voicemail from one company, conference calling from another, and FAX transmission from a third.
  • There are even forms of "internal clouds" in this model -- they're called PBXs (private branch exchanges) which are nothing more than "internal" phone switches for your business
This last point interests me the most - that enterprises have economic and operational needs (maybe even security needs!) to manage their own internal phone systems. But inevitably, workers may have to use the public phone system, too.

Similarly, many enterprises will need to retain 100% control of certain computing processes and never outsource to a cloud; They'll certainly be attracted to the economics that external computing resources offer, but will eventually build (or acquire) a similar *internal* capability. Just wait.

Wednesday, October 1, 2008

Postcards from SDForum - Cloud Computing and Beyond

I attended most of today's SDForum "Cloud computing and Beyond: The Web Grows Up (Finally)" in Santa Clara. Somewhere around 200 professionals from Silicon Valley showed to hear -- and to debate -- the relative maturity and merits of the thing we're calling the cloud.

The day was lead-off by James Staten, a friend and former colleague, and now with Forrester Research, who gave a fantastic keynote of "Is cloud computing the next revolution?" Just getting to a definition of terms, and mapping the taxonomy of this emerging market is tricky. But he's tracking this fast-maturing market rather closely. Both web-based services and Software-as-a-Service are becoming the norm; but the industry is also calling the lower-level services (PaaS, IaaS) cloud too. So be careful of terms when you enter into a cloud debate.

Another morning keynote (which I unfortunately missed most of) was delivered by Lew Tucker, Sun Microsystems' new CTO of Cloud Computing (and also a friend and former colleague). He's quite a visionary, and went so far as to suggest that computing resources of tomorrow will be brokered/arbitraged based on specializations, costs, etc.

One particularly lively panel was hosted by Chris Primesberger of E-Week, with panelists from, Intacct, SAP, RingCentral and Google. There was some light discussion about cloud differentiation, interaction, and standard approaches to describing cloud SLAs. Most generally agreed that there would in fact be 3rd-party businesses brokering between providers at some point. The other enlightening discussion focused on capacity planning for the cloud -- what if a user scaled from ten to ten-thousand servers in a few days or weeks? Could services like Amazon handle this? In a consistent - and impressive - way, the panelists agreed that these sorts of scale issues were "a drop in the bucket" when you consider the vastness of what these large service provide on a daily basis.

In what drew the most spontaneous applause was a question asked to the panel (but probably directed to Rajen Sheth of Google) by a member of the audience. Essentially, how could we *not* assume there would be service lock-in, when had one platform model, and Google App Engine had another? (a good point elucidated by James Urquhart some time ago). The Google response focused on "providing the best possible service for customers" but was clearly a dodge. (BTW, the author herein suggests that SaaS and PaaS models will follow the same proprietary/fragmentary model as did Linux and Unix).

In an afternoon panel led by David Brown of AMR research, the main question addressed was whether (or to what degree) cloud computing was disruptive. The panel consisted of hardware, software and services vendors from Elastra, Egenera, Joyent and Nirvanix. The panel agreed that there were different types of disruption, depending on where you sit. From an infrastructure management perspective, internal cloud architectures can be disruptive to IT Ops, since it changes how resources are applied and shared, and the fundamentals of capacity planning. Cloud architectures can also be disruptive to traditional forms of hosting and outsourcing, due to their pay-as-you-go approach.

I will say that Jason Hoffman, Founder of Joyent, stood out in the panel clearly as a visionary in this field. Keep an eye on this guy. His take on disruption was that if "cloud" means Infrastructure-as-a-Service, then it's really just another form of hosting, and not very disruptive. But if how "clouds" are applied to support business needs using policy (i.e. to dynamically communicate SLAs, Geographic compute locations, costs, replication, failover,etc.) then they become very disruptive. IT administration would shift from scripting and fire-fighting, to policy-development and policy modification.

Finally, I will point out that many more folks showed-up who would use clouds and/or broker cloud services than who would actually *make* the clouds (IaaS) in the first place, again attesting to the point I made earlier this week that it's a lot harder to do, and only really sophisticated vendors will be taking that on.

Monday, September 29, 2008

Proceed with caution: Taking the long-view on Virtualization

I was recently pointed to Amrit Williams' excellent observations about virtualization -- including the myths and misconceptions about it as well. Amrit's observations are a really great place to start when you're considering a virtualization initiative. Forewarned is forearmed.

A consistent observation we at Cassatt have been professing for some time is that virtualization is not an end-in-itself. It is an enabler of much higher-level data center management structures. So, when virtualization is implemented as if it is the goal, the outcome could easily be more cost and complexity, rather than the reverse. As illustrated by Amrit:
Virtualization reduces complexity (I know what server I am. I’m the server, playing a server, disguised as another server)

It seems counter-intuitive that virtualization would introduce management complexity, but the reality is that all the security and systems management requirements currently facing enterprises today do not disappear simply because an OS is a guest within a virtual environment, in fact they increase. Not only does one need to continue to maintain the integrity of the guest OS (configuration, patch, security, application and user management and provisioning), one also needs to maintain the integrity of the virtual layer as well. Problem is this is done through disparate tools managed by FTE’s (full time employees) with disparate skills sets. Organizations also move from a fairly static environment in the physical world, where it takes time to provision a system and deploy the OS and associated applications, to a very dynamic environment in the virtual world where managing guest systems - VMsprawl - becomes an exercise in whack-a-mole.
There are also a variety of other perspectives on this new tool called virtualization. Take for example the implicit assumption that *everything* will be virtualized. The answer is maybe, but perhaps not in our lifetime. Begging the question "how to I manage all the other stuff?" The de-facto answer has been that IT uses its existing systems for Physical, and VM management tools for the rest. Now you've bifurcated your datacenter management, and added to complexity once agai n. (Thanks to Amrit for the pic.)

My advice is - and has been - to treat virtualization as a *feature* of something larger. Don't implement it if you're treating it as a point-solution; Treat it as an enabler of your next systems management architecture. Rules-of-thumb have to be

  • Assume heterogeneity of VMs & VM management; plan for it
  • Assume you'll always have physical servers somewhere; manage them alongside your virtual servers
  • Assume you'll have more virtual object to manage than you can keep track of; use an automation tool
  • Never assume that once you've consolidated, things will be stable; plan for constant re-adjustment of scale and capacity (another argument for automation)
Have an end-game in sight if your'e introducing VMs in your environment. Take the long-view.

Thursday, September 25, 2008

20 Cloud computing startups - analysis

I was pointed to John Foley's InformationWeek article earlier this week of "20 Cloud Computing Startups You Should Know." Aside from the fact I could only count 19, it was a great survey of what types of companies, ideas and ventures are getting on the bandwagon.

The quick-and-dirty chart above is mine; what I found so interesting is that 8 of the players are building solutions on top of other clouds (like Amazon's EC2 and S3) while another 7 are investing in essentially building hosted services.

However, only 4 (ok, maybe 4-1/2) are thinking/trying to bring "cloud" technologies and economics to the enterprise's own internal IT. This certainly attests to the difficulty in reworking IT's entrenched technologies, and building a newer abstracted model of how IT should operate.

Even though Cassatt wasn't mentioned in the survey (maybe we were supposed to be #20) we also play in the "build-an-internal-cloud-with-what-you-have" space.

This model -- that of an "internal cloud" architecture -- will ultimately result in more efficient data centers (these architectures are highly efficient) and ones that will be able to "reach out" for additional resources (if-and-when needed) in an easier manner than today's IT.

I'd look to see more existing enterprises considering building their own cloud architectures (after all, they've already invested lots of $$ in infrastructure) while startups and smaller shops opt for the products that leverage existing (external) cloud resources.

BTW, John also just posted a very nice blog of a "reality check" to curb some of the cloud computing hype.

Wednesday, September 17, 2008

Postcards from the Hosting Transformation Summit

Right across the street from VMworld was Tier 1's Hosting Transformation Summit. Roughly 400 folks -- mostly from Managed Service Providers (MSPs) -- attended to get the lowdown on where that industry is going. It's changing fast, given some of the recent "cloudy" offerings from Amazon, Mosso, OpSource and others. And part of the driver was the technology offered from VMware itself.

But First: The industry, and its growth, is compelling. Managed services hosting is growing in the U.S. at about 30%/y, and it will be a $10 billion industry by the end of 2008. While about 20% of that amount is represented by 13 of the largest firms, the remainder of the market is represented by hundreds if not thousands of smaller entities.

Dan Golding of Tier 1 pointed out that the categories called "web hosting" and "managed hosting" are colliding, given that so many apps are being delivered over http. He also pointed out that small/medium businesses are expectted to outsource even more of their own IT, as operating it themselves becomes more complex and expensive... also good for MSPs. In particular he noted that CRM, HR, Accounting, fileservers, utility storage, email and project management were expected to be the top managed SaaS applications.

Next, John Zanni, Microsoft's GM of worldwide hosting, gave a talk called "cloud computing - is virtualization enough?" Having seen Paul Maritz' VMworld keynote hours before, I couldn' t help but compare the two. Zanni's a really smart guy -- but vision-wise, his talk was a let-down. While he absolutely identified the same requirements of the "cloud" (which were surprisingly in-agreement with VMware's) Microsoft's vision was elementary in comparison to VMware, referencing Microsoft party-lines and products - and was weak on vision. Granted, the audience was not as heavily-laden with technologists as the VMware conference, but the vision that was sketched-out just didn't seem too fully-baked. One interesting side-note: John explicitly mentioned Microsoft management tools that would someday manage 3rd-party VMs such as VMware. Hmmm....

On day 2, Antonio Piraino (also of Tier 1) gave a really great talk on "virtualization and cloud computing" -- the guy really gets it, with respect to the MSP industry. His message to MSPs was pretty clear: Cloud is coming, and you (the MSP) will need to learn about it and get on board. The definition of "cloud" he gave to MSPs was
  • Server-based managed hosting
  • Virtualized offerings
  • Multi-O/S & DB support
  • Automated scalablity
  • Easy ordering of services
  • On-demand provisioning
  • Cross-service integration
  • Bill-for-use
  • SLAs were managed / managed-for
It's clear that the smaller MSPs out there will be jumping on the Utility Computing, Cloud, PaaS and SaaS bandwagon soon. That should begin to give folks like Mosso, Flexiscale, IronScale, OpSource etc. some competition. But i'm sure we're going to see the concept of abstracted-away hardware grow in popularity with frightening velocity.

Postcards from VMworld 2008 (with a twist)

I'm a bit late in reporting-back on day #1 of VMworld in Las Vegas. Word-on-the-floor is that there are over 14,000 attendees here. Definitely indicative of the hunger the industry has for this technology.

Rather than re-hash all of what CEO Paul Maritz had to say, I'd like to point out why VMware's vision is both on-the-mark -and- already available from sources other than VMware.... and showcase one such available product

Paul outlined 3 areas of vision:
  • Virtual Data Center O/S (VDC-OS)
  • vCloud (providing the ability to build internal/external clouds and federation between clouds)
  • vClient (providing end-client independence for services emanating from clouds
He emphasized, with a demo, how an "internal cloud" could reach-out to an off-premises (external) cloud for resources, say during peaking demand -- or perhaps as a failover scenario. The demo has 3 points to make: (a) the ability to provide "elastic" capacity, (b) the ability to provide self-healing in the form of replacing failed capacity, and (c) the fact that it was driven by policies based on SLAs. It was a demo of a non-commercially-available product, but it drew great applause from the audience.

Whenever the "big guys" show-off a concept/roadmap, you can be sure that there are already smaller guys who are paving the way for them; this is no different. Cassatt, for one, has been showing-off this type of demo (down to a similar GUI) for many months now. With a few key differences:
  • The product is shipping today
  • We don't require that there are "warm" hosts pre-provisioned as standby resources
  • We don't require that VMware is everytwere; in fact, we can already show the same demo but using Xen/Citrix (and soon, with other VM players)
  • We don't even require that Virtualization is used at all; our approach works with physical HW and O/Ss too (including x86, SPARC, Linux distros, Solaris, and others)

For those attending the keynote, perhaps the GUI above looks familiar; except it's Cassatt's Active Response 5.1

In the center is a chart indicating upper- and lower- SLA thresholds (SLAs can be arbitrarily defined and composed). If the upper SLA is breached, Active Response finds bare-metal resources in the "free pool" (again, defined how you like) and then automatically provisions those resources with whatever SW policy determined (read: either a physical server or a virtual server). The application "tier" grows automatically. If/when the lower threshold is breached, an instance on the "tier" is retired. This approach provides real-life SLA management, capacity-on-demand (elastic behavior), failover/availability, and many other nice-to-have properties -- automatically. And Today.

This set of properties were also discussed across the street today at Tier-1 Research Hosting Summit at the Mirage. Many MSPs in the audience wanted to know "how do I get some of that?" when discussion came to utility computing and cloud infrastructures. I'll post on that next :)

Monday, September 15, 2008

An early analysis of VMware vCloud and VDC-OS

It's the first day of VMworld, and already the P/R for new technology and "roadmaps" is flying.

The news that caught my eye was VMware's vCloud & Virtual Data Center O/S (VDC-OS) Initiatives... Strategically, it's a great move for them. They've essentially said "hey, enterprises use VMware internally, and service providers use VMware too. So why not link the two?" Cool idea. Just missing the mark by a teeny bit. An excerpt from their P/R:
Today at VMworld 2008, VMware, Inc. (NYSE: VMW), the global leader in virtualization solutions from the desktop to the datacenter, announced a comprehensive roadmap of groundbreaking new products and technologies that expand its flagship suite of virtual infrastructure into a Virtual Datacenter Operating System (VDC-OS). The Virtual Datacenter OS allows businesses to efficiently pool all types of hardware resources - servers, storage and network – into an aggregated on-premise cloud – and, when needed, safely federate workloads to external clouds for additional compute capacity. Datacenters running on the Virtual Datacenter OS are highly elastic, self-managing and self-healing. With the Virtual Datacenter OS from VMware, businesses large and small can benefit from the flexibility and the efficiency of the “lights-out” datacenter.
My reactions to this are mixed. But I'm sharing them to shed light on what this announcement means for data center operators, MSPs, and IT Operations folks. Full-disclosure: I work at Cassatt, who's been developing software for data center management now for over 5 years. So the concepts VMware is talking about are actually not new to me at all; I've been living them for a while.

A few initial reactions, and cautionary advice:
  • First: I'm thrilled that VMware is finally educating the market that "it's not all about consolidation". There's a bigger "there" there!
  • Gartner Research (Tom Bittman), has been touting a "Meta-O/S" for the data center for some time. I'm sure that's where VMware got the idea for VDC-OS. But, their vision was more heterogeneous. More on that later...
  • While VMware has coined the term "On-Premesis Cloud", it's been in the news for a while. Here at Cassatt we've been talking about "Internal Clouds" for some time. So has our CEO. Even check out our website. I wonder if VMware took notice...
  • The concept of "federating" virtualization management systems (and storage and network) is great. And the fact that VMware has roped-in partners like Savvis, Rackspace, Sungard and more means they're serious. The Gotcha, however, is that the concept works *only* if you buy-into VMware-specific technology. What if you have some other technology like Citrix' (Xen), Microsoft's Hyper-V or Parallels' Virtuozzo? Multiple Virtualization technologies under one roof is gonna happen, folks. Plan for it. (wait for my punchline...)
  • Keep in mind that this is a VMware roadmap. Not everything is in place yet.
  • What about "The Forgotten 40%"? That is, IT OPS will always have systems that are not virtualized (e.g. transaction processors, directory servers, and other high-throughput and/or Scale-Out architectures). Some analysts believe the number could be as much as 40% of infrastructure. How are you going to manage those systems? You'll Still need a second (if not a third and fourth) management system in addition to vCloud.
So, allow me to take this announcement and append a few nuances to shape it into what IT would want it to look like. Apologies for adding bias :)
  1. Demand "equal rights" for physical/native systems: The concept of a "meta O/S" for the data center has to include support for all systems, as well as coverage for systems which are *not* virtualized.
  2. Require VM heterogeneity: Data center operations will have to federate systems (i.e. for failover, capacity extensions, etc.) based on arbitrary technologies. Not VMware only. Fortunately, companies other than VMware are doing this.
  3. Products are available today: you don't have to buy-into a roadmap. Actually, companies like Cassatt are already delivering on multi-vendor, Physical + Virtual, and "federated" styles of failover, disaster recovery and data center migration.
At the core of where the industry is going is utility computing: This doesn't require that you have to use Virtualization at all, or (if you do) that it come from a single vendor. Cassatt's CTO, in fact, was the designer of one of the best O/S's on the market -- so we know what a real "data center operating system" ought to be.

We'll be at VMware, booth 1440, BTW.

Friday, September 12, 2008

Join Cassatt at VMworld

It's that time of year again when all good IT professionals migrate to Las Vegas. Cassatt will be in booth #1440, with some interesting new developments to share. (And, enter to Win a Wii)

Here's a teaser for inquisitive minds:
  • Can you Swizzle virtual & physical?
    Come by the booth for a complementary drink mixer and find out...

  • How will you manage "The forgotten 40%" of your infrastructure?
    You'll have to come by and ask!

  • What's an Internal Cloud?
    Nope. Not a smoker's lounge :)

Monday, September 8, 2008

Inherent efficiencies of Cloud Computing, Utility Computing, and PaaS

Ever noticed that the two hottest topics in IT today are Data Center Efficiency and Cloud Computing? Ever wondered if the two might be related? I did. And it’s clear that the media, industry analysts – and most of all IT OPs – have missed this relationship entirely. I now feel obligated to point out how and why we need to make this connection as soon as we can.

Let me cut to the chase: The most theoretically-efficient IT compute infrastructure is a Utility Computing architecture – essentially the same architecture which supports PaaS or “cloud computing”. So it helps to understand why this is so, why today's efficiency “point solutions” will never result in maximum compute efficiency, and why the “Green IT” movement needs to embrace Utility Computing architectures as soon as it can.

To illustrate, I’ll remind you of one of my favorite observations: How is Amazon Web Services able to charge $0.10/CPU-Hour, (equating to ~$870/year) when the average IT department or hosting provider has a loaded server cost of somewhere between $2,000-$4,000/year? What does Amazon know that the rest of the industry doesn’t?

First: A bit of background

Data center efficiency is top-of-mind lately. As I’ve mentioned before, a recent EPA report to U.S. Congress outlined that over 1.5% of U.S. electricity is going to power data centers, and that number may well double by 2011. Plus, according to an Uptime Institute White Paper, the 3-year cost of power to operate a server will now outstrip the original purchase cost of that server. Clearly, the issues of high cost and limited capacity for power are currently hamstringing data center growth, and the industry is trying to find a way to overcome it.

Why point-solutions and traditional approaches will miss the mark

I regularly attend a number of industry organizations and forums on IT energy efficiency, and have spoken with all major industry analysts on the topic. And what strikes me as absurdly odd is that the industry (taken as a whole) is missing-the-mark on solving this energy problem. Industry bodies – mostly driven by large equipment vendors – are mainly proposing *incremental* improvements to “old” technology models. Ultimately these provide a few % improvement here, a few % there. Better power supplies. DC power distribution. Air flow blanking panels. Yawn.

These approaches are oddly similar to Detroit trying to figure out how to make its gas-guzzlers more efficient by using higher-pressure tires, better engine control chips and better spark plugs. They’ll never get to an order-of-magnitude efficiency improvement on transportation.

Plus, industry bodies are focusing on metrics (mostly a good idea) that will never get us to the major improvements we need. Rather, the current metrics are lulling us into a misplaced sense of complacency. To wit: The most oft-quoted data center efficiency metrics are the PUE (Power Use Effectiveness), and it’s reciprocal, the DCiE (Data Center Infrastructure Efficiency). These essentially say, “get as much power through your data center and to the compute equipment, with as little siphoned-off to overhead as possible.”

While PUE/DCIE are nice metrics to help drive overhead (power distribution, cooling) power use down, they don’t at all address the efficiency with which the compute equipment is applied. For example, you could have a pathetically low-level of compute utilization, but still achieve an incredibly wonderful PUE and DCIE number. Sort of akin to Detroit talking about transmission efficiency rather than actual mileage.

These metrics will continue to mislead the IT industry unless it fundamentally looks at how IT resources are applied, utilized and operated. (BTW, I am more optimistic about the Deployed HW Utilization Efficiency “DH-UE” metric put forth by the Uptime Institute in an excellent white paper, but rarely mentioned)

Where we have to begin: Focus on operational efficiency rather than equipment efficiency

So, while Detroit was focused on incremental equipment efficiency like higher tire pressure and better spark plugs to increase mileage, Toyota was looking at fundamental questions like how the car was operated. The Prius didn’t just have a more efficient engine, but it had batteries (for high peak needs), regenerative braking (to re-capture idle “cycles”), and a computer/transmission to “broker” these energy sources. This was an entirely new operational model for a vehicle.

The IT industry now needs a similar operationally-efficient re-engineering.

Yes, we still need more efficient cooling systems and power distribution. But we need to re-think how we operate and allocate resources in an entirely new way. This is the ONLY approach that will result in Amazon-level cost reductions and economies-of-scale. And I am referring to cost reductions WITHIN your own IT infrastructure. Not to outsourcing. A per-CPU cost basis under $1,000/year, including power, cooling, and administration. What IT Operations professional doesn’t desire that?

Punchline: The link between efficiency, cloud computing & utility computing architectures

What the industry has termed “utility computing” or Platform-as-a-Service (a form of “cloud” computing) provides just this ideal form of operational-efficiency and energy-efficiency to IT.

Consider the principles of Utility Computing (the architecture behind “clouds”): Only use compute power when you need it. Re-assign it when-and-where it’s required. Retire it when it’s not needed at all. Dynamically consolidate workloads. And be indifferent with respect to the make, model and type of HW and SW. Now consider the possibilities of using this architecture within your four walls.

Using the design centers above, power (and electricity cost) is *inherently* minimized because capital efficiency is continuously maximized. Always, regardless of the variation from hour-to-hour or month-to-month. And, this approach is still compatible with “overhead” improvements such as to cooling and power distribution. But it always guarantees that the working capital of the data center is continuously optimized. (Think: the Prius engine isn’t running all the time!)

On top of this approach, it would then be appropriate to re-focus on PUE/DCIE!

The industry is slowly coming around. In a recent article, my CEO, Bill Coleman, pointed out a similar observation: Bring the “cloud” inside, operate your existing equipment more efficiently, and save power and $ in the process.

I’m only waiting for the rest of the industry to acquiesce the inherent connection between Energy efficiency, operational efficiency, power efficiency, and the architectures behind cloud computing.

Only then will we see a precipitous drop in energy consumed by IT, and the economies-of-scale that technology ought to provide.

Tuesday, September 2, 2008

Efficient IT and Power Management

I just learned of a brand-new analyst report (and upcoming webinar) regarding a broad survey of desktop and server power management. I expect that it will completely dwarf the very high-level survey of data center power management products I recently made.

The work is being done by The 451 Group, and sister company Tier 1 Research. The webinar, titled "Power Management 2008-2012 Managing and Monitoring IT Energy Use From the Desktop to the Datacenter" is on September 4 from 12:00 to 1:00pm EST. Register Here.

The webinar appears to coincide with a forthcoming 76 page report from The 451 titled "Eco-Efficient IT: Power Management – 2008-2012". Clearly, Cassatt is part of their wide-ranging analysis. Their summary:
"IT power consumption is causing significant financial, operational and ethical problems for many organizations – and excessive consumption may in future be a compliance issue (accurate measurement of power is required by some planned laws and incentive schemes). However, most organizations have no technology in place for measuring, aggregating and tracking power use; indeed, such technology has only recently become available. IT suppliers have belatedly realized the importance of this issue, and the race is on to develop products and establish them in the marketplace.

"This report examines this topic with a focus on managing power, managing datacenters and investing in and understanding the eco-efficient IT agenda.

"The 451 Eco-Efficient IT (ECO-IT) service tracks and analyzes key developments, from the Kyoto Protocol to datacenter effectiveness, from electricity prices to telepresence. Ecoefficiency is becoming a key consideration for technology vendors, end users, service providers and investors. The 451 Group provides insight to help organizations negotiate this new challenge and profit from the opportunity.
I'm digging the report, simply b/c of the TOC - something everyone should consider reading before undertaking *any* form of IT energy efficiency project:
SECTION 4: Datacenter power management
4.1 Datacenter power management – The technology
4.2 Effectiveness and ROI of datacenter power management
4.2.1 Calculating ROI
4.3 Market development and buyer attitudes
4.4 Reasons to be cautious: Market barriers to datacenter power management adoption
4.5 Toward the energy-aware, dynamic datacenter
4.5.1 The power management policy engine
4.6 Visibility into IT assets, dependencies and policies
4.7 Improving visibility into mechanical and electrical equipment
4.8 Virtualization and power management
4.9 Technical issues associated with virtualization
Like virtualization, power management is not an end-in-and-of-itself. Rather, it should be considered as part of a portfolio of initiatives IT & Facilities management need to consider when considering options.

I'm planning a future post to cover my philosophy of what a truly operationally-efficient data center implies; stay tuned.

Wednesday, August 27, 2008

Power IT Down Day

Today (Aug. 27th) is "Power IT down day".... a marketing/awareness-building event by HP, Intel and Citrix to promote installing and activating PC power management software. By doing so, companies can reduce power during off-hours and weekends.

Little do they know, this is both (a) a great idea, and (b) missing-the-boat on a huge additional opportunity.

Here's why: the event is definitely a great move to build awareness that IT doesn't need to be on 100% of the time... this is just a myth. And shutting down a PC/monitor overnight can save close to 200W. But the messaging the initiative is missing is that this doesn't need to be confined to desktops and monitors. Data Center Servers, the most power-hungry (and wasteful) equipment can be turned off, too.

At the time of this writing, the official website at HP showed over 2,700 participants, saving an estimated 35,000 KWh. But here's a sobering statistic: At a recent Silicon Valley Leadership Group Energy Summit, Cassatt piloted Server power management software. The organization using the software operated a number of its own data centers -- and the case study findings showed that if this software were used enterprise-wide, the annual savings could be 9,100KWh for this enterprise alone.

Maybe we'll try to expand the scope of Turn IT Off Day in 2009...

Tuesday, August 26, 2008

A day with the California ISO

Earlier today I was able to spend time with folks at the California Independent System Operator (ISO). This organization literally operates California's power grid, ensuring that supply meets demand, and that problems get worked-around seamlessly. They also work with utilities, public policy makers and technology providers to promote solutions that permanently (as well as on-demand) reduce electrical load on the power grid.
I got a tour of their demonstration lab, which is focused on providing efficiency technology, specifically "Demand Response" (DR) for commercial and residential installation. DR is a concept that is designed to reduce end-user demand for electricity during shortages (hot summer weekday afternoons). That's when electricity is most expensive, and when brown-outs are most likely. By consistently curbing peak consumption during peak hours, the ISO (and utilities like PG&E that offer DR incentives & programs) helps reduce the need to build new generating plants that can cost millions. Consider the fact that, during the summer, power consumption roughly doubles from night-to-day. That's alot of expensive peak generation that's needed only a fraction of the time. And that's why participating in DR programs can generate cool, hard $ for end-users.

On exhibit in the lab were a number of "smart" residential & commercial technologies including programmable thermostats, appliances and building control systems that curb electrical consumption during DR events. Picture a smart building that dims perimeter lighting and reduces air conditioning when commanded; picture a home thermostat that reduces air conditioning by a few degrees when it receives a signal from its utility; picture a washer/dryer that delay their cycle until electric rates drop. All of these technologies are available today, but many are awaiting legislative approvals for utility programs that incentivize their use.

Now consider this: The "energy density" of an office building is under 10W per square foot. The energy density of a data center is over 100x of that. What would the possibilities be of bringing DR to a data center? Even a 5% reduction of a data center load could outstrip the DR savings of an entire office building. Think that this isn't possible? Between Dev/test/staging servers and failover servers, there are almost certainly 5% of servers that are "non-critical" and could be gracefully powered-down during the brief DR events.

What are the possibilities there?