Wednesday, November 12, 2008

Strides toward internal clouds & more efficient data centers

While I was attending a recent Tier-1 conference of hosted service providers, the question arose of how to build a cloud infrastructure like what Amazon, Google and other 'big guns' already have? Cloud computing was looking great, and IT managers all wanted a piece of it.

Then, at a recent Cloud Computing conference in Mountain View, a number of CIO panelists (especially one representing the state of California) treated the cloud with caution: What of security, SLA control, vendor lock-in and auditability? Cloud computing was still looking nascent.

The solution is the "great taste, less filling" answer -- IT orgs that already own data centers, that want the economic benefits of clouds, but wouldn't outsource a thing to a cloud, can now build an "internal cloud" or a "private cloud". (Whether the words used to
describe it are Infrastructure-as-a-Service, Hardware-as-a-Service, or Utility Computing, these are simply infrastructures that has properties of "elasticity" and "self-healing," while adapting to user demand to preserve service levels)

As Dan Kusnetzky recently pointed out, such environments can "continue to scan the environment to manage events based upon time, occurrence of specific events, capacity considerations and ongoing workload demands" and adjust as-needed."

Well, Cassatt announced today software that does just that. It's the 5.2 release of Active Response. It's capable of transforming existing hetergeneous infrastructures into ones that act "Amazon EC2-like" to build an "internal compute cloud" behind existing firewalls. Whether the environments are Windows, Sun, Linux or IBM platforms. Whether they contain VMs from VMware, Citrix or Parallels. Regardless of networking gear from Cisco, Extreme, Force 10 and others. And, regardess of whether there is a need to manage physical apps, virtual apps, or *both* at the same time (you can even go from P to V and back again on-the-fly).

These details all matter because of a fallacious assumption the industry is making, one that's being proliferated by leading VM vendors: That all IT problems will all be solved IF you virtualize 100% of your infrastructure, and IF you use that vendor's technology. It's not true; rather, IT has to PLAN for managing physical and virtual apps from the same console. IT has to PLAN to manage VMs from differing vendors at the same time.

Scott Lowe observed similar issues in his recent article on the Challenges of cloud computing -

"What about moving resources from one cloud computing environment to another environment? Is it possible to move resources from one cloud to another, like from an internal cloud to an external cloud? What if the clouds are built on different underlying technologies? This doesn't even begin to address the practical and technological concerns around security or privacy that come into play when discussing external clouds interacting with internal ones.

"Given that virtualization typically plays a significant role in cloud computing environments, the interoperability of hypervisors and guest virtual machines (VMs) will be a key factor in the acceptance of widespread cloud computing. Will organizations be able to make a VMware ESX-powered internal cloud work properly with a Xen-powered external cloud, or vice versa?

The ability to build a utility-computing style "internal cloud" is now very real. Check out the Cassatt website, or download a new white paper on internal clouds, and how they generate efficiency and agility-- without the hobbling effects of using an external cloud. I can attest to its quality :)

There's also Steve Oberlin's, Cassatt's Chief Scientist, overview video of the product.

Finally, consider registering for a joint webcast he's doing with James Staten of Forrester Research on November 20th. They'll also be covering cloud computing, internal cloud technologies, and the overall impact on data center efficiency.

Monday, November 10, 2008

ITIL, ITSM, and the Cloud

There's been tons written by pundits about the cloud recently, but I haven't seen any significant in-depth analysis of how implementing compute clouds is integrated with IT Operations. IT OPS is the "guts" of how IT operates day-to-day processes, configurations, changes, additions and problem resolutions. The most popular reference to these processes is ITSM (IT Service Management), and the most popular guide to managing the processes is ITIL (the IT Information Library, v3). Not everyone uses (or even believes in) ITIL, and indeed, it's not required. But it is a convenient way to look at the possible methods/processes IT Ops can bring-to-bear to manage the size and complexity of today's data centers.

Obviously, if you're outsourcing your IT to a Software-as-a-Service provider, you've already obviated most ITSM issues. Somebody else is managing infrastructure for you.

But if you're running your own software in a cloud (say, Amazon EC2) you'd still probably worry about how tomanage & change software configurations; how security is administered; and how new versions are deployed. The only real processes eliminated are those dealing with the hardware -- you still have the software and data management processes to deal with.

Now, if you're operating your own cloud (say you're a hosted services provider, or building an "internal cloud" within your data center) there are still a number of processes to manage -- but also, a number that are conveniently automated or eliminated.

For example, if you look at the 'Service operation" block above, things like Event Management or Problem Management are conveniently automated (if not eliminated) by the "self-healing" aspects of most cloud computing (really utility computing) policy & orchestration engines. Similarly, in the "service design" block, things like capacity management and service level management are similarly automated, and don't require a traditional paper policy.

Consider the types of processes that would be impacted with the use of a truly "elastic" and "self-healing" cloud: "trouble tickets" would be opened and closed automatically and within seconds or minutes. Problem managent would essentially take care of itself. Service levels would be automated. Configurations would be machine-tracked and machine-verified. Indeed, most of the complexity that ITIL was designed to help manage, would be handled by computer, the way complex systems ought to be.

One other quick observation: in a cloud environment, where resources are dynamically and continuously shifted and repurposed, the Configuration Management System (usually a relational database) becomes "real-time", that is, it could change minute-by-minute, instead of daily or weekly, as-is the case with most current CMDB systems.

At any rate, I'd really like to see more in-depth analysis from the IT Ops and/or analyst community to dissect how ITSM is impacted as more IT staffs turn to, or implement, cloud-style automated infrastructures. This way, we can also get out of being "cloud idealists" and become "cloud pragmatists."

Wednesday, November 5, 2008

The art of powering-down servers

I was pretty happy to see that Ted Samson of Infoworld wrote a really well-balanced analysis of the advantages (and calculated risks) of using server power management in the data center.

Besides speaking with leading SW companies in the space like Cassatt, he also pinged authorities at HP, IBM and Sun, who all had thoughtful positions on the merits of powering servers on/off based on their usage (and when they were idle). He also spoke with Robert Aldrich of Cisco Systems -- who pointed out that they are already using power management quite extensively internally, with no ill-effects. BTW, Robert pointed out that servers at Cisco use 40% of their power when idle. Most analyses I've seen show that the number is closer to 60%-80%. Good for them.

What's also fascinating about Ted's post are the comments. Most are quite supportive of power management, with a realization that future, denser data centers will need this -- and that some power companies have already figured-out that dynamic "load management" is one of the most intelligent operational innovations available today.