Monday, October 13, 2008

Cloud Computing forever changes consolidation and capacity management

This is an intriguing topic - the relationship between the need to forecast compute capacity (part art and part science today), and the "elasticity" guaranteed by what we're calling "the cloud."

So last week, when Michael Coté (an analyst with RedMonk) wrote about "How cloud computing will change capacity management" I thought it would be a good idea to expand on his observations and to dissect the issues and trends. Including my prediction that existing capacity management tool value will be overtaken by utility computing technologies.

First, terms: When I talk about the "cloud", I'm usually talking about Infrastructure-as-a-Service (a la Amazon EC2) rather than platform-as-a-service (e.g. Google app engine) or Software-as-a-Service. To me, IaaS represents that "raw" compute capacity on which we could provision any arbitrary compute service, grid, etc. (It's also what I consider the underlying architecture that's been called Utility Computing).

Michael was clear to define two other terms, Capacity Management, and Capacity Planning. Capacity management is the balancing of compute resources against demand (usually with demand data you have), while capacity planning is trying to estimate future required capacity (usually without the demand data you'd like).

Another related issue that has to be addressed is
Consolidation Planning - essentially "reverse" capacity planning -- estimating how to minimize overall in-use capacity while maintaining service and availability levels for virtualized applications.

So how does use of "cloud" (IaaS) impact capacity management/planning, as well as consolidation planning? In my estimation, there are two broad views on this:
  1. If you buy-into using the "public" cloud, then all the work you've been doing to estimate capacity and to plan for consolidation doesn't really matter. It's because your capacity has been outsourced to another provider who will bill you on an as-used basis. The IaaS "cloud" is elastic, and expands/contracts in relationship to demand.
  2. If you instead build an "internal cloud", or essentially architect a utility computing IaaS infrastructure, the story is a little different. You're taking non-infinite resources (your data center) and applying them in a more dynamic fashion. Nonetheless, the way you've been doing capacity management/planning, and even consolidation planning, will change forever.
I'll take #2, above, as an example, because its operation is more transparent. You start with your existing infrastructure (machines, network, storage) and use policy-based provisioning/controls to continuously adjust how it is applied. This approach yields a number of nice properties:
  • Efficiency: You only use the capacity (physical and/or virtual) you need, and only when you need it
  • Continuous consolidation: A corollary to above is that the policy engine can "continuously consolidate" virtualized apps (e.g. it can continually compute and re-adjust consolidated applications for "best-fit" against working resources)
  • Global view: Global available capacity (and global in-use capacity) is always known
  • Prioritization: You can apply policy to prioritize capacity use (e.g. e-commerce apps get priority during the holidays, financial apps get priority at quarter-close)
  • Safety net: You can apply policy to limit specific capacity use (e.g. you're introducing a new application, and you don't know what initial demand will be)
  • Resource use: It enables solutions for "resource contention" (borrowing from Peter to pay Paul); higher-priority applications can temporarily borrow capacity from lower-priority apps.
The net-net of the properties above is the long-term obviation of capacity planning, capacity management, and consolidation-planning tools. (Now take a deep breath)

Yes. Long-term, I would expect existing capacity management tools like PlateSpin PowerRecon, CiRBA's Data Center Intelligence, and VMware's Capacity Planner to be completely obviated with the appropriate internal IaaS architectures. Why? Well, let's say you do clever consolidation-planning for your apps. You virtualize them and cram them into many fewer servers. But a few months pass, and the business demand for a few apps changes... so you have to start re-planning over again. Contrast this against an IaaS infrastructure, where you let a computer continuously figure out the "best fit" for your applications. The current concept of "static" resource planning is destined for the history books.

Oh - and there are some nice side-benefits of allowing policy to govern when and where applications are provisioned in an internal IaaS ("internal cloud") architecture:

1) Simplified capacity additions: If capacity is allocated on a global basis, then the need to plan on a per-application basis is much less important. Raw capacity can be added to a "free pool" of servers, and the governing policy engine allocates it as-needed to individual applications. In fact, the more applications you have, the "smoother" capacity can be allocated, and the more statistical (rather than granular) capacity measurement can become.

2) Re-defined "consolidation planning": As I said above, the "static" approach to consolidation planning will give way to continuous resource allocation, essentially "continuous consolidation." Instead, you'll simply find yourself looking at used capacity (whether for physical or virtualized apps) and add raw capacity (as in #1) as-needed. The hard work of figuring out "best fit" for consolidation will take place automatically, and dynamically.

3) Re-defined capacity management: Just like #2 - Rather than using tools to determine "static" capacity needs, you'll get a global perspective on available vs. used raw capacity. You'll simply add raw capacity as-needed, and it will be allocated to physical and/or virtual workloads as-needed.

4) Re-defined
capacity planning for new apps: Instead of the "black art" of figuring-out how much capacity (and HW purchase) to allocate to new apps, you'll use policy instead. For example, you roll-out a new app, and use policy to throttle how much capacity it uses. If you under-forecast, you can "open the throttle" and allow more resources to be used -- and if it's a critical app, maybe even dynamically "borrow" resources from elsewhere until you permanently acquire new capacity.

5) Attention to application "Phase": You'll also realize that the best capital efficiency occurs when you have "out-of-phase" resource demands. For example, most demand for app servers happens during the day, while demand for backup servers happens at night -- so these out-of-phase needs could theoretically share hardware. So I would expect administrative duties to shift towards "global load balancing", and encouraging non-essential tasks to take place during off-hours. Much the same way Independent System Operators across the country share electric loads.

BTW, if all of this sounds like "vision" and vaporware, it's not. There are firms offering architectures like IaaS, Internal Clouds and utility computing technologies today, that work with your existing equipment. I know one of them pretty well :)


Douglas Gourlay said...

Just a thought- in IaaS you remove some aspects of traditional capacity planning and management if you assume that the workloads will be dynamically allocated to the most efficient resource to handle the processing requirements at that given time. However, since many IaaS services are provisioned by N number of machines I purchase I still have to manage capacity at a per-machine level and at an application level.

i.e. If I buy 20 servers in an IaaS cloud, allocate them at 12 Web, 4 App, and 4 for database life may be great. But if I see a significant spike in one of my web apps I may beed to add 5 more web servers, 2 mem-cache, and 2 more app servers. Not only do I need to know that I need to add these, I need the application performance management visibility to enable me to know this, and I need to have the application architecture have enough scalability/levels of abstraction that it allows for this linear scaling.


Ken Oestreich said...

Thanks Doug - agreed that I assume that workload management will automatically provision/allocate. And yes, capacity planning moves from per-application scale to a per-environment scale. But interesting that perhaps the application performance management tools now become a critical piece to help the other tools "know" when to adjust.