Thursday, November 19, 2009

Infrastructure Virtualization: The Next Logical Step

2010 will be an interesting year for virtualization - but not from the perspective you're probably thinking. It will be the year of the virtual infrastructure, not of the virtual machine.

Yes, the O/S virtualization market is maturing as it transforms how servers and applications are managed. The major vendors all offer hypervisors and management to accomplish server consolidation, live migration, HA, lifecycle management, lab management, and more. And they're even offering higher-level tools for DR and cloud computing... Read more on VMBlog.com

Tuesday, October 27, 2009

Infrastructure 2.0 – A Virtual Analogy

Is OS virtualization an end in itself? Is it both necessary and sufficient for all things Cloud and IaaS? Is it the panacea IT Operations has been looking for? From where I see it, abstracting the OS is certainly a great start, but it’s actually only 50% of the goal.

To a degree, OS virtualization is the “shiny metal object” de jure in that it’s captivating everyone’s attention. It is of course very valuable, and is causing an important inflection point in datacenter operations and economics. But there is a less-visible, less sexy side to datacenter operations and economics that lies “below” the CPU in the stack...

Read more on the Infrastructure 2.0 Blog

Tuesday, October 6, 2009

Differing Target Uses for IT Automation Types

One of the most oft-repeated themes at this year's VMworld was that of "automation." Everybody claimed they had it, but on closer investigation it had any number of poorly-defined meanings.

A specific angle I want to address here is that of infrastructure automation; that is, the dynamic manipulation of physical resources (virtualized or not) such as I/O, networking, load balancing, and storage connections - Sometimes referred to as "Infrastructure 2.0". Why is this important? Although automation of software (such as provisioning & manipulation of VMs/applications) usually captures attention, remember that there is a whole set of physical datacenter infrastructure layers that IT Ops has to deal with as well. When a new server (physical or virtual) is created, much of this infrastructure also has to be provisioned to support it.

There are 2 fundamental approaches to automation I'll compare/contrast: Let's loosely call them "In-Place" Infrastructure Automation, and Virtualized Infrastructure Automation.

Confession: I am a champion of IT automation. The industry has evolved into a morass of technologies and resulting complexity; the way applications (and datacenters) are constructed today is not the way a greenfield thinker would do it. Datacenters are stove-piped, hand-crafted, tightly-controlled and reasonably delicate. Automating how IT operates is the only way out -- hence the excitement over cloud computing, utility infrastructure, and the "everything-as-a-Service" movement. These technology initiatives are clear indications that IT operations desires a way to "escape" having to manage its mess.

At a high-level, automation has major top-level advantages: Lower steady-state OpEx, greater capital efficiency, and greater energy efficiency. And, automation also presents challenges typical of paradigm changes: distrust, organizational upheaval, financial and business changes. The art/science of introducing automation into an existing organization is to reap the benefits, and mitigate the challenges.

As infrastructure automation moves forward, it appears to be bifurcating along two different philosophies. Each is valid, but appropriate for differing types of uses:
  • "In-place" infrastructure automation: (distinct from run-book automation) Seeks to automate existing physical assets, deriving its value from masking the operational and physical complexity via orchestrating in-place resources. That is, it takes the physical topology (servers, I/O, ports, addressing, cabling, switches, VMs etc.) and orchestrate things to optimize a variable such as an SLA, energy consumption, etc.
  • Virtualized Infrastructure automation: Seeks to first virtualize the infrastructure (the assets as above) and then automate their creation, configuration and retirement. That is, I/O is virtualized, networking is frequently converged (i.e. a Fabric), and network switches, load balancers, etc. are virtualized as well.
Each of these two approaches has properties with pros and cons with which I'm familiar -- having worked for companies in each space. I'll try to elucidate a few of the "high points" for each:

"In-Place" Infrastructure Automation:
Examples: Cassatt (now part of CA), Scalent
  • Automates existing assets: Usually, there is no need to acquire new network or server hardware (although not all hardware will be compatible with the automation software). Thus "in-place" assets are generally re-purposed more efficiently than they would be in a manually-controlled scenario. Clearly this is one of the largest value propositions for this approach - automate what you already own.
  • Masking underlying complexity: A double-edged sword, I suppose, is that while "in-place" automation simplifies operation and streamlines efficiency, the datacenter's underlying complexity is still there - e.g. the same redundant (and sometimes sub-optimal) assets to maintain, same cabling, same multi-layer switching, same physical limitations, etc.
  • Alters security hierarchy: Since assets such as switches will now be controlled by machine (i.e. the automation SW automatically manipulates addresses and ports) this architecture will necessarily modify the security hierarchy, single-point-of-failure risks, etc. All assets fall under the command of the automation software controller.
  • Broad, but not complete, flexibility: Because this approach manipulates existing physical assets, certain physical limitations must remain in the datacenter. For example, physical server NICs and HBAs are what they are, and can't be altered. Or, for example, certain network topologies might not be able to be perfectly replicated if physical topologies don't closely match...or, if physical load balancers aren't available, servers/ports won't have access to them. Nonetheless, if properly architected, some of these limitations can be mitigated.
  • Use with OS virtualization: This approach usually takes control of the VMM as well, e.g. takes control of the VM management software, or directly controls the VMs itself. So, for example, you'd allow the automation manager to manipulate VMs, rather than vSphere.
  • Installation: Usually more complex to set up/maintain because all assets, versions, and physical topography necessarily need to be discovered and cataloged. But once running, the system will essentially maintain its own CMDB.

Virtualized Infrastructure Automation:
Examples: Cisco UCS, Egenera, Xsigo
  • Reduction/elimination of IT components: The good news here is that through virtualizing infrastructure, redundant components can be completely eliminated. For example, only a single I/O card with a single cable is needed per server, because they can be virtualized/presented to the CPU as any number of virtual connections and networks. And, a single virtualized switching node can present itself as any number of switches and load balancers for both storage and network data.
  • Complete flexibility in configuration: By abstracting infrastructure assets, they can be built/retired/repurposed on-demand. e.g. networking, load balancing, etc. can be created at-will with essentially arbitrary topologies.
  • Consistent/complementary to OS Virtualization models: If you think about it, virtualized infrastructure control is pretty complementary to OS virtualization. While OS virtualization logically defines servers (which can be consolidated, moved, duplicated, etc.), infrastructure virtualization similarly defines the "plumbing" and allows I/O and network consolidation, as well as movement/duplication of physical server properties to other locations.
  • New networking model: One thing to keep in mind is that with a completely virtualized/converged network, the way the network (and its security) is operationally managed changes. Organizations may have to re-think how (and who) creates and repurposes network assets. (Somewhat similar to coping with "VM Sprawl" in the software virtualization domain)
  • Use with OS virtualization: This approach is usually 'agnostic' to the software payload of the physical server, and is therefore neutral/indifferent to the VMM in place. Frequently the two can be coordinated, however.
  • Installation: Usually relatively simple. Few components per server, few cables, especially in a 'green field' deployment. Installation of software/BIOS on physical servers is probably not what you're used to, though.
Ideal use of these two approaches differs too. Obviously, "In-Place" Infrastructure Automation is probably best-suited for an existing set of complex datacenter assets - especially in a Dev/Test environment. As you'd expect , a number of existing lab automation products out there target this market. On the other hand Virtual Infrastructure Automation can certainly be deployed on existing assets, but its real value is for new installations where minimal hardware/cabling/networking can be designed-in from the ground up. Most of these products are designed for production data centers, as well as cloud/utility infrastructures.

My overall sense of the market is that adoption of "in-place" automation will be driven primarily by progressive IT staffs that want a taste of automation and service-level management. Virtualized Infrastructure Automation adoption, on the other hand, will tend to ride the technology wave driven both by networking vendors and OS virtualization vendors.

Stay tuned for additional product analyses in this space...

Tuesday, September 29, 2009

A real-world cloud user shares his findings

I subscribe to a number of mailing lists from my alma mater. A few weeks ago, an alum "John" posted a request for recommendations for a cloud computing vendor for his small investment firm. What follows is his email to the group following responses he received.

This is an incredibly illustrative peek inside of the "real world" of cloud computing, and what prospective SMB users are looking for and concerned about. As well as what's "Good Enough". I've not edited anything....



I had many requests to share our findings so I figured I would share with the group. I appreciate all of the input I received. It has been really helpful.

~ John


Having looked into cloud computing solutions for our small investment firm over the past few months, we have learned a lot about the growing movement towards remote data storage and accessibility. Our goal has been to find a cost-effective solution for our IT needs that would make it convenient for employees of our company to access our shared network (documents and emails) all over the globe without much hassle, difficulty, or expense. While the cloud computing landscape is still relatively new, what is already available is exciting. Both Google, Microsoft, and other companies have products available such as Google Apps and Microsoft Office Live, but neither has fully come to the point of being able to handle our business needs. We are currently in the process of setting up a Google Apps trial period, through a consultant, to try out business e-mail and calendar via Google’s Gmail and Google Calendar. We will do this test while retaining our current Microsoft Exchange server.

There have been many issues to consider as we have been speaking with various consultants and researching all of the available alternatives. First, since we are an SEC-registered investment adviser with lots of confidential and sensitive information on our hands, issues regarding the security of our electronic files – both in terms of disaster recovery as well the integrity of the company with whom we are entrusting to house our data – are paramount. This also ties in with the issue of record retention, which is equally important to us. In terms of data storage and backup – our current system is not ideal. We need to retain copies of all e-mails and files for at least seven years, if not more, and this information needs to be secure and easily accessible. There seem to be some progress in this area (Google Postini and Amazon S3, for example), but as of yet, there is not yet one system that can do all of these things in the way we’d require.

Second, since we currently are not pleased with our current remote network access - we would like an easy and inexpensive way to access email and our network drive from any computer with Internet access. We have discovered that while web-based, unlimited e-mail and calendar storage are currently available from multiple providers, a solution for mass file storage that would essentially replicate our shared network drive and allow large files for multiple software applications to be stored/backed up in the cloud does not yet exist at an attractive price. In particular, a system where we could modify docs in the cloud without having to download and upload/re-save the file each time it needs to be edited.

One interesting product we discovered during our search is called Dropbox. You download Dropbox to one computer, save any type of file you would like to a “Drop Box drive”, and it syncs up automatically with the Web. Then, when you are at home or traveling, you can access those docs through a web browser... or you can download Drop Box onto another computer anywhere and you can edit the docs directly in Drop Box. The only glitch is that Dropbox does not yet have file storage capacity for a company with over 200GB of data to store and seems to be geared more for individual users. Word on the street is that Google will be coming out with a new product soon that has similar features to Dropbox, but on a much larger scale that would be useful for businesses.

In terms of cost and ease, Google Apps seems to be the best solution for us right now (it comes out around $50/user/year), at least for the e-mail and archiving component. Microsoft’s upcoming 2010 Web Apps platform seems appealing as well, particularly because we might be able to edit complex Excel documents directly in the cloud from anywhere.

Bottom line, what we have learned is that this rapidly-developing option for IT is not yet 100% ready to cover all the bases our business needs, but it will probably get there sometime in the next year or two. For the time being, we are going to see how the e-mail works and go from there.

Tuesday, September 22, 2009

Alternative Recommendation for DCeP "Service Productivity"

Back in February of this year, The Green Grid published a paper listing proposed Proxy measures for data center productivity, specifically Data Center Energy Productivity (DCeP).

This paper followed a much earlier output from the group in 2007 - which helped define the now much-used PUE and DCiE metrics which I wrote about back then. Those metrics were (and are) nice if what you care about are "basic" efficiencies of a data center -- simply how much power is getting to your servers relative to all of the other power being consumed by infrastructure systems (e.g. lighting, power distribution, cooling, etc.). But the shortcomings are they don't quantify the "useful output" of a datacenter vs. power input. So, for example, you could have a fantastic PUE... but with a datacenter full of idle servers.

Again, enter The Green Grid to take analysis to the next level. The excellent paper published in February details 8 "proxy" approaches (i.e. not necessarily metrics) that could be used by data center operators to begin to baseline efficiencies based on "useful output". The Green Grid also set up a survey where they have been soliciting feedback from users regarding the appropriateness, usefulness, etc. of these proxies.

Why 8 approaches? Because not everyone agrees on what "useful work output" of a datacenter really is. Should it be Bits-per-kWh (proxy #4)? Weighted CPU utilization (proxies #5 & #6)? Compute units delivered per second (proxy #7)? Each has its pros and cons. Fortunately, the Green Grid recognized that nothing's perfect. Says the paper: "...The goal is to find a proxy that will substitute for a difficult measurement, but that still gives a good-enough indication of useful work completed."

In addition, the Data Center Knowledge blog pointed out:
The new goal is to develop a simple indicator, or proxy, rather than a full metric. The Green Grid compares the proxy to EPA mileage ratings for new cars, which provide useful data on energy efficiency, with the caveat that “your mileage may vary.” The proposals “do not explicitly address all data center devices and thus fall short of a complete overall measure of data center productivity,” the group says.
To this end, the issue was also recently dealt with extremely eloquently in Steve Chambers' ViewYonder perspective on datacenter efficiency - and has the right idea: Why not base efficiency on the service provided (as opposed to CPUs themselves, or some abstract mathematical element). This approach is very similar to what I proposed a year ago February, Measuring "useful work" of a Datacenter"

In short, the proposal is to compare the data center Service's SLAs with the power the overall datacenter consumes.

Why use the "SLA" (Service Level Agreement)? Two reasons. (1) The SLA is already part of the vernacular that datacenter operators already use. It's easily understood, and frequently well-documented. (2) The SLA encapsulates many "behind-the-scenes" factors that contribute to energy consumption. Take this example: Not all 1,000 seat email services are created equal. One may be within a Tier-I data center with a relatively low response rate requirement and allowing users only 500MB of storage per mailbox. Another enterprise with the same email application may be operating in a Tier-III datacenter environment with a rigorously-controlled response rate, a full disaster-recovery requirement, and 2GB of storage per mailbox. These two SLA examples are quite different and will therefore consume different power. But wouldn't you now rather compare apples-to-apples to see if your particular instantiation of these 1,000 mailboxes was more efficient to another enterprise with the same SLA?

How would such a proxy/measurement be accomplished? The approach is somewhat analogous to the Green Grid's proxy #1 ("Self-assessment reporting"), coupled with peer-reporting/comparison of data as is done with the DOE's DC-Pro tool.

Thus, data centers would
1) quantify the number of Services and SLAs for each,
2) measure overall power consumed,
3) upload these numbers to a public (but anonymized) database.
After a while, there would be statistically-significant comparisons to be made -- say a "best practice" energy efficiency range for a given Tier-III email application with 2GB storage and disaster-recovery option.

I'm open to other suggestions of how to pragmatically apply application SLAs vs Watts to gauge overall datacenter energy efficiency - again, my earlier proposal of this is here. But it seems that the SLA encapsulates all of the "output" related service metrics, while being agnostic to the actual implementation. Seems elegant, if you ask me.

Monday, September 14, 2009

An Ideal Datacenter-in-a-Box, Part II

Last week I posted a Blog outlining Dell & Egenera's latest Datacenter-in-a-Box offering. More than one person took note of how I compared its simplicity in contrast to other offerings in the same space, but failed to detail the specifics of Egenera's PAN Manager software and how it mapped to 13 common IT Service Management functions.

The 13 different functions are mapped onto the data center "stack" at right. They span management of both physical and virtual software, servers, I/O, networking, etc. -- as well as higher-level functions such as High-Availability and Disaster Recovery.

The Dell PAN offering unifies 12 of the 13 functions, and provides them from within a single console is called PAN Manger. (The 13th function if provided via the Dell Management Console.) This single-console infrastructure management software consists of the base PAN Builder software, as well as two optional modules, PAN Server Portability, and PAN Portability.

So, using the diagram from last week, the functionality maps as follows:

PAN Builder:
  • VM server management
  • Physical server management
  • Software (P & V) provisioning
  • I/O virtualization & management
  • IP load balancing
  • Network virtualization & management
  • Storage connection management
  • Infrastructure provisioning
  • Device (e.g. switch & load balancing) failover
PAN Server Portability:
  • Physical N+1 failover (HA)
  • Virtual host N+1 failover (HA)
PAN Portability:
  • Disaster recovery (DR) for entire mixed P & V environments
I hope this helps detail not only the Egenera product, but also illustrates what's possible when the industry combines server management with virtual I/O and virtual networking & switching. It's the perfect complement to O/S virtualization, and massively simplifies traditional IT Operations Management.

Thursday, September 10, 2009

An Ideal Datacenter-in-a-Box

Today's announcement marks a modest but meaningful step in Egenera's relationship with Dell, and in overall Simplification of IT .

Essentially the punchline is this: We've taken the most commonly-purchased hardware configuration and management tools used by mission-critical IT Ops, and integrated them into a single product with a single GUI that you can install and use in ~ 1 day. That's essentially the idea behind the "Datacenter-in-a-Box:"
Most common configuration: Blades + Networking + SAN Storage

Most useful
tools to manage VMs + physical servers + network + I/O + SW provisioning + workload automation + high availability
That's what Egenera's done with Dell. It's a "unified computing" environment (to borrow a term) - but has integrated with it all of the most popular higher-level management functions too. That's to say it includes I/O virtualization, a converged network fabric (including virtual switches and load balancing - based on std. ethernet), and then includes tools for software provisioning, VM management, and high-availability to "universally" manage both physical and virtual workloads simultaneously. Pretty cool - and highly simple to use.

Don't believe all this stuff can be so simple? Here's evidence & illustrations why this move will help drive data center management toward greater simplification:
  • (1) Check out how easy it is to provision a complete compute environment with N+1 failover in 6 steps
  • (2) Compare the level of complexity reduction compared to some similar products
  • (3) The Dell PAN Datatcenter-in-a-Box (DCIB), together with the Dell Management Console, provides a massively simplified management landscape as compared with alternative solutions. To wit:
The set of "traditional" products you'd need to buy/integrate.
ALL of these functions are already integrated within the Dell PAN DCIB:




Then, the roughly-equivalent solution you'd compose with HP:






And finally, the roughly-equivalent solution you'd compose with Cisco and their partners:




I'd also be remiss without pointing out that this product SKU configuration is available directly from our friends at Dell - and was born directly from customer requests for such a building-block. Folks who've already purchased this technology based on the Dell PAN System include
  • Federal users who may replicate an entire mission-critical environment across dozens of aviation-related locations
  • Financial-services users who wanted a consolidated approach to ensuring high-availability across dozens of blades w/different workloads
  • Commercial customers wanting a flexible environment on which to run the company's SAP
  • A Federal hosted services provider wanting five-9's of availability plus being able to re-configure systems/capacity a la an "internal" cloud
  • Overseas users acting as an internal IT service provider seeking 'universal' HA and DR for all workloads
Plus thousands more worldwide locations where you can find the same PAN Manager software.
If you don't believe Dell hardware is ready for the Data Center, then think again.