Tuesday, September 29, 2009

A real-world cloud user shares his findings

I subscribe to a number of mailing lists from my alma mater. A few weeks ago, an alum "John" posted a request for recommendations for a cloud computing vendor for his small investment firm. What follows is his email to the group following responses he received.

This is an incredibly illustrative peek inside of the "real world" of cloud computing, and what prospective SMB users are looking for and concerned about. As well as what's "Good Enough". I've not edited anything....



I had many requests to share our findings so I figured I would share with the group. I appreciate all of the input I received. It has been really helpful.

~ John


Having looked into cloud computing solutions for our small investment firm over the past few months, we have learned a lot about the growing movement towards remote data storage and accessibility. Our goal has been to find a cost-effective solution for our IT needs that would make it convenient for employees of our company to access our shared network (documents and emails) all over the globe without much hassle, difficulty, or expense. While the cloud computing landscape is still relatively new, what is already available is exciting. Both Google, Microsoft, and other companies have products available such as Google Apps and Microsoft Office Live, but neither has fully come to the point of being able to handle our business needs. We are currently in the process of setting up a Google Apps trial period, through a consultant, to try out business e-mail and calendar via Google’s Gmail and Google Calendar. We will do this test while retaining our current Microsoft Exchange server.

There have been many issues to consider as we have been speaking with various consultants and researching all of the available alternatives. First, since we are an SEC-registered investment adviser with lots of confidential and sensitive information on our hands, issues regarding the security of our electronic files – both in terms of disaster recovery as well the integrity of the company with whom we are entrusting to house our data – are paramount. This also ties in with the issue of record retention, which is equally important to us. In terms of data storage and backup – our current system is not ideal. We need to retain copies of all e-mails and files for at least seven years, if not more, and this information needs to be secure and easily accessible. There seem to be some progress in this area (Google Postini and Amazon S3, for example), but as of yet, there is not yet one system that can do all of these things in the way we’d require.

Second, since we currently are not pleased with our current remote network access - we would like an easy and inexpensive way to access email and our network drive from any computer with Internet access. We have discovered that while web-based, unlimited e-mail and calendar storage are currently available from multiple providers, a solution for mass file storage that would essentially replicate our shared network drive and allow large files for multiple software applications to be stored/backed up in the cloud does not yet exist at an attractive price. In particular, a system where we could modify docs in the cloud without having to download and upload/re-save the file each time it needs to be edited.

One interesting product we discovered during our search is called Dropbox. You download Dropbox to one computer, save any type of file you would like to a “Drop Box drive”, and it syncs up automatically with the Web. Then, when you are at home or traveling, you can access those docs through a web browser... or you can download Drop Box onto another computer anywhere and you can edit the docs directly in Drop Box. The only glitch is that Dropbox does not yet have file storage capacity for a company with over 200GB of data to store and seems to be geared more for individual users. Word on the street is that Google will be coming out with a new product soon that has similar features to Dropbox, but on a much larger scale that would be useful for businesses.

In terms of cost and ease, Google Apps seems to be the best solution for us right now (it comes out around $50/user/year), at least for the e-mail and archiving component. Microsoft’s upcoming 2010 Web Apps platform seems appealing as well, particularly because we might be able to edit complex Excel documents directly in the cloud from anywhere.

Bottom line, what we have learned is that this rapidly-developing option for IT is not yet 100% ready to cover all the bases our business needs, but it will probably get there sometime in the next year or two. For the time being, we are going to see how the e-mail works and go from there.

Tuesday, September 22, 2009

Alternative Recommendation for DCeP "Service Productivity"

Back in February of this year, The Green Grid published a paper listing proposed Proxy measures for data center productivity, specifically Data Center Energy Productivity (DCeP).

This paper followed a much earlier output from the group in 2007 - which helped define the now much-used PUE and DCiE metrics which I wrote about back then. Those metrics were (and are) nice if what you care about are "basic" efficiencies of a data center -- simply how much power is getting to your servers relative to all of the other power being consumed by infrastructure systems (e.g. lighting, power distribution, cooling, etc.). But the shortcomings are they don't quantify the "useful output" of a datacenter vs. power input. So, for example, you could have a fantastic PUE... but with a datacenter full of idle servers.

Again, enter The Green Grid to take analysis to the next level. The excellent paper published in February details 8 "proxy" approaches (i.e. not necessarily metrics) that could be used by data center operators to begin to baseline efficiencies based on "useful output". The Green Grid also set up a survey where they have been soliciting feedback from users regarding the appropriateness, usefulness, etc. of these proxies.

Why 8 approaches? Because not everyone agrees on what "useful work output" of a datacenter really is. Should it be Bits-per-kWh (proxy #4)? Weighted CPU utilization (proxies #5 & #6)? Compute units delivered per second (proxy #7)? Each has its pros and cons. Fortunately, the Green Grid recognized that nothing's perfect. Says the paper: "...The goal is to find a proxy that will substitute for a difficult measurement, but that still gives a good-enough indication of useful work completed."

In addition, the Data Center Knowledge blog pointed out:
The new goal is to develop a simple indicator, or proxy, rather than a full metric. The Green Grid compares the proxy to EPA mileage ratings for new cars, which provide useful data on energy efficiency, with the caveat that “your mileage may vary.” The proposals “do not explicitly address all data center devices and thus fall short of a complete overall measure of data center productivity,” the group says.
To this end, the issue was also recently dealt with extremely eloquently in Steve Chambers' ViewYonder perspective on datacenter efficiency - and has the right idea: Why not base efficiency on the service provided (as opposed to CPUs themselves, or some abstract mathematical element). This approach is very similar to what I proposed a year ago February, Measuring "useful work" of a Datacenter"

In short, the proposal is to compare the data center Service's SLAs with the power the overall datacenter consumes.

Why use the "SLA" (Service Level Agreement)? Two reasons. (1) The SLA is already part of the vernacular that datacenter operators already use. It's easily understood, and frequently well-documented. (2) The SLA encapsulates many "behind-the-scenes" factors that contribute to energy consumption. Take this example: Not all 1,000 seat email services are created equal. One may be within a Tier-I data center with a relatively low response rate requirement and allowing users only 500MB of storage per mailbox. Another enterprise with the same email application may be operating in a Tier-III datacenter environment with a rigorously-controlled response rate, a full disaster-recovery requirement, and 2GB of storage per mailbox. These two SLA examples are quite different and will therefore consume different power. But wouldn't you now rather compare apples-to-apples to see if your particular instantiation of these 1,000 mailboxes was more efficient to another enterprise with the same SLA?

How would such a proxy/measurement be accomplished? The approach is somewhat analogous to the Green Grid's proxy #1 ("Self-assessment reporting"), coupled with peer-reporting/comparison of data as is done with the DOE's DC-Pro tool.

Thus, data centers would
1) quantify the number of Services and SLAs for each,
2) measure overall power consumed,
3) upload these numbers to a public (but anonymized) database.
After a while, there would be statistically-significant comparisons to be made -- say a "best practice" energy efficiency range for a given Tier-III email application with 2GB storage and disaster-recovery option.

I'm open to other suggestions of how to pragmatically apply application SLAs vs Watts to gauge overall datacenter energy efficiency - again, my earlier proposal of this is here. But it seems that the SLA encapsulates all of the "output" related service metrics, while being agnostic to the actual implementation. Seems elegant, if you ask me.

Monday, September 14, 2009

An Ideal Datacenter-in-a-Box, Part II

Last week I posted a Blog outlining Dell & Egenera's latest Datacenter-in-a-Box offering. More than one person took note of how I compared its simplicity in contrast to other offerings in the same space, but failed to detail the specifics of Egenera's PAN Manager software and how it mapped to 13 common IT Service Management functions.

The 13 different functions are mapped onto the data center "stack" at right. They span management of both physical and virtual software, servers, I/O, networking, etc. -- as well as higher-level functions such as High-Availability and Disaster Recovery.

The Dell PAN offering unifies 12 of the 13 functions, and provides them from within a single console is called PAN Manger. (The 13th function if provided via the Dell Management Console.) This single-console infrastructure management software consists of the base PAN Builder software, as well as two optional modules, PAN Server Portability, and PAN Portability.

So, using the diagram from last week, the functionality maps as follows:

PAN Builder:
  • VM server management
  • Physical server management
  • Software (P & V) provisioning
  • I/O virtualization & management
  • IP load balancing
  • Network virtualization & management
  • Storage connection management
  • Infrastructure provisioning
  • Device (e.g. switch & load balancing) failover
PAN Server Portability:
  • Physical N+1 failover (HA)
  • Virtual host N+1 failover (HA)
PAN Portability:
  • Disaster recovery (DR) for entire mixed P & V environments
I hope this helps detail not only the Egenera product, but also illustrates what's possible when the industry combines server management with virtual I/O and virtual networking & switching. It's the perfect complement to O/S virtualization, and massively simplifies traditional IT Operations Management.

Thursday, September 10, 2009

An Ideal Datacenter-in-a-Box

Today's announcement marks a modest but meaningful step in Egenera's relationship with Dell, and in overall Simplification of IT .

Essentially the punchline is this: We've taken the most commonly-purchased hardware configuration and management tools used by mission-critical IT Ops, and integrated them into a single product with a single GUI that you can install and use in ~ 1 day. That's essentially the idea behind the "Datacenter-in-a-Box:"
Most common configuration: Blades + Networking + SAN Storage

Most useful
tools to manage VMs + physical servers + network + I/O + SW provisioning + workload automation + high availability
That's what Egenera's done with Dell. It's a "unified computing" environment (to borrow a term) - but has integrated with it all of the most popular higher-level management functions too. That's to say it includes I/O virtualization, a converged network fabric (including virtual switches and load balancing - based on std. ethernet), and then includes tools for software provisioning, VM management, and high-availability to "universally" manage both physical and virtual workloads simultaneously. Pretty cool - and highly simple to use.

Don't believe all this stuff can be so simple? Here's evidence & illustrations why this move will help drive data center management toward greater simplification:
  • (1) Check out how easy it is to provision a complete compute environment with N+1 failover in 6 steps
  • (2) Compare the level of complexity reduction compared to some similar products
  • (3) The Dell PAN Datatcenter-in-a-Box (DCIB), together with the Dell Management Console, provides a massively simplified management landscape as compared with alternative solutions. To wit:
The set of "traditional" products you'd need to buy/integrate.
ALL of these functions are already integrated within the Dell PAN DCIB:




Then, the roughly-equivalent solution you'd compose with HP:






And finally, the roughly-equivalent solution you'd compose with Cisco and their partners:




I'd also be remiss without pointing out that this product SKU configuration is available directly from our friends at Dell - and was born directly from customer requests for such a building-block. Folks who've already purchased this technology based on the Dell PAN System include
  • Federal users who may replicate an entire mission-critical environment across dozens of aviation-related locations
  • Financial-services users who wanted a consolidated approach to ensuring high-availability across dozens of blades w/different workloads
  • Commercial customers wanting a flexible environment on which to run the company's SAP
  • A Federal hosted services provider wanting five-9's of availability plus being able to re-configure systems/capacity a la an "internal" cloud
  • Overseas users acting as an internal IT service provider seeking 'universal' HA and DR for all workloads
Plus thousands more worldwide locations where you can find the same PAN Manager software.
If you don't believe Dell hardware is ready for the Data Center, then think again.

Monday, September 7, 2009

Where the Server Industry Went Amiss

I've been doing an analysis regarding how "complexity" has evolved in the datacenter. Fundamentally, just why is it so hard to configure & provision new (physical) servers? Why is clustering inherently so complex? Why do we have data networks, storage networks and management networks (all distinct, I might add). How come we have all of these layered management systems?

OS virtualization has massively simplified complexity at the software level by abstracting-away the machine-level CPU commands, and has even contributed to simplifying networking between virtual machines. But we're still left with physical complexity at the physical I/O, networking and control levels - the other physical piece-parts (KVM ports, NICs, HBAs, etc.).

Ultimately, all of this complexity gradually resulted from incremental server hardware evolution… the motherboards to be exact. Way back when the computer industry was just getting started, motherboards harbored a simple CPU and remedial I/O (e.g. an audio jack to a cassette tape for storage...). But as processors got more sophisticated and datacenter environments grew, CPUs were integrated with more complex I/O (e.g. Network Interface Cards) as well as with storage connectivity (e.g. Host Bus Adaptors). Plus, there was usually a local disk, of course.

This meant that the server retained static data, specifically things like I/O addressing and storage connectivity naming, not to mention data on the local disk -- resulting in the server having a static “state". Usually the local network had state too – ensuring that the IP and MAC address of the motherboard were attached to switches and LANs in a particular way. Add to this the fact that with critical applications, all of these components (plus naming/addressing) were frequently duplicated for redundancy.

This meant that if you had to replace (or clone) a physical server, say because of a failure, you had to re-configure all of these addresses, names, storage connections and networks – and sometimes in duplicate. This resulted in lots of things to administer to, and lots of room for error. And frankly, this is where fundamental “data center complexity” probably arose from.

In response to dealing with failures and complexity, vendors developed special-purpose clustering and failover software – necessarily closely-coupled to specific software and hardware – to provide the re-assignment of state to the new hardware and networking. This software often required hand-crafted integration and frequent testing to ensure that all of the addressing, I/O, and connectivity operations worked properly. And many of these special-purpose systems are what are in use today.

Similarly, there are equally complicated software packages for scale-out and grid computing, that perform similar operations – not for the purpose of failure correction, but for “cloning” hardware to scale-out systems for parallel computing, databases, etc. But these systems are equally complex and usually application-specific, again having to deal with replicating Stateful computing resources.

So the industry, in an effort to add “smarts” and sophistication to the server – to enable it to fail-over or to scale – has instead created complexity and inflexibility for itself. Had the industry instead defined I/O, networks and addressing logically, then the way we assign/allocate servers would have been forever simplified and streamlined.

Fortunately, some technologies are being applied to somewhat revert/simplify:
  • I/O virtualization appliances which logically consolidate all I/O into one reconfigurable physical card (e.g. Xsigo)
  • Infrastructure virtualization software which logically defines all I/O, networking and switching so that any CPU-I/O-Network config. can be defined to take the place of any other CPU-I/O-Network config. (e.g. Egenera, Cisco UCS and to some degree HP's VirtualConnect)
  • CPU pooling hardware/software which replace traditional I/O to make multiple physical servers act as large multi-core CPUs (e.g. 3leaf)
Unfortunately, the industry's own momentum sustains the level of complexity - most players continue to develop software products to handle/abstract the increasing complication. Nor is in the interest of the board designers & silicon manufacturers to _reduce_ the number of chips & cards associated with servers. So, we may not see a significant architectural change in stateful processing units - until the industry gradually acquiesces that there is an alternative to all of this madness.