Monday, February 23, 2009

Correcting computing's wrongs - road to recovery?

As brilliant as the first microcomputer architects were, there were some early design principles that, as the law of unintended consequences outlines, have seriously hamstrung enterprise computing for years. But the industry is about to get out from under them in a big way.

We're about to hear lots about
"Infrastructure Orchestration", by virtue of Cisco's anticipated entry into the blade market with their "unified computing" strategy. The principle has been known as that of a "computing fabric," first conceived by Vern Brownell, the then-CTO of Goldman Sachs, and later productized by Egenera.

Fundamentally, the concept abstracts-away a server's I/O, disk, storage connectivity, and out-of-band controls, making it a stateless entity. The result is a server with considerably more flexibility (e.g. ability to be re-purposed) and a significant simplification in how groups of these servers are managed.
Just wait 'till this catches on.

A bit of history: How did we get here?
In the early eras of PCs, a number of new technologies arose: in particular there was the IP network, that allowed the CPU to talk to others, and external/networked storage, that externalized (or removed) the dedicated hard drive. Both of these technologies instantly resulted in additional hardware on the motherboard: The Network Interface Card (NIC) for connection to Ethernet, etc., and the Host Bus Adaptor (HBA) for connection to storage. Later on there was another bit of hardware, the on-board controller, that helped monitor/control "out-of-band" aspects of the CPU like power, temperature, performance; this also had its own equivalent of a NIC. These pieces of hardware were sometimes incorporated into the motherboard itself, or sometimes were additional plug-ins.

But each new technology came at an (unwitting) price: they became tightly-bound to the hardware and software. Each had a software driver, usually tied to the O/S. And each usually had its form of addressing -- IP and MAC address for the NIC, and usually the Worldwide Name for the HBA. Often, the NIC and the controller were actually part of the motherboard itself.

The result: Servers, their O/S, and sometimes even applications, were tightly-tied to their I/O. Changes to the network or storage meant changing I/O configurations. Changes to the server meant re-defining addresses as well. Every time a physical server had to be configured (or re-configured), the NIC, the HBA and even the controller's IP address had to be configured too. (And, if the server was on a separate network, external switches had to be configured as well).

This all made for an operations nightmare. The Application owners had to work with the O/S owners, who in-turn needed a process to work with Storage and Networking groups. No wonder operational spending is rising.

An alternative model.
Vern Brownell (and others) recognized the source of this complexity and asked whether the compute (CPU, memory, etc.) could be complete disociated or abstrated from the I/O.

In essence, the compute resource would be a stateless resource -- agnostic to the SW it ran, and agnostic to what I/O it was connected to. The I/O would be "virtualized" into a logical (rather than physical) connection... which meant that addressing and naming could be provisioned/changed in software.

Further, the physical I/O and network could be collapsed/unified. A single wire could carry all signals, and a set of switches could create custom (or private) connections between servers, or from servers out to an external network and storage. Hence the term "computing fabric" began.

This concept was initially productized around 2001 by Egenera, in the form of their BladeFrame hardware and PAN Manager software (short for Processing Area Network), and recently expanded to Dell hardware as well. The analogy to a SAN was clear: An abstracted, centrally-managed set of CPUs rather than an abstracted set of Disks. In the way that LUNs are mapped to physical drives, logical nodes would be mapped to physical (or virtual) CPUs.

Properties of the "compute fabric" a.k.a. Infrastructure Orchestration (a.k.a unified computing)
Once a set of servers is part of this compute fabric, a number of very elegant properties arise. Chiefly, any CPU can be re-purposed to handle just about any workload (assuming CPU is compatible, and memory is sufficient). Issues having to do with I/O, storage connectivity, etc. evaporate.

So, for example, if a server running a native O/S were to fail, another "bare metal" server could be instantly re-assigned all of the properties of the original server, connected to the failed machine's network, and then connected to the failed machine's shared storage. Presto - instant High Availability (HA).

Next, extend this example to a bunch of servers (and networks and switches). Should they all fail, such as in a disaster, the entire configuration, down to each server's I/O, networks, VLANs, etc., can be re-created in a separate location on "cold" bare (unprovisioned) hardware. Presto - instant Disaster Recovery (DR). All this assumes mirrored SAN storage, of course.

So what you might still say? Well, consider if the "native O/S" in the example above was really a VMware ESX server host. That means that an entire host configuration (down to the VMs) could be re-created elsewhere without having to re-provision the hosts themselves onto a physical piece of hardware. Neat, especially if you find yourself having to first duplicate hosts, hardware configurations and networks for your virtual failover sites. Not very "virtual," are they?

Now, finally, consider a mixed environment -- with native O/Ss as well as VM hosts (e.g. an SAP installation where some servers are virtual, but with native DBs as well). Complete HA and DR could be provided to the entire environment. At once. Cool.

Where we're headed

So if you think about it, if the original CPU mother boards and servers *hadn't* been equipped with stateful peripherals like NICs and HBAs, much of the complexity we deal with in data centers would be obviated. Instead, we would take for granted the fact that just about any workload could run just about anywhere, with the assurance that any other hardware could pick up if the original failed. We would have "virtual hardware" the same way we have virtual software.

And there's the point: that fabric computing - infrastructure orchestration, unified computing - is actually the ideal complement to any virutal (or physical) infrastructure.

No wonder why we'll see and hear more about this in the near future. Hardware vendors (Egenera, HP, IBM, Dell) are already doing it, and Cisco is about to. And what of VMware or Citrix?

Friday, February 13, 2009

"California" is deja vu all over again

Just in case you want a "preview" of what Cisco is expected to announce on March 16th, you might want to compare their unified computing model to Egenera's Infrastructure Orchestration technology it has been shipping for about 7 years.'s ServerWatch reported today some additional details about Cisco's "Project California." It all sounds pretty familiar to what Vern Brownell conceived-of back in 2001.

This reminds me of the famous spoof of Bill Gates' announcement of Vista, as well as a more highly-polished roasting by Apple of Vista's well-publicized, but trailing, technology. So, to pay homage to radical new innovation based on things that have been in the market for some time, permit me to highlight some historic factoids from the article by Andy Patrizio:
"According to a source familiar with the products, the blades will be based on Intel's Core i7 processors and come with up to 192GB of memory, well above the maximum capacity of 128GB in today's blades. Intel recently announced it would begin shipping Core i7 Xeon processors, codenamed Nehalem-EP, as part of its Xeon 5000 series.
Truth-be told, Egenera's own BladeFrame hardware already supports 128GB of memory, on 6-core, 4-way boards. And, our 192GB/Nehalem is coming soon too. A customer of ours has already indicated that in experiments, they have over 150 VMs running on a single blade in the chassis.
"The blades include a PCI-Express connection, allowing them to connect to Cisco's high-speed Unified Fabric architecture. These connections also give the blades very fast Ethernet access to both the network and storage devices and eliminate the need for a storage-area network (SAN). Instead, the blades would talk directly to the storage servers.
Similarly, Egenera BladeFrame Frabric architecture inherently eliminated the need for NICs and HBAs, and permitted unified/consolidated I/O to travel between blades at 2.5GB, or out to data center switches and storage. By abstracting away the I/O, it allows our management software to instantly provision any number/type of I/O onto Bare metal.
"The blade servers are believed to come with Cisco's Nexus 5000 switches embedded in the chassis, which support the Unified Fabric and is built to be virtualization-ready. The servers will also feature tight integration with and support for VMware software.
As above, a switching fabric is already built into the Egenera system. And for years, Egenera blades have been available with vmBuilder , a module which embeds VMs within the system. In that way, administators have the option of provisioning a full physical blade, or dicing-it-up into many virtual blades.
"This would put computing and networking power all in a single box. 'It's more of making the computer part of the network, thus Unified Computing,' said the source... The term "Unified Computing" was first floated by Cisco CTO Padmasree Warrior in a January blog post, where she described it as 'the advancement toward the next generation data center that links all resources together in a common architecture to reduce the barrier to entry for data center virtualization'. In other words, the compute and storage platform is architecturally 'unified' with the network and the virtualization platform....
"Computing and networking power in a single box"? Again, that sounds alot like the Egenera BladeFrame + PAN (Processing Area Network) Manager software, or like the Dell PAN System. Take a look at these demos.

Don't want to buy a high-performance Egenera BladeFrame? Well, you can also consider the Dell PAN System, which takes all of these "unified computing" Infrastructure Orchestration features, and runs them on Dell hardware, too.

Join me at NYCs cloud computing expo

Yours-truly will be one of the speakers at the Cloud Computing Conference & Expo in NY March 30-April 1, 2009.

They have a very awesome agenda, including a keynote from Werner Vogels, Amazon's CTO. I also believe William Fellows (Principal Analyst) of The 451 Group, who I've been in contact with for some time will be speaking as well. Also, David Bernstein (VP/GM, Cloud Computing) in Cisco’s Office of the CTO has a session as well, that ought to be very timely and engaging.

Wednesday, February 11, 2009

More about IBM, Cisco, Juniper, and Clouds

This is shaping-up to be an interesting week for some of the big IT players, and their intentions to build-out their cloud strategies. But within these announcements, there are also some fascinating implications that aren't making the headlines... yet.

First-off, IBM and Juniper made an interesting joint announcement, replete with demo. Very nice coverage from ZDNet, TechCrunchIT and InfoWorld - which, BTW, has a great description of their demo.

But note that in all of the IBM/Juniper coverage, there is little-to-no mention of the word "virtualization". That's very telling. It means that much of what will makes clouds (and Cloud "overflow" as they put it) work is part of the network, I/O and compute management infrastructure. i.e. it's *not* just about the VM/hypervisor. Which brings me to observation #2:

Last month, Cisco's CTO Padmasree Warrior had a widely-viewed blog about their upcoming product/strategy around "unified computing". Today, Cisco agressively followed-through with an update/elaboration on that blog with yet another blog/video featuring Ms. Warrior. In it, she elaborated on their "unified computing" vision, including their view on the phases that cloud computing will take. It's very telling re: where Cisco's strategy is likely to be focused:

Phase 1 for instance lays the foundation for data center cost containment through standardization. Core to this foundation is consistently applied network intelligence and virtualization in each area of specialization: local and wide area networking, storage networking and server/application networking.

Phase 2, or ‘Unified Fabric’ – This phase optimizes and extends data center technologies through consolidation of virtualization across the network, storage and servers/applications.

Phase 3, or ‘Unified Computing’ – Unified Computing virtualizes the entire data center through a pre-integrated architecture that brings together network, server and compute virtualization. Moving beyond that…

Phase 4, or ‘Private Clouds’—is a phase that extends the advantages of unified computing into the cloud, bringing enterprise-class security, control and interoperability to today’s stand-alone cloud architectures.

Phase 5, which is the ultimate vision of ‘inter-cloud’ marks our long-term transition with the market, by enabling portable workloads across the cloud. This will drive a new wave of innovation and investment similar to what we last saw with the Internet explosion of the mid-1990s.

Unified computing, as Cisco refers to it, is what we here at Egenera call "infrastructure orchestration" -- essentially it is about abstracting-away the I/O, network, storage and compute elements. In that way, they can become an instantly-configurable "fabric" where resources can be deployed, failed-over, scaled, etc., without having to manage any physical components at all. And all of this is *agnostic* to whether the SW payload is physical or virtual.

All-in all, I bet we'll be seeing MUCH more noise in the market about Infrastructure-as-a-Service, Infrastructure Orchestration, and how these foundations will help accelerate creation of "internal" clouds, public clouds, and bridges between these entities.