Fountainhead: 2009

Wednesday, December 16, 2009

Hosting & Cloud Computing: Numbers Don't Lie

There's lots of chatter in the market today regarding the value of using outside data centers, hosting services and cloud computing. But listening to pundits/analysts trying to objectively predict true value left me hollow.

While I'm not an investment professional, I do know that the stock market doesn't lie.... so instead, I thought I'd look at a bundle of stocks from publicly-traded data center companies in the data center space, and compare against a market benchmark.

I chose companies on publicly-traded markets in both the US as well as in Europe. My criteria were somewhat subjective, but basically the companies had to have a primary business operating data centers. I also excluded Telcos because it is difficult to separate their carrier revenues relative to hosting revenues. So, my initial "virtual fund" consists of 12 companies: Digital Realty Trust; DuPont Fabros; Equinix; Internap; Iomart; Macquarie Telecom; Navisite; Rackspace; Savvis; Switch & Data; Telecity; Terremark.

I also took a 5-company subset of these public companies that had significant offerings in the cloud computing space (Equinix; Navisite; Rackspace; Savvis; Terremark). I labeled this "virtual fund" a cloud-only index.

The chart at right is my best attempt to (a) tabulate historic end-of-month closing price of each stock; (b) calculate month-to-month percentage gains for each; (c) create "virtual funds" where $100 would be invested equally across each vehicle (initially $8.33 in each of the 12 hosting stocks, and initially $20 in each of the 5 cloud-related stocks). The benchmark I used is the Nasdaq index, also assuming an initial $100 investment.

Not surprisingly (for me, anyway) both "indexes" are outperforming the Nasdaq -- perhaps proving the thesis that datacenter operation and application outsourcing is indeed a growth market (or at least a speculative growth market?) as compared to the general technology market. What would be equally useful (but not an analysis I've done) is to chart gross revenues for the Index companies. This would be a telling barometer of actual business.

I'll continue to update this index at the end of each month. Comments, additions and suggestions welcome!

Tuesday, December 8, 2009

Emergence of Fabric as an IT Management Enabler

Last week I attended Gartner's annual Data Center Conference in Las Vegas. Four days packed with presentations and networking (of the social kind). Lots of talk about cloud computing, IT operations, virtualization and more.

Surprisingly a number of sessions directly referenced compute Fabrics -- including "The Future of Server Platforms" (Andy Butler), "Blade Servers and Fabrics - Evolution or Revolution" (Jeff Hewitt), and "Integrated Infrastructure Strengths and Challenges" (Paquet, Dawson, Haight, Zaffros). All very substantive analyses of what fabrics _are_... but very little discussion of why they're _important_. In fact, Compute fabrics might just be the next big thing after OS virtualization.

Think of it this way: Fabric Computing is the componentization and abstraction of infrastructure (such as CPU, Memory, Network and Storage). These components can then be logically re-configured as-needed. This is very much analogous to how OS virtualization componentizes and abstracts OS and application software stacks.

However, the focus by most fabric-related vendors thus far is simply on the most fundamental level of fabric computing, which is simply virtualizing I/O and using a converged network. This is the same initial level of sophistication when the industry believed that OS visualization was only about the hypervisor. Rather, we need to take a longer view of fabric computing and think about higher-level value we create by manipulating the infrastructure similar to how we manipulate VMs. A number of heady thinkers supporting the concept of Infrastructure 2.0 are already beginning to crack some of these revolutionary issues.

Enter: Fabric as an Enabler

If we think of "fabric computing" as abstraction and orchestration of IT components, then there is a logical progression of what gets abstracted, and then, what services can be constructed via logically manipulating the pieces:

1. Virtualizing I/O and converging the transport

This is just the first step, not the destination. Virtualizing I/O means no more stateful NICs and HBAs on the server; rather, the I/O presents itself to the OS as any number of configurable devices/ports, and I/O + data flow over a single physical wire. Transport can be Ethernet, FCoE, Infiniband, or others. In this manner, the network connectivity state of the physical server can be simplified and changed nearly instantaneously.

2. Virtual networking

The next step is to define in software the converged network, its switching, and even network devices such as load balancers. The result is a "wire-once" physical network topology, but with an infinitely reconfigurable logical topology. This permits physically flatter networks. Provisioning of the network, VLANs, IP load balancing, etc. can all be simplified and accomplished via software as well.

3. Unified (or Converged) Computing

Now things get interesting: Now that we can manipulate the server's I/O state and its network connections, we can couple that with creating software-based profiles of complete server configurations -- literally defining the server, its I/O, networking, storage connections, and even what software boots on it. (Software being either a virtual host, or a traditional native OS). Having defined the entire server profile in software, we can even define the entire environment's profile.

Defining servers and environments in software allows us to provide (1) High Availability: With a hardware failure, we can simply re-provision a server configuration to another server in seconds -- whether or not that server was running a VM host, or a native OS. (2) Disaster Recovery: we can re-constitute an environment of server profiles, including all of their networking, ports, addresses, etc., even if that environment hosts VMs and native OS's.

4. Unified Management

To achieve the ultimate in an agile IT environment, there's one remaining step: To orchestrate the management of infrastructure with the management of workloads. I think of this as an ideal Infrastructure-as-a-Service -- physical infrastructure that adapts to the needs of workloads, scaling up/out as conditions warrant, and providing workload-agnostic HA and DR. From an IT agility perspective, we would now be able to abstract nearly all components of a modern data center, and logically combine them on-the-fly as business demands require.

Getting back to the Gartner conference, I now realize one very big missing link -- while Gartner has been promoting their Real-Time Infrastructure (RTI) model now for some time, they have yet to link it to the coming revolution that will be enabled by fabric computing. Maybe we'll see some hint of this next year.

Thursday, November 19, 2009

Infrastructure Virtualization: The Next Logical Step

2010 will be an interesting year for virtualization - but not from the perspective you're probably thinking. It will be the year of the virtual infrastructure, not of the virtual machine.
Yes, the O/S virtualization market is maturing as it transforms how servers and applications are managed. The major vendors all offer hypervisors and management to accomplish server consolidation, live migration, HA, lifecycle management, lab management, and more. And they're even offering higher-level tools for DR and cloud computing... Read more on VMBlog.com

Tuesday, October 27, 2009

Infrastructure 2.0 – A Virtual Analogy

Is OS virtualization an end in itself? Is it both necessary and sufficient for all things Cloud and IaaS? Is it the panacea IT Operations has been looking for? From where I see it, abstracting the OS is certainly a great start, but it’s actually only 50% of the goal.

To a degree, OS virtualization is the “shiny metal object” de jure in that it’s captivating everyone’s attention. It is of course very valuable, and is causing an important inflection point in datacenter operations and economics. But there is a less-visible, less sexy side to datacenter operations and economics that lies “below” the CPU in the stack...

Read more on the Infrastructure 2.0 Blog

Tuesday, October 6, 2009

Differing Target Uses for IT Automation Types

One of the most oft-repeated themes at this year's VMworld was that of "automation." Everybody claimed they had it, but on closer investigation it had any number of poorly-defined meanings.

A specific angle I want to address here is that of infrastructure automation; that is, the dynamic manipulation of physical resources (virtualized or not) such as I/O, networking, load balancing, and storage connections - Sometimes referred to as "Infrastructure 2.0". Why is this important? Although automation of software (such as provisioning & manipulation of VMs/applications) usually captures attention, remember that there is a whole set of physical datacenter infrastructure layers that IT Ops has to deal with as well. When a new server (physical or virtual) is created, much of this infrastructure also has to be provisioned to support it.

There are 2 fundamental approaches to automation I'll compare/contrast: Let's loosely call them "In-Place" Infrastructure Automation, and Virtualized Infrastructure Automation.

Confession: I am a champion of IT automation. The industry has evolved into a morass of technologies and resulting complexity; the way applications (and datacenters) are constructed today is not the way a greenfield thinker would do it. Datacenters are stove-piped, hand-crafted, tightly-controlled and reasonably delicate. Automating how IT operates is the only way out -- hence the excitement over cloud computing, utility infrastructure, and the "everything-as-a-Service" movement. These technology initiatives are clear indications that IT operations desires a way to "escape" having to manage its mess.

At a high-level, automation has major top-level advantages: Lower steady-state OpEx, greater capital efficiency, and greater energy efficiency. And, automation also presents challenges typical of paradigm changes: distrust, organizational upheaval, financial and business changes. The art/science of introducing automation into an existing organization is to reap the benefits, and mitigate the challenges.

As infrastructure automation moves forward, it appears to be bifurcating along two different philosophies. Each is valid, but appropriate for differing types of uses:

"In-place" infrastructure automation: (distinct from run-book automation) Seeks to automate existing physical assets, deriving its value from masking the operational and physical complexity via orchestrating in-place resources. That is, it takes the physical topology (servers, I/O, ports, addressing, cabling, switches, VMs etc.) and orchestrate things to optimize a variable such as an SLA, energy consumption, etc.
Virtualized Infrastructure automation: Seeks to first virtualize the infrastructure (the assets as above) and then automate their creation, configuration and retirement. That is, I/O is virtualized, networking is frequently converged (i.e. a Fabric), and network switches, load balancers, etc. are virtualized as well.

Each of these two approaches has properties with pros and cons with which I'm familiar -- having worked for companies in each space. I'll try to elucidate a few of the "high points" for each:

"In-Place" Infrastructure Automation:
Examples: Cassatt (now part of CA), Scalent

Automates existing assets: Usually, there is no need to acquire new network or server hardware (although not all hardware will be compatible with the automation software). Thus "in-place" assets are generally re-purposed more efficiently than they would be in a manually-controlled scenario. Clearly this is one of the largest value propositions for this approach - automate what you already own.
Masking underlying complexity: A double-edged sword, I suppose, is that while "in-place" automation simplifies operation and streamlines efficiency, the datacenter's underlying complexity is still there - e.g. the same redundant (and sometimes sub-optimal) assets to maintain, same cabling, same multi-layer switching, same physical limitations, etc.
Alters security hierarchy: Since assets such as switches will now be controlled by machine (i.e. the automation SW automatically manipulates addresses and ports) this architecture will necessarily modify the security hierarchy, single-point-of-failure risks, etc. All assets fall under the command of the automation software controller.
Broad, but not complete, flexibility: Because this approach manipulates existing physical assets, certain physical limitations must remain in the datacenter. For example, physical server NICs and HBAs are what they are, and can't be altered. Or, for example, certain network topologies might not be able to be perfectly replicated if physical topologies don't closely match...or, if physical load balancers aren't available, servers/ports won't have access to them. Nonetheless, if properly architected, some of these limitations can be mitigated.
Use with OS virtualization: This approach usually takes control of the VMM as well, e.g. takes control of the VM management software, or directly controls the VMs itself. So, for example, you'd allow the automation manager to manipulate VMs, rather than vSphere.
Installation: Usually more complex to set up/maintain because all assets, versions, and physical topography necessarily need to be discovered and cataloged. But once running, the system will essentially maintain its own CMDB.

Virtualized Infrastructure Automation:
Examples: Cisco UCS, Egenera, Xsigo

Reduction/elimination of IT components: The good news here is that through virtualizing infrastructure, redundant components can be completely eliminated. For example, only a single I/O card with a single cable is needed per server, because they can be virtualized/presented to the CPU as any number of virtual connections and networks. And, a single virtualized switching node can present itself as any number of switches and load balancers for both storage and network data.
Complete flexibility in configuration: By abstracting infrastructure assets, they can be built/retired/repurposed on-demand. e.g. networking, load balancing, etc. can be created at-will with essentially arbitrary topologies.
Consistent/complementary to OS Virtualization models: If you think about it, virtualized infrastructure control is pretty complementary to OS virtualization. While OS virtualization logically defines servers (which can be consolidated, moved, duplicated, etc.), infrastructure virtualization similarly defines the "plumbing" and allows I/O and network consolidation, as well as movement/duplication of physical server properties to other locations.
New networking model: One thing to keep in mind is that with a completely virtualized/converged network, the way the network (and its security) is operationally managed changes. Organizations may have to re-think how (and who) creates and repurposes network assets. (Somewhat similar to coping with "VM Sprawl" in the software virtualization domain)
Use with OS virtualization: This approach is usually 'agnostic' to the software payload of the physical server, and is therefore neutral/indifferent to the VMM in place. Frequently the two can be coordinated, however.
Installation: Usually relatively simple. Few components per server, few cables, especially in a 'green field' deployment. Installation of software/BIOS on physical servers is probably not what you're used to, though.

Ideal use of these two approaches differs too. Obviously, "In-Place" Infrastructure Automation is probably best-suited for an existing set of complex datacenter assets - especially in a Dev/Test environment. As you'd expect , a number of existing lab automation products out there target this market. On the other hand Virtual Infrastructure Automation can certainly be deployed on existing assets, but its real value is for new installations where minimal hardware/cabling/networking can be designed-in from the ground up. Most of these products are designed for production data centers, as well as cloud/utility infrastructures.

My overall sense of the market is that adoption of "in-place" automation will be driven primarily by progressive IT staffs that want a taste of automation and service-level management. Virtualized Infrastructure Automation adoption, on the other hand, will tend to ride the technology wave driven both by networking vendors and OS virtualization vendors.

Stay tuned for additional product analyses in this space...

Tuesday, September 29, 2009

A real-world cloud user shares his findings

I subscribe to a number of mailing lists from my alma mater. A few weeks ago, an alum "John" posted a request for recommendations for a cloud computing vendor for his small investment firm. What follows is his email to the group following responses he received.

This is an incredibly illustrative peek inside of the "real world" of cloud computing, and what prospective SMB users are looking for and concerned about. As well as what's "Good Enough". I've not edited anything....

I had many requests to share our findings so I figured I would share with the group. I appreciate all of the input I received. It has been really helpful.

~ John

Having looked into cloud computing solutions for our small investment firm over the past few months, we have learned a lot about the growing movement towards remote data storage and accessibility. Our goal has been to find a cost-effective solution for our IT needs that would make it convenient for employees of our company to access our shared network (documents and emails) all over the globe without much hassle, difficulty, or expense. While the cloud computing landscape is still relatively new, what is already available is exciting. Both Google, Microsoft, and other companies have products available such as Google Apps and Microsoft Office Live, but neither has fully come to the point of being able to handle our business needs. We are currently in the process of setting up a Google Apps trial period, through a consultant, to try out business e-mail and calendar via Google’s Gmail and Google Calendar. We will do this test while retaining our current Microsoft Exchange server.

There have been many issues to consider as we have been speaking with various consultants and researching all of the available alternatives. First, since we are an SEC-registered investment adviser with lots of confidential and sensitive information on our hands, issues regarding the security of our electronic files – both in terms of disaster recovery as well the integrity of the company with whom we are entrusting to house our data – are paramount. This also ties in with the issue of record retention, which is equally important to us. In terms of data storage and backup – our current system is not ideal. We need to retain copies of all e-mails and files for at least seven years, if not more, and this information needs to be secure and easily accessible. There seem to be some progress in this area (Google Postini and Amazon S3, for example), but as of yet, there is not yet one system that can do all of these things in the way we’d require.

Second, since we currently are not pleased with our current remote network access - we would like an easy and inexpensive way to access email and our network drive from any computer with Internet access. We have discovered that while web-based, unlimited e-mail and calendar storage are currently available from multiple providers, a solution for mass file storage that would essentially replicate our shared network drive and allow large files for multiple software applications to be stored/backed up in the cloud does not yet exist at an attractive price. In particular, a system where we could modify docs in the cloud without having to download and upload/re-save the file each time it needs to be edited.

One interesting product we discovered during our search is called Dropbox. You download Dropbox to one computer, save any type of file you would like to a “Drop Box drive”, and it syncs up automatically with the Web. Then, when you are at home or traveling, you can access those docs through a web browser... or you can download Drop Box onto another computer anywhere and you can edit the docs directly in Drop Box. The only glitch is that Dropbox does not yet have file storage capacity for a company with over 200GB of data to store and seems to be geared more for individual users. Word on the street is that Google will be coming out with a new product soon that has similar features to Dropbox, but on a much larger scale that would be useful for businesses.

In terms of cost and ease, Google Apps seems to be the best solution for us right now (it comes out around $50/user/year), at least for the e-mail and archiving component. Microsoft’s upcoming 2010 Web Apps platform seems appealing as well, particularly because we might be able to edit complex Excel documents directly in the cloud from anywhere.

Bottom line, what we have learned is that this rapidly-developing option for IT is not yet 100% ready to cover all the bases our business needs, but it will probably get there sometime in the next year or two. For the time being, we are going to see how the e-mail works and go from there.

Tuesday, September 22, 2009

Alternative Recommendation for DCeP "Service Productivity"

Back in February of this year, The Green Grid published a paper listing proposed Proxy measures for data center productivity, specifically Data Center Energy Productivity (DCeP).

This paper followed a much earlier output from the group in 2007 - which helped define the now much-used PUE and DCiE metrics which I wrote about back then. Those metrics were (and are) nice if what you care about are "basic" efficiencies of a data center -- simply how much power is getting to your servers relative to all of the other power being consumed by infrastructure systems (e.g. lighting, power distribution, cooling, etc.). But the shortcomings are they don't quantify the "useful output" of a datacenter vs. power input. So, for example, you could have a fantastic PUE... but with a datacenter full of idle servers.

Again, enter The Green Grid to take analysis to the next level. The excellent paper published in February details 8 "proxy" approaches (i.e. not necessarily metrics) that could be used by data center operators to begin to baseline efficiencies based on "useful output". The Green Grid also set up a survey where they have been soliciting feedback from users regarding the appropriateness, usefulness, etc. of these proxies.

Why 8 approaches? Because not everyone agrees on what "useful work output" of a datacenter really is. Should it be Bits-per-kWh (proxy #4)? Weighted CPU utilization (proxies #5 & #6)? Compute units delivered per second (proxy #7)? Each has its pros and cons. Fortunately, the Green Grid recognized that nothing's perfect. Says the paper: "...The goal is to find a proxy that will substitute for a difficult measurement, but that still gives a good-enough indication of useful work completed."

In addition, the Data Center Knowledge blog pointed out:

The new goal is to develop a simple indicator, or proxy, rather than a full metric. The Green Grid compares the proxy to EPA mileage ratings for new cars, which provide useful data on energy efficiency, with the caveat that “your mileage may vary.” The proposals “do not explicitly address all data center devices and thus fall short of a complete overall measure of data center productivity,” the group says.

To this end, the issue was also recently dealt with extremely eloquently in Steve Chambers' ViewYonder perspective on datacenter efficiency - and has the right idea: Why not base efficiency on the service provided (as opposed to CPUs themselves, or some abstract mathematical element). This approach is very similar to what I proposed a year ago February, Measuring "useful work" of a Datacenter"

In short, the proposal is to compare the data center Service's SLAs with the power the overall datacenter consumes.

Why use the "SLA" (Service Level Agreement)? Two reasons. (1) The SLA is already part of the vernacular that datacenter operators already use. It's easily understood, and frequently well-documented. (2) The SLA encapsulates many "behind-the-scenes" factors that contribute to energy consumption. Take this example: Not all 1,000 seat email services are created equal. One may be within a Tier-I data center with a relatively low response rate requirement and allowing users only 500MB of storage per mailbox. Another enterprise with the same email application may be operating in a Tier-III datacenter environment with a rigorously-controlled response rate, a full disaster-recovery requirement, and 2GB of storage per mailbox. These two SLA examples are quite different and will therefore consume different power. But wouldn't you now rather compare apples-to-apples to see if your particular instantiation of these 1,000 mailboxes was more efficient to another enterprise with the same SLA?

How would such a proxy/measurement be accomplished? The approach is somewhat analogous to the Green Grid's proxy #1 ("Self-assessment reporting"), coupled with peer-reporting/comparison of data as is done with the DOE's DC-Pro tool.

Thus, data centers would
1) quantify the number of Services and SLAs for each,
2) measure overall power consumed,
3) upload these numbers to a public (but anonymized) database.
After a while, there would be statistically-significant comparisons to be made -- say a "best practice" energy efficiency range for a given Tier-III email application with 2GB storage and disaster-recovery option.

I'm open to other suggestions of how to pragmatically apply application SLAs vs Watts to gauge overall datacenter energy efficiency - again, my earlier proposal of this is here. But it seems that the SLA encapsulates all of the "output" related service metrics, while being agnostic to the actual implementation. Seems elegant, if you ask me.

Monday, September 14, 2009

An Ideal Datacenter-in-a-Box, Part II

Last week I posted a Blog outlining Dell & Egenera's latest Datacenter-in-a-Box offering. More than one person took note of how I compared its simplicity in contrast to other offerings in the same space, but failed to detail the specifics of Egenera's PAN Manager software and how it mapped to 13 common IT Service Management functions.

The 13 different functions are mapped onto the data center "stack" at right. They span management of both physical and virtual software, servers, I/O, networking, etc. -- as well as higher-level functions such as High-Availability and Disaster Recovery.

The Dell PAN offering unifies 12 of the 13 functions, and provides them from within a single console is called PAN Manger. (The 13th function if provided via the Dell Management Console.) This single-console infrastructure management software consists of the base PAN Builder software, as well as two optional modules, PAN Server Portability, and PAN Portability.

So, using the diagram from last week, the functionality maps as follows:

PAN Builder:

VM server management
Physical server management
Software (P & V) provisioning
I/O virtualization & management
IP load balancing
Network virtualization & management
Storage connection management
Infrastructure provisioning
Device (e.g. switch & load balancing) failover

PAN Server Portability:

Physical N+1 failover (HA)
Virtual host N+1 failover (HA)

PAN Portability:

Disaster recovery (DR) for entire mixed P & V environments

I hope this helps detail not only the Egenera product, but also illustrates what's possible when the industry combines server management with virtual I/O and virtual networking & switching. It's the perfect complement to O/S virtualization, and massively simplifies traditional IT Operations Management.

Thursday, September 10, 2009

An Ideal Datacenter-in-a-Box

Today's announcement marks a modest but meaningful step in Egenera's relationship with Dell, and in overall Simplification of IT .

Essentially the punchline is this: We've taken the most commonly-purchased hardware configuration and management tools used by mission-critical IT Ops, and integrated them into a single product with a single GUI that you can install and use in ~ 1 day. That's essentially the idea behind the "Datacenter-in-a-Box:"

Most common configuration: Blades + Networking + SAN Storage

Most useful tools to manage VMs + physical servers + network + I/O + SW provisioning + workload automation + high availability

That's what Egenera's done with Dell. It's a "unified computing" environment (to borrow a term) - but has integrated with it all of the most popular higher-level management functions too. That's to say it includes I/O virtualization, a converged network fabric (including virtual switches and load balancing - based on std. ethernet), and then includes tools for software provisioning, VM management, and high-availability to "universally" manage both physical and virtual workloads simultaneously. Pretty cool - and highly simple to use.

Don't believe all this stuff can be so simple? Here's evidence & illustrations why this move will help drive data center management toward greater simplification:

(1) Check out how easy it is to provision a complete compute environment with N+1 failover in 6 steps
(2) Compare the level of complexity reduction compared to some similar prod ucts
(3) The Dell PAN Datatcenter-in-a-Box (DCIB), together with the Dell Management Console, provides a massively simplified management landscape as compared with alternative solutions. To wit:

The set of "traditional" products you'd need to buy/integrate.
ALL of these functions are already integrated within the Dell PAN DCIB:

Then, the roughly-equivalent solution you'd compose with HP:

And finally, the roughly-equivalent solution you'd compose with Cisco and their partners:

I'd also be remiss without pointing out that this product SKU configuration is available directly from our friends at Dell - and was born directly from customer requests for such a building-block. Folks who've already purchased this technology based on the Dell PAN System include

Federal users who may replicate an entire mission-critical environment across dozens of aviation-related locations
Financial-services users who wanted a consolidated approach to ensuring high-availability across dozens of blades w/different workloads
Commercial customers wanting a flexible environment on which to run the company's SAP
A Federal hosted services provider wanting five-9's of availability plus being able to re-configure systems/capacity a la an "internal" cloud
Overseas users acting as an internal IT service provider seeking 'universal' HA and DR for all workloads

Plus thousands more worldwide locations where you can find the same PAN Manager software.

If you don't believe Dell hardware is ready for the Data Center, then think again.

Monday, September 7, 2009

Where the Server Industry Went Amiss

I've been doing an analysis regarding how "complexity" has evolved in the datacenter. Fundamentally, just why is it so hard to configure & provision new (physical) servers? Why is clustering inherently so complex? Why do we have data networks, storage networks and management networks (all distinct, I might add). How come we have all of these layered management systems?

OS virtualization has massively simplified complexity at the software level by abstracting-away the machine-level CPU commands, and has even contributed to simplifying networking between virtual machines. But we're still left with physical complexity at the physical I/O, networking and control levels - the other physical piece-parts (KVM ports, NICs, HBAs, etc.).

Ultimately, all of this complexity gradually resulted from incremental server hardware evolution… the motherboards to be exact. Way back when the computer industry was just getting started, motherboards harbored a simple CPU and remedial I/O (e.g. an audio jack to a cassette tape for storage...). But as processors got more sophisticated and datacenter environments grew, CPUs were integrated with more complex I/O (e.g. Network Interface Cards) as well as with storage connectivity (e.g. Host Bus Adaptors). Plus, there was usually a local disk, of course.

This meant that the server retained static data, specifically things like I/O addressing and storage connectivity naming, not to mention data on the local disk -- resulting in the server having a static “state". Usually the local network had state too – ensuring that the IP and MAC address of the motherboard were attached to switches and LANs in a particular way. Add to this the fact that with critical applications, all of these components (plus naming/addressing) were frequently duplicated for redundancy.

This meant that if you had to replace (or clone) a physical server, say because of a failure, you had to re-configure all of these addresses, names, storage connections and networks – and sometimes in duplicate. This resulted in lots of things to administer to, and lots of room for error. And frankly, this is where fundamental “data center complexity” probably arose from.

In response to dealing with failures and complexity, vendors developed special-purpose clustering and failover software – necessarily closely-coupled to specific software and hardware – to provide the re-assignment of state to the new hardware and networking. This software often required hand-crafted integration and frequent testing to ensure that all of the addressing, I/O, and connectivity operations worked properly. And many of these special-purpose systems are what are in use today.

Similarly, there are equally complicated software packages for scale-out and grid computing, that perform similar operations – not for the purpose of failure correction, but for “cloning” hardware to scale-out systems for parallel computing, databases, etc. But these systems are equally complex and usually application-specific, again having to deal with replicating Stateful computing resources.

So the industry, in an effort to add “smarts” and sophistication to the server – to enable it to fail-over or to scale – has instead created complexity and inflexibility for itself. Had the industry instead defined I/O, networks and addressing logically, then the way we assign/allocate servers would have been forever simplified and streamlined.

Fortunately, some technologies are being applied to somewhat revert/simplify:

I/O virtualization appliances which logically consolidate all I/O into one reconfigurable physical card (e.g. Xsigo)
Infrastructure virtualization software which logically defines all I/O, networking and switching so that any CPU-I/O-Network config. can be defined to take the place of any other CPU-I/O-Network config. (e.g. Egenera, Cisco UCS and to some degree HP's VirtualConnect)
CPU pooling hardware/software which replace traditional I/O to make multiple physical servers act as large multi-core CPUs (e.g. 3leaf)

Unfortunately, the industry's own momentum sustains the level of complexity - most players continue to develop software products to handle/abstract the increasing complication. Nor is in the interest of the board designers & silicon manufacturers to _reduce_ the number of chips & cards associated with servers. So, we may not see a significant architectural change in stateful processing units - until the industry gradually acquiesces that there is an alternative to all of this madness.

Monday, August 24, 2009

Products for Cloud Ops vs. Traditional Ops

Most in IT agree that cloud computing - while not a technology - does impact how technology is used within IT, and also implies a change in how IT operations will manage infrastructure. So it comes (to me) as no surprise that a number of traditional "point-product" IT Service Management (ITSM) products might be obviated by the new cloud computing operational paradigm - while others may morph in how they're used.

I touched on this topic a little over a year ago when looking at ITIL, ITSM and the Cloud as well as assessing how capacity and consolidation planning will likely change.

When I read Gartner's Hype Cycle for IT Operations Manag ement 2009, I was struck by how many technology categories may really need to change (or may simply become unnecessary) in a cloud model. (BTW, I should point out that cloud computing itself is at the peak of Gartner's overall Technology Hype Cycle for 2009.)

I see roughly five dimensions for how ITSM product use might shift within a cloud model:

Overall increase in use, due to IT Operational needs created by the automation and dynamics within a cloud infrastructure. With more dynamic/unpredictable resource requirements, some ITSM tools may become more valuable than ever. For example, take Billing/Chargeback. Clearly any provider of a public (or internal) cloud will need this to provide the pay-as-you-go economic model, particularly as individual resource needs shift over time. Same clearly goes for tools such as Dynamic Workload Brokering, etc.
Overall decrease/obviation of need, due to the the automation/virtualization within a cloud infrastructure. As automation begins to manage resources within the cloud, certain closely-monitored and managed services may simply be obviated. Take for example application-specific Capacity Planning; no longer will this matter to the degree it used to - now that we have "elastic" cloud capacity. Similarly, things like event correlation _might_ no longer be needed -- at least by the end-user -- because automation shields them from need to know about infrastructure-related issues.
Shift in use to the cloud operator - that is, the IT Service Provider will tend to use certain ITSM tools more. For example, Asset Management, Global Capacity Management and QoS tools necessarily mean nothing to the end-user now, but may still be critically-important to the SP.
Shift in use to the cloud end-user - that is, the cloud end-user may tend to use certain ITSM tools more - chiefly because they do not directly 'own' or manage infrastructure anymore - just executable images. i.e. end-users using IaaS clouds will need to maintain their Application and Service Portfolio tools to manage uploadable images etc. Conversely, End-Users may no longer care about Configuration Auditing tools - since that would not be managed by the cloud provider.
Transition from being app-specific to environment-specific - that is, a shift from tools being used to monitor/control/manage a limited-scope application stacks, to being used to do the same across a large shared infrastructure. As above, Capacity and Consolidation Planning tools are no longer of interest to the end-user on a single-application-scale. But to the cloud operator, knowing "global" capacity and utilization is critical.

In retrospect, I can probably concoct exceptions to almost every example above. So keep in mind the examples are illustrative only!

The diagram below is also mostly conceptual; I am not an ITSM professional. But while it may be a bit of a 'hack', I'm hoping it provides food-for-thought regarding how certain tools may evolve, and where certain tools may be useful in new/different ways. I've selected a number of ITSM tools from Gartner's IT Ops Management Hype Cycle report to populate it with.

Monday, August 17, 2009

What's your "data center complexity factor"?

After speaking with IT users, analysts and vendors, I've tried to draw-up a "map" of some of the most common data center management tools directly related to operations, and how they layer across typical infrastructures. So far, I see about 13 tools in use. (I'm not counting higher-level administrative tools e.g. compliance mgmt, accounting, problem management etc. - but watch this space for a future Blog)

I'm curious to see (a) if I have it right, and (b) how does *your* infrastructure management 'stack' up? Can you do better than 13? Worse?

Tuesday, July 14, 2009

Quantifying Data Center Simplification

Ever read that marketing fluff that says "blah blah simplifies your data center"? Ever wonder what that means and whether there is any quantifiable measure?

Infrastructure & management simplification is more than simply reducing ports, cable counts, and more than simply virtualizing/consolidating. (In fact, if done improperly, each of these approaches ultimately adds management complexity)

To me, true simplification isn't 'masking' complexity with more management and abstraction layers. Rather, it's about taking a systems-style approach to re-thinking the componentry and interaction of items across both software and infrastructure. For example, independently-managed components (and management products) can consist of

Server/CPU status, workload repurposing
Server console/KVMs
Physical app management
Virtual app management
Physical HA/DR
Virtual HA/DR
Storage connectivity
I/O management
Networking & switch management
software systems Management

It's not just about reducing cables & ports!

Three observations I've recently made have driven this concept home to me.

1. The arising of true Infrastructure management: systems like PAN Manager which essentially manage all of the above bullets together as a true "system" (see my earlier post on 6 simple steps to take to managing IT infrastructure) Nowhere else will you see as many as 6-7 complex IT management functions reduced to a single console.

2. An average case-study of a PAN Manager user. For example take a major Online Grocer dealing with a storefront website (environment was BEA WebLogic, Oracle9i RAC, CRM, business intelligence, etc.) for delivery admin and payment processing. Complexity consisted of traditional systems management, and then the addition of clustering and the 1:1 duplication of server, network and SW tools.

With a systems-style management approach, ultimately, servers, ports, cables, NICs, HBAs, disks, recovery systems -- and most of all, admin time and OpEx -- fell dramatically with a PAN-managed, systems approach to simplification. That took componentry from ~1,500 "moving parts" down to under 200. To me, "elegant engineering" equates to simplification.

3. Major equipment vendors are offering similar infrastructure management products to those from Egenera. But the "systems" issue still persists, even if some of them have solved for the networking, I/O & switching parts. So, after a pretty detailed analysis I did, it's still obvious that multiple "point products" are still needed to operate these products. Those products still represent a non-systems approach, to me at least.

What would you rather manage? A bunch of point products that mask complexity, or a true system that re-thinks how data center infrastructure is run? I'm thinking the PAN Manager-run system :)

Tuesday, July 7, 2009

Why (and How) Low-Cost Servers Will Dominate

Or, why high-end servers will be obviated by software...

I begin this blog with a true story. 2 weeks ago I was training a new Account Executive about the virtues of server automation, I/O virtualization, converged networking, etc. To his credit (or mine?), within the first hour he blurts out "then if a customer uses this stuff, they should be able get five-9's of availability from run-of-the-mill hardware, right?"

And that's the point: The age of high-end, super-redundant, high-reliability servers is slowly coming to an end. They're being replaced by volume servers and intelligent networks that "self heal". (Head nod to IBM for coining the term, but never following through)

I pointed-out to my trainee that folks like Amazon and other mega-providers don't fill their data centers with high-end HP, Sun or IBM gear anymore. Rather, companies like Google use scads of super-inexpensive servers. And if/when the hardware craps-out, it is software that automatically swaps-in a new resource.

It's like the transformation back in the late 1700's with Eli Whitney and mass production, where mechanical systems - including the now-famous Colt revolver - were made to simple, standard, interchangeable specifications. Similarly today, rather than hand-crafting every system stack (software, VMs, networking - everything) we're moving to a world where simple, standard HW and configurations can do the 99% of the job. And it's the software management that simply works-around failed components. This trend in IT was noticed back in 2006 (probably earlier) as the "Cheap Revolution" pointed out in a now-famous Forbes Article.

So what's the punch-line here? I believe that the vendors who'll "win" will be those who are effective at producing low-cost, volume servers with standard networking... But most of all, the winners will be effective at wrapping their wares in a system that is designed for automatic interchangeability.

To wit: in a recent IDC study, while all server sales segments were forecast to fall in 2010, the Volume segment was the only one expected to experience a gain.

I believe that the days for buying super-high-end, high-reliability servers are numbered (for all but some of the most critical telco-grade apps). The Dells of the world have an opportunity; and the Suns, HPs and IBMs will need to re-think the future of their "high-end".

Were I a vendor with a strong volume server play, I would continue to push on hardware pricing, and begin to emphasize a hardware/network self-management strategy.

One other (slightly random) analogy. Shai Agassi's Better Place company *isn't* a car company. It's really a software and networking company. With the right network and infrastructure, the vehicles are efficient and always have a 'filling' station to keep running. Similarly, IT is transforming from it "being about the hardware" to being about how the hardware is networked and managed. Think about it.

Monday, June 29, 2009

HPQ & CSCO: Analysis of New Blade Environments

I've been spending some significant time analyzing new entries into the blade computing market, and poking around in the corners where the trade rags and analysts have failed to investigate. And, as the line goes, "some of the answers may surprise you."

The two big recent entrants/announcements were Cisco's Unified Computing System (made this past March) and then HP's BladeSystem Matrix (made in June). Both are implicitly or explicitly taking aim at each other as they chase the enterprise data center market. They're also both teaming with virtualization providers, as well as hoping for success in cloud computing. Each has a differing technology approach to blade repurposing, and each differs in the type (and source) of management control software. But how revolutionary and simplifying are they?

HP's BladeSystem Matrix architecture is based on VirtualConnect infrastructure, and bundled with a suite of mostly existing HP software (Insight Dynamics - VSE, Orchestration, Recovery, Virtual Connect Enterprise Manager) which itself consists of about 21 individual products. Cautioned Paul Venezia in his Computerworld review:

“The setup and initial configuration of the Matrix product is not for the faint of heart. You must know your way around all the products quite well and be able to provide an adequate framework for the Matrix layer to function.”

From a network performance perspective, Matrix includes 2x10Gb ‘fabric’ connections, 16x8Gb SAN uplinks, and 16x10Gb Ethernet uplinks. The only major things missing from their "Starter Kit" suite they offer are the addition of VMware - not cheap if you choose to purchase it - as well as the addition of a blade (or two) to serve as controllers of the system.

From Cisco, the UCS System is based on a series of server enclosures interconnected via a converged network fabric (which does a somewhat analogous job of repurposing blades as does HP's VirtualConnect). The UCS Manager software bundled with the system provides core functionality (see diagram, right). Note, that at the bottom of their "stack", Cisco turns to partners such as BMC for "higher level" value such as high-availability and VMware for virtualization management. As sophisticated as it is, in contrast to HP, this software is essentially "1.0" and full integration w/third-party software is probably a bit more nascent than with HP.

As you would expect, the system has pretty fast networking; Cisco’s system includes 2x10Gb fabric interconnects, 8x4Gb SAN uplink ports, and 8x10Gb Ethernet uplink ports. (But as the system scales to 100's of blades, you can't get true 10Gb fabric point-to-point.)

But really, how simple?

What I continue to find surprising is how both vendors boast about simplicity. True, both have made huge strides in the hardware world to allow for blade repurposing, I/O, address, and storage naming portability, etc. However, in the software domain, each still relies on multiple individual products to accomplish tasks such as SW provisioning, HA/availability, VM management, load balancing, etc. So there's still that nasty need to integrate multiple products and to work across multiple GUIs.

A little comparison chart (at right) shows what an IT shop might have to do to accomplish a list of typical functions. Clearly there are still many 3rd-party products to buy, and many GUIs and controls to learn.

Still, these systems are - believe it or not - a major step forward in IT management. As technology progresses, I would assume both vendors will attempt to more closely integrate (and/or acquire?) technologies and products to form more seamless management products for their gear.

Thursday, June 11, 2009

RTI Fabrics... not just a networking play

Pete Manca, Egenera's CTO, posted an excellent Blog Explaining RTI Architectures, (a term coined by Gartner some time ago) and does a nice job of taking a pretty objective approach to 3 types:

"A converged fabric architecture takes a single type of fabric (e.g. Ethernet) and converges various protocols on it in a shared fashion. For example, Cisco’s UCS converges IP and Fiber Channel (FC) packets on the same Ethernet fabric. Egenera’s fabric does the same thing on both Ethernet fabrics (with our Dell PAN System solution) and on an ATM fabric (on our BladeFrame solution)...

"Dynamic Fabrics are not converged, but rather separate fabrics that can be have their configuration modified dynamically. This is the approach that HP uses. Rather than utilize a converged fabric, HP has separate fabrics for FC and Ethernet. These fabrics can be dynamically re-configured to account for server fail-over and migration. HP’s VirtualConnect and Flex10 products are separate switches for Fiber Channel and Ethernet traffic, respectively."

"The 3rd type of fabric is a Managed Fabric. In this architecture there is no convergence at all. Rather, the vendor programs the Ethernet and Fiber Channel switches to allow servers to migrate. This is a bit like the Dynamic Fabric above, however, these typically are not captive switches and there is no convergence whatsoever."

I'll take some liberty here, and emphasize a pretty important point:

Converged /managed fabrics aren't attractive just because they simplify networking. It's because they are a perfectly complementary technology to managing server repurposing as well. That's for *both* physical servers and virtual hosts.

It's no wonder why IBM (with their Open Fabric Manager), HP (with their Matrix bundle), Cisco (with UCS) and Dell/Egenera (with the Dell PAN System) are all pushing in this area.

Why? Because once you have control over networking, I/O and storage connectivity, you've greatly simplified the problem of repurposing any given CPU. That means scaling-out is easier, failing-over is easier, and even recovering entire environmentns is easier. You don't have to worry about re-creating IPs, MACs, WWNs etc., because it's taken care of.

So, if you can combine Fabric control with SLA management and then with server (physical and virtual) provisioning, you've got an elegant, flexible compute environment.

Tuesday, June 2, 2009

CA's Acquisition of Cassatt - Hindsight & Foresight

Today I read the press release and Gordon Haff's analysis that Computer Associates has acquired Cassatt -- a former employer of mine.

CA probably appreciates that they have a real gem. But like all things Tech, most cool products are not "build it and they will come". However, I can say that Bill Coleman (Cassatt's CEO) and Rob Gingell (Cassatt's CTO and former Sun Fellow) really have a break-the-glass vision. Now lets see if the new lease-on-life for the vision (and product) will take shape.

Vision vs. speedbumps
Cassatt's vision - led by Rob - is still out in front of the current IT trends... but not by too far. As much as 3 years ago, the company was anticipating "virtualization sprawl", the need for automating VMs, the expectation that IT environments will have both physical and virtual machines, and the fact that "you shouldn't care what machines your software runs on, so long as you meet your SLA". That last bit, BTW, presaged all of our current 'hype' about cloud computing!

The instantiation of these observations was a product that put almost all of the datacenter on "autopilot" -- Servers, VMs, switches, load-balancers, even server power controllers and power strips. The controller was then managed/triggered by user-definable thresholds, which could build/re-build/scale/virtualilze servers on-the-fly, and do just about anything needed to ensure SLAs were beging met. And it worked, all-the-time making most efficient use of IT resources and power. As Rob would say "we don't tell you what just happened - like so many management products. We actually take action and tell you what we did." Does it sound like Amazon's recent CloudWatch, Auto-Scaling and Elastic Load Balancing announcement? Yep.

Finally, the coup the company had -- and what the industry still has to appreciate -- is that the product takes a "services-centric" view of the data center. Rather than focusing on *servers* the GUI focuses on *services*. This scales more easily, and gives the user a more intuitive sense of what they really care about -- service availability... not granular stuff like physical servers or how they're connected.

Unfortunately for Cassatt, there is an inherent tension between how ISVs develop products, and how IT customers buy them. ISVs are always looking for the next leap-frog.... while IT customers almost always play the conservative card by purchasing incremental/non-disruptive technology.

So the available market of real leap-frog CIOs is still small... but growing. I would expect the first-movers to adopt this won't be traditional enterprises -- but rather Service Providers, Hosting Providers and perhaps even IT Disaster Recovery operations looking to get into the IaaS and/or Cloud Computing space.

What it could mean to CA
So why would CA buy Cassatt? Unfortunately, it's not to acquire Cassatt's customers. It is much more likely to acquire technology and talent.

Given that CA seems to be a tier-2 player in the data center management space, Cassatt would help them legitimize their strategy, and pull-together a cloud-computing play that other competitors of CA's are already moving down the road on. Cassatt's product ought to also complement CA's "Lean IT" marketing initiative

The other good news is that CA has a number of Infrastructure Management products that ought to complement Cassatt technology. There is Spectrum (infrastructure monitoring), Workload Automation (more of a RBA soulution that might get partially displaced by Cassatt), Services Catalog, and Wily's APM suite. BTW, there's a pretty decent WP available on CA's website on Automating Virtualization.

Per Donald Ferguson, CA’s Chief Architect: “Cassatt invented an elegant and innovative architecture and algorithms for data center performance optimization. Incorporating Cassatt’s analysis and optimization capabilities into CA’s world-class business-driven automation solution will enable cloud-style computing to reliably drive efficiencies in both on-premises, private data centers and off-premises, utility data centers. We believe the result will be a uniquely comprehensive infrastructure management approach, spanning monitoring, analysis, planning, optimization and execution.”

I could see CA now beginning to target large enterprises as well as xSPs to begin to leverage Cassatt technology, as their engineering teams begin integrating bridges to other CA suite products. It will also take CA's sales and support organizations some time to digest all of this, and then bring it to market through their channels.

But Cassatt will bring to them a bunch of sharp technical and marketing minds. Stay tuned. CA's a new player now.

Monday, May 18, 2009

New AWS enable "Real" Elastic Clouds

Yesterday Amazon announced a new set of services for their EC2 "elastic compute cloud" and these perhaps represent the real "holy grail" for cloud computing. While not new concepts, they illustrate how "real" cloud computing elasticity works, and challenge a few other virtualization & automation providers to do the same.

Amazon CloudWatch: A for-fee ($0.015 per AWS instance monitored) service that:
provides monitoring for AWS cloud resources... It provides customers with visibility into resource utilization, operational performance, and overall demand patterns—including metrics such as CPU utilization, disk reads and writes, and network traffic...

Amazon Auto Scaling: a free service that:
automatically scales your Amazon EC2 capacity up or down according to conditions you define. With Auto Scaling, you can ensure that the number of Amazon EC2 instances you’re using scales up seamlessly during demand spikes to maintain performance, and scales down automatically during demand lulls to minimize costs. Auto Scaling is particularly well suited for applications that experience hourly, daily, or weekly variability in usage. Auto Scaling is enabled by Amazon CloudWatch...
Amazon Elastic Load Balancing: A for-fee ($0.025/hour/balancer + $0.008/GB transferred) which
automatically distributes incoming application traffic across multiple Amazon EC2 instances. It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic. Elastic Load Balancing detects unhealthy instances within a pool and automatically reroutes traffic to healthy instances until the unhealthy instances have been restored...

To-date, users of Amazon EC2 have had to do these sorts of things manually, if at all. Now Amazon is building these services into AWS (as well as into Amazon's pricing and business model).

Not entirely new concept:
Note that what Amazon is doing is not entirely new. For example, if you're considering building these sorts of capabilities into your own "internal cloud" infrastructure, there are a few products that provide similar solutions. e.g. in 2008, Cassatt (RIP?) announced its own capacity-on-demand technology, which created/retired entirely new instances of a service based on user-definable demand/performance criteria. (I should add that Auto Scaling & CloudWatch operate similarly -- you can define a number of performance and SLA parameters to trigger grow/shrink scaling commands).

Similarly, Egenera's PAN Manager approach dynamically load-balances networking traffic between newly-created instances of an App. And, products such as 3Tera also enable users to define components (such as load balancers) in software. All of this adds-up to truly "adaptive infrastructure" that responds to loads, failures, SLAs etc. automatically.

The challenge to others:
So, if Amazon can instantiate these services in the "public" cloud, then I would expect others -- notably providers such as VMware, Citrix, MSFT etc. -- to provide similar technologies for folks building their own infrastructure.

For example, in VMware's "vCloud" I would expect to see services (some day) that provide similar monitoring, auto-scaling, and load balancing features. If virtualization providers are to take "internal cloud comuting" seriously, these are automation-related services they'll be required to provide.

Kudos to Amazon:
And finally, Amazon has done two saavy things in one move - (a) they've once again shown the world what a true "cloud" computing infrastructure ought to do, and (b) provided another nice value (and revenue stream!) to complement their per-instace EC2 fees. Remember: The easier they make it to scale your EC2 instances, the more $ they make...

Looking forward to how the industry will respond....

Tuesday, May 12, 2009

Profiling questions nobody's asking re: cloud applications

I find it odd that so much is being written about defining cloud terminology, cloud operation, and cloud construction... But so little attention is being paid to identifying & profiling which applications are best-suited to actually run in an external "cloud."

I'd like to see a comprehensive list (or decision-tree?) of what ideal application properties pre-dispose apps for being well-suited to run in a cloud. And, for that matter, what qualities might potentially disqualify apps from running in a cloud as well. (BTW, a great Blog by John Willis, Top 10 reasons for not using a cloud, was another initiator of my thought process)

Customers of mine are attracted to the value prop of a cloud (or a "utility" infrastructure)... but need guidance regarding what apps (or components) should be candidates for these environments. And recent conversations with prominent analysts haven't helped me... yet.

I'm also surprised that consulting/service companies aren't all over this issue... offering advice, analysis and infrastructure "profiling" for enterprises considering using clouds. Or am I missing something?

So, with no further ado, I've begun to jot down thoughts toward a comprehensive list of application properties/qualities where we could "rank" an application for its appropriateness to be "outsourced" to a cloud. I've chosen to annotate each factor with a "Y" if the app is appropriate for an external cloud, a "N" if not, and a "M" if maybe.

Dynamics/Cyclicality

Y Apps with highly dynamic (hourly/daily/etc.) or unpredictable in compute demand.
A cloud's elasticity ensures that only enough capacity is allotted at any given time, and release when not needed. This avoids having to buy excess capital for "peak" periods.
M Apps where compute demand is "flat" and/or constant.
Not clear to me if it makes sense to outsource an app if it's demand is "steady-state" - maybe just keep it in-house and purring along?
M Apps where demand is highly counter-cyclical with other applications
In other words, if an application runs out-of-phase with other apps in-house (say, backup apps that run in the middle of the night when other apps are quiescent) then it might make sense to keep in-house... they make better use of existing capital, assuming that capital can be re-purposed.

Size / Temporality

Y Apps that are very "big" in terms of compute need, but which "go away" after a period of time
Such as classic "batch jobs", whether they're daily trade reconciliations, data mining projects, scientific/computational, etc.
N Apps that are "small" and constant in nature
Apps such as Edge apps like print services, monitoring services etc.
Y Apps that are part of test environments
Apps - and even entire environments - which are being tested prior to roll-out in production. This approach eliminates costly/redundant staging environments
M Apps for dev and test uses
For example, where environment and regression testing is concerned, and environments are built and torn-down frequently. However, certain environments are inherently bound to hardware and/or tested for performance, and these may need to remain "in-house" on specific hardware (see below)

Inherent application functionality

N Apps that are inherently "internal"
Such as internal back-up software, "edge" applications like printer servers, etc.
N Apps that are inherently bound to hardware
Such as physical instances of software for specific (e.g. high-performance) hardware, or physical instances remaining so for scale-out reasons. Also, physical instances on ultra-high-reliability hardware (e.g. carrier-grade servers) .

Responsiveness/Performance

N Apps needing high-performance, and/or time-bound requirements
such as exchange trading algorithms, where response and deleay (even down to microseconds) is critical, and needs to be tightly monitored, controlled and optimized

Security / Auditability / Regulatory / Legal
NB: Also see an excellent Blog by James Urquhart on regulatory issues in this space.

M Apps where data must be maintained within (or outside of) specific county borders
Data within certain borders may be subject to search/access by the government (e.g. Patriot Act). Data may be required to be maintained within sovereign borders... or it may be preferred that the data explicitly be maintained outside of sovereign borders to avoid such searches.
M Apps requiring tight compliance/auditability trails
Ordinarily, I'd give this a "N", but some tools are coming out that help ensure compliance for apps that exist in the cloud. Apparently HIPAA regulations essentially prohibit use of clouds right now. Stay tuned here.
N Apps manipulating government data, e.g., where laws require direct data oversight
Many government databases are required to be maintained within government facilities behind government firewalls.
N Apps where software licensing prohibits cloud
e.g. some software licensing may be tied to specific CPUs; some licensing may not permit virtualization (as is found in most clouds); certain licensing may not permit use outside of specific physical domains.

Curious to hear whether these are some of the properties being taken into account... and what other pointers people have. And most of all, curious to hear whether (and if so, which) service providers & consultancies are currently using guidelines such as these.

Wednesday, May 6, 2009

In their own words: Valley CTOs' Blogs & Tweets

I noticed that I subscribe to feeds from a number of CTO-like folks, so I thought I'd publish a few of my favorites:

Sun's CTO, Greg Papadopoulos - Blog

I love listening to Greg - definitely a visionary, definitely well-connected. He's the guy I referenced back in 2007 when he observed that The World Only Needs 5 Computers. Looking like this concept is really coming true.

Cisco's CTO, Padmasree Warrior - Blog - Twitter

Padmasree's blog started getting major traction when Cisco "leaked" their UCS system. As Cisco's visionary, she has unbelievable insight into where IT is going, with a great sense of "humanity" and realism thrown in.

Amazon's CTO, Werner Vogels - Blog - Twitter

What can I say? As the chief spokesperson for Amazon Web Services, he does a great job championing All Things Cloud. I had the opportunity to see him a few months ago at the Cloud Computing Expo in NY, and his vision is compelling.

Intel CTO, Justin Rattner - Blog

While only an occasional Blogger, he definitely reflects Intel's position on a number of issues and technologies.

BMC Software CTO, Tom Bishop - Blog

Tom has a great way of posing thought questions and industry issues from an Enterprise Management perspective. He doesn't seem to get lots of comments on his Blog, though. Wonder why. :|

Novell's CTO, Jeff Jaffe - Blog

A frequent blogger (and, I might add, so is Novell's CMO). I like this blog b/c I think it gives a real sense of where his (and Novell's) mind is at.

HP's Cloud Services CTO, Russ Daniels - Blog

Russ is HP's visionary in the cloud/SaaS space. Very cool guy. Unfortunately his Blog reads more like Twitter updates. But he's posted a few interesting videos to the web recently.

These are the guys I track... and I'm surprised that more tech companies either don't have official CTOs, or don't tend to condone Blogging/Tweeting.

Which others do you follow?

Tuesday, May 5, 2009

Infrastructure Orchestration in use within SPs & Hosting providers

For the past months I've held that new technologies are OK... but the litmus test is whether they're actually used and valuable in the real world.

One of those new technologies in the Enterprise Data Center space is what I call Infrastructure Orchestration (others term it fabric computing or unified computing). HP, IBM and now even Cisco have solutions in the space, but I believe only Egenera has been doing it the longest, and has the broadest installed base of enterprises in the real-world using it and expanding footprint.

With the explosive growth of virtualization, this segment of technology is hotter than ever. Why? In the way virtualization abstracts & configures the software world (O/S, applications, etc.), Infrastructure Orchestration abstracts and defines/configures the infrastructure world (I/O, NIC cards, HBA cards, storage connectivity, LANs, switches, etc.). So, not only can you define a virtual server instantly, you can define a *physical* server (maybe a virtual host, or a physical machine) down to I/O, NICs, Storage and Network. By doing this, you can reconstruct an entire data center -- giving you a unified approach to HA and/or DR. Cool.

I've been pointing out applications for this technology in Healthcare as well as in the Financial sector, and I thought it would also be useful to illustrate value in the Service Provider / Hosting market.

For this segment, the Infrastructure Orchestration approach is essentially used to build Infrastructure-as-a-Service, or IaaS. In the past it's been called "utility computing" but in the era of cloud computing, this seems to be the term in use.

Savvis

In 2004, Savvis set a goal to become the industry’s first totally virtualized utility computing data center, integrating virtualized servers, storage, networks, and security into an end-to-end solution. Today, the service provider houses over 1,425 virtual servers running on 70 industrystandard Egenera servers, 370 terabytes of storage and 1,250 virtualized firewalls.

As a complement to its managed hosting and collocation business, the company has built huge, scalable service platforms that can be leveraged by multiple clients with full security. This utility approach enables them to charge customers for resources more closely tailored to their actual needs. Each year, more revenues and profits are generated from utility hosting contracts with business and government customers ranging from start-up entrepreneurs to the largest enterprises in the world, enabling Savvis to compete and win against traditional hosting providers and outsourcers.

Albridge Solutions:

Albridge Solutions migrated from UNIX servers to industry-standard servers running Linux and Egenera-based Infrastructure Orchestration. Initially, they considered building a virtualized environment by combining virtualization and management point-products. They discovered, however, that resulting complexity would be overwhelming. Servers from the industry’s largest vendors were also ruled out since their legacy architectures made virtualization and resource sharing impossible. Today, using industry-standard servers and Egenera's software, Albridge can run any application on any server at any time based on demand... regardless of whether those applications are virtual, or native.

Panasonic Electric Works Information Systems Co., Ltd.

Panasonic chose Egenera products to consolidate servers and reduce floor-space. Along with enabling server consolidation, the software is delivering superior high availability (HA) and disaster recovery (HA). Applications running in the data center include an order-processing service for the manufacturing industry, a content delivery system and Electronic Data Interchange (EDI). Based on results, Panasonic has designated Egenera software as its standard infrastructure virtualization management software for mission-critical processing.