Sunday, May 29, 2011

It's All Just Data to Me

Now that I’ve been with EMC for a few months, my relationship to storage, computing, and networking has once again shifted. And, in the context of the cloud computing operations model, my relationship to the physical location of data - and processing of that data - has shifted too.

My new perspective starts with Computer Science 101: Where, at its heart, computing is simply data and instructions (stored on similar media) which are combined on a device (CPU) and produce an output.

Since computing began, this model was consistent – but as the data and instructions grew in size and abstraction, the media changed to the point where instructions (code) and data, were each stored in physically separate locations.

Until recently the data and instructions would be transported (over the network) to individual physical CPUs (with their own sets of OS) where they would be combined and executed. And then, the resulting data generally was transported back to its place of residence.

Servers are Just Bits

Now, enter the Virtual Machine.  At the heart of it, it's simply another file (e.g. VMDK) – in other words, just more data.

So in the modern virtualized data center, what we have – at the extreme – is a model where not only the data and instructions are bits… but the servers are bits too. All they require are physical CPUs to execute.

In the ‘traditional’ model, the data and instructions were brought to where the physical servers and O/S were.  But today, with pervasive farms of generic physical servers, we have the situation where *either* the data bits can be brought to the server, or the server bits can be brought to the data.

Some of the implications you’ve probably already thought of – such as vMotion of a VM from one physical server to another, or using a DRS-style control to re-locate VMs from failed physical compute resources elsewhere.

But consider another situation that’s happening with increasing frequency: The need to work with “Big Data” – such as running analytics on unstructured bits that could be on the Terabyte to Petabyte scale.  Here is a case where it makes sense to send Mohamed to the mountain than the other way around… To literally re-locate the servers (which are, after all just data themselves) closer to, or co-incident with, the data.

Or, consider a “follow-the-moon” strategy for data center energy efficiency: where the most energy-efficient (and least expensive) physical servers are chosen to handle workloads. Once again, the data (which includes the virtual server, data and instructions) is simply transported to the optimal set of physical processing resources.

Cloud Infrastructure and Data Management

From where I sit, the importance of data storage, data management and data portability suddenly becomes paramount. It can reasonably be argued that physical servers are now merely execution platforms for the VM data bits, and that the network is simply becoming flatter and fatter.

So the future data center and cloud model might be thought about as a data management problem. Where and how to locate bits, back-up bits, scale bits, operate on bits.   True, this is a data-centric view of the world. But it's also a healthy perspective from which to view the renewed importance of data and its dynamics, versus the other more static components of the data center.