Monday, May 20, 2013

What Clouds Will Form Around Data's Gravity?

The concept of Data Gravity posits that as data accumulates (whether it be stored, analyzed, used) it tends to attract even more similar data. And as the data amasses there is less likelihood that it will be moved/migrated elsewhere.  If you're not already familiar with the concept definitely check out Dave McCrory's excellent blogs, analyses and presentations on

I believe this data aggregation concept can also apply to attracting computing too. As I've mentioned in It will be a data-centric (cloudy) world, there are examples today where special-purpose compute clouds are already forming around special-use data sets... sometimes intentionally, sometimes organically. One example I frequently point out is the NYSE Capital Markets Community Platform - a special-purpose cloud computing environment formed with a massive market trading data set at its core.

I am increasingly asked by service providers and enterprises alike, what other businesses and special-purpose clouds might form around data?  What new clouds (and associated business models) might we build and monetize? How can we better serve vertical market needs in the Cloud?

Forming Community Clouds - Applied (vs. Theoretical) Data Gravitational Theory

After more thinking and conversations with experts on the topic, I wanted to offer some examples and ideas that I hope trigger further exploration by cloud- and service providers. Perhaps there are (or will be) new businesses based on some of these ideas of attracting data and computing.

Financial Services Community Cloud: as I've mentioned, the NYSE CMCP has at its core a huge database of stock market history.  It's natural attractor for trading firms and hedge funds to co-locate their compute loads near this data as they test and refine trading algorithms and prediction methods. High-performance processors with low-latency connections to Wall Street don't hurt the model either.  Perhaps there are other forms of gravitational financial data (other markets?) that could attract similar compute clouds?

Photography/Imagery Community Cloud: More and more companies (Shutterfly, SmugMug, EverPic stock photography companies etc.) are in the business of warehousing photos - mostly for simple monetization. But some innovative photo data collections might take advantage of this and provide a co-located compute platform for ISVs to provide higher-level photo identification, cataloging, enhancement and even geo-tagging services.  Perhaps the compute services could take advantage of knowledge about the larger database of images that have been previously tagged or otherwise cataloged within the larger community.  [Bonus thought experiment: create a shared medical imagery cloud].

CRM and Customer Insight Community Cloud: Consider the amount of customer data located on and others. Now consider the amount of consumer behavior information collected by systems like Marketto and others.  What if one of these giants begins to acquire additional firms who house complementary marketing data - and begins to build valuable "big data" around customer behavior?  Much like, the customer data would attract even more marketing and consumer behavior application workloads, again attracting more data and workloads.

The Retail Community Cloud: Start watching what Walmart Labs and Nielson are doing in the Big Data and retail analytics space.  It would be but a small jump for either to create a retail cloud - centered around a huge (but perhaps anonymized) database of consumer purchasing patterns, geographies, pricing and outlets. Monetize it by allowing co-location of marketing analytics workloads from marketing firms seeking insights into better forms of micro-marketing, associative/recommendation sales, and other forms of retail analytics engines. All retail firms great-and-small would want a piece of that action.

The Energy Community Cloud: What would it be worth to amass data about energy consumption -- at the customer level -- across the country? Perhaps associate those users with industry/SIC codes, zip codes, electricity prices and/or electricity source renewabilty (or carbon footprint)?  No single utility has this data, but firms such as Enernoc monitor consumption data across the country. What if they developed a cloud that encouraged co-location of workloads and businesses which take advantage of this data - such as monitoring which businesses are really "greenest", which vertical industries are growing fastest, or where alternative energy sources would be most attractive. Add to the database information such as energy efficiency programs or overlay it with data about alternative (wind, solar, geo) energy generation. The data at the core could attract compute workloads for use by other energy, efficiency, and economic monitoring businesses.

And more clouds: As I've mentioned before,
I could see this transforming both the cloud service provider ecosystem, as well as entire industry groups. Consider new Cloud Service Provider models:  What if NOAA formed the Weather and Atmospherics Community Platform? If healthcare companies created federated Medical Records Community Platforms? If the USGS formed the World Geologic Community Platform? If other brokerages created equivalent capital markets platforms? 

Building a Community Cloud with Gravity
The next natural question I wonder is how one might go about building a community cloud or "special purpose" data repository and associated compute cloud - be it around a vertical industry or specialized data type. In my opinion there are a few necessary properties each cloud (business) would have:
  • Data sets that become more valuable as they grow and become more diverse - and of course which generate additional gravity of their own
  • Business models that monetize the data - and perhaps generate additional derivative data. (In some instances the data may need to be anonymized).
  • Co-located workloads that need to be co-located near the large (gravitational) data sets due to their frequent access 
  • Privacy, security and regulatory controls specific to the industry and/or data type and globally provided/reinforced
  • Industry-specific sales & marketing - presumably each community cloud would have appeal to specific verticals, markets or industry groups. Driving demand / awareness within these markets is of course critical.
If you know of community clouds based on data gravity, please share. In my opinion we'll see dozens of these special-purpose clouds form around data sets in the coming years.

