Friday, November 14, 2008

EMC and cloud storage

Well this is more a collection of stuff for me to read later than a real blog post. EMC has announced their Atmos product. Atmos is the software that used to be called maui. It is used to store data on their "Hulk" product. The high-density, Low-cost storage array. So the tubes have been a buzz with info and questions about the product.


For example, Robin Harris has a link and some discussion about the theoretical underpinnings of the product, and the project in which the idea is based. Clearly the EMC product isn't exactly the "oceanstore" product but it's fairly close.


My favorite storage blogger Chris M Evans wrote up some pretty good summaries and links to even more good posts on the topic. He has the same practical matter questions that I have. How does it actually get done? How is going to handle failures and do it's real work? I'm sure EMC has reasonable methods to handle them, but the idea is different from how we do things today.


My #2 goto Beth Pariseau has an equally compelling look and collection of links about the Atmos product as well. (Sorry Beth, you get paid to write so I have to view it with a skeptical 'payola' eye. It's not you, its me. As such you'll have to settle for my #2 spot (as if it matters to anyone other than me...)).


I have the same basic questions a lot of others seem to have:



  • How's it going to work in the real world? How do I, in my point-to-point DS3 world make this work? If the answer is 'bigger pipes' then they've just eliminated a huge customer base. The large scale companies would have to buy it and then re-sell to smaller companies ala Amazon's S3. Well, that hasn't been stellar, and I don't have my data in hand, a company who could be engaging in the next version of credit default swaps does. How do I back this up? Do I need to? How does compliance work?


  • Haven't we heard this song and dance before? Storage as a service has been tried and was pretty much a disaster. People want to own it. They want their hands on it. I think people are willing to tollerate their 'in flight' data being in someone else's hands, but data at rest is another issue all together. How do we convince CIO and CTOs that this isn't a remix on a bad B-side single? They've heard SaaS. They've heard Utility Computing. They've heard Grid computing (Twice, no less). Now Cloud storage is some how different. Not feeling the 'gotta have it' need to run right out and get it.


  • Overhead much? Caching, N-levels of replication, Rich Meta data, Single Unified Name space... None of this is storage overhead. This is all CPU and Network overhead. Of the three components, CPU, Network and Spindles, they pile on the two that are the hardest to incrementally grow? Where does this caching and metadata 'live'? Who/What maintains consistency with out incurring murderous latency


Anyway, I'm sure the EMC partisans will march out with their explanations. As the product actually ships to live customers who are using it for their core unstructured data store (not some stove-piped sub-group within Dell, for example) I'm sure these questions will all come to pass. For now, I'm sitting on the sidelines trying to get my mind around what's real and what's marketing fluff.


The other big question I can't seem to answer is what business problem is this solving? Reliability? Cost? Performance? I have a hard time believing this will help costs or performance, so that leads to reliability? I'm sure I'll have more thoughts as time permits me to read the glut of information flowing out these days.


No comments:

Post a Comment