Data Longevity, VMware deduplication change over time, NetApp ASIS deterioration and EMC Guarantee

August 18th, 2010
by Christopher Kusek (PKGuild)

Hey guys, the other day I was having a conversation with a friend of mine that went something like this.

How did this all start you might say?!? Well, contrary to popular belief, I am a STAUNCH NetApp FUD dispeller.  What that means is, if I hear something said about NetApp by a competitor, peer, partner or customer which I feel is incorrect or just sounds interesting; I task it upon myself to prove/disprove it because well frankly… People still hit me up with NetApp questions all the time :) (And I’d like to make sure I’m supplying them with the most accurate and reflective data! – yea that’s it, and it has nothing to do with how much of a geek I am.. :))

Well, in the defense of the video it didn’t go EXACTLY like that.   Here is a little background on how we got to where that video is today :)   I recently overheard someone say the following:

What I hear over and over is that dedupe rates when using VMware deteriorate over time

And my first response was “nuh uh!”, Well, maybe not my FIRST response.. but quickly followed by; “Let me try and get some foundational data”  because you know me… I like to blog about things and as a result collect way too much data to try to validate and understand and effectively say whatever I say accurately :)

The first thing I did was engage several former NetApp folks who are as agnostic and objective as I am to get their thoughts on the matter (we were on the same page!)Data collection time!  

For Data Collection… I talked to some good friends of mine regarding how their Dedupe savings have been over time because they were so excited when we first enabled it in the first place (And I was excited for them!)   This is where I learned some… frankly disturbing things (I did talk to numerous guys named Mike interestingly enough, and on the whole all of those who I talked with and their data they shared with me reflected similar findings)

Disturbing things learned!

Yea I’ve heard all the jibber jabber before usually touted as FUD that NetApp systems will deteriorate over time in general (whether it be Performance, whether it be Space Savings) etc etc. 

Well some of the disturbing things learned actually coming from the field on real systems protecting real production data was:

  • Space Savings are GREAT, and will be absolutely amazing in the beginning! 70-90% is common… in the beginning. (Call this the POC and the burn-in period)
    • As that data starts to ‘change’ ever so slightly as you would expect your data to change (not sit static and RO) you’ll see your savings start to decrease, as much as 45% over a year
    • This figure is not NetApp’s fault.  Virtual machines (mainly what we’re discussing here) are not designed to stay uniformly the same no matter what in accordance to 4k blocks, so the very fact that they change is absolutely normal so this loss isn’t a catastrophe, it’s a fact of the longevity of data.
  • Virtual Machine data which is optimal for deduplication typically amounts to 1-5% of the total storage in the datacenter.   In fact if we want to lie to ourselves or we have a specific use-case, we can pretend that it’s upwards of 10%, but not much more than that.  And this basically accounts for Operating System, Disk Image, blah blah blah – the normal type of data that you would dedupe in the first place.
    • I found that particularly disturbing because after reviewing the data from these numerous environments… I had the impression VMware data would account for much more!   I saw a 50TB SAN only have ~2TB of data residing in Data stores and of that only 23% of it was deduplicating (I was shocked!)
    • I was further shocked that when reviewing the data that over the course of a year on a 60TB SAN, this customer only found 12TB of data they could justify running the dedupe process against and of that they were seeing less than 3TB of ‘duplicate data’ coming in around 18% space savings over that 12TB.    The interesting bit is that the other 48TB of data just continued on un-affected by dedupe.   (Yes, I asked why don’t they try to dedupe it… and they did in the lab and, well it never made it into production)

At this point, I was even more so concerned.   Concerned whether there was some truth to this whole NetApp starts really high in the beginning (Performance/IO way up there, certain datasets will have amazing dedupe ratios to start) etc. and then starts to drop off considerably over time, while the EMC equivalent system performs consistently the entire time.

Warning! Warning Will Robinson!

This is usually where klaxons and red lights would normally go off in my head.    If what my good friends (and customers) are telling me is accurate, it is that not only will my performance degrade just by merely using the system, but my space efficiency will deteriorate over time as well.    Sure we’ll get some deduplication, no doubt about that!  But the long term benefit isn’t any better than compression (as a friend of mine had commented on this whole ordeal)    With the many ways of trying to look at this and understand I discussed it with my friend Scott who had the following analogy and example to cite with this:

The issue that I’ve seen is this:

Since a VMDK is a container file, the nature of the data is a little different than a standard file like a word doc for example.

Normally, if you take a standard windows C: – like on your laptop, every file is stored as 4K blocks.  However, unless the file is exactly divisible by 4K (which is rare), the last block has just a little bit of waste in it.  Doesn’t matter if this is a word doc, a PowerPoint, or a .dll in the \windows\system32 directory, they all have a little bit of waste at the end of that last block.

When converted to a VMDK file, the files are all smashed together because inside the container file, we don’t have to keep that 4K boundary.  Kind of like sliding a bunch of books together on a book shelf eliminating the wasted space.  Now this is one of the cool things about VMware that makes the virtual disk more space efficient than a physical disk – so this is a good thing.

So, when you have a VMDK and you clone it – let’s say create 100 copies and then do a block based dedupe – you’ll get a 99% dedupe rate across those virtual disks.  That’s great – initially.  Netapp tends to calculate this “savings” into their proposals and tell customers that require 10TB of storage, that they can just buy 5TB and dedupe and then they’ll have plenty of space.

What happens is, that after buying ½ the storage they really needed the dedupe rate starts to break down. Here’s why:

When you start running the VMs and adding things like service packs or patches for example – well that process doesn’t always add files to the end of the vmdk.  It often deletes files from the middle, beginning, end and then  replaces them with other files etc.  What happens then is that the bits shift a little to the left and the right – breaking the block boundaries. Imagine adding and removing books of different sizes from the shelf and making sure there’s no wasted space between them.

If you did a file per file scan on the virtual disk (Say a windows C: drive), you might have exactly the same data within the vmdk, however since the blocks don’t line up, the block based dedupe which is fixed at 4K sees different data and therefore the dedupe rate breaks down.

A sliding window technology (like what Avamar does ) would solve this problem, but today ASIS is fixed at 4K. 

Thoughts?

If you have particular thoughts about what Scott shared there, feel free to comment and I’ll make sure he reads this as well; but this raises some interesting questions.   

We’ve covered numerous things in here, and I’ve done everything I can to avoid discussing the guarantees I feel like I’ve talked about to death (linked below) so addressing what we’ve discussed:

    • I’m seeing on average 20% of a customers data which merits deduping and of that I’m seeing anywhere from 10-20% space saved across that 20%
      • Translation: 100TB of data, 20TB is worth deduping reclaiming about 4TB of space in total; thus on this conservative estimate you’d get about 4-5% space saved!
      • Translation: When you have a 20TB data warehouse and you go to dedupe it (You won’t) you’ll see no space gained, with a 100% cost across it.
        • With the EMC Unified Storage Guarantee, that same 20TB data warehouse will be covered by the 20% more efficient guarantee (Well, EVERY data type is covered without caveat)   [It’s almost like it’s a shill, but it really bears repeating because frankly this is earth shattering and worth discussing with your TC or whoever]

For more great information on EMC’s 20% Unified Storage Guarantee – check out these links (and other articles I’ve written on the subject as well!)

EMC Unified Storage is 20% more efficient Guaranteed

I won’t subject you to it, especially because it is over 7 minutes long, but here is a semi funny (my family does NOT find it funny!) video about EMCs Unified Storage Guarantee and making a comparison to NetApp’s Guarantee.   Various comments included in the description of the video – Don’t worry if you never watch it… I won’t hold it against you ;)

Be safe out there, the data jungle is a vicious one!   If you need any help driving truth out of your EMC or NetApp folks feel free to reach out and I’ll do what I can :)

SPOILERS!!!

 

You didn’t think I’d leave it that easily!   I definitely encourage conversation and engagement and absolutely want you to; some of you are you going to read what I said (or completely disregard or gloss over it) and say “I GET AMAZING DEDUPE RESULTS TODAY LOOK AT WHAT I HAVE!” or “MY ENVIRONMENT IS DEDUPING X Y AND Z AT J PERCENT” etc.  No, I get it, I do.   You had BETTER be seeing massive deduplication space savings in your VMware environment.  In fact, I hope you are seeing 10-20% savings in your SQL environments which you’re not compressing by default (Why are you deduping SQL? Yea I know we’re crazy, lets get past that!) and your file data, you’ll be seeing what.. 35-45% deduped there per volume you have allocated or broken down into your CIFS or NFS structures?   I know for the most part exactly what type of space savings you SHOULD see out the gate, and at a fairly decent level what you would typically be seeing over the span of x months, years (etc rate of change period, etc).  Not to mention that I personally love the algorithm which is employed, I think it’s really cool functionally and understand it far more than most people should.   Though this is not the panacea of all things.   And with that in mind I want you to be combative if you are defensive about the savings you’ve gained; ESPECIALLY if you are questioning where that may bode for the future.   I hate nothing more than making a promise “Look! You’ve saved 80% today, so it can only stay at 80% or above for the future right?” only to find them shattered because there honestly wasn’t enough historical data let alone usage patterns to dictate what the future would look like.    

As I always have, I encourage you to question, confirm, validate and question again.   I drive a Prius and NetApp is a little like a Prius in a number of ways.   In optimal conditions you can get an expected MPG, and pretty much assume you’ll get it.   Unless you go up hill, or down hill, or it’s too hot out or it’s too cold out; though otherwise you can pretty much expect to get this same MPG! (Even if my MPG deviates by as much as 30% given any number of conditions)     EMC has always been conservative.  If they say you’ll get ‘x’ you have the expectation, nay a guarantee that you will be obtaining ‘x’, that’s something they’ve always been very good at which is why EMC is where information lives. :) (bad tagline reference… ;))

Tags: , , , , , ,
Posted in Avamar, Celerra, CLARiiON, Deduplication, Efficiency, emc, NAS, NetApp, SQL, Storage, Virtualization, vmware, vSphere | Comments (243)

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • But let me set the record straight here…nnI do not do origami…

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • I too have respect for you Vaughn which is why I hold you to such a high standard! (though please call me christopher)nnYou know very clearly that I absolutely ADORE to share the specific details of where/when/etc I collected all of my data to cite it very specifically and granular (perhaps so much so that people get uncomfortable)nHowever, it was for that very reason (The parties involved were not comfortable with me sharing their actual anecdotal data so as not to be getting called upon with a “WTF?!”) Thus, I took it upon myself to obfuscate this for their safety.nnBelieve me, if I had systems in my own lab (I don’t have any netapp filers in here anymore, I returned all of those when I left!) I’d back it up with that, but my “lab” would never reflect a Production environment and I’d not want to even claim that it would, thus why I took the information I was presented at face value which accurately reflected Production environment workloads running dedupe over time.nnAs to why people aren’t rushing out there to publish blog posts about their deterioration over time/etc/etc – Well, we shall have to see if this stirs up any responses from the community saying “NAY! After x months in Production my environment is holding steady and my space savings are epic!” or if we’ll see “Yea, I’ve noticed a deterioration of ‘x’ percent, but I’ve grown to expect that” or whatever the case may be. I’ll reach out to some of my friendly sharers of the data to see what their thoughts are and whether they’re interested in sharing their results publicly.nnI’m going to defer the whole VMDK/etc section to Scott who wrote that part – I was mainly using it for the anecdote of the books on a shelf (I thought that was cute ;)) but if you wholly disagree or want to centralize on a single point, that’s cool as well – This is one of the reasons I normally don’t bring in outside commentary (as I did here) because it’s not written to my style so to speak :)nnTo the point of misaligned, that I know (consider that to be a dead horse, so something I don’t even discuss anymore) though I was one of the folks pushing for alignment recognition looooong before NetApp or EMC even recognized it formally as an issue – I’ll consider that a non-issue, and that Scott wasn’t focusing on Misalignment of the Disks, but on the ability for data to not be LINED up EXACTLY as you scale across VMs. If I am mistaken that’s cool, and I’ll be glad to be corrected.nnSo to answer the question of; Am I trying to mislead readers, and/or is any of this misinformation? Not at all on either count.nnI was trying to educate and raise awareness of the following points which I had not considered when I was at NetApp, and that I’m sure oftens gets misrepresented or misunderstood by customers in general.nn1. VMware data accounts for a small portion of the environment. Certainly in VERY small deployments it may account for more capacity (Which is epic, the savings when realized in a 10TB system will be amazing!) but in a 100TB+ system, that level of reduction gained through deduplication will be diminished to a small portion of space saved. (Then when you get into the whole EMC Guarantee of saving 20% blah blah blah) that sheer amount realized starts to really be a wash, and then where are we?n2. Deduplicated data deteriorates over time. This is something I could certainly consider “Yea, sure.. data changes so it’s bound to happen right?” And I certainly considered that, but to the scale of 1 or 2%, MAYBE 3%, but not on the grade of 20-30%+ over the span of say 1 Year. This can be rather concerning if I’m delivered a 5TB storage system intended to store data of up to 10TB and find myself needing to go back to the well to grow in to the system within the first years of time. It could be said that if I had a 10TB expectation, and sold 10TB to bear this wouldn’t be an issue.nnIn the future you’ll see a lot fewer competitive type posts out of me, because I’ll be able to do what I do best (and love most) sharing new technology and advancements across many facets and angles which we can all enjoy and talk to. But like I said earlier in the post, I was asked to produce this by my friends/customers in order to GET the conversation started within the community. If no one other than these folks saw deterioration after a year, that would be the absolute best; but if others are indeed seeing their VM environment start to lose.nnBack to the fun of sharing, learning, and collaborating. :)

  • Archives