HP StoreOnce D2D – Understanding the challenges associated with REALLY BAD NETAPP FUD

Hey guys!   I was sitting here today, minding my own business… when the following tweet showed up in one of my search columns! (Why yes I do search on NetApp, and every major vendor in the Industry that I know a real lot about, I like to stay topical! oh and RT job opportunities… I know peoples ;))

#HP - Understanding the Challenges Associated with NetApp's Deduplication http://tek-blogs.com/a/sutt9r @TekTipsNetHawk 

So I thought “Well Hey! I’d like to understand the challenges associated with NetApp’s Deduplication! Let’s get down to business!”

I click the little link which takes me to THIS PAGE where I fill out a form to receive my “Complimentary White Paper” ooh, yay!   And let me tell you, other than the abusive form (Oh lovely… who makes people fill out FORMS for content.. yea I know, I know..) this thing looked pretty damn sweet!   FYI: By sweet, I mean it looks so professional, so nice, like a solid Marketing Group got their hands on this and prettified it! I mean look at it!

HP StoreOnce D2D - Understanding the Challenges Associated with NetApp Deduplication - Business White Paper

Tell me that doesn’t look damn professional! Hell, I’d even at first pass with NO knowledge, take everything contained within that document at face value as the truth, I mean cmon let’s cover the facts here.

  1. This whitepaper looks SWEET! It’s all logo’d out and everything too!
  2. It’s only 8 pages; that speaks of SOLID content including not only text, but pictures and CITING evidence! Sweet right?!
  3. And you said it; right there on the first page is says “BUSINESS WHITE PAPER” Tell me that does not spell PRO all over it.

So what I’m thinking is, clearly this has been vetted by a set of experts who have validated the data and ensured that it is correct; or at least within the context of the information consider the footer of this document claims to have been published January 2011.  So this CLEARLY should be current.

Yea… No. Not Quite.  Quite the opposite? I guess it may be time to explain though! But before I go there, Disclaimer time!

HP’s Disclaimer at the bottom of the document:

© Copyright 2011 Hewlett?Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

My Disclaimer for what you’re about to read:

I do not work for HP and I have nothing against HP.  I do not work for NetApp and have nothing against NetApp.  Yea I work for EMC – Wait, aren’t you the competition?! WHY ARE YOU RAGGING ON HP FOR THEIR POORLY WRITTEN PAPER?! I think that falls in line because, when *I* Publish something attacking NetApp’s deduplication I do the homework and validate it (Except for when I quote external third parties… Yea I don’t do that anymore because… you end up with a mess like this document that HP has released ;))  OMG Seriously?! Seriously HP!? You’ve spurned me to write this because you upset my competitive nature.   With that said, let’s get down to brass tacks.     Secondary Disclaimer:  I had forgotten I read this originally when this post came out HP Launches an Unprovoked Attack on NetApp Deduplication and you know what? between seeing it circulate AGAIN and having me fill out a form… yea Sean following bad data with bad data is #fail either way.    Tertiary Disclaimer; a lot of the ‘concerns’ and ‘considerations’ addressed in the HP Paper which they’re claiming StoreOnce is the bees knees can solve, are actually readily solved with Industry Best of Breed Avamar and Data Domain, let alone leveraging VNX Deduplication and Compression, but I won’t go there because that is outside of the boundaries of this particular post :)

The paper has been broken down into the following sections; “Challenge #, blah blah blah, maybe cited evidence, Takeaway” I plan to… give you the gist of the paper without quoting it verbatim (that’s like the paper itself!) but also not removing the context, and sprinkling commentary and sarcasm as needed ;)

Challenge #1:  Primary deduplication: Understanding the tradeoffs

This section has a lot of blah blah blah in it, but I’ll quote two areas which have CITED references;

While some may find this surprising given the continuing interest in optimization technologies, primary deduplication can impose some potentially significant performance penalties on the network.1

Primary data is random in nature. Deduplicating data leads to various data blocks being written to multiple places. NetApp’s WAFL file system exasperates the problem by writing to the free space nearest to the disk head. Reading the data involves recompiling these blocks into a format presentable to the application. This data reassembly overhead mandates a performance impact, commonly 20–50 percent.2

I particularly love this section for two reasons; one it’s VERY solid in its choice of words “can impose” not will impose, but it’s like “maybe?!?” This it not a game of “can” I have a cookie vs “may I have a cookie”, this is a white paper right? Give me some facts to work off of guys.   Oh, I said two reasons didn’t I.  Well, here is Reason #2 – Here’s the citing! [1 End Users Hesitate on Primary Deduplication; TheInfoPro TIP Insight, October 21, 2010]  I’ll chalk up to the possibility that I am clearly an IDIOT but I was unable to find the “Source” of this data.  So… soft language… inability to validate a point, sweet!

But wait, let me discuss the second citing for a second, yea let me do that.    I won’t go into WTF they’re saying in how they’re citing this as this is not an extensive and deep analysis of how WAFL and Data ONTAP operate but I decided “Whoa excellent backing data! Let me checking out that citing shall I?!”  So I go to the source [2 Evaluator Group, August, 2010] and I find… I can pay, $1999 to get this data!   Excellent! First idea which came to mind, “I should write stupid papers and then sell the data at MASSIVELY high costs.. nah I’ll stick to factual blog posts”   Yea, so I’m 0 for 2 in being able to “Validate” whatever these sources happen to be sharing, I’m sure you’ll be in the same boat too.    Oh but the best part? Let’s take a moment and read the Take Away, shall we?!

Takeaway – Deduplication is often the wrong technology for data reduction of primary storage.

OMG SERIOUSLY? THAT IS SERIOUSLY YOUR TAKEAWAY?! It’s like a cake made up of layers of soft language, filled it with unverifiable sources.   And it’s not like this is even very GOOD FUD, it’s just so… Ahh!!!!!! A number of us (non-netappians) got so pissed off when we read this, I mean SERIOUSLY?!?

Relax.. Relax, it can’t get any worse than that right?

Challenge #2: Fixed vs. variable chunking

Wow this reads like an advertisement for Avamar.  But seriously, this for the most part only discusses the differences between Fixed and Variable chunking, more educational than anything.  Not a whole lot for me to discuss other than noting the similarities in their message to the Industry Leading Avamar.

Takeaway – Using variable chunking allows HP StoreOnce D2D solutions to provide a more intelligent and effective approach for deduplication.

Wow Christopher, you’re getting tame.. you let them slide on that one!

Challenge #3: Performance issues and high deduplication ratios

NetApp suffers performance issues with high deduplication ratios; something NetApp engineers said on a post to the NetApp technical forum.3

NetApp is so concerned about the performance of their deduplication technology that Chris Cummings, senior director of data protection solutions for NetApp told CRN customers must acknowledge the “chance of performance degradation when implementing the technology” should they turn on the technology.4

Okay, sweet! Let’s rock this out! Not only do they have CITED sources of this data (You know I love it when I have data to refer to!) but they even provide embedded links so I can click to go directly to the data! (WOOHOO!)  And like any good detective… I did visit those links.   It was upon visiting those two links that two things came back to me. “Hmm, Chris Cummings quote from 2008.   Hmm, Forum conversation from 2009…” … Yea I was still AT NetApp during those two periods, OMG SERIOUSLY HP YOU’RE QUOTING DATA FROM 3 OR MORE YEARS AGO?!?! How can you NOT expect me to put that in caps?    Let’s take a little journey down almost ANY product or dev company for a moment… I’d like to visit VMware in this particular scenario.

“VMware is great for Virtualization applications, Oh, but not Mission Critical Applications, it’s not stable for that.  Do not virtualize mission critical applications”.   Yea.  you can almost QUOTE me as having said that. When would I might have said that? Maybe when VMware had GSX out (Pre-ESX days) and our computers were run with the power of Potatoes.    Yea, if you have NO dev cycle and you do not invest in development [Oh no you didn’t make a slighted attack on the MSA/EVA! … No I didn’t ;)]   But if you STOP development all things we’re discussing can absolutely be true! #WeirdAnecdoteOver

So, while I firmly agree in 2008 and 2009 there WERE Performance concerns the likes of which were discussed in those forums.  Very viable, Deduplication in general was maturing, I’m sure every product out there had similar problems (Data Domain which scales based upon CPU – with 4 year old CPUs probably couldn’t perform as well as it can today with our super Nehelem’s etc)    You need to realize it is 2011, we’re in an entirely new decade.  Please stop quoting “Where’s the beef” or making “Hanging Chad” references like Ted Mosby in How I met your mother because while true at the time, not so applicable today.

Takeaway – HP typically finds 95 percent duplicate data in backup and deduplicates the data without impacting performance on the primary array.

I almost forgot the takeaway! (Hey! I’m verbose… You should know that by now!)   So… what I’m hearing you say is… Because HP doesn’t have a native Primary Storage Deduplication solution like NetApp or EMC… there is no performance impact on the primary array! Hooray! Yea… WTF SEAN?  I mean, I guess if I wanted I could repurpose most of this paper to position Avamar which seems a LOT more versatile than HP StoreOnce but okay, let’s move past!

I’m going to lump Challenge #4, #5 and #6 together because they have little to no place in this paper.

Challenge #4: One size fits all
Takeaway – Backup solutions are optimized for sequential data patterns and are purpose built. HP Converged Infrastructure delivers proven solutions. NetApp’s one?size?fits?all approach is ineffective in the backup and deduplication market.
Challenge #5: Backup applications and presentation
Takeaway – NetApp does not provide enough flexibility for today’s complex backup environments.
Challenge #6: Snapshots vs. backup
Takeaway – Snapshots are part of a data protection solution, but are incomplete by themselves. Long?term storage requirements are not addressed effectively by snapshots alone. HP Converged Infrastructure provides industry?leading solutions, including StoreOnce for disk?based deduplication for a complete data protection strategy.

I’m sorry, this is no contest and these points have absolutely no place in a paper educating on the merits and challenges of Deduplication with NetApp.  This definitely has it’s place in a whole series of OTHER competitive and FUD based documents, but not here, not today.

In summary…

Sean… (Yes I know your name!) You wrote this paper for HP right? As a Technologist and Technology Evangelist for that matter, I would absolutely LOVE to learn about the merits, the values, the benefits of what the HP StoreOnce D2D solution brings to market and can do to solve customers challenges.    But honestly man, this paper?   I COMPETE with NetApp and you pissed me off with your fud slinging.   I know *I* can piss off the competition when I sling (FACTS) so just think about it.  We’re a fairly small community, we all know each other for the most part.  (If you’re at Interop in a few weeks, I’ll be at EMCWorld, feel free to txt me and we can meet up and I won’t attack you, I promise ;))    Educate, but please do not release this kind of trash into the community… Beautiful beautiful trash mind you I mean everything I said about how amazingly this was presented, honestly BEST WHITE PAPER EVER. But that has got to be some of the worst most invalid content I’ve encountered in my life. (As applicable to how I stated it :))

I guess I should add a little commercial so someone doesn’t go WTF – I mean what I said above not only about the technologies which were discussed.    If you think StoreOnce is a great solution, then you’ll be floored by Avamar and Data Domain.  They’re not best of breed in the industry without good reason.

Feel free to comment as appropriate, it’s possible this has been exhausted in the past but SERIOUSLY I don’t want to see this again. ;)

Step one you say we need to talk, He walks you say sit down it’s just a talk, He smiles politely back at you, You stare politely right on through.

First Industry Cloud Certification: EMC Cloud Architect class and E20-018 EXPOSED!!!

If you’ve ever read any of my exposed series, well… Look for a fairly unbiased approach to things.   Oh and hopefully the Education team doesn’t come back screaming at me. ;)   And as always, in the best of my ability I bring you the… Disclaimer!

Disclaimer:  The following information is not under NDA, is not one persons opinion but rather that collected from others through interviews, emails, discussions under which none of us are sharing any proprietary data about the Class or the Exam.    I tread the line closely so read on!

Okay, I normally only Post-Mortem or expose an Exam, or a Class, but not too often do I get the liberty to expose the two together! With that being the case I want to start by educating you a little bit about this designation, certification and beyond so you don’t feel the need to go to multiple sources to learn it!

Cloud Architect (EMCCA) Certification E20-001 and E20-018

Okay this little chart stripped from the Brochure basically tells you:

  • To prepare for the E20-001 exam you should take the Information Storage and Management 5 day course # MR-1CP-STF
  • To prepare for the E20-018 exam you should take the Virtualized Data Center and Cloud Infrastructure 5 day course: MR-1CP-NPVICE

At this point I’d like to give you a little color on these particular courses and the respective requirements around them, etc.     First of all, unlike the VCP or other similar type exams, these courses are NOT required in order to sit the exam.   I wanted to make sure you understand that you CAN sit the exam cold.

(Yes I did sit the E20-001 Exam cold and passed – Industry experience has its advantages)

There is an AMAZING book which covers the content of the E20-001 course and exam – the ISM Book I’m not sure if I have a copy at all but I’ve heard from those who have used it is an excellent learning and educational aid!  So if you’re a self-studier this is definitely an EXCELLENT tool for you to use.

Because the E20-001 is a pre-requisite to sit/pass the E20-018 exam I wanted to ensure it got a little coverage, which I think is sufficient at this point :)   For what it is worth, if you have been in the industry a fair amount of time working with SANs, NASs, and other Information Storage Management stuff you should do fine, but ensure you are prepared, E20-001 is the cost of ENTRY beyond that comes the BIG GUNS!!!

Tell us about the Virtualized Data Center and Cloud Infrastructure course

Okay, okay guys I will.   Here it goes, the full in depth analysis of the VDC and Cloud course.    (Education folks, watch out this isn’t all from me either;))

I want to start by telling you very clearly and concisely that there is some GREAT content in the books, material, and other information provided both in the written as well as the lecturer (Your results may vary depending upon instructor)  Though irrespective of who your instructor is, the content in the book will stay the same and will be relevant to the Class, Cloud, VDCs and the Exam.   With that said, I need to differentiate a few things with the course pre-reqs

  • According to the course materials we strongly recommend you have the following Certifications or knowledge/experience:
    • Cisco Certified Design Associate (CCDA) – I bet CCNA would apply as well but I think that’s focused in the other exam/course
    • VMware Certified Professional (VCP)
    • Certified Information Systems Security Practitioner (CISSP)
    • The EMCISM is required for the EMCCA Certification – I mentioned that above, so that’s not a surprise.
    • Oh, and ITIL/PMP is NOT mentioned but those of you who are will find yourself wondering why not ;) (Not for content, but for presentation)

With THIS particular data points expressed I’d like to break you down into two groups:

Generalist/Novice/Acolyte:

If you fall into this area, maybe you have one or more of the certifications above or work in various cross-disciplines.  The courseware will VERY much apply to you.  You will want to pay attention, take rigorous notes; really get the best out of the networking, the instructors, the homework, read, read again, even do some labs to ensure you not only UNDERSTAND it, but you are fully committed to the material you are learning.   For what that is worth, the information is VERY general to the industry, Best Practices with a ‘little’ emphasis on some specific EMC technologies, but otherwise 70%+ of the material on the exam is of a VERY general nature.  The book should be your best friend and will be the answer to your success when it comes time to sit the exam and in life! :)

Guru/Expert/Ninja/Buddha/#IWroteTheExam:

Hey guys, how are you doing.   You know who you are.   You hold all of the certifications above or really have the information down solid.  Heck, you might have taken those exams 10 or more years ago; or even written some of the exam material in those times.   You also happen to be the same kind of folks who have helped write and spec the standards for where we are today; Chances are I know each of you personally. (grin).    Yea…  You won’t last in the class.  I’m sorry.   I’m totally supportive of you, completely in fact (You probably wrote your own internal cloud strategy for your business which is inline with the exam, or for your consultancy)  Definitely not going to survive in the class room.  You’ll say “Err, this is just lecture, I can read the book myself… err, I can write this book while I’m at it!)   I’m not being negative, I see your kind every day…. leave the class because you’re bored, not learning anything and at this point just want to ensure that you have what is REQUIRED to sit and pass the exam.    Good luck guys, you will DEFINITELY want to read the Exam section because that’ll make the difference of success and WTF?!?   

Whoa! Wait a minute! Isn’t that a massive generalization? Either you’re a student or a teacher? … No not really.   Seriously.    If you find yourself arguing with the teacher that they’re wrong and you cite evidence often referring to a presentation you’ve given at a conference? Yea… You’ll do fine :)

Now, I’d like to segue way a moment to some of the directly shared thoughts from an attendee of the course.  We’ll call him B (Not like B from Gossip Girl!)

B’s take on the Virtualized Data Center and Cloud Infrastructure Course

“B” is a Technical Manager in mid-size enterprise.   Experienced in implementing VMware over the years with EMC Storage, HP Servers, Cisco Networking.   Longtime expired member of the CCNA/CCDA club and recent VCP and EMCISM credential holder.   With the stated pre-reqs B felt it might be a stretch from his qualification but not too much of a concern  (If this sounds like you, you’re in good company!)

  • Class started with going over pre-reqs, with CISSP added to the list; was surprised ITIL wasn’t there as discussed earlier
  • Two classes were merged so each section was alternated between two instructors.   As the course is 95% lecture didn’t feel that mattered.
  • The volume of content for the class is 2” thick of slides which unfortunately restricts discussion time available over 5 days.
  • Module 1 leads you into an Introduction to Cloud Computing  – If you instructor reads this module to you verbatim – STOP THEM!!!
  • Module 2 covers the VCP, ISM ad CCDA related material – Very much a review of the Pre-reqs – should be consolidated to focus on goals
  • Module 3 kicked off VDC Design – This is where the meat of the course is, requires proper time to digest and discuss properly
  • Module 4 focused on Governance, Risk and Compliance (Interesting Chapter) but due to time was rushed as was Managing Virtual Environments
  • Module 5 focused on Cloud Services and Summary modules (Had to leave early so missed it)
  • There is nothing earth-shattered in the course but there is a lot of GOOD Material!
  • The labs are too vague leaving you spending more time trying to figure out what you’re supposed to do instead of discussing solutions

B’s Summary of the course

In summary, it’s a good course to show EMC’s "journey to the cloud".  I’d prefer less focus on the pieces (modules 1-2) and more focus on how to put the pieces together (modules 3-6).  The labs need refined to give more guidance so we can spend more time applying the knowledge rather than wondering what the designers of the course had in mind.  Given that this was the first class (I believe), I’d love to see how it changes over the next few sessions.

Well guys, what do you think of B’s take on the course?   I think his assessment was fairly accurately representation of what was going on, and equally what you might expect out of the class in its early stages.   To tell you the truth it can ONLY get better from this point.   I only briefly paraphrased what B had to say to preserve the original message but also not to call him directly unless he wants to be named :)  

Curious what the course looks like on the other side of the fence?   Here is the summary and breakdown from “Jerome” who’s been doing this for awhile!

Jerome’s take on the Virtualized Data Center and Cloud Infrastructure Course

I had the chance to attend the "Virtualized Data Center and Cloud Infrastructure" course put on by EMC this week.  Below are my thoughts.

Certification Track

This course is part of the EMC Cloud Architect track – EMCCA.  This course specifically is designed to prepare for the E20-018 certification test, which is a Specialist level certification.  The Expert level material and test have not yet been released, and are expected later this year.

Focus

The EMC Cloud Architect Track is designed to help enable customers adopt a cloud maturity model.  This consists of a move from physical data centers to Virtual Data Centers (VDCs), from VDCs to full Operationalization of virtualization, and from there to IT as a Service.  This course was specifically focused on the physical to VDC phase of the transition. 

Material and Presentation

This course is a lecture only course.  There was no hands on material or lab time.  What labs were included in the course were small group discussions only.  EMC has tried to make this a "generic" cloud course that is "open" to all technologies, but it is heavily slanted towards their view of the world.  The course uses the following outline, I have added the EMC translation in parenthesis:

  • Virtualized Data Center and Cloud Introduction (Private Clouds and ITaaS model)
  • VDC Architecture (V+C+E products, convergence)
  • Designing for Virtualized and Cloud Environments (Best Practices for Virtualization – VCP stuff)
  • Governance Risk and Compliance (RSA and Archer)
  • Managing Virtualized Environments (IONIX)
  • Cloud Services (Service Provider models)

Exam

The exam is a 60 question test, with 63% required for passing (38 correct answers).  The practice exam on the EMC Education website is decent, and a good barometer of your chance to pass the exam, though the practice was about 20% easier than the real exam.  I would say that the real exam questions were written fairly poorly, and were often difficult to understand.  They would describe a scenario, but then it seemed they would give up half way through and ask only a tangentially related question.  I think that it was a result of attempting to keep the exam mostly generic, rather than focused on EMC technologies.

Recommendation

In general, I found the course to be very much in alignment with our message and focus, and as a result I felt it was a very easy set of material.  The only new sections to me were a few of the VDC maturity definitions and the GRC models.  Because of that, I felt the instructors moved much too slow.  I also found that the instructors were professional trainers, not SMEs on cloud computing, so they offered little value other than moderating the course.  I ended up leaving mid-way through the second day, and just reviewing the course material on my own, and was able to pass the test on Thursday, even though the course runs through Friday.  If you feel you need a little more preparation, I would recommend the VILT, rather than the full course. 

Okay, no his real name is not Jerome I decided I would use that name as a tribute to Jerome from Flight of the Conchords, especially how Jerome was being so constructive with his feedback So what this is providing you is two assessments of the course; FWIW.. I agree with both, grin :)

CXIs take WTF?!?

Yea, I think I made it fairly clear in the earlier points.    But if I had a few things I want you to do and know; KNOW the material, if you’re confused read it again, understand it, deep.    Focus on your weaknesses in the areas which are defined in the class and be true and honest to yourself, because albeit Rihanna and Eminem may love the way you lie… well the Exam will NOT be so nice.  . . . Speaking of which!

Tell us about E20-018 Virtualized Infrastructure Specialist Exam for Cloud Architects!

Okay, Okay, you begged enough!   Firstly, let me tell you I cannot tell you what is ON the Exam, what is IN the exam or anything ABOUT the exam. We cool? ;)    Yea, but just because I cannot provide you those specifics and by now I think you know a few things about me…. here is what I can tell you.

Remember what I said above about PREPARING.  KNOWING the content from the Class, Books or material LISTED as being on the exam?  Yea, I wasn’t messing around.  Seriously! DO THAT. KNOW THAT. DO IT ALL!   But what would this matter or mean if you didn’t take a few sliding comments from those of us who have taken the exam.    I talked to Jerome after he took the exam to see how he felt about it; his take?

Jerome: The test was very hard, but that was only due to the language of the questions and the structure.

Me: Hated that test.

Whoa Whoa Whoa! Christopher! That isn’t very constructive! What about being constructive with your feedback?!?! Yea, hi, I’m still here.. I’m still WRITING THIS! ;)     I don’t remember if I’m supposed to say this or not but since the exam is already out, published and I’ve taken it… I’ll go out on a limb thinking I can talk about it.   Yea, I’ve seen SOME of the content before the exam came out.  I reviewed the questions for validity, truth, honesty, integrity… The kind of standard I started to see so wonderfully come out of Microsoft (I know the entire Microsoft Learning Team, so I know the commitment they have to Exam Integrity THESE days instead of days gone past where questions were insane)   I’d like to say that this exam took the PAGES upon PAGES of comments I’d have on a few word question to heart when it came time to publish the exam to stand behind as backing for the questions.   Yea, I thought that for OTHER exams I would EXTENSIVELY provide EXTREMELY constructive feedback on. [I’m not shy about telling you what is wrong, why it is wrong, how others will perceive it, and what steps you can take in order to correct….]   Also sometimes there are release schedules… or my voice isn’t LOUD enough, or I didn’t cover enough of a user base of questioning to make an impact outside of my SME area I was initially focused on reviewing.    None the less, to Jerome’s point of the language of the questions and the structure; how things were poorly worded, or to quote me “I hated that test”  

I’m VERY good at taking tests (I teach classes on how to take exams ;)).  I’ve passed more exams than most will in their lives, and respectively I’ve probably failed more exams than most people will take including their entire academic career ;)     I can wholeheartedly say that you better STUDY for this exam.   KNOW your material and know how to cut through the treacle which is going to be offered up as questions.   The answers are right, the questions are a little confusing and the ones which are not can be VERY specific.    I prepared for the exam by using the Practice Test – I was getting 100% consecutively and I felt confident.   Yea, once that exam started up that melted away!    Definitely study, study, study! Prepare! 

Summary on Class and Exam!

For the first Industry Certification focused on Cloud with an Exam AND a course; that is a major undertaking to start with, and honestly to tell you the truth I think EMC did a great job of it.   Obviously you may take some of my comments above as extremely critical (Hey, I’m extremely critical!) but it’s because I care.     They’re definitely taking things to new levels, I’m not even sure what other organization in the industry could assault such an undertaking other than Microsoft (And that would be HEAVILY MS biased, Hey I love you but it’s true) and most “independent” third parties, well we all know that their Exam would come out looking like absolute trash and they wouldn’t really have the vehicle or mechanism to go about delivering and driving it successfully.   No definitely considering all of that and what we expect so heavily from ourselves, and our industry, EMC has done an absolute bang up amazing job!   

Hopefully I haven’t scared you away from taking this course and the exam.   If you know your stuff, you better prep, if you are new to the whole game you’re going to learn A LOT OF MATERIAL.   In a way the course will take you through a compressed CCNA/CCDA/VCP/CISSP/EMCISM courseware all compressed into a few days of time and then you need to assimilate that into your head and go take a test!    If anything you should have an honest reflection of what to expect (I highly encourage your feedback if you agree, disagree or WTF on anything I’ve said)   Together we move mountains, so let’s not make mountains out of molehills, that’s how the Cloud works.   Together. :)   Oh and Good Luck, I don’t say this often on exams, but you will NEED it. *love* :)

Data Longevity, VMware deduplication change over time, NetApp ASIS deterioration and EMC Guarantee

Hey guys, the other day I was having a conversation with a friend of mine that went something like this.

How did this all start you might say?!? Well, contrary to popular belief, I am a STAUNCH NetApp FUD dispeller.  What that means is, if I hear something said about NetApp by a competitor, peer, partner or customer which I feel is incorrect or just sounds interesting; I task it upon myself to prove/disprove it because well frankly… People still hit me up with NetApp questions all the time :) (And I’d like to make sure I’m supplying them with the most accurate and reflective data! – yea that’s it, and it has nothing to do with how much of a geek I am.. :))

Well, in the defense of the video it didn’t go EXACTLY like that.   Here is a little background on how we got to where that video is today :)   I recently overheard someone say the following:

What I hear over and over is that dedupe rates when using VMware deteriorate over time

And my first response was “nuh uh!”, Well, maybe not my FIRST response.. but quickly followed by; “Let me try and get some foundational data”  because you know me… I like to blog about things and as a result collect way too much data to try to validate and understand and effectively say whatever I say accurately :)

The first thing I did was engage several former NetApp folks who are as agnostic and objective as I am to get their thoughts on the matter (we were on the same page!)Data collection time!  

For Data Collection… I talked to some good friends of mine regarding how their Dedupe savings have been over time because they were so excited when we first enabled it in the first place (And I was excited for them!)   This is where I learned some… frankly disturbing things (I did talk to numerous guys named Mike interestingly enough, and on the whole all of those who I talked with and their data they shared with me reflected similar findings)

Disturbing things learned!

Yea I’ve heard all the jibber jabber before usually touted as FUD that NetApp systems will deteriorate over time in general (whether it be Performance, whether it be Space Savings) etc etc. 

Well some of the disturbing things learned actually coming from the field on real systems protecting real production data was:

  • Space Savings are GREAT, and will be absolutely amazing in the beginning! 70-90% is common… in the beginning. (Call this the POC and the burn-in period)
  • As that data starts to ‘change’ ever so slightly as you would expect your data to change (not sit static and RO) you’ll see your savings start to decrease, as much as 45% over a year
  • This figure is not NetApp’s fault.  Virtual machines (mainly what we’re discussing here) are not designed to stay uniformly the same no matter what in accordance to 4k blocks, so the very fact that they change is absolutely normal so this loss isn’t a catastrophe, it’s a fact of the longevity of data.
  • Virtual Machine data which is optimal for deduplication typically amounts to 1-5% of the total storage in the datacenter.   In fact if we want to lie to ourselves or we have a specific use-case, we can pretend that it’s upwards of 10%, but not much more than that.  And this basically accounts for Operating System, Disk Image, blah blah blah – the normal type of data that you would dedupe in the first place.
    • I found that particularly disturbing because after reviewing the data from these numerous environments… I had the impression VMware data would account for much more!   I saw a 50TB SAN only have ~2TB of data residing in Data stores and of that only 23% of it was deduplicating (I was shocked!)
    • I was further shocked that when reviewing the data that over the course of a year on a 60TB SAN, this customer only found 12TB of data they could justify running the dedupe process against and of that they were seeing less than 3TB of ‘duplicate data’ coming in around 18% space savings over that 12TB.    The interesting bit is that the other 48TB of data just continued on un-affected by dedupe.   (Yes, I asked why don’t they try to dedupe it… and they did in the lab and, well it never made it into production)

    At this point, I was even more so concerned.   Concerned whether there was some truth to this whole NetApp starts really high in the beginning (Performance/IO way up there, certain datasets will have amazing dedupe ratios to start) etc. and then starts to drop off considerably over time, while the EMC equivalent system performs consistently the entire time.

    Warning! Warning Will Robinson!

    This is usually where klaxons and red lights would normally go off in my head.    If what my good friends (and customers) are telling me is accurate, it is that not only will my performance degrade just by merely using the system, but my space efficiency will deteriorate over time as well.    Sure we’ll get some deduplication, no doubt about that!  But the long term benefit isn’t any better than compression (as a friend of mine had commented on this whole ordeal)    With the many ways of trying to look at this and understand I discussed it with my friend Scott who had the following analogy and example to cite with this:

    The issue that I’ve seen is this:

    Since a VMDK is a container file, the nature of the data is a little different than a standard file like a word doc for example.

    Normally, if you take a standard windows C: – like on your laptop, every file is stored as 4K blocks.  However, unless the file is exactly divisible by 4K (which is rare), the last block has just a little bit of waste in it.  Doesn’t matter if this is a word doc, a PowerPoint, or a .dll in the \windows\system32 directory, they all have a little bit of waste at the end of that last block.

    When converted to a VMDK file, the files are all smashed together because inside the container file, we don’t have to keep that 4K boundary.  Kind of like sliding a bunch of books together on a book shelf eliminating the wasted space.  Now this is one of the cool things about VMware that makes the virtual disk more space efficient than a physical disk – so this is a good thing.

    So, when you have a VMDK and you clone it – let’s say create 100 copies and then do a block based dedupe – you’ll get a 99% dedupe rate across those virtual disks.  That’s great – initially.  Netapp tends to calculate this “savings” into their proposals and tell customers that require 10TB of storage, that they can just buy 5TB and dedupe and then they’ll have plenty of space.

    What happens is, that after buying ½ the storage they really needed the dedupe rate starts to break down. Here’s why:

    When you start running the VMs and adding things like service packs or patches for example – well that process doesn’t always add files to the end of the vmdk.  It often deletes files from the middle, beginning, end and then  replaces them with other files etc.  What happens then is that the bits shift a little to the left and the right – breaking the block boundaries. Imagine adding and removing books of different sizes from the shelf and making sure there’s no wasted space between them.

    If you did a file per file scan on the virtual disk (Say a windows C: drive), you might have exactly the same data within the vmdk, however since the blocks don’t line up, the block based dedupe which is fixed at 4K sees different data and therefore the dedupe rate breaks down.

    A sliding window technology (like what Avamar does ) would solve this problem, but today ASIS is fixed at 4K. 

    Thoughts?

    If you have particular thoughts about what Scott shared there, feel free to comment and I’ll make sure he reads this as well; but this raises some interesting questions.   

    We’ve covered numerous things in here, and I’ve done everything I can to avoid discussing the guarantees I feel like I’ve talked about to death (linked below) so addressing what we’ve discussed:

    • I’m seeing on average 20% of a customers data which merits deduping and of that I’m seeing anywhere from 10-20% space saved across that 20%
    • Translation: 100TB of data, 20TB is worth deduping reclaiming about 4TB of space in total; thus on this conservative estimate you’d get about 4-5% space saved!
    • Translation: When you have a 20TB data warehouse and you go to dedupe it (You won’t) you’ll see no space gained, with a 100% cost across it.
    • With the EMC Unified Storage Guarantee, that same 20TB data warehouse will be covered by the 20% more efficient guarantee (Well, EVERY data type is covered without caveat)   [It’s almost like it’s a shill, but it really bears repeating because frankly this is earth shattering and worth discussing with your TC or whoever]

    For more great information on EMC’s 20% Unified Storage Guarantee – check out these links (and other articles I’ve written on the subject as well!)

    EMC Unified Storage is 20% more efficient Guaranteed

    I won’t subject you to it, especially because it is over 7 minutes long, but here is a semi funny (my family does NOT find it funny!) video about EMCs Unified Storage Guarantee and making a comparison to NetApp’s Guarantee.   Various comments included in the description of the video – Don’t worry if you never watch it… I won’t hold it against you ;)

    Be safe out there, the data jungle is a vicious one!   If you need any help driving truth out of your EMC or NetApp folks feel free to reach out and I’ll do what I can :)

    SPOILERS!!!

    Read More

    Avamar Support Super Site! The ultimate source of source deduplication mayhem!

    You may remember my rocking consolidated blog posts for Symmetrix FAST and Celerra FAST – But here is something I didn’t even have to create myself! This is a pure total rocking consolidation for Avamar! Wowza is the first thing I’d say, and for what it’s worth, I’ve seen the internal version of the same site – Believe me when I say, the customer facing version you see below here is WAAAAAAAAAYYYYYY better!  Seriously ! Check it out, and if you don’t think this is totally rocking, I’ll one-up it and do it even better! ;)

    The link may require credentials on EMC’s PowerLink – so please keep that in mind when it comes to accessing the site!

    Avamar Support Super Site! 

    So, check it out! This will be your best friend when it comes to working with Avamar, and solving your backup problems and other for the future! :)