The Foundations Of A Performance POC

One of the really cool aspects of Pure Storage is that customers and potential customers appreciate our ability to execute on a POC (Proof Of Concept) .  When I refer to existing customers, this is where they have other use cases they are exploring and we work to help “Validate The Value” with an additional POC.  Recently, I responded to a post on LinkedIn where a potential customer was looking for advice around how to test an AFA (All Flash Array).  I thought I would capture some of these foundational elements and share with the broader community.

You are likely asking yourself several questions…..

  • How do I test an All Flash Array?
  • Can I use the same methodologies as my traditional/legacy disk based array?
  • What tool(s) should I use?
  • What metrics are most valuable to capture?

While I do represent Pure Storage, this basic methodology I will discuss is vendor agnostic.  If you are one of our awesome Pure Partners, you may have captured a portion of this methodology from the Partner Training Webcast that Lou Lydiksen (Performance Engineering) and myself conducted on Dec 2nd.  You can find that here and we would love your feedback!

Let me start off with a question.  When should synthetic “performanceScreen Shot 2014-12-15 at 12.23.56 AM corners” testing be run?  The survey says.. Preferably never!  The reasoning is quite simple and can be answered with another question.  How many synthetic instances of your application do you have running in production?  None! In other words, you can likely take a Formula One race car on the ice to test its “performance corners”, but I suspect the results will offer little value relative to the typical racing environment.  The same applies to synthetic “performance corners” testing on any storage platform.  Baseline your testing to be more representative of your production environment(s) which provide a better expectation of the business value derived from the underlying technologies.

What is our recommendation around offering our customers the ability to see the true potential of Pure’s technology?

  1. Consider the no-risk “Love Your Storage Guarantee”.
  2. Move a “copy” of your Dev/Test/and or Production workloads onto the Pure Storage FlashArray. With non-disruptive hypervisor based mobility, this can offer up a quick way to validate application benefit, technology efficiency, and performance capabilities.
  3. If synthetic testing is leveraged, conduct a synthetically-modeled customer workload scaling test with application load generators approved by application vendors. Think SLOB, Hammerdb/ora, Oracle RAT, SQLIO, etc.
  4. If synthetic testing is leveraged, conduct a synthetically-modeled customer workload scaling test with generic load generators like Load DynamiX, vdbench, FIO, etc.
  5. Consider using the new Pure Storage vdbench kit 2.1 for the next best thing to performance testing.  This kit was designed with real-world characteristics which will allow customers to assess performance and data reduction of any AFA.  Real-world testing is our preferred and default recommendation, but this test kit provides the next best alternative.  Please reach out to your Pure Storage SE and/or Pure Storage Partner for the latest Pure Storage test kit.

What is our recommendation around synthetic “performance-corners” testing?

Pure Storage came up with a POC framework called ASAP which stands for:

Availability, Scalability, Affordability, and Performance

We have listed these in order of testing priority.  Given AFA’s continue to service mission-critical workloads, Availability cannot be compromised during planned and unplanned events.  More on this framework later.

One of our recommendations for Performance testing, would be to test with real-world data as that is the ONLY way to get the most accurate expectation in terms of performance, efficiency (data reduction), and resiliency. If you can avoid synthetic testing altogether and focus on actual real-world datastreams, this can shorten your testing effort while providing more realistic expectations of the technologies’ capabilities relative to your environment.  We want customers to validate that the AFA meets their requirements in the areas of cost, resiliency, and integration before performance, as performance varies greatly based on hardware configurations.  Value in areas of data reduction (infrastructure efficiency & cost), resiliency, and integration are constants of an architecture – unlike performance.

With regards to synthetic testing, we have moved away from recommending homogeneous fixed-block tests (1K,2K,4K,8K….), to tests with more real-world block size “mixes”.

Likewise, our recommendation is that 100% read and 100% write tests are of little value when compared to read/write mixed tests (what you see in real workloads).

Also, it is our recommendation that a dataset comprised of a mix of reducible data-subsets be created to run the performance tests against. I think we would all agree that testing a platform outside of its capabilities as a measure of a baseline might set false expectations. Hence why the best measure of a baseline is to leverage real datasets and datastreams that will have reducibility and characteristics that unfortunately you are challenged with reproducing through synthetic means.  Real-world datasets are data-reducible, the data reduction ratio is a critical element of an AFA, as it drives the economics and enables the transformational impact of an AFA to be applied broadly.  If you’re not testing data reduction capability of an array, you’re missing out on understanding a critical element of an AFA = incomplete evaluation.

Finally, we also believe that “performance corners” tests should attempt to emulate real-world datastreams that have, at the very least, the following characteristics:

– A read io size mix that is different than the write io size mix
– A read range that is different than the write range
– A set of “hotbands” within the ranges that are different for reads and writes.
– The dataset and the datastreams should use data that has both a deduplication component and a compression component that is modeled after the datasets and datastreams that the customer wishes to serve from the AFA
– We have settled on a vdbench datastream that reduces to about 5:1 on Purity 4.0.x today

We combine all of these into IO fingerprintramp tests so that we can show the relationship between latency, throughput, and concurrency for the dataset/datastreams combo.  Every dataset/datastream has its own Little’s Law fingerprint.

We will soon publish a whole methodology based upon these principles. We do have the VDBench scripts for this that we can produce now. Again, this is only if you are unable to test real working sets which we strongly urge as a best measure of the ANY technologies capabilities.

We believe that this will provide a much more realistic and valuable synthetic “Performance” testing environment for any storage array, but in particular, non-disruptive all-flash data-reduction storage arrays like a Pure Storage FlashArray. It is important to inject all sorts of failures through this process to highlight planned and unplanned events.

Screen Shot 2014-12-24 at 9.36.07 AM

As much as we would like to say technology never fails, we know that when it does it does so in a disruptive manner (generally not a graceful shutdown procedure). What a better way to baseline performance and resiliency other than pulling components while testing performance. You will find Pure Storage is very well-differentiated in this area.  More importantly, once you put mission critical business applications on flash there is a new expectation set (latency and/or velocity) and during planned/unplanned events this expectation should be maintained.

Hope this helps and thanks for evaluating Pure Storage.  We know you will Love Your Storage!

Leave a Reply