« VMware boot storm on NetApp - Part 2 | Main | ZFS Capacity Usage - Optimizing Compression and Record Size Settings »

VMware boot storm on NetApp

UPDATE: I have posted an update to this article here: More boot storm details

Measuring the benefit of cache deduplication with a real world workload can be very difficult unless you try it in production. I have written about the theory in the past and I did a lab test here with highly duplicate synthetic data. The results were revealing about how the NetApp deduplication technology impacts both read cache and disk. Based on our findings, we decided to run another test. This time the plan was to test NetApp deduplication with a VMware guest boot storm. We also added the NetApp Performance Accelerator Module (PAM) to the testing.

The test infrastructure consists of 4 dual socket Intel Nehalem servers with 48GB of RAM each. Each server is connected to a 10GbE switch. A FAS3170 is connected to the same 10GbE switch. There are 200 virtual machines: 50 Microsoft Windows 2003, 50 Microsoft Vista, 50 Microsoft Windows 2008, and 50 linux. Each operating system type is installed in a separate NetApp FlexVol for a total of 4 volumes. This was not done to maximize the deduplication results. Instead we did it to allow the VMware systems to use 4 different NFS datastores. Each physical server mounts all 4 NFS datastores and the guests were split evenly across the 4 physical servers.

The test consisted of booting all 200 guests simultaneously. This test was run multiple times with the FAS 3170 cache warm and cold, with deduplication and without, and with PAM and without. Here is a table summarizing the boot timing results. This is the amount of time between starting the boot and the 200th system acquiring an IP address. Here are the results:

  Cold Cache (MM:SS) Warm Cache (MM:SS) % Improvement
0 PAM 15:09 13:42 9.6%
1 PAM 14:29 12:34 12.2%
2 PAM 14:05 8:43 38.1%
0 PAM 8:37 7:58 7.5%
1 PAM 7:19 5:12 29.0%
2 PAM 7:02 4:27 37.0%

Let's take a look at the Pre-Deduplicaion results first. The warm 0 PAM boot performance improved by roughly 9.6% over the cold cache test. I suspect the small improvement is because the cache has been blown out by the time the cold cache boot completes. This is the behavior I would expect when the working set is substantially larger than the cache size. The 1 PAM warm boot results are 13.2% faster than the cold boot suggesting that the working set is still larger than the cache footprint. With 2 PAM cards, the warm boot is 38.1% faster than the cold boot. With 2 PAM cards it appears that a significant portion of the working set is now fitting into cache enabling a significantly faster warm cache boot.

The Post-Deduplication results show a significant improvement in cold boot time over the Pre-Deduplication results. This is no surprise since once the data is deduplicated, the NetApp will fulfill a read request for any duplicate data block already in the cache by a copy in DRAM and save a disk access. (This article contains a full explanation of how the cache copy mechanism works.) As I have written previously, reducing the physical footprint of data is only one benefit of a good deduplication implementation. Clearly, it can provide a significant performance improvement as well.

As one would expect, the Post-Deduplication warm boots also show a significant performance improvement over the the cold boots. The deduplicated working set appears to be larger than the 16GB PAM card as adding a second 16GB card further improved the warm boot performance. It is certainly possible the additional PAM capacity would further improve the results.

It is worth noting that NetApp has released a larger 512GB PAM II card since we started doing this testing. The PAM I used in these tests is a 16GB DRAM based card and the PAM II is a 512GB flash based card. In theory, a DRAM based card should have lower latency for access. Since the cards are not directly accessed by a host protocol, it is not clear if the performance will be measurable at the host. Even if the card is theoretically slower, I can only assume the 32x size increase will more than make up for that with an improved hit rate.

Thanks to Rick Ross and Joe Gries in the Corporate Technologies Infrastructure Services Group who did all the hard work in the lab to put these results together.

Originally posted at http://ctistrategy.com

PrintView Printer Friendly Version

EmailEmail Article to Friend

References (4)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    Jesse St. Laurent's blog. My thoughts on technology trends. - Blog - VMware boot storm onĀ NetApp
  • Response
    Response: click site
    Nice Website, Maintain the very good work. Thank you.
  • Response
    Response: computer support
    Jesse St. Laurent's blog. My thoughts on technology trends. - Blog - VMware boot storm on NetApp
  • Response
    Response: Allen Daniels
    Jesse St. Laurent's blog. My thoughts on technology trends. - Blog - VMware boot storm on NetApp

Reader Comments (1)

[...] This post was mentioned on Twitter by valb00 and bgracely, bgracely. bgracely said: Real-world (production) testing of NetApp DeDup + PAM + VMware, http://bit.ly/2TZ5fH << PAM II makes this even better [...]

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>