Entries in Virtualization (6)


vSphere Research Project - Security Update

Several readers (ok, more than several) pointed out some security concerns about releasing esxtop data with hostname and VM name in it. Click through for a simple perl script that will strip out all the names from your data. They are replaced with server1 ... serverN. It is not pretty, but it gets the job done. Run the script with the name of the csv file from esxtop. This was a giant oversight on my part, and I should have put this up with the original request. 

This post refers to a paid vSphere research survey here

Click to read more ...


vSphere Research Project

I did not realize how long it has been since I last posted. Time has been flying by. I took a position with a startup about 6 months ago. My new company is still in stealth mode, so there is not much I can say about the details yet. We are solving some hard IT problems and I think the end result is going to be very exciting.

One of the projects I am working on requires me to understand what "typical" VMware environments looks like. After lots of digging, it turns out there is not much published information about the systems running in these environments. So, I am looking to all of you to help me gather data. I have posted a survey online and anyone who submits esxtop output from a vSphere 4.1 (or later) system will get a $10 Starbucks card.

 Click here to take the survey and feel free to forward it along.

I am looking forward to coming out of stealth and getting to share all of the exciting things we have been working on. Watch here or follow me @JesseStLaurent for updates as we get closer to launch.


Commercial Storage at "cloud scale"

The major storage manufacturers are all chasing the cloud storage market. The private cloud storage market makes a lot of sense to me. Clients adopting private cloud methodologies have additional, often more advanced, storage requirements. This will frequently require a storage rearchitecture and may dictate changing storage platforms to meet the new requirements. The public cloud storage market outlook is much less clear to me.

If public cloud services are as successful as the analysts, media, and vendors are suggesting they will be, then cloud providers will become massive storage buyers at a scale that dwarfs today's corporate consumers. Whether the public cloud storage is part of an overall architecture that includes compute and capacity or a pure storage solution, the issue is the same. This is not about 1 or 2PB. The large cloud providers could easily be orders of magnitude larger than that.

Huge storage consumers are exactly what the storage manufacturers are looking for, right? Let me suggest something that may sound counterintuitive. Enormous success of cloud providers will be terrible news for today's mainstream storage manufacturers.

Click to read more ...


Block alignment is critical

Block alignment is an important topic that is often overlooked in storage. I read a blog entry by Robin Harris a couple months back about the importance of block alignment with the new 4KB  drives. I was curious to test the theory on one of the new 4KB drives, but I did not have one on hand. That got me thinking about Solid State Disk (SSD) devices. If filesystem misalignment hurts traditional spinning disk performance, how would it impact SSD performance. In short, it is ugly. Here is a chart showing the difference between aligned and misaligned random read operations to a Sun F20 card. I guess it is officially an Oracle F20 card. Oracle F20 - Aligned vs. Misaligned Oracle F20 - Aligned vs. Misaligned With only a couple threads, the flash module can deliver about 50% more random 4KB read operations. As the thread count increases, the module is able to deliver over 9x the number of operations if properly aligned. It is worth noting that the card is delivering those aligned reads at less that 1ms while the misaligned operations average over 7ms of latency. 9x the operations at 85% less latency makes this an issue worth paying attention to. (My test was done on Solaris and here is an article about how to solve the block alignment issue for Solaris x64 volumes.) I have seen a significant increase in block alignment issues with clients recently. Some arrays and some operating systems make is easier to align filesystems than others, but a new variable has crept in over the last few years. VMware on block devices means that VMFS adds another layer of abstraction to the process. Now it is important to be sure the virtual machine filesystems are aligned in addition to the root operating system/hypervisor filesystem. Server virtualization has been the catalyst for many IT organizations to centralize more of their storage. Unfortunately, centralized storage does not come at the same $/GB as the mirrored drives in the server. It is much more expensive. Block misalignment can make the new storage even more expensive by making it less efficient. If the filesystems are misaligned, then it makes the array cache far less efficient. When that misaligned data is read from or written to disk, the drives are forced to do additional operations that would not be required for an aligned operation. It can quickly turn a fast storage array into a very average system. Most of the storage manufacturers can provide you with a best practices doc to help you avoid these issues. Ask them for a whitepaper about block alignment issues with virtual machines.

Click to read more ...


VMware boot storm on NetApp - Part 2

I have received a few questions relating to my previous post about NetApp VMware bootstorm results and want to answer them here.  I have also had a chance to look through the performance data gathered during the tests and have a few interesting data points to share. I also wanted to mention that I now have a pair of second generation Performance Accelerator Modules (PAM 2) in hand and will be publishing updated VMware boot storm results with the larger capacity cards. What type of disk were the virtual machines stored on?

  • The virtual machines were stored on a SATA RAID-DP aggregate.
What was the rate of data reduction through deduplication?
  • The VMDK files were all fully provisioned at the time of creation. Each operating system type was placed on a different NFS datastore. This resulted in 50 virtual machines on each of 4 shares. The deduplication reduced the physical footprint of the data by 97%
A few interesting stats gathered during the testing. These numbers are not exact and due to the somewhat imprecise nature of starting and stopping statit in synchronization with the start and end of each test.
  • The CPU utilization moved inversely with the boot time. The shorter the boot time, the higher the CPU utilization. This is not surprising as during the faster boots, the CPUs were not waiting around for disk drives to respond. More data was served from cache the the CPU could stay more utilized.
  • The total NFS operations required for each test was 2.8 million.
  • The total GB read by the VMware physical servers from the NetApp was roughly 49GB.
  • The total GB read from disk trended down between cold and warm cache boots. This is what I expected and would be somewhat concerned if it was not true.
  • The total GB read from disk trended down with the addition of each PAM. Again, I would be somewhat concerned if this was not the case.
  • The total GB read from disk took a significant drop when the data was deduplicated. This helps to prove out the theory that NetApp is no longer going to disk for every read of a different logical block that points to the same physical block.
How much disk load was eliminated by the combination of dedup and PAM?
  • The cold boots with no dedup and no PAM read about 67GB of data from disk. The cold boot with dedup and no PAM dropped that down to around 16GB. Adding 2 PAM (or 32GB of extended dedup aware cache) dropped the amount of data read from disk to less that 4GB.

Click to read more ...


Benchmarking and 'real FC'

Sometimes I think the only people who read technology blogs are people who write other technology blogs. I have no way to figure out if this is true or not, but it is an interesting topic to ponder. Do IT end users actually read technology blogs? If they are reading, they do not seem to comment very frequently. Much more often comments come from other bloggers or competing vendors. That said, I am going to talk about an issue that some of the storage bloggers seem to be caught up in at the moment. The issue of 'emulated FC' vs 'real FC.' Let me start off by sharing a few recent posts from other blogs: Chuck Hollis at EMC writes about the EMC/Dell relationship and takes the opportunity to compare EMC to NetApp. In this case, he is comparing the EMC NX4 to the NetApp FAS2020. The comment in the post that certainly aggravated NetApp is that EMC does "real deal FC that isn't emulated." The obvious implications being that EMC FC is not emulated, NetApp FC is emulated, and FC emulation is bad. (This is not a new debate between EMC and NetApp. Look back through the blogs at both companies and you will find plenty of back and forth on the topic. Kostadis Russos at NetApp has a post explaining why he, not surprisingly, completely disagrees with Chuck. Stephen Foskett, a storage consultant, posts what I think is an excellent overview of the issues. He cuts through the marketing spin and asks the right questions. His coverage of the topic is so complete, I almost decided not to write about the topic. I will try not to retrace all the issues he covered. I will hit a couple of his high level points in case you have not had a chance to read his post (I highly recommend it though, it is very good.) In summary:

  • All enterprise storage arrays “emulate” Fibre Channel drives to one extent or another
  • NetApp is emulating Fibre Channel drives
  • All modern storage arrays emulate SCSI drives
  • Using the wrong tool for the job will always lead to trouble
  • Which is more important to you, integration, performance, or features?
So, why am I writing about it? I am writing about it because Chuck posted a very good blog entry about benchmarking a few days later that, to me, contradicts the importance he gave to 'real FC' on 12/9. I have never meet Chuck or Stephen, but they both seem to be very technically adept from their postings. Without trying to put words in his mouth (text on his blog?), the overall theme of Chuck's post is to make sure you use meaningful tests if you want meaningful results from a storage product benchmark. He is absolutely correct. I could not agree more. How many times have we seen benchmarks performed that were completely irrelevant to the workload the array would see in production? My question is, if the end result of performance testing with real world applications produces acceptable results, then who cares what is 'real' and what is 'emulated'? The average driver does not worry about how the computer in her car is controlling the variable valve timing. She worries about whether it reliably gets her to work on time. VMware is selling plenty of virtualization technology that presents devices that are not 'real.' I know it is not storage, but why is that any different? Less and less is 'real' in storage these days. It is impossible to continue to drive innovation in storage array technology if we are bound by the old ideas of how we configure and manage our storage. With the introduction of technologies that leverage thin provisioning, dependent pointer based copies, compression, and deduplication we need to rethink concepts as fundamental as RAID groups, block placement, and LUN configuration. Or in my opinion, we need to stop thinking about those things. Controlling the location of the bits is not what matters. Features and performance are what matter. Results in the real world matter. Look at the systems available and decide what blend of features fits your organization and workload best. Full disclosure: My company provides storage consulting on all of the platforms discussed above. We sell NetApp, Sun, and HDS products. We do not sell EMC products.

Click to read more ...