Performance Benchmarking Dell R910 – Part 3 (Final)

Having covered off on the methodology in Part 1 and the Bare Metal Results of Part 2, we now get onto what was perhaps the more controversial aspect of my testing where I worked – performance under a virtualisation platform. We primarily use VMware, and so the tests were done using this.

Disclaimer: The numbers obtained below are not indicative of the true performance of the server and should not influence any purchasing decisions made. 

With virtualisation, I’m conservative and have a general expectation that I would see some minimal performance degradation of under 10%. The first tests we did consisted of creating an empty VM, and then running the PTS Live CD, as we thought this would rule out any disk I/O operations. I compared the results to the bare metal for each server and was surprised by the results:

Server PTS LiveCD (Bare Metal) PTS LiveCD VM
Dell R910 17000 5400
HP DL380 G5 7000 6105

These results caused great confusion – a roughly 2/3 performance reduction under VMware on the Dell server versus around 1/5 from the HP. Something was amiss. The PGBench results weren’t much better:

Server PTS LiveCD (Bare Metal) PTS LiveCD VM
Dell R910 10000* 2800
HP DL380 G5 4500 3200

The very first thought I had was that being a Live CD, it would not have VMware Tools installed, which could mean that some critical paravirtual drivers were missing. Being a MS Administrator I generally don’t accept significant driver installations would work without a reboot, something which would not be possible with a Live CD. As a result, a VM was built running CentOS and VMtools and PTS Suite installed.

The results remained the same.

From previous investigation, we knew that the PTS scores were limited by the processing ability of one particular monitor thread which could skew results.

At this point a decision was made to just focus on the Apache front end rather than both Apache and PGBench. We also decided to start experimenting with different Linux distributions that were supported under VMware, which led to some interesting discoveries:

  Apache Benchmark Score
Ubuntu 12.04 LTE x64 19000
Scientific Linux x64 5800
CentOS 6.4 x64 5500
Ubuntu server was far and away the best performer, with score consistently surpassing even bare metal. Once again this raised eyebrows.

Conclusions

I would like to tell you all that that we found a performance bottleneck and resolved the issue. Unfortunately that is not the case. Being mainly a Microsoft shop, there is limited internal resources and knowledge to have advanced this performance benchmarking further, and to have arranged for a trusted external resource to come in and do benchmarking and tuning would have been a costly exercise. However we did learn some things about how to benchmark things going forward:

  • Make sure you understand what your benchmarking tool is doing, particularly if you are relying on generating some workload-based benchmarks rather than raw performance. I later found out that the benchmarks  I was doing did not necessarily reflect what the actual workload was going to be.
  • Have a well defined test plan. “Lets make a change here and see what happens” is not a test plan. Understanding what you are doing and why is important for interpreting the results.
  • Understand the limitations of your benchmarking software, and use purpose built benchmark platforms where available. PTS is good for bare-metal tests, however I remain unconvinced about it’s ability to benchmark virtual machines.
  • Do not tune your platform to the benchmarking tool. Your system is what it is. I came very close to attempting to tune the OS to the benchmarking tool, which loses sight of the fact you need to tune for the workload and not the benchmark score.

Benchmarking was a steep learning curve for me (and my colleagues that were also involved). It was the first time such extensive tests were done on hardware in our workplace. I don’t think it was a success, but I think it has us all thinking a lot harder about meeting baseline workload requirements for future systems.

 

Advertisements