Caveats of Benchmarking
Benchmarking is actually something that needs to be considered very carefully and objectively. Not all benchmarks are equal. Phoronix test suite was good in the sense that you can benchmark certain workloads and I chose to focus on apache and PostgreSQL tests in the product as this closely represented the workloads I needed to improve performance on. At this point it was decided to use the PTS Desktop Live (http://www.phoronix-test-suite.com/?k=pts_desktop_live) as it was felt that this test would assure all things would be equal, regardless of platform.
Not everything was exactly equal though – none of the hardware was like for like in specifications. As I was working towards matching and improving the score of the R910 against the DL380 G5, the best I could achieve was to ensure that benchmarks were consistent. In this case the Live CD achieved this by ensuring that the same linux build and benchmarking tools were being used. I will discuss additional factors impacting the results as I continue through this series.
Also of note that the results from PTS are to some degree influenced by the clock speed and cache of a single core on a multi core CPU and that this is due to the way the benchmark processes are threaded. much of the testing saw one core that had the “listener” thread for the benchmarking sitting at 100%, whereas the worker threads were sitting at anywhere between 5% and 50% depending on the test – so perhaps the scores are a bit lower as a result of this bottleneck.
Now lets get on to some numbers.
Bare Metal Numbers
Disclaimer: The numbers obtained below are not indicative of the true performance of the server and should not influence any purchasing decisions made.
As I had a Dell R820 Gen 12 at my disposal in pre-production I took the opportunity to run these tests against it as well. The R910 had both the Hyper Threading and Max Performance configured at the same time, so no benchmark for just HT Off in this case, though it can be inferred from the numbers there would be a difference. The HP was already set for maximum performance. Max Performance Profile refers to the power management profile used.
The following numbers were a result the apache benchmark. The test profile measure how many requests per second a given system can sustain when carrying out 500,000 requests with 100 requests being carried out concurrently.
|Dell R820||Dell R910||HP DL380 G5|
|BareMetal HT On||6621||9784||7000|
|BareMetal HT Off||15870||–||–|
|BareMetal HT Off & Max Perf. Profile||22845||16780||–|
The second test was using PGBench. PGBench performed a simple TPC-B like benchmark of PostgreSQL:
|Dell R820||Dell R910||HP DL380 G5|
|BareMetal HT On||2062||2777||4500|
|BareMetal HT Off||8016||–||–|
|BareMetal HT OffMax perf. Profile||14791||10230||–|
To be fair, the Dell R820s are 2 years newer than the R910s and therefore have different chipsets and the CPUs are a different architecture and clock speed, so it was always expected that they would outperform the R910s. What was found interesting in the bare metal tests was that the performance of the server was particularly sensitive to Hyper Threading and the power management profile.
It was later found that PTS under the Live CD was not hyper-threading aware and treated the logical CPUs created by enabling of HT to be real CPU rather than a modified pipeline. The power management profile on the other hand did make a significant difference, particularly in the PGBench tests where nearly 4-fold increases were seen in the R910 and a 7-fold increase in the R820. This change alone saw the significant increases I would expect from newer hardware.
The next instalment of this series will look at the benchmarking under a virtual environment, and where things started to go a little pear shaped.