Have you ever looked in OpenManage Essentials and seen the above when looking at a device? I recently had this experience when checking on a number of older servers that we were not receiving alerts properly for.
Checking on the iDRAC and server it appeared that the management agents were running and correctly configured so that the OME server could contact the device, but attempts to discover and inventory were still failing. What was going on?
The Dell Troubleshooting Tool is an excellent utility by Dell to interrogate devices using variety of protocols. Querying the iDRAC using WSMAN soon found the problem:
Many of thew older servers had internal SSL certificates installed on them, which had subsequently expired. As most of the servers had been decommissioned, renewing the certificates had been overlooked.
Getting rid of the expired certificate is not as straight forward as it should be, with no ability in the iDRAC6 web interface to delete the certificate. This was resolved by accessing the IDRAC via SSH and using the racadm command:
After the iDRAC interface rebooted to apply changes, OME was then able to discover and inventory the iDRAC interface.
The Dell troubleshooting tool proved to be a very useful tool in the infrastructure admin’s toolbox for dealing with non-obvious management protocol issues.
Well, I am back in HP space for a little bit – configuring up a couple of HP c7000 chassis with some Gen8 blades. Being Gen8 they come equipped with an iLO4 interface and it has given me the opportunity to use the HP iLO mobile app. For the purposes of this article, this app was being used on an iPad Air with a bluetooth keyboard.
Having got my basic configuration into the blades I started adding them to the app, which was a little tedius having to re-enter the same credentials all the time. Dear HP, I would love a setting to be able to have a default or global credential store.
This can be worked around however, particularly if you are familiar with QR Codes. Making some assumptions that your server room is secure, you can print out QR codes for your devices with a string of hostname;username;password to put on servers, and then adding servers becomes a scan of the QR code using the app (or a paper based booklet of ‘codes’). The big problem here is that if you do not keep these QR codes secure, anyone with a QR code reading app can obtain login credentials.
As seen in the first image, I have a list of iLO interfaces. There are a couple of servers there with detailed information, and that is collected once you connect to the device for the first time. There is very limited organisation of the devices, with the ability to have a favourites list and thats about it. Dear HP, I would really like to see the ability to see folder organisation in future releases of this app. This will become unweildy with lots of devices.
It was supposed to be a routine ESXi upgrade. Having discovered that one of the NetApp's OnTap software had a memory leak which was causing a controller to slow down and eventually fail over every 3-6 months, a pre-requisite of updating the OnTap software was to upgrade the ESXi hosts to at least version 5.
Having been doing a number of these upgrades recently, my confidence was high as I submitted the change requests, notified stakeholders and completed the upgrade.
Having covered off on the methodology in Part 1 and the Bare Metal Results of Part 2, we now get onto what was perhaps the more controversial aspect of my testing where I worked – performance under a virtualisation platform. We primarily use VMware, and so the tests were done using this.
Disclaimer: The numbers obtained below are not indicative of the true performance of the server and should not influence any purchasing decisions made.
With virtualisation, I’m conservative and have a general expectation that I would see some minimal performance degradation of under 10%. The first tests we did consisted of creating an empty VM, and then running the PTS Live CD, as we thought this would rule out any disk I/O operations. I compared the results to the bare metal for each server and was surprised by the results:
Benchmarking is actually something that needs to be considered very carefully and objectively. Not all benchmarks are equal. Phoronix test suite was good in the sense that you can benchmark certain workloads and I chose to focus on apache and PostgreSQL tests in the product as this closely represented the workloads I needed to improve performance on. At this point it was decided to use the PTS Desktop Live (http://www.phoronix-test-suite.com/?k=pts_desktop_live) as it was felt that this test would assure all things would be equal, regardless of platform.
Not everything was exactly equal though – none of the hardware was like for like in specifications. As I was working towards matching and improving the score of the R910 against the DL380 G5, the best I could achieve was to ensure that benchmarks were consistent. In this case the Live CD achieved this by ensuring that the same linux build and benchmarking tools were being used. I will discuss additional factors impacting the results as I continue through this series.
When someone say’s their Virtual Machine is running slowly, the first thing we do is check out the performance graphs with the vSphere client. Everything looks normal, and so it is easy to dismiss it as an application issue. However, what happens when this issue is the result of a previously undetected problem that appears to have multiple components to it?
This is the problem I am currently working on now with a Dell R910 Server. The Dell R910s are quite a powerful machine designed for more intensive workloads and certainly comes as a surprise that we are seeings performance issues.
This is currently an issue being worked on, and I do not know what the outcome will be. However I will be documenting the steps being taken in order to benchmark performance and how we might improve the performance.