Have you ever looked in OpenManage Essentials and seen the above when looking at a device? I recently had this experience when checking on a number of older servers that we were not receiving alerts properly for.
Checking on the iDRAC and server it appeared that the management agents were running and correctly configured so that the OME server could contact the device, but attempts to discover and inventory were still failing. What was going on?
The Dell Troubleshooting Tool is an excellent utility by Dell to interrogate devices using variety of protocols. Querying the iDRAC using WSMAN soon found the problem:
Many of thew older servers had internal SSL certificates installed on them, which had subsequently expired. As most of the servers had been decommissioned, renewing the certificates had been overlooked.
Getting rid of the expired certificate is not as straight forward as it should be, with no ability in the iDRAC6 web interface to delete the certificate. This was resolved by accessing the IDRAC via SSH and using the racadm command:
After the iDRAC interface rebooted to apply changes, OME was then able to discover and inventory the iDRAC interface.
The Dell troubleshooting tool proved to be a very useful tool in the infrastructure admin’s toolbox for dealing with non-obvious management protocol issues.
Having covered off on the methodology in Part 1 and the Bare Metal Results of Part 2, we now get onto what was perhaps the more controversial aspect of my testing where I worked – performance under a virtualisation platform. We primarily use VMware, and so the tests were done using this.
Disclaimer: The numbers obtained below are not indicative of the true performance of the server and should not influence any purchasing decisions made.
With virtualisation, I’m conservative and have a general expectation that I would see some minimal performance degradation of under 10%. The first tests we did consisted of creating an empty VM, and then running the PTS Live CD, as we thought this would rule out any disk I/O operations. I compared the results to the bare metal for each server and was surprised by the results:
Benchmarking is actually something that needs to be considered very carefully and objectively. Not all benchmarks are equal. Phoronix test suite was good in the sense that you can benchmark certain workloads and I chose to focus on apache and PostgreSQL tests in the product as this closely represented the workloads I needed to improve performance on. At this point it was decided to use the PTS Desktop Live (http://www.phoronix-test-suite.com/?k=pts_desktop_live) as it was felt that this test would assure all things would be equal, regardless of platform.
Not everything was exactly equal though – none of the hardware was like for like in specifications. As I was working towards matching and improving the score of the R910 against the DL380 G5, the best I could achieve was to ensure that benchmarks were consistent. In this case the Live CD achieved this by ensuring that the same linux build and benchmarking tools were being used. I will discuss additional factors impacting the results as I continue through this series.
When someone say’s their Virtual Machine is running slowly, the first thing we do is check out the performance graphs with the vSphere client. Everything looks normal, and so it is easy to dismiss it as an application issue. However, what happens when this issue is the result of a previously undetected problem that appears to have multiple components to it?
This is the problem I am currently working on now with a Dell R910 Server. The Dell R910s are quite a powerful machine designed for more intensive workloads and certainly comes as a surprise that we are seeings performance issues.
This is currently an issue being worked on, and I do not know what the outcome will be. However I will be documenting the steps being taken in order to benchmark performance and how we might improve the performance.
HP released an advisory last week detailing an issue with upgrading firmware for iLO 3 interfaces of a particular firmware version to version 1.5. This appears to affects tools using the hponcfg utility, command line (CLI) interfaces and HPSUM update tools.
The good news is there is a fix for this, with HP releasing revised toolsets on the iLO 3 site, advising versions 4.1.0 of the tools or later have resolved this issue.
Where I work we have recently acquired a number of Dell R620 servers to be used in remote locations. Personally I think they are a great all round server for the light to medium workloads seen on the infrastructure we manage.
I’ve been pushing hard to use automated deployment systems that rely on the iDRAC interface, with the plan being that people take the rack and stack a previously unopened server at the remote site, use the front panel to configure iDRAC Network and then come back to work to provision the server remotely. The default username and password for the iDRAC is well known. That is until recently when we placed a server on site, confirmed we could remotely contact the iDRAC and came back, only to find the defaults not working.
It would appear that certainly the batch of servers we have, and some anecdotal reports from friends deploying Dell servers is that some server’s iDRAC interface appears to have a “faulty” default password. This was confirmed with a follow up call to Dell support. Current fix is (you guessed it) to ensure you set the password before going out on site. Whilst an inconvenience, luckily this was discovered after one server went out, and not all of them.
So just a heads for people that if you are relying on you iDRAC to remotely provision brand new boxes, you may want to set the default username/password yourself rather than making an assumption it will work.