One of the things I am big on is ensuring servers maintain consistency in their configuration. Troubleshooting becomes a lot harder when every server is set up a different way.
Dell’s OpenManage Essentials (and Enterprise) have a configuration compliance portal which allows you to deploy consistent settings like BIOS, iDRAC and network configuration and to view any differences to the baseline template.
However, I’ve been pulling my hair out (what’s left of it) over the past couple of weeks with a particular configuration attribute that was just not updating.
So here’s a question I want you to try answering off the top of your head – Which certificate is your domain controller using for Kerberos & LDAPS and what happens when there are multiple certificates in the crypto store?
The answer is actually pretty obvious if you already know the answer, however this was the question I faced recently, and ended up having to do a little bit of poking around to answer the question.
The scenario in question for me is having built a new multi-tier PKI in our environment I have reached the point of migrating services to it, including the auto-enrolling certificates templates used on Domain Controllers.
Have you ever looked in OpenManage Essentials and seen the above when looking at a device? I recently had this experience when checking on a number of older servers that we were not receiving alerts properly for.
Checking on the iDRAC and server it appeared that the management agents were running and correctly configured so that the OME server could contact the device, but attempts to discover and inventory were still failing. What was going on?
The Dell Troubleshooting Tool is an excellent utility by Dell to interrogate devices using variety of protocols. Querying the iDRAC using WSMAN soon found the problem:
Many of thew older servers had internal SSL certificates installed on them, which had subsequently expired. As most of the servers had been decommissioned, renewing the certificates had been overlooked.
Getting rid of the expired certificate is not as straight forward as it should be, with no ability in the iDRAC6 web interface to delete the certificate. This was resolved by accessing the IDRAC via SSH and using the racadm command:
After the iDRAC interface rebooted to apply changes, OME was then able to discover and inventory the iDRAC interface.
The Dell troubleshooting tool proved to be a very useful tool in the infrastructure admin’s toolbox for dealing with non-obvious management protocol issues.
Well, I am back in HP space for a little bit – configuring up a couple of HP c7000 chassis with some Gen8 blades. Being Gen8 they come equipped with an iLO4 interface and it has given me the opportunity to use the HP iLO mobile app. For the purposes of this article, this app was being used on an iPad Air with a bluetooth keyboard.
Having got my basic configuration into the blades I started adding them to the app, which was a little tedius having to re-enter the same credentials all the time. Dear HP, I would love a setting to be able to have a default or global credential store.
This can be worked around however, particularly if you are familiar with QR Codes. Making some assumptions that your server room is secure, you can print out QR codes for your devices with a string of hostname;username;password to put on servers, and then adding servers becomes a scan of the QR code using the app (or a paper based booklet of ‘codes’). The big problem here is that if you do not keep these QR codes secure, anyone with a QR code reading app can obtain login credentials.
As seen in the first image, I have a list of iLO interfaces. There are a couple of servers there with detailed information, and that is collected once you connect to the device for the first time. There is very limited organisation of the devices, with the ability to have a favourites list and thats about it. Dear HP, I would really like to see the ability to see folder organisation in future releases of this app. This will become unweildy with lots of devices.
It was supposed to be a routine ESXi upgrade. Having discovered that one of the NetApp's OnTap software had a memory leak which was causing a controller to slow down and eventually fail over every 3-6 months, a pre-requisite of updating the OnTap software was to upgrade the ESXi hosts to at least version 5.
Having been doing a number of these upgrades recently, my confidence was high as I submitted the change requests, notified stakeholders and completed the upgrade.
Having covered off on the methodology in Part 1 and the Bare Metal Results of Part 2, we now get onto what was perhaps the more controversial aspect of my testing where I worked – performance under a virtualisation platform. We primarily use VMware, and so the tests were done using this.
Disclaimer: The numbers obtained below are not indicative of the true performance of the server and should not influence any purchasing decisions made.
With virtualisation, I’m conservative and have a general expectation that I would see some minimal performance degradation of under 10%. The first tests we did consisted of creating an empty VM, and then running the PTS Live CD, as we thought this would rule out any disk I/O operations. I compared the results to the bare metal for each server and was surprised by the results: