Ah, the joys of upgrading the home lab. It’s almost guaranteed that something goes wrong, since I don’t really spend much time maintaining my environment. I wanted to update my vCenter Appliance from 5.5 Update 3d to Update 3e. I normally use the built-in update functionality of the vCenter Appliance VAMI page. That has been one of my favourite and best features, and it has never failed. Well, until now.
The download and update process worked until the final reboot. After that, I noticed that I could not login to vCenter, so I logged back into VAMI. The vCenter Server service was not running. This is the time for a deep breath, because it’s not gonna be pretty. I did try my luck with rebooting the appliance, but of course that didn’t help. In my experience, if the vCenter Server does not start, it’s almost always the database. Log time.
Wheels are turning. As we move on from the IaaS space to offer a more developer friendly PaaS solution, it’s time to learn some Pivotal Cloud Foundry! I wanted to implement PCF on my own to see how it functions under the hood, and also see how it reacts in a, hmm, more challenging infrastructure environment. I’m running a ridiculously small vSphere lab, which is waaaay under the requirements for PCF. Also, I do get frequent power outages because I forget that I’m running servers and flick the power switch carelessly ;).
Well, there’s actually a CLI command to do the steps below. Just run vcac-vami installation-wizard activate, and it does everything for you. Sounds like a clean approach to me.
vRA 7.0 comes with a nice Installation Wizard to ease the process of getting vRA and IaaS Components running. However, if you butter finger the installation process by clicking Cancel and not really reading what vRA is trying to tell you (I did that), you cannot access the installation wizard again. It’s a manual installation after that, and I’m not going to do that anymore. So, let’s fix it.
I’ve been running a home lab a while with a couple of Intel NUCs. They have been absolutely brilliant, but they do have a slight problem with network card drivers. ESXi 5.5 didn’t support the Intel 82579LM Ethernet Controller inside NUCs, so you had to create a custom ISO image with the correct drivers. Today I wanted to upgrade my old ESXi 5.5 image to the latest one (I’m prepping to give vRA 7.0 a go). To my happy surprise, VMware has included the necessary E1000 drivers in the ESXi 5.5 U3 (and newer) package! Oh joy, no more custom images!
Configuring vCenter / vRealize Orchestrator in a cluster mode can be tricky. There are several sources of information how to do that, including the official VMware documentation (vCenter Orchestrator 5.5.2 Documentation), so it’s not that big of a problem. Upgrading the plugins in cluster mode, however, can be challenging. There’s a procedure you have to follow if you don’t want to end up in a situation where plugins keep disappearing for no good reason.
This week we ran into an interesting problem during a Federation Enterprise Hybrid Cloud implementation. We had the solution implemented with VMware vRealize Automation 6.2, and everything was running smoothly. The vRA implementation was done as a distributed install, so after configuration we moved to do some vRA component failover testing. We succeeded in failing over the primary component to secondary component on all of the different VMs (vRA appliance, IaaS Web, IaaS Model Manager + IaaS DEM-O, IaaS DEM-Workers and IaaS DEM-Agents), but failback was not successful. After diving into the component logs, we found a distinctive error on almost all of them:
System.Configuration.ConfigurationErrorsException: Error creating the Web Proxy specified in the 'system.net/defaultProxy' configuration section
O-oh, here we go, the problems start pouring in. After installing nova in my lab, I noticed that a few services (nova-cert, nova-conductor, nova-consoleauth, nova-scheduler) failed to start after reboot. In fact, if you check them immediately after reboot, they are started, but they fail after a while. From the logs I found this line:
Can't connect to MySQL server on 'controller'
So it seems that our DB is not up and running. Let’s check the db state:
service mysql status
Hmm, it’s up! If you restart the services now, everything will work. The problem is that in OpenStack, there is no check if the db is actually alive, it just issues the start commands and moves on. If for some reason the db is not accepting connections when nova services start, they will not function. I did some closer inspection of the logs, and there is a nice 2 second gap between mariadb still trying to get everything up and nova services trying to connect. It would be a lot easier if the db just didn’t start, it would be simple to adjust retries and delays (http://docs.openstack.org/juno/config-reference/content/list-of-compute-config-options.html#config_table_nova_database). Now we need to figure out how to delay the nova services by a couple of seconds or make the operating system try again when the services fail. A script after reboot checking the services and after finding a service not started it would try to restart them. That would work but it’s not neat. After banging my head to the OpenStack wall I couldn’t find an answer how to delay the start of services or do a retry. None of the delay options from the manual work. Well, a simple script will do the job for now.
Another thing. DO NOT MESS UP THE HOSTNAMES! I’ve done that twice now, stupid me, and this is what you get:
Nova still does not understand that the host might change its hostname. If an existing host has a new hostname, it is considered a brand new host with new services. I had to clean my services list with nova service-delete ID. MAC address check, anyone?