Getting ready to upgrade vRA from 6.x to 7.x? If your system contains imported VMs, pay attention.
There’s a known issue with imported VMs when doing an in-place upgrade. VMware has a KB 2150515 on the issue, but you have to do the fix before you upgrade. As it happens, we noticed the KB after the upgrade was done and things were in a bad state. If you can, use the KB. If it is too late, read on.
EDIT: Updated for EHC 4.1.2
Architecting large scale cloud solutions using VMware products have several maximums and limits when it comes to scalability of the different components. People tend to look at only vSphere limits, but the cloud also has several other systems with different kind of limits to consider. In addition to vSphere, we have limits with NSX, vRO, vRA, vROps and the underlying storage. Even some of the management packs for vROps have limitations that can affect large scale clouds. Taking everything into consideration requires quite a lot of tech manual surfing to get all the limitations together. Let’s inspect a maxed out EHC 4.1.2 configuration and see where the limitations are.
EHC 4.1.1 can support up to 4 sites with 4 vCenters across those sites with full EHC capabilities. On top of that, we can connect to 6 more external vCenters without EHC capabilities (called vRA IaaS-Only endpoints). There are many things to consider when designing a multi-site solution, but one aspect is often omitted: latency. If you have two sites near each other, latency is usually not a problem. When it comes to multiple sites across continents, then we need to consider roundtrip times (RTT) between the main EHC Cloud Management Platform and remote sites very carefully. There are many components that connect over the WAN to the main instance of EHC and vice versa, and some of the components are sensitive to high latency. It’s also difficult to find exact information on what kind of latencies are tolerated. Often the manuals just state that “can be deployed in high latency environments” or something similar. Let’s try to find some common factors on how to design multi-site environments. For a quick glance of the latencies involved, scroll down to a summary table at the end of this post. For a bit more explanation, read on!
As part of our Enterprise Hybrid Cloud, we deploy a Log Insight instance to gather the logs from the various components of the solution. Back in the days of EHC 3.5 and older, we used to have a single Log Insight appliance or a cluster, and all the syslog servers were pointed to that. Since EHC 4.0, that design has changed. Now we utilize a separate Log Insight Forwarder instance to collect and forward some of the logs. The reason behind this change is the ability of EHC 4.0 and newer to connect several remote sites (or vCenters) to one main instance of EHC. We want to collect logs from the remote sites as well, but it’s not efficient from networking perspective to collect the logs straight from the components over WAN to the main Log Insight cluster. Log Insight has a nifty built-in feature called Event Forwarding, that can push the local logs to a central location. It’s designed to work over WAN, so it can optimize the network usage and also can encrypt the traffic between sites. Pretty cool! There are plenty of other reasons to use forwarding as well.