This week we ran into an interesting problem during a Federation Enterprise Hybrid Cloud implementation. We had the solution implemented with VMware vRealize Automation 6.2, and everything was running smoothly. The vRA implementation was done as a distributed install, so after configuration we moved to do some vRA component failover testing. We succeeded in failing over the primary component to secondary component on all of the different VMs (vRA appliance, IaaS Web, IaaS Model Manager + IaaS DEM-O, IaaS DEM-Workers and IaaS DEM-Agents), but failback was not successful. After diving into the component logs, we found a distinctive error on almost all of them:
System.Configuration.ConfigurationErrorsException: Error creating the Web Proxy specified in the 'system.net/defaultProxy' configuration section
This error was on the IaaS Model Manager, DEM-O and DEM-Agents. Rest of the components failed back just fine. The symptom was that the VMware vCloud Automation Center Service and the DEM-Orchestrator Service would not start on reboot. We also could not restart them manually, because they would fail and the same error would appear in the logs. The error points to .NET call that sets a default proxy according to the web.config file found on the Windows host (Windows\Microsoft.NET\Framework\v4.0.30319\Config). These files were not modified by us, so the error did not make a lot of sense. The web.config file also exists in some of the vRA folders, so the origin of this error was unclear. It was clear, however, that the vRA code was calling to .NET function during service start, and that call failed due to a proxy error. This lead us to a wild goose chase with VMware support for a couple of days. It became clear that the security settings or the Windows image were blocking the services to start. Since the issue only occurred after rebooting the Windows VMs, GPO seemed the prime suspect. After engaging the customer Windows/Security SME, we found the root of the problem.
Our customer runs a high security environment, so their GPO settings are very strict. The vRA manuals tells to give these rights to the IaaS Service User:
"Log on as a batch job" and "Log on as a service"
We verified these settings, and everything was according to vRA requirements. However, the customer SME found out by using the Process Explorer (https://technet.microsoft.com/en-gb/sysinternals/bb896653.aspx) that the Service User needs an extra right to local privilege called Bypass Traverse Checking. The Process Explorer actually shows that the user needs a privilege called SeChangeNotifyPolicy, but that privilege also gives user the Bypass Traverse Checking. More info on that here: http://blogs.technet.com/b/markrussinovich/archive/2005/10/19/the-bypass-traverse-checking-or-is-it-the-change-notify-privilege.aspx. After giving the user the new rights, all of the services restarted!