Wheels are turning. As we move on from the IaaS space to offer a more developer friendly PaaS solution, it’s time to learn some Pivotal Cloud Foundry! I wanted to implement PCF on my own to see how it functions under the hood, and also see how it reacts in a, hmm, more challenging infrastructure environment. I’m running a ridiculously small vSphere lab, which is waaaay under the requirements for PCF. Also, I do get frequent power outages because I forget that I’m running servers and flick the power switch carelessly ;).
Let’s see where we are. Here are the official minimum requirements for Pivotal Cloud Foundry:
- Disk space: 1TB
- Memory: 120GB
- vCPU cores: 80
- Overall CPU: 28 GHz
This is what I have:
- Disk space: 1.8TB free on NAS (7.200 RPM disks) and internal 100GB SSD on one server
- Memory: 32GB
- vCPU cores: 8 (mobile i3 and i5 processors)
- Overall CPU: 8.18 GHz
- Single Ethernet port on servers
Oh boy.. I don’t really care about the disk space, although I have the capacity. It’s going to be thin provisioned anyway, so that’s not a problem. The memory and vCPUs, however, might turn out to be a real blocker. I don’t even have the full 32GB of memory at my disposal, since my vCenter and DNS servers take 8GB. I’ve done enough of PoCs and lab builds during my time that I’m fairly certain that the actual need is not even close to the requirements, but the question is how much is really needed to make this run. Only way to know is to install, so here we go.
The three main components for PCF are the Ops Manager, Ops Manager Director and Elastic Runtime. The first two are for managing the PCF environment, whereas the Elastic Runtime is the actually Cloud Foundry environment where apps live. The installation outline is quite simple:
- Deploy Ops Manager OVA to the vSphere environment
- Log into Ops Manager UI and configure the Director
- Deploy Director
- Upload Elastic Runtime file to Ops Manager
- Configure Elastic Runtime
- Deploy Elastic Runtime
You can follow the installation docs for detailed steps. I’m not going to go step by step since they are very well documented by Pivotal. I would propose to create and use vSphere Resource Pools to make sure your main components get the necessary resources during congestion. For PCF, these can be specified in Director configuration under Availability Zones. I created two AZs, one for the ‘Singleton’ jobs (= Director itself and NFS server) and one for Elastic Runtime VMs. AZs are defined in Director config and assigned in Director and Elastic Runtime config.
I did run into some issues during my installation. My analysis is that it was all down to lack of resources and more particularly servers timing out during requests. I made a couple of tweaks to my environment to make the installation process a success. It did fail several times with different errors before I found the right settings. Everything went fine without tweaking until I tried to deploy the Elastic Runtime. During the deployment about 20 VMs are pushed to the vSphere, so this is where my problems started. The first part of the deployment went fine. After the actual push of the VMs, there were some extra steps called Errands. In this stage a few additional features are implemented, but they are all optional. The first one is a Smoke Test, that checks that the Elastic Runtime is ok and apps can be deployed. This failed once for me with a ‘StagingError’. I suspect there was a timeout when uploading the necessary bits for the test application. In fact, I saw this again after the installation was a success and I pushed my first app. During Errands, the App Manager is also installed. App Manager is the GUI for Pivotal Cloud Foundry. Not absolutely needed, but I wanted to have that as well for the full experience. This part was the toughest to get through, it failed constantly with different errors, for example database and upload errors. This is where I had to start tweaking the system.
During the configuration of Elastic Runtime, you can modify the resource config for the different VMs. There are some minimum requirements, but the installer will tell you if you configure the settings too low. I changed the Diego Cells size to ‘medium.disk’ to reduce the memory footprint. Diego Cells are the VMs where application and their containers run. I don’t need that much of application capacity to test PCF, so those settings could be reduced. I also moved my Director VM to SSD disk and change my DRS settings to ‘Partially Automated’. I noticed during the failed installation attempts that the Director started dropping packets at some point causing random timeout errors and DRS started moving VMs during critical installation stages due to resource congestion. It seemed that CPU cycles started to be a problem and I saw very high cpu_ready values in vSphere. PCF requires DRS to be operational, but ‘Partially Automated’ is a supported setting. This way I saved some CPU cycles to get rid of the timeouts. On top of that, I only have a single ethernet port on my servers, so all the management and data traffic go through the same port. vMotion affects the network and it can cause more timeouts.
After these modifications the installation went through without problems! You might need to run it several times, since the timeouts are quite random. This won’t happen if you have the necessary resources, mind you. I also had a couple of times a timeout during app push. Again, this is due to my poor environment. I won’t recommend trying PCF for the first time in a setup like mine, it’s far easier and a better experience to have a go at the PCF public cloud, Pivotal Web Services.
I yanked the power cable to see what happens. Everything came up disappointingly well, nothing to fix 🙂