TripleO and Golden Images
I spend a good deal of my time these days working upstream on OpenStack TripleO. TripleO has made a lot of great progress recently at being able to deploy and manage a production quality OpenStack cloud. Lately, I’ve seen a lot of growth in the community activity and interest.
I figured a good way to help provide some background on some of the architectural concepts of TripleO would be to do a series of blog posts going into a little more detail about some of these points and the reasoning behind them.
First off though, I don’t take credit for using these ideas I’m going to cover :). A lot of different hard working folks have helped to define TripleO into what it is today.
Secondly, there are lots of other great resources out there on TripleO. A youtube search will pull up a lot of videos and presentations from different folks, including some from community members that bootstrapped the project. These are excellent resources, so if you have further interest, check those videos out!
So, jumping into the first topic I’d like to talk about: Golden Images.
TripleO’s goal is to deploy OpenStack using OpenStack itself wherever possible. Given that the unit of deployment in the typical cloud model is an image (qcow2, raw, ovf, etc), it’s not surprising that TripleO deploys to physical baremetal hardware nodes using pre-built images. The deployment process itself of course uses Nova, which has 2 available baremetal drivers, nova-baremetal and ironic. They both work roughly the same way — when launching an image on a barmetal node, the qcow2 image is converted to raw and dd is used to write the bits to the physical disk of the baremetal node over iscsi.
The images contain all of the operating system bits, OpenStack software, and initial bootstrap configuration. They are “installed” images, as opposed to installation media themselves. This can be a bit confusing at first because it’s not necessarily typical when deploying to baremetal. You might instead be used to doing baremetal provisioning by actually running an operating system installer such as anaconda with a kickstart file. The TripleO process is more akin to baremetal imaging vs. baremetal provisioning in the traditional sense.
This model makes good sense for cloud. When you boot a vm in a cloud, you’re booting an image that is already installed, e.g., you don’t have to run an installer once the vm is up. So, it makes sense that TripleO would make use of this model and apply it to baremetal deployments as well. Remember, the whole point of TripleO is to prefer and use OpenStack itself.
The deployment process is also much quicker than provisioning via installers. You’re typically only bound by network speed and disk I/O. A rack of baremetal servers with a gigabit switch and all SSD’s, can be provisioned in just a few minutes.
To that end, TripleO provides an image building tool, diskimage-builder, which provides support for building images for many well known Linux distros. The output from diskimage-builder is a qcow2 image that can be used as a baremetal physical disk. diskimage-builder customizes images by applying what it calls elements during the build process. At their core, elements themselves are just scripts. The script based nature of the elements provides a practically universal entry point for any customization method you choose. You can write elements to apply puppet modules, or install distribution packages, or custom scripts even.
There’s an existing repository of elements for setting up OpenStack software called tripleo-image-elements. For TripleO’s puposes, this is where all the installation logic of how to install and configure OpenStack lives.
It’s a valid concern, that it could be a bit difficult to build a qcow2 image that will boot on all available hardware out there. But, that concern shouldn’t be overstated. Practically every major Linux distribution can produce a live iso variant (some *only* produce live iso’s in fact) that will boot on 99% of commodity hardware out there, so the same can be accomplished here.
That sounds great, but what if you aren’t using commodity hardware or have a heterogeneous environment with some hardware using specialized network cards for instance and some not? That’s a problem that would be solved during the image build process itself. You could write an element that adds driver support for the hardware. If you didn’t want this driver in all your images, you could build a set of images just for the special hardware, and make sure the correct images are used for that hardware in any number of ways (a custom Nova flavor for instance).
This process is really conceptually no different than what might have to be done if using an operating system installer instead for baremetal provisioning. Let’s say you’re installing RHEL throughout your environment and you need to add support for some set of specialty hardware because that support is not available in base RHEL. You likely host a yum repository internally, or use a custom Red Hat Satellite channel to host the 3rd party packages. You then enable that repository and install the packages in your kickstart file.
In either case, you’re enabling support for specialty hardware by installing additional drivers or packages. It doesn’t really matter if you do that at image build time or system installation time.
What about the “Golden” part of the images? The golden implies that the images you plan to deploy in production are “known good” and are the exact same images that you have tested with and passed your CI process. One of the things that makes this attractive is that there is much less room for drift across your environment. If all your systems are deployed from the same set of images, then that should eliminate questions about what package sets are on which systems, what updates have been applied where, etc.
To abuse a software engineering term, in a way deploying via images is more “idempotent” than running installers. Which is more likely to produce the same bit-for-bit result every time? Converting a qcow2 file to raw and dd’ing it to a physical disk, or yum/apt-get installing hundreds of packages, many of which run scripts?
The idempotent nature of this deployment model is especially important in CI/CD environments where you want to be sure you’re deploying what you’ve actually tested. And, I think that’s what I’ll plan on highlighting in my next post, CI/CD and how it fits in with TripleO. Stay tuned.
Posted in Cloud