Thursday, September 15, 2011

Libvirt And The Virtual Standard

Libvirt is more than just a virtualization library — it strives for standardization among virtual resources. This matters because hypervisors abound, they all have different capabilities.  The Libvirt library assimilates an interface for communicating with different hypervisor technology and treating them as one in the same.  But there is more to the Libvirt API than just functions that create and manipulate virtual machines.

More than machines
There is certainly more to operating a virtual infrastructure than provisionally instantiating and terminating virtual machines.  There are networks, there is storage, and there are the physical hosts themselves who carry out these operations.  Any virtualization library hoping to elucidate these things will need to differentiate between them — to create abstractions for them.

Considering the number of hypervisors that Libvirt actually supports, this is a challenging feat — the goal is for the developer using the library to have a seamless integration between the divergent hypervisors.

These abstractions are the virtual entities that comprise virtual infrastructures.  To be interoperable, they need to behave the same.  That is, their external interfaces must not differ per hypervisor connection.  If it were just the virtual machine control we're concerned with — starting, stopping, destroying, snap-shots — this would be no problem.  But the fact of the matter is that there are dozens of virtual entity types that need to be acknowledged.

In addition to other virtual entities such as networking and storage, the machine entities themselves have internal parts.  Machines are composites made up of several smaller components — CPUs, disk devices, memory.  These all fall outside of the core virtualization idea of emulating CPU instructions.  Effectively managing a large-scale, or even a mid-scale virtual environments, requires abstraction at a higher level.

Virtual documents
Inside code that uses Libvirt, inside the Python binding at least, these virtual entities are represented as traditional objects.  Just like a user object or a file object, virtual entities have methods and attributes — used to query the state of the object or to alter it in some fashion.  What is unique about how Libvirt deals with it's virtual entities is that they're described in an XML format.  Any given virtual entity can be reduced to an XML description.  A document.

Whats interesting to me about this approach is that XML has an inherent ability to be decomposed and rearranged.  Any programming language can parse and generate XML data — so with any virtual entity XML from Libvirt, we're able to take it's XML description and splice it up into several smaller documents should we choose to do so.

For example, the XML schema for domains — virtual machines — in Libvirt can be somewhat lengthy.  The XML description needs to tell Libvirt about everything the machine needs to operate, including all devices, CPU tuning, network connections, etc.  So of course it's going to have some heft to it.  Our software that uses Libvirt can take a given XML description for a domain an find the interesting sub-elements it needs — it can drill down and organize the domain data in it's own way.

A side-effect to using XML in this way is that it gives the various components of our virtual infrastructure the freedom to devise their own abstractions for virtual entities.  Let's say you want a memory abstraction — one for domains that you can use to attach additional behavior.  The memory abstraction is then constructed based on the parsed domain XML.  Additionally, this abstraction could play a role in constructing XML descriptions for domains.

Another corollary to XML documents is that they're documents — they've an inherent ability to be treated as shared resources that can be passed around in a distributed environment.

More than one host
Another, perhaps less obvious, reason for having XML documents describe the virtual entities in our infrastructure is that there are going to be multiple hosts.  The host is where the hypervisor manages virtual machines, so you're going to need more than one.  Libvirt really shines here because our hyperviser environment may be heterogeneous — many different hypervisors each with their own physical hardware.

Now imagine that we've got our own tailor made software for controlling various things — the management of the virtual infrastructure.  This means that we'll need to pass around virtual XML documents to various hosts that control the entity.

For example, let's say we want to define a new domain.  Our software builds the XML document that will tell Libvirt how to make it.  Our first attempt to create and start the domain on host A fails.  So we can try it again on another host in our infrastructure.  In theory, we shouldn't care what hypervisor is running on host B for our second attempt — the same XML document that defines the domain will do.  This is a key benefit of having a standardized XML schema for virtual entities — there is one schema for many hypervisors.

Outside of the XML standardization, Libvirt offers another type of virtualization standard — connecting to multiple hosts.  It is one thing to write some agent code that will use Libvirt to control our virtual infrastructure.  This code will need to live on each host where the hypervisor runs.  But, alas, we can use Libvirt to establish remote connections to the hypervisor in a heterogeneous way.  What this part of the virtual standard libvirt implements means is that we can put more focus on our controlling code — we can worry more about making stuff that best promotes our infrastructure.  The physical hosts we're using don't need our application running on them — only Libvirt and the hypervisor itself.

The absent stuff
The Libvirt library fills a large gap between disparate hypervisor technologies.  Perhaps most important, Libvirt unifies the virtual entities common in any infrastructure and abstracts them.  This is how we're able to use multiple hosts to deploy virtual machines without having to worry about the hypervisor installed on each.  Another innovation in the virtualization standard that we're sorely missing in this field is that of the XML document — used to describe virtual entities.  The concept of an XML document isn't new to most code and can be easily and quickly adapted for any software.

But despite these achievements, Libvirt isn't exactly a standard — although, there isn't anything else that better unifies virtualization technology.  There is still some betterment that needs to take place in order to better facilitate our infrastructures.  For example, we know little about the capabilities of the hypervisor's host — this XML schema still hasn't solidified.  Another hurdle in the virtual standard is that despite having abstract entities, not every hypervisor is going to support identical functionality that Libvirt offers.  Our code that uses Libvirt can sort these errors out, but one must ask, can there really be a standard where not all hypervisors agree on at least a minimal set of common features?