Limits of virtualization

It's been over 10 months since we have started to seriously use virtualization and run Windows inside virtual machine to ease installation and configuration pain. It starting first as convenient measure of isolation two different development environments (.NET 1.1 based and .NET 2.0 based) and avoid "crosspolination" in the data analytics project. At that time, my expectations what would be the limits of what you can or cannot do in virtual environment were mostly around performance, responsiveness and device support (USB especially). As it turned out, all of that actually worked much better than I have ever expected. With new versions of Parallels, the performance is very good and user experience (user means fellow developer) is barely noticeable difference against developing on host system. Assumed that you have decent dual-core system with 2 GB of RAM, of course. Using Parallels gives you the added benefit of moving the virtual environment between Windows, Mac and Linux hosts, which is very convenient.

We have also started to use virtualization on the server side, using Microsoft Virtual Server 2005 R2 and I am happy to report that it worked very smooth as well. In the biometric project, we were running UA testing on circuit of 5 instances of replicated SQL Servers, each server using own virtual machine. The circuit was hosted on two quad core (2x dual core) servers with 4GB RAM each. Using virtual machines allowed us to achieve repeatability and consistency in configuration setting up the environment - we cloned one install and renamed the VM's hosts.

And here comes the catch: because it is very easy to copy virtual disk in order to test some new software or plugin or configuration, after some time we have ended up with quite a collection of virtual machines and experienced first limit of virtualization: configuration management. It is pretty hard to keep exact track of what is exactly installed in which VM - what version of which software, what are the network settings, user accounts, access rights. It can easily lead to administrative nightmare and can require effort comparable with managing environment of hundreds of computers (it essentially is that environment). In development shop as ours you can cheat a bit a standardize on same usernames and passwords for each VM, but it is not very secure and hardly recommended approach for production ...

Second limit we have seen is Windows update effect. The VM's which represents "alternative universes" seldom run at the same time. With Windows updates coming almost daily, first things that happens after you get back to start using VM which was sitting idle for 3 months is installation of 37 updates, interwoven with 7 reboots. A pretty time consuming and boring activity. If you are math-geek, you can define a function that will compute number of wasted hours from number of VM's, their inactivity and frequency of security updates - and find out how many VMs you should own so that all your working hours would be consumed by switching the VM's on / off and waiting for the updates to finish ...

There is no really 100% good solution for this. Running all VM's all time is not practical and switching the updates off completely is dangerous. Again, in development shop you can (and should) batch the updates an update in "waves" - it will still consume time, but at least the "patchlevel" of the VM's will be consistent and you will save some time with merging some reboots. And it is not only Windows updates that is causing problems: keeping e.g. versions of assemblies installed in GAC (or Ruby GEM's) in sync across multiple virtual machines can be a challenge too.

Third challenge is licensing and license management. I do not mean the legal side of software licensing related to running software in VM's - just pure technical implications of doing it. Many software products and subscription based services are using client requests' tracking to enforce only allowed number of client installs. For example anti-virus, which must download almost daily new library version, can use the "get update" and current client version as mean to track that only licensed number of clients are getting the updates with same license id. It can get quite confused when you repeatedly roll back the VM state and return to starting point two weeks ago - or alternate running two different snapshots of same VM in time. Even if there is never more than single instance of VM running at the same time with the licensed version of software - and only one licensed copy was ever installed, it is very hard to distinguish this from situation where second (illegal) copy of software were installed on second host - virtual or not. I can imagine this will lead to some quite interesting challenges on both technology and legal sides ...

Contents