Building a bridge between a virtual machine and the outside world

Posted by m.jackson on 9 February 2015 - 2:00pm

Bridge

By Mike Jackson, Software Architect.

The Distance project at the University of St. Andrews use Windows XP virtual machines for developing their Distance for Windows software. Their interface code, implemented in Visual Basic, is not held under revision control and institutional security policies mean that their XP virtual machines cannot be connected to the network.

In this blog post, I describe my experiences of using Git and shared folders to address both these problems, as part of our recent open call collaboration.

Distance for Windows

Distance sampling is a widely-used methodology for estimating animal density or abundance, Distance for Windows is software for the design and analysis of distance sampling surveys of wildlife populations. Distance for Windows consists of a Visual Basic 6 interface on top of analysis engines written in FORTRAN, the statistical programming language R, and ESRI MapObjects. Distance for Windows is built from a number of legacy components, many of which are no longer supported by their vendors, including Visual Basic 6 itself.

Distance for Windows development is currently done on VMWare Windows XP virtual machines, which run under VMWare Player, and which are configured with the required legacy components and tools. However, the Visual Basic source code is not held under any form of revision control, and, even if it were, institutional security policies prohibit these XP virtual machines from being connected to the network. This prevents the use of networked repositories whether these be locally-hosted Subversion repositories or Git repositories hosted on GitHub, for example.

I decided to explore the use of Git - which supports both non-networked and networked usage - and VMWare shared folders to address both these problems.

Git

Git is a distributed revision control system. Unlike Subversion or CVS which rely upon a centralised server storing all the versions of the files, Git repositories store all these versions locally and every developer has their own copy (or clone) of a repository. A developer can commit changes to their local repository, as often as they wish, and without the need for a network connection. Developers can then push their changes to, or pull changes made by others from, other clones, of the repository e.g. pushing changes to a clone on GitHub.

It is this ability to use Git locally, without a network connection, that makes it a good tool for revision control on a VM that cannot be connected to a network, such as those used by Distance. However, if the VM were corrupted then a developer would lose their entire history of changes. So, how can we get the contents of the local Git repository into a Git repository hosted elsewhere?

Shared folders

To allow files to be shared between the virtual machine and the host machine (the machine on which VMWare player is running the virtual machine) a shared folder can be set up. A shared folder is a folder on the host machine that is available within the virtual machine and can be used as if it was a network drive. So, for example, via the VMWare player interface, can declare that a folder on the host machine (e.g. C:\Users\mike\Local Documents\distance-consultancy) is to be a shared folder. This will then appear within the VM as a network drive e.g. (e.g. "Shared Folder on 'vmware-host' (Z:) Network Drive").

Git repositories in shared folders

A shared folder allows Git to push and pull changes from a repository within the non-networked VM to the network, via the host machine:

  • On the host machine, a developer clones the repository from, for example, GitHub into the shared folder.
  • Within the VM, the developer can use this as a local Git repository to commit their changes.
  • When there is a need to push changes to the remote repository, they can, on the networked host machine, invoke a Git push within the shared folder to push the changes to the remote repository.
  • When there is a need to pull changes from the remote repository, they can, on the networked host machine, invoke a Git pull within the shared folder to pull the changes from the remote repository.

A Visual Basic blocker

The above solution works with Distance's Windows XP VM running under on a network-aware host machine. However, when it was tested with the Distance for Windows Visual Basic code, the Visual Basic projects would not build. An attempt to build a Visual Basic project resulted in a dialog box with a message "Error during load. Refer to '\\vmware-host\Shared Folders\...'".

"\\vmware-host\Shared Folders" is the Uniform Naming Convention syntax used to describe the location of networked resources, such as the shared directory here. Searches for a way to get Visual Basic to load projects located on the shared folder were fruitless (if you know how, please comment below!)

Git saves the day

Git's commands to clone a repository and share changes between repositories (push and pull) use a URL which specifies the repository the changes are to be shared with. Usually these are used with http- or https-based URLs but they can be used with file-based URLs too. For example, to clone a local repository locally one can do:

git clone file:///home/mjj/my-code my-code-cloned

This flexibility provided a workaround to the problem of Visual Basic code being located within a shared folder, since a local clone of the repository in the shared folder could be made, still within the VM but outwith the shared folder:

  • On the host machine, a developer clones the repository from, for example, GitHub into the shared folder.
  • Within the VM, the developer clones a local copy of this repository, outwith the shared folder, from the clone within the shared folder.
  • Within the VM, the developer can use this local copy this as a local Git repository to commit their changes.
  • When there is a need to push changes to the remote repository, they can, on the VM, invoke the Git push command within the local repository to push their changes to the shared folder repository. On the networked host machine, within the shared folder repository, they then invoke the Git push command to push their changes to the remote repository.
  • When there is a need to pull changes from the remote repository, they can, on the networked host machine, within the shared folder repository, invoke the Git pull command to pull the changes from the remote repository. On the VM, within the local repository, they then invoke the Git pull command to pull the changes from the shared folder repository.

This approach, while introducing the overhead of additional push and pull invocations served as a workaround to the problems caused by Visual Basic code residing in a shared folder while still allowing for revision control to be used.

Conclusion

A long term solution to Distance's reliance on Windows XP VMs is to update the technologies used to build Distance for Windows. The Distance team are already exploring options around this. However, the foregoing provides a workaround to allow them to place their Visual Basic code under version control in the meantime, and to host it on GitHub. This workaround was possible due to the support within VMWare for shared folders - allowing a Git repository to co-exist on the host and virtual machines and be used simultaneously by both - and Git's ability to be used without a network and connection and its support for pushing changes between repositories (whether they be locally or remotely hosted).

For a step-by-step guide on how to use these workarounds, please see our guide on Using Git with shared folders and virtual machines.