2015-09-13 08:18:02

I want to experiment with Apache Hadoop, and to create a cluster of machines I will use KVM with Ubuntu guests.

To create VMs, the first step is to install the vm-builder, which enables you to create new virtual machines directly from the command line:

sudo apt-get install python-vm-builder

Current build script (for creating a raw VM with just Java and OpenSSH):

export MY_GNOME="gnome1"
sudo vmbuilder kvm ubuntu --suite trusty --flavour virtual\
    --destdir "/media/data/kvm/${MY_GNOME}" \
    --rootsize 10000 \
    --domain qatal.de \
    --addpkg acpid --addpkg openssh-server --addpkg linux-image-generic --addpkg openjdk-7-jre-headless \
    --user admin --pass admin \
    --mirror http://gb.archive.ubuntu.com/ubuntu/ --components main,universe,restricted \
    --arch amd64 --hostname "${MY_GNOME}" \
    --libvirt qemu:///system --bridge virbr0 \
    --mem 2048 --cpus 1 ;

This will build a machine with a 10 GByte file system and 2 GByte of RAM.

My plan was to create a couple of such VMs and then "install and run hadoop" on them. But "one does not simply install a hadoop cluster" - I underestimated the comlexity of the project, and so I am currently back at the recommended beginner step, to install a single standalone node before diving into the cluster setup via Puppet.