Hadoop 2.x resources

This posts contains links to further Hadoop resources.

Dewarim.com download area - contains:

  • Hadoop 2.7.1 compiled  for Raspberry Pi 2 B
  • Image of the Hadoop master node for Raspberry.

Learning Hadoop - my Github repository with code examples

Other resources:

Raspberry Pi 2 case: a comparison of 6 models

Testing Raspberry Pi 2 B enclosures - Gehäuse im Test

So I bought a couple of enclosures for my Raspberry Pi 2 B computers...

Note: the links go to Amazon.de, though those are not affiliate links.

Name Price @ Amazon Comment
Orbital Case - Das runde Gehäuse für den Raspberry Pi - 2. Generation (Dark Black)
17,90€ Would not buy again. This is a complete enclosure, so you cannot connect anything except the back ports. Connecting a power supply is difficult, since only the back ports are accessible. You cannot connect a HDMI cable, so it's only use case is as a ... I don't know. What was I thinking? 

Exclusives High Quality Designer Metall Gehäuse Case für Raspberry Pi+ PLUS und Pi 2 Model B (die neueste Version 2015)
14,99€ Solid metal enclosure, very expensive but nice. Easy to assemble. The metal case is held together with 4 additional screws, so it won't snap open if it falls to the ground.
Exclusives Gehäuse Case für Raspberry Pi Model B + (B Plus) Case und Pi 2 Model B (die neueste Version 2015) -- Three Part Kunststoffgehäuse (Black Sl 9,99€ Plastic box from Raspberry meets Rydges®. Like the metal case, the two plastic parts are fixed with four additional screws. Costs 25% of the Raspberry itself, which is still expensive, but overall well done.
Exclusives Gehäuse Case für Raspberry Pi Model B + (B Plus) Case und Pi 2 Model B (die neueste Version 2015) -- Two Part Kunststoffgehäuse (Black Builder Edition) 9,99€ Another plastic box from Raspberry meets Rydges, this one exposes all connectors to the outside which makes it more suitable for hardware wizards but less resistance to dust entering the enclosure. The two plastic parts snap together and only loosely hold to each other (that is, you can open this case with one hand).

Aukru® NEU 3-in-1 kit Enthält - Raspberry Pi 2 Model B Gehäuse/Case transparent + Netzteil 5V 2000mA Micro usb + 2x Kupfer Kühlkörper Für Raspberry Pi
15,99€ Like the Black Builder case, same openings, but transparent and with a power supply and a screw driver as well as two copper cooling blocks. The screw driver was really useful in assembling all the cases :). Even though it looks exactly like the Black Builder one, this somehow seems more flimsy, and putting the screws in place was more difficult as they tend to get stuck in the plastic.
Eleduino Raspberry Pi 2 Model B and Raspberry Pi Model B+ (B Plus) Metal Gehäuse Case with Cooling Fan Black 18,90€ A black metal case with a cooling fan, but without any instructions on how to assemble this. Looks like you first have to afix the fan, otherwise the Raspberry will block access to the lower pair of screws. But I do not really need the fan, and since I do not know how to connect it to the Raspberry, I skipped on installing it. The case is okay, but if you need a metal case, the 14,99€ version seems better.

Assembly of all cases took only a couple of minutes and was nowhere as difficult as some Amazon reviews suggested. The Raspberry Pi 2 B always fit very well (with the exception of the clear plastic case - it fits well, too, but the screws are not as tight as I would like them to be, so the board may move a little up and down in the case if shaken).

Recommendation: For plastic, use the Black Builder edition of Raspberry meets Rydges. For a metal case, the designer metal case is not bad, but somewhat expensive.

Raspberry Hadoop Cluster - Bill of Materials

For a cluster of 5 Raspberry Pi 2 B instances, I bought:

Everything you need for a 5 Pi cluster

Item Price
TP-Link TL-SG 108: 8 port Gigabit Switch 1x 25,99 €
Transcend TS-RDF5K Card reader 1x 7,68 €
Raspberry Pi 2 B 5x 39,99 = 199,95 €
Anker PowerPort (60W 6 Port USB Charger) 1x 29,99 €
3 USB Charger cables 2 x 7,49 = 14,98 €
Transcend 32 GByte microSDHC memory cards 5 x 11,89 = 59,45 €
TFPNet 5-pack CAT 6 Ethernet cables 1x 9,95€
Total 342,99 €

  • Download raspbian, the debian based OS for Raspberries
  • Copied it to the frist card with the USB card reader and did basic setup
  • Copied card four times (in hindsight, I should have done more setup: install hadoop etc before copying the system, as there are always little things that need installation later on, for example emacs)
  • Setup DHCP in the router for the 5 ethernet MAC addresses so each Raspberry always gets the same IP (poor man's DNS)

And the initial test run with all 5 raspberries connected:


Fix hanging NameNode for Hadoop 2.7.1 at Raspberry Pi 2 B

The problem: In /usr/local/hadoop-2.7.1 I can run ./sbin/start-dfs.sh, and the processes for both the namenode and the datanodes on all nodes are started. jps shows that the DataNode, NameNode and SecondaryNameNode are clearly running. But they are not binding to any port - neither netstat nor "lsof -i" show any TCP ports being used beside the SSH port.

Running any hdfs commands gets me  a "connection refused" error.

Apache Hadoop has a nice wiki page about how any networking problems with connection refused are not their problem, but mine. Okaaaay.

I searched for hours - was it my DNS / IP settings, was one of the configuration files, was it being on raspbian?

No. In the end, I found a posting on StackOverflow:


The reason for the Hadoop services not binding to any port / refusing all connections was: the process was hanging because of an old version of the Google Guava jar.

So I wrote a fix to download and install a more current version:

cd /tmp
wget http://central.maven.org/maven2/com/google/guava/guava/18.0/guava-18.0.jar
export HADOOP_SHARED=/usr/local/hadoop-2.7.1/share/hadoop

rm "${HADOOP_SHARED}/common/lib/guava-11.0.2.jar"
rm "${HADOOP_SHARED}/hdfs/lib/guava-11.0.2.jar"
rm "${HADOOP_SHARED}/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/guava-11.0.2.jar"
rm "${HADOOP_SHARED}/kms/tomcat/webapps/kms/WEB-INF/lib/guava-11.0.2.jar"
rm "${HADOOP_SHARED}/tools/lib/guava-11.0.2.jar"
rm "${HADOOP_SHARED}/yarn/lib/guava-11.0.2.jar"

cp guava-18.0.jar "${HADOOP_SHARED}/common/lib/guava-18.0.jar"
cp guava-18.0.jar "${HADOOP_SHARED}/hdfs/lib/guava-18.0.jar"
cp guava-18.0.jar "${HADOOP_SHARED}/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/guava-18.0.jar"
cp guava-18.0.jar "${HADOOP_SHARED}/kms/tomcat/webapps/kms/WEB-INF/lib/guava-18.0.jar"
cp guava-18.0.jar "${HADOOP_SHARED}/tools/lib/guava-18.0.jar"
cp guava-18.0.jar "${HADOOP_SHARED}/yarn/lib/guava-18.0.jar"

And now Hadoop (at least the hdfs part)  starts and listens to the default ports:

hduser@node1 ~ $ sudo lsof -i tcp
sshd 2169 root 3u IPv4 6612 0t0 TCP *:ssh (LISTEN)
sshd 2292 root 3u IPv4 6663 0t0 TCP node1:ssh->turtle.local:55560 (ESTABLISHED)
sshd 2296 hduser 3u IPv4 6663 0t0 TCP node1:ssh->turtle.local:55560 (ESTABLISHED)
java 4395 hduser 196u IPv4 13062 0t0 TCP *:50090 (LISTEN)
java 5029 hduser 190u IPv4 22738 0t0 TCP *:50070 (LISTEN)
java 5029 hduser 202u IPv4 21275 0t0 TCP node1:8020 (LISTEN)
java 5029 hduser 212u IPv4 21877 0t0 TCP node1:8020->node4:36701 (ESTABLISHED)
java 5029 hduser 213u IPv4 21878 0t0 TCP node1:8020->node2:46195 (ESTABLISHED)
java 5029 hduser 214u IPv4 21879 0t0 TCP node1:8020->node5:44667 (ESTABLISHED)
java 5029 hduser 215u IPv4 21880 0t0 TCP node1:8020->node3:46794 (ESTABLISHED)
java 5029 hduser 216u IPv4 21882 0t0 TCP node1:8020->node1:36134 (ESTABLISHED)
java 5130 hduser 192u IPv4 21626 0t0 TCP *:50010 (LISTEN)
java 5130 hduser 196u IPv4 21632 0t0 TCP localhost:57456 (LISTEN)
java 5130 hduser 250u IPv4 18108 0t0 TCP *:50075 (LISTEN)
java 5130 hduser 251u IPv4 21851 0t0 TCP *:50020 (LISTEN)
java 5130 hduser 262u IPv4 23654 0t0 TCP node1:36134->node1:8020 (ESTABLISHED)



  • Errors

    Descriptions of error messages and possible solutions

  • Little Goblin

    Posts about Little Goblin, the Grails based open source browser game engine and its reference implementation.

    The home page of Little Goblin is littlegoblin.de