Friday, December 30, 2011

Running WordCount example in Hadoop installation at cygwin

I couldn't find a spare PC to install Ubuntu. So a cygwin on my Windows PC seems to be the only solution for running Hadoop (I did try to install Ubuntu desktop on the same PC, and then there is a wireless adapter error). I followed most of the steps that one can find out on Google. To me, the most helpful notes come from http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/

My setup: Windows 7 professional (32 bit), Dell Studio. And I have a single node Hadoop installation on the cygwin environment.

I think most of the steps described in http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-windows/ are still quite correct. But when setting up $JAVA_HOME, I would suggest to use the Environment Variables settings in the Windows environment. The setup in conf\hadoop-env.sh did not work for me. And I also set the $PATH variable to point to the Java/jdk1_..../bin folder in the Windows environment.The reason is that the javac and java commands running in cygwin are actually based on the installed Java SDK in windows. 


After the setup, my next issue was to run the WordCount example. There were two parts. Running the WordCount.jar seems to be easy and successful. However, when I tried to use javac to compile the source code of WordCount.java, I met many errors, such as "package cannot be found...". 


After a few hours of struggle, I found out the answer. The issues come from the classpath for Windows environment (when you run the command in cygwin). Here is the final solution for running the javac command in cygwin bash. 


javac -classpath "C:\cygwin\hadoop-0.20.2\hadoop-0.20.2-core.jar;C:\cygwin\hadoop-0.20.2\lib\commons-cli-1.2.jar" -d playground/classes playground/src/WordCount.java

Another thing to remember is to make sure that you do have java 1.6.x installed in Windows (at the time of this note, the hadoop version I used is 0.20.2). Anything in Java1.7 will not work.


Thanks to the many discussion about this issue even though they did not give direct help but just some ideas. Hopefully this note can be helpful for someone else on the same road :)

3 comments:

Swati said...

Hi, I was trying to do a similar thing.I have installed hadoop and cygwin on Windows 7 64-bit.
But, when I try to run WordCount program, I get the below error inspite of setting the HADOOP_HOME.
Can you please help me with the same since you had got it working.
Thanks in advance.

$ bin/hadoop jar /Hadoop/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount
Using JAVA_HOME: C:\software\Java\jdk1.6.0_45
Using HADOOP_HOME: C:\Hadoop\hadoop-2.6.0
15/02/03 17:53:46 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.(Shell.java:363)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.(GenericOptionsParser.java:153)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Usage: wordcount [...]

Unknown said...

I have configured in your way and its working. But when I am running code , its completing map 100 but reduce 0%/. Please I am facing this issue since long time please help me

Hadoop course in adyar
Hadoop training in Tambaram
Hadoop course in Tambaram

Unknown said...

The content provided here is vital in increasing one's knowledge regarding hadoop, the way you have presented here is simply awesome. Thanks for sharing this. The uniqueness I see in your content made me to comment on this. Keep sharing article like this. Thanks :)

Hadoop Training in Chennai | Hadoop training institutes in chennai | Big data training in Chennai