I am trying to install Hadoop in Ubuntu 12.04 version. Following the instructions from
http://michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/, I installed java-6-openjdk
from Ubuntu software-center. I have set java_home
in .bashrc
. Also set java_home
in Hadoop conf/env.sh
. While formatting the namenode, I am getting the following error:
usr/lib/jvm/java-6-openjdk/bin/java no such file or directory.
Thank you. But it's a 64bit OS.
The guides I followed when I had 12.04 were:
I was actually opposed to the MyLearning one because the first thing it recommended was Oracle Java 7 instead of OpenJDK 7, but I had some issues with OpenJDK 7 when trying this out so I had to go with Oracle.
The guide is mostly straight forward and here it is:
Install Java
Create Hadoop user
Where hduser is the Hadoop user you want to have.
Configuring SSH
To be sure that SSH installation went well, you can open a new terminal and try to create ssh session using
hduser
by the following command:reinstall ssh if localhost does not connect (you may need to add
hduser
to sudo as below step)Edit Sudoers
Add at the end the line to add hduser into sudoers
To save press CTRL+X, type Y and press ENTER
Disable IPv6
or
Copy the following lines at the end of the file:
If you face a problem telling you, you don't have permissions, just run the previous command with the root account (In case sudo is not enough. For me it was)
Now reboot.
You can also do
sudo sysctl -p
but I rather reboot.After rebooting, check to make sure IPv6 is off:
it should say 1. If it says 0, you missed something.
Installing Hadoop
There are several ways of doing this, the one the Guide suggests is to download from the Apache Hadoop site and decompress the file in your
hduser
home folder. Rename the extracted folder tohadoop
.The other way is to use a PPA that was tested for 12.04:
NOTE: The PPA may work for some and for others will not. The one I tried was to download from the official site because I did not know about the PPA.
Update
$HOME/.bashrc
You will need to update the
.bashrc
forhduser
(and for every user you need to administer Hadoop). To open.bashrc
file, you will need to open it as root:or
Then you will add the following configurations at the end of
.bashrc
fileNow, if you have OpenJDK7, it would look something like this:
The thing to watch out for in here is the folder where the Java resides with the AMD64 version. If the above does not work, you can try looking in that particular folder or setting the Java that will be in used with:
Now for some helpful alias:
Configuring Hadoop
The following are configuration files we can use to do the proper configuration. Some of the files you will be using with Hadoop are (More information in this site):
start-dfs.sh
- Starts the Hadoop DFS daemons, the namenode and datanodes. Use this before start-mapred.shstop-dfs.sh
- Stops the Hadoop DFS daemons.start-mapred.sh
- Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.stop-mapred.sh
- Stops the Hadoop Map/Reduce daemons.start-all.sh
- Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers. Deprecated; use start-dfs.sh then start-mapred.shstop-all.sh
- Stops all Hadoop daemons. Deprecated; use stop-mapred.sh then stop-dfs.shBut before we start using them, we need to modify several files in the
/conf
folder.hadoop-env.sh
Look for the file
hadoop-env.sh
, we need to only update the JAVA_HOME variable in this file:or
or in the latest versions it will be in
or
Then change the following line:
To
Note: if you get
Error: JAVA_HOME is not set
Error while starting the services, you forgot to uncomment the previous line (just remove #).core-site.xml
Now we need to create a temp directory for Hadoop framework. If you need this environment for testing or a quick prototype (e.g. develop simple hadoop programs for your personal test ...), I suggest to create this folder under
/home/hduser/
directory, otherwise, you should create this folder in a shared place under shared folder (like /usr/local ...) but you may face some security issues. But to overcome the exceptions that may caused by security (like java.io.IOException), I have created the tmp folder under hduser space.To create this folder, type the following command:
Please note that if you want to make another admin user (e.g. hduser2 in hadoop group), you should grant him a read and write permission on this folder using the following commands:
Now, we can open
hadoop/conf/core-site.xml
to edit the hadoop.tmp.dir entry. We can open the core-site.xml using text editor:or
Then add the following configurations between
<configure>
xml elements:Now edit
mapred-site.xml
Now edit
hdfs-site.xml
Formatting NameNode
Now you can start working on the Node. First format:
or
You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.
Starting Hadoop Cluster
You will need to navigate to hadoop/bin directory and run the
./start-all.sh
script.If you have a different version from the one shown in the guides (Which you will most likely have if doing this with the PPA or a newer version) then try it this way:
This will start a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.
Checking if Hadoop is running
There is a nice tool called
jps
. You can use it to ensure that all the services are up. In your hadoop bin folder type:It should show you all Hadoop related processes.
NOTE: Since this was done around 6 months ago for me, if there is any part not working let me know.
Hadoop Using Juju (A Juju Charm for Hadoop)
Taken from Charming Hadoop
I will assume the following is already set up:
~/.juju/environments.yaml
with the information regarding the server you will be using including the PPA origin.Ok now follow this steps to have a Hadoop service running:
Bootstrap the environment for Hadoop
Wait until it finishes then check to see if it is connecting correctly:
Deploy Hadoop (Master and Slave)
Create Relations
Expose Hadoop (Since you already deploy and created relations the service should be running)
And check status to see if it working correctly:
Up to now you have a running Hadoop. There are many more things you can do that can be found in the link provided or in the official Juju Charm for Hadoop
For up to date JuJu Charms (Setups, Step by Step guide and more) you can visit: JuJu Charms and make your own JuJu Environment and see how each file is setup and how each service connects.
I successfully installed Hadoop by setting the path of
JAVA_HOME
asusr/lib/jvm/java-6-openjdk-amd64
.Derived from @Luis Alvarado's answer, here is my version for Ubuntu 14.04 and Hadoop 2.5.1
In brief
hduser
hduser
from now onhduser
to remote via ssh with pass-phrase-lessDone. Good luck!
Detail steps
Install Java
Download and install
Make sure you have Java7 installed
we should have
java
point to/usr/lib/jvm/java-7-oracle/jre/bin/java
Prepare an executive user for Hadoop
hduser
Create user
hduser
in grouphadoop
Grant
hduser
the sudo privilegeEdit sudo
Add to the end this line
Switch to
hduser
from now onAllow
hduser
to remote via ssh with pass-phrase-lessInstall openssh
Generate RSA public/private key for SSH connection; passphrase is empty as
parameter -P ""
Make sure
hduser
can ssh remote locally without a passwordDisable IPv6
Edit the configuration file
Copy to the end
Make sure IPv6 is off by a reboot or call
Then call
It should say 1 which means OK ^^
Download and config Hadoop package
Download Hadoop 2.5.1 packages from Apache Hadoop site
The direct URL for this package is this link
So let's download to
hduser
's home folder, extract it, and rename it tohadoop
Make sure we have Hadoop stored in
hduser
homePrepare system path $HADOOP_HOME and $JAVA_HOME
Edit
hduser
's .bashrc filePut to the end values for
$HADOOP_HOME
and$JAVA_HOME
Add the Hadoop
binary
folders to system$PATH
Open a new terminal, log in as
hduser
, and make sure you have $HADOOP_HOME with available commandsWe should see the full path of those names.
Config Hadoop's services
Each component in Hadoop is configured using an XML file.
Common properties go in core-site.xml
HDFS properties go in hdfs-site.xml
MapReduce properties go in mapred-site.xml
These files are all located in folder $HADOOP_HOME/etc/hadoop
Define, again, JAVA_HOME in
hadoop-env.sh
by edit the lineDefine Hadoop
temp folder
andfile system
name in core-site.xml atWe need to prepare this
temp folder
as configured at/home/hduser/tmp
Define
file system
'sblock replication
in hdfs-site.xmlDefine
map-reduce job
in mapred-site.xmlFormat
name node
Start Hadoop service
Call
these two commands are located at $HADOOP_HOME/sbin which we have added to system $PATH before.
Make sure Hadoop services are started properly
we should see
To be able to install
sun-java
with theapt-get
command, you need to add a line to a file calledsources.list
. This file can be found in/etc/apt/sources.list
.Open the file using this command:
Then at the very end of (bottom) of that file, you copy/paste the line:
Now press Ctrl+X to exit, and y for saving.
Now type the command:
And when that is done, you can successfully run the command:
For a more up to date tutorial (not sure on differences) look at the hadoop screencasts video tutorials. They provide video and the actual commands to install underneath. Also if you email the writer he is very happy to respond and help you out if you get stuck with anything.
These instructions are largely similar to the ones that @Luis replied with.