Installing hadoop on Ubuntu as single node

Lets start with Installation first

Prerequisites :

Java

2. An Ubuntu 18.04 server with a non-root user with sudo privileges

STEP 1: To get started, we’ll update our package list:

sudo apt-get update

STEP 2: Next, install OpenJDK, the default Java Development Kit on Ubuntu 18.04.

sudo apt-get install default-jdk

Once the installation is complete, let’s check the version.

java –version

STEP 3: Installing Hadoop

With Java in place, we’ll visit the Apache Hadoop Releases page to find the most recent stable release.

http://hadoop.apache.org/releases.html

After visiting the above page, you’ll find Apache Hadoop Releases. Click on Binary given under Tarball option against 2.8.1

When you click on Binary, it will take you to next page.

Circled link is desired download link for Hadoop.

STEP 4: Go to Terminal again and write the following command to start downloading Hadoop.

wget http://www-us.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz

It will take some time depending upon your internet connection speed.

STEP 5: (OPTIONAL STEP) After you download the above software, you can check its integrity that whether the downloaded software is modified or not.

On Apache Hadoop release page, you will find checksum link.

In above image, link is encircled. When you click on this link, it will take to following page.

After clicking the encircled link, it will further take you to following page :

Encircled link in above image is desired checksum file. Now will download this file by writing following command on terminal.

wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz.mds

Then run the verification by executing following command on terminal :

shasum -a 256 hadoop-2.8.1.tar.gz

One will expect following output

Output

d489df3808244b906eb38f4d081ba49e50c4603db03efd5e594a1e98b09259c2 hadoop-2.8.1.tar.gz

Compare this value with the SHA-256 value in the .mds file

Run following command on terminal :

cat hadoop-2.8.1.tar.gz.mds

hadoop-2.8.1.tar.gz: SHA256 = D489DF38 08244B90 6EB38F4D 081BA49E 50C4603D B03EFD5E 594A1E98 B09259C2

The output of the above command should match the value in the file we downloaded from apache.org.

STEP 6 : Now that we’ve verified that the file wasn’t corrupted or changed, we’ll use the tar command with the -x flag to extract, -z to uncompress, -v for verbose output, and -f to specify that we’re extracting from a file.

tar -xzvf hadoop-2.8.1.tar.gz

Finally, we’ll move the extracted files into /usr/local, the appropriate place for locally installed software.

sudo mv hadoop-2.8.1 /usr/local/Hadoop

STEP 7: Configuring JAVA HOME

To find the default Java path, make use of following command :

readlink -f /usr/bin/java | sed “s:bin/java::”

Output :

/usr/lib/jvm/java-8-openjdk-amd64/jre/

You can copy this output to set Hadoop’s Java home to this specific version, which ensures that if the default Java changes, this value will not.

To begin, open hadoop-env.sh:

sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Edit the file with following modifications

#export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

Put hashtag against default JAVA_HOME and add new JAVA_HOME.

STEP 8 : Check whether HADOOP got installed

Run following command to check.

/usr/local/hadoop/bin/Hadoop

If you get following output, it means everything is in right place.

Usage: hadoop [–config confdir] [COMMAND | CLASSNAME]

CLASSNAME run the class named CLASSNAME

…….

STEP 9: CONFIGURING HADOOP

INSTALLING SSH

For installing SSH you need to run following command :

apt-get install ssh

It will take sometime to install ssh. Once during installation user will be asked to continue. Press Y to proceed further.

User may face following error :

Could not get lock /var/lib/dpkg/lock – Unable to lock the directory

Error can be removed by running following commands :

sudo rm /var/lib/dpkg/lock

sudo apt-get update

After running these commands again try installing ssh.

STEP 10: Generate Public/Private RSA key pair by running following command

ssh-keygen -t rsa -P “”

When you have been asked ‘Enter file in which to save the key’ during installation of key then simply press ENTER.

STEP 11: Making the generated public key authorized

You can make the generated public key authorized by running following command:

cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys

STEP 12: Check if SSH working by running following command

ssh localhost

During process, user will be asked to continue then simply type ‘yes’.

You will get following message after installation

Welcome to Ubuntu 18.04 …..

……..

STEP 13: Edit and Setup Configuration Files

To complete the setup of Hadoop, the following files will have to be modified:

~/.bashrc

/usr/local/hadoop/etc/hadoop/hadoop-env.sh

/usr/local/hadoop/etc/hadoop/core-site.xml

/usr/local/hadoop/etc/hadoop/yarn-site.xml

/usr/local/hadoop/etc/hadoop/mapred-site.xml.template

/usr/local/hadoop/etc/hadoop/hdfs-site.xml

STEP 14: Editing ~/.bashrc

Open up ~/.bashrc by using nano editor

sudo nano ~/.bashrc

Go to the end of the file and paste/type the following content in it:

#HADOOP VARIABLES START

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/

export HADOOP_INSTALL=/usr/local/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native

export HADOOP_OPTS=”-Djava.library.path=$HADOOP_INSTALL/lib”

#HADOOP VARIABLES END

Files opened and edited using nano can be saved using Ctrl + X. Upon the prompt to save changes, type Y. If you are asked for a filename, just press the enter key.

After saving and closing the .bashrc file, execute the following command so that your system recognizes the newly created environment variables:

source ~/.bashrc

STEP 15: Editing /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Already configured in Step 7.

STEP 16: Editing /usr/local/hadoop/etc/hadoop/core-site.xml

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up. This file can be used to override the default settings that Hadoop starts with.

Open this file with nano using the following command:

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

Save and close this file.

STEP 17: Editing /usr/local/hadoop/etc/hadoop/yarn-site.xml

The /usr/local/hadoop/etc/hadoop/yarn-site.xml file contains configuration properties that MapReduce uses when starting up. This file can be used to override the default settings that MapReduce starts with.

Open this file with nano using the following command:

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

Save and close this file.

STEP 18: Creating and Editing /usr/local/hadoop/etc/hadoop/mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains the /usr/local/hadoop/etc/hadoop/mapred-site.xml.template file which has to be renamed/copied with the name mapred-site.xml. This file is used to specify which framework is being used for MapReduce.

This can be done using the following command:

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

(Don’t think the above code is two step code, lower line is in continuation with first)

Once this is done, open the newly created file with nano using the following command:

sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If “local”, then jobs are run in-process as a single map

and reduce task.

</description>

</property>

Save and close this file.

STEP 19: Editing /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml has to be configured for each host in the cluster that is being used. It is used to specify the directories which will be used as the namenode and the datanode on that host.

Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation.

This can be done using the following commands:

mkdir -p /usr/local/hadoop_store/hdfs/namenode

mkdir -p /usr/local/hadoop_store/hdfs/datanode

Once this is done, open the /usr/local/hadoop/etc/hadoop/hdfs-site.xml file with nano using the following command:

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<name>dfs.replication</name>

</property>

<name>dfs.namenode.name.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/namenode</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:/usr/local/hadoop_store/hdfs/datanode</value>

</property>

Save and close this file.

STEP 20 : Format the New Hadoop Filesystem

After completing all the configuration outlined in the above steps, the Hadoop filesystem needs to be formatted so that it can start being used.

This is done by executing the following command:

hdfs namenode -format

STEP 21: (FINAL STEP) STARTING HADOOP

All that remains to be done is starting the newly installed single node cluster:

start-dfs.sh

While executing this command, you’ll be prompted twice with a message similar to the following:

Are you sure you want to continue connecting (yes/no)?

Type in yes for both these prompts and press the enter key. Once this is done, execute the following command:

start-yarn.sh

Executing the above two commands will get Hadoop up and running. You can verify this by typing in the following command:

jps

If you see some IDs and Name and Data Node displayed alongside, it means that you now have a functional instance of Hadoop running.

REFERENCES :

https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10

http://hadoop.apache.org/releases.html

Linux essentials

Search This Blog

Installing hadoop on Ubuntu as single node

https://apsaggu.wordpress.com/

https://www.youtube.com/watch?v=PAArVg-6gg0

Comments

Post a Comment

Popular posts from this blog

Improve Ubuntu Battery by installing TLP for Linux

Fix Ubuntu Brightness with a software

Restore corrupt .bashrc to default in Ubuntu