Creating and adding a DataNode with multiple volumes

In this example I am adding a new DataNode with 3 volumes 200 GB, each.

The DataNode is created through the WebUI in the cloud and so are the 3 volumes. Each volume is attached to device in the following order:

volume01 – /dev/vdb
volume02 – /dev/vdc
volume03 – /dev/vdd

After the new “soon-to-be” DataNode instance has been created and volumes attached there is some work to be done in the command line interface:

Use ssh to connect to the new DataNode instance.
```
ssh -i .ssh/key w-datanode04
```

Update and upgrade the system.

sudo apt-get update -y && sudo apt-get upgrade -y

Create the directories where the data for each volume for the DataNode will be stored.
```
sudo mkdir -p /data/vol1 /data/vol2 /data/vol3
```

Format file system for every device attached to every volume.

sudo mkfs.ext4 /dev/vdb
sudo mkfs.ext4 /dev/vdc
sudo mkfs.ext4 /dev/vdd

Mount the volumes to the respective directory.

sudo mount /dev/vdb /data/vol1
sudo mount /dev/vdc /data/vol2
sudo mount /dev/vdd /data/vol3

Label the volumes for easier future work.

sudo e2label /dev/vdb "vol1"
sudo e2label /dev/vdc "vol2"
sudo e2label /dev/vdd "vol3"

Open and update /etc/fstab.
This is smart to do to keep the volumes mounted to the directories after the DataNode is restarted.

LABEL=vol1 /data/vol1 ext4 defaults,nobootwait 0 0
LABEL=vol2 /data/vol2 ext4 defaults,nobootwait 0 0
LABEL=vol3 /data/vol3 ext4 defaults,nobootwait 0 0

Check if volumes are mounted to correct directories.
```
df -h
```
Something like this should appear:

/dev/vdb 197G 241M 187G 1% /data/vol1
/dev/vdc 197G 299M 187G 1% /data/vol2
/dev/vdd 197G 65M 187G 1% /data/vol3
For future reference, you can check the size of all monuted folders under directory /data.
```
sudo du -hs /data/vol*
```
Something similar to this should be in the output.
The used disk information in the below example shows data after some files have been done to the HDFS. Immidiately after the Datanode is added to the Hadoop claster, the DataNode holds no filesblocks.

181M /data/vol1
240M /data/vol2
5.4M /data/vol3

Now the DataNode with multiple volumes is ready to be added to the cluster.

It is important to change the property dfs.datanode.data.dir in hdfs-default.xml. Or if you are using Ambari: HDFS -> Configs -> Settings and on the right side, you find the first property under DataNode to be “DataNode directories”.

Note: if you are adding new DataNodes with new DataNodes directories, it is smart to first append the new directories to the existing ones (comma separated, no spaces) and after the DataNodes are added, then remove the old directories.
If there is a directory in this property that does not exist, HDFS will ignore it and will not fail.

How to add a DataNode to a cluster with Ambari is described here.

markobigdata

Big Data documentation in a blog

Creating and adding a DataNode with multiple volumes

One thought on “Creating and adding a DataNode with multiple volumes”

Leave a comment Cancel reply

Share this:

One thought on “Creating and adding a DataNode with multiple volumes”

Leave a comment Cancel reply