In this example I am adding a new DataNode with 3 volumes 200 GB, each.
The DataNode is created through the WebUI in the cloud and so are the 3 volumes. Each volume is attached to device in the following order:
volume01 – /dev/vdb
volume02 – /dev/vdc
volume03 – /dev/vdd
After the new “soon-to-be” DataNode instance has been created and volumes attached there is some work to be done in the command line interface:
- Use ssh to connect to the new DataNode instance.
ssh -i .ssh/key w-datanode04
- Update and upgrade the system.
sudo apt-get update -y && sudo apt-get upgrade -y
- Create the directories where the data for each volume for the DataNode will be stored.
sudo mkdir -p /data/vol1 /data/vol2 /data/vol3
- Format file system for every device attached to every volume.
sudo mkfs.ext4 /dev/vdb sudo mkfs.ext4 /dev/vdc sudo mkfs.ext4 /dev/vdd
- Mount the volumes to the respective directory.
sudo mount /dev/vdb /data/vol1 sudo mount /dev/vdc /data/vol2 sudo mount /dev/vdd /data/vol3
- Label the volumes for easier future work.
sudo e2label /dev/vdb "vol1" sudo e2label /dev/vdc "vol2" sudo e2label /dev/vdd "vol3"
- Open and update /etc/fstab.
This is smart to do to keep the volumes mounted to the directories after the DataNode is restarted.LABEL=vol1 /data/vol1 ext4 defaults,nobootwait 0 0 LABEL=vol2 /data/vol2 ext4 defaults,nobootwait 0 0 LABEL=vol3 /data/vol3 ext4 defaults,nobootwait 0 0
- Check if volumes are mounted to correct directories.
df -h
Something like this should appear:
/dev/vdb 197G 241M 187G 1% /data/vol1
/dev/vdc 197G 299M 187G 1% /data/vol2
/dev/vdd 197G 65M 187G 1% /data/vol3 - For future reference, you can check the size of all monuted folders under directory /data.
sudo du -hs /data/vol*
Something similar to this should be in the output.
The used disk information in the below example shows data after some files have been done to the HDFS. Immidiately after the Datanode is added to the Hadoop claster, the DataNode holds no filesblocks.181M /data/vol1
240M /data/vol2
5.4M /data/vol3
Now the DataNode with multiple volumes is ready to be added to the cluster.
It is important to change the property dfs.datanode.data.dir in hdfs-default.xml. Or if you are using Ambari: HDFS -> Configs -> Settings and on the right side, you find the first property under DataNode to be “DataNode directories”.
Note: if you are adding new DataNodes with new DataNodes directories, it is smart to first append the new directories to the existing ones (comma separated, no spaces) and after the DataNodes are added, then remove the old directories.
If there is a directory in this property that does not exist, HDFS will ignore it and will not fail.
How to add a DataNode to a cluster with Ambari is described here.
One thought on “Creating and adding a DataNode with multiple volumes”