Adding new DataNode to the cluster using Ambari

I am going to add one DataNode to my existing cluster. This is going to be done in Ambari. My Hadoop ditribution is Hortonworks.

Work on the node

Adding new node to the cluster affects all the existing nodes – they should know about the new node and the new node should know about the existing nodes. In this case, I am using /etc/hosts to keep nodes “acquainted” with each other.

My only source of truth for /etc/hosts is on Ambari server. From there I run scripts that update the /etc/hosts file on other nodes.

  1.  Open the file.
    sudo vi /etc/hosts
  2. Add a new line to it and save the file. In Ubuntu, this takes immediate effect.

    10.0.XXX.XX     t-datanode02.domain       t-datanode02

  3. Running the script to update the cluster.
    As per now, I have one line per node in the script, as shown below. it is on my to-do list to create a loop that would read from original /etc/hosts and update the cluster.
    So the following line is added to the existing lines in the script.

    cat /etc/hosts | ssh ubuntu@t-datanode02 -i /home/ubuntu/.ssh/key "sudo sh -c 'cat > /etc/hosts'";
  4. Updating the system on the new node
    I tend to run this from Ambari. If multiple nodes are added, I run a script.

    ssh -i /home/ubuntu/.ssh/key ubuntu@t-datanode02 'sudo apt-get update -y && sudo apt-get upgrade -y'
  5. Adjusting maximum number of open files and processes.
    Since this is a DataNode we are adding, number of open files and processes has to be adjusted.
    Open the limits.conf file on the node.

    sudo vi /etc/security/limits.conf
  6. Add the following two lines at the end of the file

    *                –       nofile          32768
    *                –       nproc           65536

  7. Save the file, exit the CLI and log in again.
  8. The changes can be seen by typing the following command.
    ulimit -a

    Output is the following:

    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 257202
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 32768
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 65536
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited

Work from Ambari

  1. Log in to Ambari, click on Hosts and choose Add New Hosts from the Actions menu.
    ambari-add-new-host
  2. In step Install Options, add the node that is soon to become a DataNode.
    Hortonworks warns against using anything than FQDN as Target Hosts!

    If multiple nodes are added in this step, they can be written one per line. If there is a numerical pattern in the names of the nodes , Pattern Expressions can be used.
    Example nodes:
    datanode01
    datanode02
    datanode03
    Writing this in one line with Pattern Expressions:
    datanode[01-03]
    Worry not, Ambari will ask you to confirm the host names if you have used Pattern Expressions:ambari-pattern-expression-example
    (This is a print screen from one of my earlier cluster installations)Private key has to be defined and SSH User Account is by default root, but that will not work. In my case, I am using Ubuntu, so the user is ubuntu.
    ambari-new-host-install-options
    Now I can click Register and Confirm.
  3. In the Confirm Hosts step, Ambari server connects to the new node using SSH, it registers the new node to the cluster and installs Ambari Agent in order to keep control over it.Registering phase:
    ambari-new-host-registering-status
    New node has been registered successfully:
    ambari-new-host-success-status
    If anything else but this message is shown, click on the link to check the results. The list of checks performed is shown and everything should be in order before continuing (Earlier versions had a problem if ntpd or snappy was not installed/started, for example).
    ambari-new-host-check-passed
    All good in the hood here so I can continue with the installation.
  4. In step Assign Slaves and Clients I define my node to be a DataNode and has a NodeManager installed as well (if you are running Apache Storm, Supervisor is also an option).
    ambari-new-host-assign-slaves-clientsClick next.
  5. In step Configurations, there is not much to do, unless you operate with more than one Configuration Group.
    ambari-new-host-configurationsClick Next.
  6. In step Review, one can just doublecheck if everything is as planned.
    Click deploy if everything is as it should be.
  7. Step Install, Start and Test is the last step. After everything is installed, new DataNode has joined the cluster.Here is how this should look like:
    ambari-new-host-install-successClick Next.
  8. Final step – Summary – gives a status update.ambari-new-host-summaryClick on Complete and list of installed Hosts will load.

14 thoughts on “Adding new DataNode to the cluster using Ambari

  1. im stuck in step 3. How do i run the script and what script do i need to run?
    thank you! 🙂

    Like

    1. Hi! The script I mention there is a custom script for updating /etc/hosts on all nodes. You can update the file manually without the script by adding a new line in the /etc/hosts on all nodes.

      Like

      1. Than you for your response!

        Im still stuck… So after adding the line with de ip and the name for my new node, what do i need to do?
        execute de linecat /etc/hosts | ssh ubuntu@t-datanode02 -i /home/ubuntu/.ssh/key “sudo sh -c ‘cat > /etc/hosts'”; ??

        Like

  2. Than you for your response!

    Im still stuck… So after adding the line with de ip and the name for my new node, what do i need to do?
    execute de linecat /etc/hosts | ssh ubuntu@t-datanode02 -i /home/ubuntu/.ssh/key “sudo sh -c ‘cat > /etc/hosts'”; ??

    Like

  3. sorry for my multi-questions but im new in this and what i thought it would be easy, it is becoming a nightmare.
    executing the command line ssh -i /home/ubuntu/.ssh/key ubuntu@t-datanode02 ‘sudo apt-get update -y && sudo apt-get upgrade -y’ gives me error ssh:connect to host sandbox1-hdp.hortonworks.com port 22: connection refused

    😦

    Like

      1. I would recommend you to rather use cloud. In AWS you can create an HDP cluster for administration with free instances. You will learn so much more. Also I would consider infrastructure-as-code for building the HDP cluster. You can find me on linkedin for more details, I can share more there on the topics.

        Like

      2. You will still use Ambari. The only difference between sandbox and using cloud is that in the cloud you can create a mulitnode cluster for better understanding of distributed systems. And the admin tool would still be Ambari. I have some posts on my blog on how to use Ambari to create a multinode cluster.

        Like

Leave a comment