Configuring Ranger Plugins in Ambari

In previous post, I described how to install Ranger in Ambari on HDP.


Ranger allows (through configuration) both Ranger policies and HDFS permissions to be checked for a user request. When a user request is received in NameNode, Ranger plugin will check for policies set through Ranger admin. If there are no policies, Ranger plugin will check for permission set in HDFS.

It is recommended to have restrictive permission at HDFS level and create permission in Ranger security admin.

Configuring HDFS Plugin happens in two places – HDFS service and Ranger service.

HDFS service

Select HDFS service from the Services menu.

Open Advanced ranger-hdfs-plugin-properties ad check the Enable Ranger for HDFS checkbox.

Change the following property by replacing NAMENODE_HOSTNAME with the RANGER_HOST.


If you are using an older HDP version, check Audit to DB.

audit to db

Change HDFS umask from 022 to 077.

umask 077

Save the properties and restart the service.

The following message appears, click OK to restart HDFS.

dependent configurations

Ranger service

In Ranger, under tab Config

Switch on HDFS Ranger Plugin



Change the audit source type from default solr to db.

audit source type

Save and restart Ranger service.


comming soon…

Adding and configuring service Ranger in Ambari

Ranger is a framework to enable, monitor and manage data security in Hadoop cluster. The service comes from Hortonworks and is a part of Apache family now.

This post describes how Ranger 0.5.0 is installed and configured  with audit data stored in a database. Default setting is Solr, my cluster does not have Solr, but it has a MySql database.

My Hadoop distribution is Hortonworks and versions mentioned in this post are 2.3.4 and 2.5.


Database preparation

Install MySql

(If not installed yet)

sudo apt-get install mysql-server -y

Set up Ranger database

Note for HDP 2.3.4!
Ranger database has to be created manually otherwise the installation will not go through. If you are using HDP 2.5, this is done through Ambari Add Service Wizard. Move on to “Adding Service in Ambari”.

create database ranger;
CREATE USER 'ranger'@'localhost' IDENTIFIED BY 'ranger';
GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'localhost';
CREATE USER 'ranger'@'%' IDENTIFIED BY 'ranger';
GRANT ALL PRIVILEGES ON *.* TO 'ranger'@'%';

If the MySql database is on another server than Ranger, check from RANGER_SERVER if you can log in to the database

mysql -u ranger -pranger -h MYSQL_SERVER

Adding Service in Ambari

Start Add Service Wizard and choose service Ranger

Add service

Some requirements have to be fulfilled.

Ranger Requirements

Check if MySql Java Connector is present on Ambari Server

ls /usr/share/java/mysql-connector-java.jar

Run the following on Ambari Server if the file is present

sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar


Using python  /usr/bin/python
Setup ambari-server
Copying /usr/share/java/mysql-connector-java.jar to /var/lib/ambari-server/resources
If you are updating existing jdbc driver jar for mysql with mysql-connector-java.jar. Please remove the old driver jar, from all hosts. Restarting services that need the driver, will automatically copy the new jar to the hosts.
JDBC driver was successfully initialized.
Ambari Server 'setup' completed successfully.

Assign masters for both Ranger services. In this case, the services are installed on the NameNode.

Assign masters

Choose DB flavor, tye in ranger DB host and ranger password (same as in the script from the previous chapter)

Wizard - Ranger Admin

Type password for root user and test the connection.

Wizard root password

If the MySql database is on another server, user has to be created and grants for root from Ranger server have to be granted.


In the Audit tab:
– switch off Audit to Solr
– switch on Audit to HDFS
– switch on Audit to DB and type in password for Ranger Audit user. (HDP 2.3.4)

HDP 2.5: Audit to DB is not an option anymore.

Wizard - audit storage

Ranger is now installed and can be accessed on the RANGER_SERVER:6080.

Note: the Ranger WEB UI not showing up?
Make sure port 6080 is open.

If the URL is an internal IP address read on:
External URL has to be corrected to ranger host. Authentication in this example is UNIX.

Wizard - ranger url only 2-3-4

Continue to the next step.

Review of the installation follows, if everything is ok, start with the Install, Start and Test.

Upgrading Hortonworks Data Platform from 2.3.4 to 2.4.0

This post describes how to do an Express Upgrade of Hortonworks Data Platform (HDP) with Ambari.

Ugrading HDP begins with upgrading Ambari, Ambari Metrics and, not mandatory but recommended, adding Grafana.

When this is in place and all services are up and running, Upgrading HDP to 2.4 can begin.


File backup

Creating a backup of all the important files and databases is the first step. The following steps are done on the NameNode.

Create backup directory

mkdir /home/ubuntu/HDP-2.3.4-backup

Run HDFS filesystem check and save the ouptut to a file in the backup directory

sudo -u hdfs hdfs fsck / -files -blocks -locations > /home/ubuntu/HDP-2.3.4-backup/dfs-old-fsck-1.log

Gather basic filesystem information and statistics in a report

sudo -u hdfs hdfs dfsadmin -report > /home/ubuntu/HDP-2.3.4-backup/dfs-old-report-1.log

List the whole HDFS directory and save the ouptut to a file

sudo -u hdfs hdfs dfs -ls -R > /home/ubuntu/HDP-2.3.4-backup/dfs-old-lsr-1.log

Enter Safemode, mandatory for next steps

sudo -u hdfs hdfs dfsadmin -safemode enter

Save current namespace and reset edits log

sudo -u hdfs hdfs dfsadmin -saveNamespace

Make a copy of the VERSION file (here is HDP’s default directoy, file VERSION should reside in ${}/current)

sudo cp /hadoop/hdfs/namenode/current/VERSION /home/ubuntu/HDP-2.3.4-backup/

Leave Safemode

sudo -u hdfs hdfs dfsadmin -safemode leave

Finalize upgrade of HDFS
According to the Apache Hadoop documentation:

“Datanodes delete their previous version working directories, followed by Namenode doing the same. This completes the upgrade process.”

sudo -u hdfs hdfs dfsadmin -finalizeUpgrade

Database backup

My cluster has MySql database that is used by Hive and Ranger. That means I have 3 databases to back up: hive, ranger and ranger_audit (since I am storing audit data in a database).


DAT=`date +%Y%m%d_%H%M%S`
mysqldump -u root -proot hive > /home/ubuntu/HDP-2.3.4-backup/hive_$DAT.sql

This is done beforehand so that you can check the checkbox and move on in the process of upgrade

Hive upgrade warning


This is done beforehand so that you can check the checkbox and move on in the process of upgrade

Ranger Admin warning


DAT=`date +%Y%m%d_%H%M%S`
mysqldump -u root -proot ranger > /home/ubuntu/HDP-2.3.4-backup/ranger_$DAT.sql


DAT=`date +%Y%m%d_%H%M%S`
mysqldump -u root -proot ranger_audit > /home/ubuntu/HDP-2.3.4-backup/ranger_audit_$DAT.sql


Content of backup folder

├── dfs-old-fsck-1.log
├── dfs-old-lsr-1.log
├── dfs-old-report-1.log
├── hive_20160804_074811.sql
├── ranger_20160804_074907.sql
├── ranger_audit_20160804_074914.sql

Click below on Page 2 to continue with the process.

Adding Hive, Tez & Pig in Ambari

I have 4 Hadoop environments, all running distribution Hortonworks, versions are either 2.3.4 or 2.4. I have installed HDFS, MapReduce and YARN first and the need is to add Hive.

When installing Hive, Pig and Tez follow with it whether you want it or not.

I already have an existing MySql database (because of Ranger) and this post describes how to install Hive and use an existing MySql for metastore. Installing Hive with a new MySql is actually easier.

  1. On Ambari server, from the CLI, run the following
    sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar


    Using python  /usr/bin/python
    Setup ambari-server
    Copying /usr/share/java/mysql-connector-java.jar to /var/lib/ambari-server/resources
    JDBC driver was successfully initialized.
    Ambari Server ‘setup’ completed successfully.

  2. Log in to Ambari as administrator
  3. From the Actions drop down menu on the left side of the screen, click Add Service
    flume-add service
  4. Choose services
    Check services Tez, Hive and Pig. If you pick only Hive, the installation wizard will remind you that you have to set up Tez and Pig packages as well.
    choose services
  5. Assign masters
    In this case, I am installing Hive on my namenode. This can always be changed – it is possible to move services to other instances (why do you think my namenode is called md-namenode2? ;))
    assign masters
  6. Assign Slaves and Clients
    Tez Client, HCat Client, Hive Client and Pig Client are going to be installed to this host(s).
    In this case I am installing it on the same server as Hive server, on “more serious” clusters I install the clients where they belong – the client server.
    assign slaves
  7. Customize Services
    On the MySql Server used for Hive metastore run the following commands as root

    CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
    CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';


  8. Set up connection string to the metastore
    Choose “Existing MySQL Database”hive metastore setup

    Note: If there is a problem connecting to the database when testing the connection, check also in the my.cnf on the MySql server if the following property is uncommented:

    bind-address           =

    Comment it (# in front of the line), since we are connecting to the server from other hosts than localhost.

  9. Review
    If the installation details are acceptable, proceed with the installation.
  10. When the installation is complete. The installed services are now available
    service available
    Do not forget to restart the services if Ambari suggests so!

Error during installation

resource_management.core.exceptions.Fail: Applying Directory[‘/usr/hdp/’] failed, looped symbolic links found while resolving /etc/tez/conf

The solution to it run the following on the Hive server (md-namenode2 in this example):

unlink /etc/tez/conf

Ambari Upgrade 2: Install Grafana

How Ambari is upgraded to version is described in Ambari Upgrade 1. The upgrade is not complete at this stage yet; a lot more is offered from the visualization perspective. Grafana is offered from Ambari 2.2.2. as a component of Ambari Metrics.

Apache Grafana is a visualization tool for time data series.

Hortonworks’ documentation on this can be obtained here.

Install Grafana

  1. Add the METRICS_GRAFANA component to Ambari
    curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://ambari-server:8080/api/v1/clusters/cluster_name/services/AMBARI_METRICS/components/METRICS_GRAFANA

    If the command was a success the message HTTP/1.1 201 Created should appear.

  2. Add METRICS_GRAFANA to a host
    curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST -d '{"host_components":[{"HostRoles":{"component_name":"METRICS_GRAFANA"}}]}' http://ambari-server:8080/api/v1/clusters/cluster_name/hosts?Hosts/host_name=grafana-server-fqdn

    If the command was a success the message HTTP/1.1 201 Created should appear.
    If the message is HTTP/1.1 200 OK then something went wrong, perhaps the server where Grafana is going to be installed was not properly defined – use FQDN.

  3. In Ambari Metrics, under Configs, under tab General Grafana Admin Password is missing.
    ambari grafana config uname passwordEnter a password and save the changes.
  4. In Ambari, under Services -> Ambari Metrics you will see status on Grafana
    ambari grafana status
  5. In Ambari, under Hosts -> grafana-server (or the name where Grafana resides), you find the Grafana component ready for install. Click on the Install Prending
    grafana install pending
  6. And click on Re-Install
    grafana reinstall
  7. Install Status should be a success
    grafana install status success
    Grafana Web UI should now be visible at the following address: grafana-server:3000. If everything went well, you should just refresh the website and see the Ambari Dashboards.
    Signing in the Grafana gives extra options, among others also the Data Sources. AMBARI_METRICS is already added.


Manually configuring Data Source

Sign in.
Go to Grafana Web Ui and define an Ambari Metrics datasource
Data sources -> Add new
Under Url, type the grafana-server url address.
grafana define datasourceSave and Test the connection.

When you go in Grafana again, you should be able to see the list of default Ambari dashboards.



Creating and adding a DataNode with multiple volumes

In this example I am adding a new DataNode with 3 volumes 200 GB, each.

The DataNode is created through the WebUI in the cloud and so are the 3 volumes. Each volume is attached to device in the following order:

volume01 – /dev/vdb
volume02 – /dev/vdc
volume03 – /dev/vdd

After the new “soon-to-be” DataNode instance has been created and volumes attached there is some work to be done in the command line interface:

  1. Use ssh to connect to the new DataNode instance.
    ssh -i .ssh/key w-datanode04
  2. Update and upgrade the system.
    sudo apt-get update -y && sudo apt-get upgrade -y
  3. Create the directories where the data for each volume for the DataNode will be stored.
    sudo mkdir -p /data/vol1 /data/vol2 /data/vol3
  4. Format file system for every device attached to every volume.
    sudo mkfs.ext4 /dev/vdb
    sudo mkfs.ext4 /dev/vdc
    sudo mkfs.ext4 /dev/vdd
  5. Mount the volumes to the respective directory.
    sudo mount /dev/vdb /data/vol1
    sudo mount /dev/vdc /data/vol2
    sudo mount /dev/vdd /data/vol3
  6. Label the volumes for easier future work.
    sudo e2label /dev/vdb "vol1"
    sudo e2label /dev/vdc "vol2"
    sudo e2label /dev/vdd "vol3"
  7. Open and update /etc/fstab.
    This is smart to do to keep the volumes mounted to the directories after the DataNode is restarted.

    LABEL=vol1 /data/vol1 ext4 defaults,nobootwait 0 0
    LABEL=vol2 /data/vol2 ext4 defaults,nobootwait 0 0
    LABEL=vol3 /data/vol3 ext4 defaults,nobootwait 0 0
  8. Check if volumes are mounted to correct directories.
    df -h

    Something like this should appear:

    /dev/vdb      197G      241M      187G      1%      /data/vol1
    /dev/vdc      197G      299M      187G      1%      /data/vol2
    /dev/vdd      197G        65M      187G      1%      /data/vol3

  9. For future reference, you can check the size of all monuted folders under directory /data.
    sudo du -hs /data/vol*

    Something similar to this should be in the output.
    The used disk information in the below example shows data after some files have been done to the HDFS. Immidiately after the Datanode is added to the Hadoop claster, the DataNode holds no filesblocks.

    181M     /data/vol1
    240M    /data/vol2
    5.4M     /data/vol3

Now the DataNode with multiple volumes is ready to be added to the cluster.

It is important to change the property in hdfs-default.xml. Or if you are using Ambari: HDFS -> Configs -> Settings and on the right side, you find the first property under DataNode to be “DataNode directories”.

Note: if you are adding new DataNodes with new DataNodes directories, it is smart to first append the new directories to the existing ones (comma separated, no spaces) and after the DataNodes are added, then remove the old directories.
If there is a directory in this property that does not exist, HDFS will ignore it and will not fail.

How to add a DataNode to a cluster with Ambari is described here.

Removing Datanode(s) from a cluster with Ambari

Datanodes come and go.
Proper removal of a Datanode is important, otherwise you might end up with missing blocks or unconsolidated Ambari meta database.
In this post, I am removing 2 Datanodes from my cluster. Before doing this, I have to know the replication factor (3 in my case) and the number of Datanodes left after the decommission. This is important because of the following rule:

If replication factor is higher than the number of existing datanodes after the removal, the removal process is not going to succeed!

In my case, I have 5 datanodes in the cluster, so I am going to be left with 3 Datanodes after removing 2.

Following procedure is needed to remove Datanode(s) properly:

  1. From the Hosts list, check the Datanodes you wish to remove.
  2. Decommission Nodemanagers
    decomission nodemanager
  3. Decommision Datanodesdecommission datanode
  4. Stop Ambari Metrics on each Datanode
    ambari metrics stop
  5. While on the same page, stop all components
    stop all componentsRepeat this for every Datanode.Click OK on the confirmation window.
    stop all components confirmation
    Background Operations window informs you when the components are stopped.
    status report
  6. Stop the Ambari Agent.
    Log in to each to-be-removed Datanode and stop the Ambari Agent.

    sudo ambari-agent stop
  7. Delete Host
    While on the same page, from the Host Actions menu, choose Delete Host
    delete host
    Repeat this for every Datanode.
  8. Restart HDFS and YARN services from Ambari.
    The soon to be removed Datanode(s) are in decommissioned status and will remain in it until HDFS service is restarted. Ambari reminds you to restart HDFS and YARN.HDFS status

The unwanted Datanodes have now been removed. I have a cluster with 3 Datanodes after removing 2.

Corrupted blocks in HDFS

Ambari on one of my test environments was warning me about under replicated blocks in the cluster:

ambari - under replicated blocks

Opening the Namenode web UI (http://test01.namenode1:50070/dfshealth.html#tab-overview) painted a different picture:

namenode web ui - overview information about blocks

11 blocks are missing which might lead to corrupted files in the cluster.

Running file check on one of the files:

sudo -u hdfs hdfs fsck /tmp/tony/adac.json

The output:

fsck - corrupted file

The output reveals that the file is corrupted, was split in 4 blocks and is replicated only once on the cluster.

Nothing to do but to permanenetly delete the file:

sudo -u hdfs hdfs dfs -rm -skipTrash /tmp/tony/adac.json

Ambari wil now show 4 blocks less in the Under Replicated Blocks widget. Refreshing the Namenode web UI will also show that the corrupted blocks are removed.

Namenode hangs when restarting -can’t leave safemode

I am using Hortonworks distribution and Ambari for Hadoop administration. Sometimes, HDFS has to be restarted and sometimes Namenode hangs in the process giving the following output:

2016-04-21 06:12:47,391 – Retrying after 10 seconds. Reason: Execution of ‘hdfs dfsadmin -fs hdfs://t-namenode1:8020 -safemode get | grep ‘Safe mode is OFF” returned 1.
2016-04-21 06:12:59,595 – Retrying after 10 seconds. Reason: Execution of ‘hdfs dfsadmin -fs hdfs://t-namenode1:8020 -safemode get | grep ‘Safe mode is OFF” returned 1.
2016-04-21 06:13:11,737 – Retrying after 10 seconds. Reason: Execution of ‘hdfs dfsadmin -fs hdfs://t-namenode1:8020 -safemode get | grep ‘Safe mode is OFF” returned 1.
2016-04-21 06:13:23,918 – Retrying after 10 seconds. Reason: Execution of ‘hdfs dfsadmin -fs hdfs://t-namenode1:8020 -safemode get | grep ‘Safe mode is OFF” returned 1.
2016-04-21 06:13:36,101 – Retrying after 10 seconds. Reason: Execution of ‘hdfs dfsadmin -fs hdfs://t-namenode1:8020 -safemode get | grep ‘Safe mode is OFF” returned 1.

To get out of this loop I run the following command from the command line on the Namenode:

sudo -u hdfs hdfs dfsadmin -safemode leave

The output is the following:

Safe mode is OFF

If you have High Availability in the cluster, something like this shows up:

Safe mode is OFF in t-namenode1/10.x.x.171:8020
Safe mode is OFF in t-namenode2/10.x.x.164:8020

After the command is executed, the Namenode restart process in Ambari continues.

Ambari Upgrade 1: Upgrading Hortonworks Ambari from 2.1 to 2.2


This post explains how to upgrade from Ambari 2.1 to either version or

Im using an external database MySql as Ambari database. My operating system is Ubuntu 14.04 Trusty. Hive service is using external database – MySql (important information for later).

The cluster does NOT have the following services installed:

  • Ranger
  • Storm
  • Ganglia
  • Nagios

The upgraded cluster does not use LDAP, nor Active Directory.

If you have any of the above mentioned services, check this link to learn how to handle them in the upgrade process.


The following steps are done on the Ambari server, unless explicity mentioned otherwise.

  1. Create a folder for backup files on all nodes in the cluster.
    mkdir /home/ubuntu/ambari-backup
  2. Backup the Ambari MySql database.
    DAT=`date +%Y%m%d_%H%M%S`
    mysqldump -u root -proot ambari_db > /home/ubuntu/ambari-backup/ambari_db_$DAT.sql
  3. Backup the file.
    sudo cp /etc/ambari-server/conf/ /home/ubuntu/ambari-backup


  1. Make sure you have Java 1.7+ on the Ambari server.
  2. Stop Ambari Metrics from the Ambari web UI.
  3. Stop Ambari server
    sudo ambari-server stop
  4. Stop all Ambari agents on all nodes.
    sudo ambari-agent stop
  5. Remove old repository file ambari.list from all nodes. Different Linux flavours might have different file name check here, page 6.
    sudo mv /etc/apt/sources.list.d/ambari.list /home/ubuntu/ambari-backup
  6. Download new repository file on all nodes.
    Ambari 2.2.1 for Ubuntu 14:

    sudo wget -nv -O /etc/apt/sources.list.d/ambari.list

    Ambari 2.2.2 for Ubuntu 14:

    sudo wget -nv -O /etc/apt/sources.list.d/ambari.list
  7. Update Ubuntu packages and check version.
    sudo apt-get clean all
    sudo apt-get update
    sudo apt-cache show ambari-server | grep Version

    If you are installing to 2.2.1, you should see the following output:
    If you are installing to 2.2.2, you should see the following output:

  8. Install Ambari server on the node dedicated for Ambari server.
    sudo apt-get install ambari-server

    Confirm that there is only one ambari-server*.jar file in /usr/lib/ambari-server.
    Jar files related to upgrade 2.2.1:


    Jar files related to upgrade 2.2.2:


  9. Install Ambari agents on all nodes in the cluster.
    sudo apt-get update -y && sudo apt-get install ambari-agent
  10. Upgrade Ambari database.
    sudo ambari-server upgrade
  11. The following question show up: “Ambari Server configured for MySQL. Confirm you have made a backup of the Ambari Server database [y/n] (y)?”
    Press “y”, since that was done in the backup process.When the installation is completed, the following output concludes the installation process:

    Ambari Server ‘upgrade’ completed successfully.

  12. Start Ambari server.
    sudo ambari-server start
  13. On all nodes where Ambari agent is installed, start the agent.
    sudo ambari-agent start
  14. Hive in the cluster is using external database – MySql, so this step is mandatory. Reinstall mysql connector file
    sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
  15. Log in to the upgraded Ambari (same URL, same port, same username and password)
  16. Restart all services in Ambari

Ambari Metrics upgrade

  1. Stop all Ambari Metrics services in Ambari.
  2. On every node in the cluster, where Metrics Monitor is installed, execute the following commands.
    sudo apt-get clean all
    sudo apt-get update
    sudo apt-get install ambari-metrics-assembly
  3. On every node in the cluster, where Metrics Collector is installed, execute the following commands (yes, the command is the same as in previous step).
    sudo apt-get install ambari-metrics-assembly
  4. Start Ambari Metrics services in Ambari.


After the upgrade, it is possible to run into the following message when accessing Ambari Web UI.

Ambari post upgrade message in browser

Ctrl+Shift+R solves the problem. The text in the message is quite descriptive and explains why this message is showing.


Next step is installing Grafana. This is covered in post Ambari Upgrade 2.

Additional links

The link takes you to the Hortonworks Ambari upgrade document.
The link takes you to the Hortonworks Ambari upgrade document.