Pig – markobigdata

The cluster

I am using AWS EC2 services, operating system is Centos7.

I have an Ambari server, version 2.6.2 and an HDP cluster version 2.6.5. This should work on other versions as well.

My cluster has one NameNode, on which Ambari is installed as well, and one DataNode. The services installed are the bare minimum – HDFS, YARN, MapReduce2, Zookeeper and Hive.

Adding Pig to the cluster

Variables

export AMBARI_SERVER=PUBLIC_IP export MASTER_DNS=NAMENODE_PRIVATE_DNS export SLAVE_DNS=DATANODE_PRIVATE_DNS export CLUSTER_NAME=mincluster2

Create service on the Cluster

curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST -d '{"ServiceInfo":{"service_name":"PIG"}}' 'http://'$AMBARI_SERVER':8080/api/v1/clusters/'$CLUSTER_NAME'/services'

Pig service is added to the list of services in Ambari.

Check for service on the cluster

curl -k -u admin:admin -H "X-Requested-By:ambari" -i -X GET 'http://'$AMBARI_SERVER':8080/api/v1/clusters/'$CLUSTER_NAME'/services/PIG'

The service is registered on the cluster.

Add components to the service

curl -k -u admin:admin -H "X-Requested-By:ambari" -i -X POST -d '{"RequestInfo":{"context":"Install PIG"}, "Body":{"HostRoles":{"state":"INSTALLED"}}}' 'http://'$AMBARI_SERVER':8080/api/v1/clusters/'$CLUSTER_NAME'/services/PIG/components/PIG'

Running the curl command from previous step to check if the component is added returns the following:

The component has been added according to the “components” element in the JSON output. The state of the service is still “UNKNOWN”.

Creating configuration is on the next page.

I have 4 Hadoop environments, all running distribution Hortonworks, versions are either 2.3.4 or 2.4. I have installed HDFS, MapReduce and YARN first and the need is to add Hive.

When installing Hive, Pig and Tez follow with it whether you want it or not.

I already have an existing MySql database (because of Ranger) and this post describes how to install Hive and use an existing MySql for metastore. Installing Hive with a new MySql is actually easier.

On Ambari server, from the CLI, run the following
```
sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar
```
Output:

Using python /usr/bin/python
Setup ambari-server
Copying /usr/share/java/mysql-connector-java.jar to /var/lib/ambari-server/resources
JDBC driver was successfully initialized.
Ambari Server ‘setup’ completed successfully.
Log in to Ambari as administrator
From the Actions drop down menu on the left side of the screen, click Add Service
Choose services
Check services Tez, Hive and Pig. If you pick only Hive, the installation wizard will remind you that you have to set up Tez and Pig packages as well.
Assign masters
In this case, I am installing Hive on my namenode. This can always be changed – it is possible to move services to other instances (why do you think my namenode is called md-namenode2? ;))
Assign Slaves and Clients
Tez Client, HCat Client, Hive Client and Pig Client are going to be installed to this host(s).
In this case I am installing it on the same server as Hive server, on “more serious” clusters I install the clients where they belong – the client server.

Customize Services
On the MySql Server used for Hive metastore run the following commands as root

CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
CREATE USER 'hive'@'%' IDENTIFIED BY 'hive';
FLUSH PRIVILEGES;

Set up connection string to the metastore
Choose “Existing MySQL Database”

Note: If there is a problem connecting to the database when testing the connection, check also in the my.cnf on the MySql server if the following property is uncommented:

bind-address = 127.0.0.1

Comment it (# in front of the line), since we are connecting to the server from other hosts than localhost.
Review

If the installation details are acceptable, proceed with the installation.
When the installation is complete. The installed services are now available

Do not forget to restart the services if Ambari suggests so!

Error during installation

resource_management.core.exceptions.Fail: Applying Directory[‘/usr/hdp/2.4.0.0-169/tez/conf’] failed, looped symbolic links found while resolving /etc/tez/conf

The solution to it run the following on the Hive server (md-namenode2 in this example):

unlink /etc/tez/conf

markobigdata

Big Data documentation in a blog

Category: Pig

Adding service to HDP using REST API