Prior to configuring and running Spark History Server, Spark should be installed.
How to install Apache Spark 1.6.0 is described here.
How to install Apache spark 2.0 is described here.
Spark History server
Check that $SPARK_HOME/conf/spark-defaults.conf has History Server properties set
spark.eventLog.dir hdfs:///spark-history spark.eventLog.enabled true spark.history.fs.logDirectory hdfs:///spark-history spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.ui.port 18080 spark.history.kerberos.keytab none spark.history.kerberos.principal none
Create spark-history directory in HDFS
sudo -u hdfs hadoop fs -mkdir /spark-history
Change the owner of the directory
sudo -u hdfs hadoop fs -chown spark:hdfs /spark-history
Change permission (be more restrictive if necessary)
sudo -u hdfs hadoop fs -chmod 777 /spark-history
Add user spark to group hdfs on the instance where Spark History Server is going to run
sudo usermod -a -G hdfs spark
To view Spark jobs from other users
When you open the History Server and you are not able to see Spark jobs you are expecting to see, check the Spark out file in the Spark log directory. If error message “Permission denied” is present, Spark History Server is trying to read the job log file, but has no permission to do so.
Spark user should be added to the group of the spark job owner.
For example, user marko belongs to a group employee. If marko starts a Spark job, the log file for this job will have user and group marko:employee. In order for spark to be able to read the log file, spark user should e added to the employee group. This is done in the following way
sudo usermod -a -G employee spark
Checking spark’s groups
groups spark
should return group employee among spark’s groups.
Start Spark History server
sudo -u spark $SPARK_HOME/sbin/start-history-server.sh
Output:
starting org.apache.spark.deploy.history.HistoryServer, logging to /var/log/spark/spark-spark-org.apache.spark.deploy.history.HistoryServer-1-t-client01.out
Accessing Spark History server from the web UI can be done by accessing spark-server:18080. The following screen should load.
A fresh Spark History Server installation has no applications to show (no applications in hdfs:/spark-history).
Spark History Server offers a great monitoring interface for Spark applications!
WARN ServletHandler: /api/v1/applications
If you happen to start Spark History Server but get neither completed nor incompleted applications on the Web UI, check the log files. If you get something like the following
WARN ServletHandler: /api/v1/applications java.lang.NullPointerException at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:479) at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.spark_project.jetty.server.Server.handle(Server.java:499) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745)
Take the jersey-bundle-*.jar file out of the $SPARK_HOME/jars directory. Hortonworks dont need it, you dont need it 🙂
how it will work when ranger hdfs plugin implemented with Ad/ldap users
LikeLike
HI Nerraj! Great question and perfect timing. Right now Im working on a project where Ranger HDFS plugin is used, cluster is kerberized and LDAP catalog is used for handling users. At this point, Im implementing Kerberos to my server – I can keep you updated once we get to LDAP/Kerberos point.
LikeLike