I have multiple users on one client who are going to use/test ZeppelinR. For every Zeppelin user I create a copy of built Zeppelin folder in user’s home directory. I dedicate a port to that user (8080 is for my testing, running), for example my first user got port 8082. This is done in user’s $ZEPPELIN_HOME/conf/zeppelin-site.xml.
Example for one user:
<property> <name>zeppelin.server.port</name> <value>8082</value> <description>Server port.</description> </property>
Running Zeppelin as root is not a big problem. Running ZeppelinR as root is also not so problematic. Running it as a normal Linux user can give some challenges.
There is this error message that can surprise you when starting a new Spark context from Zeppelin Web UI.
Taken from Zeppelin log file (zeppelin-user_running_zeppelin-t-client01.log):
ERROR [2016-03-18 08:10:47,401] ({Thread-20} RemoteScheduler.java[getStatus]:270) – Can’t get status information
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getStatus(RemoteInterpreterService.java:355)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getStatus(RemoteInterpreterService.java:342)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:256)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.run(RemoteScheduler.java:205)
ERROR [2016-03-18 08:11:47,347] ({pool-1-thread-2} RemoteScheduler.java[getStatus]:270) – Can’t get status information
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getStatus(RemoteInterpreterService.java:355)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getStatus(RemoteInterpreterService.java:342)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:256)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:335)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The zeppelin out file (zeppelin-user_running_zeppelin-t-client01.out) gives a more concrete description of the problem:
Exception in thread "Thread-80" org.apache.zeppelin.interpreter.InterpreterException: java.lang.RuntimeException: Could not find rzeppelin - it must be in either R/lib or ../R/lib at org.apache.zeppelin.interpreter.ClassloaderInterpreter.getScheduler(ClassloaderInterpreter.java:146) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getScheduler(LazyOpenInterpreter.java:115) at org.apache.zeppelin.interpreter.Interpreter.destroy(Interpreter.java:124) at org.apache.zeppelin.interpreter.InterpreterGroup$2.run(InterpreterGroup.java:115) Caused by: java.lang.RuntimeException: Could not find rzeppelin - it must be in either R/lib or ../R/lib at org.apache.zeppelin.rinterpreter.RContext$.apply(RContext.scala:353) at org.apache.zeppelin.rinterpreter.RInterpreter.rContext$lzycompute(RInterpreter.scala:43) at org.apache.zeppelin.rinterpreter.RInterpreter.rContext(RInterpreter.scala:43) at org.apache.zeppelin.rinterpreter.RInterpreter.getScheduler(RInterpreter.scala:80) at org.apache.zeppelin.rinterpreter.RRepl.getScheduler(RRepl.java:93) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.getScheduler(ClassloaderInterpreter.java:144) ... 3 more
The way I solved it was by running Zeppelin service from the $ZEPPELIN_HOME. For users to be able to start the Zeppelin service I have created a script:
export ZEPPELIN_HOME=/home/${USER}/Zeppelin-With-R cd ${ZEPPELIN_HOME} /home/${USER}/Zeppelin-With-R/bin/zeppelin-daemon.sh start
Now I can start and stop the Zeppelin service and start new Spark contexts with no problem.
Here is an example of my YARN applications:
And here are the outputs from Zeppelin when scala, sparkR and Hive are tested: