This post describes a simple Storm topology – random words are written to HDFS. The topology is uploaded on the cluster from the client node. Nimbus is on the cluster’s NameNode. I have 4 DataNodes and on each of them a Supervisor is installed. More on how I installed and configured Storm can be found here.
Services used
I am using Hortonworks 2.4, Hadoop is version 2.7.1, Storm is version 0.10.0. All services were installed through Ambari.
Preparing development environment
Create a new maven project. How to install maven is explained here.
mvn archetype:generate -DgroupId=org.package -DartifactId=storm-project -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
When the project is created, step into the directory (in this case it is storm-project) where the pom.xml file is also located.
In the org.package (./src/main/java/org.package), create folder spout. The App.java can be deleted.
There are 3 files important for this topology: pom.xml, the spout file and the topology file.
Prepare pom.xml
The pom file for this case includes Storm dependencies, with scope provided. Storm jars are not packed together with the topology! It is important to match the versions.
maven-shade-plugin
Add build node with the plugin
<build> <sourceDirectory>src/</sourceDirectory> <resources> <resource> <directory>${basedir}</directory> <includes> <include>*</include> </includes> </resource> </resources> <outputDirectory>classes/</outputDirectory> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>1.4</version> <configuration> <createDependencyReducedPom>true</createDependencyReducedPom> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass></mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>
clojure
Add clojure in the dependencies node. Be sure to check for newer version
<dependency> <groupId>org.clojure</groupId> <artifactId>clojure</artifactId> <version>1.8.0</version> </dependency>
storm-core
Make sure the version matches Storm installation
<dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>0.10.0</version> <!-- keep storm out of the jar-with-dependencies --> <scope>provided</scope> </dependency>
hadoop-client
Hadoop client XML node. Make sure the version matches your Hadoop installation. org.slf4j is omitted otherwise messages about multiple version of the package are appearing
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> </exclusions> </dependency>
hadoop-hdfs
Hadoop hdfs XML node. Make sure the version matches your Hadoop installation. org.slf4j is again omitted
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.1</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> </exclusions> </dependency>
storm-hdfs
<dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-hdfs</artifactId> <version>0.10.1</version> </dependency>
Now that the pom.xml is in order, you can package the project to see if pom.xml is valid
mvn package
Build success should appear. If not, the pom.xml is invalid and should be taken care of.
Click on the next page for Spout.