Oozie 4.

0 instalation

1. Download version 4.0 of Oozie -> wget http://mirror.symnds.com/software/Apache/oozie/4.0.0/oozie-4.0.0.tar.gz 2. Extract the tar gz file -> tar -xzvf oozie-4.0.0.tar.gz 3. cd oozie-4.0.0 4. Install maven -> sudo apt-get install maven 5. replace in pom xml hadoop version -> find . -name pom.xml | xargs sed -ri 's/(2.2.0\SNAPSHOT)/2.2.0-cdh5.0.0-beta-2/' -> add in pom.file the following repository in order to be able to build oozie with hadoop 2.0.0-mr1-cdh4.4.0

<repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository>

6. ./bin/mkdistro.sh -DskipTests

7. - cd distro/target/oozie-4.0.0-distro/ - cp -r oozie-4.0.0/ ~/oozie

8. - export OOZIE_HOME=/home/ubuntu/oozie and cd $OOZIE_HOME - export HADOOP_HOME="/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39"

*10. cp -r /usr/local/hadoop/share/hadoop/common/*jar libext/ *11. cd to libext folder and download the following file -> wget http://extjs.com/deploy/ext2.2.zip

cd $OOZIE_HOME/oozie-server/lib - wget http://extjs.com/deploy/ext-2.2.zip - rm -rf ecj-3.7.2.jar - wget http://repo1.maven.org/maven2/tomcat/jasper-compiler/5.5.23/jasper-compiler5.5.23.jar - wget http://repo1.maven.org/maven2/tomcat/jasper-compiler-jdt/5.5.23/jasper-compiler-jdt5.5.23.jar - cp $HADOOP_HOME/lib//hadoop-0.20-mapreduce/hadoop-core-2.0.0-mr1-cdh4.4.0.jar . - cp $HADOOP_HOME/lib/hadoop/hadoop-common-2.0.0-cdh4.4.0.jar .

.create.xml <property> <name>oozie.0-cdh4. *12.0.jar .0-cdh4.war $OOZIE_HOME/webapp/src/main/webapp/ 13.0.jar .cp $HADOOP_HOME/lib/hadoop-hdfs/hadoop-hdfs-2.db.jar . it creates the DB schema if it does not exist. it does not create the DB schema.0.schema</name> <value>true</value> <description> Creates Oozie DB. If the DB schema exists is a NOP. cp $OOZIE_HOME/distro/target/oozie-4. If set to true.4.0.sh create -sqlfile oozie.cp $HADOOP_HOME/lib/hadoop/lib/*.0-distro/oozie4. Run the following command to create OOZIE DB: $OOZIE_HOME/bin/ooziedb.0/oozie.0.cp $HADOOP_HOME/lib/hadoop/hadoop-auth-2. . </description> 14.0. nano $OOZIE_HOME/conf/oozie-site.4.service. If the DB schema does not exist it fails start up.JPAService..sql -run setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m" Validate DB Connection DONE Check DB schema does not exist DONE Check OOZIE_SYS table does not exist DONE Create SQL schema DONE Create OOZIE_SYS table DONE . If set to false.

war oozie.rm -rf oozie.com/developerworks/library/bd-ooziehadoop/ .0/hadooplibs/hadooplib1.0/*.0.2.jar -> cd $OOZIE_HOME .0 .0./bin/addtowar.0.war . .0. http://www. we need to install the ext JS library.cp oozie. Also.oozie-4. oozie war file requires few other jar files like hadoop-core-<version>.1.zip -> cd ~/oozie-4. start oozie -> oozied.Oozie DB has been created for Oozie version '4.0' The SQL commands have been written to: oozie.0/sharelib/target/oozie-sharelib-4.war -jars ~/oozie4.0. To enable webconsole.sh -inputwar oozie. User: ubuntu is not allowed to impersonate ubuntu -> to be able to run oozie jobs I started oozie and workflow with hdfs user 1.mv oozie1.sudo -u hdfs hadoop fs -put share share 16.0.jar -extjs $OOZIE_HOME/oozie-server/lib/ext-2.sh start If you encounter this issue on running oozie jobs -> Error: E0501 : E0501: Could not perform authorization operation.ibm.0/hadooplibs/target/oozie-4.jar & commons-configuration<version>.sql 15.0-hadooplibs/oozie-4.0.1.war -outputwar oozie1.war .war $OOZIE_HOME/oozie-server/webapps/ 17.

and Sqoop.StandardService start INFO: Starting service Catalina Feb 19. Hadoop administrators are able to build complex data transformations that can combine the processing of different individual tasks and even sub-workflows.startup. http://archive.cloudera. and makes it easier to manage the life cycle of those jobs. using Oozie.path: Feb 19.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.2.0.com/cdh4/cdh/4/oozie/WebServicesAPI. Oozie Bundle— Facilitates packaging multiple coordinator and workflow jobs.Catalina load INFO: Initialization processed in 590 ms Feb 19. 2014 7:23:47 AM org.coyote.html#Java_API_Example 4.cloudera. Therefore.0-incubating/WorkflowFunctionalSpec.Http11Protocol init INFO: Initializing Coyote HTTP/1.cloudera.HostConfig deployDescriptor INFO: Deploying configuration descriptor oozie.library. In addition. such as Java programs.catalina. One advantage of the Oozie framework is that it is fully integrated with the Apache Hadoop stack and supports Hadoop jobs for Apache MapReduce.xml ERROR: Oozie could not be started REASON: java.36 Feb 19.apache. there are different types of Oozie jobs:    Oozie Workflow jobs — Represented as directed acyclical graphs to specify a sequence of actions to be executed. 2014 7:23:47 AM org.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/6.Apache Oozie is an open source project based on Java™ technology that simplifies the process of creating workflows and managing coordination among jobs.lang.apache. Oozie offers the ability to combine multiple jobs sequentially into one logical unit of work.apache.org/docs/3. 2.apache.html 3.catalina. Hive.catalina.apache.apache.apache.NoClassDefFoundError: org/apache/hadoop/util/ReflectionUtils .com/cdh/3/oozie/DG_Examples. 2014 7:23:47 AM org. Oozie Coordinator jobs — Represent Oozie workflow jobs triggered by time and data availability. In practice. 2014 7:23:47 AM org. it can be used to schedule jobs specific to a system. http://oozie.core. 2014 7:23:47 AM org. In principle.core. This ability allows for greater control over complex jobs and makes it easier to repeat those jobs at predetermined periods.catalina. http://blog.catalina.1 on http-11000 Feb 19. http://archive. Pig. 2014 7:23:47 AM org.http11.com/blog/2013/06/how-to-use-the-apache-oozie-rest-api/ Feb 19.html 5.startup.

StandardContext.core.apache.<init>(Services.StandardEngine.hadoop.apache.java:595) at sun.startup.Bootstrap.reflect.startup.invoke(DelegatingMethodAccessorImpl.core.catalina.catalina.contextInitialized(ServicesLoader.apache.reflect.ServicesLoader.catalina.WebappClassLoader.java:324) at org.start(StandardServer.invoke0(Native Method) at sun.apache.apache.java:1680) at org.lang.java:4206) at org.apache.java:108) at org.java:39) at sun.catalina.apache.java:525) at org.apache.loadClass(WebappClassLoader.main(Bootstrap.java:359) at org.listenerStart(StandardContext.deployDescriptors(HostConfig.start(StandardService.service.apache.apache.Catalina.apache.apache.catalina.core.catalina.reflect.Services.catalina.startup.apache.apache.start(Catalina.apache.java:1317) at org.loadClass(WebappClassLoader. required = false) Integer offset.startup.loader.java:840) at org.oozie.java:754) at org.HostConfig.catalina.catalina.LifecycleSupport.deployApps(HostConfig.fireLifecycleEvent(LifecycleSupport.catalina.java:1526) .start(Bootstrap.apache.core.oozie.start(StandardContext.ContainerBase.HostConfig.catalina.setServiceInternal(Services.StandardHost.addChild(StandardHost.java:779) at org.catalina.addChildInternal(ContainerBase..HostConfig.catalina.java:601) at org.startup.java:4705) at org.StandardService.Services.java:142) at org.service.WebappClassLoader.core.apache.apache.catalina.java:502) at org.catalina.start(StandardEngine.invoke(Method.catalina.start(HostConfig.Bootstrap.util.core.catalina.oozie.ClassNotFoundException: org.core.catalina.startup.core.deployDescriptor(HostConfig.ReflectionUtils at org.startup.core.ContainerBase.addChild(ContainerBase.NativeMethodAccessorImpl.java:675) at org.NativeMethodAccessorImpl.util.HostConfig.NoClassDefFoundError: org/apache/hadoop/util/ReflectionUtils at org.invoke(NativeMethodAccessorImpl.Method.java:289) at org.. .StandardHost.apache.apache.apache.lang.catalina.lang.java:463) at org.catalina.StandardServer.java:414) Caused by: java.java:601) at org.apache.catalina.DelegatingMethodAccessorImpl.start(ContainerBase.start(ContainerBase.apache.ContainerBase.apache.java:1065) at org.lifecycleEvent(HostConfig.core.java:25) at java.core.apache.servlet.startup.StandardContext.start(StandardHost.apache.Stacktrace: ----------------------------------------------------------------java. 27 more @RequestParam(value = "offset".HostConfig.java:38) at org.loader.reflect.ContainerBase.java:799) at org.catalina.java:597) at org.java:1057) at org.

com:8020/user/workspace/lib").com:8021").web. conf. conf.annotation.converter.bigdata.json.http. "hdfs://namenode.servlet.map.mvc. Properties conf = wc.com:8021").setProperty("appLibLoc".com:8020/user/workspace/").jackson.setProperty("queueName".bigdata. try { String jobId = wc. "jobtracker.@RequestParam(value = "limit".AnnotationMethodHandlerExceptio nResolver"> <property name="messageConverters"> <array> <ref bean="mappingJacksonHttpMessageConverter" /> </array> </property> </bean> @RequestParam(value = "filter".xml"). required = false) Run Oozie job from java code public static void main(String[] args) { OozieClient wc = new OozieClient("http://host:11000/oozie"). "hdfs://namenode. conf. "jobtracker.springframework.bigdata. "hdfs://namenode.run(conf).com:8020/user/workspace/apps"). .setProperty(OozieClient.ObjectMapper" /> <bean id="mappingJacksonHttpMessageConverter" class="org. conf.com:8021"). System.out. "hdfs://namenode.springframework.bigdata.setProperty("nameNode".println("Workflow job submitted"). "hdfs://cluster/user/apps/merge-psplogs/merge-wf/workflow.bigdata.bigdata. required = false) Integer limit <bean id="jacksonObjectMapper" class="org.bigdata. conf.createConfiguration().setProperty("appsRoot". conf.setProperty("rawlogsLoc".APP_PATH.MappingJacksonHttpMessageConverter"> <property name="objectMapper" ref="jacksonObjectMapper" /> </bean> <bean id="annotationMethodHandlerExceptionResolver" class="org. "jobtracker.setProperty("mergedlogsLoc".com:8020").codehaus. conf.setProperty("jobTracker". conf.

.WEBINF/lib/jsp-api*.jar</packagingExcludes> </configuration> </plugin> <plugin> .").println("Errors").jar.println("Workflow job running .plugins</groupId> <artifactId>maven-war-plugin</artifactId> <version>2.println("Workflow job completed .maven.Status.out.. Thread.out.getJobInfo(jobId))..out. } } } /user/hue/oozie/workspaces/_hdfs_-oozie-26-1393496375.out.mojo</groupId> <artifactId>tomcat-maven-plugin</artifactId> <configuration> <server>tomcat</server> <url>http://localhost:8080/manager/html</url> </configuration> </plugin> <plugin> <groupId>org.1</version> <configuration> <packagingExcludes>WEB-INF/lib/servlet-api*.while (wc.RUNNING) { System.codehaus.").sleep(10 * 1000).26 <plugin> <groupId>org. } System.getJobInfo(jobId).apache.. System.getStatus() == WorkflowJob.1.println(wc. } catch (Exception r) { System.