Hadoop Configuration and Single Node Setup

CE 441 Big Data Analytics 18CE004
PRACTICAL-1
Aim: Hadoop Configuration and Single node cluster setup and perform file management task in
Hadoop. (creating a directory, list the content of directory, upload and download file in HDFS)
Theory:
There are two ways to install Hadoop, i.e. Single node and Multi node.
 Single node cluster means only one DataNode running and setting up all the NameNode,
DataNode, ResourceManager and NodeManager on a single machine. This is used for
studying and testing purposes. For example, let us consider a sample data set inside a
healthcare industry. So, for testing whether the Oozie jobs have scheduled all the
processes like collecting, aggregating, storing and processing the data in a proper
sequence, we use single node cluster. It can easily and efficiently test the sequential
workflow in a smaller environment as compared to large environments which contains
terabytes of data distributed across hundreds of machines.
 While in a Multi node cluster, there are more than one DataNode running and each
DataNode is running on different machines. The multi node cluster is practically used in
organizations for analyzing Big Data. Considering the above example, in real time when
we deal with petabytes of data, it needs to be distributed across hundreds of machines to
be processed. Thus, here we use multi node cluster.
Implementation:
o I will show you how to install Hadoop on a single node cluster.
o Prerequisites
 VIRTUAL BOX: it is used for installing the operating system on it.
 OPERATING SYSTEM: You can install Hadoop on Linux based operating
systems. Ubuntu and CentOS are very commonly used. In this tutorial, we are using
CentOS.
 JAVA: You need to install the Java 8 package on your system.
 HADOOP: You require Hadoop 2.7.3 package. lamentation:
o Install Hadoop
o Step 1: Click here to download the Java 8 Package. Save this file in your home
directory.
o Step 2: Extract the Java Tar File.
Command: tar -xvf jdk-8u101-linux-i586.tar.gz
o Step 3: Download the Hadoop 2.1.0 Package.

Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-
2.10.0/hadoop-2.10.0.tar.gz
o Step 4: Extract the Hadoop tar File. Command: tar -xvf hadoop-2.1.0.tar.gz
o Step 5: Add the new group and new user to it.
o Step 6: Export path and check Hadoop version.

Command: export HADOOP_HOME=/home/mypc/Desktop/Hadoop
export PATH=$PATH:$HADOOP_HOME/bin
hadoop version
o Step 7: Edit the Hadoop Configuration files.

Command: cd Desktop/Hadoop/etc/hadoop Gedit Hadoop-env.sh
 Add java path at the last

o Step 8: Open core-site.xml and edit the property mentioned below inside
configuration tag: core-site.xml informs Hadoop daemon where NameNode runs
in the cluster. It contains configuration settings of Hadoop core such as I/O
settings that are common to HDFS & MapReduce.
Command: sudo gedit $HADOOP_HOME/etc/Hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>Parent directory for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS </name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. </description>
</property> \
o Step 9: Edit hdfs-site.xml and edit the property mentioned below inside
configuration tag: hdfs-site.xml contains configuration settings of HDFS
daemons (i.e. NameNode, DataNode, Secondary NameNode). It also includes
the replication factor and block size of HDFS. Command: sudo gedit
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
<property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hduser_/hdfs</value>
</property>
</configuration>
o Step 10: Edit the mapred-site.xml file and edit the property mentioned below
inside configuration tag:
 mapred-site.xml contains configuration settings of MapReduce application like
number of JVM that can run in parallel, the size of the mapper and the reducer
process, CPU cores available for a process, etc.
 In some cases, mapred-site.xml file is not available. So, we have to create the
mapred-site.xml file using mapred-site.xml template.
 Command: cp mapred-site.xml.template mapred-site.xml
 Command: sudo gedit $HADOOP_HOME/etc/hadoop/mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:54311</value>
<description>MapReduce job tracker runs at this host and port.
</description>
</property>
</configuration>
o Step 11: Go to Hadoop home directory and format the NameNode.

Command: $HADOOP_HOME/bin/hdfs namenode -format
o Step 12: Once the NameNode is formatted, start aa services of Hadoop.

Command: $HADOOP_HOME/sbin/start-dfs.sh
Conclusion: In this practical, I understood the Installation of Hadoop using Setting up a Single
Node Hadoop Cluster.
PRACTICAL-2
Aim: Copy your data into the Hadoop Distributed File System (HDFS)
Implementation:
Steps:
 Download a text file of words
 Open a terminal shell
 Copy text file from local file system to HDFS
 Copy a file within HDFS
 Copy a file from HDFS to the local file system
Conclusion: In this practical we have learnt to copy text file of words to HDFS.
PRACTICAL-3
Aim: To understand the overall programming architecture using Map Reduce API. (word count
using MapReduce or weather report POC-map reduce program to analyses time-temperature
statistics and generate report with max/min temperature)
Implementation:
1. Open Eclipse> File > New > Java Project >( Name it – MRProgramsDemo) > Finish.
2. Right Click > New > Package ( Name it - PackageDemo) > Finish.
3. Right Click on Package > New > Class (Name it - WordCount).
4. Add Following Reference Libraries:
o Right Click on Project > Build Path> Add External
• /usr/lib/hadoop-0.20/hadoop-core.jar
• Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar
5. Type the following code:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String[] args) throws Exception {
Configuration c = new Configuration();
String[] files = new GenericOptionsParser(c, args).getRemainingArgs();
Path input = new Path(files[0]);
Path output = new Path(files[1]); Job j = new Job(c, "wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true) ? 0 : 1);
}
public static class MapForWordCount extends Mapper & lt; LongWritable, Text,
Text, IntWritable & gt; {
public void map(LongWritable key, Text value, Context con) throws
IOException, InterruptedException {
String line = value.toString();
String[] words = line.split(",");
for (String word: words) {
Text outputKey = new
Text(word.toUpperCase().trim());
IntWritable outputValue = new
IntWritable(1); con.write(outputKey,
outputValue);
}
}
}
public static class ReduceForWordCount extends Reducer & lt; Text, IntWritable,
Text, IntWritable & gt; {
public void reduce(Text word, Iterable & lt; IntWritable & gt; values, Context con)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value: values) { sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}
o The above program consists of three classes:
 Driver class (Public, void, static, or main; this is the entry point).
 The Map class which extends the public class
Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> and implements the Map
function.
 The Reduce class which extends the public class
Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> and implements the
Reduce function.
6. Make a jar file.

• Right Click on Project> Export> Select export destination as
JAR File > next> Finish.
7. Take a text file and move it into HDFS format:
8. To move this into Hadoop directly, open the terminal and enter the following
commands: hadoop fs -put wordcountFile wordCountFile
9. Run the jar file:

hadoop jar MRProgramsDemo.jar PackageDemo.WordCount wordCountFile
MRDir1
Output:
Conclusion: In this practical, I understood the basic concepts of the overall programming
architecture using Map Reduce API.
PRACTICAL-4
Aim: Configure Flume/Spark for streaming data analysis. (Analysis using Apache Spark and
fetching live tweets from twitter) OR (Word count/ analysis from local host data streaming)
Sentiment Analysis using Apache Spark and fetching live tweets from twitter using flume-
ng
 Go to twitter developer, create new app.
 Create twitter.conf file inside /usr/lib/apache-flume-1.9.0-bin:

sudo gedit twitter.conf
 To run flume:
bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf -
Dflume.root.logger=DEBUG,console -n TwitterAgent
Create First WordCount project
 Open Eclipse and do File->New -> project ->Select Maven Project; see below.
 Enter Group id, Artifact id, and click finish.
 Edit pom.xml.
 Write your code or just copy given WordCount code from D:\ spark\
spark-1.6.1-bin- hadoop2.6 \ examples\src \main\ java\ org\ apache\

spark\ examples
 Now, add external jar from the location D:\spark\spark-1.6.1-bin-hadoop2.6\lib

and set Java 8 for compilation; see below.
 Build the project: D:\hadoop\spWCexamples

Write mvn package on cmd
 Execute the project: Go to the following location on cmd: D:\spark\spark-1.6.1-

binhadoop2.6\bin
Write the following command

spark-submit --class sparkWCexamples.spWCexamples.WORDCOUNT --master
local /D:/Hadoop/spWCexamples/target/spWCexamples-1.0-SNAPSHOT.jar
D:/Hadoop/spWCexamples/how.txt D:/Hadoop/spWCexamples/answer.txt
 You can also check the progress of the project at:

http://localhost:4040/jobs/ Finally get the answers; see below.
Conclusion: From this practical, we studied how to install spark and created Word
Count project.
PRACTICAL-5
Aim: Implementation of HDFS & MapReduce using Talend.
Implementation:
 Download Talend from below link:

https://www.talend.com/download/thankyou/big-data-windows/
 Add IP address of Host from Cloudera. This can be verify using the “cloud manager” in
Cloudera.
Location: C:\Windows\System32\drivers\etc\ in that open “host” file.
Note: To open the cloud manager in Cloudera follow some steps:

Click on “Launch Cloudera Express” and you will get some message in terminal:
 Run command “sudo /home/cloudera/cloudera-manager –force –express” and wait until

finish then go to browser open cloud manager.
 After successful loading of cloud manager enter username and password in to login
section and then click on “Hosts” from menu bar.
 Verify IP with Cloud manager and Host files IP address.
 Open “Talend” and start configuration of Hadoop Connection:

o As per given steps in Document file:
1. Create New project and give appropriate name.
2. Now inside Repository tab untoggled Metadata tab and inside that right click
on Hadoop cluster and select create new cluster. Now follow some steps as
per screenshot:
3. Here enter Username in User Name Tab and click on Check Services to get
successful connection of NameNode is done or not. Here you may have
download server required modules.
Note: Before that you have to start NamenNode Service in Cloudera follow some steps:
Enter below code into terminal:
“for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x start; done”
It takes some time to complete.
After that in browser from Hadoop bookmark click on HDFS NameNode to verify that
NameNode is working.
 After that click on Finish

which will create cluster under
Hadoop Cluster.
4. Now we have to create JOB which is under Repository Tab. Now Right click
on Job Design Tab and click to creat Job and give name to that job. Then click
to finish.
5. After that go to Pallete tab and add “tHDFSConnection” and “tHDFSList”

from that.
6. Now Right click on “tHDFSConnection” and from Trigger option select

“OnSubjectOk” and connect it with “tHDFSList”.
7. Now click on tHDFSConnection_1 and go to Component add configuration

as per seen here:
8. Now click on tHDFSList_1 and go to Component add configuration as per

seen here:
9. Now click on browser button, and is successful fetching done then you will
get this window which you will verify with repository under HDFS in
cloudera:
Now Reading file from HDFS
1. For that you have to change tHDFSList with tHDFSInput and tLogRow.
2. Now connect tHDFSConnection with tHDFSInput using OnSubjectOk trigger and
tHDFSInput connect with tLogRow using main Row.
3. Now configure tHDFSInput and tLogRow as per seen in screenshot and don’t
change anything in tHDFSConnection.
4. Now click on Run. But it gives an error. For writing FILE same kind of error
occurred.
Conclusion: From this practical, I studied how to implement the HDFS and MapReduce using
Talend.
CE441: Big Data Analytics 18CE004
PRACTICAL-6
Aim: To perform NoSQL database operation using mongo dB to perform CRUD operation on it.
Implementation:
1. db.student.insert({
regNo: "3014",
name: "Test Student",
course: {
courseName: "MCA",
duration: "3 Years" },
address: {
city: "Bangalore",
state: "KA",
country: "India" }
})
2. db.studnet.find().pretty()
3. db.student.update({"regNo": "3014"},{$set:{"name":"user"}})
4. db.student.remove({"regNo":"3014"})
Conclusion: In this practical, I learnt the basics CRUD operation in MongoDB.

PRACTICAL-7
Aim: Retrieve various types of documents from MongoDB (file import/export operations)
Implementation:
Importing json file:
 To import json file : mongoimport --db restaurantd –collection restaurantc --file
data.json
 To verify whether data has been imported or not:

a. show dbs
b. use restaurant
c. db.resraurant.c.count()
 To display data : db.restaurantc.findOne()

 db.restaurantc.find({name : "Glorious Food"}).forEach(printjson)
Import csv file:
 mongoimport --db OpenFlights --collection Airport --type csv --headerline --ignoreBlanks

airports.csv
 To verify whether data has been imported or not:

a. show dbs
b. use OpenFlights
c. show tables
d. db.Airport.count()
 To display particular data : db.Airport.find({iso_region : "US-GA"})
Export csv file:
 mongoexport --db=OpenFlights --collection=Aiport --type=csv -- fields=id, type, name-

out=G:\output.csv
Conclusion
In this practical, I learnt how to import the data, how to use imported data and how to export data in
mongo db.
Practical- 8
Aim: Configure Neo4j and create node(add –remove property) & relationship in it.
Implementation:
1. Download Neo4J Community Edition from below link:
https://neo4j.com/download-center/
2. Extract the zip file and keep the extracted folder in D:\ drive (for e.g.)
3. Open command prompt and navigate to D:\neo4j
4. Run below commands to start neo4j service:
bin\neo4j install-service bin\neo4j start
5. Open the URL http://localhost:7474/browser/ in your web browser.

6. Initial username: neo4j and password: neo4j
7. Once submitted prompt for change password will be shown wherein you can
change database password
8. Homepage will be shown in which top most text box starting with $ is for
writing queries.
9. Create a node “Brad” with label Person as
under: CREATE (n: Person {name:"Brad"}) RETURN
Similarly create rest of the 4 nodes.
CREATE (n: Person {name:"Alice"}) RETURN n CREATE (n: Person {name:"Mike"})

RETURN n
CREATE (n: Person {name:"Jill"}) RETURN n
CREATE (n: Person {name:"Hazel"}) RETURN n 10. To view all nodes use:
MATCH (n) RETURN n
10. Create a node “Avenger” with label Movie as under:

CREATE (n: Movie {title:"Avengers"}) RETURN
n
Similarly create rest of the 2 nodes
CREATE (n: Movie {title:"Skyfall"}) RETURN n CREATE (n: Movie {title:"Inception"})
RETURN n
12. To view nodes of specific label: MATCH (n: Movie) RETURN n
13. To set other properties:

MATCH (n: Person {name:"Brad"}) SET n.age=34
14. To remove the property:
MATCH (n: Person {name:"Brad"}) REMOVE n.age
15. To add a relationship to nodes:
MATCH (a: Person {name:"Brad"}),(b:Person {name:"Alice"}) MERGE (a) - [r:FRIENDS]->

(b)
16. To add properties in relationship:
MATCH (a: Person {name:"Hazel"}), (b: Person {name:"Jill"}) MERGE (a) -[r: FRIENDS
{since:"1998"}]-> (b)
17. Adding relationship between two different labels:
MATCH (a: Person {name:"Jill"}),(b: Movie {title:"Avengers"}) MERGE (a) -

[r:FAVOURITE]-> (b)
18. Add some more relationships between nodes.
19. To fetch or query database here in this case to fetch a node with name JILL: MATCH
(n: Person) WHERE n.name=”Jill” RETURN n
20. To find friends of BRAD:

MATCH (a: Person {name:"Brad"}) - [: FRIENDS] -> (b: Person) RETURN a, b
21. To delete a relationship and node:
MATCH (a: Person {name: “Jill”}) - [r] – (b) DELETE r
MATCH (a: Person {name: “Jill”}) DELETE a

22. To delete entire database:
MATCH (n) OPTIONAL MATCH (n) – [r] – () DELETE n, r

Creating node Relationships in Neo4j
syntax to create a relationship using the CREATE claus
CREATE (node1)-[:RelationshipType]->(node2)
CREATE (Dhawan:player{name: "Shikar Dhawan", YOB: 1985, POB: "Delhi"}) CREATE
(Ind:Country {name: "India"})
CREATE (Dhawan)-[r:BATSMAN_OF]->(Ind)
RETURN Dhawan, Ind
Syntax:
MATCH (a:LabeofNode1), (b:LabeofNode2)

WHERE a.name = "nameofnode1" AND b.name = " nameofnode2"
CREATE (a)-[: Relation]->(b) RETURN a,b
MATCH (a:player), (b:Country) WHERE a.name = "Shikar Dhawan" AND b.name = "India"
CREATE (a)-[r: BATSMAN_OF]->(b)
RETURN a,b
MATCH (a:player), (b:Country) WHERE a.name = "Shikar Dhawan" AND b.name = "India"
CREATE (a)-[r:BATSMAN_OF {Matches:5, Avg:90.75}]->(b)
RETURN a,b
Conclusion: After performing practical, I understood to configure Neo4j and make nodes.
PRACTICAL-9
Aim: Import files from Neo4j and complete relational graph from it.
Implementation:
 First, we will store csv file into Neo4j
 Start Database and Open neo4j browser
1. To read data from csv file

LOAD CSV from 'file:///orders.csv' as row return row;
2. To count row from data

 LOAD CSV FROM 'file:///products.csv' AS row RETURN count(row);
OR
 LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row RETURN count(row);

3. LOAD CSV WITH HEADERS FROM 'file:///order-details.csv' AS row

RETURN count(row);
Here Data in string format by defualt Convert in appropriate format

 toInteger(): converts a value to an integer.
 toFloat(): converts a value to a float (in this case, for monetary amounts).
 datetime(): converts a value to a datetime.
4. LOAD CSV FROM 'file:///products.csv' AS row WITH toInteger(row[0]) AS productId, row[1]

AS productName, toFloat(row[2]) AS unitCost RETURN productId, productName, unitCost
OR
LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row WITH toInteger(row.orderID) AS
orderId, datetime(replace(row.orderDate,' ','T')) AS orderDate, row.shipCountry AS country
RETURN orderId, orderDate, country
5. Now Create Node
 LOAD CSV FROM 'file:///products.csv' AS row WITH toInteger(row[0]) AS productId, row[1] AS

productName, toFloat(row[2]) AS unitCost MERGE (p:Product {productId: productId}) SET
p.productName = productName, p.unitCost = unitCost RETURN p LIMIT 20
6. Relationship
 LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row WITH toInteger(row.orderID) AS
orderId, row.shipCountry AS country MERGE (o:Order {orderId: orderId}) create (a:orderID
{id:orderId}) create (b:country {cname:country}) create (a)-[:MADE_IN]->(b) RETURN a , b
7. To Create unique field
 create constraint on (c:country) assert c.name is unique;

//above query use only one time
 LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row

//TH toInteger(row.orderID) AS orderId, row.shipCountry AS country
create (orderId:orderId {id:toInteger(row.orderID)})
MERGE (country: country{name:row.shipCountry})
create (country)-[r:try]->(orderID)
return country , orderID
//create (a:orderID {id:orderId})
Conclusion: In this practical, I learnt how to import files from Neo4j and complete relational graph
from it.
PRACTICAL-10
Aim: Configure apache spark and perform word count operation on it.
Implementation:
1) Installing Scala
 Download Scala from below link:
https://www.scala-lang.org/download/
 Set environmental variables:

i. User variable:
 Variable: SCALA_HOME;
 Value: C:\Program Files (x86)\scala
ii. System variable:

 Variable: PATH
 Value: C:\Program Files (x86)\scala\bin
.
2) Installing Java8
 Download Java from below link:
https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

iii. User variable:
 Variable: JAVA_HOME;
 Value: C:\Program Files\Java\jre1.8.0_251
iv. System variable:

 Variable: PATH
 Value: C:\Program Files\Java\jre1.8.0_251\bin
3) Install Eclipse
 Download Eclipse from below link:
https://www.eclipse.org/downloads/

v. User variable:
 Variable: ECLIPSE_HOME;
 Value: C:\eclipse
vi. System variable:

 Variable: PATH
 Value: C:\eclipse\bin
4) Install Spark 1.6.1

 Download Spark from below link:
http://spark.apache.org/downloads.html

vii. User variable:
 Variable: SPARK_HOME;
 Value: D:\Spark\spark-3.0.1-bin-hadoop2.7
viii. System variable:

 Variable: PATH
 Value: D:\Spark\spark-3.0.1-bin-hadoop2.7\bin
5) Download Windows Utilities

 Download from below link:
https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin
And paste it in D:\spark\spark-1.6.1-bin-hadoop2.6\bin
6) Execute Spark
7) Install Maven 3.3

 Download Apache-Maven-3.3.9 from below link:
http://spark.apache.org/downloads.html
And extract it into D drive, such as D:\apache-maven-3.3.9

ix. User variable:
 Variable: MAVEN_HOME;
 Value: D:\apache-maven-3.3.9
x. System variable:
 Variable: PATH
 Value: D:\apache-maven-3.3.9\bin
8) Create First WordCount project

 Open Eclipse and do File->New -> project ->Select Maven Project; see below.
 Enter Group id, Artifact id, and click finish.
 Edit pom.xml.
 Write your code or just copy given WordCount code from D:\ spark\
spark-1.6.1-bin- hadoop2.6 \ examples\src \main\ java\ org\ apache\

spark\ examples
 Now, add external jar from the location D:\spark\spark-1.6.1-bin-hadoop2.6\lib

and set Java 8 for compilation; see below.
 Build the project: D:\hadoop\spWCexamples

Write mvn package on cmd
 Execute the project: Go to the following location on cmd: D:\spark\spark-1.6.1-

binhadoop2.6\bin
Write the following command

spark-submit --class sparkWCexamples.spWCexamples.WORDCOUNT --master
local /D:/Hadoop/spWCexamples/target/spWCexamples-1.0-SNAPSHOT.jar
D:/Hadoop/spWCexamples/how.txt D:/Hadoop/spWCexamples/answer.txt
 You can also check the progress of the project at:

http://localhost:4040/jobs/ Finally get the answers; see below.
Conclusion: From this practical, we studied how to install spark and created Word
Count project.

Hadoop Configuration and Single Node Setup

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Configuration and Single Node Setup

Uploaded by

Copyright:

Available Formats

CE 441 Big Data Analytics 18CE004

o Step 3: Download the Hadoop 2.1.0 Package.

o Step 6: Export path and check Hadoop version.

o Step 7: Edit the Hadoop Configuration files.

 Add java path at the last

<?xml version="1.0" encoding="UTF-8"?>

<?xml version="1.0" encoding="UTF-8"?>

o Step 11: Go to Hadoop home directory and format the NameNode.

o Step 12: Once the NameNode is formatted, start aa services of Hadoop.

6. Make a jar file.

9. Run the jar file:

 Create twitter.conf file inside /usr/lib/apache-flume-1.9.0-bin:

Create First WordCount project

 Enter Group id, Artifact id, and click finish.

spark-1.6.1-bin- hadoop2.6 \ examples\src \main\ java\ org\ apache\

 Now, add external jar from the location D:\spark\spark-1.6.1-bin-hadoop2.6\lib

 Build the project: D:\hadoop\spWCexamples

 Execute the project: Go to the following location on cmd: D:\spark\spark-1.6.1-

Write the following command

 You can also check the progress of the project at:

Aim: Implementation of HDFS & MapReduce using Talend.

 Download Talend from below link:

Note: To open the cloud manager in Cloudera follow some steps:

 Run command “sudo /home/cloudera/cloudera-manager –force –express” and wait until

 Open “Talend” and start configuration of Hadoop Connection:

 After that click on Finish

5. After that go to Pallete tab and add “tHDFSConnection” and “tHDFSList”

6. Now Right click on “tHDFSConnection” and from Trigger option select

7. Now click on tHDFSConnection_1 and go to Component add configuration

8. Now click on tHDFSList_1 and go to Component add configuration as per

Now Reading file from HDFS

Conclusion: In this practical, I learnt the basics CRUD operation in MongoDB.

 To verify whether data has been imported or not:

 To display data : db.restaurantc.findOne()

 db.restaurantc.find({name : "Glorious Food"}).forEach(printjson)

Import csv file:

 mongoimport --db OpenFlights --collection Airport --type csv --headerline --ignoreBlanks

 To verify whether data has been imported or not:

 To display particular data : db.Airport.find({iso_region : "US-GA"})

Export csv file:

 mongoexport --db=OpenFlights --collection=Aiport --type=csv -- fields=id, type, name-

1. Download Neo4J Community Edition from below link:

bin\neo4j install-service bin\neo4j start

5. Open the URL http://localhost:7474/browser/ in your web browser.

under: CREATE (n: Person {name:"Brad"}) RETURN

Similarly create rest of the 4 nodes.

CREATE (n: Person {name:"Alice"}) RETURN n CREATE (n: Person {name:"Mike"})

10. Create a node “Avenger” with label Movie as under:

12. To view nodes of specific label: MATCH (n: Movie) RETURN n

13. To set other properties:

15. To add a relationship to nodes:

MATCH (a: Person {name:"Brad"}),(b:Person {name:"Alice"}) MERGE (a) - [r:FRIENDS]->

17. Adding relationship between two different labels:

MATCH (a: Person {name:"Jill"}),(b: Movie {title:"Avengers"}) MERGE (a) -

20. To find friends of BRAD:

MATCH (a: Person {name: “Jill”}) DELETE a

MATCH (n) OPTIONAL MATCH (n) – [r] – () DELETE n, r

syntax to create a relationship using the CREATE claus

MATCH (a:LabeofNode1), (b:LabeofNode2)

CREATE (a)-[: Relation]->(b) RETURN a,b

 First, we will store csv file into Neo4j

 Start Database and Open neo4j browser

1. To read data from csv file