Professional Documents
Culture Documents
S3Lab
Smart Software System
Laboratory
1
“Without big data, you are blind and deaf
and in the middle of a freeway.”
– Geoffrey Moore
Big Data 2
Hadoop Ecosystem
3
Big Data
Apache Flume Tutorial
4
Introduction to Apache Flume
5
Data transfer components
Flume - How it works
● Agent nodes are typically installed on the machines that generate the logs and are data’s
initial point of contact with Flume. They forward data to the next tier of collector nodes,
which aggregate the separate data flows and forward them to the final storage tier.
6
Big Data
Data transfer components
Flume - How it works
7
Big Data
Data transfer components
Flume - Agent architecture
● Sources:
○ HTTP, Syslog, JMS, Kafka,
Avro, Twitter - stream api
for tweets download, …
● Sink:
○ HDFS, Hive, HBase, Kafka,
Solr, …
● Channel:
○ File, JDBC, Kafka, ...
8
Big Data
Start Flume on Cloudera Quickstart VM
● To add Flume to Cloudera Quickstart VM, you need to launch Cloudera Manager
● Configure the VM.
○ Allocate 2 CPUs.
9
Start Flume on Cloudera Quickstart VM
10
Start Flume on Cloudera Quickstart VM
11
Start Flume on Cloudera Quickstart VM
12
Start Flume on Cloudera Quickstart VM
13
Start Flume on Cloudera Quickstart VM
● Select Flume
14
Start Flume on Cloudera Quickstart VM
● Start Hue
15
Start Flume on Cloudera Quickstart VM
● Start Flume
16
Start Flume on Cloudera Quickstart VM
17
Start Flume on Cloudera Quickstart VM
18
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation
● Firstly, let’s install telnet
● Command: sudo yum install telnet
19
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation
20
Start Flume on Cloudera Quickstart VM
Use Telnet to test the default Flume implementation
21
Writing from Flume to HDFS
Create the /flume/events directory
● In the VM web browser, open Hue
● Click File Browser
● In the /user/cloudera directory, click New->Directory
● Create a directory named flume
22
Writing from Flume to HDFS
Create the /flume/events directory
● If you get this error when creating new directory
23
Writing from Flume to HDFS
Create the /flume/events directory
● In the flume directory, create a directory named events
24
Writing from Flume to HDFS
Create the /flume/events directory
● Check the box to the left of the events directory, then click the Permissions setting
25
Writing from Flume to HDFS
Create the /flume/events directory
● Enable Write access for Group and Other users
● Then click Submit
26
Writing from Flume to HDFS
Change the Flume configuration
● Open Cloudera Manager -> click Flume -> Click the Configuration tab
● Scroll or search for the Configuration File item.
● Append the following lines to the Configuration File settings
● Clich Save Changes
tier1.sinks.sink1.type = HDFS
tier1.sinks.sink1.filetype = DataStream
tier1.sinks.sink1.channel = channel1
tier1.sinks.sink1.hdfs.path =
hdfs://localhost:8020/user/cloudera/flume/events
27
Writing from Flume to HDFS
Change the Flume configuration
● Restart Flume
28
Writing from Flume to HDFS
Change the Flume configuration
● Start YARN
29
Writing from Flume to HDFS
Writing to HDFS
● In a terminal window, launch Telnet with the command telnet localhost 9999
● At the prompt, enter some text
30
Writing from Flume to HDFS
Writing to HDFS
31
Apache Sqoop Tutorial
32
Apache Sqoop Tutorial
Sqoop (Sql-to-hadoop)
33
Big Data
Apache Sqoop Tutorial
Sqoop - How it works
34
Big Data
Apache Sqoop Tutorial
Sqoop - How it works
35
Big Data
Apache Sqoop Tutorial
Sqoop - How it works
36
Big Data
Apache Sqoop Tutorial
Sqoop - How it works
37
Big Data
Apache Sqoop Tutorial
Flume vs Sqoop
38
Big Data
Apache Sqoop Tutorial
39
Sqoop connecting to a Database Server
40
Sqoop import RDBMS table into HDFS
● Login to mysql
41
Sqoop import RDBMS table into HDFS
42
Sqoop import RDBMS table into HDFS
● Create table student and insert some data into this table
43
Sqoop import RDBMS table into HDFS
44
Sqoop import RDBMS table into HDFS
45
Sqoop import RDBMS table into HDFS
46
Sqoop import RDBMS table into HDFS
47
Sqoop import RDBMS table into HDFS without target directory
48
Sqoop import RDBMS table into HDFS without target directory
49
Sqoop – IMPORT Command with Where Clause
50
Sqoop – IMPORT Command with Where Clause
51
Free-form Query Imports
● Sqoop can also import the result set of an arbitrary SQL query.
● When importing a free-form query, you must specify --target-dir, --split-by, and include the token $CONDITIONS.
● Command: sqoop import --connect jdbc:mysql://localhost/StudentInfo --username root --password cloudera --query
'select std_name from student where std_id=103 and $CONDITIONS' --split-by std_name --target-dir
/user/cloudera/studentInfo/studentName103
52
Free-form Query Imports
53
Sqoop - list all databases
54
Sqoop - list all tables
55
Sqoop - list all tables
56
Sqoop Imports All Tables
57
Sqoop Imports All Tables
58
Sqoop Export data from HDFS to the RDBMS
59
Sqoop Export data from HDFS to the RDBMS
60
Sqoop Export data from HDFS to the RDBMS
61
● Show databases
62
● Run
● This command may take a while to complete. It is launching MapReduce jobs to pull the data
from our MySQL database and write the data to HDFS, distributed across the cluster in
Apache Parquet format. It is also creating tables to represent the HDFS files in Impala /
Apache Hive with matching schema.
63
64
Assignment 3
65
Assignment 3
66
Q&A
67
Big Data
68
69
Writing from Flume to HDFS
● cd /usr/lib/flume-ng
70
71
72
73
74
75
76
77