Professional Documents
Culture Documents
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and
recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any
kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://e.huawei.com
HCNA-BigData V2.0 Experiment Guide Page 3
Overview
This guide instructs trainees to perform all the experiment tasks required by the HCIA-Big Data
course on Huawei FusionInsight HD Big Data platform. It is aimed to help trainees master the
knowledge about using Big Data components of the FusionInsight HD platform.
Content Description
This document contains eight experiments: FusionInsight client installation, HBase database practice,
HDFS file system practice, Loader data import and export practice, Flume data collection practice,
Kafka message subscription practice, Hive data warehouse practice, and cluster comprehensive
experiment.
Precautions
During an experiment, trainees are not allowed to delete files randomly.
When naming a directory, topic, or file, a trainee must include the trainee’s account stuxx or userxx,
for example, stu06_data and user01_socker.
The trainer manages and allocates all the user names and passwords for logging in to the
environment. If you have any questions regarding the user name and password, ask the trainer
please.
References
FusionInsight HD product documentation
Experiment Environment
Table 1-1 Experimental Hardware and Software
1.1 2288H V5
(excluding optical
modules) ,2*1500W AC power
supply)H22H-05
1.1.3 Memory
HardDisk-600GB-SAS 12Gb/s-10K
02311HAP N600S1210W2 rpm-128MB -2.5 inch(2.5 inch 2
bracket)
HardDisk-1800GB-SAS 12Gb/s-10K
02311FM
N1800S10W2 rpm-128MB-2.5 inch(2.5 inch 4
R
bracket)
SR530C-M 1G(LSI3108)SAS/SATA
RAID card -RAID0,1,5,6,10,50,60-
02311SMF BC1M05ESMLB 1
1GB Cache-supports supercapacitor
and out-of-band management.
02311TW
BC1M31RISE 3*x8 (x16 slot) RISER1 module 1
R
Other hardware
/ 10G
/tmp 10G
/var 10G
Management/Control/Data
node /var/log ≥200G
/srv/BigData ≥60G
/opt ≥300G
HCNA-BigData V2.0 Experiment Guide Page 6
Experiment Topology
Three server nodes are used.
Contents
1.1 Background
The FusionInsight HD client is the interface for the communication between users and the cluster as
well as the foundation of subsequent experiments. After a client is installed, it requires security
authentication to communicate with the cluster if the cluster is deployed in secure mode.
1.2 Objective
⚫ To understand how to download and install a client.
Copy the FusionInsight HD client to the home directory of userXX (for example, user01). The client
files are saved in the /FusionInsight_Client directory of each cluster node.
> cd /FusionInsight-Client
> cp FusionInsight_Cluster_1_Services_ClientConfig.tar /home/userXX
> cd /home/userXX/FusionInsight_Cluster_1_Services_ClientConfig/
>./install.sh /home/userXX/hadoopclient
> cd /home/userXX/hadoopclient
> source bigdata_env
> kinit stuXX
Note: The initial password is Huawei@123 (or consult the trainer). If the system prompts you to
change the password during the first authentication, change the password to Huawei12#$.
----End
1.4 Summary
This experiment demonstrates how to install a FusionInsight HD client. During the installation, you
need to decompress the client software twice. Note that no file or folder exists in the directory
where the client is installed. Otherwise, the installation fails.
HCNA-BigData V2.0 Experiment Guide Page 13
2.1 Background
HDFS is a distributed file system on the Hadoop Big Data platform and provides data storage for
upper-layer applications or other Big Data components, such as Hive, Mapreduce, Spark, and HBase.
On the HDFS shell client, you can operate and manage the distributed file system. Using HDFS helps
us better understand and master Big Data.
2.2 Objectives
⚫ To have a good command of common HDFS operations
⚫ To master HDFS file system management operations
> cd /home/userXX/hadoopсlient
> source bigdata_env
> kinit stuXX
-get: Downloads a file from HDFS to a local host, which is equivalent to copyToLocal.
total 2881728
drwxr-xr-x 15 user01 hadoop 4096 Apr 4 10:58 1001_hadoopclient
-rw-r--r-- 1 user01 hadoop 63 Apr 4 16:30 appendtext.txt
drwxr-xr-x 2 user01 hadoop 4096 Apr 4 10:03 bin
-rw-r--r-- 1 user01 hadoop 0 Apr 4 15:28 hdfs
-rwxr-xr-x 1 user01 hadoop 2947983360 Apr 4 10:05 Service_Client.tar
-rw-r--r-- 1 user01 hadoop 38 Apr 4 16:27 stu01.txt
-rw-r--r-- 1 user01 hadoop 38 Apr 4 17:54 test01.txt
HCNA-BigData V2.0 Experiment Guide Page 15
> ll
total 2881716
drwxr-xr-x 15 user01 hadoop 4096 Apr 4 10:58 1001_hadoopclient
drwxr-xr-x 2 user01 hadoop 4096 Apr 4 10:03 bin
-rw-r--r-- 1 user01 hadoop 0 Apr 4 15:28 abcd
-rwxr-xr-x 1 user01 hadoop 2947983360 Apr 4 10:05 Service_Client.tar
Execute the moveFromLocal command to move the abcd file to the /user/app_stuXX directory of
the HDFS.
After the execution is complete, check that the file does not exist anymore in the home directory of
userXX.
> ll
total 2881716
drwxr-xr-x 15 stu01 hadoop 4096 Apr 4 10:58 1001_hadoopclient
drwxr-xr-x 2 stu01 hadoop 4096 Apr 4 10:03 bin
-rwxr-xr-x 1 stu01 hadoop 2947983360 Apr 4 10:05 Service_Client.tar
Found 3 items
-rw-rw-rw-+ 3 root hadoop 1.3 G 2020-07-13 16:21
/user/app_stu20/FusionInsight_Cluster_1_Services_Client.tar
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-13 16:45
/user/app_stu01/abcd.txt
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-13 16:29
/user/app_stu01/test01.txt
01,HDFS
02,Zookeeper
03,HBase
04,Hive
HCNA-BigData V2.0 Experiment Guide Page 16
10,Spark
11,Storm
12,Kafka
13,Flink
14,ELK
15,FusionInsight HD
01,HDFS
02,Zookeeper
03,HBase
04,Hive
10,Spark
11,Storm
12,Kafka
13,Flink
14,ELK
15,FusionInsight HD
Found 3 items
-rw-rw-rw-+ 3 root hadoop 1352929792 2020-07-13 16:21
/user/app_stu01/FusionInsight_Cluster_1_Services_Client.tar
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-13 16:45
/user/app_stu01/abcd.txt
-rw-rw-rw-+ 3 user01 hadoop 101 2020-07-13 16:57
/user/app_stu01/test01.txt
Found 1 items
Found 2 items
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-13 16:45 /tmp/stu01/abcd.txt
-rw-rw-rw-+ 3 user01 hadoop 101 2020-07-13 17:40
/tmp/stu01/test01.txt
There are two files in the /user/app_stuXX directory, which are file01 and test01.txt.
001 FusionInsight HD
002 FusionInsight Miner
003 FusionInsight LibrA
004 FusionInsight Farmer
005 FusionInsight Manager
01,HDFS
02,Zookeeper
03,HBase
04,Hive
10,Spark
11,Storm
12,Kafka
13,Flink
14,ELK
15,FusionInsight HD
001 FusionInsight HD
002 FusionInsight Miner
003 FusionInsight LibrA
004 FusionInsight Farmer
005 FusionInsight Manager
01,HDFS
02,Zookeeper
03,HBase
04,Hive
10,Spark
11,Storm
12,Kafka
13,Flink
14,ELK
15,FusionInsight HD
213.1 M /user/hive
4.3 K /user/loader
493 /user/mapred
344 in the first column indicates the number of folders in the /user/ directory, and 494 in the second
column indicates the number of files in /user/. 3.2G indicates the disk space occupied by all files in
/user/ (excluding the number of copies).
Then, use the -mv parameter to move the file to the specified directory. For details, see description
about -mv in this section.
Step 1 On the FusionInsight Manager interface, click Tenant Management (в новом интерфейсе:
Tenant Resources -> Tenant Resources Management).
In the tenant list on the left, click tenant queue_stuXX whose HDFS storage directory needs to be
modified.
Path: Fill in the director assigned to the tenant. If the path does not exist, the system automatically
creates the path. (/user/app_stuXX/myquota)
Quota: Fill in the upper limit of the total number of stored files and directories. Quota: 3
SpaceQuota: Fill in the storage space quota for creating the directory. SpaceQuota: 1000 MB
HCNA-BigData V2.0 Experiment Guide Page 22
Run the following command to check whether the file has been uploaded:
Found 1 items
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-16 20:09
/user/app_stu01/myquota/test01.txt
If the preceding information is displayed, the /myquota directory is created successfully and the
current user has the permission to upload files.
> cd /FusionInsight-Client/Flume
> ll -h
total 451M
-rwxr-xr-x 1 root root 451M Jul 23 13:22
FusionInsight_Cluster_1_Flume_Client.tar
It can be seen that the file fails to be uploaded when SpaceQuota is set to1000 MB and the file size is
greater than 256 MB.
Run the following command to view the file list in the specified HDFS directory:
Found 2 items
-rw-rw-rw-+ 3 user01 hadoop 1774 2020-07-23 18:35
/user/app_stu01/myquota/switchuser.py
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-23 18:31
/user/app_stu01/myquota/test01.txt
If the command output does not contain the install.ini file, the file fails to be uploaded.
----End
Step 2 Upload the data again and then view the file list in the directory:
Found 3 items
-rw-rw-rw-+ 3 user01 hadoop 472237056 2020-07-23 18:42
/user/app_stu01/myquota/FusionInsight_Cluster_1_Flume_Client.tar
-rw-rw-rw-+ 3 user01 hadoop 1774 2020-07-23 18:35
/user/app_stu01/myquota/switchuser.py
-rw-rw-rw-+ 3 user01 hadoop 38 2020-07-23 18:31
/user/app_stu01/myquota/test01.txt
The preceding command output indicates that the large file (296 MB) can be uploaded and multiple
(three) files can be uploaded after the configuration is modified.
----End
Step 1 Log in to FusionInsight Manager, choose Tenant > Tenant Management > queue_stuXX >
Resource (в новом интерфейсе Tenant Resources -> Tenant Resources Management ->
queue_stuXX > Resource).
Step 2 Click the cross icon (x) in the Operation column of the specified directory in HDFS Storage area
to delete the storage resource.
HCNA-BigData V2.0 Experiment Guide Page 25
Step 3 In the Delete Directory dialog box that is displayed, select the check box and click OK.
----End
Step 1 Choose System > Backup Management. (O&M -> Backup and Restoration)
Step 3 Select the check box next to NameNode and configure parameters of the NameNode metadata
backup task, including Task name, Path type, Maximum number of backup copies, and Instance
name, and click OK.
Step 4 Click the start icon in the Operation column to execute the metadata backup task.
Выбираем задачу из списка, в столбце «Operation» выбираем пункт «More» и далее «Back
Up Now»
HCNA-BigData V2.0 Experiment Guide Page 27
Step 5 When the task progress is 100%, the task is complete and HDFS metadata is backed up
successfully.
----End
Step 1 Choose System > Backup Management (в новом интерфейсе O&M -> Backup and Restoration -
> Backup Management).
Step 2 Click the button for viewing historical operations in the NameNodeBackup task (в столбце
«Operation» выбираем пункт More -> View History).
HCNA-BigData V2.0 Experiment Guide Page 28
Step 3 Check the data backup log and click View in the Details column.
Step 4 Find the path for saving the backup data file from the log file, as shown in the following figure.
(Другой интерфейс: В столбце «Backup Path» кликнуть «View»). Выведется имя файла,
например:
/srv/BigData/LocalBackup/1/stu01_NameNodeBackup_20200723185056/NameNode_2
0200723185104/6.5.1_HDFS-hacluster-fsimage_20200723185211.tar.gz
Step 5 Copy the path and click Recovery Management to create a recovery task (O&M -> Backup and
Restoration -> Restoration Management).
HCNA-BigData V2.0 Experiment Guide Page 29
Step 6 On the page that is displayed, click Create Recovery Task (кнопка Create).
Step 7 Configure parameters for the task, including Task name, Path type, Source path, and Instance
name. The source path indicates the file path obtained in step 4. After configuring all the
parameters, click OK.
Step 8 Click the start icon corresponding to the task to start data recovery.
ВНИМАНИЕ! Запуск Recovery task завершиться неудачей, поскольку для ее выполнения нужно
остановить NameNode. Останавливать NameNode НЕ НУЖНО!
2.4 Summary
This experiment describes common HDFS operations and HDFS management. After this experiment,
trainees should have known how to perform common operations in the HDFS.
HCNA-BigData V2.0 Experiment Guide Page 31
3.1 Background
HBase is a highly reliable, high-performance, column-oriented, and scalable distributed storage
system. It is the most commonly used NoSQL database in the industry. The knowledge about how to
use HBase can deepen trainees' understanding of HBase and lay a solid foundation for
comprehensively using Big Data.
3.2 Objective
⚫ To have a good command of common HBase operations, region operations, and filter
usage.
> cd /home/userXX/hadoopclient
> source bigdata_env
> kinit stuXX
Password for stuXX@HADOOP.COM:
> hbase shell
……
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.0.2, rUnknown, Thu May 12 17:02:55 CST 2016
hbase(main):001:0>
The preceding information indicates that you have logged in to the HBase shell client.
Step 2 Run the list command to check the number of common tables in the system.
> list
TABLE
stu01_cga_info
Socker
t1
3 row(s) in 0.2300 seconds
=> ["stu01_cga_info", "socker", "t1"]
The command output shows that there are three common tables in the system.
----End
----End
----End
Step 2 Query the information stored in the cell whose Rowkey is 123001 or 123002 and column name
is name.
ROW COLUMN+CELL
123001 column=info:name, timestamp=1523350443121, value=Kobe
123002 column=info:name, timestamp=1523351965188,
value=Victoria
2 row(s) in 0.0500 seconds
In addition to COLUMNS, HBase also supports Limit (limits the rows of query results)
and STARTROW (the Rowkey start line locates the region based on the STARTROW, and
then scans backwards), STOPROW (end line), TIMERANGE (range of the time stamp),
VERSIONS (version number), and FILTER (filters lines by condition).
----End
Step 2 Change the age information whose Rowkey is 123001 in the table.
Step 3 Query the age information whose Rowkey is 123001 in the table again.
Compare the results of step 1 and step 3. It can be seen that the age information has been updated.
----End
Step 2 Run the delete command to delete the data stored in the age column in 123001.
Step 3 Query the information whose Rowkey is 123001 in the table again.
Compare the results of step 1 and step 3. It can be seen that the age information has been deleted.
----End
Step 1 Run the deleteall command to delete data in the entire line of 123001 in the cga_info table.
Step 2 Query the information whose Rowkey is 123001 in the table again.
No information whose RowKey is 123001 can be found, indicating that all the data in the line has
been deleted.
----End
Step 2 Run disable 'table name' first and then drop 'table name' to delete the table.
> list
TABLE
cga_info
Socker
t1
3 row(s) in 0.2300 seconds
=> ["cga_info", "socker", "t1"]
The result shows that the stuXX_cga_info1 table has been deleted.
----End
Example 4: Query the address information of all the people in the table and find out the people who
live in London.
HCNA-BigData V2.0 Experiment Guide Page 38
Filter filters data based on the column family, column, version, and so on. Only four filtering methods
are demonstrated here. RPC query requests with filter criteria will be distributed to each
RegionServer. In this way, the network transmission pressure is reduced.
create 'table name', 'column family name', {NUMREGIONS => 4, SPLITALGO =>
'UniformSplit'}
> create 'stuXX_cga_info2','info',{NUMREGIONS=>4,SPLITALGO=>'UniformSplit'}
0 row(s) in 0.3720 seconds
=> Hbase::Table - cga_info2
Step 6 Query the region division result. The stuXX_cga_info2 table is divided into four regions. Name
contains the table name, StartKey (the first region does not have StartKey), timestamp, and
region ID.
----End
HCNA-BigData V2.0 Experiment Guide Page 41
> create 'table name', 'column family name', SPLITS => ['first StartKey',
'second StartKey', 'third StartKey']
Example: Create a table named stuXX_cga_info3 and specify three StartKeys which are 10000,
20000, and 30000 respectively.
The result shows that the stuXX_cga_info3 table is divided into four regions based on Start Keys
10000, 20000, and 30000.
----End
user01@fi01host01:~>
> cd /home/userXX
On the editing interface, press i, and then enter 10000, 20000, and 30000, and press Enter after
entering each of the values.
Step 5 After entering all the information, press esc: wq to end the editing.
Step 6 Go to HBaes shell again.
> cd /home/userXX/hadoopclient
> source bigdata_env
> kinit stuXX
Password for stuXX@HADOOP.COM:
> hbase shell
Step 7 Create a table named stuXX_cga_info4 and pre-divide it using the splitFile file created earlier.
The result shows that the stuXX_cga_info4 table is divided into four regions based on Start Keys
10000, 20000, and 30000 specified in splitFile.dat.
Note: For a table with regions pre-divided using start keys and end keys, the range of region rowkey
is [start_key, end_key).
----End
HCNA-BigData V2.0 Experiment Guide Page 43
The previous figure shows a serious problem of load imbalance. The fi01host02 host is overloaded.
You can manually move hot regions to the fi01host01 host.
----End
Step 2 Check which regions are taken over by the fi01host02 host.
HCNA-BigData V2.0 Experiment Guide Page 44
As shown in the preceding figure, the load is unbalanced due to the meta table. However, you are
not advised to move the meta table. In this experiment, move the stuXX_cga_info table.
On the web UI, you can see that the region has been moved to fi01host01.
----End
HCNA-BigData V2.0 Experiment Guide Page 45
3.4 Summary
This experiment demonstrates how to create and delete an HBase table, how to add, delete, modify,
and query data, how to pre-divide regions, and how to manually achieve load balancing. Through the
experiment, trainees can master the methods of using HBase and deepen their understanding of
HBase.
HCNA-BigData V2.0 Experiment Guide Page 46
4.1 Background
Hive is a data warehouse tool that plays an important role in data mining, data aggregation, and
statistical analysis. In particular, Hive plays an important role in telecom services. It can be used to
collect traffic, call fee, and tariff information of users, and establish users' consumption models to
help carriers better plan package content.
4.2 Objectives
⚫ To have a good command of common Hive operations
⚫ To master how to run HOL on Hue.
Syntax: reverse(string A)
Returned value: string
Note: Return the reversion of character string A.
Syntax: substr(string A, int start, int len),substring(string A, int start, int len)
Returned value: string
Note: Return character string A from the start point with a length of len.
Syntax: trim(string A)
Returned value: string
Note: Remove the spaces on both sides of the character string.
Time functions
In the preceding information, row format delimited fields terminated by ',' indicates that the line
delimiter is ','. If this parameter is not set, the default delimiter is used. A Hive HQL statement ends
with a semicolon (;).
View the cga_info1 table.
> create external table cga_info2 (name string,gender string,timest int) row
format delimited fields terminated by ',' stored as textfile;
No rows affected (0.343 seconds)
+------------+
| tab_name |
+------------+
| cga_info2 |
+------------+
1 row selected (0.078 seconds)
> cd /home/userXX
> touch 'cga111.dat'
Step 2 Run the vim command to edit the cga111.dat file. Enter several lines of data in the sequence of
name, gender, and time. The line delimiter is a comma (,). To start new line, press Enter. After
the input is complete, press ESC and enter :wq to save the modification and exit to the Linux
interface.
Xiaozhao,female,20
Xiaoqian,male,21
Xiaosun,male,25
Xiaoli,female,40
Xiaozhou,male,33
> beeline
+-----------------+-------------------+-----------------+
| cga_info3.name | cga_info3.gender | cga_info3.time |
+-----------------+-------------------+-----------------+
| xiaozhao | female | 20 |
| xiaoqian | male | 21 |
| xiaosun | male | 25 |
| xiaoli | female | 40 |
| xiaozhou | male | 33 |
+-----------------+-------------------+-----------------+
5 rows selected (0.287 seconds)
The result shows that the content in the local file cga111.dat has been loaded to the Hive table
cga_info3.
----End
Step 2 Upload the local file cga111.dat in the tmp folder to the /cga/cg directory of the HDFS.
> beeline
> use stuXX_db;
Note: Slightly different commands are used to load local data and HDFS data.
Loading a local file: load data local inpath 'local_inpath' into table hive_table;
Loading an HDFS file: load data inpath 'HDFS_inpath' into table hive_table.
The result shows that the content of the cga111.dat file in the HDFS has been loaded to the Hive
table cga_info4.
----End
> create external table cga_info5 (name string,gender string,timest int) row
format delimited fields terminated by ',' stored as textfile location
'/user/app_stuXX/cga/cg';
No rows affected (0.317 seconds)
It can be seen that the cga_info5 table has been created successfully with cga111.dat data in the
HDFS loaded.
При загрузке данных во внешнюю таблицу файлы не удаляются. При добавлении новых
происходит автоматическое добавление записей в таблицу.
-----------------
ВЫВОДЫ:
1. При загрузке данных в обычную таблицу файл данных в HDFS удаляется.
The output shows that the empty table has been copied successfully.
----End
4.3.3 Querying
4.3.3.1 Fuzzy Query of Tables
Query tables whose names start with cga.
| cga_info6 |
+--------------------+
7 rows selected (0.072 seconds)
+-----------------+-------------------+-------------------+
| cga_info3.name | cga_info3.gender | cga_info3.timest |
+-----------------+-------------------+-------------------+
| xiaozhao | female | 20 |
| xiaoqian | male | 21 |
+-----------------+-------------------+-------------------+
2 rows selected (0.295 seconds)
Example 2: Use where to query the information about all women in the cga_info3 table.
Example 3: Use order to query the information about all women in cga_info3 by time in descending
order.
The result shows that the information of xiaozhao is ranked second in the output because the query
result is sorted in descending order of time although data about xiaozhao is entered first.
+-----------+--------------+
| name | all_time |
+-----------+--------------+
| xiaoli | 40 |
| xiaozhou | 33 |
+-----------+--------------+
2 rows selected (24.683 seconds)
Example 2: Query the cga_info3 table for groups by gender, and find out the person whose time
value is the greatest.
+---------+---------+
| gender | _c1 |
+---------+---------+
| female | 40 |
| male | 33 |
+---------+---------+
2 rows selected (24.35 seconds)
Example 3: Check the numbers of women and men respectively in the cga_info3 table.
Example 4: Insert women information in the cga_info7 table into the cga_info3 table.
> cd /home/userXX
> touch cga222.dat
> vim cga222.dat
xiaozhao,female,20
xiaochen,female,28
> beeline
> use stuXX_db;
HCNA-BigData V2.0 Experiment Guide Page 56
Step 4 Load women information in the cga_info7 table to the cga_info3 table.
The output shows that two pieces of women information in the cga_info7 table have been added to
the cga_info3 table.
Example 5: Query the sum of time values of the people in the cga_info3 table based on the name
and gender.
+-----------+---------+----------+
| name | gender | timest |
+-----------+---------+----------+
| xiaochen | female | 28 |
| xiaoli | female | 40 |
| xiaoqian | male | 21 |
| xiaosun | male | 25 |
| xiaozhao | female | 40 |
| xiaozhou | male | 33 |
+-----------+---------+----------+
6 rows selected (23.554 seconds)
The output shows that two pieces of xiaozhao information are merged in the cga_info3 table.
Example 6: Check the sum of the time values of all the people in the cga_info3 table, and then sorts
the records by time in descending order based on the gender.
+-----------+-----------+----------+--------+
| b.name | b.gender | b.timest | rank |
+-----------+-----------+----------+--------+
| xiaozhao | female | 40 | 1 |
| xiaoli | female | 40 | 2 |
| xiaochen | female | 28 | 3 |
| xiaozhou | male | 33 | 1 |
| xiaosun | male | 25 | 2 |
| xiaoqian | male | 21 | 3 |
+-----------+-----------+----------+--------+
6 rows selected (52.762 seconds)
----End
> beeline
> use stuXX_db;
> create table cga_info8(name string,age int) row format delimited fields
terminated by ',' stored as textfile;
GuoYijun,5
YuanJing,10
Liyuan,20
> beeline
> use stuXX_db;
> load data local inpath '/home/userXX/cga8.dat' into table cga_info8;
> create table cga_info9(name string,gender string) row format delimited fields
terminated by ',' stored as textfile;
YuanJing,male
Liyuan,male
LiuYang,female
Lilei,male
> beeline
> use stuXX_db;
> load data local inpath '/home/userXX/cga9.dat' into table cga_info9;
+-----------------+-------------------+
| cga_info8.name | cga_info8.age |
+-----------------+-------------------+
| GuoYijun | 5 |
| YuanJing | 10 |
| Liyuan | 20 |
+-----------------+-------------------+
3 rows selected (0.212 seconds)
The following statement uses inner join to associate information about the same person in the
cga_info8 and cga_info9 tables.
Use full join to associate information about the same person in the cga_info8 and cga_info9 tables.
Query the sum of time values of people in the cga_info3 table based on the name and gender.
Если такой запрос не работает из-за нехватки памяти или ресурсов, можно выполнить
простой запрос без группировок:
> select name,gender from cga_info3;
+-----------+---------+---------+
| name | gender | timest |
+-----------+---------+---------+
| xiaochen | female | 28 |
| xiaoli | female | 40 |
| xiaoqian | male | 21 |
| xiaosun | male | 25 |
| xiaozhao | female | 40 |
| xiaozhou | male | 33 |
+-----------+---------+---------+
6 rows selected (1.213 seconds)
Compared with the result of example 5 in section 4.3.3, the query speed of Hive on Spark is 1
second, which is much faster than that of Hive on MapReduce. (здесь будет другое время, все
зависит от настроек YARN)
Вернуть исполнение с помощью MapReduce:
Step 5 Create a Hive external table cga_hbase_hive and associate it with the student table.
> beeline
> use stuXX_db;
> create external table cga_hbase_hive (key int,gid map<string,string>)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with
SERDEPROPERTIES ("hbase.columns.mapping" ="info:") TBLPROPERTIES
("hbase.table.name" ="stuXX_student");
The experiment result shows that the Hive table is associated with the HBase table.
----End
Found 2 items
HCNA-BigData V2.0 Experiment Guide Page 63
Step 2 On the Hive client, set the parameter of whether to merge Reduce output files to true.
> beeline
> use stuXX_db;
> set hive.merge.mapredfiles= true;
Step 3 Create table cga_info10 and load the content of table cga_info1 to the new table.
The result shows that the two small files that should be output in the Reduce phase have been
merged into one because the parameter setting has been modified in step 2.
----End
The result shows that the names of all the people in the table are encrypted.
----End
Step 3 Move the pointer to Query Editors and choose Hive from the drop-down menu.
Step 5 After compiling the HQL program, select the computing engine and then click Execute.
Набираем запрос, например: select * from cga_info1
----End
4.4 Summary
This experiment describes how to add, delete, modify, and query data in Hive data warehouses, Hive
on Spark, and how to operate HBase using Hive. In Hive join operations, multiple join methods are
introduced to enable trainees to have a more intuitive understanding of join types and their
differences. This experiment helps trainees to reinforce their comprehension about Hive. Note that
stored as textfile must be specified during table creation when loading data. Otherwise, data cannot
be loaded.
HCNA-BigData V2.0 Experiment Guide Page 68
5.1 Background
Data migration operations are frequently involved in Big Data services, especially data migration
between relational databases and Big Data components, for example, data migration between
MySQL and HDFS/HBase. The graphical operations of Loader makes data migration more convenient.
5.2 Objective
⚫ Have a good command of using Loader to perform data migration in service scenarios.
Step 4 Configure the task name and select the type (select Export to export data from HBase to HDFS).
Job Name: stuXX_cg_hbasetohdfs
Connection: stuXX_hdfs_conn
Queue: DEFAULT
Step 5 Click Add, as shown in the preceding figure. Select hdfs-connector and set Name to a unique
value.
HCNA-BigData V2.0 Experiment Guide Page 71
Step 7 Configure basic information as shown in the following figure, and then click Next.
Step 8 Select HBASE for Source type. Set Number to the number of Map tasks. Fill in 1 here. Then, click
Next.
HCNA-BigData V2.0 Experiment Guide Page 72
Step 9 Click Input on the left, select HBase Input, and drag the HBase Input button to the right area.
Step 10 Click output on the left, select File Output, and drag the File Output button to the right area.
Step 11 Query the content in the cga_info table first in order to configure input and output.
Step 12 Configure the HBase input. Double-click the HBase Input button on the web UI. Enter table
name cga_info, click Add, enter the family name, column name, field name, and type in
sequence, select is rowkey, and click OK.
Step 13 Double-click the File Output button on the web UI to configure the HDFS output. Specify the
output delimiter, click associate, enter serial numbers in the position column, and then click
OK.
Output delimiter: ,
HCNA-BigData V2.0 Experiment Guide Page 74
Step 16 Enter the output path, select the file format, and click Save and run.
Output path: /tmp/stuXX/cg_hbasetohdfs
HCNA-BigData V2.0 Experiment Guide Page 75
123002,Victoria,female,40,London
123003,Taylor,female,30,Redding
123004,LeBron,male,33,Cleveland
The output shows that the content of the stuXX_cga_info table is successfully moved to the
export_part_1631277441726_0002_0000000 file in the /tmp/stuXX/cg_hbasetohdfs directory.
----End
Step 2 Perform the first three steps of section 5.3.1 to go to the page for configuring Loader and
configure basic information.
Name: stuXX_cg_hdfstohbase
Connection: stuXX_hdfs_conn (строка подключения была создана в предыдущем задании)
HCNA-BigData V2.0 Experiment Guide Page 76
Queue: DEFAULT
Step 4 Click Input on the left, select CSV File Input, and drag the CSV File Input button to the right
area.
HCNA-BigData V2.0 Experiment Guide Page 77
Step 5 Click output on the left, select HBase Output, and drag the HBase Output button to the right
area.
Step 6 Configure the CSV file input. Double-click the CSV File Input button on the web UI. Enter the
delimiter of the table and click Add. Enter the position serial number, field name, and type in
sequence, and then click OK.
Delimiter: ,
HCNA-BigData V2.0 Experiment Guide Page 78
Step 7 Configure HBase output. Double-click the HBase Output button on the web UI and click
associate.
Step 8 Select the check boxes in the Name column and click OK.
HCNA-BigData V2.0 Experiment Guide Page 79
Step 9 Enter the table name, select rowkey as the primary key, and click OK.
Table Name: stuXX_cg_hdfstohbase
HCNA-BigData V2.0 Experiment Guide Page 80
Step 11 Click Next to configure To. Set Storage type to HBASE_PUTLIST, set Number to 1, and click Save
and run.
HCNA-BigData V2.0 Experiment Guide Page 81
1,Tom,male,8
2,Lily,female,24
3,Lucy,female,50
Step 2 Upload local file test_mysql to the /user/app_stuXX/loader_test directory of the HDFS.
Found 1 items
-rw-r--r--+ 3 user01 supergroup 47 2018-04-15
13:09/user/app_stu01/loader_test/test_mysql.txt
mysql> create table cga_mysql(id int(4) not null primary key auto_increment,
name varchar(255) not null, gender varchar(255) not null, time int(4));
Step 8 Copy the MySQL link JAR package to the specified directory of the active and standby Loader.
> cp /FusionInsight-Client/mysql-connector-java-5.1.21.jar
/opt/huawei/Bigdata/FusionInsight_Porter_6.5.1/install/FusionInsight-Sqoop-
1.99.3/FusionInsight-Sqoop-1.99.3/server/webapps/loader/WEB-INF/ext-lib
(на активный)
> ll /opt/huawei/Bigdata/FusionInsight_Porter_6.5.1/install/FusionInsight-
Sqoop-1.99.3/FusionInsight-Sqoop-1.99.3/server/webapps/loader/WEB-INF/ext-lib
total 940
-rwxr-xr-x 1 root root 118057 Jan 23 11:36 hive-jdbc-1.3.0.jar
-rwxr-xr-x 1 omm wheel 827942 Feb 8 10:36 mysql-connector-java-5.1.21.jar
-rwxr-xr-x 1 omm wheel 18 Nov 23 2015 readme.properties
The result shows that the MySQL link JAR package has been copied to the specified directory of the
active and standby Loader.
Step 11 Perform steps 1 to 3 in section 5.3.1 to enter the page for configuring basic information about
Loader.
Name: stuXX_cg_hdfstomysql
Queue: DEFAULT
HCNA-BigData V2.0 Experiment Guide Page 84
Step 12 Click Edit to start the MySQL connection configuration. The MySQL password is
Huawei@010203. After filling in the information, click Test. After the test is complete, click OK.
Step 13 Click input on the left, select CSV File Input, and drag the CSV File Input button to the right
area.
HCNA-BigData V2.0 Experiment Guide Page 85
Step 14 Click output on the left, select Table Output, and drag the Table Output button to the right
area.
Step 15 Configure the CSV file input. Double-click the CSV File Input button on the web UI. Enter the
delimiter ‘,’ and click Add. Enter the position serial number, field name, and type, and then click
OK.
HCNA-BigData V2.0 Experiment Guide Page 86
!!! В графе «position» нужно указать позиции, начиная с 2 – т.е. должно быть 2,3,4 (поскольку
исходный файл содержит в строке по 4 атрибута, первый – идентификатор, который в
таблице автоматически преобразуется в первичный ключ).
Step 16 Double-click the Table Output button on the web UI. Click associate, select the check boxes in
the Name column, and click OK.
Step 17 Enter the field name, table column name, and type, and then click OK.
HCNA-BigData V2.0 Experiment Guide Page 87
Step 19 Click Next to start output configuration. Enter the table name and click Save and run.
The result shows that the content of the test_mysql file in the HDFS has been loaded to the
cga_mysql table of the MySQL database.
----End
Step 2 Perform steps 1 to 3 in section 5.3.1 to enter the page for configuring basic information about
Loader.
Name: stuXX_cg_mysqltohdfs
Queue: DEFAULT
Connection: stuXX_mysql
HCNA-BigData V2.0 Experiment Guide Page 89
Step 4 Click input on the left, select Table Input, and drag the Table Input button to the right area.
Step 5 Click output on the left, select File Output, and drag the File Output button to the right area.
HCNA-BigData V2.0 Experiment Guide Page 90
Step 6 Double-click the Table Input button on the web UI. Click Add, enter the position serial number,
field name, and type, and click OK.
Step 7 Double-click the File Output button on the web UI. Configure Output delimiter and click
associate. Then select the check boxes in the Name column and click OK.
Output delimiter: ,
Step 8 Connect Table Input and File Output, and click Next.
HCNA-BigData V2.0 Experiment Guide Page 91
1,tom,male,8
2,lily,female,24
3,lucy,female,50
The result shows that MySQL table cga_mysql has been imported to the
/user/app_stuXX/loader_test directory of the HDFS.
----End
Step 3 Perform steps 1 to 3 in section 5.3.1 to enter the page for configuring basic information about
Loader.
Name: stuXX_cg_mysqltohbase
Connection: stuXX_mysql
Queue: DEFAULT
HCNA-BigData V2.0 Experiment Guide Page 93
Step 5 Click input on the left, select Table Input, and drag the Table Input button to the right area.
HCNA-BigData V2.0 Experiment Guide Page 94
Step 6 Click output on the left, select HBase Output, and drag the HBase Output button to the right
area.
Step 7 Double-click the Table Input button on the web UI. Click Add, enter the position serial number,
field name, and type, and click OK.
Step 8 Configure the HBase output. Double-click the HBase Output button on the web UI. Click
associate, select the check boxes in the Name column, and click OK.
Step 9 Enter the HBase table name, column family name, column name, and type, select id as the
primary rowkey, and click OK.
HCNA-BigData V2.0 Experiment Guide Page 95
Step 10 Connect Table Input and HBase Output, and click Next.
Step 11 Set Storage type to HBASE_PUTLIST, HBase instance to HBase, and Number to 1, and then click
Save and run.
HCNA-BigData V2.0 Experiment Guide Page 96
The result shows that MySQL table cga_mysql has been successfully loaded to HBase table
stuXX_cg_mysqltohabse.
----End
Step 2 Perform steps 1 to 3 in section 5.3.1 to enter the page for configuring basic information about
Loader.
Name: stuXX_cg_hbasetomysql
Connection: stuXX_mysql
Queue: DEFAULT
Step 3 Click Next configure From. Set Source type to HBASE and Number to 1.
Step 4 Click input on the left, select HBase Input, and drag the HBase Input button to the right area.
HCNA-BigData V2.0 Experiment Guide Page 98
Step 5 Click output on the left, select Table Output, and drag the Table Output button to the right
area.
Step 6 Configure the HBase input. Double-click the HBase Input button on the web UI. Enter the HBase
table name, click Add, enter the family name, column name, field name, and type in sequence,
select id as the rowkey, and click OK.
Step 7 Double-click the Table Output button on the web UI. Click associate, select the check boxes in
the Name column, and click OK.
HCNA-BigData V2.0 Experiment Guide Page 99
Step 8 Connect HBase Input and Table Output, and click Next.
Step 9 Configure To. Set Table name to cga_hbasetomysql, and click Save and run.
The result shows that the content of HBase table cg_mysqltohbase has been successfully loaded to
MySQL table cga_hbasetomysql.
----End
Step 2 Perform steps 1 to 3 in section 5.3.1 to enter the page for configuring basic information about
Loader.
Name: stuXX_cg_mysqltohive
Connection: stuXX_mysql
Queue: DEFAULT
Step 3 Click Next to configure From. Set the table name to cga_mysql.
Need partition column: false
HCNA-BigData V2.0 Experiment Guide Page 101
Step 4 Click Next to start transform configuration. Click input on the left, select Table Input, and drag
the Table Input button to the right area.
Step 5 Click output on the left, select Hive Output, and drag the Hive Output button to the right area.
Step 6 Configure table input. Double-click the Table Input button on the web UI. Click Add, enter the
position serial number, field name, and type, and click OK.
HCNA-BigData V2.0 Experiment Guide Page 102
Step 7 Configure table output. Double-click the Table Output button on the web UI. Click associate,
select the check boxes in the Name column, and click OK.
Step 9 Connect Table Input and Hive Output, and click Next.
The result shows that the content of MySQL table cga_mysql has been successfully loaded to Hive
table stuXX_cg_mysqltohive.
----End
5.4 Summary
This experiment describes how to use Loader in various service scenarios. After the experiment,
trainees are expected to be able to solve problems occurred during data migration. Note that you
need to create tables before migrating table data between MySQL, HBase, and Hive. When an
experiment is performed on the MySQL database using Loader, the MySQL table must have a primary
key.
HCNA-BigData V2.0 Experiment Guide Page 105
6.1 Background
Flume is an important data collection tool among Big Data components. Flume is often used to
collect data from various data sources for other components to analyze. In the log analysis service,
you need to collect server logs to check whether the server is running properly. In real-time services,
data is often collected to the Kafka for analysis and processing of real-time components such as the
Streaming or Spark. The Flume plays an important role in Big Data services.
6.2 Objective
⚫ Understand how to configure Flume and use it to collect data.
The Flume is used to monitor the file directory here. Data is saved to the HDFS. The
Channel type is memory.
HCNA-BigData V2.0 Experiment Guide Page 106
The path varies depending on the account. Here, user01 is used as an example.
Set hdfs.kerberosPrincipal to a cluster user in FusionInsight Manager stuXX, for example, stu01. The
path of hdfs.kerberosKeytab is the path where the file is stored in Linux, for example,
/home/userXX/flumetest. Set the permission of the file to 755.
----End
> cp /FusionInsight-Client/Flume/FusionInsight_Cluster_1_Flume_Client.tar
/home/userXX/
> ls /home/userXX/FusionInsight_Cluster_1_Flume_ClientConfig/Flume/FlumeClient
Log in to FusionInsight Manager using a FusionInsight Manager account stuXX, for example, stu01,
and choose System > Rights Configuration > User Management.
(--Другой интерфейс! System -> User -> stuXX)
In the Operation column of the corresponding account, click the Download icon to download
krb5.conf and user.keytab files.
HCNA-BigData V2.0 Experiment Guide Page 110
Client {
com.sun.security.auth.module.Krb5LoginModule required
storeKey=true
principal= "stuXX" (indicates the user created on FusionInsight Manager)
useTicketCache=false
debug=true
useKeyTab=true;
};
-Djava.security.krb5.conf=/home/userXX/flumetest/krb5.conf
-Djava.security.auth.login.config=/home/userXX/flumetest/jaas.conf
-Dzookeeper.server.principal=zookeeper/hadoop.hadoop.com
-Dzookeeper.request.timeout=120000
HCNA-BigData V2.0 Experiment Guide Page 111
> cd /home/userXX/hadoopclient/HDFS/hadoop/etc/hadoop
> cp hdfs-site.xml /home/userXX/flumetest
> cp core-site.xml /home/userXX/flumetest
> cd /home/userXX/hadoopclient/HBase/hbase/conf
> cp hbase-site.xml /home/userXX/flumetest
> cd /home/userXX/flumetest
> ll
-rw------- 1 user01 users 8563 Apr 16 22:49 core-site.xml
-rw------- 1 user01 users 9830 Apr 16 22:50 hbase-site.xml
-rw------- 1 user01 users 15277 Apr 16 22:48 hdfs-site.xml
-rw-r--r-- 1 user01 users 199 Apr 16 22:23 jaas.conf
-rw-r--r-- 1 user01 users 757 Apr 15 20:24 krb5.conf
-rw-r--r-- 1 user01 users 2119 Apr 16 21:12 properties.properties
-rw-r--r-- 1 user01 users 126 Apr 15 20:24 user.keytab
Step 5 Install the client. (If you use a non-root user, it is recommended that the installation directory
not contain too many levels; otherwise, the installation may fail.)
> cd
/home/userXX/FusionInsight_Cluster_1_Flume_ClientConfig/Flume/FlumeClient
Parameter description:
-d: installation path of the Flume client
HCNA-BigData V2.0 Experiment Guide Page 112
-f: Service IP addresses of two MonitorServer roles, separated by a comma (,). This parameter is
optional. If this parameter is not set, the Flume client does not send alarm information to the
MonitorServer.
-c: configuration file, which is optional. After the installation, you can configure Flume role client
parameters by modifying /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/properties.properties.
“-l”: Log directory. This parameter is optional. The default value is /var/log/Bigdata. (The user user
needs to have the write permission on the directory.)
> ll /home/userXX/spooldir -a
total 408
drwxrwxrwx 3 root root 4096 Feb 9 13:44 .
drwxrwxrwx 81 root root 12288 Apr 16 23:26 ..
drwxrwxrwx 2 omm wheel 4096 Jan 26 23:46 .flumespool
-rwxrwxrwx 1 root root 389592 Jan 26 23:45 zypper.log.COMPLETED
----End
Step 2 Set the source type to avro, set the listening IP address and port number, set channels to ch2,
and click Add Source.
Set authentication information, including the authentication account and the address of the
authentication file. The authentication account and the authentication file can be the same as those
in 6.3.1.
The flume-env.sh file exists in the flume/conf directory of the decompressed file on the Flume
client. Add the following content at the end of JAVA_OPTS:
----End
> cd
/home/userXX/FusionInsight_Cluster_1_Flume_ClientConfig/Flume/FlumeClient
If message [flume-client install]: install flume client successfully is displayed, the client is installed
successfully.
Note: After the installation, you can run the ps -ef | grep flume | grep username command to check
the Flume service status.
> cd /opt
> java -cp flumeavroclient.jar org.myorg.SSLAvroclient
----End
6.4 Summary
This experiment describes how to collect data from the spooldir and avro data sources using the
Flume. Through this experiment, trainees are expected to master how to collect data offline and in
real time as well as have a better understanding of Flume.
HCNA-BigData V2.0 Experiment Guide Page 116
7.1 Background
In Big Data services, multiple components are usually built into a service system to meet the
requirements of upper-layer services.
This experiment combines the preceding components to build a Big Data analysis and real-time
query platform.
Loader periodically migrates MySQL database data to Hive first. As Hive data is stored in HDFS,
Loader is used to load data in HDFS to HBase. HBase is used to query data in real time, and the big
data processing capability of Hive is used to analyze related results.
7.2 Objective
⚫ Use Big Data components comprehensively to convert and query data in real time.
Step 3 Create table socker and make time the primary key.
Load data in socker.csv to the socker table using the tool on the MySQL client.
----End
HCNA-BigData V2.0 Experiment Guide Page 118
If no primary key is set in the table, specify Partition column name, such as 1 or column
name time.
Step 4 Double-click Table Input and enter the attributes associated with MySQL. Field name indicates
the corresponding fields of MySQL.
HCNA-BigData V2.0 Experiment Guide Page 120
> beeline
> use stuXX_db;
> create table socker2(timest string,open float,high float,low float,close
float,volume string,endprice float)row format delimited fields terminated by
',' stored as textfile location '/user/app_stuXX/loader_test/socker2';
Step 10 Configure To. Set Storage type to HIVE, Output directory to /stu01/hive/warehouse/socker2,
and Number to 2.
----End
> beeline
> use stuXX_db;
> select socker2.timest, socker2.open, socker2.endprice from socker2 where
socker2.endprice > socker2.open sort by socker2.endprice desc;
+-----------------+---------------+-------------------+
| socker2.timest | socker2.open | socker2.endprice |
+-----------------+---------------+-------------------+
| 1974-05-21 | 87.86 | 87.91 |
| 1978-03-09 | 87.84 | 87.89 |
| 1978-03-08 | 87.36 | 87.84 |
| 1975-12-04 | 87.6 | 87.84 |
| 1975-12-12 | 87.8 | 87.83 |
| 1970-02-19 | 87.44 | 87.76 |
| 1974-06-24 | 87.46 | 87.69 |
| 1978-02-23 | 87.56 | 87.64 |
| 1970-03-18 | 87.29 | 87.54 |
| 1970-12-01 | 87.2 | 87.47 |
| 1978-03-03 | 87.32 | 87.45 |
| 1970-02-18 | 86.37 | 87.44 |
| 1974-05-30 | 86.89 | 87.43 |
| 1978-03-07 | 86.9 | 87.36 |
| 1978-03-02 | 87.19 | 87.32 |
| 1975-12-09 | 87.07 | 87.3 |
……
+-----------------+---------------+-------------------+
5,228 rows selected (30.544 seconds)
+--------------+--------------+------------------+
| socker2.time | socker2.open | socker2.endprice |
+--------------+--------------+------------------+
| 1970-04-09 | 88.49 | 88.53 |
| 1970-04-01 | 89.63 | 90.07 |
| 1970-03-26 | 89.77 | 89.92 |
| 1970-03-25 | 88.11 | 89.77 |
| 1970-03-24 | 86.99 | 87.98 |
| 1970-03-18 | 87.29 | 87.54 |
| 1970-03-17 | 86.91 | 87.29 |
| 1970-03-10 | 88.51 | 88.75 |
| 1970-03-03 | 89.71 | 90.23 |
| 1970-03-02 | 89.5 | 89.71 |
| 1970-02-27 | 88.9 | 89.5 |
| 1970-02-25 | 87.99 | 89.35 |
| 1970-02-20 | 87.76 | 88.03 |
| 1970-02-19 | 87.44 | 87.76 |
| 1970-02-18 | 86.37 | 87.44 |
……
+--------------+--------------+------------------+
HCNA-BigData V2.0 Experiment Guide Page 124
Step 4 Create a table to store data of stocks that increase and load the data to HBase (Hive).
Creating a table:
> insert into upsocker select * from socker2 where socker2.endprice >
socker2.open sort by socker2.endprice desc;
----End
Step 2 Perform steps 1 to 3 in section 6.3.1 to enter the page for configuring basic information about
Loader.
Click Next.
----End
COLUMN CELL
info:close timestamp=1523803747562, value=1052.63
info:endprice timestamp=1523803747562, value=1052.63
info:high timestamp=1523803747562, value=1056.04
info:low timestamp=1523803747562, value=1043.42
info:open timestamp=1523803747562, value=1049.03
info:volume timestamp=1523803747562, value=6185620000
6 row(s) in 0.0420 seconds
Step 2 Query information in the period from August 15, 2009 to September 15, 2009.
ROW COLUMN+CELL
2009-08-17 column=info:endprice, timestamp=1523803747562, value=979.73
2009-08-18 column=info:endprice, timestamp=1523803747562, value=989.67
2009-08-19 column=info:endprice, timestamp=1523803747562, value=996.46
2009-08-20 column=info:endprice, timestamp=1523803747562, value=1007.37
……
2009-09-09 column=info:endprice, timestamp=1523803747562, value=1033.37
2009-09-10 column=info:endprice, timestamp=1523803747562, value=1044.14
2009-09-11 column=info:endprice, timestamp=1523803747562, value=1042.73
2009-09-14 column=info:endprice, timestamp=1523803747562, value=1049.34
20 row(s) in 0.0380 seconds
Step 3 Query all the columns whose values are greater than a specific value. (The system compares
the values as strings.)
Step 4 Query all the information that ends with endprice and the string value is greater than 979.
HCNA-BigData V2.0 Experiment Guide Page 129
hbase(main):011:0> scan
'stuXX_cg_hdfstohbase2',{FILTER=>"ValueFilter(>,'binary:979') AND
ColumnPrefixFilter('endprice')"}
----End
7.4 Summary
This experiment uses multiple components to build a Big Data analysis and query platform. Through
the experiment, trainees are expected to have a better understanding of theories and
comprehensive applications about big data components.
HCNA-BigData V2.0 Experiment Guide Page 130
8 Appendix