You are on page 1of 2

BIG DATA1 HIVE IN CLASS ASSIGNMENT

Download the following files from class files to a windows directory on the
VM
Occupation.dat
Movies.dat
Ratings.dat
Users.dat
Create a folder in “data” in HDFS – location “/user/mara_dev/data”
Load the 4 files into /user/maria_dev/data - make sure the permissions are on
to read and write to new folder “data”
Use the compose section in Hive 2.0 to create a separate database named
Hive_tutorial
Issue the following commands on the
create database Hive_Tutorial;
use Hive_Tutorial
Then create the following EXTERNAL tables to hold data

CREATE EXTERNAL TABLE ratings (userid INT, movieid INT, rating INT,tstamp STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '#'
STORED AS TEXTFILE
LOCATION '/user/maria_dev/data/ratings';

CREATE EXTERNAL TABLE movies (movieid INT,title STRING,genres ARRAY<STRING>)


ROW FORMAT DELIMITED
FIELDS TERMINATED BY '#'
COLLECTION ITEMS TERMINATED BY "|"
STORED AS TEXTFILE
LOCATION '/user/maria_dev/data/movies';

CREATE EXTERNAL TABLE users (userid INT,gender STRING,age INT,occupation_id INT,


zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '#'
STORED AS TEXTFILE
LOCATION '/user/maria_dev/data/users';

CREATE EXTERNAL TABLE occupations (id INT,occupation STRING)


ROW FORMAT DELIMITED
FIELDS TERMINATED BY '#'
STORED AS TEXTFILE
LOCATION '/user/maria_dev/data/occupations ';

7. check to see if data is loaded in all the tables

select * from users limit 2;


OK
1 F 1 10 48067
2 M 56 16 70072

select * from movies limit 2;


OK
1 Toy Story (1995) ["Animation","Children's","Comedy"]
2 Jumanji (1995) ["Adventure","Children's","Fantasy"]

select * from ratings limit 2;


OK
1 1193 5 978300760

This study source was downloaded by 100000815985134 from CourseHero.com on 06-18-2023 18:26:36 GMT -05:00

https://www.coursehero.com/file/176366993/Hive-in-class-assignment-winter-2021txt/
1 661 3 978302109

select * from occupations limit 2;


OK
0 other/not specified
1 academic/educator

Take a screen shot of all your answers showing the result of your query then paste
into a word document with your name and student number at the top with the title
Hive In Class Assignment
Once completed submit the assignment
NOTE: in each case to maintain readability I will limit the output to 10 only.

Use Case 1:
Find out Occupation of all the users

Use Case 2:
Find out numbers of non-adults, who has rated movies:

Use case 3:
Find out the no of users with same occupation and having age more than 25 along
with occupation details:

Use Case 4: Find the age of the most rated user with counts of rating;

This study source was downloaded by 100000815985134 from CourseHero.com on 06-18-2023 18:26:36 GMT -05:00

https://www.coursehero.com/file/176366993/Hive-in-class-assignment-winter-2021txt/
Powered by TCPDF (www.tcpdf.org)

You might also like