You are on page 1of 29

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Practical Record
On
BIG DATA ANALYTICS LABORATORY
(18PC2CS06)
Submitted to
VNR Vignana Jyothi Institute of
Engineering & Technology
An autonomous Institute – NAAC ‘A++’ and
NBA Accredited
Bachelor of Technology
In
Computer Science & Engineering
(B Tech IV Year I sem)

Submitted By
Student Name:N VENKATA BHARADWAJA
Roll No: 18071A0597
VNR Vignana Jyothi Institute of Engineering & Technology

Bachupally,Nizampet(S,O) ,Hyderabad-90
BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VNR VIGNANA JYOTHI INSTITUTE  OF


ENGINEERING AND TECHNOLOGY
Bachupally(v),Hyderabad,Telangana,India

Department of Computer Science & Engineering

CERTIFICATE
Certified that this is the bonafide record of the practical work done during the academic

 year………………………………………………………………………………………………………….by the student

Name………………………………………………………………………………………………………………………………..

Hall Ticket No……………………………………………………class………………………………………………………..

In the laboratory……………………………………………………………………………………………………………….

Department of………………………………………………………………………………………………………………….

Signature of the HOD                                                                       Signature of the Staff Member

Date of Exam……………………..

Signature of the Examiners

Internal examiner                                                                                 External examiner

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 2


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VNR VIGNANA JYOTHI INSTITUTE OF


ENGINEERING AND TECHONOLOGY
Bachupally(v), Hyderabad, Telangana, India.

NAME:……………………………………………………………………………………………………………………………………..

DEPARTMENT OF: …................................………………………………………………………………………….

ROLLNO: ………………………………………………………………………………………………………………………………..

LABORATORY: ……………………..…………………………………………........................................................

CLASS: …………………………………………………………. SECTION: ……………………………..………………………

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 3


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Sno: TITLE OF THE PROGRAM Pg.no Date Signature

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 4


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK NO:01 DATE:

TASK: HDFS (Storage) Commands


CODE:

1. Print The Version Of Hadoop

2. List All Files And Directories In HDFS

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 5


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

3. List All Directories in HDFS

4. To Copy file from local to HDFS

5. To check copied file in HDFS

6. To See Copied File contents in HDFS

7. To See Copied File contents in HDFS – method2

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 6


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

8. To copy file from local to HDFS using put command

9. Copy Multiple Files From Local To HDFS Root Directory

10. Copy A File From HDFS To LOCAL Root Directory

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 7


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

11. Health Of HDFS

12. Make A Directory

13. Copy any File From HDFS to destination also in HDFS

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 8


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

14. Create an Empty File in HDFS

15. Check size of any File in HDFS

16. Print the contents of a file in HDFS

17. Count the no of directories and files inside a directory in HDFS

18. Delete a File completely in HDFS

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 9


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

19. Delete a Directory completely in HDFS

20. Copy a File / Multiple Files in a Directory Within HDFS

21. Move a File from one Directory to Another Directory within HDFS

22. To check usage for individual commands

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 10


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

23. Find help for any given command

24. Check the memory status

25. Cluster Balancing in HDFS

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 11


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

26. Changing Permission for File to 777

27. Empty Trash in HDFS

28. Display the last Kilo Byte of Particular file

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 12


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

29. Append Contents of file present in local to file in HDFS

WEEK NO:02 DATE:

TASK: Map Reduce Programming (Processing data).

CODE:

1. Download jar files


2. Open eclipse
3. Create projectGive project name -- > mapreduce press on finish button
4. Create package
Give package name as word then press finish
5. Create class Classword
6. Copy the given mapreduce program in the created class and save it

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 13


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

package word;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;

public class Classword


{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value,Context context) throws
IOException,InterruptedException{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
}
}

public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {


public void reduce(Text key, Iterable<IntWritable> values,Context context) throws
IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 14
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

}
}

@SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception {

Configuration conf= new Configuration();


Job job = new Job(conf,"My Word Count Program");
job.setJarByClass(Classword.class); //here put your class name
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have to delete it
explicitly
outputPath.getFileSystem(conf).delete(outputPath);
//exiting the job only if the flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

7. Add jar files


8. Create a jar file for a given program
9. Jar file will be created on the desktop
10. Open the terminal
Create a txt file (count.txt) on the desktop and move it to HDFS

11. Now, perform Map-Reduce as below:

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 15


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

12. Exploring the map-reduce results folder named ‘mapreduce-result-ar’ so created.

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 16


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 17


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 18


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

WEEK NO:04 DATE:

TASK: Data Processing Tool – Pig (Latin based scripting lang)

CODE:

1. How to enter in grunt shell?

2. Create 2 datasets using gedit command in local

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 19


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

3. Copy files to HDFS

4. How to read your files data in PIG


a. For pigfile-ar.txt

b. For pigfile1-ar.txt

5. Specify schema for above 2 tables.


a. For pigfile-ar.txt

b. For pigfile1-ar.txt

6. Check schema of 2 tables

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 20


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

7. Combine the 2 tables

8. Split the dataset c into 3 relations e.g., d and e.


I need one dataset where $0 has value 1 and other dataset has $0 value 4

9. Do filtering on dataset c where $1 is greater than 6

10. Group dataset c by $2

11. Select column 1 and 2 from dataset a

12. Store result s1 into HDFS as pigresult-ar

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 21


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Check in HDFS

WEEK NO:05 DATE: 17/11/2021

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 22


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROGRAM TITLE: SQOOP

Problem statement:

1. How to enter in mysql CLI in cloudera

2. Create a database

3. Select the database created

4. Create a table inside database

5. Insert records in the table created

6. Check contents in the table

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 23


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

7. Import this table into HDFS

8. Check if data is imported in HDFS

9. Create a file In local system . Move the file into HDFS

10. Create another table

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 24


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

11. Export the file abc.txt into the abc_table created using SQOOP

12. Check the table contents

WEEK NO:03 DATE: 30/11/2021


BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 25
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROGRAM TITLE: HIVE

1. How to enter Hive Shell

2. Create a database

3. How to create Managed Table in HIVE

4. Check where the managed table is created in hive

5. Check Whether Managed Table is created in Hive

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 26


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

6. Describe Emp Table

7. How to see all the tables present in database

8. Select all enames from emp table

9. Get the records where name is ‘A’

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 27


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

10. Count the total no of records in the created table

11. Group the sum of salaries as per the deptno

12 Get the salary of people between 1000 and 2000

13 Select the name of employees where job has exactly 5 characters

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 28


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

14 List the employee names where job has l as the second character

15 Retrieve the total salary for each departemt

16 Add a column to the table

17 How to Rename a Table

18 How to drop a table

BIG DATA ANALYTICS LAB (18071A0597) 2022 BATCH PAGE NO: 29

You might also like