You are on page 1of 3

ANSWER NO 1

To create database
Create database dbname;
To check the list of databases
Show databases;
To drop the selected database
Drop database dbname;
To insert the data into the database
INSERT INTO table_name (col1, col2, col3, col4, col5) VALUES (val1, val2,
val3, val4, val5);
To update the data into the table
UPDATE table_name SET col1 = new_val1, col2 = new_val2 WHERE condition;

ANSWER NO 3
Here are 10 basic and easy commands to work in Hadoop environment:

1. hadoop fs -ls: This command lists the files and directories in the current working directory of
the Hadoop Distributed File System (HDFS).
2. hadoop fs -mkdir: This command creates a new directory in the HDFS.
3. hadoop fs -put: This command copies files from the local file system to the HDFS.
4. hadoop fs -get: This command copies files from the HDFS to the local file system.
5. hadoop fs -cat: This command displays the contents of a file in the HDFS.
6. hadoop fs -rm: This command removes a file or a directory from the HDFS.
7. hadoop fs -du: This command displays the disk usage of a file or a directory in the HDFS.
8. hadoop fs -tail: This command displays the last kilobyte of a file in the HDFS.
9. hadoop fs -chmod: This command changes the permissions of a file or a directory in the HDFS.
10. hadoop jar: This command runs a Hadoop MapReduce job by specifying the path to a JAR file
that contains the MapReduce program.

ANSWER NO 4
Installing Cloudera and starting to work with Hive involves the following steps:

1. Install Cloudera: To install Cloudera, you can download the Cloudera Manager from the
Cloudera website and follow the instructions provided in the Cloudera Manager installation
guide.
2. Set up a Hadoop cluster: After installing Cloudera Manager, you can use it to set up a Hadoop
cluster. Follow the Cloudera Manager installation guide to configure your Hadoop cluster.
3. Start Hive service: Once your Hadoop cluster is set up, you can start the Hive service using
Cloudera Manager. The Hive service provides an SQL-like interface for querying data in the
Hadoop cluster.
4. Create tables in Hive: You can create tables in Hive using HiveQL, a SQL-like language for
querying data in the Hadoop cluster. You can use the HiveQL CREATE TABLE statement to create
a table, specifying the column names, data types, and other properties of the table.
5. Load data into Hive tables: You can load data into Hive tables using the LOAD DATA statement in
HiveQL. The LOAD DATA statement loads data from a file or a directory into a Hive table.
6. Query data using Hive: You can query data in Hive tables using HiveQL. HiveQL supports a
wide range of SQL-like syntax, including joins, subqueries, and user-defined functions.

Overall, working with Hive involves creating tables, loading data into tables, and querying data using
HiveQL. Cloudera Manager provides a user-friendly interface for managing your Hadoop cluster and
starting Hive services, making it easy to get started with Hive.

ANSWER NO 5
There are many tools that are used in big data analysis, but here are five popular ones:

1. Apache Hadoop: Hadoop is an open-source distributed processing framework that is used for
storing and processing large datasets. Hadoop is commonly used for batch processing, but can
also be used for real-time processing. The Hadoop ecosystem includes many tools such as
Hive, Pig, and Spark.
2. Apache Spark: Spark is an open-source distributed processing engine that is designed for big
data processing. Spark is commonly used for real-time processing and data streaming. Spark
has many components such as Spark SQL, Spark Streaming, and MLlib.
3. Apache Hive: Hive is an open-source data warehouse system that provides a SQL-like interface
for querying large datasets stored in Hadoop. Hive converts SQL-like queries into MapReduce
jobs that can be run on the Hadoop cluster.
4. Apache Pig: Pig is an open-source dataflow language and execution environment that is used
for processing large datasets. Pig provides a high-level language for expressing data
processing workflows, which can be compiled into MapReduce jobs that can be run on the
Hadoop cluster.
5. Tableau: Tableau is a data visualization and business intelligence tool that is used for
visualizing large datasets. Tableau provides a drag-and-drop interface for creating interactive
visualizations, dashboards, and reports. Tableau can connect to many data sources including
Hadoop, Spark, and Hive.

These tools are just a few of the many available for big data analysis. Each tool has its own strengths
and weaknesses, and the choice of tool depends on the specific needs of the project or analysis.

ANSWER NO 6
To create a table named "student" with attributes "reg_no", "name", "father's_name", and "mother's_name" with
static partition on "course", you can use the following SQL query:
CREATE TABLE student (
reg_no INT,
name VARCHAR(255),
father_name VARCHAR(255),
mother_name VARCHAR(255)
)
PARTITION BY STATIC COURSE (
'Math',
'Science',
'History'
);

This query creates a table with the specified attributes and partitions the table by the "course" column
using static partitioning. The "PARTITION BY STATIC" statement specifies that the partitions are static
and are defined in the query. The course partitions specified are 'Math', 'Science', and 'History'.

To insert at least 15 records into the table, you can use the following SQL queries:

INSERT INTO student (reg_no, name, father_name, mother_name, course) VALUES


(1, 'John', 'Peter', 'Mary', 'Math'),
(2, 'Kate', 'Robert', 'Lisa', 'Science'),
(3, 'Michael', 'David', 'Julia', 'History'),
(4, 'Jessica', 'Brian', 'Catherine', 'Math'),
(5, 'Daniel', 'Paul', 'Anne', 'Science'),
(6, 'Jennifer', 'James', 'Emily', 'History'),
(7, 'William', 'Richard', 'Samantha', 'Math'),
(8, 'Sophia', 'Thomas', 'Natalie', 'Science'),
(9, 'Benjamin', 'Charles', 'Olivia', 'History'),
(10, 'Oliver', 'Joseph', 'Grace', 'Math'),
(11, 'Isabella', 'Christopher', 'Ava', 'Science'),
(12, 'Ethan', 'Matthew', 'Emma', 'History'),
(13, 'Aiden', 'Donald', 'Madison', 'Math'),
(14, 'Mia', 'Anthony', 'Abigail', 'Science'),
(15, 'Emily', 'George', 'Elizabeth', 'History');
These SQL queries insert 15 records into the "student" table, each with a unique "reg_no" and corresponding
values for "name", "father_name", "mother_name", and "course". The data is partitioned by the "course"
column, with each record assigned to one of the three pre-defined partitions.

You might also like