Big Data & Web Programming Hands-On

14CSIT211 Big Data & Web Programming Practical
Program Educational Outcomes (PEOs)

PEO1. Graduates will be proficient in mathematics, science and engineering concepts, to solve wide
range of CST related problems.
PEO2. Apply current industry computing practices and emerging technologies to analyse,
design, implement, test and verify CST based solutions to real world problems.
PEO3. Engage in lifelong learning to maintain and enhance professional skills in a team environment
or individually.
Program Specific Outcomes (PSOs)
PSO1: Analyse, Design and Develop solutions in the area of Design and Analysis of Algorithms,
Compiler Construction, software engineering, operating systems, big data, and computer networks.
PSO2: Develop competence in advanced machine learning techniques for research and development
to solve real time problems of the society.
PSO3: The ability to think logically and apply standard practices and strategies in software
project development using open-ended programming environments to deliver a quality product.
Program Outcomes (POs)
PO1: Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering problems.
PO2: Problem analysis: Identify, formulate, research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences
PO3: Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate consideration for
the public health and safety, and the cultural, societal, and environmental considerations.
PO4: Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis, and synthesis of the information to provide valid
conclusions.
PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities with an
understanding of the limitations.
PO6: The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
PO7: Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for
sustainable development.
1
PO8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms
of the engineering practice.
PO9: Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
PO10: Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write effective
reports and design documentation, make effective presentations, and give and receive clear instructions.
PO11: Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and leader in
a team, to manage projects and in multidisciplinary environments.
PO12: Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
2
SYLLABUS
B.Tech. IV Year I Semester
14CSIT211 BIG DATA & WEB PROGRAMMING PRACTICALS
LTPC
0032
Course Prerequisite
Computer Programming Fundamentals
Course Description
Big Data and Web Programming Practical’s will make students work on Different Eco
Systems in Hadoop and make students to analyze the different Data Sets. Web programming part
intends to give the basics involved in publishing content on the World Wide Web (WWW).
Course Objectives
1. Understand Hadoop HDFS Commands.
2. Learn Basic Map Reduce in Hadoop
3. Understand the Basics in R.
4. To learn HTML, XML, Javascript, JSP
List of Experiments: Data Analytics
Week 1 Understanding Map Reduce Paradigm.
Week 2 Write a Map Reduce program to compute frequency of words in the text data set.
Week 3 Run a Map Reduce program to find maximum temperature recorded in each year.
Week 4 Write Pig Latin scripts to sort, group, join, project, and filter your data.
Week 5 Use Hive to create, alter, and drop databases.
Week 6 Data visualization using R.
Week 7 Data Analysis using R.
Week 8 Creation of college website using HTML.
Week 9 Usage of XML, Stylesheets.
Week 10 Write a JavaScript program to validate registration form.
Week 11 Write JSP Program to store student information submitted from a registration page into database
table.
Week 12 Develop a program to validate username & password that are stored in Database table using
JSP.
Week 13 Develop Payroll management system using web technologies.
Week 14 Develop Hospital management system using web technologies.
Week 15 Develop Library management system using web technologies.
3
4
Table of Content
Week List of Experiments Page

Big Data Analytics
A. Installation of VMware Workstation on Ubuntu

B. Installation of Hadoop in Ubuntu & VMWare
I. Perform setting up and Installing Hadoop in its three operating modes.
1.  Standalone 06
 Pseudo distributed
 Fully distributed
II. Configuration of Hadoop in Ubuntu
III. Use web based tools to monitor your Hadoop setup
Basic Hadoop Commands
Implement the following file management tasks in Hadoop
2. i. Adding files and directories 12
ii. Retrieving files
iii. Deleting files
3. Understanding Map Reduce Paradigm 14
4. Write a Map Reduce program to compute frequency of words in the text data set 17
5. Run a Map Reduce program to find maximum temperature recorded in each year 21
Pig
6. 24
A. Installation of Pig on Ubuntu
B. Write Pig Latin scripts sort, group, join, project, and filter your data.
Hive
7. 27
A. Installation of Hive on Ubuntu
B. Write Hive to create, alter, and drop databases.
8. Data Visualization using R 29
9. Data Analysis using R 33
Web Programming
10. Creation of College Website using HTML 35

A. Cascading Style Sheets (CSS)
11. 38
B. eXtensible Markup Language (XML)
12. Registration Form - JS Validation 42
13. Registration Form – JSP &MySQL 44
14. Login Validation – JSP &MySQL 46
15. Payroll Management System
16. Hospital Management System
17. Library Management System
5
Week 1 A
Installation of VMware Workstation on Ubuntu
The objective is to install VMware Workstation PRO on Ubuntu 20.04 Focal Fossa Linux. Then, you
can set up virtual machines with VMware Workstation Pro on a single physical machine and use them
simultaneously with the actual machine.
Prerequisites
 System Installed Ubuntu 20.04
 Software VMware Workstation PRO for Linux
Install VMWare Workstation on Ubuntu 20.04 Step by Step Instruction

Step 1. First, download the VMware Workstation PRO for Linux bundle.
Step 2. Install prerequisites. Open up your terminal and execute: $ sudo apt install build-essential
Step 3. Locate the previously downloaded VMware Workstation PRO for Linux bundle file and begin
the installation. Please note that the file name might be different:
$ sudo bash VMware-Workstation-Full-15.5.1-15018445.x86_64.bundle
Step 4. Be patient. Wait for the installation to finish.
-
VMware Workstation PRO installation progress
VMware Workstation PRO installation complete.

6
Step 5. Use Activities to start VMware Workstation PRO.
Start VMware Workstation on Ubuntu 20.04

Step 6. Accept Licenses.
Accept VMware Workstation Licenses and follow the post-installation wizard.

Step 7. Select an user to be able to connect to the Workstation server.
Enter the username

Step 8. Shared virtual machines directory configuration.
Enter directory path

Step 9. Select server connection port.
Enter port number
Step 10. Choose trial or enter license key
Enter license key

Step 11. Enter your password.
Enter your password. You must be included in sudo users.

7
Step 12. VMWare Workstation 15 Pro Ready
Post Lab Assignment

1. Install R - https://rstudio.com/products/rstudio/download/
2. Install Pig - https://www.edureka.co/blog/apache-pig-installation
3. Install Hive - https://www.edureka.co/blog/apache-hive-installation-on-ubuntu
Reference
https://linuxconfig.org/how-to-install-vmware-workstation-on-ubuntu-20-04-focal-fossa-linux
8
Week 1 B
Installation of Hadoop in Ubuntu & VMWare
Prerequisites
• Mac OS / Linux / Cygwin on Windows
• VMWare
• Java Runtime Environment, JavaTM 1.6.x recommended
Notice
• Only works in Ubuntu will be supported by VMWARE.
• ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote
Hadoop daemons.
Types of Hadoop Installation

1. Standalone mode Installation:
2. Psuedo Distributed Mode ( Locally )
9
10
Input / Output
References
[1]. The official site: http://hadoop.apache.org
[2]. Course slides & Textbooks: http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html
[3]. Michael G. Noll's Blog (a good guide): http://www.michael-noll.com/
11
Week 2
Basic Hadoop Commands
Prerequisites
1) A machine with Ubuntu 14.04 LTS operating system installed.
2) Apache Hadoop 2.6.4 pre-installed (How to install Hadoop on Ubuntu 14.04)
Hadoop is an Apache open-source framework written in JAVA which allows distributed processing
of large datasets across clusters of computers using simple programming models.
 Hadoop Common: These are the JAVA libraries and utilities required by other Hadoop modules
which contains the necessary scripts and files required to start Hadoop.
 Hadoop YARN: Yarn is a framework used for job scheduling and managing the cluster resources.
 Hadoop Distributed File System: HDFS is a
Java-based file system that provides scalable and
reliable data storage and it provides high
throughput access to the application data.
 Hadoop MapReduce: It is a software framework,
which is used for writing the applications easily
which process big amount of data in parallel on large clusters.
 Apache hive: It is an infrastructure for data warehousing for Hadoop.
 Apache oozie: It is an application in Java responsible for scheduling Hadoop jobs.
 Apache Pig: It is a data flow platform that is responsible for the execution of the MapReduce jobs
 Apache Spark: It is an open source framework used for cluster computing.
 Flume: Flume is an open source aggression service
responsible for collection and transport of data from source to
destination.
 Hbase: Apache Hbase is a column-oriented database of
Hadoop that stores big data in a scalable way.
 Sqoop: Scoop is an interface application that is used to
transfer data between Hadoop and relational database through
commands.
Hadoop MapReduce
While Hadoop is a framework basically designed to handle a large volume of data both structured and
unstructured. HDFS is a framework designed to manage huge volumes of data in a simple and pragmatic
way.
 Authentication
o Define users
12
o Enable Kerberos in Hadoop
o Set-up Knox gateway to control access & authentication to the HDFS cluster
 Authorization:
o Define groups
o Define HDFS permissions
o Define HDFS ACL’s
 Audit:
o Enable process execution audit trail
 Data protection:
o Enable wire encryption with Hadoop
`
Post Lab Activity [Ref:#2]
1. Create a directory on HDFS in your home directory (Hadoop Distributed File System).
 Create two more directories in a single command in your home directory.
 List the directories created in HDFS and check in what sort order are the contents listed by default?
2. Create a sample file (eg: sample.txt) in any of the directories created above.
 Copy a file from local file system to one of the directories created on HDFS. (This process of copying file from local file
system to HDFS called as Uploading files to HDFS).
 Verify the file upload.
 Copy one more file from local file system to HDFS to another directory created.
 Copy a file from HDFS to local file system (This is called as Downloading a file from HDFS to local file system)
 Look at the contents in the file that is uploaded on HDFS.
 Copy the file from one directory to another directory in HDFS.
 Move the file from one directory to another directory in HDFS.
 Copy a file from/To Local file system to HDFS. Use copyFromLocal and copyToLocal commands.
 Append a file from Local File to system to file on HDFS
 Merge two file contents (files present on HDFS) in to one file (this file should be present on Local file system).
Reference
[1]. https://www.tutorialspoint.com/hadoop/hadoop_introduction.htm
[2]. https://www.edureka.co/community/big-data-hadoop
13
Week 3
Understanding Map Reduce Paradigm
Pre Requirements
1) A machine with Ubuntu 14.04 LTS operating system installed.
2) Apache Hadoop 2.6.4 pre-installed (How to install Hadoop on Ubuntu 14.04)
3) Understanding Basic Hadoop Commands (Week 1)
Hadoop
 Hadoop is an Apache open source framework written in java that allows distributed processing of
large datasets across clusters of computers using simple programming models.
 The Hadoop framework application works in an environment that provides distributed storage and
computation across clusters of computers. Hadoop is designed to scale up from single server to
thousands of machines, each offering local computation and storage
MapReduce
MapReduce is a programming framework that allows us to perform distributed and parallel processing
on large data sets in a distributed environment.
 MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
o Map stage − The map or mapper’s job is to process the input data. Generally the input
data is in the form of file or directory and is stored in the Hadoop file system (HDFS).
The input file is passed to the mapper function line by line. The mapper processes the data
and creates several small chunks of data.
o Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage.
The Reducer’s job is to process the data that comes from the mapper. After processing, it
produces a new set of output, which will be stored in the HDFS.
 During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in
the cluster.
 The framework manages all the details of data-passing such as issuing tasks, verifying task
completion, and copying data around the cluster between the nodes.
 Most of the computing takes place on nodes with data on local disks that reduces the network
traffic.
 After completion of the given tasks, the cluster collects and reduces the data to form an appropriate
result, and sends it back to the Hadoop server.
14
Inputs and Outputs (Java Perspective)

 The MapReduce framework operates on <key, value> pairs, that is, the framework views the input
to the job as a set of <key, value> pairs and produces a set of <key, value> pairs as the output of
the job, conceivably of different types.
 The key and the value classes should be in serialized manner by the framework and hence, need
to implement the Writable interface. Additionally, the key classes have to implement the Writable-
Comparable interface to facilitate sorting by the framework. Input and Output types of
a MapReduce job − (Input) <k1, v1> → map → <k2, v2> → reduce → <k3, v3> (Output).
Input Output
Map <k1, v1> list (<k2, v2>)
Reduce <k2, list(v2)> list (<k3, v3>)
 PayLoad − Applications implement the Map and the Reduce functions, and form the core of the job.
 Mapper − Mapper maps the input key/value pairs to a set of intermediate key/value pair.
 NamedNode − Node that manages the Hadoop Distributed File System (HDFS).
 DataNode − Node where data is presented in advance before any processing takes place.
 MasterNode − Node where JobTracker runs and which accepts job requests from clients.
 SlaveNode − Node where Map and Reduce program runs.
 JobTracker − Schedules jobs and tracks the assign jobs to Task tracker.
 Task Tracker − Tracks the task and reports status to JobTracker.
 Job − A program is an execution of a Mapper and Reducer across a dataset.
 Task − An execution of a Mapper or a Reducer on a slice of data.
 Task Attempt − A particular instance of an attempt to execute a task on a SlaveNode.
15
Example Scenario
Given below is the data regarding the electrical consumption of an organization. It contains the monthly
electrical consumption and the annual average for various years.
If the above data is given as input, we have to write applications to process it and produce results such as
finding the year of maximum usage, year of minimum usage, and so on. This is a walkover for the
programmers with finite number of records.
Input Data
The above data is saved as sample.txt and given as input. The input file looks as shown below.
A. Write the Java Program File Name: ProcessUnits.java

B. Compilation and Execution of Process Units Program
Step 1. Create a directory to store the compiled java classes $ mkdir units
Step 2. Download Hadoop-core-1.2.1.jar (mvnrepository.com), which is used to compile and execute
the MapReduce program. Assume that Hadoop user (/home/hadoop)
Step 3. Compiling the ProcessUnits.java program and creating a jar for the program
$ javac -classpath hadoop-core-1.2.1.jar -d units ProcessUnits.java
$ jar -cvf units.jar -C units/
Step 4. Create an input directory in HDFS $HADOOP_HOME/bin/hadoop fs -mkdir input_dir
Step 5. Copy the input file named sample.txt in the input directory of HDFS
$HADOOP_HOME/bin/hadoop fs -put /home/hadoop/sample.txt input_dir
Step 6. Verify the files in the input directory $HADOOP_HOME/bin/hadoop fs -ls input_dir/
Step 7. run the Eleunit_max application by taking the input files from the input directory
$HADOOP_HOME/bin/hadoop jar units.jar hadoop.ProcessUnits input_dir output_dir
Then, the output will contain the number of input splits, the number of Map tasks, the number of
reducer tasks, etc.
Step 8. Verify the resultant files in the output folder.
$HADOOP_HOME/bin/hadoop fs -ls output_dir/
Step 9. See the output in Part-00000 file. This file is generated by HDFS.
$HADOOP_HOME/bin/hadoop fs -cat output_dir/part-00000
Expected Output
1981 34
1984 40
1985 45
Step 10. Copy the output folder from HDFS to the local file system for analysing.
Reference
[1]. https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
16
Week 4
Write a Map Reduce program to compute frequency of words in the text data
set.
Pre Requirements
1) Understanding Basic Hadoop Commands (Week 1)
2) MapReduce Paradigms (Week 2)
Problem Description
Let us understand, how a MapReduce works by taking an example where I have a text file called input.txt
whose contents are as follows:
Dear, Bear, River, Car, Car, River, Deer, Car and Bear
Now, suppose, we have to perform a word count on the sample.txt using MapReduce. So, we will be
finding the unique words and the number of occurrences of those unique words.
 First, we divide the input into three splits as shown in the Figure. This will distribute the work
among all the map nodes.
 Then, we tokenize the words in each of the mappers and give a hardcoded value (1) to each of the
tokens or words. The rationale behind giving a hardcoded value equal to 1 is that every word, in
itself, will occur once.
 Now, a list of key-value pair will be created where the key is nothing but the individual words and
value is one. So, for the first line (Dear Bear River) we have 3 key-value pairs – Dear, 1; Bear, 1;
River, 1. The mapping process remains the same on all the nodes.
 After the mapper phase, a partition process takes place where sorting and shuffling happen so that
all the tuples with the same key are sent to the corresponding reducer.
 So, after the sorting and shuffling phase, each reducer will have a unique key and a list of values
corresponding to that very key. For example, Bear, [1,1]; Car, [1,1,1].., etc.
 Now, each Reducer counts the values which are present in that list of values. As shown in the
figure, reducer gets a list of values which is [1,1] for the key Bear. Then, it counts the number of
ones in the very list and gives the final output as – Bear, 2.
 Finally, all the output key/value pairs are then collected and written in the output file.
The entire MapReduce program can be fundamentally divided into three parts:
a. Mapper Phase b. Reducer Phase c. Driver
17
Mapper Phase
 We define the data types of input and output key/value pair after the class
declaration using angle brackets.
 Both the input and output of the Mapper is a key/value pair.
 Input
 The key is nothing but the offset of each line in the text
file: LongWritable
 The value is each individual line (as shown in the figure at the right): Text
 Output
The key is the tokenized words: Text
We have the hardcoded value in our case which is 1: IntWritable
Example – Dear 1, Bear 1, etc.
 We have written a java code where we have tokenized each word and assigned them a hardcoded
value equal to 1.
 We have created a class Reduce which extends class Reducer like that of Mapper.
Reduce Phase
 We define the data types of input and output key/value pair after the class declaration using angle
brackets as done for Mapper.
 Both the input and the output of the Reducer is a key-value pair.
 Input
 The key nothing but those unique words which have been generated after the sorting and
shuffling phase: Text
 The value is a list of integers corresponding to each key: IntWritable
 Example – Bear, [1, 1], etc.
 Output
The key is all the unique words present in the input text file: Text
The value is the number of occurrences of each of the unique words: IntWritable
Example – Bear, 2; Car, 3, etc.
 We have aggregated the values present in each of the list corresponding to each key and produced
the final answer.
 In general, a single reducer is created for each of the unique words, but, you can specify the number
of reducer in mapred-site.xml.
Driver Phase
 In the driver class, we set the configuration of our MapReduce job to run in Hadoop.
 We specify the name of the job, the data type of input/output of the mapper and reducer.
 We also specify the names of the mapper and reducer classes.
 The path of the input and output folder is also specified.
 The method setInputFormatClass () is used for specifying how a Mapper will read the input data
or what will be the unit of work. Here, we have chosen TextInputFormat so that a single line is
read by the mapper at a time from the input text file.
18
Step 1. Add all hadoop jar files to your java project
//usr/local/hadoop/share/hadoop/common/*.jar
/usr/local/hadoop/share/hadoop/common/lib/*.jar
/usr/local/hadoop/share/hadoop/mapreduce/*.jar
/usr/local/hadoop/share/hadoop/mapreduce/lib/*.jar
/usr/local/hadoop/share/hadoop/yarn/*.jar
/usr/local/hadoop/share/hadoop/yarn/lib/*.jar
Step 2. Write the Java Program with the File Name : WordCount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,
InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,
InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
19
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/user/hduser/input"));
FileOutputFormat.setOutputPath(job, new Path ("hdfs://localhost:9000/user/hduser/output"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Step 3. Change the directory to /usr/local/hadoop/sbin
$ cd /usr/local/hadoop/sbin
Step 4. Start all hadoop daemons $ start-all.sh

Step 5. Create input.txt file in /home/hduser/Desktop/hadoop/ directory
Step 6. Add following lines to input.txt file.
Deer Beer River
Car Car River
Deer Car Bear
Step 7. Make a new input directory in HDFS
$ hdfs dfs -mkdir /user/hduser/input
Step 8. Copy the input.txt from local file system to HDFS

$ hdfs dfs -copyFromLocal /home/hduser/Desktop/hadoop/input.txt /user/hduser/input
Step 9. Run your WordCount program by submitting java project jar file to hadoop. Creating jar file is
left to you.
$ hadoop jar /path/wordcount.jar WordCount
Step 10. Now you can see the output files.

$ hdfs dfs -cat /user/hduser/output/part-r-00000
Step 11. Dont forget to stop hadoop daemons.

Expected Output
Beer 2
Car 4
Deer 2
River 2
Reference
[1]. http://hadoop.praveendeshmane.co.in/hadoop/hadoop-wordcount-java-example.jsp
[2]. https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
20
Week 5
Run a Map Reduce program to find Maximum Temperature recorded in each year.
Problem Description
The input for our program is weather data files for each
year. This weather data is collected by National Climatic Data
Center (NCDC) from weather sensors at all over the world. You
can find weather data for each year from
ftp://ftp.ncdc.noaa.gov/pub/data/noaa/. All files are zipped by year
and the weather station. For each year, there are multiple files for
different weather stations.
Dataset Description
If we consider the details mentioned in the file, each file has entries that look like this:
0029029070999991905010106004+64333+023450FM-12+000599999V0202301N008219999999N0000001N9-
01391+99999102641ADDGF102991999999999999999999
When we consider the highlighted fields, the first one (029070) is the USAF weather station identifier.
The next one (19050101) represents the observation date. The third highlighted one (0139) represents
the air temperature in Celsius times ten. So the reading of 0139 equates to 13.9 degrees Celsius. The
next highlighted and italic item indicates a reading quality code.
Implementation
MapReduce is based on set of key value pairs. So first we have to decide on the types for the key/value
pairs for the input.
Map Phase: The input for Map phase is set of weather data files as shown in snap shot. The types of input
key value pairs are LongWritable and Text and the types of output key value pairs are Text and
IntWritable. Each Map task extracts the temperature data from the given year file. The output of the map
phase is set of key value pairs. Set of keys are the years. Values are the temperature of each year.
Reduce Phase: Reduce phase takes all the values associated with a particular key. That is all the
temperature values belong to a particular year is fed to a same reducer. Then each reducer finds the highest
recorded temperature for each year. The types of output key value pairs in Map phase is same for the types
of input key value pairs in reduce phase (Text and IntWritable). The types of output key value pairs in
reduce phase is too Text and IntWritable.
 HighestMapper.java
 HighestReducer.java
 HighestDriver.java
21
Step1: Write the following Source Code in Java Editor
HighestMapper.java
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class HighestMapper extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable>
{
public static final int MISSING = 9999;
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
String year = line.substring(15,19);
int temperature;
if (line.charAt(87)=='+')
temperature = Integer.parseInt(line.substring(88, 92));
else
temperature = Integer.parseInt(line.substring(87, 92));
String quality = line.substring(92, 93);
if(temperature != MISSING && quality.matches("[01459]"))
output.collect(new Text(year),new IntWritable(temperature));
}
}
HighestReducer.java
import java.util.Iterator;
public class HighestReducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException
{
int max_temp = 0;
while (values.hasNext())
{
int current=values.next().get();
if ( max_temp < current)
max_temp = current;
}
output.collect(key, new IntWritable(max_temp/10));
}
}
HighestDriver.java
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
22
import org.apache.hadoop.util.*;
public class HighestDriver extends Configured implements Tool{
public int run(String[] args) throws Exception
{
JobConf conf = new JobConf(getConf(), HighestDriver.class);
conf.setJobName("HighestDriver");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(HighestMapper.class);
conf.setReducerClass(HighestReducer.class);
Path inp = new Path(args[1]);
Path out = new Path(args[2]);
FileInputFormat.addInputPath(conf, inp);
FileOutputFormat.setOutputPath(conf, out);
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception
{
int res = ToolRunner.run(new Configuration(), new HighestDriver(),args);
System.exit(res);
}
}
Step 2 : Build jar file using any IDE like netbeans or eclipse
Note: add all the jars in the folder /usr/local/hadoop/share/hadoop/common and
/usr/local/hadoop/share/hadoop/mapreduce folders to your project in IDE
Step 3: Copy input files from local system to HDFS

hduser@VSK:/usr/local/hadoop/sbin$ hadoop fs -copyFromLocal /home/mkp/Desktop/1901 /HighestTemp/input
(OR)
hduser@VSK:/usr/local/hadoop/sbin$ hadoop fs -copyFromLocal /home/mkp/Desktop/190* /HighestTemp/input
Step 4: Executing jar file

hduser@VSK:/usr/local/hadoop/sbin$ hadoop jar /home/mkp/Desktop/HighestDriver.jar HighestDriver
/HighestTemp/input/ /HighestTemp/output
Expected Output
1901 45
1902 40
1903 46
Reference
[1]. https://sites.google.com/site/sraochintalapudi/big-data-analytics/hadoop-mapreduce-programs
23
Week 6A
Installation of Pig
Prerequisite
 VMWare
 Web browser
 4 GB RAM
 Hard Disk 80 GB.
Description
Apache Pig is a platform which is used to create and execute the programs of MapReduce that are
utilized in Hadoop for large data sets who are not familiar with Java. Pig Latin is a high-level language,
platform independent and HDFS. The programs written in Apache Latin can be run on any platform, even
over distributed database environment of the Hadoop File System or HDFS. The Pig script similar like
other languages with own syntax and semantics.
Steps to be followed for installation of Apache Pig on Linux
Step 1. Download Pig tar file

wget http://www-us.apache.org/dist/pig/pig-0.16.0/pig-0.16.0.tar.gz
Step 2. Extract the tar file using tar command. In below tar command, x means extract an archive
file, z means filter an archive through gzip, f means filename of an archive file.
tar -xzf pig-0.16.0.tar.gz
ls
Step 3: Edit the “.bashrc” file to update the environment variables of Apache Pig. We are setting it so
that we can access pig from any directory, we need not go to pig directory to execute pig commands. Also,
if any other application is looking for Pig, it will get to know the path of Apache Pig from this file.
sudo gedit .bashrc
Add the following at the end of the file:
# Set PIG_HOME
export PIG_HOME=/home/edureka/pig-0.16.0
export PATH=$PATH:/home/edureka/pig-0.16.0/bin
export PIG_CLASSPATH=$HADOOP_CONF_DIR
Also, make sure that hadoop path is also set.
Step 4. Run below command to make the changes get updated in same terminal. source .bashrc
Step 5: Check pig version. pig –version
Step 6: Run Pig to start the grunt shell. Grunt shell is used to run Pig Latin scripts. pig
Execution modes in Apache Pig

 MapReduce Mode - default mode - Hadoop cluster - pig -x mapreduce or pig
 Local Mode - single machine - pig -x local
Reference
[1]. https://www.edureka.co/blog/apache-pig-installation
24
Week 6B
Basic Pig Latin Script Commands
1. Fs: This will list all the file in the HDFS

grunt> fs –ls
2. Clear: This will clear the interactive Grunt shell.

grunt> clear
3. History: This command shows the commands executed so far.

grunt> history
4. Reading Data: Assuming the data resides in HDFS, and we need to read data to Pig.
grunt> college_students = LOAD ‘hdfs://localhost:9000/pig_data/college_data.txt’ USING PigStorage(‘,’)
as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
5. Storing Data: Store operator is used to storing the processed/loaded data.
grunt> STORE college_students INTO ‘ hdfs://localhost:9000/pig_Output/ ‘ USING PigStorage (‘,’);
Here, “/pig_Output/” is the directory where relation needs to be stored.
6. Dump Operator: This command is used to display the results on screen. It usually helps in
debugging.
grunt> Dump college_students;
7. Describe Operator: It helps the programmer to view the schema of the relation.
grunt> describe college_students;
8. Explain: This command helps to review the logical, physical and map-reduce execution plans.
grunt> explain college_students;
9. Illustrate operator: This gives step-by-step execution of statements in Pig Commands.
grunt> illustrate college_students;
Reference
[1]. https://www.educba.com/pig-commands/
25
Week 6C
Write Pig Latin scripts to sort, group, join, project, and filter your data
1. Group: This command works towards grouping data with the same key.
grunt> group_data = GROUP college_students by first name;
2. Cogroup: It works similarly to the group operator. The main difference between Group & Cogroup
operator is that group operator usually used with one relation, while cogroup is used with more than one
relation.
3. Join: This is used to combine two or more relations.
 Example: In order to perform self-join, let’s say relation “customer” is loaded from HDFS tp pig
commands in two relations customers1 & customers2.
grunt> customers3 = JOIN customers1 BY id, customers2 BY id;
 Join could be self-join, Inner-join, Outer-join.

4. Cross: This pig command calculates the cross product of two or more relations.
grunt> cross_data = CROSS customers, orders;
5. Union: It merges two relations. The condition for merging is that both the relation’s columns and
domains must be identical.
grunt> student = UNION student1, student2;
6. Filter: This helps in filtering out the tuples out of relation, based on certain conditions.
grunt> filter_data = FILTER college_students BY city == ‘Chennai’;
7. Distinct: This helps in removal of redundant tuples from the relation.

grunt> distinct_data = DISTINCT college_students;
 This filtering will create new relation name “distinct_data”

8. Foreach: This helps in generating data transformation based on column data.
grunt> foreach_data = FOREACH student_details GENERATE id,age,city;
 This will get the id, age, and city values of each student from the relation student_details and hence
will store it into another relation named foreach_data.
9. Order by: This command displays the result in a sorted order based on one or more fields.
grunt> order_by_data = ORDER college_students BY age DESC;
 This will sort the relation “college_students” in descending order by age.

10. Limit: This command gets limited no. of tuples from the relation.
grunt> limit_data = LIMIT student_details 4;
Reference
[1]. https://www.educba.com/pig-commands/
26
Week 7A
Installation of HIVE
Prerequisites
 Java Installation - $ java –version
 Hadoop Installation - $ hadoop –version
Step by Step Instructions of Hive Installation

Step 1: Download the tar file.
http://apachemirror.wuchna.com/hive/stable-2/apache-hive-2.3.6-bin.tar.gz0
Step 2: Extract the file.

sudo tar zxvf /Downloads/apache-hive-* -C /usr/local
Step 3: Move apache files to /usr/local/hive directory.

sudo mv /usr/local/apache-hive-* /usr/local/hive
Step 4: Set up the Hive environment by appending the following lines to ~/.bashrc file
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.
Step 5: Execute the bashrc file.
$ source ~/.bashrc
Step 6: Hive Configuration- Edit hive-env.sh file to append this:

export HADOOP_HOME=/usr/local/Hadoop
Step 7: Edit using the below commands:

$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh
 Now to verify the hive is installed or not, use command hive-version.

 Here, hive-version enters the hive shell which means the hive is installed. However in my case it is the
older version hence giving the warning.
Step 8: Verify Hive Installed or not:
Reference
[1]. https://www.educba.com/hive-installation/
27
Week 7B
Use Hive to create, alter, and drop databases
Prerequisites
 VMWare
 XAMPP Server
 Web Browser
Databases in Hive
The Hive concept of a database is essentially just a catalog or namespace of tables. If you don’t specify
a database, the default database is used.
The location for external hive Table is “/warehouse/tablespace/external/hive/” and the location for
manage Table is “/warehouse/tablespace/managed/hive”.
1. Creating a Database
hive> CREATE DATABASE MITS;
hive> CREATE DATABASE IF NOT EXISTS MITS;
hive> SHOW DATABASES;
hive> use MITS;
hive (MITS) > USE default;
2. Alter a Database
hive> ALTER DATABASE MITS SET DBPROPERTIES ('edited-by' = 'Sys_USER');
Creating Tables
We can broadly classify our table requirement in two different ways;
1. Hive internal table
2. Hive external table
CREATE TABLE IF NOT EXISTS MITS.employees ( name STRING COMMENT 'Employee name', salary FLOAT
COMMENT 'Employee salary', subordinates ARRAY COMMENT 'Names of subordinates', deductions MAP
COMMENT 'Keys are deductions names, values are percentages', address STRUCT COMMENT 'Home
address') COMMENT 'Description of the table' TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02
10:00:00', ...) LOCATION '/user/hive/warehouse/MITS.db/employees';
hive> USE MITS;

hive> SHOW TABLES;
hive> DESCRIBE EXTENDED MITS.employees;
ALTER TABLE name RENAME TO new_name;

ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...]);
ALTER TABLE name DROP [COLUMN] column_name;
ALTER TABLE name CHANGE column_name new_name new_type;
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...]);
3. Drop Database
hive> DROP DATABASE IF EXISTS MITS;
hive> DROP DATABASE IF EXISTS MITS CASCADE;
Reference
[1]. https://www.educba.com/hive-table
28
Week 8
Data Visualization using R
R programming provides comprehensive sets of tools such as in-built functions and a wide range of
packages to perform data analysis, represent data and build visualizations.
Data visualization in R can be performed in the following ways:
 Base Graphics
 Grid Graphics
 Lattice Graphics
 ggplot2
Data Visualization in R with ggplot2 package

ggplot2 is one of the most sophisticated packages in R for data visualization, and it helps create the
most elegant and versatile print-quality plots with minimal adjustments. It is very simple to create
single- and multivariable graphs with the help of the ggplot2 package.
The ggplot2 grammar of graphics is composed of the following:

 Data
 Layers
 Scales
 Coordinates
 Faceting
 Themes
The three basic components to build a ggplot are as follows:

 Data:– Dataset to be plotted
 Aesthetics:- Mapping of data to visualization
 Geometry/Layers:- Visual elements used for the data
#To Install and load the ggplot2 package

install.packages(“ggplot2”)
library(ggplot2)
#use the mtcars dataset from the datasets package in R that can be loaded as follows:
#To load datasets package
library("datasets")
#To load iris dataset
data(mtcars)
#To analyze the structure of the dataset
str(mtcars)
1. Scatter Plots
#Since the following columns have discrete(categorical) set of values, So we can convert them to factors
for optimal plotting
mtcars$am <- as.factor(mtcars$am)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$gear <- as.factor(mtcars$gear)
#To draw scatter plot
29
ggplot(mtcars, aes(x= cyl , y= vs)) + geom_point()
#Here width argument is used to set the amount of jitter

ggplot(mtcars, aes(x= cyl , y= vs)) + geom_jitter(width = 0.1)
#We use the color aesthetic to introduce third variable with a legend on the right side
ggplot(mtcars, aes(x= cyl,y= vs,color = am)) + geom_jitter(width = 0.1, alpha = 0.5)
#To add the labels

ggplot(mtcars, aes(x= cyl , y= vs ,color = am)) + geom_jitter(width = 0.1, alpha = 0.5) +
labs(x = "Cylinders",y = "Engine Type", color = "Transmission(0 = automatic, 1 = manual)")
30
#To plot with shape =1 and size = 4

ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) + geom_point(size = 4, shape = 1, alpha = 0.6) +
labs(x = "Weight",y = "Miles per Gallon", color = "Cylinders")
2. Bar Plots
To draw a bar plot of cyl(Number of Cylinders) according to the Transmission type using geom_bar()
and fill()
ggplot(mtcars, aes(x = cyl, fill = am)) + geom_bar() + labs(x = "Cylinders", y = "Car count", fill =
"Transmission")
3. Histograms
#To plot a histogram for mpg (Miles per Gallon), according to cyl(Number of Cylinders), we use the
geom_histogram() function
gplot(mtcars, aes(mpg,fill = cyl)) + geom_histogram(binwidth = 1)+ theme_bw()+ labs(title = "Miles
per Gallon by Cylinders",x = "Miles per Gallon",y = "Count",fill = "Cylinders")
31
4. Boxplot
#To draw a Box plot
ggplot(mtcars, aes(x = cyl,y = mpg)) + geom_boxplot(fill = "cyan", alpha = 0.5) + theme_bw() +
labs(title = "Cylinder count vs Miles per Gallon",x = "Cylinders", y = "Miles per Gallon")
Reference
[1]. https://intellipaat.com/blog/tutorial/r-programming/data-visualization-in-r/
32
Week 9
Data Analysis using R
Reading different types of data sets (.txt, .csv) from web and disk and writing in file in specific disk
location.
library(utils)
data<- read.csv("D:\input.csv")
data
Output
ID Name DOB
1 Saran 01-01-2001
2 Ramu 11-11-1999
3 Raju 10-10-2010
data<- read.csv("input.csv")
print(is.data.frame(data))
print(ncol(data))
print(nrow(data))
Output
[1] TRUE
[1] 3
[1] 3
Reading Excel data sheet in R
install.packages("xlsx")
library("xlsx")
data<- read.xlsx("input.xlsx", sheetIndex = 1)
data
Output
ID Name DOB
1 Saran 01-01-2001
2 Ramu 11-11-1999
3 Raju 10-10-2010
33
Reading XML dataset in R
install.packages("XML")
library("XML")
library("methods")
result<- xmlParse(file = "input.xml")
result
Output
1
Saran
01-01-2001
2
Ramu
11-11-1999
3
Raju
10-10-2010
Reference
[1]. https://www.iare.ac.in/sites/default/files/lab1/IARE_DS_LABORATORY_LAB_MANUAL_0.pdf
34
Week 10
Creation of College Website using HTML
Description
1. Home page:-the static home page must contains three pages
2. Top frame:-logo and college name and links to homepage, login page, registration Page, catalogue
page and cart page
3. Left frame:-at least four links for navigation which will display the catalogue of Respective links
4. Right frame:-the pages to links in the left frame must be loaded here initially it Contains the
description of the website
HOMEPAGE.HTML
<head>
<frameset rows="25%,*">
<frame src="topframe.html"name="f1">
<frameset cols="20%,*">
<frame src="leftframe.html"name="f2">
<frame src="rightframe.html"name="f3">
</frameset>
</frameset>
</head>
LEFTFRAME.HTML
<html>
<body>
<a href="C:\Users\user\Desktop\Web\cse.html"
target="f3"><h3>CSE</h3></a><br><br><br><br><br>
<a href=ece.html target="f3"><h3>ECE</h3></a><br><br><br><br><br>
<a href=eee.html target="f3"><h3>EEE</h3></a><br><br><br><br><br>
<a href=mech.html target="f3"><h3>MECH</h3></a><br><br><br><br><br>
<a href=civil.html target="f3"><h3>Civil</h3></a>
</body>
</html>
RIGHTFRAME.HTML
<html>
<body bgcolor="pink">
<p>
<h2 align="center"><font face="times new roman" color="green"
>Madanapalle Institute of Technology & Science</font></h2>
<h3> <font face="monotype corsiva"
color=blue>
Madanapalle Institute of Technology & Science is established in 1998
in the picturesque and pleasant environs of Madanapalle and is ideally
located on a sprawling 30 acre campus on Madanapalle - Anantapur Highway
(NH-205) near Angallu, about 10km away from Madanapalle.MITS,
originated under the auspices of RatakondaRanga Reddy Educational
Academy under the proactive leadership of Sri. N. Krishna Kumar M.S.
35
(U.S.A), President and Dr. N. VijayaBhaskarChoudary, Ph.D., Secretary
& Correspondent of the Academy.
</font></h3>
<img src="C:\Users\user\Desktop\web\stanford.jpg">
</p>
</body>
</html>
CSE.HTML
<html>
<body>
<img src="C:\Users\user\Desktop\\Web\cse.jpeg">
<p><b>
The Department of Computer Science & Engineering offers 4 year degree,
which is established in the year 1998. The course is flexible and has
been structured to meet the evolving needs of the IT industry. The
Department is offering M.Tech - (C.S.E) from the academic year 2007 -
2008.
</p></b><br/>
<h2><b>Vision<br/><b></h2>
To excel in technical education and research in area of Computer Science
& Engineering and to provide expert, proficient and knowledgeable
individuals with high enthusiasm to meet the Societal challenges
<h2><b>Mission<br/><b></h2>
M1: To provide an open environment to the students and faculty that
promotes professional and personal growth.
M2: To impart strong theoretical and practical background across the
computer science discipline with an emphasis on software development
and research.
M3: To inculcate the skills necessary to continue their education after
graduation, as well as for the societal needs.</p>Head of the Department
</body>
</html>
CIVIL.HTML
<html>
<body>
<img src="C:\Users\user\Desktop\Web\civil.jpg">
<p><b>The Department of Civil Engineering is started in the year
2014.The Department offers 4 years B.Tech programme. </b><br/>
<h2><b>Vision<br/><b></h2>
To perpetually generate quality human resource in civil engineering who
can contribute constructively to the technological and socio-economic
development of the Nation.
<h2><b>Mission<br/><b></h2>
M1: To impart high quality education to enable students to face the
challenges in the fields of Electronics and Communication Engineering.
M2: To provide facilities, infrastructure, environment to develop the
spirit of innovation, creativity, and research among students and
faculty.
M3: To inculcate ethical, moral values and lifelong learning skills in
students to address the societal needs.
36
</p>Head of the Department
</body>
</html>
ECE.HTML
<html>
<body>
<img src="C:\Users\user\Desktop\web\ece.jpg">
<p><b>Department of Electronics & Communication started functioning
from the academic year 1998 for B. Tech course. The department has
distinguished faculty, Most of them holding M.Tech degrees. The website
will provide one more channel of communication between the department
and students and will help in faster dissemination of information.
</p>Head of the Department
</body>
</html>
EXPECTED OUTPUT
37
Week 11A
Cascading Style Sheets (CSS)
Description
Cascading Style Sheets (CSS) is used to format the layout of a webpage.
With CSS, you can control the color, font, and the size of text, the spacing between elements, how
elements are positioned and laid out, what background images or background colors are to be used,
different displays for different devices and screen sizes, and more.
CSS can be added to HTML documents in 3 ways:
 Inline - by using the style attribute inside HTML elements

 Internal - by using a <style> element in the <head> section
 External - by using a <link> element to link to an external CSS file
Procedure
1. Use different font styles
2. Set background image for both the page and single elements on page.
3. Control the repetition of image with background-repeat property
4. Define style for links as a:link, a:active, a:hover, a:visited
5. Add customized cursors for links.
6. Work with layers.
Inline.html
<html>
<head>
<title>Inline Style Sheet</title>
</head>
<body>
<p>This is Simple Text</p>
<p style="font-size:30pt;font-family:Script;color:red">This is Text is
Different</p>
<p style="font-size:40pt;font-family:Arial;color:green">This is Text
is Different</p>
</body>
</html>
Embedded.html
<html>
<head>
<title>Embedded Style Sheet</title>
<style type="text/css">
h1
{
font-family:Arial;
}
h2
{
font-family:Script;
color:red;
38
left:20px;
}
h3
{
font-family:Monotype Corsiva;
color:blue;
}
</style>
</head>
<body>
<h1>This is Simple Text</h1>
<h2>This is Text is Different</h2>
</body>
</html>
External.html
<html>
<head>
<title>External Style Sheet</title>
<link rel="stylesheet"type="text/css"
href="C:\Users\SBCEC\Desktop\News MITS\New Output\Ex1.css"/>
</head>
<body>
<h1>This is Simple Text</h1>
</body>
</html>
Ex1(styles).css
h1
{
font-family:Arial;
}
h2
{
font-family:Script;
color:red;
left:20px;
}
h3
{
font-family:Monotype Corsiva;
color:blue;
}
EXPECTED OUTPUT
39
Week 11B
eXtensible Markup Language (XML)
Problem Description
1) Write an XML file which displays the book details that includes the following:
 Title of book
 Author name
 Edition
 Price
2) Write a DTD to validate the above XML file and display the details in a table (to do this use
XSL).
Description
eXtensible Markup Language (XML) is an open standard providing the means to share data and
information between computers and computer programs as unambiguously as possible. Once transmitted,
it is up to the receiving computer program to interpret the data for some useful purpose thus turning the
data into information. Sometimes the data will be rendered as HTML. Other times it might be used to
update and/or query a database. Originally intended as a means for Web publishing, the advantages of
XML have proven useful for things never intended to be rendered as Web pages.
Procedure
1. Just about every browser can open an XML file.
2. In Chrome, just open a new tab and drag the XML file over.
3. Alternatively, right click on the XML file and hover over "Open with" then click "Chrome".
4. When you do, the file will open in a new tab
BOOKSTORE.XSL
<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<center>Book Details</center>
</head>
<body bgcolor=”yellow”>
<hr width="50%"/>
<table border="15" align="center">
<trbgcolor="blue">
<th> TITLE </th>
<th> AUTHOR </th>
<th> YEAR </th>
<th> PRICE </th>
</tr>
<xsl:for-each select="bookstore/book">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="author"/></td>
<td><xsl:value-of select="year"/></td>
<td><xsl:value-of select="price"/></td>
</tr>
40
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
BOOKSTORE.XML
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="C:\Users\user\Desktop\04-10-
2017\Library Information-XML\bookstore.xsl"?>
<bookstore>
<book>
<title>Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book>
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
</book>
<book>
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
</book>
</bookstore>
EXPECTED OUTPUT
41
Week 12
Registration Form – JS Validation
Problem Description
Write a Java Script to validate the following fields in a registration page
1. Name (should contains alphabets and the length should not be less than 6 characters)
2. Email (should not be less than 6 characters)
3. Zipcode (should not be less than 6 characters)
4. Country (should not be less than 6 characters)
5. E-mail (should not contain invalid addresses)
Description
HTML Form validation is the process of making sure that data supplied by the user using a form,
meets the criteria set for collecting data from the user. For example, if you are using a registration form,
and you want your user to submit name, email id and address, you must use a code (in JavaScript or in
any other language) to check whether the user entered a name containing alphabets only, a valid email
address and a proper address. JavaScript provides facility to validate the form on the client-side so data
processing will be faster than server-side validation.
<html>
<head>
<title>Form Validation</title>
<script type="text/javascript">

</script>
</head>
<body>
<form action="/cgi-bin/test.cgi" name="myForm" onsubmit="return(validate());">
<table cellspacing="2" cellpadding="2" border="1">
<tr>
<td align="right">Name</td>
<td><input type="text" name="Name" /></td>
</tr>
<tr>
<td align="right">EMail</td>
<td><input type="text" name="EMail" /></td>
</tr>
<tr>
<td align="right">Zip Code</td>
<td><input type="text" name="Zip" /></td>
</tr>
<tr>
<td align="right">Country</td>
<td>
<select name="Country">
<option value="-1" selected>[choose yours]</option>
<option value="1">USA</option>
<option value="2">UK</option>
<option value="3">INDIA</option>
42
</select>
</td>
</tr>
<tr>
<td align="right"></td>
<td><input type="submit" value="Submit" /></td>
</tr>
</table>
</form>
</body>
</html>
EXPECTED OUTPUT
43
Week 13
Registration - JSP & MySQL
Problem Description
 Write a JSP program to store information submitted from registration page into MySQL
database.
Procedure
1. Create a Database ‘test’ and a Table ‘users’ in MYSQL using CREATE query.
2. Write an index.html as a home page for JSP program.
3. Write a process.jsp to store information submitted from registration page into MySQL database
directly or using DAO method.
4. You need a JSP/Serlvet container that can server JSP pages for you. There a number of JSP/Servlet
containers available and used but the most commonly used is Apache Tomcat.
a. To setup Tomcat,
i. simply download it and extract it into any folder.
ii. Next, place your application inside the webapps folder.
iii. Start the server.
iv. Open your browser and browse to URL http://localhost:8080, it should open the tomcat's
default page.
v. Package your application as war, and place it in the webapps folder of tomcat.
vi. Now Access the URL http://localhost:8080/YourApplicatioContext/path/to/jspFile and you will see your
JSP file being compiled and served in the browser.
b. Double Click process.jsp
5. Insert the values in the form and click submit.
6. Check the users Table in MySQL.
SQL Query
CREATE TABLE users
(
id int NOT NULL AUTO_INCREMENT,
first_name varchar(50),
last_name varchar(50),
city_name varchar(50),
email varchar(50),
PRIMARY KEY (id)
);
Here we using 2 files for insert data in MySQL:
 index.html:for getting the values from the user

 process.jsp:A JSP file that process the request
index.html
<!DOCTYPE html>
<html>
<body>
<form method="post" action="process.jsp">
First name:<br>
<input type="text" name="first_name">
<br>
Last name:<br>
<input type="text" name="last_name">
<br>
City name:<br>
<input type="text" name="city_name">
<br>
Email Id:<br>
<input type="email" name="email">
<br><br>
<input type="submit" value="submit">
44
</form>
</body>
</html>
process.jsp
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"%>
<%@page import="java.sql.*,java.util.*"%>
<%
String first_name=request.getParameter("first_name");
String last_name=request.getParameter("last_name");
String city_name=request.getParameter("city_name");
String email=request.getParameter("email");
try
{
Class.forName("com.mysql.jdbc.Driver");
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/test",
"root", "");
Statement st=conn.createStatement();
int i=st.executeUpdate("insert into users(first_name,last_name,city_name,email)

values('"+first_name+"','"+last_name+"','"+city_name+"','"+email+"')");
out.println("Data is successfully inserted!");
}
catch(Exception e)
{
System.out.print(e);
e.printStackTrace();
}
%>
Expected Output
Reference
[1]. https://bigdatapath.wordpress.com/2018/03/13/run-your-first-jsp-program-in-apache-tomcat-
server/ [Installation & Configuration]
45
Week 14
Login Validation – JSP & MySQL
Problem Description
 Here we have created a simple login form using MySQL Database Connection and back end
validation.
 We have also added a video tutorial of the program that will guide you in creating it.
 We will be using Eclipse IDE for compile and Tomcat 7 server for deploying the application.
 The record will be fetched from database and matched with the input value given through the login
form.
Procedure
1. Create a simple login logout example using JSP.
2. When a user inputs information in a form, it is validated with the record saved into the database table.
So first create a database table that will have dummy values inserted by us.
3. We have used JSP implicit object session and setAttribute() method for setting the attribute value in
session and getAttribute() method for getting the attribute value from the same session. invalidate()
method is used for ending the session.
4. Then other jsp pages like home.jsp, login.jsp, logout.jsp, welcome.jsp, error.jsp and index.jsp pages
will be created.
The following image is of the Database table that shows the information added in it. It is with these values
that the program checks the information added by the user and returns the result.
home.jsp
<%@ page import="java.sql.*" %>

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Home</title>
</head>
<body>
<%
Connection con= null;
PreparedStatement ps = null;
ResultSet rs = null;
46
String driverName = "com.mysql.jdbc.Driver";
String url = "jdbc:mysql://localhost:3306/record";
String user = "root";
String password = "root";
String sql = "select usertype from userdetail";
try {
Class.forName(driverName);
con = DriverManager.getConnection(url, user, password);
ps = con.prepareStatement(sql);
rs = ps.executeQuery();
%>
<form method="post" action="login.jsp">

<center><h2 style="color:green">JSP Login Example</h2></center>
<table border="1" align="center">
<tr>
<td>Enter Your Name :</td>
<td><input type="text" name="name"/></td>
</tr>
<tr>
<td>Enter Your Password :</td>
<td><input type="password" name="password"/></td>
</tr>
<tr>
<td>Select UserType</td>
<td><select name="usertype">
<option value="select">select</option>
<%
while(rs.next())
{
String usertype = rs.getString("usertype");
%>
<option value=<%=usertype%>><%=usertype%></option>
<%
}
}
catch(SQLException sqe)
{
out.println("home"+sqe);
}
47
%>
</select>
</td>
</tr>
<tr>
<td></td>
<td><input type="submit" value="submit"/></td>
</table>
</form>
</body>
</html>
login.jsp
<html>
<head>
<title>Login</title>
</head>
<body>
<%! String userdbName;
String userdbPsw;
String dbUsertype;
%>
<%
Connection con= null;
PreparedStatement ps = null;
ResultSet rs = null;
String driverName = "com.mysql.jdbc.Driver";
String url = "jdbc:mysql://localhost:3306/record";
String user = "root";
String dbpsw = "root";
String sql = "select * from userdetail where name=? and password=? and usertype=?";
String name = request.getParameter("name");
String password = request.getParameter("password");
String usertype = request.getParameter("usertype");
if((!(name.equals(null) || name.equals("")) && !(password.equals(null) || password.equals(""))) &&

!usertype.equals("select"))
48
{
try{
Class.forName(driverName);
con = DriverManager.getConnection(url, user, dbpsw);
ps = con.prepareStatement(sql);
ps.setString(1, name);
ps.setString(2, password);
ps.setString(3, usertype);
rs = ps.executeQuery();
if(rs.next())
{
userdbName = rs.getString("name");
userdbPsw = rs.getString("password");
dbUsertype = rs.getString("usertype");
if(name.equals(userdbName) && password.equals(userdbPsw) && usertype.equals(dbUsertype))

{
session.setAttribute("name",userdbName);
session.setAttribute("usertype", dbUsertype);
response.sendRedirect("welcome.jsp");
}
}
else
response.sendRedirect("error.jsp");
rs.close();
ps.close();
}
catch(SQLException sqe)
{
out.println(sqe);
}
}
else
{
%>
<center><p style="color:red">Error In Login</p></center>
<%
getServletContext().getRequestDispatcher("/home.jsp").include(request,response);
}
%></body></html>
49
welcome.jsp
<html>
<head>
<title>Welcome</title>
</head>
<body>
<p>Welcome, <%=session.getAttribute("name")%></p>
<p><a href="logout.jsp">Logout</a>
</body>
</html>
error.jsp
html>
<head>
<title>Login Error</title>
</head>
<body>
<center><p style="color:red">Sorry, your record is not available.</p></center>
<%
getServletContext().getRequestDispatcher("/home.jsp").include(request,response);
%>
</body>
</html>
logout.jsp
<html>
<head>
<title>Logout</title>
</head>
<body>
<% session.invalidate(); %>
<p>You have been successfully logout</p>
</body>
</html>
50
Expected Output
Reference
https://www.roseindia.net/jsp/jsp-login-form-with-mysql-database-connection-and-back-end-
validation.shtml
51

Big Data & Web Programming Hands-On

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Big Data & Web Programming Hands-On

Uploaded by

Copyright:

Available Formats

14CSIT211 Big Data & Web Programming Practical

Program Educational Outcomes (PEOs)

Week List of Experiments Page

A. Installation of VMware Workstation on Ubuntu

10. Creation of College Website using HTML 35

16. Hospital Management System

17. Library Management System

Install VMWare Workstation on Ubuntu 20.04 Step by Step Instruction

Step 4. Be patient. Wait for the installation to finish.

VMware Workstation PRO installation complete.

Start VMware Workstation on Ubuntu 20.04

Accept VMware Workstation Licenses and follow the post-installation wizard.

Enter the username

Enter directory path

Enter port number

Step 10. Choose trial or enter license key

Enter license key

Enter your password. You must be included in sudo users.

Post Lab Assignment

Types of Hadoop Installation

2. Psuedo Distributed Mode ( Locally )

Inputs and Outputs (Java Perspective)

A. Write the Java Program File Name: ProcessUnits.java

Step 4. Start all hadoop daemons $ start-all.sh

Step 8. Copy the input.txt from local file system to HDFS

Step 10. Now you can see the output files.

Step 11. Dont forget to stop hadoop daemons.

Step 3: Copy input files from local system to HDFS

Step 4: Executing jar file

Step 1. Download Pig tar file

Execution modes in Apache Pig

1. Fs: This will list all the file in the HDFS

2. Clear: This will clear the interactive Grunt shell.

3. History: This command shows the commands executed so far.

 Join could be self-join, Inner-join, Outer-join.

7. Distinct: This helps in removal of redundant tuples from the relation.

 This filtering will create new relation name “distinct_data”

 This will sort the relation “college_students” in descending order by age.

Step by Step Instructions of Hive Installation

Step 2: Extract the file.

Step 3: Move apache files to /usr/local/hive directory.

Step 6: Hive Configuration- Edit hive-env.sh file to append this:

Step 7: Edit using the below commands:

 Now to verify the hive is installed or not, use command hive-version.

hive> USE MITS;

ALTER TABLE name RENAME TO new_name;

Data Visualization in R with ggplot2 package

The ggplot2 grammar of graphics is composed of the following:

The three basic components to build a ggplot are as follows:

#To Install and load the ggplot2 package

#Here width argument is used to set the amount of jitter

#To add the labels

#To plot with shape =1 and size = 4

Reading Excel data sheet in R

Cascading Style Sheets (CSS) is used to format the layout of a webpage.

CSS can be added to HTML documents in 3 ways:

 Inline - by using the style attribute inside HTML elements

Here we using 2 files for insert data in MySQL:

 index.html:for getting the values from the user

int i=st.executeUpdate("insert into users(first_name,last_name,city_name,email)

<%@ page import="java.sql.*" %>

<form method="post" action="login.jsp">