Professional Documents
Culture Documents
on
DATA ANALYTICS VIRTUAL INTERNSHIP
Submitted in partial fulfillment of requirements for the award degree of
BACHELOR OF TECHNOLOGY
By
1|Page
KKR & KSR INSTITUTE OF TECHNOLOGY AND
SCIENCES
Department of Electronics and Communication Engineering
BONAFIDE CERTIFICATE
EXTERNAL EXAMINER
2|Page
KKR & KSR INSTITUTE OF TECHNOLOGY AND SCIENCES
STUDENT DECLARATION
3|Page
ABSTRACT
Data Analytics is a package containing many tools that are all integral to
performing analytics on data sets. This includes importing data, cleaning and
preparing data, and tools and applications for doing statistics and data mining.
Data Analytics is a new driver of the world economic and societal changes. The
world’s data collection is reaching a tipping point for major technological changes
that can bring new ways in decision making, managing our health, cities, finance and
education. While the data complexities are increasing including data’s volume,
variety, velocity and veracity, the real impact hinges on our ability to uncover the
`value’ in the data through Data Analytics technologies. Data Analytics poses a grand
challenge on the design of highly scalable algorithms and systems to integrate the data
and uncover large hidden values from datasets that are diverse, complex, and of a
massive scale. Potential breakthroughs include new algorithms, methodologies,
systems and applications in Data Analytics that discover useful and hidden knowledge
from the Data efficiently and effectively
4|Page
CERTIFICATE OF INTERN
5|Page
ACKNOWLEDGEMENT
I take this opportunity to express my deepest gratitude and appreciation
to all those people who made this Internship work easier with words of
encouragement, motivation, discipline, and faith by offering different places to
look to expand my ideas and help me towards the successful completion of this
Internship work.
APARNA OSIPILLI
6|Page
Table of Contents
Stipulated Completion
Module Module Content Date Date
Introduction
• Big Data
Module 1 • Big Data Pipeline
• Big Data Tools
• Big Data Collection 01/05/2023 9/05/2023
• Big Data Storage
• Big Data Ingestion
• Big Data Processing and
Analysis
• Big Data Visualization
Lab 1
Module 2 Lab 1 Introduction 10/05/2023 19/05/2023
Store Data in Amazon
S3
Lab 2
Module 3 Lab 2 Introduction 20/05/2023 29/05/2023
Query Data in Amazon
Athena
Lab 3
Lab 3 Introduction
Module 4
30/05/2023 08/06/2023
7|Page
Query data in Amazon
S3 with Amazon Athena
and AWS Glue
Lab 4
Module 5 Lab 4 Introduction 09/06/2023 15/06/2023
Analyze Data with
Amazon Redshift
Lab 5
Lab 5 Introduction
Analyze Data with
Module 6
Amazon Sage maker, 16/06/2023 22/06/2023
Jupyter Notebooks and
Bokeh
Lab 6
Lab 6 Introduction
Module 7 Automate Loading Data 23/06/2023 30/06/2023
with the AWS Data
Pipeline
Lab 7
Lab 7 Introduction
Analyze Streaming Data
Module 8 with Amazon Kinesis 01/07/2023 07/07/2023
Firehose, Amazon
Elasticsearch and
Kibana
Lab 8
Module 9 Lab 8 Introduction
Analyze IoT Data with 08/07/2023 14/07/2023
AWS IoT Analytics
8|Page
About AICTE
History
The beginning of formal technical education in India can be dated back to the mid-
19th century. Major policy initiatives in the pre-independence period included the
appointment of the Indian Universities Commission in 1902, issue of the Indian
Education Policy Resolution in 1904, and the Governor General’s policy statement of
1913 stressing the importance of technical education, the establishment of IISc in
Bangalore, Institute for Sugar, Textile & Leather Technology in Kanpur, N.C.E. in
Bengal in 1905, and industrial schools in several provinces
Initial Set-up
All India Council for Technical Education (AICTE) was set up in November
1945 as a national-level apex advisory body to conduct a survey on the facilities
available for technical education and to promote development in the country in a
coordinated and integrated manner. And to ensure the same, as stipulated in the
National Policy of Education (1986), AICTE was vested with:
Statutory authority for planning, formulation, and maintenance of norms
& standards
Quality assurance through accreditation
Funding in priority areas, monitoring, and evaluation
Maintaining parity of certification & awards
The management of technical education in the country
9|Page
context of proliferation of technical institutions, maintenance of standards, and
other related matters.
Organizations are getting familiar, that work these days is something other than
an approach to win your bread. It is a dedication, an awareness of others’
expectations, and a proprietorship. In order to know how the applicant might
"perform" in various circumstances, they enlist assistants and offer PPOs (Pre-
Placement Offers) to the chosen few who have fulfilled every one of their
necessities.
For getting a quicker and easier way out of such situations, many
companies and students have found AICTE to be of great help. Through its
internship portal, AICTE has provided them with the perfect opportunity to
emerge as a winner in these trying times. The website provides the perfect
platform for students to put forth their skills & desires and for companies to place
the intern demand. It takes just 15 seconds to create an opportunity, auto-match,
and an auto-post to google, bing, glassdoor, Linkedin, and similar platforms. The
selected intern's profiles and availability are validated by their respective colleges
before they join or acknowledge the offer. Shortlisting the right resume, with
respect to skills, experiences, and location just takes place within seconds.
Nothing but authentic and verified companies can appear on the portal.
10 | P a g e
Additionally, there are multiple modes of communication to connect with interns.
Both claiming to be satisfied in terms of time management, quality, security
against frauds, and genuineness.
All you need to do was to register at this portal https://internship.aicte-india.org/
Fill in all the details, send in your application or demand, and just sit back & see
your vision take a hike.
About EduSkills
11 | P a g e
We want to completely disrupt the teaching methodologies and ICT-based
education system in India. We work closely with all the important stakeholders
in the ecosystem Students, Faculties, Education Institutions, and Central/State
Governments by bringing them together through our skilling interventions. Our
three-pronged engine targets social and business impact by working holistically
on Education, Employment and Entrepreneurship.
12 | P a g e
Plan of Internship program
b) The starting and ending date of my internship is from May to July. Each week
I had done different modules. In the first month, I learned about the introduction
of big data, storing data in Amazon s3, and Query data in Amazon Athena. In the
second month, creating an AWS Glue crawler, analyzing data with Amazon
Redshift, and analyzing data with Amazon sage maker, jupyter notebooks, and
bokeh. In the third month, automate loading data with the AWS Data Pipeline,
analyze streaming data with Amazon kinesis firehose, amazon elastic search, and
kibana, and Analyze IoT Data with AWS IOT Analytics. Finally, at the end of
May, I completed my internship.
Our professors also assisted us in completing the labs, which made data
analytics much easier. Because of the faculty's supervision, we were able to finish
the second portion of the internship with ease and it was done by the month of July.
13 | P a g e
Training Program
Amazon S3 overview:
14 | P a g e
Amazon S3 storage classes:
Amazon S3 offers a range of object-level storage classes that are designed for
different use cases:
Amazon S3 Standard
Amazon S3 Intelligent-Tiering
Amazon S3 Standard-Infrequent Access (Amazon S3 Standard-IA)
Amazon S3 One Zone-Infrequent Access (Amazon S3 One Zone-IA)
Amazon S3 Glacier
Amazon S3 Glacier Deep Archive
In this lab, Amazon S3 is used throughout the course, you must know how
to create Amazon S3 buckets and load data for subsequent labs. using the
AWS management console to create an Amazon S3 bucket, add an IAM user
to a group that has full access to the Amazon S3 service, upload files to
Amazon S3, and run simple queries on the data in Amazon S3.
Big data Collection: -
There are various ways of collecting the big data. AWS has its own services for
collecting the data, some of them are:
This is a computing service that is good for hosting web applications. Agents
can be installed on Amazon EC2 to send clickstream data, web server access
logs, error logs, etc.
AWS offers a suite of IoT services that provide device software, control and
data services. These services enable you to connect securely to IoT devices and
transfer data at any scale.
15 | P a g e
Big Data Storage:
There are many Storage options available in the AWS. Some of them are
Data can be collected into AWS Services in various ways. Some of them are
There are various managed and scalable services available to make the Analysis
of data easy. They are
Data Visualization: -
Visualization is a crucial part in big data analysis. The tools provided by AWS
for data visualization are:
16 | P a g e
Lab1: Store data in Amazon S3
17 | P a g e
o Creating Buckets in S3
18 | P a g e
Task – 3: Uploading an object into S3 bucket
19 | P a g e
o Uploading Compressed files into S3 bucket
20 | P a g e
Amazon Athena is an interactive query service that makes it easy to
analyze data in Amazon S3 using standard SQL. Athena is serverless, so there
is no infrastructure to manage, and you pay only for the queries that you run.
Athena is easy to use. Simply point to your data in Amazon S3, define the
schema, and start querying using standard SQL. Most results are delivered
within seconds. With Athena, there’s no need for complex ETL jobs to prepare
your data for analysis. This makes it easy for anyone with SQL skills to
quickly analyze large-scale datasets.
21 | P a g e
o Amazon Athena Query Editor
22 | P a g e
The data stored can be optimized using views or partitioning the data
By partitioning the data, we can decrease the time in queue and the data to be
scanned
23 | P a g e
o Creating the views
24 | P a g e
Lab 3: Creating an AWS Glue crawler:
Lab 3 introduces you to AWS Glue. Lab 3 builds on that idea to show how to
use AWS Glue to infer the schema from the data. This lab includes:
25 | P a g e
AWS Glue interface
o Creating a Crawler
26 | P a g e
o Selecting a Query result location
27 | P a g e
Lab 4: Analyze Data with Amazon Redshift
28 | P a g e
Creating and Configuring the Amazon redshift cluster
29 | P a g e
o Cluster Permissions
o Cluster Dashboard
30 | P a g e
o Cluster successfully created
Lab 5: Analyze data with Amazon sage maker jupyter Notebooks ,and
Bokeh
Lab 5 introduces you to Amazon Sage Maker, Jupyter notebooks, and
the Bokeh Python package. Amazon Sage Maker is a fully managed machine
learning service. Though machine learning is not a part of this course, this lab
uses Amazon Sage Maker as a way of hosting a Jupyter notebook for the
learners to work with. The main purpose of this lab is to provide you with an
opportunity to visualize data and practice using visualizations to support a
business decision.
The main purpose of this lab of this is:
31 | P a g e
Visualize data with the open-source Bokeh Python package.
o Obtain the AWS Identity and Access Management (IAM) role for
accessing Sagemaker
32 | P a g e
o Open Jupyter
33 | P a g e
o Creating data from database
Lab 6 introduces you to the AWS Data Pipeline. The AWS Data
Pipeline is a web service you can use to migrate and transform data. The main
purpose of this lab is to provide learners with an opportunity to automate
moving data and to understand how this service fits into the larger context of
data analysis.
34 | P a g e
Export data from Amazon Redshift to a Jupyter notebook.
Access Amazon Kinesis Data Firehose and Amazon Elastic search Service
(Amazon ES) in the AWS Management Console.
Create a Kinesis Data Firehose delivery stream.
Integrate a Kinesis Data Firehose delivery stream with Amazon ES.
Build visualizations with Kibana.
35 | P a g e
o Kibana User Interface
36 | P a g e
Sample Webpage created by AWS
37 | P a g e
o Visualizing insights of the website using pie chart
38 | P a g e
Lab 8: Analyze IOT Data with AWS IOT Analytics
Lab 8 introduces you to AWS IoT Analytics and AWS IoT Core. AWS
IoT Analytics automates the steps required for analysing IoT data. You can filter,
transform, and enrich the data before storing it in a time-series data store. AWS
IoT Core provides connectivity between IoT devices and AWS Services. IoT
Core is fully-integrated with IoT Analytics.
39 | P a g e
o Creating an AWS IoT Analytics channel
40 | P a g e
o Creating an AWS Core IoT Rule
41 | P a g e
o Creating a Dataset
42 | P a g e
o Querying the Dataset
43 | P a g e
Work Samples
Capstone Project:
44 | P a g e
Attribution Percentage of the Staff
45 | P a g e
RESULT
Critical Analysis
46 | P a g e
47 | P a g e
Conclusion
As a result, I would like to conclude that internship played a critical part in not only
expanding my theoretical but also practical knowledge.
Here, we setup the components of AWS IoT Analytics implementation an then used
a Python script to simulate loading data into AWS IoT Core . After you have loaded
the data into AWS IoT Analytics you perform queries to analyze the data.
You can create a crawler by starting in the Athena console and then using the
AWS Glue console in an integrated way. When you create the crawler, you
specify a data location in Amazon S3 to crawl.
By pursuing this internship, I was able to get a good understanding of how data
analytics works and how it is important in expanding, and improving business with the
help of AWS services which are mainly focusing on problems but not oninfrastructure
or other hardware components. With the Big data foundations, I was able to gain a basic
knowledge of data analytics. As data analytics is a popular technology, it is both
beneficial and promising in the future. Because of it, we arepredicting many trends and
patterns for further estimation, this platform is user- friendly and simple to use.
In Conclusion, it was a challenging experience, I sense that the internship was valuable
in developing my Big Data/Data Science skills. I am positive that the knowledge and
experience that I gained from AWS in Data Analytics and my internship will help me
in establishing a successful career ahead.
48 | P a g e