Professional Documents
Culture Documents
Dataanalyticsunit-1 (2) 104014
Dataanalyticsunit-1 (2) 104014
Digital Notes
[Department of Computer Science &
Engineering]
Course :B.tech
Branch : Computer Science
Semester :V
Subject Name : Data Analytics
Subject Code : KCS-051
Lecture No. /Topic : Introduction to Data Analytics
ANALYTICS?
Analytics is the discovery, interpretation, and communication of meaningful patterns
in data and applying those patterns towards effective decision making .Analytics is
an encompassing and multidimensional field that uses mathematics, statistics,
predictive modeling and machine learning techniques to find meaningful patterns
and knowledge in recorded data.
What is DATA analytics?
➢ Descriptive Analytics,
➢ Predictive Analytics,
➢ Prescriptive Analytics
Descriptive Analytics
• Descriptive Analytics, which use data aggregation and data mining to provide
insight into the past and answer:
– “What has happened in the business?”
• Descriptive analysis or statistics does exactly what the name implies they
“Describe”, or summarize raw data and make it something that is interpretable by
humans.
• The past refers to any point of time that an event has occurred, whether it is
one minute ago, or one year ago.
• Descriptive analytics are useful because they allow us to learn from past
behaviors, and understand how they might influence future outcomes.
• The main objective of descriptive analytics is to find out the reasons behind precious
success or failure in the past.
• The vast majority of the statistics we use fall into this category.
• Common examples of descriptive analytics are reports that provide
historical insights regarding the company’s production, financials,
operations, sales, finance, inventory and customers.
Predictive Analytics
• Predictive Analytics, which use statistical models and forecasts
techniques to understand the future and answer:
– “What could happen?”
• These analytics are about understanding the future.
• Predictive analytics provide estimates about the likelihood of a future
outcome. It is important to remember that no statistical algorithm
can “predict” the future with 100% certainty.
• Companies use these statistics to forecast what might happen in
the future. This is because the foundation of predictive analytics is
based on probabilities.
• These statistics try to take the data that you have, and fill in the missing data
with best guesses
Predictive analytics can be further categorized as –
Google,
once could conclude with its big data expertise, to raise warning for flu in the U.S.
by analyzing the queries having a ‘flu theme’ well before the conventional public health
services
Business Analytics
Business Analytics involves business planning / making business insights/
arriving at solutions for business problems using the information and statistics
from relevant/ associated data sources by applying different tools and
techniques.
•The tools and techniques can be statistical models or machine learning concepts etc.
The tools and techniques involved are for descriptive analytics, predictive analytics,
discovery analytics and/or prescriptive analytics. These analytics are for generating
the statistics and other information that shall eventually lead to relevant solutions.
Applications of Business Analytics
Personalized marketing
Many shopping companies use Big Data Analytics for personalized marketing to make their
customers happy
Mobile Advertising [6]
The Big Data analytic engine of a shopping company knows the personalized
needs of its customers from shopping history. When offers come up on the products of
their interest in a particular place where the customer is around, he/she gets informed
over their mobile phones.
The Big Data source associated with customer’s geographical position is also used here
Data Analytics-Advantages in Manufacturing Industry
Big Data Analytics will always improve the functioning of any associated organization or
business.
For example, In manufacturing Unit, Data Analytics can improve the following
processes:
• Procurement- to find efficient and cost-effective suppliers.
The very huge Smart data from the sensor networks of Smart
projects viz… smart cities, smart homes etc. are analyzed for pollution
control, security by preventing from thefts, homicide, energy
conservation, traffic maintanance, disaster management and many more.
The very huge Spatial data from GPS, Radar, Lidar, Aerial data are used for
identifying, visualizing and analyzing patterns of an area with specific condition or
characteristic for:
•Tracking movements of vehicles between destinations,
•Public Safety,
•Emergency management,
•Climate analysis etc.
Eg: Economic analysis is done based on the different attributes in the vehicle
movement patterns: taxi id, distance travelled, fare etc...
8
Structuring Big Data
• In simple terms, is arranging the available data in a manner such that it
becomes easy to study, analyze, and derive conclusion format.
• Why is structuring required?
In our daily life, you may have come across questions like,
• Sources: File systems such as Web data in the form of cookies, Data exchange
formats....
BigData Challenges & Characteristics
Examples
LHC (Cern) with all experiments about 25 GB/s 4
Square Kilometre Array 700 TB/s (in 2018) 5
50k Google searches per s 6
Facebook 30 Billion content pieces shared per month 7
4
Data Sources
Enterprise data
Serves business objectives, well defined Customer information
Transactions, e.g. Purchases
Social media
Created by humans Messages, posts, blogs, Wikis
Veracity: Trustworthiness of Data
There are many visualizations of the processing and value chain [8]
Reporting vs.Analytics
Reporting Analytics
▪ Lists ▪ Crosstabs, pivot tables
▪ Invoices, Orders ▪ Slice and Dice e.g. “this bythat”
▪ Information from a specificpoint ▪ Key Performance Indicators
in time ▪ Analysis of trends OVERtime
▪ What is in your Salesforce ▪ How does your Salesforce data CHANGE over
time?
database right now? ▪ Complex datarelationships
▪ Simple data relationships ▪ How does the application of Chatter
▪ Which Opportunities did wewin? affect my win rate on Opportunities?
▪ How many Customers bought specific
▪ Which Customers bought shoes? brands of shoes in each region this year,
compared to lastyear?
Phases-of-data-analytics-lifecycle
❑ Phase 1: Discovery
Learn the business domain, including relevant history, such as whether the
organization or business unit has attempted similar projects in the past, from
which you can learn.
Assess the resources you will have to support the project, in terms of people,
technology, time, and data.
Frame the business problem as an analytic challenge that can be addressed
in subsequent phases.
Formulate Initial hypotheses (IH) to test and begin learning the data.
❑ Phase 2: Data Preparation
Prepare an analytic sandbox, in which you can Perform ELT and ETL to get data into
the sandbox, and begin transforming the data so you can work with it and analyze
it.
Familiarize yourself with the data thoroughly and take steps to condition the data.
❑ Phase 3: Model Planning
Determine the methods, techniques and workflow you intend to follow for the
model scoring.
Explore the data to learn about the relationships between variables, and
subsequently select key variables and the models you are likely to use.
❑ Phase 4: Model Building
Develop data sets for testing, training, and production purposes.
Get the best environment you can for executing models and workflows, including
fast hardware and parallel processing.
❑ Phase 5: Communicate Results
Determine if you succeeded or failed, based on the criteria you developed in the
Discovery phase, in collaboration with your stakeholders.
Identify your key findings, quantify the business value and develop a narrative to
summarize your findings and convey to stakeholders
❑ Phase 6: Operationalize