Professional Documents
Culture Documents
Introduction To Data Mining Unit 1 PDF
Introduction To Data Mining Unit 1 PDF
TODAY’S AGENDA
Course management
Brief overview of Data Mining and allied fields
Summary of a few impactful articles and recent trends
1
8/24/2019
COURSE MANAGEMENT
LEARNING OBJECTIVES
Learn the art of modeling and interpreting large complicated data sets
via predictive and descriptive data mining methods.
Get to know several online data repositories and how to participate in
data analytics competitions held at Kaggle.com and other sites
Have advanced level expertise in data analytics software and languages
such as KNIME and Python.
2
8/24/2019
COURSE OVERVIEW
Data Preparation
Classification Techniques
Clustering
Text Analytics
Regression Analysis
Principal Component Analysis
Association Rule Mining
KNIME
Python
Data on Kaggle Website
http://www.kaggle.com/
3
8/24/2019
BOOKS
ACKNOWLEDGEMENT
Although I am not extensively following the two books below but their
slides are still very popular in the academia and would be using them
occasionally:
Data Mining: Concepts and Techniques (2011)
Introduction to Data Mining (2018)
4
8/24/2019
MARKS DISTRIBUTION
Midterm 25
Final 40
Project 10
Assignments 15
MEETING HOURS
Office Hours:
Monday/Wednesday: noon – 1PM and ?
or by appointment (by e-mailing me at sahaider@iba.edu.pk).
5
8/24/2019
Traffic Predictions
Google Maps
Online Transportation Networks
Uber/Careem for price prediction
Video Surveillence
Crime detection
Fraud Detection
Financial institutions
FALL 2019 Sajjad Haider 12
6
8/24/2019
MACHINE LEARNING
7
8/24/2019
A SIMPLIFIED TAXONOMY
Data Science > Data Analytics > Data Mining > Machine Learning
Data Analytics also deals with Visualization
Data Science also deals with data acquisition and management of data
Beside machine learning, data mining also makes use of statistical models
DATA MINING
8
8/24/2019
1. Statistical Models
2. Machine learning
9
8/24/2019
Data scientists are the people who understand how to fish out answers
to important business questions from today’s tsunami of unstructured
information.
As companies rush to capitalize on the potential of big data, the largest
constraint many face is the scarcity of this special talent.
10
8/24/2019
Big Data referes to datasets whose size is beyond the ability of typical
database software tools to capture, store, manage and analyze.
The demand for deep analytical positions in a big world could exceed the
supply being produced on current trends by 140K to 190K positions.
A need for 1.5 million additional managers and analysts in the US
who can ask the right questions and consume the results of the analysis
of big data effectively.
11
8/24/2019
“Big data refers to data sets whose size is beyond the ability of
typical database software tools to capture, store, manage and analyze.”
- The McKinsey Global Institute, 2011
3 V’S
12
8/24/2019
In the past couple of years, Data Science related courses and specialization
have been extremely popular on MOOCs websites:
John Hopkins University (Coursera)
University of Washington (Coursera)
Google (UdaCity)
UC Berkley (EdX)
University of Toronto (Coursera)
And many others……..
13
8/24/2019
RECENT TRENDS
Despite the success stories, many companies aren’t getting the value they
could from data science.
Four of the top seven “barriers faced at work”:
lack of management/financial support
lack of clear questions to answer
results not used by decision makers and
explaining data science to others
14
8/24/2019
Data science, broadly defined, has been around for a long time. But the failure
rates of big data projects in general and AI projects in particular remain
disturbingly high.
The following were found to be the two most important reasons:
Many data scientists are much more interested in pursuing their crafts — namely, finding
interesting nuggets buried in data — than they are in solving business problems.
From the company’s perspective, the talent is rare and protecting data scientists from the
chaos of everyday work just makes sense. But doing so increases the distance between
data scientists and the company’s most important problems and opportunities.
15