You are on page 1of 10

INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.

Y 2021 – 2022

1 | Introduction to Data Science


and Analytics
2ND SEMESTER – 1ST SHIFT ELE IDSA l PARENA

OUTLINE
I. Data Science & Analytics
II. Project Sparta
III. The Roles in Analytics / The Analytics Job Families
IV. Data Science Table No 1. Projected DSA Workforce Demand in Select Economies
V. History of Data Science and Analytics Projected DSA
Current DSA Percent
VI. Evolution of Data Science and Analytics Economy Workers
Workers Change
VII. A Brief History of Data Science Needed
VIII. Data Mining
4,000 20,000
Malaysia²⁴ 400%
(2016) (2020)
LEGEND The 147,420 340,880
Note: Delete this once transes is done. 131%
Philippines²⁵ (2016) (2022)
SOURCE COLOR CODE 9,300 15,000
Singapore²⁶ 61%
(2015) (2018)
Powerpoint #000000 33,600 43,300
Canada²⁷ 33%
(2016) (2020)
Lecture #0b5394
United 2,350,000 2,720,000
16%
States²⁸ (2015) (2020)
Book #a64d79

Other Sources #77695a

Data Science & Analytics:


● New techniques to solve problems
● Problems that we have in our world are mostly data-driven
● Alexa, Google Home Device, Apple Home Kit, etc.
● Listens actively
● Tailors advertisements to what it listened - algorithms
● With your phone with you, even if the microphone is turned
off, you’ll be surprised on the amount of data being Project Sparta
● Move of the government last 2020 to have (initially 30,000
harnessed based on what you’re saying, where you went, and
scholars
your interaction
● Online scholarship you can take coursehera(?) and you can
● If something on the internet is free, you must be the one
choose different roles in analytics (e.g. data steward, dasta
being sold
engineer, data analyst, etc.)
According to Harvard Business Review (2012) ● For free
● Data Science one of the sexiest jobs of the 21st century ● Finish it in 2 years
● Highest paying ● Problem: There are less than 1,000 who graduated or finish
○ According to Glassdoor, data scientists earn a base pay their entire chosen programs
of $116,840 (estimated 6 million pesos) a year, on
average. - Business Insider
The Roles in Analytics / The Analytics Job Families

BELTRAN, CO, ESCALONA, LEGADA, MARQUEZ | 1A-PH | BATCH 2025 1


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

computer science
● Ensure that data from centralize
data repositories are in sync with
the various data sources
● Responsible to make sure that
the data infrastructure are
available to stakeholders during
agreed times
● Related job titles:
○ ETL Developer
○ Data Architect
○ Data Warehousing
Professional
○ Big Data Engineer
● Creates mathematical models
and algorithm
● Leverage statistical techniques
● Creates analytical models to
derive new insight from
quantitative and qualitative data
● Expertise: mathematics,
statistics
● Have keen eyes to find trends and
Data
patterns from current and
modelers/Analyst
historical pieces of information
● Allows them to make predictions
on what could potentially happen
next
● Related job titles:
○ Statistician
○ Statistical Modeler
○ Advanced Analytics
Professional
ROLE DESCRIPTION ● Interpret the model produced by
the modelers
● Domain expert ● Utilize data and leverage on
● Expert in legal matters in data derived insights to help
gathering organizations make better
● Develop, enforce, and maintain an decisions on a specific functional
organization’s data governing domain
process, data usage, data ● Expertise: business, industry
security policy to ensure data domains
assets provide the organization ● Domain expertise that will
with high quality data Business/Functional validate the insights derived by
● Expertise: business, industry Analyst the data scientist
Data steward domain ● Make final prescriptions ot the
● Data gatekeeper leadership team and they bring
● Most knowledgeable in the leader of the organization to
determining how to deal with make better decisions
missing or fixed uncleaned data ● Related job titles:
● Related job titles: ○ Research Analyst
○ Data Privacy Officer ○ Human Resource Analyst
○ Data Security Officer ○ Marketing Analyst
○ Data Governance Manage ○ Financial Analyst
○ Data Curator ○ Operations Analyst
○ Data Librarian
● Develop and guide data driven
● Usually graduates of computer projects from initiation to
science planning, execution to
● Design, construct, and maintain performance monitoring
data infrastructures including ● Expertise: project management
apps that extract, clean Project Manager/
Data engineer ● Brings the analytics team
transform and load data from the Analytics Manager
together with knowledge and
data sources to centralize data what to expect from each team
repositories member
● Expertise: information ● Ensures the successful delivery of
technology, information science, analytics projects

SURNAMES | 1A-PH | BATCH 2025 2


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

● Related job titles: Methods ★ ★★★ ★ ★★★


○ Chief Data Officer
○ Project Manager Data 3 3
- 1 -
○ Data Engineering Manager Engineering ★★★ ★★★
○ Data Science Manager Statistical 1 2 3
○ Analytics Translator - -
Techniques ★ ★★ ★★★
Methods & 1 3 3
- -
● The analytics association believes the job families Algorithms ★ ★★★ ★★★
identified are differentiated enough especially in the role in 1 2 3 1 3
the data valid chain and the identified areas of expertise. Computing
★ ★★ ★★★ ★ ★★★
● By understanding this distinction, organizations and
analytics practitioners can share the same set of 21st Century 3 3 3 3 3
expectation to ensure that: Skills ★★★ ★★★ ★★★ ★★★ ★★★
1. The Organizations get the most out of their analytics
efforts
2. That analytics practitioners are positively engaged
by performing tasks that are aligned to their
profession and career path.
● Extract-Transform-Load (EDL)
○ A type of data integration which is used for
blending data from numerous data.
○ Frequently used for building data warehouses.

Professional Maturity Model


● List of competencies comes with three-level competencies
to identify proficiency level as analytic professionals mature
in their respective roles.
● Maximum proficiency level per competency per role.

Levels Other Name/s DEFINITIONS ● Business and industry (or organization) competencies
● Pre-defined tasks and works ○ Data stewards
Level 1 Entry
under guidance ○ Functional Analysts
● Formulate and solve tasks ● Technical Competencies
to achieve a wide-range of ○ Data engineers
organizational goals. ○ Data scientists
● Works independently on the ● Business, industry and technical competencies
Level 2 Intermediate
solutions, development and
○ Data Manager
operation.
● All of them needs to have proficiency to the 21st Century
● Can identify new skills
approaches and application
areas to achieve
organizational goals. Data Science
● Assess multiple alternative
solutions based on a Data Science Skill Set
Level 3 Expert structured analysis,
experience and can propose
new approaches if
necessary.

Steward Engineer Scientist Analyst Manager

Domain 3 1 2 3 3
Knowledge ★★★ ★ ★★ ★★★ ★★★
Data 3 2 2 2 3
Governance ★★★ ★★ ★★ ★★ ★★★
Operational 3 3 3 3 3
Analytics ★★★ ★★★ ★★★ ★★★ ★★★
Data 2 2 3 3
1
Visualization ★★ ★★ ★★★ ★★★
Research 1 1 3 1 3 ● Data science, due to its interdisciplinary nature, requires an
intersection of abilities:

SURNAMES | 1A-PH | BATCH 2025 3


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

○ Hacking skills statistics and current problems


○ Math and statistics machine learning to and establish the
○ Knowledge, and establish solutions. best way to
○ Substantive expertise in a field of science. ● Hindi mo pa naiisip present the data
● Hacking Skills are necessary for working with massive yung problema, obtained.
amounts of electronic data that must be acquired, cleaned, ginagawan na ng ● Solving problems,
and manipulated. solusyon. based on
● Math and Statistics Knowledge allows a data scientist to producing results
choose appropriate methods and tools in order to extract for future
insight from data. improvements.
● Substantive Expertise in a scientific field is crucial for
generating motivating questions and hypotheses and Machine Learning, AI, Healthcare, gaming,
interpreting results. MAJOR Search Engine travel, industries with
● Traditional Research lies at the intersection of knowledge of FIELDS Engineering, Corporate immediate adata
math and statistics with substantive expertise in a scientific Analytics needs
field.
● Machine Learning stems from combining hacking skills USING
Yes Yes
with math and statistics knowledge, but does not require BIG DATA
scientific motivation. ● Their functions can be considered as interconnected.
● Danger Zone! Hacking skills combined with substantive ● Data science and data analytics are unique fields with the
scientific expertise without rigorous methods can beget scope being their major difference.
incorrect analyses. ● Data science is an umbrella term for the group of fields that
mines large data sets.
● Inter-disciplinary and must be balanced. ● Data analytics are more focused and part of the larger
● Hacking skills are necessary for massive amounts of data project.
that can be acquired, cleaned and manipulated. ● Question of Exploration
● Data Science is a combination of substantive expertise, ○ Data science answers specific queries, instead, parsing
math and statistics knowledge and hacking skills. through data sets and sometimes unstructured ways to
● The combination of math and statistics knowledge and expose the insights. Broader insights that are
hacking skill would be categorized as machine learning. concentrated on what question to be asked.
● The combination of substantive expertise and math and ■ Foundation and parses to big data sets to create
statistical knowledge would be categorized as traditional initial observations, future trends and potential
research, insights.
● There is a danger zone, incorrect analysis would occur if a ■ Information itself can be used for modeling,
person does not have math and statistics knowledge, only improving machine learning and enhancing
substantive expertise and hacking skills. artificial intelligence algorithms.
○ Data analysis works better when it is focused, having
Data Science vs. Data Analytics questions in mind that needs answers for existing data.
Discovering answers to questions being asked.
● Data science and data analytics are being used
■ Actionable insights with practical applications
interchangeably.
● Massive data has become a major component in the
technical world. History of Data Science and Analytics
● From the actionable insights and results that businesses ● Necessity is always the Mother of Invention.
can derive from data science. ● Because we need something immediate, so we need to invent
● Requires understanding and having proper tools to work with something.
the data, to sift through it. To uncover the right information. ● Great reference will be the Hidden Figures (movie).
● Used to assess data to know what is useful from what's not.
● Provides different results and pursues different approaches. 1970s
DATA SCIENCE DATA ANALYTICS
SCOPE Macro Micro
● To ask the right ● To find actionable
questions data REPORT WRITING
● Solution for ● Solution for Goal: Automation
PREDICTED CURRENT
PROBLEMS PROBLEMS
● Multi-disciplinary ● Processing and
field focused on performing
finding actionable statistical analysis ● To get rid of the rudimentary job of reporting transactions
GOAL
insights from large of existing data on a daily, weekly, monthly basis, because of report writing.
sets of raw and sets. ● Dyan nagsimula yung COBOL (The Common Business Oriented
structured data. ● Creating methods Language) which is still being used in some of the recent banks/
● Uses techniques to to capture, banking systems; to handle our accounts.
obtain answers process, organize ● Printing statement of accounts, banks are still using the
incorporating data to uncover dot-matrix printer and yung maingay [assumed as fax
computer science, actionable machine].
predictive analytics, insights for ● Only Problem: Once you reach the 2000s, it now becomes

SURNAMES | 1A-PH | BATCH 2025 4


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

00. For computation interests, it would be confusing since Desktop Publishing


the computations will become negative (due to 00).
● The main job of people that time before the 2000s was to November 19, 1990
update the system so they will be able to survive the Y2K ● Microsoft Office for
bug. Hindi ‘yan naisip noon since their main target is to save Windows is released
memory, but we want automation (automation to be able to otherwise known as
generate needed reports). “Office 1.0”. Office 1.0
contains Word 1.1, Excel 2.0,
1980s and PowerPoint 2.0.
● In the same year as the
release, Microsoft became
the first company to excel
$1 billion in sales in one
CENTRALIZED SYSTEM year.
Goal: ERP (Enterprise Resource
Planning)
MIS (Management Info 2000s
System)

● We already have MIS and ERP to accept the reports coming


INTERNET AND DATA MINING
from the departments (especially in business sectors).
There would be several departments, not only one,
especially in big companies.
● To accept these reports from several departments, you have
to place them in a single repository.
● In the 1980s, they needed to physically go to the MIS
department and request the report to be made. During this,
it made it possible to track who requested the report, what ● Nagkaroon na ng Home Internet, not for military purposes or
report was requested, and the other information. educational purposes.
● Looking at the picture above, they were still using big tape ● Slowly, the Internet was made available to the general
drives that would consume a very big space. If you have MIS public. Now, everyone can search and create information
dept, it is like a warehouse fully air conditioned just for this online. Smartphones already started to get into the picture
thing. (but not like the ones we use today and not as powerful).
● Our current flash drives might still have a bigger capacity ● In the US, some students are required to be acquainted
compared to the data or devices before. with the Texas Instrument Calculator. It is a big calculator
with a big screen (sometimes colored) that can do graphing
and everything.
● But the problem is that your cell phones can do everything
1990s that this expensive calculator can do (faster and more
portable). But they are still not allowing students to
exclusively use cell phones in their calculations, even if
you’re taking the examinations. TI calculators must only be
BUSINESS INTELLIGENCE used.
Goal: Apps for Everyone

2010s

● Applications for personal use were invented and made to


share (not YET to analyze).
● Report generation became the responsibility of the person
who wanted to use the report or present. ‘Di na nila kailangan
mag-assign ng tao para pumunta sa MIS department, instead,
sila na mismo dapat gumagawa noon.
● This ushered in the start of desktop publishing. This BIG DATA
necessitated the invention of what we call the computer
mouse that we use today in our computers. So, nagkakaroon
na tayo ng ppt during that time in the 1990s; nagrelease na ‘yung
MS Office for Windows. D’yan na nagsimula.
● If we look at it closely, it’s all about applications in
business (kasi doon naman talaga unang ginagamit ‘yung data),
but mostly for reporting or somebody else to analyze just to give
out the information.
● BUT generating the insights and trying to make sense of what the ● In 2010, Big Data came into play. What made it possible?
data has more to offer, ‘di pa gaano ginagawa that time, it’s just Because of improving internet connection and
for SHARING. infrastructure (for the creation of a lot of data; billions of

SURNAMES | 1A-PH | BATCH 2025 5


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

data per second). saang sites ka nagpupunta. They’re tracking it and selling it to
● Big data was conceptualized as data science and real time others.
analysis were now required in industries. ● In the growth of the internet, the IoT and exponential growth
● E.g., of data volume available to enterprises has been a flood of
○ Before, to know the traffic situation, you have to go to new information or the Big Data. Once the doors were opened
the website called Traffic.com to know which roads are by businesses seeking to increase profits and drive better
congested. Now, we just use Waze. decision making, the use of big data has started being
○ Google Maps have been improved, allowing applied to other fields such as medicine, engineering, and
turn-by-turn navigation. It sometimes has overlays of social sciences.
the actual buildings or the augmented reality. ● Data Science has become an important part of business and
○ But take note, the difference between Google Maps and academic research. Technically, this would include machine
Waze: translation, robotics, speech recognition, and search engines
■ Waze has a real time traffic situation, and adjust (in fact, you could even search now just using your voice if
accordingly depending on which road you are you have your digital assistance with you).
taking in consideration of the traffic situation. ● In terms of research areas, data science has expanded to
■ In Google maps, wala, point A to point B. If you;re include biological sciences, healthcare, medical informatics,
going to change direction or course somewhere in humanities, and social sciences. Data Science now
between, it will just again redirect you going to influences economics, governments, budisness, and finance.
point B, without considering the traffic situation. ● The needs of the industry, as demanded by the fast moving
● If you’re going to go to provinces and expect that there will realities of the present time, also evolve the analytics.
be no internet but you would want a navigation capability, ● Noon, puro reporting lang and analysis ng konti using Excel; you
what you can do is to download Maps.me (available in App have your monitoring because you have your Dashboards
Store and Google Play). and Scoreboards. In 2010, we already used predictive
○ Maps.me - provide offline navigation provided that you analytics.
download the map; also has maps for other cities ● E.g.,
○ According to Octa Research, by the end of February, 1000
cases a day.
Evolution of Data Science and Analytics ○ How did they do that? Because of predictive analytics.
● Take note that, what are you going to do with all that day can
only be answered if you know your domain.

● 1970s - Report Writing


● 1980s - MIS (Management Information System)
● 1990s - Business Intelligence
● 2000s - Data Mining
● 2010s - Analytics, Data Discovery, and Data Science
● With this in mind, as you progress, there are now
differences when it comes to the skill sets you must What are you going to do with all that data?
possess, or your graduates would possess when you enter
● The VALUE in the data “haystack” is guided by your
into the workforce.
knowledge of the DOMAIN - not the tools or technique.
● Remember, as future pharmacists, you're not just required
● Finding that VALUE - the combination of all the skill sets that
to be able to understand pharmacy, but you are also
you need - is ANALYTICS.
expected to be able to handle or use Microsoft Office or
● If you’re working as a pharmacist, you must have an
other desktop publishing/editing software. But are you
expertise in pharmacy also. To know which part of the data
skillful enough for you to do data analysis? You should be able
set is useful to you or not.
to know how to analyze data.
● Technology and other necessary skills allow industries to
optimize the demands of the time. What is Data Science and Analytics?

A Brief History of Data Science


● Data Science started with statistics and has evolved to
include concepts such as Artificial Intelligence, machine
learning, rural networks, and Internet of Things.
● As more and more data becomes available (first by way of
recorded shopping behavior and trends), businesses have
been collecting and storing it in ever greater amounts. Binbili
nila ‘yung data mo based on your browsing habits, history, kung

SURNAMES | 1A-PH | BATCH 2025 6


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

● Netflix curates the possible movies you watch based on your


search history and the movies that you have watched.
● Education in China, they use a technology to determine if the
student is paying attention in class
● Health Care - helped in diagnosing patients
○ Machine Learning in Disease Diagnosis
○ Genetics & Genomics
○ Drug Development
○ Virtual Assistance for Customer Support
● In Taiwan, they all have a national ID that helps in getting
datas in pharmacies

Evolution of Analytics
● Where do we use it?
● E.g., Shopee - Will I meet my committed packaged deliveries
to my customers?
- used in trying to select who would be the one to
endorse your brand
● Analytics - is a process; an art of bringing sense of the data
to bear on decision-making
● Successful use of analytics and data mining requires both an
understanding of the business context/field of expertise
where value is to be captured and understanding of exactly
what the data mining methods do. Found in logistics:
● E..g., Lazada, Showbusiness, Mining, Medicine
● It already has a wide scope.
● E.g., Connection between north luzon and south luzon
expressway - so many changes in the design before the final
output; because of the ever-changing data

TERM DEFINITION
What happened?
Descriptive ● Describes historical data
● Helps understand how things are going
Why did it happen?
● Helps understand unique drivers
Diagnostic
● Segmentation, Statistical, and
Sensitivity analysis
What could happen?
Predictive ● Forecast future performance, events,
and results
How to make it happen?
Prescriptive ● Analysis that suggest a prescribed
action
What to do, why, and how?
● Remember: Right now, we don’t just need the information, ● Proactive action
but we need the insights. We have predictive analytics there. Cognitive ● Learn at scale
But the aim is for us to have descriptive analytics. How do we ● Reason with purpose
optimize? (one of the questions to be answered) ● Interact naturally
● Information - to answer what happened, where exactly is the
problem, what needs attention ● Analytics is used in several cases. It can be used in stocks,
● Insights - to answer why is this happening, what’s the next Facebook, and other social media algorithms. In Facebook, it
best action (predictive modeling), what’s the best thing that uses analytics for establishing your social bubble. (Kung sino
could happen (optimization)? lang ‘yung madalas mo viniview, sila lang ‘yung madalas mo
● E.g., Traffic.com vs Waze makita). Your timeline is based on your interests
○ Traffic.com - provides information only; shows which
ones are actually heavy with traffic
○ Waze - can show you where to go, providing with the Data Mining
shortest route or fastest time to get there (can provide
insights); can also give alternative routes ● Finding useful pattern in a data

SURNAMES | 1A-PH | BATCH 2025 7


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

● It is the process of knowledge discovery, machine learning ● The objective of this class of data mining
and predictive analytics techniques is to find patterns in data
● Extracting Meaningful Patterns based on the relationship between data
● Building Representative Models points themselves
● Combination of Statistics, Machine Learning, and Computing ● Hindi lang input and output but between your
Algorithms data points

Motive of Data Mining


● Recognize valid, probable, advantageous, and Types of Learning Models
1. Classification Models (S)
understandable patterns in data
2. Regression Models (S)
● Database technology has developed where huge amounts of
3. Clustering Models (S/US)
data required to be stored in database in the wealth of
4. Anomaly Detection (US)
knowledge hidden in those datasets is collected by business
5. Time Series Forecasting (US)
people as a usable tool in business-vital decisions
6. Association (US)
○ Ex: Shopee - Alin yung mabenta? Alin yung hindi? Where are
7. Text and Sentiment Analysis (US)
you going to put your money? Which product are you going
to order more? How long [will it take] before this product
actually fades into oblivion?
● Data mining fascinates more awareness as it is obligated
to take out more valuable information
● Used to the peak intelligence in databases – the procedure
of recognizing and extracting useful information in
succeeding knowledge in databases using mathematical,
statistical, AI, and machine learning techniques.
● Facilitates many various algorithms to put into different
tasks
○ All these algorithms assimilate the model into the data

Data mining algorithms can be described as consisting of three


parts:
Predictive Models:
● Constitutes prediction concern values of data using
TERM DEFINITION known results coming from various data
Model Objective is to fit the model into the data ● Predictive modeling may be made based on the use of
Some identification tests must be used to fit different historical data
Preference ● Comprised of: classification, regression, time series analysis,
one model over another
and prediction
Where all algorithms are necessary for ○ Statistical regression - monitoring learning
Search
processing to find data technique that incorporates an [inaudible] of the
dependency of a few attribute values on the values of
Data mining is NOT: other attributes
● descriptive statistics ■ Gagamitin yung other attributes for you to come up
● exploratory visualization with a prediction
● dimensional splicing ■ Linear, multi-linear regression
● hypothesis testing ○ Classification - tries to classify which justice it
● Queries actually belongs
○ Time Series analysis - based on data that is
Types of Learning Models concerned with time

Data Mining: Steps


TERM DEFINITION
1. Business Understanding
● Directed data mining ○ determining the problem; doesn’t mean something is
Supervised ● The model generalizes the relationship not working
between the input and output variables ○ How to optimize the process (paano mo pa siya mas
Unsupervised ● Undirected data mining mapapaganda?)
2. Data Understanding

SURNAMES | 1A-PH | BATCH 2025 8


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

○ If we have this particular problem, what data do we


need to be able to address that problem? References from: Insert books, other sources (use proper citations mga
3. Data Preparation bb)
○ Clean the data
University of Santo Tomas Powerpoint Presentation: Unit ano
○ 80% of the data process falls on the data preparation
○ Where it’s important to focus because you need to
understand which of your data is useful and not
4. Modeling
5. Testing and Evaluation
6. Deployment
*it goes back from the start

CRISP-DM
● Used within the business industry
● Based on current research, this is the most widely used
form of data mining model because of its various
advantages which solved the system problems in the data
mining industries
● Drawbacks: does not perform project management
activities

● Sequence between phases is not strict


● Back and forth between different phases is required
● The arrows indicate the most important and frequent
dependencies between phases
● Outer circle symbolizes the secret nature of data mining
itself
○ Data mining process continues after a solution has been
deployed
○ Lessons learned during the process can trigger new
often more focused business questions as a
subsequent data mining process will benefit from the
experiences of previous ones
○ Not necessarily addressing a problem, more on how to
optimize
■ How are we going to create a better solution?

REFERENCES
Course Instructor:

SURNAMES | 1A-PH | BATCH 2025 9


INTRODUCTION TO DATA SCIENCE AND ANALYTICS BS PHA | 1A PH A.Y 2021 – 2022

SURNAMES | 1A-PH | BATCH 2025 10

You might also like