You are on page 1of 38

Analitik Big data

3 SKS

Gandhi Pawitan

09/24/2023 Analitik Big Data # 1 1


09/24/2023 Analitik Big Data # 1 2
Kontrak Perkuliahan
• Kehadiran penuh – minimal 80% (absensi
pada aplikasi)
• Nilai akhir – NA : Tugas 25%, UTS 35%, dan
UAS 40%
• Tugas: nama/NPM/tanggal, essay, tugas
mandiri, aktifitas di kelas
• UTS dan UAS: essay, dan pilihan berganda

09/24/2023 Analitik Big Data # 1 3


Materi
Bagian 1: Pengantar
1. Pengantar Analitik Big Data Untuk Bisnis
2. Motivasi Bisnis penerapan big data dan adopsi big data untuk bisnis
Bagian 2: Descriptive analtyic
3. Data Mining dan Text Mining
4. Classification
5. Cluster
Bagian 3: Predictive analytics
6. Analisis Model Regresi logistic
7. Time series modeling dan Forecasting (moving average dan exponential smoothing)
Bagian 4: Prescriptive analytics
8. Linear Optimization (bab 13 dan 14)
9. Integer Optimization (bab 15)
10.Decision analysis (bab 16)
09/24/2023 Analitik Big Data # 1 4
09/24/2023 Analitik Big Data # 1 5
09/24/2023 Analitik Big Data # 1 6
Big Data is a field dedicated to the
• analysis,
• processing, and
• storage
of large collections of data that frequently originate from disparate sources.

• combining of multiple unrelated datasets,


• processing of large amounts of unstructured data and harvesting
of hidden information in a time-sensitive manner.

• traditional analytic approaches based on statistics,


• newer techniques that leverage computational resources and
approaches to execute analytic algorithms.
09/24/2023 Analitik Big Data # 1 7
• interdisciplinary endeavor that blends :
mathematics + statistics + computer science +
subject matter expertise.

Data within Big Data environments generally accumulates from being


amassed within the enterprise via applications, sensors and external
sources.

09/24/2023 Analitik Big Data # 1 8


The results obtained through the processing of Big Data can
lead to a wide range of insights and benefits, such as:

09/24/2023 Analitik Big Data # 1 9


Concepts and Terminology
Datasets
Collections or groups of related data
Data Analysis
Data analysis is the process of examining data to find facts,
relationships, patterns, insights and/or trends.
Data Analytics
Data analytics is a discipline that includes the management of the
complete data lifecycle, which encompasses collecting, cleansing,
organizing, storing, analyzing and governing data.
Data analytics has developed methods that allow data analysis to occur through
the use of highly scalable distributed technologies and frameworks that are
capable of analyzing large volumes of data from different sources.

09/24/2023 Analitik Big Data # 1 10


Big Data analytics lifecycle:
identifying, procuring, preparing and analyzing large amounts of raw,
unstructured data to extract meaningful information that can serve as an
input for identifying patterns, enriching existing enterprise data and
performing large-scale searches.

In business-oriented
environments, data analytics
results can lower operational costs
and facilitate strategic decision
making.

09/24/2023 Analitik Big Data # 1 11


Analytics are distinguished by the results they produce:
• descriptive analytics
• diagnostic analytics
• predictive analytics
• prescriptive analytics

09/24/2023 Analitik Big Data # 1 12


• Data dashboards: Collections of tables,
charts, maps, and summary statistics that are
updated as new data become available
• Uses of dashboards
• To help management monitor specific aspects of
the company’s performance related to their
decision-making responsibilities
• For corporate-level managers, daily data
dashboards might summarize sales by region,
current inventory levels, and other company-wide
metrics
• Front-line managers may view dashboards that
contain metrics related to staffing levels, local
inventory levels, and short-term sales forecasts

09/24/2023 Analitik Big Data # 1 13


• Descriptive analytics: Encompasses the set of techniques that
describes what has happened in the past; examples:
• Data queries
• Reports
• Descriptive statistics
• Data visualization (including data dashboards)
• Data-mining techniques
• Basic what-if spreadsheet models
• Data query: A request for information with certain characteristics
from a database

09/24/2023 Analitik Big Data # 1 14


09/24/2023 Analitik Big Data # 1 15
Roll-up – groups data across
multiple categories to show
subtotals and totals

Drill-down – enables a detailed


view of the data of interest by
focusing in on a data subset
from the summarized view
performing drill-down operations to
breakdown sales by type and location so
that it can be determined which
locations underperformed for specific
types of policies.
09/24/2023 Analitik Big Data # 1 16
Predictive Analytics
Predictive analytics are carried out in an attempt to determine the outcome of
an event that might occur in the future.
Questions are usually formulated using a what-if rationale, such as the
following:
• What are the chances that a customer will default on a loan if they have
missed a monthly payment?
• What will be the patient survival rate if Drug B is administered instead of
Drug A?
• If a customer has purchased Products A and B, what are the chances
that they will also purchase Product C?

09/24/2023 Analitik Big Data # 1 17


Techniques used in Predictive
Analytics:
• Linear regression
• Time series analysis
• Data mining is used to find
patterns or relationships among
elements of the data in a large
database; often used in predictive
analytics
• Simulation involves the use of
probability and statistics to
construct a computer model to
study the impact of uncertainty
on a decision
09/24/2023 Analitik Big Data # 1 18
Prescriptive Analytics
Prescriptive analytics build upon the results of predictive analytics by
prescribing actions that should be taken.
Sample questions may include:
• Among three drugs, which one provides the best results?
• When is the best time to trade a particular stock?
Techniques:
• Simulation optimization: Combines the use of probability and statistics to model
uncertainty with optimization techniques to find good decisions in highly complex
and highly uncertain
• Decision analysis
• Used to develop an optimal strategy when a decision maker is faced with several decision
alternatives and an uncertain set of future events
• Employs utility theory, which assigns values to outcomes based on the decision maker’s
attitude toward risk, loss, and other factors

09/24/2023 Analitik Big Data # 1 19


Prescriptive Analytics: Indicates a best course of action to take
• Optimization models: Models that give the best decision subject to
constraints of the situation
Model Field Purpose
Portfolio models Finance Use historical investment return data to determine the
mix of investments that yield the highest expected
return while controlling or limiting exposure to risk

Supply network Operations Provide the cost-minimizing plant and distribution


design models center locations subject to meeting the customer
service requirements
Price markdown Retailing Uses historical data to yield revenue-maximizing discount levels
models and the timing of discount offers when goods have not sold as
planned

09/24/2023 Analitik Big Data # 1 20


09/24/2023 Analitik Big Data # 1 21
Three developments spurred recent explosive growth in the use of analytical
methods in business applications:
• First development: Technological advances
• scanner technology, data collection through e-commerce, Internet social networks, and data
generated from personal electronic devices—produce incredible amounts of data for
businesses

• Second development: Methodological developments


• Advances in computational approaches to effectively handle and explore massive amounts
of data
• Faster algorithms for optimization and simulation
• More effective approaches for visualizing data

• Third development: explosion in computing power and storage


capability
• Better computing hardware, parallel computing, and cloud computing have enabled
businesses to solve big problems fasterAnalitik
09/24/2023 andBig more
Data # 1 accurately than ever before 22
Data Analytics in Business world
• Data mining is focused on better understanding
characteristics and patterns among variables in
large databases using a variety of statistical and
analytical tools
• Simulation and risk analysis relies on spreadsheet
models and statistical analysis to examine the
impacts of uncertainty in the estimates and their
potential interaction with one another on the
output variable of interest.
• Spreadsheets and formal models allow one to
manipulate data to perform what-if analysis—how
specific combinations of inputs that reflect key
assumptions will affect model outputs.
• Visualizing data and results of analyses provide a
way of easily communicating data at all levels of a
business and can reveal surprising patterns and
relationships.

09/24/2023 Analitik Big Data # 1 23


Big Data
Volume
Velocity
Variety
Veracity
Value
09/24/2023 Analitik Big Data # 1 24
Big Data
• Walmart handles over 1 million purchase transactions per hour.
Facebook processes morethan 250 million picture uploads per day. Six
billion cell phone owners around the world generate vast amounts of
data by calling, texting, tweeting, and browsing the web on a daily basis.
• Because data can now be collected electronically, the available amounts
of it are staggering. The Internet, cell phones, retail checkout scanners,
surveillance video, and sensors on everything from aircraft to cars to
bridges allow us to collect and store vast amounts of data in real time.
• the new term BIG DATA has been created.

09/24/2023 Analitik Big Data # 1 25


Big data
• Businesses have realized that understanding big data can lead to a
competitive advantage.
• Although big data represents opportunities, it also presents challenges in
terms of data storage and processing, security, and available analytical talent.
• This has led to new technologies like Hadoop—an open-source programming
environment that supports big data processing through distributed storage and
distributed processing on clusters of computers.
• MapReduce is a programming model used within Hadoop that performs the two
major steps for which it is named: the map step and the reduce step. The map step
divides the data into manageable subsets and distributes it to the computers in the
cluster (often termed nodes) for storing and processing. The reduce step collects
answers from the nodes and combines them into an answer to the original problem.

09/24/2023 Analitik Big Data # 1 26


Big Data
Big data: A set of data that cannot be managed, processed, or analyzed
with commonly available software in a reasonable amount of time
• Represents opportunities
• Presents challenges in terms of data storage and processing, security, and
available analytical talent
• More companies are hiring data scientists who know how to process and
analyze massive amounts of data

09/24/2023 Analitik Big Data # 1 27


• Volume
• Because data are collected electronically,
we are able to collect more of it.
• To be useful, these data must be stored,
and this storage has led to vast
quantities of data. Many companies now
store in excess of 100 terabytes of data
(a terabyte is 1,024 gigabytes).

Typical data sources :


• online transactions, such as point-of-sale and banking
• scientific and research experiments, such as the Large Hadron Collider
and Atacama Large Millimeter/Submillimeter Array telescope
• sensors, such as GPS sensors, RFIDs, smart meters and telematics
• social media, such as Facebook and Twitter
09/24/2023 Analitik Big Data # 1 28
Velocity
• Real-time capture and analysis of data
present unique challenges both in how data
are stored and the speed with which those
data can be analyzed for decision making.
• For example, the New York Stock Exchange
collects 1 terabyte of data in a single trading
session, and having current data and real-
time rules for trades and predictive modeling
are important for managing stock portfolios.

09/24/2023 Analitik Big Data # 1 29


• Variety
In addition to the sheer volume and speed with which companies
now collect data, more complicated types of data are now
available and are proving to be of great value to businesses.

09/24/2023 Analitik Big Data # 1 30


• Veracity
• Veracity has to do with how much uncertainty is in the data.
• For example, the data could have many missing values, which makes reliable
analysis a challenge.
• Inconsistencies in units of measure and the lack of reliability of responses in
terms of bias also increase the complexity of the data.

09/24/2023 Analitik Big Data # 1 31


Value
Value is defined as the usefulness of data for an enterprise.

Data that has high veracity and can be analyzed quickly has more
value to a business.
09/24/2023 Analitik Big Data # 1 32
Apart from veracity and time, value is also impacted by the
following lifecycle-related concerns:
• How well has the data been stored?
• Were valuable attributes of the data removed during data
cleansing?
• Are the right types of questions being asked during data analysis?
• Are the results of the analysis being accurately communicated to
the appropriate decision-makers?

09/24/2023 Analitik Big Data # 1 33


Big Data
• The five Vs have led to new technologies
• Hadoop: An open-source programming environment that supports big data
processing through distributed storage and processing over multiple
computers
• MapReduce: A programming model used within Hadoop that performs two
major steps: the map step and the reduce step
• Data security: The protection of stored data from destructive forces
or unauthorized users

09/24/2023 Analitik Big Data # 1 34


Different Types of Data
human-generated or machine-generated

• structured data
• unstructured data
• semi-structured data
09/24/2023 Analitik Big Data # 1 35
Structured Data
Structured data conforms to a data model or schema and is often stored in tabular form.

Unstructured Data
Data that does not conform to a data model or data schema  makes up 80% of the data
within any given enterprise.

Semi-structured Data
Semi-structured data has a defined level of structure and consistency, but is
not relational in nature.
Instead, semi-structured data is hierarchical or graph-based. This kind of
data09/24/2023
is commonly stored in files thatAnalitik
contain
Big Data # 1 text. 36
Examples of common sources of semi-structured data include electronic
data interchange (EDI) files, spreadsheets, RSS feeds and sensor data.

09/24/2023 Analitik Big Data # 1 37


Jawab pertanyaan berikut:
1. Jelaskan entitas dari ETI.
2. Jelaskan apa yang mendorong ETI menerapkan teknologi BIG DATA
3. Jelakan apa analitik yang diaplikasikan oleh ETI dan kenapa?
4. Jelaskan karakteristik dan tipe data yang didefinisikan oleh ETI
5. Jelaskan outcome yang diperoleh ETI dengan menerapkan BIG DATA.
09/24/2023 Analitik Big Data # 1 38

You might also like