Professional Documents
Culture Documents
1
The Evolution of Analytics
2
The Evolution of Analytics
3
The Evolution of Analytics
4
Big Data Perspectives
5
Big Data Impacts to Core Business Practices
6
Big Data Market Revenue Forecast
7
Size of Hadoop & Big Data Market
8
Big Data Market by Segment
9
Big Data Revenue by Segment
10
The Fastest Growing Category
11
Big Data Initiatives and Success Rate
12
Big Data in Business & Government
13
Big Data in Business & Government
14
Big Data Practices and Inisiatives
15
16
Introducing Big Data
• Big Data is a strategic initiative build upon premise
that the internal data does not hold all answers.
• Big Data has the ability to change the nature of
business.
• Many organization sole existence is based upon their
capability to generate insights that only Big Data can
deliver.
• Big Data is not just about technology—it is also
about how these technologies can propel an
organization forward.
17
Big Data as A Field
• Analysis, processing, storing of large collections of
data from many sources.
• Big Data = Traditional Statistics + Analytic Algorithms
• Increasingly important as
• Datasets continue to become Larger, Diverse, Complex and
Streaming-centric.
• Advances in computational sciences have allowed the
processing of entire datasets, making sampling as done in
traditional statistics become unnecessary.
• Interdisciplinary: Mathematics, Statistics, Computer
Science, and Subject Matter Expertise.
18
Data Within Big Data
• Accumulates within enterprise via
applications, sensors, and external sources.
• Can be processed by a Big Data solution.
• Can be used by enterprise application directly.
• Can be fed into a data warehouse to enrich the
existing data.
19
Insights and Benefits
• Operational optimization
• Actionable intelligence
• Identification of new markets
• Accurate predictions
• Fault and fraud detection
• More detailed records
• Improved decision-making
• Scientific discoveries
20
Concepts and Terminology
• Datasets
• Data Analysis
• Data Analytics
• Business Intelligence (BI)
• Key Performance Indicators (KPI)
21
Datasets
23
Data Analytics
• A broader term that encompasses data
analysis.
• Include the management of the
complete data life-cyle:
• Collecting
• Cleansing
• Organizing
• Storing Figure 1.3 The
• Analyzing symbol used to
• Governing represent data
analytics.
• Development of analysis methods,
scientific techniques, and automated
tools.
24
Data Analytics
• Uses highly scalable distributed
technologies and frameworks that
capable of analyzing large volumes of
data from different sources.
• Identifying, procuring, preparing, and
analyzing large amount of raw,
unstructured data to extract Figure 1.3 The
meaningful information for symbol used to
represent data
• Identifying patterns analytics.
• Enriching existing enterprise data
• Performing large-scale searches
25
4 Categories of Analytics
27
Descriptive Analytics
29
Diagnostic Analytics
• Provides more value than descriptive analytics.
• Requires a more advanced skillset.
• Requires collecting data from multiple sources.
• Storing data in a structure that facilitate performing
drill-down and roll-up analysis.
• Viewed via interactive visualization tools that
enable users to identify trends and patterns.
• What information is related to the phenomenon.
30
Diagnostic Analytics
31
Predictive Analytics
• To determine the outcome of an event that might
occur in the future.
• Enhance information with meaning to generate
knowledge (how information is related).
• Generate future predictions based on past events.
• Examples:
• What are the chances a customer will default on a loan
if he have missed a monthly payment?
• What the patient survival rate if Drug B is administered
instead of Drug A?
• If a customer purchased Products A dan B, does he will
buy C?
32
Predictive Analytics
• Predict the outcomes of events based on patterns,
trends, and exception in historical and current data.
• Identification of both risks and opportunities.
• Involves the use of large datasets comprised of
internal and external data and various data analysis
techniques.
• Has greater value and requires a more advanced
skillset than both descriptive and diagnostic
analytics.
33
Predictive Analytics
34
Prescriptive Analytics
• Build upon the results of predictive analytics.
• Prescribe actions that should be taken
• Focus not only on prescribing the best option to
follow, but also why.
• Results can be reasoned about because they
embed elements of situational understanding.
• Gain advantage or mitigate a risk.
• Examples:
• Among three drugs, which one provides the best
results?
• When the best time to trade a particular stock?
35
Prescriptive Analytics
• Has more value than any other type of analytics.
• Requires the most advanced skillset, as well as
specialized software and tools.
• Calculates various outcomes.
• Suggests the best course of action for each
outcome.
• Incorporates internal data with external data.
• Internal Data: business rules, historical data,
customer information, product data.
• External Data: social media, weather forecasts,
government-produced demographic data.
36
Prescriptive Analytics
38
Business Intelligence
39
Key Performance Indicators (KPI)
• A metric to gauge success within a particular business context.
• Linked to overall enterprise’s strategic goals and objectives.
• Identify business performance problems.
• Demonstrate regulatory compliance.
• Act as quantifiable reference points for measuring a specific aspect of
a business’ overall performance.
40
Key Performance Indicators (KPI)
41
Big Data Characteristics
42
Volume
• The volume of data is substantial and ever growing.
• High data volumes impose distinct:
• Data storage,
• Processing demands,
• Additional processes for data preparation, curation, and management.
• Data sources:
• Online transactions, POS and banking.
• Scientific research experiments.
• Sensors (GPS, RFID, Smart meters, and Telematics)
• Social media (Facebook and Twitter)
43
Volume
44
Velocity
• Data can arrive at fast speeds.
• Enormous datasets can accumulate within a very
short time.
• Demands highly elastic and available data processing
solutions and data storage capability.
• Depending on the data source, velocity may not
always high. Example: MRI scan images vs Internet
traffic logs.
45
Velocity
46
Variety
• Multiple formats and types of data that need to be
supported by Big Data solutions.
• Bring challenges for enterprise in terms of data
integration, transformation, processing, and storage.
47
Veracity
• Refers to quality or fidelity of data
• Leads to data processing activities to resolve invalid
data and remove noise.
• Data can be part of signal or noise.
• Noise is data that cannot be converted into
information and thus has no value.
• Data with a high signal-to-noise ratio has more
veracity.
• Data that is acquired in a controlled manner usually
contains less noise (online customer registrations vs
blogs.)
48
Value
• The usefulness of data for an enterprise.
• Related to the veracity characteristics
• Depends on how long data processing takes, because analytics results
have a shelf-life.
• Value and time inversely related.
• Stale results inhibit quality and speed of informed decision-making.
49
Value, Veracity and Time
Figure 1.15 Data that has high veracity and can be analyzed quickly has
more value to a business.
50
Value Lifecycle-related Concerns
• How well the data has been stored?
• Were valuable attributes of the data removed duing
data cleansing?
• Are the right types of questions being asked during
data analysis?
• Are the results of the analysis being accurately
communicated to the appropriate decision-makers?
51
Data Sources
• Human-generated: the result of human interaction
with systems:
• Online services
• Digital devices
52
Human-generated Data
53
Machine-generated Data
55
Structured Data
• Conforms to a data model or data schema.
• Often stored in tabular form.
• Often stored in a relational database
• Frequently generated by enterprise applications and IS such as ERP
and CRM sytems.
• Rarely requires special consideration in processing or storage.
• Examples:
• Banking transactions
• Invoices
• Customer records Figure 1.18 The symbol used
to represent structured data
stored in a tabular form.
56
Unstructured Data
• Does not conform to a data model or data schema.
• 80% data within any given enterprise.
• Has faster growth rate than structured data.
• Either textual or binary (image, audio, video data)
• Non-relational.
Figure 1.19 Video, image and audio files are all types
of unstructured data.
57
Unstructured Data
• Special purpose logic is usually required to process and store (Ex.
Correct codec to play a video file)
• Cannot be directly processed or queried using SQL.
• If stored within a relational database, it is stored in a table as a Binary
Large Object (BLOB)
• Not-only SQL (NoSQL) database is a non-relational database that can
be used to store unstructured data alongside structured data.
58
Semi-structured Data
• Has a defined level of structure and consistency, but is not relational in
nature.
• Hierarchical and graph based.
• More easily processed than unstructured data.
• Requires special pre-processing and storage, especially if the underlying
format is not text-based.
• Examples:
• EDI files
• Spreadsheets
• RSS feeds
• Sensor data Figure 1.20 XML, JSON and
sensor data are semi-structured.
59
Metadata
• Provides information about a dataset’s
characteristics and structures.
• Mostly machine-generated.
• Can be appended to data.
• Tracking of metadata is crucial to Big
Data processing, storage, and analysis.
Figure 1.21 The
• Examples: symbol used to
represent metadata
• XML tags about author and creation date.
• Attributes stating the file size and image
resolution of photo.
60