You are on page 1of 12

Chapter 8: Big Data Usage

Topic 9B

Hyung-Eun Choi

Assistant Professor of Finance, NEOMA Business School

Big Data for Finance, Spring 2024

1 / 12
8.1 Introduction

Big Data Usage is:


The core business tasks of advanced data usage is to support
business decisions by:
Prediction
In-use analytics
Simulation
Exploration
Visualisation
Modeling
Industry 4.0:
The digitization and interlinking of products, production
facilities, and transportation infrastructure as part of the
developing “Internet of Things” amount of data.
Again, what about finance 2.0?

2 / 12
8.2 Key Insights for Big Data Usage

Predictive Analytics:
A prime example for the application of predictive analytics is in
predictive maintenance based on sensor and context data to
predict deviations from standard maintenance intervals.
Examples of predictive analysis in the financial industry?
Industry 4.0:
A growing trend in manufacturing is the employment of
cyberphysical systems.
Industry 4.0 stands for the entry of IT into the manufacturing
industry and brings with it a number of challenges for IT
support.
i.e., planning and simulation, monitoring and control, interactive
use of machinery, logistics and enterprise resource planning
(ERP), predictive analysis, and eventually prescriptive analysis.

3 / 12
8.2 Key Insights for Big Data Usage (Cont.)

Smart Data and Service Integration:


Services that solve the tasks and enable the application of smart
services to deal with the big data usage problems.
on a hardware level - machines, facilities, networks;
on a conceptual level - intelligent devices, services, decisions;
on an infrastructure level - IaaS, PaaS, SaaS.
Interactive Exploration:
Data analysts have a greater need for exploring datasets and
analyses.
This is addressed through visual analytics and new and dynamic
ways of data visualization, but new user interfaces with new
capabilities for the exploration of data are needed.

4 / 12
8.3 Social and Economic Impact of Big Data Usage
Better understanding of decision-making processes:
The discovery of new relations and dependencies in the data
that lead, on the surface, to economic opportunities and more
efficiency.
Wherever data is publicly available, social decision-making is
supported; where relevant data is available on an individual-level,
personal decision-making is supported. potential.
Transparency through big data usage:
Regulations and agreements on data access, ownership,
protection, and privacy
Demands on data quality
Access to the raw data
Appropriate tools or services for big data usage
i.e., in the financial sector, the BEA, FRED, Census, BLS,
EDGAR, FINRA, etc.
can reduce the gap between large firms and small and medium
size companies.
5 / 12
8.4. Big Data Usage State-of-the-Art

8.4.1 Big Data Usage Technology Stacks


Big data usage applications use a whole stack of technologies.
The complete big data technology stack is much broader:
the hardware infrastructure such as storage systems, servers,
datacentre networking infrastructure, corresponding data
organization and management software,
a whole range of services ranging from consulting and
outsourcing to support and training on the business and
technology side.
Actual user access to data usage through specific tools:
Storage, query and scripting languages: SQL, RDBMS,
Hadoop (MapReduce, Hive, Pig), Scope, AQL/Algebricks, etc.
Analytics tools: SystemT, Matlab, SAS, Vertica, SPSS,
STATA, Tableau, etc.

6 / 12
8.4. Big Data Usage State-of-the-Art (Cont.)
8.4.1.1 Trade-Offs in Big Data Usage Technologies
Trade-offes amongst Volume. vs. Velocity vs. Variety
Practical resolutions are needed for your business solutions.
Example: Google’s YouTube Data Warehouse (YTDW) -
“medium” scales and “low” latency and “less” expressive power

7 / 12
8.4. Big Data Usage State-of-the-Art (Cont.)
c.f. ETL (Extract, Transform, Load)*
Extract, transform, load (ETL) is the general procedure of
copying data from data sources into a destination system.
Data extraction involves extracting data from homogeneous or
heterogeneous sources;
Data transformation processes data by data cleaning and
transforming them into a proper storage format/structure for
the purposes of querying and analysis;
Data loading describes the insertion of data into the final target
database such as an operational data store, a data mart, data
lake or a data warehouse.

*source: https://en.wikipedia.org/wiki/Extract,t ransform,l oadciten ote − Kimball2 004 − 4


8 / 12
8.4. Big Data Usage State-of-the-Art (Cont.)

8.4.2 Decision Support


The three business goals of decision support systems:
Lookup: On the lowest level of complexity, data is merely
retrieved for various purposes
- fact retrieval and searches for known items.
Learning: On the next level, these functionalities can support
knowledge acquisition and interpretation of data, enabling
comprehension.
- comparison, aggregation, and integration of data
Investigation: On the highest level of decision support systems,
data can be analysed, accreted, and synthesized.
- exclusion, negation, and evaluation.

9 / 12
8.4. Big Data Usage State-of-the-Art (Cont.)
8.4.3 Predictive Analysis
Predictive maintenance based on big data usage
Maintenance intervals are typically determined as a balance
between a costly, high frequency of maintenance and an equally
costly danger of failure before maintenance.
Longer maintenance intervals: as “unnecessary” interruptions
of production (or employment) can be avoided when the
regular time for maintenance is reached. A predictive model
allows for an extension of the maintenance interval, based on
current sensor data.
Lower number of failures as the number of failures occurring
earlier than scheduled maintenance can be reduced based on
sensor data and predictive maintenance calling for earlier
maintenance work.
Lower costs for failures as potential failures can be predicted
by predictive maintenance with a certain advance warning time,
allowing for scheduling maintenance/exchange work, lowering
outage times.
Any example for the financial industry?
10 / 12
8.4. Big Data Usage State-of-the-Art (Cont.)

8.4.4 Exploration
Exploring big datasets and the corresponding analytics results
can be distributed across multiple sources and formats:
new portals, travel blogs, social networks, web services, etc.
users have to start multiple requests to multiple, heterogeneous
sources and media. Finally, the results have to be combined
manually.
Support for the human trial-and-error approach can add value
In a first phase, relevant data is identified and then a second
learning phase context is added for such data.
A third exploration phase allows various operations for deriving
decisions from the data or transforming and enriching the data.

11 / 12
8.4. Big Data Usage State-of-the-Art (Cont.)
8.4.5 Iterative Analysis
Given the high volumes in big data applications, computations
are executed in parallel, distributing, storing, and managing the
state efficiently across multiple machines.
Batch-based systems such as MapReduce and Spark repeat all
computations in every iteration even when the (partial) results
do not change.
Truly iterative dataflow systems like Stratosphere of specialized
graph systems like GraphLab and Google Pregel exploit such
properties and reduce the computational cost in every iteration.
8.4.6 Visualization
Visualizing the results of an analysis including a presentation of
trends and other predictions by adequate visualization tools is
an important aspect of big data usage.
The selection of relevant parameters, subsets, and features is a
crucial element of data mining and machine learning with many
cycles needed for testing various settings.
Autonomous Driving Technology
ML visual recognition
Metaverse (AR/VR)
12 / 12

You might also like