Professional Documents
Culture Documents
& Framework
Pretest
• https://www.menti.com
• Join code: 6480 0540
Big Data
Mobile phones, social media, imaging technologies to determine a medical diagnosis̶all these
and more create new data, and that must be stored somewhere for some purpose.
Devices and sensors automatically generate diagnostic information that needs to be stored and
processed in real time.
Data Deluge
HOW BIG?
Attributes of Big Data
Huge volume of data: Rather than thousands or millions of rows, Big Data can be billions of
rows and millions of columns.
Complexity of data types and structures: Big Data reflects the variety of new data sources,
formats, and structures, including digital traces being left on the web and other digital
repositories for subsequent analysis.
Speed of new data creation and growth: Big Data can describe high velocity data, with rapid
data ingestion and near real time analysis.
Distinctive
• Big Data cannot be efficiently analyzed
using only traditional databases or
methods.
• Big Data problems require new tools and
technologies to store, manage, and
realize the business benefit.
• These new tools and technologies enable
creation, manipulation, and management
of large datasets and the storage
environments that house them
• Most of the Big Data is unstructured or
semi-structured in nature, which requires
different techniques and tools to
process and analyze
Big Data Ecosystems
DATA
ANALYTICS VS
BUSINESS
INTELLIGENCE
DATA
DRIVEN
DECISION
MAKING
DATA
ANALYTICS
SKILL SET
Accounting & Data
Analytics
• Auditing
• KPMG reports that:
• Audit must better embrace technology.
• Technology will enhance the quality,
transparency, and accuracy of the audit.
• Financial Reporting
• The use of so many estimates and
valuations in Financial Accounting, some
believe that employing Data Analytics may
substantially improve the quality of the
estimates and valuations.
THE
FRAMEWORKS
• Data science projects differ from most traditional Business
Intelligence projects and many data analysis projects in that
data science projects are more exploratory in nature.
• It is critical to have a process to govern them and ensure that
the participants are thorough and rigorous in their approach
Traditional
Theories
Positivism
Empirical
Data Inquiry
Evidence
Analysis
The
• Scientific method in use for centuries, still provides a
solid framework for thinking about and deconstructing
problems into their principal parts. One of the most
Frameworks
valuable ideas of the scientific method relates to
forming hypotheses and finding ways to test ideas.
• CRISP-DM provides useful input on ways to frame
analytics problems and is a popular approach for data
mining.
• Tom Davenport’s DELTA framework: The DELTA
framework offers an approach for data analytics
projects, including the context of the organization’s
skills, datasets, and leadership engagement.
• Doug Hubbard’s Applied Information Economics (AIE)
approach: AIE provides a framework for measuring
intangibles and provides guidance on developing
decision models, calibrating expert estimates, and
deriving the expected value of information.
• “MAD Skills” by Cohen et al. offers input for several of
the techniques that focus on model planning, execution,
and key findings.
• Data Analytics model called the IMPACT cycle, by Isson
and Harriott
CRISP DM
Stage 1. Business Understanding
• Format? • Characteristic
• Sourcing? • Types
• Size? • Priority
Exploring
Data
• Use this phase of CRISP-DM to
explore the data with the tables,
charts, and other visualization tools
Writing a Data Exploration Report
Selecting Modeling Generating a Test Design Building the Models Assessing the Model
Techniques •Describing the criteria for •Parameter settings include the •Adjust the parameters of existing
•The data types available for mining. "goodness" of a model notes you take on parameters that models.
•Defining the data on which these produce the best results. •Choose a different model to
•Your data mining goals.
criteria will be tested •The actual models produced. address your data mining problem.
•Specific modeling requirements
•Descriptions of model results
Stage 5. Evaluation
• Review Process
• Determining the Next Steps
Stage 6. Deployment