SESI #3 Analytics and Big
Data
OUTLINE
• Pendefinisian Analitik BIG Data
• Understanding Text Analytics and Big Data
• Customized Approaches for Analysis of Big Data
Big Data Result
Basic Analytic
Slicing and dicing:
• Slicing and dicing refers to breaking down your data into smaller sets of data that are
easier to explore.
Basic monitoring:
• You might also want to monitor large volumes of data in real time
Anomaly identification:
• You might want to identify anomalies, such as an event where the actual observation
differs from what you expected, in your data because that may clue you in that
something is going wrong with your organization, manufacturing process, and so on
Advanced Analytic
Advanced analytics provides advanced analytics is becoming more
algorithms for complex mainstream. With increases in
analysis of either structured computational power, improved data
infrastructure, new algorithm
or unstructured data. It development,and the need to obtain
includes sophisticated better insight from increasingly vast
statistical models, machine amounts of data, companies are
pushing toward utilizing advanced
learning, neural networks, analytics as part of their decision-
text analytics and other making process. Businesses realize
advanced data-mining that better insights can provide a
techniques. superior competitive position.
Advanced Analytic Technique
Predictive modeling:
• Predictive modeling is one of the most popular big data advanced analytics use cases. A predictive model
is a statistical or data-mining solution consisting of algorithms and techniques that can be used on both
structured and unstructured data (together or individually) to determine future outcomes.
Text analytics:
• Unstructured data is such a big part of big data, so text analytics — the process of analyzing unstructured
text, extracting relevant information, and transforming it into structured information that can then be
leveraged in various ways — has become an important component of the big data ecosystem.
Other statistical and data-mining algorithms:
• This may include advanced forecasting, optimization, cluster analysis for segmentation or even micro
segmentation, or affinity analysis
Operationalized analytics
When you operationalize analytics, you make them part of a business process.
For example, statisticians at an insurance company might build a model that predicts the likelihood of a claim being fraudulent.
The model, along with some decision rules, could be included in the company’s claims-processing system to flag claims with a high probability of
fraud. These claims would be sent to an investigation unit for further review. In other cases, the model itself might not be as apparent to the end
user.
For example, a model could be built to predict customers who are good targets for upselling when they call into a call center.
The call center agent, while on the phone with the customer, would receive a message on specific additional products to sell to this customer.
The agent might not even know that a predictive model was working behind the scenes to make this recommendation.
Monetizing analytics
Analytics can be used to optimize your business to create better decisions and drive bottom- and top-line revenue.
However, big data analytics can also be used to derive revenue above and beyond the insights it provides just for your
own department or company. You might be able to assemble a unique data set that is valuable to other companies, as
well.
For example, credit card providers take the data they assemble to offer value-added analytics products.
Likewise, with financial institutions. Telecommunications companies are beginning to sell location-based insights to
retailers.
The idea is that various sources of data, such as billing data, location data, text-messaging data, or web-browsing data
can be used together or separately to make inferences about customer behavior patterns that retailers would find
useful. As a regulated industry, they must do so in compliance with legislation and privacypolicies.
Modifying Business Intelligence
Products to Handle Big Data
Traditional business intelligence products weren’t really designed
to handle big data. They were designed to work with highly
structured, well-understood data, often stored in a relational data
repository and displayed on your desktop or laptop computer.
This traditional business intelligence analysis is typically applied to
snapshots of data rather than the entire amount of data available.
Modifying Business Intelligence
Products to Handle Big Data
It can come from untrusted sources. Big data analysis often involves aggregating data from various
Data
sources. These may include both internal and external data sources.
It can be dirty. Dirty data refers to inaccurate, incomplete, or erroneous data. This may include
the misspelling of words; a sensor that is broken, not properly calibrated, or corrupted in some
way; or even duplicated data.
The signal-to-noise ratio can be low. In other words, the signal (usable information) may only be a
tiny percent of the data; the noise is the rest. Being able to extract a tiny signal from noisy data is
part of the benefit of big data analytics, but you need to be aware that the signal may indeed be
small.
It can be real-time. In many cases, you’ll be trying to analyze real-time data streams.
Modifying Business Intelligence
Products to Handle Big Data
Analytical
When you’re considering big data analytics, you need to be aware that when you expand beyond the desktop,
the algorithms you use often need to be refactored, changing the internal code without affecting its external
functioning. The beauty of a big data infrastructure is that you can run a model that used to take hours or days in
algorithms minutes. This lets you iterate on the model hundreds of times over. However, if you’re running a regression on a
billion rows of data across a distributed environment, you need to consider the resource requirements relating to
the volume of data and its location in the cluster. Your algorithms need to be data aware.
This approach of running analytics closer to the data sources minimizes the amount of stored data by retaining
only the high-value data. It is also enables you to analyze the data sooner, looking for key events, which is critical
for real-time decision making.
Of course, analytics will continue to evolve. For example, you may need realtime visualization capabilities to
display real-time data that is continuously changing. How do you practically plot a billion points on a graph plot?
Or, how do you work with the predictive algorithms so that they perform fast enough and deep enough analysis
to utilize an ever-expanding, complex data set? This is an area of active research
Modifying Business Intelligence
Products to Handle Big Data
Infrastructure ✓ Integrate technologies: The infrastructure needs to integrate new big data
support technologies with traditional technologies to be able to process all kinds of big
data and make it consumable by traditional analytics.
✓ Store large amounts of disparate data: An enterprise-hardened Hadoop
system may be needed that can process/store/manage large amounts of data
at rest, whether it is structured, semi-structured, or unstructured.
✓ Process data in motion: A stream-computing capability may be needed to
process data in motion that is continuously generated by sensors, smart
devices, video, audio, and logs to support real-time decision making.
✓ Warehouse data: You may need a solution optimized for operational or
deep analytical workloads to store and manage the growing amounts of
trusted data
Big Data Analytics Solutions
Understanding Text Analytics
and Big Data
Exploring Unstructured Data
In return for a loan that I have received, I promise to pay $2,000 (this amount is called principal), plus
interest, to the order of the lender. The lender is First Bank. I will make all payments under this note in the
Documents: form of cash, check, or money order. I understand that the lender may transfer this note. The lender or
anyone who takes this note by transfer and who is entitled . . .
✓ E-mails: Hi Sam. How are you coming with the chapter on big data for the For Dummies book? It is due on Friday.
Joanne
✓ Log files: [Link]- - [08/Oct/[Link] -0400] “GET / HTTP/1.1” 200 10801
“[Link] . .
✓ Tweets:
#Big data is the future of data!
✓ Facebook
posts: LOL. What are you doing later? BFF
Understanding Text Analytics
Numerous methods exist for analyzing unstructured data.
Historically, these techniques came out of technical areas such as Natural Language
Processing (NLP), knowledge discovery, data mining, information retrieval, and statistics.
Text analytics is the process of analyzing unstructured text, extracting relevant information,
and transforming it into structured information that can then be leveraged in various ways.
The analysis and extraction processes take advantage of techniques that originated in
computational linguistics, statistics, and other computer science disciplines
The difference between text analytics and
search
Analysis and Extraction Techniques
• examines the characteristics of an individual word — including prefixes, suffixes, roots, and parts of
speech(noun,verb, adjective, and so on) — information that will contribute tounderstandingwhat the
word means in the context of the text provided. Lexical analysis depends on a dictionary, thesaurus, or
any list of words that provides information about those words. In the case of a wirelesscommunication
Lexical/morphological analysis
company’s sales promotion, a dictionary might provide the information that promotion is a noun that
can mean an advancement in position, an advertising or publicity effort, or an effort to encourage
someone’s growth. Lexical analysis would also enable an application to recognize that promotion,
promotions, and promoting are all versions of the same word and idea.
• uses grammatical structure to dissect the text and put individual words into context. Here you are
widening your gaze from a single word to the phrase or the full sentence. This step might diagram the
relationship between words (the grammar) or look for sequences of words that form correct sentences
Syntactic analysis or for sequences of numbers that represent dates or monetary values. For example, the wireless
communication company’s call center records included this complaint: “The customer thought it was
ridiculous that roll-over minutes were not in the plan.” Syntactic analysis would tag the noun phrases in
addition to providing the part-of-speech tags.
• determines the possible meanings of a sentence. This can include examining word order and sentence
Semantic analysis structure and disambiguating words by relating the syntax found in the phrases, sentences, and
paragraphs.
Discourse-level analysis • attempts to determine the meaning of text beyond the sentence level.
Understanding the extracted information
✓ Terms: Another name for keywords.
✓ Entities: Often called named entities, these are specific examples of abstractions (tangible or intangible). .
✓ Facts: Also called relationships, facts indicate the who/what/where relationships between two entities. John Smith is the CEO of
Company Y and Aspirin reduces fever are examples of facts.
✓ Events: While some experts use the terms fact, relationship, and event interchangeably, others distinguish between events and
facts, stating that events usually contain a time dimension and often cause facts tochange. Examples include a change in
management within a company or the status of a sales process.
✓ Concepts: These are sets of words and phrases that indicate a particular idea or topic with which the user is concerned. This can be
done manually or by using statistical, rule-based, or hybrid approaches to categorization.
✓ Sentiments: Sentiment analysis is used to identify viewpoints or emotions in the underlying text. Some techniques do this by
classifying text. Sentiment analysis has become very popular in “voice of the customer” kinds of applications
Taxonomies
Taxonomies are often critical to text analytics. A taxonomy is a method for
organizing information into hierarchical relationships. It is sometimes referred to
as a way of organizing categories. Because a taxonomy defines the relationships
between the terms a company uses, it makes it easier to find and then analyze
text
Taxonomies can also use synonyms and alternate expressions, recognizing that
cellphone, cellular phone, and mobile phone are all the same. These taxonomies
can be quite complex and can take a long while to develop.
Putting Your Results Together
with Structured Data
Putting Big Data to Use
✓ What are major areas of complaints by customers and how are these changing over time?
Voice of
✓ What is the level of satisfaction of customers with specific services?
the
customer ✓ What are the most frequent issues that lead to customer churn?
✓ What are some key customer segments that provide higher potential upsell opportunities?
✓ What are people saying about my brand?
Social ✓ What do they like about my brand?
media ✓ What do they dislike about my brand?
analytics ✓ How does my brand compare to my competitors’?
✓ How loyal are my customers?
Text Analytics Tools for Big Data
Customized Approaches for
Analysis of Big Data
Building New Models and Approaches
to Support Big Data
Characteristics of big data analysis
• ✓ Decision-oriented
• ✓ Action-oriented
The additional characteristics of big data analysis
• ✓ It can be programmatic
• ✓ It can be data driven
• ✓ It can use a lot of attributes
• ✓ It can be iterative
• ✓ It can be quick to get the compute cycles you need by leveraging a cloud-based
Infrastructure as a Service
Understanding Different Approaches
to Big Data Analysis
Custom applications for big data analysis
• R environment
• Google Prediction API
Semi-custom applications for big data analysis
• ✓ Speed to deployment
• ✓ Stability
• ✓ Better quality
• ✓ More flexibility
Characteristics of a Big Data
Analysis Framework
✓ Support for multiple data Many organizations are incorporating, or expect to incorporate, all types of data as part of their big data
types: deployments, including structured, semi-structured, and unstructured data.
✓ Handle batch processing Action orientation is a product of analysis on real-time data streams, while decision orientation can be adequately
and/or real time data streams: served by batch processing. Some users will require both, as they evolve to include varying forms of analysis.
✓ Utilize what already exists in To get the right context, it may be important to leverage existing data and algorithms in the big data analysis
your environment: framework.
✓ Support NoSQL and other While organizations will continue to use SQL, many are also looking at newer forms of data access to support faster
newer forms of accessing data: response times or faster times to decision.
✓ Overcome low latency: If you’re going to be dealing with high data velocity, you’re going to need a framework that can support the
requirements for speed and performance.
✓ Provide cheap storage: Big data means potentially lots of storage — depending on how much data you want to process and/or keep. This
means that storage management and the resultant storage costs are important considerations.
✓ Integrate with cloud The cloud can provide storage and compute capacity on demand. More and more companies are using the cloud as
deployments: an analysis “sandbox.”