Subtitle

[MUSIC] In the previous videos, you have heard
some of our speakers talk about data sources like social media,
emails, and documents. All things you may not typically
think of as data sources. In this video, we will define big data and
big data and analytics. Look at the differences between
structured, unstructured, and semi-structured data. And talk about
the applications of big data. For example, a large bank was looking at improving
their customer satisfaction ratings. The bank traditionally measured CSAT,
or Customer Satisfaction, and NPS, or Net Promoter Score,
as the two primary indicators. Both these were traditionally
obtained from periodic surveys. We worked with the bank to obtain and link a large
variety of
different sources of data. First, we used the surveys and analyzed the natural
language
text in the open comments field. This gave us a good view of the topic
areas of concern to their customers. Second, we recorded all the calls coming
to the customer service call center. Based on the speech analytics,
speech to text, and text analytics, we we're able
to identify the key reasons for the calls and the emotional state
of the callers and the responders. Finally, we analyzed the social
media channels to assess what their customers were saying
about the bank and their competitors. This allowed us to build a complete
view of the customer's interactions and sentiments about the bank,
which helped them make better decisions. The amount of information available
is exploding as digitarization, and the Internet of things, has increased
the number of data sources and the value and complexity of data. Now, we use big
data as a term to
describe a collection of data sets so large and complex that it becomes
difficult to process using basic database management tools or
traditional data processing applications. The large data sets involved can
consist of numerous data formats in either a structured, a semi-structured,
or an unstructured form. Let's take a look at what we mean
by structured, unstructured, and semi-structured data. Think about the list of
names, addresses,
and phone numbers found in a phone book. This is an example of structured data. It
is well defined data,
like customer names, ages, identifiers, etcetera, that you can collect formally.
The most popular platforms for
structured data include, Oracle, Microsoft SQL Server,
Microsoft Access, and so on. Big data can be associated with structured
data sources, but not exclusively. Now, lets look at unstructured data.
Unstructured data is not broken
down into individual components. The data is a bunch of sentences
that you need to make sense of, like in a Word document. It is a collection of
videos or
audio recordings on YouTube. It is millions of e-mails or
pictures or social media posts. It can be a recorded conversation. The challenge
is,
how do you take this unstructured data and do something meaningful with it? To
understand semi-structured data, take that Word document that represents
unstructured data and add metadata. Tags to keywords so
that it is easily searchable. Now you have semi-structured data. Semi-structured
data does not
conform to a structural format like relational or other standard formats. Semi-
structured data includes tags and
other markers to separate data elements. Big data is not just about the data. It is
about the interconnectedness
of the data. Big data sets can be linked together, and insights can be derived
from those linkages. Today, organizations capture and
store an ever-increasing amount of data. Internet availability, interconnectedness,
rapid connection speeds, and mobility contribute to the torrent
of data points being generated daily. Organizations want to
realize the potential value of these extreme size data sets, and
discard less and less information. Whether it is customer data or
internal data. However, the existing means to process and analyze data cannot scale
to
extreme sizes economically. As far back as 2001, industry analyst
Doug Laney, currently with Gartner, articulated a now mainstream
definition of big data as four Vs. Volume, velocity, variety, and veracity. First,
let's look at volume. Volume reflects the size of a dataset. New information is
generated daily,
and in some cases hourly, creating datasets that are measured
in terabytes and petabytes. Many factors contribute to
this increase in data volume. Transaction-based data
stored through the years. Unstructured data streaming
in from social media. Increasing amounts of sensor and
machine-to-machine data being collected. In the past,
excessive data volume was a storage issue. But with decreasing storage costs,
other issues have emerged. Including how to determine relevance
within the large data volumes, and how to use analytics to create
value from the relevant data. The second V we want to
look at is velocity. This reflects the speed at which
data is generated and used. New data is being created every second. In some cases,
it may need to
be analyzed just as quickly. Radio Frequency Identification, or
RFID tags, sensors, and smart metering, are driving the need to deal with
torrents of data in near real time. Reacting rapidly enough to deal with
data velocity is a challenge for most organizations. Variety is the third V, and
it represents the diversity of the data. Data sets will vary by time. Social
networking, media, text, and so on. And they will vary how
well they are structured. Data today comes in all types of formats. Structured,
numeric data,
and traditional databases. Information created from line
of business applications. Unstructured data in the form
of text documents, email, video, audio, stock ticker data and
financial transactions. Managing, merging, and
governing different varieties of data is something many organizations
still grapple with. Next, we have veracity. Data veracity refers to the biases,
noise and abnormality in data. Is the data that is being stored and mined
meaningful to the problem being analyzed? Veracity in data analysis
is the biggest challenge when compared to things like volume and
velocity. In scoping out your data and analytic
strategy, you need to have your team and partners work to help
keep your data clean, and create processes to prevent dirty data
from accumulating in your systems. Even more important than the definition
of data is what data promises to achieve. Effectively used, data can be transformed
into insights and intelligence. Delivered where and
when they are needed to make and implement strategic and
operational decisions. There is one more V to take into account
when looking at data and analytics. And that is value. Having access to data
creates value only when you have the right data
to clean strategic insights. Companies can generate
significant value from your data. An online retailer for example, was planning to
enhance
their recommendation engine. The current software relied
on a static set of rules to determine one of five different
paths through their website. They wanted to modify this to
make recommendations based on the individual profile of the customer,
the amount of time they spend on a page, the keywords they enter, and what other
customers like them have done in the past. So just how big is big data? Think about
these facts. More than half of new data created
is in video and audio formats. And by the year 2020, total global Internet traffic
will
exceed 200 exabytes per month. An exabyte is equal to
1 billion gigabytes. Global mobile traffic will increase
to 30 exabytes per month, and will increase by 50%
combined annual growth rate. The total number of users with Internet
access will exceed 3.5 billion. The total number of mobile
devices will exceed 10 billion. Think about that for a minute. What does all that
data mean for
organizations? How will organizations use this data? Big data is a game changer in
making business decisions. Let's look at how organizations
are currently using social media. In traditional use, businesses
use the convening power of social media to boost their image and
better anticipate consumer trends. Few organizations have harnessed
the potential power of social media for applications beyond marketing and
public relations. Let's take an example. A large investment bank is worried
about its compliance risk. Regulators are monitoring customer
complaints made directly to them as well as the social media channels
of the financial institutions. The investment bank built
a social media dashboard that monitored customer complaints
on a regular basis made to their social media site as
well as other public forums. The dashboard captured the rate of change
of the number of messages on a particular topic, as well as the rate of change of
sentiment with respect to the same topic. This allowed them to react and
respond fast whenever there was a change in volume or sentiment related
to themselves or their competitors. Let's recap what we just covered. Big data is
made up of structured,
unstructured, and semi-structured data. The amount of data that is being produced
is growing at an astounding rate. And the key to this data is
the interconnectedness of it all. We covered a lot of information here. You can use
the interactive PDF
to review the big data concepts. In the next video, you will hear
from some of our PWC professionals about how we have used big
data to solve client issues. [MUSIC]

Subtitle

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Subtitle

Uploaded by

Copyright:

Available Formats

[MUSIC] In the previous videos, you have heard

You might also like