You are on page 1of 12

Course Transcript

Big Data Interpretation


Course Overview
Read the Course Overview .

Interpreting Big Data


1. Big Data and the Data Analysis Process

2. Big Data and Business Intelligence

3. Basic Analytics for Big Data

4. Advanced Analytics for Big Data

5. Data Storage, Management, Cleaning, and Mining Tools

6. Data Analysis, Visualization, and Integration Tools

7. Big Data Analysis Challenges


Course Overview
[Course title: Big Data Interpretation] In the world of big data, you need to know how to identify, gather, and analyze data in order to use it
effectively. In this course you'll learn about the data analysis process and common basic and advanced analytics methods, including data
mining. You'll also discover some of the most common big data tools and their associated uses, and some challenges to keep in mind
when undertaking big data analysis activities in your organization.
Big Data and the Data Analysis Process
Learning Objective
After completing this topic, you should be able to
sequence the five steps of the data analysis process

1.
[Topic title: Big Data and the Data Analysis Process] What do you think "big data" is? Wading through streams of numbers, figures, and
characters that represent infinite amounts of data comes to mind.

once it's collected. It's about applying business intelligence to
But big data isn't just collecting data. It's what you do with that data
organize, vet, and analyze it so it makes sense. This helps you transform data into valuable insights that you can use to drive cost
reductions, smart business decisions, and innovative product development.
We know that sifting through large amounts of data is pretty tough. That's why
you need an effective process in place – not only to simplify
your decisions, but to crank up your data analysis skills as well!

Begin by asking defining questions. Keep questions relevant, measureable, and succinct. When you're planning your questions it's also
helpful to think about possible solutions. This pre-planning will help you to phrase the questions better so that you can easily distinguish
possible solutions from unworkable solutions. For instance, will employing fewer people employ affect the quality of work done?
The next step is to set measurement priorities...deciding what to measure and how to measure it.

For instance, if you're reducing staff, you'll need to know how many staff members you employ, what it costs to employ them, and consider
the effects of a reduction. This takes care of the what.

And the how? You'll need to know: What this staff reduction will cost you in the long run? What's the unit you will measure in? Maybe it's
the currency you are using to measure the cost savings. And are there any other factors – staff benefits, for example?
Once you've set your measurement priorities, it's time to collect data! Start with data you already have on existing databases
before using
other sources. And if you're working with a team? Use a collaborative file storing system. Be sure to log all your data by date and include
references.
Then it's time
to dig deep…to analyze your data. Analyzing helps you gauge whether you've gathered enough information. Consider using
different ways to picture the data. Pivot tables, for example, can help you find correlations or variations between batches of data.
Finally, you interpret your results. If the data answers your initial question and helps you defend against pre-empted objections, in
both
cases, ask how. Also ask if your conclusion is limited in any way, and whether all the perspectives have been considered.

If your conclusion responds positively to these questions and considerations, you're off to a good start. And once you've interpreted the
results of your analysis, you can use them to decide what action to take.
Using these steps to sort through mountains of data can help you make
better comparative decisions for your business.
Big Data and Business Intelligence
Learning Objective
After completing this topic, you should be able to
identify the four data analysis categories

1.
[Topic title: Big Data and Business Intelligence] Working with big data can sometimes seem like you're navigating a ship, at night, in a
storm, and you don't know which way to go. But if you get it right, big data, and the right analytics, can offer great insights into various data
relationships.

are several navigational tools you can use. In the same way you'll find that there are many data analysis tools and
As a sailor there
methods, but they all generally fall into one of four categories.
Possibly the best way to predict the future is by looking at past
behavior. This is the basis for predictive analytics. It uses big data to
identify historical patterns or behaviors to predict the future.
For instance, banks use predictive analytics to measure credit
risks and to gauge the chances of fraudulent activity before it happens. The
retail industry is another good example. Take Amazon's Customers Who Bought This Also Bought feature...it uses a customer's past
purchase history to predict what they or other customers may want to buy in future.

Let's move on to prescriptive analytics. Generally, when you get sick you go to the doctor. And depending on your symptoms, he
prescribes specific medication to "fix" your problem. That's how prescriptive analytics works. In the same way a doctor prescribes
medication, prescriptive analysis suggests possible solutions...or actions based on a specific issue. But although this kind of analysis is
probably the best analysis method, it's hardly ever used by businesses.
You can compare diagnostic analytics to Six Sigma's 5 Whys – it drills down
into the why of a concern. Each time you answer one why –
like "why did this happen?" You ask why again until you get to the root cause. Because it's quite laborious, not many companies use this
method consistently. Usually the results will be compiled into an analytic dashboard.
Finally, we have descriptive analytics. Sometimes referred to as data mining, this kind
of analysis helps with identifying patterns or
behaviors that offer further insight by describing what has happened in the past. Knowing what's happened in the past – in most cases – is
a good indicator of what will happen in the future. It's useful to display historic data for things like a company's sales and financial reports.
Big data analytics, regardless of their category, provide businesses with a better understanding of what's happening now and is a good

indicator of what should happen next.


Basic Analytics for Big Data
Learning Objective
After completing this topic, you should be able to
recognize basic data analysis methods

1.
[Topic title: Basic Analytics for Big Data] When it comes to big data, basic analytic methods are good when you have vast amounts of
unlike data. Or if you have data that you want to explore the value of. Most basic analytic methods include simple stats and visualizations.
Imagine being asked to do a search engine optimization review for a blog that doesn't have any specific theme or topic. There are articles

about food, trending clothes, events, and products in no particular order. They want to know what topics, articles, or keywords bring in the
most traffic. How would you go about tackling this review?
One method you can use is slicing and dicing. From the name,
this method involves breaking down data into smaller segments, which
make it easier to understand and work with. To figure out which topics bring in the most traffic on the blog, you can filter by keywords for
each topic, then plot your findings in a graph. You could also use this method to find out which keywords are the most or least used,
regardless of the topic. So you'll want to use this method to focus on answering specific questions in certain areas.
Things are going great, and the blog has created a frenzy about a new product launch that they have exclusive right to cover. On the day
the product launches, you expect the blog traffic to increase and decide to monitor it in real time to check on progress. This is what you'd
call basic monitoring. It creates huge sets of data from various sources across the web – think tweets, Instagram pictures, or even simple
text comments on the blog page. But be careful. Because of the large amounts of different types of data being generated, you won't want
to use this method too often unless it's necessary.
But what if, on the day the product launches, there's
hardly any traffic on the blog? Anomaly identification takes care of this. This method
involves looking for irregularities in your data to clue you in on what could be wrong. Especially when certain things are not as you'd
expect it to be. You know that traffic had been increasing steadily as the product launch date drew closer, so what could've happened to
the blog traffic on that specific day? You could look at the keywords being used, and perhaps which keywords haven't been but should
have been used to explain this deviation.
Using these methods to understand big chunks
of data can go a long way when trying to figure out what's valuable in your business.

Advanced Analytics for Big Data


Learning Objective
After completing this topic, you should be able to
recognize advanced data analysis methods

1.
[Topic title: Advanced Analytics for Big Data] In life, any complex issue generally requires a specialized solution. The same goes for
complex big data analysis; it requires sophisticated analytics methods for both structured and unstructured data.
Advanced analytics is best used for finding patterns in complex data, forecasting, and advanced event processing
needs.

If there was a way for you to look into the future, wouldn't you want to know it? This is the basis for predictive modeling. Algorithms are
applied to both structured and unstructured data – separately or together – to predict future outcomes. For example, a cable company
could use this model to forecast who's most likely to cancel their service.

And the text analytics method? It's mainly used to make sense of unstructured data. It involves examining fragmented text, taking out
useful information, and then arranging it logically. Then you can use this structured information as the input for all sorts of analysis, like
predicting fraud, or customer trends.
Data mining in advanced analytics involves identifying patterns from large data sets. When you use data mining techniques, you want to
know if the data will be used for classification or prediction.

a company might be interested in the characteristics of employees who take
Classification is all about sorting data into sets. For instance,
all their sick days and those who don't take any sick days.

And prediction? That's self-explanatory. This deals with predicting a nondiscrete or continuous variable value. For instance, a company
might try to predict who will respond to the incentive of earning extra vacation days.
Data mining makes use of various algorithms. One of them is a classification tree. It
uses predictor variables to classify dependent
categorical variables. The end result? A "tree" with interlinked nodes that can be used to form if-then rules. For instance, if a customer has
been with a cable company for more than 10 years, then they're more likely to remain loyal.
Logistical regression also deals with classification. It comes up with a formula that forecasts
the likelihood of an instance as a function of
the independent variables.
Another algorithm, which is
actually a software algorithm consisting of input and output nodes, as well as hidden layers, is the neural
network. Here, each element is weighted, so when data is put in, the algorithm alters and modifies the elements until it gets to a specific
stopping criteria. Neural networks can be compared to the working of animal brains – the algorithm is based on trial and error.
Another technique is K-nearest neighbors, which is an example of clustering. When it's implemented, it finds groups of records that are
alike, then computes the gap between those records and historical data points. The record is then allocated to the class of its closest
neighbor in a data set.
So these are the advanced
analysis techniques you need to know to up your big data game.
Data Storage, Management, Cleaning, and Mining
Tools
Learning Objective
After completing this topic, you should be able to
recognize some of the most common big data tools used for data storage,
management, cleaning, and mining activities

1.
[Topic title: Data Storage, Management, Cleaning, and Mining Tools] As the big data evolution continues, more tools become available to
use. Open source or proprietary – the choice is yours. There's no right or wrong toolset to use, but we will focus on some of the most
common big data storage, management, cleaning, and mining tools available today.
When you think about storing and managing big data, you'll need to choose a storage
provider that can handle running all your analytics
tools. They should also be able to offer you a platform to store and query your data.
Hadoop is one of these. It's open-source software that can store sizable datasets that
you can increase or decrease as needed. And it can
manage an almost unlimited amount of tasks simultaneously, due to its raw processing power.
And if you need help with building a data hub for your business, you could try Cloudera. Basically,
it's Hadoop with wings! It's still open-
source, but it's more of an enterprise solution for businesses to handle their Hadoop system – they do the Hadoop grunt work for you.
If you don't want to use relational databases, MongoDB is a great way to store and manage data. It's typically used for applications that

support singular views spanning numerous systems. It's also good for organizing data that's unstructured or continuously altered.

But data sets are vast and varied, and often "dirty". So the best thing to do is to take it for a rinse and spin before you mine for information.
OpenRefine is user-friendly software that offers data cleaning services to help you whip your data into shape, even if it's very messy,

unstructured data. Even large data sets are easily handled using this tool.
Another useful cleanup tool is DataCleaner. It saves you from doing all the
work by converting unstructured data sets into useable and
readable data.

Once you've cleaned your data, you can start the mining bit – filtering your data to look for patterns to help you with forecasting.
For predictive analysis, you can use RapidMiner to help you uncover useful information within your database. And if you already
have your
own algorithm, you can integrate it through their application program interface – or APIs.
The IBM SPSS Modeler is best suited to larger companies. Their data mining solutions include text analytics, decision management and
optimization, as well as entity analytics. It can be used on almost any kind of database and can be integrated with other IBM SPSS
products.

quite figure out how to analyze big data, it's kind of useless having an abundance of data points. That's where Teradata comes
If you can't
in. They provide various solutions and services to businesses who want to become data-driven.
This ever-growing list of tools makes it easier to untangle your data and make sense of it so you can put it to good use for your business.

Data Analysis, Visualization, and Integration Tools


Learning Objective
After completing this topic, you should be able to
recognize some of the most common big data tools used for data analysis,
visualization, integration, and collection activities

1.
[Topic title: Data Analysis, Visualization, and Integration Tools] Different jobs require different tools. This is also true when it comes to big
data. So besides the storage, management, cleaning, and mining tools, there are also tools for data analysis, visualization, and
integration.

So, what's data analysis? It involves dissecting the data and patterns you uncovered when mining. And evaluating the significance of
those patterns you didn't recognize earlier. Data analysis helps you find answers to specific questions in your data.
Qubole is a solution for large businesses. It makes use of several data processing engines, including Presto and Hive.
This program is
very accessible and also very flexible.
You can import your data using BigML's
user-friendly interface, then get forecasts out of it. They'll even let you use their predictive analysis
models.
If you're
more of a visual person, then you know how boring just looking at numbers can be. This is where data visualization tools can
help. They transform all your complex data into a brighter and easy to understand visuals, making coding unnecessary.
With Tableau, you create those visual displays of data – like maps and charts. You can even use their web connector to
project your live
data more visually.

with maps, then CartoDB is for you! It allows you to mess around with sample datasets to make it easier to learn how
If you prefer working
to put your location data into a more suitable visual display.
Then there's Chartio. In just a few steps, you can incorporate
data sources and execute queries online, directly from a browser. Using their
platform, you can create dashboards, and then format them into PDF files to send via e-mail.
Data integration platforms allow you to take data you extract from one platform and share it with
another.
One such platform, Blockspring, allows you to take a simple Google Sheet Formula and apply it to third party
programs. For instance, you
can check for followers, Tweet, or even connect to other tools – amongst other things – from that one sheet.
Pentaho, in contrast, allows you to integrate tools using a basic drag and drop user interface. And although you
won't need any coding,
they allow for big data integration and offer analytics services too.

But in order to do all these wonderful things, you first need data, right? This is where data collection – or extraction – comes into play. It
means taking unstructured information, like a blog or a photo album, and transforming it into something structured, like a spreadsheet.
Then you can analyze and use the data to make important decisions.

When it comes to data extraction, Import.io is definitely up for the challenge! It takes web pages, and after a simple click in their UI,
transforms those pages into user-friendly spreadsheets.
With so many tools to choose from – whether you're more visual or not – making sense of the data you create is now a whole lot simpler
than before.
Big Data Analysis Challenges
Learning Objective
After completing this topic, you should be able to
identify the challenges associated with big data analysis

1.
[Topic title: Big Data Analysis Challenges] No matter how great your solution is, there will always be challenges that will factor into it. And
big data analysis is no exception.
Leveraging big data can result in
amazing benefits to the organization with regards to cost and time reductions, improving decision-
making, and product and service development. But if the right approach and resources aren't in place – or effective analysis and
interpretation methods aren't used – good intentions can turn into risk for the organization.
We already know it can be difficult to make sense of big data – how do you know what to use
and how to use it? Data-focused business
cases usually require a different way of thinking – more out of the box. So identifying and using the right data can be difficult, especially if
you're used to more traditional methods of thinking.
And because most companies don't know what deciphering
big data entails, they don't have suitable systems to manage it. Inadequate
access and connectivity is a huge issue when it comes to storing and managing your data.
But besides not having the infrastructure to manage the data itself, companies don't have the
right people in place to even try. It doesn't
help that even though big data is evolving quite rapidly, it's not a subject you can take at university level. This is why the right people who
can make sense of all the data are hard to find. If you don't know much about big data, how do you know what to look for in a big data
expert? Not having the right people work with your data can be a huge setback.
Compounding this issue is inadequate collaboration systems. Data ownership is
usually segmented by department or by specific function.
But to analyze big data, you need to be able to access all of it regardless of department or function. If you can't, your data could be
skewed.

A lot of companies still make use of standard databases while testing the waters of big data. But testing the waters isn't enough.
Technology is evolving so fast nowadays, isn't it? Your company can fall behind quite quickly if you're not keeping up with current
technology. So you can understand why it's a must to have a system in place – one which can evolve and adapt to keep up as technology
gets more efficient.
So the next question
would be exactly how secure is your company's data? If you don't have the right people, system, or tools to work with
big data, how would you be able to keep it secure? As a company grows they acquire more information. This, in effect, means that there's
more information that can be compromised.

big data analysis, make sure you're aware of these challenges so you don't sabotage your
So instead of rushing blindly into the world of
business in the process.

© 2019 Skillsoft Ireland Limited

You might also like