You are on page 1of 15

Welcome to Data Analytics for Decision

Making: An Introduction to Using Excel

Welcome to the course! We hope you’re excited to get started. On this first step you’ll find some
important information on the learning outcomes, course design and a brief background for our
educators from Bond University, Dr. Adrian Gepp and James Todd.

Learning Outcomes

By the time you finish this course, you should be able to

 Describe data using statistics and graphical techniques


 Use the concepts of probability and discrete random variables to make business decisions
 Apply modern quantitative tools (Microsoft Excel) to data analysis in a business context
 Gain exposure to the changing landscape of data science in the modern business

Course Design

This course is split into two weeks. In the first, you will learn about methods for describing
important characteristics of data through application of graphical techniques and descriptive
statistics. You will also hear about why ethics is an important topic when performing data
analysis and reporting results. In the second week, you will be introduced to the idea of
probability, random variables, how we can describe them, and how we can make decisions even
in the face of randomness. Finally, you’ll hear about the environment in which data analytics is
happening today from Professor Bruce Vanstone, a Professor of Data Science at Bond
University.

Your’ Educators

Dr. Adrian Gepp, Associate Professor of Data Analytics

Adrian has more than a decade of experience in teaching at a tertiary level that spans
undergraduate, postgraduate and online education in data analytics, economics and finance.
Adrian’s primary research interest is in applying big data and advanced statistical modelling to
reveal unique insights about problems of economic and social importance, including fraud
detection and business failure prediction. His research has attracted more than $500,000 in
external funding and he has more than 40 peer-reviewed research outputs.

James Todd, PhD Candidate

James is a PhD candidate and teacher at Bond University. His previous studies have included a
Bachelors and Honours degree in Actuarial Science, graduating as Valedictorian and with First
Class Honours respectively. He has taught subjects in financial, statistical, and data analytics
areas at both undergraduate and postgraduate levels. Beyond experience as a research assistant in
various areas, his own research focuses on the practical application of data science tools in
healthcare systems and processes, with three peer-reviewed journal articles published.

Where do graphs fit into analysis?


Graphs? Why Graphs?
Do you think of graphs when you think of data analysis? Maybe you do, but if you don’t then it’s
time to start.

Often people consider effective data analytics to be complicated, but that is definitely not always
the case. The goal of data analytics in business is to learn about the characteristics of the data to
make decisions. The better we understand it, the better we achieve that goal. The goal isn’t to
make sure that no one understands our analysis. Visualising data is often the most effective way
of describing it for human beings, because sight is such an important part of how we interact
with the world. A good graph can make insights obvious and doesn’t require the viewer to
decipher a complicated calculation. Even if it isn’t always the final result of an analysis, it often
provides an important starting point.

In the next activity, Adrian will talk through several useful graphical techniques. He will discuss
what type of data is appropriate for each technique and how to interpret the visualisations with
examples. You will learn about pie charts, bar charts, histograms, line plots, and scatter plots.

While pie and bar charts are useful for Visualising data where only a few values are possible
(such as hair colors or product names), we need a different tool for continuous data where many
values are possible.

This is where histograms are useful. In this video, Adrian shows their use with the example of
phone bills, but I’m sure you can think of other examples as well.

Share in the comments below some other examples of histograms you can think of or are familiar
with.

Next, we’ll look at two final graphical techniques for inspecting two variables and the
relationships between them. These are line plots and scatter plots…
Line Plots and Scatter Plots
We now have our last two plots. Line plots are valuable tools for visualising how a variable
changes across time and is one of the most common plots you will see.

We often care not just about the current state of a variable, such as profit or sales, but also about
how it has evolved. Has there been high growth? Is it stable? These are examples of questions
which line plots help us to answer.

Scatter plots are similar in that they visualise two variables at once that might be related to one
another. They are a useful exploratory tool to assess whether there might be relationships
between variables. For example, when I spend more on advertising do I see higher sales?

Hopefully you are comfortable with the ideas behind these and the other graphical techniques
now. If not, feel free to double back and watch a topic video again or ask your fellow learners
any questions you have in the comments below.

Once you’re ready, in the next step we’ll take it one step further and talk about what makes a
“good” graph.

Good Graphs
What makes a good graph?
It’s easy to spot a bad graph.

I’m sure everyone has seen graphs before that only made the data more confusing. A quick
search will turn up many. Sometimes they are simply hard to decipher, due to a bad choice of
graphical technique or because of poor labelling. Other times, they may be misleading and result
in incorrect interpretation of the data. To effectively communicate our results, we need to focus
on making good graphs.

Constructing a good graph can be a bit of an art form, but there are some basic principles.
Applying these principles puts us on the path to a final graph that is effective in communicating
important facts about the data without being misleading. In the above video, Adrian outlines
some of these principles. While we adhere to these principles in this course, others might not by
accident or to be deliberately misleading. When we talk about the role of ethics in analysis
(Activity five) we’ll show some examples of really bad graphs.

You’ve now finished with this first activity! Hopefully you have a greater appreciation for
different types of visualisations, when they should be applied, and what we need to consider to
make a ‘good’ graph. You now have the opportunity to check your understanding with a quick
quiz.

From there, we’ll step through these same graphical techniques and examples to demonstrate
how you can also construct them using Excel.

Using Excel to Create Graphs


In the last activity, we addressed the ‘what’ and the ‘why’. You now know what some common
and useful graphical techniques are and you also know why we use them. These are important
for engaging with and critically assessing data analysis, but you also need to know how to create
them.

Now, in this activity, you’ll learn how to construct them yourself. We’ll be revisiting the
examples shown and going through their creation step-by-step. Unfortunately, FutureLearn
doesn’t support the sharing of Excel files within the comments. If you have your own data
available, however, we’d suggest you try experimenting and following along.

You can share your work with other learners and view examples from others on this course using
our Padlet wall here. Please do not share inappropriate materials or any personally identifiable
information, learners who do so will have their posts removed.

Actively doing a task helps to reinforce the lessons and ensures you get the most out of this
activity. If you don’t have access to Microsoft Excel, you could also use the free Google Sheets.

If you’re ready, Mark as Complete and move on to next step to get started with pie charts and
bar charts.

Pie Charts and Bar Charts with Excel


From watching this video, you can now do more than just interpret and explain pie and bar
charts, you can start to make your own!

Remember to follow along with your own data or use the data in the video and share on our
Padlet wall if you wish to join in.

Next we’ll demonstrate how to construct histograms with Excel.

Histograms with Excel


In the above video, the process of constructing a histogram in Excel is described.

The choice of bins for the histogram has a big impact on the final result. Too wide will mean the
finer details of the data are obscured. Too narrow will mean that the bigger picture is hard to see.
You will normally experiment with different widths to see what happens. For a given problem
you’re unlikely to get it right the first time, but you can easily modify it in Excel until you find
the widths that work best!

Line Plots and Scatter Plots with Excel


In this video, Adrian explains how to construct line plots and scatter plots using Excel. Hopefully
you find them useful the next time you want to investigate the relationship between two
variables.

You should now be comfortable with how to create bar charts, pie charts, histograms, line plots,
and scatter plots using Excel. I hope you’re excited to get to using those!

Descriptive Statistics
Often, we don’t struggle from a lack of data but instead too much of it. By having too much
information, it is difficult to identify important features of the data and then make data-driven
decisions. This is where descriptive statistics come in. Descriptive statistics can be used to
describe characteristics of vast quantities of data in just a few numbers, making them a useful
addition to your data analysis toolbox.

So what can descriptive statistics tell us? What insights can they provide? Well, we might care
about ‘typical’ customers, for which we can use measures of central location like the mean,
median and mode. We might care about how much variety there is in our data, for which we can
use measures of spread like variance or standard deviation. We might care about if we tend to
have a few really large or really small values, for which we can use measures of shape like
Skewness. Finally, we might care about how related two different variables might be, for which
we can use measures of association like correlation.

In this activity, you’ll learn about all of these measures. You’ll learn what they are, how they are
calculated, how to interpret them, and how to generate them using Excel.

Let’s discuss

When have you used descriptive statistics or seen them used effectively?
Measures of Spread - Range, Variance and Standard
Deviation
Measures of Spread
Variance and standard deviation are the most common measures of how spread out data is.

While the range of the data is also a measure of spread, it ignores a lot of important information
we normally want to capture. From this video, you should understand the relation between
standard deviation and variance as well as how we calculate them.

Discuss with your fellow learners any questions you may still have in the comments below.

In the next step, we’ll talk about how we can use and interpret our measure of spread once we
have calculated it.

Interpreting Standard Deviation


Standard deviation is an important measure of spread because of how it can be interpreted for
data that is approximately bell-shaped. Specifically, it gives an indication of how much is how
far away from the mean. If the data is approximately bell shaped then:

 68% of the data is within plus or minus one standard deviation around the mean
 95% of the data is within plus or minus two standard deviations around the mean
 99.7% of the data is within plus or minus three standard deviations around the mean

This result means that standard deviation is a very descriptive statistic! Now that we have some
tools for measuring spread, let’s look at how we measure shape in the next step.

Measures of Shape
We introduce the idea of ‘Skewness’ in this video.

Skewness tells us about the symmetry of the distribution of data. Skewness of 0 means that the
data is perfectly symmetric, so you can draw a vertical line and both sides of that line would look
identical. Positive and negative skew indicate that the distribution is not symmetric, with tails
either to the right or left.

In the next step, we demonstrate how to calculate the measures introduced so far using Excel.
Computing Descriptive Statistics in Excel
We’ve now covered various descriptive statistics, and now you should also be able to calculate
them yourselves using Excel.

Note that while most common statistics can be produced automatically using the analysis tool
pack in Excel, those values don’t automatically update. By calculating the statistics individually
with the easy to use Excel formulas, they will automatically update when you make changes to
your data. If you think there is a chance you will have to make changes to your data, then it’s
probably best to do it this way.

In the next step we’ll introduce the final descriptive statistic, which measures the association
between variables.

Measures of Association
In this video we have provided a new measure, demonstrated its calculation, and summarized our
statistics.

Our new measure of correlation is useful for determining whether there is a linear relationship
between two variables. One common pitfall encountered by analysts reporting correlation is to
confuse it with causation. It is very important to remember that a strong correlation does not
mean that one variable causes the other. Two variables might be correlated while actually having
very little to do with each other. The example given in the above video was of the strong
correlation between the number of films Nicholas Cage has appeared in and number of deaths by
drowning. Even if the correlation is strong, we don’t believe either of these are causing the other.
Many other examples of strong correlations between unrelated variables can be found here.

Now you have a collection of useful statistics in your data analyst toolbox. You can describe the
central location, spread, shape, and association of data. In addition to the graphical techniques
from activities one and two, you’re quickly gaining the key tools of data analysis.

Check your knowledge of this activity in the next step’s quiz before you move onto our final
activity for this week, where we discuss the importance of ethics in data analysis.

Ethics in Data Analysis?


Most people don’t think of ethics when they think of data analysis, but it plays an important role.

Ethics should be considered in being careful in how you do your analysis and interpreting the
results of others. It is increasingly relevant in today’s data-driven world. Relevant issues which
will be discussed in the following videos will include appropriate treatment of data which may
be sensitive, making sure outputs are faithful representations of the underlying data, and ensuring
conclusions from the analysis are suitably justified.

As budding data analysts, it will be your responsibility to ensure your own analysis is ethical. It
will also be your responsibility to draw attention to unethical, misleading and untrustworthy
analysis.

As economist Ronald Coase stated: “if you torture the data long enough, it will confess to
anything”.

Let’s discuss

Can you think of any real life examples where data has been misused or manipulated? Share your
thoughts in the comments below.

The Role of Ethics Part 1


In this video, Adrian speaks about why we need to consider ethics in our analysis, provides a list
of principles to consider, and gives examples of unethical analysis.

Why do you think ethics is important in data analysis?

In the next step we discuss how data visualisation can also distort data and lead to incorrect
conclusions.

The Role of Ethics Part 2


Building on the last video, Adrian here continues to highlight the importance of ethics in
analytics, especially with relation to graphical techniques.

Throughout this course we’ve already talked about what we need to think about when making
good graphs, and here we’ve seen some examples of bad graphs. You’ve probably seen some
bad misleading graphs outside of our examples. An excellent collection of real-life examples can
be found here and I’d encourage everyone to have a look through them.

Week 1 Reflection
Well done on completing your first week of “Data Analytics for Decision Making: An
Introduction to Using Excel”.

We hope you’ve enjoyed it so far and have added some new skills to your analytical toolbox.
This week we’ve covered a lot. You learned about a variety of different graphs, when to use
them, and how to make them yourself using Excel. You learned about the major descriptive
statistics, where they are important, and again how to calculate them using Excel. Finally, you’ve
heard about why ethics are important to consider for any good data analyst.

Now that we are at the halfway point, it would be great to hear your reflections on the first
week.

 What part of the content was most exciting for you?


 Do you have any ideas or data you want to investigate now that you have some
new tools?
 Lastly, have you seen the results of an analysis in the past that you now have
more questions about?

Week 2: Probability and beyond

Probability in Week 2
Welcome to week 2! Last week you saw graphical techniques, descriptive statistics, and heard
about ethics in analysis. This week will focus on how we can work with probability.

The key idea behind the content for this week is that in business and throughout life, we have to
make decisions under uncertainty. These decisions can be as small as whether we bring an
umbrella in case it rains to whether we invest in new technologies. Being able to work with this
uncertainty in a structured way is a vital skill. We will start in the next two steps by introducing
the idea of probability and talking about how we interpret probabilities.

In activity two, we’ll talk about how to calculate the probability of combinations of random
events occurring. In activity three, we’ll introduce the idea of discrete random variables and how
we can describe them with some familiar ideas. In activity four, we discuss how we can take our
understanding of probability and random variables to make decisions even when outcomes are
uncertain.

Activities one, two, three and four all relate to the primary goal for this week of being able to
work with uncertainty. We also believe it’s important to finish the course with a broad discussion
of today’s environment for data analytics. We live in an exciting time, where more and more data
is being stored and there is more and more need for data analytics to help make sense of it. In
activity five we will have the opportunity to hear about today’s environment from Bruce
Vanstone, Professor of Data Science at Bond University.
Introduction to Probability and Terminology
In this video, Adrian discusses the basic building blocks of probability as well as why it’s
important to understand the concept of probability.

Some terminology is also introduced so that we can be clearer when talking about the topic. In
the next video, we’ll finish our introduction by talking about how we assign probabilities to
events and how we interpret them.

If there is anything you’re unclear on, ask your fellow learners in the comments below.

Assigning and Interpreting Probabilities


We should now have a broad understanding of the different ways we can assign probabilities to
events as well as what those probabilities mean.

You’ll likely most often be working from the empirical approach for assigning probabilities,
though in some examples the classical approach may be appropriate. In general, we try to avoid
the subjective approach unless we cannot apply either of the other two, as it’s harder to defend.

Now that we have a basic understanding of what probabilities are and how we assign them, we
are now ready to jump into using them. In the next activity we’ll demonstrate how we can work
out the probabilities of different combinations of events. See you there!

Event Relations
In the last activity, we introduced the idea of simple events. Now, we’ll turn our attention to
more complex events and look at how we can calculate probabilities for combinations of simple
events. Specifically, we’ll focus on how to answer questions in the form of:

 What is the probability that both of these events occur?


 What is the probability that neither event occurs?
 What is the probability that at least one of these events occurs?

These types of questions deal with relations between events, and we have a bit more terminology
about how we refer to them. You’ll hear about unions, intersections and complements in the next
video. Click through if you’re ready to get started!
Unions, Intersections and Complements
In the last activity we discussed how we assign probabilities to simple events. In this video we
look at how we can take these simple events and work out probabilities of more complicated
events. Together we have now examined the following event relations:

 Unions: where we are interested in whether at least one of several events occur
 Intersections: where we are interested in whether multiple events all occur
 Complements: where we are interested in whether events do not occur

In the next step, Adrian will take us through a more detailed example.

Fruit Orchard Example


Hopefully this video’s example will help to reinforce where our formulas come from.

Without using a formula, Adrian can calculate the desired probability through common sense.
The intuition displayed in this example is the basis of our formulas. Often it’s helpful to
approach probability problems involving multiple events this way, as our calculations should
always pass our logic checks.

How do you feel about this example? Can you think of your own example to share in the
comments below?

Now that we’ve got a better understanding of probability for both simple and more complicated
events, let’s move on. In the next activity, we’ll introduce discrete random variables and discuss
how we can work with them.

Discrete Random Variables


In this activity, we introduce the idea of discrete random variables.

A random variable is simply a variable where the value is uncertain. For example, we may record
the number of customers entering a store each day. This is a variable. If we are interested in the
number of customers who will enter the store tomorrow, this is a random variable. It’s random
because the value is uncertain. It’s also a discrete random variable because there is a finite or
limited set of values it can take. We’ll describe the idea of random variables more in the next
video, but why do we need these?

We need to know what they are and how to deal with them because they pop up throughout
everyday life and business. The number of customers entering a store each day is just one
example, but it might be important to consider because it would influence the number of staff to
service them.
So if they are so common, how do we work with them? We’ll explore that question in the next
few videos as well!

An Introduction to Discrete Random


Variables
What is a random variable?

It’s just a variable where the outcome or value is uncertain. That isn’t to say that you have no
idea what the value will be, just that it’s not guaranteed. A coin flip is a random variable in that
you don’t know what the outcome will be beforehand, but you do have knowledge of the
‘structure’ of the randomness. Only two outcomes are possible, and each has a 50% chance to
occur (for a fair coin).

Now that you’ve heard from Adrian talking about discrete random variables and we have a good
foundation, you should be ready for the next step where we will hear about how we calculate and
interpret the expectation of a random variable.

Discrete Random Variables - Expectation


(with Excel)
Expectation of a Random Variable
In Week 1, we spoke about how we can describe the central location of a variable.

This is also important to measure for random variables. In this video Adrian re-introduces the
mean, or expectation now, for random variables. The expectation represents the long run average
of the random variable.

Up next, we’ll also revisit a measure of spread.

Discrete Random Variables - Variance (with


Excel) and Shape
Spread for Discrete Random Variables
Spread of a random variable is also important to know to understand how different outcomes are
likely to be from our expectation.
To measure this spread, we re-introduce variance and standard deviation. Our equations for
calculating them have changed a little from before, but the principles are the same. While
Skewness might still be important, we can assess this visually as shown by Adrian.

We now have tools for calculating expectation and variance for discrete random variables, with a
bit of help from Excel. We can also inspect shape by considering the skewness of our data
visually. Now that we have these, we can move onto our final practical topic for this course,
where we use these measures to make a business decision involving random variables.

Before we move on let’s check in.

Is there anything you’re struggling with, or any terms you don’t understand? Try and help other
learners with their questions if you can.

Decisions with Discrete Random Variables:


Part 1 (with Excel)
To Lease or Not To Lease
In this video we started applying what we have learned so far for a real business problem.

This is an example, but it isn’t a far-fetched one. You can imagine how similar scenarios of
choosing between two projects with uncertain outcomes might arise in businesses. To assess the
two projects, we are computing the expectation, variance and standard deviation of cash flows.

Let’s discuss

What other scenarios are you contemplating right now that you could use your Excel skills to
help decide? Feel free to share with your fellow learners in the comments!

In the next video, Adrian will discuss how we our calculated figures to determine which project
is more attractive.

Decisions with Discrete Random Variables:


Part 2
To Lease or Not To Lease (Continued)
Using the expectation and variance from the last video, we can now make a decision as to which
project was better.
To do this, we first look at their expected cash flows. We would prefer a project to have a higher
expected cash flow. In the long run, if we always took projects with higher expected cash flows
we would be better off. Next, we look at the variance of the cash flows. We would prefer lower
variance because it means that there is less risk of large variations from our expectation.

To summarize, we prefer higher expected cash flows and lower variance. This example was nice
because one of the options had a higher expectation and lower variance. In the next video,
Adrian will discuss how we make a decision if one project has higher expected cash flows but
also higher variance.

Coefficient of Variation
To Lease or Not To Lease (Continued)
Coefficient of variation is our solution to the scenario when one project has better expected cash
flows but worse variance.

It provides a convenient measure of how much risk is taken on relative to the expected cash
flows. In this example, option X offered a higher expected return relative to its risk. So even
though it had a lower expectation it is the better project because of its substantially lower risk.

In the next step, we’ll summaries what we have learned so far this week before you start your
Test to check your understanding. If you pass the test and have marked each step as complete
along the way you’ll be eligible for our free Certificate of Achievement.

How far we've come


Firstly, well done on making it this far! You’ve now finished the practical steps for this course.

Next we have a short Test to assess your knowledge before you go into the last activity. In
activity five, we’ll hear from Professor Bruce Vanstone about the broader environment of data
analytics today. This is a fascinating topic in today’s rapidly changing world and it should give
you some idea as to how the skills you’ve learned in this course equip you to better face some
challenges faced by businesses.

Before you go into the Test and last activity, we want to briefly summarise what you’ve learned
in the past week and hear from you what you’ve made of it. This week you encountered the idea
of probability and uncertainty, maybe for the first time. As part of this week’s content you
learned about the following:

 What simple events are and the three ways we can assign probabilities to events
 How to use the probabilities assigned to simple events to work out probabilities for more
complicated events involving unions, intersections, and complements
 What discrete random variables are and how we can describe them with measures of
central location, spread and Skewness
 How to use our descriptive measures of random variables to make real business decisions
for projects even when outcomes are uncertain
 How to use Excel to help us perform these tasks and keep the focus on the process rather
than complicated formulas

Adding all of this to our lessons from the first week, you’re almost ready to start performing your
own analysis or learn about more advanced analytical tools. All that is left is your Test and the
last activity. But before you start those, it would be great to hear from you once more in the
discussion below.

Let’s discuss

Is there a particular task you want to complete with your new skills? How might you apply what
you’ve learnt when you next perform a data related task in excel?

The Environment for Analytics


Welcome to our final activity for the course.

Here, we want to take a big step back away from learning specific methods and instead look at
the environment in which we are working as data analysts or working as a manager with data
analysts. The world is changing very fast and as you build your skills, it is important to
understand how you fit in. We hope that the following video helps with that. As a bit of a sneak
peek, the good news is that as businesses have increasingly realised their data are a very valuable
resource. Furthermore, with modern technology, businesses now have the ability to store and
potential to analyse massive amounts of data. The current need is for individuals and teams who
are able to make sense of it all. That puts data analysts, data scientists and data-savvy managers
in high demand.

The Science of Business


This video is also used at the end of a longer, more detailed data analytics course taught in the
MBA at the Bond Business School, Bond University.

It’s actually the subject that this online course is based on! As a result, you’ll notice that
Professor Vanstone refers to the content being topic 10. We felt that it was important information
to know even if you haven’t gone through the longer version of the course because it still applies
to you. So we’ve included it for you all as a special treat! If you want to learn all the information
in between the introduction to data analytics and this next video, then just contact the Bond
University team as you will benefit from completing the extended Data Analytics for Decision
Making subject offered by Bond.

You might also like