You are on page 1of 10

TRANSCRIPT MY CLASS NOTES

Let’s now try and use analysis toolpak to


basically use the rank and percentile function.
So the data analysis toolpak, there is a nice
function called rank and percentile and
basically what it helps us do is gives us the
ability to rank a set of data points in our data
set as well as understand how they perform
from a percentile stand point.

For example, and that’s the example I’m going


to use. Let’s say I have a set of test scores in
my statistics class. So I have around 78
students. They all have a student ID and I
know how much each of them has scored in a
statistics exam out of a 100, that’s the
maximum number of points they could have
gotten on the test and now I want to know the
rank of each student and I also want to know
their percentile. Right? Who was in the 100
percentile, who was in the 70s percentile, who
was the 50th percentile and so on. And this is
a really quick way of doing it. And also if I
want to be able to then use this data to rank
students to give them grades to decide who
failed or just to understand the distribution of
performance in my class out of a score of 100
in this test. It’s a really quick way for me to
do that.

Now that we have this data let’s now use data


analysis, am going to say rank and percentile.
So the input range is just all of the values in
this data set for this score am not using the

1|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

labels because I had the… haven’t added the


header in my input range. And I want the
output in a new sheet or let’s get the output
in a new sheet. So am going to say Ok.

Now literally at a click of a button Excel has


now created a new worksheet with the data
that I want. So what it has done is it has
ranked all the data points in decreasing order
as we see the maximum score was 96 and goes
all the way down and the minimum score is 24
out of hundred, which is typically expected on
any statistical exam. A lot of people don’t like
the subject. Also tells me exactly which data
point it was. Out of the original data set and
now each of these data points have been
ranked from 1 all the way to 78 and I also
know the percentile. So obviously the student
who is ranked 1 is 100 percentiles because his
or her score which was a 96 out of a 100 is
basically the top score in the class, which
means it is better than 100 percent of the
class. There is nobody in the class who is
better than this student. Next student is 95
and his/ her percentile is 98.7% and so on.

A lot of competitive entrance exam especially


for prestigious universities use a method of
grading students based on percentile and
issuing their admissions, their interview calls
etc. based on what percentile you fell in out
of an entire nation of students that took place
in this class.

2|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

So now this has given me very interesting


information about how everyone in my class
has performed. One of the things that I could
do quickly is further understand the
distribution of scores by looking at a quick
histogram. And if you remember the histogram
tool earlier and analysis toolpak has the
histogram function. So let’s quickly define
some buckets. Maybe we can have 10, 20, 30
etc., all the way to a 100. And now let’s
quickly look at a distribution, so am going to
go back to the histogram function, select the
input range, the bin range in this case is in
column F. Am going to chart output. Actually I
want the output right here and I’d like a
chart. Oops, something seems to have messed
up. Let’s quickly change the input range to
column 1. Excel seems to be acting funny.
Let’s just get it in a new workbook and there
we go. Let’s quickly zoom into this.

As you can see this is the distribution we have


received. So out of 78 students, the
maximum… the bin with the highest frequency
seems to be the 50 to 60 bin. Which means
that the majority of my students have scored
in this range although not necessarily it would
reflect the mean but the majority of the
students are in this range and it tapers off on
either side. The minimum scores seem to be in
the 20 to 30 range which is 24 and he

3|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

maximum students who scored 96 which was


in the 90 to 100 range.

There seem to be about 4 students in this


range Now with the combination of this data
as well as the data which... which tells us the
percentile and rank of each student, I
understand how my class has performed how
each student has performed and this goes into
a student’s specific record if am a teacher and
also it means distributing grades because
based on the rank I can use the cut off rank, I
can use a curve and just give students grades
based on how many standard deviation away
from the mean they are with respect to this
histogram. There are multiple methods that
can be used but fundamentally the
combination of the histogram and the rank and
percentile tool in the analysis toolpak really
quickly help me understand the distribution of
scores and the distribution of students in this
class from this original data set.

So this… the combination of these 2 tools can


actually be used in any scenario, right. Any
distribution you have and you are trying to
understand the distribution itself as opposed
to quickly gaining analytical distribution like
mean or median. But you just want to know
how this spread of numbers is in a given data
set. It could be sales data, it could be
employee performance data, it could be any
data pertaining to your business then these

4|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

two tools in the analysis toolpak will help you


quickly analyse the data and get results.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

The analysis tool pack in Excel also has an


interesting module called descriptive statistics
where it gives you a whole bunch of
descriptive stats on any data set that you give
it. And it’s a nice quick way for you to get
some really quick summary numbers on a data
set before you dive into it for deeper analysis
or before you can maybe split that data and
combine it with other data sets depending on
the type of analytics technique you choose to
employ. But it’s a really quick way for you to
get some simple information about your data
set and have it at your fingertips so that you
can then make those decisions.

Now let’s apply it on the data set that we


have in front of us which is actually an
interesting set of data. Now what we have
here is unemployment data for every single
county in the United States in 2009. So just
after the great economic crash of 2008-2009
it’s interesting to see how... how states
performed in terms of their specific
unemployment rate, both at a state level as
well as at a county level. So this data set has
data for 3218 counties, 3219 entries in our
database and what we have here is a specific
both for the state as well as the county itself.

5|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

FIPS: - basically it stands for Federal


Information processing standards. So it’s the
standard use across the US. The county name,
the year which is 2009 population, number of
unemployed people and the unemployment
rate. So really quickly, without diving into this
data set, without creating additional columns,
without doing anything, I just want to know
how this data is spread. So what I’m going to
do real fast is go to data analysis, descriptive
statistics, the input range I want to use is the
unemployment rate itself. So data in column
H, no labels I’d like some summary status…
summary statistics and I want it in a new
worksheet. So that’s it. So really fast as you
can see that took us not more than 20 seconds
to be able to do, right? And now really… now
what I can understand quickly is that mean
unemployment rate was 8.99%, about 9%;
standard deviation of 0.06… sorry standard
deviation of 3.64, the median is 8.5 the mode
is 7.8. The minimum unemployment rate
across all the 3200 odd counties was 1.2%.
Well the maximum was 30%.

Now that I have this data at my fingertips, I


can A: take the summary statistics on my own
if I need to report it somewhere, if someone
needs a quick glance of how every single
county across the US is working. I can tell
them the mean, the median the maximum the
minimum, the entire range and also some

6|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

information of the standard deviation and


variance. But what I can also do is dig into this
more. If I want to know what that maximum
unemployment county was, who was the
minimum etc.

What I can also do is maybe repeat this


exercise at a state level which might also be
interesting. So one of the things that can be
done is to summarize this data at the state
level and then use the descriptive statistics on
top of that data. So if we could re-compute it
at a state level what we first have to do is
extract the state code, if we can see in
column D, the last 2 alphabets, the last 2
letters in each county basically indicates the
state code. So we have AL which is Alabama
AK which is Arkansas, AZ Arizona and so on. So
what I am going to do now is basically say let’s
get the right most... 2 rightmost characters in
from column D and extend it all to the end of
the data set.

As we can see it as we can get every single


state code from that county code. Now what I
want to do is first aggregate columns F and G
at the state level and then compute the
unemployment rate for each single state as
opposed to each single county that we have
from the current data set. So what am going
to do now is select the entire data set and
insert pivot table, say Ok and it opens up a
new worksheet. Here what I want is my rows

7|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

to indicate the state code and now I want the


sum of the population as well as the sum of all
the unemployed people in that state. Let’s get
rid of this. So now let’s quickly extract this
data out and store it. So let’s get rid of the
pivot table so it makes life easier. Here we go.
I want to know the unemployment rate
percentage at the state level. So am going to
say is equal to B4 divided by C4 times a 100.
Let’s extend this formula all the way down
and bring it down to 1 to display only 1
decimal point. So now that I know the data at
a state level, what I can do is to re-do at the
county level which we had applied in the
descriptive statistics module from the analysis
toolpak but do it for the unemployment rate
at the state level. So am going to go to data...
data analysis, descriptive statistics, and now
my input range in this case is going to have to
be all the values in column D and everything
else here is the same. I want summary
statistics, I want it in a new worksheet, there
are no labels in the first row, so am going to
say OK and now what I want to do is copy this
data and paste it right next to what we had
for the county level data. That is for state
level.

Now we can do a side by side comparison of


how the statistics look when we compute it at
a county level whereas how the descriptive
statistics… how the descriptive statistics look

8|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

when we compute at a state level. So what


these two data table side by side is telling us
and again I want to reflect on the ease of how
were able to generate this using the
descriptive statistics module.

So the mean unemployment rate across every


single county in the United States is 8.99%,
which means on average counties in the US
see unemployment rate of almost 9%. If you
look at the state level on the other hand the
average state in the US sees an unemployment
rate of an average of 8.64%. Similarly, the
median at a county level is 8.5% where the
median at a state level is 8.22%. The minimum
unemployment rate at a county level is 1.2 %
whereas the minimum at a state level is 3.4%.
The maximum is 30.1% at the county level
whereas at a state level it is 16.42%.

It’s a nice way to quickly compare this data


both at a county level and at a state level.
Looks like the unemployment rate was pretty
bad in 2009. Of course now we are almost 8
years, 7 or 8 years past 2009, which means the
unemployment rate has definitely gotten
better over Barack Obama’s term as
president. But this is a really nice way to study
and learn how to use the descriptive statistics
module and like I mentioned for other
modules, it can be applied to any single
domain that you are using. So you might be
from sales, from marketing from an Ops stand

9|Page
© Jigsaw Academy Education Pvt Ltd
TRANSCRIPT MY CLASS NOTES

point , we might be looking at different data


sets but like we just saw in under 20 to 30
seconds if you want some really fast numbers
and descriptive stats based on the data set
you are using to enable you to make the right
decision regarding the next step of what
you’re going to do in your analysis or just to
report it to other people in your organization,
then it’s a really powerful module and helps
you achieve this really fast.

10 | P a g e
© Jigsaw Academy Education Pvt Ltd

You might also like