You are on page 1of 41

ANALYZING

DATA IN EXCEL




By Andrei Okhlopkov





Published: December 2019



For the most recent version of this booklet,
accompanying Excel file (“Data analysis.xlsx”)
and other publications please visit:


www.eloquens.com/channel/andrei-okhlopkov








Materials in this booklet are distributed freely and are not intended for sale. You are
authorized and encouraged to forward this booklet and accompanying Excel file to
everyone who might be interested.
Analyzing Data in Excel


Also in this series:

• Guiding Principles of Financial Modeling
• Dates and Timelines in Financial Models
• Unlocking Full Potential of Excel Data Tables (parts 1 and 2)
• Sensitivity Analysis with Indifference Curves
• Charts and Dashboards in Excel (parts 1 and 2)
• Descriptive Statistics for Grouped (Weighted) Data
• Monte Carlo Analysis without Macros
• Portfolio Analysis and Sales Forecasting
• Form Controls and Hyperlinks in Financial Models
• VBA Interactive Collection

The above list and the publications are constantly updated. To be notified on the
updates, please follow me on LinkedIn:

https://www.linkedin.com/in/andrei-okhlopkov-92a3191/





I appreciate your interest in my publications on financial
modeling, Excel and VBA, which I enjoy sharing, will keep
coming and hope you will find useful.

Most of my publications are free but you can support my project
through the PayPal (ID: andrei.okhlopkov@hotmail.com).

And certainly feel free to let me know if you need any further
help in financial modeling.







Andrei Okhlopkov

andrei.okhlopkov@hotmail.com

Page 2
Analyzing Data in Excel

TABLE OF CONTENTS
INTRODUCTION 4
LOOKING UP DATA IN LISTS 5
REGULAR LOOKUPS 5
LENIENT LOOKUPS 6
MATCHING DATA IN LISTS AND CELLS 9
MATCHING ENTIRE CELL CONTENT 9
MATCHING PARTS OF CELLS: COUNTING 10
MATCHING PARTS OF CELLS: SUMMING 11
EXTRACTING DESIRED PARTS OF TEXT FROM LISTS AND CELLS 12
PRACTICE EXAMPLES 14
DUE DILIGENCE ANALYSIS 14
ENHANCING ACCOUNTING SYSTEM 14
SPLITTING TEXT IN CELLS BY COLUMNS 15
SINGLE-COMPONENT 15
MULTI-COMPONENT 15
CONDITIONAL STATISTICS FOR DELIMITED LISTS 17
ONE CONDITION, SINGLE CRITERIA 17
SEVERAL CONDITIONS, SINGLE CRITERIA 18
MULTIPLE CRITERIA 19
STATISTICS FOR DATA WITH CRITERIA AND RANKS 21
CRITERIA, THEN RANKS 21
RANKS, THEN CRITERIA 22
SORTING DATA UNDER MULTIPLE CRITERIA 24
CONDITIONAL STATISTICS FOR NON-DELIMITED LISTS 25
ONE CONDITION, SINGLE CRITERIA 25
SEVERAL CONDITIONS, SINGLE CRITERIA 25
MULTIPLE CRITERIA 25
“FLEXIBLE” SELECTION 29
ANALYZING DATA CONTAINING DATES 31
GROUPING BY YEARS 31
GROUPING BY YEARS AND ADDITIONAL CRITERIA 32
GROUPING BY YEARS AND QUARTERS 32
GROUPING BY YEARS AND MONTHS 33
GROUPING BY WEEKDAYS 34
TRANSFORMING TWO-DIMENSIONAL TABLES 35
SINGLE CRITERIA 35
MULTIPLE CRITERIA, ONE DIMENSION 36
MULTIPLE CRITERIA, TWO DIMENSIONS 36
CREATING AUTOMATIC LISTS BASED ON CRITERIA 38
SINGLE CRITERIA 38
MULTIPLE CRITERIA (AND AND OR LOGIC) 39
WHAT’S NEXT? 40
DESCRIPTIVE STATISTICS FOR GROUPED (WEIGHTED) DATA 40
PORTFOLIO ANALYSIS AND SALES FORECASTING 41

Page 3
Analyzing Data in Excel

Introduction

In this brochure I have collected a handful of techniques for working with data in Excel:
looking up values in tables, matching text or numbers in the lists and extracting
corresponding values from neighboring columns, querying specific data, aggregating
the amounts based on certain condition or a number of criteria, making lists and
summaries, doing statistical analysis based on these criteria etc.

From financial modeling standpoint, these techniques will help collect and analyze data
at the due diligence stage (I am giving a practical example of how to make an extract
from somewhat a messy accounting system of a fictitious investment target). If you are
an in-house accountant or analyst, the methods I am going to describe will help you
make even the existing system more informative and analytical with no additional
resources (I will give an example on this too).

I have tied my examples to the automotive industry (in which I have spent quite some
years). But these examples are very generic and flexible and can certainly be adjusted
to any other industry or specific analytical requirements.

Page 4
Analyzing Data in Excel

Looking up Data in Lists



Refers to sheet: "Lookups"

Formulae on this sheet explain how to lookup values in lists based on certain criteria.

As an example I have given a table with the USD/EUR weekly exchange rates (range of
cells D4:E18). Dates are put in ascending order.

Regular lookups

To find out an exchange rate valid at a given date (entered into the cell B22) we use a
simple lookup function. It can work in two syntaxes:

– short variant (cell D22) does not use the third argument (result vector) and
returns the date at which the rate was established. In our example, we are trying
to find the rate at 25-Jun-18. There is no such date in the table, so we are looking
for the date immediately preceding our given date. This is 22-Jun-18.

– full variant (cell E22) tells the rate (1.1552) which was set on 22 June 2018 and
which was valid on the date we are looking at (25-June-18).

Again, for these formulae to work properly dates must be sorted in ascending order. If,
for whatever reason, your data is not sorted, section on the right explains how to still
get the same results. The dates and rates in cells G4:H18 are exactly the same but are
shuffled.

Let's look at the formula in cell G22 which finds the date at which the relevant exchange
rate was determined. Take the first component of it (=G5:G18<=B22), enter into any
cell, then put the cursor inside the formula and press F9. This is what you will see:

={FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;F
ALSE;TRUE}

As you, the formula checks for every date if it is less than or equal to our given date, and
returns a string of TRUE and FALSE values. As such, 24-Aug-18 is greater than 22-Jun-
18, so the first position is FALSE. Same about the second and the third positions (20-Jul-
18 and 13-Jul-18 respectively). On the forth position we have 01-Jun-18 which is less
than our 25-Jun-18, so the value is TRUE for the first time, and so on.

We are then multiplying this array by the dates ( =G5:G18 * (G5:G18<=B22) ) and if we
F9 this formula again we will get:

={0;0;0;43252;43273;0;0;0;0;0;43266;0;0;43259}

First three positions are zeroes representing the fact that the first three values in the
first string are FALSE, and multiplying anything by FALSE is equivalent to multiplying

Page 5
Analyzing Data in Excel

by zero. Other seemingly strange numbers are the dates as Excel treats them (and Excel
treats dates as the number of days elapsed since 1 Jan 1900).

As the result of this operation, we have zeroed out all the dates that are greater than
our target date. Thus, the effective date we are looking for will be the maximum of the
remaining dates.

We do an interim operation putting this array under the INDEX function
=INDEX(G5:G18 * (G5:G18<=B22), 0). This does not change the array in any way but it
allows to use it further with regular functions natively, in this case with the MAX
function:

=MAX(INDEX(G5:G18 * (G5:G18<=B22), 0))

This is our final formula for the effective date.

Next, we will need to determine the exchange rate which was set on this date. In cell
H22 we start with =MAX(G5:G18 * (G5:G18<=B22)) which is the same formula we have
arrived to earlier but without the INDEX component (we can exclude it here because
we will use the SUMPRODUCT function later which can handle such arrays itself).

We then test which of date in the list is equal to the date we have just found out:
=G5:G18=MAX(G5:G18 * (G5:G18<=B22)). This returns the following array:

={FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;
FALSE;FALSE}

There is only one TRUE on the fifth position here, showing that that this is the position
which has 22-Jun-2018. By multiplying this array by exchange rates, we zero out all the
rates except the one we need:

=(G5:G18=MAX(G5:G18 * (G5:G18<=B22))) * H5:H18

which produces an array:

={0;0;0;0;1.1552;0;0;0;0;0;0;0;0;0}

If we add up all these numbers, we will get 1.1552 and our mission is completed. We do
this with the SUMPRODUCT function:

=SUMPRODUCT((G5:G18=MAX(G5:G18 * (G5:G18<=B22))) * H5:H18)

Lenient lookups

Lenient lookups provide more flexibility in looking up data. Consider an example in cell
D25 which returns the closest date from the list before or on the lookup date (this is
essentially the same what the formula in cell D22 does but I want to give you a

Page 6
Analyzing Data in Excel

perspective). First we test which dates in our array are less than or equal to the lookup
date (25-Jun-2018):

=D5:D18<=B25

This operation returns the following array:

={TRUE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FA
LSE;FALSE}

Then we divide the dates array by those logical results:

=D5:D18 / (D5:D18<=B25)

Dividing a number by TRUE or FALSE logical value is equivalent to dividing by 1 or 0
respectively, so the number either remains as it was or transforms to a #DIV/0! error:

={43252;43259;43266;43273;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#D
IV/0!;#DIV/0!;#DIV/0!;#DIV/0!}

Numbers in this array are the dates presented in general number format. By doing this
operation we have eliminated

We use the AGGREGATE function (I will be using it in this file very extensively and will
talk a lot about it) to find the maximum date which remained after we had eliminated
all the dates greater than the lookup date:

=AGGREGATE(14, 6, D5:D18 / (D5:D18<=B25), 1)

Argument 14 means the function works as an equivalent to the LARGE function,
argument 6 means the functions will ignore the errors and argument 1 in the end
means the function is to define the 1st largest value in the array. This returns 22-Jun-
2018 as it is the latest (largest) date less than the lookup date (25-Jun-2018).

Formula in cell D26 works in the opposite way: it keeps only those date which are
greater than or equal to the lookup date and finds the smallest of those (the first
argument being 15 corresponds to the SMALL function).

As a side note, a combination of INDEX and MATCH with a match type of -1 produces
the same effect but dates must be sorted in descending order. With the above formula
dates can be in any order.

Lastly, the formula in cell D27 just finds the closest date, regardless of direction. It
deducts the lookup date from the array of dates first:

=D5:D18-B27

the result of which is:

Page 7
Analyzing Data in Excel

={-24;-17;-10;-3;4;11;18;25;32;39;46;53;60;67}

Every number in this array is the number of days between the lookup date and every
date in the array. We will take the absolutes of those days:

=ABS(D5:D18-B27)
={24;17;10;3;4;11;18;25;32;39;46;53;60;67}

we put this through the INDEX function - again, this is a technical operation to allow
other functions to deal with this array natively:

=INDEX(ABS(D5:D18-B27),0)

and then find out the minimum of it:

=MIN(INDEX(ABS(D5:D18-B27),0))

the result of which is 3. We then use the MATCH function to determine the position of
value 3 in our array:

=MATCH(MIN(INDEX(ABS(D5:D18-B27),0)), INDEX(ABS(D5:D18-B27),0), 0)

which returns 4.

In the end we will index the array with rates to determine which date is in the 4th cell
from the top:

=INDEX(D5:D18, MATCH(MIN(INDEX(ABS(D5:D18-B27),0)), INDEX(ABS(D5:D18-
B27),0), 0))

this is our final formula in cell D27 and the result is 22-Jun-2018 (this dare is just 3
days away from the lookup date 25-Jun-2018 and is the closest to it).

Formulae in cell D29 and D30 are slight modifications of the formulae in cells D25 an
dD26 respectively. They use strict inequality when comparing the lookup dates with
dates in the array. As a result, even if we use lookup dates which exist in the array (22-
Jun-2018), the formulae will return 15-June-2018 and 29-Jun-2018, the dates
preceding and following the lookup date in the array.

Formulae in cells E25:E30 use the simple LOOKUP function to find out the rate on the
dates we have found out. As that date would actually exist in the list, the LOOKUP
function will return an exact match for the date.

With shuffled data, formulae to determine the dates (cells G25:G30) are the same as
those for sorted data. Formulae to find the rates use the combination of INDEX and
MATCH functions to find the exact matches (discussed in detail later), as the LOOKUP
function is not suited for unsorted data.

Page 8
Analyzing Data in Excel

Matching Data in Lists and Cells



Refers to sheet: “Matching”

We use matching when we need to find an exact match to a value from a list as the basis
for further analysis.

We are looking at the car business, and any car is characterized by a standard set of
features, which include:

– the overall level of standard equipment (from the most simple Base series to the
Comfort series and further up to the richest Luxury series)
– bodystyle (Sedan, Hatchback, Wagon)
– engine size (1.6 and 2.0 liters in my example)
– transmission (Manual or Auto)

There is as well optional equipment (e.g. alloy wheels or a DVD-player) which is not
part of the standard specification and which customers have to pre-order separately
and some extra price for.

Let's imagine a dealer who has ordered seven vehicles listed in the 'Order sample'
section on this sheet.

The order description consists of two parts: standard specifications (range F4:G10) and
optional equipment (range F13:G19). Optional equipment is listed as abbreviations
which are explained in the price lists for vehicles and optional equipment on the left
side of the sheet. Both ranges relate to the same orders, but the lists are shuffled against
each other, so we need to match the car specs against the optional equipment based on
the order number.

Matching entire cell content



First we will determine the price for every vehicle based on the price list.

To find out a price of the vehicle from the list, we first need to match it to one of the list
items. We do it with the MATCH function, and the first part of the formula in cell I4
=MATCH(G4,B4:B14,0) returns 10, because the contents of cell G4 (Luxury sedan 2.0
auto) corresponds to the 10th cell in the range B4:B14.

The next part of the formula is based on the INDEX function, which works in somewhat
an opposite way - it returns the cell reference (and by consequence its value) from a
given range using the cell number (a.k.a. index) given. So if we combine both function
into a formula =INDEX(D4:D14,MATCH(G4,B4:B14,0)), the MATCH part of it will get us
10, and the INDEX part of it will return the 10th cell in the range D4:D14,
corresponding to cell D13 with a value of 15,500. This is the price corresponding to the
type of car in cell G4.

Page 9
Analyzing Data in Excel

We just need to add absolute references and copy the formula down to cell I10 and our
vehicles are priced now.

This combination of MATCH and INDEX is one of the most powerful and popular lookup
and reference tools - I recommend memorizing it and using in your analysis. This
combination has many advantages against the VLOOKUP and HLOOKUP functions
which are used frequently for the same purposes).

We are using again the combination of INDEX and MATCH to tie vehicle specs to their
optional equipment using the order numbers (the formulae are in the range J4:J10).

Matching Parts of Cells: Counting



Getting to the optional part of our order, in many real life situations we have some
items listed in a cell (as exemplified here in the range J4:J10 we have just obtained). We
need to identify which codes are contained (along with other codes) in each cell and
calculate the total price of those codes. I have put forward the formulae to both count
the number of optional items in every car (cells K4:K10) and then to calculate the total
price of them in cells L4:L10.

We take the first component of the formula in cell K4: =SEARCH(C17:C23,J4). This
formula takes every cell in the list of abbreviations and tries to find it in cell J4. If the
search is successful, the formula returns the number of a character in cell J4 under
which the abbreviation text is found; otherwise it returns an error. Going into the cell
and pressing F9:

={1;#VALUE!;5;#VALUE!;#VALUE!;#VALUE!;#VALUE!}

This means that the formula has found the text "AC" (the first one in the option list)
under the first character position, and the text "LT" (the third one in the option list)
under the 5th character position in this cell.

We then use the ISNUMBER function to evaluate if the result of the previous calculation
is a number or not (an error in this case). So we modify our formula to
=ISNUMBER(SEARCH(C17:C23,J4)) and receive as the result:

={TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE}

This means that positions number one and three in our range contain numbers, and
that they are found in the contents of the cell J4.

Logical values TRUE and FALSE are converted into 1 and 0 by doing any mathematical
operation on them. This operation could be multiplying by 1, adding 0, but the most
elegant way is to add double minus sign in front =--ISNUMBER(SEARCH(C17:C23,J4))
which gives us:

={1;0;1;0;0;0;0}

Page 10
Analyzing Data in Excel

If we add this array up, we will get 2, which means we have 2 optional pieces from the
list found in cell E4. We do this addition with the SUMPRODUCT function, then make
absolute references as needed and copy the formula down to cell F10:

=SUMPRODUCT(--ISNUMBER(SEARCH($C$17:$C$23,J4)))

Matching Parts of Cells: Summing



Now making a formula for a total price is no much more difficult. We take the
=ISNUMBER(SEARCH(C17:C23,J4)) part again and multiply it by the prices in the range
D17:D23:

=D17:D23*ISNUMBER(SEARCH(C17:C23,J4))

We don't need double minus anymore because there is a mathematical operation of
multiplying one array by another. The result of it is:

={500;800;1100;400;1000;850;1200}*{TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE
}

which ultimately becomes:

={500;0;1100;0;0;0;0}

Adding these numbers together using the SUMPRODUCT function gives us 1,600, which
is the total price of an Air conditioner and Leather trim:

=SUMPRODUCT($D$17:$D$23*ISNUMBER(SEARCH($C$17:$C$23,J4))) copied down to
cell L10.

Page 11
Analyzing Data in Excel

Extracting Desired Parts of Text from Lists and Cells



Refers to sheet: “Fetching”

In many cases it is easier to analyze the data if its characteristics are broken down into
separate cells.

Our car models have common descriptive features (level, bodystyle, engine and
transmission). Going further, the level can be defined as base, comfort and luxury. The
table on this sheet lists all car models, and then in column C it determines and fetches
the level a particular car has (from the list which is given further down in this column,
in cells C14:C16).

The formula (let's take the first cell, C5) starts with =SEARCH(C14:C16,B5). It checks
whether cell B5 contains any of the values contained in cells C14:C16, and returns the
following array in our example:

={#VALUE!;#VALUE!;1}

The #VALUE! error means that it did not find in cell B5 any text contained in the first
two cells (C14 and C15, which contain "Base" and "Comfort" respectively). It found in
cell B5 the text which was contained in the third cell (C16, "Luxury"), and "1" means
that it found it at the character position number 1. As there can be only one (out of
three) type of level in the vehicle description, there will always be two #VALUE! errors
and one number in the array returned by this formula.

Next: we modify our formula to =ISNUMBER(SEARCH(C14:C16,B5)). This makes it
return the array Boolean values (TRUE and FALSE), where TRUE stands for a number
and FALSE for anything else (including errors):

={FALSE;FALSE;TRUE}

We then put this formula under the INDEX function with row number 0 which allows
using this array by regular worksheet functions. Then we use the MATCH function to
determine the relative position of a TRUE in the array:

=MATCH(TRUE,INDEX(ISNUMBER(SEARCH(C14:C16,B5)),0),0)

This formula returns 3, which means the level type of this car ("Luxury") is in the third
cell of an array we have given to it (C16).

Final step: we use the INDEX function to obtain the text from this cell:

=INDEX(C14:C16,MATCH(TRUE,INDEX(ISNUMBER(SEARCH(C14:C16,B5)),0),0))

and this formula returns "Luxury".

Page 12
Analyzing Data in Excel

All formulae on this sheet use exactly the same principle - you just need to populate the
lists in the range of cells C14:F16, modify references in the formula to absolute or semi-
absolute and copy it down and across. The data is now ready for further analysis.

Page 13
Analyzing Data in Excel

Practice Examples

Refers to sheet: “Pr Example”

This sheet contains two practical examples we can implement in real life which are
based on what we have learnt so far.

Due Diligence Analysis



The first example illustrating a messy extract from the accounting system which you
need to tidy up and analyze.

Enhancing Accounting System



The second example shows how you can add additional analytics to the existing
accounting system in your organization just by adding coded abbreviations to entry
descriptions.

Page 14
Analyzing Data in Excel

Splitting Text in Cells by Columns



Refers to sheet: “Delimiting”

Formulae on this sheet produce similar results as those on the "Fetching" sheet but
work differently.

In many situation we have text strings with different pieces of data separated by some
character (a delimiter).

Single-component

Our first example (in cells B4:B6) it contains a vehicle description and a price of it,
separated by a tilde ("∼"). We need to separate descriptions from prices and put them
into different columns. The challenge is that the descriptions are of different length, so
the formula will need to observe that.

We first need to find out where the delimiter is positioned in each cell, and we do it
with the formula =SEARCH("∼",B4) in cell C4 copied down to cell C6. It returns the
numbers which tell at which character position the tilde character was found. I have
put this formula separately from the rest of the formulae for better illustration
purposes but in your real life work you can aggregate this formula into the others.

The formula =LEFT(B4,C4-1) in cell I4 extracts the description from the text by taking
the number of characters from the left equal to the tilde position less one.

The formula =VALUE(RIGHT(B4,LEN(B4)-C4)) extracts the price by taking the number
of characters from the right equal to the length of the string less the tilde position. The
outcome of the formula will be treated as test, and to convert it into number I am using
the VALUE function.

Multi-component

The next example is a bit more complicated as it has several delimiters (spaces) in each
text string (located in cells B10:B16). Column C contains zeroes which are used later for
support purposes (for consistence of formula). Starting from column D, I determine the
position of every space character. To do this, I am using first the SUBSTITUTE function
which replaces a given character (in this case a space " ") with another character (I
have used the pipe character ("|") as it is not used anywhere else in the descriptions; in
your examples you can use any other character or a string of characters, e.g.
"abc987xyz"). A special and unique feature of the SUBSTITUTE function is that you can
indicate the instance number of the character you are looking for to replace with
another character. These numbers are in the header (cells D9:G9) which feed the
formulae. Once the spaces are substituted with the pipes, I initiate the SEARCH function
to find at which position this new character is located. Remember, in every column this
substituted character relates to a particular occurrence of the original space character.
Therefore, the formula returns a position number of a particular space in the text.
Page 15
Analyzing Data in Excel


The last column here (H) calculates the total length of the strings which is also used
later.

Formulae in range J10:M16 use the MID function which returns a certain number of
character from the middle of a text string starting from a giver character. As starting
character and the number of character it uses the numbers we have calculated
previously in cells C10:G16. Formulae in the last column (N10:N16) are very slightly
different in references, and they use the VALUE function again to move from text to
numbers.

As a final remark, you can do this transformation using the 'Text to Columns'
functionality in Excel. One of the reasons I have given this alternative approach here is
to show you some text functions.

Page 16
Analyzing Data in Excel

Conditional Statistics for Delimited Lists



Refers to sheet: “List criteria”

On this sheet I will explain how to count, sum, average, perform other basic statistics
with the data based on a single criteria or (in most cases) on a list of criteria.
For this exercise we will use a larger list of cars contained in the range B3:F38 (Under
the 'Source data' heading).

One condition, single criteria



The first example is easy: from the list of vehicles we take only those which have a
certain bodystyle (sedan in our case) and perform the basic analysis of these selected
vehicles. We count, sum and make an average of them using the simple COUNTIF,
SUMIF and AVERAGEIF functions.

Median is a bit more tricky - we start with the formula =C5:C38=I5 which tests which
bodystyles is the range C5:C38 are equal to the bodystyle given in the cell I5 (Sedan),
and this formula returns:

={FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TR
UE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FAL
SE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE}.

Then we divide the array with prices by the array with those values:
=F5:F38/(C5:C38=I5). As the result of dividing a number by a logical value TRUE is the
same number, and dividing something by FALSE is similar to dividing by zero and a
#DIV/0! error is produced. So the resulting array is:

={#DIV/0!;17950;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;16450;#DIV/0!;15500;14300;12
100;#DIV/0!;18500;16450;18500;13400;#DIV/0!;18750;#DIV/0!;#DIV/0!;#DIV/0!;15
350;#DIV/0!;12600;19450;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/
0!;#DIV/0!;14850}

In this array the numbers are the prices of sedans, and the rest of the car prices have
been transformed into errors.

Then we use the AGGREGATE function which can work as various other functions
depending on the first argument entered. In this case we enter 16, which corresponds
to the PERCENTILE.INC function. Next argument (6) tells the function to ignore errors.
Then goes our array, in which the errors will be ignored and the remaining numbers
correspond to the prices of sedans which we are analyzing. The last argument (0.5) tells
the function it should take the percentile at the middle of the array - which is the
median.

The minimum and maximum are calculated also through the AGGREGATE function
(similar to the standard deviation), but the parameters we are using this time are 15

Page 17
Analyzing Data in Excel

(the SMALL function) and 14 (the LARGE function). The last argument (1 in both cases)
means the functions need to find the first smallest and largest number from the array
(and if you need to find the second, third etc. smallest or largest number under the
given criteria, just replace 1 with the corresponding number). The errors, again, will be
ignored.

The population standard deviation is calculated by testing again the cars have sedan
bodystyle (=C5:C38=I5). Another part of the formula (=(F5:F38-M7)^2)/M5) is taken
from the definition of standard deviation - we take every member's price, deduct the
average price from it, and square every difference. Then we add these squares up and
divide by the number of cars. As we need to do the calculation only for the "qualifying"
cars, we divide by the number of sedans found in the first calculation (in cell M5). We
also multiply every square by the first array which tested which of the cars are sedans,
so as to zero out those which are not. We finally add them up (using the SUMPRODUCT
function) and find the square root of the total. This is the standard deviation.

Other statistical indicators (sample standard deviation, skewness, kurtosis) are
calculated below using the same techniques and the formulae definitions of the
corresponding functions.

Several conditions, single criteria

In the next example we start using several criteria, and this allows us to use either the
AND logic, or the OR logic. With the AND logic, the selection must contain all the
features we select. With the OR logic, the selection may contain any of the criteria
chosen. If we want to analyze the cars with the base level and sedan bodystyle, we can
take cars which contain both of those features or the cars which contain at least one of
those features.

Analyzing under the AND logic is relatively simple - we use the SUMIFS and COUNTIFS
functions and the AGGREGATE function in a similar way as in the previous example, but
we are doing two tests this time (one checking the levels being equal to the base, and
the other checking the bodystyles for being the sedans), and then we multiply the
resulting arrays:

=(C5:C38=I19)*(B5:B38=I18)
={FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TR
UE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FAL
SE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE})*({TRUE;FALSE;FALSE;FA
LSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FALSE;FALSE;
FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;FALSE;TRUE;FALS
E;TRUE;TRUE;TRUE;TRUE;FALSE}
={0;0;0;0;0;0;0;0;0;0;1;0;1;1;0;0;0;1;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0}

You can see these arrays if you enter the formula, then click in the formula and press
the F9 key. To see the second line above (the results of both tests separately before
they are multiplied) you have to select what's inside the first parenthesis, and also
press F9, then do the same for the second parenthesis.

Page 18
Analyzing Data in Excel


The OR logic is slightly more complicated - we need to check that at least one test result
in the pair of corresponding test results is TRUE, so we add both arrays and then test
which results are greater than zero (just to remind you, the TRUE and FALSE values
become 1 and 0 if we do some mathematical operation on them, even if we add or
multiply one of them by the other):

=(C5:C38=I19)+(B5:B38=I18)>0

={FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TR
UE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FAL
SE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE})+({TRUE;FALSE;FALSE;FA
LSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FALSE;FALSE;
FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;FALSE;TRUE;FALS
E;TRUE;TRUE;TRUE;TRUE;FALSE})>0

={1;1;0;0;1;0;1;0;1;1;2;0;2;2;1;1;0;2;1;0;0;1;0;2;2;0;0;1;0;1;1;1;1;1}>0

={TRUE;TRUE;FALSE;FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TRU
E;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FALSE
;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE}

In the last step we have obtained an array of TRUE and FALSE values which tell us
which vehicles contain either of the features. We are using this array in the formulas
directly, same as in the AND logic examples, except for the first formula which counts
them: because the SUMPRODUCT function cannot add the TRUE and FALSE values
directly, we have to convert them into ones and zero by putting a double-minus before
them (this would also be a mathematical operation).

Multiple criteria

The last three examples (on the right) allow for multiple criteria for each feature (i.e.
the level can be comfort or luxury, and the bodystyle can be sedan or wagon, or any
other combination).

These formulae rely on a somewhat non-standard use of the MATCH function: We use
an array of elements(representing a lookup value argument) to be matched to one
single cell (representing a lookup array). Looking at cell T5:

=MATCH(B5:B38,P5:P7,0)
={1;2;2;#N/A;1;2;2;2;2;2;1;#N/A;1;1;2;2;#N/A;1;1;2;#N/A;2;#N/A;1;1;2;#N/A;1;2;1;1
;1;1;2}

A number in this array means that the corresponding cell in the range B5:B38 contains
one of the values from the range P5:P70, and an error #N/A means it has not found any.
Using the ISNUMBER function converts these values into TRUE and FALSE logical
values:

Page 19
Analyzing Data in Excel

=ISNUMBER(MATCH(B5:B38,P5:P7,0))

={TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;T
RUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FAL
SE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE}

The latter array is then handled in a way described above.

In the very last example on this sheet ('Several conditions, multiple criteria - 2') I have
introduced a criteria which is NOT contained in the data (we are selecting the vehicles
the bodystyle of which is NOT sedan, i.e. it is a hatchback or a wagon). In this case
instead of the ISNUMBER function we are using the ISERROR function (the MATCH
function will return the #N/A errors for those values which it could not match, and
these errors will be converted into TRUEs, while the numbers (in the instances where
the MATCH function did find the match will become the FALSEs). The rest of the
formulae are unchanged.

Once you understand this toolset you will be able to do any other sort of analysis you
need in this format.

Page 20
Analyzing Data in Excel

Statistics for Data with Criteria and Ranks



Refers to sheet: “Rankings”

Formulae on this sheet explain how to select for analysis the data based on its rankings.

Criteria, then ranks



In the first example we will do the analysis for the 10 most expensive sedans and
wagons. We first need to segregate sedans and wagons (we have already done this on
the "List criteria" sheet) and then take 10 the most cheap cars. We start by identifying
the prices of the cars which are sedans or wagons, transforming the rest into errors
(this is part of the formula in cell I11 setting this criteria):

=F3:F36/ISNUMBER(MATCH(C3:C36,I3:I5,0))

={#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;12100;12600;18500;16450;18750;19450;18500;
15700;17950;17550;#DIV/0!;#DIV/0!;#DIV/0!;

14300;13400;15350;14850;17950;16450;15500;18500;15300;15850;14400;#DIV/0!;
#DIV/0!;16100;15250;18850;17750}

We are now familiar with the AGGREGATE function which we can use to determine the
k-th smallest element in an array - we need to use 15 as the first argument which
makes in work as the SMALL function, and 6 as the second argument, which makes it
ignore errors. As we need to get 10 numbers, not just one k-th smallest number, the last
argument must be an array of numbers from 1 to 10 which we do as:

=ROW(INDIRECT("1:"&I6))

The value of cell I6 is 10, so we get an array:

={1;2;3;4;5;6;7;8;9;10}

Combining both formulae, we get an array of 10 the smallest prices:

=AGGREGATE(15,6,F3:F36/ISNUMBER(MATCH(C3:C36,I3:I5,0)),ROW(INDIRECT("1:"&
I6)))

={12100;12600;13400;14300;14400;14850;15250;15300;15350;15500}

We add them up using the SUMPRODUCT function and here we go:


=SUMPRODUCT(AGGREGATE(15,6,F3:F36/ISNUMBER(MATCH(C3:C36,I3:I5,0)),ROW(I
NDIRECT("1:"&I6))))

Page 21
Analyzing Data in Excel

returning 143,050

We do other statistics in a similar way. The arrays returned by the AGGREGATE
functions can be processed by other Excel functions if we put those arrays under the
INDEX function with a second argument of zero. In this section, as we have a
sufficiently high number of vehicles for the analysis, I have added the calculation of
skewness and kurtosis using their native Excel functions.

Ranks, then criteria



In the second exercise here we will first take 7 the most expensive comfort series
vehicles, and then, from that selection, we will take only sedans for further analysis. We
start by calculating the number of cars which qualify (in cell L10), and we start by
determining the 7 most expensive Comforts in a similar way as we did in the first
exercise:

=AGGREGATE(14,6,F3:F36/(B3:B36=I21),INDEX(ROW(INDIRECT("1:"&I22)),0))

={19000;18500;17950;16450;15850;15500;15350}

Notice we are using the INDEX function again to enable further processing of the
resulting array by other functions, but in this case we are "surrounding" by it the
ROW(INDIRECT()) construction.

Now we have the prices of the cars we need to be looking at. We do our next test based
on the following criteria:

- the price of the car belongs to the array we have just identified
- the car is a Comfort
- the car is a sedan

We test the first criteria with the MATCH function as we have done earlier:

=ISNUMBER(MATCH(F3:F36,AGGREGATE(14,6,F3:F36/(B3:B36=I21),INDEX(ROW(IN
DIRECT("1:"&I22)),0)),0))

={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;FALSE;TRUE;FALSE;T
RUE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;F
ALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE}

As you can notice, TRUE is repeated 11 times here which means there are 11 cars the
prices of which are found in our array of 7 the most expensive Comforts. The reason for
that is that some prices are duplicating, and there are other car series having the same
prices. This is why we have to do our second test (check whether the car is a Comfort
again) to eliminate such occurrences. Here is the formula which combines all the three
tests:

Page 22
Analyzing Data in Excel

=ISNUMBER(MATCH(F3:F36,AGGREGATE(14,6,F3:F36/(B3:B36=I21),INDEX(ROW(IN
DIRECT("1:"&I22)),0)),0))*(B3:B36=I21)*(C3:C36=I24)

which results in:

={0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;1;1;1;1;0;0;0;0;0;0;0;0;0}

In this list every "1" corresponds to a car which satisfies all the three criteria. We add
them up with the SUMPRODUCT function and get 5.

All other statistic calculations rely on this algorithm of criteria selection and are done in
the same way as in the first example.

Page 23
Analyzing Data in Excel

Sorting Data under Multiple Criteria



Refers to sheet: "Sorting"

In some situation it is comfortable to have several columns next to the data showing
whether a record satisfies a criteria, and then combine and/or sort the list based on this
criteria. In this examples we are looking for the cars which are either Base series or
have a manual transmission, but in no event are sedans.

Formulae in column C check for the first condition (Base series). As we are not given a
separate column here with a series name but have series as part of the whole vehicle
description (in column B) we will use the SEARCH function to determine if the word
"Base" can be found in the description. When this function find the text it looks for, it
returns the number of character from which this text starts. If the text is not found, it
returns and error (#VALUE!). Therefore, it will be enough to see that the function
returns some number, which we will do with the ISNUMBER function:

=ISNUMBER(SEARCH($C$3,B6))

This formula is in cell C6 and is copied down to the last row.

We use the same approach in column D to find out whether the description contains the
"Manual" transmission. As for column E, since our goal is to find the descriptions not
containing the "Sedan" bodystyle, we are interested here in the instances when the
SEARCH function returns errors. So our formula will be:

=ISERROR(SEARCH($E$3,B6))

Column F is the resulting calculation taking into account all the three criteria. We need
to have either the first or the second condition satisfied, and the third condition also
satisfied, so we do:

=AND(OR(C6,D6),E6)

in cell F6 and copy this formula down again.

Now if we click on an arrow in cell F5 and filter the column by TRUE, we will get a list of
20 vehicles which satisfy our criteria.

Page 24
Analyzing Data in Excel

Conditional Statistics for Non-Delimited Lists



Refers to sheet: “In-cell”

On this sheet I am explaining how to do the analysis without splitting the description
text. The formulae will be checking for the features directly in the cells containing the
descriptions.

I am using the same list of cars as we have seen on the "List criteria" sheet and doing
the same analysis. The list of cars with full descriptions in one cell is in range B4:B38.

One condition, single criteria



In the first example we will analyze only the cars with a sedan bodystyle.

We will use the SEARCH function and will apply it to an array of text cells, which will
return an array of results (cell J5):

=SEARCH(F5,B5:B38)

={#VALUE!;9;#VALUE!;#VALUE!;#VALUE!;#VALUE!;9;#VALUE!;9;9;6;#VALUE!;6;6;9;9;
#VALUE!;6;#VALUE!;#VALUE!;#VALUE!;9;#VALUE!;6;6;#VALUE!;#VALUE!;#VALUE!;#
VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;9}

Again, we will use the ISNUMBER functions to separate numbers and errors and to see
more clearly which cells contain the text we are looking for:

=ISNUMBER(SEARCH(F5,B5:B38))

={FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE;FALSE;TR
UE;TRUE;TRUE;TRUE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;TRUE;FAL
SE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE}

With this array we can perform statistical analysis of the prices for the sample we have
chosen. For many formulae though we will not be able to use their native Excel function
and will need to use the AGGREGATE function (for the median, min, max) or to "model"
them based on their definition (for the standard deviation, skewness, kurtosis).

Several conditions, single criteria



I am using this approach in the second and third examples underneath, combining
these expression depending on the logic in the same way as we have done on the 'List
criteria'.

Multiple criteria

Page 25
Analyzing Data in Excel

The three examples to the right show how to use a range of criteria for the analysis. If
you look at cell Q5, the formula relies on the SEARCH function again but this time we
will create a matrix of results consisting of 3 columns (the number of criteria in the
range M5:M7) and 34 rows (corresponding to the number of models in the range
B5:B38). As both ranges are vertical arrays, we will need to transpose one into a
horizontal one to construct such a matrix. As you can guess, we will use the
TRANSPOSE function for that.

If we take a reference =M5:M7 and F9 it ("to F9" became a verb on its own in modern
English) we will see:

={"Base";"Comfort";0}

As we are going to use this array with the SEARCH function shortly we need to do
something with an empty cell M7 which is currently not filled with any criteria but we
want to keep it for flexibility in the future. If we use this blank sell with the SEARCH
function the latter will find it in any cell we will be testing, so we will do a small trick:

=IF(ISBLANK(M5:M7),"|||",M5:M7)

which returns:

={"Base";"Comfort";"|||"}

We have just replaced an empty cell with three pipe symbols ("|") which I have chosen
because it is very unlikely to be found in any text; you can use your own combination of
any symbols you wish.

Notice the values are separated by semicolons, which represent end of lines in an array;
so our array has three rows. If we now use the TRANSPOSE function we will get:

=TRANSPOSE(IF(ISBLANK(M5:M7),"|||",M5:M7))

={"Base","Comfort","|||"}

Instead of semicolons we now have values separated by commas, which separate
columns in matrices; so this is now an array of one row and three columns. We will use
this array with the array of car descriptions and the SEARCH function:

=SEARCH(TRANSPOSE(IF(ISBLANK(M5:M7),"|||",M5:M7)),B5:B38)

This results in:

={1,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,#VALUE!,#V
ALUE!;1,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,1,#VAL
UE!;#VALUE!,1,#VALUE!;#VALUE!,1,#VALUE!;1,#VALUE!,#VALUE!;#VALUE!,#VALUE!,
#VALUE!;1,#VALUE!,#VALUE!;1,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,1,#V
ALUE!;#VALUE!,#VALUE!,#VALUE!;1,#VALUE!,#VALUE!;1,#VALUE!,#VALUE!;#VALUE!
,1,#VALUE!;#VALUE!,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,#VALUE!,#VAL

Page 26
Analyzing Data in Excel

UE!;1,#VALUE!,#VALUE!;1,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!;#VALUE!,#VALUE!,
#VALUE!;1,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!;1,#VALUE!,#VALUE!;1,#VALUE!,#V
ALUE!;1,#VALUE!,#VALUE!;1,#VALUE!,#VALUE!;#VALUE!,1,#VALUE!}

Its a long array but you can see that every three members are separated by semicolons,
and there are 34 such groups; this means we have an array of 3 columns and 34 rows.
Applying our standard procedure to separate numbers from errors:

=ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M5:M7),"|||",M5:M7)),B5:B38))

={TRUE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALSE;TR
UE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,TRUE,FALSE;FALSE,TRUE,FALSE;FALSE,T
RUE,FALSE;FALSE,TRUE,FALSE;TRUE,FALSE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,
FALSE;TRUE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALS
E;TRUE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALSE;FA
LSE,TRUE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,T
RUE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,TRUE,FALSE;TRUE,FALSE,
FALSE;TRUE,FALSE,FALSE;TRUE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,TRUE,FALSE
}

and putting a double negative in front:

=--ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M5:M7),"|||",M5:M7)),B5:B38))

={1,0,0;0,1,0;0,1,0;0,0,0;1,0,0;0,1,0;0,1,0;0,1,0;0,1,0;0,1,0;1,0,0;0,0,0;1,0,0;1,0,0;0,1,0;0,1
,0;0,0,0;1,0,0;1,0,0;0,1,0;0,0,0;0,1,0;0,0,0;1,0,0;1,0,0;0,1,0;0,0,0;1,0,0;0,1,0;1,0,0;1,0,0;1,0
,0;1,0,0;0,1,0}

Still the same array of 3 columns and 34 rows, but this time we have brought it down to
ones and zeroes. Every 1 represents an instance where one of the criteria (in the range
M5:M7) was found in the list B5:B38. Every criteria is unique for every description (a
car cannot be both base and comfort level), so if we add up these numbers this will give
us a total number of cars satisfying the criteria:

=SUM(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M5:M7),"|||",M5:M7)),B5:B38)))

This is the formula in cell Q5 which returns 28. It must be array-entered
(Ctrl+Shift+Enter). This criteria is used in all other formulae in this example.

Moving on to the next examples (with several conditions), we will need to deal with
several matrices in one formula. The matrices might have different dimensions and
cannot be combined directly. So we will transform them back from two-dimensional
into single-dimensional matrices using the MMULT function. In the example (developed
further in cell P18) we start with =--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M18:M20),"|||",M18:M20)),B5:B38))
which results in a 3x34 matrix:

Page 27
Analyzing Data in Excel

={1,0,0;0,0,0;0,0,0;0,0,0;1,0,0;0,0,0;0,0,0;0,0,0;0,0,0;0,0,0;1,0,0;0,0,0;1,0,0;1,0,0;0,0,0;0,0
,0;0,0,0;1,0,0;1,0,0;0,0,0;0,0,0;0,0,0;0,0,0;1,0,0;1,0,0;0,0,0;0,0,0;1,0,0;0,0,0;1,0,0;1,0,0;1,0
,0;1,0,0;0,0,0}

We have done this in previous examples. This time we will do =MMULT(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M18:M20),"|||",M18:M20)),B5:B38)),{1;
1;1})

Using the MMULT function to multiply a 3x34 array by a 1x3 array results in a 1x34
array:

={1;0;0;0;1;0;0;0;0;0;1;0;1;1;0;0;0;1;1;0;0;0;0;1;1;0;0;1;0;1;1;1;1;0}

As our second multiplier (the small matrix) consists entirely of 1s, the resulting matrix
returns a 1 for every row (three members) in the larger array if at least one member in
the row was 1 (you can read more about the MMULT function in the Excel Help).

Finally, to quickly generate an array of a given size consisting of 1s, I am using the ROW
function first which returns numbers of rows in a reference:

=ROW(M18:M20)

={18;19;20}

Raising these numbers to the power 0 converts them (as any other number) into 1s:

=ROW(M18:M20)^0

={1;1;1}

The final formula in cell P18 multiplies both arrays (to comply with the AND logic) and
adds up the products:

=SUM(MMULT(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M18:M20),"|||",M18:M20)),B5:B38)),R
OW(M18:M20)^0)*MMULT(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(M21:M23),"|||",M21:M23)),B5:B38)),R
OW(M21:M23)^0))

This approach is used in all other formulae through this example.

In the next example ('Several conditions, multiple criteria - 2') we are using the same
approach but, as the second condition is exclusive (NOT sedan) we are using the
ISERROR instead of the ISNUMBER function. Note this time we are not doing anything
with unused empty cells (cells M35 and M36 in our example) so that they are counted
first and then excluded by the ISERROR function.

Page 28
Analyzing Data in Excel

“Flexible” selection

The last example on this sheet ("Flexible selections") allows to choose and features, or
keywords, and performs the analysis based on the AND or the OR logic. The OR logic
formulae follow the same principle as other formulae; the AND logic formulae make use
of a special particularity of multiplying matrices. Considering our example (in cell W5),
the =--ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(T5:T7),"|||",T5:T7)),B5:B38))
part of it returns the following matrix (I have added line breaks after each semicolon
for the sake of legibility - it is important to understand how it all works):

={1,0,0;
0,1,0;
0,0,0;
0,0,0;
1,0,0;
0,0,0;
0,1,0;
0,0,0;
0,1,0;
0,1,0;
1,1,0;
0,0,0;
1,1,0;
1,1,0;
0,1,0;
0,1,0;
0,0,0;
1,1,0;
1,0,0;
0,0,0;
0,0,0;
0,1,0;
0,0,0;
1,1,0;
1,1,0;
0,0,0;
0,0,0;
1,0,0;
0,0,0;
1,0,0;
1,0,0;
1,0,0;
1,0,0;
0,1,0}

Every line has three elements corresponding to the three cells in the range T5:T7. A 1
means that a particular cell content is found in the corresponding cell in the range
B5:B38. If we multiply this array by ={1;1;1} (represented by the ROW(T5:T7)^0 part
of formula) we will get a one-dimensional array:

Page 29
Analyzing Data in Excel


=MMULT(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(T5:T7),"|||",T5:T7)),B5:B38)),ROW(T5:
T7)^0)

={1;1;0;0;1;0;1;0;1;1;2;0;2;2;1;1;0;2;1;0;0;1;0;2;2;0;0;1;0;1;1;1;1;1}

This array is a vertical array but I have not made line breaks this time. You may notice
that essentially this operation adds up values within each line of the original array and
makes this sum a value in the new array. For instance, the first two values being 1
means that only one of the conditions is satisfied in the first two cells, and so on. We are
interested in those instances where all conditions are satisfied, and as we have two
conditions, in the values equal to 2. So we modify our formula further:

=(MMULT(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(T5:T7),"|||",T5:T7)),B5:B38)),ROW(T5:
T7)^0)=COUNTA(T5:T7))

={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;
TRUE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRU
E;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE}

Then we translate logical values into 1s and 0s by adding a double negative sign in front
of the array and summing those values up with the SUM function (the formula must be
array-entered - Ctrl+Shift+Enter):

=SUM(--(MMULT(--
ISNUMBER(SEARCH(TRANSPOSE(IF(ISBLANK(T5:T7),"|||",T5:T7)),B5:B38)),ROW(T5:
T7)^0)=COUNTA(T5:T7)))

and the total count is 6. Other formulae on this sheet are based on the same criteria
selection

Once again, all formulae in these last four examples must be array entered
(Ctrl+Shift+Enter). Also make sure any range, except in the last example, does not
contain any duplicating criteria (e.g. range M21:M23 must contain only bodystyles;
don't mix it up with other features as this will likely lead to errors).

Page 30
Analyzing Data in Excel

Analyzing Data Containing Dates



Refers to sheet: “Dates”

On this sheet I will explain the basics of analyzing the information using dates. I have
taken the list of cars from the previous sheet and added dates of sale for every car in
column A.

Grouping by years

In the first example we will calculate by years the total revenues (as a sum of the
prices) and the number of cars sold. The initial part of the formula uses the YEAR
function which extracts the year from a date. We apply this function to the array of
dates and in return get the following array of numbers representing years:

=YEAR(A5:A38)

={2017;2017;2018;2016;2017;2018;2018;2016;2018;2017;2018;2017;2017;2018;201
6;2016;2018;2018;2017;2017;2016;2017;2017;2018;2018;2018;2016;2017;2018;201
7;2018;2016;2016;2016}

Once again, to see the second line you need to enter the formula, then click on the
formula bar and press F9.

Then we do a check of which of those numbers are equal to a given year:

=YEAR(A5:A38)=I4

={FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;F
ALSE;FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALS
E;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE}

In the next steps we multiply this array by an array with prices (which zeroes out
prices for those cars sold in other years) and add up the remaining prices using the
SUMPRODUCT function:

=SUMPRODUCT(F5:F38*(YEAR(A5:A38)=I4))

=SUMPRODUCT({13900;12050;10400;10800;12100;12600;18500;16650;18750;1945
0;18900;15700;17900;17550;19000;14300;12750;14300;13400;15350;14850;17950;
16200;15500;18300;15300;15850;14400;14600;17450;16100;15250;18850;17750}*{
FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FA
LSE;FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;
FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE})

=SUMPRODUCT({0;0;0;10800;0;0;0;16650;0;0;0;0;0;0;19000;14300;0;0;0;0;14850;0;0
;0;0;0;15850;0;0;0;0;15250;18850;17750})

Page 31
Analyzing Data in Excel


The total is 143,300.

To count the vehicles sold in a particular year, we just skip adding the prices; instead,
we put a double minus before the test to convert the results into 1s and zeroes and add
them up again:

=SUMPRODUCT(--(YEAR(A5:A38)=I4))

=SUMPRODUCT(--
({FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;F
ALSE;FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALS
E;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE}))

=SUMPRODUCT({0;0;0;1;0;0;0;1;0;0;0;0;0;0;1;1;0;0;0;0;1;0;0;0;0;0;1;0;0;0;0;1;1;1})

which returns 9.

Grouping by years and additional criteria

In the second exercise we also get sales by years but in addition we will break down the
cars by their equipment level (Base, Comfort, Luxury). The formula is not much
different: we just add one more criteria testing the level in each line, so the prices of the
cars of other levels are zeroed out:

=SUMPRODUCT(F5:F38*(YEAR(A5:A38)=I4)*(B5:B38=H11))

In the formula which counts the cars we now don't need to add a double minus: the
purpose of it was to do some mathematical operation with TRUEs and FALSEs which
converts them into 1 and o respectively. But now we are multiplying two arrays of
logical values which does the same.

Grouping by years and quarters



In the third example we will show the sales broken down by years and quarters.
Because quarters can be presented using Arabic (1, 2, 3, 4) or Roman (I, II, III, IV)
number, we will look at both options.

To determine a quarter I normally use the LOOKUP(month,{1,4,7,10;1,2,3,4})
construction (although there can be others). To figure out the month we will use the
MONTH function, so the starting point of our formula will be:

=LOOKUP(MONTH(A5:A38),{1,4,7,10;1,2,3,4})

which returns an array of numbers corresponding to the quarters of our dates:

={1;4;4;3;3;4;3;4;3;2;4;1;1;3;1;4;2;3;1;4;3;2;2;4;1;4;4;2;4;1;3;2;4;4}

Page 32
Analyzing Data in Excel

The next steps are no different to what we have done before: we do a test of whether
the quarter number is equal to the given quarter, combine it with the test of the year,
multiply by the prices and add up the results:

=SUMPRODUCT(F5:F38*(YEAR(A5:A38)=I22)*(LOOKUP(MONTH(A5:A38),{1,4,7,10;1,
2,3,4})=H23))

=SUMPRODUCT({13900;12050;10400;10800;12100;12600;18500;16650;18750;1945
0;18900;15700;17900;17550;19000;14300;12750;14300;13400;15350;14850;17950;
16200;15500;18300;15300;15850;14400;14600;17450;16100;15250;18850;17750}*(
{FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FA
LSE;FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;
FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE})*({TRUE;FALSE;FALSE;F
ALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;TRUE;FALS
E;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;F
ALSE;TRUE;FALSE;FALSE;FALSE;FALSE}))

=SUMPRODUCT({0;0;0;0;0;0;0;0;0;0;0;0;0;0;19000;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0
})

From the last iteration we can see there was only one car sold in the first quarter of
2016 for 19,000, so the result of the formula is 19,000.

For the third and the fourth quarters I have put the Roman numbers. To check the
quarters we have received against those Roman numbers we will use the ROMAN
function which converts Arabic numbers into Roman numbers (cell I25):

=ROMAN(LOOKUP(MONTH(A5:A38),{1,4,7,10;1,2,3,4}))

={"I";"IV";"IV";"III";"III";"IV";"III";"IV";"III";"II";"IV";"I";"I";"III";"I";"IV";"II";"III";"I";"IV";
"III";"II";"II";"IV";"I";"IV";"IV";"II";"IV";"I";"III";"II";"IV";"IV"}

The rest of the formula has not changed. Starting from Excel 2016 there is the ARABIC
function available which does the opposite - converts Roman numbers into Arabic, so
instead of converting the whole array from Arabic into Roman you can convert just the
header of your line from Roman into Arabic.

Grouping by years and months



The fourth example shows how to calculate revenues by years and months. Again, there
could be different notations for the months (a number, a three-letter abbreviation, the
full name) and I am showing how to use all.

Using a number, we will have to determine the month number from the date - the
MONTH function can do it. Analyzing the formula in cell I30, its first component
extracts the months' number from the dates array:

=MONTH(A5:A38)

Page 33
Analyzing Data in Excel


={2;11;12;7;8;11;8;11;9;4;10;2;2;7;3;11;5;8;2;10;8;5;4;12;1;12;11;4;12;3;8;4;12;11}

We have done the same and are doing this here for the years. Finally, we multiply the
product of both array by the price array and get the total sales for a particular month in
a particular year.

=SUMPRODUCT((MONTH(A5:A38)=H31)*(YEAR(A5:A38)=I30)*F5:F38)

If the months are expressed as three-letter abbreviations we will use a different
approach. We will format our dates to show only the months using the TEXT function.
See how this is done in e.g. cell I35:

=TEXT(A5:A38,"mmm")

={"Feb";"Nov";"Dec";"Jul";"Aug";"Nov";"Aug";"Nov";"Sep";"Apr";"Oct";"Feb";"Feb";"Jul";
"Mar";"Nov";"May";"Aug";"Feb";"Oct";"Aug";"May";"Apr";"Dec";"Jan";"Dec";"Nov";"Apr"
;"Dec";"Mar";"Aug";"Apr";"Dec";"Nov"}

The format "mmm" means showing month names reduced to three letters, as you can
see in the second line showing formula evaluation. In the last section of this table
(which deals with full month names) we are using the format "mmmm" (in e.g. cell I39):

=TEXT(A5:A38,"mmmm")

={"February";"November";"December";"July";"August";"November";"August";"Novemb
er";"September";"April";"October";"February";"February";"July";"March";"November";"
May";"August";"February";"October";"August";"May";"April";"December";"January";"De
cember";"November";"April";"December";"March";"August";"April";"December";"Nove
mber"}

The rest of the formulae in both sections is the same as in the first section - we test the
year, multiply both tests by each other and the price array and add up the products.

Grouping by weekdays

In the last example I am grouping sales by weekdays and the approach is quite similar:
if a weekday is expressed as a number I am using the WEEKDAY function (I am using
the European convention under which the first day of the week is Monday). If we use
weekdays as three-letter abbreviations or full names, I am using the TEXT function with
"ddd" or "dddd" respectively as a second argument.

As a last remark, if you have all the features combined in one cell (as we had in the "In-
cell" sheet), you can use the ISNUMBER(SEARCH(...)) construction again to do this
analysis by dates.

Page 34
Analyzing Data in Excel

Transforming Two-Dimensional Tables



Refers to sheet: “2D”

Sometimes we have a two-dimensional table as source data and we need to present this
data in a different format. An example of such a table is given in this sheet ("Source
data" heading). This table looks like a pivot table, and putting it in a different way is
called unpivoting.

Single criteria

We will start with the most simple Summary 1 which aggregates the numbers by level.
We will yet again use the SUMPRODUCT function and this time its first argument will be
the 2D range (E6:G10) filled with the revenue information. Excel treats this range as a
5x3 array, and if you enter =E6:G10 into a cell and press F9 while the cursor is in a cell
or in the formula bar you will see this:

={47150,24700,70050;0,73350,0;46050,57900,45550;0,67950,0;32050,0,67950}

Notice the line items are separated with commas (","), and lines with semicolons (";").
For the sake of convenience, going forward I will be adding a line break after each
semicolon:

={47150,24700,70050;
0,73350,0;
46050,57900,45550;
0,67950,0;
32050,0,67950}


We will take this array and multiply it by a vertical array consisting of five rows
representing the checks of whether the cars belong to the base series:

=E6:G10*(B6:B10=B15)

This will return an array as follows:

={47150,24700,70050;
0,73350,0;
0,0,0;
0,0,0;
0,0,0}

In the original table the Base series is the first and the second line, so the formula has
kept the values in those lines (they have been multiplied by the TRUE values from the
array with checks which is equivalent to multiplying by 1) and zeroed out the other
lines (multiplying them by the FALSE values which is the same as multiplying by 0).

Page 35
Analyzing Data in Excel


Lastly, we put this array under the SUMPRODUCT function which adds up all the values
and returns 215,250.

Multiple criteria, one dimension

In Summary 2 we will add another column with a split by bodystyle and the formula
will need an extra check for that. In the source data table bodystyles are located in the
header, which means they represent a horizontal array. So now we will be multiplying
our number array by a vertical and a horizontal arrays of logical values, and the
formula =E6:G10*(E5:G5=F15)*(B6:B10=E15) will result in:

={47150,0,0;
0,0,0;
0,0,0;
0,0,0;
0,0,0}

The operation has kept only one number (at the intersect of the tests satisfying both
conditions). So the SUMPRODUCT of this array returns 47,150.

Multiple criteria, two dimensions



We keep complicating our formulae in Summary 3 and add one more criteria -
transmission type. We will put it into the header, while in the source table it is located
in a column, so this time we will multiply the following four arrays (a 2D array, two
vertical arrays and one horizontal array) in cell K16:

=E6:G10*(B6:B10=I16)*(D6:D10=K15)*(E5:G5=J16)

={47150,24700,70050;
0,73350,0;
46050,57900,45550;
0,67950,0;
32050,0,67950}

={TRUE;
TRUE;
FALSE;
FALSE;
FALSE}

={TRUE;
FALSE;
TRUE;
FALSE;
TRUE}

Page 36
Analyzing Data in Excel

={FALSE,TRUE,FALSE}

The resulting array is:

={0,24700,0;
0,0,0;
0,0,0;
0,0,0;
0,0,0}

rendering the total of 24,700 (base sedans with manual transmissions) - this is the only
number located at the intersect of all the TRUEs.

Page 37
Analyzing Data in Excel

Creating Automatic Lists Based on Criteria



Refers to sheet” “Auto”

On this sheet I will show you how to create automatic lists based on specified criteria
using formulae.

Single criteria

The first example (in columns E and F) makes a selection based on single criteria (in
cell E4). We start looking at the formula with the familiar part
=ISNUMBER(SEARCH(E4,B10:B43)) which cells in the source range contain a value in
the criteria cell:

={FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;
FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FA
LSE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;FALSE}

Then we take a range with ordinal numbers (in column A) corresponding to the source
range entries, and divide these values by the array above. Dividing some number by
TRUE is equivalent to dividing it by 1 and leaves it unchanged; dividing it by FALSE is
the same as dividing it by zero and returns an error. So our modified formula
=A10:A43/ISNUMBER(SEARCH(E4,B10:B43)) will produce:

={#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;8;#DIV/0!;#DIV/0!;#D
IV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;17;#DIV/0!;#DIV/0!;#DIV/0!;21;#DI
V/0!;#DIV/0!;#DIV/0!;#DIV/0!;26;#DIV/0!;28;29;30;31;32;#DIV/0!;#DIV/0!}

As you see, it has kept only those position numbers which contain "Hatchback",
according to our criteria. We will need to extract those numbers, one by one, based on
their ranking, using the AGGREGATE function with 15 as the first argument (which
makes it work as the SMALL function) and 6 as the second argument (which makes it
ignore errors). The last argument (relative position) is taken from column A which, as
we recall, contains consecutive numbers:

=AGGREGATE(15,6,A10:A43/ISNUMBER(SEARCH(E4,B10:B43)),A10)

This function returns 8 being the first smallest number in the above array. With the
help of the INDEX function we will find out the 8th position in the source data:

=INDEX(B10:B43,AGGREGATE(15,6,A10:A43/ISNUMBER(SEARCH(E4,B10:B43)),A10))

and the answer is "Comfort hatchback 1.6 manual"

As a final step, we will put this expression under the IFERROR function (because when
we are done with all numerals after the division operation the formula will start

Page 38
Analyzing Data in Excel

returning errors), make absolute references as needed and copy the formula down to
cell E43:

=IFERROR(INDEX($B$10:$B$43,AGGREGATE(15,6,$A$10:$A$43/ISNUMBER(SEARCH(
$E$4,$B$10:$B$43)),A10)),0)

Formulae in the neighboring column F are the same formulae but referring to column C
as the first argument and so returning the car prices.

Multiple criteria (AND and OR logic)



Formulae in the second (columns H and I) and the third (columns K and L) are more
complex but they use the techniques we have covered before (on the "In-cell" sheet).
We are using the MMULT function to determine which cells contain the listed criteria
and check if the vehicle qualifies based on the AND or the OR logic. Formulae in these
two examples must be array-entered (Ctrl+Shift+Enter).

Page 39
Analyzing Data in Excel

What’s Next?
Descriptive Statistics for Grouped (Weighted) Data

Excel has a powerful set of tools to perform statistical analysis, but they apply only to
ungrouped data. As an example, the AVERAGE function calculates a simple average of
ungrouped numbers, but calculating a weighted average requires a different approach.

This review compiles the formulas to perform statistical calculations for grouped data.
They cover the same (and even greater) scope as Excel’s native statistical functions.

The file includes calculations for:

– single-array data (average, median, variance, standard deviation, percentiles,
skewness, kurtosis)
– dual-array data (covariance, correlation, standard error, linear trend).

I have also given an overview of the basics of statistics and how it applies to financial
analysis and included a method to assign weights to your numbers, depending on their
relevance to your benchmarks.


https://www.eloquens.com/tool/QklGiG1a/engineering/statistics-
methods/descriptive-statistics-for-grouped-weighted-data

Page 40
Analyzing Data in Excel

Portfolio Analysis and Sales Forecasting



This publication includes a very practical set of tools providing a mathematical basis for
a number of real-business situations:

1) Building a full picture of past sales for the whole portfolio and by feature:

– calculating all types of statistics for the historic performance
– analyzing variances between the periods, explaining deviations between budget
and actual (price, mix and volume effect)
– drawing seasonality patterns on a cycle plot

2) Making educated predictions of future sales and portfolio performance, including:

– How to build trends and why commonly used CAGR rarely gives reliable future
estimates.
– How to analyze and forecast seasonalities based on monthly, quarterly or
weekly data.
– How to measure historic volatilities and translate them into model scenarios
with a desired level of confidence.
– How to tie volatilities to a timeline, measure standard error and set up scenarios
based on that data.

https://www.eloquens.com/tool/dXAVfKnZ/finance/financial-modeling-courses-
tutorials/portfolio-analysis-and-sales-forecasting

Page 41

You might also like