Professional Documents
Culture Documents
FinQuiz - Curriculum Note, Study Session 1, Reading 2
FinQuiz - Curriculum Note, Study Session 1, Reading 2
1. INTRODUCTION
FinQuiz Notes – 2 0 2 2
Data is the key input for security analysis and Organizing, cleaning, and analyzing data is
investment management. The rapid growth in highly important and is a foundation of a
technology has contributed to providing a successful investment strategy. The data is then
data-rich environment featuring large volume, examined to detect - important relationships,
high velocity, and a wide variety of data - valuable insights, underlying structures, and
resulted in investors embracing big data for outliers - within the dataset.
their investment strategies.
2. DATA TYPES
• Simplest format to organize information One of the most popular forms of organizing
• Suitable for compiling data with single data for computers or humans.
variable. For example, time-series data –
such as closing price of TSLA for the first 10 Data tables are similar to excel spreadsheet
trading days in January 2021. where columns hold multiple variables and
• Time series format facilitates: rows hold multiple observations typically
organized in a time ordered sequence.
o future data updates to the current
dataset.
o in observing trends or patterns in the Practice: Example 3,
data over time Curriculum Volume 1, Reading 2.
Example: Suppose,
Max. value = 3.5%
Reading 2 Organizing, Visualizing, and Describing Data
Compute the endpoint of the first bin by Step 4: Determine bin width
adding bin width to the minimum value.
?@-A( JK%
Then compute the 2nd bin’s endpoint by Bin width = = = 3.5%
L K
adding the bin width to the endpoint of the
first bin. Step 5: Determine the end points of the bins
-2.2%
Reading 2 Organizing, Visualizing, and Describing Data
Cumulative Relative Frequency is computed by the percentage of observations that are less
adding up the relative frequencies. It reflects than the upper limit of each interval.
Practice: Example 4,
Volume 1, Reading 2.
Uses of Contingency tables better insights into the portions where the
model is creating errors and where the model is
Contingency tables can be used to examine correct.
the potential association between two
variables. One method used to test for the
Refer to the paragraph above
potential association between variables is Chi-
Example 5.
square test of independence.
6. DATA VISUALIZATION
Histogram
A histogram is the graphical representation of
the frequency distribution (absolute frequency
or relative frequency) of numerical data.
• Bar charts are similar to a histogram, Note: Bar charts are also used when
with the difference that bar charts categorical data are also associated with
represent the frequency distribution of numerical data
categorical data.
• Each bar indicates a distinct category
arranged with no logical ordering.
Reading 2 Organizing, Visualizing, and Describing Data
Practice: Example 6,
6.6 Scatter Plot Volume 1, Reading 2.
A measure of central tendency indicates the *The difference between each outcome and
center of the data. The most used measures of the mean is called a deviation.
central tendency are:
Property 2:
•Arithmetic mean The arithmetic mean is sensitive to extreme
•Median values i.e., it can be biased upward or
•Mode downward by extremely large or small
•Weighted mean observations, respectively.
•Geometric mean
•Harmonic mean Advantages of Arithmetic Mean:
Winsorized mean: In a winsorized mean, a The mode is the most frequently occurring
stated % of the lowest values is assigned a value in a distribution.
specified low value and a stated % of the
highest values is assigned a specified high Unimodal Distribution: A distribution that has
value and then a mean is computed from only one mode is called a unimodal
the restated data. distribution.
Median is the middle value of a sorted Modal Interval: Data with continuous
(ascending or descending) list of items. distribution (e.g., stock returns) may not have a
modal outcome. In such cases, a modal
Steps to compute the Median: interval is found i.e., an interval with the largest
1. Arrange all observations in ascending order number of observations (highest frequency).
i.e., from the smallest to the largest. The modal interval always has the highest bar
2. When the number of observations (n) is in the histogram.
odd, the median is the center observation
in the ordered list i.e. Important to note: The mode is the only
(-hJ) measure of central tendency that can be used
Median will be located at = position
j with nominal data.
7.4.2) The Geometric Mean In addition, the geometric mean ranks the two
funds differently from that of an arithmetic
Geometric mean (GM): The geometric mean mean.
can be used to compute the mean value over
time to compute the growth rate of a variable.
Practice: Example 10,
Volume 1, Reading 2.
𝐺 = tr𝑋J 𝑋j 𝑋F … 𝑋-
with Xi ≥ 0 for i = 1, 2, …, n.
7.4.3) The Harmonic Mean
Or
1 -
𝐼𝑛 𝐺 = 𝐼𝑛(𝑋J 𝑋j 𝑋F … 𝑋- ) 1
𝑛 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝑀𝑒𝑎𝑛 𝑋c• = 𝑛/ m( )
𝑋E
EfJ
or as with Xi > 0 for i = 1,2, …, n.
∑-EfJ 𝐼𝑛𝑋-
𝐼𝑛 𝐺 = • It is a special case of the weighted
𝑛 mean in which each observation's
weight is inversely proportional to its
G = elnG
magnitude.
• It should be noted that the geometric
Cost Averaging is an investment strategy
mean can be computed only when the
involving periodic investments of fixed amount
product under the radical sign is non-
of money. Harmonic mean is appropriate when
negative.
averaging the ratios, and the ratios are
repeatedly applied to a fixed quantity to yield
The geometric mean return over the time a variable number of units.
period can be computed as:
In cost averaging, the ratios to be averaged
𝑅w($x = [(1 + 𝑅J )(1 + 𝑅j ) … (1 + 𝑅Z )]J/Z − 1 are prices per share at the date of the
purchase, and then apply those prices to a
• Geometric mean returns are also known constant amount of money to yield a variable
as compound returns. number of shares.
8. QUANTILES
Dividend
No. Company Thus,
Yield(%)
P10 = X5 + (5.1 – 5) (X6 – X5) = 0.26 + 0.1 (1.09 –
22 Nestle 2.55 0.26)
= 0.34%
23 Royal Bank of 2.60
Scotland Group
Calculating 90th percentile (P90):
24 ABN-AMRO Holding 2.65 L90 = (50 + 1) × (90 / 100) = 45.9
25 BNP Paribas 2.65
• It implies that 90th percentile lies
26 UBS 2.65 between the 45th observation (X45 =
5.15) and 46th observation (X46 = 5.66).
27 Tesco 2.95
28 Total 3.11 Thus,
29 GlaxoSmithKline 3.31 P90 = X45 + (45.9 – 45) (X46 – X45) = 5.15 + 0.90
(5.66 – 5.15) = 5.61%
30 BT Group 3.34
Calculating 1stQuartile (i.e.P25):
31 Unilever 3.53
L25 = (50 + 1) × (25 / 100) = 12.75
32 BASF 3.59
33 Santander Central 3.66 • It implies that 25th percentile lies
Hispano between the 12th observation (X12 =
1.51) and 13th observation (X13 = 1.75).
34 Banco Bilbao 3.67
Thus,
VizcayaArgentaria
P25 = Q1 = X12 + (12.75 – 12) (X13 – X12) = 1.51 +
35 Diageo 3.68 0.75 (1.75 – 1.51) = 1.69%
9. Measures of Dispersion
Standard deviation (S.D.): Standard deviation is Refer to Curriculum, Reading 2, Exhibit 46 for
the positive square root of the variance. It is Steps to Calculate Sample Standard
easy to interpret relative to variance because Deviation and Variance
standard deviation is expressed in the same
unit of measurement as the observations.
It is computed as:
-
ˆEfJ(𝑋E − 𝑋c)j
𝑠= ‰
𝑛−1
where,
s = sample S.D.
Practice: Example 18 and 19,
𝑋c = sample mean.
CFA Program Curriculum
Volume 1, Reading 2.
• CV is a scale-free measure (i.e., has no
units of measurement); therefore, it can
be used to directly compare dispersion
10.1 Coefficient of Variation across different data sets.
Symmetrical return distribution or Normal extreme gains i.e. limited but frequent
distribution: downside.
It is a return distribution that is symmetrical • It has a long tail on its right side.
about its mean i.e. equal loss and gain intervals • It has skewness > 0.
have same frequencies. It is referred to as • In a positively skewed unimodal
normal distribution. distributionè mode < median < mean.
• Generally, investors prefer positive
• A symmetrical distribution has skewness skewness (all else equal).
=0
b) Negatively skewed or left-skewed
Characteristics of the normal distribution: Distribution: It is a return distribution that
1) In a normal distribution, mean = median. reflects frequent small gains and a few
2) A normal distribution is completely extreme losses i.e. unlimited but less
described by two parameters i.e. its mean frequent upside.
and variance.
• It has a long tail on its left side.
Skewed distribution: The distribution that is not • It has skewness < 0.
symmetrical around the mean is called • In a negatively skewed unimodal
skewed. distribution è mean < median < mode.
1 ∑- (𝑋E − 𝑋c)F
𝑆± ≈ - ® EfJ F
𝑛 𝑆
• It has more frequent extremely large It is always positive number because the
deviations from the mean than a deviations are raised to the 4th power.
normal distribution.
• Ignoring fatter tails in analysis results in Excess kurtosis = Kurtosis – 3
underestimation of the probability of
extreme outcomes. • A normal or mesokurtic distribution has
• The more leptokurtic the distribution is, excess kurtosis = 0.
the higher the risk. • A leptokurtic distribution has excess
kurtosis > 0.
Platykurtic: It is a distribution that is less peaked • A platykurtic distribution has excess
than normal. kurtosis < 0.
Negative Covariance: When both variables Scatter plots are useful tool for a sensible
tends to move in the opposite direction, they interpretation of a correlation coefficient as it
are referred to as negatively correlated and demonstrates the relationship graphically.
have negative covariance.