CFA Level 1 - Quantitative Methods Email to Friend Comments

2.1 - Introduction The Quantitative Methods section of the CFA curriculum has traditionally been placed second in the sequence of study topics, following the Ethics and Professional Standards review. It's an interesting progression: the ethics discussions and case studies will invariably make most candidates feel very positive, very high-minded about the road on which they have embarked. Then, without warning, they are smacked with a smorgasbord of formulas, graphs, Greek letters, and challenging terminology. We know – it's easy to become overwhelmed. At the same time, the topics covered in this section – time value of money, performance measurement, statistics and probability basics, sampling and hypothesis testing, correlation and linear regression analysis – provide the candidate with a variety of highly essential analytical tools and are a crucial prerequisite for the subsequent material on fixed income, equities, and portfolio management. In short, mastering the material in this section will make the CFA's entire Body of Knowledge that much easier to handle. The list of topics within Quantitative Methods may appear intimidating at first, but rest assured that one does not need a PhD in mathematics or require exceptional numerical aptitude to understand and relate to the quantitative approaches at CFA Level 1. Still, some people will tend to absorb quantitative material better than others do. What we've tried to do in this study guide is present the full list of topics in a manner that summarizes and attempts to tone down the degree of technical detail that is characteristic of academic textbooks. At the same time, we want our presentation to be sufficiently deep that the guide can be effectively utilized as a candidate's primary study resource. For those who have already purchased and read the textbook, and for those who already clearly understand the material, this guide should allow for a relatively speedy refresher in those hectic days and weeks prior to exam day. Along the way, we'll provide tips (primarily drawn from personal experience) on how to approach the CFA Level 1 exam and help give you the best chance of earning a passing grade. 2.2 - What Is The Time Value Of Money? The principle of time value of money – the notion that a given sum of money is more valuable the sooner it is received, due to its capacity to earn interest – is the foundation for numerous applications in investment finance. Central to the time value principle is the concept of interest rates. A borrower who receives money today for consumption must pay back the principal plus an interest rate that compensates the lender. Interest rates are set in

the marketplace and allow for equivalent relationships to be determined by forces of supply and demand. In other words, in an environment where the market-determined rate is 10%, we would say that borrowing (or lending) \$1,000 today is equivalent to paying back (or receiving) \$1,100 a year from now. Here it is stated another way: enough borrowers are out there who demand \$1,000 today and are willing to pay back \$1,100 in a year, and enough investors are out there willing to supply \$1,000 now and who will require \$1,100 in a year, so that market equivalence on rates is reached.

Exam Tips and Tricks The CFA exam question authors frequently test knowledge of FV, PV and annuity cash flow streams within questions on mortgage loans or planning for college tuition or retirement savings. Problems with uneven cash flows will eliminate the use of the annuity factor formula, and require that the present value of each cash flow be calculated individually, and the resulting values added together. 2.3 - The Five Components Of Interest Rates CFA Institute's LOS 5.a requires an understanding of the components of interest rates from an economic (i.e. nonquantitative) perspective. In this exercise, think of the total interest rate as a sum of five smaller parts, with each part determined by its own set of factors. 1. Real Risk-Free Rate – This assumes no risk or uncertainty, simply reflecting differences in timing: the preference to spend now/pay back later versus lend now/collect later. 2. Expected Inflation - The market expects aggregate prices to rise, and the currency's purchasing power is reduced by a rate known as the inflation rate. Inflation makes real dollars less valuable in the future and is factored into determining the nominal interest rate (from the economics material: nominal rate = real rate + inflation rate). 3. Default-Risk Premium - What is the chance that the borrower won't make payments on time, or will be unable to pay what is owed? This component will be high or low depending on the creditworthiness of the person or entity involved. 4. Liquidity Premium- Some investments are highly liquid, meaning they are easily exchanged for cash (U.S. Treasury debt, for example). Other securities are less liquid, and there may be a certain loss expected if it's an issue that trades infrequently. Holding other factors equal, a less liquid security must compensate the holder by offering a higher interest rate. 5. Maturity Premium - All else being equal, a bond obligation will be more sensitive to interest rate fluctuations the longer to maturity it is. << Back Next >> 2.4 - Time Value Of Money Calculations Here we will discuss the effective annual rate, time value of money problems, PV of a perpetuity, an ordinary annuity, annuity due, a single cash flow and a series of uneven cash flows. For each, you should know how to both interpret the problem and solve the problems on your approved calculator. These concepts will cover LOS' 5.b and 5.c.

The Effective Annual Rate CFA Institute's LOS 5.b is explained within this section. We'll start by defining the terms, and then presenting the formula. The stated annual rate, or quoted rate, is the interest rate on an investment if an institution were to pay interest only once a year. In practice, institutions compound interest more frequently, either quarterly, monthly, daily and even continuously. However, stating a rate for those small periods would involve quoting in small fractions and wouldn't be meaningful or allow easy comparisons to other investment vehicles; as a result, there is a need for a standard convention for quoting rates on an annual basis. The effective annual yield represents the actual rate of return, reflecting all of the compounding periods during the year. The effective annual yield (or EAR) can be computed given the stated rate and the frequency of compounding. We'll discuss how to make this computation next. Formula 2.1 Effective annual rate (EAR) = (1 + Periodic interest rate)m – 1 Where: m = number of compounding periods in one year, and periodic interest rate = (stated interest rate) / m

Example: Effective Annual Rate Suppose we are given a stated interest rate of 9%, compounded monthly, here is what we get for EAR: EAR = (1 + (0.09/12))12 – 1 = (1.0075) 12 – 1 = (1.093807) – 1 = 0.093807 or 9.38% Keep in mind that the effective annual rate will always be higher than the stated rate if there is more than one compounding period (m > 1 in our formula), and the more frequent the compounding, the higher the EAR. Solving Time Value of Money Problems Approach these problems by first converting both the rate r and the time period N to the same units as the compounding frequency. In other words, if the problem specifies quarterly compounding (i.e. four compounding periods in a year), with time given in years and interest rate is an annual figure, start by dividing the rate by 4, and multiplying the time N by 4. Then, use the resulting r and N in the standard PV and FV formulas. Example: Compounding Periods Assume that the future value of \$10,000 five years from now is at 8%, but assuming quarterly compounding, we have quarterly r = 8%/4 = 0.02, and periods N = 4*5 = 20 quarters. FV = PV * (1 + r)N = (\$10,000)*(1.02)20 = (\$10,000)*(1.485947) = \$14,859.47 Assuming monthly compounding, where r = 8%/12 = 0.0066667, and N = 12*5 = 60. FV = PV * (1 + r)N = (\$10,000)*(1.0066667)60 = (\$10,000)*(1.489846) = \$14,898.46

Compare these results to the figure we calculated earlier with annual compounding (\$14,693.28) to see the benefits of additional compounding periods. Exam Tips and Tricks On PV and FV problems, switching the time units - either by calling for quarterly or monthly compounding or by expressing time in months and the interest rate in years - is an often-used tactic to trip up test takers who are trying to go too fast. Remember to make sure the units agree for r and N, and are consistent with the frequency of compounding, prior to solving. Present Value of a Perpetuity A perpetuity starts as an ordinary annuity (first cash flow is one period from today) but has no end and continues indefinitely with level, sequential payments. Perpetuities are more a product of the CFA world than the real world – what entity would obligate itself making to payments that will never end? However, some securities (such as preferred stocks) do come close to satisfying the assumptions of a perpetuity, and the formula for PV of a perpetuity is used as a starting point to value these types of securities. The formula for the PV of a perpetuity is derived from the PV of an ordinary annuity, which at N = infinity, and assuming interest rates are positive, simplifies to:

Formula 2.2 PV of a perpetuity = annuity payment A interest rate r Therefore, a perpetuity paying \$1,000 annually at an interest rate of 8% would be worth: PV = A/r = (\$1000)/0.08 = \$12,500 FV and PV of a SINGLE SUM OF MONEY If we assume an annual compounding of interest, these problems can be solved with the following formulas: Formula 2.3 (1) FV = PV * (1 + r)N (2) PV = FV * { 1 } (1 + r)N

Where: FV = future value of a single sum of money, PV = present value of a single sum of money, R = annual interest rate, and N = number of years

Example: Present Value At an interest rate of 8%, we calculate that \$10,000 five years from now will be: FV = PV * (1 + r)N = (\$10,000)*(1.08)5 = (\$10,000)*(1.469328) FV = \$14,693.28 At an interest rate of 8%, we calculate today's value that will grow to \$10,000 in five years: PV = FV * (1/(1 + r)N) = (\$10,000)*(1/(1.08)5) = (\$10,000)*(1/(1.469328)) PV = (\$10,000)*(0.680583) = \$6805.83 Example: Future Value An investor wants to have \$1 million when she retires in 20 years. If she can earn a 10% annual return, compounded annually, on her investments, the lump-sum amount she would need to invest today to reach her goal is closest to: A. \$100,000 B. \$117,459 C. \$148,644 D. \$161,506 Answer: The problem asks for a value today (PV). It provides the future sum of money (FV) = \$1,000,000; an interest rate (r) = 10% or 0.1; yearly time periods (N) = 20, and it indicates annual compounding. Using the PV formula listed above, we get the following: PV = FV *[1/(1 + r) N] = [(\$1,000,000)* (1/(1.10)20)] = \$1,000,000 * (1/6.7275) = \$1,000,000*0.148644 = \$148,644 Using a calculator with financial functions can save time when solving PV and FV problems. At the same time, the CFA exam is written so that financial calculators aren't required. Typical PV and FV problems will test the ability to recognize and apply concepts and avoid tricks, not the ability to use a financial calculator. The experience gained by working through more examples and problems increase your efficiency much more than a calculator. FV and PV of an Ordinary Annuity and an Annuity Due To solve annuity problems, you must know the formulas for the future value annuity factor and the present value annuity factor. Formula 2.4 Future Value Annuity Factor = (1 + r)N - 1 r

Formula 2.5 Present Value Annuity Factor = 11

(1 + r)N r Where r = interest rate and N = number of payments

FV Annuity Factor The FV annuity factor formula gives the future total dollar amount of a series of \$1 payments, but in problems there will likely be a periodic cash flow amount given (sometimes called the annuity amount and denoted by A). Simply multiply A by the FV annuity factor to find the future value of the annuity. Likewise for PV of an annuity: the formula listed above shows today's value of a series of \$1 payments to be received in the future. To calculate the PV of an annuity, multiply the annuity amount A by the present value annuity factor. The FV and PV annuity factor formulas work with an ordinary annuity, one that assumes the first cash flow is one period from now, or t = 1 if drawing a timeline. The annuity due is distinguished by a first cash flow starting immediately, or t = 0 on a timeline. Since the annuity due is basically an ordinary annuity plus a lump sum (today's cash flow), and since it can be fit to the definition of an ordinary annuity starting one year ago, we can use the ordinary annuity formulas as long as we keep track of the timing of cash flows. The guiding principle: make sure, before using the formula, that the annuity fits the definition of an ordinary annuity with the first cash flow one period away. Example: FV and PV of ordinary annuity and annuity due An individual deposits \$10,000 at the beginning of each of the next 10 years, starting today, into an account paying 9% interest compounded annually. The amount of money in the account of the end of 10 years will be closest to: A. \$109,000 B. \$143.200 C. \$151,900 D. \$165,600 Answer: The problem gives the annuity amount A = \$10,000, the interest rate r = 0.09, and time periods N = 10. Time units are all annual (compounded annually) so there is no need to convert the units on either r or N. However, the starting today introduces a wrinkle. The annuity being described is an annuity due, not an ordinary annuity, so to use the FV annuity factor, we will need to change our perspective to fit the definition of an ordinary annuity. Drawing a timeline should help visualize what needs to be done:

Figure 2.1: Cashflow Timeline

The definition of an ordinary annuity is a cash flow stream beginning in one period, so the annuity being described in the problem is an ordinary annuity starting last year, with 10 cash flows from t0 to t9. Using the FV annuity factor formula, we have the following: FV annuity factor = ((1 + r)N – 1)/r = (1.09)10 – 1)/0.09 = (1.3673636)/0.09 = 15.19293 Multiplying this amount by the annuity amount of \$10,000, we have the future value at time period 9. FV = (\$10,000)*(15.19293) = \$151,929. To finish the problem, we need the value at t10. To calculate, we use the future value of a lump sum, FV = PV*(1 + r)N, with N = 1, PV = the annuity value after 9 periods, r = 9. FV = PV*(1 + r)N = (\$151,929)*(1.09) = \$165,603. The correct answer is "D". Notice that choice "C" in the problem (\$151,900) agrees with the preliminary result of the value of the annuity at t = 9. It's also the result if we were to forget the distinction between ordinary annuity and annuity due, and go forth and solve the problem with the ordinary annuity formula and the given parameters. On the CFA exam, problems like this one will get plenty of takers for choice "C" – mostly the people trying to go too fast!! PV and FV of Uneven Cash Flows The FV and PV annuity formulas assume level and sequential cash flows, but if a problem breaks this assumption, the annuity formulas no longer apply. To solve problems with uneven cash flows, each cash flow must be discounted back to the present (for PV problems) or compounded to a future date (for FV problems); then the sum of the present (or future) values of all cash flows is taken. In practice, particularly if there are many cash flows, this exercise is usually completed by using a spreadsheet. On the CFA exam, the ability to handle this concept may be tested with just a few future cash flows, given the time constraints. It helps to set up this problem as if it were on a spreadsheet, to keep track of the cash flows and to make sure that the proper inputs are used to either discount or compound each cash flow. For example, assume that we are to receive a sequence of uneven cash flows from an annuity and we're asked for the present value of the annuity at a discount rate of 8%. Scratch out a table similar to the one below, with periods in the first column, cash flows in the second, formulas in the third column and computations in the fourth. Time Period 1 2 3 4 5 Cash Flow \$1,000 \$1,500 \$2,000 \$500 \$3,000 Present Value Formula (\$1,000)/(1.08)1 (\$1,500)/(1.08)2 (\$2,000)/(1.08)3 (\$500)/(1.08)4 (\$3,000)/(1.08)5 Result of Computation \$925.93 \$1,286.01 \$1,587.66 \$367.51 \$2,041.75

Taking the sum of the results in column 4, we have a PV = \$6,208.86. Suppose we are required to find the future value of this same sequence of cash flows after period 5. Here's the same approach using a table with future value formulas rather than present value, as in the table above: Time Period Cash Flow Future Value Formula Result of computation

1 2 3 4 5

\$1,000 \$1,500 \$2,000 \$500 \$3,000

(\$1,000)*(1.08)4 (\$1,500)*(1.08)3 (\$2,000)*(1.08)2 (\$500)*(1.08)1 (\$3,000)*(1.08)0

\$1,360.49 \$1,889.57 \$2,332.80 \$540.00 \$3,000.00

Taking the sum of the results in column 4, we have FV (period 5) = \$9,122.86. Check the present value of \$9,122.86, discounted at the 8% rate for five years: PV = (\$9,122.86)/(1.08)5 = \$6,208.86. In other words, the principle of equivalence applies even in examples where the cash flows are unequal. << Back Next >> 2.5 - Time Value Of Money Applications I. MORTGAGES Most of the problems from the time value material are likely to ask for either PV or FV and will provide the other variables. However, on a test with hundreds of problems, the CFA exam will look for unique and creative methods to test command of the material. A problem might provide both FV and PV and then ask you to solve for an unknown variable, either the interest rate (r), the number of periods (N) or the amount of the annuity (A). In most of these cases, a quick use of freshmen-level algebra is all that's required. We'll cover two real-world applications – each was the subject of an example in the resource textbook, so either one may have a reasonable chance of ending up on an exam problem. Annualized Growth Rates The first application is annualized growth rates. Taking the formula for FV of a single sum of money and solving for r produces a formula that can also be viewed as the growth rate, or the rate at which that sum of money grew from PV to FV in N periods.

Formula 2.6 Growth rate (g) = (FV/PV)1/N – 1 For example, if a company's earnings were \$100 million five years ago, and are \$200 million today, the annualized five-year growth rate could be found by: growth rate (g) = (FV/PV)1/N – 1 = (200,000,000/100,000,000) 1/5 – 1 = (2) 1/5 – 1 = (1.1486984) – 1 = 14.87% Monthly Mortgage Payments The second application involves calculating monthly mortgage payments. Periodic mortgage payments fit the definition of an annuity payment (A), where PV of the annuity is equal to amount borrowed. (Note that if the loan is needed for a \$300,000 home and they tell you that the down payment is \$50,000, make sure to reduce

the amount borrowed, or PV, to \$250,000! Plenty of folks will just grab the \$300,000 number and plug it into the financial calculator.) Because mortgage payments are typically made monthly with interest compounded monthly, expect to adjust the annual interest rate (r) by dividing by 12, and to multiply the time periods by 12 if the mortgage loan period is expressed in years. Since PV of an annuity = (annuity payment)*(PV annuity factor), we solve for annuity payment (A), which will be the monthly payment:

Formula 2.7 Monthly mortgage payment = (Amount of the loan)/(PV annuity factor) Example: Monthly Mortgage Payments Assuming a 30-year loan with monthly compounding (so N = 30*12 = 360 months), and a rate of 6% (so r = . 06/12 = 0.005), we first calculate the PV annuity factor: PV annuity factor = (1 – (1/(1 + r)N)/r = (1 – (1/(1.005)360)/0.005 = 166.7916 With a loan of \$250,000, the monthly payment in this example would be \$250,000/166.7916, or \$1,498.88 a month. Exam Tips and Tricks Higher-level math functions usually don't end up on the test, partly because they give an unfair advantage to those with higher-function calculators and because questions must be solved in an average of one to two minutes each at Level I. Don't get bogged down with understanding natural logs or transcendental numbers.

II. RETIREMENT SAVINGS Savings and retirement planning are sometimes more complicated, as there are various life-cycles stages that result in assumptions for uneven cash inflows and outflows. Problems of this nature often involve more than one computation of the basic time value formulas; thus the emphasis on drawing a timeline is sound advice, and a worthwhile habit to adopt even when solving problems that appear to be relatively simple. Example: Retirement Savings To illustrate, we take a hypothetical example of a client, 35 years old, who would like to retire at age 65 (30 years from today). Her goal is to have enough in her retirement account to provide an income of \$75,000 a year, starting a year after retirement or year 31, for 25 years thereafter. She had a late start on saving for retirement, with a current balance of \$10,000. To catch up, she is now committed to saving \$5,000 a year, with the first contribution a year from now. A single parent with two children, both of which will be attending college starting in five years, she won't be able to increase the annual \$5,000 commitment until after the kids have graduated. Once the children are finished with college, she will have extra disposable income, but is worried about just how much of an increase it will take to meet her ultimate retirement goals. To help her meet this goal, estimate how much she will need to save every year, starting 10 years from now, when the kids are out of college. Assume an average annual 8% return in the retirement account. Answer:

To organize and summarize this information, we will need her three cash inflows to be the equivalent of her one cash outflow. 1.The money already in the account is the first inflow. 2. The money to be saved during the next 10 years is the second inflow. 3. The money to be saved between years 11 and 30 is the third inflow. 4.The money to be taken as income from years 31 to 50 is the one outflow. All amounts are given to calculate inflows 1 and 2 and the outflow. The third inflow has an unknown annuity amount that will need to be determined using the other amounts. We start by drawing a timeline and specifying that all amounts be indexed at t = 30, or her retirement day.

Next, calculate the three amounts for which we have all the necessary information, and index to t = 30. (inflow 1) FV (single sum) = PV *(1 + r)N = (\$10,000)*(1.08)30 = \$100,627 (inflow 2) FV annuity factor = ((1 + r)N – 1)/r = ((1.08)10 – 1)/.08 = 14.48656 With a \$5000 payment, FV (annuity) = (\$5000)*(14.48656) = \$72,433 This amount is what is accumulated at t = 10; we need to index it to t = 30. FV (single sum) = PV *(1 + r)N = (\$72,433)*(1.08)20 = \$337,606 (cash PV annuity factor = (1 – (1/(1 + r)N)/r = (1 – (1/(1.08)25/0.08 = 10.674776.outflow) With payment of \$75,000, PV (annuity) = (\$75,000)*(10.674776) = \$800,608. Since the three cash inflows = cash outflow, we have (\$100,627) + (\$337,606) + X = \$800,608, or X = \$362,375 at t = 30. In other words, the money she saves from years 11 through 30 will need to be equal to \$362,375 in order for her to meet retirement goals. FV annuity factor = ((1 + r)N – 1)/r = ((1.08)20 – 1)/.08 = 45.76196 A = FV/FV annuity factor = (362,375)/45.76196 = \$7919 We find that by increasing the annual savings from \$5,000 to \$7,919 starting in year 11 and continuing to year 30, she will be successful in accumulating enough income for retirement. How are Present Values, Future Value and Cash Flows connected? The cash flow additivity principle allows us to add amounts of money together, provided they are indexed to the same period. The last example on retirement savings illustrates cash flow additivity: we were planning to accumulate a sum of money from three separate sources and we needed to determine what the total amount would be so that the accumulated sum could be compared with the client's retirement cash outflow requirement. Our example involved uneven cash flows from two separate annuity streams and one single lump sum that has

already accumulated. Comparing these inputs requires each amount to be indexed first, prior to adding them together. In the last example, the annuity we were planning to accumulate in years 11 to 30 was projected to reach \$362,375 by year 30. The current savings initiative of \$5,000 a year projects to \$72,433 by year 10. Right now, time 0, we have \$10,000. In other words, we have three amounts at three different points in time. According to the cash flow additivity principle, these amounts could not be added together until they were either discounted back to a common date, or compounded ahead to a common date. We chose t = 30 in the example because it made the calculations the simplest, but any point in time could have been chosen. The most common date chosen to apply cash flow additivity is t = 0 (i.e. discount all expected inflows and outflows to the present time). This principle is frequently tested on the CFA exam, which is why the technique of drawing timelines and choosing an appropriate time to index has been emphasized here. 2.6 - Net Present Value and the Internal Rate of Return This section applies the techniques and formulas first presented in the time value of money material toward real-world situations faced by financial analysts. Three topics are emphasized: (1) capital budgeting decisions, (2) performance measurement and (3) U.S. Treasury-bill yields. Net Preset Value NPV and IRR are two methods for making capital-budget decisions, or choosing between alternate projects and investments when the goal is to increase the value of the enterprise and maximize shareholder wealth. Defining the NPV method is simple: the present value of cash inflows minus the present value of cash outflows, which arrives at a dollar amount that is the net benefit to the organization. To compute NPV and apply the NPV rule, the authors of the reference textbook define a five-step process to be used in solving problems: 1.Identify all cash inflows and cash outflows. 2.Determine an appropriate discount rate (r). 3.Use the discount rate to find the present value of all cash inflows and outflows. 4.Add together all present values. (From the section on cash flow additivity, we know that this action is appropriate since the cash flows have been indexed to t = 0.) 5.Make a decision on the project or investment using the NPV rule: Say yes to a project if the NPV is positive; say no if NPV is negative. As a tool for choosing among alternates, the NPV rule would prefer the investment with the higher positive NPV.

Companies often use the weighted average cost of capital, or WACC, as the appropriate discount rate for capital projects. The WACC is a function of a firm's capital structure (common and preferred stock and long-term debt) and the required rates of return for these securities. CFA exam problems will either give the discount rate, or they may give a WACC. Example: To illustrate, assume we are asked to use the NPV approach to choose between two projects, and our company's weighted average cost of capital (WACC) is 8%. Project A costs \$7 million in upfront costs, and will generate \$3 million in annual income starting three years from now and continuing for a five-year period (i.e. years 3 to 7). Project B costs \$2.5 million upfront and \$2 million in each of the next three years (years 1 to 3). It generates no annual income but will be sold six years from now for a sales price of \$16 million. For each project, find NPV = (PV inflows) – (PV outflows). Project A: The present value of the outflows is equal to the current cost of \$7 million. The inflows can be viewed as an annuity with the first payment in three years, or an ordinary annuity at t = 2 since ordinary annuities always start the first cash flow one period away. PV annuity factor for r = .08, N = 5: (1 – (1/(1 + r)N)/r = (1 – (1/(1.08)5)/.08 = (1 – (1/(1.469328)/.08 = (1 – (1/ (1.469328)/.08 = (0.319417)/.08 = 3.99271 Multiplying by the annuity payment of \$3 million, the value of the inflows at t = 2 is (\$3 million)*(3.99271) = \$11.978 million. Discounting back two periods, PV inflows = (\$11.978)/(1.08)2 = \$10.269 million. NPV (Project A) = (\$10.269 million) – (\$7 million) = \$3.269 million. Project B: The inflow is the present value of a lump sum, the sales price in six years discounted to the present: \$16 million/(1.08)6 = \$10.083 million. Cash outflow is the sum of the upfront cost and the discounted costs from years 1 to 3. We first solve for the costs in years 1 to 3, which fit the definition of an annuity. PV annuity factor for r = .08, N = 3: (1 – (1/(1.08)3)/.08 = (1 – (1/(1.259712)/.08 = (0.206168)/.08 = 2.577097. PV of the annuity = (\$2 million)*(2.577097) = \$5.154 million. PV of outflows = (\$2.5 million) + (\$5.154 million) = \$7.654 million. NPV of Project B = (\$10.083 million) – (\$7.654 million) = \$2.429 million. Applying the NPV rule, we choose Project A, which has the larger NPV: \$3.269 million versus \$2.429 million. Exam Tips and Tricks Problems on the CFA exam are frequently set up so that it is tempting to pick a choice that seems intuitively better (i.e. by people who are guessing), but this is wrong by NPV rules. In the case we used, Project B had lower costs upfront (\$2.5 million versus \$7 million) with a payoff of \$16 million, which is more than the combined \$15 million payoff of Project A. Don't rely on what feels better; use the process to make the decision!

The Internal Rate of Return The IRR, or internal rate of return, is defined as the discount rate that makes NPV = 0. Like the NPV process, it starts by identifying all cash inflows and outflows. However, instead of relying on external data (i.e. a discount rate), the IRR is purely a function of the inflows and outflows of that project. The IRR rule states that projects or investments are accepted when the project's IRR exceeds a hurdle rate. Depending on the application, the hurdle rate may be defined as the weighted average cost of capital. Example: Suppose that a project costs \$10 million today, and will provide a \$15 million payoff three years from now, we use the FV of a single-sum formula and solve for r to compute the IRR. IRR = (FV/PV)1/N –1 = (15 million/10 million)1/3 – 1 = (1.5) 1/3 – 1 = (1.1447) – 1 = 0.1447, or 14.47% In this case, as long as our hurdle rate is less than 14.47%, we green light the project. NPV vs. IRR Each of the two rules used for making capital-budgeting decisions has its strengths and weaknesses. The NPV rule chooses a project in terms of net dollars or net financial impact on the company, so it can be easier to use when allocating capital. However, it requires an assumed discount rate, and also assumes that this percentage rate will be stable over the life of the project, and that cash inflows can be reinvested at the same discount rate. In the real world, those assumptions can break down, particularly in periods when interest rates are fluctuating. The appeal of the IRR rule is that a discount rate need not be assumed, as the worthiness of the investment is purely a function of the internal inflows and outflows of that particular investment. However, IRR does not assess the financial impact on a firm; it only requires meeting a minimum return rate. The NPV and IRR methods can rank two projects differently, depending on thesize of the investment. Consider the case presented below, with an NPV of 6%: Project A B Initial outflow \$250,000 \$50,000 Payoff after one year \$280,000 \$60,000 IRR 12% 20% NPV +\$14,151 +6604

By the NPV rule we choose Project A, and by the IRR rule we prefer B. How do we resolve the conflict if we must choose one or the other? The convention is to use the NPV rule when the two methods are inconsistent, as it better reflects our primary goal: to grow the financial wealth of the company. Consequences of the IRR Method In the previous section we demonstrated how smaller projects can have higher IRRs but will have less of a financial impact. Timing of cash flows also affects the IRR method. Consider the example below, on which initial investments are identical. Project A has a smaller payout and less of a financial impact (lower NPV), but since it is received sooner, it has a higher IRR. When inconsistencies arise, NPV is the preferred method. Assessing the financial impact is a more meaningful indicator for a capital-budgeting decision.

Project

Investment t1

Income in future periods t2 \$0 t3 \$0 t4 \$0 t5 \$0

IRR

NPV

A

\$100k

\$125k

25.0%

\$17,925

B \$100k \$0 \$0 \$0 \$0 \$200k 14.9% \$49,452 2.7 - Money Vs. Time-Weighted Return Money-weighted and time-weighted rates of return are two methods of measuring performance, or the rate of return on an investment portfolio. Each of these two approaches has particular instances where it is the preferred method. Given the priority in today's environment on performance returns (particularly when comparing and evaluating money managers), the CFA exam will be certain to test whether a candidate understands each methodology. Money-Weighted Rate of Return A money-weighted rate of return is identical in concept to an internal rate of return: it is the discount rate on which the NPV = 0 or the present value of inflows = present value of outflows. Recall that for the IRR method, we start by identifying all cash inflows and outflows. When applied to an investment portfolio: Outflows 1. The cost of any investment purchased 2. Reinvested dividends or interest 3. Withdrawals Inflows 1.The proceeds from any investment sold 2.Dividends or interest received 3.Contributions Example: Each inflow or outflow must be discounted back to the present using a rate (r) that will make PV (inflows) = PV (outflows). For example, take a case where we buy one share of a stock for \$50 that pays an annual \$2 dividend, and sell it after two years for \$65. Our money-weighted rate of return will be a rate that satisfies the following equation: PV Outflows = PV Inflows = \$2/(1 + r) + \$2/(1 + r)2 + \$65/(1 + r)2 = \$50 Solving for r using a spreadsheet or financial calculator, we have a money-weighted rate of return = 17.78%. Exam Tips and Tricks Note that the exam will test knowledge of the concept of money-weighted return, but any computations should not require use of a financial calculator It's important to understand the main limitation of the money-weighted return as a tool for evaluating managers. As defined earlier, the money-weighted rate of return factors all cash flows, including contributions and withdrawals. Assuming a money-weighted return is calculated over many periods, the formula will tend to place

a greater weight on the performance in periods when the account size is highest (hence the label moneyweighted). In practice, if a manager's best years occur when an account is small, and then (after the client deposits more funds) market conditions become more unfavorable, the money-weighted measure doesn't treat the manager fairly. Here it is put another way: say the account has annual withdrawals to provide a retiree with income, and the manager does relatively poorly in the early years (when the account is larger), but improves in later periods after distributions have reduced the account's size. Should the manager be penalized for something beyond his or her control? Deposits and withdrawals are usually outside of a manager's control; thus, a better performance measurement tool is needed to judge a manager more fairly and allow for comparisons with peers – a measurement tool that will isolate the investment actions, and not penalize for deposit/withdrawal activity. Time-Weighted Rate of Return The time-weighted rate of return is the preferred industry standard as it is not sensitive to contributions or withdrawals. It is defined as the compounded growth rate of \$1 over the period being measured. The timeweighted formula is essentially a geometric mean of a number of holding-period returns that are linked together or compounded over time (thus, time-weighted). The holding-period return, or HPR, (rate of return for one period) is computed using this formula: Formula 2.8 HPR = ((MV1 – MV0 + D1 – CF1)/MV0) Where: MV0 = beginning market value, MV1 = ending market value, D1 = dividend/interest inflows, CF1 = cash flow received at period end (deposits subtracted, withdrawals added back) For time-weighted performance measurement, the total period to be measured is broken into many sub-periods, with a sub-period ending (and portfolio priced) on any day with significant contribution or withdrawal activity, or at the end of the month or quarter. Sub-periods can cover any length of time chosen by the manager and need not be uniform. A holding-period return is computed using the above formula for all sub-periods. Linking (or compounding) HPRs is done by (a) adding 1 to each sub-period HPR, then (b) multiplying all 1 + HPR terms together, then (c) subtracting 1 from the product: Compounded time-weighted rate of return, for N holding periods = [(1 + HPR1)*(1 + HPR2)*(1 + HPR3) … *(1 + HPRN)] – 1. The annualized rate of return takes the compounded time-weighted rate and standardizes it by computing a geometric average of the linked holding-period returns. Formula 2.9 Annualized rate of return = (1 + compounded rate)1/Y – 1 Where: Y = total time in years

Example: Time-Weighted Portfolio Return Consider the following example: A portfolio was priced at the following values for the quarter-end dates indicated: Date Dec. 31, 2003 March 31, 2004 June 30, 2004 Sept. 30, 2004 Market Value \$200,000 \$196,500 \$200,000 \$243,000

Dec. 31, 2004 \$250,000 On Dec. 31, 2004, the annual fee of \$2,000 was deducted from the account. On July 30, 2004, the annual contribution of \$20,000 was received, which boosted the account value to \$222,000 on July 30. How would we calculate a time-weighted rate of return for 2004? Answer: For this example, the year is broken into four holding-period returns to be calculated for each quarter. Also, since a significant contribution of \$20,000 was received intra-period, we will need to calculate two holdingperiod returns for the third quarter, June 30, 2004, to July 30, 2004, and July 30, 2004, to Sept 30, 2004. In total, there are five HPRs that must be computed using the formula HPR = (MV1 – MV0 + D1 – CF1)/MV0. Note that since D1, or dividend payments, are already factored into the ending-period value, this term will not be needed for the computation. On a test problem, if dividends or interest is shown separately, simply add it to endingperiod value. The ccalculations are done below (dollar amounts in thousands): Period 1 (Dec 31, 2003, to Mar 31, 2004): HPR = ((\$196.5 – \$200 – (–\$2))/\$200) = (–1.5)/200 = –0.75%. Period 2 (Mar 31, 2004, to June 30, 2004): HPR = ((\$200 – \$196.5)/\$196.5) = 3.5/196.5 = +1.78%. Period 3 (June 30, 2004, to July 30, 2004): HPR = ((\$222 – \$200 – (\$20))/\$200) = 2/200 = +1.00%. Period 4 (July 30, 2004, to Sept 30, 2004): HPR = (\$243 – \$222)/\$222 = 21/222 = +9.46%. Period 5 (Sept 30, 2004, to Dec 31, 2004): HPR = (\$250 – \$243)/\$243 = 7/243 = +2.88% Now we link the five periods together, by adding 1 to each HPR, multiplying all terms, and subtracting 1 from the product, to find the compounded time- weighted rate of return: 2004 return = ((1 + (–.0075))*(1 + 0.0178)*(1 + 0.01)*(1 + 0.0946)*(1 + 0.0288)) – 1 = ((0.9925)*(1.0178)*(1.01)*(1.0946)*(1.0288)) – 1 = (1.148964) – 1 = 0.148964, or 14.90% (rounding to the

nearest 1/100 of a percent). Annualizing: Because our compounded calculation was for one year, the annualized figure is the same +14.90%. If the same portfolio had a 2003 return of 20%, the two-year compounded number would be ((1 + 0.20)*(1 + 14.90)) – 1, or 37.88%. Annualize by adding 1, and then taking to the 1/Y power, and then subtracting 1: (1 + 37.88)1/2 – 1 = 17.42%. Note: The annualized number is the same as a geometric average, a concept covered in the statistics section. Example: Money Weighted Returns Calculating money-weighted returns will usually require use of a financial calculator if there are cash flows more than one period in the future. Earlier we presented a case where a money-weighted return for two periods was equal to the IRR, where NPV = 0. Answer: For money-weighted returns covering a single period, we know PV (inflows) – PV (outflows) = 0. If we pay \$100 for a stock today, and sell it in one year later for \$105, and collect a \$2 dividend, we have a moneyweighted return or IRR = (\$105)/(1 + r) + (\$2)/(1 + r) – \$100 = \$0. r = (\$105 + \$2)/\$100 – 1, or 7%. Money-weighted return = time-weighted return for a single period where the cash flow is received at the end. If the period is any time frame other than one year, take (1 + the result), multiply by 1/Y and subtract 1 to find the annualized return. 2.8 - Calculating Yield Calculating Yield for a U.S. Treasury Bill A U.S. Treasury bill is the classic example of a pure discount instrument, where the interest the government pays is the difference between the amount it promises to pay back at maturity (the face value) and the amount it borrowed when issuing the T-bill (the discount). T-bills are short-term debt instruments (by definition, they have less than one year to maturity), and there is zero default risk with a U.S. government guarantee. After being issued, T-bills are widely traded in the secondary market, and are quoted based on the bank discount yield (i.e. the approximate annualized return the buyer should expect if holding until maturity). A bank discount yield (RBD) can be computed as follows: Formula 2.10 RBD = D/F * 360/t Where: D = dollar discount from face value, F = face value, T = days until maturity, 360 = days in a year By bank convention, years are 360 days long, not 365. If you recall the joke about banker's hours being shorter than regular business hours, you should remember that banker's years are also shorter. For example, if a T-bill has a face value of \$50,000, a current market price of \$49,700 and a maturity in 100 days, we have:

RBD = D/F * 360/t = (\$50,000-\$49,700)/\$50000 * 360/100 = 300/50000 * 3.6 = 2.16% On the exam, you may be asked to compute the market price, given a quoted yield, which can be accomplish by using the same formula and solving for D: Formula 2.11 D = RBD*F * t/360 Example: Using the previous example, if we have a bank discount yield of 2.16%, a face value of \$50,000 and days to maturity of 100, then we calculate D as follows: D = (0.0216)*(50000)*(100/360) = 300 Market price = F – D = 50,000 – 300 = \$49,700 Holding-Period Yield (HPY) HPY refers to the un-annualized rate of return one receives for holding a debt instrument until maturity. The formula is essentially the same as the concept of holding-period return needed to compute time-weighted performance. The HPY computation provides for one cash distribution or interest payment to be made at the time of maturity, a term that can be omitted for U.S. T-bills. Formula 2.12 HPY = (P1 – P0 + D1)/P0 Where: P0 = purchase price, P1 = price at maturity, and D1= cash distribution at maturity Example: Taking the data from the previous example, we illustrate the calculation of HPY: HPY = (P1 – P0 + D1)/P0 = (50000 – 49700 + 0)/49700 = 300/49700 = 0.006036 or 0.6036% Effective annual yield (EAY) EAY takes the HPY and annualizes the number to facilitate comparability with other investments. It uses the same logic presented earlier when describing how to annualize a compounded return number: (1) add 1 to the HPY return, (2) compound forward to one year by carrying to the 365/t power, where t is days to maturity, and (3) subtract 1. Here it is expressed as a formula: Formula 2.13 EAY = (1 + HPY)365/t – 1 Example: Continuing with our example T-bill, we have:

EAY = (1 + HPY)365/t – 1 = (1 + 0.006036)365/100 – 1 = 2.22 percent. Remember that EAY > bank discount yield, for three reasons: (a) yield is based on purchase price, not face value, (b) it is annualized with compound interest (interest on interest), not simple interest, and (c) it is based on a 365-day year rather than 360 days. Be prepared to compare these two measures of yield and use these three reasons to explain why EAY is preferable. The third measure of yield is the money market yield, also known as the CD equivalent yield, and is denoted by rMM. This yield measure can be calculated in two ways: 1. When the HPY is given, rMM is the annualized yield based on a 360-day year:

Formula 2.14 rMM = (HPY)*(360/t) Where: t = days to maturity For our example, we computed HPY = 0.6036%, thus the money market yield is: rMM = (HPY)*(360/t) = (0.6036)*(360/100) = 2.173%. 2. When bond price is unknown, bank discount yield can be used to compute the money market yield, using this expression: Formula 2.15 rMM = (360* rBD)/(360 – (t* rBD) Using our case: rMM = (360* rBD)/(360 – (t* rBD) = (360*0.0216)/(360 – (100*0.0216)) = 2.1735%, which is identical to the result at which we arrived using HPY. Interpreting Yield This involves essentially nothing more than algebra: solve for the unknown and plug in the known quantities. You must be able to use these formulas to find yields expressed one way when the provided yield number is expressed another way. Since HPY is common to the two others (EAY and MM yield), know how to solve for HPY to answer a question. Effective Annual Yield Money Market Yield Bond Equivalent Yield EAY = (1 + HPY)365/t – 1 rMM = (HPY)*(360/t) HPY = (1 + EAY)t/365 – 1 HPY = rMM * (t/360)

The bond equivalent yield is simply the yield stated on a semiannual basis multiplied by 2. Thus, if you are given a semiannual yield of 3% and asked for the bond equivalent yield, the answer is 6%. 2.9 - Statistical Concepts And Market Returns The term statistics is very broad. In some contexts it is used to refer to specific data. Statistics is also a branch of mathematics, a field of study – essentially the analysis tools and methods that are applied to data. Data, by itself, is nothing more than quantities and numbers. With statistics, data can be transformed into useful information and can be the basis for understanding and making intelligent comparisons and decisions. Basics

Descriptive Statistics - Descriptive statistics are tools used to summarize and consolidate large masses of numbers and data so that analysts can get their hands around it, understand it and use it. The learning outcomes in this section of the guide (i.e. the statistics section) are focused on descriptive statistics. Inferential Statistics - Inferential statistics are tools used to draw larger generalizations from observing a smaller portion of data. In basic terms, descriptive statistics intend to describe. Inferential statistics intend to draw inferences, the process of inferring. We will use inferential statistics in section D. Probability Concepts, later in this chapter.

Population Vs. Sample A population refers to every member of a group, while a sample is a small subset of the population. Sampling is a method used when the task of observing the entire population is either impossible or impractical. Drawing a sample is intended to produce a smaller group with the same or similar characteristics as the population, which can then be used to learn more about the whole population. Parameters and Sample Statistics A parameter is the set of tools and measures used in descriptive statistics. Mean, range and variance are all commonly used parameters that summarize and describe the population. A parameter describes the total population. Determining the precise value of any parameter requires observing every single member of the population. Since this exercise can be impossible or impractical, we use sampling techniques, which draw a sample that (the analyst hopes) represents the population. Quantities taken from a sample to describe its characteristics (e.g. mean, range and variance) are termed sample statistics. Population  arameter P Sample Sample Statistic 

Measurement Scales Data is measured and assigned to specific points based on a chosen scale. A measurement scale can fall into one of four categories: 1. Nominal - This is the weakest level as the only purpose is to categorize data but not rank it in any way. For example, in a database of mutual funds, we can use a nominal scale for assigning a number to identify fund style (e.g. 1 for large-cap value, 2 for large-cap growth, 3 for foreign blend, etc.). Nominal scales don't lend themselves to descriptive tools – in the mutual fund example, we would not report the average fund style as 5.6 with a standard deviation of 3.2. Such descriptions are meaningless for

nominal scales. 2. Ordinal - This category is considered stronger than nominal as the data is categorized according to some rank that helps describe rankings or differences between the data. Examples of ordinal scales include the mutual fund star rankings (Morningstar 1 through 5 stars), or assigning a fund a rating between 1 and 10 based on its five-year performance and its place within its category (e.g. 1 for the top 10%, 2 for funds between 10% and 20% and so forth). An ordinal scale doesn't always fully describe relative differences – in the example of ranking 1 to 10 by performance, there may be a wide performance gap between 1 and 2, but virtually nothing between 6, 7, and 8. 3. Interval - This is a step stronger than the ordinal scale, as the intervals between data points are equal, and data can be added and subtracted together. Temperature is measured on interval scales (Celsius and Fahrenheit), as the difference in temperature between 25 and 30 is the same as the difference between 85 and 90. However, interval scales have no zero point – zero degrees Celsius doesn't indicate no temperature; it's simply the point at which water freezes. Without a zero point, ratios are meaningless – for example, nine degrees is not three times as hot as three degrees. 4. Ratio - This category represents the strongest level of measurement, with all the features of interval scales plus the zero point, giving meaning to ratios on the scale. Most measurement scales used by financial analysts are ratios, including time (e.g. days-to-maturity for bonds), money (e.g. earnings per share for a set of companies) and rates of return expressed as a percentage. Frequency Distribution A frequency distribution seeks to describe large data sets by doing four things: (1) establishing a series of intervals as categories, (2) assigning every data point in the population to one of the categories, (3) counting the number of observations within each category and (4) presenting the data with each assigned category, and the frequency of observations in each category. Frequency distribution is one of the simplest methods employed to describe populations of data and can be used for all four measurement scales – indeed, it is often the best and only way to describe data measured on a nominal, ordinal or interval scale. Frequency distributions are sometimes used for equity index returns over a long history – e.g. the S&P 500 annual or quarterly returns grouped into a series of return intervals. << Back Next >> 2.10 - Basic Statistical Calculations Holding Period Return The holding return period formula was introduced previously when discussing time-weighted return measurement. The same formula applies when applied to frequency distributions (descriptions changed slightly): Formula 2.16 Rt = [(Pt – Pt - 1 + Dt)/ Pt – 1] Where: Rt = holding period return for time period (t) and Pt =

price of asset at end of time period t, Pt - 1 = price of asset at end of time period (t – 1), Dt = cash distributions received during time t Relative and Cumulative Frequencies Relative frequency is calculated by dividing the absolute frequency of a particular interval by the total population. Cumulative relative frequency is a process where relative frequencies are added together to show the percentage of observations that fall at or below a certain point. For an illustration on calculating relative frequency and cumulative relative frequency, refer to the following frequency distribution for quarterly returns over the last 10 years for a mutual fund:

Quarterly Number of Relative Cumulative Cumulative return observations frequency absolute relative interval (absolute frequency frequency frequency)

–15% to -10% –10% to – 5% –5% to 0% 0% to +5% +5% to +10% +10% to +15% +15% to +20%

2

5.0%

2

5.0%

1

2.5%

3

7.5%

5 17 10

12.5% 42.5% 25.0%

8 25 35

20.0% 62.5% 87.5%

2

5.0%

37

92.5%

3

7.5%

40

100.0%

There are 40 observations in this distribution (last 10 years, four quarters per year), and the relative frequency is found by dividing the number in the second column by 40. The cumulative absolute frequency (fourth column) is constructed by adding the frequency of all observations at or below that point. So for the fifth interval, +5% to +10%, we find the cumulative absolute frequency by adding the absolute frequency in the fifth interval and all previous intervals: 2+1+5+17+10=35. The last column, cumulative relative frequency, takes the number in the fourth column and divides by 40, the total number of observations. Histograms and Frequency Polygons A histogram is a frequency distribution presented as a bar chart, with number of observations on the Y axis and

intervals on the X. The frequency distribution above is presented as a histogram in figure 2.2 below:

Figure 2.2: Histogram A return polygon presents a line chart rather than a bar chart. Here is the data from the frequency distribution presented with a return polygon:

Figure 2.3: Return Polygon

Look Out! You may be asked to describe the data presented for a histogram or frequency polygon. Most likely this would involve evaluating risk by indicating that there are two examples of the most negative outcomes (i.e. quarters below –10%, category 1). Also you may be asked how normally distributed the graph appears. Normal distributions are detailed later in this study guide. Central Tendency The term "measures of central tendency" refers to the various methods used to describe where large groups of data are centered in a population or a sample. Here it is stated another way: if we were to pull one value or observation from a population or sample, what would we typically expect the value to be? Various methods are used to calculate central tendency. The most frequently used is the arithmetic mean, or the sum of observations

divided by the number of observations. Example: Arithmetic Mean For example, if we have 20 quarters of return data: -1.5%-2.5%+5.6%+10.7% +0.8%-7.7%-10.1% +2.2% +12.0%+10.9% -2.6% +0.2% -1.9%-6.2%+17.1% +4.8% +9.1% +3.0% -0.2% +1.8% We find the arithmetic mean by adding the 20 observations together, then dividing by 20. ((-1.5%) + (-2.5%) + 5.6% + 10.7% + 0.8% + (7.7%) + (-10.1%) + 2.2% + 12.0% + 10.9% + (-2.6%) + 0.2% + (-1.9%) + (-6.2%) + 17.1% + 4.8% + 9.1% + 3.0% + (-0.2%) + 1.8%) = 45.5% Arithmetic mean = 45.5%/20 = 2.275% The mean is usually interpreted as answering the question of what will be the most likely outcome, or what represents the data most fairly. The arithmetic mean formula is used to compute population mean (often denoted by the Greek symbol μ), which is the arithmetic mean of the entire population. The population mean is an example of a parameter, and by definition it must be unique. That is, a given population can have only one mean. The sample mean (denoted by X or X-bar) is the arithmetic mean value of a sample. It is an example of a sample statistic, and will be unique to a particular sample. In other words, five samples drawn from the same population may produce five different sample means. While the arithmetic mean is the most frequently used measure of central tendency, it does have shortcomings that in some cases tend to make it misleading when describing a population or sample. In particular, the arithmetic mean is sensitive to extreme values. Example: For example, let's say we have the following five observations: -9000, 1.4, 1.6, 2.4 and 3.7. The arithmetic mean is –1798.2 [(-9000 + 1.4 + 1.6 + 2.4 + 3.7)/5], yet –1798.2 has little meaning in describing our data set. The outlier (-9000) draws down the overall mean. Statisticians use a variety of methods to compensate for outliers, such as, for example, eliminating the highest and lowest value before calculating the mean. For example, by dropping –9000 and 3.7, the three remaining observations have a mean of 1.8, a more meaningful description of the data. Another approach is to use either the median or mode, or both. Weighted Average or Mean The weighted average or weighted mean, when applied to a portfolio, takes the mean return of each asset class and weights it by the allocation of each class. Say a portfolio manager has the following allocation and mean annual performance returns achieved for each class: Asset Class Portfolio Mean annual

weight U.S. Large Cap U.S. Mid Cap U.S. Small Cap Foreign (Developed Mkts.) Emerging Markets Fixed Income (short/intermediate) Fixed Income (long maturities) Cash/Money Market 30% 15% 10% 15% 8% 12% 7% 3%

return 9.6% 11.2% 7.4% 8.8% 14.1% 4.1% 6.6% 2.1%

The weighted mean is calculated by weighting the return on each class and summing: Portfolio return = (0.30)*(0.096) + (0.15)*(0.112) + (0.10)*(0.074) + (0.15)*(0.088) + (0.08)*(0.141) + (0.12)*(0.041) + (0.07)*(0.066) + (0.03)*(0.021) = 8.765% Median Median is defined as the middle value in a series that is sorted in either ascending or descending order. In the example above with five observations, the median, or middle value, is 1.6 (i.e. two values below 1.6, and two values above 1.6). In this case, the median is a much fairer indication of the data compared to the mean of – 1798.2. Mode Mode is defined as the particular value that is most frequently observed. In some applications, the mode is the most meaningful description. Take a case with a portfolio of ten mutual funds and their respective ratings: 5, 4, 4, 4, 4, 4, 4, 3, 2 and 1. The arithmetic mean rating is 3.5 stars. However in this example, the modal rating of four describes the majority of observations and might be seen as a fairer description. Weighted Mean Weighted mean is frequently seen in portfolio problems in which various assets classes are weighted within the portfolio – for example, if stocks comprise 60% of a portfolio, then 0.6 is the weight. A weighted mean is computed by multiplying the mean of each weight by the weight, and then summing the products. Take an example where stocks are weighted 60%, bonds 30% and cash 10%. Assume that the stock portion returned 10%, bonds returned 6% and cash returned 2%. The portfolio's weighted mean return is: Stocks (wtd) + Bonds (wtd) + Cash (wtd) = (0.6)*(0.1) + (0.3)*(0.06) + (0.1)*(0.02) = (0.06) + (0.018) + (0.002) = 8% Geometric Mean We initially introduced the geometric mean earlier in the computations for time-weighted performance. It is usually applied to data in percentages: rates of return over time, or growth rates. With a series of n observations of statistic X, the geometric mean (G) is: Formula 2.1 7

G = (X1*X2*X3*X4 … *Xn)1/n So if we have a four-year period in which a company's sales grew 4%, 5%, -3% and 10%, here is the calculation of the geometric mean: G = ((1.04)*(1.05)*(0.97)*(1.1))1/4 – 1 = 3.9%. It's important to gain experience with using geometric mean on percentages, which involves linking the data together: (1) add 1 to each percentage, (2) multiply all terms together, (3) carry the product to the 1/n power and (4) subtract 1 from the result. Harmonic mean is computed by the following steps: 1. Taking the reciprocal of each observation, or 1/X, 2. Adding these terms together, 3. Averaging the sum by dividing by n, or the total number of observations, 4. Taking the reciprocal of this result. The harmonic mean is most associated with questions about dollar cost averaging, but its use is limited. Arithmetic mean, weighted mean and geometric mean are the most frequently used measures and should be the main emphasis of study. Quartiles, Quintiles, Deciles, and Percentiles. These terms are most associated with cases where the point of central tendency is not the main goal of the research study. For example, in a distribution of five-year performance returns for money managers, we may not be interested in the mean performer (i.e. the manager at the 50% level), but rather in those in the top 10% or top 20% of the distribution. Recall that the median essentially divides a distribution in half. By the same process, quartiles are the result of a distribution being divided into four parts; quintiles refer to five parts; deciles, 10 parts; and percentiles, 100 parts. A manager in the second quintile would be better than 60% (bottom three quintiles) and below 20% (the top quintile) (i.e. somewhere between 20% and 40% in percentile terms). A manager at the 21st percentile has 20 percentiles above, 79 percentiles below. 2.11 - Standard Deviation And Variance Range and Mean Absolute Deviation The range is the simplest measure of dispersion, the extent to which the data varies from its measure of central tendency. Dispersion or variability is a concept covered extensively in the CFA curriculum, as it emphasizes risk, or the chances that an investment will not achieve its expected outcome. If any investment has two dimensions – one describing risk, one describing reward – then we must measure and present both dimensions to gain an idea of the true nature of the investment. Mean return describes the expected reward, while the measures of dispersion describe the risk.

Range Range is simply the highest observation minus the lowest observation. For data that is sorted, it should be easy to locate maximum/minimum values and compute the range. The appeal of range is that it is simple to interpret and easy to calculate; the drawback is that by using just two values, it can be misleading if there are extreme values that turn out to be very rare, and it may not fairly represent the entire distribution (all of the outcomes). Mean Absolute Deviation (MAD) MAD improves upon range as an indicator of dispersion by using all of the data. It is calculated by: 1. Taking the difference between each observed value and the mean, which is the deviation 2. Using the absolute value of each deviation, adding all deviations together 3. Dividing by n, the number of observations. Example: To illustrate, we take an example of six mid-cap mutual funds, on which the five-year annual returns are +10.1, +7.7%, +5.0, +12.3%, +12.2% and +10.9%. Answer: Range = Maximum – Minimum = (+12.3%) – (+5.0%) = 7.3% Mean absolute deviation starts by finding the mean: (10.1% + 7.7% + 5.0% + 12.3% + 12.2% + 10.9%)/6 = 9.7%. Each of the six observations deviate from the 9.7%; the absolute deviation ignores +/–. 1st: 10.1 – 9.7 = 0.4 2nd: 7.7 – 9.7 = 2.0 3rd: 5.0 – 9.7 = 4.7 5th: 12.2 – 9.7 = 2.5 4th: 12.3 – 9.7 = 2.6 6th: 10.9 – 9.7 = 1.2

Next, the absolute deviations are summed and divided by 6:(0.4 + 2.0 + 4.7 + 2.6 + 2.5 + 1.2)/6 = 13.4/6 = 2.233333, or rounded, 2.2%. Variance Variance (σ2) is a measure of dispersion that in practice can be easier to apply than mean absolute deviation because it removes +/– signs by squaring the deviations. Returning to the example of mid-cap mutual funds, we had six deviations. To compute variance, we take the square of each deviation, add the terms together and divide by the number of observations. Observation 1 2 3 4 5 6 Value +10.1% +7.7% +5.0% +12.3% +12.2% +10.9% Deviation from +9.7% 0.4 2.0 4.7 2.6 2.5 1.2 Square of Deviation 0.16 4.0 22.09 6.76 6.25 1.44

Variance = (0.16 + 4.0 + 22.09 + 6.76 + 6.25 + 1.44)/6 = 6.7833. Variance is not in the same units as the underlying data. In this case, it's expressed as 6.7833% squared – difficult to interpret unless you are a

mathematical expert (percent squared?). Standard Deviation Standard deviation (σ) is the square root of the variance, or (6.7833)1/2 = 2.60%. Standard deviation is expressed in the same units as the data, which makes it easier to interpret. It is the most frequently used measure of dispersion. Our calculations above were done for a population of six mutual funds. In practice, an entire population is either impossible or impractical to observe, and by using sampling techniques, we estimate the population variance and standard deviation. The sample variance formula is very similar to the population variance, with one exception: instead of dividing by n observations (where n = population size), we divide by (n – 1) degrees of freedom, where n = sample size. So in our mutual fund example, if the problem was described as a sample of a larger database of mid-cap funds, we would compute variance using n – 1, degrees of freedom. Sample variance (s2) = (0.16 + 4.0 + 22.09 + 6.76 + 6.25 + 1.44)/(6 – 1) = 8.14 Sample Standard Deviation (s) Sample standard deviation is the square root of sample variance: (8.14)1/2 = 2.85%. In fact, standard deviation is so widely used because, unlike variance, it is expressed in the same units as the original data, so it is easy to interpret, and can be used on distribution graphs (e.g. the normal distribution). Semivariance and Target Semivariance Semivariance is a risk measure that focuses on downside risk, and is defined as the average squared deviation below the mean. Computing a semivariance starts by using only those observations below the mean, that is, any observations at or above the mean are ignored. From there, the process is similar to computing variance. If a return distribution is symmetric, semivariance is exactly half of the variance. If the distribution is negatively skewed, semivariance can be higher. The idea behind semivariance is to focus on negative outcomes. Target semivariance is a variation of this concept, considering only those squared deviations below a certain target. For example, if a mutual fund has a mean quarterly return of +3.6%, we may wish to focus only on quarters where the outcome is –5% or lower. Target semivariance eliminates all quarters above –5%. From there, the process of computing target semivariance follows the same procedure as other variance measures. Chebyshev's Inequality Chebyshev's inequality states that the proportion of observations within k standard deviations of an arithmetic mean is at least 1 – 1/k2, for all k > 1. # of Standard Deviations from Mean (k) 2 3 4 Chebyshev's Inequality 1 – 1/(2)2, or 1 – 1/4, or 3/4 1 – 1/(3)2, or 1 – 1/9, or 1 – 1/(4)2, or 1 – 1/16, or 15/16 % of Observations 75 (.75) 89 (.8889) 94 (.9375)

Given that 75% of observations fall within two standard deviations, if a distribution has an annual mean return of 10% and a standard deviation of 5%, we can state that in 75% of the years, the return will be anywhere from 0% to 20%. In 25% of the years, it will be either below 0% or above 20%. Given that there are 89% falling within three standard deviations means that in 89% of the years, the return will be within a range of –5% to +25%. Eleven percent of the time it won't. Later we will learn that for so-called normal distributions, we expect about 95% of the observations to fall within two standard deviations. Chebyshev's inequality is more general and doesn't assume a normal distribution, that is, it applies to any shaped distribution. Coefficient of Variation The coefficient of variation (CV) helps the analyst interpret relative dispersion. In other words, a calculated standard deviation value is just a number. Does this number indicate high or low dispersion? The coefficient of variation helps describe standard deviation in terms of its proportion to its mean by this formula: Formula 2.18 CV = s/X Where: s = sample standard deviation, X = sample mean Sharpe Ratio The Sharpe ratio is a measure of the risk-reward tradeoff of an investment security or portfolio. It starts by defining excess return, or the percentage rate of return of a security above the risk-free rate. In this view, the risk-free rate is a minimum rate that any security should earn. Higher rates are available provided one assumes higher risk. The Sharpe ratio is calculated by dividing the ratio of excess return, to the standard deviation of return. Formula 2.19 Sharpe ratio = [(mean return) – (risk-free return)] / standard deviation of return

Example: Sharpe Ratio If an emerging-markets fund has a historic mean return of 18.2% and a standard deviation of 12.1%, and the return on three-month T-bills (our proxy for a risk-free rate) was 2.3%, the Sharpe ratio = (18.2)–(2.3)/12.1 = 1.31. In other words, for every 1% of additional risk we accept by investing in this emerging markets fund, we are rewarded with an excess 1.31%. Part of the reason that the Sharpe ratio has become popular is that it's an easy to understand and appealing concept, for practitioners and investors. 2.12 - Skew And Kurtosis

Skew Skew, or skewness, can be mathematically defined as the averaged cubed deviation from the mean divided by the standard deviation cubed. If the result of the computation is greater than zero, the distribution is positively skewed. If it’s less than zero, it’s negatively skewed and equal to zero means it’s symmetric. For interpretation and analysis, focus on downside risk. Negatively skewed distributions have what statisticians call a long left tail (refer to graphs on previous page), which for investors can mean a greater chance of extremely negative outcomes. Positive skew would mean frequent small negative outcomes, and extremely bad scenarios are not as likely. A nonsymmetrical or skewed distribution occurs when one side of the distribution does not mirror the other. Applied to investment returns, nonsymmetrical distributions are generally described as being either positively skewed (meaning frequent small losses and a few extreme gains) or negatively skewed (meaning frequent small gains and a few extreme losses).

Positive Skew Figure 2.4

Negative Skew

For positively skewed distributions, the mode (point at the top of the curve) is less than the median (the point where 50% are above/50% below), which is less than the arithmetic mean (sum of observations/number of observations). The opposite rules apply to negatively skewed distribution: mode is greater than median, which is greater than arithmetic mean. Positive: Mean > Median > Mode Negative: Mean < Median < Mode

Notice that by alphabetical listing, it’s mean à median à mode. For positive skew, they are separated with a greater than sign, for negative, less than. Kurtosis Kurtosis refers to the degree of peak in a distribution. More peak than normal (leptokurtic) means that a distribution also has fatter tails and that there is a greater chance of extreme outcomes compared to a normal distribution. The kurtosis formula measures the degree of peak. Kurtosis equals three for a normal distribution; excess kurtosis calculates and expresses kurtosis above or below 3. In figure 2.5 below, the solid line is the normal distribution; the dashed line is leptokurtic distribution.

Figure 2.5: Kurtosis Sample Skew and Kurtosis For a calculated skew number (average cubed deviations divided by the cubed standard deviation), look at the sign to evaluate whether a return is positively skewed (skew > 0), negatively skewed (skew < 0) or symmetric (skew = 0). A kurtosis number (average deviations to the fourth power divided by the standard deviation to the fourth power) is evaluated in relation to the normal distribution, on which kurtosis = 3. Since excess kurtosis = kurtosis – 3, any positive number for excess kurtosis would mean the distribution is leptokurtic (meaning fatter tails and greater risk of extreme outcomes). << Back Next >> 2.13 - Basic Probability Concepts To help make logical and consistent investment decisions, and help manage expectations in an environment of risk, an analyst uses the concepts and tools found in probability theory. A probability refers to the percentage chance that something will happen, from 0 (it is impossible) to 1 (it is certain to occur), and the scale going from less likely to more likely. Probability concepts help define risk by quantifying the prospects for unintended and negative outcomes; thus probability concepts are a major focus of the CFA curriculum. I. Basics Random Variable A random variable refers to any quantity with uncertain expected future values. For example, time is not a random variable since we know that tomorrow will have 24 hours, the month of January will have 31 days and so on. However, the expected rate of return on a mutual fund and the expected standard deviation of those returns are random variables. We attempt to forecast these random variables based on past history and on our forecast for the economy and interest rates, but we cannot say for certain what the variables will be in the future – all we have are forecasts or expectations. Outcome Outcome refers to any possible value that a random variable can take. For expected rate of return, the range of outcomes naturally depends on the particular investment or proposition. Lottery players have a near-certain probability of losing all of their investment (–100% return), with a very small chance of becoming a multimillionaire (+1,000,000% return – or higher!). Thus for a lottery ticket, there are usually just two extreme outcomes. Mutual funds that invest primarily in blue chip stocks will involve a much narrower series of

outcomes and a distribution of possibilities around a specific mean expectation. When a particular outcome or a series of outcomes are defined, it is referred to as an event. If our goal for the blue chip mutual fund is to produce a minimum 8% return every year on average, and we want to assess the chances that our goal will not be met, our event is defined as average annual returns below 8%. We use probability concepts to ask what the chances are that our event will take place. Event If a list of events ismutually exclusive, it means that only one of them can possibly take place. Exhaustive events refer to the need to incorporate all potential outcomes in the defined events. For return expectations, if we define our two events as annual returns equal to or greater than 8% and annual returns equal to or less than 8%, these two events would not meet the definition of mutually exclusive since a return of exactly 8% falls into both categories. If our defined two events were annual returns less than 8% and annual returns greater than 8%, we've covered all outcomes except for the possibility of an 8% return; thus our events are not exhaustive. The Defining Properties of Probability Probability has two defining properties: 1. The probability of any event is a number between 0 and 1, or 0 < P(E) < 1. A P followed by parentheses is the probability of (event E) occurring. Probabilities fall on a scale between 0, or 0%, (impossible) and 1, or 100%, (certain). There is no such thing as a negative probability (less than impossible?) or a probability greater than 1 (more certain than certain?). 2. The sum of all probabilities of all events equals 1, provided the events are both mutually exclusive and exhaustive. If events are not mutually exclusive, the probabilities would add up to a number greater than 1, and if they were not exhaustive, the sum of probabilities would be less than 1. Thus, there is a need to qualify this second property to ensure the events are properly defined (mutually exclusive, exhaustive). On an exam question, if the probabilities in a research study are added to a number besides 1, you might question whether this principle has been met. These terms refer to the particular approach an analyst has used to define the events and make predictions on probabilities (i.e. the likelihood of each event occurring). How exactly does the analyst arrive at these probabilities? What exactly are the numbers based upon? The approach is empirical, subjective or a priori. Empirical Probabilities Empirical probabilities are objectively drawn from historical data. If we assembled a return distribution based on the past 20 years of data, and then used that same distribution to make forecasts, we have used an empirical approach. Of course, we know that past performance does not guarantee future results, so a purely empirical approach has its drawbacks. Subjective Probabilities Relationships must be stable for empirical probabilities to be accurate and for investments and the economy, relationships change. Thus, subjective probabilities are calculated; these draw upon experience and judgment to make forecasts or modify the probabilities indicated from a purely empirical approach. Of course, subjective probabilities are unique to the person making them and depend on his or her talents – the investment world is filled with people making incorrect subjective judgments. A Priori Probabilities A priori probabilities represent probabilities that are objective and based on deduction and reasoning about a particular case. For example, if we forecast that a company is 70% likely to win a bid on a contract (based on an either empirical or subjective approach), and we know this firm has just one business competitor, then we can

test the ability to calculate joint probabilities. Such computations require use of the multiplication rule, which states that the joint probability of A and B is the product of the conditional probability of A given B, times the probability of B. In probability notation:

Formula 2.20 Multiplication rule: P(AB) = P(A | B) * P(B) Given a conditional probability P(A | B) = 40%, and a probability of B = 60%, the joint probability P(AB) = 0.6*0.4 or 24%, found by applying the multiplication rule. The Addition Rule The addition rule is used in situations where the probability of at least one of two given events - A and B - must be found. This probability is equal to the probability of A, plus the probability of B, minus the joint probability of A and B. Formula 2.21 Addition Rule: P(A or B) = P(A) + P (B) – P(AB)

For example, if the probability of A = 0.4, and the probability of B = 0.45, and the joint probability of both is 0.2, then the probability of either A or B = 0.4 + 0.45 – 0.2 = 0.65. Remembering to subtract the joint probability P(AB) is often the difficult part of applying this rule. Indeed, if the addition rule is required to solve a probability problem on the exam, you can be sure that the wrong answers will include P(A) + P(B), and P(A)*P(B). Just remember that the addition rule is asking for either A or B, so you don't want to double count. Thus, the probability of both A and B, P(AB), is an intersection and needs to be subtracted to arrive at the correct probability. Dependent and Independent Events Two events are independent when the occurrence of one has no effect on the probability that the other will occur. Earlier we established the definition of a conditional probability, or the probability of A given B, P(A | B). If A is completely independent of B, then this conditional probability is the same as the unconditional probability of A. Thus the definition of independent events states that two events - A and B - are independent of each other, if, and only if, P(A | B) = P(A). By the same logic, B would be independent of A if, and only if, P(B | A), which is the probability of B given that A has occurred, is equal to P(B). Two events are not independent when the conditional probability of A given B is higher or lower than the unconditional probability of A. In this case, A is dependent on B. Likewise, if P(B | A) is greater or less than P(B), we know that B depends on A. Calculating the Joint Probability of Two or More Independent Events Recall that for calculating joint probabilities, we use the multiplication rule, stated in probability notation as

P(AB) = P(A | B) * P(B). For independent events, we've now established that P(A | B) = P(A), so by substituting P(A) into the equation for P(A | B), we see that for independent events, the multiplication rule is simply the product of the individual probabilities. Formula 2.22 Multiplication rule, independent events: P(AB) = P(A) * P(B)AB) Moreover, the rule generalizes for more than two events provided they are all independent of one another, so the joint probability of three events P(ABC) = P(A) * (P(B) * P(C), again assuming independence. The Total Probability Rule The total probability rule explains an unconditional probability of an event, in terms of that event's conditional probabilities in a series of mutually exclusive, exhaustive scenarios. For the simplest example, there are two scenarios, S and the complement of S, or SC, and P(S) + P(SC) = 1, given the properties of being mutually exclusive and exhaustive. How do these two scenarios affect event A? P(A | S) and P(A | SC) are the conditional probabilities that event A will occur in scenario S and in scenario SC, respectively. If we know the conditional probabilities, and we know the probability of the two scenarios, we can use the total probability rule formula to find the probability of event A. Formula 2.23 Total probability rule (two scenarios): P(A) = P(A | S)*P(S) + P(A | SC)*P(SC) This rule is easiest to remember if you compare the formula to the weighted-mean calculation used to compute rate of return on a portfolio. In that exercise, each asset class had an individual rate of return, weighted by its allocation to compute the overall return. With the total probability rule, each scenario has a conditional probability (i.e. the likelihood of event A, given that scenario), with each conditional probability weighted by the probability of that scenario occurring. Example: Total Probability So if we define conditional probabilities of P(A | S) = 0.4, and P(A | SC) = 0.25, and the scenarios P(S) and P(SC) are 0.8 and 0.2 respectively, the probability of event A is: P(A) = P(A | S)*P(S) + P(A | SC)*P(SC) = (0.4)*(0.8) + (0.25)*(0.2) = 0.37. The total probability rule applies to three or more scenarios provided they are mutually exclusive and exhaustive. The formula is the sum of all weighted conditional probabilities (weighted by the probability of each scenario occurring). Using Probability and Conditional Expectations in Making Investment Decisions Investment decisions involve making future predictions based upon all information that we believe is relevant to our forecast. However, these forecasts are dynamic; they are always subject to change based on new information being made public. In many cases this new information causes us to modify our forecasts and either raise or lower our opinion on an investment. In other words, our expected values are conditional on changing real-world events, and thus can never be perceived as unconditional probabilities. In fact, a random variable's expected value is the weighted average of conditional probabilities, weighted by the probability of each scenario (where scenarios are mutually exclusive and exhaustive). The total probability rule applies to determining

expected values. Expected Value Methodology An expected value of a random variable is calculated by assigning a probability to each possible outcome and then taking a probability-weighted average of the outcomes. Example: Expected Value Assume that an analyst writes a report on a company and, based on the research, assigns the following probabilities to next year's sales: Scenario 1 2 3 3 Probability 0.10 0.30 0.30 0.30 Sales (\$ Millions) \$16 \$15 \$14 \$13

Answer: The analyst's expected value for next year's sales is (0.1)*(16.0) + (0.3)*(15.0) + (0.3)*(14.0) + (0.3)*(13.0) = \$14.2 million. The total probability rule for finding the expected value of variable X is given by E(X) = E(X | S)*P(S) + E(X | SC)*P(SC) for the simplest case: two scenarios, S and SC, that are mutually exclusive and exhaustive. If we refer to them as Scenario 1 and Scenario 2, then E(X | S) is the expected value of X in Scenario 1, and E(X | SC) is the expected value of X in Scenario 2. Tree Diagram The total probability rule can be easier to visualize if the information is presented in a tree diagram. Take a case where we have forecasted company sales to be anywhere in a range from \$13 to \$16 million, based on conditional probabilities. This company is dependent on the overall economy and on Wal-Mart's same-store sales growth, leading to the conditional probability scenarios demonstrated in figure 2.7 below:

In a good economy, our expected sales would be 25% likely to be \$16 million, and 75% likely to be \$15 million, depending on Wal-Mart's growth number. In a bad economy, we would be equally likely to generate \$13 million if Wal-Mart sales drop more than 2% or \$14 million (if the growth number falls between –2% and +1.9%). Expected sales (good economy) = (0.25)*(16) + (0.75)*(15) = 15.25 million.

Expected sales (bad economy) = (0.5)*(13) + (0.5)*(14) = 13.5 million. We predict that a good economy is 40% likely, and a bad economy 60% likely, leading to our expected value for sales: (0.4)*(15.25) + (0.6)*(13.5) = 14.2 million. 2.15 - Advanced Probability Concepts Covariance Covariance is a measure of the relationship between two random variables, designed to show the degree of comovement between them. Covariance is calculated based on the probability-weighted average of the crossproducts of each random variable's deviation from its own expected value. A positive number indicates comovement (i.e. the variables tend to move in the same direction); a value of 0 indicates no relationship, and a negative covariance shows that the variables move in the opposite direction. The process for actually computing covariance values is complicated and time-consuming, and it is not likely to be covered on a CFA exam question. Although the detailed formulas and examples of computations are presented in the reference text, for most people, spending too much valuable study time absorbing such detail will have you bogged down with details that are unlikely to be tested. Correlation Correlation is a concept related to covariance, as it also gives an indication of the degree to which two random variables are related, and (like covariance) the sign shows the direction of this relationship (positive (+) means that the variables move together; negative (-) means they are inversely related). Correlation of 0 means that there is no linear relationship one way or the other, and the two variables are said to be unrelated. A correlation number is much easier to interpret than covariance because a correlation value will always be between –1 and +1.
• •

–1 indicates a perfectly inverse relationship (a unit change in one means that the other will have a unit change in the opposite direction) +1 means a perfectly positive linear relationship (unit changes in one always bring the same unit changes in the other).

Moreover, there is a uniform scale from –1 to +1 so that as correlation values move closer to 1, the two variables are more closely related. By contrast, a covariance value between two variables could be very large and indicate little actual relationship, or look very small when there is actually a strong linear correlation. Correlation is defined as the ratio of the covariance between two random variables and the product of their two standard deviations, as presented in the following formula: Formula 2.24 Correlation (A, B) = _____Covariance (A, B) Standard Deviation (A)* Standard

Deviation (B) As a result: Covariance (A, B) = Correlation (A, B)*Standard Deviation (A)*Standard Deviation (B) Both correlation and covariance with these formulas are likely to be required in a calculation in which the other terms are provided. Such an exercise simply requires remembering the relationship, and substituting the terms provided. For example, if a covariance between two numbers of 30 is given, and standard deviations are 5 and 15, the correlation would be 30/(5)*(15) = 0.40. If you are given a correlation of 0.40 and standard deviations of 5 and 15, the covariance would be (0.4)*(5)*(15), or 30. Expected Return, Variance and Standard Deviation of a Portfolio Expected return is calculated as the weighted average of the expected returns of the assets in the portfolio, weighted by the expected return of each asset class. For a simple portfolio of two mutual funds, one investing in stocks and the other in bonds, if we expect the stock fund to return 10% and the bond fund to return 6%, and our allocation is 50% to each asset class, we have: Expected return (portfolio) = (0.1)*(0.5) + (0.06)*(0.5) = 0.08, or 8% Variance (σ2) is computed by finding the probability-weighted average of squared deviations from the expected value. Example: Variance In our previous example on making a sales forecast, we found that the expected value was \$14.2 million. Calculating variance starts by computing the deviations from \$14.2 million, then squaring: Scenario Probability Deviation from Expected Value Squared 1 2 3 4 0.1 0.30 0.30 0.30 (16.0 – 14.2) = 1.8 (15.0 – 14.2) = 0.8 (14.0 – 14.2) = – 0.2 (16.0 – 14.2) = – 1.2 3.24 0.64 0.04 1.44

Answer: Variance weights each squared deviation by its probability: (0.1)*(3.24) + (0.3)*(0.64) + (0.3)*(0.04) + (0.3)*(1.44) = 0.96 The variance of return is a function of the variance of the component assets as well as the covariance between each of them. In modern portfolio theory, a low or negative correlation between asset classes will reduce overall portfolio variance. The formula for portfolio variance in the simple case of a two–asset portfolio is given by:

Formula 2.25 Portfolio Variance = w2A*σ2(RA) + w2B*σ2(RB) + 2*(wA)*(wB)*Cov(RA, RB)

Where: wA and wB are portfolio weights, σ2(RA) and σ2(RB) are variances and Cov(RA, RB) is the covariance Example: Portfolio Variance Data on both variance and covariance may be displayed in a covariance matrix. Assume the following covariance matrix for our two–asset case: Stock Stock Bond Bond 350 80 80

From this matrix, we know that the variance on stocks is 350 (the covariance of any asset to itself equals its variance), the variance on bonds is 150 and the covariance between stocks and bonds is 80. Given our portfolio weights of 0.5 for both stocks and bonds, we have all the terms needed to solve for portfolio variance. Answer: Portfolio variance = w2A*σ2(RA) + w2B*σ2(RB) + 2*(wA)*(wB)*Cov(RA, RB) =(0.5)2*(350) + (0.5)2*(150) + 2*(0.5)*(0.5)*(80) = 87.5 + 37.5 + 40 = 165. Standard Deviation (σ), as was defined earlier when we discuss statistics, is the positive square root of the variance. In our example, σ = (0.96)1/2, or \$0.978 million. Standard deviation is found by taking the square root of variance: (165)1/2 = 12.85%. A two–asset portfolio was used to illustrate this principle; most portfolios contain far more than two assets, and the formula for variance becomes more complicated for multi-asset portfolios (all terms in a covariance matrix need to be added to the calculation). Joint Probability Functions and Covariance Let's now apply the joint probability function to calculating covariance: Example: Covariance from a Joint Probability Function To illustrate this calculation, let's take an example where we have estimated the year-over-year sales growth for GM and Ford in three industry environments: strong (30% probability), average (40%) and weak (30%). Our estimates are indicated in the following joint-probability function: F Sales +6% F Sales +3% F Sales –1% GM Sales +10% GM Sales + 4% GM Sales –4% Strong (0.3) Avg. (0.4) Weak (0.3)

Answer: To calculate covariance, we start by finding the probability-weighted sales estimate (expected value):

GM = (0.3)*(10) + (0.4)*(4) + (.03)*( –4) = 3 + 1.6 – 1.2 = 3.4% Ford = (0.3)*(6) + (0.4)*(3) + (0.3)*( –1) = 1.8 + 1.2 – 0.3 = 2.7% In the following table, we compute covariance by taking the deviations from each expected value in each market environment, multiplying the deviations together (the cross products) and then weighting the cross products by the probability Environment Strong Average Weak GM deviation 10 – 3.4 = 6.6 4 – 3.4 = 0.6 –4 – 3.4 = –7.4 F deviation 6 – 2.7 = 3.3 3 – 2.7 = 0.3 –1 – 2.7 = –3.7 Cross-products 6.6*3.3 = 21.78 0.6*0.3 = 0.18 –7.4*–3.7 = 27.38 Prob. 0.3 0.4 0.3 Prob-wtd. 6.534 0.072 8.214

The last column (prob-wtd.) was found by multiplying the cross product (column 4) by the probability of that scenario (column 5). The covariance is found by adding the values in the last column: 6.534+0.072+8.214 = 14.82. Bayes' Formula We all know intuitively of the principle that we learn from experience. For an analyst, learning from experience takes the form of adjusting expectations (and probability estimates) based on new information. Bayes' formula essentially takes this principle and applies it to the probability concepts we have already learned, by showing how to calculate an updated probability, the new probability given this new information. Bayes' formula is the updated probability, given new information: Bayes' Formula: Conditional probability of new info. given the event * (Prior probability of the event) Unconditional Probability of New Info Formula 2.26 P(E | I) = P(I | E) / P(I) * P(E) Where: E = event, I = new info The Multiplication Rule of Counting The multiplication rule of counting states that if the specified number of tasks is given by k and n1, n2, n3, … nk are variables used for the number of ways each of these tasks can be done, then the total number of ways to perform k tasks is found by multiplying all of the n1, n2, n3, … nk variables together. Take a process with four steps: Step 1 Number of ways this step can be done 6

2 3 4

3 1 5

This process can be done a total of 90 ways. (6)*(3)*(1)*(5) = 90. Factorial Notation n! = n*(n – 1)*(n – 2) … *1. In other words, 5!, or 5 factorial is equal to (5)*(4)*(3)*(2)*(1) = 120. In counting problems, it is used when there is a given group of size n, and the exercise is to assign the group to n slots; then the number of ways these assignments could be made is given by n!. If we were managing five employees and had five job functions, the number of possible combinations is 5! = 120. Combination Notation Combination notation refers to the number of ways that we can choose r objects from a total of n objects, when the order in which the r objects is listed does not matter. In shorthand notation: Formula 2.27
n

Cr = n r

=

n! (n – r)!*r!

Thus if we had our five employees and we needed to choose three of them to team up on a new project, where they will be equal members (i.e. the order in which we choose them isn't important), formula tells us that there are 5!/(5 – 3)!3! = 120/(2)*(6) = 120/12, or 10 possible combinations. Permutation notation Permutation notation takes the same case (choosing r objects from a group of n) but assumes that the order that “r” is listed matters. It is given by this notation: Formula 2.28
n r

P = n!/(n – r)!

Returning to our example, if we not only wanted to choose three employees for our project, but wanted to establish a hierarchy (leader, second-in-command, subordinate), by using the permutation formula, we would have 5!/(5 – 3)! = 120/2 = 60 possible ways. Now, let's consider how to calculate problems asking the number of ways to choose robjects from a total of nobjects when the order in which the robjects are listed matters, and when the order does not matter.

• •

The combination formula is used if the order of r does not matter. For choosing three objects from a total of five objects, we found 5!/(5 – 3)!*3!, or 10 ways. The permutation formula is used if the order of r does matter. For choosing three objects from a total of five objects, we found 5!/(5 – 3)!, or 60 ways.

Method Factorial Combination Permutation

When appropriate? Assigning a group of size n to n slots Choosing r objects (in any order) from group of n

Choosing r objects (in particular order) from group of n 2.16 - Common Probability Distributions The topics in this section provide a number of the quantitative building blocks useful in analyzing and predicting random variables such as future sales and earnings, growth rates, market index returns and returns on individual asset classes and specific securities. All of these variables have uncertain outcomes; thus there is risk that any downside uncertainty can result in a surprising and material impact. By understanding the mechanics of probability distributions, such risks can be understood and analyzed, and measures taken to hedge or reduce their impact. Probability Distribution A probability distribution gathers together all possible outcomes of a random variable (i.e. any quantity for which more than one value is possible), and summarizes these outcomes by indicating the probability of each of them. While a probability distribution is often associated with the bell-shaped curve, recognize that such a curve is only indicative of one specific type of probability, the so-called normal probability distribution. The CFA curriculum does focus on normal distributions since they frequently apply to financial and investment variables, and are used in hypothesis testing. However, in real life, a probability distribution can take any shape, size and form. Example: Probability Distribution For example, say if we wanted to choose a day at random in the future to schedule an event, and we wanted to know the probability that this day would fall on a Sunday, as we will need to avoid scheduling it on a Sunday. With seven days in a week, the probability that a random day would happen to be a Sunday would be given by one-seventh or about 14.29%. Of course, the same 14.29% probability would be true for any of the other six days. In this case, we would have a uniform probability distribution: the chances that our random day would fall on any particular day are the same, and the graph of our probability distribution would be a straight line.

Figure 2.8: Probability Distribution Probability distributions can be simple to understand as in this example, or they can be very complex and require sophisticated techniques (e.g., option pricing models, Monte Carlo simulations) to help describe all possible outcomes.

Discrete Random Variables Discrete random variables can take on a finite or countable number of possible outcomes. The previous example asking for a day of the week is an example of a discrete variable, since it can only take seven possible values. Monetary variables expressed in dollars and cents are always discrete, since money is rounded to the nearest \$0.01. In other words, we may have a formula that suggests a stock worth \$15.75 today will be \$17.1675 after it grows 9%, but you can’t give or receive three-quarters of a penny, so our formula would round the outcome of 9% growth to an amount of \$17.17. Continuous Random Variables A continuous random variable has infinite possible outcomes. A rate of return (e.g. growth rate) is continuous:
• • •

a stock can grow by 9% next year or by 10%, and in between this range it could grow by 9.3%, 9.4%, 9.5% in between 9.3% and 9.4% the rate could be 9.31%, 9.32%, 9.33%, and in between 9.32% and 9.33% it could grow 9.32478941% clearly there is no end to how precise the outcomes could be broken down; thus it’s described as a continuous variable.

Outcomes in Discrete vs. Continuous Variables The rule of thumb is that a discrete variable can have all possibilities listed out, while a continuous variable must be expressed in terms of its upper and lower limits, and greater-than or less-than indicators. Of course, listing out a large set of possible outcomes (which is usually the case for money variables) is usually impractical – thus money variables will usually have outcomes expressed as if they were continuous. Rates of return can theoretically range from –100% to positive infinity. Time is bound on the lower side by 0. Market price of a security will also have a lower limit of \$0, while its upper limit will depend on the security – stocks have no upper limit (thus a stock price’s outcome > \$0), but bond prices are more complicated, bound by factors such as time-to-maturity and embedded call options. If a face value of a bond is \$1,000, there’s an upper limit (somewhere above \$1,000) above which the price of the bond will not go, but pinpointing the upper value of that set is imprecise. Probability Function A probability function gives the probabilities that a random variable will take on a given list of specific values. For a discrete variable, if (x1, x2, x3, x4 …) are the complete set of possible outcomes, p(x) indicates the chances that X will be equal to x. Each x in the list for a discrete variable will have a p(x). For a continuous variable, a probability function is expressed as f(x). The two key properties of a probability function, p(x) (or f(x) for continuous), are the following: 1. 0 < p(x) < 1, since probability must always be between 0 and 1. 2. Add up all probabilities of all distinct possible outcomes of a random variable, and the sum must equal 1. Determining whether a function satisfies the first property should be easy to spot since we know that probabilities always lie between 0 and 1. In other words, p(x) could never be 1.4 or –0.2. To illustrate the second property, say we are given a set of three possibilities for X: (1, 2, 3) and a set of three for Y: (6, 7, 8), and given the probability functions f(x) and g(y).

x 1 2 3

f(x) 0.31 0.43 0.26

y 6 7 8

g(y) 0.32 0.40 0.23

For all possibilities of f(x), the sum is 0.31+0.43+0.26=1, so we know it is a valid probability function. For all possibilities of g(y), the sum is 0.32+0.40+0.23 = 0.95, which violates our second principle. Either the given probabilities for g(y) are wrong, or there is a fourth possibility for y where g(y) = 0.05. Either way it needs to sum to 1. Probability Density Function A probability density function (or pdf) describes a probability function in the case of a continuous random variable. Also known as simply the “density”, a probability density function is denoted by “f(x)”. Since a pdf refers to a continuous random variable, its probabilities would be expressed as ranges of variables rather than probabilities assigned to individual values as is done for a discrete variable. For example, if a stock has a 20% chance of a negative return, the pdf in its simplest terms could be expressed as: x <0 >0 2.17 - Common Probability Distribution Calculations Cumulative Distribution Functions A cumulative distribution function or CDF, expresses a probability’s function in terms of lowest to highest value, by giving the probability that a random variable X is less than or equal to a particular value x. Expressed in shorthand, the cumulative distribution function is P(X < x). A cumulative distribution function is constructed by summing up, or cumulating all values in the probability function that are less than or equal to x. The concept is similar to the cumulative relative frequency covered earlier in this study guide, which computed values below a certain point in a frequency distribution. Example: Cumulative Distribution Function For example, the following probability distribution includes the cumulative function. X=x < –12 –3 to 4 4 to 10 > 10 P(X = x) P(X < x) or cdf 0.15 0.25 0.25 0.2 0.15 0.30 0.55 0.80 1.0 f(x) 0.2 0.8

–12 to –3 0.15

From the table, we find that the probability that x is less than or equal to 4 is 0.55, the summed probabilities of the first three P(X) terms, or the number found in the cdf column for the third row, where x < 4. Sometimes a

question might ask for the probability of x being greater than 4, for which this problem is 1 – P(x < 4) = 1 – 0.55 = 0.45. This is a question most people should get – but one that will still have too many people answering 0.55 because they weren’t paying attention to the "greater than". Discrete Uniform Random Variable A discrete uniform random variable is one that fulfills the definition of "discrete", where there are a finite and countable number of terms, along with the definition of "uniform", where there is an equally likely probability that the random variable X will take any of its possible values x. If there are n possible values for a discrete uniform random variable, the probability of a specific outcome is 1/n. Example: Discrete Uniform Random Variable Earlier we provided an example of a discrete uniform random variable: a random day is one-seventh likely to fall on a Sunday. To illustrate some examples on how probabilities are calculated, take the following discrete uniform distribution with n = 5. X=x 2 4 6 8 10 P(X = x) 0.2 0.2 0.2 0.2 0.2 P(X < x) 0.2 0.4 0.6 0.8 1.0

According to the distribution above, we have the probability of x = 8 as 0.2. The probability of x = 2 is the same, 0.2. Suppose that the question called for P(4 < X < 8). The answer would be the sum of P(4) + P(6) + P(8) = 0.2 + 0.2 + 0.2 = 0.6. Suppose the question called for P(4 < X < 8). In this case, the answer would omit P(4) and P(8) since it’s less than, NOT less than or equal to, and the correct answer would be P(6) = 0.2. The CFA exam writers love to test whether you are paying attention to details and will try to trick you – the probability of such tactics is pretty much a 1.0! Binomial Random Variable Binomial probability distributions are used when the context calls for assessing two outcomes, such as "success/failure", or "price moved up/price moved down". In such situations where the possible outcomes are binary, we can develop an estimate of a binomial random variable by holding a number of repeating trials (also known as "Bernoulli trials"). In a Bernoulli trial, p is the probability of success, (1 – p) is the probability of failure. Suppose that a number of Bernoulli trails are held, with the number denoted by n. A binomial random variable X is defined as the numberof successes in n Bernoulli trials, given two simplifying assumptions: (1) the probabilityp of success is the same for all trials and (2) the trials are independent of each other. Thus, a binomial random variable is described by two parameters: p (the probability of success of one trial) and

n (the number of trials). A binomial probability distribution with p = 0.50 (equal chance of success or failure) and n = 4 would appear as:

x (# of successes) 0 1 2 3 4

p(x) 0.0625 0.25 0.375 0.25 0.0625

cdf, P(X < x) 0.0625 0.3125 0.6875 0.9325 1.0000

The reference text demonstrates how to construct a binomial probability distribution by using the formula p(x) = (n!/(n – x)!x!)*(px)*(1 – p)n-x. We used this formula to assemble the above data, though the exam would probably not expect you to create each p(x); it would probably provide you with the table, and ask for an interpretation. For this table, the probability of exactly one success is 0.25; the probability of three or fewer successes is 0.9325 (the cdf value in the row where x = 3); the probability of at least one is 0.9325 (1 – P(0)) = (1 – 0.0625) = 0.9325. Calculations The expected value of a binomial random variable is given by the formula n*p. In the example above, with n = 4 and p = 0.5, the expected value would be 4*0.5, or 2. The variance of a binomial random variable is calculated by the formula n*p*(1 – p). Using the same example, we have variance of 4*0.5*0.5 = 1. If our binomial random variable still had n = 4 but with a greater predictability in the trial, say p = 9, our variance would reduce to 4*0.9*0.1 = 0.36. For successive trials (i.e. for higher n), both mean and variance increase but variance increases at a lower rate – thus the higher the n, the better the model works at predicting probability. Creating a Binomial Tree The binomial tree is essentially a diagram showing that the future value of a stock is the product of a series of up or down movements leading to a growing number of possible outcomes. Each possible value is called a node.

Figure 2.9: Binomial Tree

Continuous Uniform Distribution A continuous uniform distribution describes a range of outcomes, usually bound with an upper and lower limit, where any point in the range is a possibility. Since it is a range, there are infinite possibilities within the range. In addition, all outcomes are all equally likely (i.e. they are spread uniformly throughout the range). To calculate probabilities, find the area under a pdf curve such as the one graphed here. In this example, what is the probability that the random variable will be between 1 and 3? The area would be a rectangle with a width of 2 (the distance between 1 and 3), and height of 0.2, 2*0.2 = 0.4. What is the probability that x is less than 3? The rectangle would have a width of 3 and the same height: 0.2. 3*0.2 = 0.6 2.18 - Common Probability Distribution Properties Normal Distribution The normal distribution is a continuous probability distribution that, when graphed as a probability density, takes the form of the so-called bell-shaped curve. The bell shape results from the fact that, while the range of possible outcomes is infinite (negative infinity to positive infinity), most of the potential outcomes tend to be clustered relatively close to the distribution’s mean value. Just how close they are clustered is given by the standard deviation. In other words, a normal distribution is described completely by two parameters: its mean (μ) and its standard deviation (σ). Here are other defining characteristics of the normal distribution: it is symmetric, meaning the mean value divides the distribution in half and one side is the exact mirror image of the other –that is, skewness = 0. Symmetry also requires that mean = median = mode. Its kurtosis (measure of peakedness) is 3 and its excess kurtosis (kurtosis – 3) equals 0. Also, if given 2 or more normally distributed random variables, the linear combination must also be normally distributed. While any normal distribution will share these defining characteristics, the mean and standard deviation will be unique to the random variable, and these differences will affect the shape of the distribution. On the following page are two normal distributions, each with the same mean, but the distribution with the dotted line has a higher standard deviation.

Univariate vs. Multivariate Distributions A univariate distribution specifies probabilities for a single random variable, while a multivariate distribution combines the outcomes of a group of random variables and summarizes probabilities for the group. For example, a stock will have a distribution of possible return outcomes; those outcomes when summarized would be in a univariable distribution. A portfolio of 20 stocks could have return outcomes described in terms of 20 separate univariate distributions, or as one multivariate distribution.

Earlier we indicated that a normal distribution is completely described by two parameters: its mean and standard deviation. This statement is true of a univariate distribution. For models of multivariate returns, the mean and standard deviation of each variable do not completely describe the multivariate set. A third parameter is required, the correlation, or co-movement, between each pair of variables in the set. For example, if a multivariate return distribution was being assembled for a portfolio of stocks, and a number of pairs were found to be inversely related (i.e. one increases at the same time the other decreases), then we must consider the overall effect on portfolio variance. For a group of assets that are not completely positively related, there is the opportunity to reduce overall risk (variance) as a result of the interrelationships. For a portfolio distribution with n stocks, the multivariate distribution is completely described by the n mean returns, the n standard deviations and the n*(n – 1)/2 correlations. For a 20-stock portfolio, that’s 20 lists of returns, 20 lists of variances of return and 20*19/2, or 190 correlations. 2.19 - Confidence Intervals While a normally-distributed random variable can have many potential outcomes, the shape of its distribution gives us confidence that the vast majority of these outcomes will fall relatively close to its mean. In fact, we can quantify just how confident we are. By using confidence intervals - ranges that are a function of the properties of a normal bell-shaped curve - we can define ranges of probabilities. The diagram below has a number of percentages – these numbers (which are approximations and rounded off) indicate the probability that a random outcome will fall into that particular section below the curve.

In other words, by assuming normal distribution, we are 68% confident that a variable will fall within one standard deviation. Within two standard deviation intervals, our confidence grows to 95%. Within three standard deviations, 99%. Take an example of a distribution of returns of a security with a mean of 10% and a standard deviation of 5%:
• •

68% of the returns will be between 5% and 15% (within 1 standard deviation, 10 + 5). 95% of the returns will be between 0% and 20% (within 2 std. devs., 10 + 2*5).

99% of the returns will be between –5% and 25% (within 3 std. devs., 10 + 3*5)

Standard Normal Distribution Standard normal distribution is defined as a normal distribution where mean = 0 and standard deviation = 1. Probability numbers derived from the standard normal distribution are used to help standardize a random variable – i.e. express that number in terms of how many standard deviations it is away from its mean. Standardizing a random variable X is done by subtracting X from the mean value (μ), and then dividing the result by the standard deviation (σ). The result is a standard normal random variable which is denoted by the letter Z. Formula 2.31 Z = (X – μ)/σ Example 1: If a distribution has a mean of 10 and standard deviation of 5, and a random observation X is –2, we would standardize our random variable with the equation for Z. Z = (X – μ)/ σ = (–2 – 10)/5 = –12/5 = –2.4 The standard normal random variable Z tells us how many standard deviations the observation is from the mean. In this case, –2 translates to 2.4 standard deviations away from 10. Example 2: You are considering an investment portfolio with an expected return of 10% and a standard deviation of 8%. The portfolio's returns are normally distributed. What is the probability of earning a return less than 2%? Again, we'd start with standardizing random variable X, which in this case is 10%: Z = (X – μ)/ σ = (2 – 10)/8 = –8/8 = –1.0 Next, one would often consult a Z-table for cumulative probabilities for a standard normal distribution in order to determine the probability. In this case, for Z = -1, P(Z ≤ x) – 0.158655, or 16%. Therefore, there is a 16% probability of earning a return of less than 2%. Keep in mind that your upcoming exam will not provide Z-tables, so, how would you solve this problem on test day? The answer is that you need to remember that 68% of observations fall + 1 standard deviation on a normal curve, which means that 32% are not within one standard deviation. This question essentially asked for probability of more than one standard deviation below, or 32%/2 = 16%. Study the earlier diagram that shows specific percentages for certain standard deviation intervals on a normal curve – in particular, remember 68% for + one away, and remember 95% for + two away. Shortfall Risk Shortfall risk is essentially a refinement of the modern-day development of mean-variance analysis, that is, the idea that one must focus on both risk and return as opposed to simply the return. Risk is typically measured by standard deviation, which measures all deviations – i.e. both positive and negative. In other words, positive

deviations are treated as if they were equal to negative deviations. In the real world, of course, negative surprises are far more important to quantify and predict with clarity if one is to accurately define risk. Two mutual funds could have the same risk if measured by standard deviation, but if one of those funds tends to have more extreme negative outcomes, while the other had a high standard deviation due to a preponderance of extreme positive surprises, then the actual risk profiles of those funds would be quite different. Shortfall risk defines a minimum acceptable level, and then focuses on whether a portfolio will fall below that level over a given time period. Roy's Safety-First Ratio An optimal portfolio is one that minimizes the probability that the portfolio's return will fall below a threshold level. In probability notation, if RP is the return on the portfolio, and RL is the threshold (the minimum acceptable return), then the portfolio for which P(RP < RL) is minimized will be the optimal portfolio according to Roy's safety-first criterion. The safety-first ratio helps compute this level by giving the number of standard deviations between the expected level and the minimum acceptable level, with the higher number considered safer. Formula 2.32 SFRatio = (E(RP) – RL)/ σP Example: Roy's Safety First Ratio Let's say our minimum threshold is –2%, and we have the following expectations for portfolios A and B: Portfolio A Expected Annual Return Standard Deviation 8% 10% Portfolio B 12% 16%

Answer: The SFRatio for portfolio A is (8 – (–2))/10 = 1.0 The SFRatio for portfolio B is (12 – (–2))/16 = 0.875 In other words, the minimum threshold is one standard deviation away in Portfolio A, and just 0.875 away in Portfolio B, so by safety-first rules we opt for Portfolio A. Lognormal Distributions A lognormal distribution has two distinct properties: it is always positive (bounded on the left by zero), and it is skewed to the right. Prices for stocks and many other financial assets (anything which by definition can never be negative) are often found to be lognormally distributed. Also, the lognormal and normal distributions are related: if a random variable X is lognormally distributed, then its natural log, ln(X) is normally distributed. (Thus the term “lognormal” – the log is normal.) Figure 2.11 below demonstrates a typical lognormal distribution.

2.20 - Discrete and Continuous Compounding In discrete compounded rates of return, time moves forward in increments, with each increment having a rate of return (ending price / beginning price) equal to 1. Of course, the more frequent the compounding, the higher the rate of return. Take a security that is expected to return 12% annually:
• • • • •

With annual holding periods, 12% compounded once = (1.12)1 – 1 = 12%. With quarterly holding periods, 3% compounded 4 times = (1.03)4 – 1 = 12.55% With monthly holding periods, 1% compounded 12 times = (1.01)12 – 1 = 12.68% With daily holding periods, (12/365) compounded 365 times = 12.7475% With hourly holding periods, (12/(365*24) compounded (365*24) times = 12.7496%

With greater frequency of compounding (i.e. as holding periods become smaller and smaller) the effective rate gradually increases but in smaller and smaller amounts. Extending this further, we can reduce holding periods so that they are sliced smaller and smaller so they approach zero, at which point we have the continuously compounded rate of return. Discrete compounding relates to measurable holding periods and a finite number of holding periods. Continuous compounding relates to holding periods so small they cannot be measured, with frequency of compounding so large it goes to infinity. The continuous rate associated with a holding period is found by taking the natural log of 1 + holding-period return) Say the holding period is one year and holding-period return is 12%: ln (1.12) = 11.33% (approx.) In other words, if 11.33% were continuously compounded, its effective rate of return would be about 12%. Earlier we found that 12% compounded hourly comes to about 12.7496%. In fact, e (the transcendental number) raised to the 0.12 power yields 12.7497% (approximately). As we've stated previously, actual calculations of natural logs are not likely for answering a question as they give an unfair advantage to those with higher function calculators. At the same time, an exam problem can test knowledge of a relationship without requiring the calculation. For example, a question could ask: Q. A portfolio returned 5% over one year, if continuously compounded, this is equivalent to ____? A. ln 5 B. ln 1.05 C. e5

D. e1.05 The answer would be B based on the definition of continuous compounding. A financial function calculator or spreadsheet could yield the actual percentage of 4.879%, but wouldn't be necessary to answer the question correctly on the exam. Monte Carlo Simulation A Monte Carlo Simulation refers to a computer-generated series of trials where the probabilities for both risk and reward are tested repeatedly in an effort to help define these parameters. These simulations are characterized by large numbers of trials – typically hundreds or even thousands of iterations, which is why it's typically described as “computer generated”. Also know that Monte Carlo simulations rely on random numbers to generate a series of samples. Monte Carlo simulations are used in a number of applications, often as a complement to other risk-assessment techniques in an effort to further define potential risk. For example, a pension-benefit administrator in charge of managing assets and liabilities for a large plan may use computer software with Monte Carlo simulation to help understand any potential downside risk over time, and how changes in investment policy (e.g. higher or lower allocations to certain asset classes, or the introduction of a new manager) may affect the plan. While traditional analysis focuses on returns, variances and correlations between assets, a Monte Carlo simulation can help introduce other pertinent economic variables (e.g. interest rates, GDP growth and foreign exchange rates) into the simulation. Monte Carlo simulations are also important in pricing derivative securities for which there are no existing analytical methods. European- and Asian-style options are priced with Monte Carlo methods, as are certain mortgage-backed securities for which the embedded options (e.g. prepayment assumptions) are very complex. A general outline for developing a Monte Carlo simulation involves the following steps (please note that we are oversimplifying a process that is often highly technical): 1. Identify all variables about which we are interested, the time horizon of the analysis and the distribution of all risk factors associated with each variable. 2. Draw K random numbers using a spreadsheet generator. Each random variable would then be standardized so we have Z1, Z2, Z3… ZK. 3. Simulate the possible values of the random variable by calculating its observed value with Z1, Z2, Z3… ZK. 4. Following a large number of iterations, estimate each variable and quantity of interest to complete one trial. Go back and complete additional trials to develop more accurate estimates. Historical Simulation Historical simulation, or back simulation, follows a similar process for large numbers of iterations, with historical simulation drawing from the previous record of that variable (e.g. past returns for a mutual fund). While both of these methods are very useful in developing a more meaningful and in-depth analysis of a complex system, it's important to recognize that they are basically statistical estimates; that is, they are not as analytical as (for example) the use of a correlation matrix to understand portfolio returns. Such simulations tend to work best when the input risk parameters are well defined. << Back Next >> 2.21 - Sampling and Estimation A data sample, or subset of a larger population, is used to help understand the behavior and characteristics of the entire population. In the investing world, for example, all of the familiar stock market averages are samples

designed to represent the broader stock market and indicate its performance return. For the domestic publiclytraded stock market, populated with at least 10,000 or more companies, the Dow Jones Industrial Average (DJIA) has just 30 representatives; the S&P 500 has 500. Yet these samples are taken as valid indicators of the broader population. It's important to understand the mechanics of sampling and estimating, particularly as they apply to financial variables, and have the insight to critique the quality of research derived from sampling efforts. BASICS Simple Random Sampling To begin the process of drawing samples from a larger population, an analyst must craft a sampling plan, which indicates exactly how the sample was selected. With a large population, different samples will yield different results, and the idea is to create a consistent and unbiased approach. Simple random sampling is the most basic approach to the problem. It draws a representative sample with the principle that every member of the population must have an equal chance of being selected. The key to simple random sampling is assuring randomness when drawing the sample. This requirement is achieved a number of ways, most rigorously by first coding every member of the population with a number, and then using a random number generator to choose a subset. Sometimes it is impractical or impossible to label every single member of an entire population, in which case systematic sampling methods are used. For example, take a case where we wanted to research whether the S&P 500 companies were adding or laying off employees, but we didn't have the time or resources to contact all 500 human resources departments. We do have the time and resources for an in-depth study of a 25-company sample. A systematic sampling approach would be to take an alphabetical list of the S&P 500 and contact every 25th company on the list, i.e. companies #25, #50, #75, etc., up until #500. This way we end up with 25 companies and it was done under a system that's approximately random and didn't favor a particular company or industry. Sampling Error Suppose we polled our 25 companies and came away with a conclusion that the typical S&P 500 firm will be adding approximately 5% to their work force this fiscal year, and, as a result, we are optimistic about the health of the economy. However, the daily news continues to indicate a fair number of layoffs at some companies and hiring freezes at other firms, and we wonder whether this research has actually done its job. In other words, we suspect sampling error: the difference between the statistic from our sample (5% job growth) and the population parameter we were estimating (actual job growth). Sampling Distribution A sampling distribution is analogous to a population distribution: it describes the range of all possible values that the sampling statistic can take. In the assessment of the quality of a sample, the approach usually involves comparing the sampling distribution to the population distribution. We expect the sampling distribution to be a pattern similar to the population distribution – that is, if a population is normally distributed, the sample should also be normally distributed. If the sample is skewed when we were expecting a normal pattern with most of the observations centered around the mean, it indicates potential problems with the sample and/or the methodology. Stratified Random Sampling. In a stratified random approach, a population is first divided into subpopulations or strata, based upon one or more classification criteria. Within each stratum, a simple random sample is taken from those members (the members of the subpopulation). The number to be sampled from each stratum depends on its size relative to the population – that is, if a classification system results in three subgroups or strata, and Group A has 50% of the population, and Group B and Group C have 25% each, the sample we draw must conform to the same relative sizes (half of the sample from A, a quarter each from B and C). The samples taken from each strata are then

pooled together to form the overall sample. The table below illustrates a stratified approach to improving our economic research on current hiring expectations. In our earlier approach that randomly drew from all 500 companies, we may have accidentally drawn too heavily from a sector doing well, and under-represented other areas. In stratified random sampling, each of the 500 companies in the S&P 500 index is assigned to one of 12 sectors. Thus we have 12 strata, and our sample of 25 companies is based on drawing from each of the 12 strata, in proportions relative to the industry weights within the index. The S&P weightings are designed to replicate the domestic economy, which is why financial services and health care (which are relatively more important sectors in today's economy) are more heavily weighted than utilities. Within each sector, a random approach is used – for example, if there are 120 financial services companies and we need five financial companies for our research study, those five would be selected via a random draw, or by a systematic approach (i.e. every 24th company on an alphabetical list of the subgroup).

Sector

Percent of Companies to S&P 500 sample 3.8% 9.4% 1 2

Sector

Percent of S&P 500 13.6% 12.7%

Companies to sample 4 3

Business Svcs Consumer Goods Consumer Svcs Energy Financial Svcs Hardware

Health Care Idstrl Mtls.

8.2%

2

Media

3.7%

1

8.5% 20.1% 9.4%

2 5 2

Software Telecomm Utilities

3.9% 3.2% 3.4%

1 1 1

Time-Series Data Time series date refers to one variable taken over discrete, equally spaced periods of time. The distinguishing feature of a time series is that it draws back on history to show how one variable has changed. Common examples include historical quarterly returns on a stock or mutual fund for the last five years, earnings per share on a stock each quarter for the last ten years or fluctuations in the market-to-book ratio on a stock over a 20year period. In every case, past time periods are examined. Cross-Sectional Data Cross section data typically focuses on one period of time and measures a particular variable across several companies or industries. A cross-sectional study could focus on quarterly returns for all large-cap value mutual funds in the first quarter of 2005, or this quarter's earnings-per-share estimates for all pharmaceutical firms, or differences in the current market-to-book ratio for the largest 100 firms traded on the NYSE. We can see that the actual variables being examined may be similar to a time-series analysis, with the difference being that a

single time period is the focus, and several companies, funds, etc. are involved in the study. The earlier example of analyzing hiring plans at S&P 500 companies is a good example of cross-sectional research. The Central Limit Theorem The central limit theorem states that, for a population distribution with mean = μ and a finite variance σ2, the sampling distribution will take on three important characteristics as the sample size becomes large: 1. The sample mean will be approximately normally distributed. 2. The sample mean will be equal to the population mean (μ). 3. The sample variance will be equal to the population variance (σ2) divided by the size of the sample (n). The first assumption - that the sample distribution will be normal - holds regardless of the distribution of the underlying population. Thus the central limit theorem can help make probability estimates for a sample of a non-normal population (e.g. skewed, lognormal), based on the fact that the sample mean for large sample sizes will be a normal distribution. This tendency toward normally distributed series for large samples gives the central limit theorem its most powerful attribute. The assumption of normality enables samples to be used in constructing confidence intervals and to test hypotheses, as we will find when covering those subjects. Exactly how large is large in terms of creating a large sample? Remember the number 30. According to the reference text, that's the minimum number a sample must be before we can assume it is normally distributed. Don't be surprised if a question asks how large a sample should be – should it be 20, 30, 40, or 50? It's an easy way to test whether you've read the textbook, and if you remember 30, you score an easy correct answer. Standard Error The standard error is the standard deviation of the sample statistic. Earlier, we indicated that the sample variance is the population variance divided by n (sample size). The formula for standard error was derived by taking the positive square root of the variance. If the population standard deviation is given, standard error is calculated by this ratio: population standard deviation / square root of sample size, or σ/(n)1/2. If population standard deviation is unknown, the sample standard deviation (s) is used to estimate it, and standard error = s/(n)1/2. Note that "n" in the denominator means that the standard error becomes smaller as the sample size becomes larger, an important property to remember. Point Estimate vs. Confidence Interval Population Parameters A point estimate is one particular value that is used to estimate the underlying population parameter. For example, the sample mean is essentially a point estimate of a population mean. However, because of the presence of sampling error, sometimes it is more useful to start with this point estimate, and then establish a range of values both above and below the point estimate. Next, by using the probability-numbers characteristic of normally distributed variables, we can state the level of confidence we have that the actual population mean will fall somewhere in our range. This process is knows as "constructing a confidence interval". The level of confidence we want to establish is given by the number α, or alpha, which is the probability that a point estimate will not fall in a confidence range. The lower the alpha, the more confident we want to be – e.g. alpha of 5% indicates we want to be 95% confident; 1% alpha indicates 99% confidence. Properties of an Estimator The three desirable properties of an estimator are that they are unbiased, efficient and consistent:

1. Unbiased - The expected value (mean) of the estimate's sampling distribution is equal to the underlying population parameter; that is, there is no upward or downward bias. 2. Efficiency - While there are many unbiased estimators of the same parameter, the most efficient has a sampling distribution with the smallest variance. 3. Consistency - Larger sample sizes tend to produce more accurate estimates; that is, the sample parameter converges on the population parameter. Constructing Confidence Intervals The general structure for a (1 – α) confidence interval is given by:

Formula 2.33

Where: the reliability factor increases as a function of an increasing confidence level. In other words, if we want to be 99% confident that a parameter will fall within a range, we need to make that interval wider than we would if we wanted to be only 90% confident. The actual reliability factors used are derived from the standard normal distribution, or Z value, at probabilities of alpha/2 since the interval is two-tailed, or above and below a point.

Degrees of Freedom Degrees of freedom are used for determining the reliability-factor portion of the confidence interval with the t– distribution. In finding sample variance, for any sample size n, degrees of freedom = n – 1. Thus for a sample size of 8, degrees of freedom are 7. For a sample size of 58, degrees of freedom are 57. The concept of degrees of freedom is taken from the fact that a sample variance is based on a series of observations, not all of which can be independently selected if we are to arrive at the true parameter. One observation essentially depends on all the other observations. In other words, if the sample size is 58, think of that sample of 58 in two parts: (a) 57 independent observations and (b) one dependent observation, on which the value is essentially a residual number based on the other observations. Taken together, we have our estimates for mean and variance. If degrees of freedom is 57, it means that we would be "free" to choose any 57 observations (i.e. sample size – 1), since there is always that 58th value that will result in a particular sample mean for the entire group. Characteristic of the t-distribution is that additional degrees of freedom reduce the range of the confidence interval, and produce a more reliable estimate. Increasing degrees of freedom is done by increasing sample size. For larger sample sizes, use of the z-statistic is an acceptable alternative to the t-distribution – this is true since the z-statistic is based on the standard normal distribution, and the t-distribution moves closer to the standard normal at higher degrees of freedom. Student's t-distribution Student's t-distribution is a series of symmetrical distributions, each distribution defined by its degrees of freedom. All of the t-distributions appear similar in shape to a standard normal distribution, except that, compared to a standard normal curve, the t-distributions are less peaked and have fatter tails. With each increase in degrees of freedom, two properties change: (1) the distribution's peak increases (i.e. the probability that the

estimate will be closer to the mean increases), and (2) the tails (in other words, the parts of the curve far away from the mean estimate) approach zero more quickly – i.e. there is a reduced probability of extreme values as we increase degrees of freedom. As degrees of freedom become very large – as they approach infinity – the tdistribution approximates the standard normal distribution. Figure 2.12: Student's t-distribution

2.22 - Sampling Considerations Sample Size Increasing sample size benefits a research study by increasing the confidence and reliability of the confidence interval, and as a result, the precision with which the population parameter can be estimated. Other choices affect how wide or how narrow a confidence interval will be: choice of statistic, with t being wider/more conservative than z, as well as degree of confidence, with lesser degrees such as 90% resulting in wider/more conservative intervals than 99%. An increase in sample size tends to have an even more meaningful effect, due to the formula for standard error (i.e. the ratio of 'sample standard deviation / sample size1/2'), resulting in the fact that standard error varies inversely with sample size. As a result, more observations in the sample (all other factors equal) improve the quality of a research study. At the same time, two other factors tend to make larger sample sizes less desirable. The first consideration, which primarily affects time-series data, is that population parameters have a tendency to change over time. For example, if we are studying a mutual fund and using five years of quarterly returns in our analysis (i.e. sample size of 20, 5 years x 4 quarters a year). The resulting confidence interval appears too wide so in an effort to increase precision, we use 20 years of data (80 observations). However, when we reach back into the 1980s to study this fund, it had a different fund manager, plus it was buying more small-cap value companies, whereas today it is a blend of growth and value, with mid to large market caps. In addition, the factors affecting today's stock market (and mutual fund returns) are much different compared to back in the 1980s. In short, the population parameters have changed over time, and data from 20 years ago shouldn't be mixed with data from the most recent five years. The other consideration is that increasing sample size can involve additional expenses. Take the example of researching hiring plans at S&P 500 firms (cross-sectional research). A sample size of 25 was suggested, which would involve contacting the human resources department of 25 firms. By increasing the sample size to 100, or 200 or higher, we do achieve stronger precision in making our conclusions, but at what cost? In many crosssectional studies, particularly in the real world, where each sample takes time and costs money, it's sufficient to leave sample size at a certain lower level, as the additional precision isn't worth the additional cost. Data Mining Bias

Data mining is the practice of searching through historical data in an effort to find significant patterns, with which researchers can build a model and make conclusions on how this population will behave in the future. For example, the so-called January effect, where stock market returns tend to be stronger in the month of January, is a product of data mining: monthly returns on indexes going back 50 to 70 years were sorted and compared against one another, and the patterns for the month of January were noted. Another well-known conclusion from data mining is the 'Dogs of the Dow' strategy: each January, among the 30 companies in the Dow industrials, buy the 10 with the highest dividend yields. Such a strategy outperforms the market over the long run. Bookshelves are filled with hundreds of such models that "guarantee" a winning investment strategy. Of course, to borrow a common industry phrase, "past performance does not guarantee future results". Data-mining bias refers to the errors that result from relying too heavily on data-mining practices. In other words, while some patterns discovered in data mining are potentially useful, many others might just be coincidental and are not likely to be repeated in the future - particularly in an "efficient" market. For example, we may not be able to continue to profit from the January effect going forward, given that this phenomenon is so widely recognized. As a result, stocks are bid for higher in November and December by market participants anticipating the January effect, so that by the start of January, the effect is priced into stocks and one can no longer take advantage of the model. Intergenerational data mining refers to the continued use of information already put forth in prior financial research as a guide for testing the same patterns and overstating the same conclusions. Distinguishing between valid models and valid conclusions, and those ideas that are purely coincidental and the product of data mining, presents a significant challenge as data mining is often not easy to discover. A good start to investigate for its presence is to conduct an out-of-sample test - in other words, researching whether the model actually works for periods that do not overlap the time frame of the study. A valid model should continue to be statistically significant even when out-of-model tests are conducted. For research that is the product of data mining, a test outside of the model's time frame can often reveal its true nature. Other warning signs involve the number of patterns or variables examined in the research - that is, did this study simply search enough variables until something (anything) was finally discovered? Most academic research won't disclose the number of variables or patterns tested in the study, but oftentimes there are verbal hints that can reveal the presence of excessive data mining. Above all, it helps when there is an economic rationale to explain why a pattern exists, as opposed to simply pointing out that a pattern is there. For example, years ago a research study discovered that the market tended to have positive returns in years that the NFC wins the Super Bowl, yet it would perform relatively poorly when the AFC representative triumphs. However, there's no economic rationale for explaining why this pattern exists - do people spend more, or companies build more, or investors invest more, based on the winner of a football game? Yet the story is out there every Super Bowl week. Patterns discovered as a result of data mining may make for interesting reading, but in the process of making decisions, care must be taken to ensure that mined patterns not be blindly overused. Sample Selection Bias Many additional biases can adversely affect the quality and the usefulness of financial research. Sampleselection bias refers to the tendency to exclude a certain part of a population simply because the data is not available. As a result, we cannot state that the sample we've drawn is completely random - it is random only within the subset on which historic data could be obtained. Survivorship Bias A common form of sample-selection bias in financial databases is survivorship bias, or the tendency for financial and accounting databases to exclude information on companies, mutual funds, etc. that are no longer in existence. As a result, certain conclusions can be made that may in fact be overstated were one to remove this bias and include all members of the population. For example, many studies have pointed out the tendency of companies with low price-to-book-value ratios to outperform those firms with higher P/BVs. However, these

studies most likely aren't going to include those firms that have failed; thus data is not available and there is sample-selection bias. In the case of low and high P/BV, it stands to reason that companies in the midst of declining and failing will probably be relatively low on the P/BV scale yet, based on the research, we would be guided to buy these very same firms due to the historical pattern. It's likely that the gap between returns on lowpriced (value) stocks and high-priced (growth) stocks has been systematically overestimated as a result of survivorship bias. Indeed, the investment industry has developed a number of growth and value indexes. However, in terms of defining for certain which strategy (growth or value) is superior, the actual evidence is mixed. Sample selection bias extends to newer asset classes such as hedge funds, a heterogeneous group that is somewhat more removed from regulation, and where public disclosure of performance is much more discretionary compared to that of mutual funds or registered advisors of separately managed accounts. One suspects that hedge funds will disclose only the data that makes the fund look good (self-selection bias), compared to a more developed industry of mutual funds where the underperformers are still bound by certain disclosure requirements. Look-Ahead Bias Research is guilty of look-ahead bias if is makes use of information that was not actually available on a particular day, yet the researchers assume it was. Let's returning to the example of buying low price-to-bookvalue companies; the research may assume that we buy our low P/BV portfolio on Jan 1 of a given year, and then (compared to a high P/BV portfolio) hold it throughout the year. Unfortunately, while a firm's current stock price is immediately available, the book value of the firm is generally not available until months after the start of the year, when the firm files its official 10-K. To overcome this bias, one could construct P/BV ratios using current price divided by the previous year's book value, or (as is done by Russell's indexes) wait until midyear to rebalance after data is reported. Time-Period Bias This type of bias refers to an investment study that may appear to work over a specific time frame but may not last in future time periods. For example, any research done in 1999 or 2000 that covered a trailing five-year period may have touted the outperformance of high-risk growth strategies, while pointing to the mediocre results of more conservative approaches. When these same studies are conducted today for a trailing 10-year period, the conclusions might be quite different. Certain anomalies can persist for a period of several quarters or even years, but research should ideally be tested in a number of different business cycles and market environments in order to ensure that the conclusions aren't specific to one unique period or environment. 2.23 - Calculating Confidence Intervals When population variance (σ2) is known, the z-statistic can be used to calculate a reliability factor. Relative to the t-distribution, it will result in tighter confidence intervals and more reliable estimates of mean and standard deviation. Z-values are based on the standard normal distribution. For establishing confidence intervals when the population variance is known, the interval is constructed with this formula: Formula 2.34

For alpha of 5% (i.e. a 95% confidence interval), the reliability factor (Zα/2) is 1.96, but for a CFA exam problem, it is usually sufficient to round to an even 2 to solve the problem. (Remember that z-value at 95% confidence is 2, as tables for z-values are sometimes not provided!) Given a sample size of 16, a sample mean of 20 and population standard deviation of 25, a 95% confidence interval would be 20 + 2*(25/(16)1/2) = 20 + 2*(25/4) = 20 + 12.5. In short, for this sample size and for these sample statistics, we would be 95% confident

that the actual population mean would fall in a range from 7.5 to 32.5. Suppose that this 7.5-to-32.5 range was deemed too broad for our purposes. Reducing the confidence interval is accomplished in two ways: (1) increasing sample size, and (2) decreasing our allowable level of confidence. 1. Increasing sample size from 16 to 100 - Our 95% confidence is now equal to 20 + 2*(25/(100)1/2) = 20 + 2*(25/10) = 20 + 5. In other words, increasing the sample size to 100 narrows the 95% confidence range: min 15 to max 25. 2. Using 90% confidence - Our interval is now equal to 20 + 1.65*(25/(100)1/2) = 20 + 1.65*(25/10) = 20 + 4.125. In other words, decreasing the percentage confidence to 90% reduces the range: min 15.875 to max 24.125. When population variance is unknown, we will need to use the t-distribution to establish confidence intervals. The t-statistic is more conservative; that is, it results in broader intervals. Assume the following sample statistics: sample size = 16, sample mean = 20, sample standard deviation = 25.

To use the t-distribution, we must first calculate degrees of freedom, which for sample size 16 is equal to n – 1 = 15. Using an alpha of 5% (95% confidence interval), our confidence interval is 20 + (2.131) * (25/161/2), which gives a range minimum of 6.68 and a range maximum of 33.32. As before, we can reduce this range with (1) larger samples and/or (2) reducing allowable degree of confidence: 1. Increase sample size from 16 to 100 - The range is now equal to 20 + 2 * (25/10) à minimum 15 and maximum 25 (for large sample sizes the t-distribution is sufficiently close to the z-value that it becomes an acceptable alternative). 2. Reduce confidence from 95% to 90% - The range is now equal to 20 + 1.65 * (25/10) à minimum 15.875 and maximum 24.125. Large Sample Size In our earlier discussion on the central limit theorem, we stated that large samples will tend to be normally distributed even when the underlying population is non-normal. Moreover, at sufficiently large samples, where there are enough degrees of freedom, the z and t statistics will provide approximately the same reliability factor so we can default to the standard normal distribution and the z-statistic. The structure for the confidence interval is similar to our previous examples.

For a 95% confidence interval, if sample size = 100, sample standard deviation = 10 and our point estimate is 15, the confidence interval is 15 + 2* (10/1001/2) or 15 + 2. We are 95% confident that the population mean will fall between 13 and 17. Suppose we wanted to construct a 99% confidence interval. Reliability factor now becomes 2.58 and we have 15 + 2.58*(10/1001/2) or 15 + 2.58, or a minimum of 12.42 and a maximum of 17.58. The table below summarizes the statistics used in constructing confidence intervals, given various situations: Distribution Population Sample Size Appropriate

Variance Normal Normal Normal Normal Known Known Unknown Unknown Small Large Small Large Small Large Small Large

Statistic z z t t or z unavailable z unavailable t or z

Non-Normal Known Non-Normal Known Non-Normal Unknown Non-Normal Unknown Exam Tips and Tricks

While these calculations don't seem difficult, it's true that this material seems at times to run together, particularly if a CFA candidate has never used it or hasn't studied it in some time. While not likely to be a major point of emphasis, expect at least a few questions on confidence intervals and in particular, a case study that will test basic knowledge of definitions, or that will compare/contrast the two statistics presented (t-distribution and z-value) to make sure you know which is useful in a given application. More than anything, the idea is to introduce confidence intervals and how they are constructed as a prerequisite for hypothesis testing 2.24 - Hypothesis Testing Hypothesis testing provides a basis for taking ideas or theories that someone initially develops about the economy or investing or markets, and then deciding whether these ideas are true or false. More precisely, hypothesis testing helps decide whether the tested ideas are probably true or probably false as the conclusions made with the hypothesis-testing process are never made with 100% confidence – which we found in the sampling and estimating process: we have degrees of confidence - e.g. 95% or 99% - but not absolute certainty. Hypothesis testing is often associated with the procedure for acquiring and developing knowledge known as the scientific method. As such, it relates the fields of investment and economic research (i.e., business topics) to other traditional branches of science (mathematics, physics, medicine, etc.) Hypothesis testing is similar in some respects to the estimation processes presented in the previous section. Indeed, the field of statistical inference, where conclusions on a population are drawn from observing subsets of the larger group, is generally divided into two groups: estimation and hypothesis testing. With estimation, the focus was on answering (with a degree of confidence) the value of a parameter, or else a range

within which the parameter most likely falls. Think of estimating as working from general to specific. With hypothesis testing, the focus is shifted: we start my making a statement about the parameter's value, and then the question becomes whether the statement is true or not true. In other words, it starts with a specific value and works the other way to make a general statement. What is a Hypothesis? A hypothesis is a statement made about a population parameter. These are typical hypotheses: "the mean annual return of this mutual fund is greater than 12%", and "the mean return is greater than the average return for the category". Stating the hypothesis is the initial step in a defined seven-step process for hypothesis testing – a process developed based on the scientific method. We indicate each step below. In the remainder of this section of the study guide, we develop a detailed explanation for how to answer each step's question. Hypothesis testing seeks to answer seven questions: 1. 2. 3. 4. 5. 6. 7. What are the null hypothesis and the alternative hypothesis? Which test statistic is appropriate, and what is the probability distribution? What is the required level of significance? What is the decision rule? Based on the sample data, what is the value of the test statistic? Do we reject or fail to reject the null hypothesis? Based on our rejection or inability to reject, what is our investment or economic decision?

Null Hypothesis Step #1 in our process involves stating the null and alternate hypothesis. The null hypothesis is the statement that will be tested. The null hypothesis is usually denoted with "H0". For investment and economic research applications, and as it relates to the CFA exam, the null hypothesis will be a statement on the value of a population parameter, usually the mean value if a question relates to return, or the standard deviation if it relates to risk. It can also refer to the value of any random variable (e.g. sales at company XYZ are at least \$10 million this quarter). In hypothesis testing, the null hypothesis is initially regarded to be true, until (based on our process) we gather enough proof to either reject the null hypothesis, or fail to reject the null hypothesis. Alternative Hypothesis The alternative hypothesis is a statement that will be accepted as a result of the null hypothesis being rejected. The alternative hypothesis is usually denoted "Ha". In hypothesis testing, we do not directly test the worthiness of the alternate hypothesis, as our testing focus is on the null. Think of the alternative hypothesis as the residual of the null – for example, if the null hypothesis states that sales at company XYZ are at least \$10 million this quarter, the alternative hypothesis to this null is that sales will fail to reach the \$10 million mark. Between the null and the alternative, it is necessary to account for all possible values of a parameter. In other words, if we gather evidence to reject this null hypothesis, then we must necessarily accept the alternative. If we fail to reject the null, then we are rejecting the alternative. One-Tailed Test The labels "one-tailed" and "two-tailed" refer to the standard normal distribution (as well as all of the tdistributions). The key words for identifying a one-tailed test are "greater than or less than". For example, if our hypothesis is that the annual return on this mutual fund will be greater than 8%, it's a one-tailed test that will be rejected based only on finding observations in the left tail. Figure 2.13 below illustrates a one-tailed test for "greater than" (rejection in left tail). (A one-tailed test for "less than" would look similar to the graph below, with the rejection region for less than in the right tail rather than the left.)

Two-Tailed test Characterized by the words "equal to or not equal to". For example, if our hypothesis were that the return on a mutual fund is equal to 8%, we could reject it based on observations in either tail (sufficiently higher than 8% or sufficiently lower than 8%).

Choosing the null and the alternate hypothesis: If θ (theta) is the actual value of a population parameter (e.g. mean or standard deviation), and θ0 (theta subzero) is the value of theta according to our hypothesis, the null and alternative hypothesis can be formed in three different ways:

Choosing what will be the null and what will be the alternative depends on the case and what it is we wish to prove. We usually have two different approaches to what we could make the null and alternative, but in most cases, it's preferable to make the null what we believe we can reject, and then attempt to reject it. For example, in our case of a one-tailed test with the return hypothesized to be greater than 8%, we could make the greaterthan case the null (alternative being less than), or we could make the greater-than case the alternative (with less than the null). Which should we choose? A hypothesis test is typically designed to look for evidence that may possibly reject the null. So in this case, we would make the null hypothesis "the return is less than or equal to 8%", which means we are looking for observations in the left tail. If we reject the null, then the alternative is true, and we conclude the fund is likely to return at least 8%. Test Statistic Step #2 in our seven-step process involves identifying an appropriate test statistic. In hypothesis testing, a test statistic is defined as a quantity taken from a sample that is used as the basis for testing the null hypothesis (rejecting or failing to reject the null).

Calculating a test statistic will vary based upon the case and our choice of probability distribution (for example, t-test, z-value). The general format of the calculation is:

Formula 2.36 Test statistic = (sample statistic) - (value of parameter according to null) (Standard error of sample statistic) Type I and Type II Errors Step #3 in hypothesis testing involves specifying the significance level of our hypothesis test. The significance level is similar in concept to the confidence level associated with estimating a parameter – both involve choosing the probability of making an error (denoted by α, or alpha), with lower alphas reducing the percentage probability of error. In the case of estimators, the tradeoff of reducing this error was to accept a wider (less precise) confidence interval. In the case of hypothesis testing, choosing lower alphas also involves a tradeoff – in this case, increasing a second type of error. Errors in hypothesis testing come in two forms: Type I and Type II. A type I error is defined as rejecting the null hypothesis when it is true. A type II error is defined as not rejecting the null hypothesis when it is false. As the table below indicates, these errors represent two of the four possible outcomes of a hypothesis test:

The reason for separating type I and type II errors is that, depending on the case, there can be serious consequences for a type I error, and there are other cases when type II errors need to be avoided, and it is important to understand which type is more important to avoid. Significance Level Denoted by α, or alpha, the significance level is the probability of making a type I error, or the probability that we will reject the null hypothesis when it is true. So if we choose a significance level of 0.05, it means there is a 5% chance of making a type I error. A 0.01 significance level means there is just a 1% chance of making a type I error. As a rule, a significance level is specified prior to calculating the test statistic, as the analyst conducting the research may use the result of the test statistic calculation to impact the choice of significance level (may prompt a change to higher or lower significance). Such a change would take away from the objectivity of the test. While any level of alpha is permissible, in practice there is likely to be one of three possibilities for significance level: 0.10 (semi-strong evidence for rejecting the null hypothesis), 0.05 (strong evidence), and 0.01 (very strong evidence). Why wouldn't't we always opt for 0.01 or even lower probabilities of type I errors – isn't the idea to reduce and eliminate errors? In hypothesis testing, we have to control two types of errors, with a tradeoff that when one type is reduced, the other type is increased. In other words, by lowering the chances of a type I error, we must reject the null less frequently – including when it is false (a type II error). Actually quantifying this tradeoff is impossible because the probability of a type II error (denoted by β, or beta) is not easy to define (i.e. it changes for each value of θ). Only by increasing sample size can we reduce the probability of both types of errors. Decision Rule Step #4 in the hypothesis-testing process requires stating a decision rule. This rule is crafted by comparing two values: (1) the result of the calculated value of the test statistic, which we will complete in step #5 and (2) a

rejection point, or critical value (or values) that is (are) the function of our significance level and the probability distribution being used in the test. If the calculated value of the test statistic is as extreme (or more extreme) than the rejection point, then we reject the null hypothesis, and state that the result is statistically significant. Otherwise, if the test statistic does not reach the rejection point, then we cannot reject the null hypothesis and we state that the result is not statistically significant. A rejection point depends on the probability distribution, on the chosen alpha, and on whether the test in one-tailed or two-tailed. For example, if in our case we are able to use the standard normal distribution (the z-value), if we choose an alpha of 0.05, and we have a two-tailed test (i.e. reject the null hypothesis when the test statistic is either above or below), the two rejection points are taken from the z-values for standard normal distributions: below -1.96 and above +1.96. Thus if the calculated test statistic is in these two rejection ranges, the decision would be to reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

Look Out! Traditionally, it was said that we accepted the null hypothesis; however, the authors have discouraged use of the word "accept", in terms of accepting the null hypothesis, as those terms imply a greater degree of conviction about the null than is warranted. Having made the effort to make this distinction, do not be surprised if this subtle change (which seems inconsequential on the surface) somehow finds its way onto the CFA exam (if you answer "accept the null hypothesis", you get the question wrong, and if you answer "fail to reject the null hypothesis" you score points. Power of a Test The power of a hypothesis test refers to the probability of correctly rejecting the null hypothesis. There are two possible outcomes when the null hypothesis is false: either we (1) reject it (as we correctly should) or (2) we accept it – and make a type II error. Thus the power of a test is also equivalent to 1 minus the beta (β), the probability of a type II error. Since beta isn't quantified, neither is the power of a test. For hypothesis tests, it is sufficient to specify significance level, or alpha. However, given a choice between more than one test statistic (for example, z-test, t-test), we will always choose the test that increases a test's power, all other factors equal. Confidence Intervals vs. Hypothesis Tests Confidence intervals, as a basis for estimating population parameters, were constructed as a function of "number of standard deviations away from the mean". For example, for 95% confidence that our interval will include the population mean (μ), when we use the standard normal distribution (z-statistic), the interval is: (sample mean) ± 1.96 * (standard error), or, equivalently,-1.96*(standard error) < (sample mean) < +1.96*(standard error). Hypothesis tests, as a basis for testing the value of population parameters, are also set up to reject or not reject based on "number of standard deviations away from the mean". The basic structure for testing the null hypothesis at the 5% significance level, again using the standard normal, is -1.96 < [(sample mean – hypothesized population mean) / standard error] < +1.96, or, equivalently,-1.96 * (std. error) < (sample mean) – (hypo. pop. mean) < +1.96 * (std. error). In hypothesis testing, we essentially create an interval within which the null will not be rejected, and we are 95% confident in this interval (i.e. there's a 5% chance of a type I error). By slightly rearranging terms, the structure for a confidence interval and the structure for rejecting/not rejecting a null hypothesis appear very similar – an indication of the relationship between the concepts. Making a Statistical Decision

Step #6 in hypothesis testing involves making the statistical decision, which actually compares the test statistic to the value computed as the rejection point; that is, it carries out the decision rule created in step #4. For example, with a significance level of 0.05, using the standard normal distribution, on a two-tailed test (i.e. null is "equal to"; alternative is not equal to), we have rejection points below –1.96 and above +1.96. If our calculated test statistic [(sample mean – hypothesized mean) / standard error] = 0.6, then we cannot reject the null hypothesis. If the calculated value is 3.6, we reject the null hypothesis and accept the alternative. The final step, or step #7, involves making the investment or economic decision (i.e. the real-world decision). In this context, the statistical decision is but one of many considerations. For example, take a case where we created a hypothesis test to determine whether a mutual fund outperformed its peers in a statistically significant manner. For this test, the null hypothesis was that the fund's mean annual return was less than or equal to a category average; the alternative was that it was greater than the average. Assume that at a significance level of 0.05, we were able to establish statistical significance and reject the null hypothesis, thus accepting the alternative. In other words, our statistical decision was that this fund would outperform peers, but what is the investment decision? The investment decision would likely take into account (for example) the risk tolerance of the client and the volatility (risk) measures of the fund, and it would assess whether transaction costs and tax implications make the investment decision worth making. In other words, rejecting/not rejecting a null hypothesis does not automatically require that a decision be carried out; thus there is the need to assess the statistical decision and the economic or investment decision in two separate steps. 2.25 - Interpreting Statistical Results Results Where Data is Normally Distributed and Variance is Known or Unknown 1. Whenever variance of a population (σ2) is known, the z-test is the preferred alternative to test a hypothesis of the population mean (μ). To compute the test statistic, standard error is equal to population standard deviation / sq. root of sample size. For example, with a population variance of 64 and a sample size of 25, standard error is equal to (64)1/2/(25)1/2, or 1.6. Example: Test Statistic Suppose that in this same case we have constructed a hypothesis test that the mean annual return is equal to 12%; that is, we have a twotailed test, where the null hypothesis is that the population mean = 12, and the alternate is that it is not equal to 12. Using a 0.05 critical level (0.025 for each tail), our rule is to reject the null when the test statistic is either below –1.96 or above +1.96 (at p = .025, z = 1.96). Suppose sample mean = 10.6. Answer: Test statistic = (10.6 – 12)/1.6 = -1.4/1.6 = -0.875. This value does not fall below the rejection point, so we cannot reject the null hypothesis with statistical certainty. 2. When we are making hypothesis tests on a population mean, it's relatively likely that the population variance will be unknown. In these cases, we use a sample standard deviation when computing standard error, and the t-statistic for the decision rule (i.e. as the source for our rejection level). Compared to the z or standard normal, a t-statistic is more conservative (i.e. higher rejection points for rejecting the null hypothesis). In cases with large sample sizes (at least 30), the z-statistic may be substituted. Example: Take a case where sample size is 16. In this case, the t-stat is the only appropriate choice. For the t-

distribution, degrees of freedom are calculated as (sample size – 1), df = 15 in this example. In this case, assume we are testing a hypothesis that a population mean is greater than 8, so this will be a one-tailed test (right tail): null hypothesis is μ < 8, and the alternative is that μ > 8. Our required significance level is 0.05. Using the table for Student's t-distribution for df = 15 and p = 0.05, the critical value (rejection point) is 1.753. In other words, if our calculated test statistic is greater than 1.753, we reject the null hypothesis. Answer: Moving to step 5 of the hypothesis-testing process, we take a sample where the mean is 8.3 and the standard deviation is 6.1. For this sample, standard error = s /n1/2 = 6.1/(16)1/2 = 6.1/4 = 1.53. The test statistic is (8.3 – 8.0)/1.53 = 0.3/1.53, or 0.196. Comparing 0.196 to our rejection point of 1.753, we are unable to reject the null hypothesis. Note that in this case, our sample mean of 8.3 was actually greater than 8; however, the hypothesis test is set up to require statistical significance, not simply compare a sample mean to the hypothesis. In other words, the decisions made in hypothesis testing are also a function of sample size (which at 16 is low), the standard deviation, the required level of significance and the t-distribution. Our interpretation in this example is that the 8.3 from the sample mean, while nominally higher than 8, simply isn't significantly higher than 8, at least to the point where we would be able to definitively make a conclusion regarding the population mean being greater than 8. Relative equality of population means of two normally distributed populations, where independent random sample assumed variances are equal or unequal For the case where the population variances for two separate groups can be assumed to be equal, a technique for pooling an estimate of population variance (s2) from the sample data is given by the following formula (assumes two independent random samples):

Formula 2.37

Where: n1, n2 are samples sizes, and s12, s22 are sample variances. Degrees of freedom = n1 + n2 – 2 For testing equality of two population means (i.e. μ1 = μ2), the test statistic calculates the difference in sample means (X1 – X2), divided by the standard error: the square root of (s2/n1 + s2/n2). Example: Population Means Assume that the pooled estimate of variance (s2) was 40 and sample size for each group was 20. Standard error = (40/20 + 40/20)1/2 = (80/20) ½ = 2. Answer: If sample means were 8.6 and 8.9, the t = (8.6 – 8.9)/2 = -0.3/2 = -0.15. Tests of equality/inequality are twosided tests. With df = 38 (sum of samples sizes – 2) and if we assume 0.05 significance (p = 0.025), the rejection level is t < -2.024, or t > +2.024. Since our computed test statistic was –0.15, we cannot reject the null hypothesis that these population means are equal.

1. For hypothesis tests of equal population means where variances cannot be assumed to be equal, the appropriate test statistic for the hypothesis is the t-stat, but we can no longer pool an estimate of standard deviation, and the standard error becomes the square root of [(s12/n1) + (s22/n2)]. The null hypothesis remains μ1 = μ2, and the test statistic is calculated similar to the previous example (i.e. difference in sample means / standard error). Computing degrees of freedom is approximated by this formula

Look Out! Note: Don't spend time memorizing this formula; it won't be required for the exam. Focus instead on the steps of hypothesis testing and interpreting results. The Paired-Comparisons Test The previous example tested the equality or inequality of two population means, with a key assumption that the two populations were independent of each other. In a paired-comparisons test, the two populations have some degree of correlation or co-movement, and the calculation of test statistic takes account of this correlation. Take a case where we are comparing two mutual funds that are both classified as large-cap growth, in which we are testing whether returns for one are significantly above the other (statistically significant). The pairedcomparisons test is appropriate since we assume some degree of correlation, as returns for each will be dependent on the market. To calculate the t-statistic, we first find the sample mean difference, denoted by d: d = (1/n)(d1 + d2 + d3 …. + dn), where n is the number of paired observations (in our example, the number of quarters for which we have quarterly returns), and each d is the difference between each observation in the sample. Next, sample variance, or (sum of all deviations from d )2/(n – 1) is calculated, with standard deviation (sd) the positive square root of the variance. Standard error = sd/(n)1/2. For our mutual example, if our mean returns are for 10 years (40 quarters of data), have a sample mean difference of 2.58, and a sample standard deviation of 5.32, our test statistic is computed as (2.58)/((5.32)/ (40)1/2), or 3.067. At 49 degrees of freedom with a 0.05 significance level, the rejection point is 2.01. Thus we reject the null hypothesis and state that there is a statistically significant difference in returns between these funds. Hypothesis Tests on the Variance of a Normally Distributed Population Hypothesis tests concerning the value of a variance (σ2) start by formulating the null and alternative hypotheses.

In hypothesis tests for the variance on a single normally distributed population, the appropriate test statistic is known as a “chi-square”, denoted by χ2. Unlike the distributions we have been using previously, the chi-square is asymmetrical as it is bound on the left by zero. (This must be true since variance is always a positive number.) The chi-square is actually a family of distributions similar to the t-distributions, with different degrees

of freedom resulting in a different chi-square distribution. Formula 2.38 The test statistic is χ2 = (n – 1)*s2 σ02 Where: n = sample size, s2 = sample variance, σ02 = population variance from hypothesis Sample variance s2 is refereed to as the sum of deviations between observed values and sample mean2, degrees of freedom, or n – 1 Example: Hypothesis Testing w/ Chi Squared Statistic To illustrate a hypothesis test using the chi-square statistic, take an example of a fund that we believe has been very volatile relative to the market, and we wish to prove that level of risk (as measured by quarterly standard deviation) is greater than the market's average. For our test, we assume the market's quarterly standard deviation is 10%. Our test will examine quarterly returns over the past five years, so n = 20, and degrees of freedom = 19. Our test is a greater-than test with the null hypothesis of σ2 < (10)2, or 100, and an alternate hypothesis of σ2 > 100. Using a 0.05 level of significance, our rejection point, from the chi-square tables with df = 19 and p = 0.05 in the right tail, is 30.144. Thus if our calculated test statistic is greater than 30.144, we reject the null hypothesis at 5% level of significance. Answer: Examining the quarterly returns for this period, we find our sample variance (s2) is 135. With n = 20 and σ02 = 100, we have all the data required to calculate the test statistic. χ2 = ((n – 1)*s2)/σ02 = ((20 – 1)*135)/100 = 2565/100 or 25.65. Since 25.65 is less than our critical value of 30.144, we do not have enough evidence to reject the null hypothesis. While this fund may indeed be quite volatile, its volatility isn't statistically more meaningful than the market average for the period. Hypothesis Tests Relating to the equality of the Variances of Two Normally Distributed Populations, where both Samples are Random and Independent For hypothesis tests concerning relative values of the variances from two populations – whether σ12 (variance of the first population) and σ22 (variance of the second) are equal/not equal/greater than/less than – we can construct hypotheses in one of three ways.

When a hypothesis test compares variances from two populations and we can assume that random samples from the populations are independent (uncorrelated), the appropriate test is the F-test, which represents the ratio of sample variances. As with the chi-square, the F-distribution is a family of asymmetrical distributions (bound on the left by zero). The F-family of distributions is defined by two values of degrees of freedom: the numerator (df1) and denominator (df2). Each of the degrees of freedom are taken from the sample sizes (each sample size – 1).

The F-test taken from the sample data could be either s12/s22, or s22/s12 - with the convention to use whichever ratio produces the larger number. This way, the F-test need only be concerned with values greater than 1, since one of the two ratios is always going to be a number above 1. Example: Hypothesis Testing w/ Ratio of Sample Variances To illustrate, take a case of two mutual funds. Fund A has enjoyed greater performance returns than Fund B (which we've owned, unfortunately). Our hypothesis is that the level of risk between these two is actually quite similar, meaning the Fund A has superior risk-adjusted results. We test the hypothesis for the past five years of quarterly data (df is 19 for both numerator and denominator). Using 0.05 significance, our critical value from the F-tables is 2.51. Assume from the five-year sample that quarterly standard deviations have been 8.5 for Fund A, and 6.3 for Fund B. Answer: Our F-statistic is (8.5)2/(6.3)2 = 72.25/39.69 = 1.82. Since 1.82 does not reach the rejection level of 2.51, we cannot reject the null hypothesis, and we state that the risk between these funds is not significantly different. Concepts from the hypothesis-testing section are unlikely to be tested by rigorous exercises in number crunching but rather in identifying the unique attributes of a given statistic. For example, a typical question might ask, “In hypothesis testing, which test statistic is defined by two degrees of freedom, the numerator and the denominator?”, giving you these choices: A. t-test, B. z-test, C. chi-square, or D. F-test. Of course, the answer would be D. Another question might ask, “Which distribution is NOT symmetrical?”, and then give you these choices: A. t, B. z, C. chi-square, D. normal. Here the answer would be C. Focus on the defining characteristics, as they are the most likely source of exam questions. Parametric and Nonparametric Tests All of the hypothesis tests described thus far have been designed, in one way or another, to test the predicted value of one or more parameters – unknown variables such as mean and variance that characterize a population and whose observed values are distributed in a certain assumed way. Indeed, these specific assumptions are mandatory and also very important: most of the commonly applied tests are built with data that assumes the underlying population is normally distributed, which if not true, invalidates the conclusions reached. The less normal the population (i.e. the more skewed the data), the less these parametric tests or procedures should be used for the intended purpose. Nonparametric hypothesis tests are designed for cases where either (a) fewer or different assumptions about the population data are appropriate, or (b) where the hypothesis test is not concerned with a population parameter. In many cases, we are curious about a set of data but believe that the required assumptions (for example, normally distributed data) do not apply to this example, or else the sample size is too small to comfortably make such an assumption. A number of nonparametric alternatives have been developed to use in such cases. The table below indicates a few examples that are analogous to common parametric tests. Concern of hypothesis Single mean Differences between Parametric test t-test, z-test t-test (or Nonparametric Wilcoxian signed-rank test Mann-Whitney U-test

means Paired comparisons

approximate t-test) t-test Sign test, or Wilcoxian

Source: DeFusco, McLeavey, Pinto, Runkle, Quantitative Methods for Investment Analysis, 2nd edition, Chapter 7, p 357. A number of these tests are constructed by first converting data into ranks (first, second, third, etc.) and then fitting the data into the test. One such test applied to testing correlation (the degree to which two variables are related to each other) is the Spearman rank correlation coefficient. The Spearman test is useful in cases where a normal distribution cannot be assumed – usually when a variable is bound by zero (always positive), or where the range of values are limited. For the Spearman test, each observation in the two variables is ranked from largest to smallest, and then the differences between the ranks are measured. The data is then used to find the test statistic rs: 1 – [6*(sum of squared differences)/n*(n2 – 1)]. This result is compared to a rejection point (based on the Spearman rank correlation) to determine whether to reject or not reject the null hypothesis. Another situation requiring a nonparametric approach is to answer a question about something other than a parameter. For example, analysts often wish to address whether a sample is truly random or whether the data have a pattern indicating that it is not random (tested with the so-called “runs test”). Tests such as KolmogorovSmirnov find whether a sample comes from a population that is distributed a certain way. Most of these nonparametric examples are specialized and unlikely to be tested in any detail on the CFA Level I exam. 2.26 - Correlation and Regression Financial variables are often analyzed for their correlation to other variables and/or market averages. The relative degree of co-movement can serve as a powerful predictor of future behavior of that variable. A sample covariance and correlation coefficient are tools used to indicate relation, while a linear regression is a technique designed both to quantify a positive relationship between random variables, and prove that one variable is dependent on another variable. When you are analyzing a security, if returns are found to be significantly dependent on a market index or some other independent source, then both return and risk can be better explained and understood. Scatter Plots A scatter plot is designed to show a relationship between two variables by graphing a series of observations on a twodimensional graph – one variable on the X-axis, the other on the Y-axis. Figure 2.15: Scatter Plot

Sample Covariance To quantify a linear relationship between two variables, we start by finding the covariance of a sample of paired observations. A sample covariance between two random variables X and Y is the average value of the cross-product of all observed deviations from each respective sample mean. A cross-product, for the ith observation in a sample, is found by this calculation: (ith observation of X – sample mean of X) * (ith observation of Y – sample mean of Y). The covariance is the sum of all cross-products, divided by (n – 1). To illustrate, take a sample of five paired observations of annual returns for two mutual funds, which we will label X and Y:

Year 1st 2nd 3rd 4th 5th Sum

X return Y return Cross-Product: (Xi – Xmean)*(Yi – Ymean) +15.5 +10.2 -5.2 -6.3 +12.7 32.9 +9.6 +4.5 +0.2 -1.1 +23.5 36.7 7.3 (15.5 – 6.6)*(9.6 – 7.3) (10.2 – 6.6)*(4.5 – 7.3) (-5.2 – 6.6)*(0.2 – 7.3) (-6.3 – 6.6)*(-1.1 – 7.3) = 20.47 = -10.08 = 83.78 = 108.36

(12.7 – 6.6)*(23.5 – 7.3) = 196.02 398.55 398.55/(n – 1) = 99.64 = Cov (X,Y)

Average 6.6

Average X and Y returns were found by dividing the sum by n or 5, while the average of the cross-products is computed by dividing the sum by n – 1, or 4. The use of n – 1 for covariance is done by statisticians to ensure an unbiased estimate. Interpreting a covariance number is difficult for those who are not statistical experts. The 99.64 we computed for this example has a sign of "returns squared" since the numbers were percentage returns, and a return squared is not an intuitive concept. The fact that Cov(X,Y) of 99.64 was greater than 0 does indicate a positive or linear relationship between X and Y. Had the covariance been a negative number, it would imply an inverse relationship, while 0 means no relationship. Thus 99.64 indicates that the returns have positive co-movement (when one moves higher so does the other), but doesn’t offer any information on the extent of the co-movement. Sample Correlation Coefficient By calculating a correlation coefficient, we essentially convert a raw covariance number into a standard format that can be more easily interpreted to determine the extent of the relationship between two variables. The formula for calculating a sample correlation coefficient (r) between two random variables X and Y is the following: Formula 2.39

r = (covariance between X, Y) / (sample standard deviation of X) * (sample std. dev. of Y).

Example: Correlation Coefficient Return to our example from the previous section, where covariance was found to be 99.64. To find the correlation coefficient, we must compute the sample variances, a process illustrated in the table below.

Year 1st 2nd 3rd 4th 5th Sum Average

X return +15.5 +10.2 -5.2 -6.3 +12.7 32.9 6.6

Y return +9.6 +4.5 +0.2 -1.1 +23.5 36.7 7.3

Squared X deviations (15.5 – 6.6)2 = 79.21 (10.2 – 6.6)2 = 12.96 (-5.2 – 6.6)2 = 139.24 (-6.3 – 6.6)2 = 166.41 (12.7 – 6.6)2 = 146.41 544.23 136.06 = X variance

Squared Y deviations (9.6 – 7.3)2 = 5.29 (4.5 – 7.3)2 = 7.84 (0.2 – 7.3)2 = 50.41 (-1.1 – 7.3)2 = 70.56 (23.5 – 7.3)2 = 262.44 369.54 99.14 = Y variance

Answer: As with sample covariance, we use (n – 1) as the denominator in calculating sample variance (sum of squared deviations as the numerator) – thus in the above example, each sum was divided by 4 to find the variance. Standard deviation is the positive square root of variance: in this example, sample standard deviation of X is (136.06)1/2, or 11.66; sample standard deviation of Y is (99.14)1/2, or 9.96. Therefore, the correlation coefficient is (99.64)/11.66*9.96 = 0.858. A correlation coefficient is a value between –1 (perfect inverse relationship) and +1 (perfect linear relationship) – the closer it is to 1, the stronger the relationship. This example computed a number of 0.858, which would suggest a strong linear relationship. Hypothesis Testing: Determining Whether a Positive or Inverse Relationship Exists Between Two Random Variables A hypothesis-testing procedure can be used to determine whether there is a positive relationship or an inverse relationship between two random variables. This test uses each step of the hypothesis-testing procedure, outlined earlier in this study guide. For this particular test, the null hypothesis, or H0, is that the correlation in the population is equal to 0. The alternative hypothesis, Ha, is that the correlation is different from 0. The t-test is the appropriate test statistic. Given a sample correlation coefficient r, and sample size n, the formula for the test statistic is this: t = r*(n – 2)1/2/(1 – r2)1/2, with degrees of freedom = n – 2 since we have 2 variables.

Testing whether a correlation coefficient is equal/not equal to 0 is a two-tailed test. In our earlier example with a sample of 5, degrees of freedom = 5 – 2 = 3, and our rejection point from the t-distribution, at a significance level of 0.05, would be 3.182 (p = 0.025 for each tail). Using our computed sample r of 0.858, t = r*(n – 2)1/2/(1 – r2)1/2 = (0.858)*(3)1/2/(1 – (0.858)2)1/2 = (1.486)/ (0.514) = 2.891. Comparing 2.891 to our rejection point of 3.182, we do not have enough evidence to reject the null hypothesis that the population correlation coefficient is 0. In this case, while it does appear that there is a strong linear relationship between our two variables (and thus we may well be risking a type II error), the results of the hypothesis test show the effects of a small sample size; that is, we had just three degrees of freedom, which required a high rejection level for the test statistic in order to reject the null hypothesis. Had there been one more observation on our sample (i.e. degrees of freedom = 4), then the rejection point would have been 2.776 and we would have rejected the null and accepted that there is likely to be a significant difference from 0 in the population r. In addition, level of significance plays a role in this hypothesis test. In this particular example, we would reject the null hypothesis at a 0.1 level of significance, where the rejection level would be any test statistic higher than 2.353. Of course, a hypothesis-test process is designed to give information about that example and the pre-required assumptions (done prior to calculating the test statistic). Thus it would stand that the null could not be rejected in this case. Quite frankly, the hypothesis-testing exercise gives us a tool to establish significance to a sample correlation coefficient, taking into account the sample size. Thus, even though 0.858 feels close to 1, it’s also not close enough to make conclusions about correlation of the underlying populations – with small sample size probably a factor in the test. CFA Level 1 - Quantitative Methods Email to Friend Comments

2.27 - Regression Analysis Linear Regression A linear regression is constructed by fitting a line through a scatter plot of paired observations between two variables. The sketch below illustrates an example of a linear regression line drawn through a series of (X, Y) observations: Figure 2.16: Linear Regression

A linear regression line is usually determined quantitatively by a best-fit procedure such as least squares (i.e. the distance between the regression line and every observation is minimized). In linear regression, one variable is plotted on the X axis and the other on the Y. The X variable is said to be the independent variable, and the Y is said to be the dependent variable. When analyzing two random variables, you must choose which variable is independent and which is dependent. The choice of independent and dependent follows from the hypothesis – for many examples, this distinction should be intuitive. The most popular use of regression analysis is on investment returns, where the market index is independent while the individual security or mutual fund is dependent on the market. In essence, regression analysis formulates a hypothesis that the movement in one variable (Y) depends on the movement in the other (X). Regression Equation The regression equation describes the relationship between two variables and is given by the general format:

Formula 2.40

Y = a + bX + ε

Where: Y = dependent variable; X = independent variable, a = intercept of regression line; b = slope of regression line, ε = error term In this format, given that Y is dependent on X, the slope b indicates the unit changes in Y for every unit change in X. If b = 0.66, it means that every time X increases (or decreases) by a certain amount, Y increases (or decreases) by 0.66*that amount. The intercept a indicates the value of Y at the point where X = 0. Thus if X indicated market returns, the intercept would show how the dependent variable performs when the market has a flat quarter where returns are 0. In investment parlance, a manager has a positive alpha because a linear regression between the manager's performance and the performance of the market has an intercept number a greater than 0. Linear Regression - Assumptions Drawing conclusions about the dependent variable requires that we make six assumptions, the classic

assumptions in relation to the linear regression model: 1. The relationship between the dependent variable Y and the independent variable X is linear in the slope and intercept parameters a and b. This requirement means that neither regression parameter can be multiplied or divided by another regression parameter (e.g. a/b), and that both parameters are raised to the first power only. In other words, we can't construct a linear model where the equation was Y = a + b2X + ε, as unit changes in X would then have a b2 effect on a, and the relation would be nonlinear. 2. The independent variable X is not random. 3. The expected value of the error term "ε" is 0. Assumptions #2 and #3 allow the linear regression model to produce estimates for slope b and intercept a. 4. The variance of the error term is constant for all observations. Assumption #4 is known as the "homoskedasticity assumption". When a linear regression is heteroskedastic its error terms vary and the model may not be useful in predicting values of the dependent variable. 5. The error term ε is uncorrelated across observations; in other words, the covariance between the error term of one observation and the error term of the other is assumed to be 0. This assumption is necessary to estimate the variances of the parameters. 6. The distribution of the error terms is normal. Assumption #6 allows hypothesis-testing methods to be applied to linear-regression models. Standard Error of Estimate Abbreviated SEE, this measure gives an indication of how well a linear regression model is working. It compares actual values in the dependent variable Y to the predicted values that would have resulted had Y followed exactly from the linear regression. For example, take a case where a company's financial analyst has developed a regression model relating annual GDP growth to company sales growth by the equation Y = 1.4 + 0.8X. Assume the following experience (on the next page) over a five-year period; predicted data is a function of the model and GDP, and "actual" data indicates what happened at the company: Year (Xi) GDP Predicted co. Actual co. Residual Squared growth growth (Yi) Growth (Yi - Yi) residual (Yi) 1 2 3 4 5 5.1 2.1 -0.9 0.2 6.4 5.5 3.1 0.7 1.6 6.5 5.2 2.7 1.5 3.1 6.3 -0.3 -0.4 0.8 1.5 -0.2 0.09 0.16 0.64 2.25 0.04

To find the standard error of the estimate, we take the sum of all squared residual terms and divide by (n – 2), and then take the square root of the result. In this case, the sum of the squared residuals is 0.09+0.16+0.64+2.25+0.04 = 3.18. With five observations, n – 2 = 3, and SEE = (3.18/3)1/2 = 1.03%.

The computation for standard error is relatively similar to that of standard deviation for a sample (n – 2 is used instead of n – 1). It gives some indication of the predictive quality of a regression model, with lower SEE numbers indicating that more accurate predictions are possible. However, the standard-error measure doesn't indicate the extent to which the independent variable explains variations in the dependent model. Coefficient of Determination Like the standard error, this statistic gives an indication of how well a linear-regression model serves as an estimator of values for the dependent variable. It works by measuring the fraction of total variation in the dependent variable that can be explained by variation in the independent variable. In this context, total variation is made up of two fractions: Total variation = explained variation total variation + unexplained variation total variation

The coefficient of determination, or explained variation as a percentage of total variation, is the first of these two terms. It is sometimes expressed as 1 – (unexplained variation / total variation). For a simple linear regression with one independent variable, the simple method for computing the coefficient of determination is squaring the correlation coefficient between the dependent and independent variables. Since the correlation coefficient is given by r, the coefficient of determination is popularly known as "R2, or Rsquared". For example, if the correlation coefficient is 0.76, the R-squared is (0.76)2 = 0.578. R-squared terms are usually expressed as percentages; thus 0.578 would be 57.8%. A second method of computing this number would be to find the total variation in the dependent variable Y as the sum of the squared deviations from the sample mean. Next, calculate the standard error of the estimate following the process outlined in the previous section. The coefficient of determination is then computed by (total variation in Y – unexplained variation in Y) / total variation in Y. This second method is necessary for multiple regressions, where there is more than one independent variable, but for our context we will be provided the r (correlation coefficient) to calculate an Rsquared. What R2 tells us is the changes in the dependent variable Y that are explained by changes in the independent variable X. R2 of 57.8 tells us that 57.8% of the changes in Y result from X; it also means that 1 – 57.8% or 42.2% of the changes in Y are unexplained by X and are the result of other factors. So the higher the R-squared, the better the predictive nature of the linear-regression model. Regression Coefficients For either regression coefficient (intercept a, or slope b), a confidence interval can be determined with the following information: 1. 2. 3. 4. An estimated parameter value from a sample Standard error of the estimate (SEE) Significance level for the t-distribution Degrees of freedom (which is sample size – 2)

For a slope coefficient, the formula for confidence interval is given by b ± tc*SEE, where tc is the critical t value at our chosen significant level. To illustrate, take a linear regression with a mutual fund's returns as the dependent variable and the S&P 500 index as the independent variable. For five years of quarterly returns, the slope coefficient b is found to be 1.18, with a standard error of the estimate of 0.147. Student's t-distribution for 18 degrees of freedom (20 quarters –

2) at a 0.05 significance level is 2.101. This data gives us a confidence interval of 1.18 ± (0.147)*(2.101), or a range of 0.87 to 1.49. Our interpretation is that there is only a 5% chance that the slope of the population is either less than 0.87 or greater than 1.49 – we are 95% confident that this fund is at least 87% as volatile as the S&P 500, but no more than 149% as volatile, based on our five-year sample. Hypothesis testing and Regression Coefficients Regression coefficients are frequently tested using the hypothesis-testing procedure. Depending on what the analyst is intending to prove, we can test a slope coefficient to determine whether it explains chances in the dependent variable, and the extent to which it explains changes. Betas (slope coefficients) can be determined to be either above or below 1 (more volatile or less volatile than the market). Alphas (the intercept coefficient) can be tested on a regression between a mutual fund and the relevant market index to determine whether there is evidence of a sufficiently positive alpha (suggesting value added by the fund manager). The mechanics of hypothesis testing are similar to the examples we have used previously. A null hypothesis is chosen based on a not-equal-to, greater-than or less-than-case, with the alternative satisfying all values not covered in the null case. Suppose in our previous example where we regressed a mutual fund's returns on the S&P 500 for 20 quarters our hypothesis is that this mutual fund is more volatile than the market. A fund equal in volatility to the market will have slope b of 1.0, so for this hypothesis test, we state the null hypothesis (H0)as the case where slope is less than or greater to 1.0 (i.e. H0: b < 1.0). The alternative hypothesis Ha has b > 1.0. We know that this is a greater-than case (i.e. one-tailed) – if we assume a 0.05 significance level, t is equal to 1.734 at degrees of freedom = n – 2 = 18. Example: Interpreting a Hypothesis Test From our sample, we had estimated b of 1.18 and standard error of 0.147. Our test statistic is computed with this formula: t = estimated coefficient – hypothesized coeff. / standard error = (1.18 – 1.0)/0.147 = 0.18/0.147, or t = 1.224. For this example, our calculated test statistic is below the rejection level of 1.734, so we are not able to reject the null hypothesis that the fund is more volatile than the market. Interpretation: the hypothesis that b > 1 for this fund probably needs more observations (degrees of freedom) to be proven with statistical significance. Also, with 1.18 only slightly above 1.0, it is quite possible that this fund is actually not as volatile as the market, and we were correct to not reject the null hypothesis. Example: Interpreting a regression coefficient. The CFA exam is likely to give the summary statistics of a linear regression and ask for interpretation. To illustrate, assume the following statistics for a regression between a small-cap growth fund and the Russell 2000 index: Correlation coefficient Intercept Slope What do each of these numbers tell us? 1. Variation in the fund is about 75%, explained by changes in the Russell 2000 index. This is true because the square of the correlation coefficient, (0.864)2 = 0.746, gives us the coefficient of determination or Rsquared. 0.864 -0.417 1.317

2. The fund will slightly underperform the index when index returns are flat. This results from the value of the intercept being –0.417. When X = 0 in the regression equation, the dependent variable is equal to the intercept. 3. The fund will on average be more volatile than the index. This fact follows from the slope of the regression line of 1.317 (i.e. for every 1% change in the index, we expect the fund's return to change by 1.317%). 4. The fund will outperform in strong market periods, and underperform in weak markets. This fact follows from the regression. Additional risk is compensated with additional reward, with the reverse being true in down markets. Predicted values of the fund's return, given a return for the market, can be found by solving for Y = -0.417 + 1.317X (X = Russell 2000 return). Analysis of Variance (ANOVA) Analysis of variance, or ANOVA, is a procedure in which the total variability of a random variable is subdivided into components so that it can be better understood, or attributed to each of the various sources that cause the number to vary. Applied to regression parameters, ANOVA techniques are used to determine the usefulness in a regression model, and the degree to which changes in an independent variable X can be used to explain changes in a dependent variable Y. For example, we can conduct a hypothesis-testing procedure to determine whether slope coefficients are equal to zero (i.e. the variables are unrelated), or if there is statistical meaning to the relationship (i.e. the slope b is different from zero). An F-test can be used for this process. F-Test The formula for F-statistic in a regression with one independent variable is given by the following:

Formula 2.41 F = mean regression sum of squares / mean squared error = (RSS/1) / [SSE/(n – 2)] The two abbreviations to understand are RSS and SSE:

1. RSS, or the regression sum of squares, is the amount of total variation in the dependent variable Y that is explained in the regression equation. The RSS is calculated by computing each deviation between a predicted Y value and the mean Y value, squaring the deviation and adding up all terms. If an independent variable explains none of the variations in a dependent variable, then the predicted values of Y are equal to the average value, and RSS = 0. 2. SSE, or the sum of squared error of residuals, is calculated by finding the deviation between a predicted Y and an actual Y, squaring the result and adding up all terms. TSS, or total variation, is the sum of RSS and SSE. In other words, this ANOVA process breaks variance into two parts: one that is explained by the model and one that is not. Essentially, for a regression equation to have high predictive quality, we need to see a high RSS and a low SSE, which will make the ratio (RSS/1)/[SSE/(n – 2)] high and (based on a comparison with a critical F-value) statistically meaningful. The critical value is taken

from the F-distribution and is based on degrees of freedom. For example, with 20 observations, degrees of freedom would be n – 2, or 18, resulting in a critical value (from the table) of 2.19. If RSS were 2.5 and SSE were 1.8, then the computed test statistic would be F = (2.5/(1.8/18) = 25, which is above the critical value, which indicates that the regression equation has predictive quality (b is different from 0) Estimating Economic Statistics with Regression Models Regression models are frequently used to estimate economic statistics such as inflation and GDP growth. Assume the following regression is made between estimated annual inflation (X, or independent variable) and the actual number (Y, or dependent variable): Y = 0.154 + 0.917X Using this model, the predicted inflation number would be calculated based on the model for the following inflation scenarios: Inflation estimate -1.1% +1.4% +4.7% Inflation based on model -0.85% +1.43% +4.46%

The predictions based on this model seem to work best for typical inflation estimates, and suggest that extreme estimates tend to overstate inflation – e.g. an actual inflation of just 4.46 when the estimate was 4.7. The model does seem to suggest that estimates are highly predictive. Though to better evaluate this model, we would need to see the standard error and the number of observations on which it is based. If we know the true value of the regression parameters (slope and intercept), the variance of any predicted Y value would be equal to the square of the standard error. In practice, we must estimate the regression parameters; thus our predicted value for Y is an estimate based on an estimated model. How confident can we be in such a process? In order to determine a prediction interval, employ the following steps: 1. Predict the value of the dependent variable Y based on independent observation X. 2. Compute the variance of the prediction error, using the following equation:

Formula 2.42

Where: s2 is the squared standard error of the estimate, n is number of observations, X is the value of the

independent variable used to make the prediction, X is the estimated mean value of the independent variable, and sx2 is the variance of X. 3. Choose a significance level α for the confidence interval. 4. Construct an interval at (1 – α) percent confidence, using the structure Y ± tc*sf. Here's another case where the material becomes much more technical than necessary and one can get bogged down in preparing, when in reality the formula for variance of a prediction error isn't likely to be covered. Prioritize – don't squander precious study hours memorizing it. If the concept is tested at all, you'll likely be given the answer to Part 2. Simply know how to use the structure in Part 4 to answer a question. For example, if the predicted X observation is 2 for the regression Y = 1.5 + 2.5X, we would have a predicted Y of 1.5 + 2.5*(2), or 6.5. Our confidence interval is 6.5 ± tc*sf. The t-stat is based on a chosen confidence interval and degrees of freedom, while sf is the square root of the equation above (for variance of the prediction error. If these numbers are tc = 2.10 for 95% confidence, and sf = 0.443, the interval is 6.5 ± (2.1)*(0.443), or 5.57 to 7.43. Limitations of Regression Analysis Focus on three main limitations: 1. Parameter Instability - This is the tendency for relationships between variables to change over time due to changes in the economy or the markets, among other uncertainties. If a mutual fund produced a return history in a market where technology was a leadership sector, the model may not work when foreign and small-cap markets are leaders. 2. Public Dissemination of the Relationship - In an efficient market, this can limit the effectiveness of that relationship in future periods. For example, the discovery that low price-to-book value stocks outperform high price-to-book value means that these stocks can be bid higher, and value-based investment approaches will not retain the same relationship as in the past. 3. Violation of Regression Relationships - Earlier we summarized the six classic assumptions of a linear regression. In the real world these assumptions are often unrealistic – e.g. assuming the independent variable X is not random. << Back Next >>