This action might not be possible to undo. Are you sure you want to continue?
Gerald Pollio, Ph.D.
Preliminary – Not to be quoted without the author’s permission
1. Introduction The purpose of this Guide is to acquaint you with the purpose, sources and structure of your finance dissertation. Most graduate business schools require students to produce a dissertation in partial fulfilment of the requirements for obtaining an MBA degree. Many students resent this requirement; and would if they could take additional courses or, better still, produce a Business Project instead. This is the practice of some business schools, though the vast majority still require production of what might best be described as an ‘academic’ dissertation. We place the word academic in inverted commas to emphasise that dissertations are not strictly speaking formal academic studies. Business education is after all an applied subject and students, accordingly, will expect that the topic of their dissertation should emphasise practical relevance. The two of course are not mutually exclusive: most business school dissertations combine the former with the latter, in that students are expected to produce output that meets or exceeds established academic norms but within the context of addressing a topic that will advance understanding of a narrowly defined business issue. Student hostility towards the dissertation requirement is understandable, but as we hope to show misguided. Throughout their course of study students have to face assessments of various sorts, some oral, some written, some as part of a group exercise, others as individual assignments. What these assessments have in common is that they were all set by the student’s lecturers, with the choice, if any, confined to the limited range of topics on offer. A dissertation is the only assessment the choice of which is determined more or less uniquely by the student. Students, of course, have the benefit of their Supervisor’s advice, designed to improve their proposal and ensure that it can be completed within the time required. Only on very rare occasions will a Supervisor reject the student’s topic and then only because it is too broad and thus unlikely to be completed within the time allotted. Supervisors seldom reject out of hand a dissertation topic, since we all recognise that a topic of the student’s own choice is the best motivator for getting on with the work. A logical place to begin our discussion is with the concept of research. Most students find the task awesome, especially international students arriving from countries where the prevailing approach to education differs, in some cases quite radically, from that of the United Kingdom. Yet the process is far less daunting than you might imagine, not the least because, perhaps without even being aware of it, most students have already produced some fairly sophisticated research results of their own. Consider the following: the university at which you are studying was not chosen randomly; you will have reviewed the websites of a number of different business schools
that were of interest to you. You will have narrowed the focus by concentrating on those where you meet all of the requirements, whether in respect of prior academic accomplishment or linguistic proficiency. You will have determined whether tuition costs are reasonable, and whether you can afford major ancillary expenses such as housing, food and transportation. You will also have investigated whether you are able to work, and if so, how many hours are both permissible and consistent with successfully completing your course of study. The answers to all of these issues will have come from a careful and detailed assessment of the material from whatever source or sources you had access to, and quite possibly from discussions with one or more students who attended the university you are considering attending. These people will also be a source of valuable information concerning the additional costs you will incur as part of acquiring your degree, what type of work is available locally and what rates of pay are likely to be. Armed with this information, you select from among the many post-graduate institutions you investigated the one that best meets all of your requirements. As will be seen the very same process applies when drafting your dissertation.
2. Research What is research? The definition favoured by the UK’s Quality Assurance Agency is also one of the most comprehensive:
Research … is to be understood as original investigation undertaken in order to gain knowledge and understanding. It includes work of direct relevance to the needs of commerce and industry, as well as to the public and voluntary sectors; scholarship, the invention and generation of ideas, images performances and artefacts including design, where these lead to new or substantially improved insights; and the use of existing knowledge in experimental development to produce new or substantially improved materials, devices products and processes, including design and construction. It excludes routine testing, and analysis of materials, components and processes, for example, for the maintenance of national standards, as distinct from the development of new analytical techniques. It also excludes the development of teaching materials that do not embody original research.
The quote actually applies to the production of doctoral dissertations though with all necessary changes applies equally to MBA dissertations. Knowledge is not produced in isolation; it builds upon the scholarship and research efforts of others, hence the centrality of the Literature Survey that is mandatory for all dissertations. It provides the platform upon which the dissertation rests, pointing to the best way to approach a given topic; it also summarises the results of recent scholarship against which the findings of your dissertation can and will be measured. The QAA definition highlights the critical importance of originality, by which they mean either a new or unique contribution to existing knowledge or the generation of results that extend, revise or supplant existing scholarship. A somewhat weaker standard is applied to the production of a Master’s dissertation. You will not be expected to produce original research in the sense just described. Originality here means you will select and address a topic of interest or importance applying principles and techniques learned as part of your post-graduate education. In effect, the dissertation demonstrates the depth of your understanding of the analytical materials acquired during your course of study and your ability to apply the associated tools constructively and effectively to a topic of direct interest or relevance to you. It also means something else of equal importance, which applies in respect of whatever material you are required to prepare and submit for evaluation, namely, that all such work must have been conceived and produced by you. We are, of course, referring here to plagiarism. This does not mean that you can not refer to work produced by others, only that whenever you do, you must acknowledge its source. This applies as much to direct quotations, which you will be required to indicate with the use of inverted commas, or to paraphrases, summaries of other people’s ideas. The failure to do so is not merely a breach of research ethics, it amounts to intellectual theft that can result in immediate failure of the work presented and possible termination of your studies here.
There are various aspects of research that you must understand as a prelude to writing your dissertation; many of these terms are used casually, or carelessly, even among those who should know better. It seems desirable, therefore, to provide rigorous definitions for these terms to prevent you from falling into the same trap. We begin with the term concept. Concepts are abstractions – products of the mind – that identify some aspects of reality as forming a class made up of things that are similar – or at least alike enough – for purposes of theorising about them. A concrete example should help here. Mammals are warm blooded animals that incubate their offspring internally and give birth to living young. There are of course many different types of mammals, from mice to hippopotami, some of which are carnivores, others herbivores, while yet others eat both meat and vegetation. These variations within the broader category are, however, irrelevant to many biological theories; for many theoretical purposes, all mammals are alike. Although concepts are abstract, and not always directly measurable, all scientific concepts have empirical (observable) counterparts. We may not be able to see, touch or smell the concept of mammal, but we can see, touch, and smell animals that are classified as mammals. An indicator is any observable measure of a concept. While scientific research is undertaken to test a theory or thesis, actual research necessarily is based entirely on examination of indicators. There is, for example, a strong presumption in the management literature that training improves productivity which in turn benefits the firm’s bottom line. What we observe are the results (indicators) of training on profitability, and if profitability does improve we may reasonably conclude that training contributed to improved financial performance. Theories and theses produce specific predictions about relationships that do (or do not) exist among a given set of indicators. A prediction about the relationship that exists among indicators is known as a hypothesis; it is the hypothesis that serves as the specific research focus. A theory is meant to have general relevance, applicable across time and space; a thesis, by contrast, lacks generality and typically applies to a specific situation occurring in a specific time period. There is a long running debate about the function of theory, and how best to test its accuracy. For some, theory serves three broad purposes equally: explanation, prediction and control. To fulfil these requirements, the assumptions upon which the theory depends should be realistic, facilitating determination whether the theory has broad, limited or no explanatory power at all. If it does we should be able to use the theory to forecast future outcomes: for example, a reduction in taxes or an increase in public expenditure is expected to stimulate economic activity by some multiple of the initial fiscal injection. If the prediction is valid, then the information can be used to influence or control subsequent economic activity. When the economy is heading towards recession, governments should pursue expansionary fiscal policies to limit the decline in total output.
For others, the true test of a theory is prediction alone; whether the assumptions used to generate the theory are realistic or not is immaterial. Consider the following example, often used to support this assertion. A billiard player is about to launch the eight ball into the corner pocket. Will she succeed? Other than guessing the outcome, we could formulate a simple theory about the likelihood of success. One such theory would be to assume that the player knows trigonometry and uses it to determine the correct angle with which the white cue ball should strike the black eight ball to ensure it falls into the designated pocket. As a description of how (most) people approach the game of billiards it is probably hopelessly inaccurate, in other words, the assumptions underpinning the theory taken at face are totally unrealistic, but as a theory of whether the eight ball will drop into the corner pocket, there is none better. Critics of this position point out that realistic assumptions are the foundation of sound research. Suppose one of the conclusions of the model is shown to be false, then it logically follows that one of the assumptions must be false; if the assumptions are not realistic (plausible), it is hard to learn from the failure of the conclusion. Note, significantly, that ‘realistic’ or ‘plausible’ are not synonymous with ‘true.’ It is thus unclear exactly what additional information is imparted by the use of ‘realistic’ assumptions. Suppose we substitute ‘more’ for ‘less’ realistic assumptions; in what way will we gain in understanding? If our objective is to determine how best to place a billiard ball in a given pocket, with which plausible assumption(s) should we replace the model’s unrealistic assumptions, and will that significantly improve the model’s predictive accuracy? A second, more focused, example may help clarify the point while pulling together broader issues of how to go about testing hypotheses which, after all, is the primary objective of any dissertation. In Principles of Finance courses students are taught the Efficient Markets Hypothesis (EMH). The theory presupposes that market participants make use of all publicly available information to value assets, real estate or shares of stock, for example. Indeed, the EMH goes so far as to assert that it doesn’t matter whether all traders act rationally; so long as some investors act rationally, then the market as whole will exhibit rationality. One of the main predictions of the EMH is that share prices will incorporate all known information about the firm so that only the arrival of new information will cause prices to change. Since new information (‘news’) is unpredictable, so too will be the behaviour of share prices, thus providing an explanation for the apparent random character of their movements. If so, the best prediction of tomorrow’s share price is today’s price. An equally important implication is that active investment management is pointless, since it will not be able to produce extraordinary returns; investors would do better to invest in indexed funds that track the movements of a broad market index. How can we be sure that departures from rationality will be corrected by a small number of rational, knowledgeable traders? The mechanism that ensures that shares are correctly valued is known as arbitrage; rational investors who spot pricing errors will buy (sell) the under- (over-) priced asset until the correct valuation is achieved.
But how do we know this well informed individual or group of individuals exists, and if they do how did they come by their extraordinary knowledge of asset valuation? The simple fact is that the existence of well-informed arbitrageurs, and how they acquire their omniscience, is nothing more than a theoretical construct, assumptions that are not directly verifiable, but which help to explain why financial markets exhibit such a high degree of efficiency. We could, of course, dispense with these assumptions, but then we would have find another explanation for why markets are efficient A final important consideration concerns causation. In principle, the concept is straightforward: two variables are causally related if and only if it can be shown that the behaviour of one variable (the dependent variable) is directly influenced by a second variable known as the independent variable. It is much more difficult to establish causation empirically; the fact of placing one variable on the right hand side of the equation and calling it the independent variable is too facile. Consider one of the examples given above that budget deficits can be used to stimulate economic activity during recessionary periods. Since the size of the budget deficit affects, as well as being affected by, the state of the economy, we could never be sure which impact predominates, that is, whether fiscal policy is having the predicted impact. Numerous causality tests exist most of which are too advanced to be used for dissertations here. But the point to remember is that it is wrong to assert the existence of causal relationships without any empirical or external sanction for the claim.
3. Getting Ready: The Proposal To get you started the following exhibit highlights the main points you need to know to develop your Proposal. The full UWIC report from which this exhibit is extracted is available on your student portal; it would be to your advantage to read this document before embarking on your Proposal, as it answers the most frequently asked student questions. Exhibit 1
Summary of Proposal Guidelines Red Book UWIC Revised Edition 2007 with explanatory notes Word count: The proposal is ‘2500 words of which the literature review should represent around 1500 words.’ I Provisional Title ‘This should include an initial sentence that clearly encompasses the purpose and aims of the dissertation’. You should be able to establish in which organisation/s the primary data is to be collected and what the research aims to do. II Brief review of literature ‘The aim is to have a focused and critical conclusion.’ ‘The literature review is an essential guide to other stages’. Plan your review into themes that relate directly to the research question and can direct the following research aims and objectives. III Aims and objectives ‘These should flow directly from the focus of the literature review.’ ‘Clearly specify your research objectives.’ IV Statement of the design and methodology ‘Overall design, validity and reliability. The justification will need to also refer to you title, aims, objectives together with issues of access [primary data] and time’. V Sources and acquisition of data ‘It must be shown how these relate to the research title and design. They must provide evidence that access has been negotiated.’ [Email or letter from the organisation/s where the primary research is to be gathered]. ‘Ensure that ethical issues have been considered’ [never collect data, record interviews or meetings without prior and clear permission being granted]. ‘ The method must be consistent with the research design’. ‘Field work methods must be specified eg. participative observation, interviews and questionnaires’
VI Method of data analysis ‘A clear and reasoned distinction needs to be made between deductive quantitative techniques and inductive qualitative techniques. The selection needs to be consistent with the research design and field work methods.’ VII Form of presentation ‘Usually in written form. Additionally, Indicate if graphs, charts or tables are to be also included. VII Timetable ‘ensures that the work has been planned guide to the amount of work to be done guide to how much time should be spent on each section – time limitation can be a reason for selecting certain methods. References The Harvard System: A reference is cited in the main body of the text by inclusion of the authors name and date of publication, e.g. (Greaves, 2006) wrote about the size of………. If you are using a direct quotation you must include quotation marks and the page number of the reference in the main body of your text, e.g. (Rooney and Owen, 2006 p. 5) maintain that teamwork: “supports the values of the organisation”. If there are more than two authors you should name the first author only, e.g. (Kotler et al, 2006). References are then listed alphabetically on a references page at the end of your dissertation. Examples: Single author: Keegan, W. J. (1989) Global Marketing Management London: Prentice Hall Two authors: Gerard, S. & Harman, D (2005) Winning the League Liverpool: Penguin
Source: Michael L Nieto, LSC (2009).
You will be assisted with the production of your dissertation by your Supervisor, a knowledgeable member of faculty with a background in your chosen subject area. Before being assigned a Supervisor, you are required to take a course in which the purpose, scope and nature of research and research methods are discussed and clarified. These new tools are designed to supplement the knowledge you acquired in classroom lectures; together they provide the essential preparation to undertake production of your dissertation. The first step in this process will be topic selection. By the time you have reached this stage, you will already have decided in which area of business you intend to specialise,
and will similarly have considered those topics within your subject area that are of greatest interest. Your Research Methods course will help to sharpen your thoughts, after which you will be required to produce a formal research proposal, that is, a brief description setting out the primary and secondary, if any, objectives of your dissertation and how you intend to proceed with your study. The Proposal amounts to a statement of intent; it is possible to alter or amend your original submission, usually after preliminary consultation with your Supervisor. The title of your proposed dissertation provides a clear statement as to what you intend to do. Best practice is to put the title in the form of a question; if you intend to answer the question posed in the title with reference to a specific company or industry, you may add an indication to that effect, for example, ‘Does Pay for Performance Affect Profitability? The Walt Disney Company as a Case Study.’ In preparation for submitting your Proposal you will have done some preliminary research to determine whether there are sufficient resources to produce your dissertation within the time allotted. You will provide a general overview of how you intend to develop your analysis (methodology) and discuss the main objectives of the study. Since you have neither completed your research nor developed the data to support your analysis, it will be difficult at this stage to set out a definitive statement of either. Only as your literature review and research proceed, will it be possible to firm up the details of your dissertation. Practically, this means that you will have the opportunity to revise aspects of your study as may be necessary, though ideally the changes will be modest and fully in keeping with the originally stated intentions. If more radical changes are required, it would be best to discuss the proposed modifications with your Supervisor. Finally, you are required to provide a time line setting out a work schedule. The time you allocate to the different sections of your study should reflect their relative importance, an indication of which is shown in the following exhibit (2). The Proposal should be comprehensive enough to provide a clear statement of your research intentions, amounting to no more than 1,500-2,000 words, but not so detailed as to deprive you of flexibility to modify your approach as your research proceeds. The Proposal will then be graded, and comments provided to assist you in transforming your initial ideas into a workable dissertation that can be produced within the required time period. If, as sometimes happens, your Proposal is failed, it is nothing to be too concerned about as your Supervisor will have provided detailed comments which, if followed, should lead to a passing grade upon resubmission. An MBA dissertation is a challenging task for most post graduate students, but it can also be extremely rewarding. Most obviously, it is a piece of research that you have produced largely or entirely on your own. And second, it is something that can be shown to,
among others, prospective employers to demonstrate the depth of your understanding of an important financial topic; you will have few other examples of your post-graduate work that can serve this purpose. Exhibit 2 LSC/STM MBA Dissertation Assessment Form
1. Purpose: a clear statement of the purpose of the dissertation e.g. reasons for the investigation; statement of problems; purpose of the study 2. Literature review: critical review of the literature e.g. use of relevant literature; evidence of understanding the ideas expressed; development of extent of application 3. Methods; appropriate use of methods e.g. stated reasons for using type of methods; description of methods; appropriateness and extent of application 4. Data: presentation and analysis e.g. description and setting of the study; presentation of the results; analysis of the findings 5. Interpretation and conclusion: e.g. analysis of findings with reference to purpose of study; issues from the literature review; practical application and areas for further research 6. Presentation: e.g. structure, language, visuals, logic and coherence
There are several general points you should bear in mind when developing or outlining a dissertation topic. You must use information and knowledge learned during your studies in order to be able to form suitable research titles. The most important thing is to develop dissertation topics that you feel comfortable with and have confidence that they can be completed within the time allotted and to the required standard. Tutors and or Supervisors can provide you with helpful information. The following pointers provide you with useful information as to how to proceed to develop your dissertation topic or title. • • It may be belabouring the point, but dissertation topics should be based in the real world of your field of study. Topics should be based on your area(s) of interest. There is no point in asking your Supervisor to choose a topic for you, as they will refuse to do so. Because you will have to carry out the research, it is essential you feel comfortable with the topic. You should be knowledgeable about your topic, as you are more likely to complete successfully a dissertation when you have a strong interest in or direct knowledge of its subject matter. You might consider in this connection researching a topic that would be helpful to your prospective career. You must ensure your chosen dissertation topic is up to date; new knowledge is constantly being produced and must not be ignored, otherwise your dissertation will appear outdated, with an attendant negative impact on your final grade. A good dissertation topic, accordingly, must be up to date and reflect current practices; MBA dissertations are not historical exercises, though you may refer to or summarise relevant background information if that would help to better understand or clarify points being made in your dissertation. You should rule out dissertation topics that are too difficult for you to research. Many students select dissertation topics in areas that are considered to be interesting or trendy, but are not actually of interest to them. Students imagine that more difficult topics will impress their Supervisors and ultimately receive a better grade than they would have gotten with a topic of more direct interest or relevance to them It doesn’t, not the least because it is difficult to get good marks in a subject which you lack the competence to write about. One of the best ways to develop a sound and interesting dissertation topic is to think about issues that you have discussed and learned during your course of study. Consider particular content areas or subjects that you studied within your modules and especially those that stimulate ideas that might help throw light on the research questions and topics that you are attempting to formulate. 12
You might consider rereading one or two previously assigned text books or articles. Revising previously acquired knowledge and ideas can often help formulate an interesting dissertation topic; it can also provide you with an indication whether yours is a good topic or not. Students will be able to gauge for themselves whether they can easily obtain relevant information to support their research if they decide to pursue that particular topic.
It might be useful at this point to illustrate, using an actual case study, the process by which initial, tentative ideas are transformed into an acceptable proposal and ultimately a satisfactory dissertation. Against the backdrop of the large number of recent and past corporate failures − beginning with Enron and WorldCom and culminating in the financial difficulties recently experienced by many large multinational banks − several students thought it might be worthwhile to investigate the importance of auditing failures as a possible contributing factor. Students expressing an interest in pursuing various aspects of this topic were keen to focus on either accounting and auditing issues or were professional auditors wanting to know more about the practices of the ‘Big 5’ accounting firms. 1 Many indicated the topic connected with their professional development and thought (correctly) it would enhance their career prospects, a perfectly valid reason for choosing a particular dissertation topic. Their next task was to narrow the issue to a relevant topic that could be completed within the required time period. Out of these preliminary discussions there emerged a number of interesting and important topics that served both the students’ immediate academic needs and longer term professional interests. One of the first of these proposals took as its point of departure Enron’s failure and sought to explain why Arthur Andersen, the firm’s auditor, had failed to detect the company’s growing financial difficulties. This suggested two hypotheses: (1) Enron deliberately withheld pertinent information from its auditors, concealing questionable financial transactions in so-called Special Purpose Vehicles (SPV), legal entities that benefited the company by reducing transparency; or (2) perhaps Andersen’s practice concentrated on companies making similar extensive use of SPVs, and thus faced similar financial issues. Andersen may also have advised as to how best to structure and manage such vehicles, which raises the spectre of a potential conflict of interest. Both hypotheses were perfectly reasonable, the more immediate question being how to go about implementing them. The first hypothesis could be attacked from the standpoint of numerous official investigations and research reports that have addressed all aspects of Enron’s failure, including the importance of auditing failures.
The Big 5 auditing firms are: Arthur Andersen (now defunct), Price Waterhouse Coopers, Ernst and Young, Deloitte and Touche, and KPMG.
The second involved developing data on the companies audited by Andersen to determine whether its client list differed fundamentally from that of its competitors. This in turn suggested two possible approaches: • Establish whether Andersen specialised in companies operating in the same industries as Enron. The student was encouraged to pursue this line of inquiry because preliminary research indicated that Andersen did indeed have a larger concentration of its clients in the oil and gas industries than did other major accounting firms, meaning they could have been as vulnerable financially as was Enron. Determine whether Andersen’s clients exhibited a risk profile different from the firms audited by the other Big 5 accounting firms. Auditors serve two main shareholder functions − assurance and insurance − the former confirming the accuracy of the client’s financial statements, and the latter ensuring the availability of financial resources needed to cover any damages arising from auditing failures. The largest accounting firms would seem to offer investors the strongest guarantees on both scores, in which case they deserve the premiums they are reported to earn over lower tier accounting firms.
There is of course no reason a priori to suppose their client lists differed significantly from each other. Indeed, it is fairly common to assume that the Big 5 accounting firms were more or less homogeneous, and thus pretty much interchangeable; the results of the analysis would disclose not only whether there were specific reasons for Andersen’s failures but also whether clients do in fact view their auditors comparably. Two additional issues arose in connection with pursuing this proposal: • • How to go about collecting the necessary data for the dissertation, and After having gathered the data, could one be sure that any differences detected were significant.
The first issue was resolved by choosing a relatively large sample of companies in a number of different industries, then sorting each by its auditing firm; the second required use of specific statistical techniques capable of differentiating whether any observed differences could be interpreted as significant or whether they were more likely to reflect chance, a by-product of the size of the sample chosen. This highlights yet another important trade off that must be faced as you develop your proposal and ultimately your dissertation. The larger the sample, the greater confidence you may have that the observed data are indicative of real differences between the two groups you are studying. On the other hand, unless you have access to large economic, financial and corporate data bases, it will take considerable time to develop your data − conceivably more than you have for producing your dissertation − meaning that you may
have to make do with a smaller sample thus reducing the confidence with which you can present your findings and conclusions. This is where statistical procedures come into the picture and we shall have more to say about them later in this Guide. Before leaving this subject, we should mention a number of other hypotheses that students have pursued in this specific area of research. • Around the time of Enron’s growing financial difficulties, did Andersen’s other clients experience a more negative share price impact compared with firms audited by the other Big 5 accounting firms? Did Andersen’s growing legal difficulties affect the clients of the other large accounting firms, and if so which were most directly and strongly affected? Why were the financial data of so many companies restated during this period? Did clients that reported dismissing Andersen experience positive stock price reactions around the time of the dismissal? Did clients that remained with Andersen experience negative stock price reactions in response to announcements by other firms they were dismissing Andersen?
• • • •
Some idea of the full range of suitable MBA dissertation topics is given below, which were chosen from those approved by the University of Wales, one of the institutions that will be awarding your degree. Note, in particular, not only the wide range of topics represented but also the geographic scope of the dissertations. • • • • • • • • • • • • • • • Competition between the Hong Kong and Shanghai IPO Markets Determinants of Capital Structure: Cross-Sectional and Panel Analysis for UK non-Financial Firms Inter-cultural Differences in Internet Marketing Communications Reform of the Foreign Exchange Rate Regime and Exchange Rate Misalignment in China Capital Structure and Financial Crisis in Malaysia The Impact of the Asian Financial Crisis on the Relationship Between Financial Development and Economic Growth The New Basel Accord: Implications for the UK Residential Mortgage Market Determinants of Japanese Commercial Banks’ Profitability, 1995-2003 Financial Derivatives and the Exposure of US Banks Culture, Economic Development and the Financial Sector: an essay Economic Crises and the Financial Sector: Empirical Evidence from Turkey Bank Efficiency in the Nigerian Commercial Banking Sector Bank Off-Balance Sheet Business and Risk Exposure Taxes Effects of Fixed Assets Revaluation on Stock Returns: Evidence from Greece An exploratory study of supply chain integration over the Internet 15
• • • • • • • • • •
Recipes for Western Fast-food success in China – a value chain perspective Enterprising environment for Entrepreneurs E-commerce facilities within Greek retailers How mangers can increase Employees Motivation within the retail services without increasing financial costs The relevance of Strategic Statements to the delivery of shareholder value Virtually working: an examination of the experience of home tele-workers in North East Wales Are solicitors client focused? International expansion by franchising Marketing effectiveness: the case of the Brunei Islamic Trust Fund The training function of the International Joint Venture
Finally, the following exhibit illustrates a number of important things about your dissertation; it will repay careful study as it answers most of the basic questions students have concerning the structure and content of their dissertation. Exhibit 3 MBA Dissertation Pointers
Structure Opening Section Word Count Guide Key Elements
• • •
Title page- should be written as a question and should be indicative of the subject e.g. How can an organisation develop an international team? A focus on…
Declaration and statement of own work / Supervisor sign off Acknowledgements • Abstract: This can not be written until the end. It should be a short outline that summarises all sections. It needs to focus on the study question, methods used and the key findings in particular. • Table of contents What is the area or problem you are investigating and why it is important to the research community, any company and you? (purpose) Requirements: • Background / context of the study to the study. Use authoritative sources and facts and figures to provide evidence of trends and importance in this area • A clear focus on a research issue or statement of any problem area and possible causes that will be investigated if it is company investigation. An introduction to any company. This should be a crafted overview relating to the topic area to some degree and not just
dumped from the company website or an article. • A clear statement of Aims and Objectives E.g.
Aim: (example) To explore strategies to develop international teams Objectives are the key elements that underpin the aim. Therefore: Objectives (Example) To Explore IHRM models of the universalism school, cultural school and key international cultural models To examine the importance recruitment and selection in the context of building international teams To explore the characteristics of a successful international team To examine culturally aligned training for these teams • The results expected /aims of the research Introduce the theoretical / conceptual framework that you will be exploring (Plain English). It identifies the main theoretical points and themes that are RELAVANT to the dissertation and informs the reader how you intend fulfil your research aims by use of research methods. E.g. Secondary data or primary data analysis Brief outline of the subsequent chapters
This is a theoretical exploration of the existing knowledge that is RELEVANT to the area you are investigating • What research exists? • How does this impact on your research problem? • What relevant theories and frameworks does it supply to your problem? • The literature review focuses on similar and contrasting perspectives that researchers or academics have used to approach this, or similar research areas. As such, you have to identify the strengths and weaknesses of such approaches. Use only relevant studies, focus on their main findings and conclusions You should be able to consider the most appropriate areas, justify these, and use them to inform / create your research methodology *The literature review is not a brief summary of all the books and articles you have read. This is called an annotated bibliography.
An in-depth discussion on how the study will be undertaken and how it fits with your research question • • • • • • What methodology will be used? How will it be done? Why was the method selected? (justification for the final selection) Why were other approaches discounted? Strengths and weaknesses of the approach Design of research instrument
Design of research instrument • Clear indication of the design features and any questions to be asked in a survey • Describe the data analysis techniques to be employed. E.g. sampling techniques, frameworks, size and type of an appropriate survey Description of subjects involved, settings for the study and any variables that may be encountered Describe the data analysis techniques to be employed. E.g. sampling techniques, frameworks, size and type of an appropriate survey
If you are using a particular concept / model, then it should be critically discussed and incorporated here. For in- company Investigative dissertations where you have access to the operation Current system described • • • • Description of the operation and its processes A perspective on the problem Factors that may be the cause of problems Identification of internal documents, operation procedures and policies
Note: These two sections will add to the word count. So, adjust the Literature Review and RM sections to compensate. The word count and structure should be adapted to suit the dissertation and its focus.
For in- company Investigative dissertations where you have access to the operation Current system Analysed
Investigations into the company, the operation and its systems This may involve application of a model or method that you discussed in the literature review and research methods. E.G. Observing an operator carrying out job task, asking questions after the operation and comparing this to required standards of performance necessary. This might show that there is a skill gap in operatives that is causing quality problems.
Data Analysis and findings
What did you find? One or possibly 2 chapters on data analysis and findings What have you discovered and propose reasons why this may be? Presenting the findings 1) Restate the actual research question for the reader and show frequency tables and graphed data. Interpret the data for each data set. Make selected links back to the LR 2) You may be able to cross-tabulate; make correlations from one data set to another IF you have set up the personal information and demographics of the population first. From this you may be able to identify patterns and trends. 3) Discuss the results in plain English 4) Build on and qualify the overall conclusions linking back to the LR and key facts from the analysis. Discuss whether your results successfully met the conditions to test your research question or hypothesis e.g. in a postal survey, how many people returned the questionnaire? Those who didn’t, are they different in someway? Discuss in narrative form your conclusions linking back to your research question and literature review so that a clear argument and thesis can be identified throughout the work. Use clear statements on the conclusions reached Was your hypothesis or theory supported in your findings? Answer the questions you raised in the introduction. How does the results compare with theory and good practice discussed in the literature? • How do your results compare to those of other research / academic studies? • What has surprised you about the results? • What limitations may there be to your study? • How does it add to knowledge in the field? • What further research would you recommend to develop your work in the area of knowledge and research? Recommendations should be concise statements and include time scales as necessary. All sources in the dissertation need to be referenced using the Harvard method. The order of referencing is as follows: 1) Books 2) Journals and articles 3) Web sites with full URL address and dates accessed. Use sparingly! Appendices are to be used for relevant information that would spoil the flow of the report. E.g. a good example of use is the inclusion of any survey questionnaire • • • •
Recommendation s Bibliography
Number these using Roman numerals
Source: David Greenshields (LSC, 2009)
4. The Introduction and Literature Review The first section of your dissertation is the Introduction, in which you state the basic objectives of the study; this would include both primary and secondary objectives, the former relate to the main topic(s) of your dissertation, while the latter are a number of important side issues that flow out of the analysis of the main area of interest. It will also serve to introduce how you intend to pursue your chosen topic by providing a discussion of the methodology you will be using to implement the study. Students sometimes find it easier to write the Introduction last, a decision that is less convoluted than it appears. After all, it is much easier to draft the introduction when you know how things turned out than when they are still pretty much up in the air. This is an important point to bear in mind since the introduction frames the remainder of the dissertation. The reader is more likely to be drawn into the study if the approach, objectives and method are clearly, carefully and confidently stated then if the introduction is carelessly or ambiguously worded. The second broad area is the Literature Review or Survey. This is one of the most critical sections of the dissertation as it provides the foundation upon which your study ultimately rests. Despite its importance there is considerable ambiguity as to how best to develop the Survey. In some instances students compile what amounts to little more than an annotated bibliography, that is, they list numerous publications that relate to their topic and provide brief summaries of each. This, of course, misses the point and value of such a Survey. For one thing, if pursued correctly it will open your eyes to valuable sources of information that can be used to inform your study. It will point you in the direction of what has been written in your area of research and will provide alternative perspectives on many of the issues you will be covering in your dissertation, some of which will confirm your conclusions, while others will not. Part of your job will be to separate out the more from the less relevant studies, the more from the less reliable findings. Relevant and reliable do not refer to whether previous research agrees with the findings of your study, but rather to the quality of the findings, that is, how well they have held up to subsequent research. Findings that have not (or cannot) be replicated should be considered suspect, while those that have been validated repeatedly are worthy of more serious consideration. Before considering how best to evaluate the evidence you are developing in connection with your dissertation, it is worth considering the different types of information resources that are available, above all, their advantages and limitations. There are five principal literature resources: popular press (including electronic documents); practitioner books and compendia; practitioner journals; academic books and compendia; and academic journals. We now consider each in turn.
Popular press: The popular press consists of widely read business publications and magazines. It includes publications such as the Financial Times and Wall Street Journal, both international newspapers, and magazines such as The Economist, Euromoney, Business Week, Institutional Investor and Forbes. The main value of this literature is timeliness and accessibility: global business and financial newspapers provide real time information concerning all aspects of national and international business and global financial market trends. The information provided is generally factual, practical and intended for the use of decision makers. Business magazines, because they publish only weekly or monthly, take a slightly longer (thoughtful) view of international political, business and financial developments and often provide informed commentary, frequently by guest contributors, designed to provide perspective on recent or prospective developments. The main weaknesses of the popular press are the same as its strengths, namely, its currency. Currency means that much of the information provided is incomplete or uneven. Some issues are covered in considerable depth, including commentary provided either by staff or guest contributors or in editorials, while others rate only one column inch of space. By definition, news equates to new information, which perforce is fragmentary, inaccurately reported or simply wrong. Leading newspapers and magazines often publish apologies for inaccuracies in the original stories or corrections that revise or update information previously provided. There are other difficulties that also affect the value of such information for use in dissertations. Most business newspapers and magazines are, owing to their main readership, business men and women, so they often colour information to maximise the appeal to their target audience; their editorials equally tend to adopt pro-business stances. It is absurd to imagine that such publications are (or can be) value-free; the very best one can hope for is that their journalists offer a wide range of alternative or competitive perspectives, from which readers can draw their own conclusions about the value or relevance of the points of view being expressed. Most such publications make every attempt to ensure that the information they provide is complete if not accurate. Not being scientific, there is no reason to expect scientific rigour in most of the information they provide. Only the most mundane information – standardised measures of business, financial or economic performance, for example – are likely to be presented reasonably accurately; less familiar or more technical information is not always well covered or explained, nor is any attempt made to distinguish between the relevance of different concepts. Trendy, rather than serious, ideas tend to be emphasised partly because that is one of their main functions, to publicise the new and the controversial, not to concentrate on concepts that have longer term significance, promote sound business practices or add to the firm’s bottom line. The Internet is a source of the widest imaginable range of subjects. Business and financial information is easily accessible thanks to search engines that do most of the work. There are numerous sources of information so that the competitive ideal noted
above is met well beyond what is possible within a given news organisation. On the other hand, the available information lacks even the most elementary safeguards with respect to accuracy or completeness. The classic example here is Wikipedia. In its original form, Wikipedia entries could be edited online by anyone, qualified or not, regardless of whether the ‘corrected’ information is more or less accurate than the information it replaces. Wikipedia recognised this difficulty and now imposes a time delay for redaction of individual entries. Of course this does nothing to correct the basic flaw in its design. Information is useful only if it is accurate. In the Wikipedia world ideology or personal views are considered to be of equal value to research driven and supported by ‘facts’; the lack of intellectual controls means that no information so supplied can be regarded as authoritative, even when it is. It is best to avoid such information; most Supervisors will do more than raise an eyebrow should information from this source be listed in footnotes or the bibliography. Practitioner Books and Compendia: This category comprises an amazing diversity of publications that it is difficult to know exactly how to describe and evaluate. Many of the books falling into this category are of the popular variety, written by authors keen their share their experience or point(s) of view with the reader. They vary widely in terms of quality and value, so the best way to approach them would be with a high degree of scepticism. Unless the authors have an established track record in the area or areas in which they are writing, it would be best avoid them altogether for the simple reason that no matter how deeply felt the author’s concern for the topic being addressed there is simply no way the reader can assess the validity of the results or how widely they can be applied. This is not to say the genre is worthless; there are examples of such books having been written that have changed thinking or influenced organisational practices, but these are the exceptions. Practitioner Journals: These avoid some, but not all of, the pitfalls of practitioner books while most journals subject submitted articles to peer review. Their main advantage is that they are highly readable, and do not presume any specialised knowledge on the part of the reader; the Harvard Business Review is a prime example of this genre. They also deal mainly with ‘live’ business issues and as they tend to be authored either by current practitioners or academics, the arguments tend to be presented with greater objectivity than in the popular press. The weaknesses of this literature derive mainly from a predisposition to include articles dealing with issues such as philosophy or other equally arcane areas that lack immediacy or that drift into such generalities as to be practically worthless in terms of applicability or relevance. A second shortcoming is the overriding importance editors of such journals attach to clarity and uniformity of style. This emphasis produces a number of unfortunate consequences.
For one thing editors are assigned to see an article through from submission to publication. These people are typically better informed about the editorial requirements of the journal than the content of the articles being submitted. In the course of rewriting articles they can, and often do, sacrifice relevant material or lose the main point(s) being made by the author for the sake of clarity. Many authors put up with this interference because of the journal’s prestige and wide readership, the latter in particular being of considerable importance to aspiring academics or consultants looking to add to their reputations and hence client lists. Even so, it would be wrong to dismiss such journals out of hand. Many articles are written by acknowledged experts in their field, with the material presented in a form that is far more accessible than where the research underpinning the article was originally published. The Journal of Corporate Finance is another good case in point: like the Harvard Business Review, it invites distinguished academics, lawyers, and business practitioners to contribute articles or to participate in roundtable discussions invariably on finance or related topics. The presentations may include technical material but presented in such a way as to ensure that it can and will be easily understood; this is especially useful for business students who may lack the qualifications to understand the more technical analytical or statistical issues characteristic of articles appearing in learned journals. Used wisely, such journals can be a source of valuable and useful information; however, they are best used as a complement to and not a substitute for the more scholarly articles that should form the core of the dissertation’s Literature Review. Academic Books and Compendia: Academia places considerable importance on the production of research articles and less to the production of specialised monographs or books. Many authors write books to summarise a relatively large body of their previous research or to produce textbooks. Much of the material contained in specialised monographs will have been published previously; in many instances the results of prior research are incorporated in their entirety in the text, the main differences being that dated information will have been replaced with more recent data. A new introduction and conclusion are often added to tie the disparate research reports together and, where required, some of the material will have been updated to take account of recent developments or research in the study’s main field. Such publications provide little information beyond what could have been gotten from reading the original articles. There is nothing wrong with this practice, and from the research consumer’s point of view may actually be beneficial in that most of the relevant material will be available in a single source. We might also include in this category official publications, such as those issued by central banks (the Federal Reserve Bank, and its regional banks [Federal Reserve Bank of New York, Federal Reserve Bank of Boston, and so forth]; the Bank of England, Financial Services Agency, and other OECD central banks); government agencies (US Department of Commerce or the UK Department of Trade and Industry) and publications
of international organisations such as the International Monetary Fund, the World Bank, regional development banks and the Bank for International Settlements. Reports and publications of local Chambers of Commerce and other trade associations also fall within this category. Central bank and other official publications provide both statistical information, current and historical, covering a wide range of data; they also publish general reports intended for popular consumption as well as more specialised publications, such as financial or policy reviews, that tend to focus on matters of current interest, or working papers prepared by professional staff members. Most central banks and international financial institutions produce similar policy and research documents, mostly written in English, but frequently translated into other European languages, mainly French. Publications of the European Union are available in the languages of member countries. Academic Journals: The principal advantages of academic journal articles are that they are written by professionals with (usually) considerable knowledge of their subject, and are peer reviewed meaning that other specialists will have had the opportunity to assess whether their research findings are worthy of publication. This process ensures higher standards of objectivity than apply to the other literature categories we have reviewed. Authors accordingly are more likely to produce higher quality articles that tend to focus more on narrower academic concerns than on current issues or fads that tend to dominate the popular press. This is not true of all academic journals; there are several excellent publications that combine the highest academic standards with a focus on contemporary issues. Two of the best are published by the Brookings Institution in Washington, D.C., Brookings Papers on Economic Activity, one devoted to macroeconomic, the other to microeconomic, issues. Prominent academics are invited to analyse topical issues at forums held twice-yearly. Discussants are nominated to comment on individual reports, and their remarks are included alongside the main article. Audience members, too, contribute to the discussion, and the final publication contains summaries of their comments as well. The weaknesses of the academic literature are the obverse of its strengths. To ensure objectivity, the editorial process is typically very long, the time from submission to publication can be as long as two years, to the detriment of accessibility. To remedy this shortcoming many authors publish so-called working papers, which can be accessed at leading academic websites such as the Social Science Research Network (SSRN). This, however, poses the same problems that we noted above: neither the fact of having been included on the SSRN’s website, nor the academic affiliation of its author can guarantee the quality of the publication; indeed, some authors do not even bother to indicate their affiliation. This poses few if any problems for academics, equipped as they are (or should be) to separate the wheat from chaff. For consumers of academic research, the inability to properly assess whether the article has any merit at all suggests that all such studies should be viewed cautiously.
One technique widely favoured by consumers as a quality measure is the number of times a given article has been accessed, with higher downloading frequencies taken as an indicator of intrinsic merit. The flaw here is the equation of interest with importance. It is interesting to note that the SSRN does list the ten most popular articles measured by the number of times they were accessed in a given time period. In each case, the most popular articles were also written by some of the country’s best known academics. It is the combination of frequency and author, not frequency alone, that gives this measure any degree of credibility. Against this backdrop, peer review in leading journals is still the soundest guide to the worth of any individual publication. Finally, we might note that there are a huge number of publications that could be included in this category of extremely variable quality. The fact of peer review here is no guarantee of the value of the final publication. True, editors go to great lengths to ensure that reviewers are qualified and competent, and often provide a comprehensive list of the members of their editorial boards. There should, however, be no presumption that any of these will have seen, let alone commented, on a particular article; in some instances the editor has the final say as to whether or not to publish a given article, a far cry from the highest standards of the best academic literature.
5. Evaluation of Published Research Resources What conclusions can we draw about the value of the different research resources used by students in the preparation of their dissertations? With respect to business literature, there is a general consensus that the closer the methodology approaches that favoured by the ‘softer’ social sciences (communications, decision making, motivation, leadership, and so forth) the weaker is their claim to scientific validity; this conclusion holds true equally for several other areas of business research. The principal exception is economic or financial research, where the favoured method of analysis more closely approximates to that of the natural or physical sciences. Financial economists typically use published information; most other disciplines have to generate their own data, usually based on surveys, the design and data collection are often more time consuming than is the analysis. Statistical procedures are employed, though not always of the same degree of sophistication or applied with the same competence as economic modellers; nor are the full set of results always published, making it difficult for other researchers to evaluate the claim(s) being made. Against this backdrop, you will need help to determine how best to interpret the value of the information being generated. How, in other words, you can best establish the relevance and reliability of the information being developed. Abelson2 suggests the acronym MAGIC as the best way to judge the worth of information. M = Magnitude: The important point here is the how large are the effects being reported? How reliable and broadly based are they? In each case, the more impressive the findings, the more reliable they may be taken to be. A = Articulation: How well is the research story being told? Does it consider both sides of the issue fairly, and does it do so in a sound and reasoned form? G = Generality: Do the findings have wide implications or are they specific to a particular point in time or to a specific set of circumstances? How well is the claim(s) made supported? I = Interestingness (or perhaps, better still, importance), the ability of the research findings to influence how other people view the topic, even the potential to alter or change their beliefs concerning the phenomenon under investigation. C = Credibility, that is, is the argument being put forward theoretically and methodologically sound? Have alternative perspectives been confronted? Are the data too good to be true: do the results depend upon statistical procedures correctly applied that rule out the possibility of the observed outcome occurring by chance or do they depend upon personal observation or experience only?
R. P. Abelson, Statistics as Principal Argument (Erlbaum: 1995).
These arguments provide a sound way forward for assessing the merits of individual publications. A better approach would be to look at the specific findings within the context of a broader body of research, the ultimate purpose of the Literature Review. It is of course possible that your topic will have generated only a very small number of publications in which case the ‘competitive test’ will obviously fail. More likely, you will have access to a very large body of research, as tends to be true in virtually all areas of finance. In which case, you will have to exercise great care and skill in weeding out those that are worth serious consideration from those that are not or, more generally, knowing when it is appropriate to use one particular source over another potential source. Specialised newspapers or magazines will not be as reliable as articles found in scholarly journals as the latter are reviewed by a panel of individuals who are experts in their field, while the former will have been approved by an editor who may or may not have specialised knowledge of the subject matter to which the article refers. Our point is not to argue in favour of the superiority of one source over the other: they both are relevant so long as you understand their limitations. Newspapers and magazines are the main source for current information, of fast breaking developments occurring across the business world. It will be months, even years, before academics scrutinise these issues by applying high powered research techniques, statistical analysis, for example. That is the fundamental difference between journalism and research, and this distinction should be remembered when using information deriving from these different sources. One final point is worth stressing here: the distinction between advocacy and research; it is not unusual in the world of business to find the former masquerading as the latter. This is another important characteristic that differentiates academic from non-academic sources. As we have seen, the rules of the academic game are fundamentally different from those applying in the non-academic world. Academics are meant to generate research results to better explain or clarify aspects of relevant business or financial topics. They present their findings fairly and objectively. Editors of scholarly journals will weed out from research submissions material that fails to meet these requirements; if authors refuse to alter or amend their articles, the editor has the right to refuse publication. Errors will inevitably occur, but where we are talking about serious research, they are more likely to be unintentional, the result of an oversight on the author’s, reviewer’s or editor’s part, than to a deliberate attempt at deception. This can and does happen, but the risks are much lower in the scholarly than in the more popular literature. Why? Because no comparable standards apply elsewhere, where authors are free to express their views, no matter how controversial they may be. Indeed, controversy is likely to stimulate newspaper or magazine sales and thus revenues, which after all is the purpose of for-profit news media. This does not rule out use of advocacy-based financial or business research; many of the best known management or financial theories have their origins in research that went against the grain of prevailing theories, but not always. It is best to regard advocacy literature with suspicion, because the intent is not necessarily
to enlighten but rather to convince, often of a point of view that does not command broad support. Academic research, though widely (and correctly) conceded to be the most objective source of information is not without problems of its own. These limitations apply especially in respect of business research, and other soft sciences. Even within business subjects, there is considerable dispersion not only in terms of methodology, but also in terms of interpretation. The empirical literature tends to follow a common structure. A theory of the phenomenon to be investigated is adumbrated and tested typically using statistical analysis. This approach is most common in economics and finance research, but increasingly is finding its way into management and marketing research as well. A key difference is that in the latter disciplines there are few theories as well developed as those in economics or finance. Moreover, while economic and financial research makes use of numerous large data bases from which to test hypotheses, among management and marketing disciplines the data tends to be developed ad hoc, that is, primary data based upon surveys or interviews of varying size, or experiments, with the data thus generated tested for the ‘associations’ or ‘differences’ the research is designed to identify. We shall have much more to say later about surveys and interviews; here we are concerned primarily with the limitations of these approaches and how these limitations affect any generalisations that can be drawn from the test results. The first question that we need to answer is: How do we know we have found anything of significance? Significance can be defined either in its everyday sense of ‘importance’ or in its narrower, more technical statistical sense, that is, whether the observed results could have arisen by chance. Here we need to draw a distinction between business studies research on the one hand, and financial and economic research on the other. Analysts argue that the former (unlike the latter) tends to suffer from three main shortcomings: (1) the lack of research replication; (2) the inability to cumulate or generalise the research results; and (3) the faulty interpretation of statistical significance.3 The first point is important because it stands in marked contrast to the way research in the physical or natural sciences is conducted, where research findings are expected to be replicated to establish the robustness of the original findings. If the original results cannot be verified by subsequent research, then the validity of the initial findings should be dismissed. Nor does the growing use of high powered statistical techniques change things all that much: statistical significance is not always proof of support for the hypothesis being tested. Statistical results can and often are misused, either because the researcher fails to understand the limitations of the methodology, or because crucial assumptions underpinning the statistical model have not been met, or through some combination of the two. Replication permits detection and correction of such errors; with the current
The following discussion is adapted from John Kmetz, The Skeptic’s Handbook: Consumer Guidelines and a Critical Assessment of Business and Management Research (2002).
research ethos discouraging such an approach, much of the management research literature should be regarded with caution. That economic or financial research avoids all of the shortcomings of research methods favoured in other business studies areas is only partially true. Much of the received wisdom, based upon ‘strong’ assumptions concerning human behaviour, has been questioned, while the application of new, less restrictive paradigms has challenged many of the disciplines’ most basic conclusions. Two assumptions in particular lie at the heart of much economic and financial research: (1) individuals behave rationally, that is, when confronted with choices they select the one that best serves their interest (self-interest); and (2) individuals have access to identical information. The first assumption minimises the possibility that psychological forces can and do influence human economic behaviour; once this assumption is relaxed, so-called Behaviouralist models can begin to unravel phenomena that appear to defy traditional economic analysis, speculative ‘Bubbles’, for example. The second perspective undermines the traditional market model since it assumes that buyers and sellers do not have access to identical information, in which case, the latter can exploit the former with an attendant negative impact on economic efficiency. These challenges have forced researchers to question their respective approaches, and by having drawn attention to existing weaknesses have caused practitioners to confront them. In economics and finance, asymmetric information and the research it has generated has proved to be of immense value, with many of the insights having been incorporated into the traditional curriculum. The other criticisms have fared less well, but that hasn’t deterred researchers from pursuing their research programme even though the balance of evidence makes clear there is much less there than meets the eye. We raise these issues so as to provide a more balanced view of the quality of informational resources available to business students. Research methods and results in all disciplines, even in the physical sciences, have and continue to be questioned. But they don’t change the fact that much of traditional theory still has considerable merit and accordingly is still taught. Indeed, exceptions and anomalies are the rule not the exception in all areas of science, and that is how things should be. Karl Popper, the eminent philosopher, pointed out long ago that the struggle to accommodate anomalous evidence is an important aspect of the accumulation of scientific knowledge. This struggle, Popper points out, rarely leads to the complete abandonment of the existing conceptual framework. Rather, it typically results in modifications to the current paradigm capable of accommodating observed anomalies, while retaining those features of the intellectual framework that still fit. Business students can rest assured that the research upon which their dissertations are based is still valid. By highlighting the limitations of different informational resources we do not mean to suggest that all such resources lack any value at all; as we have seen, they also have their advantages. The point is we need to take the good with the bad to
arrive at a balanced view of what the literature has to offer. Business students are fortunate in that the available literature resources are vast. The aim is to ensure you make best use of what is available, hence the emphasis we place on research methods to which this Guide is a part.
6. Methodology If you are preparing a dissertation in any subject area other than finance, you will be required to generate primary data. Primary data are produced through surveys or interviews or some combination of the two. Some finance dissertations also involve the production of primary data; here, however, the survey results are intended to supplement the basic objective of the dissertation. For example, one recent dissertation sought to determine what impact adoption of EU accounting rules would have on the reported profitability of leading Turkish manufacturing companies. The author’s main hypothesis was that the shift would result in lower earnings than would have been reported under traditional accounting standards. The results were mixed so he decided to supplement his financial analysis by interviewing analysts employed in leading local brokerage and accounting firms. He was under no obligation to do so, but was convinced that the additional information he would develop would illuminate the ambiguous results of the financial analysis. It did. As you will have noticed from Exhibit 2, there are two principal dimensions to the Methodology section. (1) It should provide an in-depth discussion of how the study will be undertaken and how it fits in with the issue your research is intended to answer. There are several different approaches that could be used to develop the information in your analysis. You will need to provide justification for your particular approach: why it was chosen in preference to alternative ways of going about collecting data, that is, why you believe it provides the best fit for addressing your chosen topic. And (2) if you are using one provide a detailed consideration of how you designed your survey. 6.1 Surveys A survey is nothing more than a systematic method of collecting information from a selected group of people who are asked a series of questions.4 (a) When Should I Use a Survey? You employ a survey when it is faster, easier, or less expensive to use than other methods. Sometimes other data collection methods are preferable. For example, to determine the number of people using a clinic, you could simply count the number of signatures on the sign-in sheet, or examine the daily records; no survey is required to obtain such information. Nor is there a need to undertake a survey when the information exists in some other form that can easily be accessed, for example, in archives, records, or databases. Using such data can save you time, money, and effort.
Much of the following discussion, examples and some charts and tables are adapted from: Houston, Survey Handbook (Organizational Systems Division, Total Quality Leadership Office of the Under Secretary of the Navy).
(b) Survey Preparation What is the purpose of the survey? Surveys are used for many purposes, and these influence its form and content. Some of these include: • • • • To obtain information that can be used to verify the main predictions of the theoretical literature. Identify organizational strengths and weaknesses Targeting areas in need of improvement Assess the effectiveness of new or existing policies or programs
What specific information is needed? To meet the purpose of the survey, identify the topics or issues of interest and the forms of information needed. If, for example, you are interested in determining the importance of maintaining current dividends, you might ask questions about how often and under what circumstances dividends are increased. You might also ask people to compare the value of dividend payout with alternative methods of returning cash, and so forth. If the objective is determine future actions, you might ask respondents to identify what factors influence future payout decisions. Who will be surveyed? Identify the types of people who can provide the information you are interested in developing. Do they belong to a particular group (students, managers), a single category within that group (post-graduate business students or middle managers) or do they come from a variety of categories (under- and post-graduate students, all managers of Barclay’s Southwark branch)? How will the survey be administered? There are three main ways to conduct surveys – face-to-face interviews, telephone interviews, and written surveys, conducted either by post, email or group sessions – with the method chosen capable of providing sufficient information as quickly, efficiently and economically as possible. Interviews only make sense when you need to collect detailed information from a relatively small group of people. Interviews can be used to explore issues and options to a greater extent than written surveys.
What resources will be needed? You will be responsible for designing and implementing the survey. On the other hand, people who respond to the survey should be considered as a resource in terms of number, time invested, and information provided. What survey items will be used? Occasionally it is possible to use existing surveys, but you must consider ways in which they can be updated to reflect any relevant changes that will have taken place between the two surveys. For post graduate business students, this is practical only where previous research surveys are deposited in the School’s library. It may be possible to access completed UWIC dissertations, but unless they are online you will have to visit the library in Cardiff, which may not always be practicable. It is best, therefore, to think in terms of an original survey, though the specific questions you intend to ask in the survey may be gleaned from prior studies uncovered in your Literature Review. How will survey information be analyzed and reported? Once developed, you must consider the best way the data can be organized and interpreted. There are numerous possibilities: some relatively simple and straightforward – tables, frequency distributions, line graphs, bar charts, pie charts, or histograms – while others, only slightly more complex, may involve the calculation of averages (means) and their associated standard deviations, medians and modes. The analysis dictates the best way to present your data. If, for example, you intend to apply formal statistical procedures, then you may want to present summary measures only; if the tabulated information constitutes the main data then you may want to use more expansive formats. All survey results are intended to measure differences or associations. What impact does training have on worker productivity? Does additional training lead to even better performance? In the first instance we are interested in determining how big the effect, if any, is of requiring employees to attend training sessions. One way to do this would be calculate average worker productivity, usually measured as output per hour worked, before and after attending training sessions. Does it matter whether the training occurs during work hours, after hours or on the weekend? Are the results sensitive to whether employees attending after work sessions are or are not paid? In each case, the answer depends upon the size of the effect. If after completing the training session, for example, the improvement amounts to only 1.75 per cent, is that large enough to confirm the value of the training sessions? Assume further that paying staff attending out of hours training sessions improves productivity by 4.75 per cent compared with 1.5 per cent if employees are not paid? In the first example, the effect appears too small to confirm the claim that training matters; on ether hand, the differences in productivity between employees attending training sessions who are or are not paid appears big enough to support the contention that payment matters. Survey
results could be supplemented with interviews asking employees what difference payment had on their attitudes towards the value of training. A word of caution: even large differences can sometimes be misleading, especially where sample sizes are relatively small. It may just be that the observed difference of paying employees to attend training sessions after hours may have arisen as a result of chance. Later on various statistical procedures will be described that can be used to determine whether such results could have arisen by chance. How many people need to respond to the survey? What is of interest is not the sample results but rather whether we can draw any meaningful inferences concerning the larger population from which the sample was drawn. That really is the point of a survey: in most cases it is physically impossible or prohibitively expensive (or both) to interview each person in the group you are interested in. Samples of inappropriate size can lead to misleading results, inaccurate interpretations, and ineffective actions. (c) Constructing Survey Items Survey questions should be: Clearly written. Statements should be short, to the point and easy to read. Jargon, technical terms or unfamiliar acronyms should be avoided. Concise. Get to the point as quickly as possible: wordy questions are distracting and could easily defeat the purpose of the survey. Specific. Focus on one idea at a time. Each item should collect information on a single behavior, attitude, opinion, event or subject. Explicit. Do not force people to guess about what is being asked. Be sure they understand what information you want by explicitly stating so. If necessary, highlight or underline what is needed by way of an answer. Selecting Response Formats Along with the statements and questions, you need to provide methods for people to give their answers. Typically, survey items are used to ask people how much they agree with some statement, how important something is, or how often something happens. Rating Scales. Surveys often ask that products or services be rated according to some scale. Some survey items present statements and ask people to rate how much they agree or disagree with the statements.
For example, on a scale of to five tell me whether your supervisor encourages subordinates to participate in important decisions. 1 = Strongly disagree; 2 = Disagree; 3 = Slightly disagree; 4 = Agree; 5 = Strongly agree. When creating rating scales, ensure that the end points (or anchors) are equal and opposite in meaning. Failure to do so runs the risk of biasing the survey responses. Note, finally, that when the survey results are tabulated, you will obtain a mean rating of, say, 4.25; in terms of the way the above rating system is defined, that would indicate that most respondents took the view their supervisors did encourage their participation in making important decisions. You may also calculate the associated standard deviation, which indicates the extent to which individual responses deviate from the mean value. The higher the standard deviation, the greater is the dispersion of responses, and vice versa. Note, finally, that the mean and standard deviation are measured in the same units, hence the preference for this statistic over the variance.5 Ranking Items. Another common response choice is to ask respondents to rank-order a list of options in terms of some factor, importance, for example. These data help to prioritise what is most important to respondents. Thus, if speed of service and cost of a restaurant meal are ranked higher than quality of food and variety on the menu, efforts can be focused on those aspects most important to respondents. Consider the following example: Please rank the following five objectives in terms of importance by marking a 1 next to the most important objective, a 2 next to the second most important objective, and so forth: --------Achieving a quick success --------Increasing the amount of output --------Reducing the price charged to customers --------Reducing the work backlog --------Reducing the number of defects Selecting Options. This response format presents a list of statements or options to which respondents are asked to circle one or more items that apply to them. This format is similar to the ranking question, but does not require survey respondents to put things in any particular order; this kind of question is easier to respond to than having to rankorder questions, for in many cases respondents are unable prioritise when they feel that everything is of equal importance.
The standard deviation, usually denoted by the lower case Greek letter sigma (σ) is the square root of the variance, hence the variance is the square of the standard deviation and thus the square of the data unit. For example, if the data are measured in per cent, then so to will the standard deviation, while the variance will be per cent squared, a unit of measure that is not easily grasped or understood.
Comments and Open-ended Questions. The fourth question/response type allows respondents to provide additional comments or other information in response to general questions. These questions usually leave blank space where respondents can write whatever is important in their own words and format. Examples are listed below. Do you have any suggestions on how we can improve classroom lectures? Are there any products or services you need that we do not currently provide? Is there anything else you would like us to know? Demographic Questions. Demographic information is used mainly to segment respondents into narrower groups based on specific characteristics such as age, level of education, marital status or salary level. Segmentation is important if one of the purposes of the survey is to determine whether significant differences in responses exist between groups. Which one of the following age categories do you currently fall into? --------Under 20 --------20-29 --------30-39 --------40-49 --------50-59 -------Over 59 What is the highest educational level you have attained? -----Less than secondary school diploma -----Secondary school diploma -----Associate Degree -----Bachelor’s Degree -----Master’s Degree -----Doctoral Degree What is your martial status? -----Single (never married) -----Married -----Divorced/Separated -----Widowed What is your current position and salary? Demographic items can be included that ask people to identify themselves, that is, give their names, where they live or personal information relevant to the survey. The more
detail you require, the more intrusive people perceive the survey to be. People who feel uncomfortable providing detailed personal information are unlikely to answer questions honestly or may decline to answer them at all. To assuage privacy concerns, you should include a description of how the survey answers will be used and a promise to keep individual responses anonymous. ‘The information you are providing me with will be used exclusively in an MBA dissertation designed to test the importance customers attach to the quality of service they receive at this supermarket, and for no other purpose.’ (d) Reviewing Items After developing a set of potential survey questions and response scales, review them to make sure that they are: Relevant to the purpose of the survey. Items that stray from the purpose will not provide the information needed. You must have a specific reason why an item is being asked in a survey. Always focus on the purpose of the survey and the type of information needed to support that purpose. Carefully match the items to your survey purpose to ensure that they address the issues that have been identified. Appropriate for the individuals being surveyed. Do not include items that people do not have the knowledge to answer. For example, store customers could probably answer questions about a store’s layout; but they would not be able to answer questions about a store’s compliance with health or safety regulations. Capable of providing the appropriate type of results. Anticipate how the information being developed will be summarised. Summaries should provide the types and level of information required by the survey users. Will the results be presented in simple bar charts or subjected to further analysis? How much detail will be required to meet the information needs of those using the survey? If, for example, survey users are interested in general impressions only, it is a waste of time to calculate averages to the seventh decimal point. If, on the other hand, precise distinctions among quality features are required, then just providing a list of verbatim comments is unlikely to be helpful either. Check items to ensure that they are not: Ambiguous. Avoid words or phrases that can be easily misinterpreted. Overlapping. Avoid presenting response choices that overlap. Overlapping choices can lead to confusion on the part of the respondent and difficulty in interpreting information. Circle the number that best represents the number of hours per week you spend on preparing your assignments: 1. None at all 2. Less than one hour per week 3. One to two hours per week 4. Two to three hours per week 38
5. Three to four hours per week 6. Four or more hours per week Double-barreled. Avoid having respondents address two different issues in the same item. Did your teacher listen carefully to your question and did s/he answer it promptly?
Leading. Avoid giving clues that point to the desired answer, or limiting the answers to
those desired. Do you think you are being forced to spend too much time on your accounting lectures? Redundant. Avoid duplication, that is, asking the same question more than once. (e) Administering the Survey Written surveys are typically administered by mail and in-person. Mail-Out Surveys. The most common method of administration is to mail surveys to the customer sample with a stamped reply envelope. When using the mail-out process, allow time for the survey to get to its destination, time for the respondent to complete it, and time for the survey to be returned. Surveys mailed to distant (i.e., overseas) addresses might require will obviously require considerably more time for their return than those posted locally.
In-Person Surveys. In-person (or face-to-face) surveys can be done in different ways. One
way is to ask respondents to complete an on-the-spot survey; use only very short surveys in this situation. A second way is to ask respondents to come to a particular location to complete a survey. While a third format is to visit respondents at their homes or work sites and ask them to complete the survey. (f) Analyzing the Survey Results After surveys have been administered, you need to summarize, analyze, and interpret the results. This requires sorting and consolidating individual responses to survey items so that they can be more easily displayed and understood. Frequency distributions. Frequency distributions are a very simple method of displaying the variation in responses to survey items. These distributions can be developed by counting and recording answers according to the response scales used in the survey. Frequency distributions are typically presented as tables or bar graphs for ease of interpretation.
Exhibit 4 Example of a Table Showing a Frequency Distribution of Responses to Survey Items
Exhibit 5 Example of a Frequency Distribution Presented as a Graph
Percentages. One of the simplest ways to summarize survey information is with percentages. Percentages are calculated by dividing the total of a specific response choice by the total number of responses and multiplying by 100. Percentages can be displayed using tables, bar graphs, or pie charts. Exhibit 6 Example of Percentages Presented as Pie Chart: Areas Where Possible Synergies in Mergers and Acquisitions Exist
Source: KPMG Mergers and Acquisitions Report
Line Graph: A line graph typically presents data organised against time. Shown below is Tesco’s share price from 15 June 2004 to 16 January 2009 and measured in pence per share. Exhibit 7 Example of Data Presented as a Graph Tesco’s Share Price
Source: Yahoo Finance
Sampling Since surveys can be expensive in terms of printing, mailing, and data entry costs, it is common to select a subset or small group of people from which to gather data. This subset or small group is known as a sample. The people targeted to fill out a survey are chosen from a specific population. A population consists of all members of an organization or group of people who possess the desired traits, knowledge, experience, or characteristics of interest to the survey project. A convenient rule of thumb for selecting the size of a sample when the size of the population is known (employees that are to be selected for corporate training sessions) is to randomly select 10-20 per cent of the members from the population being investigated. Random sampling means that each member of the population has an equal chance of being surveyed. Random sampling improves the probability that information obtained through the survey will represent the responses the entire population would give. One way to simplify the situation is to think of respondents in terms of segmentation. Segmentation means to sort respondents into groups based on similar characteristics, such as relationship to the organization (internal, end-user, supplier), position, or type(s) of products or services used. Sorting respondents into groups is a commonly used method to identify and distinguish the experiences and perceptions of distinct groups. Demographic items are used in surveys to identify respondents so that their data can be sorted and analyzed as needed. We conclude this section with an example of a recent survey undertaken to better understand the factors that influence how Chief Financial Officers (CFOs) at leading US companies view dividend policy and share buybacks.6 The authors of the survey sent out questionnaires to leading CFOs who are members of Financial Executives International, an association that includes both publicly traded and privately owned companies. A number of different procedures were used to deliver the questionnaire, and incentives were offered to elicit a high response rate. The authors report a response rate of 16 per cent, which is more or less typical for these types of surveys. (It would appear that the incentives had little or no impact on whether executives were prepared to complete and return the survey.) A follow up interview was conducted by the authors, mainly via telephone though several were conducted face-to-face. Interviews lasted between 40 minutes and more than two hours. The authors report that CFOs were ‘remarkably candid and straightforward’ Interviewees were not chosen randomly so that the researchers could obtain crosssectional differences in firm characteristics and payout policies. Because dividend cuts are rare (as financial theory predicts) companies that reduced, or contemplated reducing, their dividends were over-sampled.
Brav, Graham, Harvey and Michealy (2005): “Payout Policy in the 21st Century,” Journal of Financial Economics, 77, and Brav, Graham, Harvey and Michaely (2008): “The Effect of the May 2003 Dividend Tax Cut on Corporate Dividend Policy: Empirical and Survey Evidence,” National Tax Journal, LXI.
These additional results were then integrated with the survey evidence to ‘reinforce and clarify the survey responses but occasionally to provide a counterpoint.’ The results, presented below, are pretty much self-explanatory. In some places they confirm previous theories as to dividend policy – firms regard dividend policy as being extremely important and are reluctant to reduce or suspend dividends unless forced by circumstances to do so – while offering interesting insights into factors that influence share buybacks, the principal alternative to dividend payout as a way of returning cash to shareholders. The information provided by such surveys adds texture to the more theoretical and quantitative studies that are characteristic of financial research. It is difficult to assess the merits of this particular survey. True, great care has gone into the design and delivery of the survey; questions were pre-tested so as to note how it long to took to complete and to provide feedback to ensure the researchers were on the right track. This process resulted in the rewording of several questions and the deletion of one quarter of the original content. They authors also tested whether the order in which the questions were asked mattered (it didn’t), and whether too many sub-parts might result in ‘burn-out’ that is, affected the quality of the responses (it, too, didn’t). Despite the care that went into its preparation, and the substantial costs incurred, we cannot conclude it is the last word on the subject. Previous surveys, equally carefully crafted, provide conflicting evidence on a number of other important financial issues (capital budgeting techniques, for example), though whether these differences are meaningful is another question. Even large differences in the reported results do not constitute evidence that the indicated effect could not have arisen by chance. The survey is thus a source of good news in that many venerable propositions concerning dividend payout policy were vindicated and bad news, that many important conclusions found in the relevant literature are not supported by CFO views. Again additional research will needed before we can confirm (or reject) the significance of these findings.
Exhibit 8 Financial Executive Views Concerning Payout Policy
Dividends Very important. Do not cut dividends except in extreme cases. Sticky, inflexible, smooth through time. Little reward for increasing. Big market penalty for reducing or omitting. Most common target is the level of dividends followed by payout ratio and growth of dividends. Target is viewed as rather flexible. External funds would be raised before cutting dividends. First maintain historic dividend level, then make incremental investment decisions. Dividend increases tied to permanent , stable earnings. At the margin, do not reduce repurchases to in order to increase dividends. Tax disadvantage of dividends of secondary importance. Dividends convey information. Dividends are not a self-imposed cost to signal firm quality or separate from competitors. Retail investors like dividends if tax disadvantaged. Retail investors like dividends about the same as institutions like dividends. Institutions generally like dividends but institutions are not sought out to monitor firm. Not important. Not important. Not important. Not important. Not important. Not important. Not important. Expected to pay dividends. … we would keep dividend commitment minimised … when earnings become positive and stable. … institutions demand dividends. … they have fewer investment opportunities available. Source: (Brav, et al.:2004). Historical Level Flexibility Consequence if Increased Consequence if Reduced Target Repurchases Historical level is not very important. Very flexible. Smoothing not needed. Stock price increase when repurchase plan is announced. Little consequence from one year to the next, though firms try to complete plans. Most common target is dollar amount of repurchase, a very flexible target. Repurchases would be reduced before raising external funds. First investment decision, then make repurchase decisions. Repurchases increase with permanent earnings but also with temporary earnings. At the margin, reduce dividends increases (not level) in order to increase repurchases. Tax advantage of repurchases of secondary importance. Repurchases convey information. Repurchases are not used to as a self-imposed cost to signal firm quality or separate from competitors. Retail investors like repurchases less than they like dividends. Institutions generally like dividends about the same as they like dividends. Repurchase shares when stock undervalued by market. Repurchasing in an attempt to increase EPS is very important. Repurchasing to offset dilution is important. Use to reduce cash holdings when cash is sufficiently high. Do not repurchase if float is not sufficient. Important. Important. Expected to return capital, including repurchasing shares. We would rely heavily on repurchases to return cash to shareholders. …the market is undervaluing their stock. …they have extra cash on the balance sheet. …institutions demand repurchases. …they have fewer profitable investments. …they think that repurchases can increase EPS or offset stock option dilution.
Relation to External Funds Relation to Investment Earnings Quality Substitutes? Taxes Convey Information? Signal? Retail Investors
Institutional Investors Stock Price EPS Stock Options Cash on Balance Sheet Float or Liquidity Mergers and Acquisitions Takeovers Cash Cows If we were starting over… Non-payers will initiate when…
6.2 Statistical Inference When you can measure what you are speaking about and express it in numbers you know something about it; but when you can not measure it when you can not express in numbers your knowledge is of a meagre and unsatisfactory kind. Lord Kelvin7 Statistical methods are at the heart of economic and financial analysis. They provide the basis for some of the discipline’s best known theories and the analytical means to differentiate sound from flawed explanations. Since this is not meant to be a quantitative Guide, our discussion will, accordingly, be limited to those topics that are central to a proper understanding of the way statistical procedures can best be used in MBA dissertations. One of the key abstractions used by financial economists is the assumption of certainty, that is, that only one outcome is possible and that outcome is widely known. Given certainty, it is easy to predict how a rational, self-interested individual will behave. Things become considerably more complicated once uncertainty – the possibility of more than one outcome – is introduced into the analysis. Uncertainty compels us to act ‘upon opinion rather knowledge’ as the eminent University of Chicago economist Frank Knight once observed. If uncertainty is the driving force behind the quest for knowledge, then statistics provides the means for accessing that knowledge, and it does this by accounting for uncertainty in a formal, mathematical way. Seen this way, knowledge may be defined as a competition between different descriptions of how the world works, with the competition decided by data. Statistical analysis allows for logical and rigorous management of this competition, by enabling researchers to draw conclusions about a large number of events or the properties of a population from a sample of those events or from the population itself. Classical statistics is based on the use of observed frequencies of different events to make inferences about the population or to test hypotheses. An essential tool of classical statistics is the null hypothesis. The importance of the null hypothesis to statistical inference derives from the fact that it is frequently impossible to prove that something occurred. To obviate this difficulty, a null hypothesis is constructed that is the complement of the hypothesis of interest. We use available data to assess the likelihood that the null hypothesis is true. As the probability that the null hypothesis is true decreases, it becomes a less likely description of how the world works. At some point, the null hypothesis is considered to be disproved and is rejected, leaving the original hypothesis. By scholarly convention, the null hypothesis is rejected when the chance of observing the data, given that the null hypothesis were true, is less than 5 per cent. The 5 per cent criterion has been attacked as being arbitrary; some natural and social scientists propose
PLA, vol. 1, "Electrical Units of Measurement", 3 May 1883.
even more rigorous cut-off values, say, 1 per cent, though this standard, too, can be dismissed as being equally arbitrary. The simple fact is that the 5 per cent value remains the most widely used criterion; this consensus is actually quite beneficial since it means that 5 per cent has become the more or less universal standard for evaluating statistical evidence. A simple example will help to illustrate the way the null hypothesis is used. Suppose you are given a coin and asked to determine whether this coin was not fair. If the coin was fair, then when the coin is tossed we would expect “heads” to come up as often as “tails”. Since the coin has two sides, for each toss the likelihood of one or the other side coming up should be the same. One toss cannot settle the matter; nor would we reject the idea of it being a fair coin if two tosses both turned up heads. How would things change if after ten tosses six heads came up, or after 100 tosses 60 came up as heads? Can we still consider the coin to be fair? And how do we interpret this information? From the perspective of classical statistics, the first thing we do is to construct the null hypothesis, which says that the coin is fair, and then go on to determine the probability of observing 60 per cent or more heads as function of the number of tosses. After ten tosses there is about a 35 per cent chance of observing six heads; after 100 tosses, the chances of a head appearing 60 per cent of the time decline to around 3 per cent. Knowing this, we would reject the null hypothesis that the coin is fair. We have not proved that the coin was unbiased only that, if the coin were fair, the chance of observing the data is less than 5 per cent. Choosing the right test to compare measurements is not quite as straightforward as standard statistics texts suggest, as you must choose between two broad families of tests. Many statistical tests are based upon the assumption that the data are sampled from a Gaussian (or Normal) distribution. Such tests are referred to as parametric tests; commonly used parametric tests are shown in the first column of the table below and include the t-test and analysis of variance. Tests that make no assumptions about the population distribution are referred to as nonparametric tests. All commonly used nonparametric tests rank the outcome variable from low to high and then analyze the ranks. These tests are listed in the second column of the table; they are also called distribution-free tests. How does one choose between parametric and nonparametric tests? Sometimes it is very easy to do so, but not always. You would choose a parametric test if you were sure the data you collected are sampled from a population that follows a Gaussian distribution (at least approximately).8 Nonparametric tests should be selected in the following three situations: (1) If the outcome is a rank or a score then the population is clearly not Gaussian;
There are several tests that can be used to determine whether the data you are studying follow a normal (Gaussian) distribution. The best known of these are the Kolmogorov-Smirnov and Shapiro-Wilks tests.
(2) If some values are ‘off the scale,’ that is, too high or too low to measure. Even if the population is Gaussian, it is impossible to analyze such data with a parametric test since you don't know all of the values. Using a nonparametric test with these data is simple. Assign values too low to measure an arbitrary very low value and assign values too high to measure an arbitrary very high value. Then perform a nonparametric test. Since the nonparametric test only knows about the relative ranks of the values, it won't matter that you didn't know all the values exactly; and (3) When the data are measurements, and you are sure that the population is not normally distributed. If the data are not sampled from a Gaussian distribution, consider whether you can transform the values to make the distribution become Gaussian. You might, for example, take the logarithm or reciprocal of all values. Financial researchers long ago concluded that stock returns are not normally distributed, but when converted to logarithms they correspond more closely to a Gaussian distribution. It is not always easy to decide whether a sample comes from a Gaussian population. If you collect many data points (more than 100), look at the distribution of data and it will be fairly obvious whether the distribution is approximately bell shaped (Gaussian). With a smaller number of data points, it will be difficult to tell by inspection alone whether the data are Gaussian; sometimes even formal tests cannot always discriminate between Gaussian and non-Gaussian distributions. You should look at previous data as well. It bears repeating that what matters most is the distribution of the overall population, and not the distribution of your sample. In deciding whether a population is Gaussian, look at all available data, not just data in the current experiment. Consider the source of scatter. When the scatter comes from the sum of numerous sources (with no one source contributing most of the scatter), you would expect to find a roughly Gaussian distribution. When in doubt, some people choose a parametric test – because they aren't sure the Gaussian assumption is violated – while others choose a nonparametric test for precisely the opposite reason. Does it matter whether you choose a parametric or nonparametric test? The answer depends on the sample size. Four cases are worth considering in this connection: • Large samples. What happens when you use a parametric test with data from a non-Gaussian population? The Central Limit Theorem ensures that parametric tests work well with large samples even if the population is non-Gaussian. In other words, parametric tests are robust to deviations from Gaussian distributions, so long as the samples are large. The snag is that it is impossible to say how large is large enough, as it depends on the nature of the particular non-Gaussian distribution. Unless the population distribution is really weird, you are probably safe choosing a parametric test when there are at least two dozen data points in each group.
Large samples. What happens when you use a nonparametric test with data from a Gaussian population? Nonparametric tests work well with large samples from Gaussian populations. The p values tend to be a bit too large, but the discrepancy is small. In other words, nonparametric tests are only slightly less powerful than parametric tests with large samples. Small samples. What happens when you use a parametric test with data from nonGaussian populations? You can't rely on the central limit theorem, so the p value may be inaccurate. Small samples. When you use a nonparametric test with data from a Gaussian population, the P values tend to be too high. The nonparametric tests lack statistical power with small samples.
Large data sets, in short, present no problems. It is usually easy to tell if the data come from a Gaussian population, but it doesn't really matter because the non-parametric tests are so powerful and the parametric tests are so robust. Small data sets, on the other hand, create a dilemma. It is difficult to tell if the data come from a Gaussian population, and it matters: non-parametric tests are not powerful and parametric tests are not robust. With many tests, you must choose whether you wish to calculate a one- or two-sided p value (also known as a one- or two-tailed p value). Let's review the difference in the context of a t-test (see below). The p value is calculated for the null hypothesis that the two population means are equal, and any discrepancy between the sample means is due to chance. If the null hypothesis is true, the one-sided p value is the probability that two sample means would differ as much as was observed (or further) in the direction specified by the hypothesis just by chance, even though the means of the overall populations are actually equal. The two-sided p value also includes the probability that the sample means would differ that much in the opposite direction (that is, the other group has the larger mean). The two-sided p value is, not unexpectedly, twice the onesided p value. A one-sided p value is appropriate when you can state with certainty (and before collecting any data) that there either will be no difference between the means or that the difference will go in a direction you can specify in advance (that is, you have specified which group will have the larger mean). If you cannot specify the direction of any difference before collecting data, then a two-sided p value is more appropriate. When in doubt, select a two-sided p value. If you select a one-sided test, you should do so before collecting any data and you need to state the direction of your experimental hypothesis. If the data go the other way, you must be willing to attribute that difference (or association or correlation) to chance, no matter how striking the data.
Exhibit 9 Selecting a Statistical Test
Type of Data Rank, Score, or Measurement (from NonGaussian Population)
Measurement (from Gaussian Population) Goal Describe one group Mean, SD
Binomial (Two Possible Outcomes)
Survival Time Kaplan Meier survival curve
Median, interquartile Proportion range Wilcoxon test Chi-square or Binomial test ** Fisher's test (chi-square for large samples) McNemar's test
Compare one group to a hypothetical value One-sample t test
Compare two unpaired groups
Unpaired t test
Log-rank test or Mantel-Haenszel* Conditional proportional hazards regression* Cox proportional hazard regression** Conditional proportional hazards regression**
Compare two paired groups
Paired t test
Compare three or more unmatched groups One-way ANOVA
Compare three or more matched groups
Quantify association between two variables Pearson correlation Predict value from another measured variable Simple linear regression or Nonlinear regression Multiple linear regression* or Multiple nonlinear regression**
Spearman correlation Nonparametric regression**
Contingency coefficients** Simple logistic regression* Cox proportional hazard regression*
Predict value from several measured or binomial variables
Multiple logistic regression*
Cox proportional hazard regression*
6.2.a Chi-Square One important set of statistical tests allows us to test for deviations of observed frequencies from expected frequencies. To introduce these tests, we will start with a simple example: we want to determine if a coin is fair. In other words, are the odds of tossing the coin heads-up the same as tails-up. We conduct an experiment using the coin by flipping it 200 times. The coin landed heads-up 108 times and tails-up 92 times. At first glance, we might suspect that the coin is biased because heads turned up more often than tails. However, to determine whether the observed differences are significant or could have arisen by chance we utilise a chi-squared test. 49
To perform a chi-square test - or for that matter any other statistical test - we must first formulate the null hypothesis. In the example to hand, our null hypothesis is that for each toss the coin should be equally likely to land heads-up or tails-up each time. The null hypothesis allows us to state expected frequencies: for 200 tosses, we would expect 100 heads and 100 tails. The following table summarises the results of our experiment: Heads 108 100 208 Tails 92 100 192 Total 200 200 400
Observed Expected Total
The observed values are those we gather ourselves; the expected values are those frequencies expected based on our null hypothesis. We sum the rows and columns as shown in the table. It is always a good idea to make sure that the row totals equal the column totals (both total to 400 in this example). Statisticians have devised the chi-square test as a way to determine if a frequency distribution differs from the expected distribution. Chi-squared is calculated according to the following formula: Χ2 = ∑ (observed-expected)2/(expected) We have two classes to consider in this example, namely, heads and tails. Chi-squared = (100-108)2/100 + (100-92)2/100 = (-8)2/100 + (8)2/100 = 0.64 + 0.64 = 1.28 We next consult a table of critical values of the chi-squared distribution; apportion of the bale is presented below. df/prob. 1 2 3 4 5 0.99 0.00013 0.02 0.12 0.3 0.55 0.95 0.0039 0.10 0.35 0.71 1.14 0.90 0.016 0.21 0.58 1.06 1.61 0.80 0.64 0.45 1.00 1.65 2.34 0.70 0.15 0.71 1.42 2.20 3.00 0.50 0.46 1.39 2.37 3.36 4.35 0.30 1.07 2.41 3.66 4.88 6.06 0.20 1.64 3.22 4.64 5.99 7.29 0.10 2.71 4.60 6.25 7.78 9.24 0.05 3.84 5.99 7.82 9.49 11.07
The left-most column lists the degrees of freedom (df), which are determined by subtracting the number one from the number of classes. In the example to hand, we have two classes (heads and tails), so our degrees of freedom is 1, and our chi-squared value is 1.28.
Now, move across the row for 1 df until we find critical numbers that bound our value. In this case, 1.07 (corresponding to a probability of 0.30) and 1.64 (corresponding to a probability of 0.20). We can interpolate our value of 1.28 to estimate a probability of 0.27; this value means there is a 73 per cent chance that our coin is biased. In other words, the probability of getting 108 heads out of 200 coin tosses with a fair coin is 27 per cent. Because the chi-squared value we obtained in the coin example is greater than 0.05 (0.27 to be precise), we accept the null hypothesis as true and conclude that our coin is fair. 6.2.b Testing the Difference Between Two Means: The t-Test To introduce this technique, let’s revert to the simple example given above that training affects worker productivity and by extension firm profitability. After collecting the data we conclude that before receiving training the average hourly level of productivity amounted to 3.882 units, with a variance of 2.743, and 5.353 and 2.743, respectively, after training. We would like to know whether the observed differences in output could have arisen by chance or are reflective of the impact of training on productivity. 9 We noted above that the magnitude of the effect by itself should not be taken as evidence that the differences are significant. Of course, the larger the sample, the greater is the likelihood of the difference being significant. In the case to hand, let us assume we are using a fairly small sample of, say, 17 workers. If we have followed correct procedure, then our sample should be representative of the population of the firm’s workers who received on-the-job training. Taking the data at face value, the benefits of training appear quite considerable, with the productivity of workers receiving training rising by 37 per cent compared to the productivity of untrained workers. We next need to establish whether we can rule out the possibility this improvement could have arisen by chance, which would, of course, signify that training has no impact at all on productivity. To do so we make use of three critical assumptions: (1) the two populations from which the samples were drawn have the same variance (homogeneity of variance); (2) the underlying populations follow a Gaussian distribution; and (3) each value is sampled independently from all other values (random sampling). As a general matter small violations of the first two assumptions do not matter all that much; that the third assumption is satisfied is much more important if the results are to have validity. To determine whether the observed differences are significant we calculate the following statistic: t = (estimated value- hypothesised value)/estimated standard error of the statistic. The null hypothesis is that training does not affect the average level of worker productivity, in which case the hypothesised value is zero. The first step in this process is
More formally, we could have used either the z-test or the t-test. If the underlying population follows a Gaussian distribution and the variance is known, we should apply the z-test; if, by contrast, the variance is unknown and the population distribution is Gaussian or very large, then the t-test is the correct procedure to use.
calculation of the difference between the two means: M1 – M2 = 5.3523 – 3.8824 = 1.470. Since the hypothesised value is zero, there is no need to subtract it from the statistic. The next step is calculation of the standard error of the statistic: (S M1 – M2). Any statistical text will show that the formula for the standard error of the difference in means in the population is given by:
To derive this quantity, we estimate σ2 and use that estimate in place of the corresponding population value in the formula. Since by assumption the population variance is the same, we estimate this quantity by averaging the two sample variances: MSE = 2.743 + 2.985/2 = 2.864, where MSE is our estimate of σ2. Since n is 17, we need to adjust MSE to reflect sample size in each group according to:
The next step is to compute the value of t by inserting these values into the formula given above: t = 1.47/0.5805 = 2.533. Lastly, we compute the probability of getting as larger or larger than 2.53 or as small or smaller than – 2.53. To do this, we need first to know the number of degrees of freedom, the number of independent estimates of the variance upon which the MSE is based. This value is determined as (n1 – 1) + (n2 – 1), where n1 is the sample size for the first group and n2 is the sample size of the second group; since n1 = n2 = 17, the number of degrees of freedom amounts to 16. Now we can use this information to use the t-distribution to the find the probability we are looking to establish. As noted above, we use a one tailed test where we believe there either will be no difference between the means or that the difference will go in a direction specified in advance (in other words, you have specified which group will have the larger mean); with t = 2.533 and 17 degrees of freedom, only one time out of 100 could the observed difference have arisen by chance. Therefore we can reject the null hypothesis that the difference in means amounts to zero; the sample data confirms that raining does have a favourable impact on productivity. To complete our discussion, we illustrate in the exhibit below how t-tests have been used in financial research, in the case to hand an investigation into the differences in several variables widely regarded as differentiating failed from viable banks; the data cover a sample of Jamaican banks sorted by whether they did or did not fail between 1992 and 1993.
Exhibit 10 Mean of Selected Variables, Non-Failed and Failed Jamaican Banks, 1992-1993 Variables Gross Capital/Risk Assets Loan Loss Reserve/Gross Loans Total Operating Expense/Total Operating Revenue Return on Assets Liquid Assets/Total Assets Log Total Assets Change in Loans/GDP Non-Failed 18.69 6.78 96.87 - 0.03 40.34 0.53 0.02 Failed - 18.46 17.78 152.65 - 8.28 45.20 0.43 - 0.16 t-statistic 1.8 - 3.1b - 2.9a 2.0b - 0.8 0.2 2.7a
Source: Daley, Mathews and Whitfield, “Too-Big-Too-Fail: Bank Failure and Banking Policy in Jamaica,” Cardiff Business School Working Paper, E2006/4 (January).
With one exception the data relate to a number of financial variables specific to each bank in the sample. The results are pretty much as would be expected: failed banks have much less capital relative to risk assets, a lower return on assets, much higher loan loss provision and operating expense ratios, and are generally smaller than viable banks. Once these differences are subject to formal statistical analysis, we observe that neither size, liquidity or capital adequacy matter, that is, the differences noted in the table appear to have arisen by chance; on the other hand, the efficiency and return measures all appear to reflect genuine differences between the two categories of banks. To summarise: We use a one-tailed test when the research hypothesis predicts a significant difference between two groups and the direction of the difference. Hours spent studying improves finance exam results. We use a two-tailed test when the research hypothesis predicts a significant difference between two groups, but not the direction of the difference. Male and female students differ in the number of hours they devote to revising for their finance exam. The critical value for the rejection of the null hypothesis is calculated differently depending upon whether the hypothesis is one- or two-tailed. 6.2.c Testing the Difference of More than Two Means: Analysis of Variance Where we are interested in determining whether two means differ significantly from each other, the appropriate statistical procedure is to use the t-test. Where, on the other hand, we are interested in establishing whether significant differences exist among three or
more means, then the appropriate procedure is to use analysis of variance or ANOVA as it is commonly called. Why? To answer this question we need to investigate more closely the meaning of a p-value. When interpreting a p-value, we may conclude there is a significant difference between groups if the p-value is small enough, with 5 per cent typically used as the cut-off value. In this case, 5 per cent is the significance level, or the probability of a type I error – the chance of incorrectly rejecting the null hypothesis, that is, incorrectly concluding that an observed difference did not occur just by chance or, more simply, the chance of concluding that there is a difference between two groups when in fact there is no such difference. If multiple t-tests are carried out then the type I error rate will increase with the number of comparisons made. To illustrate the point, consider a study where there are six possible pair-wise comparisons, with the number of comparisons given by 4C2 = 4!/[2!2!], where 4! = 4x3x2x1. If the chance of a type I error in one such comparison is 0.05, then the chance of not committing a type I error is 1.0 - 0.05 = 0.95. If the six comparisons are assumed independent, then the chance of not committing a type I error in any one of them is 0.956 = 0.74. Hence the chance of committing a type I error in at least one of the comparisons is 1 – 0.74 = 0.26, which is the overall type I error rate for the analysis. Therefore there is a 26 per cent overall type I error rate even though for each individual test the type I error rate is 5 per cent. ANOVA is used to avoid this error.10 To understand how ANOVA is used in practice consider the following example. Suppose we are interested in establishing whether the application of three different fertilisers significantly affects farm yields. This could be done, for example, by a field experiment in which each fertiliser is applied to 10 plots; the 30 plots are later harvested with the crop yield being calculated for each plot. We now have three groups of ten yield estimates, as shown in Exhibit 10. Inspection of Exhibit 10 suggests that different fertilisers do appear to have a significant impact on yields; from our previous discussion, it should be clear that these differences could signify either that choice of fertilizer matters or the observed differences arose by chance. The only way we could know for sure would be to subject the data to formal statistical analysis.
Bewick, Cheek and Ball, “Statistics Review 9: One Way Analysis of Variance,” Critical Care, 8 (130136).
Exhibit 11 Yield per Plot for Thirty Plots Treated with Fertiliser
The variability in a set of data quantifies the scatter of the data points around the mean. To calculate a variance, we first calculate the mean, then the deviation of each point from the mean. Deviations will be both positive and negative; though their sum will be zero. (This follows directly from how the mean was calculated in the first place). This will be true regardless of the size of the data set, or the amount of variability within a dataset; accordingly, the ‘raw’ deviations do not provide a useful measure of variability. If instead the deviations are squared before summation then this sum is a useful measure of variability, which will increase the greater is the scatter of the data points around the mean. This quantity is referred to as a sum of squares (SS), To illustrate how is this done consider the following chart (Exhibit 11), which presents the basic data as well as the mean and deviation of each point from the mean. Note that the fertiliser applied to each plot is not indicated. The SS cannot, however, be used as a comparative measure between groups, because clearly it will be influenced by the number of data points in the group; the more data points, the greater the SS. Instead, this quantity is converted to a variance by dividing by n − 1, where n equals the number of data points in the group:
A variance is therefore a measure of variability, taking account of the size of the dataset. You might ask, why use n – 1 rather than n? If we wish to calculate the average squared deviation from the mean (i.e., the variance) why not simply divide by n? The reason is that we do not actually have n independent pieces of information about the variance. The first step was to calculate a mean (from the n independent pieces of data collected). The second step is to calculate a variance with reference to that mean. If n − 1 deviations are calculated, it is known what the final deviation must be, for they must all add up to zero by definition. So we have only n − 1 independent pieces of information on the variability
about the mean. Consequently, it makes more sense to divide the SS by n − 1 than n to obtain an average squared deviation around the mean. The number of independent pieces of information contributing to a statistic are referred to as the degrees of freedom. Exhibit 12 Yield per Plot by Plot Number
In an ANOVA, it is useful to express the measure of variability in terms of its two components; that is, a sum of squares, and the degrees of freedom associated with the sum of squares. Returning to the original question: what is causing the variation in yield between the 30 plots of the experiment? Numerous factors are likely to be involved such as differences in soil nutrients between the plots, differences in moisture content, other biotic and abiotic factors, as well as the fertiliser applied to the plot. However, it is only the last of these that are of interest to us, so we will divide the variability between plots into two parts: (1) the portion due to applying different fertilisers, and (2) the remainder which is due to all of these other factors. To illustrate the principle behind partitioning the variability, first consider two extreme datasets. If there was almost no variation between the plots due to any of the other factors, so that nearly all variation was due to the application of the three fertilisers, then the data would follow the pattern shown below. • • The first step would be to calculate a grand mean, that is, the mean value of all the data points, and there is considerable variation around this mean. The second step is to calculate the three group means we wish to compare: that is, the means for the plots given fertilisers A, B and C.
Once these means are fitted, there little variation is left around the group means; in other words, fitting the group means has removed or explained nearly all the variability in the data. This has happened because the three means are distinct. Now consider the other extreme, in which the three fertilisers are, in fact, identical. Once again, the first step is to fit a grand mean and calculate the sum of squares. Second, three
group means are fitted, only to find that there is almost as much variability as before. Little variability has been explained. This has happened because the three means are relatively close to each other (compared to the scatter of the data). The amount of variability that has been explained can be quantified directly by measuring the scatter of the treatment means around the grand mean. In the first of the two examples, the deviations of the group means around the grand mean are considerable, whereas in the second example these deviations are relatively small. Exhibit 13 Variability Around the Grand Mean
Having explained the principles behind an analysis of variance, let us analyse whether the observed response differences reflect the impact on yields of applying the different fertilisers or are due entirely to chance. This analysis requires two inputs from the researcher, but we must first convert the data into two broad categories: Yields and Fertilisers, with the first column sub-divided by the three different fertilizers and the yields obtained from each. Exhibit 14 Variability Around Three Treatment Means
The first variable is categorical, and in this sense the values 1, 2 and 3 are arbitrary. YIELD, by contrast, is continuous, the values representing true measurements. Data are usually continuous, while explanatory variables may be continuous or categorical or both. (a) The question The question to be answered is: ‘Does fertiliser affect yield?’ This question focuses on two variables: YIELD the data we wish to explain and FERTIL the variable we hypothesise might do the explaining. YIELD therefore is the response (or dependent) variable, and FERTIL the explanatory (or independent) variable. It is important that the data variable is on the left hand side of the formula, and the explanatory variable on the right hand side. It is the right hand side of the equation that will become more complicated as we seek progressively more sophisticated explanations of our data. Exhibit 15 Analysis of Variance with One Explanatory Variable
(b) Output The primary piece of output is the ANOVA table, in which the partitioning of SS and df has taken place. This will either be displayed directly, or can be constructed by you with the output given. he total SS have been partitioned between treatment (FERTIL) and error, with a parallel partitioning of degrees of freedom. Each of the columns ends with the total of the preceding terms. The calculation of the SS is displayed in Table 1.3. Columns M, F and Y give the grand mean, the fertiliser mean and the plot yield for each plot in turn. Column MY represents the deviations from the grand mean for each plot. If these values are squared and summed, then the result is the total SS of 36.44. FY then represents the deviations from the group mean for each plot; these values when squared and summed give the error SS. Finally, MF represents the deviations of the fertiliser means from the grand mean; squaring and summing giving the treatment SS. Dividing by the corresponding df gives the mean square. Comparison of the two mean squares gives the F-ratio of 5.70. The probability of getting an F-ratio as large as 5.70 or larger, if the null
hypothesis is true, is the p-value of 0.009. That is sufficiently small to conclude that these fertilisers probably do differ in efficacy.11 Exhibit 16 The F-distribution of 2 and 27 Degrees of Freedom
(The area to the right of 5.7 represents the probability the F-ratio is at least 5.7, and is 0.009 of the total area under the curve.)
To construct a confidence interval, both the parameter estimate, and the variability in that estimate are required. In this case, the parameters estimated are means—we wish to know the true mean yield to be expected when we apply fertiliser 1, 2 or 3—which we will denote μA, μB, and μC, respectively. These represent true population means, and as such we cannot know their exact values—but our three treatment means represent estimates of these three parameters. The reason why these estimates are not exact is because of the unexplained variation in the experiment, as quantified by the error variance which we previously met as the error mean square, and will refer to as s2. The 95% confidence interval for a population mean is:
If none of the fertilisers influenced yield, then the variation between plots treated with the same fertilizer would be much the same as the variation between plots given different fertilizers. This can be expressed in terms of mean squares: the mean square for fertilizer (FMS) would be the same as the mean square for error (EMS), namely, FMS/EMS = 1. The ratio of the two mean squares is the F—ratio, and is the end result of the ANOVA. Even if fertilizers are identical, it is unlikely to exactly equal one; it could by chance take take on a whole range of values. The F-distribution represents the range and likelihood of all possible Fratios under the null hypothesis that the fertilizers are identical.
Exhibit 17 Calculating the SS and DF
The key point is where our value for s comes from. If we had only the one fertiliser, then all information on population variance would come from that one group, and s would be the standard deviation for that group. In this instance however there are three groups, and the unexplained variation has been partitioned as the error mean square. This is using all information from all three groups to provide an estimate of unexplained variation—and the degrees of freedom associated with this estimate are 27—much greater than the 9 which would be associated with the standard deviation of any one treatment. So the value of s used is √EMS, √0.949, 0.974. This is also called the pooled standard deviation. Hence the 95% confidence intervals are as shown in Exhibit 17. These intervals, combined with the group means, are an informative way of presenting the results of this analysis, because they give an indication of how accurate the estimates are likely to be.
Exhibit 18 Constructing Confidence Intervals
6.2.d Regression Analysis In many finance applications we make use of correlation, the extent to which two variables move together, positively or negatively, or are independent of each other. Recall that in portfolio theory, we use the degree of correlation to determine the impact adding an additional asset will have on the portfolio variance: the lower or better yet, the more negative the correlation the greater will be the diversification potential associated with the new asset. In reality, correlation signifies nothing more than the existence of a gross association between two variables; it emphatically says nothing about the direction of causation, that is, whether changes in one variable lead to changes in the second variable. Financial theory is, however, very much concerned with both the strength of the relationship and the direction of causation. One of the statistical procedures favoured by financial analysts is regression analysis, because it ostensibly addresses both concerns simultaneously. Regression analysis can be conceptualised as an extension of the covariance/correlation concept. The key difference is that regression analysis presupposes that one variable, the independent variable (X), “causes” changes in the dependent variable (Y). It is important to stress that this causal relationship is conjectural only. For example, economic theory suggests that a change in the budget balance (resulting, say, from an increase in public expenditure) will cause national income to rise. It is, of course possible, that changes in nominal income caused the increase in the budget deficit: owing to the existence of the so-called automatic stabilisers: any decline in economic activity will cause unemployment payments to rise while simultaneously reducing the level of tax receipts. A regression of GDP growth on the budget deficit presupposes that the latter causes the former, when in reality the reverse may be true. Of course, we could regress changes in the lagged budget deficit against the current change in GDP. Here, there is no question as to the direction of causation: the current state of the economy cannot cause past changes in the budget balance. If there is any relationship between the two variables then, given this specification, the direction of causation must run from budget deficits to changes in national income.
The simplest way to comprehend what regression analysis is all about is to plot two series against each other on a graph, with the values of the dependent variable (Y) shown on the ordinate (Y-axis) and the values of the independent variable (X) shown on the abscissa (X-axis). What regression analysis does is to fit a line through the data that minimises the sum of the squared deviations of a given point from the regression line; hence the alternative descriptor ordinary least squares or OLS. The standard output of regression analysis consists of two parameters, a measure of the variability of the two parameter estimates, a goodness-of-fit statistic and a measure of how tightly individual data points cluster around the regression line. Regression analysis is used to quantify the relationship between two variables that are of interest to the researcher. We might, for example, hypothesise that earnings are an increasing function of education: the more years of schooling you have, the higher on average we expect your income to be. Now if we plot the data on the two series collected for a large sample of people randomly chosen, the resulting scatter of points should rise from the origin to the northeast corner of the graph. This positive association could be rationalised in two ways: (1) higher productivity (and thus higher income) goes hand in hand with higher education, and (2) people require compensation for investing time in studying, given they have the alternative of earning income or spending time with their families or friends (leisure). Having plotted the data for a large sample of individuals, we would now like to estimate precisely how much additional income one might expect to earn by increasing their number of years of schooling. The relationship will not be perfect, as other factors that could effect earnings will have been omitted from our bi-variate analysis; these omitted influences are typically described as “noise.” Thus earnings are a function of years of schooling (education) and noise. Regression analysis produces a line that describes the relationship between the two series by minimising the sum of squared errors, the square of the difference between actual and predicted values of the dependent value (earnings in our example). This criterion has several important advantages over other possible “fit” criteria. For one thing, it is easy to employ computationally: “When one expresses the sum of squares mathematically and employs calculus techniques to ascertain the value of (the regression coefficients) a and b that minimise it, one obtains expressions for a and b that are easy to evaluate with a computer using only the observed values of education and earnings in the data sample.” And for another, “it also has attractive statistical properties under plausible assumptions about the error term”12, namely, the resulting estimators will be unbiased (meaning that the estimate will produce values corresponding to the ‘true’ mean of the population from which the sample was drawn), consistent (the tendency for the estimator to converge to the true population parameter as the number of observations increases) and efficiency (an estimator having the lowest variance).
Sykes, “An Introduction to Regression Analysis,” Inaugural Coase Lecture (published as Chicago Working Papers in Law & Economics, No.20: 1993).
The first regression output, the intercept, is the point at which the estimated regression line cuts the Y-axis; the second is the slope of the regression line. The intercept can be interpreted in one of two ways. It is either the value of the dependent variable when the value of X is zero. Alternatively, and following directly from the way the parameter is defined, it is the difference between the average value of Y and the slope-adjusted expected value of X. The slope measures both the direction and the magnitude of the relation. If the two series are positively correlated, the slope coefficient will have a positive sign; if the series are inversely related, the slope coefficient will be negative. The magnitude of the slope coefficient can be interpreted in the following way: a unit change in the independent (X) variable causes the dependent (Y) variable to change by the amount b. As the formula given in Exhibit 2.18 indicates, the slope of the regression and the degree of correlation among the two variables are closely linked. Each of these coefficient estimates has an associated standard error. If we make the assumption that the estimated intercept and slope coefficients are normally distributed, the parameter estimates can be combined with the associated standard errors to obtain a tstatistic. The t-statistic measures whether the relationship is statistically significant in the sense defined above; that is, the likelihood that the coefficient of interest differs reliably from zero (or from some other value). While there are tables that provide critical values of t (thresholds that must be met or surpassed to reject the null hypothesis), broadly speaking, for regressions estimated using 120 or more observations, t > 1.66 (2.36) signifies that 95 (99) times out of 100, the coefficient will be differ from zero or, equivalently, that only 5 (1) times out of 100 the estimated coefficient will be zero. For smaller samples, the t-statistic has to be larger to indicate statistical significance. The goodness of fit measure is known as the coefficient of determination, and is designated as R2. The coefficient of determination measures the proportion of the variation in the Y variable that is explained by the X variable. R 2 is a function of the correlation between the two variables, but unlike the correlation coefficient, R2 is bounded by zero and one. If the X variable explains none of the variation in the Y variable – the two series are unrelated – R2 will equal zero; the closer the R2 is to one, the greater is the strength of the relationship – regardless of whether it is positive or negative – between the two variables. Most economic or financial relationships are dependent upon more than one explanatory variable. For example, we might expect the debt/total capital ratio to be a function of the volatility of the firm’s cash flow, the level of the firm’s outstanding debt and the company’s tax rate. In which case, we will want to know both the separate and combined impact of each of these variables on the debt ratio.
Exhibit 19 Regression Outputs and Their Interpretation
Measure OLS Regression Slope of the Regression (b) Regression Intercept (a) R2 of the Regression Standard Error of the Intercept (SEa) Calculation Y = a + bX σYx/σX2 μY – b(μx) ρXY2 = b2σ2X/σY2 Interpretation A linear relationship between the independent variable X and the dependent variable Y. Measures the change in Y given a unit change in X. The value of Y when X equals zero. Measures the proportion of the variation in Y that is explained by X. A measure of the spread around the intercept term. Used to asses the probability that the estimate does not equal zero [ t = a/SEa > t(0.95)]. A measure of the spread around the estimated slope coefficient. Used to assess the probability that the estimated coefficient does not equal zero [t = b/SEb > t(0.95). Y = a + b1X1 + b2X2 + b3X3 + b4X4 R2 – [k – 1/k – n]R2 Allows for a relationship between the dependent variable and more than one independent variable. Corrects for the tendency for the R2 to rise as the number of independent variables increases.
Standard Error of the Slope (SEb)
The coefficient of multiple determination measures goodness-of-fit,13 the estimated regression coefficients provide a measure of the direction and strength of the relationship between each of the independent variables and (holding the influence of the other independent variables constant) the dependent variable, and each regression coefficient can be tested using the t-statistic to determine whether it differs significantly from zero. In this form, the analysis is known as a multiple regression. To sumarise: There are five key steps in the successful implementation and use of regression analysis: 1. Specify the variables in the model and the exact form of the relationship between them. • In most instances a linear approximation will do; in others, more complex relationships may have to be employed. Inspection of a plot of the dependent variable should clarify the appropriate functional form.
There is a tendency for the R2 to increase as the number of independent variables included in the analysis increases. To counter this tendency, the R2 is corrected by the following ratio: (k – 1/n – k) where k is the number of independent variables and n is the number of observations included in the analysis. The result is known as the adjusted-R2, and is usually denoted with a bar above the statistic.
2. Collect data on the variables in the analysis. • Data are derived from multiple sources ranging from government statistics to corporate annual reports. There are numerous economic and financial data bases available that now greatly simplify data collection. 3. Estimate the parameters of the model. • Various regression packages are available; the one favoured by students is part of the EXCEL package. 4. Test statistically the utility of the model developed, and verify whether the assumptions of the linear regression model have been satisfied. • The diagnostic tests used to establish whether the assumptions of the regression model have been met are typically a standard feature of the output; where the assumptions are violated, some regression packages apply the appropriate corrections automatically. 5. Use the model for prediction. • Predictive accuracy is tested either (1) in-sample, that is, estimating the values of the dependent variables given the values of the independent variables used in the analysis; or (2) out of sample, where a sample of withheld data are is used to predict their values. In either case, the values of the independent variables are set at their actual levels. This is not a very stringent test of the model’s forecasting ability as in most instances the input variables are estimates not known quantities. In reality, it is the combination of the model and the forecaster’s skill that determines the usefulness of the final model for predictive purposes. 6.2.e Assumptions of the Regression Model To ensure that the results of the analysis are correctly interpreted, it is essential that all of the assumptions underpinning the model are verified; otherwise, the results obtained may not stand up under closer scrutiny, that is, may inferences may not be valid and generalisable. One of the most important assumptions concerns the residuals εi: the εi represent the difference between the regression predictions and the actual data. Where the assumptions concerning the residuals are verified, ordinary least squares provides the best estimates of population coefficients. But what happens when these assumptions are violated? More to the point, we need to know how to recognise when they are violated, assess the implications of the violation, and understand the techniques can be used to correct violations of these assumptions.
Assumptions of the Multiple Linear Regression Model The "ideal" conditions for estimation and inference in regression analysis are: 1. The expected value of the disturbances is zero. E(εi) = 0. The regression line passes through the conditional means of Y given X values. This also implies that Y has a linear relationship with the explanatory variables. 2. The disturbances, εi; have a constant variance equal to σ2ε. 3. The disturbances, εi, are normally distributed. 4. The disturbances, εi, are independent. 5. The explanatory variables are not highly correlated with each other. The Regression Residuals As noted above, we can describe a sample regressions as: Yi = β0 + β1X1i + β2X2i + … + βKXKi where ^Yi is the regression's predicted value of the ith observation of the dependent variable. If we denote Yi to be the actual ith observation of the dependent variable, then the term Yi - ^Yi represents the difference between the predicted and actual values of the dependent variable. This is called the residual for observation and is written as εi = Yi – ^Yi These are sample disturbances which can be used to approximate the population disturbances εi. These residuals can be used to examine assumptions concerning the population disturbances. After any regression estimation, an analysis should be undertaken to assess whether the model assumptions are verified or not. The residuals can be assessed graphically through what is known as a residual plot, which is nothing more than a scatter plot of the residuals. Graphical techniques often involve personal judgement about whether violations are occurring; when you look at a residual plot, a "good" regression will generate the following three properties: (a) The average of the residuals will equal to zero, which is simply a result of the least squares solution which forces this to be true.
(b) If conditions 1 through 3, above, hold, then the residuals should be randomly distributed about their means (zero); in other words, there should be no systematic pattern evident in the residuals. (c) If conditions 1 through 4 noted above hold, then the residuals should look like random numbers drawn from a normal distribution. Remember: the residuals represent the errors of your regression and the errors should be totally random. When using residual plots, the following should be used: • • • Plot the residuals against each explanatory variable. Plot the residuals against the predicted values. If the data are a time series, the residuals should also be plotted against time.
Each of these plots can be used to find violations in different assumptions. The following chart depicts a residual plot of the residuals of a regression of size versus value. The figure plots the residuals against the explanatory variable size.
The residuals appear to be fairly random and also fairly normal. If anything, there appears to be a slight tendency for the residuals to decrease as size increase. In reality, there is no trend in the residuals (the vertical line shown in the chart). This is an example of a residual plot where no assumptions appear to be violated.
Consider, by contrast, the following chart where a violation is probably occurring.
Notice the humped shaped pattern in the residuals; the residuals are no longer random. In the previous two figures we used the actual residuals in our residual plots. If you are using SPSS, the software allows us to save the standardised residuals: these are simply the residuals divided through by their standard deviation. The use of standardised residuals is most helpful when assessing whether the residuals are normally distributed. The following figure plots the standardized residuals for the real estate regression.
We know that if something is distributed normally, then most of the values (99%) should lie within three standard deviations of the mean and 95 per cent of the values should lie within two standard deviations of the mean. In the above chart only 2 residuals are beyond three standard deviations from the mean and only three are outside two standard deviations from the mean. These residuals appear to be normally distributed.
Is the Relationship Linear? Using Plots to Assess Linearity Assumption The first assumption of a regression model is that there is a linear relationship between the regression and the explanatory variables. We can assess this assumption by looking at a residual plot. The residual plot shown below alerts us to a violation of this assumption, as the residuals have a clear quadratic pattern. This systematic pattern in the residuals suggests that the relationship between the explanatory variable and the regression is not linear, but instead a possible curvilinear relationship in the regression. Corrections for Violations of the Linearity Assumption Fixing this type of violation is not always obvious. A violation simply means that there is a curvilinear relationship between x and y. To correct this violation we need to perform a curvilinear transformation. The most common technique is to try a transformation and look at the residual plot again. If the violation has been corrected, proceed no further; otherwise, continue trying other curvilinear relationships. In the example noted above, there appears to be a quadratic pattern to the residuals, so we should try a polynomial transformation of degree 2. After performing this transformation, the residuals look like (residual plots against x and x2).
We see no systematic pattern between the residuals and x or x2. It appears we have corrected the violation of the linearity assumption.
Is the Variance Around the Regression Line Constant? Using Plots to Assess the Assumption of Constant Variance The second assumption of the regression model is that the residuals have constant variance equal to σ2ε. In a plot of the residuals against any explanatory variable, the residuals should appear random with constant variance. If there appears to be a change in the variance, then the assumption of constant variance in the residuals may be violated. In a residual plot, non-constant variance (commonly called heteroskedasticity) is indicated by a "cone-shaped" pattern in the residuals as shown in the chart below.
Note that as the explanatory variable gets larger, the variance in the residuals also gets larger. This residual plot shows non-constant variance in the residuals. When we have non-constant variance, the use of regression suffers two major drawbacks. • • The estimates of the regression coefficients are no longer minimum variance. The standard errors of the regression coefficients are larger than they should be. The estimates of the standard errors of the coefficients are biased. Therefore, hypothesis testing about population coefficients may lead to misleading results.
There are several tests for non-constant variance. One particular test has proven to be very powerful detecting non-constant variation. Here is the structure for this hypothesis test: Null Hypothesis: Variance of the residuals is constant. Alternative Hypothesis: Variance of the residuals is not constant. The test statistic for this test is Q = (6n/N2 – 1)1/2 (h – n + ½) where n is the sample size, and h is h = ∑i x ^εi2/∑^εi2 where ei2 is the residual from the ith observation in the regression equation. The decision rule for this test is Reject if : Q > zα Accept if : Q ≤ zα where α is the level of significance for the test and zα is chosen from the standard normal distribution with an upper tail area equal to α. The test assumes that all of the observations in the regression have been ordered in increasing variance. Typically, it is assumed that the variance in the residuals increases as the value of one of the explanatory variables increases. Thus the data need to be arranged in increasing order with regards to this explanatory variable. Corrections for Non-Constant Variance There are several ways to correct non-constant variance. All of these corrections involve transforming the dependent variable, y. This will sometime makes regression results difficult to interpret. Here are three of the most common correction strategies:
In place of the dependent variable, y, use the variable ln(Y). The natural log is less variable and may correct the problem. Remember the natural log only works on positive values. If your dependent variable has negative values you can not perform this transformation. In place of the dependent variable, y, use the variable py. The square is less variable and may correct the problem. Remember the square root only works on positive values. If your dependent variable has negative values you can not perform this transformation. Works best when your dependent variable is a count variable.
3. If the variance in the residuals is believed to be proportional to some function of one of the explanatory variables, then that explanatory variable may be used to stabilize the regression. Suppose, we believe that σ2 = σ2 xi2 That is the variance of the residuals is related to xi. Then we simply divide the entire regression through by xi This gives us the following regression
Where εi’= εi/xi are the residual constant variance. Note that the roles of β0 and β1 have ben reversed in this model. This transformation does not work if xi= 0. Remember: when you transform your dependent variable, you are actually estimating a different variable. To get the original forecasts for y back, you are going to have to use the estimates from your transformed model and then back out the corresponding y. Consider the following model. Y = β0 + β1x Suppose this model results in non-constant variance in the residuals. If I choose option 1 to correct this violation, the model that will be run is ln(Y) = β0 + β1x Suppose you get the following results: ln(Y) = 8:74 + 0:00537x What is the estimated value of y when x = 300? We first use our transformed model to get:
ln(Y) = 8.74 + 0.00537(300) = 10.351 Then we have to exponentiate this to get back Y: Y = e10.351 = 31.288 To make sure you have fixed the non-constant variance problem you should always get a residual plot of the transformed model and make sure that the "cone" shape has been removed. Are the Disturbances Normally Distributed? Using Plots to Assess the Assumption of Normality The residual plot of the standardized residuals versus the predicted values can be use to assess if the residuals are normally distributed. Remember, to normally distributed about 68 per cent of the values should be within one standard deviation of the mean, and 95 per cent of the values should be within two standard deviations of the mean. If you find many standardized residuals that are more than 2 standard deviations from the mean, then the residuals may not be normal. The easiest way to check the normality of the residuals, if you are using SPSS, is to save the standardized residuals from your regression, and construct a histogram of the residuals. Under the histogram option in SPSS, check the box "Display Normal Curve," then click OK. The histogram will show the distribution of the residuals with a superimposed normal curve: if the residuals are normally distributed, the histogram should follow the normal curve. The following histogram shows residuals which are not normally distributed.
Std. Dev = .99 Mean = 0.00 N = 35.00 We see that the residuals do not follow the normal curve very well.
Corrections for Normality The assumption of normally distributed residuals is not always necessary to run a regression estimate. It is necessary when making inferences from small samples. In large samples, this normality assumption is not important because the Central Limit Theorem shows that the sample distribution of the estimators are normal. When the normality assumption is violated and we have a small sample, we need ways to fix the violation. Some of these corrections involve transformations of the dependent variable called Box-Cox Transformations. These are complex corrections and are not covered in this class (if interested you should sign up for Econometrics). In truth this is typically the last violation we worry about with the residuals. Some time other violations make the residuals appear non-normal when they really are. In the above histogram, there is clearly an outlier which needs to be dealt with. Once this outlier has been corrected, the residuals of the regression may in fact be normal. The most obvious way to avoid this violation is to have a larger sample. Once the sample is large enough, the assumption of normality in the residuals does not really matter. Extreme Values? The objective of the least-squares method is to minimize the error sum of squares, SSE. In doing so, this method is looking to avoid larger distances between the data Y i and regression prediction ^yi. If any large distances do exist, the regression line can be substantial pulled toward this influence point in order to remove this large distance. From this we can see one down fall of the least-squares approach, extreme points (outliers) are give proportional the most weight in the construction of the regression equation. When a sample data point has a value which is much different than the other values of the data, it is called an outlier. Outliers can be both good and bad. Outliers do provide some information. They identify the total possible variability in the data. This is a good thing. On the other hand, the presence of outliers can cause misleading and confusing regression results. Either way, it is important recognise the presence of an outlier. The following charts show how an outlier can affect a regression. The top figure displays a regression plot from some sample data that has no outliers. The bottom figure displays a regression plot of the same data on one observation has been changed to an outlier.
Note that the one outlier has an effect on the regression line. The regression equation has been displayed for both samples. In the bottom figure, the outlier has shifted the entire regression upward, the intercept term having risen from 1.18 to 1.70. On the other hand, the outlier has had no effect on the slope of the regression. Why? because the outlier is in the middle of the sample data. The outlier also causes a big change in the quality of the regression, with the R2 dropping from 0.97 to 0.25. Instead of explaining 97 per cent of the variance in Y, the regression now explains only about one quarter. In this example, the outlier had no effect on the slope of the regression line, though this will not always be the case. In short, the position of the outlier in the data matters. Consider the following situation. Suppose that instead of having the outlier at the middle data value, it is associated with a high X value.
This time the outlier has produced a change in both the intercept and the slope of the regression line. Outliers which cause this are called leverage points (think of a lever twisting the line). Identifying Outliers Now that we have see the effects of outliers, we need to find a method for identify them. The preferred method makes use of standardized residuals. Remember, standardised residuals, εis, are simply the residuals of a regression, εi, divided through by the standard deviation of the residuals, σεi.. The variance of the standardized residuals is 1. Why? If the residuals are normal with mean zero, dividing each residual by its standard deviation makes the standardised residuals distributed as a standard normal. To identify an outlier, we simply look for standardised residuals which are large in absolute value. Given that these standardized residuals are distributed as a standard normal, we should only find that about 5 per cent of the standardised residuals are larger than two in absolute value: any observations that has a standardized residual larger than two in absolute value can be classified as an outlier. Another measure often used to identify outliers is to calculate the studentised residuals. To compute the studentised residuals, the residuals εi, is again standardized, but with a different standard deviation. The standard deviation σεi for the ith observation is calculated with the ith observation removed. If the ith observation is unusual, this will be reflected in the residual and not in σεi. Thus, any unusual observations should be easy to locate. Studentised residuals follow a t-distribution with n - K - 1 degrees of freedom. A simple t-test can be preformed to determine whether a specific observation is an outlier. What to do with Unusual Observations? Now that we know how to identify outliers, we need to learn what to do with these observations. Remember not every unusual observation is an outlier. Sometimes other violations of such as non-constant variance or non-linearity may make an observation look like an outlier when it really isn't. Also, even if an observation is an outlier, you will not always want to simply delete this observation. The information gained through an outlier may be important. Here are some basic rules for dealing with outliers. If the data value is simply incorrect (typos), then this observation should be dropped from the analysis. If possible the correct data value should be found and included in the data set. If the data value is correct and is just unusual, then the choice of what to do with this observation is uncertain. It simply depends on why the value is unusual and how important this value is to the analysis. If the outlier is in a range of the data that is beyond the main focus of the analysis, then dropping the data observation is the appropriate thing to do. However, this is purely a judgement call, and should be avoided if possible.
Are the Residuals Independent? Autocorrelation One violation that frequently occurs in time-series analysis is that of independence in the residuals. Disturbances in adjacent time periods are often correlated. If your regression has a negative residual in time period t there is a good chance that you will see another negative residual in time period t + 1. The relationship between residuals is represented by: εi = ρε i - 1 + ui where εi is the residual in period i, εi – 1 is the residual in period i - 1; ρ is the serial correlation coefficient or autocorrelation coefficient, and ui represents a disturbance that is independent over time. Our regression model is written as yi = β0 + β1xi + εi where εi = ρε i - 1 + ui. When residuals display autocorrelation the estimated standard errors of the coefficients (the βs) are biased and larger. As a result, the confidence interval estimates and hypothesis tests do not generate the expected results. The autocorrelation coefficient, ρ, determines the strength of the relationship between residuals over time periods. Like any other correlation coefficient, the autocorrelation coefficient can take on any value between -1 and 1. Values close to -1 or 1 indicate a strong relationship over time. Values close to 0 indicate independence in the residuals. To test for autocorrelation, we perform a Durbin-Watson test. A Test for First-Order Autocorrelation The Durbin-Watson test is widely used test for autocorrelation. In most applications, if autocorrelation is present, it will be a positive autocorrelation (ρ > 0). For this reason, the Durbin-Watson test is set up as a hypothesis test for positive autocorrelation. The test is set up as: Null Hypothesis: ρ = 0 Alternative Hypothesis: ρ > 0 where ρ is the autocorrelation coefficient. If the null hypothesis is accepted, then autocorrelation is deemed not to be a problem. If the null is rejected, then we have evidence that autocorrelation exists and we may need correct it. The Durbin-Watson test statistic is computed as
where bei is the residual for observation i and bei¡1 is the residual for observation εi ¡ 1. When the residuals are independent, d is approximately equal to 2. When the residuals are positively correlated, d < 2. The decision rule for this test take the form: Reject Null if : d < dL(a; n; K) Accept Null if : d > dU (a; n; K) where dL(a; n; K) and dU (a; n; K) are selected from Table B.7 in the back of your textbook. These value depend on the level of significance,α; the size of the sample, n, and the number of explanatory variables in the regression, K. In the Durbin-Watson test, there is a range of value for d, where the test is inconclusive. This range is: dL(a; n; K) · d · dU (a; n; K) If the test-statistic falls in this range, we are unsure if autocorrelation is a problem. If the test-statistic falls in this range, the best approach is to go ahead and correct for autocorrelation. If the results of the regression change, then the correction should have be preformed and autocorrelation was a problem. If the results of the regression do not change very much, then the correction did not need to be preformed and autocorrelation was not a problem. SPSS can calculate the Durbin Watson statistic, it is located under the statistics option in the regression dialog box. Correction for First-Order Autocorrelation When the residual are autocorrelated, this typically means that some important variable has been omitted from the regression. One way to fix autocorrelation is to find this variable and include it in the regression. Another way to correct first-order autocorrelation (correlated across adjacent time periods), transforms the original time-series variables in the regression so that the regression will use independent disturbances. Let's look at this transformation: We start with our original regression. Yi = β0 + β1 Xi + εi where the residual, ei, has first-order autocorrelation
εi = ρe i-1 + ui To remove the autocorrelation, the following transformations are used on both the dependent and explanatory variables Y*i = yi - ρyi-1 X*i = xi – ρXi -1 for observations 2 through n. For the 1st observation we use the following transformations Y*1 = √1 – ρ2Y1 x*1 = √p1 – ρ2X1. We then use the new regression Y*i = β0 + β1X*i + ui and the disturbances ui are independent. The third option to correct autocorrelation is to add a lagged value of the dependent variable in as an explanatory variable. In this case your regression would take the form Yi = β0 + β1X+i + β2 yi -1 + εi This option works well when we have large samples. Test for Autocorrelation We use lagged values of the dependent variable as an explanatory variable, the DurbinWatson test for autocorrelation is no longer valid. To test for autocorrelation we have to switch to Durbin's h-test. The hypothesis for this test is: Null Hypothesis: ρ= 0 Alternative Hypothesis: ρ > 0 where ½ is the autocorrelation coefficient. If the null hypothesis is accepted, then autocorrelation is deemed not to be a problem. If the null is rejected, then we have evidence that autocorrelation exists and we may need correct it. The test statistic is computed as
where n is the sample size, r is an estimate of the autocorrelation coefficient ρ, and sb1 is the estimated variance of regression coefficient of the lagged dependent variable (yi¡1 ). The decision rules for this test are: Reject Null if : h > zα Accept Null if : h · zα Where zα is measured from the standard normal. A quick estimate of ½ is calculated as: r = 1 – d/2 where d is the Durbin-Watson statistic. Is Multicollinearity a Problem? Consequences of Multicollinearity For a regression with K explanatory variables, it is hoped that the explanatory variables are highly correlated with the dependent variable. At the same time however, it is also hoped that the explanatory variables are NOT highly correlated with each other. When the explanatory variables are correlated with each other we have a problem of multicollinearity. The seriousness of this problem depends on the degree of the correlation between the explanatory variables. High correlations may result in highly unstable least-square estimates of regression coefficients. The presence of several multicollinearity brings about the following problems: • The standard errors of the regression coefficients become usually large. As a result, individual t-tests report values that are too small. We may conclude that coefficients are zero when that in fact is not the case. The regression coefficients become unstable; the signs of the regression coefficients may even get switched around. Dropping one explanatory variable may lead to large changes in the coefficients for all the other variables in the analysis.
Detecting Multicollinearity There are several ways to detect multicollinearity. 1. Compute the correlations between the explanatory variables. Because multicollinearity is exists when the explanatory variables are highly correlated, calculating these correlations should identify the problem. The cutoff for high correlation is generally thought to be somewhere around 0.50 in absolute value. This is a simple rule of thumb. To calculate correlations in SPSS, simply choose correlate: bi-variate in the ‘analyse’ menu. 80
There is a serious limitation in this approach: you can only see if the explanatory variables are directly correlated. If you have three explanatory variables X1, X2, X3, correlations establish the relationship between any two of these explanatory variables at a time. What can not be captured is the relationship between one explanatory variable and a combination of the other variables. If X1 is highly correlated with X2 + X3, calculating correlations will not capture this effect even though multicollinearity exists. 2. The second methods involves close inspection of the regression output. Multicollinearity can also be indicated by large F statistics with small t statistics. In this case the F test would say the overall regression is significant, but each individual βi would fail their individual t-tests, owing to the inflated standard errors that are removing all the power from the t-tests. It should be noted that neither of these methods is fool proof when it comes to establishing the existence of multicollinearity. Consider the case where only some of the t statistics are small but the F stat is large. It is unclear whether this is an indication that multicollinearity is a problem. Correction for Multicollinearity One easy method for correcting multicollinearity is to remove those explanatory variables that are highly correlated with the other explanatory variables. There is, however, an obvious downside to this approach: you are removing all of the information from the dropped variables, which can lead to substantial changes in your regression estimates. There are other high powered ways to correct multicollinearity, though they are beyond the scope of this Guide; any standard econometrics text can point you in the right direction. One final point: you do not have to correct multicollinearity if you are using regression analysis for forecasting purposes only. Multicollinearity does not limit a regression’s ability to predict, nor does it affect a regression's ability to obtain a good fit (high R 2). Correcting multicollinearity matters only if you are using regression for inference and coefficient estimation. 6.2.e Logistic (Logit) Regression Before leaving the topic, we should note a special form of regression that is widely used in financial analysis, namely, logistic (or logit) regression. In most regression applications both the dependent and independent variables are continuous, that is, they assume a wide range of values. In others, the data are categorical, that is, take on only one of two values (alive/dead; win/lose, bankrupt/viable; defaulted/current, union/nonunion, Tory/Labour and so forth). In each case we are interested not so much in predicting the actual value but whether we can establish the likelihood (probability) that an unclassified observation will fall into one or the other category. In finance, such
research goes back to the 1970s when Edward Altman of the Stern School of Business (NYC) first developed his famous Z-Score model of corporate bankruptcy.14 Altman’s model uses financial ratios to sort companies into financially distressed or financially viable categories with a reasonably high degree of accuracy. His model is still widely used, even though critics note that its exclusive dependence on corporate data has two significant drawbacks: (1) Corporate data are frequently published with a lag, sometimes as long as year. This means that the classification prediction is based upon data that may no longer reflect the financial reality of the company, thus biasing the analysis; and (2) It ignores non-corporate data, which may have a significant bearing on the correct classification.15 In his original formulation Altman used a technique known as Discriminant Analysis; in more recent years analysts have come to favour logistic regression over discriminant analysis for a number of reasons even though both approaches start from a common premise, namely, that the categories of outcome in the dependent variable must be mutually exclusive. • Logistic regression is much more relaxed and flexible in its assumptions than is discriminant analysis. Unlike the latter, logistic regression does not require the independent variables to be normally distributed, linearly related or of equal variance within each group. The greater flexibility of logistic regression argues strongly in its favour, though it has been noted that ‘when (the) assumptions regarding the distribution of predictors are met, discriminant analysis may be a more powerful and efficient analytic strategy.16 Even though the logistic regression does not have many assumptions, and thus is usable in more instances, it does require a larger sample size – at least fifty cases per independent variable might be required for ensuring accurate hypothesis testing. A good rule of thumb is to use logistic regression when the dependent variable is categorical and the sample to be used in the analysis is large.
More formally, logistic regression treats the distribution of outcomes in a probabilistic manner, that is, the occurrence of the study phenomenon is evaluated in terms of probability, which take on values of between zero (no chance of the event occurring) to one (the occurrence is certain). In other words, the outcome of the analysis is the
“Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy,” Journal of Finance, 4 These limitations led some analysts to develop Contingent Claims (option) models that make use of all available information; the main problem with this approach is that produces an estimate of the probability of default but without any clear understanding of the factors that contributed to the estimate. Other financial analysts favour use of a hybrid model that combines observable variables with the unobservable inputs of Contingent Claims models, though it arguable whether the addition of the latter − subsumed, in principle, in the former − adds anything of real value.
Tabachnik and Fidell, Using Multivariate Statistics (Harper Colins, 1996).
likelihood of the event of interest occurring, say, there is 67 per cent chance that company X will fail within the next year. If the probability of the phenomenon occurring (say, default) is PA and PB is the probability of the absence of the phenomenon, then PA + PB = 1 (that is, PB = 1 - PA), and PA = EXP(UA)/1 + EXP(ZA), and ZA = β0 + β1X1 + β2X2 + βNXn + ε, Where the variable ZA is a measure of the total contribution of all the risk factors used in the model and is known as the logit. Β0 is called the intercept, and the βis are called the regression coefficients of the risk factors Xi. The intercept is the value of ZA when the risk factors are zero. Each of the regression coefficients describes the size of the contribution of that risk factor, while a positive sign signifies that the risk factor increases the probability of the outcome, while a negative coefficient indicates that the risk factor decreases the probability of that outcome. A large (small) coefficient means that the risk factor has a strong (little) influence on the probability of the outcome. The greater is the value of ZA, the greater is the probability the event will occur; as Z A approaches infinity, PA approaches one indicating a high probability of the event happening. When, by contrast, UA approaches negative infinity, PA approaches 0. When ZA = 0, PA = 0.5, meaning there is a 50:50 chance of the event occurring. One of the standard outputs of logistic regression is the Odds Ratio (OR) associated with each control (independent) variable. The ‘odds’ of an event is defined as the probability of the outcome of that event occurring divided by the probability of the event not occurring. For example, let the probability of an event occurring be 0.8 (p = 0.8), so that the probability of failure is q = 1 - p = 0.2. The odds of ‘success’ are defined as odds(success) = q/p = 0.8/0.2 = 4, that is, the odds of success are 4 to 1. The odds of failure would be q/p = 0.2/0.8 = 0.25; while the results may look odd, in fact all that is being said is that the odds of failure are 1 to 4. In other words, the odds of success and the odds of failure are reciprocals of each other: ¼ = 0.25 and 1/0.25 = 4. We need to add one more variable to the equation to be able to compute the Odds Ratio. This is best accomplished by use of another example. Suppose that seven out of ten males are admitted to a business school compared with three out of ten females. The probabilities for admitting a male student are p = 7/10 = 0.7 and q = 1 - p = 1 – 0.7 = 0.3. The same probabilities for females are: p = 3/10 = 0.3 and q = 1 – 0.3 = 0.7. We can now use these probabilities to calculate the admission odds for both male and female students: odds(male) = 0.7/0.3 = 2.3333; odds(female) = 0.3/0.7 = 0.4285. Compute the odds ratio for admission as follows: OR = 2.3333/0.42857 = 5.44, which says: the odds of being admitted if the applicant is a male student are 5.44 times greater than if the applicant were a female.
In most logistic packages the logistic produces results in terms of an odds ratio while logit produces results in terms of coefficients. A logit is defined as the loge of the odds: logit(p) = log (odds) = log(p/q). In fact, there is a direct relationship produced by logit and the odds ratios produced by logistic. Logistic regression is in reality ordinary regression analysis using the logit as the response (dependent) variable: log (p) = β0 + β1X or log(p/x) = β0 + β1X. This means that the coefficients in the logistic regression (βi) are in terms of log odds, that is, the coefficient β1 implies that a one unit change in the independent variable (X) results in β1 unit change in the log of the odds. The equation log(p/x) = β0 + β1X can be used expressed in terms of odds by getting rid of the log; this is done by taking e to the indicated power and applied to both sides of the equation: p/q = eβ0 + β1X. The end result of these mathematical manipulations is that the odds ratio can be computed by raising e to the power of the logistic coefficient: OR = e β1. To make the point clear let’s revert to the business school example mentioned above, where we sorted students by whether they gained entrance to the business school or not. Assume a logistics equation was estimated and the following result obtained: log(p/1- p) = -0.847 + 1.694596X, where X stands for gender (male or female). According to our previous discussion, OR = e β1 = e1.694596 = 5.444, which (as we saw above) indicates that a male student is nearly 5.5 times more likely to be admitted to our hypothetical business school than is a female student.17 To better understand how logistic regression can be used and interpreted, we review the results of a recent article that explored why consumers choose non-traditional finance companies (finance or loan companies, but excluding loans with auto manufacturing companies or mortgage brokers) in preference to traditional lenders (commercial banks, savings banks or savings and loan associations) to obtain credit. 18 The data are drawn from the US Survey of Consume Finance for the year 2004. The classification was based upon whether the individual had borrowed from a financial institution that did (traditional) or did not (non-traditional) accept deposits; the sample consisted of more than 4,500 respondents. The estimated regression is as follows:19 ZA = - 1.43 + 0.73X1 + 0.28X2 + 0.10 X3 – 0.02X4,
The above example was derived from the University of California’s Academic Technology Service.
Jeffrey Dew, “Credit Crunched? The Relationship Between Credit Denials and the Use of Alternative Financial Institutions,” Consumer Interests Annual, 54 (122-126).
Shown here are only those variables where p = 0.05 or higher.
where all variables are as defined below. Taken at face value, the single most important determinant of whether a respondent sought financing at a non-traditional financial institution was whether s/he had been denied credit within the past five years. The following table shows how the probabilities vary with each of the variables included in the analysis. The odds are to be understood as signifying the likelihood, for example, of the respondent having been denied credit compared with those that were not: an individual having been denied credit was twice as likely to have an account with a nontraditional financial institution than someone whose credit application was not rejected. As expected the estimated regression coefficients and the log odds tell exactly the same story in terms of the importance of the variable. Variable X1 = Denied Credit within past five years X2 = 1 if credit is ‘good’, zero if ‘bad’ X3 = Number of individuals in household X4 = age of respondent Log Odds 2.08 1.19 1.11 0.98
If we now wished to use the model to determine what the likelihood was of an individual 23 years old, denied credit in the past five years, with a good credit record and living on his own, we would simply insert these values into the equation: ZA = - 1.43 + 0.73(1) + 0.28 (1) + 0.10 (0) - 0.02 (23) = + 0.04, so that PA = exp(0.04)/1 + exp(0.04) = 0.51, that is, there is a 51 per cent chance that this individual will have an account with a non-traditional financial institution. Note the importance of having a good credit score: had this individual had a poor credit history the odds would have increased to 63 per cent [PA = exp(-1.16)/1 + exp(- 1.16)]. We can explore further some of the properties of logistic regression by reverting to the example given above concerning the characteristics of failed and viable Jamaican banks. In our previous discussion individual variables were tested pair-wise, that is, the difference in means between two groups were compared and the t-test applied to determine whether the observed differences could have arisen by chance. The main problem with the pair-wise approach is that it neglects possible interactions that may exist among the variables; only after such effects are controlled for will we be able to establish the contribution of individual variables towards explaining the phenomenon under investigation. We noted above that ordinary least squares is ideal where the dependent variable is measured continuously and logistic regression where the dependent variable takes on either a binomial or multinomial form, as in the case to hand. The logistic analysis takes the following form: Z = β0 + β1X + β2Y + β3Q,
Where X is a vector of bank financial characteristics, Y a vector of other bank specific characteristics and Q is a vector of macroeconomic variables. Included in X would be capital adequacy, asset quality, liquidity and earnings variables; included in Y would be variables that measure the efficiency (inefficiency) of bank management, size, bank risk, audit status and ownership (foreign vs. local); and Q variables such as real GDP growth. The final version of the model is shown below and includes only those variables that were significant at the 5 per cent level or higher; the accompanying table defines the variables and indicates the level of significance for each of coefficients shown. Z = 1.01 +0.1X1 + 0.05Y1 + 1.36Y2 – 10.15Q.
Variable Intercept X1: Change in the gross capital/risks assets lagged two periods Y1: Management inefficiency lagged on period Y2: Size lagged two periods Q1: Real GDP growth lagged three periods Definition Long term debt + Equity/Loans + leases Total operating expense/net interest revenue + other operating income Log of total assets GDP growth in constant 1986 prices (t-statistic) and level of significance (* = 0.05; ** = .01) (0.20) (2.1)* (2.7)* (2.7)** (- 2.9)**
From the fairly long list of potential variables the only ones shown to bear a statistically significant relationship to bank failure are lagged changes in capital adequacy and management inefficiency variables, bank size and the growth of real GDP. Real GDP growth is shown to reduce the probability of failure; as the economy improves, so too does the performance of the banking sector as a whole. All other variables, including bank size, are shown to increase the risk of failure. If bank size is taken to indicate an increase in loans and investments, then the expansion appears to generate a subsequent decline in asset portfolios contributing to bank failure. The positive association between declining capital adequacy and poor management performance hardly requires comment. Of equal (or perhaps) greater interest is the extent to which the model was able to correctly classify failed banks. The results indicate a high degree of accuracy, with a correct classification rate of 96 per cent though the model appears to have a better record identifying viable (97.8 per cent) than non-viable (81.2 per cent) banks. While the model’s ability to accurately classify viable banks remains high, there is a marked decline in its ability to identify failures, with the correct percentage declining to roughly 57 per cent one, two and three years before bankruptcy. In other words, the model is almost as likely to miss as it is to correctly identify a potentially bankrupt bank. One of the main uses of regression analysis is prediction, using the estimated relationship to forecast future values of the dependent variable. In our discussion of regression analysis we assumed the dependent variable pertained to an economic or financial magnitude, but as we have seen there is no reason to restrict its application solely to those
types of variables. Logistic analysis (and its regression variants) generates estimates of the probability of an outcome falling into one or more mutually exclusive categories, and thus can be used to address a wider set of issues. Most such analyses claim to be quite good in terms of classification accuracy, the reported proportion as we have seen often exceeding 90 per cent. But a stronger test of the predictive power of such models is how well they classify observations that were not part of the data set used to generate the original results. For this reason, many financial economists typically withhold part of the sample, testing model accuracy by solving the equation and comparing forecast with actual outcomes. Of course, in most forecasting situations the actual values of the input variables will not be known; that is, the test assumes that the forecaster knows with certainty the correct values of the model’s independent variables – an extremely unlikely possibility which biases the test procedure in favour of correctly predicting the variable of interest. Bluntly put, this means that the best forecasts will more likely than not combine professional judgement with quantitative rigour, as opposed to relying solely on model output. One of the most interesting demonstrations of this conclusion appeared in an article written a few years ago in the Wall Street Journal Europe to predict which film was likely to win the Oscar for best picture for 2005.20 To that end the editors of the paper invited a statistician to design and build a model that could be used to predict which motion picture had the highest probability of being voted the Best Film of the year. The model’s forecast was then compared with the predictions of two respected Hollywood pundits – one the paper’s own film columnist, the other an independent film reviewer. The Journal’s statistical expert estimated and used a model similar to the one described above, whose output also consisted of assigning a value of between one and zero to the five films nominated in a given year; a value of one implies that a given film is certain to win an Oscar, while zero implies there is no chance at all of winning. All of the predictions are conditional on a set of factors known or thought to affect the probability of winning. Three variables appear in the basic version of the model: (1) the total number of Oscar nominations wracked up by a given film, (2) the number of Golden Globe awards won, and (3) whether or not a given film is a comedy. The logic of each is pretty straightforward. • Past best films typically garnered a large number of overall Oscar nominations; each additional nomination increases the odds that a given film will win.
“A Winning Formula?” The Wall Street Journal Europe (February 25, 2005).
• The number of Golden Globes won, another Hollywood award ceremony that precedes the Oscars, provides an alternative, independent measure of a given film’s popularity that in the past has been a good lead indicator of winning. • And finally, since not a single comedy nominated for an Academy Award over the past twenty years has won, classification as a comedy everything else held constant drastically reduces a film’s chance of success. How good is the model? And are there any modifications that might improve forecast accuracy? According to the modeller, the equation is 90 per cent accurate; that is, nine out of ten times over the past twenty years, it correctly predicted the best film in each of those years. Of course, statistical models usually perform extremely well within the estimated sample period, so that a more stringent test is how good the model’s predictions are outside the period over which it was estimated. To test the robustness of the model, the three variable version was tweaked to take into account two plot devices, namely, whether the film involved a hero riding on a horse or included a leading character with a disability. These modifications did improve somewhat overall forecast accuracy; the expanded model correctly predicted 19 of the previous 20 best films – one more than the basic model The main difficulty with these additions is that they are too ad hoc: they are meant to correct specific past errors and accordingly are unlikely to outperform the basic model over time. For example, in its original version, the model incorrectly pegged Born on the Fourth of July as the best film for 1989, when the leading character’s disability is factored into the model it generated the correct forecast for that year: Driving Miss Daisy. How well did the model predict the Best Film for 2005? The results of the basic three variable model predicted that The Aviator would be 2005’s Best Film, assessing the probability of winning at 84.6 per cent – a virtual certainty. Million Dollar Baby, the winner predicted by the Journal’s experts, was deemed to have only a negligible (13.5 per cent) chance of winning. According to the article from which these forecasts are extracted: “should a picture other than The Aviator walk away with Best Picture, it would be the biggest upset of the past twenty years.” And the winner was: Million Dollar Baby!
7. Analysis This section of the dissertation describes the data used in your study and provides a detailed discussion of the results obtained. If you are relying on secondary data, they should be described in detail: what each variable measures (UK Gross Domestic Product), the units in which they are expressed (constant 1990 UK pounds sterling), and whether the data have been transformed and if so in what way (to logarithms, differenced, whether the original or transformed data), and their source(s). If your dissertation relies upon survey data or interviews you will be expected to provide detailed information on the size, structure and response rate to your survey. Surveys, while used in finance dissertations, are not all that common. Interviews, by contrast, are as they can help to clarify or elucidate unresolved issues; an example of the use of interviews for this purpose was described above in connection with the impact on reported earnings associated with the shift from traditional to EU accounting standards in Turkey. This is an important distinction: unlike samples, which are intended to describe one or more characteristics of the larger population from which the sample was drawn, and thus must satisfy fundamental statistical criteria, interviews do not. In fact, the number of interviews can be relatively small, so long as the interviewees are carefully chosen; by virtue of their position, experience or background they should be able to pronounce on important issues with authority. There is no presumption here that the views expressed can be generalised to the profession as a whole, even though such opinions may indeed command widespread agreement among practitioners. Much more common is the use of statistical procedures, typically regression analysis. Many financial or economic issues lend themselves to quantitative analysis, ranging from whether particular markets can be shown to be informationally efficient (and at what level of efficiency) to whether asset returns are better described using the Capital Asset Pricing Model (CAPM) or the Arbitrage Pricing Theory (APT). In the latter case, the key distinction between the two relates mainly to whether beta (which measures market or macroeconomic risk) is a comprehensive enough indicator of risk to provide a satisfactory explanation for a given observed pattern of returns or whether additional, more specific macro variables are required. Regression analysis is one of the best methods that can be used to discriminate between these alternative models. For one thing, it provides a test of whether the estimated beta is significantly different from zero. And for another, the results indicate how much of the variance in returns can be explained by that single risk measure in comparison to the proportion explained using the macroeconomic variables suggested by the APT. The model that accounts for the greater proportion of the return variance could be considered superior, though it should be pointed out that the quantitative analysis explains actual not expected returns, which is what the model was devised to explain; on the other hand, there are reasons to believe that over the longer term the two should coincide.
Let us now demonstrate, by way of an illustration, how regression analysis can be used in the production of MBA dissertations; we will look at the results presented in a recently submitted dissertation that compares the performance of the two asset valuation models. The following exhibits summarise the regression results used to assess the dissertation’s main objective; the author’s data were derived for a sample of three Thai banks, the country’s largest and most important financial institutions. Separate equations were estimated for each bank and for each valuation model, six regressions in all. The principal criterion used to evaluate the author’s hypothesis was to establish which model explained a greater proportion of the variance in historic returns, and whether it did so consistently. Her results for both models were extremely robust. The three macro variables used in her APT analysis were GDP, the bhat/dollar exchange rate, and local short term interest rates. All things constant, we might expect stronger economic growth to be associated with higher returns, owing to higher lending activity and accordingly bigger margins. The relationship between the other financial variables and bank returns is less clear. Interest rates are both a cost and a source of revenue, the first consideration arguing in favour of a negative relationship (higher funding costs depress lending margins), the latter a positive relationship, given that higher interest rates are indicative of stronger economic growth, Thai banks may be able to increase lending margins to the benefit of their bottom lines. As to exchange rates, a weaker bhat-dollar rate could be consistent with lower returns, by raising external funding costs inducing a decline in earnings; on the other hand, a weaker bhat would positively effect export industries, stimulating borrowing to finance the expansion of production capacity to accommodate anticipated higher foreign demand. In each regression, the positive impact of rising economic activity on bank returns is affirmed, the estimated regression coefficients are of roughly the same order of magnitude and all are highly significant statistically. The results indicate further that higher interest rates are consistent with higher bank returns, while a decline in the exchange rate (more bhats required to purchase a US dollar) is shown to negatively effect bank returns, underlining the importance of expanded lending opportunities associated with devaluation than higher funding costs. The models are estimated using a relatively large number of observations, while the results shown below indicate that the estimated equations explain 70-80 per cent of the variance in returns. The estimated regression coefficients provide an estimate of the impact of changes in each of the independent on the dependent variables on bank returns holding all other variables constant (mathematically they are known as partial derivatives). Note that all of the explanatory variables are statistically significant; generally speaking, t > 2.00 usually indicates significance at the 5 per cent level or better meaning there is less than a 5 per cent chance that the observed relationship could have arisen by chance.
Exhibit 21 Regression Results 1. Results for CAPM
Bank Siam Commercial Bank(SCB) Bangkok Bank (BBL) Kasikorn Bank (KBANK) Linear Equation y = 1.6112x + 0.0061 y = 1.2462x + 0.0023 y = 1.2889x + 0.0036 P-Value 2.96E-32 1.54E-32 9.83E-33 T-stat 16.469 16.600 16.690 β 1.6112 0.0271 0.0241 R2 69.86 70.20 70.42
2. Results for APT Siam Commercial Bank
Coefficients Constant GDP Δ Exchange Rate Bhat Interest Rate 72.24829 0.00013 -3.75759 4.46881 Standard Error 44.10934 0.00003 0.68122 1.05956 t Stat 1.63794 4.42104 -5.51599 4.21760 P-value 0.11014 8 8.68E-05 3.08E-06 0.00015 9
Coefficients Constant GDP Δ ERB Interest rate 22.7984904 0.0002463 -4.3289166 5.6616889 Standard Error 49.5515341 0.0000321 0.7652648 1.1902902 t Stat 0.4600966 7.6695970 -5.6567564 4.7565619 P-value 0.6482139 0.0000000 0.0000020 0.0000315
Coefficients Constant GDP Δ ERB Interest rate 54.3365136 0.0001110 -2.9116377 4.8544467 Standard Error 33.2736488 0.0000216 0.5138721 0.7992749 t Stat 1.6330194 5.1471101 -5.6660743 6.0735634 P-value 0.1111822 0.0000096 0.0000019 0.0000006
Exhibit 22 Summary Statistics for Three APT Regressions
Siam Commercial Bank 0.7332 0.7109 15.5176 40
Statistic R2 R2 (adjusted for degrees of freedom) Standard Error Number of Observations
Source: Supanapasot (2009).
Bangkok Bank 0.8002 0.7835 17.432 40
Kasikorn Bank 0.7932 0.7759 11.706 40
Of course, if there were reasons to do so, we could have tested whether the coefficients differed significantly from any value. In which case, we would have tested the following statistic: t = (regression coefficient – assumed value)/standard error. For example suppose we wanted to establish whether the change in the exchange rate (DERB) on Kasikorn’s return differed significantly from one, we would then subtract the estimated regression coefficient from one and divide the difference into the standard error: - 2.991 – 1.000/0.5139 = – 2.52. Again, there is less than a 5 per cent chance the actual coefficient could differ from one by chance. Turning to the CAPM results, we note that for each bank the betas are all highly statistically significant, though the estimated beta for Siam Commercial Bank is considerably higher than for the other two banks in the sample, which are extremely small; in other words, Bangkok Bank and Kasikorn Bank exhibit little sensitivity to changes in the overall market while for each 10 per cent increase (decrease) in the market return SCB’s return increases (decreases) by 16 per cent. In terms of the way the author set out to test her hypothesis, namely, that the APT outperforms the CAPM in terms of explaining historic returns, the differences as measured by R2 are comparatively small. True, the APT regressions have uniformly higher R2 s than the CAPM regressions but the differences are not big enough to settle the issue, the author having reached the same conclusion. The use of statistical analysis is not mandatory for finance dissertations but it does, as we have seen, provide a firmer foundation upon which to base the results of your study. This conclusion applies only where the appropriate statistical technique was chosen and correctly employed. In the case to hand, we might note some of the things that were omitted from the Thai bank analysis, and thus highlight the risks associated with using an approach with which you are not thoroughly familiar. The regression model rests upon a number of specific assumptions that all too often are not spelled out. Practically, this will diminish the confidence the reader will have in the accuracy of the claims being made. For example, a key assumption of the regression
model is that the residuals (the difference between the actual and predicted values) must be serially uncorrelated and of constant variance (heteroskedasticity).21 If the first assumption is violated it could signify that an important variable was omitted from the analysis; the practical consequence of violating the second requirement is that it biases downward the standard errors of the regression coefficients possibly compromising the validity of the t-test used to establish statistical significance. These and other issues were discussed at length above and there is no need to repeat that discussion. What does bear repeating is that the diagnostic tests should be as a matter of course reported alongside the regression results. In this way, the reader can decide whether you have made your case convincingly. The key point here is that use of one of more statistical procedures in a dissertation is not enough; you must demonstrate a clear understanding of the method, its assumptions and what to do if these assumptions are violated. Failure to do so can lead to a loss of points, cancelling out any benefits you may have gotten for having tackled your topic in a formal and rigorous way. Finance dissertations depend upon the use of data to test and support the hypothesis you have chosen to investigate. The points noted above apply equally whether the data are subject to formal statistical analysis or not. For example, many dissertations rely upon ratio analysis. Financial ratios are normally calculated from information derived either from data bases or company annual reports, and are then examined to determine whether any consistent patterns can be identified over time. All too often the ratios, neatly calculated and summarised in tabular form, are left to speak for themselves; alternatively, the text merely repeats what is obvious from the table itself. ‘The profit ratio rose in 2003, fell in 2004 and 2005, and then rose again in 2006.’ The obvious question left unanswered is: Why? It is insufficient to present ratios without first providing a clear statement of what the ratio is intended to measure or fail to explain the reason(s) behind the observed patterns. For example, in a recent dissertation analysing the performance of the IBB bank since its inception, the author noted and explained the trends indicated for a number of key financial variables: deposit growth, loan growth, expenses, and so forth. While noting that the bank had yet to report a positive return on equity, the failure was neither explained nor its significance assessed. No matter how favourably some financial variables behave, an unprofitable bank can not survive for very long. Ironically, had the author connected the data shown for expenses and loan losses with operating income, the answer would have been obvious: both the bank’s expense and loan loss ratios were way out line with those of other British banks.
Two other problems we might mention are multicollinearity and non-linear regression. The first problem signifies that the independent variables are highly correlated producing too much ‘noise’ to detect their separate effects. The problem can be solved either by combining the variables or by dropping one. The other problem is easily detected and corrected: plot the dependent variable and observe the shape of the scatter.
8. Conclusions, Limitations, Recommendations and References The purpose of this section of your dissertation is to pull together, in narrative form, the main findings of your study together with a statement of its limitations and hence directions, if any, for future research; it may also contain a list of recommendations that follow from your findings. All of the main elements of this section are clearly and carefully spelled out in Exhibit 2, though it may be useful to provide brief commentary on some of the points noted there. Your conclusion should be expressed clearly and concisely; this section is not intended to repeat other parts of your dissertation but rather to make clear what you have actually accomplished. The most obvious point(s) worth making is whether the hypothesis you set out to test was, or was not, supported by your analysis. Negative findings can be as important as positive results, providing the analytical procedures employed are up to the task. You should also indicate the extent to which your results correspond to those noted in your Literature Review. If they do, you should say so. If they provide only partial support, then the reasons need to be spelled out and even more so if they contradict previous research findings. At this point it is useful to describe the study’s limitations, if any. One of the main shortcomings of a dissertation is that the results typically apply to a particular company or to several companies in a given sector or industry. Until your findings are replicated for other firms in the same industry or for the sector as a whole or for allied sectors, it will not be possible to generalise them nor should you attempt to do so. Indeed, this is one of the reasons why you are encouraged to make recommendations or suggest extensions to the existing body of research, which now includes your dissertation. The final component of your dissertation is the bibliography, which lists all the works you consulted in the preparation of the study and those that were specifically referenced in the body of your text. You should familiarise yourself with the correct procedure for citing references. When referring to books, there should be an indication if there is more than one edition of the text you used; if so, that edition together with the location and name of the publisher should be included (eg., London and New York: Macmillan). It is common to provide the date of publication within the brief reference used in the text the way the references are organised in your bibliography so that you do not have to repeat the date twice. The format used for articles is somewhat more complex, but essential if the source is to be properly identified. If the article appears in a book, the name of the article is set within quotes, the editor(s) of the book named [Jones and Smith (eds.)] followed by the italicised title of the book, with the publisher information provided as described above.
For articles appearing in journals, you should again set the title of the article in quotation marks, followed by the name of the journal (in italics), the edition and volume number of the journal where the article can be found as well its page numbers (386-412). If you cite information published in newspapers or magazines, if the article(s) has an author you follow the procedure used above; newspapers do not always have an edition or volume number, in which case substitute in its place the date when the article appeared (Sunday, 5 April 2009). Where no author is indicated, as is not uncommon, refer to the source and its date of publication and list that in your bibliography (Times (2009): “The G20: Failure or Success?” Sunday, 5 April 2009). Some dissertations contain appendixes, the value of which is open to question. If your dissertation must have one or more appendixes, though there is no compelling reason why it should, then keep to the essentials. In most instances authors provide a record of the data used, or the survey questionnaire and related information (how or where it was administered), the number of respondents, the response rate, the number of follow-up interviews conducted, with whom, how long they took, and so forth. It should be clear that all of this information could just as easily have been included in the appropriate sections of the dissertation. In others, it provides summary statistics for the data used in the analysis, though again it is unclear why such information was not presented in the main body of the text. On the other hand, where you have investigated alternative models, or performed failed experiments, it is probably best to include these in an appendix so as not to the clutter the text with too much information. It is appropriate to refer to these additional tests or experiments, note the findings and present the results so that anyone interested in seeing what you have done will be able to do so. In most instances footnotes referring to these additional results are more than adequate. In short, you should consider carefully the need to include the supplementary material that is normally confined to appendixes. The information contained in this Guide is intended to provide you with an overview of what to expect as you approach the dissertation stage, to eliminate as much as the mystery as possible, to assist you with topic selection, explain the role of your supervisor, highlight potential sources of information, including an assessment of their strengths and limitations, how to evaluate the merits of alternative sources of information you are developing, approaches to research, including a discussion of statistical inference and the various statistical procedures that have been used in dissertation, as well as much practical information that (hopefully) will address many student FAQs. Forewarned is forearmed.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.