Uploaded by Mike Dell

5.0 (1)

Research Methodology

COURSE OVERVIEW

Management Research is increasingly an important tool in all areas of management activity and it support in business decision process. This course aims to comprehensively equip students on all areas related to the design, analysis and solving the management research problem. Specifically it would enable students to select and define a research problem, develop an appropriate research plan, write a research proposal, collect data, carry out analysis and submit a report of main findings. The course would provide an exposure to students of different analytical techniques including quantitative multivariate techniques of data analysis as well as qualitative techniques The course would cover the foundations of research, sampling, data collection, data analysis and presentation of main findings. The students on completion of the course shall develop the following skills and competencies: A. B. C. D. E. Writing of research proposal Preparing research design Techniques of data collection & sampling Data analysis using software package SPSS Report writing

11.556

© Copy Right: Rai University

i

RESEARCH METHODOLOGY

SYLLABUS

Unit: Research Methodology Unit value: 4 Credits Unit level: S2 Unit code: MBA-208 Description of unit

The aim of this unit is to introduce students to the practical aspects of management research. The focus of this unit will be to integrate techniques and concepts learnt at a theoretical level in other modules with their practical application to management research problems. Research is increasingly an important tool in all areas of management activity and this course aims to comprehensively equip students on all areas related to the design, analysis and presentation of management research. Specifically it would enable students to select and define a research problem, develop an appropriate research plan, write a research proposal , collect data, carry out analysis and submit a report of main findings. The course would provide an exposure to students of different analytical techniques including quantitative multivariate techniques of data analysis as well as qualitative techniques .The course will cover the foundations of research, sampling, data collection, data analysis and presentation of main findings. techniques in research, limitations of quantitative techniques in business research Steps in the research process: selection of problem, literature survey, formulation of hypotheses, research design, analysis and report writing. Designing the research : defining the research objective, research design: exploratory, descriptive, causal. Write-up: writing a research proposal, report: formatting: title page, abstract, body, introduction, methods, sample, measures, design, procedures, results, conclusions, references: reference citations in the text of your paper, reference list in reference section; tables, figures, and appendices, presentation of results: 2. Techniques of data collection and sampling Sources of marketing data:retail audits, consumer panels, scanner service and single source system, diary method, internet as a source of data: primary and secondary Qualitative research techniques: depth interviews, focus groups, projective techniques, limitations of qualitative methods, observation, case study Survey design: data sources: primary and secondary, questionnaire layout: structured , unstructured, types of information wanted, sequencing, types of questions: paired comparison, semantic differential, bias in question, pilot testing, types of questionnaires: personal interviews, telephone interviews, mail surveys. Measurement of attitudes: scales of measurement: nominal, ordinal, interval and ratio, attitude scales, eg Thurston scale, Likert scale, Semantic Differential. Sampling issues in research: definition of universe, determining sampling units, determining sampling frame, determining sample size, errors in sampling: sampling error, non sampling error, problem of non response, selecting samples: probability vs non probability methods

Summary of outcomes

To achieve this unit a student must:

• Investigate the process of research design for business

decision-making

• Determine appropriate techniques of data collection and

**sampling for managerial research
**

• Investigate application of methods of data analysis and

**multivariate techniques for research
**

• Undertake a piece of research making effective use of

research methods for business and management

Content

1. Fundamentals of research process Role of research in business decision making: types of research in business decision making, quantitative and qualitative

ii

© Copy Right: Rai University

11.556

Probability sampling: simple random sampling, use of random number tables, stratified sampling, cluster sampling, systematic sampling Non probability methods:convenience sampling, judgement sampling, quota sampling, snowball sampling 3. Data analysis Tabulation:coding, simple tabulation, cross tabulation, weighting Applications of hypotheses testing to research situations: factors influencing choice of statistical technique, tests of hypotheses and significance for small samples, paired sample t tests, tests of hypotheses and significance for large samples, tests of proportions, choice of level of significance, power of a test, interpretation of of p - values in computer output, non parametric tests of hypotheses, bivariate analysis: chi-square test, cross tabulations and chi square test, analysis of variance: one way ANOVA, F statistic, Latin Square Design 4. Multivariate analysis Correlation and multiple regression: Correlation coefficient, multiple regression: when to use, scatter plots, fitting the Least Squares model, interpretation of parameter estimates, mean square error(MSE), testing significance of independent variables, goodness of fit, coefficient of determination, multiple regression model, Adjusted R2, stepwise multiple regression, problems when using regression analysis, interpretation of computer output for multiple regression Factor and cluster analysis: application areas, methods, factor interpretation, interpretation of computer output Multidimensional scaling; application areas, methods, interpretation of computer output Discriminant analysis: application areas, methods, interpretation of computer output Clustering Analysis: application areas, methods, interpretation of computer output Conjoint analysis: application areas, methods, interpretation of successful results Computer skills: use of statistical packages such as SPSS for carrying out data analysis

**Outcomes and assessment criteria
**

Outcomes Assessment criteria To achieve each outcome a student must demonstrate the ability to: 1. Investigate the process of research design for business decisionmaking • • • 2. Determine appropriate techniques of data collection and sampling for managerial research • • • 3. Investigate application of methods of data analysis and multivariate techniques of analysis for research 4.Undertake a piece of research making effective use of research methods for business and management • • Describe the importance of management research in business Understand steps in the process of design and implementation of a research Demonstrate knowledge and skill in proposal and report writing Assess different techniques of data collection including quantitative and qualitative data Prepare a sample questionnaire to collect data Determine survey sample size and composition based on principles of sampling Apply techniques of hypotheses testing to data collected Propose use of multivariate techniques of analysis for sample data

RESEARCH METHODOLOGY

• • •

Specify research objectives and write a research proposal Collect and analyse data using different techniques of data analysis studied in this module Prepare report and presentation of main findings

Guidance

Generating evidence Assessment: Component weighting In-module coursework: 40% End-of-module examination: 60% The in-module component will require the application of research methodology to a management problem and will be completed in small groups of four or five students. Students will be required to define the research task and objectives; identify the target population and sampling frame; describe the research methodology to be used; conduct the research survey; and use an appropriate computer package to analyse the data. Student progress will be monitored by a series of five meetings alongside student logs, with marks awarded to individuals for each section completed (10% of component assessment). Each group will present their findings in a group report and presentation with equal mark allocated to all (20% of component assessment). A report will be submitted individually on a topic of strategic nature reviewing available literature. This component will carry the balance 10% marks. The end-of-examination component will consist of a 3 hour practical where the students will be required to define research issues, apply statistical and quantitative techniques , analyse a data set and prepare a short report on their findings.

11.556

© Copy Right: Rai University

iii

Links

The unit is intended to give a good understanding of issues impacting business organisations. It is part of the MBA management pathway and links with the all finance, mathematics and statistics, operations, HRM, marketing, strategic, accounting, and business modules

RESEARCH METHODOLOGY

Nargundkar R – Marketing Research Text and Cases((Tata McGraw- Hill 2002) Bell J- Doing your Research Project (OU Press, 1993) Diamantopoulos A and Schlegelmilch A- Taking the fear out of Data Analysis (Dryden Press, 1997) Easterby-Smith M et al- Management Research-an introduction (Sage Publications, 1991) Miller D C- Handbook of Research Design and Social Measurement (Sage Publications, 1991) Trochim W M K- Research Methods (Atomic Dog, 2003)

Resources

The business press can be a significant source of information. Companies such as Video Arts produce a variety of videos, which may be useful in covering international finance topics. However, one of the best sources for information is the World Wide Web sites that can be used for providing information and case studies (eg http://www.bized.ac.uk/ which provides business case studies appropriate for educational purposes). Others are http:// www. businesscases.org/, www. Businesscase.com/, http:// www.icongrouponline.com, www. 3.ibm.com/e-business, www. 5.ibm.com/e-business/uk/ case_studies. Library for secondary research and CD-ROM databases are to be specially targeted. Some are listed for ready reference www.faust.information.com, www.un.org/Pubs/ whatsnew, www.un.org/Pub/whatsnew/electron.htm, www. businessmonitor.com, and Global CD-ROM Directories gives the business listing on CD-ROM. CD-ROM Software is available on www.sba.gov/bi/bics/biccdrom.html. Important journals available for consultation are: British J of Management, European Finance Review, Global Finance J, J of International Business Studies, International J of Human Resource, International J of Intercultural Relations, International Trade J, J of World Business, J of International Economic Law, J of International Economic, Thunderbird International Business Review, and Law and Policy in International Business. Other relevant material is found in Financial Press, Financial Times, Investors’ Chronicle, financial pages of quality newspapers, and Annual Reports of organisations

Delivery

A mixture of lectures to cover basic theory and case studies for class discussion will be used as reinforcement of key concepts. Much of the module is statistical and students will be expected to understand the underlying concepts but emphasis will be placed on the ability to use statistical software on a PC (eg Minitab or SPSS) to analyse data. Throughout the module, creativity and communication skills will be emphasised.

Suggested Reading

There are a large number of textbooks available covering the areas contained within the unit. Examples are: Kothari C R – Quantitative Techniques(Vikas Publishing House 3rd ed.) Levin R I & Rubin DS - Statistics for Managemen(Prentice Hall of India, 2002) Aaker D A , Kumar V & Day G S - Marketing Research( John Wiley &Sons Inc, 6th ed.)

iv

© Copy Right: Rai University

11.556

RESEARCH METHODOLOGY

RBS

RESEARCH METHODOLOGY (MBA)

11.556

CONTENT

Unit No. Lesson No. Topic Lesson Plan Page No. vii

Lesson 1 Lesson 2 Lesson 3 Lesson 4 Lesson 5 Lesson 6 Lesson 7 Lesson 8 Lesson 9 Lesson 10 Lesson 11 Lesson 12 Lesson 13 Lesson 14 Lesson 15 Lesson 16 Lesson 17 Lesson 18 Lesson 19 Lesson 20 Lesson 21 Lesson 22 Lesson 23 Lesson 24 Lesson 25 Lesson 26 Lesson 27 Lesson 28

Role of research in business decision making Steps in Research Process Research Proposal Tutorials Research Design and Experimental Designs Tutorials Writing The Research Techniques of Data collection Tutorial Questionnaire Design Issues in Questionnaire Measurement and Scaling Sampling Issues in Research Designing Sample Applications of Market Research Data coding and Analysis Principles of Statistical Inference and Confidence Intervals Statistical Inferences and Sampling Distribution Model Building And Decision Making Principle of Hypothesis Testing Testing of Hypothesis – Large Samples Tutorial Tests Of Hypotheses – Small Samples Non –Parametric Tests Chi-square Test Analysis of Variance (ANOVA) Applications of Anova Application of Correlation Technique in Research Methodology

1 10 20 27 28 36 37 48 64 65 69 74 79 84 90 98 101 105 108 114 120 128 129 134 140 152 158 162

11.556

© Copy Right: Rai University

v

166 169 173 175 178 181 190 195 199 © Copy Right: Rai University 11.556 .556 CONTENT Unit No. Lesson No. Lesson 29 Lesson 30 Lesson 31 Lesson 32 Lesson 33 Lesson 34 Lesson 35 Lesson 36 Lesson 37 Lesson 38 Topic Multicollinearity in Multiple Regression Multiple Regression Making inferences about Population parameters Multicollinearity in Multiple Regression Applications of regression Analysis in Research Regression Analysis using SPSS Package Factor Analysis Principal Component Analysis Multidimensional Scaling Further applications and theory the Multidimensional Scaling using state software Lesson 39 Lesson 40 Lesson 41 Lesson 42 Lesson 43 Lesson 44 Conjoint Analysis Discriminant Analysis Cluster Analysis Interpolation and Extrapolation Case Study Case Study 204 207 213 218 224 231 247 Page No.RESEARCH METHODOLOGY vi RBS RESEARCH METHODOLOGY (MBA) 11.

556 © Copy Right: Rai University vii .RAI UNIVERSITY RAI BUSINESS SCHOOL RESEARCH METHODOLOGY LESSON PLAN Program: MBA Session Year 2 Semester 3 Subject Title : Research Methodology Title Name of the Book & Author Page no. of Course pack Daily Lesson Schedule Name of The Topic As Given In RU Syllabus LO Assi gn Remarks Lesson Practical 1. UNIT III Lesson 16 11. Unit 1 Lesson 1 Lesson 2 Lesson 3 Lesson 4 Lesson 5 Lesson 6 Lesson 7 Fundamentals of research process Role of research decision making in business 1 1 1 1 1 1 Steps In Research Process Research Proposal Tutorial Research Design & Experimental Designs Tutorial Writing the Research 1 1 1 Techniques of data collection and sampling Techniques of Data collection Tutorial Questionnaire Design Issues in Questionnaire Mesurement & Scaling Sampling Issues in Research Designing Sample Application to Market Research Data analysis Data coding & Analysis 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2. UNIT II Lesson 8 Lesson -9 Lesson 10 Lesson 11 Lesson 12 Lesson 13 Lesson 14 Lesson 15 3.

of Course pack Remarks Lesson 17 Lesson 18 Lesson 19 Lesson 20 Lesson 21 Lesson 22 Lesson 23 Lesson 24 Lesson 25 Lesson 26 Lesson 27 4.RESEARCH METHODOLOGY Daily Lesson Schedule Name of the topic as given in RU Syllabus LO Assi gn Lesson Practical Title Name of the Book & Author Page no. UNIT IV Lesson 28 Lesson 29 Lesson 30 Lesson 31 Lesson 32 Lesson 33 Lesson 34 Lesson 35 Lesson 36 Lesson 37 Lesson 38 Lesson 39 Lesson 40 Lesson 41 Lesson 42 Lesson 43 Principles of statistical inference and confidence intervals Statistical Inferences & Sampling Distribution Modeling & Decision Making Principles of Hypotheses Testing Testing of Hypotheses – Large samples Tutorial Testing of Hypotheses – Small samples Non –Parametric Tests Chi-square Test Analysis of Variance (ANOVA) Applications of Anova Multivariate Analysis Application of correlation in research methodology Introduction to Regression Analysis Multiple regression Making inferences about Population parameters Multicollinearity in Multiple Regression Applications of regression Analysis in Research Regression Analysis using SPSS package Factor Analysis Principal Component Analysis Multidimensional scaling Application of Multidimensional scaling using statistical software Conjoint Analysis Discriminant analysis Cluster analysis Interpolation & Extrapolation Case Studies 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 viii © Copy Right: Rai University 11.556 .

According to Clifford Woody “Research comprises of defining and redefining problems. • Communicates the findings and their implications for collecting information. it helps to improve management decision making by providing relevant. Every organization operates under some degree of uncertainty. Some people consider research as a movement. Intelligent use of Research Tools is the key to business achievement 11. Definition: Some of the definitions of Research are: 1. The organization should be Consumer oriented and should try to understand consumer’s requirements & satisfy them quickly and efficiently. and • Improve understanding of marketing as a process. promotion & distribution of ideas.556 © Copy Right: Rai University 1 . • Formulating a hypothesis. and at last carefully testing the conclusions to determine whether they fit the formulating hypothesis”. • Analyzing the facts and • Reaching certain conclusions either in the form of solutions towards the concerned problem or in certain generals for some theoretical formulation. although its can be minimized with the help of research methodology. • Manages and implements the data collection process. To choose the best line of action (in the light of growing competition and increasing uncertainty). Redman and Mory define research as a “systematized effort to gain new knowledge”. In this lecture we will be discussing the role of research in management and its key ingredients. pricing. Every decision poses unique needs for information gathered through marketing research. and services to create exchange that satisfy individual & organizational objectives. UNIT I FUNDAMENTALS OF RESEARCH PROCESS RESEARCH METHODOLOGY • Collecting the fact or data. and • The public to the marketer through information Information used to identify and define marketing opportunities and problems. Marketing is the process of Planning & Executing the concepts. But first let us understand the meaning of research. This means that any organization should try to obtain information on consumer needs and gather market intelligence to help satisfy these needs efficiently. Thus. a movement from known to unknown. accurate. • Customer. formulating hypothesis or suggested solutions.LESSON 1: ROLE OF RESEARCH IN BUSINESS DECISION MAKING When research is used for decision-making. making deductions and reaching conclusions. Research in common context refers to a search for knowledge. in ways that are beneficial to both the consumer & the organization. In the nut-shell we see that Marketing research specifies • Τhe information required to address these issues. we can say that marketing research is the function that links the • Consumer. In management research is extensively used in various areas. For example. • Monitor marketing performance. 3. It will be clear after going through some important definitions of research. This uncertainty cannot be eliminated completely. goods. It is actually a voyage to discovery. This can only be done only only by research. 2. it means we are using the methods of science to the art of management. Thus. On evaluating these definitions we can conclude that-Research refers to the systematic method consisting of • Enunciating the problem. We all know that. Marketing research is a critical part of such a Market intelligence system. • Designs the method • Analyses. • Refine and evaluate marketing actions. & timely information. Market Research has become an important part in management decision-making. It can also be defined as a scientific and systematic search for gaining information and knowledge on a specific topic or phenomena. the Marketing Concept requires Customer Satisfaction rather than Profit Maximization to be the goal of an organization. we can say that. Research in particularly important in the decision making process of various business organizations. • Generate.

This might involve formulating a hypothesis or a focus question. the researcher might hypothesize that a particular method of computer instruction in math will improve the ability of elementary school students in a specific district. 000/-” Marketing Research Manager forwarded this recommendation to Marketing Vice-President. Relevancy . A planned and organized research saves your time and money.It furnishes three important tasks: • It avoids collection of irrelevant information and saves time and money • It compares the information to be collected with researcher’s criteria for action Once the basic data is collected. the researcher could be interested in how to use computers to improve the performance of students in mathematics. Research is a systematic approach to gather information required for sound management decisions. The researcher has to narrow the question down to one that can reasonably be studied in a research project. and from it deduces approximately the same results. Characteristics of Research a.15000/=. What should be the selling price of model X-240? Explicit Answer The explicit answer is Rs. should be specified. analysis. b. the initial problem that the researcher wishes to study. which an equally competent researcher could duplicate. d. For instance.10. 2 © Copy Right: Rai University 11. At the narrowest point of the research hourglass. since our findings may reflect the effect of education and age rather than income.making. Research provides a base for your business sound decision . Objectivity . The research process usually starts with a broad area of interest. All the factors that we think may affect the study have to be controlled and accounted for. collection etc. which one is investigating but some other extraneous factors also. c.It implies that True Research should attempt to find an unbiased answer to the decision-making problem. The difference lies in the methods and procedures adopted to reach a conclusion. Implicit Question the study demands • All those variables beyond the control should be recorded Structure of Research Most research projects share the same general structure. For instance. and interpretation of the information leading from the question to the answer of Rs.15000/.A reproducible research procedure is one. without controlling for education and age. Illustration Consider the statement: “We recommend that Model X-240 of music system be priced at Rs. Control Must Consider • All the factors. Implicit question posed 2. which are under control. But this initial interest is far too broad to study in any single research project (it might not even be addressable in a lifetime of research). the researcher is engaged in direct measurement or observation of the question of interest. You might think of this structure as following the shape of an hourglass. must be varied as per I hope the meaning of research is clear. usually by analyzing it in a variety of ways. Research is not synonymous to common sense.• It enables to see whether the research is proceeding in the RESEARCH METHODOLOGY right direction e. It is impossible to control all the factors. Collection. Precise information regarding samples– methods. Reproducible . There are three parts involved in any of your systematic finding: 1..556 . For example Suppose we are studying the relationship between incomes and shopping behaviour. the researcher begins to try to understand it. Systematic Approach -Each step must of your investigation be so planned that it leads to the next step. and interpretation of the information leading from the question to answer. The third part deals with the collection.Research is not only affected by the factors. Planning and organization are part of this approach. Control . it will be a height of folly. Explicit answer proposed 3. analysis.

entrepreneur has to make sound decisions. Market analysis has become an integral tool of business policy. production and sales.. Many Researchers define marketing research as gathering. one cannot depend upon casual contacts and personal impressions. which guard against the manufacturer’s subjective bias. The manager’s increased need for more and better information. promotion and price can be utilized so as to obtain maximum results in the context of the factors outside the control of the organization viz. thus replacing subjective business decisions with more logical and scientific decisions. Marketing Marketing research is undertaken to assist the marketing function. competitor and laws of land. Three factors stimulate the interest in a scientific research to decision making.556 I hope you have the clear picture of the functions of the manager in an organization and the role of research in decisionmaking. Research with regard to demand and market factors has great utility in business. If an 11. viz. the researcher begins to formulate some initial conclusions about what happened as a result of the computerized math program. the researcher often will attempt to address the original broad question of interest by generalizing from the results of this specific study to other related situations. For instance.Even for a single hypothesis there are a number of analyses a researcher might typically conduct. Research methodology has been developed as the tool by which business executives keep in touch with their customers. more and more firms/ executives have turned to research methodology as a medium of communication between the customer and the company. RESEARCH METHODOLOGY Importance of Research in Management Decision The role of research has greatly increased in the field of business and economy as a whole. which improves his information base for making sound decisions affecting future operations of the enterprise. recording and analyzing of the facts about business problems with a view to investigate the structure and development of a market for the purpose of formulating efficient policies for purchasing. an executive can quickly get a synopsis of the current scenario. On the basis of the functions we can state some of the general objectives of Managerial Research: • Decision-making objectives • Economic and business objectives • Policy objectives • Product development • Profit objectives • Human Resource Development objectives • Market objectives: i. Particularly when business is too big and operations are too far-flung. economic environment. It is necessary for developing the marketing strategy where in factors under the control of the organization. The following are the major areas in which research plays a key role in making effective decisions. The study of research methods provides you with the knowledge and skills you need to solve the problems and meet the challenges of today’s modern pace of development. At this point. To a certain extent he relies on his salesmen and his dealers to supply him with market information but in recent years. ii. the researcher might conclude that other school districts similar to the one in the study might expect similar results. Market research involves the process of • Systematic collection • Compilation © Copy Right: Rai University 3 . Research methodology is an essential prerequisite for consumeroriented marketing. iii. the Master Production Schedule (MPS) and Material Requirement Planning (MRP) can be efficiently done within the limits of the projected capacity based on the MPS Budgetary control can be made more efficient. Once sales forecasting is done. Marketing research is the link between the manufacturer and the consumer and the means of providing consumer-orientation in all aspects of the marketing function. It is the instrument of obtaining the knowledge about the market and consumer through objective methods. Modern industry with its large-scale operations tends to create a gulf between the customer and the manufacturer. Finally.. on the basis of strong results indicating that the math program had a positive effect on student performance. recording and analyzing of all facts about problems relating to the transfer and sale of goods and services from producer to consumer. The availability of improved techniques and tools to meet this need. Innovation objectives ii. The resulting information overload The usefulness and contribution of research in assisting marketing decisions is so crucial that it has given rise to the opening of a new field altogether called ‘marketing research’. Market research is basically the systematic gathering. he must know who has customers are and what they want. i. product distribution system. advertising. Customer satisfaction objectives • Promotional objectives • Corporate change objectives Role Of Research in Important Areas Through research. Marketing research stimulates the flow of marketing data from the consumer and his environment to marketing information system of the enterprise.

consumer attitudes. Advertising research. incentive schemes. iv.. Product positioning 7. e. Product potential Marketing Research i. New product launching and Product Positioning. research of special consumer groups. On the basis of this data the executive develop plans and programmers. These techniques help in replacing intuitive business decisions by more logical and scientific decisions. shopping habits of consumers. buying motive. employee turnover rates. Media selection for advertising 5. operations research and motivational research. and performance appraisal. units in which product is purchased. it can help in examining the consequences of each alternative and help in bringing out the effect on economic conditions. Some of the areas you can apply research are: • Product development • Cost reduction • Work simplification • Profitability improvement • Inventory control Materials The materials department uses research to frame suitable policies regarding: • Where to buy • How much to buy • When to buy • At what prices to buy. Solving Various Operational and Planning Problems of Business and Industry vi. brand loyalty. cost of living. Consumer buying behaviour 3. Various examples can be quoted such as’ problems of big and small industries due to various factors–up gradation of technology © Copy Right: Rai University 4 11. e.. advertising and promotion viii. Research tools are applied effectively for studies involving: 1.• Analysis • Interpretation of relevant data for marketing decisions Production Research helps you in an enterprise to decide in the field of production on: • What to produce • How much to produce • When to produce • For whom to produce RESEARCH METHODOLOGY This information goes to the executive in the form of data. Human Resource Development You must be aware that the Human Resource Development department uses research to study wage rates. analysis of consumption rates.g. concentration of sales and advertising efforts. Government and Economic System Research helps a decision maker in a number of ways. help in solving various complex problems of business and industry in a number of ways. employment trends. Measuring advertising effectiveness 4. iii. territorial sales quota.g. Test marketing 6. etc. distribution costs. Market Characteristics Research (Qualitative): Who uses the product? Relationship between buyer and user. Competitive position and Trends Research Sales Research: Analysis of sales records. how a product is used. Demand forecasting 2.556 . appraisal of efficiency. Distribution Research: Channels of distribution. quota for individuals. performance evaluation research. basic economic analysis of the consumer market. survey of local markets. packaging research. Advertising and Promotion Research: Testing and evaluating. sales analysis.. It also uses research effectively for its most important activity namely manpower planning. total sales quota. etc. Product Research: Assessment of suitability of goods with respect to design and price. Size of Market (Quantitative): Market potential. ii. Various types of researches. customs and habits affecting the use of a product. market research. vii. may also be considered in management research. distribution channel. when combined together. etc. v.

Hospital Management RESEARCH METHODOLOGY and its impact on lab our and supervisory deployment. Research lays the foundation for all Government Policies in our economic system. Such types of information indicate what is happening to the national economy and what changes are taking place. WTO and its new guidances. For systematic collection of information on the economic and social structure of the country you need Research. • It helps philosophers and thinkers in their new thin kings and ideas. It is a sort of formal training. Railway Social Relationships Research in social sciences is concerned with both – knowledge for self and knowledge for helping in solving immediate problems of human relations. • It helps in developing new styles for creative work. etc. • It may help researchers. effect of government’s liberal policy. ISO 9000/14000 standards and their impact on our exports allocation of national resources on national priority basis. in general.556 © Copy Right: Rai University 5 . Government also uses research for economic planning and optimum utilization of resources for the development of the country. We all are aware of the fact that research is applied for brining out union finance budget and railway budget every year. e. which helps an individual in a better way. Now let us do a certain activity List out the uses of research in the field of 11.g. • It helps professionals to earn their livelihood • It helps students to know how to write and report various findings. to generalize new theories.

RESEARCH METHODOLOGY Temple Management Traffic Control 6 © Copy Right: Rai University 11.556 .

possibilities simultaneously and in a sense it is not quite sure of its objective. RESEARCH METHODOLOGY Exploratory Research Many times a decision maker is grappling with broad and poorly defined problems. This artificiality is the essence of the experimental method. which show the effects. Experimental Research Experimentation will refer to that process of research in which one or more variables are manipulated under conditions. • Research is not synonymous with common sense. researchers are looking for some fresh possible divergent views. which you will have to tested for drawing definite conclusions. then undoubtedly experiments are much more effective than descriptive technique Thus. the degree to which product use varies with income. A covering of widely divergent views is better. reproducible.for example. On the basis of market feedback the company may hypothesise that teenage children do not eat its cereal for breakfast. It allows both implicit and explicit hypotheses to be tested depending on the research problem. as the word implies. just “explore”. Your objective and understanding should be clear and specific. the characteristics of users of a given product. The literature search is fast. relevance and control. but again. and c. or the number who saw a specific television commercial. since it gives you more control over the factors you are studying. For example: A cereal company may find its sales declining. which permit the collection of data. The analysis of specific examples is a sort of case study approach. The experience survey. Conclusive research can further be classified as: • Descriptive • Experimental. Research lays the structure for decision-making. we can classify research into two types: Exploratory research gives rise to several hypotheses. A descriptive study can then be designed to test this hypothesis. • Research is characterized by-systematic. Experiments will create situation so that you as a researcher can obtain the particular data needed and can measure the data accurately. the ability to set up a situation for the express purpose of observing and recording accurately the effect on one factor when another is deliberately changed permits you to accept or reject hypothesis beyond reasonable doubt. idle curiosity approach. Conclusive research 11. Descriptive studies vary in the degree to which a specific hypothesis is the guide. differing from it only in that the investigator thinks there may be a payoff in the application some where in the forest of questions. If you attempt to secure better definitions by analytic thinking. The literature survey. To be of maximum benefit. which are present in a given situation. data for a definite purpose. If the objective is to validate in a resounding manner the cause and effect relationship among variables. to familiarize and.556 Thus.Types of Research On the basis of the fundamental objectives of the research. we can conclude that: • Research methodology minimizes the degree of uncertainty involved in management decisions. Exploratory research is designed to provide a background. a descriptive study must only collect. Exploratory research uses a less formal approach. The experience survey concentrates on persons who are particularly knowledgeable in the particular area. sex or other characteristics. b. These conclusions when tested for validity lay the structure for your decision-making. © Copy Right: Rai University 7 . A part of exploratory research is the investigation of relationships among variables without knowing why they are studied. economical way to develop a better understanding of a problem area in which you are investigating and have limited experience and knowledge. age. the general subject. Researchers are not looking for conclusions. it may be the wrong approach and may even be counter productive–counter productive in the sense that this approach may lead to a definitive answer to the wrong question. you can obtain more conclusive evidence of cause and effect relationships between any two of them. It also familiarizes you with past research results. data sources and the type of data available. Experiments are artificial in the sense that the situations are usually created for testing purposes. objective. It pursues several. In this representative samples are not desired. Conclusive research is used for this purpose of testing the hypotheses generated by exploratory research. It borders on an. The analysis of “insight-stimulating” examples Descriptive Research Descriptive research as the name suggests is designed to describe something. If you can control the factors. Three typical approaches in exploratory research are: a. they are looking for ideas.

556 . • We have dichotomized the types of research into- exploratory. data collection. • Research process involves the five important steps-problem definition. and interpretation of results. data analysis. • Conclusive research could be further divided intodescriptive. they are tested for validity by the conclusive research. The areas include marketing. materials. research design. You ask the research department to do a study to determine why sales have declined. brand EXCELLENCE shows a declining trend in sales. the experimental research establishes in a more effective manner the cause and effect relationships among variables. • While the descriptive procedures merely test the hypotheses. • While exploratory research enables the researcher to generate hypotheses. Is this an exploratory or conclusive research? Explain Your Reasons. Activity Research can be classified into: 8 © Copy Right: Rai University 11. and conclusive. All these steps have been explained in detail with their key elements. banking. For the last four consequent months. human resource development and government. a nationally distributed brand. RESEARCH METHODOLOGY has been briefly covered.• The role of research in the important areas of management Activity You are Product manager for brand EXCELLENCE of Vanaspati. production. and experimental.

Self-Assessment 1. and explain: Research “control” of the environment introduces artificial conditions. depends on methods of inquiry that maintain objectivity.556 © Copy Right: Rai University 9 . clarity. Objective research is best achieved recording what happens without “disturbing” the environment. accuracy and consistency. It is often said that there is not a proper link between some of the activities under way in the world of academics and in most business in our country.” Discuss this statement and examine the significance of research. Account for this state of affairs and give suggestions for improvement. Analyse. Creative management. 3. Briefly explain the meaning and importance of each of the following in research • Systematic • Objectivity • Relevance • Reproducible RESEARCH METHODOLOGY 5. 2. Discuss with examples “Exploratory Research”. “Descriptive Research” and “Experimental Research” 4. Notes - 11. whether in public administration or private industry. criticize.

if you are doing descriptive research you should define questions. Exploratory research can be performed using a literature search. Now we will discuss these categories in detail: Exploratory Research Exploratory research has the goal of formulating problems more precisely. Exploratory research 2. which tracks an aggregate of individuals who experience the same event within the same time interval over time. where. what. Causal research These classifications are made according to your objective of the research. gaining insight. gathering explanations. Refer diagram below to understand each steps clearly 10 © Copy Right: Rai University 11. In other words. but in other cases different phases of the same research project will fall into different categories. and forming hypotheses. Case studies can include contrasting situations or benchmarking against an organization known for its excellence. Research methodology minimizes the degree of uncertainty involved in management decisions. and how aspects of the research should be defined. or predict future demand for a product or describes the happening of a certain phenomenon. but it does not seek to test them. When you will be surveying people.556 . You can use Cohort analyses for long. In some cases the research will fall into one of these categories.forecasting of product demand. Research lays the structure for decisionmaking Let us recapitulate what we have studied in the last lecture. Descriptive research 3. why. in our introduction to the subject lecture we had conquered the areas where research is used as a tool for decision-making. thus allowing you to monitor behaviour such as brand switching. and the method of analysis prior to beginning data collection. clarifying concepts. Such preparation allows you the opportunity to make any required changes before the costly process of data collection has begun. but rather. It accomplishes this goal through laboratory and field experiments. eliminating impractical ideas. However. Exploratory research may develop hypotheses. Cross-sectional Studies Cross-sectional studies sample the population to make measurements at a specific point in time. the who. Also we know that all business operates in the world of uncertainty. Causal Research Casual Research seeks to find cause and affect relationships between variables. A special type of cross-sectional analysis is a cohort analysis. There are two basic types of descriptive research: Longitudinal Studies Longitudinal studies are time series analyses that make repeated measurements of the same individuals. surveying certain people about their experiences. We saw that research plays a dominant role in the field of • Marketing • Production • Banking • Materials • Human Resource Development • Government Descriptive Research Descriptive research is more rigid than exploratory research and seeks to describe users of a product. Exploratory research is characterized by its flexibility.RESEARCH METHODOLOGY LESSON 2: STEPS IN RESEARCH PROCESS Students. As opposed to exploratory research. when. seek to interview those who are knowledgeable and who might be able to provide you the insight concerning the relationship among variables. focus groups. exploratory research studies would not try to acquire a representative sample. longitudinal studies are not necessarily representative since many people may refuse to participate because of the commitment required. Research process involves important steps• Problem definition • Research proposal • Research Design • Data Collection • Data Analysis & interpretation • Report writing • Interpretation of Research You can classify Research in one of three categories: 1. people surveyed. determine the proportion of the population that uses a product. and case studies.

environment) Data Collection Sampling design Question & Instrument Testing Instrument Revision Data Collection & Preparation Research planning Data gathering Analysis interpretation. purpose. time frame. and reporting Data Analysis & Interpretation Research Reporting Management Decision 11.556 © Copy Right: Rai University 11 . scope.RESEARCH METHODOLOGY The Research Process Discover Management Dilemma Define Management Question Define Research Question (s) Redefine research Questions Exploration Exploration Research proposal Research Design Design Strategy (type.

The specification of units to be studied b. • Behaviour Primary data can be obtained by Communication . primary and secondary. such as attitudes. While useful. the researcher might study 500 consumers in a certain geographical area. statement of intent.for example. The specification of the kind of information to be sought. it may be the internal proposal or it may be the external proposal. We have identified the alternative choices but not completely specified the problem. The quality of data will greatly affect the conclusions and hence. why. Observation is less versatile than communication since some attributes of a person may not be readily observable. Let us suppose that we want to know which of the two methods can be employed Method I or Method II. For data to be useful.a person’s motives are more stable than his/her behavior. There are three aspects of research problem a. This method is versatile. What would you like to know if information is free and without error? A complete answer to this question defines the initial research problem.. It can be redefined later if some difficulty arises. which is collected by the investigator himself for the purpose of a specific inquiry or study. The identification of the particular units within the scope of study c. prospectus. etc. 12 © Copy Right: Rai University 11. brand awareness • Intentions . knowledge.Involves the recording of actions and is performed by either a person or some mechanical or electronic device. If you are working in a company. The researcher should be clear with the alternative choices he has. Observation . outline. how. race. and to whom it will be done. We will discuss the research proposal in detail in lesson 3. It must be ascertained that the group contains people representing variables such as income level. Remember your problem statement should be specific. But research proposal is not required in case of research studies for P. 3. • Motivation . Research Design Data Collection-Types and Sources RESEARCH METHODOLOGY Once the researcher has decided the ‘Research Design’. purchase intentions. Research Proposal Research proposal are necessary for all business research. the next job is of data collection. while gathering and collecting data. so that all relevant groups are represented in the data. the problem is assign by the top management. Should other Methods be also considered? Let us assume that the alternative choices are clearly specified: either II or I be employed. Or if you are doing some research project then you have to identify your problem statement of your own. statistical data can be classified into two categories.Involves questioning respondents either verbally or in writing. Then with the broad concept from the top management you define the specific problem statement. To discuss the research efforts of others who have worked on related management questions. Primary Data Primary data is one.for example. Communication usually is quicker and cheaper than observation. the response may not be accurate. Depending upon the sources utilized. To present the management question to be researched and its importance 2. To determine the potential market for a new product.556 . Observation typically is more accurate than communication. treated. Such data is original in character and is generated by surveys conducted by individuals or research institutions. The proposal of research is – 1. A proposal is known as a work plan. To suggest the data necessary for solving the management question and how the data will be gathered. for example. however. usually you get broad ideas regarding the Problem. our observations need to be organized so that we can get some patterns and come to logical conclusions. The complete problem is concerned with the criterion that will determine the superiority of the two methods. and interpreted. The criterion could be: • Cost • Efficiency of materials • Availability of resources. education and neighborhood. though observation using scanner data might be quicker and more cost effective. or draft plan. intentions. Observation also might take longer since observers may have to wait for appropriate events to occur. utmost importance must be given to this process and every possible precaution should be taken to ensure accuracy. where. since you need only to ask for the information.Problem Definition First of all one should be clear as what exactly the problem is. awareness. intentions are not a reliable indication of actual future behavior. whether the data has come from actual observations or from records that are kept for normal purposes. so motive is a better predictor of future behavior than is past behavior. Some common types of primary data are: • Demographic and socioeconomic characteristics • Psychological and lifestyle characteristics • Attitudes and opinions • Awareness and knowledge . Statistical investigation requires systematic collection of data. The proposal tells us what. and motivation. hd. or paper presentation as concerned.

Concealed tape recorder with the investigator helps to determine typical sales arguments and find out sales enthusiasm shown by various salesmen. To evaluate the effectiveness of display of the Dunlop cushions in a departmental store. in a personal interview the respondent’s perception of the interviewer may affect the responses Questionnaire . categories used. Competitor’s actions. • Objective of the original data collection. What is the best method for training salesmen? ii. What is the effectiveness of a point-of-purchase display? v. Their income and education is also not known. an observer notes: • How many pass by? • How many stopped to look at the display? • How many decide to buy? RESEARCH METHODOLOGY iii. sales incentive plan. What package design should be used? vi.The questionnaire is an important tool for gathering primary data. Data Collection Procedure for Primary Data Planning the Study Since the quality of results gained from statistical data depends upon the quality of information collected. customer response. • Then. so significant effort should be put into the Questionnaire Secondary Data When an investigator uses the data. and questionnaire design. etc. For example. • Specifications and methodologies used. A Control group is a group equivalent to the experimental group and differing only in not receiving any treatment. go to a service station and observe ii. sales territories. display. There are several criteria that you should use to evaluate secondary data. the experimental units may be consumers. Experimental method may be used in the following situations. • Nature of the data. in cooperative dealers. he can get the required information or data from the records of the meteorology department. of interest. This data is primary data for the agency that collects it and becomes secondary data for someone else who uses this data for his own purposes. The result/ response of a marketing experiment will be in the form of sales. Service Stations: Pose as a customer. if a researcher desires to analyze the weather conditions of different regions. flavor. their buying motives. What is the best shelf arrangement for displaying a product? iv. including definition of variables. it must be edited so that errors can be corrected or Observation Process Observing the process at work collects information. color shape. etc. iv. and relationships examined. The secondary data can be obtained from journals.556 omitted. Poorly constructed questions can result in large errors and invalidate the research data. weather changes. are environmental factors. • How current the data is and whether it applies to time period i. sample size and sampling technique. By this method. government publications. observation or by surveying the opinions of customers or experts. raw data must be transformed into the right format. This requires a high degree of skill and also certain precautionary measures may have to be taken. The method can be used to study sales techniques. attitudes or behaviour. What is the best remuneration plan for salesmen? iii. It also takes time for the investigator to wait for particular sections to take place. their images are not revealed. Modes of Data Collection Following are widely used methods for collection of primary data: • Observation • Experimentation • Questionnaire • Interviewing • Case Study Method Data Analysis–Preliminary Steps • Before the analysis can be performed. publication of professional and research organization and so on.Which version of a product would consumers like best? In a marketing experiment. i. such data is called secondary data. 11. For example. What media are the most effective? viii. response bias is eliminated. packaging. including data collection method. customer movements. etc. Super Market: What is the best location in the shelf? Hidden cameras are used.\Factors or marketing variables under the control of the researcher which can be studied are price. Experimental Method Many of the important decisions facing the marketing executives cannot be settled by secondary research. response rate. reports. Or sometimes a control group is set up. To study the effect of the marketing variables in the presence of environmental factors. a sufficiently large sample should be used. • Errors and accuracy-whether the data is dependable and can be verified. Which copy is the most effective? vii. quality and analysis of the data. However. • Presence of bias in the data. which has already been collected by others. © Copy Right: Rai University 13 . The following are a few examples.Personal interviews: have an interviewer bias that mail-in questionnaires do not have. it is important that a sound investigative process be established to ensure that the data is highly representative and unbiased. units of measure. • Whether the data is useful in the research study. stores. the customer’s/ consumer’s state of mind.

a test of statistical significance can be run. k = the number of categories. and Product C (20%). • A company’s sales revenue comes from Product A (50%). This hypothesis is assumed to be true unless proven otherwise. So if a test using such data does not reject a hypothesis. To illustrate the difference. these types of errors sometimes are confused. 11. • Choose the appropriate test.determine the rejection region. E i = the number of expected cases in category i. this procedure converts the RESEARCH METHODOLOGY • Type I error: occurs when one rejects the null hypothesis and edited raw data into numbers or symbols. A third variable can be introduced to uncover a relationship that initially was not evident. it is useful to consider a trial by jury in which the null hypothesis is that the defendant is innocent. If the jury convicts a truly innocent defendant. the data is tabulated to count the number of samples accepts the alternative. There are two types of errors in evaluating hypotheses: The chi-square goodness-of-fit test is used to determine whether a set of proportions have specified numerical values. A codebook is created to document how the data was coded. Cross tabulation can be performed for nominal and ordinal variables. treats two or more variables simultaneously. The null hypothesis is expressed as H0. the information set cannot be complete. a Type II error has occurred.556 14 © Copy Right: Rai University . when in fact the null hypothesis is true. Product B (30%). • Based on the comparison. • Finally. Cross tabulation is the most commonly utilized data analysis method in marketing research. However. • Type II error: occurs when one accepts the null hypothesis falling into various categories. • Choose a level of significance (alpha) . one can define chi-squared as: ?2 = SOi . the conclusion is not necessarily that the hypothesis should be accepted. This alternative hypothesis states that the relationship observed between the variables cannot be explained by chance alone.• The data must then be coded. It often is used to analyze bivariate cross-tabulated data. Many studies take the analysis no further than cross tabulation. Hypothesis testing involves the following steps: • Formulate the null and alternative hypotheses. If. Because their names are not very descriptive. Some people jokingly define a Type III error to occur when one confuses Type I and Type II. • Determine the probability of the observed value of the test • statistic under the null hypothesis given the sampling distribution that applies to the chosen test. The null hypothesis in an experiment is the hypothesis that the independent variable has no effect on the dependent variable. Some examples of situations that are well suited for this test are: • A manufacturer of packaged products test markets a new product and wants to know if sales of the new product will be in the same relative proportion of package sizes as sales of existing products.Ei )2 / Ei Where Oi = the number of observed cases in category i. Defining k categories and observing the number of cases falling into each category perform the chi-square test. a Type I error has occurred. Compare the value of the test statistic to the rejection threshold. • Gather the data and calculate the test statistic. Cross tabulations. In order to analyze whether research results are statistically significant or simply by chance. Simple tabulations count the occurrences of each variable independently of the other variables. since the variables are in a two-dimensional table. Knowing the expected number of cases falling in each category. Tests of Statistical Significance Conjoint Analysis The Conjoint Analysis is a powerful technique for determining consumer preferences for product attributes. This hypothesis is known as the alternative. also known as contingency tables or cross tabs. cross tabbing more than two variables is difficult to visualize since more than two dimensions would be required. The firm wants to know whether recent fluctuations in these proportions are random or whether they represent a real shift in sales. hypothesis. This technique divides the sample into sub-groups to show how the dependent variable varies from one subgroup to another. when in fact the null hypothesis is false. The alternative to the null hypothesis is the hypothesis that the independent variable does have an effect on the dependent variable. the summation runs from i = 1 to i = k. the jury declares a truly guilty defendant to be innocent. In the case of sampled data. or experimental hypothesis and is expressed as H1. reject or do not reject the null • Make the marketing research conclusion. research. Hypothesis Testing A basic fact about testing hypotheses is that a hypothesis may be rejected but that the hypothesis never can be unconditionally accepted until all possible evidence is evaluated. on the other hand.

• Describe the variation by breaking it into three parts .556 © Copy Right: Rai University 15 . 1. essentially a weighted sum of the variables. 2.SSwithin). 5.1) (number of rows . Anova Another test of significance is the Analysis of Variance (ANOVA) test. reducing the number of variables to a more manageable set. To do so. Two-way ANOVA allows for a second independent variable and addresses interaction.loading matrix is a key output of the factor analysis. Discriminant analysis can determine which variables are the best predictors of group membership. Measure the difference between each group’s mean and the grand mean. however. Factor analysis groups variables according to their correlation The factor loading can be defined as the correlations between the factors and their underlying variables. it is inferred from the variables. the number of degrees of freedom is equal to (number of columns . a factor is a linear combination of variables. it also can be used to compare two means. Mathematically.the total variation. the null hypothesis is rejected. Test the significance of the discriminant function. one needs a test that can consider them simultaneously in order to take into account their interrelationship. ANOVA calculates the ratio of the variation between groups to the variation within groups (the F ratio). 4. For chi-square applied to crosstabulated data. Whereas the t-test can be used to compare two means.Before calculating the chi-square value. the probability of a TYPE I error (rejecting a true null hypothesis) increases as the number of comparisons increases. Essentially. it is not useful for determine their individual impacts when the variables are used in combination. one needs to determine the expected frequency for each cell. A factor . 3. The square of 11.1) This is equal to the number of categories minus one. since this is a test on means the Central Limit Theorem holds as long as the sample size is not too small ANOVA is efficient for analyzing data using relatively few observations and can be used with categorical variables. While ANOVA was designed for comparing several means. The variation between group means (SSbetween) is the total variation minus the in-group variation (SStotal . Note that regression can perform a similar analysis to that of ANOVA. The technique identifies underlying structure among the variables. This is done by dividing the number of samples by the number of cells in the table. Determine the discriminant function coefficients that result in the highest ratio of between-group variation to withingroup variation. 2. and then uses that variable to predict new cases of group membership. use the following steps: Discriminant Analysis Analysis of the difference in means between groups provides information about individual variables. To use the output of the chi-square function. whereas factor analysis and cluster analysis address the interdependency among variables. one uses a chisquare table. the portion that is within groups. The in-group variation (SSwithin) is the sum of the squares of the differences in each element’s value and the group mean. To determine which variables discriminate between two or more naturally occurring groups. The test is called an F-test. Perform a significance test on the differences. Interpret the results. Since some variables will not be independent from one another. A factor is not directly observable. It determines which groups differ with respect to the mean of a variable.05 normally is used. A discriminant analysis consists of the following steps: 1. To run a one-way ANOVA. The total variation (SStotal) is the sum of the squares of the differences between each value and the grand mean of all the values in all the groups. the discriminant function problem is a one-way ANOVA problem in that one can determine whether multiple groups are significantly different from one another with respect to the mean of a particular variable. Factor 1 Factor 2 Factor 3 Variable 1 Variable 2 Variable 3 Column's Sum of Squares: Each cell in the matrix represents correlation between the variable and the factor associated with that cell. One such test is to construct a linear combination. discriminant analysis is used. Factor analysis studies the entire set of interrelationships without defining variables to be dependent or independent. Interpret the results. One-way ANOVA examines whether multiple means differ. The conventional critical level of 0. Factor analysis combines variables to create a smaller set of factors. Discriminant analysis analyzes the dependency relationship. and the portion that is between groups (or among groups for more than two groups). An example matrix is shown below. Factor Analysis Factor analysis is a very popular technique to analyze interdependence. If the calculated output value from the function is greater than the chisquare look-up table value. Determine the validity of the analysis. This F-test assumes that the group variances are approximately equal and that the observations are independent. one needs to know the number of degrees of freedom (df). RESEARCH METHODOLOGY • Identify the independent and dependent variables. The primary purpose of ANOVA is to test for differences between multiple means. It also assumes normally distributed data. 3. ANOVA is needed to compare three or more means. Formulate the problem. If multiple t-tests were applied.

Profile the clusters. Choose a clustering procedure (linkage. buffs. Varimax attempts to force the column entries to be either close to zero or one. Cluster Analysis Market segmentation usually is based not on one factor but on multiple factors. the you should take into account • The purpose of the study. The format of the marketing research report varies with the needs of the organization. The most common is the Euclidean distance. Each and every step in market research we will be doing in detail. and • How it will be used in decision-making. only factors for which the eigen value is greater than one are used. etc. • The relevant background information. Step 1: Problem Definition The first step in any marketing research project is to define the problem.Researcher should understand the kinds of questions research can handle and the type of structure required making a problem “researchable” 3. First let us discuss these briefly now. or factor procedures). or in the case of marketing research. each variable represents its own cluster. Cluster analysis is useful in the exploratory phase of research when there are no a-priori hypotheses. The challenge is to find a way to combine variables so that relatively homogenous clusters can be formed. Cluster Analysis Steps Formulate the problem. Research is wasted if it is not used in decision making or influencing actions.this correlation represents the proportion of the variation in the variable explained by the factor. Such clusters should be internally homogenous and externally heterogeneous. it is more of a collection of algorithms for grouping objects. Not only the results should be interpreted into action recommendations but the recommendations must also be communicated in an understandable manner. Cluster analysis is one way to accomplish this goal. Determine the number of clusters. An eigen value represents the amount of variance in the original variables that is associated with that factor. power distance. To facilitate interpretation.Researcher must understand the proper interpretation of research results and the assumptions embodied in them. Chebychev distance. Other criteria for determining the number of factors include the Scree plot criteria and the percentage of variance criteria.Researcher must be capable of appraising the feasibility of research proposals Special Case In management maximum research is done in the field of marketing. Assess the validity of the clustering. collecting data and choosing the variables to analyze. Initially. etc RESEARCH METHODOLOGY Data is then processed in order to summarise the results It seeks to determine how units covered in the research project respond to the items under investigation. Rather than being a statistical test. Rotation of the axis is equivalent to forming linear combinations of the factors. A commonly used rotation strategy is the varimax rotation. They should be well separated and ideally they should be distinct enough to give them descriptive names such as professionals. Results should be presented in as simple manner as possible. Basic requirement 1. In other words. Other possibilities include the squared Euclidean distance. grouping people. city-block (Manhattan) distance. the axis can be rotated. 16 © Copy Right: Rai University 11. 2. nodal. Choose a distance measure.556 . We conceptualize the marketing research process as consisting of six steps. The communality is the amount of the variable variance explained by common factors A rule of thumb for deciding on the number of factors is that each included factor must explain at least as much variance as does an average variable. The report often contains the following sections: • Authorization letter for the research • Table of Contents • List of illustrations • Executive summary • Research objectives • Methodology • Results • Limitations • Conclusions and recommendations • Appendices containing copies of the questionnaires. In defining the problem. • The information needed. and percent disagreement. The sum of the squares of the factor loadings in each column is called an eigen value. It could be • Univariate • Bivariate • Multivariate Interpretation of Results It is the “so what”? of research.

E.R. some qualitative research. coding. F. Wiley Eastern Ltd. Brown.556 17 .Taraporevala Sons andCo. Plan of data analysis Step 4: Fieldwork or Data Collection The entire project should be documented in a written report that addresses the specific research questions identified. and • Hypotheses and identifying the information needed. 2. “Marketing Research Text and Cases”. Dunn Olive Jean and Virginia A Clarck. by conducting a survey or an experiment) must be addressed. describes the approach. formulating the research design involves the following steps: 1. observation. supervision. Data preparation includes the editing. if necessary. Bombay. The data from the questionnaire are transcribed or key punched onto magnetic tape or disks. Green Paul E and Donald S. 6. an oral presentation should be made to management using tables. C. • Interviews with industry experts. Boyd. All India Traveller Bookseller.B. Methods of collecting quantitative data (survey. and • Perhaps. New Delhi. and graphs to enhance clarity and impact References and Further Readings 1. Addison – Wesley publishing company 3. The findings should be presented in a comprehensible format so that management can readily use them in the decision making process. the research can be designed and conducted properly Step 2: Development of an Approach to the Problem Development of an approach to the problem includes • Formulating an objective or theoretical framework. provide input in to the management decision problem. “Introduction to Business and Economic Statistics”. or computer-assisted personal interviewing). The data are analyzed to derive information related to the components of the marketing research problem and. and designing appropriate scales to measure them are also a part of the research design. • Analytical models. and stasch. Kothari. • Research questions. or input directly into the computer. training. through mail (traditional mail and © Copy Right: Rai University 11. 4. transcription and verification of data. the research design. data collection. “Marketing Research. More formally. • Analysis of secondary data. Qualitative research 4. Definition of the information needed 2. figures. a structure for decision making”. or electronically (e-mail or Internet). Private Limited. Step 6: Report Preparation and Presentation This process is guided by • Discussion with management and industry experts. Measurement and scaling procedures 6. D. mail panel surveys with pre-recruited households). thus. corrected. mall intercept. and experimentation) 5. and presents the results and the major findings. from an office by telephone (telephone or computer-assisted telephone interviewing). Each questionnaire or observation form is inspected or edited and. and • Pragmatic considerations Step 3: Research Design Formulation A research design is a framework or blueprint for conducting the marketing research project. Stockton and Clark. Conducting exploratory research precisely defines the variables. “Applied Statistics” John Wiley and Sons. 5. Number or letter codes are assigned to represent each response to each question in the questionnaire.Problem definition involves discussion with the • Decision makers. • Analysis of secondary data. Sampling process and sample size 8. such as focus groups. New Delhi Activity 1 The definition of the problem comprise Data collection involves a field force or staff that operates either in the field. Questionnaire design 7. The issue of how the data should be obtained from the respondents (for example. and provide the information needed for decision making. “Research for Marketing Decisions” Prentice Hall of India. Secondary data analysis 3. It is also necessary to design a questionnaire and a sampling plan to select respondents for the study. determine possible answers to the research questions. In addition. “Research Methodology-Methods and Techniques”. and evaluation of the field force help minimize data-collection errors. and its purpose is to design a study that will test the hypotheses of interest. Step 5: Data Preparation and Analysis RESEARCH METHODOLOGY Once the problem of investigation has been precisely defined. and data analysis procedures adopted. It details the procedures necessary for obtaining the required information. westfall. Tull. • Qualitative research. as in the case of personal interviewing (in-home. Proper selection.

Clearly define the units of analysis and characteristics of interest. Profits have almost vanished. Define the problem. The chief executive in searching for ways to revitalize the operation was advised to increase the number of hours the market is open for business.Activity 2 Explain the importance of interpretation of results? Activity 4 Explain with the help of a suitable example the need for introducing two types of environmental conditions in a research problem? Clue: The environmental conditions specified in the research problem are of two types: (i) Those beyond firm’s control (ii)Those within the firm’s control RESEARCH METHODOLOGY Activity 5 Activity 3 Name and briefly discuss the five steps of research process? A local supermarket has experienced a decline in unit _sales and little change in rupee value sales.556 . State the relevant question b. c. He comes to you for _advice in structuring a research problem that will provide relevant information for decision-making. taking care to: a. What are the relevant “ states of nature” which would lead to the selection each alternative answer? 18 © Copy Right: Rai University 11. Enumerate the alternative answers.

RESEARCH METHODOLOGY Activity 7 Define and state the research problem for the following case: “Why is the productivity in Japan so much higher than in India”? Think about problem in a broader sense and narrow down the research problem. a. What action would be associated with each hypothesis? 11. Propose and defend a precise definition of “best” b. What is the set of hypothesis that should be tested? c.556 © Copy Right: Rai University 19 . Activity 6 A shampoo manufacturing company wishes to test two types of makes in order to determine which is the best one .

An unsolicited proposal has the advantage of not competing against others but the disadvantage of having to speculate on the ramifications of a problem facing the firm’s management. descriptions are not required for facilities and special re-sources. Seldom do businesses begin research studies for other reasons. The goal of the summary is to secure a positive evaluation by the executive who will pass the proposal on to the staff for a full evaluation. As such. the results and objectives sections are the standards against which the completed project is measured. the complexity is generally greater than in a comparable private sector proposal. More complex and common in business is the small-scale study either an internal study or an external contract research project Now let us discuss difference Internal proposal & External proposal External Proposals An external proposal is either solicited or unsolicited. the associated jargon. and defi-nitions should be included directly in the 20 © Copy Right: Rai University . Sponsors can be university grant committees. make a decision. or improve an aspect of their business. requirements. particular attention must be paid to each specifica-tion in the RFP. approach. an executive summary is mandatory for all but the most simple of propos-als (projects that can be proposed in a two-page memo do not need an executive summary). design. the sponsoring individual or institution. On the other extreme. schedule. study objectives. and the benefits of your approach. Depending on the type of project. different levels of complexity are required for a proposal to be judged complete. The most important sections of the external proposal are the objectives. Since management insists on brevity. nor is there a need for a glossary. As the complexity of the project in-creases. government agencies. Schedules and budgets are necessary for funds to be committed. and budget. The management question starts the research task.556 Internal Proposals Internal proposals are a memo from the researcher to management outlining the prob-lem statement. Even more difficult. It is essentially an informative abstract. and so forth. research design. the literature review and bibliography are consequently not stressed and can often be stated briefly in the research design. and time allotted to the project. In contract research. For the smaller-scale projects. Privately and publicly held firms are concerned with how to solve a particular problem. The importance of re-searching the management question should be emphasized here if a separate module on the importance/ benefits of study is not included later in the proposal. In the small-scale proposal. The exploratory study is the first. External proposals are either solicited or unsolicited. more information is required about project management and the facilities and special resources. For example the government agencies demand the most complex proposals for their funding analyses. government contractors. Managers will typically leave this detail for others. A solicited proposal is often in response to a request for proposal (RFP). Problem Statement This section needs to convince the sponsor to continue reading the proposal. 2. this section should include any restrictions or areas of the management question that will not be addressed. its back-ground. qualifications. the executive summary should include brief statements of the management dilemma and management ques-tion. the measuring instrument and project management modules are not required. There are three general levels of complexity. Also. As we move toward government-sponsored research. and consequences. Since small projects are sponsored by managers familiar with the problem. business proposals can be divided between those generated internally and externally. corporations. the research objectives/research questions(s). The executive summary of an external proposal may be included within the letter of transmittal. and the resulting management question. the more complex is the proposal. If the proposal is unsolicited. The proposal is likely competing against several others for a contract or grant. 11. In general. Contents of Research Proposal 1. an exploratory study done within a manager’s department may need merely a one-to three-page memo outlining the objectives. giving executives the chance to grasp the essentials of the proposal without having to read the details. In public sector work. most simple business proposal. and the cost of the project. With few exceptions. Executive Summary The executive summary allows a busy manager or sponsor to understand quickly the thrust of the proposal. a brief description of your qualifications is also appro-priate.RESEARCH METHODOLOGY LESSON 3: RESEARCH PROPOSAL Dear friends. In addition. You should capture the reader’s attention by stating the management dilemma. the writer of an unso-licited proposal must decide to whom the document should be sent. after completion of this lesson you will be able to• Prepare internal research proposal • Prepare external research proposal text. An internal proposal is done for the corporation by staff special-ists or by the research department of the firm. the larger the project. and schedule is enough to start an exploratory study.

Avoid the extraneous details of the literature. 4. In this way. This section should include as many subsections as needed to show the phases of the project. Research Objectives This module addresses the purpose of the investigation. The research ques-tions (or hypotheses. instrumentation. and results sections. It is here that you layout ex-actly what is being planned by the proposed research. It is best to list the objectives either in order of importance or in general terms first. the credibility of these sources and the appropriateness of earlier studies. Data Analysis A brief section on the methods used for analyzing the data is appropriate for large-scale contract research projects and doctoral theses. what your study goals are. and why it is important for you to do the study. The design module describes what you are going to do in technical terms. the sponsor should be able to go back to the problem statement and research objectives and discover that each goal of the study has been covered. and achievable goals. concrete. You can. The proposal has presented the study’s value and benefits. If you find something of interest in a quotation. This will make the section easier to write and easier to read. If the proposal is for a causal study. In a descriptive study. If there is no statistical or analytical expertise within your company. be prepared to hire a professional to help with this activity. Recall that the research question can be further broken down into investigative questions.Problem statements too broadly defined cannot be addressed adequately in one study. Discuss how the literature applies to the study you are proposing. discuss the methods you rejected and why your selected approach is superior. company data. When more than one way exists to approach the design. you will avoid any errors of interpretation or transcription. If your proposal deals solely with secondary data. do a brief review of the informa-tion. that you have captured the essence of the problem. show the importance of this infor-mation and its implications. Begin your discussion of the related literature and relevant secondary data from a comprehensive perspective. You must convince the sponsoring organization that your plan will meet its needs. the relevant data and trends from previous research. If the problem has a historical background. through additional discussions with your sponsor or your research team. By use of sample charts and dummy tables. discussing how you would avoid similar problems. moving to specific terms (i.. 7. Describe your proposed treatment and the theoretical basis for using the selected techniques. The importance of “doing the study now” should be emphasized. It is important that the management question be distinct from related problems and that the sponsor see the delimitations clearly. and particular methods or designs that could be duplicated or should be avoided. if appropriate) should be set off from the flow of the text so they can be found easily. One should also specify the types of data to be obtained and the interpreta-tions that will be made in the analysis. data collection method. ultimately. then the objectives can be restated as a hypothesis. data analysis. Research Design Up to now. make 21 11. show the weaknesses or faults in the design. Be sure your problem statement is clear without the use of idioms or clinches. this sec-tion is not more than a few paragraphs. The importance/benefits section is particularly important to the unsolicited ex-ternal proposal.e. then you have probably not understood the problem adequately. to examining the accuracy of secondary sources. Provide information on your proposed design for tasks such as sample selection and size. Importance/ Benefits of the Study This section allows you to describe explicit benefits that will accrue from your study. This section also requires you to understand what is most troubling to your sponsor. With smaller projects. Usually. 3. The objectives module flows naturally from the problem statement. the pro-posed data analysis would be included within the research design section. re-search question followed by underlying investigative questions). the ob-jectives can be stated as the research question. procedures. the final report. 6. you have told the sponsor what the problem is. 5. and ethical requirements. giving the sponsor specific. If the data are to be turned over to the sponsor for proprietary reasons. find the original publication and ensure you understand it. If it is a potential union activity. Verify the consistency of the proposal by checking to see that each objective is discussed in the research design. The data analysis section is important enough to contract research that you should contact an expert to review the latest techniques available for your use. discuss the relevance of the data and the bias or lack of bias inherent in it. Nature and Form of Results RESEARCH METHODOLOGY Upon finishing this section.556 © Copy Right: Rai University . 8. Return to the analysis of the prob-lem and ensure. or by a reexamination of the literature. Always refer to the original source. The literature review may also explain the need for the proposed work to ap-praise the shortcomings and informational gaps in secondary data sources. not a comprehensive report. The research objectives section is the basis for judging the remainder of the pro-posal and. moving to more specific studies that are associated with your problem. you can make it easier to understand your data analysis. If you find it difficult to write. or industry reports that act as a basis for the proposed study. This benefit may allow management to respond to employee concerns and forge a linkage between those concerns and unionization. however. begin with the earliest references. you cannot promise that an employee sur-vey will prevent unionization. Emphasize the important results and conclusions of other studies. This is often an arduous section to write. This analysis may go beyond scrutinizing the availability or conclusions of past studies and their data. The object of this section is to assure the sponsor you are following correct assumptions and using theoretically sound data analysis procedures. Literature Review The literature review section examines recent (or historically significant) research studies.

especially in the corporate marketplace. Additional in-formation. Finally. If you estimate time for interviews. 9. Experience in carrying out previous research is important. backup details. The budget section of an external agency’s proposal states the total fee payable for the assignment. Budget The budget should be presented in the form the sponsor requests. The budget presented by an external research organization is not just the wages or salaries of their employees but the person-hour price that the contract-ing firm charges. keep explicit notes on how you made the estimate. It is also customary to begin qualifications with the highest academic degree held. action plans. get the quotation in writing for your file. One reason why external research agencies avoid giving detailed budgets is the possibility that disclosures of their costing practices will make their calculations public knowledge. reducing their negotiating flexibility. the budget should be no more than one to two pages. Unlike most product sale environments. In addition. and capital equipment purchases can change the way in which you prepare a budget Typically.556 . so a con-cise description of similar projects should be included. This section also contains the contractual statement telling the sponsor exactly what types of information will be received.sure this is reflected. The detail presented may vary depending on both the sponsors’ requirements and the contracting research company’s policy. Publication and delivery of final reports can be a last -minute expense that can easily be overlooked in preliminary budgets. you should know exactly how much money is budgeted for each particular task. this is frequently detailed in a purchase order. recommendations. relevant business and technical societies to which the researcher belongs can be included where this infor-mation is particularly relevant to the research project. . Since budget statements embody a financial work strategy that could be used by the recipient of the bid to develop an independent work plan. Qualifications of Researchers This section should begin with the principal investigator. The entire curriculum vitae of each researcher should not be included unless re-quired by the RFP. It is extremely important that you retain all information you use to generate your budget. Instead. When the time comes to do the work. and hourly time and payment calcu-lations should be put into an appendix if required or kept in the researcher’s file for future reference. models. The budget statement in an internal research proposal is based on employee and overhead costs. Alternatively. applied findings. For example. RESEARCH METHODOLOGY 22 © Copy Right: Rai University 11. and so forth are examples of the forms of results. limitations on travel. When it is accompanied by a proposed schedule of payment. If you use quotes from external contractors. 10. Also important to business sponsors is experience as an executive or employee of an organization involved in a related field. some organizations require secretarial assistance to be individually budgeted. then a percentage at an intermediate stage. strategic plans. vendors are often doubly careful. Statistical conclusions. if the report will go to more than one sponsor. Sometimes a re-tainer is scheduled for the beginning of the contract. research payments can be divided and paid at stages of completion. that should be noted. Often businesses are reluctant to hire individuals to solve operational problems if they do not have practical experience. and the balance on completion of the project. Some costs are more elusive than others. Diagram below shows a format that can be used for small contract research projects. refer to the relevant areas of experience and expertise that make the researchers the best selection for the task. whereas oth-ers insist it be included in the research director’s fees or the overhead of the opera-tion. Do not forget to build the cost of pro-posal writing into your fee. per diem rates. quotes from vendors.

For example. Critical Path: S –1 –3-4-7-8-9-E Time to complete: 40 days 12 Facilities and Special Resources Often. It may be helpful to you and your sponsor if you chart your schedule. Many of these sources also make suggestions for successful proposal writing. clerical help. Kate L. Theses. Payment frequency and timing are also covered in the master plan. because any delay in an activity along that path will delay the end of the entire project. 13 Project Management The purpose of the project management section is to show the sponsor that the research team is organized in a way to do the project efficiently. (4) field interviews. If none is specified. Turabian. procedures for information processing. In addition. or the Publication Manual of the American Psychological Associ-ation) will provide the details necessary to prepare the bibliography. right’s to the data. You can use a Gantt chart. The plan includes • The research team’s organization • Management procedures and controls for executing the Examples of management and technical reports • Research team relationship with the sponsor • Financial and legal responsibility • Management competence Tables and charts are most helpful in presenting the master plan. Finally. their timetables. The sponsor’s limits on control during the process should be delineated. For example. record control. projects will require special facilities or resources that should be described in detail. Use the bibliographic format required by the sponsor.RESEARCH METHODOLOGY D (1) 1 A (6) C (10) F (3) Start B (5) E (3) 2 5 I (3) 6 3 4 H (8) 7 K (8) 9 M (3) G (3) J (2) End 8 L (4) 11. In a CPM chart the nodes represent major milestones.556 © Copy Right: Rai University . Also. a standard style manual (e. (5) editing and coding.. define any acronyms that you use. Schedule Your schedule should include the major phases of the project. This is a simple section consisting of terms and definitions. Achtert. and the milestones that signify completion of a phase. and expense control are critical to large oper-ations and should be shown as part of the management procedures. Joseph Gibaldi and Walter S. and authority to speak for the researcher and for the sponsor are in-cluded. the results. The type and frequency of progress reports should be recorded so the sponsor can expect to be kept up-to-date and the researchers can expect to be left alone to do research. (3) questionnaire revision. The relation-ships between researchers and assistants need to be shown when several researchers are part of the team. Each of these phases should have an estimated time schedule and people assigned to the work. and the arrows suggest the work needed to get to the milestone.g. Computer-assisted telephone or other interviewing facilities may be required. Details such as printing facilities. In addition. and Dissertations. More than one arrow pointing to a node indicates all those tasks must be completed before the milestone has been met. (2) final research proposal. a contract exploratory study may need specialized facilities for focus group sessions. Usual1y a number is placed along the arrow showing the number of days or weeks required for that task to be completed. a critical path method (CPM) of scheduling may be included. 11. 15 Appendices Glossary A glossary of terms should be included whenever there are many words unique to the research topic and not understood by the general management commu-nity. 23 research plan. A Manual for Writers of Term Papers. major phases may be (I) exploratory interviews. 14 Bibliography For all projects that require literature review. or information-processing capa-bilities that are to be provided by the sponsor are discussed. (6) data analysis. The pathway from start to end that takes the longest time to complete is called the critical path. Alternatively. Sponsors must know that the director is an individual capable of leading the team and being a useful liaison to the sponsor. MIA Handbook for Writers of Research Papers. if the project is large and complex. and (7) report generation. even if they are defined within the text. An example of a CPM chart is shown below. A master plan is required for complex projects to show how the phases will all be brought together. proof of financial responsibility and overall management competence are provided. a bibliography is necessary.

16 Measurement Instrument For large projects. the proposal’s major top-ics should be easily found and logically organized. The formal method has some variations. With informal evalua-tion.556 . The problem statement must be easily understood. Second. This includes researcher vitae. unclear. The reviewer should be able to page through the proposal to any section of interest. These include budgetary restrictions and schedule deadlines. Beyond the required modules. many items contribute to a proposal’s acceptance and funding. The proposal with the highest num-ber of points will win the contract. budget details. or disorganized proposal will not get serious attention from the reviewing sponsors. Pri-marily. Several people. Although a proposal produced on a word processor and bound with an expensive cover will not overcome design or analysis deficiencies. the category scores are added to provide a cumulative total. a poorly presented. the content discussed above must be included to the level of detail required by the sponsor. Before the proposal is re-ceived. After the review. Points are recorded for each category reflecting the sponsor’s assessment of how well the proposal meets the category’s established criteria. university. The importance/befits of the study must allow the sponsor to see why the research should be funded. In contrast to the formal method. Other Any detail that reinforces the body of the proposal can be included in an ap-pendix. or public sector grants and also for large-scale contracts. typically review long and complex proposals. Evaluating The Research Proposal Proposals are subjected to formal and informal reviews. a system of points is not used and the criteria are not ranked. First. criteria are established and each is given weights or points. In practice. there are factors that can quickly elim-inate a proposal from consideration and factors that improve the sponsor’s reception of the proposal. and thus the criteria. RESEARCH METHODOLOGY Proposal types Proposal Modules Executive summary Problem Statement Research objectives Literature review Importance/b enefits of study Research Design Data Analysis Nature and form of results Qualification of researchers Budget Schedule Facilities & special resources Project management Bibliography Appendixes/ glossary of terms Measurement instrument Expl orat ory stud y Management Internal Small scale study √ √ √ √ √ Large scale study √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Explo ratory contra cts √ √ √ External Small scale contrac ts √ √ √ Large scale contract s √ √ √ √ √ √ Govern ment Large scale contract s √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Student Term paper Master ’s thesis Doctor al thesis √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ 24 © Copy Right: Rai University 11. The formal method is most likely to be used for competitive government. it is appropriate to include samples of the measurement instruments if they are available when you assemble the proposal. but its essence is described as follows. each of whom is assigned to a particular sec-tion. The process is more qualitative and impressionistic. the project needs. are well understood but not usually well documented. and lengthy descriptions of special facilities or resources. If exploratory work precedes the selection of the measurement instruments you will not use this appendix section. The research design should be clearly outlined and the methodology explained. A fourth important aspect is the technical writing style of the proposal. This allows the sponsor to discuss particular changes in one or more of the instru-ments. The proposal also must meet specific guidelines set by1he sponsoring company or agency. The proposal is evaluated with a checklist of criteria in hand. Small-scale contracts are more prone to informal evaluation. the proposal must be neatly presented.

The outcome of the episode depends on the quality of the manufacturer’s response. How customer tolerance levels for repair performance affect overall satisfaction. the completed proposal provides a logical guide for the investigation. for its portable/laptop/note-book computers. The staff of a company generates in-ternal proposals. model. Although we are not convinced that open-ended questions are appropriate for postcard questionnaires. ABC is currently experiencing a shortage obtained technical operators in its telephone center. Research Design Exploration .Qualitative We will augment our knowledge of ‘CompleteCare’ by interviewing the service manager. The sponsor uses the proposal to evaluate a research idea. • External proposals are prepared by an outside firm to obtain con-tract research. Which process components should be immediately improved to elevate the overall satisfaction of those ABC customers experiencing product failures. Components of the repair process are important targets for investigation because they reveal: I. and others will be drawn from the executive interviews. and project management aspects such as budgets and schedules. This will contain your costs. such as product failures. The approval for a reduced postage rate will take one to two weeks. we propose to develop a mail survey. and 2. Case Study: Research Proposal Repair Process Satisfaction Proposal ABC Corporation ‘CompleteCare’ Program Problem Statement ABC Corporation has recently created a service and repair program. Recent phone logs at the call center show complaints about ‘CompleteCare’ it is unknown how representative these complaints are and what implications they may have for satisfaction with ABC products. • The proposal is also a useful tool to ensure the sponsor and investigator agrees on the research question. and shows the design used to gather and analyze the data.We will test the questionnaire with a small sample of customers using your tech-line operators.A self-administered questionnaire (postcard size) offers the most cost-effective method for securing feedback on the effectiveness of CompleteCare. Research Objectives The purpose of this research is to discover the level of satisfaction with the ‘CompleteCare’ service program.Points to Ponder • A proposal is an offer to produce a product or render a We will also discover the importance of types of product failure on customer satisfaction levels. • In addition. Based on a thorough inventory of CompleteCare’s internal. A new five-point expectation scale. special facilities and resources. Logistics . We will then revise the questions and forward them to our graphics designer for layout. The budget section itemizes these costs. A comments/suggestions question will be included. We anticipate a maximum of 10 questions.556 © Copy Right: Rai University 25 . External proposals emphasize qualifications of the researcher. compatible with your exiting customer satisfaction scales. we intend to identify the component and overall levels of satisfaction with ‘CompleteCare’. contracted to pick up and deliver customers’ machines to ‘CompleteCare’. This research has the potential for connecting to ongoing ABC customer satisfaction programs and measuring the long-term effects of ‘CompleteCare’ (and product failure incidents) on customer satisfaction. This program promises to provide a rapid response to customers’ service problems. and item(s) serviced. • Proposals are valuable to both the research sponsor and the researcher. the call center manager. Pilot Test . In addition. Specifically. An extraordinary response by the manufacturer to such incidents will preserve and enhance user satisfaction levels to the point that direct and indirect benefits derived from such programs will justify their costs. Evaluation of Non response bias . outlines the data needed for solving the problem.The postal arrangements are: box rental. Malraison like them.a random sample of 100 names will be secured from the list of customers who do not RESEARCH METHODOLOGY service to the potential buyer or sponsor. Some questions for this instrument will be based on the’ investigative questions we presented to you previously. Internal and external proposals have a problem-solving orientation. permit. discusses related research ef-forts. Management desires information on the program’s effectiveness and its impact on customer satisfaction to determine what should-be done to improve the ‘CompleteCare’ program for ABC product repair and servicing. we understand that you and Mr. is being designed. Importance/Benefits High levels of user satisfaction translate into positive word ofmouth product endorsements. and the independent package company’s account executive. Critical incidents. The introduction on the postcard will be a variation of ABC current advertising campaign. has provided irregular execution ABC has also experienced parts availability problems for some machine types. These endorsements influence the purchase outcomes for (1) friends and relatives and (2) business associates. external processes. Questionnaire Design . • Two types of proposals: internal and external. we will work out a code block that captures the call center’s reference number. have the potential to either undermine existing satisfaction levels or preserve and even increase the resulting levels of product satisfaction. ‘CompleteCare’. 11. The instrument will then be submitted to you for final approval. The package courier. and “business reply” privileges to be arranged in a few days. • The research proposal presents a problem.

00 $ 11150.00 1325. Visual displays of the data will be in bar chart/ histogram form. At approximately a 30 percent return rate. we will provide you with a report consisting of frequencies and category percentages far each questions. Budget Card Layout and Printing Based on your card estimate.return the questionnaire.00 Data entry (monthly) Monthly data files (each) Monthly reports (each) Total start-up costs Monthly run costs 430.00 185. Results: Deliverables 1.00 500. Data Analysis We will review the postcards returned and send you a weekly report listing customers who are dissatisfied (score a “1” or “2”) with any item of the questionnaire or who submit a negative comment. Other analyses can be prepared on a time and materials basis.00 75. Cost Summary Interviews Travel costs Questionnaire development Equipment/supplies Graphics design Permit fee (annual) Business reply fee (annual) Box rental (annual) Printing costs $ 1550. our designer will layout and print 2.00 1850.21 per card will be assessed by the post office for business reply mail. 2. An ASCII diskette with each month’s data shipped to Austin by the fifth working day of each month. Non responders will be interviewed on the telephone and their responses compared statistically to those of the responders. This Overall question would be regressed on the individual items to determine each item’s importance. Note - 26 © Copy Right: Rai University 11. A gray-scale layer with a ABC or CompleteCare logo can be positioned under the printed material at a nominal charge.000 cards in the first run ($500). 4.00 RESEARCH METHODOLOGY * An additional fee of 0.00 800.00 35. Each month.00 1850. We propose to include at least one question dealing with overall satisfaction (with CompleteCare and/or MindWriter). The two-sided cards measure 4 1/4 by 5 1/2. Monthly reports as described in the data analysis section.00 50. we also can provide content analysis for these questions.00 $ 1030. postage paid symbol. This allows us to print four cards per page. 3. and address. If you wish. The open-ended questions will be summarized and reported by model code. ABC employees will package the questionnaire with the returned merchandis. Weekly exception reports (transmitted electronically) listing customers who met the dissatisfied customer criteria. This will improve your timeliness in resolving customer complaints. Call center records will be used for establishing the sampling frame. The specifications are as follows: 7-point Williamsburg offset hi-bulk with one-over-one black ink. Development and production of a postcard survey.00 2500. A performance grid will identify items needing improvement with an evaluation of priority.556 . we estimate the monthly cost to be less than $50. The opposite side will have the business reply logo.

You are competing for a university-sponsored student research grant. Design a form for a research proposal that can be completed easily by your research staff and the sponsor-ing manager.RESEARCH METHODOLOGY LESSON 4: TUTORIALS 1. 4. How might these be used to enhance research proposals? Give several examples of appropriate use. are the differences between solicited and unsolicited proposals? 2. c. What modules would you suggest be included in a proposal for each of the following cases? a. Computer service effectiveness at the individual store level 11. multimedia computer author-ing and display capabilities. What. The president of your company has asked for a study of the company’s health benefits plan and for a comparison of it to other firms’ plans. if any. 5. Include a point scale and weighting algorithm. The proposal will go to several city and county planning agencies. b. Select a research report from a management journal. d.556 © Copy Right: Rai University 27 . Discuss how your form improves communication of the re-search objectives between the manager and the researcher. and independent and government landfill providers. providing monthly information about the use of recyclable items in your state. Many product managers and corporate officers have requested market surveys from you on various products. Advertising effectiveness c. You are the manager of a research department in a large department store chain.. Develop a list of criteria for evaluating the types of research activities listed below. A bank is interested in understanding the population trends by location so that it can plan its new branch locations for the next five years. Credit card operations e. 4. Employee opinion surveys d. independent waste service providers. Market research b. They con-tacted you for a proposal. 3. and inexpensive video taping and playback possibilities. a. Outline a proposal for the research as if it had not yet been performed. You are interested in starting a new research service. Make estimates of time and costs. awarded to seniors and graduate students. You are the new manager of market intelligence in a rapidly expanding soft-ware firm. Consider the new trends in desktop publishing.

by what means concerning a research project constitute a research design.RESEARCH METHODOLOGY LESSON 5: RESEARCH DESIGN AND EXPERIMENTAL DESIGNS Objective In this lesson. we may split the Overall research design into the following parts. In fact. For example.which relates to the conditions under which the observations are to be made. • Sampling design. Design plan for collection of data 2. and 3. Research design is the important step in any research project. exploratory research studies are also termed as formulative research studies. Research design in case of exploratory research studies. We take up each category separately 1.which deals with the techniques by which the procedures specified in the sampling. 2. similarly we need a research design or a plan in advance of data collection and analysis for our research project.which deals with the method of selecting items It be observed for the given study. The major emphasis in such studies is on the discovery of ideas and insights. Design plan for the analysis of data We can state the important features of a research design as under: Essentials of research Designs • The design is an activity – and –time –based plan • The design is always based on the research question • The design guides the selection of sources and types of information • The design is a frame work for specifying the relationships Meaning of Research Design The decisions regarding what. broadly defined initially. The main purpose of such studies is that of formulating a problem for more precise investigation or of developing the working hypotheses from an operational point of view. It ensures the systematic and timely completion of your project. • Observational design . how much. Research design in case of hypothesis-testing research studies. The research design appropriate for such studies must be flexible enough to provide opportunity for considering different aspects of a problem under study. economical and attractive construction of house we need a blueprint (or what is commonly called the map of the house) well thought out and prepared by an expert architect. After completion of this lesson you will be able to – 1. is transformed into one with more precise Keeping in view the above stated design decisions. you will learn who to design your research project. • Operational design . • Statistical design . Research Design in Case of Exploratory Research Studies As you know from previous lessons that. “A research design is the arrangement of conditions for collection and analysis of data in a manner that aims to combine relevance to the research purpose with economy in procedure”. where. Research design stands for advance planning of the methods to be adopted for collecting the relevant data and the techniques to be used in their analysis. As such the design includes an outline of what the researcher will do from writing the hypothesis and its operational implications to the final analysis of data. 28 © Copy Right: Rai University 11. Design plan for measurement 3. statistical and observational designs can be carried out.556 . the research design is the conceptual structure within which research is conducted. Research design in case of descriptive and diagnostic research studies. More explicitly. Inbuilt flexibility in research design is needed because the research problem. the design decisions happen to be in respect of: • What is the study about? • Why is the study being made? • Where will the study be carried out? • What type of data is required? • Where can the required data be found? • What periods of time will the study include? • What will be the sample design? • What techniques of data collection will be used? • How will the data be analysed? • In what style will the report be prepared? among the study’s variables • The design outlines procedure for every research activity Need for Research Design (Why Research Design is Required?) Research design is needed because it facilitates the smooth sailing of the various research operations.which concerns with the question of how many items are to be observed and how the information and data gathered are to be analysed. time and money. Different Research Design Different research designs can be conveniently described if we categorize them as: 1. it constitutes the blueprint for the collection. thereby making research as efficient as possible yielding maximal information with minimal expenditure of effort. measurement and analysis of data. when.

the only thing essential is that it must continue to remain flexible so that many different facets of a problem may be considered as and when they arise and come to the notice of the researcher. Formulating the objective of the study b. the unstructured interviewing may take place. Hypotheses stated by earlier works may be reviewed and their usefulness be evaluated as a basis for further research. the study of individuals who are in transition from one stage to another. Most of the group or search comes under this category From the point of view of the research design. with narration of facts and characteristics concerning individuals or group or situation are all examples of descriptive research studies. the following three methods in the context of research design for such studies are talked about: a. Experience survey . whatever method or research design outlined above is adopted. Hence. It may also be considered whether the already stated hypotheses suggest new hypothesis. the experience-collecting interview is likely to be long and may last for few hours. c.Descriptive research studies. The design in such studies must be rigid and not flexible and must focus attention on the following: a. Processing and analysing the data. In descriptive as well as in diagnostic studies. Experience indicates that for particular problems certain types of instances are more appropriate than others. the intensity of the study and the ability of the researcher to draw together diverse information into a unified interpretation are the main features. The researcher must prepare an interview schedule for the systematic questioning of informants. Collecting the data (where can the required data be found and with what time period should the data be related?) e. the descriptive as well as diagnostic studies share common requirement. b. the reactions of marginal individuals. It is particularly suitable in areas where there is little experience to serve as a guide.556 adopted.means the survey of people who have had practical experience with the problem to be studied. Reporting the findings. Besides. those studies which are concerned with describing the characteristics of a particular individual. For such a survey people who are competent and can contribute new ideas may be carefully selected as respondents to ensure a representation of different types of experience. Attitude of the investigator. which make this method an appropriate procedure for evoking insights.This method happens to be the most simple and fruitful method of formulating precisely the research problem or developing hypothesis. studies concerned with specific predication. Sometimes the works of creative writers also provide a fertile ground for hypothesis-formulation and as such may be looked into by the researcher. what he wants to measure and must find adequate methods for measuring it along with a clear cut definition of population he wants to study. The object of such a survey is to obtain insight into the relationships between variables and new ideas relating to the research problem. the bibliographical survey of studies. Designing the methods of data collection c. the researcher must be able to define clearly. if any. Research Design in Case of Descriptive and Diagnostic Research Studies Now another type of research studies are . This survey may as well provide information about the practical possibilities for doing different types of research. The studies concerning whether certain variables are associated are examples. which fact may necessitate changes in the research procedure for gathering relevant data. In this way the researcher should review and build upon the work already done by others. Survey of concerning literature . RESEARCH METHODOLOGY © Copy Right: Rai University 29 . Analysis of ‘insight-stimulating’ – It is also a fruitful method for suggest hypothesis for research. Generally. One can mention few examples of ‘insight-stimulating’ cases such as the reactions of strangers. 2. Now. The analysis of ‘insight-stimulating’. Generally. This method consists of the intensive study of selected instances of the phenomenon in which one is interested. The research design must make enough provision for protection against bias and must maximize reliability. the procedure to be used must be carefully planned. The survey of concerning literature. in an exploratory or formulative research study which merely leads to insights or hypotheses. Since the aim is to obtain complete and accurate information the said studies. what sort of examples is to be selected and studied? There is no clear-cut answer to it. already made in one’s area of interest may as well be made by the researcher for precisely formulating the problem. The investigator may then interview the respondents so selected. or of a group whereas diagnostic research studies determine the frequency with which something occurs or its association with something else. b. We let us discuss each of these methods a. and as such we may group together these two types of research studies. Selecting the sample (how much material will be needed?) d. He should also make an attempt to apply concepts and theories developed in different research contexts to the area in which he is himself working. For this purpose the existing records. But the Interview must ensure flexibility in the sense that the respondents should be allowed to raise issues and questions that the investigator has not previously considered. or some other approach may be 11. The experience survey and c. but in cases where hypotheses have not yet been formulated. As against this. diagnostic research studies.meaning in exploratory studies. Thus. an experience survey may enable the research to define problem more concisely and help in the formulation of the research hypothesis. with due concern for the economical completion of research study. may be examined. f. Thus. his task is to review the available material for deriving the relevant hypotheses from it. it is often considered desirable to send a copy of the questions to be discussed to the respondents well in advance. the reactions of individuals from different social strata and the like.

3. Research Design in Case of Hypothesis-testing Research Studies Hypothesis-testing research studies (generally known as experimental studies) are those where the researcher tests the hypotheses of causal relationships between variables. Such studies require procedures that will not only reduce bias and increase reliability, but will permit drawing inferences about causality. Usually experiments meet this requirement. Hence, when we talk of research design in such studies, we often mean the design of experiments. Professor R.A. Fisher’s name is associated with experimental designs. The study of experimental designs has its origin in agricultural research. Professor Fisher found that by dividing agricultural fields or plots into different blocks and then by conducting experiments in each of these blocks, whatever information is collected and inferences drawn from them, happens to be more reliable. This fact inspired him to develop certain experimental designs for testing hypotheses concerning scientific investigations. Today, the experimental designs are being used in research relating to phenomena of several disciplines. Now let us discuss the basic principles of experimental designs. Basic Principles of Experimental Designs There are three principles of experimental designs: 1. Principle of Replication; 2. Principle of Randomization 3. Principle of Local Control Now let us discuss each one of these experimental design Principle of Replication - In this design, the experiment should be repeated more than once. Thus, each treatment is applied in many experimental units instead of one. By doing so the statistical accuracy of the experiments is increased. For example, suppose we are to examine the effect of two varieties of rice. For this purpose we may divide the field into two parts and grow one variety in one part and the other variety in the other part. We can then compare the yield of the two parts and draw conclusion on that basis. But if we are to apply the principle of replication to this experiment, then we first divide the field into several parts, grow one variety in half of these parts and the other variety in the remaining parts. We can then collect the data of yield of the two varieties and draw conclusion by comparing the same. The result so obtained will be more reliable in comparison to the conclusion we draw without applying the principle of replication. The entire experiment can even be repeated several times for better results. Conceptually replication does not present any difficulty, but computationally it does. For example, if, an experiment requiring a two-way analysis of variance is replicated, it will then require a three-way analysis of variance since replication itself may be a source of variation in the data. However, it should be remembered that replication is introduced in order to increase the precision of a study; that is to say, to increase the accuracy with which the main effects and interactions can be estimated.

Principle of Randomization - This principle indicates that we should design or plan the experiment in such a way that the variations caused by extraneous factor can all be combined under the general heading of “chance.” For example - if grow one variety of rice, say, in the first half of the parts of a field and the other variety is grown in the other half, then it is just possible that the soil fertility may be different in the first half in comparison to the other half. If this is so our results would not be realistic. In such a situation, we may assign the variety of rice to be grown in different parts of the field on the basis of some variety ‘sampling technique, i.e., we may apply randomization principle and random ourselves against the effects of the extraneous factors (soil fertility processes in the given case. The Principle of Local Control – is another important principle of experimental designs. Under it the extraneous factor, the known source of variability, is made to vary deliberately over as wide a range as necessary and this needs to be done in such a way that the variability it causes can be measured and hence eliminated from the experimental error. This means that we should plan the experiment in a manner that we can perform a two-way analysis of variance, in which the total variability of the data is divided into three components attributed to treatments (varieties of rice in our case), the extraneous factor (soil fertility in our case) and experimental error. In other words, according to the principle of local control, we first divide the field into several homogeneous parts, known as blocks, and then each such block is divided into parts equal to the number of treatments. Then the treatments are randomly assigned to these parts of a block. Important Experimental Designs Experimental design refers to the framework or structure of an experiment and such there are several experimental designs. We can classify experimental designs into two broad categories. viz., informal experimental designs and formal experimental designs. Informal experimental designs are designs that normally use a less sophisticated form of analysis based on differences in magnitudes, whereas formal experimental designs offer relatively more control and use precise statistical procedures for analysis. Important experimental designs are as follows:

a. Informal experimental designs:

RESEARCH METHODOLOGY

I. Before-and-after without control design. II. After-only with control design. III. Before-and-after with control design.

b. Formal experimental designs:

I. Completely randomized design (C. R. design) II. Randomized block design (R. B. design) III. Latin square design (L.S. design). IV. Factorial designs. We may briefly discuss with each of the above stated informal as well as formal experimental designs. 1. Before-and-after without control design - In such a design a single test group or area is selected and the dependent variable

11.556

30

© Copy Right: Rai University

is measured before the introduction of the treatment The treatment is then introduced and the dependent variable is measured again after the treatment has been introduced. The effect of the treatment would be equal to the level - of the phenomenon after the treatment minus the level of the phenomenon before the treatment. The design can be represented thus:

Test Area: Level of phenomenon Treatment Before treatment (X) Level of phenomenon

This design is superior to the above two designs for the simple reason that it avoids extraneous variation resulting both from the passage of time and from non-comparability of the test and control areas. But at times, due to lack of historical data, time or a comparable control area, we should prefer to select one of the first two informal designs stated above. 4. Completely randomized design (C.R. design) – It involves only two principles viz., the principle of replication and the principle of randomization of experimental designs. It is the simplest possible design and its procedure of analysis is also easier. The essential characteristic of this design is that subjects are randomly assigned to experimental treatments (or vice-versa). For Example - If we have 10 subjects and if we wish to test 5 under treatment A and 5 under treatment B, the randomization process gives every possible group of 5 subjects selected from a set of 10 an equal opportunity of being assigned to treatment A and treatment B. One-way analysis of variance (or one-way ANOVA) is used to analyse such a design. Such a design is generally used when experimental areas happen to be homogeneous. Technically, when all the variations due to uncontrolled extraneous factors are included under the heading of chance variation, we refer to the design of experiment as C. R. design. We can present a brief description of the two forms of such a design is given below.

RESEARCH METHODOLOGY

introduced After treatment (Y)

Treatment effect = (Y) - (X)

The main difficulty of such a design is that with the passage of time considerable extraneous variations may be there in its treatment effect. 2. After-only with control design - In this design two groups or areas (test area and control area) are selected and the treatment is introduced into the test area only. The dependent variable is then measured in both the areas at the same time. Treatment impact is assessed by subtracting the value of the dependent variable in the control area from its value in the test area. This can be exhibited in the following form:

Test Area: phenomenon Control Area: phenomenon

Treatment introduced

Level of After treatment (Y) Level of Without treatment

sidered. If this assumption is not true, there is the possibility of extraneous variation entering into the treatment effect. However, data can be collected in such a design without the introduction of problems with the passage of time. In this respect this design is superior to before –and- after without control design. 3. Before-and-after with control design - In this design two areas are selected and the dependent- variable is measured in both the areas for an identical time-period before the treatment. The treatment is then introduced into the test area only, and the dependent variable is measured in both for an identical timeperiod after the introduction of the treatment The treatment effect is determined by subtracting the change in the dependent variable in the control area from the change in the dependent variable in test area. This design can be shown in this way:

Time period I Time Period II Test Area: Level of phenomenon Treatment Level of Phenomenon Before treatment (X) Introduce After treatment (Y) Control Area: Level of phenomenon Level of phenomenon Without treatment (A)

Two-group simple randomized design - In a two-group simple randomized design, first of all the population is defined and then from the population a sample is selected randomly. Further, requirement of this design is that items, after being selected randomly from the population, be randomly assigned to the experimental and control groups (such random assignment of items to two groups is technically described as principle of randomization). Thus, this design yields two groups as representatives of the population. Since in the simple randomized design the elements constituting the sample are randomly drawn from the same population and randomly assigned to the experimental and control groups, it becomes possible to draw conclusions on the basis of samples applicable for given different treatments of the independent variable. This design of experiment is quite common in research studies concerning behavioural sciences. The merit of such a design is that it is simple and randomizes the differences among the sample items. But the limitation of it is that the individual differences among those conducting the treatments are not eliminated, i.e., it does not control the extraneous variable and as such the result of the experiment may not depict a correct picture. This can be illustrated by taking an example. Example - Suppose the researcher wants to compare two groups of students who have been randomly selected and randomly assigned. Two different treatments viz., the usual training and the specialised training are being given to the two groups. The researcher hypothesises greater gains for the group receiving specialised training. To determine this, he tests each group before and after the training, and then compares the

11.556

© Copy Right: Rai University

31

RESEARCH METHODOLOGY

32

Experiment al Populatio Randomly Selected randomly Assigned Control group

Treatment A Indepen dent Variabl e Treatment B

amount of gain for the two groups to accept or reject his hypothesis. Random replication design: The limitation of the two-group randomized design is usually eliminated within the random replication design. In the example we just discuss, the teacher differences on the dependent variable were ignored, i.e., the extraneous variable was not controlled. But in a random replications design, the effect of such differences are minimised (or reduced) by providing a number of repetitions for each treatment. Each repetition is technically called a ‘replication’. Random replication design serves two purposes viz., it provides controls for the differential effects of the extraneous independent variables and secondly, it randomizes any individual differences among those conducting the treatments. Diagrammatically we can illustrate the random replications design thus (Diagram given here) From the diagram it is clear that there are two populations in the replication design. The sample is taken randomly from the population available for study - and is randomly assigned to, say, four experimental and four control groups. Similarly, sample is taken randomly

© Copy Right: Rai University 11.556

from the population available to conduct experiments (because of the eight groups eight such individuals be selected) and the eight individuals so selected should be randomly assigned to the eight groups. Generally, equal number of items is put in each group so that the size of the group is not likely to affect the results of the study. Variables relating to both population characteristics are assumed to be randomly distributed among the two groups. Thus, this random replication design is, in fact, an extension of the two-group simple randomized design. 5. Randomized block design (R.B. design) - It is an improvement over the C.R design. In the RB, design the principle of local control can be applied along with the other two principles of experimental designs. In the R.B. design, subjects are first divided into groups, known as blocks, such that within each group the subjects are relatively homogeneous in respect to some selected variable. The variable selected for grouping the subjects is one that is believed to be related to the measures to be obtained in -respect of the dependent variable. The number of subjects in a given block would be equal the number of treatments and one subject in each block would be randomly assigned to each treatment. The RB. design is analysed by the two-way analysis of variance (two-way ANOVA) technique. Let us understand the RB design with the help of an example. Suppose four different forms of a standardized test in statistics were given to each of five students (selected one from each of the five I.Q. blocks)

fertilizers, but it may also be the effect of fertility of soil. Similarly, there may be the impact of varying seeds on the yield. To overcome such difficulties, the L.S design is used when there are two major extraneous factors such as the varying soil fertility and varying seeds. The Latin-square design is one wherein each fertilizer, in our example, appears five times but is used only once in each row and in each column of the design. In other words, the treatments in a L. S. design are so allocated among the plots that no treatment occurs more than once in anyone row or anyone column. The two blocking factors may be represented through rows and columns (one through rows and the other through columns). The following is a diagrammatic form of such a design in respect of, say, five types of fertilizers, viz., A, B, C, D and E and the two blocking (actors viz., the varying soil fertility and the varying seeds: I

X1

RESEARCH METHODOLOGY

II

A B C D E

III

B C D E A C D E A B

IV

D E A B C

V

E A B C D

Seed Difference

X2 X3 X4 X5

The above diagram clearly shows that in a L.S. design the field is divided into as many blocks as there are varieties of fertilizers and then each block is again divided into as many parts as there are varieties of fertilizers in such a way that each of the fertilizer variety is used in each of the block (whether column-wise or row-wise) only once. The analysis of the L. S. design is very similar to the two-way ANOV A technique. The merit of this experimental design is that it enables differences in fertility gradients in the field to be eliminated in comparison to the effects of different varieties of fertilizers on the yield of the crop. But this design suffers from one limitation, and it is that although each row and each column represents equally all fertilizer varieties, there may be considerable difference in the row and column means both up and across the field. This, in other words, means that in L.S. design we must assume that there is no interaction between treatments and blocking factors. This defect can, however, be removed by taking the means of rows and columns equal to the field mean by adjusting the results. Another limitation of this design is that it requires number of rows, columns and treatments to be equal. This reduces the utility of this design. In case of (2 x 2) L. S. design, there are no degrees of freedom available for the mean square error and hence the design cannot be used. If treatments are 10 or more, than each row and each column will be larger in size so that rows and columns may not be homogeneous. This may make the application of the principle of local control ineffective. Therefore, L.S. design of orders (5 x 5) to (9 X 9) are generally used. 7. Factorial designs: Factorial designs are used in experiments where the effects of varying more than one factor are to be determined. They are specially important in several economic and social phenomena where usually a large number of factors affect a particular problem. Factorial designs can be of two types;

33

Very low IQ Student A 82 90 86 93

Low I.Q.

Average I.Q.

High I.Q. Student D 71 70 69 68

Form Form Form Form

1 2 3 4

Student Student B C 67 57 68 54 73 51 75 60

Very High I.Q. Student E 73 81 84 75

If each student separately randomized the order in which he or she took the four tests (by using random numbers or some similar device), we refer to the design of this experiment as a R.B. design. The purpose of this randomization is to take care of such possible extraneous factors (say as fatigue) or perhaps the experience gained from repeatedly taking the test. 6. Latin squares design (L. S. design) - It is an experimental design very frequently used in agricultural research. The conditions under which agricultural investigations are carried out are different from those in other studies for nature plays an important role in agriculture. For example, an experiment has to be made through which the effects of five different varieties of fertilizers on the yield of a certain crop, say wheat, is to be judged. In such a case the varying fertility of the soil in different blocks in which the experiment has to be performed must be taken into consideration; otherwise the results obtained may not be very dependable because the output happens to be the effect not only of

11.556

© Copy Right: Rai University

i. Simple factorial designs, and ii. Complex factorial designs. We take them separately. Simple factorial designs: In case of simple factorial designs, we consider the effects of varying two factors on the dependent variable, but when an experiment is done with more than two factors, we use complex factorial designs. Simple factorial design is also termed as ‘two-factor-factorial design’, whereas complex factorial design is known as ‘multi-factor-factorial design’. Simple factorial design may either be a 2 x 2 simple factorial design, or it may be, say, 3 x 4 or 5 X 3 or the like type of simple factorial design. We can design some simple factorial designs with this example. Example : (2 x 2 simple factorial design). A 2 x 2 simple factorial design can graphically be design as follows:

two levels of the control variable. As such there are four cells into which the sample is divided. Each of the four combinations would provide one treatment or experimental condition. Subjects are assigned at random to each treatment in the same manner as in a randomized group design. The means for different cells may be obtained along with the means for different rows and columns. Means of different cells represent the mean scores for the dependent variable and the column means in the given design are termed the main effect for treatments without taking into account any differential effect that is due to the level of the control variable. Similarly, the row means in the said design are termed the main effects for levels without regard to treatment Thus, through this design we can study the main effects of treatments as well ‘as the main effects of levels.

RESEARCH METHODOLOGY

Experimental Control Variable Level 1 Level 2 Treatment A Treatment A

Cell 1 Cell 3

Cell 2 Cell 4

ii. Complex factorial designs: Experiments with more than two factors at a time involve the use of complex factorial designs. A design, which considers three or more independent variables simultaneously, is called a complex factorial design. In case of three factors with one experimental variable having two treatments and two control variables, each one of which having two levels, the design used will be termed 2 x 2 x 2 complex factorial design which will contain a total of eight cells as shown below.

Experimental Variable

Treatment A Control Control Variable 2 Variable 2 Level - I Level - II Control Variable Level - I Level – II Cell 1 Cell 2 Cell 3 Cell 4

Treatment B Control Control Variable 2 Variable 2 Level – I Level -II Cell 5 Cell 6 Cell 7 Cell 8

**2 x 2 x 2 Complex Factorial Design You can understand this design better using 3 – D representation given below.
**

34 © Copy Right: Rai University 11.556

556 © Copy Right: Rai University 35 . RESEARCH METHODOLOGY Points to Ponder • There are several research designs and the researcher must decide in advance of collection and analysis of data as to which design would prove to be more appropriate for his research project. one experimental and two control variables. The researcher can also determine the interactions between each possible pair of variables (such interactions are called ‘First order interactions’) and interaction between variable taken in triplets (such interactions are called Second Order interactions). In case of a 2 X 2 X 2 design. the further given first order interactions arc possible.From this design it is possible to determine the main effects for three variables i. and conduct the final test Analyze the data Type of universe and its nature. The source list or the sampling frame. • Consideration of the following activities is essential for the execution of a well-planned experiments • • • • • • • • • • • Select relevant variable for testing Specify the level of treatment Control the environment and extraneous factors Choose an experimental design suited to the hypothesis Select and assign subjects to groups Pilot-test.. revise. The objective of his study. Desired standard of accuracy • He must give due weight to various points such as the – Notes- 11.e.

How should this study be conducted? Q3. The methods are hourly wage. However. A study to determine whether it is true that the use of fast – paced music played over a store’s public address system will speed the shopping. data on supervisors in charge are available for only 242 of the 365 days. Explain the use of control groups.556 . Complete historical data are available for the following variables on a daily basis for a year: A. average. B. mediocre) Some experts feel that defective also depend on production supervisors. low) B. and weekly salary. C. Recommend the appropriate design for the experiment b. humidity. the percentages of defectives are dependent on temperature. blind. and double blinds if you recommend them. A pharmaceuticals manufacturer is testing a drug developed to treat cancer.Describe how you would operationalize variables for experimental testing in the following research question: what are the performance difference between 10 microcomputers connected in a local area network(LAN) and one minicomputer with 10 terminals? Q5. normal. a. One of the problems is patient mortality during experimentation. The dependent variable is direct labor cost per unit of output. Artisan expertise level ( expert. 36 © Copy Right: Rai University 11. A lighting company seeks to study the percentages of defective glass shells being manufactured.RESEARCH METHODOLOGY LESSON 6: TUTORIAL Q1. and the level of artisan expertise. A test of three methods of compensation of factory workers. low) C. Temperature ( high. Q4. Compare the advantages of experiment with advantages of survey and observational methods. Q2.What type of experimental design would you recommend in each of the following cases? A. incentive pay. During the final stages of development the drug’s effectiveness is being tested on individuals for different (1) dosage conditions and (2) age groups. A study of the effects of various levels of advertising effort and price reduction on the sale of specific branded grocery products by a retail grocery chain. Justify your design recommendations through a comparison of alternatives and in terms of external and internal validity. Theoretically. normal. Humidity (high.

any report would fall into one of the following three major categories: 1. form the foundation of subsequent decision reports and research reports. In describing any person. bear in mind that none of the variables should be over or under stated. you would be required to prepare reports to facilitate comprehensive and application oriented learning. I would like to give you a brief recap from our last class. object. These were identified as: • Problem Definition: It stated that before we actually initiate the investigation. I am sure you would agree. Information Oriented 2. before we start our topic for the day. the management committee. this provides the blueprint of investigation. your research results must be consistent with the decisions that you have to make. it is the substance and focus of the content that determines the category.RESEARCH METHODOLOGY LESSON 7: WRITING THE RESEARCH From this we derive the essence of our discussion today – ‘Writing the Research Report’ Being asked to write a report can fill people with horror! However. Further. the following seven questions will help you to convey a comprehensive picture 37 11. Therefore. If you carry out a research or an investigation which is not used in influencing any action anywhere. we should be clear about the problem we are facing. • Research Design: As I had also highlighted in the last class. Decision Oriented 3. Such reports of yours would be called term papers. you would put out your initial findings in a research report. We will start our lesson today with a brief classification of the various types of reports Students. paper or monograph. but also even tomorrow as a budding manager. the time and effort expected out of you as a student and your curriculum design. It is equally important that you should be able to communicate these findings and recommendations in an understandable and concise manner to the decision makers. situation or concept. Information Reports They are the first step to understanding the existing situation (for instance–business. would be required to start collecting information from the units under study. However. Broadly. you would try to investigate how various units respond to the variable or characteristics under study. you would realize that report writing there forms the basis for decisionmaking. labour market or research scenario) or what has been discussed or decided (minutes of a meeting). writing reports correctly is an essential skill that you will need not only today as a student. as a researcher. a report that you make may contain characteristics of more than just one category Your next step would be to process the data. interpretation is the ‘so what’ of a research process. then it is a sheer waste of time and resources. Research Oriented As these names suggest. which would later be condensed into an article or expanded into a series of articles or a book When you join the corporate world tomorrow. We had centered our discussion on the various steps involved in a research process. It gives you a broad idea about how to proceed further in getting information regarding the relevant variables from the units under consideration • Data Collection: Once your design is developed you. theses and dissertations depending upon the nature of the report. Such reports would be expected to be brief but comprehensive and clearly reflect your thinking as the manager. Here. economic. or the consulting group that has been given the terms of reference for fact finding or decision making. if you were a researcher. However. when I say that report writing is common to both academic and managerial situations. project reports. This is not the end of your task. They. you should remember. In academics.556 © Copy Right: Rai University . Such data analysis that you may carry out could be: • Uni-variate • Bi-variate • Multi-variate Interpretation Literally speaking. Your report should clearly highlight that the recommendation or suggestion is justified. • Data Analysis: Categories of Reports Can any of you think of various forms a report might take? No! Never mind. technological. Let me explain it you.

you should then evaluate the same against the criteria and the possible implications in implementation. you would require yardsticks to evaluate options. Students. list them and rank them by priority or their probability of meeting your end objectives. Your contingency plan must emerge from the action plan you have already prepared. provided that your thinking process © Copy Right: Rai University 38 11. WHERE. Therefore. you can check the comprehensiveness of an information or descriptive report by iteratively asking: WHO does WHAT to WHOM? WHEN. if something can go wrong. • Generating and Evaluating the Options and data • Try to identify your findings after that • Come to a conclusion • Draw up your recommendations • Plug in suggestions for further research • End your survey with back-up evidence and data In generating options it is your creativity that stands to test. If you start with a wrong problem. Problem is the beginning and the end of decision-making. Make sure that the decision is an adequate response to the problem • Drawing up an action plan RESEARCH METHODOLOGY Action steps and their consequences should be visualized to avoid your being caught unaware. while writing your report should ensure that they are very sharply focused in purpose. HOW and WHY? Decision Reports As you would well be able to make out from the name itself. you should not lose track of the main objective of what the situation should be.556 . Make sure that it is structured by criteria or options depending upon which structure is easy to understand. You should therefore be ready with parachutes to bail you out. a wrong hypothesis or a wrong assumption. you. and analyzing the data is what should follow. and significance and utility of the study • Methodology for collecting data.Subject / Object Who? Or Whom? Action What? When? Where? How? Reason Why? so far has been logical. • Then. They pave the way for new information. Criteria link the ‘problem definition’ with ‘option generation and evaluation’. hypothesis to be tested. committee/consulting/administrative report or a student report – I advise that you precede it with a proposal/ draft and its acceptance or modification and periodic interim reports and their acceptance or modification by your sponsor. • Conclusion A good decision report should not only be structured sequentially but also reflect comprehensively your iterative thinking process as the decision maker. • Making a decision Your recommendations would. flow out of the evaluation of the options. content and readership. significant hypotheses and innovative and rigorous methods of research and measurement. you will only end up solving a non-existing problem or might even create a new problem. WHERE and HOW for even the best analysis can go waste if attention is not paid to the action plan • Working out a contingency plan Therefore. but you should look beyond the obvious. you should short- Steps of Report Writing Preparing the Draft Preparation of reports is time consuming and expensive. your knowledge of SWOT analysis could be very useful. To control the final outcome of your product – whether it is a research report. Yet. Such reports that you make have to follow the below mentioned steps: • Identifying the problem Managers thrive on optimism in getting things done. and what should it be? • What are the symptoms and what are the causes? • What is the central issue and what are the subordinate Research Reports As you would all know. • Once a set of options has been generated. it is likely to go wrong. you should broadly follow the following pattern: • Undertake a Literature Survey to find gaps in knowledge • Next. but naturally. while preparing them. In constructing the criteria. • Your next job is to present the evaluation. WHEN. lay out the description and analysis of the experiment In order to achieve your end objective of bringing the existing situation to what it should be. Be clear of WHO does WHAT. research reports contribute to the growth of subject literature. decision reports adopt the problem solving approach. • Sometimes the options may be obvious. keeping in mind each of the following elements • What is the situation. There is need to think of how to achieve the second best objective if the first one is not feasible. you should clearly identify the nature and scope of issues? • What are the decision areas – short medium and long term? • Constructing the Criteria study. conducting the experiment. However. Therefore you should carefully define the problem. all this while. • As the decision maker.

Therefore please list and arrange the elements and the actors of a situation to understand its dynamics Language and Tone Since the purpose of communication is to make the reader understand the message. it should be thoroughly reviewed and edited before the final report is submitted. If not. for a reader can easily test internal consistency of the report by comparing information across pages and sections • Not all the data that is required to make the report may be available. Can you think of any? © Copy Right: Rai University • Likely product or tentative outline • Bibliography Reviewing the Draft To err is human. religion. the tone of the language also matters. recheck the focus to see whether you need to make any changes in the foundation • Hypothesis. Revising Finishing Finishing Therefore try to use a simple. Length This is a matter that needs to be judged by you as the author keeping in mind the purpose. age. The major discriminating features of the readers profile are culture. easy to read style and presentation that will help your reader to understand the content easily. ignore or reject the message.556 39 . language and tone RESEARCH METHODOLOGY • Misinterpretation of the message Revising.We can split the writing process into stages Getting in the Mood Getting in the Mood Writing the First Draft Writing the First Draft • Length? • Appearance? Author’s Purpose The lack of clarity and explicitness in the communication process leads to two major problems • Confusion in determining the mix of content. the more attractive it is to the reader. subject and the reader’s interest. to be tested • Data • • • Sources Collection procedure Methodology for analysis • Keep in mind that you may loose credibility if you fail to check for the accuracy of the facts. Reader’s Profile Readership may consist of one or more person(s) / group(s). ideologies. its organization. if any. Then you will need to decide on the types and parts of the report that can satisfy the various reader groups. education and economic background Content Please pay attention to the content’s focus. Let us now try to make a checklist that will help you in reviewing the draft • • • • Your purpose as the author? Reader’s profile? Content? Language and tone? 11. You would therefore need to check whether all of them have the same wavelength. shorter the content. Usually. Your proposal should provide information on the following items: • Descriptive title of your study • Your name as the author and your background • Nature of your Study • • • • • Problem to be examined Need for the study Background information available Scope of study To whom will it be useful • If any material is added or deleted in the text. Let us now try to work on a few tips to save words. Abstract phrases are difficult to comprehend while concrete phrases are easy to understand. However it should not be so brief as to miss the essential points and linkages in the flow of arguments and force the reader to ask for more information. Sometimes you may need to make assumptions to fill the gaps • Equipment and facilities required • Schedule – target dates for completing • • • • • • Library research Primary research Data analysis Outline of the report First draft Final draft • What is good in one situation may not hold for another. • You should clarify the focus right in the first few paragraphs to attract the reader’s attention and hold it. common interest areas will need to be segregated from the special interest areas. Finally. Revising. Revising. It can make the reader receive. and accuracy of facts and logic of arguments. Therefore after you have prepared your draft report. Revising Revising. use vocabulary and sentence structure which the reader understands.

make sure that it contains each of the following parts • Make concrete adjectives • Use abbreviations which are more familiar than their expanded form Appearance Looks Matter! Don’t you all agree with this? This therefore also holds true for your report. let’s try to list them down again [Peterson. 1987] Illustration words wordswords wordswordswords wordswordswordswords wordswordswords wordswordswords wordswordswords wordswordswords Structure style Language Proof Reading If you or another person proofreading your report is good. Presentation attracts readers and content holds their attention. unless they are needed to sharpen the message • Return it to the printer according to the agreed schedule • Also return the manuscript along with • Upon printing. Once you have thoroughly proof read your report. The novelty of presentation is as important as the originality of ideas. clarity in giving instructions to the printer and speed for meeting the printer’s deadline. • Eliminate weighty expressions Format of a Report No matter which category your report falls into. he should have the accuracy to pin point all the mistakes. you should: • • • • • Title of the subject or project Presented to whom On what date For what purpose Written by whom If there is any restriction on the circulation of the report that you have made. Both are products of creativity.556 . You’ve covered most of the tips. 2003 By Ms. Can you also give me some examples for the above? Hey! That’s nice. Hence pay complete attention to both the product and its packaging. • Make sure that you indicate correction marks at two places • • Within the line where the correction is to be carried out In the margin against the corresponding line giving the instruction • Please. However.• Cut out repetitions. your final document is ready for reference RESEARCH METHODOLOGY • Take out redundancies • Use active voice • Use shorter and direct verbs You have done quite a good job of this. ABC And Ms. I’ll just add a few more to complete the list. you should indicate it on the top right corner of the cover and title page Sample For Official Use Only Working Capital Requirements Of Xyz Private Limited Presented To Managing Director Xyz Private Limited On November 26. DEF 40 © Copy Right: Rai University 11. when you make one. Style is the way you communicate the content to the audience • A cover and title page • Introductory Pages • Foreword • Preface • Acknowledgement • Table of Contents • List of tables and illustrations • Summary • Text • Headings • Quotations • Footnotes • Exhibits • Reference Section • Appendices • Bibliography • Glossary (if required) We will now discuss each of these at length Cover and the title page I am sure you would all know what details this page needs to contain. never give instructions at the place of correction • You should mark the proof preferably with a red ball point • To catch as many errors as possible read it over and over again • One last point. Always remember that proofs are meant to be corrected not edited Final Printing Phew! At last your job is almost over.

556 © Copy Right: Rai University 41 . Side head. place of writing and date. if any? How can the conclusions be used and by whom? What are the recommendations and the suggested action plan? Text The subject matter of the text of your report should be divided into the following Headings This I am sure is very simple for you to understand. you should give a list that mentions the details and page numbers of the various tables and illustrations that you may have used to support your report. This is what I mean by Quotation. Interview BIBLIOGRAPHY GLOSSARY After your table of contents. At the end of the foreword. Your name will appear at the end of the preface on the right side. You may allow quotations up to three typewritten lines to run into the text. you should give due credit to anyone else whose efforts were instrumental in your writing the report. Make sure that your introductory pages contain: Foreword This is not numbered but counted among the introductory pages. usually an authority on the subject or the sponsor of the research or the book. Direct quotations over this limit have to be set in indented paragraphs. Have a look at the sample that I have prepared below for better understanding ……………………………………………… ……………………………………………… SAMPLE CONTENTS Preface Acknowledgement SECTION A 1. Such recognition will form the acknowledgement. Questionnaire b. Table of Contents The content sheet of your report would act as both a summary and a guide to the various segments of your report. ii. if not you may put it in a separate section. its importance and need and the focus of the book’s/ research paper’s content. spelling and punctuation. be very careful that all quotations should correspond exactly to the original in word. I am mentioning the classifications once again • • • • Center head. It would be written by someone other than you. which are put in italics. Chapter Title A. If it is short. I suggest that you treat it as a part of the preface. iii…). 3…) from the first page of the introduction. number them in lower case Roman Numerals (i. You should number the tables and illustrations continuously in a serial order throughout the book/report. Which combination of headings you would use would depend on the number of classifications or divisions that the chapters of your report have. Just as a refresher. Preface It has to be written by you to indicate how the subject was chosen. ……………………………………………… ……………………………………………… 11. Use Arabic Numerals (1. your name as the writer would appear on the right side. Center Head i. Paragraph head. At the end of the acknowledgement obviously only your name would appear on the right side and in italics. which you should put in italics. Usually keep them in Arabic Numerals or Decimal Form Summary The executive summary that you would write in the initial pages is usually of great help to a busy reader. place of writing and date. The summary should highlight the following essential information: • • • • • • • • • What is the study about? What is the extent and limitation of the coverage? What is the significance and need for the study? What is the kind of data used? What research methodology has been used? What are the findings and conclusions? What are the incidental findings. Center Side Head SECTION B SECTION C Summary and Conclusions APPENDICES a. purpose and audience. On the left would be your address. On the left come address. 2. vii ix 1 3 10 17 25 30 32 37 39 45 51 55 • A directly quoted passage or word • A word or phrase to be emphasized • Titles of articles While quoting. While writing such pages for your report. Acknowledgement As a courtesy. You all would have been using this classification right from your secondary school days. You should ensure that it covers all the essential parts of the book/ report and yet is brief enough to be clear and attractive. Quotations There may be times when you feel that you need to reproduce a portion of the work of another author to add value to your own report.Rai Business School New Delhi Campus ……………………………………………… ……………………………………………… List of Tables and Illustrations RESEARCH METHODOLOGY Introductory Pages Every time you open any book introduction is the first thing that you will come across. Each list should start on a separate page. It should list out the sections/chapters/main heads and give their corresponding page numbers along with. Center sub-head. Quotation Marks must necessarily used for Foreword v.

Footnotes would help the reader to check the accuracy of the interpretation of the source by going to the source if they want to.72 3. diagrams and maps. 42 © Copy Right: Rai University 11. Exhibits Writing just theory about any subject matter would never be sufficient. Students. clarify or give visual explanation. Also. BIBLIOGRAPHY or GLOSSARY appear in all capital letters separates each section. used by you in a special connotation. They are also a form of your acknowledgement of the indebtedness to the source. or foreign to the language in which the book is written. Please ensure that explanatory footnotes are put at the bottom of the page and are linked to the text with a footnote number. decision reports and research reports. This would give the reader an idea of the literature available on the subject and that has influenced or aided your study. They may take the form of either a table or an illustration. research (as base for further reference) or an organization (to decide the future course of action) Before we call it a day lets just look back to recapitulate all that we covered in the class today In this lesson we have discussed the steps involved in preparation of a proposal for a report.245) Medium Combined Oral and Written Oral Only Written Only Bulletin Board Grapevine Only No. The steps involved in writing reports were also highlighted. It is meant only to expand. The text should highlight the table’s focus and conclusions ………………………………………………………………………………… SAMPLE • Appendices: They will help you. It was something very general and away from the usual theory. please ensure that a divider page on which only the words APPENDICES. You will need to supplement it with exhibits for better and faster understanding by the reader. the footnote would be of use to you for this. I am summarizing the same with the following flow chart * All differences are significant at the 5% level or better except that between the last two means in the column ………………………………………………………………………………… • Illustrations: They cover charts graphs. 11. unfamiliar to the reader. First write out the appendices section. Again.91 3. p. of Employees 102 94 109 115 108 Mean Test Score* 7. I explained you three categories of reports namely – information reports.Footnotes When you insert quotations. laws.17 4. I hope you know that even this is listed as a major section in the table of content I hope you enjoyed today’s session. you would see that the following information is given for each reference: • Table 10 Mean Information Test scores of Employees receiving Communication through Different Media (From Dalhe. However it was necessary to formally list down the steps of report writing because as we mentioned. it is important that you indicate the source of the reference.556 . I am sure you would all agree that such pictorial representations also help in ensuring longer retention period. these reports are very critical in decision-making – whether in academics (for performance review). documents Illustrative material Extensive Computations Questionnaires and Letters Schedules or forms that you might have used in collecting data Case studies Transcripts of interviews Bibliographies It would follow the appendices and make sure that it is listed as a major section in your table of contents. which are technical. This is what you may do using the footnotes.7 6. as the author of the report. to a uthenticate the thesis and help your reader to check the data. rather than stand by itself. They help the reader distinguish between your contribution as the author of the report and the work of others. then the bibliography and finally the glossary. If you try to look up the bibliographical section of any book or report.56 • • • • • • Name of the Author Title of his work Place of publication Name of the Publisher Date of publication Number of pages Glossary Finally we come to a short dictionary giving definitions and examples of terms and phrases. But you must incorporate source references within the text and supplement them with a bibliographical note at the end of the chapter or book or report. there may be times when you might want to provide an explanation that is not important enough to be included in the text. Let us now try to list out the material that you would usually put in the appendices RESEARCH METHODOLOGY • • • • • • • • • • Original data Long tables Long quotations Supportive legal decisions. It should contain the source of every reference cited in the footnote and any other relevant work that you had consulted. Table: Before you introduce a table make sure that it is referred to in the text. Most of the instructions that I have listed out for tables hold good for illustrations Reference Section This section will follow the text.

) and you have a minimum of five references. Statement of constructs Each key construct in the research/evaluation project is explained (minimally. effective and concise report is an art form in itself.. Developing a good. The explanations are readily understandable (i. Doing your thesis or dissertation? Every university I know of has very strict policies about formatting and style. this final stage — writing up your research — may be one of the most difficult. Again. There are several general considerations to keep in mind when generating a report: The Audience Who is going to read the report? Reports will differ considerably depending on whether the audience will want or require technical detail. Introductory Pages. and wrestling with the data analysis. Strategic thinking What? Why? Who? Action plan How? When? Why? Gathering information Planning the report Analysing information Put something on paper Formatting Considerations Are you writing a research report that you will submit for publication in a journal? If so. it should be thoroughly reviewed and edited. I hope you all have understood each of these heads. To find the story in your research. When you write your report. We further divided a report into various parts – Title Page. every publisher will require specific formatting. We concluded the unit by explaining that before you submit your final report.g. You have to try to view your research from your audience’s perspective. Statement of causal relationship The cause-effect relationship to be studied is stated clearly and is sensibly related to the problem area. in many research projects you will need to write multiple reports that present the results at different levels of detail for different audiences. Statement of hypothesis The hypothesis (or hypotheses) is clearly stated and is specific about what is predicted. These guidelines are very similar to the types of specifications you might be required to follow for a journal article. Literature citations and review The literature cited is from reputable and appropriate sources (e. But it illustrates how a final research report might look using the guidelines given here. And. Sometimes it is based on a methodological problem or challenge. However. Text and reference Section.D. you should attempt to tell the “story” to your reader. You’re a bit like the ostrich that has its head in the sand. or whether they are about to examine your research in a Ph. professional journals. you should be aware that every journal requires articles that you follow specific formatting guidelines. Newsweek. Citations are in the correct format (see APA format sheets). You’ve been worrying about sampling response. The literature is condensed in an intelligent fashion with only the most relevant information included. whether they are looking for a summary of results. dealing with the details of design. Thinking of writing a book. To illustrate what a set of research report specifications might include.picture. books and not Time. Usually when you come to writing up your research you have been steeped in the details for weeks or months (and sometimes even for years).e. The importance and significance of the problem area is discussed. The Story Sometimes the story centers on a specific research finding.556 Key Elements Introduction Statement of the problem The general problem area is stated clearly and unambiguously. struggling with operational zing your measures. exam I believe that every research project has at least one major “story” in it. Writing a term paper? Most faculties will require that you follow specific guidelines. Even in very formal journal articles where you will be required to be concise and detailed at the same time.. you have to pull your head out of the sand and look at the big 11. jargon-free) to an intelligent reader. etc. a good “storyline” can help make an otherwise very dull report interesting to the reader. what do you do? In fact. You may have to let go of some of the details that you obsessed so much about and leave them out of the write up or bury them in technical appendices or tables Eureka! Proof reading and submission Final draft Teachers feedback RESEARCH METHODOLOGY Review Report writing Glimpsing the process Writing the report 1. This sample paper is for a “makebelieve” research project. The relationship of the hypothesis to 43 © Copy Right: Rai University . both the cause and effect). I’ve also included a sample research paper write-up that illustrates these guidelines. you need to check the specific formatting guidelines for the report you are writing — the ones presented here are likely to differ in some ways from any other guidelines that may be required in other contexts. There are legendary stories that circulate among graduate students about the dissertation that was rejected because the page margins were a quarter inch off or the figures weren’t labeled correctly. I present in this section general guidelines for the formatting of a research write-up for a class term paper. The hardest part of telling the story in your research is finding the story in the first place. Write-up So now that you’ve completed the research project.

g. tests and interviews: questions are clearly worded. fashion. records) for the study is described and is appropriate. volunteers) and. you must explain how you assessed construct validity. the implications of these results are discussed. For validity. Incomplete and run-on sentences are avoided. you must specify what estimation procedure(s) you used. the program participants are frequently self-selected (i. Words are capitalized and abbreviated correctly. you should minimally address both convergent and discriminate validity. which are not well controlled. no “feelings” about things). Material is presented in an unbiased and unemotional (e. RESEARCH METHODOLOGY The procedure for selecting units (e. The sequence of events is described and is appropriate to the design. In an evaluation. The format for the document has been correctly followed. The Abstract is the first section of the paper. and follow in a logical fashion. References All citations are included in the correct format and are appropriate for the study described. are also considered. See the format sheet for more details. which are anticipated in the study. For scales. For qualitative measures. For archival data: original data collection procedures are adequately described and indices (i. but not necessarily uninteresting. The population and sampling frame are described. Description of procedures An overview of how the study will be conducted is included. Sufficient information is included so that a reader could replicate the essential features of the study. Methods Sample section Sampling procedure specifications: Internal validity Threats to internal validity and how they are addressed by the design are discussed. Sample description Results Statement of Results The results are stated concisely and are plausible for the research described. Abstract and Reference Sections Implications of the study Assuming the expected results are obtained. if so. Design and Procedures Section Design The design is clearly presented in both notational and text form. General Style The document is neatly produced and reads well. The measures. Conclusions. Parallel Construction Tense is kept parallel within and between sentences (as appropriate). The author state which sampling method is used and why. should be described as such. specific. the procedures for collecting the measures are described in detail. You describe briefly the measure you constructed and provide the entire measure in an Appendix. External validity considerations Generalizability from the sample to the sampling frame and population is considered. Stylistic Elements Professional Writing First person and sex-stereotyped forms are avoided. Spelling and Word Usage Spelling and use of words are appropriate. 44 © Copy Right: Rai University 11. For reliability.e. multiple measures of the same construct are used. appropriate for the population.. The procedures.. Problems in contacting and measuring the sample are anticipated. Figures The figure(s) is clearly designed and accurately describes a relevant aspect of the results. Wherever possible. you must describe briefly which scaling procedure you used and how you implemented it.both the problem statement and literature review is readily understood from reading the text. Tables The table(s) is correctly formatted and accurately and concisely presents part of the analysis. Wherever possible. which are used to examine reliability and validity.. The sample is described accurately and is appropriate.g. For each construct. Abstract The Abstract is 125 words or less and presents a concise picture of the proposed research. Measurement Section Measures Each outcome measurement construct is described briefly (a minimum of two outcome constructs is required).e. are appropriate for the measures. The standards for good questions are followed. The design is appropriate for the problem and addresses the hypothesis. subjects. Sentence Structure Sentence structure and punctuation are correct. which are used.556 . Major constructs and hypotheses are included. Any threats to internal validity. Construction of measures For questionnaires. are relevant to the hypotheses of the study and are included in those hypotheses. The author mentions briefly any remaining problems.. the measure or measures are described briefly and an appropriate citation and reference is included (unless you created the measure). Reliability and validity You must address both the reliability and validity of all of your measures. combinations of individual measures) are constructed correctly.

When this is not possible. Information to be Collected The pilot test should be run exactly as if it were the actual study. The need to provide clear directions is extremely important. The survey pages should be printed using a high quality laser printer on white or off-white paper. Pilot testing allows you to answer the following questions: RESEARCH METHODOLOGY • • • • Is each of the questions valid? Are all the words understood? Do all respondents interpret questions similarly? Does each close response question have an answer that applies to each respondent? motivates people to answer it? • Does the questionnaire create positive impression. Any procedures that require complex instructions should be pilottested. Ask one question at a time. the practicality of procedures. one of the purposes of doing a pilot test is to debrief the participants after the study by asking questions about the methods. The pilot test is a good way to determine the necessary sample size needed for experimental designs. Place no questions on the front or back pages. when a new line of questioning starts. The first question should convey a sense of neutrality. instruments. Show how to skip screening questions. It is also useful to distinguish between major and minor transitions. Depending upon the availability of people. Also. Directions for answering are always distinguished from the questions by putting them in parentheses. Questions in any topic area that are most likely to be objectionable to respondents should be positioned after the less objectionable ones. Any methodology requiring time estimates should be pilot-tested. when a new page starts or to break up the monotony of a long series of questions on a single topic. Designing the Covers The front cover receives the greatest attention and contains: The title should sound interesting. then the participants in the pilot-test will have experienced something different from those in the main study. 11. Items in a Series Repeat the scale for each item. The goal is to have the respondent view the researcher as an intermediary between the respondent and the accomplishment the back cover should consist of an invitation to make additional comments. Even a modest pilot test conducted informally can reveal flaws in the research design or methodology beforehand. Establish a vertical flow. you may need to save as many participants for the main survey as you can which case. The problem of asking two questions is that each request interferes with the other. Any surveys that have not been used in the past or have been modified in any way should always be pilot-tested. and the variability of observed events as a basis for power tests. vertical flow enhances feelings of accomplishment. The first question is the most important. Ordering the Questions Questions are ordered according to social usefulness or importance: those which people are most likely to see as useful come first and those least useful come last. The first question should be clearly applicable and interesting to everyone. then you should try to get a sample with similar characteristics. From the findings of the pilot test. Why do a Pilot-test? The pilot-test is useful for demonstrating instrument reliability. Additionally. A graphic illustration. The exception here is that you will be collecting data on how • • • • A study title. upper case for answers. Use multiple column technique to conserve space. the researcher can estimate the expected group means differences as well as the error variance. one that • Are questions answered correctly? • Does any aspect of the questionnaire suggest bias on the part of the researcher? Selecting the Pilot Test Sample The sample for the pilot test should be as close as possible to the actual sample that will be drawn for the main project. Use graphic illustrations. Demographic questions are usually placed at the beginning or at the end. the availability of volunteers. and procedures. Formatting the Pages Use lower case letters for questions. If you make any change whatsoever to your study as a consequence of the pilot-test. Use Words for Answer Choices Show a connection between items and answers.The Formatting Booklet Format and Printing Procedures Print the survey booklet on 81/2 x 11 paper. Vertical flow also prevents the common error of checking the space on the wrong side of the answers when answer categories are placed beside one another. participants’ capabilities or the investigators skills. Use transitions for continuity — for example. The first question should be clearly related to the survey topic and should be easy to answer. Subtitles are often useful. Group questions those are similar in content. The purpose of vertical flow is to prevent inadvertent omissions. Any needed directions and The name and address of the study sponsor.556 © Copy Right: Rai University 45 . Transitions must also fit the situation. That is like mixing apples and oranges. Make questions fit each page. something that occurs often when respondents are required to move back and forth across a page with their answers. Some researchers often will do a pilot test on a subset of their sample and then include them as part of the main sample. The respondent should only be asked to do one thing at a time. Establish a flow of responding from one question to the next. you don’t want to include them in a pilot test. a thank you and plenty of white space. Identify answer categories on left with numbers this allow precoding of responses. Use the same marking procedure throughout the survey. The return address does not include the name of the researcher.

New Delhi Prentice Hall of India Pvt. What is this person’s opinion about career opportunities available in marketing research? Write a report of your interview. • Business Communication Service Sharma R. IGNOU • Course Design MS 95. You are the research director for a major bank. “Report Writing for Management”. Charles and Benjamin B. and Krisna Mohan. 1975. Unit IV – “Report Writing and • Wright. and if they had any recommendations how to improve the study. “Business Correspondence and Report Writing”. What type of institutional structure is best for a marketing research department in a large business firm? 2. It may be necessary to have more than one pilot test especially in the situation where instructional materials or methods have been developed. P. if they had any particular problem with any of the questions asked. McGraw-Hill Book Company • Abrams Mark.C. you would be asking questions of the participants as they read. whether instructions are understood and if the data you obtain is in the form expected. Research in Education. New York: Holt. Are all of them essential for understanding the theme of the report? Can they be pruned? 6 Edit a report using the copy reading and proof reading symbols References and Further Readings RESEARCH METHODOLOGY Activities Role Playing 1. Tata McGraw-Hill Book Company Presentation”. • Best John. “The Rational Manager”. • Gallagher. 46 © Copy Right: Rai University 11.. Ask the participants if they understood all of the instructions. Rinehart and Winston INC. Can ethical standards be enforced in marketing research? If so. what will be your outline and what stages would you do to improve the report. Examine whether the introductory pages contain all the sections indicated in this unit. Addison-Wesley • Golen. 2. You have just received a telephone call from an irate respondent who believes that an interviewer has violated her privacy by calling at an inconvenient time. • Anderson. Tregoe. You are to recruit a junior analyst who would be responsible for collecting and analyzing secondary data (data already collected by other agencies that are relevant to your operations). Social Surveys and Social Action.. If you were to rewrite the report . Witherby & Co. How would you go about preparing and handling of audio visual materials? Fieldwork 1. 2 Describe an incident that has recently occurred and check whether your description answers all the conditions indicated under descriptive reporting 3 Prepare a sample title / cover page 4 Pick up a report that you have recently prepared. discuss the following issues. compile a list of career opportunities in marketing research. If not. the Wall Street Journal.. R. William. If you were to rewrite the report. “Report Writing for Business and Industry”. Interview someone who works in the marketing research department of a major corporation. With a fellow student playing the role of an applicant for this position. C. You are a project director working for a major research supplier. 3. Ask a fellow student to play the role of this respondent. Group Discussion As a small group of four or five.556 . Presentations You have recently read a book and your friends want you to make a brief presentation about it. Interview someone who works for a marketing research supplier. 4. if they understood the intent of the study. put these sections if they are necessary for the report 5 Examine the appendices to any report. 1951. 1. Participant debriefing If your study involves questionnaires or interviews with people. J. and Zelditch Morris Jr. What is the ideal educational background for someone seeking a career in marketing research? Is it possible to acquire such a background? 3. Address the respondent’s concerns and pacify her. Unlike a pilot test where the researcher may not interact with participants. 1963. how? Self Assessment Exercises 1 Take a report of any organization and check whether the problem solving or descriptive approach has been used.. Ltd. Take a report of some organization and check whether the problem solving approach or descriptive approach has been used. In the case of instructional materials or methods. “Report Writing”. what will be your contents outline and what steps would you follow to improve the report. 2. you would do a formative evaluation of the materials and methods. The respondents express several ethical concerns. conduct the interview. London: William Heinemann Ltd. or listen to the instruction and when they are quizzed on what they have learned. watch. What is this person’s opinion about career opportunities in marketing research? Write a report of your interview. A Basic Course in Statistics with Sociological Application. or the New York Times. England • Kepner H.long procedures take. Using your local newspaper and national newspapers such as USA Today. Does this applicant have the necessary background and skills? Reverse the roles and repeat the exercise. you should have a debriefing session at the completion of the pilot test. Stevan. what actions facilitate or inhibit the operation of the study.

An Introduction. Educational Research. Notes - 11. Methodology in Social Research New York: McGraw Hill Book Company. New York: David Mckay Company.556 © Copy Right: Rai University 47 .• Blalock Jr. B. and Blalock Ann. 1976. R. Herbet. 1968 RESEARCH METHODOLOGY • Borg Walter. M.

Consumer and pressure groups are increasingly concerned about the social conditions in which workers from developed and developing countries are subjected. The key parameters that we look at when carrying out retail audits are: RESEARCH METHODOLOGY Sources of Market Data Retail Audit Consumer Panel Consumer Panel TV Meters Diary Method Internet as a source of Data • • • • In-store availability of product/brand. but also the brand. It’s not just the product that needs to be sold. Forced labor.) 48 © Copy Right: Rai University 11. location. • Customer demand.) • Fabric care (fabric bleach. Pricing of product/brand cross-tabbed with type/location of outlet. poor conditions and dangerous working • • • • • • • • Identification of market opportunities Trend analyses and forecasting Studying market structure Prioritisation of markets Conducting analyses of competitors Product portfolio analysis Understanding changes in distribution Pricing trend analyses Product Categories Covered This Audit covers more than 100 product categories including • Baby products (oil. it became increasingly important to develop a strong brand image.) • Food products (butter. specialty). we will be discussing in detail each and every step of research.LESSON 8: TECHNIQUES OF DATA COLLECTION UNIT II TECHNIQUES OF DATA COLLECTION AND SAMPLING Studentsso far we have studied about various research processes and the writing of report. every organisation has a social responsibility. The data obtained from the retail audit is useful for carrying out Secondary Data Sources of Secondary Data RBI Economic Survey CSO Investment Data Foreign Trade Survey Data Types of Survey Techniques Now let us discuss these in detail Retail Audit: Retail Audit is a common term in marketing research Audits During the 1990s. many companies are moving their production from their home countries to nations where manufacturing costs are considerably lower. • Resulting market share and rank/position of product/ brand.) • Environmental hygiene (air freshener. sanitary napkins. salt. However. detergent. the role of the company extends beyond just financial issues. nail polish. low pay. toilet soap. lipstick.) • Beverages (coffee. interviews with employees and a closing meeting. margarine. talcum powder. tea. The write up part we have done in detail. whitener. floor cleaner.) • Contraceptives • Cosmetics (colognes. soap. squash and juice. soup mix. toothpaste. Now. diapers. After the research problem is framed along with the hypothesis the next step is the collection of required information and data. The design of a retail audit is critical to the success of the project. The audit process includes an opening meeting. concentrated drinks. powder. Today. toothbrush. Sales volume cross-tabbed with type and location. weaning food.556 . liquid. It must be noted that there are no readily available retail universe data. syrup. In this class we will be focusing specially on the collection of qualitative data for market research. factory tour. etc. etc. packaged food. document review. They expect companies to accept its responsibilities and to conduct its activities in accordance with the ethical and moral values accepted in the country in which their product is sold. washing powder. feelings and identity that put over a positive message to consumers. milk food. charged with values such as ethics. child labor. perfume. Types of outlets (by owner. deodorant. • Display value. The list of techniques and sources of data are environments are all areas of serious concern to the reputable retailer or brand owner.) • General toiletries (mouthwash. floor polish. quality.

cold cream. In this case no type of experimental treatment is involved.• Hair care (conditioner. Consumer panels can also be an overly costly. “White area”: from official stats sources to fully legal retail.) • Skin care (cream. • • • • • • • • • • • Market size in terms of units sold. and sample accordingly. and perceptions. and presented as text. vodka. Fine-tuning and approval of research approach. termed as. ad-hoc open air markets. 5. blades. etc. A static consumer panel of families with young children might be set up to monitor the acceptance of new line of toys. gin. rum. face-wash. Consumer panels are a unique tool that can enable a clever researcher to examine dynamic longitudinal changes in behaviors. termed as discontinuous access panels. a sample of families might be interviewed initially to gather information on their purchases of soft drinks. analyzed. confectionery. Following steps can be followed We never assume that our clients will mean the same thing under “retail audit”. possibly over several weeks to obtain a good idea of their “steady state” purchasing patterns. etc. condensed milk. paint. In this way. Mystery shopping. digestive. c.556 © Copy Right: Rai University 49 . Launch and management of field research As a rule. etc.) 7. • • • • Scope and goals. Optimal sample size. The effect of a special offer can be measured through a before-and-after design using a panel approach. Usually the sources can be broken down into three basic groups: a. shampoo. Contacting the panel either too frequently or too infrequently may lead to reduced cooperation. 2. bar code data (scanning d-bases). liquor. volume and value Market share by volume and value Numeric distribution Weighted distribution Share among handlers Out-of-stock retailers Per dealer off-take Purchases by retailers Stock levels with retailers Stock turnover ratio Trends for market. lubricants. we use the following field research methods: • • • Observation.) • Health products and OTC (analgesic. Draft the research plans and schedule. Design and production of customized research tools. The Benefits of Continuous Consumer Panels 1.) • Semi-durable products (batteries. structured. wine. Rather. medicated dressing. tube lights. Face-to-face POS interviews. and kiosks (partial reporting) original. We always strive to define exactly the specific knowledge needs. say. company. 4. Cheese. Thus. audit code levels. continuous panels. Panel studies can involve data collection at widely different intervals varying anywhere from a day to several years between waves of interviews. b.for size and shares Consumer Panel There’s nothing (consumer) panel data can tell us that we don’t already know from scanner data. customized databases. but locally unauthorized product. etc. “Grey area”: includes medium and small wholesale.) • Snack foods and soft drinks (biscuits. and the purchases of the same sample are monitored for perhaps every week for three months. etc. razors. Our experience has taught us that there can be no long-term representative samples. excessive generator of unused data What are Consumer Panels? There are two basic kinds of consumer panels. Panel operators are continuously faced with the decision about how often panel members should be contacted and asked to report. the data are punched in (software and formats to be determined based on client needs). Deadlines. Structure and format of reports. lotion. Each new project requires a revision of the existing sample size and structure in order to achieve credible results. brand and SKU . or a combination of these. attitudes.) • Shaving products (after-shaves. etc. dye. graphics. etc. They are mostly non-existent.) Measures “Black area”: private entrepreneurs operating without a license. babushkas. etc. sampling variation is minimized and both short-term and long-term effects of the deal are obtained. whisky. The second kind of panel consists of samples of pre-screened respondents who report over time on a broad range of different topics. every month on the toy 3. 8. RESEARCH METHODOLOGY • Liquor (beer. respondents report essentially the same information repeatedly over some period of time. brandy. milk powder. Analysis and report writing After verification. van sales. chocolates. Note: do not expect data on opening stock/deliveries/closing stock. A special deal for a particular brand is then introduced. Data collection 11. methods of collecting quantitative & qualitative data. and design the approach.) • Milk products (milk. indicating. In the first kind.. 6. Both kinds of panels come in all different forms. oil. The chief examples of these kinds of panels are the syndicated purchase panels using store and home. methodology. information is obtained. 1. bulbs.

(3)Marketing and advertising experimentation The following examples illustrate some uses of discontinuous consumer access panels: 1. respondents might receive the actual racquets for use testing. they might be asked to evaluate a single advertisement or to choose from among multiple advertisements. By obtaining such data every week for several years. and (4)Record maintenance. and asked for their preference between the two. Similarly. three uses are especially common 50 © Copy Right: Rai University 11. Alternatively. or the World Wide Web does even when panels are recruited by personal methods. Also estimates can be derived of the extent to which purchasers remain loyal to different brands. A manufacturer of tennis racquets is considering alternative shapes for a new racquet that would make it easier to handle.556 . thus allowing the respondent to build up confidence in the validity and trustworthiness of the study. For example. These different groups of panelists can then be used separately in the separate waves of the panel. A dynamic consumer panel might be used to keep track of the purchases of frozen foods of one brand in relation to other brands. data are compiled on the types of families that are buying any of the new toys. users of denture cream. They are more relevant because respondents can be easily screened on the basis of prior questions (e. samples of each of the alternatives would be sent to relevant panel members for their evaluation. 3. a continuous consumer panel operation poses four problems: (1)Gaining and maintaining cooperation. in middle or upper middle- The Benefits of Discontinuous Consumer Access Panels The benefits of discontinuous consumer access panels are primarily related to reductions in the cost and time required obtaining market research information. changes in economic and social conditions) from effects due to the aging process. Those who cooperate are more likely to be more educated. sheets with pictures and a description of the new racquets might be sent by mail or e-mail to pre-screened samples of respondents who play tennis. In the next section we discuss problems with discontinuous consumer panels that sometimes make one-time surveys the better alternative.purchases of the families. and may be much lower if recruiting. very detailed information can be obtained on what sorts of families are purchasing each major brand and on the change in market shares of the different brands over time among different groups of consumers. A continuous consumer panel may be used to obtain more detailed and reliable information on different types of behavior. In all cases. RESEARCH METHODOLOGY Challenges of Panels? Essentially. recent car purchasers) They are often better quality because respondents are experienced and can easily be pre-qualified as panel members on the basis of the quality of the previous survey responses. pet owners. A continuous consumer panel is the only means of obtaining information on a series of events extended through time. It is possible to use a series of demographically identical discontinuous access panels for the purposes of continuous tracking. (2)Evaluation of new product concepts and formulations. By monitoring the behavior of peers at the same time. Internet. 6. Similarly. Instead of a new product. Since the sample is not static. the purchase habits of teenagers might be monitored over a number of years to ascertain how these purchase habits change as the subjects move into a different stage of life. The testing could also be done by the advertising agency before the recommendation was made to the manufacturer. Any one respondent would receive only one of the alternatives. such as measures of trial and repeat and brand switching. For example. a marketer might be considering a new advertising campaign for an existing product. Two reasons for using discontinuous panels are because they can provide greater relevance and better quality.. It is obvious that similar information could be obtained from one-time surveys. At a later stage. the objective is to screen different ideas or executions inexpensively by having a panel evaluate them singly or side-by-side. In this way it becomes possible to measure changes in program acceptance and to relate attitudes and behavior at one time to viewing and attitudes toward earlier episodes.e. It has been demonstrated that data on consumer financial holdings are obtained much more reliably if this information is sought over a period of time. 4. however. (1)Screening for special populations (especially for rare special populations). (2)Information validity and reliability. panel members could be asked to evaluate different designs or layouts for a web page or a brochure. Gaining and Maintaining Cooperation Mail. Selecting demographically identical samples containing different panelists at predetermined intervals across time can do this. Again. but the manufacturer could determine which racquet was preferred from the different samples. means of obtaining other insights in at a lower cost than it would be to maintain a continuous. 1. in professional or clerical occupations.. Initially. information on medical care events is obtained much more accurately from panels than from one-time surveys. but with greater difficulty and at greater expense.g. traditional static panel analytics. Only through continuous consumer panels is it possible to monitor changes in the behavior of particular cohorts. 2. In this way. reactions to the weekly episodes of a television program are best obtained by monitoring the viewing of the same family and at the same time getting their reactions to the different programs. the initial rate of cooperation can be as low as 50%. full-time panel. and might wish to choose between several alternatives that had been proposed by the advertising agency. As above. it becomes possible to distinguish effects due to history (i. What is gained. and how many of the toys are purchased by each family. respondents might receive pictures of two racquets with the order of the pictures randomized. Although these kinds of panels are used in a wide variety of ways. 5. (3)Panel conditioning. are lost. how soon the toys are purchased after they have been placed on the market.

the data from individual surveys are weighted to further control for major demographic biases. there is the problem in a longterm panel study of keeping track of changes in the composition of these households and changes in their characteristics. the number of pieces of information could be as high as 600. For example. A family member may leave the household. a panel at the beginning of the operation may not be representative of the population from which it was selected. particularly so that analyses can be made either on a cross-sectional or longitudinal basis. As a result. especially if they are asked to keep extensive written records. As with continuous panels. which is not unusually high. respondents who keep diaries about visits to restaurants may become aware of the large amount of money they are spending on restaurant meals and either reduces the frequency of the visits to restaurants or switch to lower cost ones. the employment status and other characteristics of the individual members will change over time. may be even less representative than those recruited through more deliberate or personal means. Panel Conditioning The third major problem that affects continuous panels. another may be born or move into it. They exist in some sorts of studies. All of these changes have to be recorded so that the data can be used for analytical purposes when required. these problems are identical to those conducting one-time surveys. If a panel has 10. methods exist for detecting and correcting such effects. 2. or a new household may be formed. Although diaries significantly reduce reporting error as compared to recall. In addition. the capacity of computers has expanded so rapidly that the computers themselves are no longer the problem. say. however. Some idea of the magnitude of the problem can be obtained from the fact that in many panel studies a single round of data collection may provide information on 500 to 1. panel operators take steps to reduce this non-representativeness through the use of a variety of methods such as selective recruiting and weighting the panel. The major problem was and remains the designing of computer systems RESEARCH METHODOLOGY 11. something that could be a problem even for a static panel. Almost all of these panels are recruited by mail with initial cooperation rates usually below 5%. This is only the beginning of the problem. It is often claimed by operators of such panels that the response rate to an individual survey is 70 percent or higher. Sample representativeness is also a concern when recruiting online panels. As with continuous panels. so are many of these same efforts being used to make-to-make online research more representative.000 families. but probably not discontinuous ones is the danger of conditioning effects. Surfers who inadvertently stumble on to a site.000.000 variables. This is especially a problem for behaviors that are infrequent and of low salience to respondents.” The representatives and precision required to determine which of six package designs is most appealing is different (perhaps less important) than that needed to estimate the impact of a price change on market share. The attitudes and behavior of the panel members also have to be recorded in such a way that the data are readily accessible. but this refers to respondents who had already previously agreed to participate. Many panels still rely on diaries. Record Maintenance The fourth major problem of a continuous panel study is not so much methodological as of the researcher’s own making.556 © Copy Right: Rai University 51 . Fortunately. Panel conditioning effects are both erratic and pervasive. Critics argue that because on-line panels necessitate computer literacy and the means to access the Internet. particularly at an aggregate level. Of course. Since members of a panel are usually households or families. reporting errors do occur if panel members forget to make their entries. As a result. For discontinuous panels. As with panel mortality. 3. but do not seem to exist in others. although even such equipment and meters do not prevent errors caused by respondents forgetting to use them. That is. Attrition can be substantial. shipment data can validate results. there are also very high initial dropout rates. or attempt to recall and record earlier behavior at the end of the recording period instead of at the time that it occurred. a family asked month after month about ownership of savings accounts may decide to open a savings account. and in the younger and middle age brackets. rather than single individuals. In a similar fashion.000. About half the people that even consent to participate may drop out after the first two or three rounds. Yet. Independent of the population being sampled is the reliability of the information obtained from panel members. five years. The extensive and increased use of discontinuous panels suggests the responses obtained from these panels do provide information that is sufficiently accurate for making marketing decisions. Since many of the uses of the panel relate to attitudes and buying intentions. It is important to realize that “purpose defines precision. 4. a panel operation can become increasingly unrepresentative of the population from which it originally came. a household may be dissolved. or ones who are attracted by the lure of a lottery. These concerns of representatives may be exacerbated depending on how the panel is recruited. even though they originally had no such intention. just as efforts are made to make off-line panels more representative. Information Validity and Reliability This problem of sample representativeness is also the key problem for discontinuous consumer panels. Certainly the introduc- tion of portable household scanner equipment and electronic meters for television viewing has increased validity. and information is obtained every month for. For continuous panels that are more often measuring behavior. This is the need for some systematic means of keeping easily accessible records on the activities of panel members and on changes in the characteristics of these panel members over time. the possibility that behavior or attitudes of panel members will be influenced or contaminated by their participation in the panel. the only way of ultimately verifying the quality of responses is to observe marketplace results.income levels. it is biased against low-income groups and technological laggards. Operators of discontinuous panels make initial efforts to balance their samples for major demographic characteristics by selectively recruiting respondents from groups least likely to cooperate and by dropping from their panels respondents from groups that are over-represented.

80 www. UK Martin Hamblin 1969 44-20-7222.com 14.5433 www. Brazil Instituto de Pesquisas Datafolha 1983 55-11-224. UK Harris Research 1965 44-20-8332. USA NFO Worldwide 1946 203-629-8888 www. Japan Marketing Intelligence Corporation (MiC) 1960 81424. Inc.com 24.opinionresearch.that make storing and accessing the data straightforward. UK ORC International 1938 44-20-7675. Inc.com 13.ibope. Canada CF Group Inc.2685 3.com RESEARCH METHODOLOGY 52 © Copy Right: Rai University 11. USA AC Nielsen 1933 203-961-3330 www. France SECODIP / Groupe SOFRES 1969 33-1-30.uk 25.28.isisresearch. It is especially important for syndicated services to design systems that make client access to data fast and easy.jp 2. Worldwide Major Panel Operators Country/ Firm/ Started/ Phone/ URL 1.angusreid. Ltd.nikkei-r. USA MORPACE International.co. USA Maritz Marketing Research.marketfacts. 1956 716-272-9020 www.fr 7. UK MORI (Market & Opinion Rsch Intl)) 1969 44-207222. UK Research Resources Ltd 1986 44-20-7656.com 3.1200 16. Inc.mori. USA Opinion Research Corporation Intl.1000 www. USA Harris Interactive.613. 1961 91-22-218. Indonesia PT AMI Indonesia 1996 62-21-521.68. USA NPD Group.indica.com 14.5802 www.com 2.06.com.Shakai-Chosa Kenkyusho) 1. Inc.secodip. France IRI-SECODIP 1993 33-1-30. France Ipsos France 1975 33-1-53.1741 www.acnielsen.com 12. UK The Gallup Organization 1937 44-208-939. 1979 1.procongfk. 1946 847-590-7000 www.com 9.csa-fr.uk 30.com 29.3933 12.br 11.indicator. UK BJM Research and Consultancy Ltd 1973 44-207891.9898 20. UK MVA 1968 44-1483-728051 www.com 4.5555 32. Inc. 1932 1-416924.5000 www.28 5.ipsos. 1962 81-3-5541.1587 www.uk 19.74.ifop. Korea Hyundai Research Institute 1986 82-2-737. Inc.bva.br 13.2033 www.5164 www.3636 www.58.nfow.2891 www.millwardbrown.unfores.videor. India MBL Rsch. UK Information Resources 1992 44-1344-746000 www.8100 www. India Indian Market Research Bureau (IMRB) 1970 91-22432.241.mediametrie. Brazil MARPLAN BRASIL Pesquisas Ltda.com 11. 1970 81-3-5281. UK The Research Business International 1981 44-207923.crresearch.06.co.K.opinionresearch. Brazil IBOPE GROUP IBOPE Ad Hoc 1942 55-113066.imrbint.556 . UK INFRATEST BURKE GROUP LTD. 1923 914-698-0800 www. 1931 513-241-5663 www.44 www.22. 1946 44-208861.harrisinteractive.5751) Burke International Research 1. Canadian Facts. 1953 516-625-0700 www. Inc. 1987 9140-335. Japan NIKKEI RESEARCH INC.com 6.com 15. UK Isis Research plc 1973 44-208788. France Research International 1952 33-1-44.mva-research.researchint.com. India Indica Research Pvt.com 10.micjapan.npd.com 16.rslmedia.8000www.com 3. Canada Angus Reid Group.6506 www. France MEDIAMETRIE 1985 33-1-47.co.76.21. France B.jp 4 GfK Research Services 1997 91-212-216.com 27.5000 31. USA Ziment 1976 212-647-7200 15.com 33. 1973 636-827-1610 www.burke.6000 www. India ORG-MARG Research Ltd.0232 www.co. USA Lieberman Research Worldwide 1973 310-553-0550 www. 1994 91-22-265. 1958 55-113361. UK Ipsos-RSL Ltd.co. UK BMRB International 1933 44-20-8566. USA Burke.lrw.com 28.84. France CSA (CSA TMO Group) 1983 33-1-41.84.com 26.fr 2.uk 23.com 18.6922 17.morpace.65.mblindia.com 8. USA Taylor Nelson Sofres Intersearch 1960 215-442-9000 8.00 www. 1938 908-2815100www.65 www.martinhamblin.bmrb.br 10.marplan.00 6.80.8819www.com.com (K.97.14. UK Research International 1962 44-20-7656. & Consultancy Group Pvt.co.com 21.22.com 5.8181 www.00 www. (ARC. USA Market Facts. USA Roper Starch Worldwide.V.com 9. Ltd.trbi. USA Creative and Response Rsch Services. Turkey Procon 1.maritz.gallup.88. Brazil INDICATOR Pesquisa de Mercado Ltda.58 www.3000 www. Inc.91 www.uk 17. 1974 44-208782. 1940 248-737-5300 www. UK Millward Brown UK Ltd 1973 44-1926-452233 www. 1987 55-113365. Inc.roper.3420 18.com 4. Japan Video Research Ltd. UK GfK Marketing Services Ltd 1992 44-870-603.com 7.7000 www.A.3000 22.86.gfkms. France IFOP 1938 33-1-45. 1970 33-1-30. 1960 312828-9200 www.

and you don’t have usable research. Here’s the one that I wish were easier to measure. We have to insist on open disclosure of cooperation data for all major segments of the population.TV Meters It’s easy to get lost in the details of audience research. there are response rates. Good research always requires: • And just how much of this task will they tolerate. which uses picture matching as an alternative means of identifying the channels tuned. As for overall sample size: Frankly. with a good knowledge of the population we’re measuring... There do appear to be major factors beyond the direct control of individual suppliers. or anything else. Sometimes I think we go overboard with meter panels. or diaries. Those are some of my core beliefs about media research. • To a supplier with high-quality process controls. and we’re honestly not sure how much that affects us.” Panels and surveys are frequently judged more on composition than cooperation. There are many shortcuts to a balanced panel. on each of those four dimensions in order to make a good decision. It doesn’t get much better than that— and it’s often much less.” but whether the ones we have are representative of their population segments. The importance of sample size doesn’t need much additional stress from me. we have the respondent’s ability and willingness to provide accurate data: • How Many. then your panel is “representative. but most of them cost money. Under the heading of “getting the right people. But in the end.. they are low. causing an accelerating overall trend toward lower cooperation. Take away any one of those. Cash incentives. maybe then we’d pay them more attention.. RESEARCH METHODOLOGY • • • • A sample of the right people. Specifically. one obvious and one not so obvious: Response rate It would be easy to simply decry the fact that the television ratings have response rates that are painfully low. Who provide accurate data about their behavior. Some people think that if your sample appears to mirror the population demographically. We’d always like to have more stable data. it’s at least equal in importance to the other three quality factors.556 © Copy Right: Rai University . There are tremendous variations in meter equipment around the world. and even getting appropriate data about cooperation can sometimes be a challenge. though. 53 • Can they answer the question.. and we have to consider not just whether we have “enough. For a supplier to invest in response rates. It would also be easy to make excuses. And of course. But they also directly affect the unit costs of a research supplier. That’s why it’s critical to have a transparent. Obviously. and conducted openly.. In sufficient quantity for your purpose. they have to believe that it’s a priority with customers. • In sufficient quantity for your purpose. verifiable system in place. To a supplier with high-quality process controls. the BBM in Canada had rolled out an intriguing new people meter design for use nationally and in local markets. with an industry steering committee representing all parts of the industry. with full disclosure of defined. and what do they mean when they do so? 11. with meters. The “who” questions—getting a sample of the right people to participate in the research—often get the lion’s share of our attention. and they work universally. But I’d like to remind you quickly that the overall reliability of surveys could have multiple components. BBM has licensed a newly designed people meter from Taylor Nelson AGB. The methodology issues of • Who. Let me segue from that to a few observations specific to meter measurement. with provision for two independent audits. I would argue that you need to have knowledge. We have to start off well. research quality and usability can be very difficult to assess. is reducing the total error in our surveys. and does the question make sense? • Will they push their buttons. each with its own strengths and weaknesses. the golden yardsticks of survey research. so today. Obviously. So be wary of simplistic assumptions about sampling error.” television measurement. it’s just as important to collect good data from that sample... Which leads nicely into the “how” issues. for example. sacrificing our knowledge of the other quality issues while putting samples under the microscope. But there’s no question that sampling issues are critical. If you’re trying to decide whether to use a new source of research.. But that’s not enough. I’d like to begin with a very simple model for television research quality assessment. For example. But that’s too easy—on us. you’ve got to be able to address these four issues. you have to ask whether you’re really going to receive: • A sample of the right people. anyway? Beyond the capabilities of the respondent lie the technical capabilities of any equipment used in the survey or panel. One of the strengths of the BBM system was their open approach to testing this new meter. objective policies and procedures. and an opinion. are still the most potent way to affect response rates. Having stable data isn’t very helpful if all you’ve done is make a bias more consistent. As important as it is to have a good sample of adequate size. is challenged in at least two ways.. or if you’re trying to influence the priorities of an existing service. and • What Data can only be considered? in the applied context of how the work is actually being executed. I place a number of distinct issues under this heading. and more variations are coming. I believe that an acceptable research service has to be thought of as having four quality components... This is the way to pursue changes in technology—with an unusually rigorous test design. I only wish that other quality concerns had such appealing summary measures. The real issue. because in my mind. There are fixes to the declining response rate problem. especially. The details can quickly overwhelm you. • Who provide accurate data about their behavior. there’s never enough sample.

To find out who is watching TV and what they are watching. The next challenge is direct-to-home satellite. Television measurement is dependent on many non-government measures of the population. meters installed in the selected sample of homes track when TV sets are on and what channels they are tuned to. has been the way in which people spend their time. homes. One of the most fruitful time-budget endeavors. health. the research company gets around 5. and more recently sociologists. events and other aspects of individuals’ daily lives.” which is just a computer and modem. First. and people are measured in a variety of ways. Under the heading of “what”—of trying to collect better quality data from our samples—let me focus today just on meter measurement of set tuning. Finally. One of the greatest concerns is the extent to which the population can change rapidly without us knowing about it. too. and sexual behaviour. like other self-completion methods. To ensure reasonably accurate results. diaries can help to overcome the problems associated with collecting sensitive information by personal interview. To at least some users. diaries are used as research instruments to collect detailed information about behaviour. To find out what people are watching. the company is able to keep track of how many people watch which program. More recently. More qualitative studies have used a “standard day” diary. there’s a lot of effort directed at better measurement of set tuning with meters. and then combines this information with huge databases of the programs that appear on each TV station and cable channel. historians and literary scholars have long considered diary documents to be of major importance for telling history. A placing interview is important for RESEARCH METHODOLOGY 54 © Copy Right: Rai University 11. Using Diaries in Social Research Biographers. diaries can provide a reliable alternative to the traditional interview method for events that are difficult to recall accurately or that are easily forgotten. initiated in the mid 60s. The ‘time-budget’ involved respondents keeping a detailed log of how they allocated their time during the day. measure who is watching by giving each member of the household a button to turn on and off to show when he or she begins and ends viewing. Accounts of time use can tell us much about quality of life. crime behaviour. and then count how many in that audience view each program. but a network may need millions more watching that program to make it a financial success. Other topics covered using diary methods are social networks. New media technologies pose significant near-term threats to the current generation of television set meters. Statistical sampling This is the same technique that pollsters use to predict the outcome of elections. In contrast to these ‘journal’ types of accounts. social work and other areas of social policy. The national TV ratings largely rely on these meters. Internationally. A “sample audience” is created.Ability to Estimate the Universe Another sampling challenge concerns our ability to estimate the universe that we project our numbers to. This information is also collected each night. The ‘diary interview method’ where the diary-keeping period is followed by an interview asking detailed questions about the diary entries is considered to be one of the most reliable methods of obtaining information. The researcher then extrapolates from the sample and estimates the number of viewers in the entire population watching the show. Second. has been the Multinational Time Budget Time Use Project. those population estimates are of growing concern. That’s not measured today. A show that has several million viewers may seem popular to us. Then by monitoring what is on TV at any given time. That’s why some shows with a loyal following still get canceled. Small boxes. and digital television in general. This research is very costly. A “black box. That’s a simple way of explaining what is a complicated. Its aim was to provide a set of procedures and guidance on how to collect and analyse time-use data so that valid crossnational comparisons could be made Two other major areas where diaries are often used are: • Consumer expenditure and • Transport planning research. The researcher relies mainly on information collected from TV set meters that it installs. illness and associated behaviour. social and economic well being and patterns of leisure and work. placed near the TV sets of those in the national sample. the company uses audits and quality checks and regularly compares the ratings it gets from different samples and measurement methods. clinical psychology and family therapy. alcohol consumption and drug usage. sociologists have taken seriously the idea of using personal documents to construct pictures of social reality from the actors’ perspective (see Plummer’s 1983 book Documents of Life). Diaries are also increasingly being used in market research. market researchers. The Subject Matter of Diary Surveys A popular topic of investigation for economists. Programmers also use these data to decide which shows to keep and which to cancel. extensive process. Cable and satellite penetration and multi-set distribution are just two of them. which focuses on a typical day in the life of an individual from a particular group or community.000 households to agree to be a part of the representative sample for the national ratings estimates. programs.556 . Using Diaries in Surveys Diary surveys often use a personal interview to collect additional background information about the household and sometimes about behaviour or events of interest that the diary will not capture (such as large items of expenditure for consumer expenditure surveys). Advertisers pay to air their commercials on TV programs using rates that are based on these data. Self-completion diaries have a number of advantages over other data collections methods. diet and nutrition. Then TVs. they can be used to supplement interview data to provide a rich source of information on respondents’ behaviour and experiences on a daily basis. gathers and sends all this information to the company’s central computer every night.

An A4 booklet of about 5 to 20 pages is desirable. Data Quality and Response Rates In addition to the types of errors encountered in all survey methods. 8. at what time. inadequate recall. incomplete recording of information and under-reporting. there are certain design aspects. Appropriate terminology or lists of activities should be designed to meet the needs of the sample under study. it is important to give strict guidelines on what type of behaviour to include. 1. Literacy All methods that involve self-completion of information demand that the respondent has a reasonable standard of literacy. 3. Although the design of a diary will depend on the detailed requirement of the topic under study. There should be an explanation of what is meant by the unit of observation. an “event” or a “fixed time block”. guidance should be given on how to deal with “competing” or multiple activities. or becoming less conscientious than when they started the diary. An obvious advantage of the free format is that it allows for greater opportunity to recode and analyse the data. Pages should be clearly ruled up as a calendar with prominent headings and enough space to enter all the desired information (such as what the respondent was doing. Very long lists should be avoided since they may be off-putting and confusing to respondents. Time budget diaries without fixed time blocks should include columns for start and finish times for activities.explaining the diary keeping procedures to the respondent and a concluding interview may be used to check on the completeness of the recorded entries. Following the diary pages it is useful to include a simple set of questions for the respondent to complete. such as a “session”. anything from one to three day diaries may be used. Checklists of the items. Below are sets of guidelines recommended for anyone thinking about designing a diary. Diary Keeping Period The period. This should stress the importance of recording events as soon as possible after they occur and how the respondent should try not to let the diary keeping influence their behaviour. Household expenditure surveys usually place diaries on specific days to ensure an even coverage across the week and distribute their fieldwork over the year to ensure seasonal variation in earnings and spending is captured. Participation The best response rates for diary surveys are achieved when diary keepers are recruited on a face-to-face basis. the labour intensive work required to prepare and make sense of the data may render it unrealistic for projects lacking time and resources. 5. Thus the diary sample and the data may be biased towards the population of competent diary keepers. 7. different versions of the diary should be used for different groups. and if necessary. that is. Even if these remarks will not be systematically analysed. most respondents do not carry their diaries around with them. events or behaviour to help jog the diary keeper’s memory should be printed somewhere fairly prominent. Expenditure surveys find that an intermediate visit from an interviewer during the diary keeping period helps preserve ‘good’ diary keeping to the end of the period. or they can be highly structured where all activities are pre-categorized. each page denoting either a week. This is also observed for other types of behaviour and the effects are generally termed “first day effects”. Disappointing. Where respondents are given more freedom in naming their activities and the activities are to be coded later. insufficient cooperation and sample selection bias. rather than by post. allowing respondents to record activities and events in their own words. Recall errors may also extend to ‘tomorrow’ diaries. 2. primary and secondary (or background) activities. they may prove helpful at the editing or coding stage. as it might seem. 6. For a structured time budget diary. Respondents often write down their entries at the end of a day and only a small minority are diligent (and perhaps obsessive!) diary keepers who carry their diary with them at all times. who with and how they felt at the time. which are common to most. Often retrospective estimates of the behaviour occurring over the diary period are collected at the final interview. an exhaustive list of all possible relevant activities should be listed together with the appropriate codes. and so on). Depending on how long a period the diary will cover. over which a diary is to be kept needs to be long enough to capture the behaviour or events of interest without jeopardizing successful completion by imposing an overly burdensome task for collecting time-use data.556 © Copy Right: Rai University 55 . It is also good practice to include a page at the end asking for the respondents’ own comments and clarifications of any peculiarities relating to their entries. among other things. depending on the nature of the diary. Personal collection of diaries also allows any problems in RESEARCH METHODOLOGY 11. Diary Design and Format Diaries may be open format. Reporting Errors In household expenditure surveys it is routinely found that the first day and first week of diary keeping shows higher reporting of expenditure than the following days. where. a day of the week or a 24 hour period or less. Furthermore. whether the diary-keeping period was atypical in any way compared to usual daily life. 4. A model example of a correctly completed diary should feature on the second page. the amount of piloting required to perfect the diary format should not be under-estimated. They may be due to respondents changing their behaviour as a result of keeping the diary (conditioning). The inside cover page should contain a clear set of instructions on how to complete the diary. diaries are especially prone to errors arising from respondent conditioning. what definitely to exclude and the level of detail required. or where the sample is large. However. asking. Where more than one type of activity is to be entered.

which includes checking entries against information collected in the personal interview. for which a common coding scheme is neither feasible. Coding. methods of analysis based on algorithms for searching for patterns of behaviour in diary data are being used. conducting research. The 10 ‘C’s outlined here. particularly where the recall method gives poor results. This kind of data bank gives the researcher access to original diary documents allowing them to make use of the data in ways to suit their own research strategy. qualitative software packages such as The ETHNOGRAPH can be used to code them in the same way as interview transcripts. competent and well briefed. reassuring them of confidentiality and offering incentives are thought to influence co-operation in diary surveys. For textual diaries. One research company gives a 10-pound postal order for completion of their fourteen-day diary and other surveys offer lottery tickets or small promotional items. More recently. Success may also depend on the quality of interviewing staff that should be highly motivated. It offers. in much the same way as it is for processing qualitative interview transcripts. satiric or serious? What is the date of the document or article? Is the “edition” current? Do you have the latest version? (Is this important?) How do you know? 56 © Copy Right: Rai University 11. the ethics of making personal documents public (even if in the limited academic sense) have to be considered Internet as a Source of Data The expansion of the Internet over the past decade has provided the researcher with a range of new opportunities for finding information. New “communities” to serve as the object of social scientific enquiry 4). Access to computer-led data becomes handy in solving many complex mysteries. an answer to public access is to deposit the original survey documents in an archive. For unstructured diaries. The interviewers usually make at least two visits and are often expected to spend time checking the diary with the respondent. networking. Although computer assisted methods may help to reduce the amount of manual preparatory work. Using highly trained coders and a rigorous unambiguous coding scheme is very important particularly where there is no clear demarcation of events or behaviour in the diary entries. Time-budget researchers are probably the most advanced group of users of machinereadable diary data and the structure of these data allows them to use traditional statistical packages for analysis. Many diary surveys are small-scale investigative studies that have been carried out with very specific aims in mind. If the diary is unstructured. for example. a well-designed diary with a coherent pre-coding system should cut down on the degree of editing and coding. Relative Cost of Diary Surveys The diary method is generally more expensive than the personal interview. little is available to other researchers for secondary analysis (further analysis of data already collected). However. Following this is an intensive editing procedure. there are few packages and most of them are custom built to suit the specifics of a particular project. However. electronic mail. with related savings in costs and time 3). such as: RESEARCH METHODOLOGY • Problems of sampling • The ethics of conducting research into online communities • Physical access and skills required to use the technologies involved • Accuracy and reliability of information obtained from online sources • The changed chronology of interaction resulting from asynchronous communication Internet is a useful media to get valuable information and results of various surveys. for example: 1). nor possibly desirable. and online questionnaires. Content What is the intent of the content? Are the title and author identified? Is the content “juried?” Is the content “popular” or “scholarly”.556 . The ratio of costs for diaries compared with recall time budgets are of the order of three or four to one Computer Software for Processing and Analysis Probably the least developed area relating to the diary method is the computer storage and analysis of diary data. related to the market place. One of the problems of developing software for processing and manipulating diary data is the complexity and bulk of the information collected. provide criteria to be considered while evaluating Internet resources: 1. the processing can be very labour intensive. the Internet opens up new possibilities for conducting research.the completed diary to be sorted out on the spot. Opportunities for including mixed multiple media in questionnaires On the other hand. Archiving Diary Data In spite of the abundance of data derived from diary surveys across a wide range of disciplines. Editing and Processing The amount of work required to process a diary depends largely on how structured it is. the interviewer while still in the field does part has the editing and coding process. these costs must be balanced against the superiority of the diary method in obtaining more accurate data. Through the use of tools such as online focus groups. involving coding of verbatim entries. and disseminating research results. and personal placement and pick-up visits are more costly than postal administration. The possibility of conducting interviews and focus groups by e-mail. these opportunities also raise new challenges for the researcher. Clearly. while online surveys can be captured directly into a database 2). For many large-scale diary surveys. intensive editing and coding will push up the costs. which merits future attention. For these less structured diaries. Shorter timeframes for collecting and recording data: e-mail messages can be saved and analyzed in qualitative data packages. This is perhaps not surprising given that the budget for many diary surveys does not extend to systematic processing of the data. Software development is certainly an area. Appealing to respondent’s altruistic nature.

parent organization or space limitations. Censorship Is your discussion list “moderated”? What does this mean? Does your search engine or index look for all words or are some words excluded? Is this censorship? Does your institution. etc. define the research context and research needs and decide what sources might be best to use to successfully fill information needs without data overload. Standard style manuals (print and online) provide some examples of how to cite Internet documents. Many services. then you must use a commercial service to access the Internet. or working for.org? What does this tell you about the “publisher”? 3.based services in the morning. to evaluate Internet resources? Can you identify the author. humorous? Is the URL extension .a sense of timing If more than one user will need to access a site.if you want to be able to use telnet and FTP (described below) then don’t subscribe to an e-mail only service. narrative.556 © Copy Right: Rai University 57 . 8. some newspapers have partial but not full text information on the Most services on the Internet are made available through volunteer effort.gov or . using your home telephone. Continuity Will the Internet site be maintained and updated? Is it now and will it continue to be free? Can you rely on this source over time to provide up-to-date information? Some good . Citation What is the context for your research? Can you find “anything” on your topic. apply some restrictions to Internet use? Consider censorship and privacy issues when using the Internet. and the material falls under the copyright conventions. publisher. is it serious. satiric. Internet users. including previous knowledge and experience. or is responsible for. Etiquette . graphic. Connectivity If you are working in a higher education institution you should have easy access to a network connection of some sort. then you should talk to your computer support staff about connecting to the institution via a modem. both to give credit to the author and to provide the reader with avenues for further research. sound or image. “Fair use” applies to short.com. Since the institution’s connection and use of it are paid for en bloc. a higher education institution but you do the majority of your work at home. someone wrote. Why use the networks? International computer networks provide a very cheap and effective vehicle for collaboration and communication. Copyright Even if the copyright notice does not appear prominently. 11. 6. are very busy during their working day. depending on your project.edu sites have moved to . particularly in the USA. that is. must respect copyright. usually as an example for commentary or research. Getting connected Internet resources should be cited to identify sources used. and once you have made the initial connection you should be able to get “out” onto the Internet without incurring any charges other than the local phone call between your home and the institution. opinion. although these standards are not uniform. 5. There are various levels of service available from such providers. before their day starts. You will find that access times are much improved and that you get less “system busy” messages. A list of such commercial services appears below. Other sites offer partial use for free. If you are an independent researcher with no links to an academic institution. and charge fees for continued or indepth use. Credibility Is the author identifiable and reliable? Is the content credible? Authoritative? Should it be? What is the purpose of the information. Make sure that the service you get is the service you need . Comparability Does the Internet resource have an identified comparable print or CD ROM data set or source? Does the Internet site contain comparable and complete information? (For example. Long distances and time zones do not disrupt the process and your colleagues in other institutions or other parts of the world can work at the times that suit them.) Do you need to compare data or statistics over time? Can you identify sources for comparable earlier or later data? Comparability of data may or may not be important. the creation of a document. Materials are in the “public domain” if this is explicitly stated. cited excerpts. If you are a researcher connected with. as you would with a “traditionally” published resource? What criteria do you use to evaluate Internet resources? 4. . 10. as users of print media. based on its mission. commentary. edition. . consider each users’ access and “functionality. Context RESEARCH METHODOLOGY How can you apply critical thinking skills.2. will it be accessible in the time frame needed? Is it accessible by more than one Internet tool? Do users have access to the same Internet tools and applications? Are users familiar with the tools and applications? Is the site “viewable” by all Web browsers? 9.edu. with possible cost implications. Most institutions will allow this. statistics and your quest will be satisfied? Are you looking for current or historical information? Definitions? Research studies or articles? How does Internet information fit in the overall information context of your subject? Before you start searching.” How do users connect to the Internet and what kind of connection does the assigned resource require? Does access to the resource require a graphical user interface? If it is a popular (busy) resource. the power of many computers and programs and the stored information from millions of documents. Once you get used to map reading on the Internet you will find that you have access to a huge variety of resources including the expertise of the many other network users. It is easy to share resources of all kinds amongst a group of researchers and to make results available to as many or as few people as you desire. Please try to use US. 7. though to start with you may have to buy an add-on such as an Ethernet card for your machine. that is. you will not incur any charges for using the network.com. Critical Thinking Internet.

datasets. browse through articles and sometimes books and.niss or make a telnet call to: niss. at your PAD> prompt type call uk.ac. To join this list sends an email message to: socbb-request@soc. This allows you to search the Social Science Citation Index looking for citations (references) quoted in articles from several thousand journals. BIDS is a service provided for the academic community. which allows you to interactively access other computers on the Internet. or if the machine which works out (“resolves”) which number matches which name has gone wrong.76. Most also have a name (the Domain Name) which is a lot easier to remember. particularly abroad. if you are in this position you will probably access the UK JANET via a PAD> prompt. File transfer over the Internet is known as FTP (file transfer protocol). You may be able to run telnet directly from the machine on your desk. Messages are archived for future reference and some systems also allow important files to be stored and retrieved by list members.uk Also there are many other academic discussion lists on the UK service Mail base. But sometimes. This includes word-processed documents.ac. In this way you can read news items and bulletin boards.ac. this is the process in which you make a connection between the machine on your desk and another computer (a “remote host”). publicly accessible service. All services on the Internet have a unique number. If you have a telnet name of a service you wish to call. still asleep . perhaps the one that handles your e-mail. There is a general discussion list for UK sociologists. There are many thousands of computers.63. and many thousands of lists on other systems worldwide. File transfer Anything that can be stored as a file on a computer can be transferred over the networks from one computer to another. published from 1981 onwards. If you are relying on reading your e-mail when you are away from home it is always a good idea to have a note of the IP address (the number) as well as the name. if she’s on the other side of the world. the NISS Gateway’s domain name is: niss. E-mail allows quick question and answer sessions. The process of transfer is known as Anonymous FTP because you don’t need to identify yourself with a password in order to use such systems. if you find them useful.uk 58 © Copy Right: Rai University 11.yet your message will be waiting for her as soon as she switches on her machine. From a PAD> prompt type: call uk.uk And it’s IP address is: 196.surrey. search through library catalogues and data archives. Probably the best known such service in the UK is the BIDS ISI service run from the University of Bath. rapid revisions and corrections to documents and is an ideal medium for collaboration and supervision. but there are also commercial services. These repositories are known as FTP Archives. If you can use telnet from your machine then you can probably use ftp too.niss then from the menus choose: General Services and then: NSF-Relay Guest Telnet Service You will then be prompted for the name or the number of the service you wish to call using telnet. For example. Where e-mail really scores is in-group communication. they will get you through to the same place. E-mail and more RESEARCH METHODOLOGY Many academics have discovered the advantages of using email. then you will need to know the number. Other services only permit access once you or your institution has paid a subscription. which require subscription and/or make access charges.ac. which are freely available for you to transfer. The program. A group of geographically disparate researchers can continually consult and keep in contact using the discussion list facilities available over e-mail. The NISS Gateway at Bath runs a Guest Telnet Service to allow you to use telnet to access the rest of the world via the Internet. Many sites on the Internet have set up repositories of files.ac. ask the remote computer to e-mail them to you.uk If you don’t know how to make either of these calls please ask your computer services. If you have never used interactive access before try accessing the NISS Gateway. The NISS Gateway is an example of a centrally funded. Your correspondent doesn’t have to be available to take your call . Your one message can be circulated to the whole group and any replies or comments also seen by the whole group in a matter of minutes. The information stored on the remote computer appears on your screen. a sort of information supermarket from whose menus you can choose a variety of other services.bris. This article concentrates on using freely accessible resources on the Internet with no charges attached. or alternatively you may be able to run it from an account you have on a larger local computer. which is called the IP address (Internet Protocol). software and graphics. if you are away from your home site. Example of a file transfer session: When the location of a file available for anonymous ftp is quoted it will probably look something like this: Filename: README Site: ftp. you can obtain a username and password from your library or computing service.she may be in a meeting or. is called telnet. A few lists that may be of interest to sociologists are mentioned at the end of this article. which freely allow public access in this way. If your academic institution subscribes to BIDS.556 .Interactive Access (telnet and PAD>) Sometimes known as Terminal Access.1 When you are using telnet you can usually use either of these. Many sites are not yet able to use telnet directly.ac.

NOT to the list itself.uk Login: biron Password: norib PenPages * An American information server concerning all aspects of rural life. This is reassuring . Now type cd pub/info/networks/generic You have now changed into the appropriate directory.edu. books. audio-visual material.anu.Research Activities and Publications Information Database * Information on the ESRC research awards and the surrounding publications resulting from these awards (journals.psu. Access * Telnet: psupen.essex.au login: info Many files from the Coombs archives are available for file transfer.edu Message text: subscribe por first name lastname PSN .ed.sometimes the process takes several minutes.uk Message text: subscribe social-theory first name lastname List details: POR . To join a list you send a specific message to the machine (or sometimes the person) that runs the list.the national Co-operative Extension family database. Includes papers.g. software. it contains several databases including MAPP .ercvax Username: rapid Password: rapid or telnet ercvax. it is polite to enter your e-mail address e.Public Opinion Research * Unmoderated discussion list for academics and professionals interested in public opinion research. You will then receive all the messages that are sent to that list.ferguson@bristol. Leave the system by typing: bye and look for the file on your home system.ac.essex.edu Login: world RAPID . bibliographies. Resources for Sociologists * Online bibliographic details to the ESRC catalogue and subject index. In each case replace first name last name with your names.ac.bris.uk This will connect you to the ftp archive at the University of Bristol. etc) Access * PAD> call uk.: nicky. You may want to print them out and investigate them at your leisure.uk You should then see on your screen the prompt: ftp> Now type: dir This will give you a directory listing of the contents of the root or base directory. Subscription address: mailbase@mailbase. Now type: get README While the process is working the hash symbols will appear on your screen. You will see that one of the files is called README (note the capital letters .Directory: pub/info/networks/generic To transfer this file to your local machine you would follow this procedure: At the system prompt type: ftp ftp.ac.ac.solb1 Login: biron Password: norib or Telnet dasun.Bibliographic Information Retrieval Online * This list aims to provide a forum for the discussion of social theory with the social sciences. Social-Theory: CoombsQuest . The lists below each appear with their subscription address and the correct text of the e-mail message to send.hensa. When the transfer is complete you will get a message saying so.ac.Social Science Research Data Bank * Collection of 21 databases on material specific to the Pacific region and South and North East Asia.ac.case is important in such systems. Subscription address: listserv@unc.ac. Particular emphasis will be placed upon the relationship between psychology and sociology with special reference to the individual and social processes. Access * telnet info. databases. bibliographies.556 59 . You will be prompted for your username. The directory contains a set of practical exercises designed for social scientists to explore the networks.uk directory: pub/uunet/doc/papers/coombspapers BIRON . type: anonymous You will then be asked for your password.uk Username: rapid Password: rapid Discussion Lists: RESEARCH METHODOLOGY When you join an e-mail discussion list it is often called subscribing (but you don’t have to pay). use dir again to look at the contents. publications.ed. Some is of a local nature but there is also general interest material available. using ftp: sitename: unix.ac. It will explain to you the rest of the contents of the directory: pub/info/networks/generic. Now type: hash this asks the system to put hash symbols (#) on your screen while it is working. Contains over 3000 datasets including the General Household Survey and Census of Great Britain. useful to researchers currently conducting survey research projects.Progressive Sociologists Network © Copy Right: Rai University 11. directories and abstracts of theses. census data. Server points to other social science information sites around the world. reference materails. Messages sent to the list itself are distributed to ALL the list members. Server also hosts the Senior Series Database and the 4-H Youth Development Database. etc. This is provided by the Dept of Agricultural Economics and Rural Sociology at Pennsylvania University and contains research briefs. Access * PAD> call uk.

women’s rights. book reviews. More generally. for example. depend on the researcher’s ability to form critical insights based on inter-subjective understanding.1) directory: /pub harnad Secondary Analysis Secondary analysis involves the use of existing data. Canada and Western Europe. The approach may either be employed by researchers to re-use their own data or by independent analysts using previously established qualitative data sets. the approach has not been widely adopted to date.edu Message text: subscribe sos-data firstname lastname Electronic Journal Psycoloquy * Refereed electronic journal intended to implement peer review over the network. In response. it may be argued that even where primary data is gathered via interviews or observation in qualitative studies. 1).edu Message text: subscribe psych firstname lastname Many files from the Psycoloquy journal are available for file transfer. while tightly structured interviews tend to limit the range of responses. secondary analysis differs from systematic reviews and meta-analyses of qualitative studies. It should also be noted that use of the approach does not necessarily preclude the possibility of collecting primary data. etc. There is no easy solution to these problems except to say that greater awareness of secondary analysis might enable researchers to more appropriately recognise and define their work as such. be required to obtain additional data or to pursue in a more controlled way the findings emerging from the initial analysis. all of which. new hypotheses. Origin The second issue concerns the problem of where primary analysis stops and secondary analysis starts. Hence within the research team the data still has to be contextualised and interpreted by those who were not present. there may be more than one researcher involved. This may. in order to pursue a research interest. Subscription address: listserv@unc. A check for the extent of missing data relevant to the secondary analysis but irrelevant to the original study may also be required. This raises questions about the desirability and feasibility of particular strategies for secondary analysis of qualitative data discussed below: Methodological and Ethical Considerations RESEARCH METHODOLOGY Before highlighting some of the key practical and ethical issues. for example.Social Organisation of Knowledge Discussion * Moderated discussion list for sociologists. queries and announcements. etc. which is distinct from that of the original work. Membership is mostly from the US. Tenable The first is whether secondary analysis of qualitative studies is tenable. 60 © Copy Right: Rai University 11. conduct and analysis of both qualitative and quantitative research are always contingent upon the contextualisation and interpretation of subjects’ situation and responses.Social science data discussion list * Provides a forum for discussion of any topic related to social science data.112. community development. which have been discussed in the literature. Compatibility of the data with secondary analysis: Are the data amenable to secondary analysis? This will depend on the ‘fit’ between the purpose of the analysis and the nature and quality of the original data. Subscription address: listserv@pucc. there are two fundamental methodological issues to be considered.edu (128. for example. Why do Secondary Analysis? It has been contended that the approach can be used to generate new knowledge. For independent analysts re-using other researchers’ data there are also related professional issues about the degree of overlap between their respective works.* International discussion list for sociologists concerned with progressive issues and values such as civil rights struggles. secondary analysis is no more problematic than other forms of empirical inquiry.556 . at some stage. it has been suggested that secondary analysis is a more convenient approach for particular researchers. notably students.edu Message text: subscribe psn firstname lastname SOCORG-K . For primary researchers re-using their own data it may be difficult to determine whether the research is part of the original enquiry or sufficiently new and distinct from it to qualify as secondary analysis. In addition. There may also be a need to consult the primary researchers in order to investigate the circumstances of the original data generation and processing.utcc. collected for the purposes of a prior study. and that it allows wider use of data from rare or inaccessible respondents. where semi-structured interviews involved the discretionary use of probes. Scope for additional in-depth analysis will vary depending on the nature of the data. Thus. which aim instead to compile and assess the evidence relating to a common concern or area of practice. designs using semi-structured schedules may produce more rich and varied data. using ftp: sitename: ftp: princeton. Despite the interest in and arguments for developing secondary analysis of qualitative data. The list is used most frequently to ask for references to sources of data on some particular subject but it also includes announcements of conferences and new data resources.128.colorado. Subscription address: listserv@csf. or support for existing theories. In this respect. Subscription address: listserv@vm.princeton. the quality of original data will also need to be assessed.utoronto. given that it is often thought to involve an intersubjective relationship between the researcher and the researched. that it reduces the burden placed on respondents by negating the need to recruit further subjects.ca Message text: subscribe socorg-k firstname lastname SOS-DATA . The journal is primarily for psychologists but it is interdisciplinary in the topics covered and includes articles. 2). A more radical response is to argue that the design. Qualitative research is an iterative process and grounded theory in particular requires that questions undergo a process of formulation and refinement over time.

the procedures to be followed. some more specific guidelines are needed for researchers about the ethical issues to be considered when 11. involved in the secondary analysis of single. All of the survey’s results should be To see if the potential of secondary analysis can be realised in practice. multiple and mixed data sets. a professional judgment may have to be made about whether re-use of the data violates the contract made between subjects and the primary researchers. A manufacturer does a survey of the potential market before introducing a new product . needs. surveys gather information from only a portion of a population of interest — the size of the sample depending on the purpose of the study. This way. where all members of the population are studied.. informed consent cannot be presumed. This is likely to be easier if they were part of the original research team. The industry standard for all reputable survey organizations is that individual respondents should never be identified in reporting survey findings. they also can be conducted in many ways — including over the telephone. Then. by chance.” thus.” That is. as well as the quality. that it is particularly important that the study design. Developing the Approach: Not only do surveys have a wide variety of purposes. our major problems and tasks no longer mainly center on the production of the goods and services necessary for survival and comfort. Our “society. The survey’s intent is not to describe the particular individuals who. business. methods and issues involved are reported in full. then ideally they should also be able to consult with the primary researchers in order to assess the quality of the original work and to contextualise the material (rather than rely on field notes alone). by mail. Secondly. if so. Ethical Issues How was consent obtained in the original study? • A sample of voters is questioned in advance of an election • to determine how the public perceives the candidates and the issues. all surveys do have certain characteristics in common. Unlike a census. particularly with regard to the re-use of other researchers’ data should be carried out. including tapes and field notes. or in person. the sample is not selected haphazardly or only from persons who volunteer to participate. This “sample” is usually just a fraction of the population being studied. as well as an account of how methodological and ethical considerations were addressed. whether conducting secondary analysis in an independent capacity or not. requires a prompt and accurate flow of information on preferences. Secondary analysts require access to the original data. further work on the protocols for conducting secondary analysis of qualitative data. in order to re-examine the data with the new focus in mind. Where sensitive data is involved. secondary analysis remains an under-developed and ill-defined approach. the results can be reliably projected from the sample to the larger population. If not. Information is collected by means of standardized procedures so that every individual is asked the same questions in more or less the same way. together with a description of the processes involved in categorising and summarising the data for the secondary analysis. Growing interest in reusing data make it imperative that researchers in general now consider obtaining consent which covers the possibility of secondary analysis as well as the research in hand. A government entity commissions a survey to gather the factual information it needs to evaluate existing legislation or to draft proposed new legislation.. Various methodological and ethical considerations pose a challenge for the would-be secondary analyst. and colleagues involved in the primary research but not in the secondary analysis may have to be negotiated. and behavior. Survey This is an “information society. particularly those who were not part of the primary research team.Position of the secondary analyst Was the analyst part of the original research team? This will influence the decision over whether to undertake secondary analysis and. Ideally this should include an outline of the original study and data collection procedures. value and impact of this work. In a bona fide survey. • • Thirdly. data archive managers.556 © Copy Right: Rai University 61 . This could include examination of the methods used. there should be greater consideration of the issues • Finally. Given that it is usually not feasible to seek additional consent. Reporting of Original and Secondary Data Analysis undertaking qualitative work that may be re-used in the future RESEARCH METHODOLOGY Conclusion Despite growing interest in the re-use of qualitative data. Further work to develop this approach is required to see if the potential benefits can actually be realised in practice. What is a Survey? Today the word “survey” is used most often to describe a method of gathering information from a sample of individuals. some form of contractual agreement between the secondary analyst and the primary researchers. It is scientifically chosen so that each person in the population will have a measurable chance of selection. this is consistent with professional guidelines on ethical practice. It is in response to this critical need for information on the part of the government. Finally. and social institutions that so much reliance is placed on surveys. Nonetheless. are part of the sample but to obtain a composite profile of the population. which have explicitly (and perhaps implicitly) used this approach. developmental work still needs to be undertaken: • First. For example Such is the complexity of secondary analysis. there should be a more comprehensive review of the literature on secondary analysis and studies. Further consultation may also be helpful in terms of crosschecking the results of the secondary analysis.

and the effects on family life of women working outside the home. Many surveys study all persons living in a defined area. etc. physicians. The wide variety of issues with which surveys deal is illustrated by the following listing of actual uses scientists. psychologists. health professionals.556 62 © Copy Right: Rai University .g. Surveys may also be conducted with national. Agency for Health Care Policy and Research sponsors a periodic survey to determine how much money people are spending for different types of medical care • Local transportation authorities conduct surveys to acquire information on commuting and travel habits • Magazine and trade journals use surveys to find out what their subscribers are reading • Surveys are conducted to ascertain who uses our national parks and other recreation facilities. Surveys are concerned with: • Major TV networks rely on surveys to tell them how many and what types of people are watching their programs • Statistics Canada conducts continuing panel surveys of children (and their families) to study educational and other needs • Auto manufacturers use surveys to find out how satisfied people are with their cars • The U. One well-known example is the measurement of TV audiences carried out by devices attached to a sample of TV sets that automatically record the channels being watched. there is no simple rule for sample size that can be used for all surveys. political • Opinions and attitudes (such as a pre-election survey of • voters). Factual characteristics or behaviors (such as people’s health. by the respondent. though. or users of a particular product or service. soils.S. state. Method of data collection Surveys can be classified by their method of data collection. In newer methods of data collection. State polls and metropolitan area polls. Most are directed to a specific administrative. Extracting data from samples of medical and other records is also frequently done. housing. beliefs.000 persons to get reasonable information about national attitudes and opinions. Some surveys combine various methods. the well-known national polls frequently use samples of about 1. relates to how the results will be used.presented in completely anonymous summaries. The great majority of surveys. How Large Must The Sample Size Be? The sample size required for a survey partly depends on the statistical quality needed for survey findings. but others might focus on special population groups — children. Mail surveys can be relatively low in cost. In-person interviews in a respondent’s home or office are much more expensive than mail or telephone surveys. Size and type of sample. Telephone interviews are an efficient method of collecting some types of data and are being increasingly used. telephone interview. They may be necessary. or scientific purpose. The major broadcasting networks and national news magazines also conduct polls and report their findings. such as subscribers to a specialized magazine or members of a professional association. comparative voting behavior. to locate older individuals eligible for Medicare) and then make appointments for an in-person interview. Even so. attitudes. They conduct surveys on national public opinion on a wide range of current issues. Who Conducts Surveys? We all know about the public opinion surveys or “polls” that are reported by the press and broadcast media. information is entered directly into computers either by a trained interviewer or. are reported regularly in many localities. community leaders. it is easy to appreciate the value of using surveys to make informed decisions in a complex society such as ours. For instance. such as statistical tables and charts. As with any other survey. the implications of health problems on people’s lives. Analysts.g. Mail surveys can be most effective when directed at particular groups. in turn. They lend themselves particularly well to situations where timeliness is a factor and the length of the survey is limited. Economists. Much depends on the professional and financial resources available. increasingly. animate or inanimate objects — animals. are not public opinion polls. and behaviors. and in-person interview surveys are the most common.. RESEARCH METHODOLOGY Surveys also can be used to study either human or non-human populations (e.000 individuals can reflect various characteristics of the total population. especially when complex information is to be collected.S. however. While many of the principles are the same for all surveys. the unemployed. Surveys provide a speedy and economical means of determining facts about our economy and about people’s knowledge. consumer spending. often supported by a local newspaper or TV station. and sociologists conduct surveys to study such matters as income and expenditure patterns among households. Mail. When it is realized that a properly selected sample of only 1. expectations. though. or transportation habits). the focus here will be on methods for surveying individuals. or local samples.).. What Survey Questions Do You Ask? You can further classify surveys by their content. often find that a moderate sample size is sufficient statistically and operationally. a survey worker may use the telephone to “screen” or locate eligible respondents (e. What are Some Common Survey Methods? Surveys can be classified in many ways. commercial. problems exist in their use when insufficient attention is given to getting high levels of cooperation. For example. Bureau of the Census conducts a survey each month to obtain information on employment and unemployment in the nation • The U. housing. Surveys provide an important source of basic scientific knowledge. the roots of ethnic or racial prejudice. this. 11.

Several professional organizations dealing with survey methods have codes of ethics that prescribe rules for keeping survey responses confidential.Many surveys combine questions of both types. a recent NBC/Wall Street Journal poll asked two very similar questions with very different results: (1) Do you favor cutting programs such as social security.556 © Copy Right: Rai University 63 . although basic computer skills have become increasingly important for applicants. the senior staff will have taken courses in survey methods at the graduate level and will hold advanced degrees in sociology.. taking five minutes or less — or it can be quite long — requiring an hour or more of the respondent’s time. thus.. analyze the data.. In most survey research organizations. Survey takers may ask respondents to rate a political candidate or a product on some type of scale. 11% no opinion. marketing. 66% oppose. appears at the door. medicaid. Typically. Respondents may be asked if they have heard or read about an issue . well suited for individuals not wanting full-time employment or just wishing to supplement their regular income. Surveys should be carried out solely to develop statistical information about a subject. and farm subsidies to reduce the budget deficit? The results: 23% favor. 14% no opinion. These and other “selfselected opinion polls (SLOPS)” may be misleading since participants have not been scientifically selected. marital status. They should not be designed to produce predetermined results or as a ruse for marketing and similar activities. Since it is inefficient to identify and approach a large national sample for only a few items of information. was mainly part-time work and. persons with strong opinions (often negative) are more likely to respond. some surveys employ a “panel design. a half dozen more on another subject. The questionnaire may be very brief — a few questions. and certain factual information that will help the survey analyst classify their responses (such as age.. The main requirements for interviewing are an ability to approach strangers (in person or on the phone).. or stops people at a shopping mall. interviewers. or they will have the equivalent in experience. 900 “polls”) or magazine write-in “polls. Another important violation of integrity occurs when what appears to be a survey is actually a vehicle for stimulating donations to a cause or for creating a mailing list to do direct marketing. The recommended policy for survey organizations to safeguard such confidentiality includes: RESEARCH METHODOLOGY • Using only number codes to link the respondent to a • questionnaire and storing the name-to-code linkage information separately from the questionnaires Refusing to give the names and addresses of survey respondents to anyone outside the survey organization. 25% oppose. and report the survey’s findings. or they may ask for a ranking of various alternatives... survey interviewing. Changes in the labor market and in the level of survey automation have begun to alter this pattern — with more and more survey takers seeking to work full time. including clients about respondents after the responses have been entered into the computer • Destroying questionnaires and identifying information • Omitting the names and addresses of survey respondents from computer files used for analysis • Presenting statistical tabulations by broad enough categories so that individual respondents cannot be singled out. there are “omnibus” surveys that combine the interests of several clients into a single interview. and so on. Anyone asked to respond to a public opinion poll or concerned about the results should first decide whether the questions are fair. process the data collected.. how strongly they feel and why. Most call-in TV inquiries (e. in SLOPS.” in which the same respondents are interviewed on two or more occasions. and to collect the data needed in exact accordance with instructions. In these surveys. 11. but equally important are the in-house research staffs. What About Confidentiality and Integrity? The confidentiality of the data supplied by respondents is of prime concern to all reputable survey organizations. For example. supervise the interviews. Most research organizations provide their own training for the interview task. The manner in which a question is asked can greatly affect the results of a survey. Less visible. or psychology...g. and place of residence). Such surveys are often used during an election campaign or to chart a family’s health or purchasing pattern over a period of time. what they know about it . What are Other Potential Concerns? The quality of a survey is largely determined by its purpose and the way it is conducted.. respondents will be asked a dozen questions on one subject. Questions may be open-ended (“Why do you feel that way?”) or closed (“Do you approve or disapprove?”). statistics. who among other things — plan the survey. Middle-level supervisors and research associates frequently have similar academic backgrounds to the senior staff or they have advanced out of the ranks of clerks. their opinion . their interest in the issue . although occasionally requiring long days in the field. gender. (2) Do you favor cutting government entitlements to reduce the budget deficit? The results: 61% favor.” for example. are highly suspect. to persuade them to participate in the survey. or coders on the basis of their competence and experience. Traditionally. past experience with it . choose the sample. Experience is not usually required for an interviewing job. develop the questionnaire. Because changes in attitudes or behavior cannot be reliably ascertained from a single interview.. Who Works on Surveys? The survey worker best known to the public is the interviewer who calls on the telephone. medicare.. occupation.

etc.556 . and operationally define the observation acts that should be measured. or other classroom aids?______________________________________ g: Return exams promptly?______________________ 3: What is professor’s strongest point?________________ © Copy Right: Rai University 64 11. Any information you can secure should be obtainable from observation alone. This study will be part of a project to improve office efficiency and paperwork flow. filing systems. It is expected to involve the redesign of office space and the purchase of new office furniture and organization elements. A: B: C: D: What other information might you find useful to observe? How would you decide what information to collect? Devise the operational definitions you would need What would you say in your instructions to be observers you plan to use? E: How might you sample this shoppers traffic? Q3] In a class project. and you would like to classify these shoppers on various relevant dimensions. A: B: what are the varieties of information that might be observed? Select a limited number of content areas for study. students developed a brief self – administered questionnaire by which they might quickly evaluate a professor. You are interested in determining how many shoppers pass by this store. UNIT I FUNDAMENTALS OF RESEARCH PROCESS RESEARCH METHODOLOGY 4: What is professor’s weakest point?__________________ 5: What kind of class does the professor teach?___________ 6: Is this course required?___________________________ Q2] You wish to analyze the pedestrian traffic that passes a given store in a major shopping center. One student submitted the following instrument. Evaluate the questions asked and format of the instrument Professor Evaluation Form 1: Overall .) Your company has been asked to propose an observational study to examine the use of office space by white –collar and managerial workers for a large insurance company.LESSON 9: TUTORIAL Q1] Assume that you are the manufacturer of modules office systems and furniture as well as office organization elements (desktop and wall organizers. how would you rate this professor ? Good_________ Fair_______ Poor____ 2: Does this professor a: Have good class delivery?______________________ b: Know the subject?__________________________ c: Have a positive attitude toward the subject?________ d: grade fairly?_______________________________ e: Have a sense of humor?______________________ f: Use audiovisual case examples.

RESEARCH METHODOLOGY LESSON 10: QUESTIONNAIRE DESIGN Students.e Questionnaire. For example.or groupadministered questionnaire. 3. and emphasize. like its advantages and disadvantages. technique have dramatically reshaped our traditional views on the time-intensive nature and inherent unreliability of the interview technique. The questions reach the respondends very efficiently. bias in questions. • The effects of potential human errors (for example. The most common means of collecting data are the interview and the self. Economical in Money and Time The questionnaires will save your time and money. • The questionnaires can be send to a large group and can be collected simultaneously. the interview has been the most popular datacollecting instrument. however when personal interview is done the interviewer has to go to each and every individual seperately. This knowledge will allow you to maximize the strengths of the questionnaire while minimizing its weaknesses. The advantages of administering a questionnaire instead of conducting an interview are: The primary advantages of questionnaire are (i) (ii) (iii) (iv) it is economical in terms of money and time it gives samples which are more representative of population it generates the standardized information it provides the respondent the desired privacy • The use of a questionnaire also eliminates any bias introduced by the feelings of the respondents towards the interviewer (or vice versa). The Questionnaire — Pros and Cons First of all it is important for you to understand the advantages and disadvantages of the questionnaire as opposed to the personal interview. This will provide you with more representative samples. 2. 4. We will also be discussing various situations on various issues relating to questionnaire mode of collecting data. Standardization The questionnaire provides you with a standardized datagathering procedure. What we can do is try to minimize them. and biasing by “explaining”) can be minimized by using a well-constructed questionnaire. there by reducing the time of operation and is economical. the questionnaire has surpassed the interview in popularity. criteria of a good research design. yielding what is commonly known as computer automated telephone interview (or CATI) surveys. • There is no need to train the interviewers.556 © Copy Right: Rai University . Recently. the cost of postage should be less than that of travel or telephone expenses. 1. types of questions. Non returns cannot be overcome entirely. Since a typical questionnaire usually has a lower cost per respondent. Better Samples Many surveys are constrained by a limited budget. most surveyors believe the respondent will answer a questionnaire more frankly than he would answer an interviewer. especially if you found only a small number of the returns were in favour of the policy. Non returns Non returns are questionnaires or individual questions that are not answered by the people to whom they were sent. Finally. it is important to guard. We will discuss these advantages of Questionnaire technique of collecting primary data 1. • Recent developments in the science of surveying have led to incorporating computers into the interview process. This would introduce nonrandom (or systematic) bias into your survey results. because of a greater feeling of anonymity. Respondent Privacy • Although the point is debatable. nonresponse etc. Advances in using this survey 11. today we shall be studying a very important part of data collection i. and they might comprise the majority of the non returns. Techniques to accomplish this we will be studying later on. 65 • • To maximize this feeling of privacy. We know that the final step in preparing the survey is developing the data collection instrument. The respondent has no one to impress with his/her answers and need have no fear of anyone hearing them. calling at inconvenient times. one can alter the pattern of question asking. you can send it to more people within a given budget (or time) limit. You may be surveying to determine the attitude of a group about a new policy. the respondent’s privacy The primary disadvantages of the questionnaire are discussed on the grounds of: • (i)non return • (ii) mis-interpretation • (iii validity We will discuss them in detail. In the past. As you are going to became managers of future and you would be facing problems relating to decision making and planning –the art of preparing the questonnaire will help you in generating the desired information . Some of those opposed to it might be afraid to speak out.

You should remember that • According to him the background question is used to obtain demographic characteristics of the group being studied. privacy) while minimizing the number of non returns. • If the respondents do not understand the mechanical procedures necessary to respond to the questions. If respondents become confused. grade. they should be consistent with your survey plan. If you are using novel terms. a special form of the multiplechoice question. they will either give up on the survey (becoming a nonreturn) or answer questions in terms of the way they understand it. • Did the person you wanted to survey give the questionnaire • • Did the respondent deliberately choose answers to mislead the surveyor? to a friend or complete it personally? Did the individual respond indiscriminately? Types of Questions Before investigating the art of question writing. such as age. how he was chosen to participate. This type requires respondents to answer the question in their own words . these questions should be consistent with your data analysis plan. • Since the questions are the means by which you are going to collect your data. These questions provide answers that cover a range of feelings. more representative samples. sex. it will help in minimizing both nonreturn and validity problems. sometimes.we need to concentrate primarily on factors relating to their application. Your questionnaire’s instructions and questions must be able to stand on their own and you must use terms that have commonly understood meanings throughout the population under study.must work together to have a positive impact on the success of the survey. therefore. misinterpretations. but not necessarily the way you meant it. • If possible. Without observing the respondent’s reactions (as would be the case with an interview) while completing the questionnaire. standardization. Cantelou (1964. and who is sponsoring the survey (the higher the level of sponsorship the better). Instructions • The cover letter should be followed by a clear set of instructions explaining how to complete the survey and where to return it. say so explicitly in the instructions. It is prudent. level of assignment. • The final type of question is the free response or open-end question.2. their answers will be meaningless. • The second and most common type of question is the multiple choice or closed-end question. Misinterpretation Misinterpretation occurs when the respondent does not understand either the survey instructions or the survey questions. Criteria of a good questionnaire What is the secret of getting all strengths of questionnaire while minimizing its weakness? The secret to take advantage of the strengths of questionnaires (lower costs. 66 © Copy Right: Rai University 11. and the questions . it should explain why the survey is important to him.It can be used to gather opinions or to measure the intensity of feelings. it will be useful to examine the various types of questions. be sure to define them so all respondents understand your meaning. Multiple-choice questions are the most frequently used types of questions in surveying today. Validity The third disadvantage of using a questionnaire is inability to check on the validity of the answer.This information is used when you are categorizing your results by various subdivisions such as age or grade. Therefore. This would turn out to be more serious than non return . and so forth.556 . and tell them to leave the NAME column blank Set of questions The third and final part of the questionnaire is the set of questions. • They should not be ambiguous or encourage feelings of frustration or anger that will lead to nonreturns or validity problems. • Each of the three portions of the questionnaire – the cover letter. and validity problems lies in the preparation of a survey questionnaire. The key to minimizing the disadvantages of the survey questionnaire lies in the construction of the questionnaire itself. It is used to determine feelings or opinions on certain issues by allowing the respondent to choose an answer from a list you have provided . 3. • A poorly developed questionnaire contains the seeds of its own destruction. Cover letter The cover letter should explain to the respondent the purpose of the your survey and it should motivate him to reply truthfully and quickly. • If you do not want respondents to provide their names. • The intensity question. You have no way of knowing the true answers to following questions • Also you will strongly stress on the confidentiality of the results RESEARCH METHODOLOGY • When you will enclose a well written cover letter. the instructions. p 57) identifies four types of questions used in surveying. is used to measure the intensity of the respondent’s feelings on a subject.

fair. One person’s “fair” may be another person’s “bad. These may not get the answers you need but they will minimize the number of invalid responses. Such questions can alienate the respondent and may open your questionnaire to criticism. • • Do not ask leading questions. and bad sparingly. • Ask easier questions first. Having said this. For example. ask the question. you can alleviate many such conflicts. Respondents who cannot find their answer among your list will be forced to give an invalid reply or. These terms mean different things to different people. possibly. Consider this question: “Are you in favour of raising pay and lowering benefits?” What would a yes (or no) answer mean? • Formulate your questions and answers to obtain exact information and to minimize confusion. Avoid the use of technical terms. If the information is necessary. An appropriate corollary to Murphy’s Law in this case would be: “ If someone can misunderstand something. • Group similar questions together.556 Pretest (Pilot Test) the Questionnaire This is the most important step in preparing your questionnaire. Below are some helpful hints typical of those that appear most often in texts on question construction. or other. Limit each question to one idea or concept. Use subjective terms such as good. may leave out a clause and thus change the meaning of the question. • Allow for all possible answers. • If you must use personal or emotional questions. add a third option. Have some questions that are worded differently. • Keep the number of questions to a minimum. If you cannot avoid them. Long questions tend to become ambiguous and confusing. • • Understand the should-would question. A great number of “don’t know” answers to a question in a fact-finding survey can be a useful piece of information. in different parts of the questionnaire. A properly worded question gives no clue as to which answer you may believe to be the correct one. A question consisting of more than one idea may confuse the respondent and lead to a meaningless answer. Respondents answer “should” questions from a social or moral point of view while answering “would” questions in terms of personal preference. but are soliciting the same information. in trying to comprehend a long question. Apply the “So what?” and “Who cares?” tests to each question. Avoid dichotomous (two-answer) questions (except for obvious demographic questions such as gender). keep in mind that you should not leave out questions that would yield necessary data simply because it will shorten your survey. don’t know. A respondent. Some respondents may give the answer you are looking for whether or not they think it is right.Questionnaire Construction The complex art of question writing has been investigated by many researchers From their experiences. such as no opinion. become frustrated and refuse to complete the survey. you have reason to doubt the validity of their entire set of responses. • • Have your opening questions arouse interest. place them at the end of the questionnaire. have general questions precede specific ones. 11. Wording the question to reduce the number of possible answers is the first step. Analyze your audience and write on their level. they offer valuable advice. and perhaps should be cautious when analyzing the results. If you find a respondent who answers these questions differently. These questions should be designed to identify the respondents who are just marking answers randomly or who are trying to game the survey (giving answers they think you want to hear). they will”. “Nice-to-know” questions only add to the size of the questionnaire. © Copy Right: Rai University 67 . The respondent may feel your survey is getting a bit too personal! RESEARCH METHODOLOGY • Keep the questions short. • Keep the language simple. does “How old are you?” mean on your last or your nearest birthday? By including instructions like “Answer all questions as of (a certain date)”. Avoid emotional or morally charged questions. For this reason. Ask only questions that will contribute to your survey. • Include a few questions that can serve as checks on the accuracy and consistency of the answers as a whole. • To minimize conditioning. you may decide to exclude their response sheet(s) from the analysis. if at all. but research suggests higher return rates correlate highly with shorter surveys. These questions are worded in a manner that suggests an answer. But a majority of other answers may mean you have a poor question. There is no commonly agreed on maximum number of questions that should be asked.” How much is “often” and how little is “seldom?” • Organize the pattern of the questions: • Place demographic questions at the end of the questionnaire.

• After explaining the purpose of the pretest. Finally. and each of the questions and answers. but each should be constructed carefully with well. A professional looking product will increase your return rate. • Use the above 12 hints as a checklist. questions. 68 © Copy Right: Rai University 11. • The questionnaire is the means for collecting your survey data. • The types of bias which you will be encounted with when you prepare and execute a questionnaire with be studied in the next lecture. instructions and questions should take advantage of the strengths of questionnaires while minimizing their weaknesses. Properly constructed questions and well-followed survey procedures will allow you to obtain the data needed to check your hypothesis and.The purpose of the pretest is to see just how well your cover letter motivates your respondents and how clear your instructions. • Each of its three parts the cover letter. instructions. • Each of the different kinds of questions is useful for eliciting different types of data. and answers are. ask them to critique the cover letter. • Finally. minimize the chance that one of the many types of bias will invalidate your survey results. RESEARCH METHODOLOGY • You should choose a small group of people (from three to ten should be sufficient) you feel are representative of the group you plan to survey. A poorly designed survey that contains poorly written questions will yield useless data regardless of how “pretty” it looks.556 . make your survey interesting Let us now summarise what we have studied today.developed construction guidelines in mind. • It should be designed with your data collection plan in mind. and go through them with your pilot test group to get their reactions on how well the questionnaire satisfies these points. Don’t be satisfied with learning only what confused or alienated them. let them read and answer the questions without interruption. Have your questionnaire neatly produced on quality paper. • When they are through. • Question them to make sure that what they thought something meant was really what you intended it to mean. redo any parts of the questionnaire that are weak. at the same time.

the weighting scheme should remain consistent throughout the survey. 7-. Whether “agree” or “disagree” gets the higher weight actually makes no difference. Before studying the types of bias first we will deal with the Intensity quest Strongly Disagree (1) Disagree (2) Undecided (3) Agree (4) Strongly Agree (5) 2. The “questions” are in the form of statements that seem either definitely favourable or definitely unfavourable toward the matter under consideration. You being an investigator collect a large number of definitive statements relevant to the attitude being investigated. while. Instead of a finding that 80 percent of the respondents favour a particular proposal or issue.law” feelings this : 1. at the same time. the intensity question is used to measure the strength of a respondent’s feeling or attitude on a particular topic. Each of its three parts should take advantage of the strengths of questionnaires while minimizing their weaknesses. One procedure for constructing Likert-type questions is as follows (adapted from Selltiz.developed construction guidelines in mind. with its scaled answers and average scores. 11. The scoring is consistent with the attitude being measured. Likert scale.556 © Copy Right: Rai University 69 . 1963. Randomly select some questions and flip-flop the Strongly Agree — Strongly Disagree scale to prevent the respondent from getting into a pattern of answering (often called a response set). You conduct and score a pretest of your survey. Likert-type answer scale It allows the respondent to choose one of several (usually five) degrees of feeling about a statement from strong approval to strong disapproval. On the whole. bias in questions. bias in volunteer samples and levels of measurement . can supply quantitative information about your respondents’ attitudes toward the subject of your survey. Properly constructed questions and well-followed survey procedures will allow you to obtain the data needed to check your hypothesis and. Almost anything can be fixed up in the courts if you have enough money. you can obtain results that show 5 percent of them are strongly in favour whereas 75 percent are mildly in favour.RESEARCH METHODOLOGY LESSON 11: ISSUES IN QUESTIONNAIRE Student’s today we will be continuing with the questionnaire design. pp 367-368): 1. 3. But for ease in interpreting the results of the questionnaire. Each of the different kinds of questions is useful for eliciting different types of data. or 9-point . you should choose phrases that are far enough apart from one another to be easily discriminated. 2. In this manner. The respondent’s total score is the sum of the scores on all questions. The most favourable response to the attitude gets the highest score for each question. judges are honest. We have studied in the last lecture that the questionnaire is the means for collecting your survey data. Today we will be studying the Intensity Questions . The most common and easily used intensity (or scaled) question involves the use of the Likert-type answer scale. but the second type of response supplies more useful information. These findings are similar. at the same time. The results of this research can be of considerable value when trying to decide on the right set of phrases to use in your rating or intensity scale. Intensity Questions and the Likert Scale As I have told you before . et al. 4. 5-. The intensity question. Strongly Disagree (1) Disagree (2) Undecided (3) (4) Strongly Agree (5) Agree The weights (shown by the numbers below the answers) are not shown on the actual questionnaire and. A number of studies have been conducted over the years attempting to determine the limits of a person’s ability to discriminate between words typically found on rating or intensity scales. but each should be constructed carefully with well. A person who feels that laws are unjust would score lower than one who feels that they are just. The stronger the feeling. If you are investigating more than one attitude on your survey. When selecting phrases for a 4-. the respondent will be less able to guess what you are doing and thus more likely to answer honestly. therefore.. intermix the questions for each attitude. Illustration The following questions designed to measure the amount of “anti. minimize the chance that one of the many types of bias will invalidate your survey results. It should be designed with your data collection plan in mind. Such questions allow you to obtain more quantitative information about the survey subject. keeping them close enough that you don’t lose potential information. reliability and validity of findings. the higher (or lower) the score. are not seen by the respondents. The answers are given scores (or weights) ranging from one to the number of available answers. with the highest weight going to the answer showing the most favorable attitude toward the subject of the survey.

You should also try to gauge whether the phrases you are using are commonly understood so that different respondents will interpret the meaning of the phrases in the same way. An obvious example is shown with the following 3 phrases: Strongly Agree, Neutral, Strongly Disagree. • These are easily discriminated, but the gap between each choice is very large. • How would a person respond on this three-point scale if they only agreed with the question being asked? • There is no middle ground between Strongly Agree and Neutral. • The same thing is true for someone who wants to respond with a mere disagree. • Your scales must have enough choices to allow respondents to express a reasonable range of attitudes on the topic in question, but there must not be so many choices that most respondents will be unable to consistently discriminate between them.

• Since little can be done to estimate the feelings of the nonreturnees, especially in a confidential survey, the only solution is to minimize the number of non-returns. The non returns can be minimized by

RESEARCH METHODOLOGY

• Use follow-up

These letters are sent to the non-respondents after a period of a couple of weeks asking them again to fill out and return the questionnaire. The content of this letter is similar to that of the cover letter.

**• Use high-level sponsorship.
**

People tend to reply to surveys sponsored by organizations they know or respect. If possible, use the letterhead of the sponsor (sponsoring the research) on the cover letter.

**• Make your questionnaire attractive, simple to fill out,
**

and easy to read. A professional product usually gets professional results.

**• Keep the questionnaire as short as possible.
**

You are asking for a person’s time, so make your request as small as possible Use your cover letter to motivate the person to return the questionnaire One form of motivation is the have the letter signed by an individual known to be respected by the target audience for your questionnaire. In addition, make sure the individual will be perceived by the audience as having a vested interest in the information needed.

**Bias and How to Combat It
**

Like any scientist or experimenter, surveyors must be aware of ways their surveys might become biased and of the available means for combatting bias. The main sources of bias in a questionnaire are:

•

**• a non-representative sample –leading questions and non
**

returns

**• question misinterpretation and mistrustful.
**

Now we will discuss these in detail. Non-representative sample Surveyors can expose themselves to possible non-representative sample bias in two ways. 1. The first is to actually choose an non-representative sample. This bias can be eliminated by careful choice of the sample . 2. The second way is to have a large number of non- returns.

**• Use inducements to encourage a reply.
**

These can range from a small amount of money attached to the survey to an enclosed stamped envelope. A promise to report the results to each respondent can be helpful. If you do promise a report, be sure to send it. Proper use of these techniques can lower the non-return rate to acceptable levels.

**• The non-return bias (also called non-respondent bias) can
**

affect both the sample survey and the complete survey. • The bias stems from the fact that the returned questionnaires are not necessarily evenly distributed throughout the sample. The opinions or attitudes expressed by those who returned the survey may or may not represent the attitudes or opinions of those who did not return the survey.

Misinterpretation

The second source of bias is misinterpretations of questions. • Misinterpretation of questions can be limited by clear i s r c i n , constructed questions, and through judicious pilot n t u t o swell testing of the survey.

**• Biased questions can also be eliminated by constructing the
**

questions properly and by using a pilot test.

• It is impossible to determine which is true since the nonrespondents remain an unknown quantity. Ilustration A survey shows that 60 percent of those returning questionnaires favour a certain policy. If the survey had a 70 percent response rate (a fairly high rate as voluntary surveys go), then

**• Finally, bias introduced by untruthful answers can be
**

controlled by internal checks and a good motivational cover letter.

**• Although bias cannot be eliminated totally, proper construction
**

of the questionnaire, a well-chosen sample, follow- up letters, and inducements can help control it

**• the favourable replies are actually only 42 percent of those
**

questioned (60 percent of the 70 percent who replied), which is less than 50 percent!

**Bias in Volunteer Samples
**

Now we will illustrates the many diverse, and sometimes powerful factors that influence survey findings as a result of using volunteers in a survey.

70

© Copy Right: Rai University

11.556

The exclusive use of volunteers in survey research represents another major source of bias to the surveyor — especially the new surveyor.

2. Volunteers tend to be more interested in religion than nonvolunteers, especially when volunteering is for questionnaire studies. 3. Volunteers tend to be more altruistic than non- volunteers. 4. Volunteers tend to be more self-disclosing than non volunteers. 5. Volunteers tend to be more maladjusted than non volunteers, especially when volunteering is for potentially unusual situations (e.g., drugs, hypnosis, high temperature, or vaguely described experiments) or for medical research employing clinical rather than psychometric definitions of psychopathology. 6. Volunteers tend to be younger than non- volunteers, especially when volunteering is for laboratory research and especially if they are female. Conclusions Warranting Minimum Confidence 1. Volunteers tend to be higher in need for achievement than non-volunteers, especially among American samples. 2. Volunteers are more likely to be married than non volunteers, especially when volunteering is for studies requiring no personal contact between investigator and respondent. 3. Firstborns are more likely than later borns to volunteer, especially when recruitment is personal and when the research requires group interaction and a low level of stress. 4. Volunteers tend to be more anxious than non -volunteers, especially when volunteering is for standard, unstressful tasks and especially if they are college students. 5. Volunteers tend to be more extraverted than non- volunteers when interaction with others is required by the nature of the research. Borg and Gall (1979) have suggested how surveyors might use this listing to combat the effects of bias in survey research. For example, they suggest that: The degree to which these characteristics of volunteer samples affect research results depends on the specific nature of the investigation. For example, a study of the level of intelligence of successful workers in different occupations would probably yield spuriously high results if volunteer subjects were studied, since volunteers tend to be more intelligent than non-volunteers On the other hand, in a study concerned with the cooperative behavior of adults in work-group situations, the tendency for volunteers to be more intelligent may have no effect on the results, but the tendency for volunteers to be more sociable could have a significant effect. It is apparent that the use of volunteers in research greatly complicates the interpretation of research results and their general ability to the target population, which includes many individuals who would not volunteer.

RESEARCH METHODOLOGY

**• Although it may not be immediately evident, • It is nonetheless empirically true that volunteers, as a group,
**

possess characteristics quite different from those who do not generally volunteer.

**• Unless the surveyor takes these differences into consideration
**

before choosing to use an exclusively volunteer sample, the bias introduced into the data may be so great that the surveyor can no longer confidently generalize the survey’s findings to the population at large, which is usually the goal of the survey. Conclusions Warranting Maximum Confidence 1. Volunteers tend to be better educated than non- volunteers, especially when personal contact between investigator and respondent is not required. 2. Volunteers tend to have higher social-class status than non volunteers, especially when social class is defined by respondents’ own status rather than by parental status. 3. Volunteers tend to be more intelligent than non- volunteers when volunteering is for research in general, but not when volunteering is for somewhat less typical types of research such as hypnosis, sensory isolation, sex research, small-group and personality research. 4. Volunteers tend to be higher in need for social approval than non -volunteers. Conclusions Warranting Considerable Confidence 1. Volunteers tend to be more arousal- seeking than nonvolunteers, especially when volunteering is for studies of stress, sensory isolation, and hypnosis. 2. Volunteers tend to be more unconventional than nonvolunteers, especially when volunteering is for studies of sex behavior. 3. Females are more likely than males to volunteer for research in general, more likely than males to volunteer for physically and emotionally stressful research (e.g., electric shock, high temperature, sensory deprivation, interviews about sex behavior). 4. Volunteers tend to be less authoritarian than nonvolunteers. 5. Jews are more likely to volunteer than Protestants, and Protestants are more likely to volunteer than Roman Catholics. 6. Volunteers tend to be less conforming than non-volunteers when volunteering is for research in general, but not when subjects are female and the task is relatively “clinical” (e.g., hypnosis, sleep, or counseling research). Conclusions Warranting Some Confidence 1. Volunteers tend to be from smaller towns than nonvolunteers, especially when volunteering is for questionnaire studies.

11.556

© Copy Right: Rai University

71

APPENDIX – A

SAMPLE QUESTIONNAIRES ( Market Research) In each of these cases, the business owners gain valuable information to help them make major decisions about their businesses. Remember that if the results of the survey aren’t very positive, you need to find out WHY. The questionnaire is used as a guide. It doesn’t mean you can’t go into business. A. The first questionnaire is for a select group, the customers of Speedy Photos. The owner conducted the survey during a one week period, reaching both weekday and weekend customers.

Speedy Photo Survey

• 40-59 • over 60

8. Other comments: Created by Women’s Enterprise Society of BC 2 B. This survey was done by a business man interested in opening public storage buildings. Before he committed any time and money to the project, he sent a questionnaire to consumers within a 15 mile radius of the proposed site.

Public Storage Questionnaire

RESEARCH METHODOLOGY

1. Are you presently renting any public storage space? Yes _____ No_____ If no then go to question 2 If yes, then continue with 1a. 1a. Where are you currently renting storage space (name and address) 1b. How many times a month do you visit your storage space? _______ 1c. Is your storage space heated? Yes ______ No______ 1d. Approximately how much space are you renting? _________square feet 1e. Do you think you’ll need additional space in the future Yes ______ No ______ 1f. Are there any changes or improvements you would like to see in your present storage space arrangement? If yes, what would you like to see? 2. Are you planning on using any public storage space? Yes______ No ______ 2a. If you are planning to rent public storage space or may rent such space, how far of a distance are you willing to travel to use your space? ______miles 2b. Approximately what size storage space would you need? ______square feet 2c. How much monthly rent would you be willing to pay? $______per square foot/month 2d. Would you require heat for your space? Name: Title: Address: Thank you very much for your co-operation Created by Women’s Enterprise Society of BC 3 C. This questionnaire was developed by a woman who was interested in selling southwestern jewelry made by Native Indians.

Southwestern Jewelry Questionnaire

In order for us to serve our customers better, we would like to find out what you think of us. Please take a few minutes to answer the following questions while your photographs are being printed. Your honest opinions, comments and suggestions are extremely important to us. Thank you, Speedy Photo 1. Do you live/work in the area (circle one or both)

**• Close to home • Close to work
**

2. Why did you choose Speedy Photo (circle all that apply) • Convenient • Good service

**• Quality • Full-service photography shop • Other
**

3. How did you learn about us? (circle one)

• • • • •

newspaper flyer/coupon passing by recommended by someone other

4. How frequently do you have film printed? (please estimate)

**• ______ time per month • ______ other
**

5. Which aspect of our photography shop do you think needs improvement? 6. Our operating hours are from 8 am to 5:30 pm weekdays and Saturdays from 9:30 am to 6 pm. We are closed on Sundays and legal holidays. What changes in our operating hours would be better for you? 7. Your age (circle one)

1. Have you ever purchased or received southwestern jewelry? Yes ______ No ______ 2. Have you ever purchased or received southwestern jewelry made by native Indians? Yes ______ No ______

• under 25 • 26-39

72

© Copy Right: Rai University

11.556

If Yes, what type of jewelry? Necklace____ Ring _____ Bracelet _____ Earnings _____ Other _____ 3. Would you be interested in purchasing the above mentioned jewelry made by native Indians? Yes ______ No ______ 4. Do you know where to shop for such jewelry? Yes ______ No ______ 5. When buying jewelry, what do you value the most? On a scale of 1 through 5, list in order according to your preference. One represents your most valued choice. Craftsmanship_____ Cost _____ Uniqueness _____ Other _____ D. The last questionnaire was developed by a woman who wanted to open a fitness center and offer one-on-one training

Fitness Center Questionnaire

3. Are there any other services you would like to see offered? (open-ended) 4. Do you believe that our competitors prices are too high? (two-choice) _____ Yes _____ No 5. What price would you be willing to pay for this product/ service? (two-choice) Note: This is an important question to ask because the answer will affect one’s sales revenue projections ____ $10 - 20 ___$20 - 30 6. Which of the following services would you like to see offered? Choose one. (multiple choice) ____ loans program ____ mentoring ____ counselling ____ research ____ other Examples of Poor Survey Questions

Do you like this hotel?

RESEARCH METHODOLOGY

1. Do you exercise Yes ______ No ______ If no, please answer questions to Part A If yes, please answer questions to Part B A. Please check reasons for not exercising: ____Lack of time ____Lack of motivation ____Cost ____No convenient fitness centers ____medical reasons B. Check the type of exercise you do: ____aerobic ____Nautilus ____Free weights ____running ____Swimming ____Other, please specify C. Check you age group _____under 25 _____ 26-35 _____over 35 D. Where do you normally exercise? _____ at home _____ fitness center E. How far do you live from ( town of proposed center)? _____ in town _____ 10-15 miles _____ out of town F. Do you think your town needs a fitness center? Yes _____ No _____ G. Would you be interested in one -on -one training? Yes _____ No _____ H. Please note any other suggestions or comments you might have. Created by Women’s Enterprise Society of BC 4 Examples Of Good Survey Questions 1. How do you rate the convenience of our location? (ranking) _____ poor _____ good _____ very good _____ excellent 2.. Please rank the following factors in the order of important to you when making a buying decision for this service (1 being most important, 5 being lease important) (multiple choice & ranking) ____ price ____ referral ____ location ____ availability ____ guarantee ____ other

(This does not give any valuable information, but it could be reworded, “What do you like about this hotel, what don’t you like about this hotel?) How do you rate the service received? ____ poor ____ fair ____ good ____ very good ____ excellent (This should have an even number of choices) Which of these services would you be interested in? ____ loans ____ mentoring ____ business counselling ____ information referral (This question should have an “other” category) What beverages do you drink? ____ Milk ____ coke ____ non-cola drink ____ coffee ____ tea ____ juice (This question is too broad. Most of us will have drunk some of these at some time. Is the respondent to check a number of boxes or only one? Source: www. Wesbc01companyELibraryMarketResearchSampleQuestionnaires.pdf

11.556

© Copy Right: Rai University

73

RESEARCH METHODOLOGY

**LESSON 12: MEASUREMENT AND SCALING
**

Levels of Measurement

We know that the level of measurement is a scale by which a variable is measured. For 50 years, with few detractors, science has used the Stevens (1951) typology of measurement levels (scales). There are three things, which you need to remember about this typology: Any thing that can be measured falls into one of the four types: The higher the level of measurement, the more precision in measurement and every level up contains all the properties of the previous level. The four levels of measurement, from lowest to highest, are as follows:

**• It describes most judgments about things, such as big or
**

little, strong or weak.

**• Most opinion and attitude scales or indexes in the social
**

sciences are ordinal in nature Interval level of measurement: The interval level of measurement describes variables that have more or less equal intervals, or meaningful distances between their ranks. For example, if you were to ask somebody if they were first, second, or third generation immigrant, the assumption is that the distance, or number of years, between each generation is the same. Ratio level of measurement: The ratio level of measurement describes variables that have equal intervals and a fixed zero (or reference) point. It is possible to have zero income, zero education, and no involvement in crime, but rarely do we see ratio level variables in social science since it’s almost impossible to have zero attitudes on things, although “not at all”, “often”, and “twice as often” might qualify as ratio level measurement.

Advanced statistics require

• • • •

Nominal Ordinal Interval Ratio

**• At least interval level measurement, so the researcher
**

always strives for this level,

**• Accepting ordinal level (which is the most common) only
**

when they have to.

**• Variables should be conceptually and operationally
**

defined with levels of measurement in mind since it’s going to affect the analysis of data later Types of measurement scales Ordinal and nominal data are always discrete. Continuous data has to be at either ratio or interval level of measure Now let us discuss these in detail:

**Nominal level of measurement
**

Nominal variables include demographic characteristics like sex, race, and religion. The nominal level of measurement describes variables that are categorical in nature. The characteristics of the data you’re collecting fall into distinct categories:

**• If there are a limited number of distinct categories (usually
**

only two), then you’re dealing with a dichotomous variable.

**• If there are an unlimited or infinite number of distinct
**

categories, then you’re dealing with a continuous variable. Ordinal level of measurement:

**• The ordinal level of measurement describes variables that
**

can be ordered or ranked in some order of importance. Figure: Levels of measurement

© Copy Right: Rai University

74

11.556

All reliability estimates are usually in the form of a correlation coefficient. then it’s conceivable that other indicators were overlooked. Inter-rater This means that you are using the most appropriate research design for what you’re studying (experimental. There are four good methods of estimating validity: • • • • Face Content Criterion Construct Face validity Face validity is the least statistical estimate (validity overall is not as easily quantified as reliability) as it’s simply an assertion on the researcher’s part claiming that they’ve reasonably measured what they intended to measure. Usually. which may add years onto carrying out your research. survey. Methods of Measuring Reliability: Now.556 © Copy Right: Rai University 75 . your research need only be carried out for a short period of time. A study can be reliable but not valid. If the researcher has focused in too closely on only one type or narrow dimension of a construct or concept. a researcher asks a colleague or expert in the field to vouch for the items measuring what they were intended to measure. There are many different threats to validity as well as reliability but an important early consideration is to ensure you have internal validity. such as parallel forms and disguised test-retest. giving it to the same group twice.effect relationship. Split-half Taking half of your test. Then. There are many different types of longitudinal research. it is a valid thermometer. or historical). but it’s simply the scrambling or mixing up of questions on your survey. instrument. or assistants. and if it works correctly. instrument. qualitative. In such a case. Methods of Measuring Validity Once you find that your measurement of variable under study is reliable. or survey. Sometimes. your time frame is referred to as cross-sectional. all you do is calculate the correlation coefficient between the two scores of each group and report it as your reliability coefficient. It’s a more rigorous test of reliability. and analyzing that half as if it were the whole thing estimate split-half reliability. Some social and psychological phenomena (most notably those involving behaviour or action) lend themselves to a snapshot in time. The general rule is to use longitudinal research the greater the number of variables you’ve got operating in your study and the more confident you want to be about cause and effect. and it also means that you have screened out spurious variables as well as thought out the possible contamination of other variables creeping into your study. for example. you will want to measure its validity. Reliable but not Valid • Split-half • Test-retest Test-retest RESEARCH METHODOLOGY The Test Retest in the same group technique is to administer your test. you compare the results of this analysis with your overall analysis. Reliability Research means that the findings would be consistently the same if the study were done over again Validity A valid measure is one that provides the information that it was intended to provide. Multiple forms Not Reliable (so not valid either) Reliable AND Valid The multiple forms technique has other names. the question arises that how will you measure the reliability of a particular measure? There are four good methods of measuring reliability: • Test-retest Inter-rater reliability is most appropriate when you use assistants to do interviewing or content analysis for you. Anything you do to standardize or clarify your measurement instrument to reduce user error will add to your reliability. To calculate this kind of reliability. one that is called longitudinal. survey. is to provide information on the temperature. and to troubleshoot bugs at the same time. for example. Content validity Content validity goes back to the ideas of conceptualization and operationalization. the study lacks content validity Content validity is making sure you’ve covered all the conceptual space. If so. and it cannot be valid without first being reliable. The purpose of a thermometer. or measure to the same group of people at different points in time. such as those that involve time-series (such as tracking a third world nation’s economic development over four years or so). perhaps a few weeks or a couple of months. its findings must be both reliable and valid. • Multiple forms • Inter-rater 11. quasi-experimental. Most researchers administer what is called a pretest for this. so here.Reliability and validity For a research study to be accurate. It’s essentially a “take my word for it” kind of validity. It’s also important consider the time frame that is appropriate for what you’re studying as soon as possible. cross-sectional research is criticized as being unable to determine cause and effect A longer time frame is called when cross-sectional data fails to depict the cause. all you do is report the percentage of agreement on the same subject between your raters. In such a case.

• Performance of objective tasks .use unstructured stimuli such as word association tests.There are different ways to estimate it. It is commonly found in criminology. These statements are selected according to their position on an interval scale of favorableness. There are several types of attitude rating scales: Attitude Measurement RESEARCH METHODOLOGY Criterion validity is using some standard or benchmark that is known to be a good indicator. While the intensity can be measured. Disagree. and “strongly disagree” In fact. “Hours are convenient”. Semantic differential scale • A semantic differential scale is constructed using phrases describing attributes of the product to anchor each end. one’s attitude about an issue can be inferred by whether he/she signs a petition related to it. Neither Agree Nor Disagree. Attitudes do not change much over time Attitudes produce consistency in behavior. In this scale a statement is made and the respondents indicate their degree of agreement or disagreement on a five-point scale (Strongly Disagree. Attitudes are a person’s general evaluation of something. It’s how well the items hang together (convergent validity) or distinguish different people on certain traits or behaviors (discriminant validity). Example of a Likert scale: How would you rate the following aspects of your food store? Extremely Important Service Check outs Bakery Deli 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 Extremely unimportant 7 7 7 7 • Observation of behaviour . Respondents then are asked to indicate with which statements they agree. There are different forms of criterion validity: Many of the questions in a marketing research survey are designed to measure attitudes. Physiological reactions . Customer attitude is an important factor for the following reasons: • Concurrent validity is how well something estimates actual • day-by-day behavior. • Indirect techniques . the subject can be asked to memorize the arguments of both sides of an issue. Attitudes produce consistency in behavior. The process entitled Semantic Differential employs a similar approach as the Likert scaling in that it seeks a range of responses between extreme polarities but it seeks to place the ordinal range of responses between two keywords expressing 76 © Copy Right: Rai University 11. Attitudes can be measured using the following procedures: • Self-reporting . Statements are chosen that has a small degree of dispersion. “disagree”. He/she is more likely to do a better job on the arguments that favor his/her stance. You have to either do years and years of research or find a group of people to test that have the exact opposite traits or behaviors you’re interested in measuring. Attitudes can be related to preferences. Attitudes can be related to preferences.a mixture of techniques can be used to validate the findings.assumes that one’s performance depends on attitude. Attitude Measurement In this scale a set of statements are assembled.subject’s response to a stimulus is measured using electronic or mechanical means. For example. attitudes can be inferred by observing behaviour.556 .assuming that one’s behaviour is a result of one’s attitudes. Another way is to simply look over your inter-item correlations. Likert method of summated ratings Many of the questions in a questionnaire are designed to measure attitudes. it is difficult to know if the attitude is positive or negative. Once ordinality has been assigned. For example. For example. It’s the most difficult validity to achieve. • • • • Attitude helps to explain how ready one is to do something. Criterion validity • Multiple measures . but one of the most common is a reliability approach where you correlate scores on one domain or dimension of a concept on your pretest with scores on that domain or dimension with the actual test. Construct validity There are several types of attitude rating scales: Equal-appearing interval scaling – Construct validity is the extent to which your items are tapping into the underlying theory or model of behavior. Predictive validity is how well something estimates some future event or manifestation that hasn’t happened yet. “agree”.subjects are asked directly about their attitudes. Agree. the assumption is that a respondent choosing a response weighted with say a 15 out of 20 in an increasing scale of intensity is placed at that level for the index. Likert scaling is initially assigned through a process that calculates the average index score for each item in an index and subsequently ranks them in order of intensity (recall the process for constructing Turnstone scales). The respondent then marks one of the seven blanks between the statements to indicate his/her opinion about the attribute. the left end may state. “Hours are inconvenient” and the right end may state. Self-reporting is the most common technique used to measure attitude. especially worthwhile when selfreporting is used. It actually extends beyond the simple ordinal choices of “strongly agree”. Customer attitude is an important factor for the following reasons: • • • • Attitude helps to explain how ready one is to do something. Strongly Agree). Attitudes are a person’s general evaluation of something. Attitudes do not change much over time.

3%. that’s the difference between a scale and an index in the first place). Similarly. But. But then. Bobbie’s illustration provides the best illustration of the concept. Example of semantic differential: RESEARCH METHODOLOGY One of the first things that strike you is the highly interpretative nature of Bobbie’s example. and Wal-Mart on the following scale? Clean Bright Low quality Conservative Stapel Scale ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ dirty ___ dark __high quality __innovative Guttman Scaling and Coefficient of Reproducibility Response Pattern Scale Types + + + = = + = = + + = = + = = + + = = = = + + + Number of Cases 612 448 92 79 15 5 2 5 Index Scores 3 2 1 0 1 2 1 2 Scale Scores 3 2 1 0 2 3 0 3 Total Scale Errors 0 0 0 0 15 5 2 5 Mixed Types It is similar to the semantic differential scale except that numbers identifies points on the scale. that does not mean that a question on religion belongs in an index or scale of questions on “attitudes on abortion”. However. Bogardus and Thurstone scales. How would you describe Kmart. Guttman scaling seeks to place indicators into an ordinal progression from “weak” indicators to “strong” ones (well. The number of response sets that violate the scalar pattern is compared to the number that do reflect the pattern and what is referred to as a coefficient of reproducibility. So far.opposite “ideas” or concepts. This is referred to as a typology. Guttman scaling shows that a well constructed scale can very accurately the profile of a response set. Again.3% 1. you can construct a typology effectively showing. Choices such as “enjoyable” and “un enjoyable” simply reflect preference. As with the Likert. It 11. Remember what we have noted about making sure that your indices and scales are comprised of single dimension indicators. its ambiguity in application remains problematic. If you are seeking nothing more than attitudinal information to an abstract social artifact such as a piece of music. A brief word on typologies is in order. the process of semantic differential may be usable. for example that “Catholics” may be “conservative” on “abortion” but remain “liberal” on “other human rights”. that is the number of response sets that do not reflect the assumption that a respondent choosing one level of response would give the same type of response to all inferior levels. the premise of the Guttman scale extends even further. What the above illustration shows is that if we were to project an imaginary “sample” from the coefficient of reproducibility of 99. Recall that while “religion” can have a strong correlation with “attitudes on abortion”. etc. only one statement is used and if the respondent disagrees a negative number should marked. Catholics may be more anti-abortion because the church has forbidden it but what of other groups? You can get onto some very shaky ground using typologies as the “effect” or dependent variable. the assumption that a respondent indicating a given level of preference. you only know the coefficient of reproducibility after you have run the survey and crunched the numbers so it is not a predictive tool. that is one thing in one direction (attitudes for or against abortion.sort Technique the respondent if forced to construct a normal distribution by placing a specified number of cards in one of 11 stacks according to how desirable he/she finds the characteristics written on the cards.556 © Copy Right: Rai University 77 . we have limited ourselves to an examination of unidirectional variables. it is a proof of the strength of the scale as a measure. Bobbie’s illustration provides a very clear understanding.774 In Q. This technique is faster and less tedious for subjects than paired comparison measures. in that it examines all of the responses to the survey and separates out the number of responses that do not exactly reflect the scalar pattern. if you wish to examine the intersection of the two. This scale does not require that bipolar adjectives be developed and it can be administered by telephone. Otherwise.993 or 99.258 x 3 3. Often relationships are better explained as the function of the intersection of several variables.). and there are 10 positions instead of seven. Q-sort technique Coefficient of Reproducibility = 1 In the example = 1 - Number of Errors Number of Guesses 27 27 = = . Semantic Differential: Feelings about Musical Selections Very Much Enjoyable Somewhat Neither Somewhat Very Much Unenjoyable Simple Complex Discordant Harmonic Traditional Modern The entire exercise is really just a way of indicating that the degree to which a set of responses accurately reflects the scalar assumptions is an indication of the degree to which the entire set could be recreated from the scale itself. then the projection would reflect the real sample to that degree. attitude or belief will also demonstrate all “weaker” indicators of the same thing. Bobbie warns us that typologies are useful as independent variables (“religion” may be a good causal factor in “attitudes on abortion”) but can be problematic as dependent variables (explaining the “why” isn’t always clear). Target. but the other choices are sufficiently ambiguous as to invite imprecise understanding.

For specific attributes the semantic differential scale is very appropriate. The Likert scale is used for item analysis. To know the corporate productivity 2. Each method has got certain strengths and weaknesses. This is primarily because of a lack of models that describe the attitudes in behaviour RESEARCH METHODOLOGY Tutorial Prepare a questionnaire on any one of the following objectives 1.556 . The selection depends upon the stage and size of research. 3. Thus we can say that the objective of Q-Technique is intensive study of individuals. But all the techniques are not suitable for all purposes. which are available for the measurement of attitudes. Almost all the techniques can be used for the measurement of any component of attitudes. Generally.also forces the subject to conform to quotas at each point of the scale so as to yield a normal or quasi – normal distribution. Hence it is widely used. Limitations of Attitude Measurement Scales: The main limitation of these tools is the emphasis on describing attitudes rather than predicting behaviour. Product testing / Feedback of after sales services 78 © Copy Right: Rai University 11. Job analysis / needs and satisfaction level of employees/ motivation level of employees /job involvement etc. Selection of an appropriate attitude measurement of scale: We have examined a number of different techniques. Q-sort and Semantic differential scale are preferred in the preliminary stages. Overall the semantic differential is simple in concept and results obtained are comparable with more complex. one-dimensional methods.

The values of quantitative are called “Variates” 11. Annual yield of apple fruit in a hilly district. Parameters are used to represent a certain population characteristic. It is the entire group of interest. plant or thing which is actually studied by a researcher. the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn. and s). Within a population. when population is large infinite) or if units are destroyed during investigation it is not possible to enumerate or investigate whole population. on the basis of which a decision concerning the population is made. which is rather a very difficult task. a sample of soil. a doctor’s practice. the basic objects upon which the study or experiment is executed. When we are doing certain investigation the interest lies in the assessment of the general magnitude and the study of variation with respect to one or more characteristics relating to individuals belonging to a group 3. Popularity of family planning among families having more than two children ____________________________________________ ____________________________________________ 2. we can say that a finite subset of statistical individuals in a population is called a sample and the number of individuals in a sample is called sample size.g. Thus. animals. which may take different values. if we want to have an idea about the average montly income of people residing in India. gender since they are different from individual to individual. money and administrative convenience. It is used to give information about unknown values in the corresponding population. For example. A qualitative variable. For example. For example. animal. But even if population is finite 100% inspection is not possible because of various factors like time. plants or things on which we may collect data. Study of child mortality rate in a district ____________________________________________ ____________________________________________ Population This group of individuals is called population or universe. a zip code area. such as weight.556 © Copy Right: Rai University 79 . A statistic is a function of an observable random sample. s). For example. which can vary in successive observations either in quantity or quality is called a “variable.” Variables are classified accordingly as quantitative or qualitative. To understand it better it is necessary that we do certain related terms first. the mean of the data in a sample is used to give information about the overall mean min the population from which that sample was drawn. For example. µ. Also. does not vary in magnitude in successive observations. a pot of seedlings. a parameter is a fixed value that does not vary. The values of quantitative called “Attributes”. Measurement of the volume of timber available in a forest ____________________________________________ ____________________________________________ 4. and therefore it has to be estimated. whereas the equivalent unknown values in the population ( parameters ) are assigned Greek letters (e. It is impractical for an investigator to completely enumerate the whole population for any statistical investigation. Statistics are often assigned Roman letters (e.. Election for a political office with adult franchise ____________________________________________ ____________________________________________ Variables A characteristic or phenomenon. Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. which we wish to describe or about which we wish to draw conclusions. Any object or event. we will have to enumerate all the earning individuals in the country.g.RESEARCH METHODOLOGY LESSON 13: SAMPLING ISSUES IN RESEARCH Students. Activity Define population and sampling unit in each of the following problems 1. A quantitative variable does vary in magnitude in successive observations. Sampling Unit: A unit is a person. ____________________________________________ ____________________________________________ 5. Parameter and Statistic A parameter is an unknown value. a person. today we shall be doing various issues in sampling . It is therefore an observable random variable. A statistic is a quantity that is calculated from a sample of data. Thus we can define population as A population is any entire collection of people. Sampling Sampling is the selection of part of an aggregate or totality known as population. the population mean m is a parameter that is often used to indicate the average value of a quantity.

The frequency distribution I can create from these 20 numbers is the sampling distribution I want.48 16. Weight (g) 11.s2 is the sample variance.24 13. the lower the sampling error and the greater the efficiency.20 16. Sufficient Summarizes all relevant information about the parent population contained in the sample.34 80 © Copy Right: Rai University 11.28 20. although each random observation may not be predictable when taken alone. the sampling error. Efficient The more the statistic values for various samples cluster around the true parameter value. and put in a bag. and below that is the actual sampling distribution. collectively they follow a predictable pattern called its distribution function.89 10. Let us illustration the sampling distribution. Example: Creation of a Sampling Distribution Example: Creation of a Sampling Distribution Rock ID 1 2 3 4 5 6 Sampling Distribution The sampling distribution is a hypothetical device that figuratively represents the distribution of a statistic (some number you’ve obtained from your sample) across an infinite number of samples.where . That’s because the center of the sampling distribution represents the best estimate of the population average. it will make the topic very clear.33 18. 4. the greater the standard error (and your sampling error). and inference is all about making generalizations from statistics (sample) to parameters (population). This forms the population from which I can draw samples. The archer wants to be accurate. To do this. it is a fact that the distribution of a sample average follows a normal distribution for sample size over 30. Example of the creation of a sampling distribution Six rocks were extracted from my team and each was weighed.Variable: Randomness: Randomness means unpredictability. while ignoring any sample-specific information. but also wants the arrows to cluster as closely to the centre of the target as possible. Consider an archer shooting at a target. The average of the sampling distribution is the population parameter. The standard error (this term was first used by Yule. You can use some of the information you’ve collected thus far to calculate the sampling distribution. and the population is what you want to make inferences to. I want to construct a sampling distribution of the mean weight of 3 rocks from the population of 6. Below is the table I would use to create this distribution.While it’s very likely that any statistics you generate from your sample would be near the center of the sampling distribution. I must enumerate all samples of size 3 which can be drawn from a population of size 6 (there are 20 in total) and compute the mean of each. For example. Every statistic in a sample might have a different sampling distribution • Standard deviation — the spread of scores around the average in a single sample • Standard error — the spread of averages around the average of averages in a hypothetical sampling distribution You never actually see the sampling distribution. the closer the statistic should be to its parameter value. The greater your standard deviation.87 17. 2. any standard deviation of a sampling distribution is referred to as the standard error (to keep it separate in our minds from standard deviation). Definitions are as follows: RESEARCH METHODOLOGY Desirable Characteristics of Sample Statistics 1. 1897) is the standard deviation of a mean and is computed as: Standard Error= (s2/n)1/2. All you have to work with is the standard deviation of your sample. In sampling. The fascinating fact about inferential statistics is that. the researcher normally wants to find out exactly where the center of this sampling distribution is. n is the sample size. In other words.9 24. just by luck of the draw. Suppose.47 16. or more accurately. the standard error is referred to as sampling error. You have to remember than your sample is just one of a potentially infinite number of samples that could have been drawn. an extreme value of the sample mean is less likely than an extreme value of a few raw data. labeled. 3.556 . In statistics.43 Sample Means Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 1 1 1 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 13.22 15. Consistent The larger the sample size. Unbiased If the arithmetic mean of the statistic calculated for all possible samples of a given size n exactly equals its population parameter.

This is because of the fact that the characteristics possessed by the substituted unit will usually be different from those possessed by the unit originally included in the sample. aulty Demarcation of Sampling units It is significant in particularly areas surveys such as agricultural experiments in the field or in the crop cutting fields etc.we get an unbiased estimate of population standard deviation. c. It is very difficult to prepare an exhaustive list of the sources of non-sampling errors.80 19. RESEARCH METHODOLOGY Sampling and Non-sampling error We can classify broadly the errors involved in the process of research into two heads: Sampling Errors and Non-Sampling Errors Sampling Errors These have the origin in sampling and arise out of the fact that only a part of the population is used to estimate the population parameters and draw inferences about the population.19 14. Substitution If you substitute one unit for another if some difficulty arises in studying that particular unit (first one). <=17 >17.556 © Copy Right: Rai University . Constant error due to improper choice of the statistics for estimating the population parameters For example while estimating the standard deviation of population if we divide the sum of squares by n instead of n1.Principle of optimization impresses upon obtaining optimum results in terms of efficiency and cost of the design with the resources at disposal. The samples obtained by the technique of probability sampling satisfy this principle. therefore present in both complete enumeration and sample survey. Therefore. d. You’re not reducing bias or anything by increasing sample size. Validity and sampling error are somewhat similar. However. The larger your sample. b.69 11. However some of the more important ones arise because of following factors: 81 11.93 3.72 12. Principle of statistical regularity stresses the desirability and importance of selecting a sample at random so that each and every unit in the population has an equal chance of being selected in the sample.32 16. <=19 >19. If you use a defective technique for selecting a sample. the smaller the standard error.Principle of validity means the sample design should enable us to obtain valid tests and estimates about the parameters of the population. this leads to some bias . Principles of sample survey The theory of sampling is based on the following important principles: 1. Non-sampling errors can occur at every stage of planning or execution of census or sample survey. <=21 4 6 6 2 Frequency 2 Relation between Standard Error and Sample Size Standard error is also related to sample size. The principle of optimization consists in a. On-sampling Errors The non -sampling errors primarily arise at the stages of • Observation • Ascertainment • Processing of data These are. only coming closer to the total number in the population. e. you can estimate population parameters from even small samples. in a coin tossing experiment.Sample 7 Sample 8 Sample 9 Sample 10 Sample 11 Sample 12 Sample 13 Sample 14 Sample 15 Sample 16 Sample 17 Sample 18 0 1 0 0 1 1 0 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 17.60 15. This bias can be overcome by adhering to Simple Random Sampling. the results tend to be more reliable and accurate. Principle of statistical regularity 2. sampling errors are absent in complete enumeration. We get an immediate derivation from this principle is the principle of Inertia of large numbers which states that “Other things being equal as the sample size increases. Faulty selection of sample The Sampling Distribution Bin >11.06 17.55 20. achieving a given level of efficiency at minimum cost b. 2. The reciprocal of the sampling variance of an estimate provides a measure of its efficiency while a measure of cost of the design is provided by the total expenses incurred in terms of money and man hour.g purposive or judgement sampling in which the investigator deliberately chooses the sample in order to deduce the desired results. Principle of optimization 1. The sampling errors are basically because of following reasons: a.86 13. the results will be approximately 50% heads and 50% tails provided we perform the experiment a fairly large number of times. <=13 >13. obtaining maximum possible efficiency with given level of cost. For example .20 14. <=15 >15.09 18. Principle of validity 3.

Until and unless sampling is done by trained and efficient personnel and sophisticated equipment for its planning. Response Errors 3. 2. execution and analysis. Timeliness: Another advantage of a sample over a census is that the sample produces information faster. In absence of these sampling is not trustworthy 3. In that case sampling will not be an appropriate method. because it take less time.Compilation errors are subject to control through verification . Advantages of sampling over complete enumeration The following are the advantages and/or necessities for sampling in statistical decision-making: 1. • The appropriate sampling technique is used. 3. Errors in coverage If the objectives are not stated concisely in a clear cut manner it may lead to: • Certain units which should not be included also gets included • Certain units which must be included gets excluded 5. errors due to ill designed questionnaires. 6. Cost: Cost is one of the main arguments in favour of sampling.e.Response bias 4. because a sample is a smaller-scale undertaking. Errors in coverage 5. You have to take proper care in the planning and execution of the sample survey. Here Non-Sampling Errors may arise due to a) Data specification being inadequate and inconsistent with respect of the objectives of study b) Error due to location of the units and actual measurement of the characteristics. punching of cards.556 . consistency check. These objectives are then translated into • Mechanics of publication-the proofing error and the like. errors in recording the measurements. 4. Compiling Errors : Various operations of data processing such as editing and coding of the responses. Destructive Tests: When a test involves the destruction of an item under study. RESEARCH METHODOLOGY • A set of definitions of the characteristics for which data is to be collected • Into a set of specificationsfor collection . because often a sample can furnish data of sufficient accuracy and at much lower cost than a census. Accuracy: Much better control over data collection errors is possible with sampling than with a census. sampling must be used. 82 © Copy Right: Rai University 11. Publication Errors Now we will discuss them in detail 1. 4. tabulation and summarizing the origional observations made in study are the potential source of error. Limitations of Sampling The advantages of sampling over complete enumeration can be derived only if • The sampling units are drawn in a scientific manner. processing and publishing. Response Errors There arise as a result of the responses furnished by the respondents because of following reasons • Response error may be accidental. This is important for timely decision making. The procedure of selecting a sample may be broadly classified under the following three heads: • Non-Probability Sampling Methods: Subjective or Judgement Sampling • Probability Sampling • Mixed Sampling These we will be studying in detail in the next lecture. Compiling Errors 6. If you want to have information of each and every unit of population you will have to go for complete enumeration only. and allows us to take more care in the data processing stage. Faulty planning or definition. 5.1. is less costly. otherwise the results obtained might be inaccurate and misleading 2. Amount of Information: More detailed information can be obtained from a sample survey than from a census. etc. Statistical sampling determination can be used to find the optimal sample size within an acceptable cost. Non. c) Lack of trained and qualified investigators and 2. the respondent may understand a particular question and accordingly furnish improper information un-intentionally. and • The sample size is adequate Sampling theory has its own limitations and problems which may be briefly outlined as 1. • • • • Prestige Bias Self-Interest Bias due to interviewer Failure of respondent’s memory 3. Faulty planning or definition: As we all know the foremost step in research is explicitly stating the objectives of the study.Response bias Non-Response biases occur if you do not obtain full information from all the sampling units.Publication Errors The errors committed during presentation and printing of tabulated results are basically due to two sources: Types of Sampling The type of enquiry and the nature of data fundamentally determines the technique or method of selecting a sample . 2.g. Non. • Failure of the survey organization to point out the limitations of the statistics.

statistic. variables-qualitative and quantitative. we studied various concepts like population. sampling distribution.Now. principles of sample survey . Notes - RESEARCH METHODOLOGY 11.556 © Copy Right: Rai University 83 . characteristics of sample statistic. sampling and nonsampling errors. merits and limitations of sampling. briefly tell me what concepts you have studied today? Yes. standard error. variable randomness.

Convenience sampling is extensively used in marketing studies and otherwise. Think of extreme cases. 3. the researcher may take samples of consumers from such cities and obtain consumer evaluations about these products as these are supposed to represent “national” tastes. If we want to know the average height of the population and we select just one person and measure their height it is unlikely to be close the population average. however. Suppose a marketing research study aims at estimating the proportion of Pan (Beetle leaf) shops in Delhi. the standard deviation of sampling distribution is called standard error.000. larger the sample size lower will be the standard error. A ball pen manufacturing company is interested in knowing the opinions about the ball pen (like smooth flow of ink. as most Pan shops in Delhi had no chance of being selected. A large sample size is more likely to be representative of a population than a small one. Convenience Samples • These types of samples are used primarily for reasons of convenience. While conducting marketing tests for new products. 84 © Copy Right: Rai University 11. This would be clear from the following examples 1. resistance to’ breakage of the cover etc. students. take steps that make it as likely as possible that the sample will be representative of the population. As another example a researcher might visit a few shops to observe what brand of vegetable oil people are buying so as Types of Sampling The type of enquiry you want to have and the nature of data that you want to collect fundamentally determines the technique or method of selecting a sample. For the process of statistical inference to be valid we must ensure that we take a representative sample of our population.000 people. Also.We classify non-probability sampling into four groups: A Convenience Samples B Judgement Samples C Quota Samples D Snowball samples A. It is decided to take a sample of size 150. We can. peers. There might be some cities whose demographic make-ups are approximately the same as national average. The total number of units in the population and in the sample are known as population size and sample size.RESEARCH METHODOLOGY LESSON 14: DESIGNING SAMPLE Hello. We have learned that a sample is a part or aggregate selected with a view to obtaining information about the whole group also known as population. The population is composed of a number of units. We also came to know that sampling is the scientific technique of drawing a sample. We have also studied various sources of sampling and non-sampling error along with principles of sampling. • It is used for exploratory research and speedy situations. this figure would be likely to be close to the population average.) it is presently manufacturing with a view to modify it to suit customers need. etc. The procedure of selecting a sample may be broadly classified under the following three heads: • Non-Probability Sampling Methods: Subjective or • • Mixed Sampling Judgement Sampling Probability Sampling Now let us discuss these in detail. • It is often used for new product formulations or to provide gross-sensory evaluations by using employees. What the investigator does is to visit 150 Pan shops near his place of office as it is very convenient to him and observe whether a Pan shop stores Maaza or not. The other example where convenience sampling is often used is in test marketing. If we took 1. It is 10: only those Pan shops which were near the office of the investigator has a chance of being selected 2. The job is given to a marketing researcher who visits a college near his place of residence and asks a few students (a convenient sample) their opinion about the ‘ball pen” in question. 4. students today we shall be continuing our discussing on issues in sampling. We will start with the nonprobability sampling then we will move on to probability sampling. Two simple and effective methods of doing this are making sure the sample size is large and making sure it is randomly selected. Whatever method of sample selection we use it is vital that the method is described. Non-Probability Sampling Methods: The common feature in non probability sampling methods is that subjective judgments are used to determine the population that are contained in the sample .556 . measured their heights and took the average. Any characteristic of population is called parameter and that of sample is called statistic. 1£ This is definitely not a representative sample. Before proceeding further let us recaptulate what we had studied in the last lecture first. How do we know if the characteristics of a sample we take match the characteristics of the population we are sampling? The short answer is we don’t. which store a particular drink Maaza.

The research results cannot be projected (generalized) to the total population of interest with any degree of confidence. This new member was chosen deliberately . Probability Sampling Probability sampling is the scientific method of selecting samples according to some laws of chance in which each unit in the population has some definite pre-assigned probability of being selected in the sample. Suppose it is decided to select a sample of size 200 from the population. C. The different types of probability sampling are : 1. B. Here the sample is selected on the basis of certain basic parameters such as age. Suppose we are conducting a survey to study the buying behavior of a product and it is believed that the buying behaviour is greatly influenced by the income level of the consumers. the first field RESEARCH METHODOLOGY If personal biases are avoided. Snowball Sampling • It is that samples in which the selection of additional respondents (after the first small group of respondents is selected) is based upon referrals from the initial set of respondents. 35% in the middle-income group and 45% in the low-income • You do not know the degree to which the sample is representative of the population from which it was drawn. where each unit has an equal chance of being selected. The field workers are assigned quotas of the number of units satisfying the required characteristics on which data should be collected. Suppose we have a panel of experts to decide about the launching of a new product in the next year. then the relevant experience and the acquaintance of the investigator with the population may help to choose a relatively representative sample from the population. Judgement Samples • It is that sample in which the selection criteria are based upon • your (researcher’s) personal judgment that the members of the sample are representative of the population under study. The salesmen could be grouped into top-grade and low-grade performer according to certain specified qualities. D. However in the absence of any objective data. a member drops out. Needless to mention this is a biased method. Thus. Therefore. Judgement sampling is used in a number of cases. some of which are: 1. • It is acceptable when the level of accuracy of the research • • It often produces samples quite similar to the population of interest when conducted properly. Further it is known that 20% of the population is in high-income group. income and occupation that describe the nature a population so as to make of it representative of the population. For example. This is a very commonly used sampling method in marketing research studies. middle-income group and low-income group. Sampling units have different probabilities of being selected 3. In this method an equal probability of selection is assigned to each unit of population at the first draw. We assume that it is possible to divide our population into three income strata such as high-income group. samples of size 40. the chairman of the panel may suggest the name of another person whom he thinks has the same expertise and experience to be a member of the said panel. 2. • It is used to sample low incidence or rare populations • It is done for the efficiency of finding the additional. Simple Random Sampling It is the technique of drawing a sample in such a way that each unit of the population has an equal and independent chance of being included in the sample. However. Now the various field workers are assigned quotas to select the sample from each group in such a way that a total sample of 200 is selected in the same proportion as mentioned above.556 © Copy Right: Rai University 85 . Advantages of Non-probability Samples: • It is much cheaper to probability samples.a case of Judgment sampling. the sales manager may indicate who in his opinion. sex. 70 and 90 should come from high income. The method could be used in a study involving the performance of salesmen. the minimum required sample size cannot be calculated which suggests that the you (researcher) may sample too few or too many members of the population of interest. one might have to resort to this type of sampling. group. hardto-find members of the sample. If for some reason or the other. middle income and low income groups respectively. Probability of selection of a unit is proportional to the sample size. It is used for most test markets and many product tests conducted in shopping malls. 11. Quota Samples • Selection is done by non-probability means and are based upon the researcher’s judgement of appropriate demographics. The Investigators or field workers are instructed to choose a sample that conforms to these parameters. results is not of utmost importance. It is not possible to make an estimate of sampling error as we cannot determine how precise our sample estimates are. Having done so. from the panel. Less research time is required than probability samples. before collecting data on these units the investigators are supposed to verify that the units qualify these characteristics. 2. • Disadvantages of Nonprobability Samples: • You cannot calulate Sampling error.to make inference about the share of a particular brand he is interested in. would fall into which category.

S. 5.. we want to select “r” candidates out of “n”. the method of drawing the random sample consists in the following steps: As we all know Simple Random Sample refers to that method of selecting a sample in which each and every unit of population is given independent and equal chance to be included in the sample. Although human bias is inherent in any sampling scheme administered by human beings. size. 24 Cambridge University Press) 4. I will tell you about the different sets of random numbers commonly used in practice. 2. This method of selecting a simple random sample is independent of the properties of population. i.4.41600 digits selected at random from the British Census Report. Thus in simple random sample from a population of size N.556 .000 digits grouped into 25. Agricultural and Medical Research) comprise 15. Thus. Random selection is best for two reasons . any page of the random number tables and pick up the numbers in any row or column or diagonal at random.7. These slips are then put in a bag and thoroughly shuffled and then “r” slips are drawn one by one.figure logarithmic tables.000 digits arranged in twos.9 appear with approximately the same frequency and independently of each other.4.5. Generally.00. The pack of cards is a miniature of population for sampling purposes.8. so does each of the pairs 00 to 99 or triplets from 000 to 999 or quadruplets 0000 to 9999 and so on . Similarly if Nd”999 or Nd”9999 and so on.000 sets of 4 digited random numbers (Tracts for computers No.6.8.1. colour. Generally in place of slips you can use cards also.The probability of drawing a second unit in the second draw is 1/ N-1 . The probability of selecting a specified unit of population at any given draw is equal to the probability of its being selected at the first draw. Therefore the most practical and inexpensive method of selecting a random sample consists in the use of Random Numbers Tables. the method of selecting a sample must be independent of the properties of sampled population. But. Proper precautions should be taken to ensure that your selected sample is random. • Mechanical method and using tables of random numbers. 3. we get numbers from 000 to 999 or (0000 to 9999) and so on.400 four digited numbers.it eliminates bias and statistical theory is based on the idea of random sampling. and S is the minimum number. which has been constructed that each of the digits 0. The “r” candidates corresponding to numbers on the slips drawn will constitute a random sample. etc.Thomson’s 20.3. computerized random number generator or lottery method. We make one card corresponding to one unit of population by writing on it the number assigned to that particular unit of pipulation. Fisher and Yates (1938) Tables (in statistical tables for biological. This is one of the most reliable methods of selecting a random sample. the three methods of drawing simple random sample are: • Identify the N units in the population with the numbers from 1 to N • Select at random.1. 15 Cambridge University Press) Tippet number tables consist of 10. 1.2. 86 © Copy Right: Rai University 11. Thus. We assign the numbers from 1 to n i. We will illustrate it by means of example for better understanding: Suppose. Lottery method This is the simplest method of selecting a random sample.2. Fisher and Yates obtained these tables by drawing numbers at random from 10th to 19th digits of A. The cards are shuffled a number of times and then a card is drawn at random from them. Illinois) random number tables consist of one million random digits consisting of 5 digits each. TI-82: Generating Random Numbers You can generate random numbers on the TI-82 calculator using the following sequence. then combining the digits three by three ( or four by four and so on ). giving in all 10. • The population units corresponding to the number of unit selected in step (ii) comprise the random sample. One procedure may be good and simple for a small sample but it may not be good for the large population.9 appear with approximately the same frequency and independently of each other. The numbers in these tables have been subjected to various statistical tests for randomness of a series and their randomness has been well established for all practical purposes. Rand Corporation (1955) (free oress. Random Sample does not depend only upon selection of units but also on the size and nature of the population.e to each and every candidate we assign only one exclusive number. Kendall and Babington Smith’s (1939) random tables consist of 1. Selection of a simple random sample: Mechanical Randomisation or Random Numbers Method RESEARCH METHODOLOGY The explained method of lottery is very time consuming and cumbersome to use if population is very large. • sealed envelopes (lottery system) etc. Tippets (1927) Random Number Table: (Tracts for computers No. Since each of the digits 0.400 x 4 . If we have to select a simple random sample from a population of size N(d”99) then the numbers can be combined two by two to give pairs from 00 to 99.7.5. N is the number of different values.It also implies an equal probability of selecting in the subsequent draws. which could be. These numbers are then written on n slips which are made as homogeneous as possible in shape. the probability of drawing any unit in the first draw is 1/N.6. We can select a simple random sample through use of tables of random numbers.3.e.

Sometime. A simple random sample may result in the selection of the sampling units. This is not a problem if we are conducting a large study. If I were conducting a study looking at two treatments. you will get another random number. This would not be a very good basis for comparing the two treatments. the only way of increasing the precision of sample mean is to devise a sampling technique which will effectively reduce variance. 3. replace the N by the actual number of different values. This type of problem can be eliminated by use of Stratified Random Sampling. then you can generate them using the following formule N=B-A+1 int (N*rand+A) Notice it is B-A+1 not B-A. like long runs of the same number. in which the population is divided into different strata. But. But if the above study had stopped after recruiting 20 patients then we would have had four patients on treatment A and sixteen on B. Inversely proportional to the sample size. One such technique is Stratified Sampling. Everyone agrees there are 10 numbers between 1 and 10 (inclusive). • Proper classification of the population into various strata. Thus. 2. You can ascertain the efficiency of the estimates of the parameters by considering the sampling distribution of the statistic (estimates) For example: One measure of calculating precision is sample size. Merits and Limitations of Simple Random Sampling Merits The following set of random numbers came from a popular statistics tables (most statistics textbooks have them): 65246356854282020026 I could allocate patients to treatment A if the number were odd and B if it were even. we will move into details of stratified random sampling. 4. and evaluates it when you hit enter. it is divided into k relatively homogeneous mutually disjoint (non-overlapping) sub-groups. A and B then one way I could allocate patients to treatment groups would be by using a table of random numbers. of sizes N1. ——————————— Now you draw a simple random sample of size n i (i=1. Directly proportional to the variability of the sampling units in the population.3. a simple random sample might give most nonrandom looking results. which I will explain with the help of an illustration next.int (N*rand+S) If you have two values (A and B) that you need random numbers between. not 10. Therefore. in the formula above. Limitations We have understood that in simple random sampling. Sample mean becomes an unbiased mean of population mean or a more efficient estimate of population mean as sample size increases. we can say that simple random sample is more representative of population than purposive or judgement sampling.556 © Copy Right: Rai University . you get 9. and 87 1. Now. some of the randomly allocated sample prove very non-random. 11. 2. to generate more random numbers. if you take 10-1. Since samples units are selected at random providing equal chance to each and every unit of population to be selected .— —k) from each stratum. Also. and b. if we have a population consisting of N sampling units. the variance of the sample estimate of the population is a. termed as strata.to date frame of population from which samples are to be drawn. Stratified Random Sampling RESEARCH METHODOLOGY 1. For a given precision. Therefore. Past data or some other information related to the character under study may be used to divide the population into various groups such that (i) units within each group are as homogeneous as possible and (ii) the group means are as widely different as possible. everything evens out over time. This type of technique of drawing a sample is called stratified random sampling and the sample is called stratified random sampling. such that N = “Ni for i =1 to k . the element of subjectivity or personal bias is completely eliminated. just hit enter again. This would result in successive patients being allocated in the sequence: BABBBAABBABBBBBBBBBB Randomly selected numbers often seem to have patterns in them. N2 Nk . There are two points which you have to keep in mind while drawing a stratified random sample. Although it is impossible to have knowledge about each and every unit of population if population happens to be very large.which are widely spread geographically and in such a case the administrative cost of collecting the data may be high in terms of time and money. The selection of simple random sample requires an up. Stratification means division into layers. Therefore as sample size increases precision increases. Since the calculator remembers the last formula put in. This restricts the use of simple random sample. the population heterogeneity. The limitations of simple random sample will be clear from the example. We also know that the precision is defined as reciprocal of its sampling variance. simple random sample usually requires larger sample size as compared to stratified random sampling which we will be studying next.2. Apart from increasing the sample size or sampling fraction n/ N. Each time you hit enter.

Note: You can allocate the sample sizes for different strata can be done in two ways: 1.. Administrative Convenience As compared with simple random sample. This sampling is more efficient to simple random sample.. hospitals etc.. We number all the sampling units from 1 to N in some order and a sample of size n is drawn in such a way that N = nk i.e. it cannot be compensated by taking large samples. Notations: N —Total number of clusters . Proportional allocation 2. usually called the sampling interval.. In systematic random sampling we draw a number randomly. i+ (n-1)k The random number i is called the random start and its value determines the whole sample. the rest being automatically selected according to some predetermined pattern involving regular spacing of units.. Moreover. It does not provide the sampling error IV. stratified sampling enables us to obtain the results of known precision for each stratum. Accordingly..e. Systematic sampling may yield highly biased estimates if there are periodic features associated with the sampling interval. Minimise the variance (i. .. 4. into some recognizable sub-divisions which are termed as clusters and a simple random sample of n blocks is drawn.. The main disadvantage of systematic sampling is that systematic sampling is that systematic samples are not in general random samples since the requirement in merit two is rarely fulfilled.. For example • Literates and Illiterate . Merits and Demerits of Systematic Random Sampling Now students we will discuss the merits and demerits of systematic random sampling Merits RESEARCH METHODOLOGY In an non-stratified random sample some strata may be over represented. Thus the systematic sample of size n will consist of the units i. the time and money involved in collecting the data and interviewing the individuals may be considerably reduced and the supervision of the field work could be allocated with greater ease and convenience. Stratified sampling ensures any desired representation in the sample of the various strata in the population... It over-rules the possibility of any essential group of the population being completely excluded in the sample.e. Greater Accuracy Stratified sampling provides estimates with increased precision .556 . Principle advantages of Stratified Random Sampling 1. provided the frame (the list from which you have drawn the sample units ) is arranged wholly at random Demerits I. then • The actual sample size is different from that required.e.. the stratified random samples are more concentrated geographically. 3.. It saves your time and work involved.. the sample size of each strata is called proportional if the sample fraction is constant for each stratum i. Minimise the total cost for fixed desired precision Systematic Random Sampling: I. Optimum allocation In proportional allocation.... others may be under-represented while some may be excluded altogether. Systematic sampling is operationally more convenient than simple random sampling or stratified random sampling. (ii) fixed cost 2. I. Both these points are important to be considered because if your stratification is faulty. is an integer. In such cases we will deal with the problem through stratified sampling by regarding the different parts of the population as stratum and tackling the problems of the survey within each stratum independently. Cluster Sampling In this type of sampling you divide the total population ... k = N/n Where k. depending upon the problem under study. let us suppose that the number drawn is i d” k and selecting the unit corresponding to this number and every kth unit subsequently. hostels. 88 © Copy Right: Rai University 11. maximize the variance) of the estimate for (i) fixed sample size.. More representative: In systematic sampling you select the first unit at random. II.. Stratified sampling thus provides a more representative cross section of the population and is frequently regarded as the most efficient system of sampling. The individuals which you have selected from the blocks constitute the sample. if the frame (list) has a periodic feature and k is equal to or a multiple of the period.... n1 = n 2 = .• A suitable sample size from each stratum. i+2k. If N is not a multiple of n. 2. III. i+k. II. and • Sample mean is not an unbiased estimate of the population mean.People living in ordinary homes and people living in institutions. allocation of n i . n —number of sampled : If you have the complete and up-to-date list of sampling units is available you can also employ a common technique of selection of sample.. Sometimes you will notice that sampling problems may differ markedly in different parts of population.= nk N1 N2 Nk Optimum Allocation is another guiding principle in the determination of the ni is to choose them so as to : 1. Now let us assume that the population size is N. which is known as systematic sampling.

.. clusters being termed as primary units and the units within the clusters being termed as primary units and the units within the clusters as secondary units. i=1 Mi !Number of sampling units in the ith cluster.. i= 1. business and industrial complexes. 2. Y ij ! jth obsevation in the ith cluster ( j = 1. 3 .. 3 . One is that there is Right: Rai University © Copy 11. 3 . We regard population as a number of primary units each of which is further composed of secondary stage units and so on .. Thus in the nutshell we can say that Non probabilistic sampling such as Convenience sampling..Mi.-n).... It is generally less efficient than a suitable single. II. 2.. 3 . Judgement Sampling and Quota sampling are sometimes used although representative ness of such a sample cannot be ensured.... This technique is called two-stage sampling. with widely varying number of persons or households. instead of enumerating all the sampling units in the selected cluster.. i= 1. III. Points to Ponder • Sampling is based on two premises.. till we ultimately reach a stage where desired sampling units are obtained. 2... 2..It is simple to carry out and results in administrative convenience by permitting the field work to be concentrated and yet covering large area. Multistage sampling is more flexible as compared to other methods .. In some situations Mi as well as M are not known. In multi-stage sampling each stage reduces the sample size. It saves a lot of operational cost as we need the second stage frame only for those units which are selected in the first stage sample . Notes: • Clusters should be as small as possible consistent with the cost and limitations of the survey RESEARCH METHODOLOGY • The number of sampling units in each cluster should be approximately same..stage sampling of the same size. Multistage Sampling One better way of selecting a sample is to resort to subsampling within the clusters . Whereas a probabilistic sampling to each unit of the population to be included in the sample and in this sense it is a representative sample of the population. This technique can be generalized to multistage sampling. Thus cluster sampling is not to be recommended if we are sampling areas in the cities where there are private residential houses.-N) yij ! jth obsevation in the ith sampled cluster ( j = 1. apartment buildings. etc.n M! “ Mi is total number of units in the population.556 enough similarity among the elements in a population that a 89 .Mi. This brings an end on today’s discussion on sampling techniques. Merits and Limitations I.

These points reflect ideal combinations of the two dimensions as seen by a consumer. In these situations failures can be reduced.556 .RESEARCH METHODOLOGY LESSON 15: APPLICATIONS OF MARKET RESEARCH 1. Many companies spend millions of rupees onR & D in order to come up with a new product that will satisfy consumer needs. product line. The next diagram shows a study of consumers’ ideal points in the alcohol/spirits product space. in which products are positioned along the dimensions by which users perceive and evaluate. They felt Plymouth was most practical and conservative (bottom left corner). Areas where there is a cluster of ideal points (such as A) indicates a market segment. A managerial decision to use a pretest market analysis is justified if sufficiently accurate predictions can be achieved. brand. the purpose of marketing research for them would reduce the uncertainties associated with the new products. useful diagnostics for improvement are generated. A company considering the introduction of a new model will look for an area on the map free from competitors. Price Research 3. Product Research 2. For example consumers see Buick. Typically the position of a product. Four stages of new product development could be seen: Perceptual Map of Competing Products Cars that are positioned close to each other are seen as similar on the relevant dimensions by the consumer. They are close competitors and form a competitive grouping. or company is displayed relative to their competition. Perceptual mapping is a graphics technique used by marketers that attempts to visually display the perceptions of customers or potential customers. Promotion Research two dimensions of sportiness/conservative and classy/ affordable. Perceptual maps can have any number of dimensions but the most common is two dimensions. can suggest gaps into which new products might fit. • • • • Generating New-Product Concepts Evaluating and Developing those Concepts Evaluating and developing the actual products Testing in a Marketing Programme Concept Generation: There are two types of concept generation research: • Need identification research: • Concept Identification Need identification research: The emphasis in need research is on identifying unfilled needs in the market. time-to-market can be shortened. Each dot represents one respondents ideal combination of the two dimensions. Thus. Product Research The main product decisions that need to be considered are the physical design of the product and its demand potential. Areas without ideal points are sometimes referred to as demand voids. We cover various information requirements and techniques used for this purpose. Following are some examples: a) Perceptual Maps. and Oldsmobile as similar. The first perceptual map below shows consumer perceptions of various automobiles on the Perceptual Map of Ideal Points and Clusters 90 © Copy Right: Rai University 11. and products improved to increase customer satisfaction New Product Research: New product development is critical to the life of most organizations as there will be uncertainties associated with them. Chrysler. Some perceptual maps use different size circles to indicate the sales volume or market share of the various competing products. Any more is a challenge to draw and confusing to interpret. the timing of the analysis is before large investment commitments are necessary. and the cost of the analysis is reasonable. Distribution Research 4. Displaying consumers’ perceptions of related products is only half the story. Many perceptual maps also display consumers’ ideal points. This sample of consumers felt Porsche was the sportiest and classiest of the cars in the study (top right corner).

It is questionable how valuable this type of map is. Client representatives observe the discussion from behind a one-way mirror. and another segment that is more interested in gentleness than strength.A company considering introducing a new product will look for areas with a high density of ideal points.groups are comprised of 4 / 5 members. b) Social and environment trends can be analyzed. understanding of unsolved problems associated with a particular task. Respondents often feel a group pressure to conform and this can contaminate the results. and group dynamics. idea. Some techniques are constructed from perceived differences between products. The slope of the ideal vector indicates the preferred ratio of the two dimensions by those consumers within that segment. Multi dimensional scaling will produce either ideal points or competitor positions. They also try to interpret facial expressions. focus groups are an important tool for acquiring feedback regarding new products. Still others are constructed from cross price elasticity of demand data from electronic scanners. and the session usually lasts for 1 to 2 hours.one moderator ensures the session progresses smoothly. They will also look for areas without competitive rivals. The result is an identification of benefits sought that current products do not deliver. Types Of Focus Groups Two-way focus group . or test market a new product. The discussion is unstructured (or loosely structured). Some maps plot ideal vectors instead of ideal points. In traditional focus groups. Analysing such diaries can provide an 11. It also shows two ideal vectors. A focus group is a form of qualitative research in which a group of people. and the moderator encourages the free flow of ideas. others are constructed from perceived similarities. are asked about their attitude towards a product. a pre-screened (pre-qualified) group of respondents gathers in the same room. for specific applications. and logit analysis can also be used. This can provide invaluable information about the potential market acceptance of the product. product users might discuss problems associated with product-use situations. view. displays various aspirin products as seen on the dimensions of effectiveness and gentleness. Although the moderater is seldom given specific questions to ask. c) An approach termed benefit structure analysis has product users identify the benefits desired and the extent to which the product delivers those benefits. But group dynamics is useful in developing new streams of thought and covering an issue thoroughly. body language.one or more of the respondents are asked to act as the moderator temporarily Client participant focus groups . to discuss. In the world of marketing. Participants cannot see out. d) Product users might be asked to keep a diary of a relevant portion of their activities. Placing both the ideal points and the competing products on the same map best does this. cluster analysis. Factor analysis.one or more client representatives participate in the discussion. focus groups allow companies wishing to develop. There are also intuitive maps (also called judgmental maps or consensus maps) that are created by marketers based on their understanding of their industry. Transcripts are also created from the video tape. They are prescreened to ensure that group members are part of the relevent target market and that the group is a representative subgroup of this market segment. The map below. A moderator guides the group through a discussion that probes attitudes about a client’s proposed products or services. Questions are asked in an interactive group setting where participants are free to talk with other group members. a video camera records the meeting so that it can be seen by others who were not able to travel to the focus group site. discriminant analysis. package. either covertly or overtly Mini focus groups . There is an assortment of statistical procedures that can be used to convert the raw data collected in a survey into a perceptual map. Management uses its best judgement. Usually. and/or test the new product before it is made available to the public. but the researchers and their clients can see in. When detailed marketing research studies are done methodological problems can arise. Researchers are examining more than the spoken words. advertisement. e) In focus–group interviews. he/she is often given a list of objectives or an anticipated outline. but at least the information is coming directly from the consumer. while another ensures that all the topics are covered Dueling moderator focus group .556 © Copy Right: Rai University 91 . Preference regression will produce ideal vectors.one focus group watches another focus group and discusses the observed interactions and conclusions Dual moderator focus group . concept. name. In particular.two moderators deliberately take opposite sides on the issue under discussion Respondent moderator focus group . or packaging. RESEARCH METHODOLOGY Perceptual Map of Competing Products with Ideal Vectors Perceptual maps need not come from a detailed study. Often they just give the appearance of credibility to management’s preconceptions. There are usually 8 to 12 members in the group. This study indicates there is one segment that is more concerned with effectiveness than harshness.

then they could well turn into tomorrow’s competitors. has a system of on-line focus groups which allows respondents from all over the country to gather. Where possible. Additionally. Only in this way can questions be added in real time to further probe a particular response. This would require a considerable expenditure in travel and lodging expenses. They may even be prepared to share the burden of investment in order to find a suitable solution. The aim is to determine if the concept warrants further development and to provide guidance on how it might be improved and refined. push existing solutions to the limit or who have customized standard products to satisfy their own desires. Where possible. it would be critical to gather respondents from various locales throughout the country since attitudes about a new product may vary due to geographical considerations. while avoiding countless logistical headaches. it is now possible to link respondents electronically. Thus. Online groups are usually limited to 6 or 8 participants.computers and internet network is used Traditional focus groups can provide accurate information. it should be relatively straight forward to identify those customers who demand special solutions. In addition. Online Focus Groups With the advent of large scale computer networks. such a system must allow for the implementation of at least two separate chat discussions to be conducted simultaneously between the three classes of focus group participants to provide an electronic analog to a one-way mirror segregating clients from respondents. and are less expensive than other forms of traditional marketing research. NFO Research. There can be significant costs however : if a product is to be marketed on a nation-wide basis. The concept should be defined well enough so that it is communicable. including one or more moderators. it is possible to identify opportunities for future products and evaluate emerging concepts. electronically. which result in these users or customers having a leading position. the site of a traditional focus group may or may not be in a locale convenient to a specific client. one or more clients and one or more respondents. Furthermore. even the online system incurs some travel expenses since a client representative will need to travel to a research site or vice versa. such as the Internet. Accordingly. so client representatives may have to incur travel and lodging expenses as well. This avoids a significant amount of travel expenses. Lead users are those who face needs early that later will be general in a market place. who are all physically remote from each other. There are few industries of product types where there are no lead users who have requirements or demands ahead of the rest. f) In Lead user analysis. Respondents share images. if lead users are sufficiently interested.telephone network is used On-line focus groups . Once a lead user is identified. data. research questions might include: RESEARCH METHODOLOGY • • • • Are there any major flaws in the concept? What consumer segments might be attracted to it? Is there enough interest to warrant developing it further? How might it be altered or developed further? 92 © Copy Right: Rai University 11. By targeting these clusters. their solutions are collected more formally. instead of just asking users what they have done.556 . Eric von Hippel introduced the concept of ‘Lead Users’ in the mid 1980s. but face them months or years before the bulk of that marketplace encounters them They are positioned to benefit significantly by obtaining a solution to those needs Where a company has experience within a market place. it still requires one or more representatives from a client to be physically located with the moderator conducting the focus group. if today’s lead users do not find appropriate solutions from existing suppliers. then they can be considered as a part of the extended product design team. the lead users may only have an interest in improvements or changes to specific elements or attributes of a product. He defined the lead user as those users who display the following two characteristics: They face the needs that will be general in the market place. For instance. Lead users are an extremely valuable cluster of customers and potential customers who can contribute to identification of future opportunities and evaluation of emerging concepts. In addition. it can be useful to look beyond existing customers perhaps to users of complementary or substitute goods or in analogous markets. Understanding these users can provide richness of information relatively efficiently. or an advertising approach. a large market research company. such a system must allow and prohibit participation in the different chat discussions based on the class of the participant. there is a need for a system and method of conducting focus groups using remotely located participants. and their responses on their computer screens. Von Hippel suggests that a key element in identifying lead users is to first identify the underlying trends. they are positioned to benefit significantly by solving problems associated with these needs.Telesession (or teleconference) focus groups . a package. lead users should not necessarily be sought from within the usual customer base. Conjoint analysis typically is used to obtain an ideal combination of the concept’s various features. There may be simply a verbal description. Concept Identification During a New-product development process there is usually a point where a concept is formed but there is no tangible usable product that can be tested. In order to do this. the concepts that company or person generates are tested. The lead users are those who are at the leading edge of these trends. Thus. or there may be a rough idea for a name. While such a system does eliminate some of the logistical headaches and travel expenses associated with conducting focus groups. The biggest problem with online focus groups is ensuring that the respondents are representative of the broader population (including computer non-users).

A combination of analytical models. performance. as is expected in the production version and so activities should be performed in full and not simply walked through. In exposing people to the concept. the product should be as near to representing the final item as possible. assembly methods and robustness. Data will typically be qualitative and based on observation. including packaging. including: • What do the users think about using the concept? • • • • • Does the basic functionality have value to the user? Is the user interface appropriate and operable? How does the user feel about the concept? Are our assumptions about customer requirements correct? Have we misunderstood any requirements? This type of early analysis of concepts is potentially the most critical of all types of prototyping and evaluation. This helps the development team to understand the purpose of each test and consider how data is to be captured. Data should also be formally recorded.556 © Copy Right: Rai University 93 . then problems are almost inevitable later on. maintainability. when the problem is still being defined and potential solutions are being considered. safety or legislative purposes. such as time to perform tasks. Normally. Assessment tests While the exploratory test aims to explore the appropriateness of a number of potentially competing solutions. Some quantitative measures may be appropriate. there is a much greater emphasis on experimental rigour and consistency. Also included within validation tests will be any formal evaluation required for certification. Thus. number of failures or errors. to compare a concept. product or product element against some alternative. Four general types of testing are described in more detail: • Is the concept usable? • Does the concept satisfy all user needs? • How does the user use the product and could it be more effective? • How will it be assembled and tested and could this be achieved in a better way? • Can the user complete all tasks as intended? Assessment testing typically requires more complex or detailed models than the exploratory test. This alternative could be an existing • • • • Exploratory tests Assessment tests Validation tests Comparison tests ISO 9000 tests are also briefly summarised. involves exposing people to the concept and getting their reactions. Assuming that the right concept has been chosen. Comparison tests A comparison test may be performed at any stage of the design process. This may include usability. Validation tests The validation test is normally conducted late in the development process to ensure that all of the product design goals have been met. based on measurement of performance. Ideally. simulations and working mock ups (not necessarily with final appearance or full tooling) will be used. however. but should always be quantified. such as: RESEARCH METHODOLOGY It is important to make a distinction between the different types of testing applied at different stages of the development process. accuracy or rate of use. The main aim of an assessment test is to ensure that assumptions remain relevant and that more detailed and specific design choices are appropri- 11. Data collection will tend to be qualitative based on observation. for if the development is based on faulty assumptions or misunderstanding about the needs of the users. Data from a validation test is likely to be quantitative. although elements may have been tested individually already. Different testing methods will have different objectives. It may be preferable for evaluation to be carried out independently from the design team. this is carried out against some benchmark of expected performance. The study should aim to understand why users respond in the way that they do to the concept. discussion and structured interview. then the assessment test aims to ensure that it has been implemented effectively and answer more detailed questions. The evaluation process is likely to be relatively informal. approaches and types of modeling. Validation tests normally aim to evaluate actual functionality and performance. Compared to an assessment test. the customer should be asked to use the product without training or prompting. may be appropriate for evaluating early levels of performance. with any failures to comply with expected performance logged and appropriate corrective action determined. It is probable that the validation test is the first opportunity to evaluate all of the component elements of the product together. reliability. documentation and production processes. including both internal and external stakeholders. Exploratory tests Carried out early in the development process during the fuzzy front end. the market researcher needs to address a series of questions: • • • • How are the concepts exposed? To whom are the concepts exposed? To what are they compared? What questions are asked? ate. preferably once the development team has a good understanding of the user profile and customer needs. to assess the intuitiveness of controls and instructions. The assessment test will tend to focus on the usability or level of functionality offered and in some cases.Most concept testing. The objective of the exploratory test is to examine and explore the potential of preliminary design concepts and answer some basic questions. Usability issues may be scored in terms of speed. Issues such as desirability may be measured in terms of preference or user ranking. interview and discussion with the target audience. the assessment test digs into more detail with a preferred solution at a slightly later stage of development. but with team input on developing standards and measurement criteria.

Limitations Sales Potential) model has been developed. were introduced in the market. plus an estimate of the product’s distribution. These estimates for the respondents in the study are coupled with an estimate of the proportion of all people who will have the new brand in their response set. • They may inflate their intention to buy. The respondents then have an opportunity to make a “trial” or first purchase of the product. The product of the trial estimate and the repeat purchase estimate become a second estimate of market share. • Distribution of the product-percentage of stores stocking the product (weighted by the store’s total sales volume). Product Evaluations and Development The aim is to predict market response to determine whether or not the product should be carried forward. the respondents may nor use the product correctly and may therefore report a negative opinion. It was supposed to be hit during Diwali time and advertisements were released to prop up sales. Design validation Design validation is a process whose purpose is to examine resulting products and to use objective evidence to confirm that these products meet user needs. Blind-use Testing: Even though a product may be proved superior in the laboratory. advertising (which will create product awareness). the percentage of households purchasing at least one item in the product class within one year. Trial and repeat purchase levels: This is based on the respondent’s purchase decisions and intentions-to-buy judgments. Test Marketing Test marketing allows the researcher to test the impact of the total marketing program. Pretest Marketing: Two approaches are used to predict the new brand’s market share: Preference Judgments: Here the preference data are used to predict the proportion of purchases of the new brand that respondents will make given that the new brand is in their response set. by personal visits to their homes or offices. The researcher simply estimates the percentage of household using the product class. • Promotional expenditures-total consumer-directed promotional expenditures on the product. frequently purchased consumer products. Predicting Trial Purchase: To predict trial levels of new. A trial estimate is based on the percentage of respondents who purchase the product in the laboratory. Comparison testing could include the capturing of both performance and preference data for each solution. a competitive offering or an alternative design solution. Functions: • To gain information and experience with the marketing program before making a total commitment to it. During the course of this review. • The fact that they were given a free sample and are participating in a test may distort their impressions. problems must be identified and necessary actions proposed. A respondent is exposed to the new product promotion and allowed to shop in a simulated store or in an actual store in which the product is placed. such information can be critical. Use Testing: This gives the users a reasonable time to feel the product and inquires their reactions and their intentions to buy it. • The users will not accept the product over a long period of time. Once the model is estimated. a misunderstanding. The repeat-purchase rate is based on the proportion of respondents who make a mail order repurchase of the new brand and the buying-intentions judgments of those who elected not to make a mail order repurchase. For e. Unfortunately the consumers perceived the product as a premium product and did not substitute their purchases from the local Halwai. Consumers may say that they will buy the product but may end up not doing so. The comparison test is used to establish a preference.g. or on telephone.556 . • Even when repurchase opportunities were made available. in a market context as opposed to the artificial context associated with the concept and product tests that have been discussed. it can be applied to other new products. This is also used to analyse the concomitant market share loses of other brands.. ISO 9000 tests ISO 9000 defines a number of test activities: Design review A design review is a set of activities whose purpose is to evaluate how well the results of a design will meet all quality requirements. determine superiority or understand the advantages and disadvantages of different designs. and the number of free samples to be given away. to provide an estimate of market share.. If the firm has other brands in the market. ESP (Estimating • To predict the program’s outcome when it is applied to the total market. the total expenditures planned for the new product. which was perceived as a superior by the company by all standards. the consumer may not perceive it to be superior. The model will then estimate the trial level that will be obtained. Trial levels were predicted on the basis of three variables: RESEARCH METHODOLOGY • Product class penetration (PCP). such decisions may be quite different than when they are made in a more realistic store situation. Design verification Design verification is a process whose purpose is to examine design and development outputs and to use objective evidence to confirm that outputs meet design and development input requirements. Trial also can be estimated directly using controlled shopping experience. • Due to unclear instructions. or lack of cooperation.solution. with all its interdependence. Researchers can contact respondents in shopping centers. and the expected distribution level. Types of Test Market: 94 © Copy Right: Rai University 11. Amul sweets.

where different prices for a product are presented to respondents. This implies that the researcher’s major tasks are to forecast the costs and the revenues over the relevant range of alternative prices. at the point at which profits will be the greatest until market conditions change or supply costs dictate a price change. • Gabor and Grainger Method (Price skimming strategy). at different prices. The test itself may tend to encourage those involved to enhance the effectiveness of the marketing program. Salespersons may be more aggressive. preparation of test products. If a new product launch is delayed. but you would have to give some thought to buying it? (Expensive) • Timing: Normally. Pricing Approaches RESEARCH METHODOLOGY • Certain parameters that have to be looked into while deciding sell-in test market: • Representativeness: Ideally. Thus. a test market should be in existence for one year. Even they can retaliate or can also monitor the results themselves. an opportunity to gain a substantial market position might be lost. • At what price would you consider the product to be a bargain—a great buy for the money? (Cheap) Research for Skimming Pricing: This is based on the concept of pricing the product. Data Availability: Information about Store audit is helpful in evaluating the test. the optimal price is the one that results in the greatest positive difference between total revenues and total costs. Media cost is another consideration. who then are asked if they would buy. Pricing Research Research may be used to evaluate alternative price approaches for new products before launch or for proposed changes in products already in the market. Following questions are generally asked with regard to pricing research: • Product flow: It may be desirable to use cities that don’t have much “product-spillage” outside the area. administration of the test and collection of data associated with the test. trial purchase. • Implementing and controlling: The test should be controlled in such a manner that it ensures the marketing program is implemented in the test area so as to reflect the national program.• The sell-in test markets are cities in which the product is sold just as it would be in a national launch. such as product usage.development and implementation of the marketing program. so that all important seasonal/cultural factors can be observed and estimated. attitude. Under this strategy. • The controlled-distribution scanner markets (CDSM) are cities for which distribution is pre-arranged and the purchases of a panel of customers are monitored using scanner data. attitudes and demographics. delay the launch of a new product are more difficult to quantify. The most useful information obtained from consumers is whether they bought the product at least once. • Costs: Costs which are quantifiable. Pricing research for the two different approaches differs substantially in terms of the information sought. and are asked which they would buy. • Measurement: The basic measure is sales based on shipments or warehouse withdrawals. They also provide information on: distribution. as they normally would outside such a test. “spill-in” media from nearby cities can contaminate a test. • Number: A single city can lead to unreliable results because of the variations across cities of both brand sales and consumer response to marketing programs. whether they were satisfied with it. include . This information helps evaluate the marketing program and can help interpret sales data. The product has to gain distribution space. The costs and risks that may 11. spill-over. this technique represents a form of simulation of the point of sale. Conversely. Potential profits in the early stages of the product life cycle are sacrificed in the expectation that higher volumes in later periods will generate sufficiently greater profits to result in overall profit for the product over its life. The selected cities should contain retailers who will cooperate with store audits. The competitors may react by deliberately flooding the test areas with free samples or instore promotions. • At what price would you consider the product to be so expensive that you would not consider buying it? (Too expensive) • At what price would you consider the product to be priced so low that you would feel the quality couldn’t be very good? (Too cheap) • At what price would you consider the product starting to get expensive. Store audit data provide actual sales figures and are not sensitive to inventory fluctuations. A “buyresponse” curve of different prices. The objective is to generate as much profit as possible in the present period. shelf-facings. and whether they repurchased it or plan to. the city should be fairly representative of the country in terms of characteristics that will affect the test outcome. Research for Penetration Pricing: This is based on the concept that average unit of production costs continue to go down as cumulative output increases. Using media that “spill-out” into nearby cities is wasteful and increases costs. This allows the respondents to take into account competitors’ brands. Measures such as brand awareness. The objective is to capture an increasingly larger market share by offering a lower price. where respondents are shown different sets of brands in the same product category. is produced. • • Media isolation and Costs: It is desirable to avoid media • Multibrand-choice Method (Share penetration strategy). with the corresponding number of affirmative purchase intentions.556 © Copy Right: Rai University 95 . Retailers may be more cooperative. and repeat purchase are obtained directly from the consumer. and in-store promotional activity. so that it is not out of the question.

Often. Therefore to evaluate various product specifications. Experiments are done with the calls made. Estimating the number of sales calls required to sell to. The traditional concept test can be effectively used in pricing research when the product features are already determined. That is. Average market potential is less as per each sales representative ii. Divide the estimate in step (i) by the estimate in step (ii) to obtain the number of sales representatives required. However. An analysis of actual sales versus market potential for each sales representative can be made.when the product line is first introduced and there is no operating history to provide sales data. To illustrate. the total sample size must grow. IV. Market potential is more but have too few sales representatives A concept test will rely on aggregate. and to service. iii. with each variation at three prices. Inefficient to evaluate various product specifications • Sales effort approach. if we have three alternative product variations. Once a sales history is available from each territory. Chain shops with multiple outlets and franchise operations must decide on the physical location of their outlets. Estimating the average number of sales calls per representative that can be made in that territory. When shopping. This will the sum of the number of visits required per year to each prospect (customer) in the territory. the distribution decisions in marketing strategy involve: • • • • The number and location of salespersons. The concept test can be used to evaluate these various specifications. this approach will make respondent’s heterogeneity difficult to detect and measure. which have too many sales representatives iii. we would require 600 respondents. III. Each limitation is discussed below. Also. b) Hold that price constant until unit costs produce a desired percentage markup. This determination is based on a respondent’s awareness of the current pricing in the category. and competitive stores would help in choosing optimal location. Warehouses. This is done by: i. Relies on aggregate-level analysis • Statistical analysis approach. When presented with a set of products to select from. if we wished 200 observations per cell. most researchers would suggest that each respondent only evaluate one concept. to determine the number and location of sales representatives. it provides no competitive information. In the absence of this comparative task. it relies on price awareness. if we choose one location over another?” The approximate location (optimal location). • Field Experiment approach. and also whether the firm wants to follow a “push” or a “pull” strategy. However.is used after the sales program is under way. Despite the ubiquitous nature of the above questions. and the competitive context is such that evaluating a single product is not too limiting.556 96 © Copy Right: Rai University . Data about surrounding residential neighbourhood. will have to be determined. Relies on price awareness How many sales representatives should be there in a given territory? Approaches: The respondent compares the price presented in the concept to an internal reference price to determine if the price is fair or not. weighted by the quantities purchased. Territory. and The size of discount to be offered. there might be an interest in the market’s willingness to pay for a specific feature or how the inclusion or exclusion of a product characteristic influences purchase likelihood. consumers can make trade-offs between features and price to determine their preferred product. Provides no competitive information Distribution Research RESEARCH METHODOLOGY Traditionally. researchers commonly encounter four limitations when using this approach for pricing research: I. Number and location of Sales Representatives A concept test asks respondents to evaluate how likely they would be to purchase a specific product without any information about other products that might be available in the market. that will minimize the distance to customers. The discount to be offered to the members in the channel of distribution usually is determined by what existing or similar products are offered. it is inefficient when evaluating numerous product specifications. following inferences can be made: i. consumers generally have the chance to see a set of competing products and pick one from the set.is also applicable only after the sales program has begun. and we are only testing three prices (three cells). c) Reduce price as costs fail to maintain markup at the same desired percentage. Warehouse and Retail Location Research Location decisions include: “What costs and delivery time would result. ii. respondents may have difficulty answering reliably. or at most subgroup-level analysis.Following pricing pattern is adopted to increase market share: a) Offer a lower price (even below cost) when entering the market. This is done in two ways: 11. an analysis can be made to determine if the appropriate number of sales representatives is being used in each territory. it relies on aggregate-level analysis. a researcher would like to evaluate a small number of specific product variations at the same time price is being evaluated. prospective customers in an area for a year. the level of price awareness is high. II. income levels. Retail outlets. For instance. we now have nine cells and would require 1800 respondents.

i. Making more frequent calls on some prospects and less frequent calls on others. Advertising Research: Advertising decisions are more costly and risky. advertising research decisions are about advertising copy. and Purchasing stage Most often. Promotion Research RESEARCH METHODOLOGY Here the focus is on the decisions that are commonly made when designing a promotion strategy.556 © Copy Right: Rai University 97 . keeping the number of sales representatives unchanged. The decision for the promotion part of a market strategy can be divided into: • Advertising decisions. Companies spend more time and resources on advertising research than on sales promotion research because of the greater risk and uncertainty in advertising research. Marketing research helps to determine how effective the advertisement will be. Increasing the number of representatives in some territories and decreasing them in others to determine the sales effect. Notes- 11. The effectiveness of an advertisement depends upon the brand involved and its advertising objectives. which have long-term effects. Advertising Recognition: The respondents are tested whether they can recognize the advertisement as one they have seen before. Advertising research involves generating information for making decisions in the: • • • • Awareness stage Recognition stage Preference stage. Research on media decisions is separate from advertising research. • Sales Promotion decisions. in order to see the effect on overall sales. Four categories are used in advertising research: • • • • Advertisement recognition Recall of the commercial and its contents The measure of commercial persuasion. which affect the company in the short term. and The impact on the purchase behavior. ii.

556 . Gathered information should be presented in such a manner that even a layman understands what. accuracy and significance. we will discuss these in detail for the better understanding. No totally accepted rule tells us how many intervals are to be used. information to the decision makers. You will first have to decide on a file format and then devise a code for analysis. why. which converts raw data into a meaningful pattern for statistical analysis. if any. Entry of Data You should fix up the number of translation steps between subject’s response and readable data file • Computer assisted techniques: 1 • Digital answer format (Scantron): 3 • Entry by hand: 4 • Impacts ability to check quality of data entry (accuracy. ii) Inapplicable Information: information does not apply to a particular respondent iii) Unknown information: information as to respondent’s claim of awareness (How to treat “Don’t know” option) C. The following are the steps of constructing a frequency distribution: 1. A series of options need to consider when you enter the information you have gathered. • Perform Reverse coding/Unfolding complex response formats. Specify the number of class intervals. they need a means of converting the raw data into useful information. today we shall be doing the most crucial step in research process. • You have to be consistent in assigning values with similar responses • You should identify the question groups within test. • For Test-level: you code questions in order of appearance. Thus. This stage of data entry and coding comes after the collection of desired information is the coding and analysis of data. although it is not so.Data coding and Data Analysis. Between 5 and 15 class nominal). First of all you should try to make coding translation simple • Coding should be done minimizing effort and risk of coding errors • Remember the Item-level: Leave #s as #s (#s can be The problem most decision makers must resolve is how to deal with the uncertainty that is inherent in almost all aspects of their jobs. This step seems very simple. A. Raw data provide little. How Missing Data are Treated You should have knowledge about : i) Non-ascertained Information has to be recognized: information not obtained because of interviewer or respondent performance. Clean Data File You should examine each data file to ensure each record is complete and in order • You should remove non-legal codes • Then you should replace it with information from original Devise Code for Analysis The main points you want to remember while devising the code for analysis are: • Set of rules that translates answers into discrete values • Alphabetical or Numerical depending on measurement scale • Preserve level of measurement for each item • General Considerations (closed questions): response format • Proper importance should be given to verification Now. we will concentrate on some of the frequently used methods of presenting and organizing data. • It should help in facilitating data interpretation RESEARCH METHODOLOGY B. A class is a group (category) of interest. • Reason for failure to ask question • Failure to obtain appropriate response • Refusal to answer question (separate) Data Entry It is the process of taking completed questionnaires\surveys and putting them into a form that can readily be analyzed. when and how of information. it is easy to perform various statistical tests for their validity. In this lecture note. 98 © Copy Right: Rai University 11. Decision on File Format It comprises of decisions regarding: • The way the data will be organized in a file • Order of information collected • How subject is referenced • Constructing individual records • History of 80-column format • Application to statistics programs reliability) D. Frequency Distribution The easiest method of organizing data is a frequency distribution.UNIT III DATA ANALYSIS LESSON 16: DATA CODING AND ANALYSIS Student. Once the information is tabulated.

... As we will see in lecture number 3... It shows the degree of difference between observations.. This descriptive analysis provides us with an image of the student sample.......... Stem-and-leaf plots offer another method for organizing raw data into groups....... Thus......... 9.. The students in the sample are generally older.....1.... Note that the classes must be both mutually exclusive and all-inclusive......0.. These points then are connected.... A more-than ogive shows how many items in the distribution have a value greater than or equal to the lower limit of each class.. 19........... A more-than cumulative frequency polygon is constracted by using the lower true limits and the cumulative frequencies..... Frequency distribution is the basis for probability theory...50 inclusive up to but not including $799.....3............ we see that 40% of all students are younger than 24 years old.... A complete circle (the pie) represents the total number of measurements..True Limits $600 ...8 which is roundedup to 2.... The stem-and-leaf is developed by first determining the stem and then adding the leaves.............. Pie chart is often used in newspapers and magazines to depict budgets and other economic information..5..... the cumulative frequency for the above problem is: 3.2..1.. When all intervals are to be the same width. An ogive also shows the relative cumulative frequency distribution on the right side axis..... the slice assigned to that category is 40% of 360 or (0... and 10.5 We select K=4 and W=(25.. For example.. the $600 .. 20...... Presenting Data Graphs... Time series graph is a graph in which the X axis shows time periods and the Y axis shows the values related to these time periods.... 25.$799..... $799.00 or 100%......... 20.. the number 78 can be represented by a stem of 7 and a leaf of 8...... the following rule may be used to find the required class interval width: W = (L . 55......9.. Any amount over $799 but under $799.20% 22-U-24... Mutually exclusive means that classes must be selected such that an item can’t fall into two classes.. and 22..0... Bar charts are used to graph the qualitative data.30% 20-U-22....... In the above example.. RESEARCH METHODOLOGY Cumulative Frequency Distribution When the observations are numerical. 18......... the stated class limits and true (real) class limits are given in the following table Stated Limit.. The bars do not touch.. 18.. and charts are used to present data........ 32. 36. 23........... 54..... 52. relative....... The size of a slice is proportional to the relative frequency of a particular category.... but older than 22 years old..........9.... For example..........$599..50 up to but not including $999.. curves.... Ogive is also used to graph cumulative frequency.....2.3. Histograms are used to graph absolute......1..... For comparison.50 was rounded down to $799 and included in the 11...... L= the largest data.intervals are generally recommended.....3 . It is possible that the population is made up of night students who work on their degrees on a part-time basis while holding full-time jobs. Each bar in the chart shows the degree of quality problem for each variable measured....... The purpose of this chart is to show the key causes of unacceptable quality. The most common age is between 22 an 24.....S) / K where: W= class width. which from general information we know to be higher than usual for the students who enter college right after high school and graduate about age 22.Class Frequency.$999...........50 $800 ... For example. 23..$799 class actually includes all data from $599...... The stem contains the highervalued digits and the leaf contains the lower-valued digits............. Pareto chart is a special case of bar chart and often used in quality control.. It shows how the observations cluster around a central value. which is not available from raw data....0. and 63 can be grouped as follows: Stem.. 5.. since a complete circle is equal to 360 degrees..6 4 Suppose the age of a sample of 10 students are: 20...... Relative frequency may be determined for both quantitative and qualitative data and is a convenient basis for the comparison of similar groups of different size.$799. 2. For example............1)/4 = 1.. Stated & True Class Limits True Classes are those classes such that the upper true (or real) limit of a class is the same as the lower true limit of the next class. For example..........18.. 21... 22. in the above problem we know that no student is younger than 18 and the age below 24 is most typical. A less-than cumulative frequency polygon is constructed by using the upper true limits and the cumulative frequencies... It shows the total number of observations which lie above or below certain key values..40)(360)= 144 degrees. An ogive is constructed by placing a point corresponding to the upper end of each class at a height equal to the cumulative frequency of the class....3. S= the smallest data. and cumulative frequencies....50 In the first column of the above table the data were rounded to the nearest dollar.......... What Frequency Distribution Tells Us 1.. Thus.50 up to but not including $799..... Cumulative Frequency for a population = frequency of each class interval + frequencies of preceding intervals. variables are discrete and not continuous...... A less-than ogive shows how many items in the distribution have a value less than the upper limit of each class...Leaf 2. and all-inclusive classes are classes that together contain all the data.....4. cumulative frequency is used. if the relative frequency for a category is 0... indicating that the attributes are qualitative categories. The frequency table is as follows: Class Interval...4..50 was rounded up to $800 and tallied in the second class. These types of plots are similar to the histogram except that the actual data are displayed instead of bars.4.. 22.... K= number of classes Example: first class.556 © Copy Right: Rai University 99 ... the numbers 34......Relative Frequency 18-U-20....50. and 2..40.10% Note that the sum of all the relative frequency must always be equal to 1.2 3... 68.40% 24-U-26.

Choose the units for the stem so that the number of stems in the display is between 5 and 20. and include a key that defines the units of the leaf............8 stems in the range of the data...3.. Omit the decimals. Define the stem and leaf that you will use... 3.2.......5.4..... You may round the numbers to be more precise.. RESEARCH METHODOLOGY Steps to Construct a Stem and Leaf Plot 1.5 6. even if there are some stems with no corresponding leaves.. but this is not necessary for the graphical description to be useful...... Include all See the following Figures: 100 © Copy Right: Rai University 11. 4.....556 . Record the leaf for each measurement in the row corresponding to its stem. 2. Write the stems in a column arranged with the smallest stem at the top and the largest stem at the bottom... drop the digits after the first... If the leaves consist of more than one digit.

Our focus will be more on understanding key concepts intuitively and less on formulas and calculations. • Credit managers make estimates about whether a purchaser will pay his bills on the basis of past behaviour of customers with similar characteristics or their past repayment record 101 11. How do you go about ensuring this? One way would be to accept blindly what ever your suppliers claim. The sample mean for example can be used as an estimate of the population mean. estimation Nature of Statistical Inference What is Statistical Inference? We have seen that all managerial/ business situations involve decision making in situations with incomplete information. Types of Statistical Inference Broadly statistical inference falls into two major categories: estimation and hypothesis testing.RESEARCH METHODOLOGY LESSON 17: PRINCIPLES OF STATISTICAL INFERENCE AND CONFIDENCE INTERVALS In this chapter we shall learn to apply techniques of statistical inference to business situations. I will first introduce the basic idea underlying statistical inference and hypothesis testing. Both represent estimates. Below I briefly explain each of them. Both are actually two sides of the same coin and can be regarded as representing different aspects of a technique. To solve problems such as this we have to learn how to use characteristics of samples to test an assumption about the population from which the sample comes from. When a particular finding emerges from data analysis the manager asks whether the empirical findings represent the true picture or have occurred as a result of sampling accident. A business manager in a typical managerial situation needs to determine whether results based on samples can be generalized to a population. Clearly this would be both very time consuming and expensive and would result in an unacceptably low level of productivity. gut feel and intuition.04 inches? • Does the sample estimate of thickness differ from the By the End of this Unit you Should be Able to • Understand the nature of statistical inference • Understand types of statistical inference • Understand the theory behind statistical inference • Apply sampling theory concepts to confidence intervals and population estimate due to sampling error or is it because our fundamental assumption about the mean thickness of the underlying population is not correct? • Suppose he believes it to be the first case and accepts the consignment. To explain the concept a little more clearly we can look at a few examples of estimation: • University departments make estimates of next years Where Do We Use Statistical Inference? Let us think of a typical managerial situation: Imagine you are a purchase manager. Your basic problem is to ensure that a consignment of Aluminum sheets supplied to you by a supplier correspond to the required specification of . This in effect is the process of statistical inference. Similarly the percentage of occurrence in a sample of an attribute or event can be used to estimate the population proportion . what is the risk that he runs that the consignment is flawed and does not conform to quality standards of . Estimation This is concerned with how we use sample statistics to estimate population parameters.000 sheets. which can be done easily on computers. On the basis of past experience with the supplier the manager believes that the sheets come from a population with a standard deviation of .04 inch thickness.04 inches? This is just one example of a very typical managerial situation where the principles of statistical inference can be put to use to solving the manager’s dilemma. Statistical inference is the process where we generalize from sample results to a population from which the sample has been drawn. enrollments on the basis of last years enrollments in the same courses. 1. Thus an estimate of sales for the next quarter can be based on gut feel or on an analysis of past sales data for the quarter. On the basis of this data he has to make a decision whether the to accept or reject a consignment of 10. It is not necessary that an estimate be based on statistical data. Thus statistical inference is the process where we extend our knowledge obtained from a random sample which is only a small part of the population to the whole population. He finds for example that the sheets in the sample have an average thickness of .004inches.048 inches have come from a population with a average thickness of . All managers make quick estimates based on incomplete information.556 © Copy Right: Rai University . The issue facing the manager in the above example is: • Could a sample of 100 aluminum sheets with average thickness of .048 inches. Another option is for the manger to choose a random sample of 100 aluminum and measures them for their thickness. Another option would be to audit each and every item. The difference between a estimate based on intuition and one based on a random sample is that we can apply the principles of probability allows us to calculate percentage of error variation in an estimate attributable to sampling variation. Different management situations require different statistical techniques to carryout tests regarding the applicability of sample statistics to a population.

The agency comes back with a ad penetration of 50% for a random sample of 1000. This can be explained best with the help of an example: Suppose there is a store which sells CDS . RESEARCH METHODOLOGY What Does this Distribution Look Like? Logically we can conceive that there is only one sample which will contain the youngest possible customers and its mean will have the lowest sample mean.2. What is a Sample? A sample is a representative subset of the underlying population. 102 © Copy Right: Rai University 11. It could be that the observed difference was caused by sampling accident and that there is actually no difference between the two populations. What is statistical inference? What are the different types of inference? 2. Why do decision makers measure samples rather than entire population? What are the disadvantages of sampling? µ Figure 1 This result follows from the Central Limit theorem : if we take random samples of size n from a population. Essentially the process involves judging whether a difference between a sample and assumed population value is significant or not. Hypothesis begins with an assumption called a hypothesis that we make about a population parameter. A somewhat higher number will contain the youngest 98 customers and so on. In this case the company requests a marketing research company to carryout a survey to assess viewership. which are slightly higher than the lowest mean. A second different sample may have had a result where mean was equal to 45 years and standard deviation of 6 years. We assume it has a regular customer base. Some examples of real world situations where we might want to test hypotheses: • A random sample of 100 south Indian families finds that they consume more of a particular brand of Assam tea per family than a random sample of 100 North Indian families. We would expect samples taken from a population to generate similar if not identical sample means. · Colgate Palmolive have decided that a new TV ad campaign can only be justified if more than 55% of viewers see the ads. The distribution of sample means will look like the normal distribution as shown in the figure1 below. the distribution of sample means will approach that of a normal probability distribution. with a standard deviation of 5 years. as the sampling distribution will approximate a normal distribution as long as sufficiently large samples are taken. However if the results are not caused by pure sampling fluctuations then we have a case for the firm to take some further marketing action based on sampling finding. We do not actually know what form our population distribution takes: it could be normal or it could be skewed. However this is only one possible sample. Similarly there will be another couple of samples having the lowest 99 customers. it could just be the result of random sampling error). This approximation is closer the larger is n. The smaller the difference the greater the chance that our hypothesized value for the mean is correct. Theory Behind Statistical I nference We now look at the underlying theoretical basis of statistical inference.e. i. The basis of inference remains the same irrespective of whether the managerial objective is to obtain a point or interval estimate of a population parameter or to test whether a particular hypotheses is supported by sample data or not. standard deviation to decide how likely it is that our hypothesized population parameter is correct. The majority of the samples will have a cross section of all age groups and therefore there would be a clustering of sample means around what is likely to be the true population mean.e. we would like to know. For each sample that is taken from a population we can calculate various sample statistics such as mean and variance. Sampling Distribution Of Sample Mean Values Activities 1. which have been dealt with in more detail in the earlier chapter on sampling. We then collect sample data and calculate sample statistics such as mean. The underlying basis of statistical inference is the theory of sampling distributions. is this a “real” difference (i. However it doesn’t matter.e 55%. Hypotheses testing If we find a difference between two samples. Now we shall briefly review some concepts. Given the existence of sampling variation it is likely that there is also going to be some variability in the different estimates of mean and standard deviations. It is now the company’s problem to assess whether the sample viewing proportion is representative of the hypothesized level of viewership that the company desires. which could have been taken. If we take repeated samples such that all possible samples are taken then we are likely to obtain a sampling distribution of means. A random sample of 100 customers is taken and we find the sample mean age of customers was equal to 42years.. These samples will have means.556 . We can take many such samples from a population and calculate their mean and standard deviations. is it present in the population) or just a “chance” difference (i. To change a sample we need only change one of the customers. Can differences between the two proportions be attributed to sampling error or is the ads true viewership actually lower. In the next section we shall look at the theory behind statistical inference.

7% of all values in a normally distributed However We Should Be Clear We are Talking About Three Different Statistics Mean Deviation Sample Population Sampling distribution of mean x µ µ s s s / ”n. Standard population lie within ±3 standard deviation from the mean . The distance from the mean is defined in terms of number of standard deviations away from the mean. The standard deviation of the sampling distribution is given by ó/ ”w #v n. 11.556 © Copy Right: Rai University 103 . for a normal probability distribution. or it might be identical with the population mean. Further specific portions of the z. Thus the sampling distribution of the mean can be defined in terms of its mean and standard deviation. It might be higher than the population mean or lower. This is illustrated in the figure5. Only . As can be seen the sampling distribution of the sample mean is far more concentrated than the population distribution. However both distributions have the same mean µ.Normal Distribution We now look briefly at some of the key characteristics of the normal distribution.00. Statistical tables provide areas under the normal curve that are contained by any number of standard deviations (plus/ minus ) from the mean. Figure 3. This is illustrated in the figure 3. where ó is the populatio n standard deviation an d n is the sample size.5 Standard Normal Distribution However we rarely need intervals involving only one. • Approx 99. The Three Distributions are Illustrated Below Sampling distribution of the Population The two distributions are illustrated in figure2 below. population lie within ±1 standard deviation from the mean. Approximately 2..= ¯x.4. the total area under the normal curve is 1. • Approx 95.5% of all values in a normally distributed population lie within ± 2 standard deviation from the mean. While we cannot know for certain where the sample mean lies in relation to the population mean we can use probability to assess its likely position vis a vis the population mean. This is illustrated in the figure4. The normal sampling distribution can be summarized by its two statistics: • Mean ö • Standard deviation s / ”n normal curve lie between plus/ minus any given number of standard deviations from the mean.Thus all normal distributions with mean µ and standard deviation ó can be transformed into a standard normal distribution with µ=0 and s =1. This transformation is done using the z statistic where Application of Sampling Theory Concepts To Confidence Intervals Once we have calculated our sample mean we need to know where it lies in the sampling distribution of the mean in relation to the true mean of the sampling distribution or the population mean. These results are summarized below: • Approx 68% of all values in a normally distributed RESEARCH METHODOLOGY Logically we can see that the mean of the sampling distribution should equal the mean of the population.25% of area on either side of the population mean lies outside this range. We do this by constructing the standard normal distribution which is standardized . two or three standard deviations. Approximately 16% of the area lies on either side of of the population mean lies outside this range. From our earlier classes we know that irrespective of the values of ì and ó.µ/ s The distribution of the z statistic represents the standard normal distribution with mean µ=0 and standard deviation s=1 With the normal table we can determine the probability that the sample mean c lies within a certain distance from the population mean.15% of the area under the curve on either side of the mean lies outside this range.

z actually represents a transformation or change in measurement scale on the horizontal axis. Therefore all intervals containing the same number of standard deviations from the mean contain the same proportion of the total area under the curve. The standardized normal variable is z= x-ö/ s As can be seen from the figure5 below. So far we have tried to understand the theory of sampling underlying confidence intervals. RESEARCH METHODOLOGY Using the Standard Normal Probability Distribution The figure 6 below shows the raw scale and the standard normal transformation. Figure 5 The Standard normal probability table is organized in terms of z values .How Do we do This? This follows from the result that. irrespective of the shape of the normal curve. the area under the normal curve for a distance of one . It only gives the z values for half the area under the curve. Hence we can make use of only one standard normal distribution.two or three or any given number of standard deviations is the same across all curves. In the standard normal scale this value is transformed to µ=0. We now turn to defining what exactly is a confidence interval.556 . Because the distribution is symmetric: values which hold for one half of the distribution are true for the other. Thus in the raw scale the µ=50. 104 © Copy Right: Rai University 11.

What is a Confidence Interval? This is the range of the estimate we are making. We can use this property to make a statement about the probability that that a particular interval – defined in terms of number of standard deviations from the mean – contains the population mean. using one standard error of the mean.8kgs. . What is The Relationship Between Confidence Level and Confidence Interval? Somehow we have a perception that a high confidence levels such as 99% means a high degree of accuracy.5% or 80% confidence level. The average sample mean weight was 14. s= 185. In this lesson we will look at the concept of confidence interval estimation in more detail and attempt to understand it intuitively as well as its applications to management situations.2. 14. 181 ii. a sample of 60 is taken.4.019. i. (14. The concept applies as we shall see even when we make a non statistical estimate We shall continue with this concept in this lecture and attempt to understand intuitively what we mean by a confidence interval. A higher probability means more confidence.0390kgs ii. Confidence intervals Activities 1. A confidence interval around a sample mean would be defined as: >x ± zSE Thus we can see that the concept of confidence interval therefore provides an interval estimate around >x which would essentially represent the extent of sampling error.70 = (2125. we can for example have a 95. i. Examples of Application of Confidence Interval Estimation 1. These are not the only confidence levels. The mean is found to be 6. n=64 mean =217 i SE of mean = σ / n = 13. i.2 kg. Find the SE of the mean What is the interval around the sample mean that will include the population mean 95. Ans : i. For example we can say that the mean income of the population will lie between Rs 8000-Rs24000/. Confidence intervals can also be expressed in terms of standard errors than in numerical values: Mean+1.70 ii − x ± SE mean = 217 + / .64SE = lower limit of 90% confidence interval Confidence Intervals In the earlier sections we saw that the same proportion of area under the normal curve lies within plus/minus any given number of standard deviations and these can be related to specific probabilities. 218. ii.RESEARCH METHODOLOGY LESSON 18: STATISTICAL INFERENCES AND SAMPLING DISTRIBUTION In the last lesson we focused on understanding the theoretical foundations of statistical inference and sampling distributions.122kgs. The confidence limits help define the limits of these estimates. By the end of this chapter you should be able to : • Apply concepts of Interval estimates.7) 11. From a population with variance of 185. Managers are always making estimates. In fact it can mean the very opposite as it will produce a larger confidence interval.1. Find the SE of mean. An example can best illustrate the relation between the two.556 © Copy Right: Rai University 105 . Workers were sent to dig up and weigh a sample of 421 bricks . This probability indicates how confident we are that the interval estimate will include the population parameter. It is well known fact that standard deviation of brick weight was . University of Delhi is conducting a study on the average weight of the bricks that make up the university’s paths.64SE = upper limit of 90% confidence interval Mean-1. The most commonly used confidence levels are: 90%. Establish an interval estimate that should include the population mean 68. Find the SE of the mean.3.381) 2. 99%. ii. From a population known to have a standard deviation of 1. We shall also spend some time practicing applications of the principles we have learnt in this lesson to practical situations. ii. (6.60/ 64 = 1. a sample of 64 individuals has an estimate of mean =217.278kgs) significance Understanding Confidence Levels and Confidence Intervals What Is A Confidence Level? In statistics the probability associated with an interval estimate is called the confidence level. Establish an interval estimate around the mean. We had also examined the concept of confidence intervals. Larger confidence intervals mean that estimates are not so precise and there is an element of fuzziness about them. 95%. 6.3% of the time. These concepts are not just used in statistics but have relevance to our day-to-day life where we frequently set up confidence intervals and try to establish the associated confidence level with that interval level.5% of the time? and confidence levels to business situations • Learn to interpret a confidence interval • Determine factors influencing choice of technique • Examine issues relating to the determination of level of Ans: i.

5% confident that the true life in the interval 34. The table below shows that as the customer sets tighter and tighter confidence intervals. on the basis of a sample of 100. These represent the limits of our confidence interval. This does not mean that there is a 95.. Ultimately there are no free lunches when it comes to dealing with confidence intervals and levels.74SE -75% : ¯x ± 1.5% of all sample means are within ± 2 standard Examples 1.5% of samples the population mean would be located in an interval ±2 standard error from the sample mean. 36. This is indicated by the shaded region in a diagram of the normal distribution( see fig 1). and SE. If the population standard deviation is not known we can use sample standard deviation (s)to estimate the population standard deviation. He discusses the possible time frame for getting his repair done with the maintenance manager of the service center. then in about 95% of these cases the population mean will lie within that interval..immediately Manager’s hedge I am absolutely certain (99%) I am almost positive (95%) I am pretty sure of it(80%) I am not certain that I can manage it (40%) There is little chance that of that (1%) Or equivalently • There is a 95. what exactly do we mean? This concept can be illustrated more clearly with the help of an example: A company estimates. The sample intervals for each sample mean is also shown.121 months(36±3 standard error) For example we say that we are 95. Finally the when the confidence interval is very narrow (will I get it immediately?) the estimate is associated with a very low level of confidence as to be useless (1%).414 months. within a year . Ultimately the confidence level for an estimate is based on the expected results if the sampling process is repeated many times. express the lower and upper limits in terms of sample mean.414 months(36±2 standard error) • 99. Similarly the following two statements express a confidence interval: • 95. When we set up a confidence interval around a sample mean along with an associated confidence level . Customer’s Demand .556 .3% confident that the true life in the interval 35. we will find that approx 95% of these intervals will include the population mean.707 months(36±1) standard error • 95. Solving Problems Based On Confidence Interval Estimation It is easy to solve problems on confidence intervals if we understand that a confidence level is defined by how many SE from there are on either side of the mean.5% probability that ì lies within ± 2 standard RESEARCH METHODOLOGY errors of all sample means..3% that the mean of the sample will be within ± of 1 standard deviation. What do these statements mean? What we mean is if we select 1000 samples at random from a given population and then construct a ±2 standard deviations interval around the mean of each of these samples. Instead it means that if we select many random samples of the same size and calculate a confidence interval around the mean of each of these samples ...A customer is interested in getting his washing machine repaired. We show the ±2 standard error interval for five different samples. The manager has to weigh the benefit of high confidence levels or certainty associated with a decision with the cost of having to accept a lower level of accuracy. The fig1 below illustrates this graphically. the service center manager hedges by agreeing to a lower and lower level of confidence. In probability terms we can make the following statements: We can say that we are… • 68.5% probability that the mean life of all batteries falls within the interval established..5% confident that the mean life of a battery lies within 34.within a month . -54% : ¯x ± . Given the following confidence intervals . Will my machine be repaired . The normal tables given in the appendix convert any desired level of confidence into standard errors from the mean of a standard normal distribution. As we can see as the customer sets tighter and tighter confidence intervals the manager sets progressively lower confidence levels.879 to 38.7% confident that the true life in the interval 33. that the life of car batteries it manufactures is 36 months. Since we have information regarding one standard error we can calculate the end points or limits by SE multiplied by the appropriate z statistic. Interpreting a Confidence Interval How do we interpret a confidence interval? The process of sampling is such that we use one sample to estimate a population parameter. Similarly the probability is 68. As we can see only the interval around >x 4 does not contain the population mean.586 and 37. by tomorrow . are shown in figure 7.586 to 37.within a month .293 to Figure1 In other words we expect that in 95.15SE -95%: -98%: deviations of the population mean 106 © Copy Right: Rai University 11.

3.= 1. Define upper and lower limits in terms of the sample mean and SE.7 3. ¯x= 3250. Suppose you wish to use a confidence level of 80%. a) b) c) d) 60 percent.5= 25+/-4. but does not want to read the whole paper. e) How are these intervals different from confidence intervals? 11. Again we first write our data: N=125.9mins b. SE = σ / n σ = 185 = 13. RESEARCH METHODOLOGY Exercises 1. 125 =55.5 percent certain to include the true population mean. 2.3250±55. Steve Kipper has a reputed barbers shop .3% interval z.2 typos per page. Jon Jackobsen. c) The customer is fifth in line and Steve’s estimate is 38 minutes. Sample mean = Steve’s estimate of waiting time Standard error of the estimate = 5minutes/customers position in the queue a. Given the following confidence levels. 15+/. 70 percent 92 percent. She is interested in purchasing a used stereos . an overzealous graduate student.5 Confidence interval= 25+/-1.7 ii. The only statistician in town is frustrated with what he feels are highly inaccurate point estimates of Steve’s.3% certain that the population mean price lies within this interval. 68. Knowing a little bit about business statistics.1.00 217 ± 1. For a population with known variance of 185 . b.2 3. Establish an interval for the population mean that is 95. a.267 mins c. 3. 38+/-1. express the lower and upper limits of the confidence interval for these levels in terms of ¯x and SE. 20+/-9.5 percent certain to include the true population mean.5% certain that the population mean lies within this interval. She randomly selects 125 newspaper ads and found the average price of a used stereos in the sample was Rs3250.8mins e.6 ii. instead.000.96mins d. ó2=185 i. a sample of 64 individuals leads to 217 as an estimate of the mean. He determines that the actual waiting time for any customer is normally distributed with mean equal to Steve’s estimate in minutes and standard deviation equal to 5 minutes divided by the customers position in the waiting line.556 4. Construct for Jon a 90 percent confidence interval for the true average number of typos per page in his paper 107 © Copy Right: Rai University . 3250±110. i. Suppose a sample of 50 is taken from a population with standard deviation a) Establish an interval estimate for the population mean that is 95. 68. Sample mean ¯x =217. Jon selected 40 pages at random to read and found that the average number of typos per page was 4.6 SE = 13. Establish an interval estimate that should include the population mean 68. Calculate the estimated standard error of the mean.615. To solve this problem we need to reduce the problem in terms of sampling concepts. has just completed a first draft of his 700-page dissertation.3% of the time.6/ 8 = 1. s =615 SE= s / n = 615/ i . Standard deviation= 5/2=2. The data is as follows: n=64.6 b) Why might estimate (a) be preferred to estimate (b)? Why might (b) preferred to (a) ? 5. d) The customer is first in line and Steve’s estimate is 20 minutes.3% confidence interval z = 1. 96 percent. b) The customer is third in line and Steve’s estimate is 15 minutes. ii.96 Rs. that the sample size was 5.96*2. In what way may an estimate be less meaningful because of a) b) High confidence level ? Narrow confidence interval? i. To set up a 95% confidence interval for the first customer: Sample mean = 25. Define confidence level for an interval estimate. 95. Find the SE of mean ii.0 Rs. Jon has typed his paper himself and is interested in knowing the average number of typographical errors per page.5% interval z= 1. Ena is a frugal under graduate at a University. She knows the standard deviation of used stereos is Rs. Establish an interval estimate of stereos so that Ena can be 95. c) 6. Help Steve’s customers develop 95 percent probability intervals for the following situations: We need to help Steve’s customers determine s a) The customer is second in line and Steve’s estimate is 25 minutes. These are prediction intervals for the next observations rather than confidence intervals for the population mean based on a sample that ahs already been take. As each customer enters Steve yells out the number of minutes that the customer can expect to wait before getting his cut. Establish an interval estimate of stereo prices so that Ena can be 68. Suppose.3 and the sample standard deviation was 1.

Their characteristics are such that no routine methods.556 © Copy Right: Rai University . Strategic decisions. you should be able to: • Explain the concepts of model building and decision- The decision-making process followed may consist. models play a fairly important role. So. Such decisions the manager has to take fairly often and he/she knows the information required to facilitate them. It is also defined as the body of information about a system gathered for the purpose of studying the system. after reading this lesson..RESEARCH METHODOLOGY LESSON 19: MODEL BUILDING AND DECISION MAKING Friends. Introduction Dear friends you are the future mangers.e. whichever type of organisation you works in. In this lesson we will concentrate on Model Building and Decision-making. These are called decision-making situations. made the decision. managers often take recourse to models to arrive at decisions. which have a fairly long-term effect in an organisation. 1) Routine/ Repetitive/ Programmable vs. complex. and for a manager. very often faces situations where he has to decide/ choose among two or more alternative courses of action. It is also stated as the specification of a set of variables and their interrelationships. criteria and goals. Models and Modelling Many managerial decision-making situations in organizations are quite complex. The element of subjectivity/judgment in such decision-making is fairly high. of some or all of the steps given below: 1) Problem definition. The different types of managerial decisions have been categorized in the following manner. • Discuss the need for model building in managerial research. i. Usually the decision maker has knowledge in the form of “this is what you do” or “this is how you process” for such decision-making situations. you had a number of alternative management education programmes to choose from. can be developed for taking care of them. maybe you had admission in this programme only. It is possible for the manager to model the decision-making situation and try out the 11. 2) Operating vs. . Nonroutine /Nonprogramable decisions. The purpose of modelling for managers is to help them in decision-making. Here modelling comes in handy. However. 3) Generation/ Enumeration of alternative courses of action 4) Evaluation of alternatives. • Relate the different types of models to different decision- making situations. Even in that extreme type of situation you had a choice -whether to join the programme or not! You have. The term ‘structure’ in models refers to the relationships of the different components of the model. depending upon your own decision-making process. designed to represent some real system or process in whole or in part. and untried problem situations the manager is vary about taking decisions based on intuitions. in terms of standard operating procedures. Model: The dictionary meaning of the word model is “a representation of a thing”. evaluation and selection. broadly. 5) Selection/ choosing the “best” alternative 6) Implementation of the selected alternative. The non-repetitive/non-programmable/strategic decisions are those. An example of such a situation would be the point of time when you possibly took the decision to joint this management programme. Modelling : Models can be understood in terms of their structure and purpose. The routine/repetitive/programmable decisions are those which can be taken care of by the manager by resorting to standard operating procedures (also called “sops” in managerial parlance). Since the type of problem faced by the decision maker may vary considerably from one situation to another. Possibly. at worst. making. • Describe the principles of designing models for different types of managerial research/ decision-making situations. the information needs and the processing required to arrive at the decision may also be quite different. in the fourth and fifth steps. All the above steps are critical in decision-making situations. Examples of these decisions could be processing a loan application in a financial institution and supplier selection by a materials manager in a manufacturing organization. Or. 2) Identifying objectives. In case of large. 108 Activity Suppose you have recently bought a Sony Colour TV for your house Describe briefly the decision process you have gone through before making this choice. A wrong decision can possibly land the organisation in dire straits.

are fairly easy to follow. which has been used for sales forecasting by your organization. milk powders. Activity 2 Mention below a mathematical model. 3) Mathematical models. However. once one is familiar with the symbols used. each having capacity to process raw milk and convert it into a number of milk products like cheese. which are usually used in computer programming and software development. facilitate discussions and guide analysis. These models are very useful for situations. An example of such a model for a materials procurement situation with quantity discounts allowed. shrikhand. The development of mathematical models usually follows “graphical models. They improve exposition. As the quantum. the uncertainties and the variables. The model equation is as follows: Procurement price = 10 Total Cost = 10 * Q Yes RESEARCH METHODOLOGY Start Is procurement Quantity > 1000 No Procurement price = 15 Total Cost = 15 * Q Stop Stop Fig. the unit procurement price exhibits a decrease in a step-wise fashion. The diagrammatic model of the processes in this set up is depicted. This can be compared to non-destructive testing in case of manufacturing organisations. It gives the optimal order quantity (Q) for a product in terms of its annual demand (A).alternatives on it to enable him to select the “best” one. 4) Logical flow models. The example we will consider here is the case of co-operative state level milk and milk products marketing federation.556 © Copy Right: Rai University 109 . Most of these include clearly the objectives. of purchases increases.1: A logical flow model material procurement decisions with quantitative discounts allowed. the ordering cost per order (Co).” Graphical Models: I think you can easily understand graphical model. which require multiple decision points and alternative paths. These models have the following advantages: 1) They can be used for a wide variety of analysis. ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Activity 3 Think of a production decision situation and present it diagrammatically using logical flow model. An example of such model from the area of materials management would be as follows: “The price of materials is related to the quantum of purchases for many items. 2) They can be translated into Computer Programs. 2) Graphical/ conceptual models. the inventory carrying cost per unit (Ci) and the purchase cost per unit (Cp). Here. beyond a particular price level no further discounts are available. The example of a mathematical model that is very often used by materials managers is the Economic Order Quantity (EOQ). Presentation of Models: There are different forms through which Models can be presented. or any organization you know of. ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ ____________________________________ Role of Modelling in Research in Managerial Decision-making In the previous sections of this lesson. the model is expressed in form of symbols. These models. 11. ghee. is as given in Figure 2. we confined ourselves to bits and pieces of each concept and their illustration in a comprehensive decision-making situation has not been attempted. Mathematical Models: The mathematical models describe the relationships between the variables in terms of mathematical equations or inequalities. The graphical models are more specific than verbal models. we have tried to explore the topics of model building and decision-making. We let us discuss each model one by one Verbal Models: The verbal models use everyday English as the language of representation. The type of decisions which have to be made in such a set up can be viewed as a combination of short/intermediate term and long-term ones. In this section we will look at a managerial decision-making situation in totality and try to understand the type of modelling which may prove of use to the decision maker. However. Q= 2 * A * C0 Ci * C p Logical Flow Models: The logical flow models are a special class of diagrammatic models. They are as follows: 1) Verbal or prose models. They depict the interrelationships between the different variables or parts of the model in diagrammatic or picture form. etc. The federation has a number of district level dairies affiliated to it. butter.

Stochastic Models . The long term -decisions relate to 1. We have in this section seen a real life. and at which location(s) and (2) which new products to go in for. RESEARCH METHODOLOGY Activity 4 Give an example each of the following models used for decision-making. The profitability of the organisation depends to a great extent on the ability of the’ management to make these decisions optimally. Analytical Numerical Models . The stochastic models explicitly take into consideration the uncertainty that is present in the decisionmaking process being modelled. a) Macro Model Types of Models Models in managerial system studies have been classified in many ways. the dynamic models are more complex and more difficult to build than the static models. The system attributes are represented by variables and the activities by mathematical functions that interrelate the variables.The analytical and the numerical models refer to the procedures used to solve mathematical models. b) Macro vs. In absence of a large integrated model. and for product-mix decisions one could develop Linear Programming based models. Now let us understand each one type and its utility – Physical Models . Dynamic d) Analytical vs. The micro models include explicit representations of the individual components of the system.The mathematical models use symbolic notation and equations to represent a decision-making situation. b) Micro Model c) Deterministic Model 110 © Copy Right: Rai University 11. Those which require a numerical computational technique can be called numerical type mathematical models. Engineers and scientists usually use these models. When we explicitly build up these uncertainties into our milk federation model then it gets transformed from a deterministic to.The final way of classifying models is into the. a researcher could attempt to model different Subsystems in this set up.in physical models a scaled down replica of the actual system is very often created. We have seen this type of situation cropping up in the case of the milk marketing federation decision-making. in this situation are uncertain.556 . follow the changes over time that result from the system activities. complex managerial decisionmaking situation and looked at the possible models the researcher could propose to improve the decision-making. The dimensions in describing the models are: a) Physical vs. We have earlier seen the economic order quantity model as an illustration of such a model. The capacity creation decisions such as which type of new capacity to create. again.The terms macro and micro in modelling are also referred to as aggregative and disaggregate respectively. to time series or regression based models. Micro. In managerial research one finds the utilisation of physical models in the realm of marketing in testing of alternative packaging concepts. Mathematical models that use analytical techniques (meaning deductive reasoning) can be classified as analytical type models. Needless to say. Dynamic models vs Static Models . the system to be in a balance state and show the values and relations for that only. The macro models present a holistic picture of a decision-making situation in terms of aggregates.The short-term decisions are typically product-mix decisions like deciding: 1. Obviously. Numerical and e) Deterministic vs. The demand for the products and the milk procurement. Mathematical Models . this is a rather complex decision-making situation and intuitive or experience based decisions may be way off from the optimal ones. Static models assume. Modelling of the decision-making process and the interrelationships here can prove very useful. For instance “time series forecasting based models could prove useful for taking care of the milk procurement subsystem. Similar models could be ‘built for other decision-making situations. Micro Models . for the product demand forecasting one could take recourse. Where to produce which product and 2 When to produce it. Deterministic vs. At the same time. Mathematical.The consideration of time as an element in the model. when. c) Static vs. however. Macro vs. Stochastic. Dynamic models. deterministic and the probabilistic/ stochastic ones. a stochastic/ probabilistic type of mode. they are more powerful and more useful for most real life situations.

3) Helping the decision maker/manager decide what to do. The steps are: 1) Identifying and formulating the decision problem. The objective of modelling. results in improved decision-making. which could emerge for the rather different identifications of the problem. in many situations. in this situation. The understanding. a fair amount of effort. interactive systems or processes. Example: A materials manager may like to order the materials for his organisation in such a manner that the total annual inventory related costs are minimum. However. O f course. In one case it may be used for explanation purposes whereas in another it may be used to arrive at the optimum course of action. One can easily see here the radically different solutions/models. The different purposes for which modelling is attempted can be categorised as follows: 1) Description o f the system functioning.556 © Copy Right: Rai University . price or any other variable. 2) Identifying the objective(s) of the decision maker(s).. 2) Prediction of the future. improved marketing strategies. Very often. one may run into situations of multiple conflicting objectives. An example of this can be quoted from consumer behaviour problems in the realm o f marketing. sales. Utilising these models the manager can -understand the differences in buying pattern of household groups. The last major objective of modelling is to provide the manager inputs on what he should do in a decision-making situation. and the working capital never exceeds a limit specified by the top management or a bank. Precise problem formulation can lead one to the right type of solution methodology. This can help him in designing hopefully. 4) Determining the relevance of different aspects of the system. Discuss how the model may be used for the following: Explanation Purposes i) Prediction of the future value of the dependent variable. Sometimes the models developed for the description/ explanation can be utilised for prediction purposes also. 111 Activity 5 You. Determination of the dominant objective. The second objective of modelling is to predict future events. 5) Choosing and evaluating a model form 6) Model calibration 7) Implementation The decision problem for which the researcher intends develop a model needs to be identified and formulated properly. the assumption made here is that the past behaviour is an important indicator of the future. especially in case of complex problems. Example: A manager stating that the cause of bad performance of his company was the costing system being followed. The objective of modelling here is to optimize the decision of the manager subject to the constraints within which he is operating. i. The predictive models provide valuable inputs for managerial decision-making. trading-off between the objectives. would be to arrive at the optimal material ordering policies. iii) Helping the decision maker decide what to do to achieve a given object. Improper identification of the problem can lead to solutions for problems. A careful analysis of the situation by a consultant indicated that the actual problem lay elsewhere. which either do not exist or are not important enough. Model Building/Model Development The approach used for model building or model development for managerial decision making will vary from one situation to another. It is vary likely that you may come across a regression model for estimating. 3) System elements identification and block building. The first purpose is to describe or explain a system and the processes therein. 11. Such models help the researcher or the manager in understanding complex.e.d) Stochastic Model RESEARCH METHODOLOGY Objectives of Modelling The objectives or purposes which underlie the construction of models may vary from one decision-making situation to another. The problem identification is accompanied by understanding the decision process and the objective(s) of the decision maker(s). advertisement expenditure. This process can require. may go through various issues o f any management journal(s). the improper product-mix being produced by the company. we can enumerate a number of generalized steps which can be considered as being common to most modelling efforts.

we can represent a typical marketing system in form of a block diagram (please refer Figure 4). The next major step in model building is description of the system in terms of blocks. For instance.556 . One attempts to find whether the model does things. The model selection should be made considering its appropriateness for the situation. Inclusion of the not so relevant segments in the model increases the model complexity and solution effort. which has a few input variables and a few output variables. one may ask questions such as whether the model justifies assumptions of linearity. If you were asked to build a model to forecast the manpower requirements of both skilled and non-skilled workers for the next five-year. for descriptive-’models one would place emphasis on face validity and goodness of fit. Depending upon the modelling situation one may recommend the appropriate modelling form. simulation is listed for modelling in conditions where mathematical formulation and solution of model are not feasible. The use validity criteria may vary with the intended use of the model. Generally. maximizing welfare. The ones which are considered important for managerial applications are face validity. This is also a technique which is used for decision making under conditions of uncertainty. we are concerned about the validity of the model structure. one should continuously question the relevance of the different blocks visa-vis the problem definition and the objectives. in situations where little or no data is available. A number of alternative modelling forms or structures may be possible. The typical “objectives which could feature in such models can be maximizing profits. model. list out the steps you may consider for building the model. For instance. Each of the blocks is a part of the system.and weight the objectives could be some ways of taking care of this problem. For instance. non-linearity (but linearizable) and so on. A series of trial and error experiments are conducted on the model to predict the behaviour of the system over a period of time. In statistical validity we try to evaluate the quality of relationships being used in the. one has to take recourse to subjective procedures. In this way the operation of the real system can be replicated. and so on. and possibility of producing decisions. However. in modelling marketing decision-making situations. which are consistent with managerial experience and intuition. implementation. This methodology has been used in numerous types of decision problems ranging from queuing and inventory management to energy policy modelling. goodness of fit to the historical data. which are found to be of use for complex decision problems. Documentation of the model and procedures for continuous review and modifications are also important here. _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ RESEARCH METHODOLOGY Model Validation When a model of a decision-making situation is ready a final question about its validity should always be raised. When sample data is available then we can use statistical techniques for calibration. This improves the likelihood of the model actually being used. 112 © Copy Right: Rai University 11. Competitors Retailers Dealers Consumers Product flow Information flow The final steps in the model development process are related to model calibration and. which are acceptable in the given context. Model implementation involves training the support personnel and the management on system use procedures. The term ‘simulation’ is used to describe a procedure of establishing a model and deriving a solution numerically. The decision-making system as a whole can be described in terms of interconnections between blocks and can be represented pictorially as a simple block diagram. One could evaluate it on criteria like theoretical soundness. usually computer based. Marketing Manager Regional Depot Officials Wholesalers Simulation Models Simulation models are a distinct class of quantitative models. In face validity. Finished goods warehouse Activity 6 You are the personnel manager of a construction company. However. This involves assigning values to the parameters in the mode. among other things. maximizing employment. minimizing costs. statistical validity and use validity. A number of criteria have been proposed for model validation. The modeller should check whether the model represents the real life situation and is of use to the decision maker.

4) What are the purposes of modelling? Discuss. 11. 2) Briefly review the different type of models along with their characteristics. situation.556 © Copy Right: Rai University 113 . Try to identify the blocks in it along with their interrelationships. Try to prepare a graphical model of this decision-making. 3) Take a complex decision making situation in your organisation.RESEARCH METHODOLOGY Tutorial 1) What do you understand by the term decision-making? What is the role of models in managerial decision-making? Explain.

RESEARCH METHODOLOGY LESSON 20: PRINCIPLE OF HYPOTHESIS TESTING So far we have talked about estimating a confidence interval along with the probability (the confidence level) that the true population statistic lies within this interval under repeated sampling. statistic. we’ll increase sales by 25000 units’ is a hypothesis. The larger the difference the smaller the probability that the hypothesized value is correct. For example: • If a manager says ‘if we drop the price of this car model by Hypotheses Testing–The theory Null Hypothesis In testing our hypotheses we must state the assumed or hypothesized value of the population parameter before we begin sampling. Hypotheses testing is the process of making inferences about a population based on a sample.. We would write it as: Ho: µ=500 If we use the hypothesized value of a population mean in a problem we represent it symbolically as: m Ho. We then determine whether the sample data supports our hypotheses assumption regarding the average sales growth. then it is more likely that our hypothesized value of the mean is correct. The key question therefore in hypotheses testing is: how likely is it that a population such as one we have hypothesized to produce a sample such as the one we are looking at. Once we have our sales growth figure. i.e. This symbolized by Ha . The term null hypotheses has its origins in pharmaceutical testing where the null hypotheses is that the drug has no effect. We now examine the principles of statistical inference to hypotheses testing. • Another way is to take a sample of territories and audit sales results for them. The assumption we wish to test is called the Null Hypotheses and is symbolized by Ho. This can be any assumption about a population parameter not necessarily based on statistical data.: Ha: µ≠(the alternative hypothesis is not equal to 500) Ha: µ>500(the alternative hypothesis is greater than 500) Ha: µ<500( the alternative hypothesis is less than 500) Rs15000 . How would the manager go about testing this assumption? Suppose he has 70 territories under him. To test the validity of our assumption about the population we collect sample data and determine the sample value of the 114 Understanding Level of Significance The purpose of testing a hypothesis is not to question the computed value of the sample statistics but to make a judg- © Copy Right: Rai University 11. instead we need to use objective criteria based on sampling theory to accept or reject the hypothesis. For example if we want to test the hypotheses that the population mean is 500. • A manager estimates that sales per territory will grow on average by 30% in the next quarter is also an assumption or hypotheses. We cannot accept or reject a hypothesis about a parameter simply on intuition.e. By the End of this Chapter you Should be Able to • Understand what is hypothesis testing • Examine issues relating to the determination of level of significance • Apply tests of hypotheses to large to management Situations • Use of SPSS package to carry out hypotheses test and interpretation of computem output including p . Whenever we reject the null hypothesis the alternative hypothesis is the one we have to accept. How is this Done? If the difference between our hypothesized value and the sample value is small. In fact managers propose and test hypotheses all the time.values What is Hypothesis Testing ? What is a hypothesis? A hypothesis is the assumption that we make about the population parameter. There are three possible alternative hypotheses for any Ho. For example it can also be based on the gut feel of a manager. Alternative Hypothesis If our sample results fail to support the hypotheses we must conclude that something else must be true. The manager is then faced with the problem of determining whether his assumption or hypothesized rate of growth of sales is correct or the sample rate of growth is more representative. • One option for him is to audit the results of all 70 territories and determine whether the average is growth is greater than or less than 30%. it is likely that it will differ somewhat from our assumed rate. i. Managerial hypotheses are based on intuition. This is a time consuming and expensive procedure. the market place decides whether the manager’s intuitions were in fact correct. In practice however very rarely is the difference between the sample mean and the hypothesized population value larger enough or small enough for us to be able to accept or reject the hypothesis prima-facie.556 . there is no difference between a sample treated with the drug and untreated samples.. For example we may get a sample rate of 27%. To test it in reality we have to wait to the end of the year to and count sales.

say. on the basis of sample evidence. Instead of measuring the variables in original units we calculate a standardized z variable for a standard normal distribution with mean µ=0.04 inches.004 If this probability is too low we must conclude that the aluminum company’s statement is false and the mean thickness of the consignment supplied is not .The z statistic tells us how many how many standard deviations above or below the mean standardized mean (z. An example will help clarify the issues involved: Aluminum sheets have to have an average thickness of .0408 To test any hypothesis we need to calculate the standard error of the mean from the population standard deviation σ _x = σ / n = . How do we use sampling to accept or reject hypothesis? Again we go back to the normal sampling distribution.000 aluminum sheets. perhaps Our Sample Data is As Follows n=100. Certain situations demand that decision makers be very sure about the characteristics of the items being tested and even a 2% probability that the population produces such a sample is too high. Ha: µ ≠ . 5%. accept or reject a batch of 10. In statistical terms 5% is called the level of significance and is denoted by a=.004 inches. Therefore our problem of testing a hypothesis reduces to determining the probability that a sample statistic such as the one we have obtained could have arisen from a population with a hypothesized mean m. z= x−µ σx = . It is this probability value that will tell us how likely it is that a given sample mean can be obtained from a population with a hypothesized mean m. In the hypothesis tests we need two numbers to make our decision whether to accept or reject the null hypothesis: • An observed value or computed from the sample • A critical value defining the boundary between the acceptance acceptable probability.0004 Next we calculate the z statistic to determine how many standard errors away from the true mean our sample mean is. From the standard normal tables we can calculate the probability of the sample mean differing from the true population mean by a specified number of standard deviations.556 x=. Once we have stated out hypothesis we have to decide on a criterion to be used to accept or reject Ho. This means that we reject the null hypothesis when the observed difference between the sample mean and population mean is such that it or a larger difference would only occur 5 or less times in every 100 samples when the hypothesized value of the population parameter is correct.04 a = . For example: • We can find the probability that the sample mean differs from the population mean by two or more standard deviations.04 and a standard deviation= .045 s = .04inches. On the basis of past experience he knows that the population standard deviation for these sheets is . Therefore the next step.<0.ment about the difference between the sample statistic and the hypothesized population parameter. is to decide what criterion do we use for deciding whether to accept or reject the null hypothesis.04inches or they are useless. after stating our null and alternative hypotheses. -- it can be reasonably concluded that the difference between the sample mean and hypothesized population mean is too large and the chance that the population would produce such a random sample is too low. This gives us our observed value of z which can then be compared with the z critical from the normal tables. RESEARCH METHODOLOGY The Process of Hypothesis Testing We now look at the process of hypothesis testing.004 / 100 = . .04inches and the standard deviation is . The issue the contractor faces is whether he should . What probability constitutes too low or acceptable level is a judgment for decision makers to make. is also the risk we run of rejecting a hypothesis that is true.0408-. • If the probability is low for example less than 5% . In each situation what needs to be determined are the costs resulting from an incorrect decision and the exact level of risk we are willing to assume.04/. We can convert our observed data into the standardized scale using the transformation z= x−µ σx The z statistic measures the number of standard deviations away from the hypothesized mean the sample mean lies. z>0) our observation falls. © Copy Right: Rai University 115 . We use the result that there is a certain fixed probability associated with intervals from the mean defined in terms of number of standard deviations from the mean. In other situations there is greater latitude and a decision maker may be wiling to accept a hypothesis with a 5% probability of chance variation.04 inches) by .04. Our minimum standard for an 11. In terms of hypotheses testing the issue is : • If the true mean is .0004=2 This is demonstrated in the figure 1 below. what are the chances of getting a sample mean that differs from the population mean (. The level of significance represents the criterion used by the decision maker to accept or reject a hypothesis.0008inches or more? To find this out we need to calculate the probability that a random sample with mean . A contractor takes a sample of 100 sheets and determines mean sample thickness as .0408 inches. For example if the manager wishes to allow for a 5% level of significance.408 will be selected from a population with ì=.05 We now write our data systematically : Ho: µ = .004 and rejection region .It therefore indicates the permissible extent of sampling variation we are willing to allow whilst accepting the null hypothesis.

Here . .4972 is associated with a z =2. This value is 1. We can however not a few points regarding this issue: 1.4972 Looking up the normal tables we find for positive values of z . Area under one half of the normal curve =.44 certain that we accept the hypothesis when it is true? This problem requires that we leave a probability 1-.0028 area in each tail. Selecting a Level of Significance There is no standard or externally given level of significance for testing hypotheses. Since it is two tailed test we have to halve this probability to determine z such that there is . Figure1 1.Activities 1. Therefore we say the sample data is such as to cause us to not reject null hypotheses.44% .1.96. • In fig4c we would reject Ho. What do we mean when we reject a hypothesis on the basis of a sample? RESEARCH METHODOLOGY Interpreting The Level Of Significance The level of significance is demonstrated diagrammatically below in figure 3.5 is so high that we would rarely accept Ho when it is true and frequently reject Ho when it is true. 3.0028 Figure 4a. • In fig 4a &b we would accept Ho that the sample mean does Example 2 How many standard errors around the hypothesized value should we use to be 99.77 this is illustrated in the figure 2 below. Hence we reject the null hypothesis. The level of significance at which we want to test a hypothesis is set externally by the manager based on his evaluation of the costs and benefits associated with acceptance or rejection of a null hypothesis. a probability of . Why? Because out level of significance of . not differ significantly from the population mean. The location of the sample statistic is also shown in each of the three distributions. 2. . The two coloured parts under the curve representing a total of 5% of the area under the curve are the regions where we would reject the null hypotheses. A word of caution regarding areas of acceptance and rejection.50. This is because the only way a hypothesis can be accepted or rejected with certainty is for us to know the true population parameter. 4b.5-.0028=. Since the observed value of z is greater than the critical value of z we can infer that the difference between the value of the sample mean and the hypothesized population mean is too large at the 5% level of significance to be attributed to sampling variation. Even if our sample statistic does not fall in the non shaded region this does not prove that our Ho is true.0028 µ z 0 2. 4c .77 Figure 2 116 © Copy Right: Rai University 11.95 of the area under the curve is where we would accept the null hypotheses. Our level of significance is 5%.0056 in the tail. The higher the level of significance the greater the probability of rejecting Ho when it is true.0056/2 =. We now determine the critical value of z at 0. The sample results merely do not provide statistical evidence to reject the hypothesis. We have now determined that our calculated value of z indicates that the sample mean lies two standard errors(SE) to the right of the hypothesized population mean on the standard normal scale. It obviously remains the same.994 =.556 .01. 99. The manager would reject the consignment of aluminum sheets as not meeting the required specification level. A comparison between the observed z and the z permissible by our given level of significance :observed z: 2 Critical z :1. This is illustrated in the figure 4 below.96 4. which shows three levels of significance: .05 level of significance.

One Tailed Tests So far we have discussed two tailed tests where the sample statistic can differ from the hypothesized mean i. It is symbolized by ß. it can either be more than the hypothesized mean or less than the hypothesized mean. which are on the borderlines of acceptability. This is also the probability of the level of significance. (rejecting Ho when it is true.Define Type1 and 2 errors.) Type2 error means that an entire group of users may be poisoned!! ( accepting Ho when it is false) In this situation the company would prefer to minimize type 2 error and will set a high level of significance. Our sample size is 50 .5 6.sample of products throws up product specifications. Diagram the acceptance and rejection regions for the following hypothesis: µ ≠ 36. Thus we will rarely accept a Ho when it is not true. in figure 3 we illustrate a 50% level of significance. 2. With reference to the earlier figure. The costs associated with this maybe recalling the defective parts or offering a involves giving a special warranty to repair the defect.5 µ<36. 5. it is associated with high production costs.If out goal is to accept a null hypothesis that ì=36. The alternative hypothesis can be one of two Consider Another Scenario Type 1 error( rejecting Ho when it is true): The costs associated with a type 1 error involves disassembling an entire engine at the factory – i. This is best seen with a example. In this case the manufacturer will set lower levels of significance to minimize type 1 errors.. There exists a tradeoff between these two errors: To get a low á we have to have a higher ß. for example. Would you rather make a type 1 or type 2 errors? b. Based on your answer to a. We can reduce the probability of making a type 1 error if we are willing to increase the probability of making a type 2 error.e. This happens.In a criminal trial. That is it will set very high levels of significance ( possibly a > 50%) to get low ßs. RESEARCH METHODOLOGY Activities 1. the null hypothesis is that an individual is innocent of a certain crime. For example he may set 95% 05 99% level of significance. Type2 error: This is the probability of accepting Ho when it is false. The price that we have to pay for this higher level of certainty is that we will frequently reject Ho when it is true..5 with 96% certainty when it is true.75 SE. a. a manager has to weigh the costs and benefits involved with each type of error before setting the level of significance. what is the probability that we have rejected a hypothesis that is in fact true? 3. Type 1 and Type2 Errors Type 1 Error: defined as the probability of rejecting Ho when it is true. would you choose a high or low level of significance. Here we have reduced the acceptance region . To deal with this tradeoff. You are the quality control engineer for the battery manufacturer. Example of a Tradeoff Suppose a type 1 error involves the time and trouble of working a batch of chemicals that should have been accepted. Type2 ( accepting Ho when it is false): This involves accepting products with defective specifications i.5% probability whether it lies above the hypothesized mean and 2.If we reject a hypothesis because it differs from a sample statistic by more than 1. that some customers may get a somewhat defective product. Thus a 5 % level overall level of significance implies that we can test whether 2.Explain why there is no single level of probability used to accept or reject a hypothesis. The null hypothesis continues to be that of no difference.5 µ>36.e. However there may be many cases where we have some prior information which enables us to test whether the sample statistic is significantly more than or less than the true population statistic. if a random test 11. Offering a repair warranty is a relatively less costly option.556 © Copy Right: Rai University 117 . with the alternative hypothesis being that the battery life is more than 300 days. Would the legal system prefer to commit a type1 or type2 error with this hypothesis.5% probability whether it lies below it.Your null hypothesis is that the battery for a heart pacemaker has an average life of 300 days. In this case we go in for a one-tail test.e. 4. It is symbolized by a.

28. Insufficient doses do not produce the desired medical treatment. Therefore it is a one tail test. critical= 1. Excessive doses will pass harmlessly out of the system. Our problem suggests that the hospital faces a problem if doses are significantly less than 100cc as patient’s treatment will be affected. a right tailed test would be used. The only difference will be in the value of the z critical. We can now calculate the standardized z statistic: z= x − µ 99.75 − 100 = = 0. the sample mean lies within the acceptance region and the hospital can accept the null hypothesis: the observed mean of the sample is not significantly different from the hypothesized mean dose.28 In this case if we wish to accept or reject a hypotheses at 5% level we then determine z critical such that the entire 5% lies on either the right side (upper tailed test) or on the left side ( lower tailed test). However if the dose is more than 100cc there appears to be no major problem.e. An example will clarify the situation: A hospital uses large quantities of packaged doses of a particular drug. A highway safety engineer decides to test the load bearing capacity of a bridge that is 20 years old. The hospital inspects 50 doses randomly and finds the mean dose to be 99. We can calculate the SE of sample mean dose: SE = σ n = 2 50 = 0..0289 z. The standard deviation of a press life is 2100 hours.57 © Copy Right: Rai University 11.88 SE 0. Applications of One Tailed Tests Many managerial situations call for a one tailed test. Therefore the null hypotheses remains unchanged. which is determined by the entire level of significance on only one side of the normal distribution.001 significance level should the company conclude that the average life of the press life is less than the hypothesized 14500 hours. She therefore wants its capacity to be above a certain minimum level.28 Therefore since -. what are the null and alternative hypotheses? The engineer would be interested in whether a bridge of this age could withstand minimum load bearing capacities necessary for safety purposes . Typically if a problem requires you to test whether the sample statistic is: Figure 5a • More than a given population statistic • Less than a given population statistic a one tailed test is appropriate. we are only interested in testing as the alternative hypotheses whether the sample mean strength is significantly below 100cc. The hypotheses can be stated as follows: Ho: µ=100cc Ha: µ<100cc 118 one or two tail test If the minimum load bearing capacity of this bridge must be 10 tons . The hypotheses are: Ho : µ=10 tons Ha : µ>10 tons 2.01 SE = 2100 / 25 = 420 Z = 13000 − 14500/ 420 = −3. The population standard deviation of doses is 2cc.This is illustrated by the coloured regions in figures 5a&5b. The hospital has purchased the drug from the same manufacturer for many years. Ho : µ = 14500 Ha : µ < 14500 n = 25. i. Our problem requires us to assess whether average press life is significantly less than the hypothesized press life. so a one tailed test.RESEARCH METHODOLOGY (a) (b) Ho: µ=500 Ha: µ>500 – Upper tailed test Or Ha: µ<500 – lower tailed test This is a left tailed test and the coloured region corresponds to . Hinton press hypothesizes that the average life of its press is 14500 hours. Example 1. If the problem requires us to assess whether the sample statistic is not equal to a population statistic then we use a two tailed test. Considerable data are available from similar tests on the same type of bridges.10 level of significance. At a . σ = 2100 α = .88<1. From a sample of 25 presses the company finds sample mean life to be 13000 hours. The acceptance region consists of 40% on the left side of the distribution plus the entire 50% on the right side for a total area of 90%. This is shown in figure5a&5b.556 .75cc. Which type of hypothesis is appropriate Figure 5b The procedure for testing the hypothesis remains the same as in the two tailed case.

At a . what are the null and alternative hypotheses? 11. The manager of a South Eastern district is interested in comparing the movies popularity in his region as compared to the all India average. He randomly selects 75 theatres in his region and finds they ran the movie for an average of 81.02 level of significance does Atlas have reason to believe the average retail price to consumers ahs decreased? 3.95 with a standard deviation of $5. Before the promotion began the average retail price of a stove was $44.33 implies we should reject Ho.5 days. Under what conditions is it appropriate to use a one tailed test? A two tailed test? 4.01 level of significance. Atlas Sporting goods has implemented a special trade promotion policy for its stoves. The statistics department installed energy efficient lights. The average life is significantly less than the hypothesized life. Aaj Ka films is a film distribution company.33 -3. The manager wants to know whether mean running time in the South East is below that of the national average.RESEARCH METHODOLOGY z.556 © Copy Right: Rai University 119 . critical for a one tail test= -2. They know that a hit movie runs for an average of 84 days in a city with a standard deviation of 10 days. Test the appropriate hypothesis at . After the promotion they sample 25 retailers and finds mean price to be $42. Activities 1. Now they want to determine whether the average monthly energy usage has decreased. They think the promotion should result in a significant price change for customers. 2.57<-2.75.95. heaters. Should they perform a one or two tail test? If their previous average monthly usage was 3124 KW hours. ACs.

Ans. An example shall make the process clearer. Similarly we try to determine whether the two different sample proportions differ significantly from each other. We will also learn to test for the differences between two samples. 05577 z = − P − PHo/ σ − p = . 05/ . The company’s marketing research department used a national telephone survey of 6000 households and found that 335 would purchase extra spicy ketchup. At the 5% level of significance can the company conclude the proportion of skeptical people has decreased? Testing of Hypothesis for Proportions When testing significance of hypotheses for proportions we begin with a similar procedure to the earlier case: First we define various proportions or percentages of occurrence of a event: pHo: the population or hypothesized population of success qHo Hypothesized value of the population proportion of failures p: sample proportion of successes q: sample proportion of failures Again we calculate a z statistic z= p − p H0 σp where s p is the standard error of the proportion which is calculated using the hypothesized population proportion . Theoretically the binomial distribution is the correct distribution to use when dealing with proportions. Observed z value=1. We can also apply the principles of hypothesis testing already determined to testing for differences between proportions of occurrence in a sample with a hypothesized level of occurrence. That is do two separate samples differ significantly from each other. He is interested in comparing the reliability of his lawn mowers with an international brand. Ho : µ1 Ha : µ1 Activities 1. A sample of 120 of Steve’s customers shows that 22 of them required repairs. Macroswift estimated last year that 35% of its potential software buyer were planning to wait to purchase the new operating system Window Panes. We can then check the calculated z with z critical for the appropriate level of significance to determine whether to accept or reject Ho. . At a two percent level of significance should the company conclude that there is an increased interest in the extra spicy flavour? n=6000 Ho: p=.05577 = 2. until an upgrade has been released.05 p Ho =. As sample size increases the binomial distribution approaches a normal distribution in terms of characteristics.05583 Ha:p >.05 σ−p= Hypothesis Testing of Proportions So far we have talked about the principles of hypothesis testing where we have compared a sample mean with a hypothesized or population mean.02 level of significance is there evidence that Steve’s mowers differ from the international brand.02 (PHo qHo/n) = (.556 . He knows only 15% of the international brands require repairs. Therefore we can use the normal distribution to approximate the sampling distribution.Steve Cutter sells lawn mowers . 3. p H0 q H 0 n Hypothesis Tests of Differences Between Means So far we have examined the case where we are testing the results of a sample against a hypothesized value of a population statistic.05 Since the observed z > z critical we should reject Ho and current levels of interest are significantly greater than interest two years ago. After an advertising campaign to reassure the public . Macroswift surveyed 3000 people and found 950 who were still skeptical. σp = 120 © Copy Right: Rai University 11.05583 − . nq>5 where p is the proportion of successes q: proportion of failures. A much more extensive study made two years ago found showed that 5% of households would purchase would purchase the brand then. 07 z critical for a one tailed test is 2.RESEARCH METHODOLOGY LESSON 21: TESTING OF HYPOTHESIS – LARGE SAMPLES In the last lecture we learnt about the general principles of hypothesis testing and how to carryout a test a sample statistic with a hypothesized population value. Ïp =335/6000=. We now extend the principles of hypothesis testing to include testing of hypothesis to sample proportions vis a vis a hypothesized population proportion. At .33(z critical) Therefore we accept Ho.A ketchup manufacturer is in the process of deciding whether to produce a new extra spicy ketchup. To do this we need to satisfy the following conditions: np>5.05 a=.03< 2.05 ∗ .95 / 6000 ) = .

6 The estimated standard error of the difference between the two means is Figure 1a & 1b We then calculate the z statistic : 11. In each case we are not interested in the specific values of the individual parameters as the relation between the two parameters. How do we i f r n e e w e h a i g a---s derive this distribution ? Suppose we take a random sample from the distribution of Population 1 and another random sample from the distribution of Population 2.RESEARCH METHODOLOGY We now turn to case where we wish to compare the parameters for two different populations and determine whether these differ from each other.. The result of the survey is given in the table below. By constructing a distribution of all possible sample differences ---x1 .6 Size of sample 200 175 apex Eden $8.10 Ho : µ1 = µ2 Ha : µ1 ≠ µ2 a = .?x 2 ) – (µ1 .---x2. Figure 1c The mean of this distribution is µ − x1 − − x2 = µ1 − µ2 The standard deviation of the distribution of the difference between sample means is called the standard error of the difference between two means.A company may want to see if the proportion of promotable employees in one installation is different from another.4 $.556 © Copy Right: Rai University 121 .05 The z statistic = (?x1 .---x2 .Whether female employees earn less than males for the same work. is there a significant difference between them.4 s ˆ 2= s2=.05 level of significance that there is no significant difference between the hourly wage rate across the two cities. i.---x2 we end up with a distribution of the difference between sample means shown below in figure 6c.e. If we then subtract the two sample means. i. Example of hypothesis of this type are: . If they are not. We have two populations 1. Figure 1b shows the respective sampling distribution of sample. The core problem reduces to one of determining whether the means from two samples taken from two different populations is significantly different from each other. we can hypothesize that the two samples are not significantly different from each other. Mean µ1 and µ2 and standard deviations s 1 and s 2. 2 with mean s 1 and s 2 with standard deviation s 1 and s 2. . This difference will be positive if ---x1 is larger than ---x2 and vice versa. City Mean hourly wage Standard deviation of sample $.µ2) Ho / s ?x1 . the sampling distribution of the d f e e c b t e nt es mpl n me nx1 . In this case we are not really interested in the actual value of the two parameters but the relation between the two parameter.05 Since standard deviation of the two populations are not known we estimate óˆ1 and óˆ2 by using the sample standard deviation s 1 and s 2 s ˆ1= s1 =. σ x1 − − x2 = − (σ 1 2 / n1 + σ 2 2 /n 2 ) The testing procedure for a hypothesis is similar to the earlier cases. The associated sampling distribution for sampling means ---x1 and ---x2.e. .95 $9.?x2 Since we will usually we will be testing for equality between the two population means hence: (µ1 . However what we are now interested in is the difference between the two values of the sampling means. Ho : µ1 = µ2 Ha : µ1 ≠ µ2 The Oretical Basis Shown below are three different but related distributions. They have respectively the following characteristics: = . The company wants to test the hypothesis at . Figure 1a shows the population distribution for two different populations 1 and 2.µ2 ) Ho =0 since µ1 = µ2 An example will make the process clearer: A manpower statistician is asked to determine whether hourly wages of semi skilled labour are the same in two cities. This distribution is defined by the following statistics: Mean of sampling distribution of sample means : µ1 and µ2 Standard deviation of sampling distribution of mean or the standard error of sampling mean: s ¯x1 and s ¯x2. we get ---x1 .A drug manufacturer may need to compare reactions of one group of animals administered the drug and the control group.

_ p2).10-0/. Suppose we have two sample proportions _ p1 and _ p1 which measure the probability of occurrence of an event or characteristic in two samples.9 %.42 and the sample standard deviation was $1.23% and the sample standard deviation was.Two independent samples of observations were collected.58. In group1 71 of 100 animals respond to drug1 with lower BPS.83 ?x1 . a sample of 38 money market funds showed an average rate of return of 4. A random sample of 38 male machine tool operators found a mean hourly wage of $11. A year earlier . the Financial Accounting Standards Board (FASB) was consider-ing a proposal to require companies to report the potential effect of em-ployees’ stock options on earnings per share (EPS.µ2 ) Ho / s 8.36% and the sample standard deviation was . is it reasonable for the FASB to conclude (at á=. As long as samples are greater than 30 we can use the normal approximation to binomial.00 more per hour than the female operators? Tests for differences between proportions for large samples RESEARCH METHODOLOGY Example 2 : 1.053=-2. A random sample of 45 female operators found their mean wage to be $8. The second sample of 75 elements had a mean of 82 and a standard deviation of 9. This can be calculated from the following formula: ˆp=_ p1 _ q1+ _ p2 _ q2 / n 1+ n2 Once we have the estimated proportion of successes in two populations. we need to estimate them from sample statistics: _ p1. In group2 58 of 90 animals showed lower BP levels. Despite the Equal pay act it still appeared that in 1993 men earned more than women in similar jobs. Therefore we reject the null hypothesis. A random sample of 41 high-technology firms revealed that the new proposal would reduce EPS by an average of 13.z. For the first sample of 60 elements.?x2 ) – (µ1 . =(?x 1 . and a sample standard deviation was $1. They are administered to two different groups of lab animals.=( _ p1 .10) that the FASB proposal will cause a greater reduction in EPS for high technology firms than for producers of consumer goods? 122 This can be best illustrated with the help of an example: A pharmaceutical company tests two new compounds to reduce blood pressure. _ q1.-p2 Since 3. (a) Compute the estimated standard error of the difference between the two means.-p2=”( ˆp ˆq/n 1+ˆp ˆq /n 2) The standard z statistic in this case is calculated as : z. A sample of 32 money market funds was chosen on Jan 1. A random sample of 35 producers of consumer goods showed that the proposal would reduce EPS by 9.38. 1.hpercent on average.(p1. _ p2. Is it reasonable to conclude (at a =. we reject Ho and it is reasonable to conclude that the two samples come from different populations.31. 1996.556 .95-9. On the basis of these samples is it reasonable to conclude (at á=.96 in figure 7 As we can see the calculated z lies outside the acceptance region.01. In which case our best estimate of the overall population proportion successes is the combined proportion of successes in both samples. The company wants to test at . The estimated standard error of the difference between two proportions is as follows: ˆs -p1 .84. © Copy Right: Rai University 11.09>2. Activities 1.51% .01) that the male operators are earning over $2. The analysis for testing differences in proportions of two samples is broadly similar to earlier case.p2)Ho/ ˆó -p1 . (b) Using a = 0.8 percent with a standard deviation of 18.84%.?x2 We can mark the standardized difference on a sketch of the sampling distribution and compare with the critical value of z=±1. We wish to test whether the probability of occurrence is significantly different across the two samples. _ q1 We hypothesize that there is no difference between the two proportions.58 Since we do not know the population proportions. In 1993. _ _ p1: sample proportion of success in sample 1 p2: sample proportion of success in sample 2 n1 : sample size 1 n2 : sample size 1 Standard Error of The Difference Between Two Proportions They limits of the acceptance region are : ±2.05 level of significance whether there is a difference between the two drugs. the mean was 86 and the standard deviation 6. and the average annual rate of return over the past 30 days was found to be 3. test whether the two samples can reasonably be con-sidered to have come from populations with the same mean. with a standard deviation of 8:7 percent. On the basis of these samples.05) that money market interest rates declined during 1995? 2.

48< -z critical=-2.04 level of significance is there a significant difference in the proportions of working mothers in the two areas of the city? Ans:z calculated =2.6789 The estimated standard error of the difference between two sample proportions is z= (.01 level of significance. ˆp=100(.96. personal appearances. The second one allows the form to be mailed. can you conclude that a smaller proportion of BSE stocks advanced on Friday than did on Thursday? 2.644 (sample proportion of success in sample 2) q2: .35 (. At the 0. and 75 mail forms.. Would you recommend that the hotel chain. the proportion in which the mother’s worked full-time was 0. Specifically it is a left tailed test as the manager wishes to test that method 1 .e. use a two-tailed test.556 © Copy Right: Rai University 123 . The second more expensive system has reduced the emission of pollutants to acceptable levels 76% of the time as determined on the basis of 250 air samples.15 level of significance implies we determine z critical for area under one side of the normal curve i. Figure 4 Activities 1.e. A large hotel chain is trying to decide whether to convert more of its roe=..29 n1 : 100 _ p2: . This year.3 % of mails forms had errors.23>z critical=2.5-.71 (sample proportion of success in sample 1) _ RESEARCH METHODOLOGY q1: . The first system has reduced the emission of pollutants to acceptable levels68% as determined from 200 air samples.15 level of significance. If the test asks whether one proportion is significantly higher or lower than the other. the hypothesis that that personal appearance method produces lower errors. A coal fired power plant is considering two different systems for reducing pollution.convert more rooms to nonsmoking? Support your recommendation: -testing the appropriate hypotheses at a 0. the management Figure 3 Example 2: For tax purposes a city government requires two methods of listing property. In a sample of 60 BSE stocks on Thursday . 205 guests in a sample of 38C preferred the nonsmoking rooms. The manager wants to test at the .05. One requires property owner to appear in person before a tax lister. Ans:z calculated = -3.33 Therefore reject Ho.15). This is shown in the figure 3 below: z critical at . 24 had advanced At a=.356 n2 : 90 _ Ho: p1= p2 Ha: : p1 ≠ p2 a=.05 The overall population or hypothesized percentage of occurrence assumes that there is no difference between the two population proportions. Two different areas of a large Eastern city are being considered as sites for day-care centers. We therefore estimate it as the combined proportion of successes in both samples. a one tailed test is appropriate.644)-0/.973 z. The manager thinks the personal appearance method leads to lower fewer mistakes. The procedure for this as the same as for carrying out a one tailed test for comparing sample means._ p1: .71)+90(. She authorizes an examination of 50 personal appearances. On Friday 11 stocks in a sample of 40 of the 2500 stocks traded on the BBSE advanced.o5 level of significance is 1.644)/190=. i. Therefore reject Ho. result in significantly lower errors. their price of their shares increases. critical for . In another section of the city -40 percent of the 150 households surveyed had mothers working at full-time jobs. i. 2. 166 have requested nonsmoking rooms. Hint: If a test is concerned with whether one proportion is significantly different from another. The hypothesis is a one tailed test. Of 200 households surveyed in one section. .0678=.e..: to nonsmoking rooms. Since observed z is less than z critical we accept Ho. If the expensive system is significantly more effective than the inexpensive system.71-.1. The results show that 10% of personal appearances have errors whereas 13.52. This is shown by the marked of region in the figure4 11.04 therefore calculated z <critical z we accept the null hypotheses. A . The data is as follows: Since it is a one tail test we do not divide the level of significance on both sides of the normal curve. In a random sample of 400 guests last year. 1.15 level of significance =.10 .

For example we test the hypothesis that observing a sample mean this far away from the true population mean is less than 5%. The p.On the basis of the z test he would have rejected the Ho 5 % level. However there are some differences in the way the level of significance is presented. This is called a probability value or p.value is determined the decision maker can then weigh all relevant factors and decide whether to accept/reject Ho without being bound by a prespecified level of significance.02 in making its decisions? In this chapter we will wrap up our analysis of hypothesis testing for large samples.96 SE away from the mean. compare the observed probability of getting a sample statistic with the prespecified level of probability(a). 11. We therefore need to calculate the probability P(>xe”12. Our Hypothesis Can Be Stated As RESEARCH METHODOLOGY We can then convert _ x to a standard z score Probability Values So far we have tested a hypothesis at a given level of significance.05. • In the earlier case we prespecify a level of probability and From the normal tables we can find the probability that a z greater than 2. We shall look at what prob values mean and compare them with conventional tests of significance. As we can see the p – value is very low and he probably will not go in for recalibration. For example if we reject a Ho at a=. Given the widespread availability of computers and statistical packages we can easily run such tests on the computer.25 or >xd”11. Computer outputs usually present the prob value or p-value. • We now ask what is the probability value of getting such a result. In this case we take a sample.0062 Since this is a two tailed test the p. This concept will also be presented in detail. compute mean and ask: suppose Ho were true what is the probability of getting a sample mean this far away from ìHo.5 is .01 we would have accepted the hypotheses.value tells us the largest significance level at which we would have accepted Ho.5).75gm if Ho is true. Once the p. However these days any management problem that we may wish to analyze generates such a large volume of data that it is virtually impossible to analyze and test hypotheses manually.25 or 11. The machine is currently set to cut blocks of 12 gm. Obviously the basic theory and principles of statistical analysis do not change when a test is carried out on computer. In other words before we take the sample we specify how unlikely the observed result will have to be in order for us to reject Ho.0062=.0124 we would reject Ho. The smaller the probability value. The p. i. Figure 1 Given the above information the cheese packer can now decide whether to recalibrate the machine or not. Example A machine is used to cut Swiss Cheese into blocks of specified weight. If he had he carried out a conventional hypotheses test at . Uses of P-Values Use of p values saves the tedium of looking up tables. A sample of 25 blocks is found to have an average weight of 12. There is another way to approach the decision whether to accept or reject Ho which does not require us to prespecify the level of significance before taking a sample.value of a statistic.5-.value tells us the exact probability of the getting a sample mean 1.value of the sample mean.96SE away from the mean. where %% is our externally given level of significance. The simple rule of thumb is: As long as a>p reject Ho. This is termed the p.value can also be more informative.25g. A p. Which system will be installed if the management uses a significance level of . This is done by measuring the power of a test. However at a significance level of .3gm. Should we conclude the machine needs to be recalibrated? Since this is a two tailed test we need to determine the probability of observing a value of >x atleast as far away from 12 as 12.value is 2*.4938=. On the basis of experience the weight of a block has a standard deviation of . We shall also look at what is considered a good hypothesis test.05 level of significance is also illustrated in the figure 1.58. By now you should have a good idea how to apply the principles of hypothesis testing to different types of managerial problems. we only know that the observed value was atleast 1.0124 this information is shown in the figure 1 below.of the power plant will install the inexpensive system.75) if Ho is true. Therefore it becomes important to understand and interpret how these tests are run on the computer.e.0124 and the associated z value (±2. as the critical z value would have been 2. . The concept will be made clearer with the help of an example. the greater the significance of the finding.556 124 © Copy Right: Rai University . The two methods are equivalent and essentially represent two sides of a coin. Thus at any level of significance above .

Twenty-five pieces have been cut with the machine set to cut sections 5.710 T -15. Kelly’s machine shop uses a machine-controlled metal saw to cut sections of tubing used in pressure-measuring devices. 18 percent of passenger cars exceed 70 mph on Interstate 40 between Raleigh and Durham. its preparatory course will increase an individual’s score on the College Board exams by at least 50 points on the combined verbal and quantitative total score. The North Carolina Department of Transportation has claimed that at most. A random sample of 300 cars found 48 cars exceeding 70 mph.value we reject Ho at the relevant level of significance and vice versa. To accept or reject an hypotheses we compare the level of significance (á) and the p.4 The prob value is the probability that ¯p>. We compare this with our standard of accepting/ rejecting Ho which is . The standard deviation of the life of all tyres of this type has previously been calculated by the manu-facturer to be 7. increase their scores by at least 50 points. 4.05>.05 The computer output for this test is shown below using the Minitab package. Therefore it is important that students can interpret computer output generated for hypotheses tests by various standard statistical analysis packages. SAT’s market-ing director. that is.01.=450 ¯p=200/450=.01 we reject Ho as the probability of getting such a result is much lower than our level of significance. The university had been receiving many complaints about the caliber of teaching being done by the graduate-student teaching assistants. What is the prob value for a test of hypothesis seeking to show the NCDOT’s claim is correct? 3. As a result.For example if we have a p-value=. n. This hypotheses was tested at a =. Instead of a comparing the calculated z value with a predetermined level of significance.45 P -value 0. If á>p. most packages display the prob values or p. A random sample of 450 individuals showed that 200 of them were regular coffee drinkers at breakfast. their mean length was found to be 4. SAT Services advertises that 80 percent of the time.e. since . Because this prob value is less than our significance level of á = 0.556 © Copy Right: Rai University 125 . with an associated (two-tailed) prob value of 0. The test results are reported assuming that the two population variances 11.06". i.281 Stdev 10.25 vs µ =56.014 SEMean .45.97".05.4444 or Example While designing a test it was expected that the average grade would be 75%. She carefully records the mileage obtained from a sample of 64 such tyres. in-deed.05 . This is shown in Table 1 T test for a mean Test for µ=56. Assuming that the mileage is normally distrib-uted. they decided to test whether stu-dents in sections taught by the graduate TAs really did worse in the exam than those stu-dents in sections taught by the faculty. The most popular of these packages are SPSS and Minitab. If we let the TAs’ sections be sample 1 and the fac-ulty’s sections be sample 2.05. The length of the sections is normally distributed with a standard deviation of 0. A car retailer Hunks that a 40000 mile claim for tyre life by the manufacturer is too high. RESEARCH METHODOLOGY Example 2 The Coffee Institute has claimed that more than 40% of American adults regularly have a cup of coffee with breakfast.25 Varaible Result N 199 Mean 45.25 out of 75. Broadly all programmes follow the same principles. Using The Computer to Test The Hypotheses These days in actual managerial situations hypotheses tests are rarely done manually. 2.500 miles. This hypotheses was tested against actual test results for a sample of 199 students. Lisle has reviewed the records of 125 students who took the course and found that 94 of them did.00" long.444 Ho:p=.values.00 Activities 1.600 miles. determine the largest significance level at which we would accept the manufacturer’s mileage claim.00". The Minitab output for doing this test is given below. Use prob values to determine whether the machine should be recalibrated because the mean length is significantly different from 5. 56.000 miles. then the appropriate hypotheses for testing this concern are Ho: µ1 = µ2 Ha: µ1< µ2 The underlying population is assumed to be equal for both samples. we must reject Ho and con-clude that the test did not achieve the desired level of difficulty. Lisle Johns. Then this means that the probability of getting our sample result is . The observed t value for this test was -15. wants to see whether this is a reasonable claim.value.01 and a =.0000. An example will help show how computer outputs results for hypotheses testing. The mean turns out to be 38. When these pieces were measured.4 Ha:p>.. Use prob values to determine whether SAT’s ads should be changed because the percentage of students whose scores increase by 50 or more points is significantly different from 80 percent. What is the prob value of a test of hypotheses seeking to show that the Coffee Institute’s claim was correct. Table 1 T test for difference between two sample means Here we test hypotheses of equality of two means. at which we would not conclude the mileage is significantly less than 40.

Once we decide on the significance level. Then managers would like the hypothesis test to reject it all the time.42 cc.ß (something near 1.. In this case we would accept the null hypotheses at any level of significance up to .0 .ß (something near 0. Of course.33.0 What does the data tell us regarding the efficacy of TA ? The prob value is quite high. that is. Unfortunately. we show the power curve which is plotted by computing the values of 1 1 . µ equals some other value. the value of the dose at which we rejected the null hypothesis.76 10. © Copy Right: Rai University Figure 2b Point C on the power curve in Figure 2 b shows population mean dosage is 99. a test does not reject it. . Figure 2c 11. µ (the true population mean) does not equal µHo (the hypoth-esized population mean). we must compute the probability that the mean of a random sample of 50 doses from this population will be less than 99.44 P=. there is nothing else we can do about a.Test Instrnum 1 2 N 89 Mean 44.1.e.2 SE mean 1. Therefore a high value of 1 .42 cc.are equal.1. Ideally a manager would want a hypothesis to reject a null hypothesis when it is false. and a Type 2 error is made.0) means the test is working quite well ( i.e.98 The value of 1-ß is a measure of how well the test is working and is known as the power of the test. a low value of 1 . if sample mean dosage is less than 100.28. Example We were deciding whether to accept a drug shipment.ß for each value of µ for which the alternative hypothesis is true. the probability of a Type 2 error is ß .0) means that the test is working very poorly ( i. i.33 Df =197 Both use pooled stdev=10. the resulting curve is known as a power curve. not rejecting the null hypotheses when it is false).ß.00 .. a (the significance level of the test) is the prob-ability of making a Type I error.28 (0.2829).33. A Type 2 error occurs when we accept a null hy-pothesis that is false. Our test indicates that we should reject the null hypothesis if the standardized sample mean is less than .64 cc. there is a different probability ß of incorrectly accepting the null hypothesis. This can be explained better with the help of an example. Therefore if we compare this with a level of significance of . hypothesis tests cannot be foolproof. When the null hypothesis is false. This shown in Figure 2c .e it is rejecting the null hypothesis when it is false). In Figure 9a. For each possible value of µ for which the alternative hypothesis is true.e.05) we would accept the null hypotheses that there is no difference in results between TAs and faculty. we show a left-tailed test. If we can assume that the two variances are equal. Given that the population mean is 99. This is shown in table 2 Table2 RESEARCH METHODOLOGY Two sample T .556 126 .6 T test µ1 = µ2 vs µ1< µ2 : T=-. instead.05 (á = 0. If we plot the values of 1 .93 stdev 9. 110 45. then the test reported by Minitab is the test using a pooled estimate for s 2. sometimes when the null hypothesis is false. Suppose the null hypothesis is false. In Figure 9b. Figure 2a Measuring The Power of a Test What should a good hypothesis test do ? Ideally a and ß (the probabilities of Type I and Type 2 errors) should both be small. or 99. we would like this ß (the probability of ac-cepting a null hypothesis when it is false) to be as small as possible or we would like 1-ß (the probability of rejecting a null hypothesis when it is false) to be as large as possible.64 cc (the point below which we decided to reject the null hypothesis i. A Type I error occurs when we reject a null hypothesis that is true.

42 cc.00 cc. We then ask what is the probability that the mean of a random sample of 50 doses from this population will be less than 99. that because of sampling error. with a standard deviation of of .4% per month. Thus. This is because as the population mean gets closer and closer to 100. It is a one tail test of significance if the alternative hypothesis states the direction of differences. Points to Ponder • Hypothesis testing cab be viewed as a six –step procedure • Establish a Null Hypothesis as well as alternative Hypothesis.11 standard error above 99. So 99. many others are also used.7823.61)/0. if having any dosage below 100. the power of the test to recognize this situation is quite low. which lies at a height of 0. • • Figure 2e As we can see the values of 1 .7823. the power of the test (1 . While α = 0. This probability is nothing but the significance level of the test which in this case is 0. The probability of observing a sample mean less than 99. The probability of observing a sample mean less than 99. this is illustrated as the colored area in Figure 2e. The curve terminates at point F. we find the power of the test at ì = 99. petrol usage grew at an average rate of .-99.61 cc. when we take the true population mean to be µ= 99. petrol usage in the US had grown at a seasonally adjusted rate of . however.ß con-tinue to decrease to the right of point E.00 cc is completely unsatisfactory. The α is the significant level that we desire and is set in advance of the study. In 15 randomly chosen months between 1975 and 1985. Compute the actual test value of the data. one typically chooses the test that has the general power efficiency or ability to reduce decision error.64.61 cc. the test we have been discussing would not be appropriate.57% per month.01 level of significance can you conclude that the growth in the use of gasoline had decreased as a result of the embargo?Compute the power of the test for ì=.45 and .42)/0. Point D in Figure 9b shows that if the population mean dosage is 99. Choose the statistical test on the basis of the assumption about the population distribution and measurement level.00 cc.64cc and thus rejecting the null hypothesis is 0.33% per month.2829 cc.556 • • • © Copy Right: Rai University 127 .64cc and thus cause the test to reject the null hypothesis? This is illustrated in Figure 2d.50.64 is (99. RESEARCH METHODOLOGY Example Before the 1973 oil embargo and subsequent increase in oil price. The form of the data can also be a factor. What Does The Power Curve In Figure 2b Tell Us? As the shipment becomes less satisfactory (as the doses in the shipment become smaller).10 directly over the population mean. usually by referring to a table for the appropriate type of distribution. the power of the test 1 . our test is more powerful (it has a greater probability of recognizing that the shipment is unsatisfactory).64 cc and thus rejecting the null hypothesis is 0. This simply means that if µ = 99. Figure 2d Using the same procedure at point E.We had computed the standard error of the mean to be 0.78 Thus 99.00 cc. Thus. It also shows us.2829. Here we see that 99. If no direction of difference is given.64 99.10% per month. or 0. 05 is the most frequently used level.2829 = 0. the colored area in Figure 9d.ß ) at µ = 99.7823. This is given by the colored area in Figure 9c. Select the desired level of confidence. the power of the test (1 -ß ) gets closer and closer to the probability of rejecting the null hypothesis when the population mean is exactly 100.78 SE above the true population mean when it takes a value µ= 99. Interpret the result by comparing the actual test value with the critical test value. In light of these considerations.42 cc. Obtain the critical test value. it is two tailed test.42. At a .5438.42 is 0.2843. Thus.80 cc is 0.ß at µ = 99.64 is .64 cc is (99.10. 11. the probability that this test will reject the null hypothesis when it is false is 0.5438.61 cc is 0. when the dosage is only slightly less than 100. .

ts_clothing is 110 pounds.556 . At a significance level of . should we conclude that more stocks than usual set new highs on that day? 3. within 2 standard errors expect to find $23000 as the sample mean if. 1992 the Dow Jones closed at closed at 3.05 level of significance. the company’s claim is true. A sample of 1500 new parents redeemed 295 coupons. On Friday Sept 18th .RESEARCH METHODOLOGY LESSON 22: TUTORIAL 1.pqunds.000.02 can you conclude that the proportion of loans made to women has changed significantly in the last five years? 128 © Copy Right: Rai University 11. 4. about 5 percent of the stocks on the New York Stock set a new high for the year. A random sample of 120. is one of these sample values more likely to lead accept the null hypothesis? Why or why not? 2. A manufacturer of vitamins for infants inserts a coupon for a free sample of its production a package that is distributed at hospitals to new parents. stocks showed that 16 had set new annual highs that day. the firm suspects that today’s parents are better educated on average and as a result more likely to use vitamin supplements for their infants. From a random sample of 29 women. A manufacturer of petite women’s sportswear has hypothesized that the average weight of the women its buying1.282. Using a significance level of 0. From a sample of 10. The company takes two samples of its customers and finds one sample’s estimate of the population mean is 98 pounds. If the population standard deviation is known to be $1250 for these jobs determine whether we could reasonably. Historically about 185 of the coupons have been redeemed. This sample showed that 39% of the loans were made to women employees. in fact . In the test of the company’s hypothesis that the population mean is 110 pounds versus the hypothesis that the mean does equal 110 pounds.on a robust volume of over 136 million shares traded. 1. 350 were sampled to determine what proportion was made to women. and the other sample produces a mean weight of 122. Does this support at a significance level of 2percent the firm’s beliefs about today’s new parents. the average salary was calculated to be $23. A complete census of loans 5 years ago showed that 41% were women borrowers. A finance developed a theory that predicted that closed end equity funds should sell at a premium of about 5% on average. 5. On an average day. Assuming that the discount /premium population is approximately normally distributed does the sample information support his theory? Test at .200 loans made by a state employees credit union in the most recent five year period. Given current trends for having fewer children and starting families later. A company recently criticized for not paying women as much as men claims that its average salary paid to all employees is $23500.01.

Theoretical Aspects of The T Distribution Theoretical work on the t distribution was done by W. . Another example: a+b+c+d+e+f+g/7=16 Now we have 7 variables.10). how do we determine values for a and b? Basically we can slot in any two values such that they add up to 36. Carryout tests of differences between means for dependent samples.1. then b has to equal 26 given the above constraint. . For a sample size of n we can define a t distribution for degree of freedom n-1. 3. The t distribution tables on the other hand measures the chance that the observed sample statistic will lie outside it our confidence interval. Characteristics of The T Distribution Relationship between the t distribution and normal distribution: 1. We would go down vertically to determine the degrees of freedom (i.e. the shape of the t distribution loses its flatness and becomes approximately equal to the normal distribution. Where population standard deviation is not known. 13) and then read of the appropriate t value for a level of significance of . In this lesson we will briefly review the main theoretical properties of the t distribution and then determine principles of statistical inference under various situations. 4. By the end of this chapter you should be able to 1. · The normal tables focus on the chance of that the sample statistic lies within a given number of standard deviations on either side of the population mean. Broadly the main theoretical issues underlying tests of statistical inference are similar to the large samples. In this case t tests may be used even if the sample size is greater than 30. However as can be seen in figure1 the t distribution is flatter than the normal distribution and is higher in the tails and has proportionately less area in the around the mean. Both distributions are symmetrical. defined by a given number of standard deviations on either side of the mean. Carryout hypothesis testing using the t distribution for small samples 3.e. t values are therefore defined for level of significance and degrees of freedom. 2. Gosset in the 1900s. • A second difference is that we must specify the degrees of freedom with which we are dealing. Degrees of Freedom What is degree of freedom? This is defined as the number of values we can choose freely.RESEARCH METHODOLOGY LESSON 23: TESTS OF HYPOTHESES – SMALL SAMPLES In this and the next lesson we look at tests of statistical inference for small samples. a constraint) we are only free to specify one variable. In fact for sample sizes greater than 30 the t distribution becomes less dispersed and approximates a normal distribution and we can use the normal distribution. The concept is best illustrated with the help of an example: Consider the case: a+b/2=18 Given that the mean of these two numbers has to equal 18.. 2. Thus in a sample of two where the value of the mean is specified ( i. Suppose a=10. Suppose we are making an estimate for a n=14. There is a different t distribution for every possible sample size. The student’s t distribution is used under two circumstances: 1. The value of the 7 th variable is determined automatically. Review of the theoretical aspects of t distribution. Thus interval widths are much wider for a t distribution. at 90% level of confidence.S. Given the mean we are free to specify 6 variables.771ó x = on either side of 11.556 © Copy Right: Rai University 129 . n . Since the previous few lessons have analyzed these issues at length we shall not spend too much time on the theory in this chapter. Therefore we have only one degree of freedom.05.As sample size increases.771 shows that if we mark off plus and minus 1. 2. Sample size. This implies that we have to go further out from the mean of a t distribution to include the same area under the curve. It shows areas under the curve and t values for a limited number of level of significance (usually . We also assume that the population underlying a t distribution is normal or approximately normal. is less than 30. A t value of 1. Apply the principles of hypothesis testing of differences between means for small sample sizes. Figure 1 Using The T Distribution Tables • The t table differs in construction from the normal table in that it is more compact.01.

For example to find the appropriate t value for a one tailed test at a level of significance of . Figure 2 RESEARCH METHODOLOGY T Values For One Tailed Tests The procedure for using t tests for a one tailed test is conceptually the same as for a one tailed normal test. Thus if we are making an estimate at the 90% confidence limit we would look in the t tables under the . Therefore it also represents . a =. Hypotheses testing of means The t test is used when : 1. á=. 95% 4. Given the following sample sizes and t values find the corresponding confidence levels: • n=27.782. t=±2. that of chance error. Find t values for the following: 2. we shall focus on applications of the t test to various situations.1 column (1.05 ’! degrees of freedom= 28-1 =27 t=±2.056 • n=5.01 • n=15.05 with 12 degrees of freedom we look in the table under the . i.10 column t= ±1. 90% 3.10 of the area contained under both tails combined. a =. 99% t=±3. This is actually a or the probability of error. the sample size is <30 or When population standard deviation not known and has to be estimated by the sample standard deviation. We shall look at more common types of problems briefly. The t value is 1. we need to determine the area located in only one tail.05 % → degrees of freedom=12 T value for one tail test we need to look up the value under the . The area outside these limits.132 • n=18 t=±2. The t test is the appropriate test to use when population standard deviation is not known and has to be estimated by the sample standard deviation. t=±2. will be 10%.250 ! degrees of freedom=9 ˆ where σ x is the estimated standard error of the sample means. For a one tailed test t test. As the theoretical basis of hypothesis is the same as the normal distribution and has been dealt with in detail in the last chapter. Variants of this formula are developed to meet the requirements of different testing situations. The Formula For The T Statistic is t= x−µ σx ˆ Example For the following sample sizes and significance levels find the appropriate t values: 1. This is because the . n=28.0-. Table 1 Hypothesis Testing Using The T Distribution The procedure for hypothesis testing using the t test is very similar to that followed for the normal test.9=. n=25. Instead of calculating the z statistic we calculate a t statistic.782 Find one tail t values for the following: • n=10. n=13.e. a =. 2. 11. 1. However the t tables usually give the area in both tails combined at a specific level of significance.05 Reading The T Table A sample excerpt from the t table is presented below in table 1. n=10.the mean then we enclose 90% of the area under the curve. ˆ σ s where s is the sample standard deviation σx = ˆ σ ˆ n Exercise 1.. Exercise Find one tail value for n=13.10 column opposite 12 degrees of freedom.05 of the area contained in each tail separately. We can use it to read of t values for different levels of significance.This is shown in the figure 2 below.556 130 © Copy Right: Rai University .1).898 This represents the basic t test. degrees of freedom.048 2.10 column represents .

Picosoft. Long experience with operators on the old monochrome terminals showed that they averaged 8. Use the .2. If a sample of 25 observations reveals a sample mean of 52 a sample variance of 4. or about 13 times earnings.6.556 .10 Ha : µ = 90 n=20 1.3. 2. a sample standard deviation of 12.46 n 20 Tests For Differences Between Means – Small Samples Again broadly the procedure for testing whether the sample means from two different samples are not significantly different from each other is the same as for the large sample case. they randomly chose seven publicly traded software firms and found that their average price/ earnings ratio was 11. Degrees of freedom=19 To find t critical we look under the t table under the . and the sample standard deviation was 1. . should the supervisor of the department conclude that the new terminals are easier to learn to operate? As we can see this represents a two-tailed test. For an overseas assignment. The 95 operators trained to use the new machines aver-aged 7. . The data-processing department at a large life insurance company has in-stalled new color video display terminals to replace the monochrome units it previously used.02 can Picosoft conclude that the stocks of publicly traded software firms have an average P / E ratio that is significantly different from 13? 4. 3. Therefore since –2. The differences are in the calculation of the standard error formula and secondly in the calculation of the degrees of freedom.729 As population standard deviation is not known we estimate it : ˆ σ s = 11 where s is the sample standard deviation Standard error of sampling mean σx = ˆ σ ˆ 11 = = 2 . At á = .44< -1. At the 0. a supplier of operating system software for personal computers. a management review finds the mean scores for 20 test results ot be 84 with a standard deviation of 11. Ltd. She believes the aptitude scores are likely to be 90. Management wish to test the hypotheses at the . which gives the t value for . Our data is as follows Ho : µ = 90 a=.46 11. =1. In order to check the appropriateness of this price. If t calculated< t critical we accept the null hypotheses that there is no significant difference between the sample mean and the hypothesized population mean.01 level of significance.01 sig-nificance level.05 under both sides of the t curve.10 level of significance that the average aptitude score is 90. Given a sample mean 83. test the hypothesis that the population mean is 05 against the alternative hypothesis that it is some other value. An example shall make the process clearer: A personnel specialist is a corporation is recruiting a large number of employees.05 of area Ho: µ= µo Ha: µ= µo This is tested at a prespecified level of significance á The t statistic is -2. © Copy Right: Rai University 131 t= x − µ 84 − 90 = = −2.3. This is also illustrated diagrammatically in figure 3 Figure 3 RESEARCH METHODOLOGY σx = ˆ σ ˆ n N −n N −1 Two Tailed Test The specification of the null and alternative hypotheses is similar to the normal distribution..61 a share. test the hypothesis that the value of the population mean is 70 against the alternative the hypothesis that it is more than 100.10 column.44 σx ˆ 2. With current earnings $1. Their sample variance was 16.729 Exercises t= x−µ ˆ σx The calculated t value should be compared with the table t value.2 squared hours.44 -1.3. When a population is finite and the sample accounts for more than 5% of the population we use the finite population multiplier and the formula for the standard is modified to. t. was planning the initial public offering of its stock in order to raise sufficient working capital to finance the development of a new seventh-generation integrated system. Use the 0. Picosoft and its underwriters were contemplating an offering price of $21.729 1.2 hours before achieving a satisfactory level of performance.729 we reject the personnel managers hypotheses that the true mean of employees being tested is 90.5 and a sample size of G size of 22. Given a sample mean of 94.025 significance level. If the calculated t value > t critical we reject the null hypotheses at the given level of significance.1 hours on the machines before their performances were satisfactory.

RESEARCH METHODOLOGY

Degrees of Freedom

In the earlier case where we had tested the sample against a hypothesized population value, we had used a t distribution with n-1 degrees of freedom. In this case we have n 1 –1 degrees of freedom for sample 1 and n 2 –1 for sample 2. When we combine the sample to estimate the pooled variance we have n1 + n 2 –2 degrees of freedom . Thus for example if n 1 =10 and n 2 = 12 the combined degrees of freedom = 20

Ho: µ1= µ1 Ha: µ1 > µ1

The next step is to calculate estimate of the population variance :

s2 = p

2 ( n1 − 1) s12 + (n2 − 1)s 2 (12 − 1)(15) 2 + (15 − 1)(19) 2 = = 17.35 n1 + n 2 − 2 12 + 15 − 2

**Estimation of Sample Standard Error of The Difference Between Two Means.
**

In large samples had assumed the unknown population variances were equal and we estimated σ by s 12 and s 22 . ˆ This is not appropriate for small samples. We assume the underlying population variances are equal: ó 12= ó 22 we estimate population variance as a weighted average of s 12 and s 22 where the weights are numbers of degrees of freedom in each sample.

ˆ σ x1 − x2 = s p

1 1 1 1 + = 17.35 + n1 n2 12 15 = 6.72

We then calculate the t statistic for the difference between two means:

t=

x1 − x 2 92 − 84 = = 1.19 ˆ σ x1 −x2 6.72

s2 = p

( n1 − 1) s + ( n2 − 1) s n1 + n2 − 2

2 1

2 2

since it is a one tailed test at the .05 level of significance we look in the .1 column against 25 degrees of freedom. t. critical at .05 level of significance= 1.708 Since calculated t< t critical , we accept the null hypothesis that the first method is significantly superior to the second.

One we have our estimate for population variance we can then use it to determine standard error of the difference between two sample means, i.e we get an equation for the estimate standard error of x1 − x 2

Exercises

1. A consumer research organization routinely selects several car models each year and evaluates their fuel efficiency. In this year’s study of two small cars it was found the average mileage for 12 cars of brand A was 27.2km/litre with a standard deviation of 3.8litres. 9 brand B cars were tested and they averaged 32.1km per litre. With a standard deviation of 4.3 km per litre. At á=.01 should the survey conclude that brand a cars have lower mileage than brand B cars? 2. Connie Rodrigues, the Dean of Students at Mid State College, is wondering about grade distributions at the school. She has heard grumbling that the GPAs in the Business School are about 0.25 lower than those in the College of Arts and Sciences. A quick random sampling produced following GPAs. Business: 2.86 2.77 3.18 2.80 3.14 2.87 3.19 3.24 2.91 3.00 Arts & Sciences 2.83 3.35 3.32 3.36 3.63 3.41 3.37 3.45 3.43. 3.44 3.17 3.26 3.18 Do these data indicate that there is a factual basis for the grumbling? State and test appropriate hypotheses at á = 0.02. 3. A credit-insurance organization has developed a new high-tech method of training new sales personnel. The company sampled 16 employees, who were trained the original way and found average daily sales to be $688 and the sample standard deviation was $32.63. They also sampled 11 employees who were trained using the new method and found average daily sales to be $706 and the sample standard deviation was $24. At á = 0.05, can the company conclude that average daily sales have increased under the new plan? 4. To celebrate their first anniversary, Randy Nelson decided to buy diamond earrings for his wife Debbie. He was shown nine pairs with marquise gems weighing approximately 2 carats per pair. Because of differences in the colors and

ˆ σ x1 − x2 = s p

1 1 + n1 n2

Ho: µ1 = µ1

The null hypotheses in this case is Ho: µ1 = µ1

t=

x1 − x 2 ˆ σ x1 − x2

An example will help make this clearer: A company investigates two programmes for improving the sensitivity of its managers. One was a more informal one whereas the second involved more formal classroom instruction. The informal programme is more expensive and the president wants to know at the .05 level of significance whether this expenditure has resulted in greater sensitivity. 12 managers were observed for the first method and 15 for the second. The sample data is as follows:

Programme

Mean sensitivity index

No. of managers observed 12 15

Estimated standard deviation of sensitivity of the programme. 15% 19%

1 2

92% 82%

132

© Copy Right: Rai University

11.556

qualities of the stones, the prices varied from set to set. The average price was $2,990, and the sample standard deviation was $370. He also looked at six pairs with pear-shaped stones of the same 2-carat approximate weight. These earrings had an average price of $3,065, and standard deviation was $805. On the basis of this evidence, can Randy conclude (at a significance level of 0.05) that pear-shaped diamonds cost more on average, than marquise diamonds?

RESEARCH METHODOLOGY

References

Levin and Rubin Statisitcs for Management

11.556

© Copy Right: Rai University

133

RESEARCH METHODOLOGY

**LESSON 24: NON –PARAMETRIC TESTS
**

So far we have discussed a variety of tests that make inferences about a population parameter such as the mean or the population proportion. These are termed parametric tests and use parametric statistics from samples that come from the population being tested. To use these tests we make several restrictive assumptions about the populations from which we drew our samples. For example we assumed that our underlying population is normal. However underlying populations are not always normal and in these situations we need to use tests, which are not parametric. Many such tests have been developed which do not make restrictive assumptions about the shape of the population distribution. These are known as non-parametric or distribution free tests. There are many such tests, we shall learn about some of the more popular ones. The second disadvantage is that these tests are not as sharp or efficient as parametric tests. The estimate of an interval using a non-parametric test may be twice as large as for the parametric case. When we use nonparametric tests we trade off sharpness in estimation with the ability to make do with less information and to calculate faster. What happens when we use the wrong test in the wrong situation? Generally, parametric tests are more powerful than nonparametric tests (e.g., the non-parametric method will have a greater probability of committing a Type II error - accepting a false null hypothesis)

Exercise

1. What is the difference between the kinds of questions answered by parametric tests and those by non-parametric tests?

**Non Parametric Tests
**

What are Non Parametric Tests? Statistical tests that do not require the estimate of population variance or mean and do not state hypotheses about parameters are considered non-parametric tests. When do you use nonparametric tests?Non-parametric tests are appropriately used when one or more of the assumptions underlying a particular parametric test has been violated (generally normality or homoscedasticity). Generally, however the t-test is fairly robust to all but the severest deviations from the assumptions. How do you know if the data are normally distributed? There are several techniques are generally used to find out if a population has an underlying normal distribution. these include: goodness of fit (low power), graphical assessment (not quantitative), Shapiro-Wilk’s W (n<50), or D’Agostino-Pearson Test K2 (preferred). As noted non parametric tests do not make the assumption of normality about the population distribution. The hypotheses of a non-parametric test are concerned with something other than the value of a population parameter. The main advantage of non-parametric methods is that they do not require that the underlying population have a normal or any other shaped distribution. The main disadvantages of these tests are that they ignore a certain amount of information. For example to convert data to non parametric data we can convert numerical data to non parametric form by replacing numerical values such as 113.45, 189.42, 76.5, 101.79 by either ascending or descending order ranks. Therefore we can replace them by 1,2, 3, 4, and 5. How ever if we represent 189.42 by 5, we lose some information, which is contained in the value 189.42. 189.42 are the largest value and this represented by the rank 5. However the rank 5 could also represent 1189.42, as that would also be the largest value. Therefore use of ranked data leads to some loss of information.

**Important Types of Nonparametric Tests
**

Since the theory behind these tests is beyond the scope of our course we shall look at relevant applications and methodologies for carrying out some of the more important non parametric tests. Non-parametric tests are frequently used to test hypotheses with dependent samples.

**The Dependent Sample Tests are
**

1. The signed test of paired data where positive and negative signs are substituted for quantitative values. 2. Mcnemar test 3. Cochran 4. Wilcoxon test

**Non-parametric Tests For Independent Samples are
**

1. Chi square test 2. Kolmogorov –Smirnov one sample test Of these tests we shall cover the McNemar test, the Mann Whitney U test, the Kologomorov smironov test and the Wilcoxon test.

**Table of Equivalent Parametric and Nonparametric Tests
**

PARAMETRIC Independent t-test Paired t-test NON-PARAMETRIC Mann-Whitney U test Wilcoxon Signed-Rank Test

**When Do We Use Non Parametric Tests?
**

We use non-parametric tests in least one of the following five types of situations :

134

© Copy Right: Rai University

11.556

1. The data entering the analysis are enumerative; that is, counted data represent the number of observations in each category or cross-category. 2. The data are measured and/or analyzed using a nominal scale of measurement. 3. The data are measured and/or analyzed using an ordinal scale of measurement. 4. The inference does not concern a parameter in the population distribution; for example, the hypothesis that a time-ordered set of observations exhibits a random pattern. 5. The probability distribution of the statistic upon which the analysis is based is not dependent upon specific information or conditions (i.e., assumptions) about the population(s) from which the sample(s) are drawn, but only upon general assumptions, such as a continuous and/or symmetric population distribution. According to these criteria, the distinction of non-parametric is accorded either because of the 1. The level of measurement used or required for the analysis, as in types 1,2, 3 . That is we use either counted or ordinal or nominal scale data. 2. The type of inference, as in type 4. We do not make inferences about population parameters such as the mean. 3. The generality of the assumptions made about the population distribution, as in type 5. That is we do not know or make assumptions about the specific form of the underlying population distribution.

We now look at a few of the non parametric tests in more details including their applications.

RESEARCH METHODOLOGY

McNemar Test

This test is used for analyzing research designs of the before and after format where the data are measured nominally. The samples therefore become dependent or related samples. The use of this test is limited to the case where a 2x2 contingency table is involved. The test is most popularly used to test response to a pre and post situation of a control group. We can illustrate its use with an example: A survey of 260 consumers was taken to test the effectiveness of mailed coupons and its effect on individuals who changed their purchase rate for the product. The researcher took a random sample of consumers before the release of the coupons to assess their purchase rate. On the basis of their responses they were divided into groups as to their purchase rate (low, high). After the campaign they were again asked to complete the forms and again classified on their purchase rate. Table 1 shows the results from our sample.

Table1

Before campign Low purchase rate High purchse rate After campign High purchse rate 70(A) 80(C) Low purchase rate 180(B) 30(D) Ttotal 210 150 260

Cases that showed a change in the before and after the campaign in terms of their purchase response were placed in cells A and D. this was done as follows:

• An individual is placed in cell A if he or she changed from a

**Non-parametric VS. Distribution-free Tests:
**

As we have seen non-parametric tests are those used when some specific conditions for ordinary tests are violated? Distribution-free tests are those for which the procedure is valid for all different shapes of the population distribution. For example, the Chi-square test concerning the variance of a given population is parametric since this test requires that the population distribution be normal. On the other hand the Chi-square test of independence does not assume normality condition, or even that the data are numerical. The Kolmogorov-Smirnov test is a distribution-free test, which is applicable to comparing two populations with any distribution of a continuous random variable.

**low purchase to a high purchase rate.
**

• Similarly he’s placed in D if he goes from high to a low rate. • If no change is observed in his rate he is placed on cells BorC.

**Standard Uses of Non Parametric Tests
**

Mann-Whitney: To be used with two independent groups (analogous to the independent groups t-test) we may use the Mann-Whitney Rank Test as a non-parametric alternative to Students T-test when one does not have normally distributed data. Wilcoxon: To be used with two related (i.e., matched or repeated) groups (analogous to the dependent samples t-test) Kruskall-Wallis: To be used with two or more independent groups (analogous to the single-factor between-subjects ANOVA) Friedman: To be used with two or more related groups (analogous to the single-factor within-subjects ANOVA)

The researches wishes to determine if the mail order campaign was a success. We shall now briefly outline the various steps involved in applying the McNemar test.

Step1

We state the null hypotheses. This essentially states that there is no perceptible or significant change in purchase behavior of individuals. Thus for individuals who change their purchase rate this means that the probability of those changing from high to low equals low to high. This equal to .5. Ho: P(A)=P(D) Ha: P (A)‘“P(D) To test the null hypotheses we would examine the cases of change from cells A to D.

11.556

© Copy Right: Rai University

135

Step 2

The level of significance is chosen, for example á= .05

Step 3

We now have to decide on the appropriate test to be used. The McNemar test is appropriate because the study is a before and after study and the data are nominal. The study involves the study of two related variables. The McNemar test involves calculating the chi square value as given by the formula below:

to the sampling distribution indicates where there is a divergence between the two distributions is likely to occur due to chance or if the observed difference is due to a result of preference. Suppose for example that a sample of 200 homeowners and we got the following shade preference distribution; Very light: 80 Bright: 40 Dark: 20 The manufacturer asks whether these results indicate a preference. The data is shown in table 2 below.

Chis u r = | - | -2/A+D qae( AD 1 )

That is we calculate the absolute difference between the A and D cells.

Step 4

The decision rule For á=. 05, the critical value of ÷2 is 3.84 for degree of freedom 1. Therefore we will reject the null hypotheses if the calculated ÷2 exceeds the critical value from the tables.

Rank of shade chosen Very light F=no. choosing that rank Fo(X)= theoretical cumulative distribution of choices under Ho Sn(X)= cumulative distribution of observed choices |Fo(X)Sn(X)| .15 .20 .15 0.0 .40 .70 .90 1.00 of 80 60 40 20 homeowners light bright dark

Step 5

We now actually calculate the test statistic: The calculated x2 =( |A-D | -1) 2/A+D =(70-30 -1)2/ 100=15.21

Step 6

Draw a statistical conclusion. Since calculated x2 exceeds the critical value, we reject the null hypotheses and we can infer that the mail coupon campaign was successful in increasing the purchase rate of the product under study. When an analysis involves more than two variables we use the Cochran Q test. For situations where involving repeated observations where the dependent variable can take on only two values; either 0 or 1.

.25

.50

.75

1.00

**Tests for Ordinal Data
**

So far the test we have discussed is applicable only to nominal data. We now look at a test, which is specifically designed for ordinal data.

**Kolmogorov –Smirnov One Sample Test
**

This test is similar to the chi square test of goodness of fit. This is because it looks at the degree of agreement between the distribution of observed values and some specified theoretical distribution (expected frequencies). The Kolmogorov Smirov test is used if we want to compare the distribution on an ordinal scale. We can look at an example to see how this test works: A paint manufacturer is interested in testing for four different shades of a colour: very light, very light, bright and dark. Each respondent is shown four prints of the shades and asked to indicate his preference. If colour shade is unimportant, the photos of each shade should be chosen equally. Except for random differences. If colour shades are important the respondents should prefer one of the extreme shades. Since shade represents a natural ordering, the Kolmogorov test is applied to test the preference hypothesis. The test involves specifying the cumulative frequency distribution. We then refer

Ranking of raw scores Regular Regul ar accou nts 19 70 77 14 83 87 68 72 76 90 26 66 60 81 46 48 52 69 13 73 15 50 61 21 47 80 36 78 71 65 27 12 7 29 3 2 14 10 8 1 25 15 18 4 23 21 19 13 30 9 28 20 17 26 22 5 24 6 11 16 Commercial accounts commercial Mann – Whitney U test This is used for data.20) exceeds the critical value of . It is then expressed as a fraction of the total sample. i. 5. 25.e. Letting regular accounts be sample 1 and the commercial accounts be sample 2. 2. Drawing a statistical conclusion: Since calculated D value (. 2.096. 6.. Calculation of test statistic: The theoretical probability distribution is calculated by taking each class frequency as if it were under the null hypotheses. 100/200=. 05. we find that . D= max |Fo (X)-Sn (X)| Where Fo (X) is the specified cumulative frequency distribution under Ho for any value of x and is the proportion of cases expected to have scores equal to or less than x. Thus row 2 would be calculated as 50/200=. and so on. In our example this is . Specification of null hypotheses. Ha is that there is a significant difference difference in attitudes of the two groups. The formula for the Mann whitney U value is : U1= n1 70n2 +n1 (n1 + 1)/2 .U1 Analysis of Differences Where n1 and n2 are the two sample sizes and R1 and R2are the sums of the ranks for each group. in our case they would be equal. That is. The scores from the combined samples were then ranked in terms of their magnitude of the original score (columns 3 and 4 of Table3). Ho is that there would be no difference among the shades. 200/200=1 The observed cumulative frequency of row 3 is found by 80/ 200=. Ha: there is a difference in the shades of the new colour. the null hypotheses of no difference among shades are quickly rejected.. The level of significance for this test is . 150/200=. The Null Hypothesis Ho is that there is no difference in attitudes of the two groups of accounts towards bank services. For example the critical value is . 70. 5. 3. 75. Also the samples are independent. This test focuses on the largest value of the deviations among observed and theoretical proportions. The decision rule: If the researcher chooses a a =.36√n where n is the sample size. 90 The calculated D value is the point of greatest divergence between cumulative observed and theoretical distributions. the critical value of D for large samples is given by the formula 1.. An Example Will Illustrate This Test To illustrate its use we will examine regular and commercial account satisfaction data from Table 3. 180/200=. 140/200=. The level of significance: The test would be conducted at the 55 level.20 7. the next highest 2.05.96. So(X): observed cumulative frequency distribution of a random sample of N observations where X is any possible score. which is ordinal and can be ranked.We would carry out the test as follows: 1.96.R U2 = n1 n2 . Here the Kolmogorov tSmirnov test is appropriate because the data measured are ordinal and we are interested in comparing the above frequency distribution with a theoretical distribution. the decion rule therefore states that null hypotheses will be rejected if computed D >. Decision regarding statistical test. The table contains the attitude scores obtained from 30 customers (15 regular accounts and 15 commercial accounts). This test makes use of the actual ranks of the observations as a means of testing the hypotheses about the identity of two population distributions. The Mann whiteny Test is used because the data is ordinal and converted into ranks. 40. the highest score gets rank 1. 4.

the critical value. the evidence does not support a difference in the attitudes between the two groups.05.making procedure we followed for most of the other tests of Table 4 significance. rejects the null hypothesis. For on á = 0. Therefore. Step 4. However the larger the difference between the underlying population. is 64 or less. An Example Illustrates This Rule To illustrate the procedure. For our example. a one-tailed test is appropriate. Since the direction of the difference is predicted.198 = 147 And U2 = (15)(15) . This test is therefore suited to a pre-test and post-test situation. For this test. the bank administered the same questionnaire to the same group of people. is ordinal and the differences can be ranked in magnitude. Step 5. Calculate the Test Statistic. The Level of Significance. It was decided that á = 0. The Null Hypothesis. After the ad campaign. such as regular accounts. If the null hypothesis is true the sums of positive and negative ranks should be approximately equal. the sign of the difference is attached to the rank for that difference.Wallis test is an extension of the Mann-Whitney U t situations where more than two independent samples are being compared example.05 level of significance and a one-toiled test are 10. T = 6. it will not be rejected. The test statistic.225 . Both the before-and after ad campaign scores is presented in Table 4 RESEARCH METHODOLOGY Calculation of the Test Statistic. Inspection of the formulas for calculating U will indicate that the more similar the two groups are in their attitudes. For this test the computed value must be less than the critical value U* to reject the null hypothesis.5 5 68 The Mann-Whitney U is the smaller of the two U values calcu-lated. The Statistical Test. Signed Rank or Wilcoxon Test This test is the complement of the Mann – Whitney U test and is used when ordinal data on two samples are involved and the two samples are related or dependent. Solution Step 1. The test statistic calculated is the T value. the smaller the R values will be and the larger the value of U.n1 = 15 n2=15 R1 = 198 R2= 267 The critical value for the statistic U* is found in Appendix I. Then these differences are rankedordered without regard to their algebraic sign. This decision is just the opposite of the decision. Once again.5 4 2. the ad campaign designed to mea-sure the awareness of services offered.5 2 9 6.198 2 = 225 + 120 . Step 2. commercial aCC4 and charge accounts. suppose that our bank in the previous ex-amples wanted to test the effectiveness of an ad campaign intended to enhance the awareness of the bank’s service features. In this case the Mann-Whitney U is 78. the null hypothesis will be rejected if the computed value.5 since the smaller sum is associated with the negative difference. Finally. The test can also be used for interval or ratio data when the assumptions underlying the parametric z or t test cannot be met. The Decision Rule. The Kruskal. This indicates that a computed Tvalue of less than 10. this test could be used if the customers were divided into three or groups based on some criterion. n1 = 15 and n2 = 15. U. is the smaller of the two sums of the ranks.147 = 78 Consumer Awareness of Bank Services Offered BEFORE AD Consumer Campaign AFTER AD Campaign DIFFER ENCE In Score OVERALL Rank RANK OF Rank Of + SCORES SCORES 1 2 3 4 82 81 89 74 87 4 84 76 78 5 3 -5 2 10 6.5 9 6. Step J. of the Mann-Whitney U statistic. Drawing a Statistical Conclusion: Since the computed U is larger than the critical U*. The null hypothesis to be tested is that there is no difference in awareness of services offered after the ad campaign. the critical value for the Mann-Whitney statistic is U* = 64 for a two-tailed test.147 = .556 . The argument is similar to that. The Wilcoxon test is appropriate be-cause the study is of related samples and in which the data measured.198 = 345 . The procedure for the test is very simple.05. the smaller would be the value of T since it is defined as the smaller of the ranks. Otherwise. The alternative hyp6thesis would be that there was an aware-ness of the services after the ad campaign.5 4 6. it does not fall in the critical region and the null hypothesis is not rejected. The signed difference between each pair of observations is found. U1 = (15)(15) + 15(15 + 1) . The critical value of the Wilcoxon T is found Appendix J for n = 10 at the 0. The bank ad-ministered a questionnaire before 138 © Copy Right: Rai University 11. T. The bank wished to determine whether there was any change in the aware-ness of services offered due to the ad campaign. Therefore we are testing the probability of obtaining a value the size of the smallest of the two groups if the two groups are indeed similar in their attitudes.

Run an analysis of the data using the 0. Because of its newfound purchasing power. To check on this claim. As part of their operating policy. the neighborhood stores that is a member of the association than in supermarkets. and asked the association’s marketing director to check this out before the copy was submitted to the newspaper.5 is less than the critical T valie of 120. Discussion Questions 1. What conclusions can you draw? References: 1.Step6 We draw a statistical conclusion. By making large purchases as a group. One member stated that’ ‘we better make sure of our claim. since the computed T value is 6. the null hypothesis. the executive board brought the advertising campaign to the members for approval. which states that there is no difference in the awareness of bank services.05 level of significance.05 National Motors. is rejected. The researchers believed the data obtained from the questionnaires were ordinal. Inc. The board of directors agreed. it is able to pay less than if it were to purchase as an individual store. National Motor’s marketing research de-partment developed a questionnaire utilizing Liket-type statements that encom-passed a full range of service and warranty questions. Dr Ashrams web site RESEARCH METHODOLOGY Exercises Small Grocers Association The Small Grocers Association is a group of independent grocers who have banded together so that they may compete with the larger supermarket chains. National Motors is a manufacturer of motor scooters. It has chosen one Eroduct”-a 5-pound ham-with which to make its point in the advertisement. the executives wished to determine whether there was a difference be-tween the dealer’s customers and the company’s dealers in terms of satisfaction with the company’s warranty policy. the marketing director obtained a random sample of prices from six neighborhood association member stores and nine supermarkets Discussion Questions 1 What analytical technique would be appropriate for making 0 formed analysis? 2 What ore the null and the alternative hypotheses to be tested? 3 Run an analysis of the data using the 0. At its recent monthly meeting. Luck and rubin Marketing Research 3. The questionnaires were mailed to a random sample of customers who had returned the National Motors’s warranty card and a second mailing of the questionnaires were sent to a sample Dealers Customers Dealers 74 81 35 59 90 33 82 68 56 46 92 42 54 59 83 30 34 54 39 65 43 23 88 55 67 53 85 70 30 75 49 32 52 27 81 68 51 65 25 46 © Copy Right: Rai University 11. Levin and Rubin management Statistics 2. the association wants to advertise that its prices are really no high.556 139 . What is the null hypothesis to be tested? 3. What analytical technique(s) would be appropriate for making a formal analysis? 2. .

• This occurs when the population is normally distributed with population variance ó 2. The mean of a chi square distribution is its df. Since it is the ratio of two non- negative values. Look up this area for the right critical value and one minus this area for the left critical value. 1. • For each degree of freedom.RESEARCH METHODOLOGY LESSON 25: CHI-SQUARE TEST Student. For Example: Most people born in the 70’s didn’t have to learn interpolation in high school because they had calculators which would do logarithms (we had to use tables in the “good old” days). there are a couple of choices that you have. The mode is df .7. we have one chi-square distributions • The degree of freedom when working with a single population variance is n-1. • Area to the right . Independence Test A test to see if the row and column variables are independent.just use the area given. • You can interpolate. Chi-Square Probabilities Since the chi-square distribution isn’t symmetric. Goodness-of-fit Test A test to see if a sample comes from a population with the given distribution. The most common use of the chi square distribution is to test differences between proportions. therefore must be non-negative itself.556 . These are the sample frequencies. • Area in both tails .divide the area by two. the skew is less with more degrees of freedom. it has come to be known as the chi square test. Although this test is by no means the only test based on the chi square distribution. • Area to the left . Interpolation involves estimating the critical value by figuring how far the given degrees of freedom are between the two df in the table and going that far between the critical values in the table. its degrees of freedom (df). The Chi Square distribution is a mathematical distribution that is used directly or indirectly in many tests of significance. Observed Frequency The frequencies obtained by observation. so Key Definitions Chi-square distribution A distribution obtained from the multiplying the ratio of sample variance to population variance by the degrees of freedom when random samples are selected from a normally distributed population Contingency Table Data arranged in table form for the chi-square independence test Expected Frequency The frequencies obtained by calculation. 2. Symbolically. the method for looking up left-tail values is different from the method for looking up right tail values. Degree of Freedom Which Aren’t in the Table When the degrees of freedom aren’t listed in the table.(with usual notations) Properties of the Chi-Square • Chi-square is non-negative.2 and the median is approximately df -0 . you all aware of chi-square test for goodness of fit and independence of attributes • Sample variance and population variance multiplied by the degrees of freedom. Chi-Square Distribution The chi-square ?2 distribution is obtained from the values of the ratio of the 140 subtract the given area from one and look this area up in the table. Chi-square is defined as .the table requires the area to the right. It has a positive skew. © Copy Right: Rai University 11. This is probably the more accurate way. • Chi-square is non-symmetric. The chi square distribution has one parameter. • You can go with the critical value which is less likely to cause you to reject in error (type I error).

The square of the deviation is divided by the expected frequency to weight frequencies. Therefore. an observed value. If you have collected 100 radio-locations on a bear or bears. 3. For a two-tail test. which is calculated based upon the claimed distribution. You’re simulating a multinomial experiment (using a discrete distribution) with the goodness-of-fit test (and a continuous distribution). you can think of the test for independence as a goodness-of-fit test where the data is arranged into table form. The test statistic has a chi-square distribution when the following assumptions are met • The data are obtained from a random sample • The expected frequency of each category must be at least 5. . • It has a chi-square distribution. • It has a chi-square distribution. if the species were not selecting or 141 is only one data value for each category. If the sum of these weighted squared deviations is small. • The degrees of freedom is one less than the number of categories. black bear).. The idea is that if the observed frequency is really close to the claimed (expected) frequency. it is the value further to the left (smaller). the chi-square goodnessof-fit test is always a right tail test. • The value of the test statistic doesn’t change if the rows and columns are interchanged (transpose of the matrix) Applications of Chi-square Test 1. The multiplication rule said that if two events were independent. The test for independence is always a right tail test. The following are properties of the goodness-of-fit test • The data are the observed frequencies.g. Uses of Chi-square test 1. This table is called a contingency table. it’s the critical value which is further to the right. The test statistic used is the same as the chi-square goodness-offit test. 2. and if each expected frequency is at least five then you can use the normal distribution to approximate (much like the binomial). • It is always a right tail test. this is the critical value further to the right (larger). all hypothesis testing is done under the assumption the null hypothesis is true. The following are properties of the test for independence • The data are the observed frequencies. it is the product of the two degrees of freedom. 2. • The data is arranged into a contingency table.. The principle behind the test for independence is the same as the principle behind the goodness-of-fit test. and the expected frequency. For a left tail test. If you end up rejecting the null hypothesis. • The value of the test statistic doesn’t change if the order of RESEARCH METHODOLOGY the categories is switched. • It is always a right tail test. Therefore. the claim is that the row and column variables are independent of each other. but a difference of 10 isn’t very significant at all if the expected frequency was 1200.. For a right tail test. you would expect that. then the assumption must have been wrong and the row and column variable are dependent. which is the frequency of a category from a sample. A difference of 10 may be very significant if 12 was the expected frequency. Note. This is the null hypothesis. it’s the value further to the left and the value further to the right. then the square of the deviations will be small. Remember. Test for Independence In the test for independence. This is key to working the test for independence.Goodness-of-fit Test The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the claimed distribution. It is not one less than the sample size.1. you can use a goodness of fit test to see if the animals are using habitat in proportion to its availability. row variable times the degrees of freedom for the column variable. not one less than the sample size. then the probability of both occurring was the product of the probabilities of each occurring. Only when the sum is large is the reason to question the distribution. it is not the column with the degrees of freedom further to the right.556 © Copy Right: Rai University . the observed frequencies are close to the expected frequencies and there would be no reason to reject the claim that it came from that distribution. Two values are involved. • The expected value is computed by taking the row total This goes back to the requirement that the data be normally distributed. This means that there times the column total and dividing by the grand total • The value of the test statistic doesn’t change if the order of the rows or columns are switched. • The degrees of freedom are the degrees of freedom for the The test statistic has a chi-square distribution when the following assumptions are met • The data are obtained from a random sample • The expected frequency of each category must be at least 5. In fact. Goodness of fit Habitat Use. If you are looking for habitat selection or avoidance in a species (e. 11. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.

071 doctors who were followed for 5 years: The expected value. A common test for the “fit” of a model to the data is chi-square goodness-of-fit test. Contingency Tables and Tests for Independence of Factors: Survival. 2. Calculate the chi-square value for these bears. Suppose you are investigating the population dynamics of the deer herd at Sandhill and hypothesize that a constant death rate of 50% per year has existed over the past several years with a stable herd.05) is 11. This study (The Physicians Health Study) was described in one of the episodes of the statistics video series A Against All Odds.96875 X = 253 X = 129 Subsequent expected values are computed by applying the expected 50% death rate (d) for each succeeding year..933 104 —>11. what can you conclude about the independence of these 2 factors? rates? II. etc. . 2.52 + . Imagine you generate the following data from your spring bear study: Ho : The bears are using habitat in proportion to its availability. Calculate the chi-square value.. 3. 1) State the null hypothesis 2) Calculate the Chi-square value. you would find the radio-locations spread in each habitat type depending on its general availability (If 90% of the area was lowland conifer.556 .avoiding habitat.845 21. 0-5m. Fill in the expected number of radio-locations in each cover type.g. what do you conclude about the “fit” between the observed and hypothesized death rates? Are there significant differences?? Model Selection: Many programs develop predictive equations for data sets..5 + . uses discrete distance categories (e.9 1. you would expect 90% of the locations to occur in that habitat type).05) is anything greater than 3. the critical value for chi-square (at alpha = 0. RESEARCH METHODOLOGY 1..0705. of the first age group is obtained from the formula: X + (d + d2 + . dn-1 )X = T X + (.778 189 293 —>11. EX: Expected Value = 70 * 67 / 100 = 46. program DISTANCE. but inert placebo) would reduce the rate of heart attacks..071 is grand total) 10.96875) X = 253 1.) to see how well the model predicts the number of objects that should be seen in each distance category (Expected) versus what the data actually show (Observed).037 Data > Weight Cases > Weight Cases by > click over Count Analyze > Descriptive Statistics > Cross tabs Row: aspirin Column: heartatt Statistics > Chi-square Cells > row percentages chi-square = 25. You sample the population at random and classify deer into age groups as indicated: Expected values for each cell can be can be calculated by multiplying the row total by the column total and dividing by the grand total. (X).034 (22.55 ) X = 253 X + (. For example. Is the probability of being male or female independent of being alive or dead? Let’s use data on ruffed grouse. 5-10m.05) is anything greater than 9. You collect mortality data from 100 birds you radio-collared and test the following hypothesis: Ho: The 2 sets of attributes (death and sex of bird) are unrelated (independent). Given that the degrees of freedom are (r-1)*(c-1)= 4. which develops curves to estimate probabilities of detection.01 critical chi-square = 3.84 Statistical decision: Reject H0 142 © Copy Right: Rai University 11.84146. Knowing the critical value for 1 degree of freedom (alpha = 0. 3) Knowing the critical value for 5 degrees of freedom (* = 0. II) Heart Attack Absent Present Aspirin Aspirin Placebo 10. Would you accept or reject the null hypothesis? What does that mean in layman’s terms? Survival.49. Chi Square Test for Independence (2-Way chisquare) (SPSS Output) A large-scale national randomized experiment was conducted in the 1980s to see if daily aspirin consumption (as compared to an identical.@ Here are the actual results from the study using 22.

Then we have 1 2 3 15(22. and r3 by the grand total n. the null hypothesis states that each cell probability will equal the product of its respective row and column probabilities. a cell probability will equal the product of its respective row and column probabilities in accordance with the Multiplicative Law of Probability Example of obtaining column and row probabilities For example.071)= 25. The alternative hypothesis is that this equality does not hold for at least one cell. Let Us Construct An Industrial Case Industrial example A total of 309 wafer defects were recorded and the defects were classified as being one of four types. second. Then we have Total 74 69 128 38 309 (Note: the numbers in parentheses are the expected cell frequencies). define pB. In other words. While the numerical values of the cell probabilities are unspecified.94) 34(39. On the other hand. To obtain the observed column probability. or 3) be the row probability that a defect will have occurred during shift i.001. first according to the type of defect and. A. n. according to the production shift during which the wafers were produced. These probabilities. and pD as the probabilities of observing the other three types of defects.56) 5(11. while the alternative hypothesis is Ha: the classifications are dependent. © Copy Right: Rai University 143 .itl. This condition implies independence of the two classifications. where p1 + p2 + p3 = 1 Multiplicative Law of Probability Then if the two classifications are independent of each other. B.9) 33(28.51) 26(22. 1. we get RESEARCH METHODOLOGY Part II Application of chi-square test Source:http://www. At the same time each wafer was identified according to the production shift in which it was manufactured.50) 21(20. respectively Denote the observed frequency of the cell in row i and column jof the contingency table by n ij.29) 13(11. A contingency table consists of r x c cells representing the r x c possible outcomes in the classification process. If the proportions of the various types of defects are constant from shift to shift. or 3.gov/div898/handbook/prc/ section4/prc45. then classification by defects is independent of the classification by production shift. chi-square (1.99) 31(21. and p3 are estimated by dividing the row totals r1.Conclusion A chi-square analysis indicated that there was a significant relationship between aspirin condition and incidence of heart attacks. it is often of interest to decide whether these criteria act independently of one another. Likewise. the row probabilities p1. then the classification by defects depends upon or is contingent upon the shift classification and the classifications are dependent. will satisfy the requirement pA + pB + pC + pD = 1 Row probabilities By the same token. which are called the column probabilities. p2. the probability that a particular defect will occur in shift 1 and is of type A is (p1) (pA).77) 49(49. For example. if the proportions of the various defects vary from shift to shift. 2. r2. C. 2.6 3) 94 96 119 Expected Cell Frequencies Denote the observed frequency of the cell in row i and column jof the contingency table by n ij. pC.44) 17(26.81) 20(14. or D. we state the null hypothesis as H0: the two classifications are independent. Contingency table classifying defects in wafers according to type and production shift These counts are presented in the following table Type of Defects Shift A B C D Total Similarly. A greater percentage of heart attacks occurred for participants taking the placebo (M=1. Column probabilities Let pA be the probability that a defect will be of type A. let pi (i=1. p<.57) 45(38.556 Estimated expected cell frequency when H0 is true. In the process of investigating whether one method of classification is contingent upon another.nist. 11. Denoting the total of column j as c j. it is customary to display the data by using a cross classification in an array consisting of r rows and c columns called a contingency table.htm Product and Process Comparisons Comparisons based on data from more than two processes How can we compare the results of classifying according to several categories Contingency Table approach When items are classified according to two or more criteria.7%) compared to those taking aspirin (M=0. divide the column total by the grand total.01. N=22. suppose we wish to classify defects found in wafers produced in a manufacturing plant.

and determining the smoking status of those controls. 6. b. 3. Sex of case (cot death) child Sex of Control Child Male Female Male 68 59 37 53 Female RESEARCH METHODOLOGY The estimated cell frequencies are shown in parentheses in the contingency table above. b What can be said about the absolute risk (probability) of stroke for the two groups (smokers and non-smokers)? 4. which was obtained by selecting male patients with stroke and determining their smoking status. Of 322 subjects in the a Calculate the Odds Ratio (OR) of stroke for smokers vs nonsmokers. What is the Odds Ratio (OR) of Cot Death for boys versus girls? c. and finding a group of male controls with the same age distribution. Consider the following data: Glasses Worn? Marital Status Spectacles Never Married Currently Married Previously Married None 157 135 102 Contacts 67 22 81 74 132 85 a.A recent experiment investigated the relationship between smoking and urinary incontinence. Consider the following data for the risk of Cot Death / Sudden Infant Death Syndrome (SIDS). 8. The data are obtained by selecting all cases of Cot Death in one year. Ten subjects are each given two flavors of ice-cream to taste and say whether they like them. What additional information is needed about the controls to ensure that these data are correct? 5 A Market researcher interested in the business publication reading habits of the purchasing agents has assembled the following data Business publication A B C D frequency of first choice 35 30 45 55 Use a c2 procedure to determine if there is any association between marital status and what eyewear is used. Can you think of another factor which may explain this table? 2 Consider the following data regarding Sudden Infant Death Syndrome (SIDS) State of Residence Status at Age 2 Cot Death Still Alive Tasmania 37 6 214 Queens land 88 21 601 • Test the null hypothesis (á = 0. Exercises 1. (a) Calculate the probability (risk) of Cot Death for each state (b) Calculate the Relative Risk (RR) of Cot Death for Tasmanian infants vs QLD infants.10 versus .556 . Smoking Status Stroke YES (cases) NO (controls) Smokers 44 29 NonSmokers 30 45 Outcome Frequency 1 2 3 4 5 6 8 4 1 8 3 0 Conduct a significance test to see if the die is biased. the estimated expected value of the observed cell frequency n ij in an r x c contingency table is equal to its respective row and column totals divided by the total frequency going back to the birth records at the hospital to locate the sex of the next child born at that hospital. test the null hypothesis at á = 0. and using these children as controls. Can the Chi Qquare test be used to test whether this difference in propotions (.In other words. and then 144 © Copy Right: Rai University 11. It is rolled 24 times with the following result.05 ) that there is no difference among frequencies of first choice of tested publications • If the choice of A and C and that of B and D are aggregated.70) is significant? Why or why not? 10.05 that there is no differences. When should the correction for continuity be used? 7. When can you use either a Chi Square or a z test and reach the same conlcusion? 9. One of the 10 liked the first flavor and seven out of 10 liked the second flavor. Calculate the Odds Ratio (OR) of Cot Death for girls versus boys. when the row and column classifications are independent. Consider the following data for the risk of having a stroke. A die is suspected of being biased.

23 were former smokers. Chi-Square ( ) Analysis. The first two questions can be unfolded using Chi-Square test of goodness of fit for a single variable while solution to questions 3. Please note that the variables involved in Chi-Square analysis are nominally scaled.250 0.K.250 8.400 means that no assumption needs to be made about the form of the original population distribution from which the samples are drawn.000 1. you have to find out whether this preference could have arisen due to chance. and 193 had never smoked. The appropriate test is the c2 test of goodness of fit. please go through the contents of hyperstat on chi-square test meticulously. an attempt is made to bring into sharp focus the use of χ2 in marketing function. This Do the consumer preferences for package colors show any significant difference? Solution: If you look at the data. He is interested in knowing which of the five is the most preferred one so that it can be introduced in the market. and 158 had never smoked. Viswanathan Adjunct Professor and Management Consultant Chennai-India 2.com/hyperstat/chi_square.450 0. Of 284 control subjects who were not incontinent. The appropriate test statistic is the c2 test of goodness of fit.556 © Copy Right: Rai University 145 . Problem: In consumer marketing. Do a significance test to see if there is a relationship between smoking and incontinence Appendix Source: http://davidmlane. the coverage is exhaustive.html Glimpses into Application of Chi-Square Tests in Marketing By P. The symbol χ2 used here is to denote the chi-square distribution whose value depends upon the number of degrees of freedom (d. Null Hypothesis: All colors are equally preferred. 113 were smokers. 4. increases and becomes large. For a meaningful appreciation of the conditions/assumptions involved in using chi-square analysis. χ2 tests are nonparametric or distribution-free in nature. Alternative Hypothesis: They are not equally preferred Observed Package Color Red Blue Green Pink Orange Total Expected Frequencies Frequencies (O) 70 106 80 70 74 400 (E) 80 80 80 80 80 400 100 676 0 100 36 1.Introduction χ2 Package Color Red Blue Green Pink Orange Total Preference by Consumers 70 106 80 70 74 400 Consider the following decision situations: 1) Are all package designs equally preferred? 2) Are all brands equally preferred? 3) Is their any association between income level and brand preference? 4) Is their any association between family size and size of washing machine bought? 5) Are the attributes educational background and type of job chosen independent? The answer to these questions require the help of Chi-Square (χ2) analysis. Assume that a marketing manager wishes to compare five different colors of package design. chi-square distribution is a skewed distribution particularly with smaller d. Please note that all parametric tests make the assumption that the samples are drawn from a specified or assumed distribution such as the normal distribution. the expected frequencies for all the colors will be equal to 80. The ideas presented in this article certainly can be extended to many decision situations in marketing that can fruitfully employ chi-square tests. By no means. The illustration given below will clarify the role of c2 in which only one categorical variable is involved. Please note that under the null hypothesis of equal preference for all colors being true.study who were incontinet. As we know. 1.f. The aim is to make the reader appreciate the conceptual framework of Chi-Square analysis through problem illustrations in marketing. the χ2 distribution approaches normality.f. A random sample of 400 consumers reveals the following: RESEARCH METHODOLOGY Preamble In this article. a common problem that any marketing manager faces is the selection of appropriate colors for package design. Nominal data are also known by two namescategorical data and attribute data.f. 68 were smokers. Applying the formula 11. As the sample size and therefore the d. and 5 need the help of Chi-Square test of independence in a contingency table. Statistically. Chi-Square Test-Goodness of Fit A number of marketing problems involve decision situations in which it is important for a marketing manager to know whether the pattern of frequencies that are observed fit well with the expected ones.450 11. you may be tempted to infer that Blue is the most preferred color.). 51 were former smokers.

000 600. In particular.000 46. As in the case of the goodness of fit. the null hypothesis is rejected.000 33.425 47. Brand 1 and Brand 2 are the premium brands while Brand 3 and Brand 4 are the economy brands. you need to work out the expected frequency in each cell in the contingency table. The inference is that all colors are not equally preferred by the consumers. The marketing manager can introduce blue color package in the market. calculate this c2 146 © Copy Right: Rai University 11. A representative stratified random sampling procedure was adopted covering the entire market using income as the basis of selection. For calculations on expected frequencies. income level and brand preference are independent) Brands Brand1 Income Lower Middle Upper Middle Upper Total 44. The categories that were used in classifying income level are: Lower.000 33. Calculation Compute There are 16 observed frequencies (O) and 16 expected frequencies (E).333 25.000 147. Chi-Square Test of Independence The goodness-of-fit test discussed above is appropriate for situations that involve one categorical variable.625 36.042 125.000 40. Solution we get the computed value of chi-square ( χ2 ) = 11. The marginal totals of the rows and columns are used to calculate the expected frequencies that will be part of the computation of the χ2 statistic. Blue is the most preferred one. 3.575 165.000 160.000 42. This test is very popular in analyzing cross-tabulations in which an investigator is keen to find out whether the two attributes of interest have any relationship with each other. It would like to know in particular whether the income level of the consumers influence their choice of the brand. Middle.Analyze the cross-tabulation data above using chi-square test of independence and draw your conclusions. Cross Tabulation of Income versus Brand chosen (Figures in the cells represent number of consumers) Brands Brand1 Brand2 Brand3 Brand4 Income Lower Middle Upper Middle Upper Total 25 30 50 60 165 15 25 55 80 175 55 35 20 15 125 65 30 22 18 135 160 120 147 173 600 Total RESEARCH METHODOLOGY Null Hypothesis: There is no association between the brand preference and income level (These two attributes are independent).556 . and our interest is to examine whether these two variables are associated with each other. refer hyperstat on χ2 test.875 50. Currently there are four brands in the market. This company is a major player in the detergent market that is characterized by intense competition. For calculating expected frequencies. Do not round them. If there are two categorical variables.488. Alternative Hypothesis: There is association between brand preference and income level (These two attributes are dependent).458 175. Let us take a level of significance of 5%.667 35. So.000 36. Problem: A marketing firm producing detergents is interested in studying the consumer behavior in the context of purchase decision of detergents in a specific market. the chi-square (χ2) test of independence is the correct tool to use. please go through hyperstat.075 38. In our example. In order to calculate the χ2 value.000 33.000 173. It contains frequency data that correspond to the categorical variables in the row and column. A sample of 600 consumers participated in this study. The following data emerged from the study. there are 4 rows and 4 columns amounting to 16 elements.400 The critical value of χ2 at 5% level of significance for 4 degrees of freedom is 9.925 135.000 27. Upper Middle and High. There will be 16 expected frequencies.000 30. The cross-tabulation is popularly called by the term “contingency table”. Relevant data tables are given below: Observed Frequencies (These are actual frequencies observed in the survey) Brands Brand1 Brand2 Brand3 Brand4 Income Lower Middle Upper Middle Upper Total 25 30 50 60 165 15 25 55 80 175 55 35 20 15 125 65 30 22 18 135 160 120 147 173 600 Total Expected Frequencies (These are calculated on the assumption of the null hypothesis being true: That is.000 Brand2 Brand3 Brand4 Total Note: The fractional expected frequencies are retained for the purpose of accuracy.000 120.

805 36.691 64.549 19.617 67.987 17.476 56.369 57.212 48.684 15.195 44.585 43.347 70.026 22.252 40. Consumers in different income strata prefer different brands.256 41.920 0.507 16.932 40.528 36.910 34.745 44.869 30.337 42.428 63.896 58.957 0.998 52.620 54.410 32.486 54.312 43.852 34.value.535 19.402 74.409 34.f = 16.196 34.776 56.437 55.210 11.461 45.467 20.315 46.942 58.561 61.023 20.278 49.821 69. At 5% level of significance.301 59.703 72.10 1 2 3 4 5 6 7 8 9 10 11 0. Thus. In our case.059 47.01 5.087 40. In the marketplace.488 28.05 0.990 64.619 67.364 40.00 3.76 as shown below: Each cell in the table below shows (O-E)2/(E) RESEARCH METHODOLOGY 12 13 14 15 18.802 50.419 78.92.113 41.203 54.236 10.266 18.602 49.566 38.400 48.084 77.077 0. So the degrees of freedom =(41).251 7.378 9. (4-1) =9.49 2.362 23.668 56.996 26.025 26.557 43.422 42.605 6.979 48. In our case.781 38.412 29.615 30.706 4.820 45.475 20.526 32.443 73.488 11.025 3. Adding all these 16 values.192 53.725 51.483 21.833 14.542 24.828 13.812 18.304 60.400 82.061 57.067 15. The company should develop suitable strategies to position its detergent products. there are 4 rows and 4 columns.950 66.322 26.36 0.090 55.410 0.206 67.86 3.773 44.845 30. The inference is that brand preference is highly associated with income level.314 45.572 55.064 22.774 60.162 62.362 14.816 16.384 54.728 51.638 42.037 147 11.20 0.556 © Copy Right: Rai University .013 17.985 46.017 13.10 21.05 23.275 39 40 41 42 43 44 45 46 47 48 58.963 48. the computed c2 =131.000 33.720 84.264 2.337 24.30 14.725 10.830 64.76 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 The critical value of χ2 depends on the degrees of freedom.172 36.170 35.27 2.980 44. Therefore reject the null hypothesis and accept the alternative hypothesis.813 32.652 38.33 3.277 15.191 53.201 72.660 51.43 17.675 0.505 0. Specifically.646 41.458 24.923 43.812 21.666 23.28 23.697 39.892 58.487 63.209 24.27 3.143 12.619 59.877 29.683 81.232 49.587 28.989 27.563 36.01 32.001 Brand1 Brand2 Brand3 Brand4 Income Lower Middle Upper Middle Upper 8.363 49.25 2 16 17 18 19 20 and there are 16 such cells.805 52.736 26.415 37.24 21.893 61.479 36.769 25.119 27. critical χ2 for 9 d.179 52.71 11.924 35.459 68.758 56.342 58.949 54.098 62.194 47.171 66.268 49.144 31.296 27.123 37.191 31.007 33.204 28.722 46. the choice of the brand depends on the income strata. we get χ =131.481 61.919 18.141 30.348 11.480 50.703 61.841 5.578 32.885 40.055 73.217 27.892 52.645 12.641 59.656 0.985 69.024 7.710 69.342 60.201 65.671 33.001 6.086 16.070 12. it should position economy brands to lower and middle-income category and premium brands to upper middle and upper income category.779 9.642 46.382 35.966 53.307 23.120 59. consumers in upper middle and upper income group prefer premium brands while consumers in lower income and middle-income category prefer economy brands.307 19.777 62.289 41.090 21.449 16.515 22.745 76.750 80.076 39.916 39.592 14.588 31.001 65.790 42.247 66.797 48.125 27.345 13.907 62.023 71.903 46.08 4.230 56.685 24.69 12.124 59.588 50. The degrees of freedom = (the number of rows-1) multiplied by (the number of colums-1) in any contingency table.815 9.991 7.870 65.191 37. Chi Square Table Upper critical values of chi-square distribution with degrees of freedom Probability of exceeding the critical value 0.052 55.741 37.688 29.513 50.635 9.

168 94.324 128.339 67.211 . 4.438 139.002 76.393 113.715 123.476 134.317 113.312 10.816 104.042 7.461 95.850 121.061 92.973 81.484 .260 8.024 . 14.297 .079 103.780 99.512 114.810 75.189 97.069 82.858 125.942 4. 7.733 85.850 89.242 112.344 144.311 74.390 116. 3.527 86. 5.081 96.051 .000 .695 115.808 93.379 89.841 116.999 .053 3.038 63.828 98.469 107.261 7.632 118.022 113.498 110.141 127.879 103. 19. 2.216 .633 128.990 122.865 5.222 71.131 115.145 114.860 79.514 76.512 88 89 90 91 92 93 94 95 96 97 98 99 100 105.980 102.038 114.404 5.616 79. 13.598 .292 83.561 121.192 77.272 90. 21.416 4.583 108.107 4.000 126.381 .791 111.871 120.393 107.390 10.117 83.830 88.905 5.298 84.167 64.001 .804 131.351 86.397 75.843 81.085 10.857 1.304 7.835 116.958 111.232 81.839 101.99 .892 6.040 72.250 89.351 98.993 72.468 75. 9.617 100.888 102.978 137.116 125.139 105.821 85.999 103.945 95.100 122.700 3.783 108.004 .103 . 17.108 88.661 109.RESEARCH METHODOLOGY 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 148 62.015 7.422 129.90 1.591 0.749 101.907 9.023 96.158 104.554 .578 6.153 73.544 114.115 .651 12.000 .556 .752 119.591 85.965 87.309 133.531 91.919 76.090 111.962 8.407 118.295 65.689 93.010 93.057 118.876 117.897 0.919 71.790 8.208 138.225 124.144 112.083 127.325 3.973 131.041 132.016 .745 78. 6.756 110.407 5.484 99.346 118.988 107.262 6.513 84. 12.803 129.223 116.670 92.136 119. 20.565 129.308 84.483 3.690 2.055 112.833 3.516 99.921 6.229 5.975 . 11.716 105.166 88.154 77.505 68.177 104.180 2.693 74.395 107.283 0.711 1.629 107.898 112.476 96.571 122.573 91.462 127.767 122.635 87.630 77.599 119.633 8.282 120.642 135.240 0.217 94.168 4.197 83.831 1.041 3. 10.944 113.661 87.796 69.778 77.908 7.567 145.571 4.669 69.675 84.526 109.752 80.594 124.372 106.237 1.529 83.834 2.152 1.177 90.352 .519 92.479 1.989 118.316 105.733 3.851 11.145 1.204 2.210 .380 78.950 87.584 1.420 72.422 66.353 98.476 85.374 95.571 7.217 97.624 76.449 Lower critical values of chi-square distribution with degrees of freedom Probability of exceeding the critical value 0. 8.522 108.567 79.408 7. 16.443 13.856 95.414 120.427 121.108 123.386 78.816 4.289 126.672 9.648 109.268 115.214 2.085 82.279 74.442 104.629 6.680 98.893 142.275 66.247 3.591 90.277 133.342 115.119 143.577 114.002 .282 128.167 2.771 109.267 106.616 73.228 100.607 100.349 91.666 140.751 97.850 111. 18.391 90.141 132.239 1.746 135.635 2.872 93. 15.447 © Copy Right: Rai University 11.565 108.621 102.654 86.009 5.880 100.558 3.773 70.324 99.010 105.931 79.564 8.511 117.202 106.039 98.578 97.660 5.942 124.231 8.348 123.117 10.743 88.940 4.091 .004 89.381 82. .802 92.872 1.547 9.548 67.95 .418 85.673 68.010 148.028 99.226 5.020 .490 4.936 82.092 117.258 108.166 103.832 70.678 100.839 126.329 113.422 95.812 6.230 149.575 5.956 91.968 89.315 117.082 80.270 94.166 93.160 73.937 110.865 11.646 2.088 2.591 10.425 101.626 96.617 3.010 104.236 119.473 106.789 147.064 1.807 134.610 2.

612 31. 80. 52.003 70.492 27. 34.093 36.738 61.900 59.350 34.803 10. 38.777 69.271 23.389 62.960 20.617 0.879 13.261 62.010 42.282 64.611 17.401 13.153 57.380 64.555 32.623 66.586 36.588 12.577 149 © Copy Right: Rai University .076 66.575 19.650 24.010 45.557 50.595 47.038 44.926 55. 83.776 35.898 66.730 67.689 12.844 39.240 43.151 16.006 76.221 57.362 17.996 50.085 8. 60.696 45.188 10.473 17.643 26. 75.098 33.856 11. 88.103 0. 51. 86.308 16.020 31.881 29. 71. 59.634 58. 72.725 53. 55.126 42.279 60. 47.906 23.258 45. 25.336 22.111 49.215 35. 43.196 10.916 18.256 14. 44.95 41. 66.475 31.850 52.426 22.189 56.292 18.952 24.249 69.764 35.065 26.456 59.924 48.633 35.920 57.265 52.459 12.155 54.038 43.466 56.162 33.339 43.075 24.273 39. 32. 77.246 47.910 52. 64.939 19.610 23.041 14. 27.100 76.428 51.844 14. 81.174 55.309 57.047 19.848 15.022 58.770 52.103 52.233 19. 39.110 23.768 20.239 63.953 15.776 44.148 25. 35.164 22.876 68.793 65.758 49.707 30.928 17.581 63.401 75.004 63.278 65.020 50.965 16.741 46.487 33.398 25.639 45.269 24.338 13.600 53.958 39.562 53.483 63.556 14.387 73. 28.795 60.246 58.074 17.949 36.343 28.937 43.975 38. 41. 40. 29.757 43.764 46.507 44.120 13.509 19.640 73. 26.520 74. 42.99 32.799 59.379 16.976 67.654 59. 56.883 51. 97.884 25.501 67.690 61.308 41.818 78. 49.431 14.782 54.689 38.097 51.291 19.215 25.160 29.999 89.577 46.777 40.913 36.540 54.882 71. 93.519 41.858 48.912 77.522 60.391 10.698 56.246 32.818 37.398 37.625 32. 76.173 28.786 58.983 24.485 6. 46. 90.659 53.797 25.509 27.838 44.276 38. 36.051 47.524 12. 72.450 48.655 16.268 33.586 62.801 40. 48.662 40.592 50.092 47.324 15.848 14.054 56.649 9.482 9. 31.475 50.431 46.941 0.783 71.177 28.366 29.068 65.465 23.907 30. 37.600 36.262 17.238 65.879 51. 78.542 10.986 11.368 26. 58.433 25.183 42. 79.114 18.342 48. 94.845 59.291 74. 54.131 35.647 66.993 56. 63. 33.811 13.298 39.439 32.826 37.211 69.529 8. 65.611 15.212 38.603 45.674 25.081 35.472 57.642 60.126 70.876 64.089 63.520 47.560 39.659 16.570 34.806 20.796 49. 30.755 31.575 28.765 27.106 22.791 17. 62.867 21.633 68.623 55.277 48. 24. 70.695 26.539 18.657 27.599 21.749 65. 84.889 45.459 33.816 44.264 42.928 61.434 22.855 40.057 14.295 23.569 21.576 21.540 61. 91.698 37.438 55.708 18.929 22.027 38. 47.462 54.998 58.666 49. 68.RESEARCH METHODOLOGY 22.051 29. 45.664 22. 96.320 51.018 32.239 19. 23.132 63. 69.226 49.941 29.222 9.793 33.356 68. 50.565 14.036 48. 74.725 79.305 31. 82.063 39.561 38.787 30.942 53.982 11.649 41. 85.906 20.691 21. 95.785 27.409 64.813 57.646 41.468 28.196 29.036 39.176 66.373 0.90 44. 57.968 34.437 37.091 13.498 67.198 12.391 61.085 51.060 42.901 26.357 33. 92.906 34.965 29.144 28.688 15.357 55.072 20. 53.573 15.303 42.692 60.433 40.789 18.956 30.281 20.581 71.181 33.654 24.484 0.386 54.878 23.162 50.196 75. 87. 11.261 47.350 35.416 28.329 56.754 62.999 26.286 51.739 52.444 42.930 34.305 49.679 70.760 72.325 55.442 46.006 58. 67.592 30.983 7.047 16.113 58. 73.493 19.765 31.251 21.925 70.362 36.068 69.326 28.548 54.950 43.196 12.492 42.116 38.

1 8 12.1 2 3.137 61.8 23. click on Chi-square and Phi and Cramer’s V From the Cells submenu.065 60.9 63 251 21 8. your analysis.3 90.9 24.1 100.5 96.9 24. “MARSTAT” (married/non-married) which has categories 1 and 2.34734 Significance ————— . 80.1 Total 25.0 Value DF Significance Chi-Square 150 © Copy Right: Rai University 11.2 3 5 7.2 91.8 26.94944 18.449 82. 100.1 62 24.929 72. 99.9 4 1 1.3 90. click on Observed.556 .01823 *1 ———— ———— ———— ——— 2. Standardized Residuals Select OK to run.046 77. We then rerun the original analysis using MARSTAT.222 68.0%) Approximate Statistic ———— Phi Cramer’s V Value .2 Column 63 Total 25.4 38.1 Chi-Square —————— Pearson Likelihood Ratio linear association 251 100.1 6 9.1 25. Expected.918 widowed 6.164 77. 3.6 1 1.361 74.7 25.2 55.501 73.28192 . with the following options: • • • • *1 Pearson chi-square probability Number of Missing Observations: 1 Note: • Chi-square is statistically significant (p=.6 22.01823 < . and 4 of V76 to form a new variable.6 4 9 3.5 23. we combine categories 2.0 1.1 11. greater than five is 75%.5 63 2 3.8 92.2 62 5 7. • However.988 63 25.6 9.01823 *1 . Mantel-Haenszel test for .0 3.98.36234 Minimum Expected Frequency - 1.6 6.5 96. click on Observed. 3 Rerun.6 11.8 26. Second Crosstab MARSTAT recoded marital status by V3 club membership V3 Page 1 of 1 Count Row Pct Alden Chalet Chestnut Lancaste Col Pct 1 MARSTAT 1 married 2 not married Column 55 57 60 58 230 2 Ridge r 3 4 Total Row • First Crosstab V76 marital status by V3 club membership V3 Count Row Pct Alden Chalet Chestnut Lancaste Col Pct 1 V76 married 2 single 1 55 2 57 Ridge r 3 60 4 Total 58 230 Row Page 1 of 1 23.03135 .396 69. As you will see. Select the following options: Cells with Expected Frequency < 5 .3 4 divorced 100.3 63 25.1 25.5 50.7 DF —— 9 9 1 .54721 Crosstabulation How To: From the SPSS menus (see pages 221-225) choose: • • • • • • Statistics —> Summarize —> Crosstabs Row(s): V76 (marital status) Column(s): V3 (Country Club) From the Statistics submenu. Unstandardized Residuals.230 70.0 Value ———— 19. well above the 20% value we use a cut-off.8 92.12 OF 16 ( 75. • We must combine categories and re-run the crosstabulation. the significance test is not valid.Crosstabulation Marketing Research (MGT 461-1) Professor Novak 2 1 1 4 8 25.2 91.2 1. the number of cells with expected frequency Row(s): MARSTAT (recoded marital status) Column(s): V3 (Country Club) From the Statistics submenu.6 87.5 12. click on Chi-square and Phi and Cramer’s V From the Cells submenu. Thus.6 1.541 81.1 3.01823 .05).6 RESEARCH METHODOLOGY Appendix Source : Internet Marketing Research Analysis . Column Select OK to run.16277 ASE1 Val/ASE0 Significance .1 24.0 12. Row.7 63 25.6 87.356 61.358 76.1 28.

4 63 25.3 -3. Phi is independent of sample size.7 .4 -.0% 2. It is often useful to inspect the pattern of standardized residuals to determine the nature of significant association between two variables.72155 —— 3 3 1 —————— .2 .2 -1. rows).05). we would 5.12311 .1 .0 2 not married 8 5. We would then use: • Row probabilities to compare the marital status groups Approximate Statistic Value ASE1 Val/ ASE0 Significance -—————— ——— ———— ——— — Phi . Thus.18949 100. E[ij].e.28337 *1 Cramer’s V .7% 25. where • • • Third Crosstab MARSTAT recoded marital status by V3 club membership V3 Page 1 of 1 N[i+] = sum of observed counts in row “i” N[+j] = sum of observed counts in column “j” N[++] = sum of all observed counts Example: O[11] = (230)(63)/(251) = 57.12311 . E[ij] = N[i+]N[+j]/N[++].187 conclude that there was a significant association between country club membership and marital status..12311 here) is used to Value ASE1 Val/ASE0 Significance ———— .7 56.1% -. • Effect size.804/251).28337 .23989 .72155 .7 1.4% © Copy Right: Rai University • Residual = R[ij] = O[ij] . standardized residuals are components of chi-square.12311 . • Column probabilities to compare the country clubs (i. The phi-coefficient is interpreted as follows: • 1 = small • 3 = moderate • 5 = large • If the Chi-square statistic were significant.7 3. 11. and is calculated as the square root of chi-square divided by the sample size —> sqrt(3.80446 4.8 57.7 57.556 151 .3 -.—————— Pearson Likelihood Ratio Mantel-Haenszel test for linear association ———— 3.23989 Mantel-Haenszel test for 1.7 91.7 -.28337 *1 *1 Pearson chi-square probability Number of Missing Observations: 1 This Table Prints • Observed counts. columns).12311 Pearson 3.80446 .28337 > . The sum of squared standardized residuals equals the chi-square statistic. O[ij] • Expected counts.2 .e.6% -2.3 2 5.20774 .4 .E[ij] = the difference between the observed counts.3 6 5.1% RESEARCH METHODOLOGY Column 63 Total 25.28337 Likelihood Ratio 4.20774 1.1% Minimum Expected Frequency .187 Approximate Statistic ———— Phi *1 Cramer’s V *1 Note: • Chi-square is not statistically significant (p=. that there is no association between row and column variables.18949 linear association Minimum Expected Frequency - 3 3 1 measure the magnitude (size) of the association between row and column variables.28337 Chi-Square Significance —————————— - Value DF ———— —————- ———— -——— ——— .2 5 21 5. and the counts expected under the null hypothesis • Standardized Residuals = R[ij]/sqrt(E[ij]).5. The phi-coefficient (.28337 .3 -.7 Count Exp Val Alden Chalet Chestnut Lancaste Residual Ridge r Row Std Res 1 2 3 4 Total MARSTAT 1 55 57 60 58 230 married 57.3 8.1 62 63 251 24. estimated under the null hypothesis (i.

• Analysis of Variance or ANOVA will allow us to test the difference between 2 or more means. 2. would compare the variability that we observe between the two conditions to the variability observed within each condition. • ANOVA does this by examining the ratio of variability 2. If Fcalc > Fc-1. (Not all m j are equal) Significance level a?= 0. When we actually calculate an ANOVA we will use a short-cut formula. Example: Consider the following (small. Recall that we measure variability as the sum of the difference of each score from the mean. α = MSA MSW 4. Alternative Hypothesis. If Fcalc < Fc-1.n-c.HA: at least two of the means are not equal. when the variability that we predict (between the two groups) is much greater than the variability we don’t predict (within each group). on the other hand.89. ANOVA Table Source of Variation Among Groups Within Groups Total n-1 SST n-c SSW MSW = between two conditions and variability within each condition. An ANOVA test. Sample 3 5 5 5 3 2 20 4 3 4 3 5 0 15 3 ij n SST = ∑∑ X ij − X j =1 i =1 c nj ( ) 2 SSA = ∑ n j X j − X j =1 c ( ) 2 SSW = ∑∑ (X ij − X j ) c nj j =1 i =1 2 Sample 1 2 3 1 3 1 SUM Mean © Copy Right: Rai University Sample 2 Hypothesis test format: 1.556 . the critical value from F-table is : F 0. X = grand mean = X = X j = mean for group j The following formulas are found on ∑∑ X j =1 i =1 c nj An Illustrative Numerical Example for ANOVA Let us introduce the ANOVA in simplest forms by numerical illustration.RESEARCH METHODOLOGY LESSON 26: ANALYSIS OF VARIANCE (ANOVA) Students. F= 3. A t-test would compare the likelihood of observing the difference in the mean number of words recalled for each group. Xij = the i observation in the j group th th Degrees of Freedom c-1 Sums of Squares SSA Mean Squares F MSA = SSA c −1 F= MSA MSW SSW n− c nj = the number of observations in group j n = the total number of observations in all groups combined c = the number of groups or levels 6 Accept null hypothesis or reject null hypothesis 7 There are no significant differences among the c means or there is at least one inequality among the c means. and then we will conclude that our treatments produce different results.05.a Reject H0. H0:µ1 = µ2 = … = µc H1: Not all µj are equal 10 2 152 11. Hypothesis: Null Hypothesis. and integer.n-c. H0: µ1 = µ2 = µ3. the tests we have learned up to this point allow us to test hypotheses that examine the difference between only two means.a Do Not Reject H0.05. 5. 12 = 3. indeed for illustration while saving space) random samples from three different populations. Thus.

**Demonstrate that, SST= SSB+SSW
**

Computation of sample SST

With the grand mean = 3, first, start with taking the difference between each observation and the grand mean, and then square it for each data point.

Sample 1 Sample 2 Sample 3 1 0 4 0 4 SUM 9 0 1 0 4 9 14 4 4 4 0 1 13

Now, construct the ANOVA table for this numerical example by plugging the results of your computation in the ANOVA Table.

RESEARCH METHODOLOGY

The ANOVA Table Sources of Sum of Degrees of Mean FVariation Squares Freedom Squares Statistic Between 10 2 5 2.30 Samples Within 26 12 2.17 Samples Total 36 14

Conclusion: There is not enough evidence to reject the null hypothesis Ho. Logic Behind ANOVA: First, let us try to explain the logic and then illustrate it with a simple example. In performing ANOVA test, we are trying to determine if a certain number of population means are equal. To do that, we measure the difference of the sample means and compare that to the variability within the sample observations. That is why the test statistic is the ratio of the between-sample variation (MST) and the within-sample variation (MSE). If this ratio is close to 1, there is evidence that the population means are equal.

Therefore SST=36 with d.f = 15-1 = 14 Computation of sample SSB Second, let all the data in each sample have the same value as the mean in that sample. This removes any variation WITHIN. Compute SS differences from the grand mean.

Sample 1 Sample 2 Sample 3 1 1 1 1 1 SUM 5 0 0 0 0 0 0 1 1 1 1 1 5

**Here’s a Hypothetical Example
**

“many people believe that men get paid more in the business world than women, simply because they are male. To justify or reject such a claim, you could look at the variation within each group (one group being women’s salaries and the other being men salaries) and compare that to the variation between the means of randomly selected samples of each population. If the variation in the women’s salaries is much larger than the variation between the men and women’s mean salaries, one could say that because the variation is so large within the women’s group that this may not be a gender-related problem.” Now, getting back to our numerical example, we notice that: given the test conclusion and the ANOVA test’s conditions, we may conclude that these three populations are in fact the same population. Therefore, the ANOVA technique could be used as a measuring tool and statistical routine for quality control as described below using our numerical example.

Therefore SSB = 10, with d.f = 3-1 = 2 Computation of sample SSW Third, compute the SS difference within each sample using their own sample means. This provides SS deviation WITHIN all samples.

Sample 1 0 1 1 1 1 SUM 4 Sample 2 Sample 3 0 1 0 4 9 14 1 1 1 1 4 8

**Construction of the Control Chart for the Sample Means
**

Under the null hypothesis the ANOVA concludes that µ1 = µ2 = µ3; that is, we have a “hypothetical parent population.” The question is, what is its variance? The estimated variance is 36 / 14 = 2.75. Thus, estimated standard deviation is = 1.60 and estimated standard deviation for the means is 1.6 / 5 = 0.71. Under the conditions of ANOVA, we can construct a control chart with the warning limits = 3 ± 2(0.71); the action limits = 3 ± 3(0.71). The following figure depicts the control chart.

**SSW = 26 with d.f = 3(5-1) = 12 Results are: SST = SSB + SSW, and d.fSST = d.fSSB + d.fSSW, as expected.
**

11.556

© Copy Right: Rai University

153

Manova Y By Gp(1,5)/Print=Homogeneity(bartlett)/ Npar Tests K-w Y By Gp(1,5)/ Finish ANOVA like two population t-test can go wrong when the equality of variances condition is not met. General Rule of 2, Homogeneity of Variance: Checking the equality of variances For 3 or more populations, there is a practical rule known as the “Rule of 2”. According to this rule, one divides the highest variance of a sample by the lowest variance of the other sample. Given that the sample sizes are almost the same, and the value of this division is less than 2, then, the variations of the populations are almost the same. Example: Consider the following three random samples from three populations, P1, P2, P2 P1 25 25 20 18 13 6 5 22 25 10 P2 17 21 17 25 19 21 15 16 24 23 P3 8 10 14 16 12 14 6 16 13 6

RESEARCH METHODOLOGY

The summary statistics and the ANOVA table are computed to be: Variable P1 P2 P3 N 10 10 10 Mean 16.90 19.80 11.50 St.Dev SE Mean 7.87 3.52 3.81 2.49 1.11 1.20

SPSS program for ANOVA: More Than Two Independent Means: $Spss/Output=4-1.out1 Title ‘Analysis Of Variance - 1st Iteration’ Data List Free File=’a.in’/Gp Y Oneway Y By Gp(1,5)/Ranges=Duncan /Statistics Descriptives Homogeneity Statistics 1

Analysis of Variance Source Factor Error Total DF 2 SS 79.40 MS 39.70 9.07 F p-value 4.38 0.023

27 244.90 29 324.30

With an F = 4.38 and a p-value of .023, we reject the null at a = 0.05. This is not good news, since ANOVA, like two sample t-test, can go wrong when the equality of variances condition is not met

154

© Copy Right: Rai University

11.556

Self Assessment

. 999 . 99 . 95

RESEARCH METHODOLOGY

Normal Probability Plot

Probability

. 80 . 50 . 20 . 05

In order to compare the effectiveness of three tax-preparation methods ten tax preparers were randomly assigned to one of the methods. They were all given a hypothetical return contrived for the purpose of the experiment. The number of minutes each person required for completion of the return was recorded. Is there a significant difference in the average time to prepare the returns among the three methods? Use a=.05.

. 01 . 001 -3

Average: 0 StDev : 2.449 49 N: 10

-2

-1

0

1

2

3

RESI1

Kol mo gorov-Smirnov Normali ty Test D+ : 0.208 D-: 0.193 D : 0.20 8 App roxi ma te P-Val ue > 0.15

Method

I 15 20 19 14 Solution of the self assessment exercise by computer MINITAB II 10 15 11 III 18 19 23

**One-way Analysis of Variance
**

Analysis of Variance for time Source method Error Total 7 DF 2 SS 98.40 54.00 MS 49.20 7.71 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean +————+1 2 3 +Pooled StDev = 2.777 10.0 15.0 20.0 25.0 4 17.000 3 12.000 3 20.000 StDev ——+————+———— 2.944 2.646 (———*———) (———*———) F P 6.38 0.026

9 152.40

**Interpreting the Output
**

1. Examine the p-value for the Bartlett’s test p-value < α => reject Ho => variances unequal => STOP p-value ³ α => DNR Ho => variances equal => continue 2. Examine the p-value for the Kolmogorov-Smirnov test p-value < α => reject Ho => not normal=> STOP p-value ³ α => DNR Ho => normally dist => continue 3. Examine the p-value for the ANOVA test p-value < α => reject Ho => at least one significant difference among means => continue p-value ³ α => DNR Ho => no significant differences among means => STOP 4. Examine the confidence intervals for each combination of means interval contains 0 (signs unlike) => no significant difference interval doesn’t contain 0 (signs same) => significant difference

2.646 (———*———) ——+————+————+————

**Tukey’s Pairwise Comparisons
**

Family error rate = 0.0500 Individual error rate = 0.0214 Critical value = 4.17 Intervals for (column level mean) - (row level mean) 1 2 3 2 -1.255 11.255 -9.255 3.255 -14.687 -1.313

Example of How An Anova Should Be Written Up Check of Assumption Of Equal Variances H0: The variances are equal. H1: The variances are not equal Homogeneity of Variance Bartlett’s Test (normal distribution) Test Statistic: 0.033 P-Value : 0.984

Homogeneity of Variance

Bartlett’s Test (normal distribution) Test Statistic: 0.033 P-Value : 0.984

11.556

© Copy Right: Rai University

155

Since the p-value = 0.984 > 0.05, we DNR Ho. Therefore, there are no significant differences among the variances. The assumption of equal variances is met.

Family error rate = 0.0500 Individual error rate = 0.0214 Critical value = 4.17 Intervals for (column level mean) - (row level mean) 1 2 3 -1.255 11.255 -9.255 3.255 -14.687 -1.313 2

Probability

RESEARCH METHODOLOGY

**Check the Assumption of Normality
**

. H0: The residuals fit a normal distribution. H1: The residuals do not fit a normal distribution.

Normal Probability Plot

.999 .99 .95 .80 .50 .20 .05 .01 .001 -3

Average: 0 StDev: 2.44949 N: 10

-2

-1

0

1

2

3

Methods 2 and 3 are significantly different from each other. Method 1 is not significantly different from either method 2 or 3. By examining the means displayed with the ANOVA analysis, I would recommend Method 2 as the method tax prepares should use since the group using this method requires significantly less time to prepare returns on average than the group using method 3.

RESI1

Kolm ogorov-Sm irnov Normal ity Test D+: 0.208 D-: 0.193 D : 0.208 Approx im ate P-Valu e > 0.15

Self Assessment

Alison, a fellow psychology major, decides to replicate Gregory’s honors thesis study (see example for independent-samples ttest) of fragrance and memory, but adds an additional condition. There are three groups in Alison’s study: (a) read passage on scented paper and test recall using paper scented with same fragrance, (b) read passage and test recall using unscented paper, and (c) read passage on scented paper but test recall using unscented paper. She records the same dependent variable (number of facts correctly recalled) and statistically compares the three groups of scores. Here are the data that were recorded: Scented paper Unscented paper Scented/ both times both times unscented paper 32 29 26 mean = 29.00 s= 3.00 23 20 14 mean = 19.00 s = 4.58 22 17 15 mean = 18.00 s = 3.61

**Since the p-value > 0.15 > 0.05, we DNR Ho. Therefore, the normality assumption is met.
**

Check For Differences Among The Means

1. H0: µ 1 = µ 2 = µ 3 H1: at least one µj not equal 2. ANOVA 3. α= 0.05 4. If the p-value < α reject H0. If the p-value ³ α do not reject H0. 5. One-way Analysis of Variance Analysis of Variance for time Source code Error Total DF 2 7 SS 98.40 54.00 MS 49.20 7.71 Individual 95% CIs For Mean Based on Pooled StDev Level 1 2 3 N Mean StDev ——+———+——+———+2.944 2.646 (———*———) (———*———) 2.646 (———*———) 4 17.000 3 12.000 3 20.000 F P 6.38 0.026

9 152.40

Enter the data using two columns. Name the first column “group” and use a 1, 2, or 3 (1= both scented, 2=both unscented, 3=scented/unscented) to designate the group for the score. Name the second column “recall,” and type in the recall score. Your data entry will look like this: Group 1 1 1 2 2 25.0 2 3 3 3 Recall 32 29 26 23 20 14 22 17 15

——+————+————+————+Pooled StDev = 2.777 10.0 15.0 20.0 6. Since the p-value = 0.026 < 0.05, we reject Ho. 7. Therefore, there is at least one significant difference in the average time it takes to complete the tax return among the three methods. Tukey’s pairwise comparisons

156 © Copy Right: Rai University

11.556

Null hypothesis H0: (µ1 = µ2 = µ3, Alternative Hypothesis, HA: at least two of the means are not equal. (Not all m j are equal) Analyze > Compare Means > One-Way ANOVA Dependent list: recall Factor: group Post-hoc > Turkey Options > Descriptive F= 7.74 critical F= 5.14 decision: Reject H0 Source Within SS 86.00 df 2 6 MS 111.00 14.33 statistically significant statistically significant not statistically significant Group 2: 19.00 Group 3: 18.00 F 7.74 Between 222.00 Statistical p .022

RESEARCH METHODOLOGY

Summary of Tukey results (see SPSS output) Group 1 compared to group 2: Group1 compared to group 3: Group 2 compared to group 3:

Means from descriptives section of output: Group 1: 29.00

Conclusion

A one-way ANOVA indicated that there were significant differences in recall across the three fragrance conditions, F (2,6)= 7.74, p<.022. Post-hoc Tukey comparisons indicated that recall was significantly better using scented paper during both reading and recall (M=29.00) than for either scented paper during reading only (M=18.00), or no scented paper at all (M=19.00). If the results had NOT been statistically significant, the Tukey tests would not have been performed, and the conclusion would read: A one-way ANOVA indicated that recall did not differ significantly across the three fragrance conditions.

11.556

© Copy Right: Rai University

157

Why the Name Analysis of Variance? It may seem odd to you that a procedure that compares means is called analysis of variance. In the class you are expected to discuss on use of this advanced with special focus on Latin Square Design.e.0 1 24. if we compute the total SS based on the overall mean. if we were to perform an ANOVA on the above data.0 24. post-hoc comparisons. In the class you will be doing more exercises on ANOVA This article includes a general introduction to ANOVA and a discussion of the general topics in the analysis of variance techniques. Partitioning of variance works as follows. group membership explains this variability because we know that it is due to the differences in means.groups variability (called Mean Square Effect. the SS Effect we can explain. Consider the following data set: Significance Testing The basic idea of statistical significance testing is discussed in Elementary Concepts. computing the variance (sums of squares) based on the within-group variability yields a much smaller estimate of variance than computing it based on the total variability (the overall mean). The within-group variability (SS) is usually referred to as Error variance. or Mserror. then ANOVA will give the same results as the t test for independent samples (if we are comparing two different groups of cases or observations).com. in the above table the total SS (28) was partitioned into the SS due to within-group variability (2+2=4) and variability due to differences between means (28-(2+2)=24). 158 © Copy Right: Rai University 11. we are actually comparing (i. Namely. we get the number 28. Thus. • The Partitioning of Sums of Squares • Multi-Factor ANOVA • Interaction Effects MAIN EFFECT SS df MS F p Effect 24. Put another way. However.008 Error 4. we get 4. that is. Adding them together.0 . 1885). etc. we base this test on a comparison of the variance due to the between. respectively). SS Error and SS Effect. under stand the advanced theory and its application the below article is going to prove very useful. If we are only comparing two means. This article has been taken from the website www. ANOVA is a good example of this. that is. The reason for this in the above example is of course that there is a large difference between means. analyzing) variances. Observation 1 Observation 2 Observation 3 Mean Sums of Squares (SS) Overall Mean Total Sums of Squares Group 1 Group 2 2 6 3 7 1 5 2 6 2 2 4 28 The means for the two groups are quite different (2 and 6. ignoring group membership. Quikmba. or SS for short.0 As you can see. the purpose of analysis of variance (ANOVA) is to test for significant differences between means. this term was first used by Edgeworth. The Partioning of Sums of Squares At the heart of ANOVA is the fact that variances can be divided up. including repeated measures designs. If we now repeat these computations. or MS effect) with the within. In fact. partitioned. The sums of squares within each group are equal to 2. the variance is a function of the sums of (deviation) squares.556 . this name is derived from the fact that in order to test for statistical significance between means. However. If you are not familiar with those tests you may at this point want to “brush up” on your knowledge about those tests by reading Basic Statistics and Tables. assumptions. . In other words. we would get the following result: Basic Ideas The Purpose of Analysis of Variance In general. contrast effects. This term denotes the fact that we cannot readily explain or account for it in the current design.group variability (called Mean Square Error. Remember that the variance is computed as the sum of squared deviations from the overall mean. and it is this difference that accounts for the difference in the SS.. or the t test for dependent samples (if we are comparing two variables in one set of cases or observations). Here. divided by n-1 (sample size minus one). it is due to the differences in means between the groups. unbalanced and incomplete designs. Elementary Concepts also explains why very many statistical test represent ratios of explained to unexplained variability.0 4 1. Elementary Concepts provides a brief introduction into the basics of statistical significance testing.RESEARCH METHODOLOGY LESSON 27: APPLICATIONS OF ANOVA To. given a certain n.

Therefore. thus the combined SS-within is equal to 2+2+2+2=8). that test is highly significant. (2) variability due to experimental group membership.Achievementoriented avoiders Challenging Test 10 5 Easy Test 5 10 Summary of the Basic Logic of ANOVA To summarize the discussion up to this point. Interaction Effects There is another advantage of ANOVA over simple t-tests: ANOVA allows us to detect interaction effects between variables.g. within-gender means to compute those SS. And. We measure how hard the students work on the test. Imagine that we have a sample of highly achievement-oriented students and another of achievement “avoiders. The variables that are measured (e. For example. indeed. etc. This example demonstrates another principal of ANOVA that makes it preferable over simple two-group t test studies: In ANOVA we can test each factor while controlling for all others. temperature. and give one half of each sample a challenging test.) Main effects. it appears that we can partition the total variance into at least 3 sources: (1) error (within-group) variability. the variance estimated based on within-group variability should be about the same as the variance due to between-groups variability. for example. lighting. two-way interaction. we would get the identical result if we were to compare the two groups using this test. many factors are taken into account.e. This difference is due to the fact that the means for males are systematically lower than those for females. 1926. However. and. and instances when a single variable completely explains a phenomenon are rare. within. and we would in fact conclude that the means for the two groups are significantly different from each other.g. In our example above.. we would still expect some minor random fluctuation in the means for the two groups when taking small samples (as in our example). (Note that there is an additional source — interaction — that we will discuss shortly. ANOVA is a much more flexible and powerful technique that can be applied to much more complex research issues. Let us expand on this statement. which tests whether the ratio of the two variance estimates is significantly greater than 1. and this difference in means adds variability if we ignore this factor. and. The means of this (fictitious) study are as follows Achievement. Suppose that in the above two-group example we introduce another grouping factor. by partitioning the total variance into the component that is due to true random error (i. Let us consider another example to illustrate this point.. the purpose of analysis of variance is to test differences in means (for groups or variables) for statistical significance. We could summarize this design in a 2 by 2 table RESEARCH METHODOLOGY Experimental Experimental Group 1 Group 2 Males 2 6 3 7 1 5 Mean 2 6 Females 4 8 5 9 3 7 Mean 4 8 Before performing any computations.e. Dependent and independent variables. soil conditions. we need fewer observations to find a significant effect) than the simple t test.. if significant. We can compare those two estimates of variance via the F test (see also F Distribution). (The term interaction was first used by Fisher.556 © Copy Right: Rai University 159 . when trying to explore how to grow a bigger tomato.. a test score) are called dependent variables. 11. Multiple Factors The world is complex and multivariate in nature. Imagine that in each group we have 3 males and 3 females.Under the null hypothesis (that there are no mean differences between groups in the population). Controlling for factors. and accept the alternative hypothesis that the means (in the population) are different from each other. we reject the null hypothesis of no differences between means. we would need to consider factors that have to do with the plants’ genetic makeup. These latter variance components are then tested for statistical significance. The variables that are manipulated or controlled (e. This is accomplished by analyzing the variance. in a typical experiment. a teaching method or some other criterion used to divide observations into groups that are compared) are called factors or independent variables Multi-Factor ANOVA In the simple example above. they will be equal to 2 in each group. the other an easy test.” We now create two random halves in each sample. therefore. under the null hypothesis. and with fewer observations we can gain more information. the result is SS=10+10=20). you will see that the resulting within-group SS is larger than it is when we include gender (use the withingroup. this is actually the reason why ANOVA is more statistically powerful (i. Controlling for error variance increases the sensitivity (power) of a test.) What would have happened had we not included gender as a factor in the study but rather computed a simple t test? If you compute the SS ignoring the gender factor (use the within-group means ignoring or collapsing across gender. and (3) variability due to gender. it may have occurred to you that we could have simply computed a t test for independent samples to arrive at the same conclusion. Thus. Gender. One important reason for using ANOVA methods rather than multiple two-group studies analyzed via t tests is that the former method is more efficient. to test more complex hypotheses about reality. that is.group SS) and the components that are due to differences between means.

avoiders? None of these statements captures the essence of this clearly systematic pattern of means.How can we summarize these results? Is it appropriate to conclude that (1) challenging tests make students work harder. instrumental Amuzak@ medleys of each music type were used. The pattern shown in the table above (and in the graph below) represents a three-way interaction between factors. while easy tests make only achievement. If we have a four-way interaction.avoiders work harder. The main effect for test difficulty is modified by achievement orientation. the type of achievement orientation and test difficulty interact in their effect on effort. achievementavoiding females work harder on easy tests than on difficult tests. In other words. Anagrams were chosen of equal difficulty. To eliminate verbal interference.or higher. the BeeGees. the general formula is: Tukey This gives you a critical difference (CD). RESEARCH METHODOLOGY Higher order interactions While the previous two-way interaction can be put into words relatively easily. that there are different types of interactions in the different levels of the fourth variable. and randomly paired with the different types of music. we may summarize that the two-way interaction between test difficulty and achievement orientation is modified (qualified) by gender. (2) achievement-oriented students work harder than achievement. specifically. the description of the interaction has become much more involved. Chosen for use were Beethoven. The number of anagrams solved in each condition was recorded for analysis. For the three-way interaction in the previous paragraph. we may say that the three-way interaction is modified by the fourth variable.way interactions are not that uncommon. Remember. Easy listening. The appropriate way to summarize the result would be to say that challenging tests make only achievement-oriented students work harder.You will have to use the formula for the Tukey test and calculate the critical difference by hand for this problem. in many areas of research five. which were all 10 minutes in length. The music was chosen to represent Classical. specifically semantic processing? To investigate this. this is an example of a two-way interaction between achievement orientation and test difficulty. so all participants were tested in every condition. and Garth Brooks. Here are the data recorded for the participants: Beethoven 14 16 11 17 13 14 15 12 10 13 BeeGees 14 10 8 15 10 8 10 12 13 11 Brooks 16 12 9 15 12 11 8 12 14 10 Females Challenging Test Easy Test Males Challenging Test Easy Test Achievementoriented 10 5 Achievementoriented 1 6 Achievementavoiders 5 10 Achievementavoiders 6 1 How could we now summarize the results of our study? Graphs of means for all effects greatly facilitate the interpretation of complex effects. that is.556 Thus we may summarize this pattern by saying that for females there is a two-way interaction between achievement-orientation type and test difficulty: Achievement-oriented females work harder on challenging tests than on easy tests. Let us try this with the two-way interaction above. As you can see. Use 3 columns to enter the data just as they appear above. 160 © Copy Right: Rai University . A General Way To Express Interactions A general way to express all interactions is to say that an effect is modified (qualified) by another effect. The order of conditions was counterbalanced for the participants.10 participants solved anagrams while listening to three different types of background music. Note that statements 1 and 2 above describe so-called main effects. Call them “bthoven” “beegees” and “brooks” SPSS does not perform post-hoc comparisons for repeatedmeasures analyses. higher order interactions are increasingly difficult to verbalize. Note: MSwg = MS within groups (from ANOVA table output) 11. and we had obtained the following pattern of means: Appendix Repeated-Measures Analysis of Variance SPSS -outputA Does background music affect thinking. this interaction is reversed. Any two means which differ by this amount (the CD) or more are significantly different from each other. As it turns out. For males. A within-subjects design was used. and Country styles. Imagine that we had included factor Gender in the achievement study above.

in’/Gp Y Oneway Y By Gp(1.42.5)/Ranges=Duncan Statistics 1 Manova Y By Gp(1.05.in’/Freq Sample Nom Weight By Freq Variable Labels Sample ‘Sample 1 To 4’ Nom ‘Less Or More Than 8’ Value Labels Sample 1 ‘Sample1’ 2 ‘Sample2’ 3 ‘Sample3’ 4 ‘Sample4’/ Nom 1 ‘Less Than 8’ 2 ‘Gt/Eq To 8’/ Crosstabs Tables=Nom By Sample/ Statistic 1 Finish RESEARCH METHODOLOGY Tukey = 2.027 Statistical decision: Reject $Spss/Output=A. Post-hoc Tukey comparisons indicated that more anagrams were solved while listening to Beethoven (M=13.40 (greater than our CD.5) than while listening to the BeeGees (M=11.5)/ Finish Chi Square Test: Dependency © Copy Right: Rai University 11. Table 4.42 critical F= 3. Any two groups differing by 2. 2.2 Chi Square.10 Group 3: 11.87 Subjects 91.55 H0 Source SS Between 29.5)/Print=Homogeneity(bartlett)/ Npar Tests K-w Y By Gp(1.1st Iteration’ Data List Free File=’a.50 Within 60. if you want) Plots > Horizontal axis: music F= 4.38 F 4.90 1 compared to 2: difference of 2.93 10.out Title ‘Analysis Of Variance .50 Group 2: 11.18)= 4.10 or more are signficantly different with the Tukey test at alpha = . so this is not statistically significant) 2 compared to 3: difference of .60 (less than our CD. F (2.098 = 2.027.n = number of scores per condition q = studentized range statistic (Table in back of textbook) H0: (mu #1 = mu #2 = mu #3) H1: (1 or more mu values are unequal) Analyze > General Linear Model > Repeated Measures Within-Subjects factor name: music Number of levels: 3 click Add > Define Click over variables corresponding to level 1.10.17 3. so this is statistically significant) 1 compared to 3: difference of 1. Summary of Tukey Results Means from descrptives section of output: Group 1: 13.1 ANOVA: More Than Two Independent Means: SPSS program $Spss/Output=A.18’ Data List Free File=’a.out Title ‘Problem 4.556 161 .80 (less than our CD so this is not statistically significant Conclusion A one-way repeated-measures ANOVA indicated that there were significant differences in the number of anagrams solved across the three background instrumental (muzak) conditions. 3 Options > Descriptive Statistics (optional.80 df 2 9 18 MS 14. p<.42 p .

You should also remember that • The correlation coefficient is unaffected by units of measurement • Correlations of less than 0·7 should be interpreted cautiously • Correlation does not imply causation • Overall. r. Basically correlation technique helps us in determining the degree to which the variables are related to each other. E. • The null hypothesis in this case would be that . We can also perform a hypothesis test to test the significance of the correlation coefficient.556 . It is a measure of how well the data fit a straight line. is a measure of linear association between two continuous variables. Today we will be emphasizing on determination of relationship between variables. In both of these examples the correlation coefficient quoted is spurious. is zero). the question arises: How will you predict for future? What forms the base for forecasts? Decision-makers basically rely on relationships.”there is no RESEARCH METHODOLOGY correlation between the two variables (i. Managers in day-to-day life make decisions –personal and professional based on predictions of the future events. Correlation should not be used when: • There is a non-linear relationship between variables • There are outliers • There are distinct sub-groups For example a Healthy controls with diseased cases b If the values of one of the variables is determined in advance. It is harder to spot a correlation close to zero. Regression and Correlation analysis helps us in determining the nature and strength of the relationship between two variables.e. 2. r.g Picking the doses of a drug in an experiment measuring its effect Two examples of when not to use a correlation coefficient a When there is a non-linear relationship. b when distinct subgroups are present It is the custom that we on X-axis we measure independent variable and on y-axis we measure dependent variable. the population correlation coefficient. it will prove very useful in decisionmaking. The range of correlation coefficient is : 1 ³ r ³ -1 • If r > 0 we have a Positive correlation • If r < 0 we have a Negative correlation • If r = 0 we have No correlation UNIT IV MULTIVARIATE ANALYSIS Diagram: Scatter plots showing data sets with different correlations Note: 1. I do not find correlation to be a very useful technique 4. Spurious correlations crop up all the time: Correlation basically measures the degree of concurrence of two variables. Now. It is nothing but the concurrence of two set of data sometimes you may get spurious correlations • The price of petrol shows a positive correlation with the divorce rate over time © Copy Right: Rai University 162 11. which can either be intuitive or calculative. The (Pearson) correlation coefficient. but these are the ones we come across most often 3. If you know how the known is related to the future events.LESSON 28: APPLICATION OF CORRELATION TECHNIQUE IN RESEARCH METHODOLOGY Students. we all know that research methods provide a sound back ground for decision-making. You should be aware – how the correlation co-efficient is interpreted? What are the basic problems associated with the use of correlation coefficient? When you should be using the correlation coefficients? When you start using correlation technique you ask the question “Is there a linear relationship between them?” If the answer is yes. we use the Pearson’s correlation coefficient.

are correlated then there are four possibilities: The result occurred by chance i. This is the formula that we would be using for calculating the linear correlation coefficient if we were doing it by hand.g.We can now proceed on to the use of correlation in decision-making 1. B influences A.e There is a direct cause and effect relationship or There is a reverse cause and effect relationship • A and B are influenced by some other variable(s) i. Although you all should know that what do you mean by correlation coefficient and how is it calculated The correlation coefficient “r” is defined as Sum of squares( XY) divided by [Sum of squares (X) multiplied by Sum of squares(Y) ] I hope students that the calculation and concepts relating to correlation would be clear . • This calculates the proportion of the variation in the actual values which can be predicted by changes in the values of the independent variable • Denoted by . it represents the proportion that can be predicted by the regression line • The value 1 . Therefore with usual notations we have. then we will tend to see a correlation Why are variables correlated? If two variables. Not the same thing!) If you divide the numerator and denominator by n. increased consumption of sugar increases the number of caries a person has and increases their weight.e The relationship may be coincidental • A influences (‘causes’) B (or. This formula can be simplified through some simple algebra and then some substitutions using the SS notation discussed earlier. we won’t be finding it this way. Coefficient of Determination • A may lead to an increase in C which ‘causes’ B e.e • The relationship may be caused by a third variable This can happen in two ways: • C may ‘cause’ both A and B I. • i. So. a child’s height and ability to read.556 © Copy Right: Rai University 163 . then you get something which is starting to hopefully look familiar. A and B. low income may increase chance of smoking which increases chance of death from lung cancer.is therefore the proportion contributed by other factors Also note that Solved Example An Example Pearson’s Correlation Coefficient Pearson’s correlation is calculated from the following formulae: The correlation coefficient is denoted by r 11.g. Don’t worry about it. the square of the coefficient of correlation SS(x) could be written as • ranges from 0 to 1 (r ranges from -1 to +1) • Expressed as a percentage. Each of these values have been seen before in the Sum of Squares notation section.e The relationship may be caused by complex interactions of several variables e.g. • If we repeatedly measure two variables on the same individual over a period of time e. This notation will make the calculations very easy.• Number of deaths from heart attacks in a population rises RESEARCH METHODOLOGY with incidence of long-sightedness over time • Maximum daily air temperature and number of deaths of cattle were positively correlated during December 2003 . Does low income cause lung-cancer? Calculation of Pearson’s Correlation Co-efficient You all have studied the notation SS -the sum of squares. Does more weight cause more caries? Since we are using the SPSS we will not be calculating correlation by hand. the linear correlation coefficient can be written in terms of sum of squares.

Nearly 92% of the variation in Y is explained by the regression line. The data were as follows: FEV CIGS FEV CIGS For these data.00 7 64.00 81.00 3025.00 4 41. the choice of dependent and independent variables is arbitrary.00 5041.00 2304. a Measure of lung function. The other was the forced expiratory volume (FEV.00 6177.00 6561.00 1089.00 5 23.00 5776.08 4033.00 Mean 57.00 55.92 Interpretation/Conclusion There is a linear relation between the results of Accounting and Statistics as shown from the scatter diagram in Figure 1.00 529.00 5041.75 3799.00 3520. and were of similar height and age. Points to Ponder • Correlation is useful tool to determine relationship between two variables • Methods to determine correlation – • • Scatter diagrams Karl Pearson’s coefficient to Correlation r= n∑ x − (∑ x ) 2 * n∑ y 2 − ( ∑ y ) 2 2 n∑ xy − ∑ x∑ y • Spearmen’s coefficient of rank correlation R =1 − n( n 2 − 1) 6∑ d 2 d = difference between ranks • Value of r should be between − 1 ≤ r ≤ 1 Exercise Note: This exercise is rather lengthy! The relationship between cigarette smoking and lung function was investigated by gathering 16 college boys who were regular smokers for the past 2 years.00 9 71.00 1440.00 690.00 8464. One was the average number of cigarettes Smoked per day.00 86.00 2080. This indicates that the two variables are positively correlated (Y increases as X increases). The resultant regression line is represented by in which X represents the results of Accounting and Y that of Statistics.00 11 30. In this example.25 61.00 2 93.00 7396.00 45591.556 .00 5994.00 5476.00 7998.00 9200.25 4377.00 4096.00 35.00 900. A linear regression analysis was done using the least-square method.00 52.00 100. This shows that the two variables are correlated.00 24.RESEARCH METHODOLOGY Accounting Statistics X2 Y2 XY X Y 1 74.8453.00 4489.9194.00 6 92.00 2704.00 5396.00 52525.00 67.00 12 71.00 8649.00 900.00 87.00 8 40.00 576.00 30.00 1225.00 Sum 687.00 10 33. Figure 2 shows the regression line. It can be said that the results of Statistics are correlated to that of Accounting or vice versa.00 3 55.00 1681. the following were 164 © Copy Right: Rai University 11.00 1600. The Coefficient of Determination is 0.00 3025.00 10000.00 741.00 76.00 7569.00 792.00 48.00 48407.00 1435. The Coefficient of Correlation (r) has a value of 0. For each boy two measurements were taken.00 3685. in litres/minute).

4 10 3.30 ie ?”Y 3.5 ie ?”XY 2.9 5 2.6 30 ???”CIGS2 = 5650 ie ?”X2 2.9 25 ???”FEV = 43.2 15 2.9 5 calculated: 1.7 15 3. what is the predicted value of FEV? 5 What caution(s) might you add to the estimate in question 4? RESEARCH METHODOLOGY 11. 2 Determine the proportion of variation in FEV explained by CIGS 3 Calculate the regression line to predict FEV from CIGS 4 For a boy who smokes 40 cigarettes a day.1 15 ???”FEVxCIGS = 687.91 ie ?” X2 2.1 30 ???”FEV2 = 121.556 © Copy Right: Rai University 165 .1 10 ???”CIGS = 270 ie ?”X 2.2.5 20 3.8 15 2.5 25 2.0 25 2.5 20 1 Calculate the correlation between FEV and CIGS.1 5 N=16 3.

003 2.RESEARCH METHODOLOGY LESSON 29: MULTICOLLINEARITY IN MULTIPLE REGRESSION Definition and Effectof Multicolinearity In multiple. i. even when multicolinearity is present.2 11. The prob values at which these t ratios are significant may also be higher.e.5 22.regression analysis.6 35.. which entitles the Table 6 Minitab regression of sales on the cost of ads Regression Analysis The regression equation is SALES = 4.3 11. they may have large standard errors and small t values.6 11 9 7 12 8 6 13 18 6 8 10 12.2 40.0 9. But the value of the individual b coefficients may not be significant . The manager has collected the data in table 4 and would like to use it to predict pizza sales.4 9.31 Coef 16. Jan.19 MS 17.0% t=ratio 3.2 8. In multiple regression coefficients become unreliable if there is a high level of correlation between the independent variables.3 14. Nevertheless.69 F 15.0 30. For the past 12 months.1 38.4 34.17 + 2. Adding a second variable which is highly correlalated with the first distorts the values of the regression coefficients. respectively. the ads are scheduled and paid for in the month before they appear. Nov. If there is a high level of correlation between some of the independent variables we have a problem that statisticians call multicollinearity.95 p 0. Feb. Sept. April X1 number X 2 Cost of Y Total of Ads Ads Pizza sales appearing Appearing (000s of dollars) 12 13. Dec. What we can’t do is tell with much precision how the dependent variable changes in response to changes in any one of the correlated variables. In Tables 5& 6 and we have given Minitab outputs for the regressions of total sales of number of ads and cost of ads.7 38. 2. If the explanatory variables are relevant and explain a significant proportion of the variation in y we can still make reasonably accurate predictions. Table 4 Pizza Shack sales and Advertising Data Month May June July Aug. This is because the slope coefficients will become distorted ands will be associated with high standard errors. Each of the ads contains a two-for one coupon. the regression coefficient often become less reliable as the degree of correlation between the independent variable increases.76.40 3. These two would naturally be highly correlated with each other.556 . An example will make the issue clearer: Let’s look at an example in which multicollinearity is present to see how it present to se how it affects the regression.007 0.1 35.003 How does Multicollinearity Affect Us? 1.4 11.5271 R –Sq = 61.88 453. the manager of Pizza Shack has been running a series of advertisements in the local newspaper. What effect does Multicollinearity have on a Regression? Essentially when we run a regression with where independent variables are highly correlated we find that the overall predictive power of the regression may not be affected i.87 Cost 166 © Copy Right: Rai University 11.3 9. the egression may continue to have a high R2.7 12.31 276. Multicollinearity is the problem when two or more of the independent variables are correlated.206 Analysis of Variance SOURCE Regression Error Total DF 1 10 11 SS 176. bearer to receive two Pizza Shack pizzas while paying for only the more expensive of the two.2 30.982 0. Table 5 Minitab regression of sales on number of ads Regression Analysis The regression equation is Predictor Constant ADS S = 4.6 37. Oct. One of the key assumptions we make when carrying out a regression analysis is that the variables are independent and uncorrelated. In actual case we could have effectively done with only one variable rather than using two.3 46.2 Why would Multicollinearity occur? For example if we wish to estimate a firm’s sales revenue and we use both number of sales men employed and their tal salary bill.0832 Stdev 4. March.9 43.62 P 0. we can often predict y well.e.937 2.3 10.

625 ( that is. is 3.45 0. because t o > t c we conclude that the number rof ads is a higher significant explanatory variable for total sales. This is because the cost of an ad varies slightly. each ad increases total pizza sales by about $625). As a result of this. so that the number of ads explains about 61 percent of the variation in pizza sales. an developing a common sense understanding of it is necessary.15 453.849 Coef 4. 11.625 2. RESEARCH METHODOLOGY 0.0 percent. they are collectively vary significant.81 F 20.04 14.19 MS 305.180 How Does this Multicollinearity Affect Us? We are still able to make relatively precise predictions when it is present : Note that for the multiple regression (output intable 6).169.91 F 9.584 0.3 percent in the second simple regression. but individually not significant. in this regression.173 2.120 1. For instance. but we cannot separate out their individual contributions because they are so highly correlated with each other.12 (that is. but the standard error of this coefficient is 1.20 453. The output is in Table 8 The multiple regression is highly significant as a whole. but remember that when it’s present you cant tell with much precision how much the dependent variable will change if you “jiggle” one of the independence variables.000 What has Happened Here? In the simple regression.006.59 0.95 with 10 degree of freedom and a significance level of a = 0. In fact the correlation between these two variables is r = 0. about $ 1. we try to use both of them in a multiple regression.62 ADS + 2. because the ANOVA p is 0. we have Se = 3.54 R –Sq =67. Hint : the best multiple regression is one that explains the relationship among the data by accounting for the R –Sq =68.556 © Copy Right: Rai University 167 .3 percent so about 67 percent of the variation in pizza sales is explained by the cost of ads.006 Analysis of Variance Loss of Individual Significance However if we look at the p values for the individual variables in the multiple regression. Individual Contribution Can’t Be Separated Out At this point.99 143. Table 7 Minitab regression of sales on the number and cost of ads Regression Analysis The regression equation is SALES = 6.14 COST Predictor Constant ADS COST S = 3. that’s why we get r2 = 61. For the regression on the cost of ads.989 while for the simple regression with the cost of ads as the explanatory variable (output in Table 5 ).63330 4.01 the critical t value is found to be 3.4 percent so the two variables to gather explain about 68 percent of the variation in total sales. ads in the TV section cost more than ads in the news section and the manager of Pizza shack has placed Sunday ads in each of these section on different occasions. it is fair to ask. So our aim should be to minimize multicollinearity.000 Analysis of Variance SOURCE Regression Error Total Correlation Between two Explanatory Variables:This contradiction is explained once we notice that the number of ads is highly correlated with the cost of ads. r2 = 67. the standard error of estimate. Note also that r2 = 61.849.19 MS 154. and relatively large prob>½t½values. For the regression on number of ads.the multiple coefficient of determination is r2 = 68.8725 Stdev 7. What Can’t We Do? We can’t tell with much precision how sales will change if we increase the number of ads by one . the observed t value is 4. relatively small computed t values.54.109 t=ratio p 0.8949. so the cost of ads is even core significant as an explanatory variable for total sales than was the number of ads (for which the observed t value was only 3.1 neither variables is a significant explanatory variable. You might wonder why the two variables are not perfectly correlated.4% DF 2 9 11 SS 309.99 15. in the Sunday paper. Multicollinearity is a problem you have to deal with in multiple regressions. we see that the observed t value is 3.120).139 Stdev 8.570 0. their coefficients in the multiple regression have high standard errors.04 148.58 + 0.56 1.470 t=ratio p 0.77 0. in effect they each explain the same part of the variation Y. so we have a problem with multicollinearity in our data. and in the multiple regression.3% DF 1 10 11 SS 305. we see that even at a = 0.Predictor Constant ADS S = 3.74 P 0. but an R2 of only 68. adding the number of ads as a second explanatory variable to the cost of ads explains only about 1 percent more of the variation in total sales. “which variable is really explaining the variation in total sales in the multiple regression?” the answer is that both are. Nevertheless Both Variables Explain The Same Thing Because X1 and X2 are closely related to each other. Using Both Explanatory Variables in a Multiple Regression Because both explanatory variables are highly significant by themselves. depending on where it appears in the newspaper.542 1.4 percent in the multiple regression . each variable is highly significant.95).59 P 0. Remember that you can still make fairly precise predictions when it’s present. The multiple regression says b 1 = 0.989 SOURCE Regression Error Total Coef 6.591 0.0 percent in the first simple regression r2 = 67.461 0. which determines the widths of confidence intervals for predictions.

000 change sales by $28.9 5.5 5.10 d.5 107.238 -3. Edith has concluded that it must the useless. The subordinate however.6 86. then she wanted to use the computer output as evidence to support some of her ideas at the meeting.9 PROMOT – 13.2 7.221 b Do the passengers who fly free cause sales to decrease significantly? c.556 .30 – P 0.686 2. As a mater of fact. Does an increase in promotions by $ 1.0 6. Use the following Minitab output to determine the best fitting regression equation for the airline: The regression equation is SALES = 172 + 25.041 Stdev 51.1 146. You however.largest proportion of the variation in the dependent variable.0 9.Pratt SSR SSE SST 872. Give a 90 percent confidence interval for the slope coefficient of COMP.35 5.5 3.5 6.1 163.7 30.0 5.0 3. should know better.2 155.4.32 -3.6 2.000.3 200.38 4. with the fewest number of independent variables.004 0. Warning: Thowing in too many independent variables just because you have a computer is not a great idea.0 4. all the information she possesses concerning the multiple regression is piece of scrap paper with the following on it: Regression for E.34 25. where SALES = total revenue based on number of tickets sold PROMOT = amount spent on promoting the airline in the area (in thousand if dollars) COMP = number of competing airlines at that terminal FREE = the percentage of passengers who flew free (for various reasons) Sales($ ) Promot ($) Comp Free 79.3 237.006 0.with 17 df df Predictor Constant PROMOT COMP FREE Coef 172.0 177.4 159.6 with 24 df Because the scrap paper doesn’t even have a complete set of numbers on it.877 3.342 T ratio 3. Of the total regression was significant at the 0.59 -1.2 200.3. use a = 0.6 5. Edith is late for a meeting because she has been unable to locate the multiple regression output that an associate produced for he. Should go directly to the meeting or continue looking for the computer output? Q2 A New England based commuter airline has taken a survey of its 15 terminals and has obtained the following data for the month of February.0 9.950 -13. 1023.0 7.5 10 8 12 7 8 12 12 5 8 5 11 12 6 10 10 3 6 9 16 15 9 8 10 4 16 7 6 10 4 4 168 © Copy Right: Rai University 11.With .000 0.05 level.2 COMP . or is the change significantly different from $28000? State and test appropriate hypotheses. a.9 291. is sick today and Edith has been unable to locate has work.9 160.0 2.04 FREE RESEARCH METHODOLOGY Exercises Q1 Edith Pratt is a busy executive in a nation wide trucking company.0 339.

Reward to informants ($000s) 4.. Similarly b2 measures the effect on ˆ y of changes in x 2 holding x 1 constant. Thus b1 measures the value of changes in ˆ x 1 on y holding x 2 constant. The constant a is the value of y if both x 1. Each independent variable accounts for some of the variation in the dependent variable. b1. An example will help make the process clearer: Suppose the IRS in US wish to model discovery of unpaid taxes. However are the estimated regression coefficients. The Computer and Multiple regression A manager in any managerial situation deals with complex problems requiring large samples and several independent variables. However it is very rare that we have just one explanatory variable and the explanatory power of the estimated equation can be substantially improved by the addition of more independent variables. x1 and x2. At this level the regression analysis can also be estimated manually. That is for each of the k independent variables we have n data points. This regression plane is determined in the same way as the regression line by minimizing the sum of squared deviations of data points from the regression plane. ($100000s) The data is shown in Table 1 Month Field audit Jan Feb March Apr May Jun July Aug Sept Oct 45 42 44 43 46 44 45 44 43 42 Comp hours 16 14 15 13 13 14 16 16 15 15 Reward to informers 71 70 72 71 75 74 76 69 74 73 Actual unpaid taxes 29 24 27 25 26 28 30 28 28 27 ˆ y = a + b1 x1 + b2 x 2 + b3 x 3 + e For the two variable case we can find the multiple regression equation as follows: ˆ y = a + b1 x1 + b2 x 2 + e The normal equations for this are as follows: ∑ y = na + b ∑ x + b ∑ x ∑ x y = a ∑ x + b ∑ x + b ∑ xx ∑ x y = a∑ x + b ∑ x x + b ∑ x 1 1 2 2 1 1 1 2 1 2 2 2 2 1 1 2 2 2 2 These can be solved to obtain the values of the parameters a. The regression equation that we estimate is : Multiple Regression Equation The general form of the multiple regression equation is as follows: The three variable case for example is : ˆ y = a + b1 x1 + b2 x 2 + b3 x3 + .. The generalized multiple regression model is specified for k variables with n data points. of hours of Field audit($00s) 2. b2 describe how changes in x 1 affect the value of y . age distribution of household.. We now look at how a statistical package such as SPSS or Minitab handles the data. Multiple regression there is a regression plane among y. This is shown in figure 1 below. Typically most multiple regression analysis is always carried out by computers.. Actual unpaid taxes discovered.RESEARCH METHODOLOGY LESSON 30: MULTIPLE REGRESSION So far we have talked of regression analysis using only one independent explanatory variable. For example in our earlier example of household consumption we can probably improve the explanatory power of the equation by adding more variables such as household size. which enable us to carry out complex calculations using large volumes of data easily. ˆ x 2 are zero. of computer hours($00s) 3. b2 So far we have referred to a as the y intercept and b 1 as the slopes of the multiple regression. The coefficients b1.. ˆ Thus linear regression estimates a regression line between two variables. Therefore our stress when discussing multiple regression will be on understanding and interpreting computer output.. However when we use two or more independent variables the process of regression becomes that much more complex and it is not feasible to solve for the parameters of the equation manually... etc. No. They include the following independent variables: 1...556 © Copy Right: Rai University 169 . + bk x k This equation is estimated by the computer . No.. 11.

Thus we have k+1 parameters to estimate from the sample data.8+. We can expect the estimation to be more accurate as the degree of dispersion around this regression plane is less. Example2: Insert exercise lr p732 Example: Pam Schneider owns and operates an accounting firm in Ithaca. New York. We can also use this equation to solve problems such as : Suppose in Nov the IRS plans to leave field hours and computer hours at their Oct level but increase rewards to $75000 How much of recoveries can they expect to make in Nov? We can get a forecasted value by substituting in the equation. The regression equation is of the form : ˆ y = a + b1 x1 + b2 x 2 + b3 x 3 From the numbers given in the coefficient column we can read the estimating equation: =-45. RESEARCH METHODOLOGY Now How Do We Interpret This Output? 1.3% of variation in unpaid taxes is explained by the three independent variables. Pam feels that it would be used to be able to predict in advance in the number of rush income-tax returns during the busy march 1 to April 15 period so that she can better . • If we hold the number of field audit labour hours.556 . This is given in table 2 around our estimated vale . For example in our problem: For a value of x1= 4300 hours x2= 1500 hors x 3= $75000 Our estimate for is $27905000 and our se is $286000. number of computer hours constant and we change rewards to informants by one unit . then y will change by an ˆ additional $405000 for each additional $1000 paid to informants.18(15)+.000) =$2860. If the addition of another variable reduces s e then we say that the inclusion of the third variable improves the fir of the regression.597Audit+1. The t value at 95% level of confidence given our degrees of freedom of n-k-1 is 2.200 Lower limit the standard error of the estimate measures the dispersion of data points around the regression plane.205. Data for these factors and number of rush returns for past years are as follows: X1 Economic index X2 Population within 1 mile of office X3 Average income in Ithaca Y Number of rush returns. • Similarly for holding x2 and x3 constant we increase an increase in hours in filed by $100 increase recoveries by $597000. This is because one more degree of freedom is reduced by the estimation of the intercept term a.This tells us the 98. = -45. The standard error of the regression in our problem is 0.286. To measure this dispersion or variation by the standard error of the estimate se Where Y = sample values of the dependent variable = corresponding estimated values from the regression equation n = number of data points in the sample k = number of independent variables (3 in our example0 The denominator of this equation shows that in a regression with k independent variables. She has hypothesized that several factors may be useful in her production. Standard error of the regression Now that we have our equation we need to have some measure of the dispersion of actual observations around the estimated regression plane.000) =$27. In our example we have R2=98. We can also use the standard error of estimate or the MSE and a the t distribution to form an approximate confidence interval The Coefficient of Multiple Determination In a multiple regression we measure the strength of the relationship among the three independent variables and the dependent variables by the coefficient of determination or R2. • Similarly holding x 1 and x 3 constant an additional 100 hours of computer time will increase by $1177000.Now a regression is run on Minitab and the sample out put is presented below. In our sample output this is indicated by s.447(286.444. We now have to interpret this output.18Comp+. Standard error of the regression is also called the root mean square or Mean square error (MSE).905 or approximately $28 million. march 1 to April 15 2306 1266 1422 1721 2544 99 106 100 129 179 10188 8566 10557 10219 9662 21465 22228 27665 25200 26300 170 © Copy Right: Rai University 11.405(75) = 27.405Rewards How do we interpret this equation? The interpretation is similar to that of the one variable simple linear regression case. for example if we want to construct a 95% confidence interval around this estimate of $27905000 we can do it as follows: $27905000+/-t s e= $27905000+2.3% . Smaller values of s e indicate a better regression.597(43)+1.8+. AS we add more variables in a regression explanatory power of the equation improves if the R2 increases.447(286.800 upper limit =$27905000+2.oan her personnel need during this time. the standard error has n-k-1 degrees of freedom. This defined as : R2 is the proportion of total variation in y that is explained by the regression plane.

87. PRICE = price of widgets (in $) INCOME = consumer income (in $) SUB = price of a substitute commodity (in $) (NOTE: A substitute commodity is one that can be substituted for another commodity.2 X3 8.8 71.9 42.212) – 0.2 X2 5.7 41. X 3 = 5. use whatever computer package is available to f find the best fitting regression equation and answer the following: a.6 61. Exercises Q1Given the following set of data. For example.8.3 1.5456 -0.7 5.7 80. 41.73 X1 21.2 2. What is the regression equation? b.245 0.5 2.2.925) = 2436 rush returns.5 7.9 30. X2.8 90. and X4 are 52.2% of the total variation in Y is explained by the model. = -1275 + 17.5 2.8.9 12.514 X2 .9 X2 62. X 2 = 4.72 -1.098 0.8 3.174 X3 a What is the regression equation? b What is the standard error of estimate? c What is R2 for this regression? e Given an approximate 95 percent confidence interval for the value of Y when the values of X1.3 X1 3.1 X1 + 0. What is the predicted value for Y when X 1 = 5.1 b.1743 X 3 .059 0.) y = -1275 + 17.a.4 1.5 2.0 22.1743 S = 396.335 0.5 3. What is the standard error of estimate? c.4 2. Use the following Minitab output to determine the best fitting regression equation for these data: The regressions equation is Y = -1275 + 17.3 6.2%. the economic index is 169. X3.8 23.7 64.7 20.5406 X 2 . RESEARCH METHODOLOGY Predictor const Coef -1275 Stdev 2699 6.1743( 26.7 1.6 66.7 81. What is R2 for this regression? d.5 X3 21.3 Given the following set of data use whatever computer package is available to find the best fitting regression equation and answer the following: © Copy Right: Rai University 11.6 91.5 8.059 X 1 + 0.8 93.1 22. then population with in 1 mile of the office is 10212.9 40.5406 (10.4 9.8 11.47 1.2% T ratio – P 0.7 1. and the average income in Ithaca is $26925.5 8.9 19. and 3.6 4.3144 0.9 34.059 (169 ) + 0.4.8 41.0.9 X4 -2 5 2 -4 8 1 Y - X1 X2 X3 17.6 2.1005 R sq = 87. How many rush returns should Pam expect to prices between March 1 April 15? Results: Q3 We are trying to predict the annual demand for widgets (DEMAND)using the following independent variable.556 171 . margarine is a substitute commodity for butter.9 77.6 43.4 51.3 9.0.333 -0. respectively. For this year.2 55.1? Year 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 Demand 40 45 50 55 60 70 65 65 75 75 80 100 90 95 85 Price ($) 9 8 9 8 7 6 6 8 5 5 5 3 4 3 4 Income 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 Sub ($) 10 14 12 13 11 15 26 27 22 19 20 23 18 24 21 Y 64.9 69.47 2.719 0.6 35. ˆ R2 = 87. What percentage of the total variation in the number of rush returns is explained by this equation? c.6 6.9 24.

State and interpret the standard error of estimate for this problem. b. c. Using the equation. e.556 . Using whatever computer package is available. d. as one would expect? Explain briefly. determine the best-fitting regression equation for these data.a. State and interpret the coefficient of multiple determinations for this problem. Are the signs (+ or -) of the regression coefficients of the independent variables. consumer income was $1200 and the price of the substitute commodity was $17? RESEARCH METHODOLOGY 172 © Copy Right: Rai University 11. what would you predict for DEMAND if the price of widgets was $6.

Much in the same way as for hypothesis testing for a mean we can also set up confidence intervals for the parameters of the estimated equation...... to test a hypotheses about the value of Bi . This requires that our t ratio be a large positive or negative.... After all if we throw a dart on board to get a scatter plot we could generate a regression. which shows the two variable case. The true form of the unknown equation for the k variable case is: ˆ y = a + b1 x1 + b2 x 2 + b3 x3 + . Therefore we conclude that each one is a significant explanatory variable.... b2 . The independent variable x i is a significant explanatory variable if bi is significantly different from zero. Therefore instead of satisfying the above equation the individual data points will satisfy : Our Hypothesis is Ho: B1=B2+…Bk = 0 Null hypothesis that y does not depend on x is Ha: atleast one Bi ¹0 Alternative hypothesis that at least one Bi is not zero. In our IRS example for each of the three explanatory variables p is less than . + bk x k + e This is the population regression plane plus a random disturbance term . • Some of the computer hours may be used for organizing Figure 2 If p> a Xi is not a significant explanatory variable .. data rather than analyzing accounts.. Therefore we need to ask the question a high value of R2 necessarily mean that the independent variables explain a large proportion of the variation in Y or could this be a freak chance.. Test of significance of the regression as a whole It is quite possible that we frequently may get a high value of R2 by pure chance. This equation however is unknown and we have to use sample data to estimate it. which are values of the slope parameter for the ith variable . which may conceivably have a high R2. The standard deviation of this term of this term is se The standard error of the regression se which we have talked about in the earlier section is an estimate of se. which is the true population value of the slope for the ith variable. In the same way our population regression line or the true relationship of the data is : is Y=A+Bx . b 3. If p<a Xi is a significant explanatory variable . The term e is a random disturbance term which equals zero on the average. + bk x k Even in the case of the population regression plane regression plane not all data points will lie on it. As Our Sample Regression Equation This equation estimates the unknown e population regression plane : As we can see the estimation of a regression plane can also be thought of as a problem of statistical inference where we make inferences regarding an unknown population relationship on the basis of an estimated relationship based on sample data. Tests of Inference of an individual slope parameter Bi As explained earlier we can use the value of the individual b i .01....RESEARCH METHODOLOGY LESSON 31: MAKING INFERENCES ABOUT POPULATION PARAMETERS Essentially when we run a regression we are actually estimating the parameters on the basis of the sample of observations.. • For these and other reasons some of the data points will lie above the regression plane and some below it .. Therefore y =a+bx for example is a sample regression line ˆ much in the same way that ‘x is a sample estimate of the population parameter m... In statistical terms we ask the following question: Is the regression as a whole significant? In the last section we had looked at whether the individual x i were significant. The process of hypotheses testing is the same as that delineated for testing the mean. We can also make inferences about the slopes of the true regression equation slope parameters(B1. To explain this concept we have to go back to our initial diagram. Why ? Consider our IRS problem. bk ). (insert diag Lr p743 The total variation in y : ∑ ( y − y) ˆ ∑ ( y − y) 2 Explained variation by the regression is : 2 11. • ˆ y = a + b1 x1 + b2 x 2 + b3 x3 + . Now we ask whether collectively all the x i (i=1…k) together significantly explain the variability in y.. Not all payments to informants will be equally effective. This test of significance of the explanatory variable is always a two-tailed test..556 © Copy Right: Rai University 173 . B2.. Bk) on the basis of slopes coefficients of the estimated equation (b1.

3 0. The midterm exam for the past semester had a wide distribution of grades.e that the explanatory variables have a significant effect on y then the F ratio tends to be higher than if the null hypothesis is true..value. This is because we break up the up the analysis of variation in Y into explained variance or variance explained by the regression(between column variace0 174 Predictor Constant Hours Iq Books Age Coef -49.0819 F 118.37627 1. If the null hypotheses is false i.98163 0.218 0. and they study varying amount of time for exams.00.7029 . SST has n1 degrees of freedom.556 © Copy Right: Rai University . SSR has k degrees of freedom because there are k independent variables.319 S = 11.55 0..35 -2. .year old student with an IQ of 113 who studied 5 hour and used three different books? 11. This is also at times called the ANOVA for the regression. SSE has degrees of freedom n-k-1 because we used n observations to estimate k+1 parameters a. Because p<a =0.52 P 0. The sample output for the IRS problem is given above. their Iqs they are of different ages.7% a What is the best fitting regression equation for these data? b What percentage of the variation in grades is explained by this equation? c What grade would you aspect for a 21.Unexplained variation : ˆ ∑ ( y − y) 2 and unexplained variance.1088 .008 0. A typical output of a regression also includes the computed F ratio for the regression. To develop a predicting formula for exam grads.03982 -1.00 RESEARCH METHODOLOGY This is shown in the figure 3 for the one variable case of simplicity.) This is shown in table 3 Table 3 Analysis of Variance Source Regression Error Total DF 3 6 9 SS 29.67332 T-ratio -1.109.491 ( with n-k-1 df = 6) degrees of freedom. but Bill feels certain that several factors explain the distribution: He allowed has students to study from as many different books as they liked.312 0.50799 0. So if the F ratio is large we reject the null hypotheses that the explanatory variables have no effect on the variation of y.491 6 ∑ ( y − y) 2 SSR=Regression sum of squares: SSE=Error sum of squares: ˆ ∑ ( y − y) 2 2 The MS column is the sum of squares divided by the number of degrees of freedom.600 MS 9. which is 0.bk. Going back to our IRS example we now look at the computer output. questions regarding study time and number of books used.(within column variance. Therefore we reject Ho and conclude that the regression is significant. a statistic professor in a leading business school. If the null hypotheses is true we get the following F ratio: SSR k F= SSE n − k −1 Which has a F distribution with k numerator degrees of freedom and n-k-1 degrees of freedom in the denominator.06931 1.63 1. Bill asked each student to answer.09 3.78990 Stdev 41. Figure 3 Thus when we look at the variation in y we look at 3 different terms each of which is a sum of squares .01 we can conclude that the regression as a whole is highly significant.948 1. SSR=29.11 F = 3 = 118.657 R – sq = 76. The output form Bill’s computer run was as follows: ˆ ∑ ( y − y) Total variation in y can be broken into two parts: the explained and the unexplained: SST=SSR+SSE Each of these has an associated degrees of freedom.4912 29. For a multiple variable case the something applies conceptually. has a keen interest in factors affecting student’s performance on exams. k=3 SSE=.67 P 0.268 0.36460 2.These are denoted as follows: SST= Total sum of squares: 29. at the end of the exam. so he compiled the data for the class and ran a multiple regression with Minitab. Exercises Q1 Bill Buxton.b2. The output also gives us the p. Bill’s teaching record already contained the Iqs and ages for the students. b1.20 1.

What Effect Does Multicollinearity Have on o Regression? Essentially when we run a regression with where independent variables are highly correlated we find that the overall predictive power of the regression may not be affected i. Sept. For the past 12 months.17 + 2. Nov.5271 t=ratio p 3.2 40. the regression coefficient often become less reliable as the degree of correlation between the independent variable increases. we can often predict y well. 2.2 Why Would Multicollinearity Occur? For example if we wish to estimate a firm’s sales revenue and we use both number of sales men employed and their tal salary bill. which entitles the bearer to receive two Pizza Shack pizzas while paying for only 11. The prob values at which these t ratios are significant may also be higher.0832 Stdev 4.95 0.19 MS 17.556 R –Sq = 61.4 34. the ads are scheduled and paid for in the month before they appear.007 0.62 P 0. What we can’t do is tell with much precision how the dependent variable changes in response to changes in any one of the correlated variables..1 May June July Aug. One of the key assumptions we make when carrying out a regression analysis is that the variables are independent and uncorrelated.937 2. Adding a second variable which is highly correlalated with the first distorts the values of the regression coefficients. Oct.4 9.9 12. Feb.6 35. The manager has collected the data in table 4 and would like to use it to predict pizza sales.3 14. Table 4 Pizza Shack sales and Advertising Data Month X1 number of Ads appearing 12 11 9 7 12 8 6 13 18 6 8 10 X2 Cost of Ads Appearing 13.2 8.08 ADS Predictor Constant ADS S = 4. Table 5 Minitab regression of sales on number of ads Regression Analysis The regression equation is SALES = 16. But the value of the individual b coefficients may not be significant .0 9.40 3. These two would naturally be highly correlated with each other. March.3 9.87 Cost © Copy Right: Rai University 175 .3 11. even when multicolinearity is present.7 12.e. If there is a high level of correlation between some of the independent variables we have a problem that statisticians call multicollinearity.0% Table 6 Minitab regression of sales on the cost of ads Regression Analysis The regression equation is SALES = 4. Nevertheless.31 Coef 16.7 38. In actual case we could have effectively done with only one variable rather than using two.2 11.2 30.76. the more expensive of the two.003 How Does Multicollinearity Affect Us? 1.3 10.e.4 11.6 37.31 276.regression analysis. Jan.0 30. Each of the ads contains a two-for one coupon. If the explanatory variables are relevant and explain a significant proportion of the variation in y we can still make reasonably accurate predictions.206 Analysis of Variance SOURCE Regression Error Total DF 1 10 11 SS 176.88 453.982 0.3 46. April Y Total Pizza sales (000s of dollars) 43.5 22. In Tables 5& 6 and we have given Minitab outputs for the regressions of total sales of number of ads and cost of ads. Multicollinearity is the problem when two or more of the independent variables are correlated. the manager of Pizza Shack has been running a series of advertisements in the local newspaper. In multiple regression coefficients become unreliable if there is a high level of correlation between the independent variables. respectively.6 38.RESEARCH METHODOLOGY LESSON 32: MULTICOLLINEARITY IN MULTIPLE REGRESSION Definition And Effect of Multicolinearity In multiple.69 F 15. Dec. i.1 35. An Example Will Make The Issue Clearer Let’s look at an example in which multicollinearity is present to see how it present to se how it affects the regression.9 + 2.003 2. This is because the slope coefficients will become distorted ands will be associated with high standard errors. they may have large standard errors and small t values. the egression may continue to have a high R2.

but an R2 of only 68. so the cost of ads is even core significant as an explanatory variable for total sales than was the number of ads (for which the observed t value was only 3. because the ANOVA p is 0.19 RESEARCH METHODOLOGY 0. © Copy Right: Rai University 176 11. Note also that r2 = 61. adding the number of ads as a second explanatory variable to the cost of ads explains only about 1 percent more of the variation in total sales.54 R –Sq =67.58 + 0.Predictor Constant ADS Coef 4. and in the multiple regression.470 1. This is because the cost of an ad varies slightly.8725 S = 3.006 P SS 309. the standard error of estimate. relatively small computed t values.0 percent.91 Total DF F 2 9.04 148.74 9 11 0.99 143. In fact the correlation between these two variables is r = 0.169. For the regression on number of ads.56 COST 1. but we cannot separate out their individual contributions because they are so highly correlated with each other.04 14.849.4 percent in the multiple regression .20 453. is 3.8949.95).989 while for the simple regression with the cost of ads as the explanatory variable (output in Table 5 ). depending on where it appears in the newspaper. Table 7 Minitab regression of sales on the number and cost of ads Regression Analysis The regression equation is SALES = 6.1 neither variables is a significant explanatory variable. in the Sunday paper.989 Analysis of Variance 6. which determines the widths of confidence intervals for predictions.556 . Using both explanatory variables in a multiple regression: - Correlation Between Two Explanatory Variables This contradiction is explained once we notice that the number of ads is highly correlated with the cost of ads.625 0. they are collectively vary significant.19 MS 305. we see that even at a = 0. “which variable is really explaining the variation in total sales in the multiple regression?” the answer is that both are.109 t=ratio p 0.59 P 0. Individual Contribution Can’t Be Separated Out At this point.000 Analysis of Variance SOURCE Regression Error Total DF 1 10 11 Loss of Individual Significance However if we look at the p values for the individual variables in the multiple regression.95 with 10 degree of freedom and a significance level of a = 0.542 Coef Stdev t=ratio Nevertheless Both Variables Explain The Same Thing Because X1 and X2 are closely related to each other. their coefficients in the multiple regression have high standard errors. What has happened here? In the simple regression. but individually not significant.591 2.3 percent so about 67 percent of the variation in pizza sales is explained by the cost of ads.584 0. we try to use both of them in a multiple regression.01 the critical t value is found to be 3.173 2. r2 = 67. we see that the observed t value is 3.3 percent in the second simple regression.81 F 20. The output is in Table 8 The multiple regression is highly significant as a whole. that’s why we get r2 = 61.14 COST Predictor p Constant 0. so that the number of ads explains about 61 percent of the variation in pizza sales. For instance. As a result of this.0 percent in the first simple regression r2 = 67.63330 4. in effect they each explain the same part of the variation Y. it is fair to ask. so we have a problem with multicollinearity in our data.000 SOURCE MS Regression 154.59 0. because t o > t c we conclude that the number rof ads is a higher significant explanatory variable for total sales.3% SS 305. the observed t value is 4. How Does This Multicollinearity Affect Us? We are still able to make relatively precise predictions when it is present : Note that for the multiple regression (output intable 6). For the regression on the cost of ads.45 S = 3. in this regression.the multiple coefficient of determination is r2 = 68.849 Stdev 7.54.77 ADS 0.180 R –Sq =68.4 percent so the two variables to gather explain about 68 percent of the variation in total sales. each variable is highly significant.62 ADS + 2.461 0.4% 1.006.120 8.15 453. Because both explanatory variables are highly significant by themselves.570 0.99 Error 15. we have Se = 3.139 0. and relatively large prob>½t½values. ads in the TV section cost more than ads in the news section and the manager of Pizza shack has placed Sunday ads in each of these section on different occasions. You might wonder why the two variables are not perfectly correlated.

0 3.000.1 146. So our aim should be to minimize multicollinearity. Should go directly to the meeting or continue looking for the computer output? Q2 A New England based commuter airline has taken a survey of its 15 terminals and has obtained the following data for the month of February.38 4.6 with 24 df Because the scrap paper doesn’t even have a complete set of numbers on it.7 30. or is the change significantly different from $28000? State and test appropriate hypotheses. then she wanted to use the computer output as evidence to support some of her ideas at the meeting. 1023.6 86. Edith has concluded that it must the useless.3 200. Multicollinearity is a problem you have to deal with in multiple regressions.4.59 -1.5 5.0 9. about $ 1.221 a.950 -13. Warning: Thowing in too many independent variables just because you have a computer is not a great idea.877 3.6 5.2 200.238 -3.0 6.5 107.05 level.5 Comp 10 8 12 7 8 12 12 5 8 5 11 12 6 10 10 Free 3 6 9 16 15 9 8 10 4 16 7 6 10 4 4 a Use the following Minitab output to determine the best fitting regression equation for the airline: The regression equation is SALES = 172 + 25.9 5. The subordinate however.000 change sales by $28. As a mater of fact.3.6 2.0 339.556 © Copy Right: Rai University 177 .34 25. The multiple regression says b 1 = 0.006 0.9 PROMOT – 13.with 17 df df Sales($ ) 79.9 291.3 237.2 155.4 159. with the fewest number of independent variables.000 0. Does an increase in promotions by $ 1. but the standard error of this coefficient is 1. Edith is late for a meeting because she has been unable to locate the multiple regression output that an associate produced for he. where SALES = total revenue based on number of tickets sold PROMOT = amount spent on promoting the airline in the area (in thousand if dollars) COMP = number of competing airlines at that terminal FREE = the percentage of passengers who flew free (for various reasons) 11.5 3.004 0. RESEARCH METHODOLOGY Exercises Q1 Edith Pratt is a busy executive in a nation wide trucking company.04 FREE Predictor Constant PROMOT COMP FREE Coef 172. but remember that when it’s present you cant tell with much precision how much the dependent variable will change if you “jiggle” one of the independence variables.5 6.With .0 5.686 2. You however.041 Stdev 51. Of the total regression was significant at the 0.10 c. Give a 90 percent confidence interval for the slope coefficient of COMP. Hint : the best multiple regression is one that explains the relationship among the data by accounting for the largest proportion of the variation in the dependent variable.0 4.Pratt SSR SSE SST 872.342 T – ratio 3.120). Remember that you can still make fairly precise predictions when it’s present.0 7. Do the passengers who fly free cause sales to decrease significantly? b.What can’t we do? We can’t tell with much precision how sales will change if we increase the number of ads by one .0 9.0 177.1 163. should know better.12 (that is. each ad increases total pizza sales by about $625).0 Promot ($) 2.625 ( that is. use a = 0.2 7. all the information she possesses concerning the multiple regression is piece of scrap paper with the following on it Regression for E.2 COMP .32 -3.35 5.30 P 0. an developing a common sense understanding of it is necessary. is sick today and Edith has been unable to locate has work.9 160.

The most important thing for any regression is to look at the residuals.333 Mark is not making a profit and his rates should be changed.250 250. If the regression includes all relevant variables these residuals should be random. Why ? thus if the slop coefficient for FULLPAGE is significantly above 333. what is the appropriate critical value of F to use in determining whether the regression as a whole is significant? 178 Y = A + B1 X 1 + B 2 X 2 11.126 0.884 67. Thus we now fit a regression equation of the form.95 0.43 0.232 0.e.42 951.556 © Copy Right: Rai University . We see that the first 5 residuals are positive i.99 1. march got the output that follows.92 T – ratio 1.060 0.69 P 0. Should he consider adjusting his rates if newsprint costs him 9$ per pound? assume other costs are negligible. a Mark had always felt that each display advertisement used at least 3 pounds of newsprint.41 3. Mark had always felt that each classified advertisement used rough half a pound of newsprint.RESEARCH METHODOLOGY LESSON 33: APPLICATIONS OF REGRESSION ANALYSIS IN RESEARCH Mark Lowtown Publishes the Mosquito Junction Enquirer and is having difficulty predicting the amount of newsprint needed each day. For the five points representing male sales men this variable has a value =0 For the three female sales persons it equals 1.09 per pound = $22.333 pounds.. is the regression significant as a whole? Modelling Techniques Given a variable we usually have a group of potential explanatory variables. Each regression equation is called a model. Modelling techniques are the process by which we include different explanatory variables and check the appropriateness of the regression model. Breakeven is at 333.e. Now We test for the significance of the b coefficient i.66 Stdev 872.251 1. regions. Does there regression give him significant reason to doubt this belief at the 5 percent level? b Similarly. An example would be countries.) Q3 The following addition output was provided by Minitab when Bill ran the multiple regression: Analysis of variance SOURCE Regression Error Total DF 4 7 11 SS 3134. He has randomly selected 27 day over the past year and recorded the following information: POUNFD = pounds of newsprint for that day’s newspaper CLAWFIED = number of classified advertisements DISPLAY = number of display advisements FULLPAGE = number of full-page advertisements Using Minitab to regress POUNDS on the other three variables. Insert figure 4 (scan of residuals) For the last three however the residuals are negative i.25 4085 MS 783.e.56 cost. y^y >0 . If the residuals show any non random patterns this indicates that there is a systematic influence on the dependent variable which should actually have been included in the regression equation. We would also have different combinations of explanatory variables or different regression equations.89 F P y = a + b1 x1 + b2 x 2 ˆ Where the variable X1= 0 for men and 1 for females. gender(male vs female).001 c Based on your answers to (a) and b. This is effectively amounts to estimating tow different equation s one for man and one for women.23 1.60 135. This is shown in the table8 below . ˆ y = a + b1 (o ) + bs x 2 = a + b2 x 2 Sales women: a + b1 (1) + b2 x2 = A + b1 + B2 x2 Sales men: For sales men and women with the same length of employment we predict a base salary difference of b2 dollars.e the regression line fall b above these points a this suggests that there is a gender factor at working our example.66 pounds of paper ´$0. i..05. State explicit hypotheses and an explicit conclusion (Hint : Holding all else constant each additional full-page ad uses 250. There are many different techniques: Two commonly used ones are : Dummy Variable Technique or Qualitative Data So far we have used only numerical data . But frequently we have a variable which is categorical or qualitative. How do we incorporate gender into our regression model? We do this by a device called a Dummy variable. Does he now have significant reason to doubt this belief at the 5 percent level? c Mark sells full page advertising space to the local merchants for $30 per page. Predictor Constant CLASFIED DISPLAY FULLPAGE Coef 1072. is there actually a lower salary being paid to women? Even with the same length of service? Again our hypothesized population relationship is : a What is the observed value of F? b At a significance level 0.172 0.

or Y > Y.8409 9. This confirms the observation we made when we looked at the scatter diagram The regression equation is SALARY = 5.000 0.8093 t-ratio 0.000 0. or Y < Y. And thus for the Saleswomen.Y < 0.02492 9.8054 6.6 9.22707X1 + O. Our example had only one qualitative variable (gender).39 0.e. Figure 13-11 contains part of the output that we haven’t seen before. So we look for patterns in the residuals. FITS 1 are the fitted values and RESIl are the residuals.23320 0.1 10.194607 -0. Then the coefficient of the dummy variable can be interpreted as the difference between a woman’s base salary and the base salary for a man. Can you guess what the regression would have been in this case? It shouldn’t surprise you to learn that it would have been Y = 5. For each data point. Exercises Sc 13-4 Edith Pratt is a busy executive in a nation wide trucking company. Looking at this in another way. we “squeeze the residuals until they talk.4 9.016 level of p.3073 10.906558 MS 26.443 0. indicating that months employed explains about 93 percent of the variation in base salary. then she wanted to use the computer © Copy Right: Rai University 0.092664 -0.7 9.0 6.556 179 . so the regression line lies above three of the four Data points.9753 8.36 R-sq = 92.443 2. and is significant . From that output.291499 0.05 level.556 FITS 1 7.458684 0. We set up a dummy variable. Then its coefficient would be the difference between a man’s base salary and the base salary for a woman.6% Insert Figure Scan Diag p757 LR) Clearly shows that base salary increases with length of service. Analysis of Variance RESEARCH METHODOLOGY Regression Analysis The regression equation is SALARY = 5.If there is discrimination the B2 should be negative. Edith is late for a meeting because she has been unable to locate the multiple regression output that an associate produced for he. we note that the first five residuals are pos-itive.0069 12. if the residuals show any non-random patterns. So for the salesmen.6 percent. the regression line falls be-low these five data points.81 + 0.000 Now let’s review how we handled the qualitative variable in this problem. The coefficient for gender i.4038 P 14.31. Further we note that the inclusion of gender variable makes the months employed even more significant as an explanatory variable. this indicates that there is something systematic going on that we have failed to take into account. At the .293054 0.8093 t-ratio 0. And we see that they essentially have a random pattern.4038 P 14.2085 8.01 level of significant we find p>a we there fore reject the hypothesis a that there is no discrimination at 1% level of significance.5 8. not the numerical value of the coefficient of the dummy variable.1413 8. We can now look at the fresh output of the residuals. or to put it some-what more picturesquely.3 13. that is. Of the total regression was significant at the 0.000 0.113 28.36 R-sq = 92.. In Figure 13-11. we have Y Y > 0.233 MONTHS PredictorCoef Constant MONTHS s = 0. and that variable had only two possible categories (male and female).233 MONTHS PredictorCoef Constant MONTHS s = 0. we have Y .81 + 0. a table of residuals. r2 = 92.39 0. b1 has a t ratio of –3. which we recog-nize as the error in the fit of the regression line at that point.02492 9. If the regression includes all the relevant explanatory factors.4595 + 0. you’ll note that the black points tend to be above it and the colored cir.140928 0.” As we look at the residuals in Figure 13-11. Three of the last four residuals are negative. Suppose we had set the dummy vari-able to 0 for women and I for men.7066 RESIl 0.302 Stdev 5.6% 11. we see that months employed is a very highly significant explanatory variable for base salary. Although we won’t pursue the details here.8 DF 1 7 8 SS 26. dummy variable techniques can also be used in problems with several qualitative variables.d>s rend a> be below it Figure 13-10 gives the output from a regression of base salary on months employed. We carry out the following test of significance: Ho: B1=0 Ha:B1<0 The results for our hypothesis is given in the table below. Perhaps the most important part of analyzing a regression output is looking at the residuals. but if you try to “eyeball” the regression line. which we gave the value 0 for the men and the value 1 for the women.5494 Stdev 5.775297 -0. Also.Y.2 8. the residual is just Y . and those variables can have more than two possible categories.5494 Analysis of Variance SOURCE Regression Error Total I ROW SALARY 1 2 3 4 5 6 7 8 9 7.7890X2 The choice of which category is given the value 0 and which the value 1 is totally arbitrary and affects only the sign.23320 0.492276 0. these residu-als ought to be random.6077 10.

0 7.3 200.000.5 6.0 339. Edith has concluded that it must the useless.5 3. or is the change significantly different from $28000? State and test appropriate hypotheses.000 change sales by $28.output as evidence to support some of her ideas at the meeting.2 200.0 3.5 5.0 177. is sick today and Edith has been unable to locate has work.877 -1.7 30.238 3.0 9. all the information she possesses concerning the multiple regression is piece of scrap paper with the following on it: Regression for E.041 Stdev 172.59 0.6 2.35 5.006 0.10 d Give a 90 percent confidence interval for the slope coefficient of COMP Notes - RESEARCH METHODOLOGY 1023.1 146.0 6.000 0.9 PROMOT – 13.221 © Copy Right: Rai University 0.30 3.9 160.Pratt SSR SSE SST 872. Should go directly to the meeting or continue looking for the computer output? SC 13 – 5 A New England based commuter airline has taken a survey of its 15 terminals and has obtained the following data for the month of February.4.556 .with 17 df df b Do the passengers who fly free cause sales to decrease significantly? c Does an increase in promotions by $ 1.0 2. The subordinate however.0 4.With .3.34 25.004 -13.4 159.1 163. As a mater of fact.6 5.5 107.6 with 24 df Because the scrap paper doesn’t even have a complete set of numbers on it.6 86.32 -3.2 155.9 291. use a = 0.9 5.342 T – ratio P 51.950 2.0 9.2 7.3 237. should know better. You however.38 4. where SALES = total revenue based on number of tickets sold PROMOT = amount spent on promoting the airline in the area (in thousand if dollars) COMP = number of competing airlines at that terminal FREE = the percentage of passengers who flew free (for various reasons) Sales($ ) Promot ($) 79.686 180 11.2 COMP .5 Comp 10 8 12 7 8 12 12 5 8 5 11 12 6 10 10 Free 3 6 9 16 15 9 8 10 4 16 7 6 10 4 4 a Use the following Minitab output to determine the best fitting regression equation for the airline: The regression equation is SALES = 172 + 25.04 FREE PredictorCoef Constant PROMOT COMP FREE -3.0 5.

60 Error 7 951. Why ? thus if the slop coefficient for FULLPAGE is significantly above 333.05. the following addition output was provided by Minitab when Bill ran the multiple regression We Also Include Some Material From an Internet Web Site on Regression Models Introductory Statistics: Concepts. Stockburger Regression Models Regression models are used to predict one variable from one or more other variables.000 Predictor Constant CLASFIED DISPLAY FULLPAGE Coef 1072.41 3.250 250. and a Mark had always felt that each display advertisement used at least 3 pounds of newsprint.) 13-23 Refer to exercise 13-18.66 Stdev 872.00 maximum and mediocre to poor scores on the ACT. is the regression significant as a whole? 11. He asks about attending Harvard.95 0.99 1.172 0.39 P 0.92 T – ratio 1. He has randomly selected 27 day over the past year and recorded the following information: POUNFD = pounds of newsprint for that day’s newspaper CLAWFIED = number of classified advertisements DISPLAY = number of display advisements FULLPAGE = number of full-page advertisements Using Minitab to regress POUNDS on the other three variables. what is the appropriate critical value of F to use in determining whether the regression as a whole is significant? c Based on your answers to (a) and b. and Applications David W. 13-23 Refer to Exercise 13-19.RESEARCH METHODOLOGY LESSON 34: REGRESSION ANALYSIS USING SPSS PACKAGE 13 –22 Mark Lowtown Publishes the Mosquito Junction Enquirer and is having difficulty predicting the amount of newsprint needed each day. Models.64 at the end of four years at Harvard.23 1. march got the output that follows. the following additional out was provided by Mititab when the multiple regression was run: Analysis of variance SOURCE Regression Error Total At the 0.7 F 102. Breakeven is at 333.43 0. present.56 cost. because the event to be predicted will occur in some future time.09 per pound = $22. is the regression significant as a whole? 13-27 Henry Lander is direction of production for the Alecos Corporation of Caracas.001 evel of significance. however. Before describing the details of the modeling process. Should he consider adjusting his rates if newsprint costs him 9$ per pound? assume other costs are negligible. Venezuela.05 l DF 4 18 22 SS 2861495 125761 2987256 MS 715374 6896.333 Mark is not making a profit and his rates should be changed. SOURCE DF SS MS F P Regression 4 3134.66 pounds of paper ´$0. So you gather data. The scientist employs these models either because it is less expensive in terms of time and/or money to collect the information to make the predictions than to collect the information about the event itself.04 grade point average out of 4. run a regression of absenteeism on rainfall.060 0. The student has a 2. allowing predictions about past. The counselor tells him he would probably not do well at that institution. Does he now have significant reason to doubt this belief at the 5 percent level? c Mark sells full page advertising space to the local merchants for $30 per page.232 0. more likely.251 1. Mark had always felt that each classified advertisement used rough half a pound of newsprint. The student inquires about the necessary grade © Copy Right: Rai University 181 . or. Regression models provide the scientist with a powerful tool.89 Total 11 4085 Analysis of variance a What is the observed value of F? b At a significance level 0. But hennery is not convinced that this is a satisfactory predictor. State explicit hypotheses and an explicit conclusion (Hint : Holding all else constant each additional full-page ad uses 250.25 135. Does there regression give him significant reason to doubt this belief at the 5 percent level? b Similarly.42 783.69 P 0. some examples of the use of regression models will be presented. Data are gathered for several months you run the simple regression and you find that temperature explains 66 percent of the variation in absenteeism. at a significance level of 0.556 Example Uses of Regression Models Selecting Colleges A high school student discusses plans to attend college with a guidance counselor.126 0.884 67.01 is DISTANCE a significant explanatory variable for SALES? 13-24 Refer to Exercise 13-19. predicting he would have a grade point average of 0. Henry has asked you to help him determine a formula for predicting absenteeism in a met packing facility he hypothesizes that percentage absenteeism can be explain by average daily temperature. He suggests that daily rainfall may also have something to do with absenteeism. or future events to be made with information about past or present events.333 pounds.

both the information which is going to be used to make the prediction and the information which is to be predicted must be obtained from a sample of objects or individuals. If the counselor was using a regression model to make the predictions. so the procedure will now be discussed. For this reason it was.” It may be that this particular student was completely bored in high school. and the regression model is used to transform this information into the predicted. it would make a great deal of difference if the doctor were to say that the child had a ninety-five percent chance of having an IQ between 70 and 80 in contrast to a ninety-five percent chance of an IQ between 50 and 100. it is necessary to have information on both variables before the model can be constructed.” For example.000 people who will make the best employees because training takes time and money and firing is difficult and bad for community relations. but chances for success are not great. etc when this statement is made. as there is always error in the regression procedure. and still is. 1. with a predicted grade point average of 1.64 would most likely make the rational decision of the most promising student.000 people waiting to apply for the 1. These values are just “best guesses. graduates with an associates degree and makes a fortune selling real estate. A regional institution is then proposed.24 and one with 0.54. the student decides that maybe another institution might be more appropriate in case he becomes involved in some “heavy duty partying. All future applicants would be given the test and hiring decisions would be based on test performance. A notational scheme is now necessary to describe the procedure: Xi is the variable used to predict. The relationship between the two pieces of information is then modeled with a linear transformation. In order to provide information to help make the correct decisions.556 . Then in the future. Error may be incorporated into the information given the woman in the form of an “interval estimate. and not to mention. it is important to understand how they work. The plant personnel officer advertises the employment opportunity and the next morning has 10. The doctor makes a “point estimate” based on a regression model that the child will have an IQ of 75. the air force has lost a plane. Deciding that is still not high enough to graduate. Later. when the number of widgets made per hour had stabilized.64 at Harvard. The army that takes its best and brightest men and women and places them in the front lines digging trenches is less likely to win the war than the army who places these men and women in the position of leadership. Pregnancy A woman in the first trimester of pregnancy has a great deal of concern about the environmental factors surrounding her pregnancy and asks her doctor about what to impact they might have on her unborn child. It is highly unlikely that her child will have an IQ of exactly 75. RESEARCH METHODOLOGY Manufacturing Widgets A new plant to manufacture widgets is being located in a nearby community. vital that the best possible selection and prediction tools be used for personnel decisions. it would be the number of widgets produced per hour by that individual.25. It costs a great deal of money and time to train a person to fly an airplane. One usually thinks of the atomic bomb. the time and effort to train the pilot. the United States had thousands of men and women enlisting or being drafted into the military. and placement. In order to use them wisely. would become challenged in college and would succeed at Harvard. It is important to select the 1.23. During these wars. and 1. Less well known were the contributions of psychologists and associated scientists to the development of test and prediction models used for selection and placement of men and women in the armed forces. when faced with a choice between a student with a predicted grade point of 3. the personnel officer employs a regression model. the personnel officer of the widget manufacturing company might give all applicants a test and predict the number of widgets made per hour on the basis of the test score. the personnel officer could create a prediction model to predict the widget production of future applicants. the counselor predicts that he might succeed.” When asked about the large state university.point average to graduate and when told that it is 2. The concept of error in prediction will become an important part of the discussion of regression models. didn’t take the standardized tests seriously. The selection committee at Harvard. Every time one crashes.000 available jobs. and is sometimes called the dependent variable. the student decides to attend a local community college. Selection and Placement During the World Wars Technology helped the United States and her allies to win the first and second world wars. only the first information is necessary. better designed aircraft. Y i is the observed value of the predicted variable. he or she would know that this particular student would not make a grade point of 0. the loss of the life of a person. It is also worth pointing out that regression models do not make decisions for people. For example.54 at the regional university. however. bombsights. it would be the test score. who is drafted and who is rejected. In order to create a regression model. In other words. intellectual tasks. of those selected. In the example. the personnel officer would first have to give the test to a sample of applicants and hire all of them. Procedure For Construction of a Regression Model In order to construct a regression model. with a predicted grade point average of 1. and is sometimes called the independent variable. In the case of the widget manufacturing example. The problem was one of both selection. who will cook and who will fight. These individuals differed in their ability to perform physical and 182 © Copy Right: Rai University 11.23 at the state university. radar. None of what follows will make much sense if the procedure for constructing a regression model is not understood. Regression models are a source of information about the world.

B Y'i 21 15 32 8 23 Yi . of how well each interviewer performed. A Ms. A Yi 23 18 35 10 27 Y'i 38 34 16 10 14 Y'i 21 15 32 8 23 Ms. the better the model. but for mathematical reasons the sign is eliminated by squaring the differences. the closer the predicted to the observed values. but that is The Least-squares Criteria For Goodnessof-fit In order to develop a measure of how well a model predicts the data. it would be possible to ignore the signs of the differences and then sum. In this case large positive deviations cancel out large negative deviations. the best one. and at the end of the two month’s trial period. In the example it would be the predicted number of widgets per hour by that individual. For example.Y' i (Yi . and would be promoted. These differences are called Summing the squared differences yields the desired measure of goodness-of-fit. The first step is to find how much each interviewer missed the predicted value for each applicant. 1. that is. because he had a smaller sum of deviations. 11. residuals. B Interviewer Observed Yi 23 18 35 10 27 Mr. All of the applicants interviewed were hired. A casual comparison of the two columns of predictions with the observed values leads one to believe that interviewer B made the better predictions. Suppose there were two interviewers.Y'i )2 -15 -16 19 0 13 1 2 3 3 2 4 14 225 256 361 0 169 1011 4 9 9 4 16 42 Obviously neither interviewer was impressed with the fourth applicant. regardless of the predictions. or.Y' i -15 -16 19 0 13 1 2 3 3 2 4 14 In order to avoid the preceding problem. The prediction which minimizes this sum is said to meet the least-squares criterion.556 © Copy Right: Rai University 183 . The more similar these two values. The notational scheme for the table is as follows: Y i is the observed or actual number of widgets made per hour Y’i is the predicted number of widgets Suppose the data for the five applicants were as follows: Interviewer Observed Mr. it is desired that the predicted number of widgets made per hour be as similar to observed values as possible. in the widget manufacturing situation. B Mr. or single number. this procedure would yield: Interviewer Obs Mr. At the end of that time. take the sum of the absolute values.Y'i Yi . A Ms . than interviewer B. respectively. Interviewer A would receive a pink slip. RESEARCH METHODOLOGY then it would appear that interviewer A is the better at prediction. A Ms. This goes against common sense. A Y'i 38 34 16 10 14 Ms. The next section presents a method of measuring the similarity of the predicted and observed values of the predicted variable. who separately interviewed each applicant for the widget manufacturing job for ten minutes. the other was to be fired. with a sum of 14. the interviewer had to make a prediction about how many widgets that applicant would produce two months later. This would work. B Mr. B Mr. was to be retained and promoted. The goal in the regression procedure is to create a model where the predicted and observed values of the variable to be predicted are as similar as possible. In this case the smaller the number.Y’i is the predicted value of the dependent variable. leaving what appears as an almost perfect prediction for interviewer A. it is valuable to present an analogy of how to evaluate predictions. B Yi 23 18 35 10 27 Y'i 38 34 16 10 14 Y' i 21 15 32 8 23 Yi . A Ms . Interviewer B in the above example meets this criterion in a comparison between the two interviewers with values of 42 and 1011. The purpose of the following is to develop a measure of goodness-of-fit. This is expressed in the following mathematical equation. This is done by finding the difference between the predicted and observed values for each applicant for each interviewer. B. for good reason. If the column of differences between the observed and predicted is summed.Y' i) 2 (Yi . Mr. how well the interviewer predicted. A and Ms. one interviewer.Y'i Yi . In the example. A procedure is desired which will provide a measure.

the fewer the number of widgets made. predictions are made by performing a linear transformation of the predictor variable. A form-board is a board with holes cut out in It can be seen that the model does a good job of prediction for the first and last applicant. The goal is to put the right pegs in the right holes as fast as possible. 43. The data was collected as follows The selections of the parameters for the second model is based on the observation that the longer it takes to put the form board together. various shapes: square. The second predicted score. let a=10 and b=1. there are an infinite number of possible models. The saying “square peg in a round hole” came from this test. a negative value of b must be used in the regression model. RESEARCH METHODOLOGY Form-Board Observed Xi 13 20 10 33 15 Widgets/hr Observed Predicted Yi 23 18 35 10 27 Residuals Squared Residuals (Yi-Y'i)2 0 144 225 1089 4 1462 Y'i=a+bXi (Yi-Y'i) 23 30 20 43 25 0 -12 15 -33 2 (Yi-Y'i)2 where a and b are parameters in the regression model. Xi Yi Y'i=a+bXi 13 23 20 18 10 35 33 10 15 27 28 21 31 8 26 (Yi-Y'i) -5 -3 4 2 1 (Yi-Y'i) 2 (Yi-Y'i)2 25 19 16 4 1 55 This makes the predicted values closer to the observed values on the whole.The Regression Model The situation using the regression model is analogous to that of the interviewers. as measured by the sum of squared deviations © Copy Right: Rai University 11. The same procedure is then applied to the last three scores.556 . analogous to having an infinite number of possible interviewers. A number of possible models will now be examined where: Xi is the number of seconds to complete the form board task Y i is the number of widgets made per hour two months later Y’i is the predicted number of widgets For the first model. suppose that. a and b. In this case the regression model becomes The first score (X1=13) would be transformed into a predicted score of Y 1'= 10 + (1*13) = 23. rather than being interviewed each applicant took a form-board test. in other words. The procedure discussed in the last chapter. except instead of using interviewers. because of the prediction goal. round triangular. or. but the middle applicants are poorly predicted. 184 This model fits the data much better than did the first model. Thus a model with a=41 and b=-1 will now be tried. the predicted value would be obtained by a linear transformation of the score. the relationship between the variables is said to be inverse. as the test has been around for a long time. that of transforming the scale of X to the scale of Y. resulting in predictions of 20. Rather than interviewers in the above example. In this case the parameters of a=36 and b=-1 will be used. The mathematician knows that in order to model an inverse relationship. some other values for the parameters must be tried. Xi Yi Y'i=a+bXi (Yi-Y'i) 0 2 9 7 6 (Yi-Y'i) 2 Form-Board Test Xi 13 20 10 33 15 Widgets/hr Yi 23 18 35 10 27 (Yi -Y'i)2 0 4 81 49 36 170 13 23 23 20 18 16 10 35 26 33 10 3 15 27 21 Because the two parameters of the regression model. a. Because it is desired that the model work for all applicants. When the tendency is for one variable to increase while the other decreases. such that both have the same mean and standard deviation will not work in this case. can take on any real value. respectively. etcx. and 25. Fairly large deviations are noted for the third applicant. In the above example. attempting to predict the first score perfectly. to minimize the sum of the squared deviations. which might be reduced by increasing the value of the additive component of the transformation. The goal of regression is to select the parameters of the model so that the least-squares criterion is met. The score for the test is the number of seconds it takes to complete putting all the pegs in the right holes. The prediction takes the form where X2 = 20 would be Y 2'= 10 + (1*20) = 30.

one which makes the sum of squared deviations smaller. The point is soon reached when the question. Completing this task.5 -4 8 -7. Perhaps a decrease in the value of b would make the predictions better. we will never know if it is the best possible model. and solve for the value of b. The following table summarizes what is known about the problem thus far. This is the topic of the next section. set it equal to zero. because it is always possible to change the values of the two parameters slightly and obtain a better estimate.5 (Yi-Y' i) -2.25 142. “When do we know when to stop?” must be asked. If the same search procedure were going to be continued. Hence a model where a=32 and b=-. Using a similar procedure to find the value of a yields: a 10 36 41 32 b 1 -1 -1 -. which is seldom possible in the real world.(residuals). The appropriate summations are presented below Xi 13 20 10 33 15 Yi 23 18 35 10 27 Xi2 169 400 100 1089 225 XiYi 299 360 350 330 405 SUM 91 113 1983 1744 With four attempts at selecting parameters for a model. and so forth.5. and solving for the results. applied statisticians approached the mathematician with the problem and asked if a mathematical solution could be found.556 © Copy Right: Rai University 185 .” What the mathematician does is take the first-order partial derivative of the last form of the preceding expression with respect to b. The result of these calculations is a regression model of the form: Solving for Parameter Values which Satisfy the Least-squares Criterion The problem is presented to the mathematician as follows: “The values of a and b in the linear model Y’i = a + b Xi are to be found which minimize the algebraic expression Solving for the a parameter is somewhat easier.5 24.5 (Yi-Y' i)2 (Yi-Y' i)2 6. Using this procedure.25 16 64 56.5 22 27 17. Rather than throwing their hands up in despair. the best-fitting (smallest sum of squared deviations) is found to this point in time.5 Now comes the hard part that requires knowledge of calculus.25 12. If the student is simply willing to “believe” it may be skimmed without any great loss of the ability to “do” a linear regression problem.5 will be tried.5 3. At this point even the mathematically sophisticated student will be asked to “believe. it appears that when a=41 and b=-1. plugging them into the equations. the result becomes: Since the attempt increased the sum of the squared deviations. it obviously was not a good idea.5 The “optimal” values for a and b can be found by doing the appropriate summations. The mathematician begins as follows: 11. This is the method that mathematicians use to solve for minimum and maximum values. Unless the sum of squared deviations is equal to zero. the answer must necessarily be “never”. perhaps the value of a could be adjusted when b=-2 and b=-1. RESEARCH METHODOLOGY Xi 13 20 10 33 15 Yi 23 18 35 10 27 Y' i=a+bXi 25.5 (Yi-Y'i)2 1462 170 55 142. The following program provides scroll bars to allow the student to adjust the values of “a” and “b” and view the resulting table of squared residuals.

The value of the correlation coefficient will be used in a later formula in this chapter. Applying procedures identical to those used on earlier “nonoptimal” regression models.58 20. The data will be represented as points on a scatter plot. the seconds to put the form-board together. Scatter Plots and the Regression Line The preceding has been an algebraic presentation of the logic underlying the regression procedure. Step 1: Put the calculator in “bivariate statistics mode.42 1.57 -2.88 30. but it will be the best available. Imagine what a “difficult” problem with hundreds of pairs of decimal numbers would be like. both the number of pairs of numbers (five) and the integer nature of the numbers made this problem “easy. and since some students have an easier time understanding a visual presentation of an algebraic procedure. Since there is a one-to-one correspondence between algebra and geometry. A similar value for the Y variable. A scatter plot or scatter gram is a visual representation of the relationship between the X and Y variables. A demonstration of this fact will be done for this problem shortly. First.28 20.34 (Yi-Y'i)2 (Yi-Y'i)2 20. Xi 13 20 10 33 15 Yi 23 18 35 10 27 Y'i=a+bXi 27. as in the present case where they both start at 10. The mathematician is willing to bet the family farm on this result. The specific keystrokes required for the steps vary for the different makes and models of calculators. the X and Y axes are drawn with equally spaced markings to include all values of that variable that occur in the sample. Some calculators verify the number of numbers entered at any point in time on the display. That is why a bivariate statistics mode is available on many calculators. The results of these calculations for the example problem are The discussion of the correlation coefficient is left for the next chapter. the number of widgets made per hour. the residuals (deviations of observed and predicted values) are found. but not by much. In the example problem.556 . The bottom line is that the equation will be used to predict the number of widgets per hour that a potential employee will make.66 (Yi-Y'i) -4. is from 10 to 35. while the regression equation will be represented by a straight line.14) is smaller than the previous low of 55. no other values of a and b will yield a smaller sum of squared deviations. 186 © Copy Right: Rai University 11. a small space is left before the line markings to indicate this fact.78 4. Step 3: Enter the pairs of numbers. X.Demonstration of “Optimal” Parameter Estimates Using either the algebraic expressions developed by the mathematician or the calculator results. Step 4: Find the values of various statistics including: The mean and standard deviation of both X and Y • The correlation coefficient (r) • The parameter estimates of the regression model • The slope (b) • The intercept (a) Note that the sum of squared deviations ((Yi -Y’i )2=54. a visual presentation will now be attempted. If the axes do not start at zero. squared. In any case. That is. the “optimal” regression model which results is: RESEARCH METHODOLOGY This procedure results in an “optimal” model.80 54. All that is important at the present time is the ability to calculate the value in the process of performing a regression analysis. called the regression line.” This “easy” problem resulted in considerable computational effort.44 8. would have to range between 10 and 33.” This step is not necessary on some calculators.56 1. The prediction will not be perfect.88 8. and summed to find the sum of squared deviations.55 1.14 Using Statistical Calculators to Solve for Regression Parameters Most statistical calculators require a number of steps to solve regression problems. Step 2: Clear the statistical registers. given the score that he or she has made on the form-board test.44 25.76 2. the lowest and highest values that occur in the sample. given the data and the form of the model. The mathematician is willing to guarantee that this is the smallest sum of squared deviations that can be obtained by using any possible values for a and b.0. Please consult the calculator manual for details.

It is symbolized as s Y. The following illustrates how to draw the regression line.556 © Copy Right: Rai University 187 . X. Note that all the points fall on a straight line. 20. then a straight line would be formed. In this case only one degree of freedom is lost because only one parameter is estimated for the regression model. read as s sub Y dot X. One degree of freedom is lost for each of the parameters estimated. The computation is easier because the statistical calculator computed the correlation coefficient when finding a regression line. The next figure presents the five X and Y’ values that were found on the regression table of observed and predicted values. the points corresponding to the smallest and largest X and connect these points with a straightedge. within rounding error. The point is plotted by finding the intersection of the X and Y scores for that pair of values. a and b. the first point would be located at the intersection of and X=13 and Y=23. For example. called the computational formula for the standard error of estimate. but it does not require the computation of the entire table of differences between observed and predicted Y scores. The standard error of estimate is defined by the formula The calculation of the standard error of estimate is simplified by the following formula. Note that the numerator is the same as in the least squares criterion. RESEARCH METHODOLOGY As such it may be thought of as the average deviation of the predicted from the observed values of Y. The similarity of the two measures may be resolved if the standard deviation of Y is conceptualized as the error around a predicted Y of Y’i = a. the calculation for the example data is relatively easy. The notation is used to mean the standard deviation of Y given the value of X is known. Second. Note that the first point would be plotted as (13. however. because the axes do not begin at zero. The a value is sometimes called the intercept and defines where the line crosses the Y-axis. When the least-squares criterion is applied to this model. Note the similarity of the definitional formula of the standard deviation of Y to the definitional formula for the standard error of measurement. the degrees of freedom for the regression procedure. that is. there is a break in the line.The paired or bivariate (two variable. The regression line is drawn by plotting the X and Y’ values. The computational formula for the standard error of estimate will always give the same result. because the entire table of differences and squared differences must be calculated. The computational formula may look more complicated. but N-2. Most often the scatter plot and regression line are combined as follows The Standard Error of Estimate The standard error of estimate is a measure of error in prediction. as the definitional formula. The standard error of estimate may be calculated from the definitional formula given above. This does not happen very often in actual drawings. rather than the mean. If every possible Y’ were plotted for every possible X. 27. except the denominator is not N. etc. rather than N-1. the standard error of measurement finds the sum of squared differences around a predicted value of Y. Any two points would actually work. The easiest way to draw the line is to plot the two extreme points. The computation is difficult. The standard error of estimate is a standard deviation type of measure. Two differences appear.X. Because the numerator has already been found. the standard error of measurement divides the sum of squared deviations by N-2.88).57) the second point as (20. The computational formula is as follows: 11. that is. the optimal value of a is the mean of Y. The equation Y’ = a + bX defines a straight line in a two dimensional space. First. The first point and the remaining four points are presented on the following graph. but the two extreme points give a line with the least drawing error.Y) data will be represented as vectors or points on this graph.

The larger its value. then. Y’=a+bX. although the following illustration attempts the relatively impossible. if a person applying for a position manufacturing widgets made a score of X=18 on the form board test. between which some percentage of the observed scores are likely to fall.25. because it is the best guess of Y when X is a given value. It is possible to model the conditional distribution with the normal curve. the prediction is not perfect. The interval computed is a 95 percent confidence interval. Interval Estimates The error in prediction may be incorporated into the information given to the client by using interval estimates rather than point estimates. Note that the result is the same as the result from the application of the definitional formula. The relationship between X and Y in this case is often symbolized by Y |X. the vision would be essentially correct. what two scores on a normal distribution with parameters and cut off some middle percent of the distribution? While any percentage could be found. One interpretation of the standard error of estimate. The conditional distribution is a model of the distribution of points around the regression line for a given value of X. Conditional Distributions A conditional distribution is a distribution of a variable given a particular value of another variable. Using this formula to calculate the standard error of estimate with the example data produces the following results RESEARCH METHODOLOGY It is somewhat difficult to visualize all possible conditional distributions in only two dimensions. In order to create a normal curve model.78 would result from the application of the regression model and an interval estimate might be from 14. A point estimate is the predicted value of Y. suppose that an infinite number of applicants had made the same score of 18 on the form board test. and the worse the prediction. • Y |X and • Y| X .25 to 31. The standard error of estimate is often used as an estimate of • Y |X for all the conditional distributions. While the point estimate gives the best possible prediction. then finding an interval estimate is reduced to a problem that has already been solved in an earlier chapter. the estimate of • Y |X for the conditional distribution of number of widgets made given X=18.11. It could be said that 95 times out of 100 the number of widgets made per hour by an applicant making a score of 18 on the form board test would be between 14. within rounding error. For example. would be Y’=40. The two scores which cut off the middle 95% of that distribution are 14. the less well the regression model fits the data.78. That is. For example.78 and • Y| X=4. Conceptually. If the conditional distribution for a value of X is known. The conditional distribution is important in this text mainly for the role it plays in computing an interval estimate.11. The model of the conditional distribution is critical to understanding the assumptions made when calculating an interval estimate. In the example.01-.25 and 31. The conditional distribution which results when X=18 is presented below.556 . low and high. If a hill can be visualized with the middle being the regression line. not everyone make the same number of widgets three months later. If everyone was hired. Y’. it is necessary to estimate the values of the parameters of the model. as defined by the least squares criterion. The interval estimate presents two values. The standard error of estimate is a measure of error in prediction. The use of the Normal Curve Area program to find the middle area is illustrated belowx 188 © Copy Right: Rai University 11.25 and 31. This is found by entering the appropriate value of X in the regression equation. This assumes that all conditional distributions have the same value for this parameter. the parameter estimates of the conditional distribution of X=18 are • Y |X=22. given X equals a particular value.11. The conditional distribution of Y given that X was 18 would be symbolized as Y |X=18. This value is also called a point estimate. The best estimate of • Y| X is the predicted value of Y. a point estimate of 22. is an estimate of the value of • Y |X for all possible conditional distributions or values of X. The distribution of scores which results would be called the conditional distribution of Y (widgets) given X (form board).The computational formula for the standard error of estimate is most easily and accurately computed by temporality storing the values for s Y 2 and r2 in the calculator’s memory and recalling them when needed. Y’. For example. a conditional distribution of number of widgets made exists for each possible value of number of seconds to put the form board together. the standard value is a 95% confidence interval.957*18=22.

11. • Y|X is correctly estimated by Y’. called interval estimates. the conditional distribution for that X is a normal distribution. • Y| X by s Y.In this case. the relationship between X and Y can be adequately modeled by a straight line. • They involve a linear transformation of the predictor variable into the predicted variable. First.X = 4.X. and the value of z is 1. The model can then be used in the future to predict either exact scores. Third. subscripts indicating a conditional distribution may be employed. called point estimates. that is.78.96 for a 95% confidence interval. Because • Y| X is estimated by Y’. and sY. or intervals of scores. Second. for X=18. that the least squares criterion is met.25. RESEARCH METHODOLOGY which is the computational form for computing the 95% confidence interval. resulting in an “optimal” model. Regression Analysis Using Spss The REGRESSION command is called in SPSS as follows Point to Ponder • Regression models are powerful tools for predicting a score based on some other score. Interpretation of the confidence interval for a given score of X necessitates several assumptions. the computational formula for the confidence interval becomes The output from the preceding includes the correlation coefficient and standard error of estimate.X. •Y |X is correctly estimated by s Y. which means assuming that all conditional distributions have the same estimate for •Y|X. Y’=22. computation of the confidence interval becomes The regression coefficients are also given in the output. The optional save command generates two new variables in the data file. Other sizes of confidence intervals could be computed by changing the value of z. For example.556 © Copy Right: Rai University 189 . • The parameters of the linear transformation are selected such Selecting the following options will command the program to do a simple linear regression and create two new variables in the data editor: one with the predicted values of Y and the other with the residuals.

As such. In this situation all variables are treated as independent variables. Similarly. in factor analysis the researcher is assuming that there is a “child” out there in the form of an underlying factor. Factor analysis is popular multivariate technique which measures association between variables. That is we now investigate relations of interdependence where no one variable is dependent on another. • To select a subset of variables from a larger set. of course. • Analyze and interpret computer output generated for a factor analysis. 2. and to drop proposed scale items which cross-load on more than one factor. WE can best explain factor analysis with a non technical analogy: A mother sees various bumps and shapes under a blanket at the bottom of a bed. this which original variables have the highest correlations with the principal component factors. so the mother concludes that what is under the blanket is a single thing. However there is a need to find out what are the key drivers. To simplify a set of data by reducing a large number of measures (which in some way may be interrelated and causing multicollinearity) for a set of respondents to a smaller more manageable set which are not interrelated and still retain most of the original information . • To create a set of factors to be treated as uncorrelated variables as one approach to handling multicollinearity in such procedures as multiple regression • To validate a scale or index by demonstrating that its constituent items load on the same factor. be included in the analysis. based on What is Factor Analysis? The main objective of Factor analysis is to summarize a large number of underlying factors into a smaller number of variables or factors which represent the basic factors underlying the data. where the large number of variables precludes modeling all the measures individually. • To determine network groups by determining which sets of people cluster together (using Q-mode factor analysis. The technique is highly complex and makes use of sophisticated statistical techniques which are beyond the scope of our course. it does not assume a dependent variable is specified). Typical Problem Studied Using Factor Analysis Factor analysis is used to study a complex product or service to identify the major characteristics considered important by consumers. most likely her child.556 . Factor analysis identifies latent or underlying factors from an array of seemingly imp variables. thereby giving justification for administering fewer tests. However. • To establish that multiple tests measure the same factor. We now look at a technique which also measure association but looks at relations of interdependence. Uses of Factor Analysis To reduce a large number of variables to a smaller number of factors for modeling purposes. It reduces attribute space from a larger number of variables to a smaller number of factors and as such is a “non-dependent” procedure (that is. so it is important when conducting factor analysis that possible variables which might introduce spuriousness. factor analysis can be and is often used on a standalone basis for similar purposes.RESEARCH METHODOLOGY LESSON 35: FACTOR ANALYSIS So far we have looked at techniques of multiple regression where we essentially check out the association between a dependent variable and several independent variables. • To identify clusters of cases and/or outliers. such as anteceding causes. To identify the underlying structure of the data in which a very large number of variables may really be measuring a small number of basic characteristics or constructs of our sample. That is. Therefore our p[presentation of this technique will focus on its intuitive rationale and applications. Factor analysis is used to uncover the latent structure (dimensions) of a set of variables. The two major uses of factor analysis 1. If correlation is spurious for some reason. discussed below) 190 © Copy Right: Rai University 11. When one shape moves toward the top of the bed. factor analysis takes as input a number of measures and tests which are analogous to the bumps and shapes. all the other bumps and shapes move toward the top also. In this and the next lesson we shall be introduced to Factor analysis. and he or she takes simultaneous movement (correlation) as evidence of its existence. For e. By the end of this lesson you should be able to • Understand the analytical and intuitive concepts of Factor inference will be mistaken.g a survey may throw up bet 15-20 attributes which a consumer considers when buying a product. Since most multivariate techniques are run by most statistical packages easily our emphasis will be on providing the student with exposure to relevant application s and how to interpret computer output and to run a factor analysis on the computer. factor analysis is integrated in structural equation modeling (SEM). Those that move together are considered a single thing and are labeled a factor. helping create the latent variables modeled by SEM. analysis • Determine the types of applications for which we can use factor analysis.

• Lack of high multicollinearity. and not sight seeing.: A factor analysis of data on TV viewing indicates that there are seven different types of programmes that are independent of the network offering as perceived by the viewers: movies. Example A two wheeler manufacturer is interested in determining which variables his customers think of as being imp when they consider his product. 2. I use a two-wheeler because it’s affordable. outdoor vacations 5 Resort vacationing 6. 9. pleasure of owing/using product. I feel very powerful when I am on my two-wheeler. My vehicle gives me a comfortable ride. 3. also called common factor 11. • Interval or near-interval data. reduces their numbers by grouping them in to fewer factors. It gives me a sense of freedom to own a two wheeler 3. Factor analysis would then aim to reduce 10 factors to a few core factors. visiting friends and relatives and plus sight seeing. Thus it looks at interdependencies or interrelationships among data. principal axis factoring (PAF). What factor analysis does is it identifies two or more questions that result in responses that are highly correlated. 3. adventure plots. 4. It analyzes correlations between variables. Factor Analysis – the Process We now take the case of a marketing research study where factor analysis is most popularly used. sin 3.Three people should be allowed to travel on a 2 wheeler. subjective probability of a mispurchase 3. is preferred for purposes of confirmatory factory analysis Factor Analysis . Confirmatory factor analysis can be used to test whether the variables in a data set come from a specifies number of factors. of factors / statements or attributes. Basic principles of Factor Analysis Factor analysis is part of the multiple general linear hypothesis (MLGH) family of procedures and makes many of the same assumptions as multiple regression: • Linear relationships. 2. with the most common being principal components analysis (PCA). Some of the application are as follows: 1. Testing of hypotheses about the structure of a data set. RESEARCH METHODOLOGY How it Works Factor analysis applies an advanced form of correlation analysis to a no. Some of my friend’s who don’t have one are jealous of me. and • Assumption of multivariate normality for purposes of significance testing. The statements are as: 1. perceived product importance/ perceived importance s of negative consequences of a mispurchase 2.556 © Copy Right: Rai University 191 . However. Identifying market segments. Factor analysis is best illustrated with the help of an example: Those who vacation for the purpose of visiting friends and relatives. Determining the underlying dimensions of the data. extraneous ones excluded). The answers given by 20 resp is inputed into the computer. For each such the researchers have to use their judgment to determine what a particular factor represents. It can be used for condensing or simplifying data: An example of this : In a study of consumer involvement across a number of product categories. foreign vacationing. We begin by administering a questionnaire to all consumers. adult entertainment. 2. An example of this is a factor analysis of data on desires sought on the last vacation taken by 1750 respondents revealed six benefit segements for vacationers: • • • • • • analysis.the theory Factor analysis is a complex statistical technique which works on the basis of consumer responses to identify similarities or associations across factors. 4. 5. What the factor analysis does statistically is to group together those variables whose responses are highly correlated. family entertainment. 6. and positioning of products. Then from the groups of factors or statements we choose an overall factor which appears to represent what all the factors in the group appear to mean. A two-wheeler is essentially a man’s vehicle. 19 items were reduced to four factors of : 1. The value of the product as a cue to the type of person who owns it Each of these factors was independent and there was no multicollinearity. sightseeing. Low maintenance costs make it very economical in long run 4. . There are several different types of factor analysis.The analysis begins by observing the correlation and determining whether there are significant correlations between them. 7 : completely disgree) with a set of 10 statements relating to their perceptions and some attributes about two wheelers. If several of the statements are highly correlated. westerns. Developing perceptual maps. I feel good whenever I see ads for my two wheeler on TV or magazines 8. I think two wheelers are a safe way to travel. unrealistic events. The respondents were asked indicate on a 7 pt scale (1: completely agree. 7. it is thought that these statements measure some factor common to all of them. Factor analysis can only be applied to continuous or intervally scaled variables. 10.Applications The main applications of factor analysis are in marketing research. A typical study will throw up many such factors. Factor analysis is often used to determine the dimensions or critieria by which consumers evaluate brands and how each brand is seen on each dimension. • Proper specification (relevant variables included.

We give a simple example below for six pairs of statements..68 .06 .92 . There are many different ways of extracting factors. For simplicity we assume the correlation between two statements is either one (perfect ) or zero. The process continues till additional factors do not reduce the unexplained variance in standardized scores. Usually the factor extraction process is stopped after the unexplained variance is below a specified level.1 .12 . 3 are correlated with each other and unrelated to 4. Each additional factor selected is likely to explain less of the than the first factor. On the vertical column we have the different variables. By this method a set of variables is transformed into a new set of factors that are uncorrelated with each other. Computer Output Linser Ttable Table 1 Statement/v ariables 1 2 3 4 5 Eigen values .06 . Correlation Coefficients We calculate the correlations coefficients associated with standardized scores of responses to each pair of statements.86 . RESEARCH METHODOLOGY The Process of Identification is Complex And Is Broadly as Follows Factor analysis selects one factor at a time which explains the maximum variance in the standardized scores than any other factor combination.89 F1 F2 F3 communalities Meaning of Key Terms Used in Factor Analysis to understand and interpret a computer output of a factor analysis we need to understand the meaning of certain terms. On the horizontal row we have the different factors.75 . Variance: A factor analysis is like a regression analysis and tries to best fit factors to a scatter diagram of responses in such a way that factors explain the variance associated with responses to each statement. This is done by calculating an Individual’s standard score on a statement or a attribute Standardized score = [actual response to statement]-[mean response of all respondents to the statement]/standard deviation of all responses to the statement. 5 which are correlated with each other. That is the factor analysis aims to reduce each variable to a linear combination of a set of actors. What is a Factor ? Each factor is a linear combination of its component factors. 5pt.83 . 3. then factor analysis expresses each variable as a linear combination of the 192 © Copy Right: Rai University 11. We then construct new variables on the basis of attributes or variables which are highly correlated with each other. These factors are constructed by finding the best linear combination of variables that accounts for the maximum possible variation in the data.24 . St 1 2 3 4 5 12345 11100 1100 100 11 1 As can be seen statements 1. Thus each person’s score is actually a measure of how many standard deviations his response lies from the mean response calculated across all respondents. Thus is x1. Computer output We now turn to interpreting the computer output. Additional factors maybe selected till all the variance is accounted for . etc) To allow for comparability all responses are standardized. Before we turn to an actual computer output we need to understand some terms which appear on the computer output and represent critical stages in the analysis. The starting point is generating a correlation matrix of the original data set where responses to each variable or statement is correlated with others. We have taken five variables and the data is reduced to 3 possible factors. Each factor is defined as the best linear combination of variables in terms of explaining the variance not accounted for by the preceding factor.84 . 2. We begin with a part of the sample output which shows Above inn in table2 we present a sample output of a factor analysis.18 . Standardized scores of an individual’s response: Standardized scores are used because responses to different questions can use different scales (e.86 .g.x5 are are our original variables and we have three factors .15 .07 1. 1. 2. The gives a correlation matrix for them is given below. We now explain some of the terms in output and what they mean.76 .54 . Generally the analytical procedure follows a series of steps to arrive at a solution.04 . We aim to get factors such that we explain as much variance associated with each statement in the study.Interpretation of Computer Output Factor analysis identifies factors or attributes which are strongly correlated with each other uncorrelated with other. 1.9 1. 7 pt.1 . Principal Components analysis is the most frequently used approach.85 . This suggests all the underlying factors can be grouped into two core factors which are unrelated to each other.556 .94 .

2.12+. We usually rely on some rule of thumbs: 1. In which case the factors would be useless.The two are essentially the same thing. An alternative to the Eigen value is to look at the percentage of variation in original variables accounted for by the jth factor. Naming Factors 4. Since in this case the three factors account for most of the variance associated with each of the statements we can say the 3 factors fit the data quite well. etc are the factor loadings and e1 to e5 are the error terms.842+.4.062= 1. 2. Thus a researcher would look at the basic factor being measured by these factors and club them together as representing an overall factor. If there are five variables each variable should account for 20% of the variation in the data. The cumulative term in the putput essentially explains the cumulative variance explained by the factors . In our example we have dropped factors 3. Thus for example for factor 1 the Eigen value is found by How Many Factors? 1.5. Before extraction it is assumed each of the original variables has a Eigen value of 1. explain less than 10% of the variantion in the data.. Communalities provide information on how well the factors fit the data. Cumulative percentage of variation.556 © Copy Right: Rai University . The factor loadings are then placed in a correlation matrix between the variables and the factors. The parameters I11 I 12. When interpreting output we look at cumulative percentage of variation .3. In our example we can see that factor 1 explains 55% of the variation in the data. Then it is assumed it is a linear combination of these vars and is given a suitable name representing these variables. RESEARCH METHODOLOGY X 1 = I 11F1 + I 12 X 2 + I 13 X 3 F3 X 2 = I 21 F2 + I 22 F2 + I 23 X 3 the factor model isilar to the regression model there are a few independent variables termed factors which help explain the variation in the dependent variable or x.5 and retained 1. This is done by identifying which factor is associated with which of the original variables. 6. As shown in table 2. Eigen Values Indicate how well any given factor fits the data from all the respondents on all the statements. If factor 1 has high loading with variables 1. The error term consists of the variation in the factors which is not explained by the factors. The most common approach to determining the number of factors to retain is to examine the Eigen values . Another rule of thumb requires that we retain sufficient factors which explain a satisfactory percentage of total variance( usually over 70%) After deciding upon extracted factors in stage 1 the researcher has to interpret and name the factors. For example F1 and x1 is .(86)+ . which is explained by the three factors. The factor loadings are therefore the correlation between the factors and the variable. The communality can be found by squaring the factor loadings of a variable across all factors and then summing. 3. i.three factors. For example . Eigen value for factor 1= . 5. The data shows F1 is a good fit on data from st1. factor 2 35. Since factor analysis is designed to reduce the number of original variables a key question is how many factors should be genrated? It is possible to keep generating factors till they equal the number of original variables. 193 11. 3 but poor for st4. Therefore we would probably drop these factors.5% of the variation. A factor is identified by those items that have a relatively high factor loading on that factor and a relatively low factor loading on other factors.It can also be thought of as a measure of the uniquenwss of a variable.682+. The Eigen values are defines as : the sum of the squared factor loadings for that factor. However Factors 3. Most computer programmes also give the percentage of variance explained as well as the Eigen values. We see cum pct is 80.54 of variance of responses to statement 3 is explained by the three factors.86. Therefore we should look at the percentage of variation explained by a factor . Factor 1 is highly correlated to statement 1 and 2 and least to statement 4. A related rule of thumb is to look for a large drop in variance explained between two factors in the PCA solution. We would therefore expect any factor which is linear combination of some of the original values to have an Eigen value greater than one. The higher is the Eigen value for a factor the higher is the amount of variance explained by the factor.The loadings are derived using the principle of least squares . All factors included prior to rotation must explain atleast as much variation as an average variable.91 All computer programmes provide the Eigen value or the percentage of variance explained. There is an Eigen value for every factor. For each statement communalities indicate the proportion of variance in responses to statements. 4.89 of variance in response to statement 5 and only . 5. Factor Loadings These are the correlation bet the factor and the statement or variable’s standardized response .e by looking at the factor loadings. Communalities+ How well a factor fits the data from all respondents for any given statement? Communalities measure the percentage of total variation in any variable or statement which is explained by all the factors. Therefore we usually only retain factors which have an Eigen value>1. Percentage of variation in original variables accounted for by the jth facto = Eigen value j/Number of factors 7. Therefore we are able to economize information contained in 10 original variables to 3 factors losing only 20% of original information.3 for all 3 factors.2./ proportion of variance explained Most output give both cumulative percentage of variation accounted for by a factor and the proportion of total variation in the data accounted for by one factor.2. A low communality of figure indicates that the variable is statistically independent and cannot be combined with other variables. The original factor matrix is used for this purpose.

4. The percentage of variance explained and eigen values help determine which factors to retain.for our study. Eigenvalue criteria: An eigen value represents the amount of variance in the original variables explained associated with afactor. Factor loadings are used to interpret the factors. Only factors with Eigen values>1 are retained. it is quite possible to input Insert table 2 Factor Rotation Factor analysis can generate several solutions for any data set. A factor with an Eigen value<1 is worse than a single variable. Rotation involves moving the components or the axes to improve the fit of the data. This correlation is then used as the basis for identifying factors and labeling them. Thus for example variables3. This In fact. • Ouptut: the most important outputs are the factor loadings. • Also factor analysis does not take to statistical testing . and rotation all involve considerable skill and judgement of the analyst. • If a financial institution treated me in an impersonal or uncaring way. 3. factor scores and variance explained percentages eigen values. their interpretation. Or the toal variance explained by the factor.4. Table 2 shows the pilot study data and the correlations among the variables. etc. The second factor is highly correlated with the first two variables. \In Varimax rotation the factors RESEARCH METHODOLOGY Point to Ponder • Factor analysis is used to identify underlying dimensions in the data by reducing the number of variables . We would choose three factors. Screen plot criteria: This is a plot of the Eigen values against the number of factors. This is referred to as the screen. Even though they account for only a small proportion total. The sum of the factor loadings of each variable on a factor represent a Eigen value.29. Limitations • Tends to be highly subjective process. Thus correlation between factor 1 and x1 is .5 combine to form to define the first factor. There are many such programmes of rotation such as Varimax. It therefore provides an indicator of the extent to which the original variables are correlated with each factor and the extent of correlation. 5. This will not change the total variation explained by the retained factors but will shift the relative percentage explained by each factor. The factor loadings are shown in table 21. variation. The plotusually has a distinct brak between the steep slope of factors with large Eigen values and a gradual trailing off associated with the rest of the factors. 2. Significance test criteria: We can also determine the statistical significance of separate Eigen values and retain only those factors which are statistically significant. Sometimes an analyst picks on two factors which load heavily on a factor to represent the factor as a whole. Because both these appear to be linked to small bank factors. The determination of number of factors . They might be termed small bank factors. This is shown in figure 21.556 . In order of extraction.1. This means the list of variables should be complete. Experience has shown that the point at which the screen begins denotes the true number of factors. And should be dropped. • Key assumption of this analysis is that the factors underlying the variables is and the variables completely represented by the factors. The problem with this criteria is that in large sample many factors may be statistically significant.For example if there is a variances explained by five factors ( before rotation ) is 40%. 1. therefore it id difficult to know if the results are merely accidental or actually reflect something meaningful. 5. 30%. Promax. possibly personal factor. Each time the factors are rotated the factor loadings change . • The input of factor analysis is asset of variables for each object in the sample. We can now examine how a factor analysis is conducted using an example: • I want to be known personally at my bank and to be treated Factor Interpretation How is the factor interpreted? Interpretations are based on factor loadings which are correlations between the factors and the original variables.and so does the interpretation of the factors. A factor analysis program usually starts by calculating variable-by-variable correlation matrix. Because these variables stress the personal aspects of bank transactions. However as the third has a very low Eigen value we would drop it. Each solution is termed a rotation. Percentage of variance criteria: the number of factors extracted is determined so that the cumulative percentage of variance extracted reaches a satisfactory level – usually at least 70%. with special courtesy. continued till the factors stabilize and there is relatively little change . 20% 6% and 4% there is a drop in variance for the fourth factor which might signal a relatively unimportant factor. Most computers automatically provide Varimax scheme. I would never patronize that organization again We assume that a pilot study was conducted using 15 respondents.3. Factor rotation is 194 © Copy Right: Rai University 11.

If we wanted to find out how many women there are in the dataset who live in rural areas. Cross-tabulation Before you start any kind of analysis of a new data set. There are many rotation programs e. I would never patronize that organization again. 2. 4. and one in the Column box. Outputs: Most Imp Items are Factor loadings. I would never patronize that organization again.RESEARCH METHODOLOGY LESSON 36: PRINCIPAL COMPONENT ANALYSIS (hypothetical) study was conducted by a bank to determine if special marketing programs should be developed for several key segments. of factors to include. annual mileage. 6. One of the study’s research questions concerned attitudes toward banking. telling you what the variable is. Large banks are more likely to make mistakes than small banks. select the driving01. Large banks are more likely to make mistakes than small banks.sav file and click on OK 3. I want to be known personally at my bank and to be treated with special courtesy. 5. we must use a Crosstabs (crosstabulation) command: 4. I want to be known personally at my bank and to be treated with special courtesy. and the chi-square option tells us whether there are significantly different numbers in each cell. The output tells us how many men and women in the data set come from each type of area. If a financial institution treated me in an impersonal or uncaring way. please refer to the notes earlier in this handout. 2. In the Open file window. Click on OK. 5. The variables in the file are as follows: gender. You can find ‘driving01. The percentage of variance explained criteria help determine no. but they cannot tell you everything. Small banks charge less than large banks. then select Descriptive Statistics. put one in the Row box. select Open existing file and More files 2. Click on Analyze in the top menu. correlations bet factors and variable and is used to determine the factor. you should save a copy of the file ‘driving01. Small banks charge less than large banks.sav’ into your file space. 11. 7. a series of scores relating to items on a personality trait inventory. Click on Statistics. Each one being termed a rotation. and click on Crosstabs.556 © Copy Right: Rai University . • Exploring the data set 1. length of time respondent has held a driving licence (in years and months). Open SPSS in the usual way. agree-disagree scale. residential roads and busy high street) and finally. A-roads. dual carriageways. Tellers do not need to be extremely courteous and friendly. it’s enough for them simply to be civil. 5. Click on Continue. you now have a copy from which to work. courtesy. it is not clear where these differences lie. it’s enough for them simply to be civil. on the following questions: 1. area respondent lives. 3.Each time there is a rotation the factor loadings change as does the interpretation of factors. and check the Chi-Square box.sav’ by: 1. go to Psycho\courses\psy2005\spss\ . However. If a financial institution treated me in an impersonal or uncaring way.g Varimax(orthogonal rotation). Also included is some practical application exercises from the internet. age. you should explore the data so that you know what the variables are and what each number actually means. 4. country lanes. preferred speed on a variety of different roads at day and at night (motorways. 1. click on Save as and put it in ‘my documents’ in ‘PC files on Singer’ Whenever you need the file again. If you move the cursor to the grey cell at the top of a column. Tellers do not need to be extremely courteous and friendly. a label will appear. The following exercises are to done over the two practical sessions. For the EFA procedures. so: 195 Rotation This is a second stage and is optional. Factor analysis can generate several solutions. You should be familiar with some of the early procedures. Select the two variables that you want to compare (in this case gender and area). You can use the descriptives and frequencies commands to do investigate the data. Once the file is open. 3. • Saving a copy of the data file Before you go any further. The respondents were asked their opinion on a 0-to-9.

Click on Statistics 24. 3 > 3. If strong agreement with the item statement indicates positive tendency. Click on Analyze in the top menu. otherwise these will be included in the scale score.var13 are okay.8. then select Descriptive Statistics. Click on OK Now take a look at your new variable (it will have appeared in a column on the far right of your data sheet – get a ‘descriptives’ analysis on it. Comparing the expected count with the observed count will tell you whether or not there is a higher observed frequency than expected in that particular cell. and click on Crosstabs (so long as you haven’t done anything else since the first Crosstabs analysis above. Click on OK. You should find that the maximum and minimum values make sense in terms of the original values. 2. Click on Transform… …Compute and typing a name for the scale (e. Creating a Scale You might want to sum people’s scores on several items to create a kind of index of their attitude. 4. To check scale reliability: 21. We need to ensure that scores of ‘5’ represent the same tendencies throughout the scale (in this case. Missing Values Make sure before computing a new variable like this that you have already defined missing values. you would not want the values ‘99’ for ‘no response’ included in your scales. it is clear that items var07. ‘Column’ and ‘Total’ percentage boxes). Select the items that you want to include in the scale (in this case. and 5 > 1 18. and what percentage of the total sample do they make up? (10. then you may need to do it again! Follow the same procedure for the other items in the scale that need to be reversed Scale Calculation RESEARCH METHODOLOGY 9. so that the items scores may be added together to create a meaningful overall score. Set the new values by clicking old and new values and entering the old and new values in the appropriate boxes (adding each transformation as you go along). some of the items may have been counterbalanced. whereas the opposite is true of other items. because strong agreement implies a high ‘Thrill’ driving style). Click on Missing Values 13. var09 and var10 should all be recoded (var11. then that item is okay to include in the scale without recoding. Using the Crosstabs procedure.6%). that item’s scores must be recoded. How many male respondents are between the ages of 36 and 40? What percentage of the total sample do they constitute? (35. and check the Chi-Square box.: “Thrill”) in the Target variable box and type the following in the ‘numeric expression’ box: var07r + var08r + var09r + var10r + var11 + var12 + var13 20. Checking The Scale’s Internal Reliability Checking the internal reliability of a scale is vital. 19. we know that some of the personality inventory items in the data set relate to the Thrill-Sedate Driver Scale (Meadows. so we have to reverse the scoring on these variables before we add them together to give a single scale score. Looking at the actual questions in the questionnaire. how many female respondents live in rural area. So you finish up with 1 > 5. 2 > 4. try selecting the ‘Row’. How do we create a single scale score? First of all. 11. so there should be no scores lower than 7 (ie: 1 x 7) and none higher than 35 (ie: 5 x 7). and move them into the Items box. It assesses how much each item score is correlated with the overall scale score (a simplified version of the correlation matrix that I talked about in the lecture). 4 > 2. var08. Go to Transform… …Recode… …Into different variables and select the first item variable (var07) that requires recoding 16. Select Discrete Missing Values and type in 99 into one of the boxes 14. if disagreement with the statement indicates 196 Once you have successfully reversed the counterbalanced item variables. 3. Click on Continue and OK Recoding Variable Scores ]It is usually fairly clear which items need to be recoded. if not move them into the row and column boxes and Click on Statistics. Select ‘Scale if item deleted’ and Inter-item ‘Correlations’ © Copy Right: Rai University 11. plus all the recoded ones – in other words. 10. all the items between var07 and var13 that didn’t require recoding in the earlier step. Give a name for the new recoded variable. perhaps you forgot to exclude missing values. high ‘Thrill’ driving style). These items are numbered 7 to 13 in the questionnaire (Appendix A) and var7 to var13 in the data set. such as var07r and label it as ‘reversed var07’ 17. Click on Continue. However. The seven ‘Thrill-Sedate’ items are scored between 1 and 5. Click on Cells and check the ‘Expected’ counts box (also. for example.556 . positive tendency. Click continue and then change and check that the transformation has worked by getting a frequency table for the old and new variables – var07 and var07r. 16. Double-click on the grey cell at the top of the relevant column of data 12.1%). Click on Continue).g. the gender and area variables should still be in the correct boxes. 1994). Follow these steps to recode each item and then compute a scale composed of all item variables: 15. Currently. those listed in the previous ‘scale calculation’ step). you can compute your scale. a high score may indicate strong positive tendency on some of the items. Have the values reversed properly? If not. This will then tell you where the significant differences lie. so defining these as missing will mean that that particular respondent will not be included in the analysis. If there are scores outside these limits. 23. Click on Analyze… …Scale… …Reliability Analysis 22. For example.

This is just like the correlation matrix referred to in the lectures. To make things easier.1. Click on Continue… …OK. • the next column. Compare the output from this analysis with the output from the varimax analysis. Look at the factors and the loadings in the pattern matrix (concentrate on loadings greater than +/. The statistic that SPSS uses to check reliability is Cronbach’s Alpha.1 should prompt you to drop that particular variable. but when you click on the Rotation button. Click on Continue… …OK 26.4%. Make sure there is a tick in the Scree Plot option 32. Analyze… …Data reduction… …Factor 28. The closer to 1 the value. You can see that in this example.556 © Copy Right: Rai University . the loadings will have changed a bit.3.1) 35. which will be easier to interpret. Use the following procedure to carry out the analysis: 27.The column on the far right will tell us if there are any items currently in the scale that don’t correlate with the rest. Remember that we have asked SPSS to ‘suppress’ or ignore any values below 0. For example. unless you tell it to do otherwise). so move on to look at the rotated factor matrix below it. Click on Extraction 30. and the rest have lower ones. showing the loadings for each of the variables on each of the four factors. you will have ‘pattern’ and ‘structure’ matrices. you can first see the correlations between items in the proposed scale. In this example. Using the data in Driving01. Use the procedure described above. The Scree plot is displayed next. Secondly. as there are none lower than about 0. ‘the law’ is mentioned in variable items that do not appear to load heavily on the same factor). you will see a list of the items in the scale with a certain amount of information about the items and the overall scale. Correlated factors – oblique (oblimin) rotation You may have noticed that some of the questions in the questionnaire seem to measure similar things (for example. For this example. the better. check the Direct Oblimin option instead. but this does not result in any kind of data reduction – not very useful. the cumulative variance explained by the first four factors is 52. These are all okay. Orthogonal (Varimax) Rotation (Uncorrelated Factors) An orthogonal (varimax) analysis will identify factors that are entirely independent of each other. the scree plot shows that a 3 factor solution might be better – the big difference in the slope of the line comes after three factors have been extracted. with none of the seven items requiring removal from the scale. and in 11. You can ignore the remaining columns. you could go back and ask SPSS to suppress values less than 0. but without the problem variable). instead of checking the Varimax option.0. In the output. Instead of a rotated factor matrix. with acceptable reliability if Alpha exceeds about 0. you should get a value for Alpha of 0. select Sort by size and Suppress absolute values less than 0. The difference is once the initial solution has been identified and SPSS rotates it. and not all load in the same way as before. You will have as many factors as there were variables to begin with. Click on Continue 33.2 (anything less than 0. as personality variables have a habit of doing. Finally comes the factor rotation matrix. Select all the items from var01 to var20 and move them into the Variables box 29. you have the Communalities. because both analyses use the same process to extract the factors. You can see this more clearly if you place a ruler along the slope in the scree plot. 2. If you drop a variable. as it clearly does not have enough in common with the factors in the solution to be useful.sav 1. You should concentrate on those values greater than 0. An orthogonal analysis may not be the most logical procedure to carry out. The first few sections will look the same. the cumulative variance explained by each successive factor. Remember that this is for unrotated factors.3 – that will clean up the rotated factor matrix and make it easier to interpret. select Varimax (make sure the circle is checked) 34. var02 (“These days a person doesn’t really know quite 197 RESEARCH METHODOLOGY Factor analysis of Driving01. although four factors have been extracted (using the SPSS default criteria – see later). as any lower than this can also be ignored. In column 2 you have the amount of variance ‘explained’ by each factor. Do they look the same as the varimax solution? One thing that has changed is that. Next comes the factor matrix. in order to clarify the solution (by redistributing the variance across the factors). The first four factors have eigenvalues greater than 1.3 (instead of 0. you should run the analysis again. Click on Rotation. Perhaps three factors may be better than four? See section [iii] p.25. Click on Options. then the scale would be better without that item – it should be removed from the scale and the Reliability Analysis run again. although the factors look similar.sav we will run an oblique factor analysis on the personality trait items (var01 to var20). which can also be ignored (it simply specifies the rotation that has been applied to the factors). Using the data in Driving01. Two or more of the factors identified in the last exercise may well correlate with one another. If any of the values in that column exceed the value for ‘Alpha’ at the bottom of the table.sav we will run a factor analysis on the personality trait items (var01 to var20). which takes values between zero and 1.3). which will identify factors that may be correlated to some degree.7. so SPSS will extract these factors by default (SPSS automatically extracts all factors with eigenvalues greater than 1. The discontinuity between the first three factors and the remaining set is clear – they have a far ‘steeper’ slope than the later factors. Output First. Each factor has a number of variables which have higher loadings. so these will be represented by blank spaces.1 and then change the value to 0. The next table displays the Eigenvalues for each potential factor.7793. Click on the ì button next to the Method box and select Principal Axis Factoring from the drop-down list 31.9 for further discussion of this issue.

1 are usually non-significant). and so this is not an issue. you have been letting SPSS decide how many factors to extract. you can ignore the plus or minus signs in front of the factor loadings. while the female sample is made up of younger women. 40. I reckon that three factors would lead to a more accurate solution than four. If you run a two-way ANOVA with Thrill score as DV and with Age and Gender as IVs. You could try comparing men vs women in terms of the scale score. and it has been using the default criterion (called the ‘Kaiser’ criterion) of extracting factors with eigenvalues greater than 1. uncorrelated). 37.you could come up with your own hypotheses. to see if there are any patterns in the way that people respond to the different items. How does this change the interpretation of factors 3 and 4? Finally is the factor correlation matrix. This obviously emphasises how important it is to ensure that your sample is representative. Where to go from here? Further Exercises (not included on original handout) The following exercises don’t introduce any new ideas or concepts. Exploring the Thrill-Sedate scale scores Some of you were asking what to do with the Thrill-Sedate Driver scale once you had calculated it. The solution fits quite well. • values (‘count’) and expected values for each cell. you may find an interaction between them. thus revealing a good simple structure. 41. Previous research has found that younger people record higher Thrill scores than older people. Further factor analysis Once the EFA procedures carried out during the factor analysis of the data have identified the variables that load on each factor (exercise 3). The first step in answering this question is to produce a crosstabs table for gender vs age-group. They should also help you to see how the techniques from the three sections of the PSY2005 Multivariate Statistics module fit together with one another. you should be aware that this may not always be the case. this criterion doesn’t always guarantee the optimal solution. because the factors are entirely independent of one another (in other words.who he can count on”) no longer loads on factor 3. we can see that there is quite a neat 3factor solution. so the scores will be similar. Does this pattern appear in this sample? 38. for this particular sample. 3. there is no significant difference between men and women in terms of Thrill scores? The male sample is made up of older men. First try to interpret the factors. you could construct scales for the other two factors from the items in the questionnaire that load on each factor (looking at the scree plot. The scree plot is not exact – there is a degree of judgement in drawing these lines and judging where the major change in slope comes. What does the interaction mean? Can you now see why. the negative correlations are so small as to be unimportant (correlations less than 0. However. We may have an idea of how many factors we should extract – the scree plot can give some heavy hints (as mentioned earlier). with all variable items loading quite high on only one factor. However. so SPSS extracts and rotates four factors. and this is the case. but should enable you to practice some of the techniques that are covered earlier in this series of exercises. but only on factor 4. Now run an ANOVA using the Thrill scale score as DV and Age-group as IV. so the possibilities are endless . an independent t-test shows us that this difference is not significant . 42. 2. Look at the second table in your output: four factors have eigenvalues greater than 1. but working out the logic behind the relationships between factors makes sense when you look at the variable items that represent the factors (the relevant questionnaire statements). In a varimax solution. this is ‘Real World’ data. but this time specify that you want a 3 factor solution by specifying the Number of factors to extract as 3 in the Extraction options window. How could you interpret the resulting factors? Remember. The relationship between correlated factors must inherently take into account the sign of the loadings. Use the scale-building and reliability procedures described earlier in these exercises to produce internally-reliable scales which we may then use to describe differences between people. and therefore we need to know if there is a positive or a negative relationship (correlation) between the factors. However. but with larger samples it is usually pretty reliable.why? 36. with each variable loading on only one of the factors). However. Compare the observed 198 © Copy Right: Rai University 11. so try running the analysis again. You’ll see that there are more older men (ie: fewer younger men) and more younger women (ie: fewer older women) than would be expected in the sample. in oblique (oblimin) analyses. We might expect that men would record higher scores than women. You could also factor analyse the preferred speed data. Cris Burgess (2001) RESEARCH METHODOLOGY 1. You could put all three personality trait scores into a Regression analysis and see how well they predict preferred speed on different types of road.556 . based on the questionnaire items (variables) that the factors load on. It may seem confusing at first. Extracting a specific number of factors Up to now. based on your own ideas about how people drive. In this example. 39. we have to take these into account because the factors correlate with one another to some extent.

Nine attributes that people use describe and evaluate these beverages have also been identified. We now ask a group of respondents to rate each of the beverages on these nine attributes. it would be more useful if these nine attributes could be combined into two or three dimensions or factors. The ouput of MDS is the location of the products/brands on various dimensions and this is called a perceptual map Techniques There are several scaling techniques which can be used to get an MDS. The actual statistical aspects of this technique are highly complex and beyond the scope of our course. A second approach considers similarity or preference between the objects.we have discussed measuring consumer attitudes using measurement scales such as the Likert. or brand stands in product space. In a competitive market much of marketing is concerned with the question of positioning. These are as follows: 1. An average rating of the respondent group on each of the nine attributes can be done. one for each beverage. Attribute Based Approaches An important assumption of attribute-based approaches is that we can identify attributes on which individuals’ perceptions of objects are based. If only two then they can be represented graphically. he or she would have 14 factor scores on each of the emerging factors. cost. All variables are treated as independent variables. Discriminant Analysis So far we have covered the use of factor analysis in generating a perceptual map. For example if we are interested in the object beer. i. Use of attributes. As we noted consumer perceptions regarding a brand are likely to be multi dimensional. which is used to position a product in perceptual space. An exploratory research has identified 14 beverages that seem relevant. The basic procedure is as follow: • We obtain each respondents opinions of where each product Factor Analysis Since each respondent rates 14 beverages on nine attributes. We shall focus understanding the technique intuitively and also on its applications particularly on interpretation of computer output. MDS scaling basically involves two problems: 1. whereas on the right map the “filling” attribute is strongly associated with the “refreshing” factor. We shall now briefly look at how this is done using factor analysis: What is Multidimensional Scaling? Mulltidimensional scaling (MDS) is a set of techniques that attempts to represent consumer perceptions regarding a product in a geometric space. while the second includes the first and third.e. alcohol content. Irrespective of technique used. However. the original attitudes also are shown on the maps as lines or vectors. etc. 2. 2. not taking in to account individual attributes as factors.. Usually as noted a product is identified on several dimensions. The direction of the vectors indicates the factor with which each attribute is associated. Since three factors or dimensions are involved. As we can see three factors. Dimensions on which the consumer evaluate products need to be identified. how do we fare vis a vis competitors in the consumers mind space. Multidimensional scaling is the technique that attempts to represent such perceptions and preferences as points in a geometric space. Thus. the aim of MDS is to show market clusters or segments and their sizes. For example a soft drink may be perceived along the”colaness “ dimension. the MDS technique would then combine these attributes into dimensions such as pricequality. A car can be seen to be both luxurious and sporty. on the left map the “filling” attribute has little association with any factor. flavour. Let us start with a simple example: Suppose that our goal is to develop a perceptual map for a nonalcoholic beverage market. Two approaches can be used factor analysis and discriminant analysis are usually used to reduce the attributes to a small number of dimensions. and the length of the vector indicates the strength of association. These measure perceptions or preferences in terms of a single dimension.RESEARCH METHODOLOGY LESSON 37: MULTIDIMENSIONAL SCALING We now turn to a very popular technique. They differ with regard to the assumptions they make and even the input data they use. body. The attributes studied could be body. on a 7-point scale. as well “dietness” dimension. The vectors are obtained based on the amount of correlation the original attitudes possess with the factor scores (represented as factors). MDS again looks at interdependence in consumer responses. Such analysis is useful for determining marketing strategies. account for 77 percent of the variance in 9 attributes. The position of each beverage in the perceptual space will be the average factor score for that beverage. However it is likely that a consumers perceptions of a product may be multidimensional. 11. The products or brands need to be positioned with respect to these dimensions. Each beverage is then positioned on the attributes. The perceptual map shown in Figure 1 below. We can also use discriminant analysis. For convenience. two maps are required to portray the The first involves the first two factors. • We then try to locate each consumers ideal point in the product space for each product.556 © Copy Right: Rai University 199 .

it tends to provide a richer solution as it uses more of the underlying attributes. often it is convenient to collect binary or Zero or one data. If the number of attributes and objects are large. 3. A third quality of discriminant analysis is that it identifies a perceptual dimension even if it is represented by a single attribute. The test determines the probability that the between-object distance is due simply to a statistical accident. in a twodimensional (preferably)space as output. Correspondence analysis generates as an output a perceptual map in which the elements of attributes and brand are both positioned. · For example. 1. Nonmetric MDS assumes that the proximities data are ordinal • If the derived distances are either multiple or linear functions Correspondence Analysis In both factor analysis and discriminant analysis. The Euclidean distances (derived) between objects in the two-dimensional space are then computed and compared with the proximities data. the task of scaling each object on each attribute may be excessive and unrealistic. If there are relatively less attributes representing a dimension. RESEARCH METHODOLOGY Discriminant analysis 1. the variables are assumed be intervally scaled. However. A third way is to ask the respondent to pick or two (or k) use occasions that are most suitable a brand. This technique where we use data reflecting the association of an attribute or other variable with a brand or other object is termed correspondence analysis. the extent to which an attribute will tend to be an important contributor towards a dimension depends on the extent to which there is a perceived difference among the objects having that attribute. In this case we would ask respondents to list all the attributes they can think of for a certain brand or to list all the objects or brands that would apply to a certain use occasion.556 200 © Copy Right: Rai University . MDS then uses these proximities data to produce a geometric configuration of points . They found that the factor analysis_dimensions provided more interpretive value than those of discriminant analysis. Basically MDS uses proximities among different objects as input. of the proximities. whether they discriminate between objects or not. By simply checking which attributes (or use occasions ) apply to a given object may be easier and more efficient. Attribute based data such as objects’ X attributes (profile matrix) or non attribute based data such as similarity and preference data. Factor Analysis 1 . • If we make the rank order of derived distances between objects/brands correspond to the rank order of the proximities data. 3. However. 2. 2. A second useful characteristic of discriminant analysis is that it provides a test of statistical significance. Comparing Factor and Discriminant Analysis Each of the approaches has advantages and disadvantages. For example respondents might be asked to identify from an attribute list which attributes describe a brand. What is a Proximity? A proximity is a value that denotes how similar or different two objects are or are perceived to be. For each respondent we will get a row of zeros and ones. A study conducted by Hauser and Koppelman of shopping centers which compared several approaches to multidimensional scaling. the process is known as nonmetric MDS. 11. Metric MDS assumes that they are metric. The goal of discriminant analysis is to gener-ate dimensions (termed discriminant function factors) that will discriminate or separate the objects as much as possible. Similar to factor analysis. Binary data may be useful if we want to generate a very comprehensive list of attributes or use occasions. Factor analysis is based on both perceived differences between objects and differences between people’s perceptions of objects. Another possibility is that the respondent could be asked to pick three (or k) that attributes that are associated with a brand. The data consists of rows of zeros and ones. The null hypothesis is that two objects are perceived identically. in discriminant analysis. Ways of doing this are : 1. and results in more dimensions. The result in all these cases would be a row of zeros and ones for each respondent and for each brand. A 7point Likert scale (agree-disag ) is usually used. what snacks would you consider for a party given to watch the SuperBowl? In this case we would prefer to generate binary data and use correspondence analysis. If all objects are perceived to be similar with respect to an attribute (such as an airline’s safety).The goal of factor analysis is to generate dimensions that maximize interpretability and explain variance. then • It is known as metric MDS. A key concept of MDS is that the derived distances (output) between the objects should correspond to the proximities (input).which are actually objects. continuous variables.Factor analysis groups attributes that are similar. Discriminant analysis identifies clusters of attributes on which objects differ. All perceptual dimensions are included. that factor should not affect preference such as the choice of an airline. then it will tend not to emerge in the factor analysis solution. 2. When Do We Use Binary Data? Binary judgments are used in several contexts. can be used to obtain proximities data. Basic Concepts of MDS We now try to understand how exactly an MDs is carried out. This is a technique of MDS and is termed correspondence analysis. each dimension is based on a combination of the underlying attributes. Thus. 3. 2. Therefore discriminant analysis objective is to select attributes that discriminate between objects s.

. It actually contains about the same amount of information in that the usually is not affected by replacing intervally scaled or “metric” data with or or nonmetric data. We choose the appropriate number of dimensions depending on where the sudden jump in stress starts to occur. They also require more dimensions to represent them. Although at least seven or eight objects should be judged. one can correlate the object’s attribute ratings with the dimension to determine which dimensions correlate highly with what attributes. This location should therefore try to reposition itself by moving towards the origin by increasing both value and variety. The appropriate number of dimensions required to locate the objects in space can be obtained by plotting the stress values against the number of dimensions. it is also’possible to use only the ordinal or “nonmetric” portion data. four or even higher dimensions. Because of the above problems frequently non-attribute or similarity or preference data may be preferred. store availability. where n is the total number of objects. As we can see the Chicago Loop location is the only one that offers a good value(quality/price ) and variety.1)/2. Attribute based MDs has the advantage that attributes can have diagnostic value. however. The respondent is not told on what criteria to use to judge similarity. 2. or possibly three.7 is replaced by the fact that objects A and C are the most similar pair. the respondents be asked simply to rank the pairs from most to least similar. In MDS. we always prefer lower dimensions. Figure -4 shows the conversion to rank-order information. 3. The dimensions can be interpreted in terms of their correlations with the attributes. Evaluating The MDS Solution What is the MDS solution? The fit between the derived distances and the proximities in each dimension is evaluated through a measure called stress. As you can see. The number of pairs to be judged for degree of similarity can be as many as n(n.556 © Copy Right: Rai University 201 . Figure 221 is an example of naming the dimensions for a perceptual map of shopping locations. Attribute based data has certain disadvantages: 1. It should be noted. The plot indicates that probably two dimensions are acceptable. the stress value increases when we decrease the number of dimensions. Interpreting the Dimensions WE now look at the perceptual map . Attribute Based MDS MDS can also be used on attribute data to produce perceptual maps . Since visual inspection is possible only with two. 4. dimensions. as shown in fig 4 . specials. 2. Thus we can see that different shopping locations are perceived differently from one another.• In both cases the output (derived distances) is metric. 11. RESEARCH METHODOLOGY The statistical calculations are highly complex. For our purposes we need to understand the basic idea of what is done in an MDA analysis intuitively. for several reasons 1. the nonmetric data often are thought to be more reliable. This is similar to the factor analysis scree plot. An object maybe perceived and evaluated as a whole by respondents rather than being broken up into attributes. In our example students have judged Harvard and Stanford to be quite similar. A perceptual map could be obtained from the average similarity ratings however. Studies have generally found that attribute based data are easier for respondents to use and dimensions based on attribute data predicted preference better. Once the location of the brands and objects can be evaluated and clear strategy implemented to reposition the brand or to maintain its position. which gives the location of various shopping areas in Chicago. When it is used on attribute data it is termed attribute based MDS. An average order position then would replace the average similarity rating matrix. Instead of similarity ratings. Similarly the second dimension may have a label such as quality vs price. First/the results ( pairwise similarity judgments are summarized in a matrix. the knowledge that objects A and C in Figure 4 have an average similarity of 1. It may be difficult to generate a comprehensive attribute list given that people’s perceptions differ. that rank ordering can be difficult if 10 or more objects are involved. He is not therefore given an attribute list. higher dimensions are associ-ated with lower stress values and vice versa. The numbers in the matrix represent the average similarity judgment . The first dimension for example is labeled as variety because attributes such as variety of merchandise. Usually. Sometimes we directly seek a two dimen-sional representation because it is easier to in-terpret. With 10 brands there would be 45 pairs of brands to judge (although fewer could be used). the approach is easier to illustrate if only-four objects are considered. Labeling the Dimensions To label the dimensions. three. Second. Korvette city on the other hand offers low value and less variety. Determining the Number of Dimensions Figure 2 plots the stress values against the number of dimensions. the objects can be projected onto two. etc are associated with it. since there is a large increase in the stress values from two dimensions to one. Ordinal or non metric information is preferred. if the list of attributes is not complete the study will suffer. Application of Mds With Nonattribute Data Simialrity Data Similarity data relates the perceived similarity of two objects in the eyes of the respondent. These can then be accordingly named. a sample of 50 respondents.

Diet Coke. . . It also provides a means for segmenting customers according to their preferences for product attributes. . . . or more dimensions. . .Bad handling’ then respondents would very likely prefer an end point on the scale. the ideal object would be represented by an ideal vector or direction rather than an ideal point in the space. . . a rank-order preference among the objects is sought. . . . . 7-Up. . D). . . . . it is unlikely that there will be a significantly different solution that still satisfies the constraints of the similarities matrix. The concept of an ideal object in the space is an important one in MDS because it allows the analyst to relate object positioning to customer likes and dislikes. . The preferred object should be closest to the ideal. . . . B). because of the advantages of visual interpretations. . The assumption that people have similar perceptions may be reasonable. . . and Schweppes Calistoga are separated even though they are very similar.6 A sample of 64 undergraduates provided similarity judgments for all 45 pairs of 10 drinks including Coke. . With 10 objects and 45 constraints the task of locating the points in’ a twodimensional space is more difficult. . and the longest pair between (A. . . . The direction would depend on the relative desirability of the various attributes. . so that the distance is between pair (A. the fruit punch versus hot coffee object locations on the horizontal suggest a maturity dimension. the next shortest between pair (A. . Thus. . . . Thus. . . The second type is illustrated by a different example. . and Slice. . It also provides a means for segmenting customers according to preferences for product attributes. . There are situations where more dimensions are necessary. . . . . . although most analysts will trade off some degree of fit to stay with a two. The power of this technique is in its ability to find the smallest number of dimensions for which there is a reasonably good fit between the input similarity rankings and the rankings’ of distance between objects in the resulting space. and so on). and requires a computer. . . It is a combination of all the customers’ preferred-attribute levels. . . For example. There are Two Types of Ldeal Objects 1. . The first is simply to ask respondents to consider an ideal object as one of the objects to be rated or compared. . . . They were asked to rate the similarity of each pair such as Slice-Diet Coke. substantial. in Figure 1the location of objects suggests dimension interpretations even without the attribute inform . . . a computer program is employed to convert the rankings of similarities into distances in a map with a small number of dimensions.556 . . . . . Most respondents perceived too many differences to be captured with two or three dimensions. The second most preferred object should be farther from the ideal than the © Copy Right: Rai University 11. . The vertical axis seems to represent a nondiet dimension. . . . . . because in both the cola group and the noncola group nondiet drinks tend to be higher than the diet drinks. catsup. . Sometimes. Not at all sweet Large. . This is because there are only a few points to move in the space and only six constraints to satisfy. . This happened in a study of nine different types of sauces (mustard.or three-dimensional map. . . The concept of an ideal object in the space is an important one in because it allows the analyst to relate object positioning to customer likes /dislikes. . . . steak sauce.Next. . we can argue that the intervally scaled nature of the distances between points really was hidden in the rank-order input data all the time. One possible solution that satisfied these constraints in two dimensions is the fQllowing: You might be able to relocate the point differently and still satisfy the constraints so that the rankings of the distances in the map correspond to the rankings of the pairwise similarity judgments. . In that case. . . . . The first lies within the perceptual map. . . . and the result may therefore be ambiguous and unreliable. . . . . Then. . . Calistoga Natural Orange. in terms of either the types of foods with which the sauces would be used or the physical characteristics of each sauce. on a nine-point scale. if a new cookie were rated on attribute scales such as Very sweet. . It also dislikes. . For each individual. A second approach is indirect. . . . . . The computer will be programmed to the four objects in a space of two. including objects that can be conceptualized in the space but do not exist. RESEARCH METHODOLOGY Preference Data An ideal object is one that the customer would prefer over others. . There are two approaches to obtaining ideal-object locations.Tthis means starting with two dimensions and if this is not satisfactory. 2. three. . . . . preferences are nearly always heterogeneous. . The determination of “acceptable” is a matter of judgment. . Thus individuals ideal objects will differ. . . . Expensive 10 operate Good handling. given a perceptual map. so that similar objects are close together and vice versa. . . Suppose attributes of a proposed new car included Inexpensive ta buy. . Expensive ta buy Inexpensive to operate. . the objects on the horizontal indicate a cola-non cola dimension. . However one reason to locate ideal objects is to identify segments of customers who have similar ideal objects. . . continuing to add dimensions until an acceptable fit is achieved. Small. Additional information must be introduced to decide why objects are located in their relative positions. the location of the objects themse1vs suggests dimensional interpretations. . . . The problem with this approach is that conceptualizing an ideal object may not be natural for a respondent. For instance. The two-dimensional solution is shown in figure. . C). Interpreting the resulting dimension takes place “outside” the technique. dainty a respondent might well prefer a middle position on the scale. For example. Once a solution is found (the points are located in the space). the car should be as inexpensive as possible to buy and operate. . . dressing. . . a program will locate the individual’s ideal objects such that the distances to the objects have the same rank order (or as close to it as possible) as the rank-order preference. In 202 Figure 6. . relish.5 We can see that Slice is considered closer to Diet 7-Up than to 7Up. .

the usefulness is re-duced. The interpretation of dimensions can be difficult. 11. Perceptual mapping has not been shown to be reliable across different methods• Users rarely take the trouble to apply multiple approaches to a context to ensure that a map is not method-specific. Respondents use an appropriate context. In attitude based MDS attribute vectors and may be included to help interpret the dimensions. In that case. and so on. underlying data represents valid measures. 3. 2. Attribute list should be complete. Summary of MDS Application : used to identify dimensions • By which objects are perceived • To position objects with respect to those dimensions • To make positioning decisions for new/old products. it is not possible to determine a location that will satisfy this requirement perfectly and still obtain a small number of dimensions with which an analyst would like to work. It is difficult from the model to know how they might be affected by market events. but closer than the third most preferred. more generally. When more than two or three dimensions are needed. 2.preferred. However there are several problems in working with MDS: 1. 3. Often. to portray the relation-ship among any variables or constructs.556 © Copy Right: Rai University 203 .” RESEARCH METHODOLOGY Issues in MDS Perceptual maps are good vehicles through which to summarize the position of brands and people in attribute space and. 3. It is particularly useful to portray the positioning of existing or new brands and the relationship of those positions to the relevant segments. Perceptual maps are static snapshots at a point of time. Assumptions 1. compromises are made and the computer program does as well as possible by maximizing some measure of “ goodnessof-fit. Inputs • Attribute based data • Similarity based data • Preference data Outputs • Provides location of each object on a limited number of dimensions. The number of dimensions is selected on the basis of goodness of fit measure.

. observed distances). In factor analysis.g. The raw stress value Phi of a configuration is defined by: Phi = [d ij . To return to our example. If all reproduced distances fall onto the step-line. the better is the fit of the reproduced distance matrix to the observed distance matrix. Of course.e. the smaller the stress value. once we have a two-dimensional map it is much easier to visualize the location of and navigate between cities. MDS attempts to arrange “objects” (major cities in this example) in a space with a particular number of dimensions (two-dimensional in this example) so as to reproduce the observed distances. Measures of Goodness-of-fit: Stress The most common measure that is used to evaluate how well (or poorly) a particular configuration reproduces the observed distance matrix is the stress measure. To return to the example of distances between cities.556 . in our example. then we can perfectly reproduce the observed distance matrix. Thus. Suppose we take a matrix of distances between major US cities from a map. specifying that we want to reproduce the distances based on two dimensions.. monotone transformation of the observed input data (distances). the more dimensions we use in order to reproduce the distance matrix. The expression f (ij ) indicates a nonmetric. Thus. that is. How Many Dimensions to Specify? If you are familiar with factor analysis.e. as compared to relying on the distance matrix only. With MDS one may analyze any kind of similarity or dissimilarity matrix. that is. this is not necessary in order to understand the following discussion. In this formula. most of them amount to the computation of the sum of squared deviations of observed distances (or some monotone transformation of those distances) from the reproduced distances. In general. the result of the monotone transformation f() of the input data. so as to arrive at a configuration that best approximates the observed distances. that is. then the rank-ordering of distances (or similarities) would be perfectly reproduced by the respective solution (dimensional model). however. In fact.f ( ij )]2 Logic of MDS The following simple example may demonstrate the logic of an MDS analysis. we would most likely obtain a two-dimensional representation of the locations of the cities. We then analyze this matrix. One can plot the reproduced distances for a particular number of dimensions against the observed input data (distances). who will choose an orientation that can be most easily explained. we could have chosen an orientation of axes other than north/south and east/west. In general. the final orientation of axes in the plane or space is mostly the result of a subjective decision by the researcher. the goal of the analysis is to detect meaningful underlying dimensions that allow the researcher to explain observed similarities or dissimilarities (distances) between the investigated objects. Orientation of axes. to explain the distance matrix in terms of fewer underlying dimensions. that orientation is most convenient because it “makes the most sense” (i. if we use as many dimensions as there are variables. however. you will be quite aware of this issue. it will attempt to reproduce the general rank-ordering of distances between the objects in the analysis. however. the better is the fit of the reproduced matrix to the observed matrix (i. and ij (deltaij) stands for the input data (i. In general then. it uses a function minimization algorithm that evaluates different configurations with the goal of maximizing the goodness-of-fit (or minimizing “lack of fit”). we could rotate the map in any way we want. There are several similar related measures that are commonly used. it is easily interpretable). It actually moves objects around in the space defined by the requested number of dimensions. we could explain the distances in terms of the two geographical dimensions: north/south and east/west. our goal is to reduce the observed complexity of nature. This scatterplot is referred to as a Shepard diagram. Computational Approach MDS is not so much an exact procedure as rather a way to “rearrange” objects in an efficient manner. the smaller is the stress). in addition to correlation matrices. dij stands for the reproduced distances.. the generally negative slope). Thus. As a result of the MDS analysis. In more technical terms. This line represents the socalled D-hat values. the distances between cities remain the same.. As a result. To return to our example. Shepard diagram. you may want to read the Factor Analysis section in the manual. This plot also shows a step-function. we would basically obtain a two-dimensional map. variables) are expressed in the correlation matrix. Deviations from the step-line indicate lack of fit.e. given the respective number of dimensions. If you are not familiar with factor analysis. 204 © Copy Right: Rai University 11.RESEARCH METHODOLOGY LESSON 38: FURTHER APPLICATIONS AND THEORY THE MULTIDIMENSIONAL SCALING USING STATE SOFTWARE General Purpose Multidimensional scaling (MDS) can be considered to be an alternative to factor analysis (see Factor Analysis). As in factor analysis. the similarities between objects (e. and checks how well the distances between objects can be reproduced by the new configuration. This plot shows the reproduced distances plotted on the vertical (Y) axis versus the original similarities plotted on the horizontal (X) axis (hence. we can “explain” the distances in terms of underlying dimensions. the actual orientation of axes in the final solution is arbitrary.

more interpretable solutions emerge. 1978) is to use multiple regression techniques to regress some meaningful variables on the coordinates for the different dimensions.” RESEARCH METHODOLOGY Interpreting the Dimensions The interpretation of dimensions usually represents the final step of the analysis. in the second matrix. their interpretation is somewhat more complex. etc. A second criterion for deciding how many dimensions to interpret is the clarity of the final configuration. Let us consider for a moment why fewer factors may produce a worse representation of a distance matrix than would more factors. © Copy Right: Rai University 205 . A B C D E F configurations. and E is 90 miles away form F.Sources of misfit. if the data points in the plot do not follow any pattern. E. the actual orientations of the axes from the MDS analysis are arbitrary. However. as in our example of distances between cities. in the shape of a triangle Three-dimensional solutions can also be illustrated graphically. thus. and if the stress plot does not show any clear “elbow. For a detailed discussion of how to interpret final configurations. and F on one dimension: D—90 miles—-E—90 miles—-F D is 90 miles away from E. Note that this can easily be done via Multiple Regression. service. random variability that contributes to the differences between the reproduced and observed matrix. B. Often. Without going into much detail. the resultant dimensions are easily interpreted. we can perfectly reproduce the distances between them. Use of multiple regression techniques. At other times. 53-60) discuss se the application of this plot to MDS. Imagine the three cities A. A first step is to produce scatterplots of the objects in the different two-dimensional planes.” one should also look for clusters of points or particular patterns and configurations (such as circles. this small example illustrates how a particular distance matrix implies a particular number of dimensions. however. Point to Ponder • Multidimensional scaling (MDS) is often used in conjunction with cluster analysis or conjoint analysis. As mentioned earlier. or Guttman (1968). we can arrange those cities in two dimensions. manifolds. • It allows a respondent’s perception about a product.556 In addition to “meaningful dimensions. Now. B. can we arrange the three cities (objects) on one dimension (line)? Indeed. • MDS helps the business researcher to understand difficult – to-measure constructs such as product quality or desirability. Interpretability of configuration. and the three cities D. and C you will see that there is no way to arrange the three cities on one line so that the distances can be reproduced. If you try to do the same thing with cities A. An analytical way of interpreting dimensions (described in Kruskal & Wish. or other object of attitude to be described in a spatial manner. shown below are their distances from each other. pp. To the right of this point one finds. However. cities D and F are 180 miles apart. This test was first proposed by Cattell (1966) in the context of the number-of-factors problem in factor analysis ( e Factor Analysis). and F. Kruskal and Wish (1978. D is 90+90=180 miles away from F. Sometimes. presumably. A common way to decide how many dimensions to use is to plot the stress value against different numbers of dimensions. we can arrange cities D. that is. Cattell suggests to find the place where the smooth decrease of stress values (eigenvalues in factor analysis) appears to level off to the right of the plot. the points in the plot form a sort of “random cloud. Borg and Shye (in press). see Borg and Lingoes (1987).” then the data are most likely random “noise. Scree test.” and there is no straightforward and easy way to interpret the dimensions. and can be rotated in any direction. only “factorial scree” — “scree” is the geological term referring to the debris which collects on the lower part of a rocky slope. In the latter case one should try to include more or fewer dimensions and examine the resultant final 11. and C. all cities are exactly 90 miles apart from each other.” and contain a lot of noise. E. “real” data are never this “clean. A C 0 0 0 90 90 D 0 0 0 B 90 E 90 F 180 90 In the first matrix. A 90 miles B 90 miles 90 miles C Arranging the three cities in this manner.). Of course.

• Items judged to be similar will fall close together in RESEARCH METHODOLOGY multidimensional space and are revealed numerically and geometrically by spatial map Notes- 206 © Copy Right: Rai University 11.556 .which are perceived and cognitively mapped in different ways by different individuals.

cylindrical) • Price (different price levels) Basically the respondent’s preference ranking help reveal how desirable a particular feature is to a respondent. low. We can present our trade off information in the form of a table: Table 1 What is Conjoint Analysis? Conjoint analysis is a technique used to identify the most desirable combination of features to be offered in a new product. We can also identify the more important attributes by looking at the range of utilities for each of the different levels. 20 minutes) AC vs non AC vs. music (Ac & music. For example for a new product the features may be: • Colour (different shades) • Size (largest vs. It addresses the problem of how the customer will value the various tangible and intangible features offered by a particular firm’s product. music. which reflect different features or characteristics of products. Thus in the above example the respondent gives a high weightage to service followed by AC. nothing) A sample of 500 respondents are selected and asked to rank their preferences for all possible combinations and for each level. which helps us determine the best or optimal combination of product features or attributes. Respondents are to indicate the combination they most prefer. By the end of this lesson you should have an intuitive understanding of what is conjoint analysis and its applications in marketing and other types of research problems. A more detailed explanation is beyond the scope of our course.RESEARCH METHODOLOGY LESSON 39: CONJOINT ANALYSIS We will now learn of an important technique. medium vs. However it is an important technique. These are shown below along with one respondent’s sample rankings. Conjoint measurement also tells us the extent to which respondents are willing to give up (trade off) some features and attributes to retain others. We wish to test the relative desirability of three attributes: The company aims to provide a service. Conjoint measure-ment is the statistical technique typically used to identify the most desirable com-bination of attributes or features for the particular product or service under in-vestigation. The calculation of utilities is such that the sum of utilities for a particular combination shows a good correspondence with that combination’s position in the individual’s original preference rankings. Rs15.A high range implies that the respondent is more sensitive to changes in the level of this attribute. However he is not willing to trade off frequency of service with either AC or music. We can best understand Conjoint analysis with the help of an example: For Example • Frequency of service has a range from 1. decides on what is the optimal combination of product features to be offered. The range is therefore equal to =1. • These utilities are calculated across all respondents for all attributes and for different levels of each attribute. etc. The utilities basically show the importance of each level of each importance to respondents.04. This is a highly advanced technique and we shall only be able to touch on it intuitively. It is similar to factor analysis in that it tries to identify interdependencies between a number of variables where the variables are the different features. They wish to test three levels of frequency. Example 1 Suppose we have to design a public transport system. small) • Shape (square vs. monthly. Conjoint analysis uses preference rankings to calculate a set of utilities for each respondent where one utility is calculated for each respondent for each attribute or feature. Rs 20) Frequency of service (10 minutes.6 to . AC. Further they want to test the weightage given by consumer to add on features such as AC and music. the second most preferred.) • After sales service (frequent. the offer of music is clearly not very important as he ranks it below AC. yearly. 15 minutes.2. It differs from factor analysis because it is only applied to categorical variables. Features respondents are unwilling to give up from one preference ranking to the next are given a higher utility.556 © Copy Right: Rai University 207 . Conjoint analysis is applied to categorical variables. guarantee) • Product features Frequency 10 15 20 Ac 1 5 9 AC&music 2 6 10 Music 3 7 11 Nothing 4 8 12 Conjoint analysis – how it works A consumer is asked to compare different products attribute combinations and rank them. 11. The conjoint problem can be presented as follows: Fare (three levels Rs10. which helps a marketing person. and three levels of prices. Thus conjoint analysis is done to determine what utility a consumer attaches to attributes such as: • Price (high.

.556 RESEARCH METHODOLOGY Uses of Conjoint Analysis • It is used in industrial marketing where a product can have many combinations and features and not all features would be important to all consumers.49 Vegetable 140 yes $1.point preference scale.49 140 yes $1. Also like the multi-attribute model.49 Vegetable 100 yes $1.19 Chicken 140 no $1. Like the multi-attribute model.19 Vegetable 100 yes $1.49 80 no $1. Price onion.49 Altogether there are 3x3x2x2 = 36 possible combinations. as each individual is important.49 Chicken 100 no $1. 2. • In case of consumer goods the analysis should be done segment wise. Calories c.49 140 no $1. no $1.49 Vegetable 80 no $1.19 100 no $1.49 Vegetable 140 no $1. A consumer could.19.49 80 no $1. We now present some applications of Conjoint analysis from the internet. Conjoint Example: Packaged Soups Results for One Subject Salt Flavor Onion Onion Onion Onion Onion Onion Onion Onion Onion Onion Onion Onion Chicken Chicken Chicken Chicken Cal. 3.19 Vegetable 140 no $1.19 Chicken 140 yes $1.19 100 yes $1.19 Vegetable 140 yes $1.49 80 yes $1.49 Vegetable 100 no $1.19 80 yes $1.19 80 no $1.19 Vegetable 80 yes $1. Free Price 80 yes $1. However. In industrial marketing the analysis can be done at the individual level. The analysis assumes the attributes are important to consumers. there is an important difference between conjoint analysis and the multiattribute model: • The multiattribute model is compositional . $1..At the end of the analysis we would identify 3-4 of the most popular combinations would be identified for which the relative costs and benefits can be worked out.49 Vegetable 80 yes $1. or “worth” of the product U [attribute (y)] = “part worth” of the yth level of the Xth Attribute J = number of attributes Conjoint analysis is a sophisticated technique for measuring consumer attitudes and preferences. rate each of the 36 combinations on a 9 . + U[attribute(k)] Where: U [product]=overall utility. 100. To avoid unnecessarily long questionnaires a preliminary factor analysis should be run to select only testable attributes.49 inferred overall attitude as the sum of measured subcomponents.19 Vegetable 100 no $1. 208 © Copy Right: Rai University . Example of Conjoint Analysis Technique Packaged soups.19 Chicken 100 yes $1.49 100 no $1. Also the number of attributes should be restricted.19 140 no $1. chicken noodle. Problems It is important that the attributes be selected carefully.it measures overall preference and decomposes this into inferred subcomponents.19 Rating 9 8 6 6 7 6 5 5 7 6 5 5 7 6 2 2 3 3 2 1 2 2 2 1 9 8 7 6 8 7 6 5 6 5 5 4 11. Flavor b. country vegetable 80. Conjoint helps understand why consumers prefer certain products: U [product] = U [attribute1 (i)] + U [attribute2 (j)] + .it builds up an 140 yes $1.49 Chicken 100 yes $1.49 Chicken 140 yes $1.19 Vegetable 80 no $1.19 80 yes $1.49 100 yes $1.19 80 no $1. it decomposes overall preference into a series of additive terms. • The conjoint model is deco positional . Salt-free d. in theory. it helps understand why consumers prefer certain products.19 Chicken 100 no $1. Four attributes with the following levels: a. 140 yes.49 Chicken 140 no $1.

Airline Example: Stages a. Part-worth calculated for one subject: normalized relative attribute Flavor level vegetable onion chicken Calories 80 100 140 Salt-Free yes no Price $1. inactive. we at least double the minimum number for greater stability. Use a fractional factorial design to create an orthogonal array of stimuli. consumer behavior. international) b. full.00 . 6.40 . Q) what is the fewest number of combinations we can get by with? A) Sum of the number of degrees-of-freedom for the main effects of each attribute: faculty members. Estimate part-worths for attribute levels. Area of specialization (quantitative. Conjoint as aid to decision making. which optimized the fit of the conjoint model Sawtooth Software Research Paper Series Understanding Conjoint Analysis In 15 Minutes Joseph Curry.17 5. Product is realistically decomposable b. Sawtooth Software 11. two-factor evaluation e. Sawtooth Software. it becomes impossible to present all possible combinations of attributes to a consumer. Attribute selection b. Inc.44 4. e. average.2001. Current position (chair. strategy.06 4. Teaching orientation (star.25 2. Sequim. Role in department (work with junior faculty. Selecting attributes c.40 .92 . Orthogonal Arrays In actual applications. Selecting attribute levels d. active.00 .33 mean 1.00 1. good. Interpretation 9.75 6.4.98 . Consider carpet cleaners: 3 package designs 3 brand names 3 price points Good Housekeeping Seal yes/no Money Back guarantee yes/no b. Rate or rank the stimuli d. • Discuss use of non-metric data and monotone (3-1) + (3-1) + (3-1) + (2-1) + (2-1) = 8 d. Applications to product development 10. Estimate relative importance of attributes f. Typical sample sizes d. Collecting preference data c.58 .83 4. All combinations that are presented to respondent are reasonable d. 530 W. We need to find an “orthogonal array” of 8 profiles. Typically. Interpretation of results b. Research orientation (star.56 8% 23% 26% importance 43% e.49 mean 6. Profile vs. Fir St. Computerized (adaptive) approaches 7. work with faculty at other schools. 1996 © Copyright 1996 . which allows us to estimate all additive main effects in the conjoint model. below average) d.sawtoothsoftware.19 $1. c. Issues in Designing a Conjoint Study a.com Understanding Conjoint Analysis in 15 Minutes Joseph Curry (Originally published in Quirk’s Marketing Research Review) Copyright 1996. work with business community) • Discuss Part-Worths and relative importances for eight RESEARCH METHODOLOGY 6. Desirable problem situations for conjoint analysis a. Sawtooth Technologies. Example: Conjoint analysis for faculty chair candidate.17 6.78 5. Product is reasoned high-stake decision c. There are 108 possible combinations? c. associate) e. New product alternatives can be synthesized from basic attributes 8. Inc. Applications to market segmentation e. Product/service alternatives can be realistically described transformation of faculty rank orders.33 4. minor) c. Airline Example: Discussion a.556 © Copy Right: Rai University 209 . WA 98382 (360) 681-2300 www. Develop releva nt set of attributes and select appropriate levels b. management.75 . Attributes used were: a.

let’s figure out a set of values for driving distance and a second set for ball life for Buyer 1so that when we add these values together for each ball they reproduce Buyer 1’s rank orders. Figure 3 shows one possible scheme. Next. If you understand this. Suppose we want to market a new golf ball. Conjoint analysis became popular because it was a far less expensive and more flexible way to address these issues thanconcept testing. Figure 1 Rank Average Driving Distance Rank Average Ball Life 1 275 yards 1 54 holes 2 250 yards 2 36 holes 3 225 yards 3 18 holes This type of information doesn’t tell us anything that we didn’t already know about which ball toproduce. Buyer 1 tends to trade-off ball life for distance. We know from experience and from talking withgolfers that there are three important product features: ! Average Driving Distance ! Average Ball Life ! Price We further know that there is a range of feasible alternatives for each of these features. for instance: Average Driving Distance Average Ball Life Price 275 yards 54 holes $1. A traditional research project might start by considering the rankings for distance and ball life in Figure 1. A simple example is all that’s required. The knowledge we gain in going from Figure 1 to Figures 2a and 2b is the essence of conjointanalysis. Now consider the same two features taken conjointly. Here’s the basic marketing issue: We’d lose our shirts selling the first ball and the market wouldn’t buy the second. you understand the power behind this technique. But as we can see from their other choices. Figures 2a and 2b show the rankings of the 9 possible products for two buyers assuming price is the same for all combinations. The most viable product is somewhere in between. whereas Buyer 2 makes the oppositetrade-off.556 . Figure 3 Buyer 1 Average Ball Life 54 holes 50 36 holes 25 18 holes 0 275 yards 100 (1) 150 (2) 125 (4) 100 250 yards 60 (3) 110 RESEARCH METHODOLOGY 210 © Copy Right: Rai University 11. Figure 2a Buyer 1 Average Ball Life 54 holes 36 holes 18 holes 275 yards 1 2 4 250 yards 3 5 7 Average Driving Distance 225 yards 6 8 9 Figure 2b Buyer 2 Average Ball Life 54 holes 36 holes 18 holes 275 yards 1 3 6 250 yards 2 5 8 Average Driving Distance 225 yards 4 7 9 Both buyers agree on the most and least preferred ball. The basics of conjoint analysis are not hard to understand.25 250 yards 36 holes $1. I’ll attempt to acquaint you withthese basics in the next 15 minutes so that you can appreciate what conjoint analysis has to offer.Conjoint analysis is a popular marketing research technique that marketers use to determine what features a new product should have and how it should be priced.75assuming that it costs less to produce a ball that travels a shorter distance and has a shorter life. but where? Conjoint analysis lets us find out where. the market’s “ideal” ball would be: Average Driving Distance Average Ball Life Price 275 yards 54 holes $1.50 225 yards 18 holes $1.75 Obviously.25 and the “ideal” ball from a cost of manufacturing perspective would be: Average Driving Distance Average Ball Life Price 225 yards 18 holes $1.

Figure 5 275 yards 100 54 holes 50 $1. estimating buyer value systems. so thereis some arbitrariness in the magnitudes of these numbers even though their relationships to eachother are fixed. Figure 4b shows a set of valuesfor price that when added to those ball life reproduce the rankings for Buyer 1 in Figure 4a. These three steps—collecting trade-offs.75 3 6 9 Figure 4b Buyer 1 Average Ball Life 54 holes 50 36 holes 25 18 holes 0 $1. we get the results in Figure 7.75 The values for Buyer 1 in Figure 5 when added together give us an estimate of his preferences.25 20 250 yards 60 36 holes 25 $1.50 5 11. Starting with the values we just derived for ball life.50 2 5 8 Price $1.50 5 225 yards 0 18 holes 0 $1.75 0 Average Driving Distance Average Ball Life Price Let’s see how we would use this information to determine which ball to produce. Suppose we were considering one of two golf balls shown in Figure 6.75 0 Total Utility 105 110 Distance Ball Long-Life Ball We’d expect buyer 1 to prefer the long-life ball over the distance ball since it has the larger total value. Applying these to the two golf balls we’re considering. It’s easy to see how this can be generalized to several different balls and to a representative sample of buyers.25 1 4 7 $1. Next suppose that Figure 4a represents the trade-offs Buyer 1 is willing to make between ball life and price. Figure 6 Distance Ball Long-Life Ball Distance 275 250 Life 18 54 Price $1.25 20 (1) 70 (4) 45 (7) 20 $1.50 5 $1.556 (2) 55 (5) 30 (8) 5 Price $1. and making choice predictions— form the basics of © Copy Right: Rai University RESEARCH METHODOLOGY 211 . Figure 7 Buyer 1 Distance 275 100 250 60 Life 18 0 54 50 Price $1.75 0 (3) 50 (6) 25 (9) 0 We now have in Figure 5 a complete set of values (referred to as “utilities” or “part-worths”) thatcapture Buyer 1’s trade-offs.50 $1. Figure 4a Buyer 1 Average Ball Life 54 holes 36 holes 18 holes $1.(5) 85 (6) 60 Average Driving Distance 225 yards 0 (7) 50 (8) 25 (9) 0 Notice that we could have picked many other sets of numbers that would have worked.

conjoint analysis. It’s easier to collect conjoint data by having respondents rank or rate concept statements or by using PC-based interviewing software that decides what questions to ask each respondent. RESEARCH METHODOLOGY Point to Ponder • Conjoint analysis is a technique that typically handles non metric independent variables. not many researchers use them nowadays. Although trade-off matrices are useful for explaining conjoint analysis as in this example. But if you understand this example. you understand what conjoint analysis is and what it can do for you as a marketer. based on his previous answers. As you may expect there is more to applying conjoint analysis than is presented here. • Respondents provide preferences data by ranking or rating cards that describe products • These data become utility weight of product characteristics by means of optimal scaling and loglinear algorithms 212 © Copy Right: Rai University 11. • Conjoint analysis allows the researcher to determine the importance of product or service attributes and the levels of features that are most desirable.556 .

i. so that if the score falls on one side of the boundary (standard score less than zero. i. In DFA. Discriminant function analysis is a technique that has received lots of theoretical attention from statisticians as well as somewhat less attention from users. What is Discriminant Function Analysis? Discriminant analysis is used to analyze relationships between a non-metric dependent variable and metric or dichotomous independent variables. The usefulness of a Discriminant model is based upon its accuracy rate. which allows us to predict which group a new observation will belong to. it is predicted to be a member of the other group. Again our exposition focuses on the intuitive foundations and applications of this technique as the statistics is beyond the scope of our course. and what you want to do with the function is to maximize the distance between those groups.. and we examine whether those variates differ among discrete classes of one or more independent variables. It is similar to multiple regression and somewhat to MANOVA. or ability to predict the known group memberships in the categories of the dependent variable. If the dependent variable defines two groups. The grouping variable must be nominal. the least squares regression. based on certain characteristics or features what category an individual will fall into. Discriminant analysis is one such technique. Although we can examine 11. In MANOVA. Conceptually. between a continuous dependent variable and the regression variate. the dependent variable consists of discrete groups. Discriminant function scores are computed similarly to factor scores. In particular.. two statistically significant discriminant functions are required to distinguish among the three groups. but in reality the two techniques are quite different.e. if the dependent variable defines three groups. this appears to be what MANOVA does. i. Regression is built on a linear combination of variables that maximizes the regression relationship. Frequently as researchers we are asked to classify people or objects into one or more groups. There are other multivariate techniques. and as we’ve seen. The discriminant function is similar to a regression equation in which the independent variables are multiplied by coefficients and summed to produce a score The data structure for DFA is a single grouping variable that is predicted by a series of other variables. What Does it Do? DFA can be used in both an exploratory and predictive mode.. many of the other techniques use discriminant functions as part of the technique.. The function is presented thus: Y’ = X1W 1 + X2W 2 + X3W 3 + . using eigenvalues. the case is predicted to be a member of one group) and if the score falls on the other side of the boundary (positive standard score). the reality is that the way in which they compute the functions is quite different. The analysis statistically is very similar in to multiple regression.Xn W n + Constant This is essentially identical to a multiple regression. It is possible to categories the state as developed or less developed from the common set of observed variables ? or if the states have been categorized already on a hypothetical basis . how far our data in itself serves to discriminant between these hypothesized categories.RESEARCH METHODOLOGY LESSON 40: DISCRIMINANT ANALYSIS In the last few lessons we had learnt to do multiple regression analysis. Example: We may have data on various economic and social variables of different states of our country. The computations find the coefficients for the independent variables that maximize the measure of distance between the groups defined by the dependent variable.e. The problem faced by the researcher is to find some problem with which he can predict the categorize into which the individuals would fall. we can think of the Discriminant function or equation as defining the boundary between groups. Differences Among Average Scores At first glance.e. we have multiple dependent variables. and the book makes a big thing of the similarities. which might also be a reclassification of a continuous variable into groups. to come up with a function that has strong discriminatory power among the groups.556 © Copy Right: Rai University 213 . Although logit regression does somewhat the same thing when you have a binary (two group) variable. but it is really quite different. one statistically significant Dscriminant function is required to distinguish the groups. Let’s look at each one. Discriminant scores are standardized. it can: • Determine whether there are differences among the average scores of a series of variables for two or more groups (note the similarity to MANOVA) • Determine which variables account for these differences • Be used as a statistically-based classification procedure • Understand the dimensions of discrimination and how groups are discriminated • Note that these proceed from a hypothesis-testing mode to an exploration mode. which measure the association between variables. So that he can actually predict. Dicriminant analysis attempts to use the independent variables to distinguish among the groups or categories of the dependent variable. combined into a variate. etc. Discriminant analysis works by creating a new variable called the Discriminant function score which is used to predict to which group a case belongs. The result of discriminate analysis is an equation.

If you enter all variables. that is the preferred technique. 2nd. 214 Assumptions The two primary assumptions of DFA are multivariate normality and equality of covariances. we want to develop a function that will maximize the distance among the groups. If you have lots of coefficients over 1. and for each there is a WL computed. Finally.. you get into the classification phase. those with a very low tolerance (extreme multicollinearity) will be excluded. What used to be more species were collapsed down to the 6 once that was discovered. Each successive function will explain less variance than the previous. So that shows you the relationship between the 2 techniques . Then the successive functions are computed.are these 6 species differentiated on the basis of shell morphology. but basically I found that there were some differences. and once again those can be interpreted as weightings. is to apply the classification phase and use chisquare analysis to see how well the function separates the groups. That meant dissecting 100’s of specimens. A high canonical correlation indicates that a large amount of the variance in those variables is expressed by the function. So what was the DFA? I used the 9 shell measures to develop a function that I could apply to the species. and be less significant. and each case can then be placed into the most likely. so I don’t recall the details of the outcome. One of the things you can do with DFA. value. I started testing myself. the focus is on the groups as the outcome. etc. The outcome is compared against the group distributions. and you can use these functions to look at different aspects of the data. More recent collections were stored in alcohol.0. There is also a canonical correlation. © Copy Right: Rai University 11. Which Variables Account for These Differences? In this case. i. Can anyone tell me how you could get the standardized form if the program didn’t do it for you? and I used the functions to go back and put a maximum likelihood onto the other specimens in the CU museum. depending on which options you choose. we will have the coefficients in a raw and standardized form. In DFA. second most likely. My question was . there may be a problem with multicollinearity. we are looking at differences among the variates. A “good” function should then be able to do a good job in this phase. with the entire body present.e. there were hundreds of collections of these things in museums. An Example We look at the classification problem faced by a ecologist attempting to classify different types and distribution of a family of land snails.. with 2 species each. Just like all the other techniques that deal with coefficients. I also used MANOVA to test for differences among the 6 species based on these 6 characteristics. mostly just with the shells. suggesting that there were indeed differences. in terms of whether a variable is considered dependent or independent. Here. there’s loads of time for the mind to wander. so that you can may refer to that if necessary. rather than discontinuous. The first output that you get is an assessment of the statistical significance of the functions. SPSS arranges these by function. I also took 9 measurements of the shell. and what variables do that differentiation? I actually can’t find a copy of the thesis. but later it was shown that this really was not a good basis for the differences. Like regression. Before the functions are computed. you can use a stepwise procedure to come up with a “least number of variables” analysis. In CO. Statistically Based Classification The form of the DF looks so similar to a regression that it is hard to understand how it differs. etc. Next you get the function values for the group means. This is an expression of whether the differences among the group means are significant. and you can save the group membership probabilities as new variables. the cross-loadings in a sense. so that you can see the variables that load highly on the first function. in a sense an ordination. Those are primarily helpful if you need to plot the relative positions of the groups. I’ve found that the most straightforward thing to do is to request a classification summary only. My job was to do a thorough description of these species and to write a monograph on them. you are given a Wilk’s Lambda that expresses the difference among the group means. As an aside. The classification summary will tell you the number of cases in each group as predicted by the analysis and from your data. and that the reproductive anatomy was the key. Following the significance tests are the standardized coefficients. how well.interaction among the independent variables. i. the family has 3 genera.e. Certainly. which is the correlation between the function and the original variables. The next set of output are the correlations between the discriminating variables and the functions. we are looking at the loadings of the variables on the discriminant function. As you do work like that. and found I was a little better than random.556 . You can save the discriminant scores as new variables. so I began imagining that I could really see differences in the shell morphology. and as I did that.one is almost the inverse of the other. and that if you have only 2 groups. Understand the Dimensions DFA will give you multiple functions (one less than the number of groups). and your output will depend upon what you requested. as a further test of its significance. So now what? The classification is statistical because it gives you a likelihood of group membership. the method and what it tries to accomplish differs . or combinations of continuous dependent variables. Well. and then to save the probabilities of individual group memberships as new variables. but SPSS and most other programs will do that step for you. All the early work on their descriptions was based on shell morphology. There are some suggestions that logistic regression is less sensitive to differences from normality. RESEARCH METHODOLOGY Interpreting Output DFA gives you a lot of output. called the centroids.but the outcome of applying the function is a continuous.

For example. If you have more than 2 groups. it will test all possible dichotomous divisions of a dataset looking for a binary variable (or a binary division of a continuous variable) that produces the “best possible” dichotomous division in the data. most students will naturally fall into one of the three categories. The probability of an event (i.. There are forms that use logit regression and forms that use Dscriminant functions. because it can not only look at each variable but at all possible splits of that variable (presuming it’s continuous). or not at all (group 3). But a key limitation is that you are predicting only one variable. It is computationally very intensive. but basically what it does is to help understand structure in complex data sets. then she is likely to be a female. so that you don’t lose 11.e. then he is likely to be a male. the probability of “membership”) is expressed by a logistic equation. It then takes those subsets and redivides them. that in most data sets. For Dscriminant analysis. So it’s really up to you.regression tree analysis. (2) to attend a trade or professional school. Let us consider a simple example. A large part of the material has been taken from Dr. discriminant function analysis is very similar to analysis of variance (ANOVA). A medical researcher may record different variables relating to patients’ backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1). for example. DFA is the only choice. interesting secondary structure in one part of the gradient just because it was subsumed elsewhere. This is still fairly rare. 1984-2003 Discriminant Function Analysis • General Purpose • Computational Approach • Stepwise Discriminant Analysis • Interpreting a Two-Group Discriminant Function • Discriminant Functions for Multiple Groups • Assumptions RESEARCH METHODOLOGY When to Use It. i. For that purpose the researcher could collect data on numerous variables prior to students’ graduation. From logit regression. or (3) to seek no further training or education. After graduation. Most of the applications of these seem to be in data sets where you have a large number of potentially predictive variables. Suppose we measure height in a random sample of 50 males and 50 females.. The outcome is not a prediction of Y but a probability of belonging to a particular state of Y... and so on.You can. variable height allows us to discriminate between males and females with a better than chance probability: if a person is tall. an educational researcher may want to investigate which variables discriminate between high school graduates who decide (1) to go to college.556 • Classification General Purpose Discriminant function analysis is used to determine which variables discriminate between two or more naturally occurring groups. Further theoretical and applications from the internet © Copyright StatSoft.e. partially (group 2). because few people have the tools and skills to work with it. however. on the average. A biologist could record different characteristics of similar types (groups) of flowers. not as tall as males. Therefore. Marilyn D. Logit analysis is regression of a series of independent variables on a binary dependent variable. One of the beauties of logistic regression is that it potentially solves the problem of secondary variation that I presented a few days ago. and you’re trying to make sense of them. if a person is short.e. Discriminant Analysis could then be used to determine which variable(s) are the best predictors of students’ subsequent educational choice. Lots of uses in remote sensing and landscape ecology. Females are. Logit regression is a hierarchical dichotomous technique based on intensive succession computations. i. Essentially. it is most sensitive there. and regression trees are quite similar.. A tree analysis will treat each of those separately. Inc. Computational Approach Computationally.. i. i. and When to Use Logistic Regression I just want to talk very briefly about logistic regression. Some of you may have read about classification trees.. and this difference will be reflected in the difference in means (for the variable Height). find just as many papers saying that DFA is really quite a robust technique. it would be the variable that best separates the group means. That makes it not very useful for community classification. with 1 (never reached) being a perfect probability. where most cases were classified as 1 or 0 (or approaching those). It is called “logistic regression” because it is based on a logistic curve The slope of this curve is low at the upper and lower extremes of the independent variable(s) and highest in the middle. but recently a new technique has come up that uses it .. the most homogeneous subsets. the secondary variation is different along different parts of the primary gradient. and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types. Logit regression has not had that much usage in ecological/ biological work because of the paucity of situations regarding binary variables.e. © Copy Right: Rai University 215 . Walker web site on multivariate techniques.e. that would be a variable for which the individual cases load toward the ends and away from the middle.

and then to use that variable to predict group membership (e. of new cases). the final significance test of whether or not a variable discriminates between groups is the F test.556 .g. Another way to determine which variables “mark” or define a particular discriminant function is to look at the factor structure. As in MANOVA. To learn more about how one can test for the statistical significance of differences between means in different groups you may want to read the Overview section to ANOVA/MANOVA. one would first test the different functions for statistical significance. If one wants to assign substantive “meaningful” labels to the discriminant functions (akin to the interpretation of factors in factor analysis). which arise from analyses with more than two groups and more than one variable.. as evident in observed mean differences. Factor structure matrix. when interpreting multiple discriminant functions. the principal reasoning still applies. Only those found to be statistically significant should be used for interpretation.” For example. if statistically significant. it should be clear that. In that case. One can test the number of roots that add significantly to the discrimination between group. In the case of a single variable. Analysis of Variance. unless the n is fairly large (e. In the vertical direction (Root 2). Usually. if the means for a variable are significantly different in different groups. 1975) has shown that the discriminant function coefficients and the structure coefficients are about equally unstable. namely.RESEARCH METHODOLOGY We can generalize this reasoning to groups and variables that are less “trivial. and (2) they allow for the interpretation of factors (discriminant functions) in the manner that is analogous to factor analysis. a slight trend of Versicol points to fall below the center line (0) is apparent. while the structure coefficients denote the simple correlations between the variables and the function(s). The factor structure coefficients are the correlations between the variables in the model and the discriminant functions. The reasons given by those authors are that (1) supposedly the structure coefficients are more stable. likewise. We can compare those two matrices via multivariate F tests in order to determined whether or not there are any significant differences (with regard to all variables) between groups. that we are looking for variables that discriminate between groups.g. we 216 © Copy Right: Rai University 11. Thus. use the discriminant function coefficients (weights). and. However. F is essentially computed as the ratio of the between-groups variance in the data over the pooled (average) within-group variance. Specifically. However. Multiple Variables. and only consider the significant functions for further examination. Next. If the between-group variance is significantly larger then there must be significant differences between means. if there are 20 times more cases than there are variables). if one wants to learn what is each variable’s unique contribution to the discriminant function. The most important thing to remember is that the discriminant function coefficients denote the unique (partial) contribution of each variable to the discriminant function(s). we have a matrix of pooled within-group variances and covariances. 1975. one includes several variables in a study in order to see which one(s) contribute to the discrimination between groups. To summarize the discussion so far. If the means for the two groups (those who actually went to college and those who did not) are different. Summary. Some authors have argued that these structure coefficients should be used when interpreting the substantive “meaning” of discriminant functions.. This procedure is identical to multivariate analysis of variance or MANOVA. non-significant functions (roots) should be ignored. In this example. We could have measured students’ stated intention to continue on to college one year prior to graduation. one can ask whether or not two or more groups are significantly different from each other with respect to the mean of a particular variable. then we can say that intention to attend college as stated one year prior to graduation allows us to discriminate between those who are and are not college bound (and this information may be used by career counselors to provide the appropriate guidance to the respective students). we have a matrix of total variances and covariances. the basic idea underlying discriminant function analysis is to determine whether groups differ with regard to the mean of a variable. subsequent Monte Carlo research (Barcikowski & Stevens. suppose we have two groups of high school graduates: Those who choose to attend college after graduation and those who do not. As described in Elementary Concepts and ANOVA / MANOVA. To summarize. then we can say that this variable discriminates between the groups. Stated in this manner. proceed to see which of the variables have significantly different means across the groups. if you are familiar with factor analysis (see Factor Analysis) you may think of these correlations as factor loadings of the variables on each discriminant function. the discriminant function problem can be rephrased as a one-way analysis of variance (ANOVA) problem. even though the computations with multiple variables are more complex. then the structure coefficients should be used (interpreted). Root (function) 1 seems to discriminate mostly between groups Setosa. Significance of discriminant functions. one could first perform the multivariate test. and Virginic and Versicol combined. Huberty.

556 © Copy Right: Rai University 217 . 11. one can also examine the factor structure matrix with the correlations between the variables and the discriminant functions. Finally. whereas the predictor s are metric. The larger the standardized b coefficient. we would look at the means for the significant discriminant functions in order to determine between which groups the respective functions seem to discriminate. the larger is the respective variable’s unique contribution to the discrimination specified by the respective discriminant function. • The groups are defined by a categorical variable with two or more values. • The effectiveness of the discriminate equation is based not only on its statistical significance but also on its success in correctly classifying cases to groups. RESEARCH METHODOLOGY Point to Ponder • Discriminate analysis is used to classify people or objects into groups based on several predictor variables. In order to derive substantive “meaningful” labels for the discriminant functions.would look at the standardized b coefficients for each variable for each significant function.

Discriminant analysis is essentially used to divide objects into two or more groups. Alternatively we might want to cluster brands to determine which brands are regarded as similar. joining the two most similar. This concept has extensive applicability in marketing where we may want to group potential customers into homogenous groups in which he could develop a specific marketing mix to reach to reach a particular group. cluster analysis. and some of them use statistical probabilities or statistical quantities such as sum of squares at various points.556 . It is different from Discriminant analysis because the number and characteristics of a group in a cluster analysis are not known prior to the analysis. they have their own challenges. Cluster analysis is neither a single technique nor a statistical technique. However. We now turn to another technique which groups objects. based on relationships within the data. There are many different ways to do this. There are 2 ways of doing a hierarchical analysis agglomeratively or divisively. the hierarchical methods are more widely used because they give greater insight into the overall structure. The result for students might be one group which likes outdoor activities. and then repeatedly joining new clusters together. However the basis on which it does so is different from Discriminant analysis.they start with a data set. etc. and each case is assigned to a particular class. Generally. It is a mathematical formula for dividing data into classes. Hierarchical vs. Cluster analysis is a technique of grouping individuals or objects into unknown groups. as they give you no means of assessing likelihood. and continually subdivide the subsets until some predetermined threshold is reached. The basis of grouping could involve a variety of socio economic and psychographic characteristics. For example we might cluster them on the basis of product benefits they seek from a college or by lifestyles. That threshold might be determined by group size or relationships among group members. One goal of marketing of a marketing manager is to identify consumer segments so that the marketing programme can be developed and tailored to each segment. The result of Discriminant analysis is an equation which can predict which class or group a new object will fall into. But overall. without a preconceived notion of what those classes are. subdivide it. If a test marketing programme is planned we could cluster different cities so that different marketing programmes can be tried out in different cities.RESEARCH METHODOLOGY LESSON 41: CLUSTER ANALYSIS So far we have learnt how to classify objects into different groups using Discriminant analysis. the techniques themselves are not really statistical. nonhierarchical approaches Nonhierarchical approaches divide up your data into a set of classes. This can be done by clustering consumers. Agglomerative methods begin with each sample representing a cluster. Each segment would have different product needs and and may repond differently to advertising. another that enjoys entertainment . Again a technique using complex high level statistics our exposition will focus on obtaining an intuitive understanding of the technique and its applications. Divisive methods are similar to what we talked about with regression tree analysis . Results of hierarchical methods are usually presented as a dendrogram: Rescaled Distance Cluster Combine CASE 0 5 10 15 20 25 Label Num +———+——+———+———+———+ Case 11 11 -+—————+ Case 26 26 -+ Case 2 +—+ +————+ | | | | | | +——————+ | 2 ——————+ +———+ Cluster Analysis: What It is and What It’s Not Frequently marketers are interested in putting people or objects into groups on the basis of similarities among the objects based on a common set of measures . what are 2 techniques that we have learned that you could use to explore the validity of your classification? 218 Case 12 12 —————+——+ Case 29 29 —————+ Case 28 28 —————+ Case 19 19 ——-+—+ Case 25 25 ——+ +——+ Case 27 27 —————+——————+ Case 18 18 ————+ +—————————+ | Case 20 20 ——+———+ | | Case 22 22 ——+ Case 1 Case 8 Case 6 8 ————+ 6 ———+—+ +-+ | | | | | | | | +——+ | | | | | | | | | | | | | 1 ————+—+ Case 10 10 ———+ +—+ Case 13 13 —————+ +——+ Case 17 17 —————+—+ Case 21 21 —————+ Case 14 14 —————+-+ Case 16 16 —————+ +-+ Case 30 30 ——————+ | Case 3 Case 4 3 ——+—+ +——+ 4 ——+ +——+ +—————+ © Copy Right: Rai University 11. Once you had come up with some classes.

SPSS offers several ways to do this . where you are simply joining two samples. you should rescale everything to a common system. you can usually get around this decision quite easily by just using the Euclidean distance. The key decisions are whether to look at the individual cases and their similarity. The common distance measures are the Euclidean distance. Distance Measures and Agglomeration Methods In order to do the agglomeration or division. etc. the Euclidean distance in multivariate space will be the preferred method. then obviously those with a stronger “structure” and greater overall variance will control the clusters. Even when your data are all on the same “scale”. but a case that many of us might wish to work with. and eventually everything is joined together. which joins groups with the nearest centroids. you often want to determine a point at which you will work with the clusters. This can cause a lot of problems . I’ve found that it is the most practical and useful method in terms of being able to straightforwardly interpret the output. This is not so important at the very first step. Both agglomerative and divisive techniques can be used to produce a dendrogram. because you aren’t testing any hypotheses.within-groups.e. scaling from -1 to +1. which is a measure of distance. there has to be a means of measuring multivariate distance. that is somewhere between each sample being a cluster and all samples being one giant cluster. making the hierarchical structure go “backwards” in some cases. there is no true hierarchical structure. When you actually work with the data. which is often what we expect in ecology. and you 219 11. With species data. the more dissimilar the clusters are. This is also called chaining. If you wish to avoid this.556 © Copy Right: Rai University .. Percentage similarity is calculated between pairs of samples. but there are some considerations about scaling.. The clusters tend to be very clean and welldefined. 1 minus the %age similarity is the percentage dissimilarity. I should also say that we will work with cluster analysis on “raw” data. Basically . These are two different techniques that both tend to favor “spherical” clusters with equal variance and sample size. multiplied by 2. So that is another approach that can be used that is particularly useful if you’re working with something like species data.the percentage similarity.z-scores. Interpretation and Validation The interpretation and validation stage of cluster analysis are big problems. but often what happens is that first you reduce the data using an ordination technique.Case 23 23 ————+ Case 7 Case 5 Case 9 | | | | | +————+ | | 7 ———————+ average of the cluster. We’ll talk a lot more about scaling when we get to ordination. Two common problems in cluster analysis are chaining and reversals. scaling from 0 to 1. there is another common “distance” measure that is often used . 2 choices for count data. but it gets very important later on. and then you do a clustering on the reduced data. divided by the sum of the species abundances for each sample. choosing the most similar cases. but for now we’ll just think about it in terms of scaling different variables to a common system. whichever sample has the least of a species. Most clustering of vegetation samples will start with a matrix of dissimilarities and work directly from that. where you may have literally hundreds of variables. and the centroid method. and plenty of others. if you are using a technique that bases distance on linear correlation. Generally. It isn’t a terrifically useful new structure. these problems can all be avoided by using a complete linkage method. and 7 choices for binary data. and it is the sum of the minimum value for each species (i. you may actually have some problems with this. If the variables that you are clustering have different scales. Chaining is where single samples join a larger cluster each time. The betweengroups method also uses sum of squares. A key thing to understand about hierarchical techniques is that each sample has “membership” in multiple groups. at an Assumptions and scaling problems Cluster analysis has no “assumptions” per se.is it useful? Answer that question. There are also some “problems” that can occur if your variables have certain structure and you techniques that don’t consider that. This is a nice number. the “City-Block” distance (simple distance that doesn’t adjust for geometry . the squared Euclidean distance. then you are assuming that linear correlation is a reasonable approximation of distance within your data.often what you get is a series of clusters that each have one new member. This method tends to create “globular-shaped” clusters that have unequal variances and sample sizes. because the groups are themselves clustered hierarchically. because it varies between 0 and 1. Reversals are caused when an entity joins another cluster at a higher level of similarity than was there before.e. There are other methods .not usually recommended). Again. SPSS gives you 8 choices if your data are interval or greater. Generally. you don’t get a lot of chaining or reversals. RESEARCH METHODOLOGY 5 —————————————+ 9 ———————————————+-+ Case 24 24 ——————————————+ +——+ Case 15 15 ————————————————+ The further to the right. so instead of a cluster analysis. or at some other means of joining groups of points. because there is no way to really effectively assess the outcome. There are basically 3 common techniques: Single linkage or nearest neighbor joins the two clusters that have the 2 most similar individual points. i. you really have an ordination. Another decision point comes in in how the the agglomeration is done. For example. That’s too many variables! Species data are really a somewhat special case. which often gives a more stable solution than working with the original data. even if that number is zero). but they are not very useful for all practical purposes. In certain circumstances you might want such a structure. which is closely related to Ward’s. Complete linkage or furthest neighbor joins the two clusters that have the most distance points the most similar. Ward’s Method and Between-groups linkage. Ward’s method is based on minimizing the within-cluster sum of squares.

Marilyn D. Further application and theory from various web sites RESEARCH METHODOLOGY 220 © Copy Right: Rai University 11.will have gone a long way. and you can use DFA to look at which variables are contributing most strongly to the clustering. Walker and various other web sites. You can statistically evaluate how well differentiated your clusters are using MANOVA. Using these other techniques to iteratively improve your clustering is recommended.556 . A lot of the contents and material for this chapter have been taken from 1998 Dr.

556 © Copy Right: Rai University 221 .RESEARCH METHODOLOGY 11.

RESEARCH METHODOLOGY 222 © Copy Right: Rai University 11.556 .

11. · Cluster comparison and validation.g. political affiliation. products. Euclidean distances. market segment characteristics. symptom classes. productivity attributes). buyers. financial status. · Selection of mutually exclusive clusters (maximization of within –cluster similarity and between –cluster differences ) or hierarchically arranged clusters.RESEARCH METHODOLOGY Points to Ponder Five steps are basic to the application of most cluster studies· Selection of the sample to be clustered (e. or people (e. inventory. medical patients. employee) · Definition of the variables on which to measure the objects. · Computation of similarities among the entries through correlation. product competition definitions.g. events.556 © Copy Right: Rai University 223 . and other techniques.

“Interpolation”. The records may either be missing or destroyed. say 1988 or 1999. Interpolation relates to the past whereas the extrapolation gives us the forecast. If by looking to the data we find that the fluctuations in the series are Regular we can expect quite accurate estimates. . There is no basic difference between interpolation and extrapolation as far as the methods are concerned but I distinguishing past from future we give them two different names. What is desirable is to obtain the required estimates by analysing the available data. using these techniques you can find these missing values or you can predict some of the future values. 2. It often happens that a particular type of information is being col1ected regular intervals such as the census data. 1971. If we are given two variables X and Y. Y=f(x) = Yx For values of X0. 1961. thus refers to the insertion of an intermediate value in a series of items whereas “extrapolation” refers -projecting a value for the future. . Thanks! To the “Interpolation & Extrapolation”. X being the independent variable and Y the dependent variable. What do you do in such a situation? and fortunately you are not knowing the relationship between input data & the output data. Also when you conduct survey. you will be able to – • Find missing values from given data series with regular 1. Activity: Think & write some of the applications of Interpolation & Extrapolation in real life situations But to apply these methods we are assuming certain assumptions. after completion of this lesson. Xn of the X variable and Y0. 1981. 1951. interval between input data values • Find missing values from given data series with irregular interval between input data values • Predict the data values for future points Introduction Dear friends many times. l c v r 0 e r . they are not substitutes for actual values. Application of Interpolation and Extrapolation The tools of interpolation and extrapolation are of great practical utility their utility shall be clear from the following: 224 © Copy Right: Rai University 11. b.556 .RESEARCH METHODOLOGY LESSON 42: INTERPOLATION AND EXTRAPOLATION Objective Dear friends. in practical work we come across s t a i n w e e have to estimate a value which is not available i u t o s h rwe in the given data series or we want to predict a future value. We may find that for a particular year.. we can say that the interpolated or extrapolated values are only best possible estimates under certain assumptions.. e. And that may be highly deceptive. The variations between actual and estimated values are quite natural. For example.. X2. On knowledge of the course events with which the figures are connected. we may be studying the figures of sales for a company from 1980 to 1998. Despite the great significance of interpolation and extrapolation it should be noted that they give us only the most likely estimates under certain assump-tions. One way out is to make pure guesswork. The accuracy of interpolation depends upon a. For example. say 1992 data are not available. Knowledge of the possible fluctuations of the figures to be obtained by a general inspection of the fluctuations at dates for which they are given. Y2……. the actual population obtained by census may be different. Interpolation supplies us with the missing link whereas extrapolation helps in forecasting. However. and 1991. the census of population in India takes p a ee e y1 y a s i we have the census figures for 1931. The techniques of interpolation and extrapolation are extremely helpful in estimating the missing values and projecting the future values. What Should We Do? To talk of a census for 1998 or for 1999 is impracticable.. Now suppose if we need the population figure for. on the basis of census figures of 1931 to 1991 for India we might get population for 2001 as 1000 million. in many situations during your research work. there is possibility that some of data points may be missing or destroyed. The technique of interpolation is also used where a part of the data destroyed or missing. Yn respectively of the Y variable. some of the values are missing or you are unable to get some of the data. Xl . The reader should not form the impression that the figures obtained by this technique would be 100 per cent correct in practice. Y 1. Let us take example of. 1941. Now if we require population figures for the year 1998 or 1990. We say that Y is a function of X and is expressed as. Thus. The only alternative is to make use of the technique interpolation and extrapolation. So the friends. it would be impracticable to conduct a census for these years. but may be these data is crucial for the results of your study. Now discuss some of the applications of interpolation & extrapolation - I w w n t e t ma et ev l eox for any value of X between f e a t o s i t h au Y f limits Xo and Xn the technique of interpolation can be used.

Binomial expansion method 2.6Yl + Yo = 0 Y7 .63 27. 2. say 5. While extrapolating a value the same assumption would apply. in other words.2Y 1 + Yo = 0 Y3 .Yo = 0 Y8 .Yo = 0 Y4 .126Y 4 + 84Y 3 .Yo= 0 ∆ 4 or ∆4 0 6 5 5 or ∆ 0 6 or ∆0 7 7 or ∆0 ∆8 8 or 0 9 or ∆9 0 The powers in the expanded formula are used as subscript. The x-variable advances by equal intervals. 24 etc. The expansion of the binomial formula as shown above appears to be a little difficult and complex. Algebraic methods The following are some of the important and more popular methods under algebraic method: 1. This would be done the assumption that throughout this period from 1941-91.4Y3 + 6Y 2 . For example. of known values 2 2 or 0 3 0 3 or Equation for determining the unknown values ∆ Y2 ..84Y 6 + 126Y 5 . = 0 2! 3! where n is the number of observation of y.8Y 1 + Yo = 0 Y9 . The corresponding value of the variable. 51. we always presume that there are no sudden ups and downs in the data or. The various methods of interpolation can be divided in two heads: 1.556 © Copy Right: Rai University .91 31. For the different values of ‘n’ we have the following results: No. 10. The same is true for extrapolation. In this method.56Y2 + 9Y1 . 71. 61.11 43. this can be done by using the pascal’s triangle given below: 225 11.5Y4 + 10Y 3 .4Y l + Yo = 0 Y5 .33 84.35Y4 + 35Y3 .3Y2 + 3Y l .13 68. We take one variable on X axis (say for example year on X Axis) and on the Y-axis the values of the another variable (say sales). Binomial Expansion Method This method of interpolation is simple to understand and requires very little calculations.. The period for which the value is to be interpolated a perpendicular is drawn starting from that point on X-axis to some point on the line (or curve). When this method is used the given data are plotted on a graph paper and the plotted points are joined.9Y8 + 36Y7 . The point where it meets the line or the curve another perpendicular is drawn on the Y-axis.6Y5 + 15Y 4 .. The following example shall illustrate the method. Now let us discuss some of the methods of interpolation & extrapolation. it is applicable only in those situations where Following two conditions are satisfied: 1. 8. which are described below: Graphic Method: This is the simplest of all the methods of interpolation. the data depicts some sort of continuity. i. we can extrapolate the value for X = 30 and not X = 28. Lagrange’s method Each one of these methods is suitable in a certain set of circumstances. This assumption again may not be good in practice in many cases. 15. The value of X for which Y is to be interpolated is one of the class limits of X series. Another assumption made while interpolating or extrapolating values is that the rate of change of figures from one period to another is uniform. expand the binomial (y . is the required value..20Y3 + 15Y 2 .Assumptions of Interpolation And Extrapolation The following assumptions are made while making use of the techniques interpolation and extrapolation: 1. However in many cases this assumption may not hold good and our value may be faulty. For example. If the increase is not uniform this method is not applicable.lOY2 + 5Y l – Y0 = 0 Y6 . considering the assumption written above. 20. For example. However.92 54.56Y 3 + 28Y 2 .21Y2 + 7Y l .7Y6 + 21Y5 . 81 and 91 and we are asked to interpolate the figure for 1988.82 RESEARCH METHODOLOGY Solution: It is clear from the graph that the interpolated figure for 1986 is 77.e.87 36. observe the following data: X: Y: 5 30 10 32 15 ? 20 38 25 40 Method of Interpolation Now let us see how we can interpolate the values. Graphic method 2. There are no sudden jumps in the series from one period to another.l) n and equate it to zero. Newton’s method 3.13. there have been violent changes in population.5 crores. Example: From the following data determine the population for the year 1986. if we are given the population figures for 1941.8Y7 + 28Y6 . Thus in above illustration our assumption would be that from 1941 to 1991 the growth rate of population has been uniform. if x is 5. Population (crores) 25. This method cannot be applied. (y − 1)n = y n − ny n−1 + n(n − 1) n −2 n(n − 1)(n − 2) n −3 y − y + . Year 1961 1971 1981 1921 1991 1931 1941 1951 We can determine the value of Y corresponding to X = 15 but not corresponding to X= 12 or 18.While Interpolating a value. 15. However.56Y 5 + 70Y 4 .. When there are only two values we shall get a straight line otherwise a curve shall be obtained.. 2. 25 etc.

5y5 + 10 (350) . Age of mother 15-19 20-24 25-29 30-34 35-39 40-44 3.6 (22) + 20 = 0 . y6 = 430 ∆6 = y 6 − 5 y 5 + 10 y 4 − 10 y 3 + 5 y 2 − y1 = 0 1 Substituting the values y5 . y3.20 .43 . y2..e.. We can use the same method for more than one missing values. shall assume that the fifth order differences will be zero. Year 1960 1965 1970 1975 1980 1985 1990 Production (in tonnes) 20 22 26 30 35 ? 43 1 1 Two or more missing values.lOY2 + 5Yl – Y 0 = 0 Y5 = 5. if we are given n values. 6 = 3 + 3. Y5 . y1 = 220. y4 = 350. When two values are missing in a series we get two unknown quantities in the equation obtained by the binomial expansion. (y-1)6 =0 or y0 = 20 y4 = 35 ∆6 = 0 y1 = 22 y5 = x y2 = 26 y6 = 43 y3 = 30 Production (in 000 tonnes) 200 220 260 ? Year 1995 1996 1997 Production (in 000 tonnes) 350 ? 430 ( y − 1) 6 = y 6 − 6 y 5 + 15 y 4 − 20 y 3 + 15 y 2 − 6 y 1 + y 0 = 0 Substituting the values of y0.1) + 5(2. y5 = ?..5 (350) + 10y 3 ..39 or 4.l)th differences are constant.7 Y3 = x tuting values Calculate x =…… X = 4. y6 43 – 6x + 15 (35) . e.Y 0 Now for the coefficients of each term.556 .7 substi- RESEARCH METHODOLOGY 5.Y4 + Y3 . ‘we assume ∆n−1 . In such a case.6x = .39 Thus the expected average number of children born per mother aged 30-34 is 4. Y 1 Y 0 With alternative sign + Y 5 . Y 5.7 5.1 in years children born Solution. are constant.6x = .For value of ‘n’ which the power in expansion. Y 2. Solving ∆n = 0 ∆ny1 = 0 ∆ny 2 = 0 ..246 x = 41 Hence the estimated production for the year 1985 is 41 tonnes. ∆n −2 .1 )th differences are constant. Where two or more values are missing the binomial expansion method can easily be applied.220 = 0 y5 + 10y 3 = 3450 5y5 – 10y 3 = 5010 4y5 = 1560 or y5 = 390 Subtracting Eqn.10. Example: Using an appropriate formula for interpolation estimate the average number children born per mother aged 30-34.8 – 5(5. Since the known values are six.5Y4 + 10Y3 . They are: ∆50 = y5 − 5 y 4 + 10 y 3 − 10 y 3 + 5 y1 − y 0 = 0 y0 = 200.8 Average no. i. and so on. y1. Y 3.7) + 10(x) – lO (3. there are n+1 number of terms – say for example n=5 there are 5 terms. Example : Estimate the production for the years 1994 and 1996 with the help of the Year 1991 1992 1993 1994 Solution. y4. If (n . y5.7 2. (i) from (ii) Substituting the value of y5 in the above equation 390 + 10 y3 = 3450 or 10y3 = 3060 226 © Copy Right: Rai University 11.8 Y4 = 5. of 0.1 ? 5. The following example shall explain how to find 2 missing values. y2 = 260. the nth differences are zero i.10 (260) + 5 (220) – 200 = 0 430 . y3 = ?. use the pascal’s triangle Solution : Since the known figures are five. In the problem there are two unknown values hence two equations will be required 10 determine them.7 = 0 n 1 1 1 2 1 2 1 3 1 3 3 1 4 1 4 6 4 5 1 5 10 10 5 For example the value 3 = 1 + 2. ( y − 1) 5 = y5 − 5 y 4 + 10 y3 −10 y2 + 5 y1 − 1 y 0 = 0 Let us solve this example Example: Estimate the production for the year 1985 with the help of the following table.Y 2 + Y 1 .1 Y1= 2.525 + 600 . Y 4.4. ∆n −3 .390 + 132 . we assume that the (n . fifth leading differences will be zero. the sixth leading differences will be zero i.20 (30) + 15 (26) . Y2 = 3.1 Yo = 0.y 3 + 5 (260) .1) – 0. and so on.e. As five figures are known we.

Thus the first differences would be indicated by . and so on. y x represents the figure to be interpolated. Newton’s Advancing Difference Method. Newton’s Advancing Difference Method This method is applicable in those cases where the independent variable x increases by equal intervals like 10. 40. ∆ ‘s are the differences. extrapolate the value for x = 57. The formula for interpolation is : Table Showing Finite or Advancing Differences Y Differences X0 X1 X2 X3 X4 Y0 y1 Y2 Y3 Y4 First Difference ∆1 y1 − y 0 = ∆10 y 2 − y1 = ∆ Second Difference 2 ∆ Third difference 3 ∆ Fourth Difference ∆4 ∆11 − ∆10 = ∆20 1 1 1 2 ∆2 − ∆20 = ∆30 1 y3 − y2 = ∆ y4 − y3 = ∆ ∆12 − ∆11 = ∆2 1 ∆12 − ∆11 = ∆22 ∆22 − ∆2 = ∆3 1 1 ∆3 − ∆30 = ∆40 1 1 3 Example: Given the following pairs of corresponding values of X and Y X: Y: 20 73 25 198 30 573 35 1198 40 1450 Estimate the value of Y or X = 22 11. 30. y x = y 0 + x∆10 + x( x − 1) 2 x (x − 1)( x − 2 ) 3 x( x − 1)( x − 2 )(x − 3 ) 4 ∆0 + ∆0 + ∆ 0 + .556 © Copy Right: Rai University 227 . However.or y3 = 306 RESEARCH METHODOLOGY Thus the missing values corresponding to 1994 and 1996 are 306 and 390 thousand tones respectively. Newton’s Divided Difference Method. second differences by.... 1. The differences are indicated by the sign .The value at origin Difference between the two adjoining values If in the above example we are to compute the value of y for x = 25. and the third differences by . the value of x shall be obtained as follows: x= 25 − 10 15 = = 1. 2! 3! 4! where yo represents the value of y at origin. like Binomial expansion method it is not necessary here that the value of x for which y is to be interpolated is one of the class limits of x series. if the given data are: X 10 20 30 40 50 Y 100 120 130 140 150 y0 y1 y2 y3 y4 In case we are given years and the values of y variable then X= Year of interpolation .Year of origin Time difference between the two adjoining years When applying this method the differences between the various values of Y are to be calculated. The value of x is obtained as follows: x = The value to be interpolated . etc. 2. Some of these formulae are: 1. 20.5 20 − 10 10 Newton’s Method A number of formulae were given by Newton for different situations. For example. The following is the table of differences: We can interpolate the value of y for x = 25.. The first differences in each column are called Leading Differences.

4( 0.4( 0.4(0..73 = 125 25 198 y1 ∆11 = 573 .556 .4 − 1) 0.198 ∆20 = 375 – 125 = 250 ∆30 = =0 250- = 375 30 573 y2 250 4 ∆ 0 = .92 ∆22 = 252 – – 625 = -373 x( x − 1) 2 x ( x − 1)( x − 2) 3 x( x − 1)( x − 2) 4 ∆0 + ∆0 + ∆0 2! 3! 4! 0.4 − 1)(0.623 – 0 ∆2 = 625 – 1 ∆12 =1198 – 573 375 ∆3 = -373 – 1 = -623 = 250 250 = 625 = . ∆10 . ∆30 are the first.623 35 1198 y3 ∆13 = 1450 1198 = 252 40 1450 y4 y x = y 0 + x∆10 + y 22 = 73 + 0.4 − 1)(0.. ∆20 .. The formula is yx = y0 + ( x − x0 ) ∆10 + ( x − x0 )( x − x1 )∆ 20 + ( x − x0 )( x − x1 )(x − x 2 ) ∆30 + .4 − 3) * 250 + *0+ * −623 2 3* 2 4*3* 2 Newton’s Divided Difference Method This method is to be used when the value of the independent variable X advances by unequal intervals.4 − 2) 0. second and third leading divided differences respectively © Copy Right: Rai University 11.4 * 125 + y22 = 118.Solution: Applying Newton’s Method RESEARCH METHODOLOGY 228 X y First Difference ∆1 20 73 y0 ∆10 = 198.4 − 2)( 0.

7. Example: The observed values of a function are respectively 168.Steps: • Prepare a table of divided. What best estimate can you give for the x y First Difference ∆1 Differences 2 Second Difference ∆ Third difference ∆3 Fourth Difference ∆4 3 x0 168 y0 120 − 168 = −12 7 −3 ∆10 7 x1 120 y1 72 − 120 = −24 9− 7 ∆11 63 − 72 = −9 10 − 7 ∆12 − 24 − (−12 ) = −2 9−3 ∆2 0 − 9 − ( −24) =5 10 − 7 ∆2 1 5 − (−2 ) =1 10 − 3 ∆3 0 9 x2 72 y2 10 x3 11. The method of preparing this table is given below: RESEARCH METHODOLOGY x y First Difference ∆1 Differences Second Third Difference ∆ difference 3 2 ∆ Fourth Difference ∆4 X0 Y0 y1 − y 0 = ∆10 x1 − x 0 y 2 − y1 = ∆11 x 2 − x1 y3 − y2 = ∆12 x3 − x2 y4 − y3 x 4 − x3 = ∆13 X1 y1 ∆11 − ∆10 = ∆20 x2 − x0 ∆12 − ∆11 = ∆2 1 x 3 − x1 ∆13 − ∆12 = ∆22 x4 − x2 ∆2 − ∆20 1 = ∆30 x 3 − x0 ∆22 − ∆2 0 = ∆31 x 4 − x1 ∆3 − ∆30 1 = ∆40 x4 − x0 X2 Y2 X3 Y3 X4 Y4 value of the function at the position 6 of the independent variable? Solution: Since the independent variable is advancing by unequal intervals we will have to use Newton’s divided difference method. differences. Interpolating The Values of Y For X =6.556 63 y3 © Copy Right: Rai University 229 . 9 and 10 of independent variable. 120. By The Divided Difference Method • The value to be interpolated is denoted by x • The above formula is applied. 72 and 63 at the four positions 3.

( x − x n ) + y1 + ( x 0 − x 1 )( x 0 − x 2 )( x 0 − x 3 ).... Example: You are given the following information: X: y: 5 12 6 13 9 14 11 16 TUTORIAL Q1The annual sales of the company are given below.RESEARCH METHODOLOGY y x = y 0 + ( x − x 0 )∆10 + ( x − x 0 )( x − x1 ) ∆20 + ( x − x 0 )( x − x1 )( x − x 2 ) ∆30 + . Hat is the best estimate you can give for the value of the function at the position 6 of the independent variable? Q5 Estimate the annual premium payable at the age of 28 years from the following data Age (years) Annual premium(Rs.( x 2 − x n ) where x0........) : 36 39 43 47 : 20 25 30 35 Solution: We can extrapolate the business in 2004 by the Binomial expansion method.. whether our series advances by regular or irregular intervals or whether the value to be interpolated is in the beginning or in the end. 120... 7. y1. ( x 2 − x 0 )( x 2 − x 1 )( x 2 − x 3 ). x2.. Since the known values are five..( x − x n ) + ..556 .. x1..( x − x n ) ( x − x 0 )( x − x 1 )( x − x 2 ). y6 = 147 y 6 = 168 + ( 6 − 3) * −12 + (6 − 3)( 6 − 7) * −2 + ( 6 − 3)(6 − 7 )( 6 − 9) * 1 Lagrange’s Method This method is applicable for any data. y2=14 x3=11 y3=16 x=8 yx=? Q2 Find by interpolation the number for 1996 from the following table of index numbers of production of a certain articles in India: Year :1994 1995 107 1997 157 1998 212 Index number : 100 yx = 7 (8 − 6)(8 − 9)(8 −11 ) (8 −5)(8 −9)(8 −11 ) (8 −5)(8 − 6)(8 − 9) (8 − 5)(8 −6)(8 −9) yx = 12 +13 +14 +16 (5 − 6)(5 −9)(5 −11 ) (6 −5)(6 −9)(6 −11 ) (9 − 5)(9 −6)(9 −11 ) (11−5)(11− 6)(11− 9) Q3By using a suitable interpolation formula.... y2. Estimate the sales for the year 1980 Year Sales : 1970 : 125 1975 163 1980 1985 238 1990 282 Find the value of y when x = 8 Solution: x0=5..... Example: Extrapolate the business in 2004 from the following Year : 1999 150 2000 235 2001 365 2002 525 2003 780 Business : Q4 The observed values of a function are respectively 168.... estimate the price for the year using following data Year: Price 1980 12 1985 15 1990 20 1995 24 2000 31 Extrapolation Extrapolation refers to estimating a value for future period. … etc are the given values of x variables and y0.( x n − x n − 1 ) y2 ( x − x 0 )( x − x 1 )( x − x 2 ). y0=12 x1=6. The formula is – y x = y0 ( x − x 1 )( x − x 2 )( x − x 3 )..( x 0 − x n ) ( x 1 − x 0 )( x 1 − x 2 )( x 1 − x 3 ).. 9 and 10 of the independent variables. y1=13 x2=9. In order to extrapolate a particular value the various methods discussed above for interpolation can be adopted..( x 1 − x n ) + yn ( x − x 0 )( x − x 1 )( x − x 2 ).( x − x n − 1 ) ( x n − x 0 )( x n − x 1 )( x n − x 2 ). etc. 72 and 63 at the four positions 3. the fifth leading difference will be zero. are the corresponding values of y variables. (y-1)5 =0 x: y: 5 ∆5 = 0 5 ∆ = y 5 − 5 y 4 + 10 y 3 − 10 y 2 + 5 y 1 − y 0 = 0 1999 150 y0 2000 235 y1 2001 365 y2 2002 525 y3 2003 780 y4 2004 ? y5 (y-1) = y5 –5*780 +10* 525 – 10* 365 + 5*235 –150 =0 = y5 –3900+5250 –3650 –1175 –150 = 0 -y5 = 3900 – 5250 +3650-1175 –150 = 0 y5 = 1275 230 © Copy Right: Rai University 11. yx is the figure to be interpolated..

Sample Size: 150 4. Characteristics sought after in an ASL. Monarch. 2. Market share of various brands in Indian market 2.K.K. Research Design Type of study 3. Park Avenue by J. To find out consumer awareness about various ASL brands in the market. some of the issues which need to be studied are: 1.Top of Mind 34% of the sample respondent out of 150 said that they did not use any after shave lotion. Eighties saw the introduction of some more new brands-Savage by Wiltech India Ltd. all the information is obtained from primary source. Givenchy and Yardley were mentioned by 16% respondents Patrichs and A von were at the second in terms of awareness by of the respondents (10%).Old Spice manufactured in India by Colfax Laboratories Pvt. Park Avenue and Yardley. 3. Old Spice came in a big way to grab a major share of the ASL market in India and continues to be the market leader. Three imported brands namely English Leather. reasons for consistency or change in usage of a particular brand 4. Marketing Brief Traditionally. Park Avenue and English Leather 12% each. ASL market is thus gradually becoming competitive. Perception of consumers about the domestic vis-a-vis imported brands 3.1 Sources of Data As secondary data about the After Shave Lotion market is almost nonexistant. Other brands being used were Savage. Givenchy and Yardley are still in use among the upper segment of the ASL market. Initially. Purchase behaviour of the consumers and advertising effectiveness The present study attempts to look at some of these issues through marketing research. Also. Savage.RESEARCH METHODOLOGY LESSON 43: CASE STUDY Friends. Sampling Unit: Household 3. Second Level No particular brand was prominent at second level of awareness or recall. Previous and Future Brands On the basis of the present brand being used. and even today quite a few men use alum as an antiseptic after shaving or used nothing al all. To have a closer look at the ASL market. 3. Williams. Marketing Issues It is necessary to analyze the impact of the recent changes mentioned above and gauge the ASL market. 20% of the respondents mentioned that they used Brut on previous occasions.556 © Copy Right: Rai University 231 . Brut. 28% of the respondents had Old Spice at their top of mind. brands. Use of after shave lotion (ASL) in India is relatively a new phenomenon. now we will discuss some the real life case studies and understand how techniques of research methodologies are useful for that. 3. 2.2 Data Collection Mode The data collection instrument used for obtaining the desired information is a questionnaire. Target Population The target population of the study consisted of men from middle and upper income groups residing in Calcutta in the age group of 20-50 years. all can not afford to buy them. Ltd. Old Spice emerges as the leader. whereas Denim and Brut had 6% (see Exhibit-1) Present.3 Sampling Plan 1. a need for a cheaper and indigenous brand was felt. more and more men are today using this product. Seventies saw the introduction of two indigenous brands made in collaboration with the foreign companies. 11. Sampling Method: Purposive Sampling 4. To study the perception o f the consumers about Indian ASL brands vis-a-vis foreign. Largest number of the respondents (44%) said they used Old Spice on the previous occasion. 3. the use o f ASL was mainly confined to the upper class and most of them relied on imported brands. (a copy enclosed). Thus. Musk (Jovan) and others have a top of mind awareness of 8%. Helen Curtis and Old Spice Musk by Coaliax Laboratories. Helen Curtis in collaboration with Helen Curtis of USA. the brand usage pattern and the reason for consistency/ change. With rapid changes in lifestyle and values.. specially the choice criteria adopted. MonExploratory Case Study No. in col1aboration with Shulton o f New York and the other brand Monarch was manufactured in India by J. Hence the following findings are based on an effective response from 101 (64%) respondents. To study the buying behaviour o f consumers. Since these ASLs are expensive. English Leather. These were . 33% of the respondents said they are using Old Spice at present. See the graphics in the appendix for various findings. whereas Brut had a top of mind awareness of 16%. Marketing Research Objectives The objectives of this research study were: 1. Data Analysis And Findings Awareness . Many imported brands viz. 2. 1 After Shave Lotion 1.

which are gifted. please specify. Among the other reasons 30% said they use ASL because of the freshness it gives. 18% use for its perfume. Jovan (Musk) and English Leather were the other brands previously used. At present most of the after shave lotions... 6. (c)……… 3. b For a change. Which of the following brands have you heard of? TICK a c e g i 4. manufacturers can stress on the unique or exciting benefits from a particular brand offers. 1. (see Exhibit-3) introduce better quality products and use advertising to improve the poor image of their brands. Do you use an after shave lotion? ( ) Yes ( ) No If you do not use an after shave lotion then go to the Question-12 2. are imported ones. Manufacturers call think of launching brands with attractive packing as a gift item because a sizeable number of sample respondents said that they get an after shave lotion as a gift.. Regarding the future choice of brand: Old Spice and Park Avenue appeared as the most likely choice. The reasons mentioned for use of imported brands are better quality.. 7. (See Exhibit-4) Usage Time 56% of the respondents use ASL immediately after shaving.. Old Spice turned out to be the most popular brand in the after shave lotion market.556 . b If you are to select an after shave brand now which brand will you choose? . We would he grateful if you could fill-up the following questionnaire in this regard. 22% said that their resent brand provides them value for money..arch. 5. Surprisingly. 56% of the respondents continue using the same brand because they have become habituated to it. Indian manufacturers must 232 © Copy Right: Rai University 11. a. (see Exhibit-5) Preference for Indian vs Imported Brand 42% of the respondents were found to prefer imported brands to Indian brands. Perfume and type of bottle are considered as the next important factors. Most consumers consistently use a particular brand because they get used to it. RESEARCH METHODOLOGY Appendix Questionnaire Dear Respondent. 34% of the respondents changed their brands “Just for a change” 12% changed because the brand they preferred. was nut easily available. We are conducting a survey of the after shave lotion market. To inspire a change. ‘Park Avenue has carved a niche for itself in the upper segment of the market..... easy availability was the reason cited by 66% of the respondents. Brand image was given as the reason for preferring Indian brands by 33% of the respondents. Why do you use an after shave lotion? TICK Park Avenue Savage Patricks Aramis Brut b d f h j Old Spice English Leather Williams Givenchy Yardley Reasons for use of ASL 50% of the respondents said that antiseptic property is the predominant reasons for using an after shave lotion. (a)…….. (see Exhibit2)Reasons for Change of Brand/Consistency Regarding switching of brands. 44% of them use ASL after taking bath. For Indian brands. Can you give reasons for consistency/change in your after shave lotion’! Consistency a Habitual b Value for money c Don’t like others d Any other. Can you recall the name of the previous brand of after shave lotion you used? Please mention.. 12% of the respondents like the sting of ASL.. d Any other. Imported brands are still considered to be superior quality by a sizeable number of consumers. 50% of the respondents felt that Indian brands were preferable because of lower price. (Exhibit -7) Purchase Factors Brad names was found to be the most important factor that influenced the purchase decision.. (Exhibits-6A and 6B) Purchase Decision 52% of the respondents said that they bought ASL themselves. price and antiseptic property of the brand appeared lesser important factors in brand choice decision (Exhibit-8) Conclusions Among the sample respondents. please specify Change a Like to try other brands. 38% of them got it as a gift. brand image and status symbol. c All brands are same. Please name a few after shave lotions you have heard of. (b)……. 12% of the respondents were of the opinion that they use a brand because they don’t like other brands.. Which after shave lotion are you using at present?. while 33% of them use ASL before going to a party. 10% of the respondents said ASL is bought by their family members.

.. To feel fresh d. 8. Price b. Profession Govt... 72000 p. Given an easy availability of Indian and foreign brands of after shave lotion which brand do you prefer? ( ) Indian ( ) Imported Why? TICK a Perfume is better b Quality is better c Brand image d Price is lower e Status f Easy availability g Any other. 72000 p. When do you use an after shave lotion? a Immediate after shaving b After a bath c Anytime of the day d Before going to a party.. Brand name c.. please specify. As a perfume c. e ..a.a. 10. For its antiseptic properties b..Who buys the after shave lotion for you? a... 9.556 © Copy Right: Rai University 233 . Service/Private Service/Student/Business/ Any Other Thanks a lot.. Personal Information: Age: ( ) less than 18 years ( ) 25-35 years Family Income ( ) less than Rs. Self b Family members c Normally get it as a gift d . Here we have mentioned a set of factors that you may consider while buying an after shave lotion? Give your response on a seven point scale ranging from (1) most important to (7) least important for each of them.... Girlfriend loves it e To get the sting.a ( ) above Rs.a. f Any other reason... ( ) Rs. Perfume d Antiseptic property e Type of bottle (with/without atomizer) 12.. please mention. 36000 to Rs... 36000 p... ( )18-25 years ( ) above 35 years RESEARCH METHODOLOGY 11.. 11. a.

556 .RESEARCH METHODOLOGY 234 © Copy Right: Rai University 11.

Sampling Procedure Market Share and Brand Names In 1988. the painter. while buying any decorative paint. Oil Bound /Dry Distemper: 79% Competitive Structure The Indian paint market. Target Respondent People who have got their house/furniture/some domestic appliance painted during the last year.0 Market Share(%) 32. Paint purchase process is also thought to be a joint decision where the end user and Dumber of intermediaries (i. advertisement influence in the decision-making. medium and of economic quality. The average annual growth of the market during the last five years was roughly 5%.13 million. Going by the interval of Case Study No.5 85.6 8.RESEARCH METHODOLOGY Company Asian Paints Goodlass Nerolac Berger Paints Jenson-Nicholson ICI Total Million Litre 42. automotive.8 12. range of pack size. the top five companies market share in decorative paint market was as follows: 11. washability.5 5. b. 2.0 10. The big and small scale units contribute an equal amount to the total supply.556 © Copy Right: Rai University 235 . ii To assess the relative importance attached to the various factors namely durability. iii To study consumers awareness about brand and manufacturers’ name. In 1988 the breakup of different grades of the decorative paints consumption was as follows: Enamel Emulsion Premium Enamel: 44% Medium Enamel: 12% Economy Enamel: 44% Premium Emulsion: 3% Economic Emulsion: 13% Flat Oil Paints: 5% purchase of any decorative paint. Source of Information: Consumer survey with a structured questionnaire (copy enclosed in the Appendix-II) 3. Paint advertisers also seem to convey a lot of messages about their brands and specific features available in their offerings. friend. today. There are currently three grades of synthetic enamel paints. Premium. contractor/painter. this study endeavoured to examine the following issues: 1. consumers are found to consult their friends and own family members in the choice decision. refinish. These two evidently indicate that decorative paint market in India is a highly competitive business.e.0 11. To determine how consumers decide on the choice of shade. The solvent based paints arc known as enamel paints. instant drying. In the organised sectors the leading paint companies are: Asian Paints Berger Paints Garware Paints ICI Goodlass Nerolac Jenson & Nicholson Shalimar Paints In the year 1988. Given such a scenario of the decorative paint market in India.5 8. coach painting. dealer/retailer.4 Exhibit 1 in Appendix-1 presents the different brand names of major Paint manufacturers in India and Exhibit 2 shows major companies market share in different class of decorative paints. Specially to study how self/spouse. Research Design 1. valued at Rs. But what is the possible type of role played by these intermediaries is often not clearly known. finish. dealer/retailer) seem to interact. brand in decora-tive paints. Type: Exploratory Study 2. 2 Decorative Paints The paint is used by two broad sectors called decorative and industrial purposes such as protective. The idea is to create awareness about brands and company names among the end users so that the brand falls under the acceptable lists in the consumers mind. has 22 large units and 1600 small units. In addition. availability and price.0 16. Sampling Decisions a. This case study will examine some issues pertaining to the marketing of decorative paints.2 4. contractor. A typical decorative paint is either water based or solvent based. The water based paints are emulsion (acrylic) or distemper (dry or oil-bound). packsize. 48.. signboard and.3 66. the product is treated as consumer durable. the decorative paint market size was estimated to be 128 million litre. Marketing Research Objectives i. company name.

Surprisingly. RESEARCH METHODOLOGY Source Consulted (i) Shade(ii) Self/Spouse Friend Contractor Painter Dealer/Retailer Advertisement 96 20 35 8 15 Size (iii) Brand/Company (iv) 30 4 80 42 6 36 20 84 48 30 Applying X2 test on this data we found that these decisionmaking and sources are not independent. the study tested whether majority of the consumers expressed that they consulted some particular type of person in certain decision-making. They were. company name and price came somewhere in the middle of the scale. it was initially hypothesised that these form of classifications (shown below) are independent.50-vs.50 where Pi = Proportion of consumers who seem to consult a particular type of person in certain decision-making. That is. the study tested the role of spouse/ self. by Normality test. c. It shows that durability and finish are the most sought after qualities in decorative paint. Data Analysis (a) To analyse the data regarding the possible influence by different members in decisionmaking. would influence the decision more than others.556 . People who refused to cooperate or were not available when approached were replaced by other members in the list.V scale. This hypothesis was tested. b. 4. able to name the manufacturer. 2. Data regarding consumers relative importance to the given set of attributes in choice decision were analysed by Thurstone’s Case. To allow recency in the data collected. The above list was divided area-wise and the study decided to draw samples by systematic procedure to provide representation to different localities. Consequently. Simultaneously our intuition suggested that some of the members among the above mentioned people. however. So. 236 © Copy Right: Rai University 11.To get a representative and unbiased sample it was decided to choose certain localities in Calcutta. 4. Type of person Regerding Brand Self/Spouse Friend Contractor/Painter Dealer/Retailer Choice Shade Decision Size Advertisement This was tested by Chi-square test. From the table itself it is clear that self/spouse play an important role in deciding on the shade of paint bought. Sample size One hundred (the number fixed on convenience). Initially a sampling frame was prepared with the help of dealers/retailers of these localities from their sales records. It possibly indicates that consumers surveyed attach more importance to quality of paint than price. contrac-tors/painter and dealer/retailer in different decision-making. 50% of the samples were selected by intercepting the customers at the shop. The table below shows the number of people who consulted a particular type of person (in column-l) in different decision-making. Keeping in view that it is an exploratory study this many respondent was assumed to be adequate. Findings 1. the null hypothesis of this type can be expressed as. such as contractor/painter. 3. 3. whereas contractor/painter is the major influencer regarding the size and brand of paint purchased. some of the sources are distinctly more operative in certain decisionmaking.H1 : Pi < 0. In appendix-III we have shown the derivation of interval scale of various attributes by Thurstone’s model. Hence. Ho: Pi> = 0. Dealers and advertisements were not found to be much key source of influence in any decision. This observation would possibly give many insight to paint manufacturers on the issue of whether to have brand or corporate advertising. ‘i’ suffix denotes the particular type of person. Data also showed that majority of the consumers were not aware of the exact brand they had used. c.

Syn.Washable. Enamel Glosslite Durolac Glosslite Kinglac Synthetic Synthetic High Gloss Enamel Enamel Syn.Emulsion nab Ie Wall Paint Apcolite Jensolin Syn. Finish Finish x x 7. Washable able DisOil Bound Distemper temper Distemper 6666 Dry X Blundell's Muresca Dry Distemper Distemper 8. Flat Oil Paints Bison Syn. Washable Distemper x Jensolin x Naolac Paints Nerolac Syn.!: Premium Enamel British Paints Luxol Hi Gloss Syn. DI)' Distemper DI }' Distemper Castle Jensolin DI}' Distemper x 11. Syn. Pearl Paint. Dl)'ing S)l1 Enamel Enamel Enamel Superlac Nerolac Eomite Acrylic PlasACl}'tic Acryl ic tic Emulsion Emulsion Emulsion With Ror Int/Ext. Vinyl Wall Paint 6. Ext. Super Jensolin Decoplast Acrylic Water Thin. Lustre Finish. Syn. Parrot Synthetic Enamel Economy Butterfly Enamel G. Enamel Asian Paints Jenson &: ICI Shalimar Paints Superlac Synthetic Enamel Good/ass Garware 1.RESEARCH METHODOLOGY APPENDIX-l EXHIBIT 1 : Decorative Paints Brand Names of Major Manufacturers MANUFACTURER Class of Paint.P.P.Acrylic Acrylic tic Emulsion Emulsion Emulsion S. Oil paint Neromatt Ext. Flat Duradol New Soldier Eomite Syn.556 © Copy Right: Rai University 237 . 2.P.Syn. Umbrella Maxilite Enamel Synthetic Enamel Apcolite Robbialac Dulux SuperAcrylic Plas.P. Enamel Diamond Palm Tree King Quick Synthetic G. Flat Flat Oil Paint./lnt. Ma tt Ext. Silicone ror Use (nt/Ext use Durolac X X Acrylic Plastic Emulsion & Shaliplast Styrene Matt Kote Nerolac Emotite Paint Syn.Ecomite thetic Pammd Enamel Syn. 3. 4. Economy Emulsion B. Wash. Oil Bound Distemper x Tractor Syn. Syu. Enamel Premium LuxolSilk Emulsion Acrylic Emulsion Medium Enamel Nicholson Apcolite Brolac Dulux SynSynthetic Dulux thetic Enamel Polyurethetic Enamel Enamel 3-Mangocs Jensons Duwel Synthetic Quick Speed-gloss Enamel Dl)'ing Gattu G.

pack size and make (brand/company) of the paint? State your views by ticking in the respective columns below Decision Regarding Choice of Source Consulted Self/Spouse Friend Contract /Painter Dealer/Retailer Advertisement 6.4 75. Put the number 1 in the row whenever the 1st row factor is more Factor Price InstantAvaila bility Ronge Com.1 64.4 44. Say.5 83.3 9 22.1 (e) Company name 70. For example.2 85.556 . What were the reasons for choosing this particular paint? 5.8 55.7 75.3 21 22 46 17 14 17 65 30 6.1 24 Appendix 2 Questionnaire 1.6 4 15 20 7.2 (f) Finish 68.6 31.5 24.1 14.5 0 13.5 79.2 50 (b) Instant Drying 1.0 64. Perhaps it will be easier for you to fill this table if you start from the 1st row factor: and compare it with all the column factors. Did you seek the advice of the below mentioned persons while deciding on the shade.2 14.EXHIBIT 2 Decorative Paints .8 0 18.5 (c) Availability 20.6 24.9 52.2 87. Row {actor marl: important 10 column factor (in Factor %) A b c d e f g H (a) Price 98. company name. you consider price as more important than durability then put in the price row durability column the number 1.2 (h) Washability 50 81.: 19 10 4 Jenson & Nicholson 6 5 6 ICI 10 1 0. While buying decorativc paints consumers seem to consider factors such as.Maior Paint Companies Market Share Unit: In Percentage (%) COMPANY Berga Paints 12 12 13 RESEARCH METHODOLOGY Class of Paillls Premium Enamel Medium Enamel Economy Enamel Premium Emulsion Matt Silk Economy Emulsion Oil I30und Distemper Asian PaintS 33 44 33 Goodlass Naola.2 9. durability and washability etc.9 98.5 375 - 19 12.5 50. availability. Please note that there is no right or wrong answer here. range of pack size.1 (d) Range or pack size 0 20. instant drying.8 18.6 11. When did you get your house/furniture/some appliance painted last? 2.5 85. finish.8 (g) Durability 64.7 Ochers 20 27.8 24.8 100 35.8 © Copy Right: Rai University 11. Here you are asked to judge all possible pairs of such factors and indicate for any pair of factors which factor is more important to you in the following table.4 90.2 1. Which type of paint you used? 3.3 48.4 31. Can you recall the make (brand/company name) of the paints you used? Brand Name Company _________ _________ Name __________ __________ durability.8 35.0 16.5 43.9 88.6 100 29.5 22.9 77.Finish Durabilitu Drying (a) Price (b) Instant Dtying (c) Availability (d) Range 01 pack size (e) Company name (f) Finish (g) Durability (h) Washability x ( x ) () () X Of Pack size () () () x pony Name ( () ( ( ( x () ) () X ( ) ( x ) X () () () X Washabil ity () () ( ) () () () 4.8 100 81.5 68.5 75. suppose you are asked to compare between price and 238 () Shade Pack size Brand /Company Appendix 3 Exhibit-3A Factor Preference Pattern of Consumers Table below shows the proportion of consumers (in %) who expressed that row factor is more important to the column factor. price.5 35.2 79.

• Domestic use: 600 watt • Commercial use: 1000-1500 watt. While it is true that these two companies manufacture a better quality generator and they are more expensive.54 2.22 size (e) Company name 0.33 0.95 0. (See the following table to see the price changes undertaken by these two companies within a year). till recently portable generators used to be marketed on factors like low noise level.5 2. 3.)1.5 (-)0.71 0.83 (-)0. Two major competitors Sriram Honda and Birla Yamaha are locked in a fierce competition in the market indulging in price war. 3.30 0 3.)0.22 0.5 (-)0. (-)0.10 (h) Washability 0. 2. Hl: Portable generator is perceived as a low utility item in urban market than in rural sector.97 1.1 0. However.83 (-)0.)3.76 ( . reliable machine etc. Thus by 1986 the total output of portable generator industry was in the range of 2.71 2. Local brands so far used to satisfy the requirements of the rural sector. But the emerging market segment seems to be the rural market. in collaboration with Yamaha of Japan entered the portable generator market with like capacity. For example.54 0. But today the market leaders have realised the importance of rural market. But now the market requirements have changed.e.5 KVA portable generator.71 0.13 0.38 3.9 0.05 1.14 0.87 1.) 1. (-)3. The different hypotheses and the specific measurement used are explained below.83 3.97 (-) 1.5 (f) Finish 0.)0.71 (-) 1. The study tests many such hypotheses on data collected from rural and urban markets to see the difference in the requirement pattern of different class of users.5 .5 ( . the local brands were also gradually introduced.09 0. Marketing Research Objectives Broadly speaking. RESEARCH METHODOLOGY Case Study No. Similarly. (-)0. Lombardini has also disappeared from the market and so on (see Exhibit-l to have an idea about shift in market share).48 (-)0. Research Hypotheses The issues mentioned above are tested with formulation of some hypotheses. 2.33 (. There has been a spurt in the industrial tie-ups and consequently industrial output has gone up. scale value for each factors as the following Thurstone's Value a b c d e F g h Scale 2. TABLE I Price Fluctuation or Two Leading Companies Unit : Rs. wattage demanded) the following two segments seem to form. To compare the utility of a portable generator in rural and urban market.9 (c) Availability -0.1 (d) Range of Pack -3. 0. fuel efficiency. To examine the perception about the price of a portable generator by rural vis-a-vis urban consumers.38 (-)2. 2. Portable generator is one such industrial item where many new units were set up during 198485. the research objectives are 1. 11.48 1.09 O. Simultaneously. Typically on the basis of user capacity requirements (i.10 Computing the average of each row and shifting the origin.5 KVA machine. Hence this needs to conduct a marketing research.38 . it would imply that the product should have a large fuel tank as efficiency of the machine can be increased only up to a point.1 0. we get the.)2. 3 Portable Generator 1.13 (.38 ( . Kirloskar group has withdrawn the 1. ( . Greaves Cotton tied up to produce its brand Called ‘Lombardini’ portable generator. light and fragile and hence perhaps cannot be left in the open in the farms.t4 .76 3.5 (-)1. For example.71 1. Marketing Brief The new economic liberalization policy of 1985 has led to increased foreign industrial collabora-tions in India. it is also true that the features looked for in a portable generating set seem to vary from urban to rural market.5 (-)0. To study how rural and urban market attach importance to below mentioned qualities of a portable generator: a Noise level b Hours of continuous running c Ruggedness d Dry weight. As such. Kirloskar group introduced a 1.83 (-)0.05 1. Enfield India followed with Gee generator and so on.05 (. The Birla group.38 0.0 (b) Instant Drying -2. This market has been totally ignored by the two market leaders.0 0. the rural market requires generators mainly to run the pump sets in the farm.Exhihit-3b Applying Thurstone’s formula to obtain the interval scale value under the Normality assumption we obtained the following table: a b c d e f g H (a) Price .556 © Copy Right: Rai University 239 .41 Company Sriram Honda Birla Yamaha Market Segmentation: Portable generator has been marketed for domestic and commercial use. Thus testing such hypotheses would enable to identify the optima! feature mix consistent with technological feasibility and consumer preferences.05 1. This in turn implies that the dry weight would be higher. Sriram group collaborated with Honda of Japan and set up a unit with a capacity of 500 portable generators a day. Thousand JA MA AP MA JU AU OC NO DE FER JUL SEP N R R Y N G T V C 90 89 86 84 83 82 81 80 79 78 77 0" 89 87 85 83 82 81 80 79 78 77 77 6 Hypothesis-l Ho: Portable generator’s utility is equal for both rural and urban based consumers.08 3.09 4.3S (g) Durability 0.)0. For example. It is estimated that there are about 50-60 units operating in the local sector with capacities in the range of 100 a day.54.). if the research reveals that hours of continuous operation is the most important factor in rural setting. For example.5 lakhs a month. This demand was however shortlived and by 1987 many units had closed down.48 (-).48 .

Q :Please rank the following features of a portable generator in your order of preference.556 . Further the rural sector does not perceive the price of the portable generator to be expensive.e. Findings Hypothesis-l : The data indicated that portable generator had a greater utility for the rural segment than for the urban segment. e. (Xl . This was tested using the following question. Hypothesis-2: The rural segment perceives portable generator to be less expensive compared to the urban segment. Depending on the structure of the alternative hypothesis the test had chosen appropriate acceptance/rejection rule for the null hypothesis.This test was based on the response to the following question. While the utility for the rural sector is maximum. Hypothesis-2 Ho: Portable generator is perceived as equally expensive in both rural and urban segments HI: Portable generator is perceived as less expensive by the rural consumers. may be in view of their pressing requirement. 4. Test Statistic All the above mentioned hypotheses were tested by applying the t test whose general structure was as follows: Xl . S. HI: Noise level is a less important factor for rural segment. Hypothesis-3 Ho: Noise level during operation is an equally important factor for both segments. An implication of this result is that rural segment would prefer’ higher tank capacities. the urban consumers perhaps put it in multiple uses and hence look for lighter weight.c. Colour television Music system Air conditioner Portable generator VCR Camera Refrigerator t -= Sample Size The study had selected an equal size of sample of fifty consumers from urban and rural markets. this very. b. Marketing Brief 240 © Copy Right: Rai University 11. a portable generator may be positioned in the rural and urban market. Hypothesis-5 Ho: Ruggedness is an equally important factor for both segments HI: RU&8edness is more important for rural segment. HI: Hours of continuous operation is more important for rural segment. Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree The hypotheses on product features are analysed based on responses to the following question. Hypothesis-4 Ho: Hours of continuous operation is equally important for both segments. segment has been somewhat ignored. On the contrary. f. Also a review is needed with regard to different size of features of a portable generator and accordingly. Hypothesis-6 : Dry weight is a more important consideration of the urban segment. Case Study No. as the machine would be in the open field. Hypothesis-4 : Hours of continuous operation is a very important factor for the rural segment. This is understandable since a farmer has to procure fuel from a far distance that he can’t afford to do this very often.X2) where X1. A farmer docs not require frequent mobility of a generator. stands for standard error of the difference between sample mean. Hypothesis-6 Ho: Dry weight is equally important factor for both segments HI: Dry weight is more important for urban segment. d. c. a Noise level during operation b Hours of continuous operation c Ruggedness of the machine d Dry weight of the machine. Data analysis by this t-test on the sample response showed the following results. Q :Please list your order of preference for the purchase of the following items: a. RESEARCH METHODOLOGY The mean rank obtained by portable generator in rural and urban segments was analysed. Hypothesis-5 : Ruggedness of the machine is a feature which the farmers look for more.x2 S. Hypothesis-3 : Test shows that noise level is a less important factor for the rural segment. (See chapter on t-test for its computa-tional formula). g. Recommendation It is seen that the research results reveal some important lacunae in the existing marketing strategy. X2 sample mean scores obtained from the rural and urban sector respectively.4 Typewriter 1. Q :Portable generator is expensive for you to afford.

Source of Information Marketing Research Objectives 1. are usually involved in the decisionmaking process. Information Required 1. D. their production level during 1977-82 and demand trends. (c) Moreover. Also an analysis of the demand trends. these typists were asked to state their impressions about the merits and demerits of electronic typewriter. past performance. As in this instance two categories of personnel. discount offered and terms of payment. To assess the perceptions of the user-organisations (decision-makers) and the typists (users) about existing brands of manual typewriter and 5. To study the typewriter market in terms of competitive structure and demand trends 2. 4. Population: All public and private sector organisations including educational institutions located in Calcutta. . each purchase/administrative manager was asked to state the merits and demerits of electronic typewriter. lightness of touch. 2. But gradually the electronic typewriter has been projected as having superior features than any manual typewriter machine. so separate response from these two groups of people in an organisation was obtained. (see enclosed question-naire-1 in the appendix) (b) Also. 2. of which model or make of the typewriter to buy. So there is a growing need to assess the impressions of purchase managers and users about electronic typewriters. Information about typewriter industry was obtained from the manufacturers through in-depth interview of key marketing personnel . Secondary Scanning of annual reports of typewriter companies and discussions with marketing personnel of the typewriter companies located at Calcutta. Particularly in such a situation any marketer would require to assess how the decision makers and users judge its offering. Primary Conduct Consumers surveys with the help of two questionnaires. To ascertain decision-makers and users impression about electronic typewriter. So it is desirable to examine the possible behavioural differences in public vis-a-vis private sector companies. (See enclosed questionnaires detailed).within each company. To analyse the perceived strengths and weakness of the companies operating in the typewriter business in the eyes of the manufacturers 3. 2. In the beginning the electronic typewriter was positioned as a status symbol. market share. Each manufacturer’s perception about strengths and weakness of its competitors. (d) Five typists were asked to judge different brands of type writers which they have used with regard to clarity of prints. in recent times electronic typewriter has invaded the market in a big way. strengths and weaknesses of different manufacturers is essential for any marketing decision. Typewriter is a typical product where the people involved in the choice decision. availability of after sales services. each purchase/administrative manager was asked to indicate the possibility of switching over to electronic typewriter. And these two sets of people have different preferences about the various existing brands of typewriters and their role in the choice decision vary. (e) Similarly. are often different from the user of the machine. Schedule of Information Collection Two sets of questionnaires were developed to secure data on the lines suggested under the section titled ‘Information Required’ (copy enclosed in annexure 1). Moreover. Also. To examine the buying behaviour regarding the purchase of manual typewriter in public vis-a-vis private sector. A. 1. Companies operating in the typewriter business. The core product of most of the companies are similar. C. Sampling Unit: It included the coverage of five typists and one purchase/administrative manager in each of the selected organisations. 11. manufacturer’s reputation. Data Collection Location of the Study . guarantee. the unique selling proposition of all the companies is invariably after sales service and low price. 3. speed and durability. In each organisation the study asked (a) Purchase/administrative managers to state the degree of importance’ attached to price. As such.A typewriter machine is an indispensable item in any organisation and also possessed by many professionals. RESEARCH METHODOLOGY B. after sales service. In India the number of producers of manual typewriter has remained four for many years and portable machine is manufactured by Remington Rand only. Research Design This study used an exploratory design to analyse the market size and competitive scenario and descriptive design to examine buying behaviour and opinions about electronic typewriter. Given this background the following paragraph explicitly states the various marketing research objectives of this study.556 © Copy Right: Rai University 241 . 2. typists and purchase managers. there is a prevailing notion that the buying behavior of typewriter in government (public) and private sector companies are different. typist’s opinion. (ii) Sampling decisions for consumer behaviour survey consists of the following elements: 1. one meant for typists and other for purchase/administrative managers.Calcutta 1. But users develop some perception based on their experience or otherwise.

Ltd. In each organisation five typists were selected by following systematic sampling procedure on the master list of secretarial staff maintained by each organisation. Ltd. 3. 4. Remington 2. 3. & PB 3. Remington and Facit are the main producers of electronic typewriters in India. Purchase behaviour of organisations b.FINDINGS 1. As stated in the marketing research objectives. b Standard manual typewriters arc widely used in offices. The research objectives regarding the assessment of market size. PCL.3. Smuggled portable typewriters also compete with indigenous variety. Competitive Structure 242 © Copy Right: Rai University 11. The market size (in units) for three types of typewriters produced in India is as follows: Portable Standard Manual Electronic : : : 7. The private sector companies attach higher importance to the typist’s opinion than the public sector/units. Brand Name RESEARCH METHODOLOGY 3.000 1. Lightness of touch is the most sought after quality in a manual typewriter. Here the opinions of two sets of respondents. It still commands a very high price. Sampling procedure: (a) The study had a-priori decided to include an equal representation of public and private sector organisations in the sample. Hindi and Gujarati.000) and high operating cost (Rs. of units) from 1977 to 1982 is as follows: 1977 1978 1979 1980 1981 1982 334 Remington 31330 33100 32700 29175 17539 26 Godrej 21250 26771 26210 27362 32261 37250 Halda 8982 11349 14743 13453 13920 14t90 Fadt 2196 13244 18864 24700 26100 31170 11603 Total 63758 84464 92517 94690 89820 6 Between 1983-87 the market demand remained between 1. Regional Demand Pattern 2. 1. Interestingly. 2. a Portable typewriter is manufactured by Remington Rand only and it comes in three languages. Demand Trends (a) Market trends for manual typewriters (no.000 to 1. students/research scholars and small business owners. 18.20. Godrej & Boyce Pvt. journalists. Hypothesis (1) a. It is a light machine but suffers from low durability and uncertain servicing facility. Data Analysis 1. This type of typewriter is. competitive structure. Rs.1 per page compared to 2 to 3 paise per page for manual typewriters).000 15.000 Godrej AD Halda FACIT c Network. Hypothesis (2) a. Possible association between typewriter model on which typists learnt and models which are preferred now. (min. b. 4. The major users are professionals. The organisations were selected on a random basis after preparing a master list of population on the basis of pooling of customer list of the manufacturing companies. User’s opinions on superiority of model(s) on specific feature and d. c . It is classified as a luxury item and pays 50% duties.10.556 . the study tried to assess how purchase managers and users look at an electronic typewriter. Company 1. purchase managers and users. 2. Four major companies are supply-ing this type of machines under different brand names (given below). User’s perceptions about important features in a manual typewriter c. Facit Asia Ltd.10. Rayala Corporation Pvt. As per the convention of descriptive study it was essential to formulate a few hypothesis pertaining to a. Remington Rand of India Ltd. government docs not buy this type of typewriter machine. were treated separately to examine the contrasting views. opinions expressed on the given list of merits/demerits of electronic typewriter Were summarised by pooling the data. These are available in different Indian languages and various ranges of carriage sizes. All available models of manual typewriter do not perform equally well with regard to offering the lightness of touch. (see Annexure-II for details) b. 3. strengths and weaknesses of different companies were analysed from the data gathered through in-depth interviews with marketing personnel of companies. English. Growth Rates 10 years ago 5 years ago Now Manual Region West North South East % 30 30 20 20 30% Stagnant No growth from 1986-87 figure b.000 units. by and large. The data obtained through open end questions about the merits and demerits of electronic typewriter were subjectively analysed Likewise. The exact number chosen was fifty each which was decided as per the sample size determina-tion rule. b. Price and terms of payment receive more importance in the purchase-decision of public sector than in the private sector. Godrej. assembled in India from CKD and SKD complete/semi-knocked down) kits.

Obsolete model 2. Monopoly in typing school segment. Buying Behaviour In Publlc-Vs-private Sector (See annexure-III for detailed results of the test of hypothesis). Good image as a supplier of different office equipment. furniture etc. PB model. Lockout in 1982-83 led to some los:. RESEARCH METHODOLOGY 5. Durability (lasts for 20 years or so) 4. Could not withstand the competitive pressure and 3. but it fair~d poorly on aftersales service. Price and terms of payment are given a somewhat higher degree of importance in public sector than in the private sector.V for details about the method of testing this hypothesis) c. Automatic pull in market and 5. 90% of the market lies in metro cities. Weaknesses 1. 4. Consumer Perception About Existing Brands a. This is probably the reason why it is the market leader in public sector segment. Godrej Strengths 1. reveals that about 60% mentioned that they are used to this machine previously (i. Efficient sales policy (tim1ely visit for repair and maintenance). (See annexure. In manual typewriter market. Remington has a monopoly in typing school segment. Gourej’s AB model is perceived as of inferior quality and machine requires hard touch. 10% lower price than others. e Analysis of the open-end question. by and large. Typists. Strengths and Weakness of Different Companies Remington Strengths 1. c. Remington and Godrej were rated as “harder” typewriter machines. Weaknesses Users of Remington machine initially find it difficult to operate. d. b. of market. Godrej enjoys the highest share of public sector market. Captured fairly large market share despite somewhat new entrant 2. is yet to establish in the market. c. Easy to operate and light in touch 3. felt that FACIT is the best available typewriter machine on the lightness of touch quality. In case of electronic machines. Reputation of the manufacturer and after sales services are given equal importance in public and private sector. 2. Weaknesses 6. Lightness of touch is the most sought after quality ina typewriter. Efficient after sales service 2. Godrej adopts a somewhat liberal credit (terms of payment) and discount policy. Unable to generate large number of copies at a time. with more or less same regional pattern of demand. (See annexure-V for details about testing of this particular hypothesis) b. though better. Halda Strength First feather touch machine in India. “Why a particular brand is preferred”. 11. Ruggedness and reliability 2. Hard touch and 4. Market Share Manual Typewriter 1982(%) Godrej Remington Facit Halda Electronic Typewriter Network PCL Godrej Facit Remaining Companies 29 32 27 12 Market Share in 1987(%) 40 30 25 5 1987 Market Share (%) 33 25 13 7 22 1. Lacks innovative skill in marketing. Familiarity). Poor after sales service 3.40% of the demand originates from the Metro Cities. Facit is comparatively more successful in South and found more popular among private sector units. Services and maintenance requirement negligible 3. Facit Strengths 1. 3.556 © Copy Right: Rai University 243 . 2.e. Halda was also perceived as a light touch machine. Weaknesses 1.

Neither important nor unimportant 2. of the opinion that an electronic typewriter has the following merits: 1. 7. 4 does not include this person whom you have so long interviewed. by and large. commenting on the demerits of electronic typewriter. possibly apprehended RESEARCH METHODOLOGY poor reliability of electronic parts. Thanking you for your coopera-tion. (Decision Makers) • High costs of operation • Inability to use due to power failure • Poor reliability of electronic parts.f An association between brand on which a typist learnt and brand preerred today was tried to be established. Typists’ Opinions • Poor reliability of electronic parts • Inability to use due to power failure • Special skill required d Open-end query about the merits/demerits of electronic typewriter showed. The Chi-square homogeneity test showed that there is some positive relationship (See annexure. Very unimportant O. Godrej Buyers:Highly satisfied with after sales service. We shall appreciate if you please answer our questions. by and-large. Unimportant 1. • Managers found it to be a prestige value item and hence. after sales service.556 .tali put before you a few questions. Annexure-l Questionnaire-I Target Respondent: Purchase Manager/Administrative Officer Dear Sir/Madam We are conducting a marketing research study on typewriters. In contrast. Automatic correction facility ii. Very Important 4. c. price and past performance. 1 Name of Organisation 2.VI for details) • Sole reason for avoiding it and thus. Speed and iii Display. then terminate the interview and seek this person’s help to meet any of the persons who is involved in the purchase decision). quality product with an established market image. by and large. Can’t comment a Price b Past performance c Typist’s opinion d Manufacturer’s reputation e After sales service I Guarantee g Discount offered h Terms of payment () () () () () () () () Manual Elecrornic Buyer prol11e of three leading brands Facit Buyers : Attach more importance to opinion of the l~’pist. High price is probably the biggest barrier for its immediate acceptance. Memory. Quality of print-outs 2. the two groups expressed the following opinions: Reaction on the possible demerits Manager’s Opinions . Here we have mentioned a few factors related with purchase of typewriters. the typists were of the opinion that its merits arc: i. Important 3. expressed their unfamiliarity with this 6. still much hesitation to adopt electronic typewriter in both public and private sectors. Imfressions About Electronic Typewriters a There is. Similarly. These consumers come mainly from private sector companies. little importance to price and manufacturer’s repm. Speed and 3. Here we sJ. b Purchase/administrative managers were. Type of Qrganisation ( ) Public Sector ( ) Private Sector ( ) Educational Institute 3. 5. Will you please assign your degree of importance on a scale ranging from: 5. Who takes the decisions regarding the purchase of typewriters in your organisation? (Name all the persons and their designations) (Instructions to the interviews: If the answer to Q. Remington Buyers: Perceive their typewriter to be as high priced.~ion.. How many typewriters have your organisation purchased during the last three years Type Year 1987-88 1986-87 1985 -86 4. Do you know a particular preference for any particular make of typewriter? ( ) Yes ( ) No new innovative machine as the 244 © Copy Right: Rai University 11. choose to install it in the office of Chief Executive / Director’s Office only • Typists.

................. .... (Q) What do you feel are the distinct merits/demerits of an electronic typewriter? i Merits ........... (iii) Memory ( )..... e…………………………………... ............. (iv) Editing feature ( )....... c ... After sales service d.. In your opinion what are the merits and demerits of an electronic typewriter? Merits a …………………………….... (vi) Inability of use due to non-availability of spares ( ) Thanks you very much for your support... 6.. a........... From our experience we state below a few possible merits and demerits of an electronic typewriter......... 10....... state the special reason( s).. (ii) Special skill required ( ). Which typewriter are you using now? 3: a Which typewriter do you prefer most to work on? b What Ware probable reason(s)? 4.. What was the typewriter on which you learnt typing? 2.............. b .. If you are allowed to scale labelled as (a) strongly agree (b) agree (c) neither agree nor disagree (d) disagree (e) strongly disagree (f) cannot comment. d………………………………........... II Target Respondent : Typists/any Secretarial Staff 1.......... b .............................. 7..... and .. (v) Automatic correction ( )........................................... 5.. Demerits a .. e………………………………....... a... Questionnaire...indicate your opinions about the various brands of typewriters you have used on each of (he above mentioned select features....... (v) Inability to use due to power failure ( )... .. Brand/Model Calrity of Ligthness After sales Speed Durability Print Touch service 1.......000 will you be willing to install an electronic typewriter? ( )Yes ( )No ( )Can’t say (b) What maximum price are you willing to pay for this electronic typewriter Maximum price to be paid .......... ii Demerits .. State your possible degree of agreement/disagreement on a scale ranging from a Strongly agree b Agree c Neither agree nor disagree d Disagree e Strongly disagree f Can’t comment Possible Merits (i) Speed ( )..... Here we have selected a few distinct features of a manual typewriter namely..Rs............................................... Have you ever used an electronic typewriter? ( )Yes ( ) No If no. 9........ ......... ................ (a) Given the price of tqe manual typewriter is Rs. (vi) Display ( )............. d .... (5) Very good (6) Cannot comment...................... (vii) Quality of Printouts ( ) Possible Demerits (i) Cost of operations....... (ii) Production capability ( )...... 2................. (iii) Poor reliability of electronic parts ( )........ Clarity of print b....................... Durability Use a rating scale such as (1) Very poor. c... (3) Average... ...........If yes.. 4.................. ( ).. (4) Good.. . 3..... .. indicate your opinion on the below mentioned: Possible Merits/Demerits i Possible Merits Speed Memory Editing features Reproduction capability ii Possible Demerits High cost of operation © Copy Right: Rai University RESEARCH METHODOLOGY () () () Display Automatic Display Quality of Printouts () () () () () () 11........ (2) Poor....... then terminate the interview.... When do you decide to replace/dispose off an old typewriter? 8... ...... 6......... Lightness of touch c....... Speed e.........................................................556 245 ...... From our experience we have found the below-mentioned merits/demerits in an electronic typewriter.(iv) Likelihood of obsolescence ( )......

Therefore. The confidence with which it is desired to achieve the level of positive chosen. questionnaire we have asked whether the organisation has an electronic typewriter or not. Moreover. For example. users’ views on different purchase related dimensions are recorded on a five point scale. the optimal sample size will be the maximum of the sim -ple size calculation for different response types 246 © Copy Right: Rai University 11. () () () () () RESEARCH METHODOLOGY Annexure-II Sample Size Determination The exact sample size decision in a survey of this nature depends upon i.556 . The average estimate of the response to a question asked ii. in the. similarly. in this study the response to each of the questions asked are either binary or five point multiple choice.Special skills required Poor reliability of electronic parts Likelihood of obsolescence Inability to use due to power failure Inability of use due to non-availability of spares Thank you very much for your cooperation. The precision with which it is desired to estimate the parameter iii.

The other well known brands in the market are Savlon.4 Recognising a Problem Area Over the last few years performance of I’ AC has been a matter of concern as the sales did not reach the expected level with time. now let us discuss some of the real life case studies and see how the techniques of research methodology is helpful to solve them. • People may be aware of DAC but may not be convinced Case Study No. as well as for medicinal uses like cuts and wounds. specially Boroline is doing very well. A detailed study of the media support is beyond the scope of this report. it would seem that DAC is perceived as an antiseptic cream to be used specifically for cuts and wounds. “DAC offers Dettol protection in cream form. Hence.556 © Copy Right: Rai University 247 . it seems Boroline is perceived as a “general purpose cream” which can be used for cosmetic purposes. people may prefer to use established antiseptic liquids. Research Design 2. DAC was to be a complementary product to dettol liquid. 1. 2. insect bites. the objective would be to find out what people see in Boroline cream which they find lacking in DAC. given the way it is being currently perceived by the people. like Dettol or Savlon. One aspect of the competition which was not anticipated earlier was that DAC . which presently was the market leader with almost 90% share of the antiseptic liquid market. Fair & Lovely Shaving Nicks: lotions. 1.2 Intended Positioning . 2. 1. DAC was introduced as part of a line extension strategy. Assess consumers awareness and recall of DAC’s advertisement. Boro-Plus and Boro-Calendula. These media have also not been extensively used. as a spin-off benefit the study proposes to find out the level of awareness of DAC’s “advertisement” among the people. On taking a closer look. Boroline would be a handy allpurpose cream to have at home. This research study proposes to verify these hunches. However.1 Background Dettol Antiseptic Cream (DAC) was launched in the Indian market by Reckitt & Colman of (India) Ltd. It is indicative enough that there is a market for antiseptic cream. Burnol Various Cold Creams Various cosmetic creams Various after shave enough to buy it. there are a few products/brands with very specific usage areas like Burns: Dry skin/chapped Lips: Pimples: like Clearasil. which they might already have at home and arc currently using. 11. 3. especially in Eastern India. On the other hand. hoarding and point of purchase displays. burns. 3.5 Antiseptic Cream 1. boils and rashes”. Thus the problem could be that the media support is insufficient. the media support may be sufficient but the message might not have got across to the consumer. shaving nicks. pimples etc. Marketing Research Objectives 1. 1. It is effective against minor cuts.3 Competition Boroline has the lion’s share of the antiseptic cream market. This raises the basic question: “Why are sales not picking up and what should be done to rectify the position?” Let us examine where the problem could lie. .might face competition from DETTOL Liquid itself. In case of cuts and wounds.1 Reserach Hypothesis The research study tested the following hypothcses. This would mean that the intended positioning might not have been achieved. like dry skin and chapped lips. Insufficient media support would mean • People are not aware of DAC. Ascertain the elements (attributes) consumers look for in an antiseptic cream. Potential Market There are a number of brands in the antiseptic cream market and some of them. may be because of the Dettol brand name. In addition to these. To study the incidence of skin problems and brands used in those situations. Distribution This does not seem to be the problem area since the extensive distribution network for other products of Reckitt is being used for DAC. In that case. To examine how satisfied the consumers are with DAC and Boroline for different elements that they look for.RESEARCH METHODOLOGY LESSON 44: CASE STUDY Friends. Marketing Brief 1. it would appear that there is not enough market potential for a cream like DAC. wounds. Media Support Media support for DAC has been restricted to insertion in newspapers and magazines. 4. On the other hand. So 2.

11. 2. Question No. Content This question gives us an indication of the most frequently occurring skin problems. 2. The perception of Boroline is obtained from this question. Why should I use DAC?” . This question indicates whether the respondent has heard of the concerned brand (i. What do people want in an antiseptic cream in terms of various attributes and benefits derived from the product? 4. 10. . 6. It was decided to take a sample of 100 respondents. For example. Boroline which is a general purpose cream rather than DAC which is not a general purpose cream”. I’d rather use Dettol liquid or any other antiseptic. end and ‘good’ at the other.5 Questionnaire Design At the outset a fairly exhaustive list of usage occasions and qualities of an antiseptic cream was arrived at and a pilot 3.Hypothesis 1 “If I have minor cuts or wounds. Data were represented graphically using ‘Lotus’ package in the personal computer. RESEARCH METHODOLOGY Hypothesis 2 “If I were to use an antiseptic cream I’d use. 2. 2.5 scale with bad at one 248 3. 2. In the final questionnaire this was changed to ‘odour’ varying from ‘medicinal ‘odour’ to ‘perfumed ‘odour’. For example. Scaling: The ordidnal scale data on ranking of usage occasion frequency (Q. questions were designed in the sequence to collect the following data. the query as to what people use for various usage occasions was made open ended as it was observed that the close ended question in the pretested questionnaire made the respondent biased. survey was conducted to narrow this list down. 8.) was done on a sample of fifteen respondents.4 Data Collection Mode The data collection instrument used for obtaining the desired information is questionnaire. This open ended question would give an indication of what people currently do for the various skin problems mentioned. Depending on the difficulties encountered by them in answering the ques-tionnaire.58 Final Questionnaire (a copy enclosed in the Appendix-I) In consonance with the information requirements.1) and importance of attributes (Q.3 Sources of Data All the above information is collected from primary sources.2 Information Required To achieve the research objectives the following information is required 1. 7. The perception of an ideal antiseptic cream is sought from the respondents by asking what “magnitude” of each of the mentioned attributes would they desire in an antiseptic cream. The logic of questionnaire development is highlighted below. 4. 9. Data Analysis The data obtained from the respondents was first edited and the valid (87) responses were retained for the purpose of analysis. DAC).e. its initial format was suitably modified to finally arrive at the one given in this report. 1. Few changes were also incorporated in the questions pertaining to rating of attributes for the different brands so as to make them unbiased. This question is used to find out if the respondent has used DAC and Boroline. 2. Sample Design and Sample Size The study had purposely chosen a convenient sampling procedure. This question determines the level of DAC’s advertisement recall and message recall. the process of administering the questionnaire on a conveniently selected group of people to test its clarity.e. How is Boroline viewed in terms of the above attributes/ benefits? 6. 2. What are the most frequently occurring skin problems? 2. 2. ease of response etc. What do people do when they have these skin problems? 3. What is the message retained from the DAC’s advertisement? 2.4) was converted to an interval scale using the Thurstone’s Case scaling technique. 5. Consumer’s perception about DAC is obtained from this question.556 © Copy Right: Rai University . Sampling Unit Household 3. Here the respondent is asked to rank the chosen attributes in order of importance on a seven point scale. How is DAC viewed in terms of the above attributes/ benefits? 5. What is the brand DAC’s awareness in the market? 7. Finally some basic information about the person responding are collected. Respondent The target respondent of the study consisted of people from different income groups residing in Calcutta. What is the extent of DAC’s advertisement recall? 8.6 Sampling 1. Terms like ‘value for money’ and ‘after use visibility’ did not seem to make much sense to the respondent and so these two terms were omitted in the final questionnaire from the list of attributes. in the pretest respondents were asked to rate brands for smell on a 1. 11. This question is used to obtain the level of unaided recall of the various brands of antiseptic creams among respondents.5A Pretesting The pretesting of the questionnaire (i.

39 7. after shave lotionsor shaving cuts.4 1. does not sting on application.64 7.57 2 Wi = weightage for usage occasion i.13 4.59 4.14 25.(See Appendix-II for details).27 7.15 7.35 3.2 Presentation of Data This is a summary of the results obtained from respondents.6 2.39 8. DAC and Dettol is shown in Table-2 Table 2 Percentage break-up or usage occasions for Boroline.44 - Oyher Brands 51.69 0 100.26 18. The average scores of the respondents on their agreement/ disagreement with the seven attributes statements is given in Table-3 for an ideal antiseptic cream. using the formula: 7 Average dissatisfaction score = Σdi .6 1.78 5. Here the percentage of pcof1lc who use Boroline for a particular occasion was multiplied by the weightage for that usage occasion and this was added over all the usage occasions. It is also not staining.18 17. They are indifferent to the odour but would not want the cream to be staining. using the interval scale derived for their ‘importance of attributes’.00 0 17.57 3.0 1.61 11.79 5.34 38.23 9.14 4. A similar index was calculate for DAC.52 23. the consumers would like a general purpose cream.0 Minor Burns Boils/Pimples Blisters/skin peels Dry skin/chapped lips Cuts/scratches Insect bites Dry skin Shaving cuts Minor Burns Insect Bites Cuts & Scratches Blisters/skin peels Rashes 24.8 2.27 65.46 1.96 7. 1. and cosmetic creams like Clearasil. Boroline and DAC 249 Rashes Figure 1: Thurstone Case V scale for frequency of occurrence of skin problems 2.68 6.57 100.0 39.05 31.64 25. Usage occasions: The ranking for the frequency of occurrence of skin problems when converted to an interval scale using the Thurstone Case V method. See Exhibit-A. a medicinal odour and is not a general purpose cream.39 9.52 22.59 25.14 RESEARCH METHODOLOGY Bumol 2. What people do when a skin problem arises is shown below 11.53 42.6 18. The scores range from -2 to + 2.59 9. 3.14 44. Then.8 1.11 1. It should not sting on application and should be non-greasy.1 Dissatisfaction Score for Each Brand The ‘dissatisfaction’ score for a particular attribute for a particular brand is defined as the dif-ference between the score on that attribute for the ideal antiseptic cream and the brand. Boroline does not have high antiseptic qualities and is a general purpose cream with a perfumed odour. it is non-staining. for Ideal antiseptic cram.09 1. for Boroline and for DAC.14 1.86 7.4 2.14 12. 7 Frequency of use index for brand ‘a’ = Σ Wipai i where p i = percentage of people who use brand ‘a’ for usage occasion i a Usage Occasion Boils Dry skin/chapped lips Shaving Cuts Minor burns Insect bites Cuts/scratches Blisters/ skin peels Rashes Table 1 Percentage break-up of brand used for each skin problem Do Boroline DAC Detrol Nothing Liquid 35. Table: 3 Average scores on product attribute. (see Appendix-IIA for details) the following picture was obtained 2.21 1. does not sting on application and easily available.2 2.2 1.11 1.556 © Copy Right: Rai University . Also. DAC and Detlol Liquid Boroline DAC Dettol Boils/Pimples 8.13 70.61 30.82 15.Wi i =1 where di= average dissatisfaction score for attribute i.29 3.22 59. in the Appendix-III Various Usage Occasions of Boroline.08 “Other brands” comprised mainly of cold creams for dry skin/ chapped lips.89 6.76 2. is not greasy and is easily available. using the Thurstone scaling technique. Wi = weight age for attribute i i ranges from] to 7 The dissatisfaction scores for Boroline and DAC were compared.32 53. the average dissatisfaction score for the brand as a whole was calculated both for DAC and Boroline. Weightage was then assigned to the various attributes. The average ‘dissatisfaction score’ for each attribute was calculated for DAC and Boroline. with high antiseptic qualities. Ideally. as follows -2: strongly disagree -1: disagree 0: neither agree nor disagree 1: agree 2: strongly agree It is observed that DAC is perceived as having high antiseptic qualities.00 For a Pie-Chart See Exhibit – B1 and B2 in the Appendix-III 3.’doctor’s advice’ for rashes.92 32.00 0 22 0 14 58 2 2 100. 3.

08 1. Antiseptic creams respondents were aware of (unaided recall).09 0.6 -O.0 Antiseptic qualities General Purpose usability/ Availability 8. When converted to an interval scale using the Thurstone Case V scaling Technique is represented in Figure 2.184 9.68 6.130 0.78 0 Boroline: DAC : 5. 11.20 0.7 2" 2" -0.4 1.7 .2% 44. 15 respondents or 31.14 0.15 7.14 0.7 67. Weighted Frequency of Use Index 250 © Copy Right: Rai University 11.556 .0 -0.0.0. (See Tables in the Appendix II for details).3 Rashes Weight Frequency of Use Index See brand usage summary in the graphics enclosed in Appendix – III 7. 2.11 1.52 23.8% % No: used 10.7 1.4 2.68 18.39 9.2 0 1.104 0. See Exhibit C in the Appendix-III for a graphical exposition. 2000 Grtater than Rs.2 Non-staining characteristics/odour Non-stinging characteristic Non-greasiness Figure 2: Thurstone Case V scale for importance of attributes 5.07 0. Number of respondents who have ever used Boroline and DAC % Used Boroline 89. Brand Boroline DAC Percentage of respondents 62 45 “ These were assumed to be 2.8 DAC 10.28 0. The ranking for the importance of the various product attributes.43 Of the 48 respondents who had seen the advertisement.10 0.2 1. Dissatisfaction Score in different attributes brand combination Table 4 Mean dissatisfaction scores for each attributes for DAC and Boroline Attribute DAC Odour Staining characteristic Greasiness Sting on application Availability Antiseptic qualities Table 5 Weighted frequency of Use Index for Boroline and DAC Usage Occasion Boils/Pimples Dry skin/chapped lips Weightage Boroline 0.Attributes Medicinal Odour rather than Perfumed Odour Staining Greasy Sting on Application General Purpose Cream Easily available High Antiseptic qualities Ideal Antiseptic Cream 0 DAC 0.6 2.7 Shaving cuts Minor burns Insect bites Cuts/Scratches Bisters/Skin peels 0.9 .1.17 0.1 -0.46 1.20 0.6 0.3% could correctly recall the product’s message.14 14.0.76 2. 2000 Sex: Male Female Testing of Hypothesis Hypothesis 1 42 58 25 75 Percentage of respondents 63 37 6.172 0.4 -0.96 7.10 0.06 % of Users DAC 1.32 Brand Boroline 0.6 0.5 2. AD-recall Number of respondents who had seen the DAC advertisement before Number Seen Advertisement Not seen advertisement 48 39 Percentage 55.6 1.6 RESEARCH METHODOLOGY . since the consumer would obviously want these attribute in an ideal antiseptic cream. Demographic Data Age Less than 35 years Greater than 35 years Monthly income Less than Rs.0 1.12 0.8 2.7 Boroline .0 0. Dissatisfaction Scores 4.108 0.3 32.162 0.7 0.09 25. Number of respondents who had heard of Boroline and DAC Heard Boroline DAC 86 78 Not heard 1 9 9.8 1.0 0.14 0 0.61 11.

e. In other words. i.PDAC) = Öpq/n Z = 2. for cuts and wounds which is a frequent skin problem. 1. he examines the total bundle of benefits that the cream offers. pDAC is the population proportion using Dettol Antiseptic Cream. null hypothesis is rejected. % Who use Usage occasion Dry skin/chapped lips Cuts/scratches Insect Bites Minor Burns DAC 0 11 8 2. Ho: πL = πDAC vis HI: πi > π DAC Where pL is population proportion of people using any branded antiseptic other than DAC. Conclusions To obtain an answer to the question. Sample data shows PL = 22/87 pDAC =11/87 P= =0. 11. of (PL .212 (P is a pooled estimate of usage) S. PL . Hypothesis 3 Advertising Effect on DAC usage H1: There is no relationship between the DAC use vi5-a-vis exposure to the various DAC advertisements. n = 87 since value of Test statistics> tabulated value.556 2.e. I’d use dettol liquid or some other antiseptic rather than DAC v/s HI: Negation of Ho i. I’d use dettol liquid or some other antiseptic liquid rather than DAC” has been accepted. the.e (PL.PDAC) where PL is the sample proportion of consumer using any branded antiseptic other than Dettol Antiseptic Cream and PDAC is the sample proportion of consumers using Dettol Antiseptic Cream.e.Ho: “If I have cuts or wounds. I’d use Boroline which is a general purpose cream rather than DAC”.123. Boroline is used more often than DAC.15 Boroline 22 23 6 9. NO is rejected i.1) degrees of freedom. DAC is not perceived as a general purpose cream. gala showed it is not perceived as a general purpose cream but has Specific medicinal application. m = mean score of consumer perception on a five point scale H1: µ > 0 Reject Ho if test statistic (T) > tabulated value of t with (n .386) < X2 tabulated (= 3.5 for Boroline and 2. The sampled consumers seem to use Dettol liquid and other antiseptic creams more than the DAC.84) the null hypothesis is accepted. Hypothesis 2 Ho: If I require to use an antiseptic cream. This implies that Boroline has a greater change of being used than DAC. i. Boroline is perceived as a general purpose cream. “Why arc sales of DAC not picking up?” which was the major thrust of this study. the way DAC is currently perceived. Hypothesis 1: “If I have cuts or wounds. DAC does not find significant application. there is not enough market potential for it. S = 1. Ho: I’d use Boroline rather than DAC.e. 3. v/s HI: Negation of Ho To test this hypothesis some sub-hypothesis have to be formulated. This hypothesis has also been inferred to be true. V/s H1: There is some relationship. Ho: DAC is perceived as a general purpose cream vs H1: DAC is not perceived as a general purpose cream. Ho : µ < 0 where. I would use Boroline which is a general purpose cream rather than DAC.156 The tabulated value of Z at 5% level of significance = 1. Ho = Boroline is a general purpose cream v/s HI: Boroline is not a general purpose cream. Similar to (i) Reject Ho if T < t here Yn = 0.64 Since Z calculated > Z tabulated. Further.034 S = 0. Hypothesis 2: “If I require to use an antiseptic cream.65 n = 87 Since T < t Ho is accepted.6 for DAC. . we may recapitulate the results of different hypotheses. These two hypotheses together indicate why there is not sufficient market demand for DAC. The weighted frequency of use index has a value 14. The usage of DAC and Boroline for the 4 most frequently occurring skin problems is compared in the Table below. This hypothesis can be inferred to be correct by the following facts. when a person wants to buy an antiseptic cream. Combinil1g the earlier results it can be inferred that Boroline is used more because of its general purpose usability.e. use of DAC is not related to exposure to DACs advertisements. This is. Hence.013. Dettiol Antiseptic Cream User Non User Seen Advertisement of DAC 30 18 Not seen advertisement of DAC 29 10 59 28 Since X2 calculatcd (= 1. People do not buy an antiseptic cream 251 © Copy Right: Rai University .68 RESEARCH METHODOLOGY nl + PL+ n2PDAC n1+n2 This table confirms that for the most frequently occurring skin problems. where Test Statistics (T) = Yn s/√n Data Shows Yn = 1.PDAC Test statlstte Z = S.

This leads to the question whether DAC should be re-positioned as a ‘general purpose cream or not’.which has specific medicinal usage viz. 1. However. Please rank them from 1 to 8 in order of how frequently they occur in your family. The following is a list of eight common skin problems. 2. Whether you use any antiseptic. This means that Boroline has a greater chance of being used than DAC.556 . Listed below are some statements about an IDEAL ANTISEPTIC CREAM. Strongly disagree disagree neither agree nor disagree agree strongly agree Appendix 1 Questionnaire 4. But it would be difficult for DAC to achieve such a position in the consumer’s mind. in case 5. We would be grateful if you express your opinions on the following list of questions. But these are all positioned as ‘General purpose cream’.5 for Boroline and 2. However. 1. An ideal antiseptic cream should have a medicinal odour rather than a perfumed Strongly disagree disagree neither agree nor disagree agree strongly agree RESEARCH METHODOLOGY Brand Awareness Awareness for DAC among the sample was quite high 55% of the respondents had seen the DAC advertisement before and 31 % of the respondents could correctly recall how the product was advertised. it is true that there is lot of potential in the antiseptic cream market. When any of the above problems arise. They would rather use Dettol liquid. This is because the brand name “Dettol” has a medicinal connotation and it would be a hard task to convince the consumers that DAC is a ‘general purpose cream’. So it can be concluded that the potential lies in ‘general purpose’ antiseptic cream. Strongly disagree disagree neither agree nor disagree agree strongly agree 3. If at all they are to buy a cream. This view is further confirmed by the weighted frequency of use index which has a value 14. the next most important. cuts and scratches. you simply ignore any particular skin problem then write nothing Boils/Pimples Dry Skin/Chapped Lips Shaving Cuts/After-shave Dry Skin Minor Burns Insect Bites Cuts/Scratches Blisters/Skin Peels Rashes 3. When you buy an antiseptic cream. Hypothesis 3: Use of DAC is not related to the consumer’s exposure to various DAC advertise-ments. Mosquitos Etc) Cuts/Scratches Blisters/Skin Peels Rashes 2. An ideal antiseptic cream should sting on application. Strongly disagree disagree neither agree agree strongly agree nor disagree Dear Respondent. Give rank ‘1’ to the problem which occurs most frequently and rank ‘8’ to the least occurring skin problem. advertisement recall and use of DAC does not seem to be related. and so on. An ideal antiseptic cream should be greasy. Boroline enjoys high sales volume and there are other brands like Boro-Plus and Boro-Calendula in the market.Rank them from 1 to 8 where ‘1’ indicates the most important and ‘7’ the least. Recommendation It seems unlikely that sales for DAC will pick up if the current state of perception prevails. Boils/Pimples Dry Skin/Chapped Lips Shaving Cuts/After-sha Ve Dry Skin Minor Burns Insect Bites (Ants. An ideal antiseptic cream should be a general purpose cream. what do you generally do? Indicate your response to each by writing against the problem. which of the following is the most important to you. We arc conducting a survey about antiseptic cream. Indicate your response to each statement by putting a tick-mark against the response you prefer most. Odour Non-g R Easiness Non-stalning Characteristics Non-stinging Characteristics Availability Antiseptic Qualities General Purpose Usability 5. 252 © Copy Right: Rai University 11... An ideal antiseptic cream should be staining.6 for DAC. it would be a general purpose cre