RSB

Unit 1 Introduction to Research 1.
1 Introduction Managers are mostly involved in studying and analyzing issues that lead to decision making. They are involved in some form of research for making an appropriate decision. Decision making today is complicated and complex. There is a myriad flow of information enabled by data mining and warehousing which provides a vital input for decision making. The success or failure of a business decision depends on the data associated with the decision. The decisions can be made in an objective or subjective manner. Objective decision making is rationale and scientific. To arrive at objective decisions making the business managers often involve themselves in some form of research. Research is simply the process of finding solutions to a problem after a thorough study and analysis of the situational and other related factors. Business research is a systematic and organized effort to investigate a specific problem encountered in the work setting that needs a solution. It comprises of a series of steps designed and executed with the goal of finding answers to the issues that are of concern to the manager. This unit provides a basic understanding of research, the process involved and the steps involved in development and testing of hypothesis. Further the need and the major types of research design is dealt in detail. 1.2 Learning Objectives: After reading this unit you should be able to; Define research and understand the advantages of the knowledge of research Highlight the distinctive characteristic features of research Describe the building blocks of Science in research Understand the steps in the research process Develop a research design Understand the need and basic features of the theoretical framework Describe the steps in hypotheses development and testing The need and the major types of research design 1.3 Definition of research Research refers to search for knowledge. It is an art of scientific investigation. Redman and Mory define research as Systematized effort to gain new knowledge. Research is an original contribution to the existing stock of knowledge. D.S. Lesinger and M.Stephenson in the Encyclopedia of Social Sciences, define research as the manipulation of things, concepts or symbols for the purpose of generalizing to extend, correct or verify knowledge, whether that knowledge aids in construction of theory or in the practice of an art. According to Clifford Woody research comprises defining and redefining problems, formulating hypothesis or suggested solutions; collecting, organizing and evaluating data; making deductions and reaching conclusions; and at last carefully testing the conclusions to determine whether they fit the formulated hypothesis. Business research is an organized, systematic, data-based, critical, objective, scientific inquiry or investigation into a specific problem undertaken with the purpose of finding solutions to it. Research provides the needed information that guides managers to make informed decisions to successfully deal with problems. 1.4 Importance of knowledge of Research in business settings: The knowledge of research is important on account of the following reasons: The business world today is more complicated and complex. In this context the research enables the manager to face the competitive global market with greater confidence. Research enables to consider the available information in a sophisticated and creative ways. Research enables the managers to identify critical issues, gather relevant information, analyse the data and implement the right course of action. Managers need to understand, predict and control events that are dysfunctional to the organization. Research enables to understand, predict and control the environment. 1
Research enables to sense, spot and deal with problems before they go out of hand. The organizations may not be able to solve all the problems encountered in house. Consultants may be engaged for expert advice. The manager needs to have knowledge of research to interact with the research consultants effectively and to get the maximum benefit out of them All the research findings published cannot be accepted as such. The soundness of findings should be evaluated before making decisions on the basis of research findings. The managers needs to know about the research so as to evaluate and discriminate the research findings based on the soundness of methodology etc., The knowledge of research and research methods sensitize the mangers to the various variables operating in a situation and remind them of the multi causality and multi finality of the situations and thereby avoiding inappropriate, simplistic notions of one variable causing another. It enables the managers to understand the research reports prepared by professionals so as to take intelligent, calculated risks with known probabilities attached to the success or failure of their decision. Knowledge about the scientific investigation will enable the managers to eliminate or avoid making decisions on subjective or biased manner. Knowledge about research helps the manager to understand the need for and share the pertinent information with the research consultants.
1.5 Hallmarks of scientific research Successful managerial decisions are seldom made on hunches or on trail and error method. The sound and effective decisions are always made on the basis of scientific research. Scientific research focuses on solving problems in a step _by _step logical, organized and rigorous manner in each step of research viz., identifying problem, gathering data, analyzing it and in arriving at a valid conclusion. Organizations may not always be involved in the scientific research due to various reasons like - simple problems which can solved with previous experience, time contingency, lack of knowledge, resource constraints etc., However the scientific research performed in a rigorous and systematic way leads to repeatable and comparable research findings. It also enables the researchers to arrive at accurate, dependable and subjective findings. The hallmarks or distinguishing characteristic features of scientific research are as follows: 1.5.1 Purposiveness The research is conducted with a purpose. It has a focus. The purpose of the research should be clearly mentioned in an understandable and unambiguous manner. The statement of the decision problem should include its scope, its limitations and the precise meaning of all words and terms significant to the research. Failure to mention the purpose clearly will raise doubts in the minds of stakeholders of the research as to whether the researcher has sufficient understanding of the problem. 1.5.2. Rigor Rigor means carefulness, scrupulousness and the degree of exactness in research investigation. In order to make a meaningful and worthwhile contribution to the field of knowledge, research must be carried out rigorously. Conducting a rigorous research requires a good theoretical knowledge and a clearly laid out methodology. This will eliminate the bias, facilitate proper data collection and analysis which in turn would lead to sound and reliable research findings. 1.5.3. Testability Research should be based on testable assumptions/hypotheses developed after a careful study of the problems involved. The scientific research should enable the testing of logically developed hypotheses to see whether or not the data collected support the hypotheses developed. 1.5.4. Replicability Research findings would command more faith and credence if the same results are evolved on different set of data. The results of the test hypothesis should be supported again and again when the same type of research is repeated in other similar circumstances. This will ensures the scientific nature of the research conducted and more confidence could be placed in the research findings. It also eliminates the doubt that the hypotheses are supported by chance and ensures that the findings reflect the true state of affairs. 2
1.5.5. Precision and Confidence In management research the findings are seldom definitive due to the fact that the universe of items, events or population are not taken as such but based on sample drawn from universe. There is a probability that the sample may not reflect the universe. Measurement errors and other problems are bound to introduce an element of error in the findings. However the research design should ensure that the findings are as close to the reality as possible so that one can have confidence in the findings . Precision refers to the closeness of the finding to reality based on sample. It reflects the degree of accuracy or exactitude of the results on the basis of the sample to what exactly is in the universe. The confidence interval in statistics is referred here as precision. Confidence refers to the probability that the estimation made in the research findings are correct. It is not enough if the results are precise but it is also important to claim that 95% of the time the results would be true and there is only a 5% chance of the results being wrong. This is known as confidence level. If the precision and confidence level of the research findings are higher then the findings of the research study would be more scientific and useful. Precision and confidence can be attained through appropriate scientific sampling design. 1.5.6. Objectivity Research finding should be factual, data-based and free from bias. The conclusion drawn should be based on the facts of the findings derived form the actual data and not on the basis of subjective or emotional values. Business organizations will suffer a greater extent of damage if non-data-based or misleading conclusions drawn from the research is implemented. Scientific approach ensures objectivity of research. 1.5.7. Generalizability It refers to the scope of applying the research findings of one organizational setting to other settings of almost similar nature. The research will be more useful if the solutions are applicable to a wider range. The more generlizable the research, the greater will be its usefulness and value. However it is not always possible to generalize the research findings to all other settings, situations or organizations. For achieving genaralizability the sampling design has to be logically developed and data collection method needs to be very sound. This may increase the cost of conducting the research. In most of the cases though the research findings would be based on scientific methods it is applicable only to a particular organization, settings or situations. 1.5.8. Parsimony Research needs to be conducted in a parsimonious ie simple and economical manner. Simplicity in explaining the problems and generalizing solutions for the problems is preferred to a complex research framework. Economy in research models can be achieved by way of considering less number of variables leading to greater variance rather than considering more number of variables leading to less variance. Clear understanding regarding the problem and the factors influencing the same will lead to parsimony in research activities. The sound understanding can be achieved through structured and unstructured interview with the concerned people and by undertaking a study of related literature in the problem area. The scientific research in management area cannot fulfill all the above discussed hallmarks to the fullest extent. In management research it is not always possible to conduct investigations that are 100% scientific like in physical science as it is difficult to collect and measure the data regarding the feelings, emotions, attitudes and perception. It is also difficult to obtain representative sample; these aspects restrict the generlizability of the findings. Though it is not possible to meet all the above said characteristics of the scientific research, to the extent possible the research activities should be pursued in the scientific manner. 1.6 The Building Blocks of Science in Research The essential tenets of scientific research are: direct observation of phenomena, clearly defined variables, methods and procedures, empirically testable hypotheses, ability to rule out rival hypotheses, statistical justification of conclusions and self correcting process. One of the primary method of scientific investigation is the hypotheticodeductive method. The method of starting with a theoretical framework, formulating hypotheses and logically deducing from the results of the study is known as hypothetico-deductive method. The deduction and induction are
two important aspects of the scientific research through which the answers to a research question can be arrived at. Further details on deduction and induction are dealt below; Deduction Deduction is a process by which the researcher arrive at a reasoned conclusion by logical generalization of a known fact. Deduction leads to conclusions which should be necessarily based on reasons. The reasons are said to imply the conclusions and represent a proof. The bond between the reasons and conclusions is much stronger than in the case of induction. To be correct, a deduction should be both valid and true. True in the sense that the reasons given for the conclusions must agree with the real world. Valid means the conclusion must necessarily be arrived from the reasons. Researchers often use deduction to reason out the implication of various acts and conditions. For example in a survey a researcher may reason as follows: Surveying households in urban area is difficult and expensive (Reason 1) The study involves interview with households in urban area (Reason 2) The interview in this survey will be difficult and expensive (Conclusion) Induction Induction is a process where certain phenomenon is observed on the basis of which conclusions are arrived at. The conclusions are drawn from one or more facts or pieces of evidence. The conclusions in induction result in hypotheses. Induction leads to establish a general proposition based on observed facts. For example the researcher understand that production processes is the prime feature of factories. It is therefore concluded that factories exist for production purposes. Research is based on both deduction and induction. It helps us to understand, explain and predict business phenomena. The building blocks of scientific inquiry include the following sequences 1. Observing a phenomena 2. Identifying a problem 3. Constructing a theory 4. Developing hypotheses 5. Developing research design 6. Collecting data 7. Analyzing data and 8. Interpreting results Observation a phenomena may be casual or purposeful. A casual scanning of the environment may lead us to the knowledge of interesting facts. This observation may lead to identifying the problem in the concerned area. The problem identification needs gathering of primary data form the customers or from the employees or management concerned with the particular problem. Further insights may be obtained to refine the problem in a more specific manner. The next step is to build a conceptual model or theoretical framework taking into consideration all the factors contributing to the problem. The framework enables to integrate all the information collected in a meaningful manner. From this theoretical framework several hypotheses can be generated and tested to support the concept. A research design provides the blue print of the mechanism or insight regarding the methods of collecting data, analyzing the same and interpreting them in order to solve the problem. The building blocks of science discussed above provide the genesis for the hypothetico-deductive method of scientific methods. The steps are discussed below: 1. Observation Observation is the first stage in scientific investigation. In this process, the researcher takes into account the changes that are occurring in the environment. To proceed further the changes observed in the environment should have important consequences. The changes may be in the form of sudden drop in the sales, increase in the employee turnover, decrease in the number of customer and the like. 2. Preliminary information gathering This involves seeking in depth information regarding the facts being observed. The information may be gatehered through formal questionnaires, interview schedules or through informal or causal talk with the concerned 4
people. Desk research may also be conducted to enrich the information gathered. The next step is to make sense out of the factors identified in the information gathering stage by assembling them together in an meaningful manner. 3. Formulation of theory Theory formulation enables to integrate all the information in a logical manner so as to conceptualize and test the factors responsible for problem. The critical variables contributing to the problems are examined. The association or relationship among the variables contributing to the problem is studied in order to formulate the theory. 4. Developing Hypotheses The next logical step leads to framing of testable hypotheses. Hypotheses testing are called deductive research. Sometimes it may so happen that the hypotheses which are not originally formulated get generated through the process of induction. After the collection of data an insight may occur based on which new hypotheses can be formulated. Thus hypotheses testing through deductive research and hypotheses generation through induction are both common. 5. Scientific Data collection After the hypothesis is developed, the data with respect to each variable in the hypotheses needs to be obtained in a scientific manner so as to test the hypotheses. The primary and secondary sources can both be explored in order to collect the data. Data on every variable in the theoretical framework from which the hypothesis is generated should be collected. 6. Data Analysis The data gathered are to be statistically analyzed to validate the hypothesis postulated. Both qualitative and quantitative data needs to be analyzed. Qualitative data refer to information gathered through interviews and observations. Through scaling techniques the qualitative data can be converted into quantifiable form and subjected to analysis. Appropriate statistical tool should be used to analyze the data. 7. Deduction Deduction is the process of arriving at conclusions by interpreting the meaning of results of the data analysis. Based on the deduction recommendations can be made to solve the problem encountered. 1.7. Research Process : An Overview Research process involves execution of a series of phases towards accomplishment of the objectives of research. Each phase in the research process need not be carried out in a sequential process. Some the phase can be carried out simultaneously. However the idea of sequence will be useful for developing and carrying out a research study in a systematic manner. The research process consists of the following distinctive interrelated phases: (1) Defining the research problem (2) Establishing Research Objectives (3) Developing the research design (4) Preparing a research proposal (5) Data Collection (6) Data Analysis and Interpretation and (7) Research reporting. 1.7.1. Defining the Research Problem: A problem need not necessarily mean that something is wrong in the current situation which needs to be rectified immediately. It simply indicates an issue for which finding a solution could help to improve an existing situation. Problem can be defined as any situation where a gap exists between the actual and the desired states. Problem statement or problem definition refers to a clear, precise and succinct statement of question or issue that is to be investigated with the goal of finding an answer or solution. Components of research problem The components of research problem are as suggested by R.L.Ackoff in the Design of Social Research is elaborated below; There must be an individual or a group which has some difficulty or the problem There must be some objective(s) to be attained at. There must be alternative means or course of action for obtaining the objectives There must be some doubt in the minds of a researcher with regard to the selection of alternatives. There must be some environment to which the difficulty pertains. Criteria for selecting the research problem 5
The following criteria can be kept in the minds of researcher in selecting the research problem. Subjects on which the research is carried on amply should not be normally chosen as there will not be new dimension to reveal Too narrow or too vague problems should be avoided The researcher should be familiar with the subject chosen for research. The researcher should have enough knowledge, qualification and training in the selected problem area. The resources needs to solve the problem in terms of time, money, efforts, manpower requirement should be taken into account before embarking on a problem. The subject of research should be familiar and feasible so that related research material or sources of research can be obtained easily. The selection of a problem must be preceded by a preliminary study. Research problems trigger the research process. Defining the research problem is a critical activity. A thorough understanding of research problem is a must for achieving success in the research endeavor. Defining the research problem begins with identifying the basic dilemma that prompts the research. It can be further developed by progressively breaking down the original dilemma into more specific and focus oriented objectives. Five steps could be envisaged (1) Identifying the broad problem area(2) Literature review (3) Identifying the research question (4) Refining the research question (5) Developing investigative questions. They are discussed below; 1.7.1.1 Identifying the broad problem area The process begins with specifying the problem at the most general level eg., Declining sales, increased cost, increased employee turnover etc. From this general specification of problem the next step is to move towards the question. The question reinstates the general problem. For eg., What is the reason for declining sales?. The questions that can be raised can be grouped into three categorizes;(1) Choice of purposes or objectives where the question focuses on what objectives the researcher wishes to achieve by conducting the research (2) Generation and evaluations of solutions where the question focuses on the alternatives available to solve a problem in hand (3) Trouble shooting or control situation where the query focuses on monitoring and diagnosing why an organization is not achieving the established goals. The researcher can identify the problem through the following sources; Own experience as well as observation of others experience and situations may give raise to researchable problem Detailed discussion with various authorities concerned with the problem Focus group interviews and Scrutinizing the published data Review of literature enables to identify problems which are researched and questioned in other studies. The same can be simulated by the researcher. The above techniques would enable the researcher to understand the problem in a better manner and also to outline the possible variables that might exert an influence. The nature of information needed by the researcher could be broadly classified under three headings: 1. Background information of the organization for which research is conducted viz, the origin and history of the company, its assets, number of employees, location etc., The information can be obtained from company records, published data, Census of Business and Industry and the web. 2. Information regarding managerial philosophy, company policies and other structural aspects can be collected by asking direct question from the management 3. Information regarding the perception, attitudes and behavioural aspect of employees could be obtained by way of observation, interview and questionnaires. 1.7.1.2 Literature survey Literature survey is the review of published and unpublished work from secondary source in the area of interest to the researcher. The purpose of conducting literature survey at this stage is; To document the studies relevant to the problem identified for research To ensure that no variable that has been taken up in the past related studies is ignored. 6
To avoid conducting similar type of study and thereby stop the researcher from investing his resources in terms of time and effort in an research venture which is already solved. To provide a good frame work and a solid foundation to proceed further in the investigation. To have a comprehensive theoretical framework from which hypothesis can be developed for testing. To enable to develop the problem statement in a precise and clear manner To enhance the testability and replicability of the findings of the current research. To understand the research gap To stimulate researcher to carry out the work To confirm the appropriateness of procedure by referring the similar studies conducted in the past To trace inconsistencies, contradictions and consistencies To clear conceptualization To familiar with methodology, research tools and statistical analysis The literature review needs to be performed on the variables identified through the interview process. It comprises of three steps viz., (i) Identifying the sources (ii) Gathering relevant information (iii) Writing up the Literature review. i. Identifying the sources The data can be obtained from library by going through books, journals, newspapers, magazines, conference proceedings, doctoral dissertation, thesis, government publication and other reports. The development of information technology has led to many online databases like Prowess, EBSCO etc and also interlinking of libraries has led to a myriad of information in the hands of the researcher with the click of the mouse. Computerized databases include bibliographies, abstract and full text of articles. Bibliographic databases display only the bibliographic citations i.e., the name of the author, the title of the article/Journal, source of publication, year, volume and page numbers. The abstract databases in addition to the above said information provides an abstract or summary of the articles. The fulltext databases as the name suggest enables to download the full text of the article. ii. Gathering relevant information. The articles gathered either form books, Journal or on line sources could as such act a reservoir of information. These sources could lead to further information through the citation and references used. The list of journal and references referred in the articles could lead us further to the source of information. Also during the course of reading the articles the researcher can get insight into new variables or new avenues hitherto unexplored. iii. Presenting the Literature review . The literature should be presented in a clear and logical manner citing the author, year of study, objectives of the research, major findings and implications. The researcher should present the literature in a chronological order and in a coherent manner. There are several methods of citing references in the literature. The publication Manual of American Psychological Association (2001) offers detailed information regarding citations, quotations, references and so on. The Chicago Manual of Style also prescribes the format. 1.7.1.3 Identifying the research problem/question The next step is converting the broad problem into a research question. The research question is fact-oriented and requires gathering of information. A research question states the objective of the research study. It is a more specific question that must be answered. It can be more than one question or just only one. 1.7.1.4 Refining the research question The refined research question will have better focus and will enable to conduct research with more clarity than the initially formulated questions. In addition to fine-tuning the original question, other research question related activities should be addressed in this phase to enhance the quality of research work viz., 1. Examine the concepts and constructs used in the study. 2. Review the research question and break them down to second and third level questions. 3. Whether hypotheses are postulated in a proper and standard manner? 4. What is not included in the scope of the research questions?
If the research questions are well defined, the sub questions can be easily arrived at. However if the research question is poorly defined the researcher will need further exploration and question revision to refine the original question and generate the material for constructing the investigative questions. 1.7.1.5 Developing investigative questions Investigative questions are questions that the researcher must answer satisfactorily to arrive at a conclusion about the research question. To formulate them, the researcher should break down the research question into more specific questions for which the data is to be gathered. This fractioning process can be continued down to several levels with increasing specificity. The investigative questions guides to develop a suitable research design. They are the foundation for creating the research data collection instrument. In developing the investigative questions performance considerations, attitudinal issues and behavioral issues can be included, depending on the research problem. The problems in defining research questions There might be some problems in defining the research questions which are discussed below; The researcher may recast the management question so that it is amenable to favorite methodology. The existence of a pool or information or a database may distract the researcher to reduce the need for other research. All management questions are not researchable. To be researchable, a question must be one for which observation or other data collection can provide the answer. Some problems are complex, value-laden and bound by constraints. These ill defined questions have characteristics that are virtually the opposite of those of well-defined problems. These problems require a thorough exploratory study before proceeding. 1.7.2 Establishing Research Objectives The research objectives should be set once the research problem is finalized. Research objectives provide the guidelines for determining the other steps to be undertaken in the research process. If the objectives are achieved the decision maker will have the information needed to solve the problem. Research objectives justify the need for undertaking the research work. It provides a purpose and direction for the research. 1.7.3 Developing the research design A research design is the specification of methods and procedures for acquiring the information needed to structure or to solve problems. It is overall operational pattern or framework of the project that stipulates the information to be collected, the sources from which information can be collected and the procedures for collection of information. In other words the researcher should consider (1) The design technique, (2) The type of data, (3) The sampling methodology and procedures, (4) The schedule and the budget. A good research design ensures that the information obtained is relevant to the research problem in an objective and economical manner. The research design can be described as a master plan or model or blueprint for the conduct of investigation. 1.7.3.1The type of research design Most of the research objective could be met by using any one of the three types of research designs; exploratory, descriptive and casual research designs. Exploratory research focuses on collecting data using an unstructured format or informal procedures to capture data and to interpret them. It is often used to classify the problems or opportunities and it is not intended to provide conclusive information from which a particular course of action can be determined. Descriptive research uses a set of scientific methods and procedures to collect raw data and create data structures that describe the existing characteristics of a defined target population. For eg, the profile of the consumers, pattern of purchase behaviour etc. In descriptive research design the researcher looks for answer to the how, who, what, when and where questions concerning the different components of a market structure. The data and information generated through the descriptive designs can provide the decision makers with evidence that can lead to a course of action. 8
Casual research design deals with collecting raw data, creating data structures and information that will allow the decision maker or researcher to model cause-effect relationships between two or more market variables. The casual research designs enables to identify, determine and explain the critical factors that affect the decision making. However the research process is more complex, expensive and time-consuming. 1.7.3.2 The type of data The data can be grouped into two broad categories viz., primary and secondary. Primary data represent the first hand raw data that have been specifically collected for the current research problem. Primary data are raw, unprocessed and yet to receive any type of meaningful interpretation. Sources of primary data tend to be the output of conducting some type of exploratory, descriptive or casual research. The secondary data is the historical data previously collected and assembled for some other research problem. Secondary data can be usually gathered at faster and economical manner than the primary data. However the data may not fit in the researchers information need. The secondary data can be obtained form the libraries, website, published as well as unpublished documents etc., 1.7.3.3 Sampling methodology and procedure Sampling refers to randomly selected subgroup of people or objects from the overall membership pool of defined target population. Sampling plans can be broadly classified into probability and non probability sampling. In a probability sampling plan, each member of the defined target population is a known and has an equal chance of being drawn into the sample group. Probability sampling gives the researcher the opportunity to assess the sampling error. In the case of non probability sampling the research finding cannot be generalized and the sampling error cannot be assessed. The findings are limited to the sample which provided the original raw data. However non probability sampling may be the only choice in case where the population cannot be ascertained. . (A more detailed discussion on sampling is dealt in Unit 3 ) 1.7.3.4The time schedule and the budget The time schedule for completing the research along with the break up of time required for each task has to be ascertained. Scheduling will enable the completion of the project in time. A budget displays the sources and application of funds for the research. The budget may require less attention in case of a inhouse project or one which is sourced by the researcher . However a budget which is prepared for financial grants needs to be prepared very systematically supported with proper documentation. The budget may be prepared on various basis for eg., the Ruleof thumb budgeting where a fixed percentage is arrived on some criterion like a percentage of sales or previous years research budget. Task budgeting selects specific research projects to support on an adhoc basis. 1.7.4. Preparing a research proposal The research proposal is an oral or written activity that incorporates decisions made regarding the research work. It includes the choices the researcher made in the preliminary steps. A written proposal is often made when the study is suggested. It ensures the project purpose, methodology, time and budget. The length and complexity of the proposal varies according to the needs and desires of the researcher. Irrespective of the length of the proposal it should have two basic sections; statement of the research problem and the research methodology. 1.7.5 Data collection The data gathering phase begins with the pilot testing. It is done to detect the weakness in the research design, questionnaire/interview schedule and provides proxy data for selection of probability sample. The pilot testing should stimulate the procedure and protocols designed for data collection. If the study is to be conducted by email then the pilot questionnaire should be emailed. The size of the pilot group may range normally from 25 to 100 respondents who need not be statistically selected. There are a number of variations of pilot testing. Some of them may be restricted to data collection only. One form is pre testing where the responses are collected from colleagues, respondents surrogates or actual respondents for the main purpose of refining the questionnaire. Based on the pilot testing the questionnaire may be redesigned, rephrased and improved. Pre testing may be repeated many times to refine questions or procedures. Data are the facts presented to the researcher form the study environment. Data can be gathered from a singe location or from all over the world based on the research objectives and the resource allocation. The data collection method ranges from observation, questionnaires, laboratory notes and other modern instruments and devices. Data can 9
be characterized by their abstractness, verifiability, elusiveness and closeness to the phenomenon. As abstractions, data are more metaphorical than real. When sensory experiences consistently produce the same result then the data is said to be trustworthy as they are verified. Data capturing is elusive, complicated by the speed at which events occur and the time-bound nature of observation. Data reflect their truthfulness measured by the degree of closeness to the phenomena. Secondary data has at least one level of interpretation inserted between the event and its recording. Primary data are close to the truth. Data collected need to be edited for ensuring consistency and to locate omissions. In case of survey method editing reduces errors in the recording, improves legibility and clarifies unclear and inappropriate responses. Edited data are then converted into analyzable form. Computers can be used to find missing data, validate data, edit and code so that further analyses can be carried out in a valid manner. 1.7.6 Data Analysis and Interpretation Research is conducted for the purpose of acquiring information. Raw data as such does not provide information. Further analyzes needs to be done to crunch information out of data. Data analysis involves application of statistical techniques for reducing accumulated data to a manageable size leading to summaries. Responses acquired by way of administering questionnaires should be subjected to analysis so as to ascertain the behaviour of variable, the relationship between variables etc. Analysis should be focused to find answers to the research questions / hypothesis. Various statistical softwares are available to make the job of data analysis easier and scientifical. However the interpretation needs to be made with expertise as the recommendations are made on the basis on them. 1.7.7 Research Report It is only through reports the researcher communicates about the research work, findings and recommendations to the outside world. The report has to be prepared in the style that will be understood by the target audience. The reports may be communicated by way of written documents or in an oral manner, through letters or through telephone calls or a combination of all. The type of report varies depending on the type of research, length of report and the purpose. The researcher should take care to see that the report addresses all the objectives of research in a lucid manner. The report should be adapted to the needs of the target audience and care must be taken to use appropriate words in projecting the interpretation, recommendations and conclusion. A report should contain an executive summary consisting of synopsis of problem, findings and recommendations. It should speak about the background of the study, the statement of problem, literature summary, methods and procedures, findings, recommendations and conclusion. A detailed discussion on report writing follows in Unit 5. 1.8 Theoretical Framework A theoretical framework is a conceptual model of how one theorizes or makes logical sense of the relationship among the several factors that have been identified as important to the research problem concerned. To put it simply a theoretical framework involves identifying the network of relationship among the variables considered important to the study .It provides the conceptual foundation to proceed further with the research. The theory is developed based on the documentation of previous research studies undertaken in the relevant study area or similar problems. Understanding the conceptual framework enables to postulate hypotheses and test the relationships. A testable hypothesis can be developed to examine whether the theory formulated is valid or not. The hypothesized relationship can be tested by means of suitable statistical techniques. In case of applied research, testable hypothesis need not be evolved from the theoretical framework but still it is important as it provides a background for understanding the problem researched. Thus the entire research process rests on the soundness of the theoretical framework undertaken. Having a background knowledge of variables is absolutely necessary to understand the relationship so as to formulate testable hypotheses. A variable, as the name suggest takes varied values. The values may be different at varies time for the same object or person or at the same time for different objects or persons. For eg. Age is a variable, as it can be different for different consumers and also for a single consumer it varies as time evolves. 1.8.1 Types of variables 10
There are many types of variables like the dependent, independent, moderating, intervening, discrete , continuous ,extraneous etc., 1.8.1.1 Dependent variable As the name suggests the value of a dependent variable is influenced by other variables. It is the main variable of interest to the researcher. Understanding the variables that influence the dependent variables will lead to finding solutions to the problem. For this purpose the researcher will be interested in quantifying and measuring the dependent variable as well as the other variables that influences the dependent variables. For eg. sales of a organization is a dependent variable. The sales value depends on the demand, price fixed, environmental factors etc. The sales also vary from time to time. Hence it can be called as dependent variable. There can be more than one dependent variable in a study. In this case the researcher may be interested to know factors that influence all the dependent variables and difference in the degree of variance among the different dependent variables. 1.8.1.2 Independent variables An independent variable influences the value of dependent variable either in a positive or in a negative way. The variance in the dependent variable is accounted for by the independent variable. To manipulate the dependent variable the independent variable can be used. With each unit of increase in the dependent variable the independent variable may increase or decrease. The variance in the dependent variable is caused by the independent variables. To establish the casual relationship the independent variable is manipulated. For example age of a customer may influence the choice of a product. Here age is the independent variable and the Choice of the product is a dependent variable. Age Choice of product
Independent variable Dependent variable 1.8.1.3 Moderating variable The variable that moderates the relationship between dependent and independent variables is called as a moderating variable. The moderating variable has a strong contingent effect on the relationship between the independent and the dependent variable. The presence of a third variable modifies the original relationship between the dependent and independent variables. In the example discussed above the price of the product is a moderating variable. Though the age influences the price may moderate the choice of the product. Age Choice of product
Price Pric 1.8.1.4 Intervening variables e An intervening variable is one that surfaces between the time, the independent variable start operating to influence the dependent variable and the time the impact is felt on it. The intervening variable surfaces as a function of the independent variables operating in any situation and helps to conceptualize and explain the influences of the independent variables on the dependent variables. Age Independent Attitude Intervening Choice of product Dependent variable
1.8.2 Theoretical Framework: The need and features The theoretical framework is the foundation on which the entire research is carried out. It is logically developed, described and elaborated network of associations among the variables deemed to be relevant to the 11
problem situation and identified through such processes as interviews, observations and literature survey. Experience and intuition can also be taken up in developing the theoretical framework. To arrive at good solutions to the problem, correct identification of the problem and the variables contributing to the same is a must. After identifying the variables, the next step is to elaborate the network of associations among the variables. This will enable formulation of hypotheses which can be subsequently tested. The literature survey provides a solid foundation to develop the theoretical framework. Through literature survey the variables that are important are identified through previous research findings. This forms the basis for a theoretical model. The theoretical framework elaborates the relationship among variables, explains the theory underlying the relations and describes the nature and directions of the relationship. The theoretical foundations provide the basis for developing testable hypotheses. The following are the basic features of a theoretical framework; The variables influencing the research problem should be clearly identified, defined and discussed. The discussion should also highlight the relationship between the variables so identified. The type of relationship for eg. Positive or negative should be highlighted. The reason for assuming the type of relationship should be mentioned drawing on the previous research studies identified through the literature review. A model showing the relationship among the variables can be given so that the concepts can be visualized and understood clearly by the reader. 1.9 Hypothesis : Types and Testing procedure A hypothesis can be defined as a logically conjectured relationship between two or more variables expressed in the form of a testable statement. Relationships are assumed on the basis of the network of associations established in the theoretical framework. Formulating such testable statement is called hypothesis development. The hypothesis can be grouped on the following basis; 1. Statement of hypotheses Hypothesis can be expressed either as propositions or in the form of If-then statements. Example: Aged customers will be inclined to take insurance policy If customers are aged, then they will be inclined to take insurance policy 2. Directional and Nondirectional Hypotheses The hypothesis which indicates the type or direction of relationship between variables is called as directional hypothesis. In specifying the relationship between variables the terms such as positive, negative, more than, less than and the like are used in these hypotheses. Eg., High income consumers spend more on consumer durables. Non directional hypotheses postulate relationship but does not offer indication of the direction of the relationship. Eg., Education of the respondent does not have an influence on the importance given to the information source. Nondirectional hypotheses are formulated in the case where previous studies have not explored the direction of relationship or there is no evidence to assume the direction of the relationship among the variables. The previous research studies may give rise to conflicting findings which will also be the reason for nondirectional hypothesis. 3 Null and alternative hypotheses Null hypotheses states that there is no significant relationship between the variables. Null hypotheses also state that there is no difference between what we might find in the population characteristics and the sample that is being studied. It is implied through null hypotheses that the difference if any between the two samples groups or any relationship between two variables based on our sample is simply due to random sampling fluctuations and not due to any true differences between the two population groups. The null hypotheses so formulated are tested for possible rejection. It may state that the population correlation between two variables is equal to zero or that the difference in the means of two groups in the population is equal to zero. 12
The hypotheses generation and testing can be done through both induction and deduction. In deduction, the theoretical model is first developed, testable hypotheses are formulated on the basis of the theoretical framework, data collected and then the hypotheses are tested. In the inductive process, new hypotheses are formulated based on the known facts collected already which are subjected to test. The findings would add to the knowledge and help to build a theoretical framework. 1.9.1 Hypothesis testing: Meaning and Approaches The purpose of hypothesis testing is to determine the accuracy of the hypotheses framed due to the fact that the data is collected from sample and not from the entire population. The accuracy of hypotheses is evaluated by determining the statistical likelihood that the data reveal true differences and not the random sampling error. There are two approaches to hypothesis testing; classical or sampling theory and the Bayesian approach. Classical approach is mostly used in research application. This approach represents an objective view of probability and the decision making is made totally on an analysis of available sampling data. A hypothesis is accepted or rejected based on the sample data collected. The sample drawn may vary at least to a smaller extent from the population and hence it is a must to know whether the differences are statistically significant or insignificant. A difference is statistically significant if there is a good reason to believe that the difference does not represent the random sampling fluctuations only. Bayesian statistics also use sampling data for decision but go beyond them and considers all other available information. The additional information consists of subjective probability estimates stated in terms of degrees of belief. The subjective estimates are based on general experience rather than on specific data collected . They are expressed as a prior distribution that can be revised after sample information is gathered. The revised estimate known as posterior distribution can be further revised by additional information and so on. Various decision rules are established, cost and other estimates can be introduce and the expected outcome of the combination of these elements are used to judge the decision alternatives 1.9.2 Statistical testing procedure The sequence for testing a hypothesis is discussed below; State the null hypothesis The null hypothesis is mostly used for statistical purposes. It is used in the case where the researcher is interested in testing a hypothesis of change or difference. Choose the statistical test Appropriate statistical test must be chosen to test the hypothesis. Criteria is to be employed in selecting the appropriate statistical test. The criteria may include the type of sample used, the nature of population, type of measurement scale used etc. Select the desired level of significance The level of significance must be decided before the data collection. Commonly used level is .05 however .01 is also widely used. The other significance level are.10,.025 and .001. The level of significance is determined on the basis of extent of risk the researcher is willing to accept and the effect of the choice on the risk. The larger the level of significance lower will the risk. Compare the calculated difference value Once the data is collected then a selected statistical formula is used to obtain the calculated value Obtain the critical test value After obtaining the calculated value, the critical value is to be obtained from the appropriate statistical table. The critical value is the criteria that defines the region of rejection from the region of acceptance of the null hypothesis Interpret the test If the calculated value is larger that of the critical value the null hypothesis is rejected and concluded that the alternative hypothesis is accepted. If the critical value is larger the null hypothesis is accepted. 1.9.3 Parametric and Nonparametric test There are two general classes of significance tests: parametric and non parametric. 13
In the case of Parametric test the data are derived from interval and ratio measurements. Nonparametric tests are used to test the hypotheses with nominal and ordinal data. The following assumptions are made in case of parametric tests The observation must be independent ie the selection of one case should not affect the chances of any other cases to be included in the sample The observations should be drawn from normally distributed populations The population should have equal variances The measurement scales should be at least interval so that arithmetic operations can be used with them Non parametric tests have few assumptions. They are easy to understand and simple to use. They do not specify normally distributed population or homogeneity of variance. Some tests require independence of cases others are designed for related cases. Nonparametric tests are the only ones usable with nominal data; they are the only technically correct test to be used with ordinal data. Non parametric test can also be used in the case of interval and ratio data. However it will result in waste of some information available. The non parametric tests are highly efficient as compared to parametric tests. Non parametric test with the sample of 100 will provide the same statistical testing power as a parametric test with a sample of 95. 1.9.4 Types of test Five different types of tests can be applied to test the hypotheses viz., One-sample test, two independent sample test, two related sample test, K Independent sample test and K related sample test. In order to test the hypothesis a particular test has to be selected based on the following criterion; The samples involved viz., one sample, two samples or k samples. In case of two samples it has to be identified whether the individual cases are independent or related. The type of scale used i.e., nominal, ordinal, interval or ratio The following section explores the type of test to be used based on the three criteria discussed above; I. One-sample Tests One-sample tests are used when a single sample is taken and test is undertaken to know whether the sample come from a specified population. Parametric tests The parametric tests Z or t-test can be used to determine the statistical significance between a sample distribution mean and a parameter. When sample sizes are beyond 120 then the t and z distributions are virtually identical. Non-parametric tests Different types of non parametric tests may be used in the case of one-sample test depending on the measurement scale used and other conditions. If measurement scale is nominal, binomial or chi-square test can be used. The binomial test is appropriate when the population is viewed as only two-classes such a male and female, buyers and non-buyers and all observations fall into one of these categories. The binomial test is useful when the size of sample is very small and the chi-square test cannot be used Chi-square test is the most widely used non parametric test of significance. It is particularly useful in those tests involving nominal data but can also be used for higher scales. Using this technique, the significant differences between the observed distribution of data among categories and the expected distribution are tested on the null hypothesis. This test can be used in one sample, two independent samples or k independent samples. It must be calculated with actual counts rather than percentages. The formula for the chi-square( x 2 ) test is x2 =
t =1 k
( Oi E1 ) 2
Ei
Oi = observed number of cases categorized in the ith category Ei = Expected number of cases in the ith category under Ho K = the number of categories II. Two-independent samples tests 14
The need for two independent samples tests is often encountered in research. One can compare the purchasing predisposition of a sample of subscribers from two magazines to discover if they are from the same population. Parametric tests The z and t-tests are frequently used parametric tests for independent samples, however F test can also be used The z test is used with sample sizes exceeding 30 for both independent samples or with smaller samples when the data are normally distributed and population variances are known. The formula for the z test is x x 2 ( 1 2 ) X 0 z= 1 S12 S 22 + n1 n 2
In the case of small sample sizes, normally distributed populations and assuming equal population variances, the t-test is appropriate: x x 2 ( 1 2 ) X 0 t= 1 1 2 1 Sp + n n 2 1
is the difference between the two population means is associated with the pooled variable estimate: ( n 1) S12 + ( n2 1) S 22 2 Sp = 1 n1 + n2 2 Non-parametric tests The chi-square test is appropriate for situations in which a test for differences between samples is required. It is especially valuable for nominal data, however it can be used with ordinal measurements also. The formula slightly differs from earlier one and it is as below; (Oij Eij ) 2 2 x = i j E ij Oij = Observed number of cases categorized in the ijth cell Eij = Expected number of cases under Ho to be categorized in the ijth cell. III Two Related Samples Test The two related samples tests are used in situations in which persons, objects or events are closely matched or the phenomena are measured twice. The efficiency of workers before and after training can be measured. Parametric Tests The t-test for independent samples is inappropriate because one of the assumptions is that observations are independent. This problem is solved by a formula where the difference is found between each matched pair of observations, thereby reducing the two samples to the equivalent of one-sample case. In the following formula , the average difference, D corresponds to the normal distribution when the (alpha) difference is known and the sample size is sufficient. The statistic t with (n 1) degrees of freedom is defined as t= D SD
( 1 2 )
( D) D D2 where D = n n n SD = ( n 1)
Nonparametric Tests The McNemar test may be used with either nominal or ordinal data and is especially useful with before-after measurement of the same subjects. IV. K Independent Samples Tests 15
K independent samples tests are normally used in management and economic research when three or more samples are involved. The test is concerned with whether the samples might come from the same or identical populations. When the data are measured on an interval-ratio scale and the necessary assumptions are met then the Analysis of Variance and F test are used. If the assumptions cannot be met or if the data are measured on ordinal and nominal scale then the nonparametric test can be selected. The samples are assumed to be independent. Parametric Tests Analysis of Variance(ANOVA) is a statistical method of testing the null hypothesis that the means of several populations are equal. To use ANOVA certain conditions must be met. The samples must be randomly selected from normal populations Populations should have equal variances The distance from one value to its groups mean should be independent of the distance of other values to that mean ANOVA breaks down or partitions total variabaility into component parts. It uses squared deviations of the variance so computation of distances of the individual data points from their own mean or from the grand mean can be summed up. In ANOVA model each group has its own mean and values that deviate from that mean. Similarly all the data points form all of the groups produce an overall grand mean. The total deviation is the sum of the squared differences between each data point and the overall grand mean. The total deviation of any particular data point may be portioned into between groups variance and withingroup variance. The between-groups variance represents the effect of the treatment or factor. The differences between-groups means imply that each group was treated differently and the treatment will appear as deviations of the sample means from the grand mean. The within-groups variance describes the deviations of the data points within each group from the sample mean. This results from variability among subjects and from random variation. This is often called error.when the variability attributable to the treatment exceeds the variability arising from error and random fluctuations, the viability of the null hypothesis begin to diminish. The test statistic for ANOVA is the F ratio. It compares the variance between two sources; F = Between-groups variance Mean square between ------------------------------ = -------------------------Within-groups variance Mean square within Mean square between = Sum of squares between ----------------------------Degrees of freedom between Mean square within = Sum of squares within ----------------------------Degrees of freedom within IV. K Related Sample Case Parametric test A k related sample test is required in the following situations; The grouping factor has more than two levels The observations or subjects are matched or the same subject is measured more than once The data are atleast interval In this method it is often necessary to measure subjects several times. These repeated measurements are called trials. The repeated-measures ANOVA is a special type of n-way analysis of variance. Nonparametric test In case of k related samples to be measured on a nominal scale the Cochran Q test is appropriate. The test extends McNemar test. It tests the hypothesis that the proportion of cases in a category is equal for several related categories. When the data are ordinal, the Friedman two-way analysis of variance is appropriate. It tests matched samples, ranking each case and calculating the mean rank for each variable across all cases. It uses ranks to compute a test statistic. The basis aspects of a research design 16
The need and the major types of research design 1.10 Research Design The research design is a blue print of action. It involves a series of rational decision making choices regarding the purpose of the study, its scope, its location, the type of investigation, the extent to which its is controlled and manipulated by the researcher, the time aspects, the collection, measurement and analysis of data. It is a plan and structure to obtain answer to the research questions. It aids the researcher in the allocation of the resources in a well defined manner. The more sophisticated and rigorous the research design is the greater outcome of the research. The essentials of research design: It is an activity and time based plan The design is based on the research questions The design guides the selection of sources and types of information. It is a framework for specifying the relationship among the studys variable It outlines the procedures for every research activity The overall research design can be split into the following parts: The sampling design which deals with the method of selecting the samples for the purpose of conducting the study The observational design which deals with the conditions under which the observation is made The statistical design which is concerned with the number of samples to be observed and the how the data gathered is to be analyzed The operational design which relates to the techniques by which the procedures specified in the sampling, statistical and observational designs can be carried out. In a nut shell the research design should contain the following A clear statement of the research problem Procedure and techniques proposed for gathering information The population involved in the study The methods to be used in processing and analyzing the data 1.10.1 Need for research design The research design has to be prepared on account of the following reasons; Research design is the blueprint of the proposed research to be conducted. It enables to plan the various activities and provides an insight into the type of difficulties that may arise so that the researcher may be prepared to tackle the same. Since the research design is the plan regarding the sampling procedure, data collection method and various other activities to be performed in the proposed research , the same can be discussed with others and based on the critical comments the flaws and inadequacies can be tackled leading to an effective research design. It gives an idea regarding the type of resources required in terms of money, manpower, time and efforts It enables the smooth and efficient conduct of various research operations The research design affects the reliability of the research findings and as such it constitutes the foundation of the entire research work 1.10.2 Classification of research designs Every researcher is faced with the task of identifying a suitable research design to carry out the study. The following section explores the types of research design, based on select criteria Criteria Method of data collection Types Monitoring Interrogation/communication Researchers control on Experimental variables Expost facto 17
Purpose of study Time dimension Scope Research environment Participants perception Types of investigation Unit of analysis
Extent of crystallization
Descriptive Casual Cross-sectional Longitudinal Case Statistical study Field setting Laboratory research Simulation Actual routine Modified routine Casual Correlational Single Dyad Group Organization/Nation Formal study Exploratory study
1. Method of data collection The research study may assume the characteristics of monitoring and interrogation. Monitoring includes the studies where the researcher inspects the activities of a subject without attempting to elicit response from anyone. For eg observing the behaviour of consumers in a departmental store. In the interrogation/communication study the researcher questions the subjects and collects their responses. The data can be collected through questionnaires, interviews or experimental methods. 2. Researchers control of variables On the basis of researchers ability to control or manipulate the variables two types of research design could be arrived at viz., experimental and ex post facto designs. In an experimental design, the researcher attempts to control or manipulate the variables in the study. The variables may be kept constant or may be changed to know the effects. Experimental design is appropriate when one wishes to understand and explore the effects of certain variables on the other. In an ex post facto design investigators have no control over the variables. The variables cannot be manipulated. The researcher can only report what has happened or what is happening. 3. Purpose of the study On the basis of the purpose of the study two categories of research design can be arrived at viz., descriptive and casual studies. Descriptive studies The research concerned with finding out who, what, where, when or how much is a descriptive study. The descriptive studies are more formalized and has a structure with clearly stated hypotheses or investigative questions. They can serve a variety of objectives viz., Description of phenomena or characteristics Estimates of the proportion of a population that have a certain set of characteristics Discovery of association among different variables. This is commonly called as correlational study The Descriptive studies present data in a meaningful form and thus helps to understand the characteristics of a group in a given situation. It enables to think systematically about the aspects in a given situation. It offers ideas for further research and help to make simple decisions. A descriptive study may be simple or complex and can be done in many settings. A simplest descriptive research can study about the size, form, distribution etc. Casual studies 18
If the researcher is concerned with analyzing how one variable produces changes in another, it is called a casual study. The casual study attempts to explain the relationship among variables. The concern in casual analysis is to analyse how one variable affects or is responsible for changes in the other variable. There are three possibilities of relationship that can occur between variables; 1. Symmetrical 2. Reciprocal 3. Asymmetrical In the case of Symmetrical relationship two variables fluctuate together but it is assumed that changes in neither variable are due to changes in the other. Symmetrical conditions are usually found when two variables are alternate indicators of another cause or independent variable. Reciprocal relationship exists in the case where two variables mutually influence or reinforce each other. Asymmetrical relationship exists where the changes in one variable viz the independent variable is responsible for changes in another variable viz the dependent variable. The dependent and independent variables are identified on the basis of : The degree to which each variable may be altered. The variable which are relatively unalterable is called independent variable The time order between the variables. The independent variables precedes the dependent variable. Four types of asymmetrical relationship can exists: 1. Stimulus response; where an event or change results in a response from some object. A stimulus is an even or force. A response is a decision or reaction. eg. decrease in price might result in increase in the number of units sold 2. Property disposition; where an existing property causes a disposition. A property is an enduring characteristic of a subject that does not depend on circumstances for its activation. Disposition is a tendency to respond in a certain way under certain circumstances. eg. gender and attitude towards genocide 3. Disposition behaviour; where a disposition causes a specific behaviour A behaviour is an action eg. opinion about the stores image and purchase 4. Property- behaviour; where an existing property causes a specific behaviour eg. family size and purchase of car. Testing casual hypotheses To test casual hypotheses, three types of evidence can be opted: 1. Covariation between the variables 2. Time order of events moving in the hypothesized direction 3. No other possible causes for change in the dependent variable. Causation and experimental Design In case of experimental design apart from the above three conditions two other requirements must be met: All factors except the independent variable must be held constant and not confounded with another variable that is not part of the study. This is called a control group. Each person in the study must have an equal chance for exposure to each level of the independent variable. This is called random assignment of subjects to groups Causation and Ex Post Facto Design: In case where research studies cannot be carried out experimentally by manipulation of variable, the subjects which has been exposed to an experimental variable and those which are not exposed are studied 4. The time dimension On the basis of time involved in conducting the study two classifications is possible; Cross sectional and longitudinal study. The cross sectional studies are carried out once and represent a snapshot of the happening in a study at a point of time. Longitudinal studies are repeated over an extended period. In longitudinal studies the changes over a period of time can be tracked. In the case of longitudinal studies where a panel is used, the same members of the panel can be used for the entire period of study. The longitudinal studies involve more time and budget. Hence attempt can be made to use 19
adroit questions involving past, present and future expectations in the cross sectional study itself. However care must be taken in interpreting the findings. 5. Scope The topic of the research may involve a particular case study or it may be a general study. The general study attempts to capture a populations characteristics based on the inference drawn from sample characteristics. Hypotheses are formulated and tested based on quantitative data. Generalizations of the findings are made based on the findings of the sample study. These studies have breadth rather than the depth. Case studies place more importance on a holistic analysis of a fewer events or conditions and their interrelations. The study relies to a greater extent on the qualitative data. It provides more input and valuable insight to problem solving, evaluation and strategy. The details regarding the problem in hand are collected from multiple sources of information. Generalizations cannot be made from the case study as the findings are specific to the particular problem in hand. However a single, well designed case study can provide a major challenge to a theory and also provide the source for framing new hypotheses. 6 The research environment Research may be conducted in actual, manipulated or simulated conditions. Research conducted in actual environmental conditions is called as field study. The studies conducted under staged or manipulated conditions are termed as laboratory studies. Simulated studies involve replication of the essence of a system or process. Simulations are used to a wider extent in operations research where the major characteristics of various conditions and relationships in actual situations are often represented in mathematical models. Simulations can also take the form of role playing and other behavioral activities. 7 Participants perceptions The usefulness of a design may be reduced when people in a disguised study perceive that the research is being conducted. Participants perceptions influence the outcomes of the research. When participant perceive that something out of the ordinary is happening, they may behave less naturally. In this context three situations are likely to arise; Participants perceive no deviations from the routine. Participants perceive deviations, but as unrelated to the researcher. Participants perceive deviations as researcher-induced. 8. Type of investigation Research study can take the form of casual or correlational investigation. A casual study is conducted to establish a definitive cause and effect relationship. In this case the objective of the research is to delineate one or more factors that are causing the problem. The intention of the researcher conducting a casual study is to be able to state that variable X causes variable Y. Thus the study in which the researcher wants to delineate the cause of one or more problems is called a casual study. If the researcher wants to identify or delineate the important factors associated with the problem then a correlational study is suitable. The type of questions asked and the way in which problem is defined determines whether a study is casual or correlational. 9. Units of Analysis The research question decides the unit of analysis. The research study may be involved in collecting information from the individual units involved in the study in which case it is an individual study. If the research involves studying the interaction between two or more individuals then several two person groups also known as dyads will become the unit of analysis. The group study involves studying a group for eg. Comparison of the motivation level among the workers in the different departments is a group study. Likewise the unit of study may involve the organization or nation. 10. Extent of crystallization of Research question The research study can be classified as formal or exploratory. The classification is made on the base of the degree of structure and the objective of the study. (a)Formal study
20
Formal study begins where exploratory studies ends. It begins with a hypothesis or research question and involves precise procedures and data source specification. The goal of formal research design is to test the hypothesis or answer the research questions. (b)Exploratory study Exploratory study has loose structures with the objective of discovering further research tasks. An exploratory study is undertaken when the existing knowledge base on the problem selected is very limited or not available. In such cases preliminary research work needs to be done to gain familiarity with the situation. It is undertaken with the idea of comprehending the nature of the problem since very few studies have been conducted in that area. The data are mostly collected through interviews or observation. When data reveal some pattern regarding the problem at hand, theories are developed and hypotheses are postulated for subsequent testing. The immediate purpose is to develop hypotheses or questions for further research. Exploratory studies are also conduced when some facts are known and more information is needed for developing a viable theoretical framework. The exploratory study is finished when the researcher has achieved the following purpose; Establish the major dimensions of the research task Defined a set of subsidiary investigative questions that can be used a s guideline to a detailed research design Develop hypotheses about possible causes of the problem Learned the boundaries and scope of the proposed research study Decided that additional efforts or further research is not feasible. The objectives of exploration may be accomplished with different techniques. Exploratory study to a greater extent depends on the qualitative techniques; however quantitative techniques may also be used. Several techniques are available for conducting exploratory investigations; In-depth interviewing Participant observation Films, photographs and videotapes Projective techniques and psychological testing Case studies Street ethnography Document analysis Proxemics and Kinesics Combining the approaches listed above four techniques could be derived Secondary data analysis Experience survey Focus groups Two-stage design i. Secondary data analysis Data collected by others for the purpose of conducting their research is called secondary data. If the present study the researcher is conducting can use the same data then time and money can be saved by means of not conducting the study again. The researcher can explore the organizations archives for the data. Report of prior research studies would reveal the successful and unsuccessful methods adopted in the previous research studies. Browsing through the earlier research studies will also reveal the less attempted problem areas which can be addressed in the present research. The researcher can look into the published documents in the form of books/journals by outside organizations. They can be a rich source of hypotheses. The e-sources and the library will provide the needed information. The search of secondary sources will provide the background information about the research to be conducted and also will provide a fair idea about the areas to be pondered. ii. Experience survey Experience survey involves collecting information form the people experience or knowledgeable in the particular area of study. The data would be collected from their memories and experiences .The ideas on important 21
issues and the subject matter can be explored. The investigative format is more flexible. The outcome of the interview would be new hypothesis, discarding an old one or information on doing the study in a better manner. iii. Focus groups A focus group is a panel of people who meet for about 90 minutes to 2 hours and discuss about the subject matter led by a trained moderator. The facilitator uses group dynamics to focus or guide the group in the exchange of ideas, feelings and experiences on a specific topic. The focus group is made up of 6 to 10 respondents. Too small or too large a group may not be effective in meeting the objective. The outcome of the focus group will be a list of ideas and behaviourial observations with the observations and recommendations made by the moderator. The qualitative data produced from the focus group can be used for enriching the knowledge. Depending on the topic separate focus groups could be run for different subset of the population. Homogeneity in the focus group will be more effective and produce maximum results. The focus groups can be conducted in a face to face manner, through telephones, internet (e-groups) and through videoconferencing. iv.Two stage design In the exploratory stage, the researcher does not know much about the problem in hand but needs to know more before proceeding further in terms of time and resources. A two stage design would be useful in this situation. With this approach exploration becomes a separate first stage with limited objectives: (1) clearly defining the research question and (2) developing research design. A limited exploration at a lesser cost carries little risk for the researcher and enables to uncover information that reduces the total research cost. SUMMARY: This unit has examined some of the basic aspects of research. The importance of knowledge of research in business setting was emphasized. The hallmarks of scientific research viz., purposiveness, rigor, testability, replicability, precision and confidence, objectivity, generalizability and parsimony were described. The steps involved in hypothetico-deductive research were discussed. The various steps involved in undertaking the research were dealt in detail. The issues involved in development of hypothesis and the parametric and non parametric test were examined. With the impetus to the background of research, the next unit deals with the issues concerning the research design and its types. Have you understood? What is research? Explain the need for the same Discuss the hallmarks of scientific research. Explain the process of deduction and Induction with an example Discuss the building blocks of scientific research. What are the steps in Hypothetico-dedcutive research?. Explain them using an example Describe the research process in detail. Discuss the steps involved in problem identification Why should literature survey be conducted? Discuss the need for theoretical framework and highlight the features of the same What is hypothesis? Discuss the types. Explain the steps involved in formulation and testing the hypothesis Discuss the various methods of testing the hypothesis Explain the meaning and significance of research design What are the basic research design issues? Discuss them in detail. Is single research design suitable for all research studies? If not why? Discuss the exploratory research design in detail. Discuss the different types of research design. Site a situation to which each design is applicable to. Unit 2 22
Experimental Design 2.1 Introduction Experimental design enables a researcher to alter systematically the variables involved in the study. The experimental design involves intervention by the researcher. The researcher intervenes by way of manipulating the variables in a setting and observes the effect on the subjects studied. Under experimental design the independent variables are manipulated and the effects of the same on the dependent variables are observed. This units deals with the discussion on activities involved in conducting an experiment, the factors affecting the validity in experimentation and the various types of experimental designs. Measurement of variables is necessary for testing the hypotheses. The nominal, ordinal, interval and ratio scales are dealt in dealt. The process involved in selection and construction of measurement scales are discussed in detail. 2.2 Learning objectives: After reading this unit you should be able to; Define experimental design and elucidate the benefits and drawbacks Have insight on the internal and external validity in experimental designs Distinguish the different types of experimental designs Understand the concept of measurement, scales and the major sources of measurement error Understand the meaning of scaling and the six critical decisions involved in selecting an appropriate measurement scale Know the different forms of rating and ranking scales Be conversant with the five ways of constructing the measurement scales 2.3 The benefits and drawbacks The benefits of an experimental design are listed below; The researcher can manipulate the independent variable and thereby understand the effect on the dependent variable. This will lead to understand the existence and potency of the manipulation. The effect of the extraneous variables can be eliminated by means of effective control which will lead to authentic findings. The experimental design offers convenience and is economical compared to other methods as the scheduling of data collection and control of variables are decided by the researcher himself. The experiment can be replicated with different subject groups and conditions and thereby enables to understand the effect of independent variables across people, situations, conditions and time. The researcher in some situations can use field experiments to reduce the subjects perception of the researcher as intervention or deviations in their everyday lives. The experimental design suffers from the following limitations; The research is undertaken in artificial settings and hence the subjects may not behave as they do under the normal circumstances. Genaralization from non probability sampling can lead to difficulty in extending the findings. Many times the cost overruns arise in the experimental designs Experimental studies are conducted to solve present or future problems. However the past or occurred events cannot be dealt with in the experimental designs. Management research is mostly concerned with people and hence manipulation and control of human elements are subject to ethical considerations 2.4 Activities involved in conducting an experiment The following activities are involved in the experimental research: 2.4. 1. Selecting relevant variables 23
The researcher in the course of the conduct of the study develops hypotheses to meet the objectives of the research. The hypothesis describes the relationship between two or more variables. The researcher should; select the variables that best represent the concepts to be tested, determine the number of variables to be tested and select or design appropriate measures for them. The number of variables selected in a research study is subject to the budget allocated, the time frame available, number of subjects being tested and the like. 2.4. 2. Specifying the levels of treatment The treatment levels of the independent variable are the distinctions, the researcher makes between different aspects of the treatment conditions. For eg., if attitude is hypothesized to have influence on the purchase behaviour, the attitude may be grouped into three levels viz., positive, negative and neutral. 2.4. 3. Controlling the experimental environment The control may be exercised on the experimenter/researcher, subject or the environment. The environmental control is concerned with holding the physical environment in which the experiment is conducted as constant. The subjects may not know that the experiment is being conducted. This situation is mentioned as blinding the subject. When the experimenter/researcher is also unaware of the experiment, then is called as double blinded. The control refers to avoiding the effect of extraneous variables on the research study conducted. 2.4. 4. Choosing the experimental design Choosing an apt experimental design improves the probability that the observed change in the dependent variables is caused by the manipulation of the independent variable only and not by another variable. It strengthens the generalizability of the result. 2.4. 5. Selecting and assigning the subjects The question of selecting and assigning the subjects do not arise in case where the entire population is considered for the study. However mostly the researcher will depend on sample to conduct the study. In order to validate and generalize the findings of the research study the samples selected should be representative of the population. The sample may be selected on the basis of random selection or systematic sampling .Random assignment of subjects to groups should be followed. When it is not possible to randomly assign the groups then matching may be used. Matching is based on non-probability quota sampling approach. The object of matching is to ensure that each experimental and control subject matches on every characteristics used in research. 2.4. 6. Pilot-testing, revising and testing Pilot testing reveals error in the design and improper control of extraneous or environmental conditions. Pretesting the instruments enables refinement of the same before the final test. It enables to revise scripts, take stock of control problems with laboratory conditions and scan the environment for factors that might confound the results. 2.4. 7. Analyzing the Data Pretesting and proper planning enable to have an order and structure in the experimental data collected. The data are more conveniently arranged as a result of the levels of treatment condition, pretest and post-test and the group structures. This enables to apply statistical technique in a simplified manner. 2.5. Validity in Experimentation The findings of Experimental design are judged by measuring the internal and external validity. The Validity is the extent to which a measure accomplishes its claims. Two groups of validity exists; internal and external. Internal validity is concerned with identifying whether the conclusions drawn from a demonstrated experimental relationship truly imply cause. External validity is concerned with the genaerlizability of observed casual relationship across persons, settings and times. 2.5.1 Factors affecting internal validity: Internal validity refers to the confidence that one can place in the cause-effect relationship. In other words, it addresses the question, To what extent does the research design permit us to say that the independent variable A causes the change in the dependent variable B?. Factors affecting internal validity causes confusion as to whether the observational differences are due to experimental treatment or extraneous factors. An experiment has high internal validity if the researcher has the confidence that the experimental treatment has been the source of change in the dependent variable. The factors listed below affects the internal validity: History 24
Maturation Testing Instrumentation Selection Statistical regression Experimental mortality Diffusion or imitation of treatment Compensatory equalization Compensatory rivalry Resentful Demoralization of the disadvantaged Local history 1. History In the experimental designs a control measurement (O1) of dependent variable is taken before introducing the manipulation (X). After the manipulation an after measurement (O2) of the dependent variable is taken. Then the difference between O1 and O2 is attributed to the manipulation. However some events may occur during the course of the experimental study which will affect the relationship between the variables under the study. 2. Maturation The subjects considered for experimentation might change with the passage of time and may not be due to the occurrence of any specific event. This happens particularly when the study covers a long period of time. 3. Testing The process of taking a test can affect the scores of further tests. The first test would have created some awareness and learning experience which influences the results of the subsequent tests. 4. Instrumentation The threat to validity may arise due to the observer or the instrumentation. Using different observers or interviewers affects the validity of the study. If the same observer is used for a longer period of time, it may affect the validity due to observers experience, boredom, fatigue and anticipation of results. Difference in the questions for each measurement affects the validity 5. Selection Differential selection of subjects for experimental and control groups affects the validity. Validity considerations require the groups to be equivalent in every aspect. The problem can be overcome by randomly assigning the subjects to experimental and control groups. In addition matching can be done. Matching is a control procedure to ensure that experimental and control groups are equated on one or more variables before the experiment. Matching the members of the groups on key factors also enhances the equivalence of the groups. 6. Statistical Regression This factor operates especially when members chosen for the experimental group have extreme scores on the dependent variable. For eg., If a manager wants to test if he can increase the salesmanship qualities of the sale personnel through training program, he should not choose those with extremely low or extremely high abilities for the experiment. This is because, those with very low score ie those with low current sale abilities have a greater probability of showing improvement and scoring closer to the mean test after being exposed to the treatment. This phenomenon of low scorers tending to score closer to the mean is known as regressing towards the mean. Likewise those with very high abilities would also have a greater tendency to regress towards the mean they will score lower on the posttest than on the pretest. Thus, those who are at either end of the continuum with respect to the variable would not truly reflect the cause-and-effect relationship. This phenomenon of statistical regression is a threat to internal validity. 7. Experimental mortality This factor arises due to the changes in the composition of study groups during the test. There may be drop outs in the study group leading to the changes in the membership of the group. This problem does not arise for the control group as they are not affected by the testing situation and they are less likely to withdraw.
25
All the above threat factors can be controlled to a certain extent by random assignment. However the following factors affecting internal validity cannot be controlled by randomization. Both the control group and the experimental group are affected by the first three factors. 8. Diffusion or imitation of treatment The interaction between the experimental and the control group may lead the control group to learn about the experiments eliminating the difference between the groups. 9. Compensatory equalization If the experimentation treatment leads to a desirable and beneficial outcome, then it may lead to an administrative reluctance to deprive the control members. Compensatory action directed at the control groups may confound the experiment. 10. Compensatory rivalry Compensatory rivalry arises when the member of the control group know that they are in the control group. This will generate competitive pressures causing them to try harder which will affect their normal behaviour 11. Resentful demoralization of the disadvantaged When the treatment is desirable and the experiment is obtrusive, control group members may become resentful of their deprivation and lower their cooperation and output. 12. Local history When all experimental persons are assigned to one group or session and all control people to another, there is a chance for some idiosyncratic event to confound results. This problem can be handled by administering treatments to individuals or small groups that are randomly assigned to the experimental or control sessions. 2.5.2 Factors affecting External Validity External validity is concerned with the interaction of the experimental treatment with other factors and the resulting impact on the ability to generalize the findings across times, settings or persons. External validity is high when the results of an experiment are applicable to a larger population. The following are the threats to external validity: 1. The reactivity of testing on the experimental treatment In the case of conducting pretest in an experimental design the subjects are sensitized and they react to the experimental stimulus in a different manner. The before-measurement effect can be particularly significant in case of experiments where the independent variable involved in the study is concerned with the change in the attitude. 2. Interaction of selection and the experimental treatment This threat is concerned with the process by which the test subjects are selected for the experiment. The population from which the subjects are selected may not be the same as the population to the result are extended to. It limits the generalizability of the findings. 3. Other reactive factors The experimental settings may have a biasing effect on the subjects response to the experimental treatment. An artificial setting will produce results that are not representative of the larger populations. If subjects know that they are participating in an experiment, they may not behave in a normal way; this affects the validity of the experimental treatment. In addition yet another reactive effect is the possible interaction between X and the subject characteristics. 2.6 Experimental research designs Many experimental designs are available and they widely vary on the basis of their power to control contamination of the relationship between independent and dependent variables. On the basis of the characteristics of control the experimental design can be grouped under the following three heads: 1. Pre-experiments 2. True experiments/Lab Experiment 3. Field experiments A set of commonly used symbols are X = The exposure of an independent variable to a group of test subjects for which the effects are to be determined O = The process of observation or measurement of the dependent variable (effect outcome ) on the test subjects 26
R = The random assignment of test subjects to separate treatment groups 2.6.1. Pre-experimental designs Pre- experimental design represents the crudest form of experimentation and is undertaken only when nothing stronger is possible. The designs are characterized by and absence of randomization of test subjects. The preexperimental designs are weak in their scientific measurement power because they fail to control adequately the various threats to the internal validity. As a result they fail to meet the internal validity criteria. Three preexperimental designs are detailed below; 1. One-Shot Case Study 2. One-Group Pretest-Post-Test Design 3. Static Group Comparison 1. One-Shot Case Study: A single group of test subjects is exposed to the independent variable treatment X, and then a single measurement on the dependent variable is taken (01).One-Shot case study does not use pre test and control group. As a result this design is inadequate for establishing causality. For eg. A study on the employee education campaign about the automation of the office activities without a prior measurement of employee knowledge. Result would reveal only how much the employees know after the education campaign, but there is no way to judge the effectiveness of the campaign. This may be represented as shown below: X O Treatment or manipulation of Observation or measurement of independent variable dependent variable 2. One-Group Pre-Test Post test design First a pretreatment measure of the dependent variable is taken ( 01), then the test subjects are exposed to the independent treatment X, and there after a post treatment measure of the dependent variable is taken(( 02). This design meets the threats to internal validity better than the one-Shot Case study. Continuing the example given above it measures the awareness of the employees before the campaign and after the campaign. This can be represented in the following manner: O X O Pre- test Manipulation Post-test However other aspects of internal validity like the history, maturation, testing effect etc are not taken into account. Hence it is still a weak design only. 3. Static Group Comparison This design uses two groups; one receives the experimental stimulus and the other serves as a control group and is not given the treatment. The dependent variable is measured in both groups after the treatment. For eg in a field setting an experiment is designed to study the effect of a natural disaster (experimental treatment) on the psychological trauma ( measured outcome). A pre-test before the natural disaster say tsunami is possible but not on a large scale. Moreover the timing of the pretest would be problematic. The control group, receiving the post-test would consist of subject whose property is safe. The effect can be represented in the following manner; X = O1
-----------
O2 The addition of a comparison group increases the validity over the previous two designs. However there is no way to be certain that the two groups are equivalent. 2.6.2 True experiments/ Lab experiments The lab experiments are conducted in order to ascertain the cause and effect relationship between independent and dependent variables. In order to understand the cause-effect relationship all other variables which might contaminate the relationship should be controlled so that the actual cause effects of the investigated independent variable on the dependent variable can be determined. Thus the experiments performed in an artificial or contrived environment is known as lab experiments. In the lab experiments the researcher has complete control over all aspects. The researcher has a control over the experiment, who, what, when, where and how. The researcher can assign 27
subjects to conditions randomly. Random assignment is an unbiased assignment process that gives each subject an equal and independent chance of being placed in every condition. Random assignment is preferable because it allows one to conclude that any other variable could be confounded with the independent variable only by chance. Control over what, where, when and how of the experiment means that the experimenter has complete control over the way the experiment is to be conducted. Terms used: The following are the meaning of the terms normally used in experimental design: Factors: The independent variables of an experiment are often called the factors of the experiment. Active factors are those the experimenter can manipulate by causing a subject to receive one level or another. Blocking factor is one where the experimenter can only identify and classify the subject on an existing level. Level: A level is a particular value of an independent variable. Condition: The term condition is used to discuss the independent variables. It refers to a particular way in which subjects are treated. Treatment: this is another word used for condition. It also refers to the statistical test of the effect of various conditions of the experiment. Test unit: the experimental subjects are referred as test unit. The test unit may be people, organizations, machine type, materials and other entities. The basic principles of experimental research design are; 1. The existence of a control group or a control condition and 2. The random allocation of the subjects to groups The internal validity is higher in case of lab experiments. Internal validity as already explained the degree of confidence in the casual effects. In lab experiments the cause-and effect relationship are substantiated and hence the internal validity is higher. However the external validity i.e., the extent of generalizability of study is lesser as the research is executed in a contrived environment and the real world situation may not be same. Some of the true experimental designs are discussed below; 1. Pre-test-post-test Control Group Design Test subjects are assigned to the groups by a random procedure ( R ) to either the experimental or control group. Each group receives a pretreatment measure of the dependent variable. Then the independent treatment is exposed to the experimental group, after which both groups receive a post treatment measures of the dependent variable. The concept is represented in the following manner; R 01 X 02 R 03 04 The effect can be measured by E = (02 - 01 ) (04 - 03 ) The internal validity problems discussed earlier are addressed to a greater extent in this design. Local history may occur in one group and not in the other. Maturation, testing and regression are handled well as the same would be equally felt in experimental and control groups. Selection is equally dealt by random assignment. Mortality can be a problem if there is different drop out rates in the study groups. The external validity of the design is however questionable. 2. Post- Test Only Control Group Design Test subjects are randomly assigned to either the experimental or control group. The experimental group is then exposed to the independent treatment, after which both group receive a post treatment measure of the dependent variable. In this design, the pretest measurements are omitted. The design is R X O1 R O2 The experiment effect is measured by the difference between O1 and O2. The design is more simple and attractive. Internal validity threats from the history, maturation, selection and statistical regression are adequately controlled by random assignment. Since the subjects are measured only once, the threats of testing and implementation are handled. The different mortality rates between experimental and control groups continue to be a problem. The design reduces the external validity problem of testing interaction effect. 28
3. Extensions of True Experimental Design The researcher normally uses an operational extension to the basic design. These extensions differ from classical design in terms of The number of different experimental stimuli that are considered simultaneously by the experimenter and The extent to which assignment procedures are used to increase precision. a. Completely Randomized Design It involves two principles viz., the principle of replication and the principle of randomization. The essential characteristic feature of the design is that the subjects are randomly assigned to experimental treatments. This design is generally used to when experimental areas happen to be homogeneous. In case of this design all the variations due to uncontrolled extraneous factors are included under the heading of chance variation. Two forms of such design are 1. Two-group simple randomized design: In this design first all the population is defined and then from the population a sample is selected randomly. The selected sample is then randomly assigned to experimental and control groups. Thus, the design yields two groups as representatives of the population viz the experimental and control group. The two groups are given different treatments of the independent variable. This design of experiment is quite common in research studies concerning behavioural sciences. 2. Random replication design: The effect of the extraneous variables is not taken into consideration in the case of two-group simple randomized design. In the case of random replication design, the effect of the extraneous variables is minimized by providing a number of repetitions for each treatment. The repetition is called replication. Random replication design provides controls for the differential effects of the extraneous independent variables and also it randomizes any individual differences among those conducting the treatment b. Randomized block design In the randomized block design the subjects are first divided into groups known as blocks such that within each group the subjects are relatively homogenous in respect to some selected variable. The number of subjects in a given block would be equal to the number of treatments and one subject in each block would be randomly assigned to each treatment. This design is used when there is a single major extraneous variable. Random assignment is the basic way to produce equivalence among treatment group. Blocking is done to learn whether treatments bring different results among various groups of subjects. In this design two effects could be studied 1. The main effect is the average direct influence that a particular treatment has independent of other factors. 2. The interaction effect is the influence of one factor on the effect of another. The precision of this experimental design depends on how successfully the design minimizes the variance within blocks and maximizes the variance between blocks. If the response patterns are about the same in each block, there is little value to the more complex design and blocking is case may be counter productive. The randomized block design is analysed by the two-way ANOVA technique. c. Latin Square Design The randomized block design is used to minimize the effects of one extraneous variable whereas the Latin square design is used when two blocking factors are to be controlled. Each treatment occur an equal number of times in any one ordinal position in each row. Treatments are randomly assigned so that each treatment appears only once for each factor. Due to this aspect, the Latin square should have the same number of rows, columns and treatments. For example in order to find the effect of offering discount at 5%, 10%, 15%, two blocking factors can be identified viz, customer income and size of the store. Three levels can be identified on the basis of the blocking factor -customer income viz., High, medium and low. On the basis of store size, three levels can be identified viz., Large, Medium and small. On the basis of both the blocking factors customer income and size of the store, nine groups can be identified. for each of the group one treatment will be given and in the row or column in will not be repeated. This is illustrated below; Store size Large Medium Small High X3 X2 X1 Customer Income Medium X1 X2 X2 29 Low X2 X1 X3
Treatments are assigned based on random number tables. From the above the effects of price reduction can be ascertained. The major limitation of Latin square is that it is assumed that there is no interaction between treatments and the blocking factors. d. Factorial design In case of factorial design a researcher can deal with more than one factor simultaneously. This design is especially important in several economic and social phenomena where usually large number of factors affect a particular problem. Factorial designs can be of two types: (1) Simple factorial designs (2) Complex factorial designs (1). Simple Factorial designs: When the effect of varying two factors on the dependent variables is dealt, the design is called simple factorial design. This design is also known as two-factor-factorial design. A simple factorial design may be either a 2 X 2 , 3 X 4 or 5 X 3 or the like type of design. A 2 X 2 simple factorial design can be depicted as below: Experimental variable
Control variables Level I Level II
Treatment A Cell 1 Cell 2
Treatment B Cell 3 Cell 4
In this design extraneous variable to be controlled by homogeneity is called as the control variable and the independent variable which is manipulated is called experimental variable. (2). Complex factorial design: A design which considers three or more independent variables simultaneously is called a complex factorial design. This is also known as multi-factor factorial design. 2.6.3. Field Experiments: Quasi or semi Experiments Field experiment is done in natural environment, but treatments are given to one or more groups. It is not possible to control all extraneous variables in field conditions as the stimulus condition occurs in a natural environment. This situation warranties a field experiment. In the quasi experiment it is not possible to know when or to whom to expose the experimental treatment, however when and whom to measure can be determined. A quasi experiment is inferior to a true experimental design but is usually superior to pre-experimental designs. The results are more generalizable and so the external validity is more in case of field experiments. However the internal validity is lesser as the extent to which variable X alone causes variable Y cannot be ascertained A few quasi experiments are listed below: 1. Nonequivalent control group design This is a strong and widely used quasi experimental design. The test and control groups here are not assigned randomly. The design is diagrammed as below: O1 X O2 ------------------- ---------O3 O4 Two designs are possible viz., intact equivalent design and the self-selected experimental group design In Intact equivalent design members of the experimental and control groups is naturally assembled. This design is useful when any type of individual selection process would be reactive.
30
In the self-selected experimental group design the volunteers are recruited to form the experimental group, while the non volunteer subjects are used for control Comparison of the pre test result (O1 O2) is one indicator of the degree of equivalence between test and control groups. If the pre test results are significantly different, there is a real question about comparability. On the other hand, if pretest observations are similar between groups, there is more reason to believe internal validity of the experiment is good. 2. Sepearate Pretest post test design This design is most applicable in the situations when the researcher cannot know when and to whom to introduce the treatment but can decide when and whom to measure. The basic design is R O1 ( X ) R X O2 The bracketed treatment ( X ) suggests that the researcher cannot control the treatment. This is not a strong design because of several threats to internal validity are not handled adequately. History can confound the results but it can be overcome by repeated experiments. This design is considered to be superior to true experiments in external validity. The strength is that samples are drawn from general population which extends the generalizability of the study 3. Group Time Series design This design introduces repeated observations before and after the treatment and allows subjects to act on their own controls. The single treatment group design has before and after measurements as the only controls. There is also a multiple design with two or more comparison groups as well as the repeated measurements in each treatment group. This format is useful where regularly kept records are a natural part of the environment and are unlikely to be reactive. It is also good way to study unplanned events in an ex post facto manner. 2.7 Measurement In normal parlance measurement refers to an attempt to fix quantitatively the form or other features of a physical object. In research, measurement refers to assigning numbers to empirical events in compliance with a set of rules. This definition brings out the three steps involved in the process of measurement: 1. Selecting the observable empirical events. 2. Developing a set of mapping rules i.e. a scheme for assigning numbers or symbols to represent aspects of the event being measured. 3. Applying the mapping rules to each observation of that event. The goal of measurement is to provide the highest quality, lowest error data for the purpose of testing the hypotheses identified and other related analysis and interpretations. Variables dealt in research studies can be classified as objects and properties. Objects include the things of ordinary experience, such as the laptop, chair, car. It also includes things which are not concrete such as attitude, peer group pressures, perception etc. Properties are the characteristics of the objects. It includes level of motivation, leadership skills etc. strictly speaking researchers are not involved in measuring objects or properties but rather they measure the indicants of the properties or indicants of the properties of the objects. 2.7.1 Mapping rules Measurement involves developing mapping rules and applying the same to record the happenings. The assumptions regarding the mapping rules also affects the choice of data. The mapping rules have the following four characteristics: 1. Classification: numbers are used to group or sort responses. Order does not exist 2. Order: Numbers are arranged in some order such a way that one number is greater than /smaller than / equal to other number. 3. Distance: differences between numbers are ordered. The difference between any pair of numbers is greater than, less than or equal to difference between any other pair of numbers 4. Origin: the number series has a unique origin indicated by the number zero Combination of these four characteristics gives raise to data types. 2.7.2 The scales 31
Based on the characteristics of the mapping rules i.e classification, order, distance and origin four classifications of measurement scales could be arrived at: nominal, ordinal, interval and ratio scale. A detailed discussion follows 1. Nominal scale A nominal scale allows the researcher to assign subjects to certain categories or groups. For eg the respondents can be grouped as male and female. The two groups can be assigned numbers for the purpose of coding and further analyses as 1 and 2. These numbers are simple and convenient labels and have no intrinsic values. It only assigns subjects into either of the two mutually exclusive categories. In other words nominal scale allows the researcher to collect information on a variable that naturally or by design can be grouped into two or more categories that are mutually exclusive and are collectively exhaustive. The nominal scale provides only the basic, categorical, gross information. Counting of members in each group and calculation of frequency or percentage is possible when nominal scale is employed. The researcher is restricted to the use of mode as the measure of central tendency. One can conclude which category has more members. Chi-square test can be used to measure the statistical significance and for measures of association, phi, lambda or other measures may also be appropriate. Nominal scales are weak but they are still useful to classify the data. It is valuable in exploratory work where the objective is to uncover relationships rather that to secure precise measurements. Nominal data type is also widely used in survey and expost facto research when data are classified by major subgroups of the population. 2. Ordinal scale Ordinal scale indicates the order. It includes the characteristics of nominal scale also. Thus an ordinal scale not only categorizes the variables but also rank-orders categories in some meaningful way. The use of ordinal scale implies a statement of greater than or less than or equal to without stating how much greater or less. Other descriptors may also be used viz., superior to, happier than, poorer than, above. It is also possible to rank more than one property at a time. For eg researcher can ask the respondent to rank various air lines on the basis of certain properties. In ordinal scaling the differences in the ranking of objects, persons or events investigated are clearly known. However the ordinal data does not give any indication of the magnitude of the differences among the ranks 3. Interval scale Interval data has the power of the nominal and ordinal data and in addition it incorporates the concept of equality of interval. The interval scale allows to measure the distance between any two points on the scale. It not only enables to group the individuals according to certain categories and taps the order of the groups; it also measures the magnitude of differences in the preferences among the individuals. The interval scale is more powerful than the nominal and the ordinal scales. The measure of central tendency the arithmetic mean, is applicable. Its measures of dispersion are the range, the standard deviation and the variance. 4. Ratio scale Ratio data has the power of the nominal, ordinal and interval scale in addition it also has the provision for absolute zero or origin. It covers the disadvantage of the arbitrary origin point of the interval scale, ie it has an absoultue zero point. The ratio scale not only measures the magnitude of the differences between points on scale but also the proportion in the differences. Multiplication or division would preserve the ratios. It is the most powerful of the four scales because it has a unique zero origin and subsumes all the properties of the other three scales. The measure of central tendency of the ratio scale could be either the arithmetic or the geometric mean and the measure of dispersion could be either the standard deviation or variance or the coefficient of variation. Some examples of ratio scales are those pertaining to actual age, income and work experience in organizations. 2.7.3 Sources of measurement differences Normally any variation of scores among the respondents would reflect true differences in their opinions about the object/issue. However four major sources may contaminate the results: the respondent, the situation, the measurer and the data collection instrument .1. The Respondent
32
Opinion differences that affect measurement come from relatively stable characteristics of the respondent. The skilled researcher will anticipate many of these dimensions, adjusting the design to eliminate, neutralize or otherwise deal with them. However the respondents may still suffer from temporary factors like fatigue, boredom, anxiety or other distractions which may limit their ability to respond accurately and fully. Likewise variations in mood due to hunger, impatience etc may also have an impact. 2. The Situational factors Situational factors include any condition that places a strain on the interview or measurement session which may have a serious effect on the interviewer-respondent rapport. If another person is present during the interview or if the respondents believe that anonymity is not ensured then they may be reluctant to express their true feelings. 3. The Measurer The interviewer can distort responses by rewording, paraphrasing or reordering questions. The tone used, the body language, smiles, nods and so forth may encourage or discourage replies. Like wise in data analysis stage, incorrect coding, careless tabulation and faulty statistical calculation may introduce errors. 4. The Instrument A defective instrument can cause distortion in two major ways. First the use of complex words and syntax beyond respondents comprehension may cause confusion and ambiguity. Leading questions, ambiguous meanings, mechanical defects and multiple questions suggest the range of problems. 2.7.4 The characteristics of sound measurement The instrument developed to measure the concept should be an accurate indicator of the aspects that are being measured. The scale developed cannot be imperfect and prone to errors. The use of better instruments will ensure accuracy in results and will enhance the scientific quality of research. It should also be easy and efficient to use. The goodness of the measures developed should be assessed on the basis of three major criteria viz., validity, reliability and practicality. I. Validity Validity refers to the extent to which a test measures what we actually wish to measure. It refers to the extent to which differences found with a measuring tool reflect true differences among respondents being tested. Validity can be classified into three major forms viz., content validity, criterion-related validity and construct validity. 1.Content Validity The content validity refers to the extent to which a measuring instrument provides adequate coverage of the investigative questions guiding the study. Content validity is good if the instrument contains a representative sample of the universe of subject matter of interest. Determination of content validity is judgmental and can be approached in several ways. Generally the content validity is treated to be higher , if the scale items used represents to a greater extent the domain or universe of the concept being measured. The researcher may determine the content validity through a careful definition of the topic of concern, the item to be scaled and the scales to be used. Another way is to use a panel of persons to judge whether the instrument meet the standards. Face validity is considered as a basic and very minimum index of content validity. It indicates that on the face of it the, items look as if they measure the intended concept. 2.Criterion-Related Validity Criterion related validity reflects the success of measures used for prediction or estimation. Predictive validity refers to the extent to which an outcome could be predicted and concurrent validity refers to the extent to which estimate of current behaviour or condition could be made. The researcher must ensure that the validity criterion used is itself valid. This can be judged in terms of four qualities viz., relevance, freedom from bias, reliability and availability. 3.Construct validity This is the most complex and abstract feature. Construct validity testifies that the results obtained from the use of measure fits the theories around which the test is designed. In other words a measure has construct validity to the degree that it conforms to predicted correlations of other theoretical propositions. The researcher may wish to measure or infer the presence of abstract characteristics for which no empirical validation seems possible. Attitude, aptitude and personality scales generally fall in this category. Although it is difficult, assurance is still needed that the measurement has an acceptable degree of validity. This is assessed through convergent and discriminant validity. 33
Convergent validity is established when the score obtained with two different instruments measuring the same concept are highly correlated. Discriminant validity is established when, based on theory two variables are predicted to be uncorrelated and it is also empirically proved. The validity can be proved through the use of correlational analysis, factor analysis etc. II. Reliability Reliability refers to consistency i.e. a measure is reliable to the degree that it supplies consistent results. Reliability is concerned with estimates of the degree to which measurement is free of random or unstable error. Reliable instruments can be used with confidence that transient and situational factors are not interfering. Reliable instrument are robust and they work well at different times under different conditions. The reliability of an instrument is measured on the basis of the stability, equivalence and internal consistency. 1.Stability Stability is securing consistent results with repeated measurements of the same person with the same instrument. An observation is said to be stable if it gives the same reading on a particular person when repeated one or more times. Stability measurement in survey situations is more difficult then in observational studies. Observation can be done repeatedly but the resurvey can be conducted only once. Two test of stability are test-retest reliability and parallel-form reliability. (a)Test-Retest Reliability The conduct of resurvey is called test-retest arrangement which involves comparisons between the two tests to learn about the reliability. The reliability coefficient obtained with a repetition of the same measure on a second occasion is called test-retest reliability. When a questionnaire containing some items that are supposed to measure a concept is administered to a set of respondents and if the same questionnaire is administered after some time again, then the correlation between the scores obtained at two different times from the same set of respondents is called testretest co-efficient. Higher the coefficient, better the reliability and stability. The following difficulties can occur in test-retest methodology; Time delay in measurement may give rise to changes in situational factors Insufficient time between measurement will enable the respondent to remember previous answer and repeat them resulting in bias. Respondents discernment of a disguised purpose may introduce bias if the respondents holds opinion related to the purpose but not assessed with current measurement questions Topic sensitivity occurs when the respondents seeks to learn more about the topic or form new and different opinions before the retest. Introduction of extraneous moderating variables between measurements may result in a change in the respondents opinion from factors unrelated to research (b)Parallel-Form reliability Parallel form reliability occurs when two comparable sets of measures tapping the same construct are highly correlated. The forms have similar items and the same response format. The wording and the order or sequence of the question is only changed. It is done in order to establish the error variability arising due to change in the wordings or ordering of questions. High correlation between two forms ensures that the measures are reasonably reliable, with minimal error variance caused by wording, ordering or other factors. 2.Equivalence Equivalence is concerned with how much error may be introduced by different investigators or different sample of items being studied. Equivalence is concerned with variations at one point of time among observers and samples of items. One way to test for the equivalence of measurements by different observers is to compare their scoring of the same event. One test for item sample equivalence is by using alternative or parallel forms of the same test administered to the same person simultaneously. The results of the two tests are then correlated. The major interest with equivalence is typically not how respondents differ from item to item but how well a given set of items will categorize individuals. There may be differences in response between two samples of items, but if a person is classified the same way by each test, then the test have good equivalence. 3. Internal consistency 34
The internal consistency indicates the homogeneity of the items in the course of measuring a construct. The items should hang together as a set and should also be capable of independently measuring the same concept so that the respondents attach the same overall meaning to each of the items. This can be ensured by examining if the items and the subsets of items in the measuring instrument are correlated highly. The internal consistency among the items can be measured by using the split-half reliability and the inter-item consistency reliability test. (a)Split-Half Reliability Split-half reliability reflects the correlations between two halves of an instrument. This technique can be used when the measuring tool has many similar questions or statements. The instrument is administered and the results are separated by item into even and odd numbers or randomly selected halves. When correlation is performed, if the results of correlation is high the instrument is said to have high reliability as regards the internal consistency. (b)Inter-item consistency reliability It is a test of the consistency of respondents answers to all the items in a measure. If the items are independent measure of the same concept, they will be correlated with one another. The most popular test of interitem consistency reliability is the Cronbachs coefficient alpha. This test is used for multipoint-scaled items. For dichotomous items the Kuder-Richardson formulas are used. Higher coefficients will result in a better measuring instrument. III Practicality The operational requirements of the project require it to be practical. Practicality is defined as economy, convenience and interpretability. Economy is concerned with minimizing the cost concerned with conducting the research project. The method of data collection, length of the instrument etc will have an implication on the research budget. Convenience refers to ease in administering the questionnaire. This can be achieved by giving clear and complete instructions and by paying proper attention to design and layout. The interpretability issue arises in case when the persons other than the test designers must interpret the results. To enable interpretation the designer of the data collection instrument should provide enough information regarding the scoring keys, norms, guidelines for test use etc. 2.8 Measurement scales Scaling is a procedure for the assignment of numbers (or other symbols) to a property of objects in order to impart some of the characteristics of numbers to the properties in question. The numbers are assigned to indicants of the properties of objects. In case of measuring the attitude of respondents towards new product introduced in the market numbers may be assigned. 1 may be assigned to positive attitude, 2 to neutral and 3 to negative attitude. Measurement can be performed using standardized scales or through custom designed scales. Standardized scales may be opted in case of measuring concrete objects. Developing customized scale is needed in the case where researcher wants to measure more abstract and complex construct like the customer attitudes towards a new product introduced in the market. In this case standardized scales may not exist. This situation warrants the development of customized scales. 2.8.1 Selection of measurement scale Selection or construction of a measurement scale requires decision in the following six key areas: 1. Study objective: Researchers may have two general study objective viz to measure the characteristics of the respondents and to use respondents as judges of the objects or indicants presented to them. 2. Response form: Three types of measuring scales viz., rating, ranking and categorization can be used. Rating scale is used when respondents score an object or indicant without making a direct comparison with another object or attitude. Ranking scale enable to make comparison among two or more indicants or objects. Categorization enable to put the subjects involved in groups or categories 3. Degree of preference: Measurement scales may involve preference measurement or non preference evaluation. In case of preference measurement respondents are asked to choose the object preferred. In case of non
35
preference evaluation the respondents are asked to make judgment without any personal preference towards objects or solutions. 4. Data properties: The data properties should also be viewed in case of decision regarding measurement scales. The data can be classified as nominal, ordinal, interval and ratio. The statistical application depends on the assumptions underlying each data type. 5. Number of Dimensions: Measurement scales can be unidemensional or multidimensional. In case of unidimensional scale only one attribute of the respondent is measured. Multidimensional scaling recognizes objects as consisting of n dimensions. 6. Scale construction: Five construction approaches are available viz., arbitrary, consensus, item analysis, cumulative and factoring. The researcher should take into consideration of both the type of measurement and the scales construction when selecting an appropriate scale. 2.8.2 Methods of scaling The methods of assigning numbers or symbols to the attitudinal responses of the respondents towards objects, events or persons is an important aspect of the research. There are two main categories of attitudinal scales - the rating scales and the ranking scale. Rating scales have several response categories and are used to elicit responses with regard to the object, event or person studies. Ranking scales are used to make comparison between or among objects, events, persons and elicit the preferred choices and ranking among them. 1. Rating scales Rating scales are used to judge properties of objects without reference to other similar objects. In rating scales an object is judged in absolute terms against certain specified criteria. The scale can be used to elicit responses with regard to an object, event or person studied. The number of scale points may range from three to five or ten. Researchers believe that more points on a rating scale provide an opportunity for an accurate measurement of variance. Some of the rating scales used often by researchers are explained below; The dichotomous scale offers two mutually exclusive response choices. It may be used to elicit a Yes or No answer, agree and disagree etc., This is useful to elicit responses for demographic question or where dichotomous response is adequate. eg. Do you have a credit card? Yes No The category scale uses multiple items to elicit a single response. The multiple choice, single-response scale is appropriate when there are multiple options but only one answer is sought. o Eg., Age - less than 20 years o - 21 to 40 years o - 41 to 50 years o - Above 50 years The check list or a multiple choice, multiple- response scale allows the respondent to select one or several alternatives. Eg in eliciting the response regarding the source through which the information about a new product is obtained, a respondent may select all or more than one of the choices given below: o Source of information - Advertisement o - Sales person o - Sales materials o - Showrooms o - Friends/ relatives/ Neighbours o - Other sources The Likert scale is designed to examine how strongly the respondents agree or disagree with statements relating to the attitude or object on a 5-point scale. The scores on the individual items are summed to produce a total score for the respondent and hence it is also called summated scales. A Likert scale usually contains two parts, the item part and the evaluative part. The item part usually contains statement about a product, event or attitude. The evaluative part is a list of response categories ranging from strongly agree to strongly disagree. The item and evaluative part is shown below Strongly Strongly 36
Disagree 1
Disagree 2 3
Neutral Agree 4 5
Agree
I am satisfied with working environment I am happy with the work assigned The responses over a number of items or statements tapping a particular concept or variable are summated for every respondent. It is assumed that all the statements measure some aspect of a single common factor. This is an interval scale and the differences in the responses between any two points on the scale remain the same The Semantic Differential Scales are used widely to describe the set of beliefs a person holds. Several bipolar attributes are identified at the extremes of the scale and respondents are asked to indicate their attitudes on semantic space toward a particular individual, object or event on each of the attributes. The semantic space may consist of five or seven-point rating scales bounded at each end by polar adjectives or phrases. There may be as many as 15 to 25 semantic differential scales for each attitude or object. The procedure is also insightful for comparing the images of competing brands, stores or services. The semantic differential also may be analyzed as a summated rating scale. Each of the scale is assigned a value from -3 to 3 or 1 to 7 and the scores across all adjective pairs are summed for each respondent. Individuals can be compared on the basis of the total scores. An example of semantic differential scale is given below; Responsive _ _ _ _ _ _ _ Unresponsive Beautiful _ _ _ _ _ _ _ Ugly Courageous _ _ _ _ _ _ _ Timid The semantic differential has several advantages. It produces interval data. It is an efficient and easy way to elicit responses from a large sample. The attitudes can be measured both in terms of direction and intensity. The total set of responses provides a comprehensive picture of the meaning of an object. It is a standardized technique which can be easily repeated and at the same time escapes many problems of response distortion. The Numerical scale is similar to the semantic differential scale with the difference that numbers on a 5 point or 7 point scale are provided with bipolar adjectives at both ends. This is also an interval scale. The scale provides both an absolute measure of importance and a relative measure of the various items rated. The scales linearity, simplicity and production of ordinal or interval data makes it very popular. An example : Extremely Pleased 7 6 5 4 3 2 1 Extremely displeased The itemized rating scale is a 5 point or 7 point scale with anchors provided for each item and the respondent states the appropriate number on the side of each item or circles the relevant number against each item. The responses to the items are then summated. This uses an interval scale . Example is shown below; Indicate your response number on the line for each item. 1 2 3 4 5 Very Unlikely Unlikely Neither Unlikely Likely Very Likely nor likely ----------------------------------------------------------------------------------------1. I like to take more responsibility ---2. If additional responsibility is not provided I will be dissatisfied ---3. I am interested in a job which provides me more salary ---The itemized rating scale provides the flexibility to use as many points in the scale as considered necessary ranging from 4,5,7,9, etc., It is also possible to use different anchors. When a neutral point is provided, it is a balanced rating scale. When a neutral point is missing it is an unbalance rating scale. The itemized rating scale is frequently used in business research as it adapts to the number of points desired to be used , as well as the nomenclature of the anchors can be accommodated to suit the needs of the researcher. In Fixed or constant sum scale the respondents are asked to distribute a given number of points across various items. It enables the researcher to discover the proportions and is more in the nature of ordinal scale. A minimum of two categories and a maximum of ten can be presented to the respondents. Presenting too many stimuli will be a hindrance to the precision and the patience of the respondents. A respondents ability to add is also taxed if too 37
much of stimulus is provided. For example in selecting a particular brand of computer a respondent may be asked to rate the following aspects; Hardware configuration --Freebies given --Brand image ----------Total points 100 -------- Staple Scales are simplified version of semantic differential scales. It is used when it becomes difficult to find bipolar adjectives that match the investigative questions. It uses only one pole rather than two. Respondents are asked to indicate the object by selecting a numerical response category. The higher the positive score the better the adjective describes the object. Similarly , the less accurate the description, the larger the negative number chosen. Ratings may range from + 3 to -3, or + 5 to -5 , very accurate to very inaccurate. It produces interval data. It is easy to administer and construct as there is no need to provide adjectives to assure bipolarity. For eg, the respondents may be asked to rate their job using staple scales as follows: +3 +3 +3 +2 +2 +2 +1 +1 +1 Challenging suits my skill satisfactory -3 -3 -3 -2 -2 -2 -1 -1 -1 The graphic rating scale is simple and commonly used in practice. In this scale various points are marked along the line to form a continuum. The respondent indicates his rating by simple making a mark at the appropriate point on a line that runs from one extreme to the other. A brief description on the scale points are given to act as a guide in locating the rating. The faces scales depicting faces ranging from smiling to sad can be used on a rating scale to obtain responses regarding peoples feelings with respect to some aspect. A major limitation of this scale is that the respondent may select almost any position on the line which will pose difficulty in analysis. The Consensus scale as the name suggests is developed by consensus by a panel of judges. The judges select certain items which enable to measure a concept. The items are selected based on pertinence or relevance to the concept. The items are also tested for validity and reliability. For eg. Thurstone Equal Appearing Interval Scale is a consensus scale. A panel of judges selects the statements which describe the concept under study. The scale is developed based on the consensus. Developing this scale involves time and as such is rarely used in the organizational concept. Errors in Rating scales The respondents rating should be evaluated taking into consideration the following three types of errors; leniency, central tendency and halo effect. (1) The error of leniency occurs when the respondent is either a easy rater or hard rater. Respondents are inclined to give higher score to people they know well. The opposite is also possible where a lower score may be given. (2) The central tendency refers to the respondents reluctance to give extreme judgments which will lead to the error of central tendency. This happens because the respondent may not know the object or property being rated. (3) The halo effect happens because of carrying over a generalized impression of the subject from one rating to another. Halo is a pervasive error. It is difficult to avoid when the property being studied is not clearly defined, not easily observed, not frequently discussed, involves reactions with others or is a trait of high moral importance. 2. Ranking scales 38
The ranking scales are used to tap the preferences of respondents among two or more objects and make choices among them. The respondents are usually asked to select the best or most preferred. This approach is satisfactory when there are only two choices are involved. When more than two choices are present, it results in ties. For eg., in a response if 40 % choose product A , 35% choose Product B and 25% choose product C, then it cannot be concluded that A is the most preferred product because 60% did not prefer A. Three techniques can be used to avoid this ambiguity viz., Paired Comparisons, forced choice and the comparative scale. A brief discussion follows; Paired comparison scale The paired comparison scale is used when the respondents are expected to express attitudes or choice between two objects at a time. It helps to assess the preferences. In the previous example the preference for Product A over B and C will enable to make better decision. However as the number of objects to be compared increases the number of paired comparisons also increases. The paired choices for n objects will be [(n)(n-1)/2]. When three products are presented to the respondents the number of paired comparison would be three [(3)(3-1)/2]. If the number of products is four then the number of paired comparison would be six. More the number of objects more will be the number of paired comparisons presented to the respondents. Paired comparison is a good method, if the number of objects to be compared is small. If too much of comparisons are to be made then the respondents may become tired and provide wrong answers or refuse to continue. It is suggested that 5 or 6 stimuli are not unreasonable if other questions are to be accompanied with the comparisons. If paired comparisons only to be dealt with then upto 10 stimuli can be accommodated. Forced Ranking scale Forced ranking scale is easier and faster compared to the paired comparison method. It requires the respondents to rank a list of attributes. It enables respondents to rank objects relative to one another among the alternatives provided. It is more suitable in case where the number of alternatives to be ranked is limited in number. For eg. rank the following newspapers in the order of preference The Hindu Business Line Indian Express If the number of stimuli to be ranked is 5 or less, then it is comparatively an easy task. The respondents may be normally careless if the items exceed 10. Comparative scale It involves a standard against which comparison is done. The comparison scale provides a point of reference against which the current object under study is compared. It enables benchmarking. However this can be used only when the respondents have the knowledge regarding the standard against which comparison is made. The researchers can treat the data produced by comparative scales as interval data since the scores reveals the interval between the standard and the actual. It can also be treated as ordinal data as the rank or position of the items are dealt with. 2.8.3 Construction of measurement scales Five techniques are available to construct the measurement scales viz., Arbitrary approach, Consensus Item analysis, Cumulative methods and Factoring. They are explained below; 1. Arbitrary scaling Arbitrary scales are developed on ad hoc basis. It is largely based on researchers own subjective selection of items. Several items which are appropriate and unambiguous to the theme of study may be selected. Each item is scored from 1 to 5 depending on the responses obtained. The results are then totaled. Arbitrary scales are easy to develop, inexpensive and highly specific to the theme of the study. However the major limitation is that the design approach is subjective. There is no assurance other than researchers insight that the items choosen are representative of the universe of content. 2. Consensus Scaling In consensus scale the items are selected by a panel of judges after evaluation on the basis of some criteria like - relevance to the topic area, the risk of ambiguity and the level of attitude represented by the items. This approach is widely known as Thurstone equal appearing Interval Scale. The procedure followed in construction of the scale is described below
39
(i) A large number of items/statements expressing different degree of favorableness towards an object relating to the subject of the study, usually more than twenty are collected by the researcher. (ii) A panel of judges evaluates the statements. The statements are written in the card. One statement is written in each card. The judges sort each card into one of the 11 piles representing the degree of favourableness the statement expresses. (iii) The sorting yields a composite position for each of the items. In case of disagreement between the judges the item is discarded. (iv) For the items that are retained median scale value between one and eleven is assigned. (v) A final selection of statements are made on the basis of the median score. Of the 11 piles 3 are identified by the judges as favourable , unfavourable and neutral. The eight intermediate piles are unlabelled. The Thurstone method is widely used for developing differential scales to measure attitudes. The scale is more reliable for measuring a single attitude. This method of construction involves cost, time and people and hence it is impractical. The values are assigned to the items by judges which is subjective. 3. Item Analysis scaling In Item analysis scaling, an item is evaluated on the basis of how well it discriminates between those persons whose score is high and those whose total score is low. It involves calculating the mean score for each scale item among the low scorers and high scorers. The item means between the high-score group and the low-score group are then tested for significance by calculating t values. Finally the items that have the greatest t values are selected for inclusion in the final scales. Summated scales or Likert scales are developed by the item analysis approach. Summated scales consist of a number of statements which express either favourable or unfavourable attitude towards an object to which the respondents is required to react. The respondents indicate the agreement or disagreement with each of the statement. Each response is given a numerical score and the total is obtained to measure the respondents attitude. The procedure for developing a Likert type scale is described below; (i) A large number of statements relevant to the object being studied is collected. The statements expresses definite favourableness or unfavourableness towards the subject (ii) A trial test can be conducted with a small group of respondents who form part of the final study. The agreement or diasagreement towards each statement is obtained on a five point scale. (iii) The response is scored in such a way that the response indicating the most favrourable attitude is given the highest score of 5 and the most unfavourable attitude is given the lowest score 1. (iv) The total score of each respondent is obtained by adding the score for each individual statements (v) The next step is to array the total scores and find out those statements which have a high discriminartory power. For this purpose the researcher may select some part of the highest and the lowest total scores, for eg, top 25 percent and bottom 25 percent. These tow extreme groups are interpreted to represent the most favourable and the least favourable attitudes and are used as criterion groups by which to evaluate individual statements. Thus the statemtns whcich consistently correlate with low favoruability and with high favourability are identified. (vi) The statements which correlate with the total test are retained in the final instrument and all others are discarded. The advantages of Likert scale is that it is relatively easy to construct, considered to be more reliable and less time consuming. One of the major limitations is that the scale simply examine whether respondents are more or less favourable towards the subject under study, but it cannot reveal how much more or less they are. There is no basis for belief that the five positions indicated on the scale are equally spaced. 4. Cumulative scales Cumulative scales consist of series of statements to which a respondent expresses his agreement or disagreement. The special feature in this scale is that it forms a cumulative series. The statements are related to one another in such a way that an individual who replies favourably to Item No.3 also replies favourably to Item no. 2 and 1. An individual whose attitude is at a certain point in a cumulative scale will answer favourably all the items on one side of this point and answer unfavourably all the items on the other side of this point. The individuals score is arrived at by counting the number of points concerning the number of statements answered favourably. If the total 40
score is known it is easy to estimate the respondents answer to individual statements constituting the cumulative scales. A major scale of this type is the Guttmans scalogram. Scalogram analysis refers to the procedure for determining whether a set of items forms a unidimensional scale. A scale is unidimensional if the responses fall into a pattern in which endorsement of the item reflecting the extreme position results also in endorsing all items which are less extreme. Under this technique, the respondents are asked to indicate in respect of each item whether they agree or disagree with it. If the items form a unidimensional scale , the response patter will be in the following manner Item Number Respondent score 3 2 1 X X X 3 _ X X 2 _ _ X 1 _ _ _ 0 A score of 3 means that the respondent agrees with all the statements which is positive or expresses favourable attitude. A score of 2 reveal that the respondent does not agree with the third statements but agrees with all other statements. In this way, the scores can be interpreted. The procedure for developing a scalogram is described below; (i) The issue or concept or subject under study must be clearly defined (ii) A number of items relating to the subject under study is to be developed. Care should be devoted to eliminate the items that are ambiguous or irrelevant. (iii) The next step is to pretest the items to determine the scalability. The pretest should include a minimum of 12 items and should be administered on atleast 20 respondents. In the pretest the respondents opinion are obtained on 5 point Likert scale. The most favourable response is scored as 5 and the least favourable response is scored as 1. If there are 10 items then the total score can range between 5 and 50. For the purpose of analysis and evaluation the respondents opinions are arrayed according to the total score. If the responses of an item form a cumulative scale, the response category scores should decrease in an orderly fashion as indicated below. Failure to follow the decreasing pattern revels a overlapping and shows that the item is not a good cumulative scale item. Scale type Item Errors Number of 9 6 2 5 7 Per case errors 4(perfect) X X X X X 0 0 3(perfect) - X X X X 0 0 nonscale - X - X X 1 1 nonscale - X X - X 1 2 3(perfect) - - X X X 0 0 2(perfect) - - - X X 0 0 1(perfect) - - - - X 0 0 nonscale - - X - 2 2 0(perfect) - - - - 0 0 (iv) The total scores of the various opinions are obtained. The order is then shifted such that it results in a reduced number of items. The above example shows that five items ( 9,6,2,5 and 7) are selected for final scale. Perfect scales are those in which the respondents answers fit the pattern that would be reproduced by using the persons total score. Non-scale types are those in which the category pattern differs from that expected from the respondents total score ie non scale items have deviations from unidimensionality or errors. The selection of an item in the final unidimensional scale is made on the basis of the coefficient of reproducibility. Guttman has set 0.9 as the level of minimum reproducibility in order to select a scale. The following formula is used for measuring the reproducibility: Guttmans Coefficient of Reproducibility = 1 e/n(N) e = number of errors 41
n = number of items N = Number of cases 5. Factor scales Factor scales includes a variety of techniques that been developed to address two issues viz, the problem of dealing with the universe of content that is multi dimensional and the problem of uncovering the underlying dimensions that has not been identified by the exploratory research. Factor scales are developed through factor analysis or on the basis of intercorrelations of items which indicate the common factor responsible for the relationships between items. The techniques are designed to intercorrelate items so that the degree of interdependence may be detected. An important factor scale based on factor analysis is semantic differential scale and multi dimensional scales. They are discussed below: (a) Semantic Differential Scale The semantic differential scale (S.D) is developed by Osgood and his associates to measure the psychological meaning of an object to an individual. The scale is made on the presumption that the object under study can have different dimensions of connotative meaning which can be located in multidimensional property space or in the semantic space in the context of S.D scale. The scaling consists of a set of bipolar rating scales, usually of 7 points on the basis of which the respondents rate each concept on the scale item. An example of the scale being used by a panel of corporate leaders to rate the candidates for a leadership position are shown below. Three factors contributes viz evaluation (E), potency (P) and activity (A) are considered. (E) Sociable (7) ____ ____ ____ ____ _____ ____ ____ (1) Unsociable (P) Strong (7) ____ ____ ____ ____ _____ ____ ____ (1) Weak (A) Active (7) ____ ____ ____ ____ _____ ____ ____ (1) Passive (E) Progressive (7) ____ ____ ____ ____ _____ ____ ____ (1) Regressive (P) Tenacious (7) ____ ____ ____ ____ _____ ____ ____ (1) Yielding (A) Fast (7) ____ ____ ____ ____ _____ ____ ____ (1) Slow The nature of the problem determines the selection of dimensions and bipolar pairs. The SD scale is adapted to each research problem. The construction of SD scale involves the following steps; (1) The concept to be studied is selected based on personal judgment. It should reflect the nature of the problem. (2) The next step is to select the scales. In selecting the scale, the scales relevance to the concepts being judged and factor composition should be kept in mind.. Atleast three bipolar pairs for each factor should be taken into consideration to measure the evaluation, potency and activity. The scale should also be stable across subjects and concepts. (3) A panel of judges are used to rate the various stimuli on the various selected scales and the responses of all the judges are combined to determine the composite scaling. The Semantic differential scale is efficient and easy way to secure attitude from a large sample. The scale measures both the direction and intensity. The test can be easily repeated as it a standardized technique. (b) Multidimensional scaling(MDS) Multidimensional scaling is relatively more complicated scaling device which can be used to scale objects, individuals or both with a minimum of information. It enables to provide visual impression of the relationship between variables. MDS is used when all the variables both metric and non metric are to be analyzed in a study simultaneously and all such variables happen to be independent. The data handling characteristic of MDS provide several options: ordinal input, fully metric and non metric modes. The various techniques use proximities as input data. Proximity is an index of perceived similarity or dissimilarity between objects. The respondents are asked to judge in pairs the possible combinations regarding their similarity. Through computer program, the ranked or rated relationship is represented as points on a map in multi dimensional space. For example if respondents are asked to identify similar products among a group of products and if product X and Y are similar, MDS technique will position X and Y in such a way that the distance between them in multidimensional space is shorter than that between any two other objects.
42
Two approaches viz., the metric approach and the non-metric approach are available in the context of MDS. The metric approach to MDS treats the input data as interval scale data. The non-metric approach first gathers the non-metric similarities by asking respondents to rank order all possible pairs that can be obtained form a set of objects. Such non-metric data is then transformed into some arbitrary metric space and then the solution is obtained by reducing the dimensionality. The MDS enables the researcher to study the perceptual structure of a set of stimuli and the cognitive process underlying the development of this structure. It enables perceptual mapping in a multidimensional space. However MDS is not widely used because of the computational complications involved. SUMMARY: This unit examined the meaning and the various types of experimental design. The various activities involved in conducting the experiment were discussed. The various factors affecting the internal and the external validity were dealt. The pre-experiments, true/lab experiment and the field experiments were covered. The various rating and ranking scales were discussed. Construction of arbitrary, cumulative, consensus, item analysis and factor scales were examined. Equipped with the knowledge on the research design and measurement scales, the next unit presents the various data collection methods, sampling techniques and the parametric and non parametric test available to test the hypothesis. Have you understood? What is experimental design? Elucidate the benefits and drawbacks What essential characteristics distinguish a true experiment from other research design? Distinguish between the following: o Internal validity and external validity o Pre-experimental and quasi-experimental design o History and maturation o Random sampling, randomization and matching o Active factors and blocking factors What is internal validity and what factors affect the same? What factors contribute to the external validity? Discuss the different experimental designs. Illustrate the same. A retail grocery chain wants to study the effects of the various levels of advertising effort and price reduction on the sale of specific branded grocery products. What type of experimental design would you recommend? Suggest in detail the design for the study. What are the essential differences among nominal, ordinal, interval and ratio scales? How do these differences affect the application of statistical techniques? What are the four sources of measurement error? Illustrate by example how each of these might affect the measurement results in a face-to-face interview. How is the interval scale more sophisticated than the nominal and ordinal scales? Why is ratio scale considered to be most powerful of the four scales? How do you ensure the soundness of a measure Describe the difference between the rating scales and ranking scales and indicate the application areas where they can be used Discuss the relative merits and problems with o Rating and ranking scales o Likert and differential scales o Unidimensional and multidimensional scales Discuss the techniques of constructing the measurement scales with examples
43
Unit 3 Data collection Methods 3.1 Introduction Once the problem is defined and research design is finalized, the various sources of data and the ways in which it can be collected for the purpose of analysis, testing of hypotheses and answering research questions should be explored. Data collection could be made through the primary, secondary and tertiary sources which are dealt in detail. This unit also highlights the sources of data and the methods of collecting the same. The data cannot be collected always form the entire population due to various reasons like difficulty in estimating the population, cost constraints, time etc., Sampling technique has to be adopted by a researcher for collection of data. This unit provides a detailed account of the probability and the non probability sampling techniques. The issues regarding determination of sample size are also presented. 3.2 Learning objectives: After completing this unit you will be able to: Know the difference between primary and secondary data and their sources Understand the various data collection methods and the advantages and disadvantages of each method Know the criteria to be considered in making decisions regarding the method of data collection Understand the need for sample Understand the different types of sampling designs Identify the appropriate sample design for different research purposes Describe the issues to be considered in determining the sample size 44
3.3 Sources of data Data sources can be broadly categorized into three viz., primary, secondary and tertiary. 3.3.1 Primary data sources Primary data refers to information gathered firsthand by the researcher for the specific purpose of the study. It is raw data without interpretation and represents the personal or official opinion or position. Primary sources are most authoritative since the information is not filtered or tampered. Some examples of the sources of primary data are individuals, focus groups, panel of respondents, internet etc. Data collection from individuals can be made through interviews, observation etc. Focus groups Focus group involves a formalized process of bringing small group of people together for an interactive and spontaneous discussion on any one particular topic. Focus group generally consists of 6 to 12 participants with a moderator leading the unstructured discussions which can last between 90 minutes to two hours in general. By facilitating the discussions the moderator elicits as many ideas, attitudes, feeling and experiences as possible regarding the concerned issue. Participants are generally chosen on the basis of their expertise in the topic on which the information is sought. The goal of conducting focus group is to give researchers the access to as much information as possible regarding the product, service concept or organization. The focus group does not restrict to only asking and answering questions. The success of focus group relies to a greater extent on the group dynamics, the willingness of the participants to engage in interactive dialogue and the ability of the moderator to keep the discussion on the track. The fundamental idea behind the focus group is that one participants remark or response may initiate comments and discussions from the other participant thus generating spontaneous and free interplay among all the participants. Focus groups are relatively inexpensive and can provide access to dependable data within a short period of time. Focus groups objectives The objectives of forming focus groups are listed below: The focus groups provide data for defining and refining the problems. In situations where it is difficult for the researcher to pinpoint the specific problems, the focus group aids to differentiate between symptoms and root cause problems. In certain situations, researchers may be not be sure about the specific types of data or information that should be investigated. In these, situations focus groups reveal unexpected components of the problem and thus can help researchers to determine the specific data that are to be collected. There are situations when the quantitative research investigations leads to results which are not understandable or explainable. In such situations the focus group enables to provide data for a better understanding of results derived from quantitative studies. Focus groups interviews provide researchers excellent opportunities to gain insight into the respondents hidden needs, wants, attitudes, feelings and behaviours. It opens the door to realism. The general interactions and discussions among the focus group members enables to generate new ideas, products or services or innovative ways of solving problems which are unexplored hitherto. The focus group plays a critical role in the process of developing new constructs and creating reliable and valid measurement scales. In the exploratory stage, the focus group reveals additional insights into the underlying dimensions that may or may not make up the construct. This insight can help researcher to develop scales that can be later tested and refined through larger survey research designs. Conducting focus group interviews: The process of conducting focus group interviews can be divided into three logical phases: planning, conducting the discussion and analyzing and reporting the results. I. Planning the Focus group study Planning phase is critical for the conduct of a successful focus group interview. The researcher should have a clear idea of the purpose of the study, problem definition, and specific data requirement. In the planning phase the decision regarding the type of participants in the focus group, the process of selecting and recruiting them, the size of the focus group and the location of the focus group session should be made. These aspects are discussed below; Focus group participants 45
The decisions regarding the type participants must be made in tune with the purpose of the study. The focus group should be made as homogenous as possible, but it should also provide room for variations so as to encourage contrasting opinions. Important factors to be considered in the selection process are the potential group dynamics and the willingness of members to engage in dialogue. The knowledge level of the participants regarding the topic to be discussed should also be considered in the selection process. Selection and recruitment of participants To select the participants for a focus group, the researcher should develop a screening form. The screening form should access the characteristics, the respondent must possess in order to qualify as a prospective participant. Researcher should also choose a method of reaching the prospective participants. They can use the existing participants list, on-location interviews, snowball sampling, random telephone screening, placing advertisements in newspapers and on bulleting boards. The issue of sampling needs to be addressed carefully while planning for the focus group. Random sampling may eliminate bias and produce dependable conclusions . However it may not be possible or necessary in certain situations. For example in qualitative research, a more flexible research design can be followed. Recruitment involves the process of securing willingness from the participants to be a part of the focus group. It is not an easy task, however professional recruiters who are good in interpersonal communication skills and social skills can be identified to perform the task successfully. The recruiters should clearly communicate the importance of the topic and specify the date, starting and ending time, location, incentives for participating and method of contacting the recruiter. In order to reinforce the participants commitment to participate in the focus group, the researcher should sent out a formal communication/invitation letter incorporating all the important information about the meeting. The recruiter should also remind the participants of the focus group meeting the day before the scheduled meeting. Size of the focus group The optimal number of participants in the focus group interview may vary from 6 to 12. In case of less number of participants creating the right type of group dynamics may be difficult and a few people may dominate the others. Too many participants will also limit each persons opportunity to contribute to the discussions. However the researcher should provide room for the fact that some of the participants who have agreed may not turn for the discussion. Incentives may be provided to motivate participants. Focus group locations The focus group sessions can last between 90 minutes to two hours. The location assumes importance because of the length of the discussions. The location should be comfortable, spacious, uncrowded and conducive to elicit spontaneous, unrestricted dialogue among all the group members. Depending on the resource constraints the focus group may be conducted in locations like conference rooms, office or hotel rooms. However the ideal location would be specially designed rooms with facilities like large table, comfortable chairs, relaxed atmosphere, build in audio equipment and a one-way mirror so that the researcher can view and hear the discussion without interfering or being seen. Video equipment should also be available to capture the nonverbal communications and behaviour during the course of discussion. II. Conducting the focus group discussions The focus group is conducted by a moderator. The moderator communication, interpersonal, probing, interactive and observation skills plays a major role in successful conduct of the focus group discussions. The moderator should be able to stimulate and control the focus group discussions over the predetermined topics in a skillful manner. He should be able to draw the best and most innovative ideas from the participants regarding the topic or problem under discussion. The moderator is responsible for creating a positive group dynamics and a comfort zone between himself and each group member as well as among the group members. The moderator should have enough background knowledge regarding the topic of discussion. Apart from skill set discussed above moderating the session requires objectivity, self-discipline, concentration and careful listening. The moderator should be completely prepared with the questioning route yet should allow flexibility depending on the situation. The actual conduct of the focus group discussion can be arranged under three phases viz, Opening the session, main session and closing the session. They are dealt below; 46
Opening of the session The moderator should warmly receive the participants and make them feel comfortable. The participants should be instructed to write their names in the name cards. A few minutes should be allowed for socializing before seating the participants so that a warm, friendly and congenial environment can be set. The socializing session can be used by the moderator to observe the participants and place them in groups. The moderator should discuss the ground rules for the session; one person should only talk at a time, each one should be given a chance, brief about the purpose of the session and so on. The moderator can begin the discussion with an open question designed to engage all participants in the discussion. This breaks the ice and enables to build a positive group dynamics and comfort zone. The main session The topic area is introduced in the main session and as the discussion starts the moderator gears the direction using probing techniques to get as many details as possible. As there is no hard and fast rules regarding how long a discussion can be carried out the moderator should use his judgment in deciding when to close one topic and move to next. The critical question should be provided with more time so that the ideas, feeling and thoughts can be elicited to the maximum. Closing the session After covering all the topics for which the focus group is formed the session can be wound up. In this process the moderator can summarize the conclusion around at the discussion and also invite the closing comment from the participant regarding further contributions or disagreement over certain ideas. If nothing else arises, the moderator can close the discussion after thanking the participant and distributing the promised incentives. III. Analyzing and reporting the results For the purpose of analyzing the happenings in the focus groups session two techniques are available viz., debriefing analysis and content analysis. Debriefing analysis is an interactive procedure in which the researcher and moderator discuss the subjects responses to the topics that outlined the focus group session. Insight and perceptions can be expressed concerning the major ideas, suggestions, thoughts and feelings from the session. Ideas for improving the session can be uncovered and applied to further focus group sessions. Content analysis is a systematic procedure of collecting individual responses and grouping them into larger theme categories or patterns. It is the most widely used formalized procedure by the researchers in an effort to create data structures from focus group discussion. From the group discussion session responses are recorded and translated. The researcher reviews the raw responses and creates data structures according to the common patterns. The process requires several analysis and interpretive factors. For creating the report, the researcher should first understand the audience, the purpose of the report and the expected format. The report should be clear, understandable and should provide data to support findings. The researcher should include quotations, examples where ever needed. The report should be presented in a logical sequence. The fact that the report will be a historical record should also be given due consideration. Advantages and disadvantages of focus group interviews The following are some of the advantages of conducting focus group interviews; The spontaneous and unrestricted interaction among the participants give raise to new ideas, thoughts and feelings which cannot be elicited in an one to one interviews. The respondents will provide creative and honest opinion. The conducive environment enhances creativity. The underlying reason to the attitude, feeling, emotions, behaviour etc can be dealt in case of the focus groups discussion. The researcher will have a first hand information and the opportunity to be involved in the overall process right form starting the focus group till closing it. This gives an in depth insight into the various dimensions of the problem which are hitherto unexplored. Focus group interview can cover a number of topics. The discussion can be directed at successfully over a number of issues. It enables to bring together participants from diverge segments which will otherwise be difficult. The disadvantages of the focus group interview are dealt below; Identifying the participants and gathering them together in a location is difficult 47
The data structures developed from focus group interview cannot be as such applied to the target population. The generalizability of the research findings are questionable. The researcher has only limited ways to substantiate the data reliability. Added to this the data collected from the participants may not be structured and amenable to further statistical inferences. The data collected from the focus group can be subjectively interpreted by the researcher according to the researchers preconceived views. The bias will reduce the credibility and trustworthiness of the data and the information derived. The cost per participant in terms of identifying, recruiting and compensating are relatively quite high.
Online focus group One of the major problems in collecting data through focus group interview is gathering the participants in a particular location. This issue can be solved by administering online focus group interviews. The developments in Internet, telecommunication and computer technological advancements have led to the online focus group. In online focus groups the researcher or moderator and the participants meet and conduct the interview across the Internet on a real time basis. The participants need not be physically gathered in a common location for conducting the session. email, websites and Internet chat rooms facilitate online focus group discussions. However the cost of collecting data through online focus groups is high. Panels Panels refer to the sample of individuals, households or firms from whom information may be collected in successive time periods. Like the focus group, panels enable us to collect primary data. However the focus groups meet for a one-time group session, panels meet more than once. Panel studies are useful to track the effect of changes over a period of time. The members of the panel are randomly chosen. They may be exposed to an advertisement or their attitude towards a particular brand may be recorded. After a few days or month the panel may be exposed to a different set of advertisement or their attitude may be measured again to identify the changes in the behaviourial pattern. Thus the continuing set of members form the sample base or the platform for assessing the effects of change. Such members are called panels and the research that uses them is called as panel study. The panels can be static or dynamic. In case of a static panel the same members form part of the panel over an extended period of time. In case of the dynamic panel members change from time to time as the study progresses to successive phases. The static panel offers a good and sensitive measure of changes. However due to continuous interviews the panel members are over exposed to the issues on hand and they may not reflect the view of population. The members may also not continue to be a part of panel for a longer period of time. There may be dropouts. The major drawback in dynamic panel is that it deals with different people which may give raise to different opinions and the changes cannot be tracked in a objective manner. 3.3.2 Secondary data sources Secondary data refers to the information gathered from already existing sources. Secondary data may be either published or unpublished data. The published data are available in the following forms; Publications of the central, state and local governments Publications of the foreign governments, international bodies and their subsidiary organizations Technical and trade journals Books, magazines and newspapers Reports and publications of various business and industrial associations, stock exchanges, banks and other financial institutions Reports prepared by research scholars, universities, economists in different fields Public records and statistics, historical documents and other sources of published information. Online and real time databases etc., The unpublished sources include the company records or archives, diaries, letters, biographies and autobiographies and other public/private organizations. Collection of secondary data involves less time and cost. However a researcher should not solely depend on the secondary data due to the following reasons; the data can 48
become obsolete and may not provide current and updated information, data would have been collected for some other purpose and hence it may not meet the specific requirements of the researcher. The researcher before using secondary data should ensure the following; The reliability of data should be ensured by way of finding out the type of people involved in data collection, the sources from which the data is collected, the methods used to collect the data, the time of data collection and the level of accuracy associated. The secondary data used by the researcher would have been collected for a different problem other than what the researcher is presently attempting to solve. Hence the researcher should ensure that the data collected is suitable for the purpose of the study for which it is attempted. The secondary data should be adequate for the conduct of the study. It should be related to the area and should neither be narrower nor wider than the problem attempted by the researcher. 3.3.3 Tertiary sources Tertiary sources are an interpretation of a secondary source. It is generally represented by index, bibliographies, dictionaries, encyclopedias, handbooks, directories and other finding aids like the internet search engines. 3.4 Methods of data collection Data collection method is an integral part of the research design. There are various methods of data collection, each method has its own advantages and disadvantages. Selection of an appropriate method of data collection may enhance the value of research and at the same wrong choice may lead to questionable research findings. Data collection methods include interviews, self administered questionnaires, observation and other methods. The choice of a method depends on the following factors; Nature , scope and objectives of the research Availability of resources Degree of accuracy required Expertise of the researcher Time span of the study Cost involved and the like The data collection methods are discussed below in detail; 3.4.1 Interviews In this method the respondents are interviewed for the purpose of obtaining information on the issues pertaining to the reserch. The interview may be either unstructured or structured and it can be personal interview or conducted through telephone, mail , internet or a combination of all these. Unstructured interviews In the unstructured interviews, the interviewer does not conduct the interview with a planned sequence of questions. The aim of this interview is to highlight the preliminary issues so that the researcher can determine the variables which needs further in-depth investigation. The researcher resorts to the unstructured interviews when the problem is not clearly formulated or when a clear understanding of the variables involved is not present. The researcher in the attempt to obtain information may adopt different styles and sequencing of questions to various respondents. Some may provide information with open ended questions, whereas some may require more directions. Some respondents may be more defensive and may not to willing to share information. Some may be even reluctant to undergo the interview and may refuse to respond. The researchers have to employ various questioning techniques so as bring the respondents defenses down and make them more amenable to reveal information. The researcher should also know when to retreat or terminate the interview when the respondents cannot be convinced to participate or impart the information. The unstructured interview will direct the researcher to understand the variables which need greater focus based on which a structured interview can be planned. Structured interviews
49
The structured interviews are conducted when the interviewer knows the type of questions to be asked to the respondents or when the information needs are clearly known. The questions may focus on the issues that have been highlighted during the unstructured interviews and are considered relevant to the problem identified. The interview may be conducted by the researcher himself or by a team of interviewers. The researcher/ interviewer should be very clear about the purpose of each question particularly when a team of interviewers conduct the survey. The same questions are posed in the same sequence or manner to all the respondents and the responses are noted down. Depending on the situations and the respondents willingness and knowledge the researcher can also ask other relevant questions which may not be in the list so as to gain more insight into the identified problem. The researcher may also include visual aids, drawings , pictures and other materials in conducting the interviews. In situations where the ideas cannot be clearly articulated only with words visual aids are more useful. 1. Personal interviews Personal interviews or face to face communication is a two-way conversation initiated by the interviewer to obtain information from the participants. The interviewer and the participants may be strangers. The interviewer controls the topic and pattern of discussion. The participant or the respondents may not gain anything out of their participation in the interview. The success of the personal interview lies among other things on the respondents ability to provide the information needed and the ability to understand the importance of information provided by him. The researcher should take necessary steps to motivate the respondents to cooperate so as to ensure successful conduct of the interview. Increasing participation The researcher can enhance the respondents participation by way of explaining the kind of answer sought, the terms that should be expressed, the depth and clarity of information needed etc. Coaching can be provided to the participants but care should be taken to avoid the biasing factor. The interviewer can make the session an interesting and enjoyable experience by means of administering adequate motivation techniques. Some of the techniques for successful interviewing of the participants are listed below; The interviewer should introduce himself by name and the organizations to which they are affiliated to. The interviewer can identify himself with the introductory letters or other information that confirms the legitimacy of the work. Enough details regarding the work to be done should be given, where ever demanded more information may be provided. The interviewer should be able to kindle the interest of the respondent. If the participant is busy, the interviewer should try to stimulate interest so as to arrange for an interview at another time. The successful conduct of interview requires a good rapport and understanding between the interviewer and participant. The interviewer should earn the confidence of the respondent so as to elicit response without censure, coercion or pressure. In the process of gathering data the interviewer should ensure that the objective of each question is achieved and the needed response is obtained. The interviewer can resort to probing, but step should be taken to avoid the bias. The interviewer should record the answers of the participant in an efficient manner. The interview should record responses as they occur, recording the response later will lead to loss of information. Shorthand mechanism like recording only the keywords can be done in the case of time constraint. Interviewers should have good communication skills, should be able to adapt to flexible schedules, be willing to work during intermittent work hours and should be mobile. If the interview is conducted by the researcher himself, there is no need for much training else proper training should be provided so that the interviewer is able to understand the objective of the study, the purpose of each question, the possible responses and an outline of the research work conducted, its importance etc. Written instructions can be provided where ever needed. Questioning techniques should be followed by the interviewer. Funneling approach can be practiced ie in the beginning of the unstructured interview open-ended questions can be asked to get broad idea and create an impression about the situation. Care should be taken to see that the questions are unbiased. The interviewer 50
should restate or rephrase important information so as to ensure that the issues are recorded as how the respondent intends to represent the same. The researcher can also help the respondent to verbalize the perceptions. Problems in conducting personal interview The two problems in conducting personal interview are the increase cost and the problem of biased results. Biased results arise out of three types of errors viz., sampling, response and non response error i.Sampling error One of the major criteria of a good sample design in the precision of estimate made with the samples. The sample respondents selected for conducting the interview may not fully represent the population in all aspects. The numerical descriptors that describe the sample may differ from those that describe the population because of the random fluctuations inherent in the sample process. This is called as sampling error. The sampling error reflects the influence of chance in drawing the sample members. The sampling error is that which is left after accounting for all known sources of systematic variance. ii.Non response error Non response error occurs when the responses of participants differ in some systematic way from the responses of non-participants. The error occurs due to the inability to locate and access the selected sample respondent or the selected sample respondent may not be willing to participate in the interview. This problem specifically arises due to selection of samples through the probability sampling method. The problem can be tackled by way of attempting to contact the respondent again. Another approach is to treat all the remaining non participants as a new subpopulation after a few callbacks. A random sample is drawn from the non participant group and attempt is made to contact and complete this sample at hundred percent success rate. Finding from this nonparticipant sample can be then weighted into the total population estimate. The researcher can also try to substitute the missing participant but care should be taken to see that the substitute participant possess the significant character of the replaced participant. For eg the respondent should belong to the same occupation, educational status, income level etc., iii.Response error Response error occurs when the data reported differ from the actual data. The error can be caused by the respondent or the interviewer or during the preparation of data for analysis. Participant initiated error occurs when the participant fails to answer accurately either by choice or due to lack of knowledge. Interviewer error arises due to the inability to conduct the interview in a controlled manner. This may take many forms like the failure to secure cooperation, lack of consistent interview procedures, inability to establish appropriate interview environment, bias due to physical presence, failure to record answers correctly. These errors affect the quality of the data collected. iv. Cost To conduct the personal interview, the respondents should be met individually. They might be scattered geographically and the time and cost involved in administrative and travel task is higher. Sometimes the respondents may not be available and repeated contacts have to be made which adds to the cost. In addition to this the researcher may employ interviewers who have to be paid. To reduce the cost telephone interviews and self administrated surveys can be attempted. Advantages and drawbacks The major advantage of personal interviewing is the ability to secure in-depth information and detail. The ability to harness information is more in personal interviewing as compared to telephone, mail survey and through internet. The researcher can adopt the questioning technique in tune with the respondents ability to understand. Further clarification can be immediately made by repeating or rephrasing of the questions concerned. The researcher can also get information from the nonverbal cues exhibited through the body language of the respondent. However the personal interviewing involves cost in terms of both money and time. Costs may escalate in case, where the study covers a wide geographic area or has a large sample to be covered. The chance of the outcome being affected by the interviewers bias is more in the case of personal interviews. The respondents may feel uneasy about the secrecy of their responses in case of the face to face interaction. 2 Telephone interviews Interviewing through telephones enables to gain the following advantages;
51
Conducting interview through telephone enables to reduce the cost. The cost reduction arises due to reduction in traveling and administrative expenses involved in training and supervision. It is enough to train less number of interviewers since the interview is conducted through telephone. Coverage per person through telephone will be more than the face to face interviews. Telephone interview enables to screen and cover large population spread over a wide geographical location. It enables to have a much more representative samples. Use of computer assisted telephone interviewing enables to enter data collected in interview directly in the computers by means of terminals or through voice data entry. This help in further cost and time reduction. Computer administered telephone survey can also be conducted where the computer can replace the interviewer. A computer calls the phone number, conducts interview and place data into a file for later tabulation. The interviewers bias caused by physical appearance, body language and actions are reduced by using telephones. The respondent may feel more relaxed, comfortable and unhesitant to reveal information as the face to face contact is not present. Unlike face to face interview where the respondent may avoid contact with the researcher, the contact rate is higher in telephone interviews as the respondent has to pick up the ringing phone. However the use of caller identification facility may reduce the contact rate. The following drawbacks arise out of telephone interviews; Though the penetration rate of telephones is increasing in India, still there is a vast population without the telephone facility. Also the number of users with only cell phone connection is increasing. Their numbers are not listed and reaching them would be difficult. The random sample identified through telephone directories may be some time not available in the number given or may be malfunctioning. The length or duration for which the telephone interview can be conducted is limited. Ten minutes interview is considered ideal however sometimes the interview may extent to more than an hour also. It is difficult or impossible to use maps, illustration, visual aids , measurement scale techniques in the telephone interview. The researcher cannot depend more on the visualization techniques. The interview can be terminated by the respondent as easily as the contact could be made. Also the level of interest and rapport in the telephone interview is much lesser when compared to the face to face interview. The challenging and distracting physical environment either at home or office may reflect on the quality of data collect and may also result in refusal to participate in the interviews.
3.4.2 Observation Observation is most commonly used data collection method in many of the studies relating to behavioral sciences. Observation enables to collect data without asking question from the respondents. The respondents can be observed in the natural work environment or in lab setting and their activities and behaviors of interest can be recorded. In conducting research, casual examination without purpose cannot be called as observation. Observation becomes a scientific tool for data collection, if it is conducted specifically to answer a research question. It should be systematically planned and executed using proper controls and should provide a reliable and valid account of what has happened. Types of observation Observation can be grouped under the following categories 1. Type of activity under observation Observation includes monitoring both behavioral and nonbehavioral activities and conditions. Behavioral observation includes nonverbal analysis, linguistic analysis, extra linguistic analysis and spatial analysis. Non verbal analysis includes body movement, motor expression and exchanged glances. Body movement indicates interest, boredom, anger or pleasure. Motor expression include facial movements, blink of eye and exchanged glances. 52
Linguistic behaviour includes the number of repeated words used by persons in a conversation. It also includes the type of interaction process that occurs between two persons or in small groups. There are four dimensions to extralinguistic behaviour viz., (i) vocal which includes pitch, loudness and timbre (ii) temporal which includes the rate of speaking, duration of utterance and rhythm (iii) interaction which includes the tendencies to interrupt, dominate or inhibit and (iv) verbal stylistic including vocabulary and pronunciation, peculiarities, dialect and characteristic expressions. Spatial relationship refers to how a person relates physically to others. For eg., proxemics is a study which relates to how people organize territory about them and how they maintain discrete distances between themselves and others. The non behavioral analysis includes record analysis, physical condition analysis, and physical process analysis. The records include the historical or current record and public or private records. It may be written or printed. Physical condition analysis includes conducting store audits, studies relating to plant safety compliance, analysis of financial statements etc. Process refers to series of steps taken to complete an activity. It includes time/motion studies relating to the manufacturing process, analyzing the traffic flow in distribution system, financial flow in organization etc., 2.Directness of the observation Based on the directness of observation, it can be grouped as direct or indirect. Direct observation happens when the observer is physically present and monitors while the event is taking place. This is highly flexible as the observer can decide what to observe, how much time to spent on observation of an aspect, when to shift focus etc. The observer may feel bored or frustrated by constantly being on the watch and may tend to loose focus. This might reduce the accuracy and completeness of the observation. Another weakness is that the observer may be overloaded when the events takes place quickly which cannot be kept track of or recorded. Observation carried out using mechanical, photographic or electronic means are grouped under indirect observation. For example the uses of video cameras, pupilometric devices etc to capture the behaviour of consumers are grouped under indirect observation. Indirect observation can be carried out in an unbiased manner. Further loss of information due to boredom, fatigue, overloading etc is avoided. However the indirect observation is less flexible as they may be programmed earlier. 3. Concealment This categorization is based on whether the participant is aware of the observers presence. The presence of observer may cause the participant to behave in a different manner which might arrest the very purpose of observation. If the activity in which the participants are involved is highly absorbing then there is a high chance that the participant may remain unaffected by the presence of the observer. However the potential bias due to the presence of observer cannot be totally ruled out. In order to rule out the bias in behaviour the observers may conceal themselves from the object being observed using some mechanical means. For eg, one way mirror, camera, microphone etc. However this has to be carefully evaluated on the basis of ethical grounds. Partial concealment is where the presence of the observer is not concealed but his objectives or interest is not revealed. In order to evaluate the performance of a sales person, a sales manager may be present when the sales man is dealing with the customer. However the purpose of sales manager presence may be concealed and he may pretend to be involved in some other task. 4.Participation The presence of the observer and his involvement in the research setting is called participant observation. He plays the role of observer as well as the participant. The participants may or may not know about the same. The observer should be more efficient as he has to play a dual role. Non participant observation occurs when the observer collects the data without becoming an integral part of the research setting. The observer merely observes the activities, records them and tabulates them in a systematic manner. This type of observation requires the observer to be physically present in the research setting for a extended period of time which makes it a time consuming task. 53
5. Definiteness of structure The observation can be grouped as structured and unstructured observation. Clear definition of various aspects of observation viz., the units to be observed, method of recording, extent of accuracy needed, conditions of observation and selection of pertinent data of observation etc are the characteristic of structured observation. Structured observation is appropriate in case of descriptive studies. If the observation is conducted without the above characteristics defined in advance, it is termed as unstructured observation. This method of observation is usually followed in exploratory studies. 6. Extent of control The observation can be carried out in controlled or uncontrolled settings. Uncontrolled observation is carried out in a natural setting. No attempt is made to use precision instruments. The main aim of using this method is to get a spontaneous picture of reality. It provides naturalness and completeness to observation. However it may lead to subjective interpretation and over confidence that the observer knows more about the observed phenomena than the actual. It is usually used in exploratory research. Controlled observation takes place according to a definite pre determined plan. It involves experimental procedure and involves the use of precision instruments to record the observation. The observation is usually carried out in a standardized and accurate manner leading to certain assured degree of generalization. It is usually carried out in the form of experiments in laboratory or under controlled conditions. Decision involved in conducting the observational study Observational studies involve the decision regarding the type of the study, content to be observed, training requirement of the observer/researcher and the data collection. 1.Type of the study Observation in various forms is practiced in different type of studies. In exploratory studies data collection is done through simple observation which may not be carried out in a structured manner. In case of studies other than the exploratory nature, systematic observation employing standardized scientific procedure will be followed. 2.Content specification In observational studies the variables to be observed and other variables that may affect them should be specified. From the specified variables, the variables that are to be observed should be selected. The variables should be operationally defined so as to avoid confusion in the minds of observers. 3. Training the observers The validity and reliability of the findings from observation depends on the observer. If the observer is not trained properly, the data collected may not lead to valid results. Observer is prone to fatigue, halo effects and observers drift which will affect the dependability of the data collected. Hence in selection of observers certain guidelines should be followed. The observer should have the ability to function amidst lot of distractions, remember details of the activity observed, blend with the settings being observed and should have the ability to extract the most from the observational study. The observer should be given clear instruction regarding the outcome sought and the precise content to be observed. 4.Data collection Data collection plans deals with answers to question like who, what, when, how and where. The qualification of a participant to be observed, the characteristics of the observation, the time of observation, the method of recording data by the observers and the place where the observation is to be conducted. 3.4.3. Questionnaires Most of the research studies carried out for solving business problems require the researcher to depend on primary data. The researcher should collect data through questionnaires/ interview schedules and process the same so as to provide solution to the identified problem. A questionnaire is a formalized framework consisting of a set of questions and scales designed to generate primary raw data. It is a preformulated written set of questions to which the respondents record their answers. The answers are mostly chosen by a respondent from within the closely defined alternatives. The questionnaires can be administered personally, mailed to the respondents or electronically distributed. A.Personally administered questionnaire
54
If the study is confined to a local area, the questionnaires can be collected by personally administering the same. The main advantage is that the researcher can collect all the completed responses within a short period of time. The researcher has an opportunity to introduce the research topic and motivate the respondents to offer frank answers. Any doubts that the respondents have on any questions is clarified on the spot. Administering the questionnaire to a large number of respondents at a time would save time and expenses and also ensure quick collection of data as against personal interviewing. Hence wherever possible group administration of questionnaire should be opted for depending on the sample frame work. The major drawback will be the reluctance of organizations to give time to conduct survey among group of employees. B. Mail questionnaire Where the respondents are scattered over a wide geographical area, the researcher has to resort to mail questionnaires. The questionnaires are mailed to the respondents, who can complete them at their convenience, in their home at their own pace. The main advantage is that the anonymity of respondents is maintained and this will lead to a free and frank disclosure of information. The respondents spread over a wide , geographical area can be reached and the respondents can take more time at their convenience and fill the questionnaire. It can also be administered electronically. However the return rates of mail questionnaires are typically low. The doubts in the questionnaire cannot be cleared as easily as in the case of personally administered questionnaire. The representativeness of the sample is questionable due to the low return rates. The respondents can be motivated by sending follow-up letters, enclosing small monetary amounts as incentives, providing respondents with self-addressed, stamped return envelopes and keeping the questionnaire as brief as possible Development of questionnaire requires both creativity and scientific approach. It involves creativity because the researcher should use creative words in communicating to the respondents. Writing of question alone does not make up a questionnaire. It should be scientific as it integrates the established rules of logic, objectivity, discriminatory powers and systematic procedures. Guidelines for questionnaire design A good questionnaire accomplishes the research objectives. The logical sequences of the steps involved in the development of a good questionnaire are discussed below. I. Deciding the information to be collected II. Formulate the questions needed to obtain the information III. Decide on the wordings of the questions and layout of the questionnaire IV. Pretesting the questionnaire and correcting the problem I. Deciding the information to be collected The researcher should have a clear idea of exactly what information is to be collected from each respondent. Lack of clarity will lead to collection of irrelevant and incomplete information which does not contribute towards the research purpose. The situation will diminish the value of the study. Clarity can be facilitated by 1. Clear research objectives that will provide an insight into the kind of information needed, the hypotheses and the scope of the research 2. Exploratory research will reveal the variables to be explored and will enable to understand the point of view of the respondents 3. Experience with similar studies 4. Pretesting the preliminary version of the questionnaire In deciding the content of the questionnaire the following guiding factors should be considered. The question may be asked to get information regarding objective or subjective variables or both. In the case of objective variables like age, gender, income etc a single direct question can be asked. However if the question is regarding subjective variable for eg., regarding attitude, feeling, satisfaction etc then the questions should tap the dimensions and elements of the concept concerned. The researcher should challenge each questions in terms of its contribution towards providing an answer for the objectives. Questions which merely contribute interesting information and not towards the fulfillment of the objectives should be avoided. The researcher should learn the art of getting more information with fewer questions. 55
The question should have a proper scope and should cover the issue. The questions asked should reveal all that is needed to know. Questions are considered to be ineffective if they do not provide the right information that is needed. The question should ask precisely what is needed. For eg if the researcher needs to know the family income of the respondent but the question is asked regarding income then it may mean to the respondent as the respondents income and not family income. Unambiguous words can be used so that clarity can be ensured. The question asked by the researcher may be contributing towards the theme and may be precise but it may not be possible for the respondent to answer the same adequately. The respondent may require time to think and answer certain questions. Sometimes the respondent may not be able to give an accurate answer due his inability to recall things from memory.
II. Formulating the questions Before formulating the questions a decision has to be made by the researcher regarding the degree of freedom to be given to the respondents in answering the questions. The various types of the question that can be included in a questionnaire are discussed below: 1. Open ended versus closed questions: Unstructured questions or open ended questions allow respondents to reply to the questions in own words. It enables the respondent to answer in any way he chooses. Predetermined responses are not given to aid the respondent. For example a question asking the respondent to list five factors which made him to choose a particular investment proposal. This type of questions requires more thinking and effort on the part of respondents. In most cases an interviewer is required to prompt the response by asking probing questions. If correctly administered the open ended question can provide the researcher with a rich array of information. Structured or closed end question in contrast provides a set of predetermined responses and the respondents is required to choose among the same. This question reduces the amount of thinking and effort required by the respondent. Instead of asking the respondent to list five factors, the questionnaire may provide a set of 10 to 15 factors and ask the respondent to rank the first five among the list, in the order of their preference. All items in the questionnaire using nominal, ordinal or Likert or ratio scale are considered closed. The closed ended questions enable the researcher to code the responses easily for the purpose of carrying out subsequent analysis. Care should be exercised in making the alternatives provided as mutually exclusive and collectively exhaustive. Even a well delineated categories in closed question may make the respondent feel confined and he may be willing to provide additional comments. The researcher can tackle this issue by substantiating the closed ended questionnaire with a final open ended question. 2. Dichotomous questions Two alternatives are suggested in dichotomous questions. The choices presented should be mutually exclusive i.e. the respondent should choose either of the answer only. At the same time the given choices should be collectively exhaustive. 3. Multiple choice questions Multiple choices offer more than one alternative answer and from which the respondent to makes a single choice. The list of answers provided should be collectively exhaustive. The alternatives provided should represent different aspects of the same conceptual dimension. The multiple choice question usually generates nominal data. When the choices are numbers, the response structure will produce at least interval and sometimes ratio data. 4. Checklist questions Checklist questions are used when the researcher wants the respondent to give multiple responses to a single question. For eg. the factors leading to the choice of a particular brand laptop. The same information can be obtained from the respondent using a series of dichotomous selection questions, one for each factor. However it would be time and space consuming. Checklists are more efficient. 5. Ranking questions Ranking question is used when the response regarding the relative order of the alternatives are important. For eg. the check list question regarding the factors leading to the choice of laptop will only provide the factors considered 56
but not the order of importance. The ranking question will lead the respondent to rank the most important factor as 1 the next important as 2 and so on. 6. Positively and negatively worded questions The questionnaire should include both positively and negatively worded questions. If all the questions are positively worded then the respondent will tend to mechanically circle all the points toward one end of the scale. A respondent who is interested in completing the questionnaire soon will tend to circle all the questions to one end. The researcher can keep a respondent more alert by including both positive and negative worded questions. The use of double negatives and excessive use of words such as not , only etc., should be avoided in the negatively worded question as they will tend to confuse the respondents. 7. Double-barreled questions A question that leads to different possible responses to its subparts is called a double-barreled question. Such questions should be avoided by way of breaking the questions into two or more parts. For example the question do you like the flavour and the taste of the soft drink?. The question may lead to ambiguous reply. It should be broken into two questions addressing flavour and taste separately so as to obtain unambiguous response. The type of question dealt below should be carefully avoided or used with caution by the researcher. 8. Ambiguous question The question may not be double barreled but still it may lead to ambiguity. For eg If the researcher involved in the study of the job satisfaction asks the respondent to rate the level of satisfaction, the respondent may be confused as to whether the question is addressing satisfaction related to work environment, salary, team spirit or overall satisfaction. The question should not give raise to ambiguous response and bias. 9. Memory related questions If the questions require respondents to recall experiences from a distance past that are very hazy in their memory, then the answers to such question might have bias. 10. Leading / Loaded questions Questions should not be asked in such a way that the respondents are forced or directed to respond in a manner that he would not have, under normal situations where all possible alternatives are given. Questions should not prompt the respondents to answer in the way the researcher wants it answered. For example Dont you think that salary is the main reason for software employees to quit the job?. Questions which are emotionally charging the respondents are called as loaded questions. Such questions would lead to bias in response and should be avoided. 11. Bad questions Any question that prevent or disturbs the fundamental communication between the researcher and the respondent is considered to be a bad question. Some example of the bad questions are incomprehensible questions, unanswerable question, leading or loaded questions, double barreled question etc. III. Decide on the wordings of the questions and layout of the questionnaire The basic component of a questionnaire is the words. The researcher should be careful in considering the words to be used in creating the questions and scales for collecting raw data from respondents. The words used can influence respondents, reaction to the question. Even a small change in the words can affect the respondent answers, but it is difficult to know in advance whether or not a change in wording will have an effect: The wording used in the questionnaire and the language used should be appropriate and understandable by the respondents. Certain guidelines in deciding the wordings of the questionnaire are given below: The vocabulary should be simple, direct and familiar to all respondents If the wordings / jargons used or the language is not understood by the respondent, then it may lead to wrong or biased answers. The wording and language should be selected keeping in mind the educational level of the respondents, the terms used in the culture and the frames of reference of the respondents. The words used should not give raise to ambiguity or vagueness. This problem arises because of not giving the respondent an adequate frame of reference , in time and space for interpreting the question. Words such as often, usually lack an appropriate time referent leading the respondents to choose their own which will lead 57
to answers not comparable. Similarly appropriate space or location is not often specified. For eg.,the question Mention your place of origin Does it elicit response as the district, state or country?. Double barreled question should be avoided. The respondent may agree with one part of the question but not the other. For eg. Are you satisfied with salary and increments given? The question should be broken else it would lead to confusion and incorrect answers The instructions provided to answer the question should not be confusing the respondent. The questions should be directed more towards measuring the respondents knowledge or interest in the subject The questions asked should be applicable to all the respondents. Otherwise it will make a respondent to answer a question though they dont quality to do so or may lack an opinion. For eg Which other airways have you traveled before?. This situation can be avoided by asking a qualifying or filter question and limit further questioning to those who qualify. Simple short questions should be asked instead of long ones. Researcher should see that a question or a statement in the questionnaire should be worded as minimum as possible. Questions should not be asked in such a manner that it will elicit socially desirable response. For example Do you think that physically challenged people should be given more weightage in employment opportunities?. Irrespective of the true feeling of the respondent a socially desirable answer would be provided.
Sequencing and layout decisions The order in which the questions are to be presented can encourage or discourage the commitment and promote or hinder the development of researcher-respondent rapport. The sequence of questions asked in the questionnaire should lead the respondents from questions of general nature to specific nature. It should start with relatively easy questions which does not involve much thinking and should progress to difficult questions. This facilitates easy and smooth progress of the respondents through the various items in the questionnaire. Care should be taken to see that the positively and negatively worded questions addressing the same issue or concept are not placed contiguously. For eg., I am satisfied with the working environment I am not satisfied with the working environment If the above questions appear in the same order it will appear meaningless to the respondent. The two questionnaires should be placed in different places of the questionnaire. The way in which questions are sequenced would introduce bias in the response which is frequently referred to as the ordering effects. Randomly placing the questions in the questionnaire would reduce bias in the response, however it is not attempted as it would lead to difficulty in categorizing, coding and analyzing the responses. Layout of the questionnaire The appearance of the questionnaire is as important as its content. A neat, properly aligned and attractive questionnaire with a good introduction, instructions and well sequenced questions and response alternatives will make things easier for the respondents to answer. These aspects are explained below: In the introduction section, the researcher can disclose his identity and communicate the purpose of the research. It is also used to motivate the respondents to answer the questions by conveying the importance of the research work and by specifying the importance of contribution from the respondent. The researcher should also ensure the confidentiality of the information provided. The introduction section should end with a courteous note, thanking the respondent for the time devoted to respond to the survey. The questions should be organized in a logical manner and numbered sequentially under appropriate sections. Proper instructions should be provided to complete the questions in an unambiguous manner . The questions should be neatly assigned so as to enable the respondent to read and answer the same without difficulty. The questionnaire should be designed in such a way that the respondent spends only minimum time and effort in completing the same. Questions relating to the personal profile of the respondents viz., name, gender, age, education, income, marital status etc., can appear in the beginning or at the end of the questionnaire. The questions should provide 58
a range of response options rather than seeking an exact figure. The personal profile related questions asked at the end may have a greater chance of response because the respondent would have gone through other questions which would have convinced him about the legitimacy and genuineness of the questions framed. This would make them more amenable to reveal the personal information. Some researchers feel that asking personal data in the beginning would enable the respondent to psychologically identify themselves with the questionnaire and enhance the commitment to respond. Avoiding the question relating to the name of the respondent would be better as it ensures anonymity and enhances the probability of response. The identification of the questionnaire with a particular respondent can be made by assigning number instead of asking name. A separate private document can be maintained connecting the name and number given to identify the respondent. The open ended questions should be put at the end so the respondent may find it easy to comment on the various aspects. The questionnaire should end with an expression of sincere thanks to the respondent for spending their valuable time and effort. The researcher can also include a courteous note, reminding the respondents check that all the items have been completed properly. iv. Pretesting the questionnaire The purpose of a pretest is to ensure that the questionnaire meets the researchers expectations in terms of the information to be obtained. The objective of the pretest is to identify and correct the deficiencies in the questionnaire. It may lead to revising questions many times. It involves the use of a small number of respondents to test the appropriateness of the questions. 15 respondents are sufficient for a short and straightforward questionnaire, whereas 25 may be needed in case of a long and complex questionnaire with many branches and multiple options. Feedback is obtained from the respondents involved in the pretest on the general reaction to the questionnaire and regarding the effort involved in completing the questionnaire. Any difficulty or ambiguity can be identified and rectified before administering the questionnaire to a large number of respondents. This helps to rectify any mistakes in time and enables to reduce the biases. Various type of pretesting can be carried out ranging from informal reviews by colleagues to creating conditions similar to the final study. Some types are discussed below: The researcher pretesting is conducted in the initial stages so as to build more structure in to the test. Fellow researchers can be involved. Many suggestions and discussion may take place leading to a refined questionnaire Participant pretesting involves testing the questionnaire in the field by involving the participants or participant surrogates. Surrogates are those individuals with characteristics and backgrounds similar to the desired participants. Collaborative pretest can be conducted by the researcher where the researcher informs ore alerts the participants of their involvement in the preliminary test of questionnaire. This makes the participants as the collaborators in the process of refinement of the questionnaire. A detailed probing of the parts of the question, including the words and phrases is carried out. Noncollaborative pretest is where the researcher does not inform the participant that the activity is a pretest. However the probing of the questionnaire is done. The pretest is conducted for the following reasons: The most important purpose for pretesting is to know whether the meaning of the questions is interrupted in the manner in which it is intended to. This problem may arise because, the respondent may not be familiar with the word which will result in distortion of the meaning of the question. The respondent is likely to modify a difficult question in a way that makes it easier for him to respond. Flow of the questionnaire should be tested to know whether the transition from one topic to another is natural, logical and ensures a coherent flow. Many questionnaires have instructions on what question to skip, depending on the answer to a previous question. The skip pattern must be clearly laid out. In this context a questionnaire is like a road map with signs. Researchers who have been involved with the questionnaire design may not spot any inconsistencies or 59
ambiguities as they are highly involved in the task. Pretesting will ensure the correct layout of the questionnaire The length of the questionnaire is pretested as a lengthy questionnaire will often lead to fatigue among the respondent, interview break-off and refusal if the respondents know in advance the expected length. Task difficulty should also be identified through pretest. The respondent may be confused if the question requires that a respondent make connections or put together information in an unfamiliar ways. For eg. questions related to annual income. It involves calculation by the respondent. Instead the researcher can get monthly income and calculate the annual income on his own. Ability to capture and maintain the interest of the respondent throughout the entire questionnaire is a major challenge. The extent to which this is successful should be pretested Testing the items for an acceptable level of variation in the target population is one of the common goals of pretesting. The researcher should lookout for items showing greater variability. Very skewed distributions from a pretest can serve as a warning signal that the question is not tapping the intended construct. The flaws identified in the questionnaire should be corrected. Finally the pretest analysis should return to the fist step in the design process. Each question should be reviewed again and again regarding its contribution to objectives of the study, leading to other steps. The last step in the process may be another pretest, if major changes are needed again. Interview Schedules Vs Questionnaire Use of interview schedules method to collect data is much the same like questionnaire method except the fact that in the case of schedules, the same is filled up by researcher himself or by enumerators who are appointed for this purpose. The schedule is a proforma containing a set of questions and the space to record the answer for the same. The enumerators along with the interview schedules meet the respondents, put the questions to them from the proforma in the order the questions are listed and record the replies in the space meant for the same. In certain situations, schedules may be handed over to the respondents and researcher may help them in recording their answers to various questions in the schedules. The researcher can explain the aims and objectives of the investigation and also can clear doubts and difficulties which the respondents feel in understanding the implications of a particular question. The success of this method depends on the selection of enumerators for filling up schedules or assisting respondents to fill up schedules. The enumerators should be trained to perform their job well and nature and scope of the investigation should be clearly explained to them. The purpose of each question and the type of response expected should be informed to them. Enumerators should posses patience to tackle the respondents and should also be intelligent to cross examine and find the truth. They should be sincere, hardworking and should have perseverance. Collection of data using interview schedules and enumerators lead to fairly reliable results and extensive inquiry. However it is expensive and takes time. Difference between questionnaire and interview schedule Questionnaire and interview schedule are both used for data collection and they resemble each other. However the important points of difference are highlighted below: i. The questionnaire can be sent thought mail with covering letter and the same does not require further assistance. The schedule is filed out by the researcher who interprets the question whenever needed. ii. Collecting the questionnaire requires less expense as it is filled by the respondent himself. In the case of schedules, enumerators should be appointed. This involves additional expenses in terms of payments made to them and training provided. iii. The rate of non response is usually higher in case of mailed questionnaire. In case of schedules the nonresponse rate is lesser as the enumerator himself fills the schedules and is personally present. However the danger of bias and cheating prevails. iv. The identity of the respondent is not clear in the case of the questionnaire, but in case of the schedules the identity is known. v. The questionnaire method of data collection involves time as it requires several reminders inspite of which it may not be returned. In case of schedules direct personal contact is established and responses are elicited soon. 60
vi. vii. viii. ix. x. xi.
Questionnaire method can be used only in case of educated or literate respondents but the interview schedules can be administered even in case of illiterate persons Wider and more representative population is possible in the questionnaire method of data collection, but it remains as a difficulty in case of schedules particularly when the respondents are distributed over a wide geographical area. Risk of collecting incomplete and wrong information is more in case of questionnaire method, but in case of schedules, the enumerators are present to see that the questions are properly filled in. As a result the information collected through the schedules are more accurate than those obtained through the questionnaire. The success of the questionnaire method depends to a greater extent on the quality of the questionnaire, but in case of the interview schedules it depends on the honesty, sincerity and perseverance of the enumerators. The physical appearance of the questionnaire is very important to attract and retain the respondents attention, however the level of importance is not the same in case of the interview schedule. Additional data can be obtained by the enumerator apart from what is asked in the schedules by personal observation. This is not possible in case of the mailed questionnaire.
Electronic questionnaire design and surveys Electronic questionnaires or online questionnaries combines questionnaire based survey functionality with that of a webpage or website. It also includes mailing of Data disk to the respondents, who may use their won personal computers for responding to the questions. However web surveys are rapidly gaining popularity as they have major speed, cost, and flexibility advantages. Factors such as layout or organization, formatting, structure of the questions and technical requirements should be taken into consideration while designing and implementing an electronic questionnaire. Decision regarding the above listed aspects should be made keeping in consideration, the profile of the target audience identified for the survey. The guidelines relating to the factors to be considered are discussed below; 1. General organization The overall structure and organization of electronic question can follow the pattern depicted in the figure below. A welcome page should be used to motivate respondent to participate in the survey. Where the entry is restricted a login facility can be provided to allow the user to enter the password. A short introductory page should provide general information about the survey, including specific direction if any to fill the questionnaire. If a screening test is required for the survey it is delivered before proceeding to the main section of the questionnaire. Every online questionnaire should conclude with the acknowledgement to the respondent for the time and effort spend in completing the questionnaire. These aspects are detailed below; Welcome
Registration/ Login Additional information links Questionnaire questions
Introduction
Screening test
Thank you i. Welcome: The site or domain name that brings the respondents to the survey page should be easy to remember and should reflect the purpose of the questionnaire. Several domain name could be used to attract the respondents. The 61
welcome page should be designed in such a way that it is loaded quickly. The page should provide information regarding the organization on whose behalf the questionnaire is administered. It should motivate the respondent to take part in the survey and emphasize on the ease of responding. The procedure to start should also be made evident. For questionnaire with password restriction, the fact should be mentioned clearly in the welcome screen so that the respondent does not waste time over the same. Too much of animations and gimmicks should be avoided as it may take more time to download and may also distract the respondents attention. ii. Registration/login The registration or login screen is needed if the access to the questionnaire is restricted to specific people. The passwords access should be provided to the appropriate respondents so as to enable them to participate in the survey. While processing the pin number and password it is better to accept dashes and hyphens as part of string of numbers. In order to alleviate the respondents frustration, soon after the data is entered in the required fields, all correct data should be accepted and only the fields that have been erroneously omitted or completely incorrectly should highlighted for reentry with a proper explanation regarding the required data. Sufficient time should be provided to read and complete the registration forms before automatic time out. iii. Introduction This section should provide a brief description of the survey, the purpose and the importance of the response received. It should also outline all the security and privacy practices associated with the survey so as to reassure the respondents. Alternatively, these information can also be included in the registration/login page. iv. Screening test If the screening test is very simple it can be located within the introduction page. If it is more extensive, it should be dealt in a separate page but should be linked to the preceding and the succeeding pages. If a respondent fails a screening test, still the chance to participate in the survey should not be denied as it will offend the respondent. However the contribution can be discarded in the study. v. Questions The questions should follow all the basic guidelines of the offline questionnaire. In addition the following should also be considered in the designing of the electronic questions; 1. The total number of questions should be as minimum as possible. 2. Initial questions should be routine, easy to answer questions so as to ease the respondents mindset. The first question should be engaging and should attract the attention and interest of the respondent. 3. Difficult, sensitive and most important questions should appear after the respondent has completed atleast 1/3 of the questionnaire at a point when the respondent would have settled down. 4. In order to ensure consistency of responses, repeated questions which are worded in a different manner often forms part of the questionnaire. Such questions should be placed apart. 5. Open-ended questions should appear before closed-ended on the same topic so as to prevent influencing respondents with the fixed option choices of the closed-ended questions. vi. Additional information/links: In order to ensure that the main questionnaire is simple not cluttered with unnecessary information, additional information can be included in a separate page which is linked to the main page. The respondent should be able to return to the main questionnaire easily at any point of time. vii Note of gratitude: The questionnaire should end with a note of gratitude to the respondent for the time and effort spent by him. The same should be expressed in a friendly and gentle tone. This page should also include the facility for respondents to e-mail feedback or comments to the questionnaire administrators. 2. Layout Online questionnaires should create a distincitive, positive visual impression so as to evoke a feeling of trust and assist the respondents in the process of completing the questionnaire. Some guidelines are given below;
62
1. The question and answering process should be attractive. It should be presented in can uncluttered manner evoking ease in completion. Too much information should not be squeezed in one page. Information should be aligned horizontally and vertically to enable easy reading. 2. A question should never be separated from its response set. The question and all elements of response should appear in the same page/screen. 3. The questions related to a given topic should be presented together and clearly sectioned from questions related to other topics. Section heading and subheadings should be used to clearly differentiate sections. Too many sections should also be avoided as it will lead to confusion and lack of focus. 4. It should not be made compulsory for a respondent to answer a question so as move to the next question except in case of screening question. Respondent should be provided the freedom of moving back and forth in the process of completing the questionnaire 5. A questionnaire can be designed within a single screen with scrolling page or dispersed across several, linked, non-scrolling pages. It should be noted that lengthy questionnaire with a scroll may frustrate a respondent and give the impression that a questionnaire will take a long time to complete. Only a few questions should appear in a page and clear and easy links to preceding and succeeding pages should be provided. 6. Frames can make pages difficult to read, print, increase load time and cause problems. So the use of frame should be minimized or avoided. 7. Forms and fields are commonly used for data entry. The field labels should be placed close to the associate fields. The submit button should be located adjacent to the last field. The tab order for key navigation around the fields in questionnaire should be logical and reflect the visual appearance as far as possible. Fields should be stacked in vertical column and any instruction pertaining to a given field should appear before and not after the field. 3. Navigation To enable easy navigation within the website presenting questionnaire, online questionnaire usually includes buttons, links, site maps and scrolling. All mechanisms for navigation should be clearly defined and should be placed in such a manner that it can be easily identified and accessed by the respondents. The navigational aids are typically located at the top right hand corner of web pages and it should appear consistently in the same place on each page of a quesionnaires website. The guidelines regarding the navigational elements are listed below; 1. Buttons enables a respondent to exit a questionnaire or return to the previous/next section of a questionnaire. This should be placed consistently in the same place in all the pages and should be designed in a easily identifiable manner. Graphical presentation can be used to name the button. 2. Links are commonly used in web pages. It should be designed in a simple manner and used sparingly. It should be placed in a clearly identifiable manner. Bold, coloured, underlined text can be used. A link that has been visited by respondend should be indicated by a change in the colour. Text based link should be used rather than image based links. Clear distinction should be made between links to locations within the same page and different page. 3. Site maps provide an overview of the entire webpage at a single glance. They help users navigate through website and enables saving time and frustration. The path way is usually in a linear manner and therefore the orientation should not be overly complex. The site maps should be scaleable and should be consistently placed. It should be downloadable in minimum time and used only when there are more number of pages in a questionnaire. 4. Scroll bar should be avoided as some respondent find scrolling hard to use and it can also be overlooked by them. The welcome page should fit into a single screen and not require scrolling. If scrolling cannot be avoided then the respondents should be informed of the need to scroll. Scrolling can be avoided by using jump buttons which takes the respondents to the next screen full of information or questions. 4. Formatting Formatting of a questionnaire includes several aspects ie text, colour, graphics, flash, frames and tables, feedback and other miscellaneous factors. Guidelines pertaining to each of these aspects are discussed below: A. Text 63
i. ii. iii. iv. v. vi.
vii. B. Colour Colour has a great impact on the respondents and their responses and so it is important to use colours in a wise manner. Consistent colour coding should be used throughout the questionnaire to reinforce meaning or information in an unambiguous fashion. Neutral background colour excluding patterns should be used to make text easy to read. When using two colours, the colours of high contrast can be used to ensure maximum discernability. The use of following combination of colours can be avoided since visual vibrations and after images can occur; red and green, yellow and blue, blue and red, blue and green. While using colours the standard cultural colour association should be kept in mind C.Graphics In order to minimize the download time, the graphics should be kept to minimum. Some guidelines to be followed are listed below; i. Cluttering the questionnaires with graphs should be avoided as it repel the participants ii. Small graphics that can be downloaded quickly should be used. Individual images should not exceed 5KB in size and a single web page should not have graphics exceeding a total of 20KB size iii. It is essential to provide text also along with the graphics. Progressive rendering method of allowing the text to download before the graphics should be used so as to reduce the access time. iv. The number of colours used in the graph should be kept minimum and it should be ensured that the graphics do not resemble other items on a typical website. v. Overlapping of menus, blurring of picture, graying out areas etc., should not be done. Crisp and clear images should be used. vi. Multimedia and audio clips should not be associated with the graphics if plug-ins should be downloaded to play the same. The users should be allowed the freedom of skipping the multimedia content. D. Flash Online questionnaire should use minimum of flash including blinking text since it requires certain browser versions or plug-ins. It is makes it difficult for users to download the animation or flash which takes much of the time to load. If it is a must to use flash then the site should provide the user an option of using a flash or non-flash format. It should provide for static navigation i.e it should not use disappear and reappear format and should always include a way to navigate back to the location from which a user encountered a flash. A close window option should be provided in all windows that open. E. Tables and Frames Tables and frames are used in website design for alignment and other aesthetic purposes. If tables are used to convey structured information, they should be kept short and simple. All information should be included within tables in straight text and should be in a standard format. F. Feedback
The font used should be readable and the text should be presented in standard sentence format. Capital letters should be used only for emphasizing title, captions etc. Sentences should not have minimum words and should be presented with minimum characters per line. Paragraph should be of minimum size. Technical instructions should be written in such a way that non-technical people can understand them. Questions should be easily distinguishable in terms of formatting from instructions and answers. The relative position of questions and answers should be consistent throughout the questionnaire. Where different types of questions are to be included in the same questionnaire, each question type should have a unique visual appearance A minimum font size of 12 pt should be used. The font colour should contrast significantly with the background colour. The text should be left justified and use of italics should be avoided.
64
It is important to get feedback on the online questionnaire so as to understand whether a respondent will abandon completion or will persevere with it. With each new section/page respondent should be given real time feedbacks to their degree of process through questionnaire. This may take the form of 30 % completed in a progressive bar. Respondents answers to the questions should be made immediately visible to them in a clear and concise manner to reinforce the effect of their action. G. Miscellaneous The following guidelines relating to formatting that does not fall in any of the previously discussed categories. The total website content should remain below 60 KB of text and graphics. A version of questionnaire as well as all referenced articles or documentation should be provided in an alternative format that can be printed fully. All introductory pages in the survey website should include a date-last-modified notification well as a copyright notice if applicable. 5. Response Formats Electronic equivalents to the various paper-based response styles have to be selected to best meet the needs of the questionnaire and target audience. Some guidelines in this respect are discussed below; A. Matrix questions: If a question involves many response options, matrix formats can be used to condense and simplify questions. it should be used sparingly as they require a lot of work to be done in a single screen. It is also hard to predict how such questions will appear on the respondents web browsers and size and format of such questions demands a significant amount of screen which cannot be guaranteed on smaller-scale technology. B. Drop-down boxes: A drop down text box appears in one line text format. When collapsed it contains a list or response options from which a respondent can select one or more. Drop down boxes are fast to download and can be used when very long lists of response options are required. It should be used sparingly as it requires very accurate mouse click and should be avoided when it would be faster to simply type the response. It is important that the first option in the dropdown list box is not visible by default as it can lead respondents to select the same. C. Radio Buttons: Radio buttons are small circles that are place next to response options of a closed end questions. By default, only one radio button within any given group of radio buttons can be selected at a time. It can be used in case of mutually exclusive options. They closely resemble the paper based questionnaire answer formats. It demands a relatively high degree of mouse precision and users with limited computer exposure may find it frustrating to click the options or to change the options. D.Check boxes Check boxes are typically small squares that contain a tick mark when checked and allow multiple options rather than exclusive options. They also require high degree of mouse precision. The advantage of using check boxes and radio buttons within the same questionnaire is that their appearance is visibly different and so, respondents are given visual cues as to how to answer any question using the either of the two response formats. 6. General technical guidelines In addition to the above technicalities of a online-questionnaire design, the following additional details should also be taken into consideration in the design of an online questionnaire: i. Privacy and Protection It is important to ensure that the respondents privacy and perception of privacy are protected. The survey data should be encrypted and the anonymity of the respondent should be assured. ii. Computer literacy The questionnaire should be designed keeping in mind the less knowledgeable, and low-end computer user. Specific instructions should be provided without offending the inexperienced respondents. Prior knowledge or preconceptions in terms of technological know-how should not be assumed. The need for double-click should be eliminated since it can be difficult for an inexperienced user. iii. Automation
65
Many aspects of the online questionnaire can be automated unlike the paper based questionnaire. For eg the skip questions. Skip questions are primarily used to determine on the basis of the individual respondents answer, which of the following questions a respondent should jump(or skip) to when question path is response directed. When automation is used it should be carefully designed it order to avoid disorientation or confusion to the respondents. iv. Platforms & browsers Before launching the questionnaire online it should be ensured that the questionnaire operates effectively across all platforms and within all browsers. As a general rule, only a portion of the capacity of the most advanced browsers should be used in order to maximize the chance that all recipients of the questionnaire are having a equal likelihood of responding to the same. v. Devices: The technology is highly portable and information access is highly mobile. Hence it is important to design all elements of the online questionnaire in a scaleable form. It is unlikely that many respondents would choose to complete a lengthy questionnaire on a handheld device, but this cannot be totally avoided not can the range of the sizes to which a respondent might resize their desktop browser can be anticipate. A well designed and tested questionnaire should take this aspect into consideration. Advantages
Web page surveys are extremely fast. A questionnaire posted on a popular Web site can gather several thousand responses within a few hours. Many people who will respond to an email invitation on the first day, and most will do so within a few days. There is practically no cost involved once the set up has been completed. Large samples do not cost more than smaller ones The researcher can use audio visuals in collection of the data. Some Web survey software can also show video and play sound. Web page questionnaires can use complex question skipping logic, randomizations and other features not possible with paper questionnaires or most email surveys. These features can assure better data. Web page questionnaires can use colors, fonts and other formatting options not possible in most email surveys. A significant number of people will give more honest answers to questions about sensitive topics, such as drug use when giving their answers to a computer, instead of to a person or on paper. On average, people give longer answers to open-ended questions on Web page questionnaires than they do on other kinds of self-administered surveys. Some Web survey software can combine the survey answers with pre-existing information on the individuals taking a survey. It is possible to link the online questionnaire to data base and as such the information received can be immediately updated without further need for manual data entry as in the case of paper based questionnaire
Disadvantages
The coverage error is prevalent in online survey ie all the members of the population is not having an equal chance of being included in the survey. This is particularly so in case of countries where the internet access is very low and computer illiteracy is higher. Non responsive error is higher because respondent may not opt to fill the online questionnaire. Also respondents can easily quit in the middle of a questionnaire. They are not as likely to complete a long questionnaire on the Web as they would be if talking with a good interviewer. The survey on a web page cannot exercise a control over who replies - anyone from anywhere who is surfing may answer. We cannot restrict the demographic pattern of the respondent. There is often no control over people responding multiple times to bias the results. 66
In the present context online surveys should be mainly used only when the target population consists entirely or almost entirely of Internet users. Business-to-business research and employee attitude surveys can often meet this requirement. Surveys of the general population usually will not. Web page surveys can be used when the researcher uses audio or video or both sound and graphics. A Web page survey may be the only practical way to have many people view and react to a video. The researcher should make sure that the software used for conducting online surveys prevents people from completing more than one questionnaire. The access can also be restricted by requiring a password or by putting the survey on a page that can only be accessed directly i.e., there are no links to it from other pages. 3.4.4 Other methods In addition to the above discussed methods of data collection, the following methods can be used: 1. Warranty cards These are post sized cards which are used by dealers of consumer durables to collect information regarding the product. The information sought is printed in the form of questions on the warranty cards which is placed inside the package along with the product with the product with a request to the consumer to fill in the card and post the same to the dealer. 2. Store audits Store audits are performed by distributors as well as manufacturers through their salesmen at regular intervals. The information is used to estimate market size, market share, seasonal purchasing pattern etc. The data is obtained mostly by observational method. Store audits are invariably panel operation, for the derivation of sales estimates and compilation of sales trends are the base for the calculation. It provides an efficient way to evaluate the effect of various in-store promotions on the sales. 3. Pantry audits This is used to estimate consumption of the basket of goods at the consumer level. The investigator collects an inventory of types, quantities and prices of the commodities consumed. In pantry audits the data are data are recorded from the examination of consumers pantry. The objective in a pantry audit is to identify the type of consumers who buy certain products and certain brands. The basic assumption is that the contents of the pantry accurately portray consumers preference. Pantry audits are usually supplemented by direct questioning relating to reasons for preference of a product. 4. Consumer panels The consumer panels consists a group of consumers who are interviewed on a regular basis over a period of time. The consumer panels may be transitory or a continuing panel. A transitory panel is set up to measure the effect of a particular phenomenon. The panel is conducted on a before and after basis. Interview is conducted before the phenomenon takes place and another interview after the phenomenon has occurred so as to measure the changes in the attitude and behaviour of the consumers. A continuing consumer panel is set up for a indefinite period with a view to collect data on a particular aspect of consumer behaviour over a time period. 5. Mechanical devices Use of mechanical devices enables to record data accurately. Eye camera, Pupilometric camera, Psychogalvanometer, Motion picture camera and Audiometer are some of the devices used for data collection. Eye cameras are designed to record the focus of eyes of a respondent on a specific portion of a sketch or diagram or a product package etc. pupilometric cameras record dilation of the pupil as a result of visual stimuli. The extent of dilation shows the degree of interest aroused by the stimuli. Pshchogalvanometer is used to measure the extent of body excitement as a result of the visual stimulus. Motion pictures are used to record the movement of the buyer while deciding to by a consumer good. Audiometers are used with Television to find out the type of programmes as well as channels preferred by viewers .A device is fitted in the television itself to record the changes which can be used to ascertain the market share. 6. Projective techniques
67
Certain ideas and thoughts cannot be easily verbalized as it remains at the unconscious levels in the minds of respondents. This can be brought to the surface by trained professionals who apply different probing techniques so as to bring to the surface the deep rooted ideas and thoughts. Some techniques are explained below: i. Word association test The test is used to extract information regarding words which have maximum association. Respondents are asked to quickly associate a word say - happy with the first thing that comes to mind. This is often used to get true attitudes and feeling of the respondent. The same idea is used in marketing research to find out the quality that is mostly associated with a brand of product. This technique is quick and easy to use and yields reliable results when applied to words that are widely known and posses essentially one meaning. ii. Sentence completion tests It is an extension of the word association tests. The respondent is provided with a several half completed statement regarding a subject. Analysis of replies from the respondent reveals his attitude towards the subject. This technique not only permits the words testing but ideas too. It is quick and easy to use, however it leads to analytical problems as the responses are multidimensional. iii. Story completion test This test is a step further where the researcher may contrive stories instead of sentences and the respondent to complete the same. The respondent is given just enough of a story to focus attention on a given subject and is asked to provide a conclusion of the story. iv. Verbal projection tests The respondent is asked to comment on or explain on what other people do. For example - Why people own a particular product? Answers may reveal the respondents own motivations. v. Pictorial techniques Several pictorial techniques are available. They are discussed below; a. Thematic appreciation test (T.A.T) requires respondent to weave a story around a picture that is shown. Several need patterns and personality characteristics could be traced through these tests. b. Rosenweiz test uses a cartoon format where a series of cartoons with words inserted are given to the respondents. The respondents is asked to put his own words in the empty space provided for the purpose in the picture. From the response the attitude of the respondent can be inferred. c. Rorschach test consists of ten cards having prints of ink-blots. The design happens to be symmetrical but meaningless. The respondents are asked to describe their perception and the responses are interpreted on the basis of some predetermined psychological framework. d. Holtzman Inkblot Test(HIT) contains 45 inkblot cards which are based on colour, movement, shading and other factors involved in inkblot perception. Only one response per card is obtained from the respondent and interrupted at three levels ie accuracy(F) or inaccuracy (F-) of respondents percepts; shading and color for ascertaining the affectional and emotional needs; and movement responses for assessing the dynamic aspects of life. vi. Play techniques under play techniques subjects are asked to act out a situation where various roles are assigned to them. The researcher may observe such traits as hostility, dominance, sympathy, prejudice or absence of such traits. vii. Quizzes, tests and examinations This technique is used for extracting information regarding specific ability of candidates indirectly. The procedure uses both long and short questions to test the memorising and analytical skills of the respondents. viii. Sociometry It is a technique for describing the social relationship among individuals in a group. It attempts to describe attractions or repulsions between individuals by asing them to indicate whom they would choose or reject in various situations. It enables to study the underlying motives of the respondents. Almost all the method of data collection discussed above have some bias associated with them. Hence collecting data through multimethods and from multiple sources lends rigor to research. However it would be costly and time consuming.
68
3.5 The Basics of Sampling Sampling is an important concept which is practiced in every activity. Sampling involves selecting a relatively small number of elements from a large defined group of elements and expecting that the information gathered from the small group will allow judgments to be made about the large group. The basic idea of sampling is that by selecting some of the elements in a population, the conclusion about the entire population is drawn. Sampling is used when conducting census is impossible or unreasonable. In a census method a researcher collects primary data from every member of a defined target population. It is not always possible or necessary to collect data from every unit of the population. The researcher can resort to sample survey to find answers to the research questions. However they can do more harm than good if the data is not collected from the people, events or objects that can provide correct answers to the problem. The process of selecting the right individuals, objects or events for the purpose of the study is known as sampling and the same is dealt in detail in this chapter. The basic terminologies used in sampling are discussed below: Population A population is an identifiable total group or aggregation of elements that are of interest to the researcher and pertinent to the specified problem. In other words it refers to the defined target population. A defined target population consists of the complete group of elements (people or objects) that are specifically identified for investigation according to the objectives of the research project. A precise definition of the target population is usually done in terms of elements, sampling units and time frames. Element An element is a single member of the population. It is a person or object from which the data/information is sought. Elements must be unique, be countable and when added together make up the whole of the target population. If 250 workers in a concern happen to the population of interest to the researcher, each worker therein is an element. Population Frame The population frame is listing of all elements in the population from which the sample is drawn. The nominal roll of class students could be the population frame for the study of students in a class. Sampling units Sampling units are the target population elements available for selection during the sampling process. In a simple, single-stage sample, the sampling units and the population elements may be the same. Sampling frame After defining the target population, the researcher must assemble a list of all eligible sampling units, referred to as a sampling frame. Some common sources of sampling frames for a study about the customers are the customer list form credit card companies. Sample A sample is a subset or subgroup of the population. It comprises some members selected from it. Only some and not all elements of the population would form the sample. If 200 members are drawn from a population of 500 workers, these 200 members form the sample for the study. From the study of 200 members, the researcher would draw conclusions about the entire population. Subject A subject is a single member of the sample, just as an element is a single member of the population. If 200 members from the total population of 500 workers form the sample for the study, then each worker in the sample is a subject. 3.5.1 Why sampling? There are several reasons for sampling. They are explained below; Lower cost: The cost of conducting a study based on sample is much lesser than the cost of conducting the census study. Greater accuracy of results: It is generally argued that the quality of a study is often better with sampling data than with a census. Research findings also substantiate this opinion. 69
Greater speed of data collection: Speed of execution of data collection is higher with the sample. It also reduces the time between the recognition of a need for information and the availability of that information. Availability of population element: Some situations require sampling. When the breaking strength of materials is to be tested, it has to be destroyed. A census method cannot be resorted as would mean complete destruction of all materials. Sampling is the only process possible if the population is infinite.
3.5.2 Steps in Developing a Sampling plan A number of concepts, procedures and decisions must be considered by a researcher in order to successfully gather raw data from a relatively small group of people which in turn can be used to generalize or make predications about all the elements in a larger target population. The following are the logical steps involved in the sample execution. Define the target population
Select the Data Collection Method
Identify the Sampling Frame needed
Select the Appropriate Sampling Method
Determine necessary sample size and overall contact rates
Create an Operating plan for selecting sampling units
Execute the operational plan Define the target population The first task of a researcher is to determine and identify the complete group of people or objects that should be included in the study. With the statement of the problem and the objectives of the study acting as guideline the target population should be identified on the basis of descriptors that represent the characteristics features of element that make the target populations frame. These elements become the prospective sampling unit from which a sample will be drawn. A clear understanding of the target population will enable the researcher to successfully draw a representative sample. Select the data collection method Based on the problem definition, the data requirements and the research objectives, the researcher should select a data collection method for collecting the required data from the target population elements. The method of 70
data collection guides the researcher in identifying and securing the necessary sampling frame for conducting the research. Identify the sampling frames needed The researcher should identify and assemble a list of eligible sampling units. The list should contain enough information about each prospective sampling unit so as to enable the researcher to contact them. Drawing an incomplete frame decreases the likelihood of drawing a representative sample. Select the appropriate sampling method The researcher can choose between probability and non probability sampling methods. Using a probability sampling method will always yield better and more accurate information about the target populations parameters than the non probability sampling methods. Seven factors should be considered in deciding the appropriateness of the sampling method viz., research objectives, degree of desired accuracy, availability of resources, time frame, advanced knowledge of the target population, scope of the research and perceived statistical analysis needs. Determine necessary sample sizes and overall contact rates The sample size is decided based on the precision required from the sample estimates, time and money available to collect the required data. While determining the sample size due consideration should be given to the variability of the population characteristic under investigation, the level of confidence desired in the estimates and the degree of the precision desired in estimating the population characteristic. The number of prospective units to be contacted to ensure that the estimated sample size is obtained and the additional cost involved should be considered. The researcher should calculate the reachable rates, overall incidence rate and expected completion rates associated with the sampling situation. Creating an operating plan for selecting sampling units The actual procedure to be used in contacting each of the prospective respondents selected to form the sample should be clearly laid out. The instruction should be clearly written so that interviewers know what exactly should be done and the procedure to be followed in case of problems encountered in contacting the prospective respondents. Executing the operational plan The sample respondents are met and actual data collection activities are executed in this stage. Consistency and control should be maintained at this stage. 3.5.3 Characteristic features of a good sample The ultimate test of a good sample is based on how well it represents the characteristics of the population it represents. In terms of measurement the sample should be valid. Validity of the sample depends on two considerations viz., accuracy and precision. Accuracy The accuracy is determined by the extent to which bias is eliminated from the sample. When the sample elements are drawn properly, some sample elements underestimates the population values being studied and others overestimate them. Variations in these values offset each other. This counteraction results in sample value that is generally close to the population value. An accurate ie., unbiased sample is one in which the underestimators and the overestimators are balance among the members of the sample. There is no systematic variance with an accurate sample. Systematic variance has been defined as the variation in measures due to some unknown influences that cause the scores to lean in on direction more than another. Even a large size of samples cannot counteract systematic bias. Precision A second criterion of a good sample design is precision of estimate. No sample will fully represent its population in all aspects. The numerical descriptors that describe samples may be expected to differ from those that describe population because of random fluctuations inherent in the sampling process. This is called sampling error. Sampling error is what is left after all known sources of systematic variance have been accounted for. In theory, sampling error consists of random fluctuations only, although some unknown systematic variance may be included when too many or too few sample elements possess a particular characteristic. Precision is measured by standard error of estimate, a type of standard deviation measurement; the smaller the standard error of estimate, the higher is the precision of the sample. The ideal sample design produces a small standard error of estimate. 71
3.6 Types of sampling design The sampling design can be broadly grouped on two basis viz., representation and element selection. Representation refers to the selection of members on a probability or by other means. Element selection refers to the manner in which the elements are selected individually and directly from the population. If each element is drawn individually from the population at large, it is an unrestricted sample. Restricted sampling is where additional controls are imposed, in other words it covers all other forms of sampling. The classification of sampling design on the basis of representation and element selection is shown below: Element Selection Unrestricted Restricted Representation Basis Probability Nonprobability Simple random Convenience Complex random Purposive Systematic Judgement Stratified Quota Cluster Snowball Double
3.6.1 Probability Sampling Probability sampling is where each sampling unit in the defined target population has a known nonzero probability of being selected in the sample. The actual probability of selection for each sampling unit may or may not be equal depending on the type of probability sampling design used. Specific rules for selecting members from the operational population are made to ensure unbiased selection of the sampling units and proper sample representation of the defined target population. The results obtained by using probability sampling designs can be generalized to the target population within a specified margin of error. The different types of probability sampling designs are discussed below; A. Unrestricted or Simple Random sampling In the unrestricted probability sampling design every element in the population has a known, equal nonzero chance of being selected as a subject. For example, if 10 employees (n = 10) are to be selected from 30 employees (N = 30), the researcher can write the name of each employee in a piece of paper and select them on a random basis. Each employee will have an equal known probability of selection for a sample. The same is expressed in terms of the following formula; Probability of selection = Size of sample -------------------------Size of population Each employee would have a 10/30 or .333 chance of being randomly selected in a drawn sample. When the defined target population consists of a larger number of sampling units, a more sophisticated method can be used to randomly draw the necessary sample. A table of random numbers can be used for this purpose. The table of random numbers contains a list of randomly generated numbers. The numbers can be randomly generated through the computer programs also. Using the random numbers the sample can be selected. Advantages and disadvantages The simple random sampling technique can be easily understood and the survey result can be generalized to the defined target population with a prespecified margin of error. It also enables the researcher to gain unbiased estimates of the populations characteristics. The method guarantees that every sampling unit of the population has a known and equal chance of being selected, irrespective of the actual size of the sample resulting in a valid representation of the defined target population. The major drawback of the simple random sampling is the difficulty of obtaining complete, current and accurate listing of the target population elements. Simple random sampling process requires all sampling units to be identified which would be cumbersome and expensive in case of a large population. Hence this method is most suitable for a small population. B. Restricted or Complex Probability Sampling 72
As an alternative to the simple random sampling design, several complex probability sampling design can be used which are more viable and effective. Efficiency is improved because more information can be obtained for a give sample size using some of the complex probability sampling procedures than the simple random sampling design. The five most common complex probability sampling designs viz., systematic sampling, stratified random sampling, cluster sampling, area sampling and double sampling are discussed below; i. Systematic random sampling The systematic random sampling design is similar to simple random sampling but requires that the defined target population should be ordered in some way. It involves drawing every nth element in the population starting with a randomly chosen element between 1 and n. In other words individual sampling units are selected according their position using a skip interval. The skip interval is determined by dividing the sample size into population size. For eg. if the researcher wants a sample of 100 to be drawn from a defined target population of 1000, the skip interval would be 10(1000/100). Once the skip interval is calculated, the researcher would randomly select a starting point and take every 10th until the entire target population is proceeded thorough. The steps to be followed in a systematic sampling method are enumerated below; Total number of elements in the population should be identified The sampling ratio is to be calculated ( n = total population size divided by size of the desired sample) The random start should be identified A sample can be drawn by choosing every nth entry Two important considerations in using the systematic random sampling are; It is important that the natural order of the defined target population list be unrelated to the characteristic being studied. Skip interval should not correspond to the systematic change in the target population. Advantages and disadvantages The major advantage is its simplicity and flexibility. In case of systematic sampling there is no need to number the entries in a large personnel file before drawing a sample. The availability of lists and shorter time required to draw a sample compared to random sampling makes systematic sampling an attractive, economical method for researchers. The greatest weakness of systematic random sampling is the potential for the hidden patterns in the data that are not found by the researcher. This could result in a sample not truly representative of the target population. Another difficulty is that the researcher must know exactly how many sampling units make up the defined target population. In situations where the target population is extremely large or unknown, identifying the true number of units is difficult and the estimates may not be accurate. ii. Stratified random sampling Stratified random sampling requires the separation of defined target population into different groups called strata and the selection of sample from each stratum. Stratified random sampling is very useful when the divisions of target population are skewed or when extremes are present in the probability distribution of the target population elements of interest. The goal in stratification is to minimize the variability within each stratum and maximize the difference between strata. The ideal stratification would be based on the primary variable under study. Researchers often have several important variables about which they want to draw conclusion. A reasonable approach is to identify some basis for stratification that correlates well with other major variables. It might be a single variable like age, income etc or a compound variable like on the basis of income and gender. Stratification leads to segmenting the population into smaller, more homogeneous sets of elements. In order to ensure that the sample maintains the required precision in terms of representing the total population, representative samples must be drawn from each of the smaller population groups. There are three reasons as to why a researcher chooses a stratified random sample; To increase the samples statistical efficiency To provide adequate data for analyzing various sub population To enable different research methods and procedures to be used in different strata. Drawing a stratified random sampling involves the following steps; 73
1. Determine the variables to use for stratification 2. Select proportionate or disproportionate stratification 3. Divide the target population into homogeneous subgroups or strata 4. Select random samples from each stratum 5. Combine the samples from each stratum into a single sample of the target population. There are two common methods for deriving samples from the strata viz., proportionate and disproportionate. In proportionate stratified sampling, each stratum is properly represented so the sample drawn from it is proportionate to the stratums share of the total population. The larger strata are sampled more because they make up a larger percentage of the target population. This approach is more popular than any other stratified sampling procedures due to the following reasons; It has higher statistical efficiency than the simple random sample It is much easier to carry out than other stratifying methods It provides a self-weighting sample ie the population mean or proportion can be estimated simply by calculating the mean or proportion of all sample cases. In disproportionate stratified sampling, the sample size selected from each stratum is independent of that stratums proportion of the total defined target population. This approach is used when stratification of the target population produces sample sizes that contradict their relative importance to the study. An alternative of disproportionate stratified method is optimal allocation. In this method, consideration is given to the relative size of the stratum as well as the variability within the stratum to determine the necessary sample size of each stratum. The logic underlying the optimal allocation is that the greater the homogeneity of the prospective sampling units within a particular stratum, the fewer the units that would have to be selected to estimate the true population parameter accurately for that subgroup. This method is also opted for in situation where it is easier, simpler and less expensive to collect data from one or more strata than from others. Advantages and disadvantages Stratified random sampling provides several advantages viz., the assurance of representativeness in the sample, the opportunity to study each stratum and make relative comparisons between strata and the ability to make estimates for the target population with the expectation of greater precision or less error. iii. Cluster sampling Cluster sampling is a probability sampling method in which the sampling units are divided into mutually exclusive and collectively exhaustive subpopulation called clusters. Each cluster is assumed to be the representative of the heterogeneity of the target population. Groups of elements that would have heterogeneity among the members within each group are chosen for study in cluster sampling. Several groups with intragroup heterogeneity and intergroup homogeneity are found. A random sampling of the clusters or groups is done and information is gathered from each of the members in the randomly chosen clusters. Cluster sampling offers more of heterogeneity within groups and more homogeneity among the groups. Single stage and Multistage cluster sampling In single stage cluster sampling, the population is divided into convenient clusters and required numbers of clusters are randomly chosen as sample subjects. Each element in each of the randomly chosen cluster is investigated in the study. Cluster sampling can also be done in several stages which is known as multistage cluster sampling. For example to study the banking behaviour of customers in a national survey , cluster sampling can be used to select the urban, semiruban and rural geographical locations of the study. At the next stage, particular areas in each of the location would be chosen. At the third stage, the banks within each area would be chosen. Thus multi stage sampling involves a probability sampling of the primary sampling units; from each of the primary units, a probability sampling of the secondary sampling units is drawn; a third level of probability sampling is done from each of these secondary units, and so on until the final stage of breakdown for the sample units are arrived at, where every member of the unit will be a sample. Area sampling Area sampling is a form of cluster sampling in which the clusters are formed by geographic designations. For example, state, district, city, town etc., Area sampling is a form of cluster sampling in which any geographic unit with 74
identifiable boundaries can be used. Area sampling is less expensive than most other probability designs and is not dependent on population frame. A city map showing blocks of the city would be adequate information to allow a researcher to take a sample of the blocks and obtain data from the residents therein. Advantages and disadvantages of cluster sampling The cluster sampling method is widely used due to its overall cost-effectiveness and feasibility of implementation. In many situation the only reliable sampling unit frame available to researchers and representative of the defined target population, is one that describes and lists clusters. The list of geographical regions, telephone exchanges, or blocks of residential dwelling can normally be easily compiled than the list of all the individual sampling units making up the target population. Clustering method is a cost-efficient way of sampling and collecting raw data from a defined target population. One major drawback of clustering method is the tendency of cluster to be homogeneous. The greater the homogeneity of the cluster, the less precise will be the sample estimate in representing the target population parameters. The conditions of intracluster heterogeneity and intercluster homogeneity are often not met. For these reason this method is not practiced often Stratified random sampling Vs Cluster sampling The cluster sampling differs from stratified sampling in the following manner; In stratified sampling the population is divided into a few subgroups, each with many elements in it and the subgroups are selected according to some criterion that is related to the variables under the study. In cluster sampling the population is divided into many subgroups each with a few elements in it. The subgroups are selected according to some criterion of ease or availability in data collection. Stratified sampling should secure homogeneity within the subgroups and heterogeneity between subgroups. Cluster sampling tries to secure heterogeneity within subgroups and homogeneity between subgroups. The elements are chosen randomly within each subgroup in stratified sampling. In cluster sampling the subgroups are randomly chosen and each and every element of the subgroup is studied indepth. iv. Double sampling This is also called sequential or multiphase sampling. Double sampling is opted when further information is needed from a subset of group from which some information has already been collected for the same study. It is called as double sampling because initially a sample is used in the study to collect some preliminary information of interest and later a subsample of this primary sample is used to examine the matter in more detail The process includes collecting data from a sample using a previously defined technique. Based on this information, a sub sample is selected for further study. It is more convenient and economical to collect some information by sampling and then use this information as the basis for selecting a sub sample for further study. 3.6.2 Nonprobability Sampling In nonprobability sampling method, the elements in the population do not have any probabilities attached to being chosen as sample subjects. This means that the findings of the study cannot be generalized to the population. However at times the researcher may be less concerned about generalizability and the purpose may be just to obtain some preliminary information in a quick and inexpensive way. Sometime when the population size is unknown, then nonproability sampling would be the only way to obtain data. Some non probability sampling technique may be more dependable than others and could often lead to important information with regard to the population. The non probability sampling designs are discussed below; A. Convenience sampling Nonprobability samples that are unrestricted are called convenient sampling. Convenience sampling refers to the collection of information from members of population who are conveniently available to provide it. Researchers or field workers have the freedom to choose as samples whomever they find thus it is named as convenience. It is mostly used during the exploratory phase of a research project and it is the best way of getting some basic information quickly and efficiently. The assumptions is that the target population is homogeneous and the individuals selected as samples are similar to the overall defined target population with regard to the characteristics being studied. However in reality there is no way to accurately assess the representativeness of the sample. Due to the self selection and voluntary nature of participation in data collection process the researcher should give due consideration to the nonresponse error. 75
Advantages and disadvantages Convenient sampling allows a large number of respondents to be interviewed in a relatively short time. This is one of the main reasons for using convenient sampling in the early stages of research. However the major drawback is that the use of convenience samples in the development phases of constructs and scale measurements can have a serious negative impact on the overall reliability and validity of those measures and instruments used to collect raw data. Another major drawback is that the raw data and results are not generalizable to the defined target population with any measure of precision. It is not possible to measure the representativeness of the sample, because sampling error estimates cannot be accurately determined. B. Purposive sampling A nonprobability sample that conforms to certain criteria is called purposive sampling. There are two major types of purposive sampling viz.., Judgment sampling and Quota sampling. i. Judgment sampling Judgment sampling is a non probability sampling method in which participants are selected according to an experienced individuals belief that they will meet the requirements of the study. The researcher selects sample members who conform to some criterion. It is appropriate in the early stages of an exploratory study and involves the choice of subjects who are most advantageously placed or in the best position to provide the information required. This is used when a limited number or category of people have the information that are being sought. The underlying assumption is that the researchers belief that the opinions of a group of perceived experts on the topic of interest are representative of the entire target population. Advantages and disadvantages If the judgment of the researcher or expert is correct then the sample generated from the judgment sampling will be much better than one generated by convenience sampling. However, as in the case of all non probability sampling methods, the representativeness of the sample cannot be measured. The raw data and information collected through judgment sampling provides only a preliminary insight. ii. Quota sampling The quota sampling method involves the selection of prospective participants according to prespecified quotas regarding either the demographic characteristics (gender,age, education , income, occupation etc.,) specific attitudes ( satisified, neutral, dissatisfied) or specific behaviours ( regular, occasional, rare user of product) .The purpose of quota sampling is to provide an assurance that prespecified subgroups of the defined target population are represented on pertinent sampling factors that are determined by the researcher. It ensures that certain groups are adequately represented in the study though the assignment of the quota. Advantages and disadvantages The greatest advantage of quota sampling is that the sample generated contains specific subgroups in the proportion desired by researchers. In those research projects that require interviews the use of quotas ensures that the appropriate subgroups are identified and included in the survey. The quota sampling method may eliminate or reduce selection bias. An inherent limitation of quota sampling is that the success of the study will be dependent on subjective decisions made by the researchers. As nonprobability method, it is incapable of measuring true representativeness of the sample or accuracy of the estimate obtained. Therefore attempts to generalize the data results beyond those respondents who were sampled and interviewed become very questionable and may misrepresent the given target population. iii. Snowball Sampling Snowball sampling is a nonprobability sampling method in which a set of respondents are chosen who help the researcher to identify additional respondents to be included in the study. This method of sampling is also called as referral sampling because one respondent refers other potential respondents. Snowball sampling is typically used in research situations where the defined target population is very small and unique and compiling a complete list of sampling units is a nearly impossible task. While the traditional probability and other nonprobability sampling methods would normally require an extreme search effort to qualify a sufficient number of prospective respondents, the snowball method would yield better result at a much lower cost. The researcher has to identify and interview one qualified respondent and then solicit his help to identify other respondents with similar characteristics. 76
Advantages and disadvantages Snowball sampling enables to identify and select prospective respondents who are small, hard to reach and uniquely defined target population. It is most useful in qualitative research practices. Reduced sample size and costs are the primary advantage of this sampling method. The major drawback is that the chance of bias is higher. If there is a significant difference between people who are identified through snowball sampling and others who are not then, it may give raise to problems. The results cannot be generalized to members of larger defined target population. 3.7 Determination of Appropriate Sampling Design Determining an appropriate sampling design is a challenging issue and has greater implications on the application of the research findings. Apart from considering the theoretical components, sampling issues, advantages and drawbacks of different sampling techniques, the decision should take into consideration the following factors; 1. Research objectives A clear understanding of the statement of the problem and the objectives will provide the initial guidelines for determining the appropriate sampling design. If the research objectives include the need to generalize the findings of the research study, then a probability sampling method should be opted rather than a non probabiolity sampling method. In addition the type of research viz., exploratory or descriptive will also influence the type of the sampling design. 2. Scope of the research The scope of the research project is local, regional, national or international has an implication on the choice of the sampling method. The geographical proximity of the defined target population elements will influence not only the researchers ability to compile needed list of sampling units, but also the selection design. When the target population is equally distributed geographically a cluster sampling method may become more attractive than other available methods. If the geographical area to be covered is more extensive then complex sampling method should be adopted to ensure proper representation of the target population. 3. Availability of resources The researchers command over the financial and human resources should be considered in deciding the sampling method. If the financial and human resource availability are limited, some of the more time-consuming, complex probability sampling methods cannot be selected for the study. 4. Time frame The researcher who has to meet a short deadline will be more likely to select a simple, less time consuming sampling method rather than a more complex and accurate method. 5. Advanced knowledge of the target population If the complete lists of the entire population elements are not available to the researcher, the possibility of the probability sampling method is ruled out. It may dictate that a preliminary study be conducted to generate information to build a sampling frame for the study. The researcher must gain a strong understanding of the key descriptor factors that make up the true members of any target population. 6. Degree of accuracy The degree of accuracy required or the level of tolerance for error may vary from one study to another. If the researcher wants to make predictions or inferences about the true position of all members of the defined target population, then some type of probability sampling method should be selected. If the researcher aims to solely identify and obtain preliminary insights into the defined target population, non probability methods might prove to be more appropriate. 6. Perceived statistical analysis needs The need for statistical projections or estimates based on the sample results is to be considered. Only probability sampling techniques allow the researcher to adequately use statistical analysis for estimates beyond the sample respondents. Though the statistical method can be applied on the non probability samples of people and objects, the researchers ability to accurately generalize the results and findings to the larger defined target population is technically inappropriate and questionable. The researcher should also decide on the appropriateness of sample size as it has a direct impact on the data quality, statistical precision and generalizability of findings. 3.8 Sampling decisions : Some Issues
77
Sampling design and sample size are both important to establish the representativeness of the sample for generalizability. Even a large sample size cannot yield generalizable research findings if the appropriate sampling design is not used. Similarly unless the sample size is adequate and acceptable to ensure precision and confidence, the sampling design however justifiable and sophisticated, may not be useful to the researcher. Hence a sampling design should give due consideration to both sample size and design. If the sample size is too large it would lead to Type II errors ie., the findings of the research would be accepted instead of rejection. Due to the large sample size, even weak relationship might reach significance level and the researcher would be inclined to believe that these significant relationships found in the sample can be extended to the population which may not be true. Likewise if the sample size is too small, it may lead to generalization issues. Even if the sample size is appropriate whether the same is statistically significant and relevant is to be considered. For example there may be a statistically significant relationship between two variables but if it explains only a very small percentage of the variation then it may not have a practical utility. The following rule of thumb proposed by Roscoe (1975) can be considered in determining appropriate sample size. 1. Sample size larger than 30 and less than 500 are appropriate for most research. 2. If the samples are to be broken into sub samples and groups a minimum sample size of 30 in each category should be fixed. 3. In multivariate research the sample size should be atleast ten times as large as the number of variables in the study. 4. In case of simple experimental research a sample as small as 10 to 20 in size would yield good results. 3.8.1 Precision and Confidence in sample size estimation Since the sample data is used for drawing inference regarding the population, the inferences should be accurate to the extent possible and it should also be possible to estimate the error. An interval estimation to ensure a relatively accurate estimation of the population parameter should be made. For this purpose, statistics that have the same distribution as the sampling distribution of mean, usually a Z or t statistic is used. For example the problem at hand is to estimate the mean value of purchases made by a customer from department stores. A sample of 64 customers are identified through systematic sampling method and it is found that the sample mean X = 105 and the sample standard deviation S = 10. X, the sample mean is a point estimate of , the population mean. A confidence interval could be constructed around X to estimate the range within which would fall. The standard error S X and the percentage or level of confidence required will determine the width of the interval which is determined by the formula. = X KS X SX = SX = S n
10 =1.25 64
the cirtical value of t For 90% confidence level the k value is 1.645 For 95% confidence level the k value is 1.96 For 99% confidence level the k value is 2.576 If 90% confidence level is desired then = 105 +- 1.645(1.25) would fall between 102.944 and 107.056. This indicates that using a sample size of 64, it can be stated with 90% confidence that the true population mean value of all customers would fall between Rs. 102.944 and 107.056. If it is required to increase the confidence level to 99% without increasing the sample size, then the precision has to be sacrificed, as could be seen from the following calculation: = 105 + _ 2.576(1.25) 78
would fall between 101.78 and 108.22 The width of the interval has increased and as such the precision in the estimation is comparatively less though the confidence level in the estimation has increased. A larger sample size is required if the precision and confidence level has to be increased. The sample size , n is a function of The variability in the population Precision or accuracy needed Confidence level desired Type of sampling plan used. If the sample size cannot be increased, the only way to maintain same level of precision would be by discarding the confidence level in the estimation. The confidence level or certainty of the estimate will be reduced. It is a must for researchers to consider four aspects while making decisions regarding the sample size. The precision level needed in estimating the population characteristics ie the allowable margin of error. The level of confidence required ie., the percentage chance the researcher is willing to take in committing error in the estimation of population parameters. The extent of variability in the population on the characteristics investigated The cost - benefit analysis of increasing the sample size. 3.8.2 Sample data and hypothesis testing In addition to estimating the population parameters, the sample data can also be used to test hypotheses about population values. For example, if we want to determine whether customer spend the same average amount in purchases at Department A as in Department B a null hypothesis can be formed. Null hypothesis proposes that there is no significant differences in the amount spent by customers at the two different stores. This would be expressed as: H0 : A- B = 0 The alternate hypothesis can be states as follow; H0 : A- B 0 If a sample of 20 customers from each of the two stores and find that the mean value of purchases of customers in Store A is 105 with a standard deviation of 10, and the corresponding figures for store B are 100 and 15, respectively , it can be seen that XA X B = 105-100 = 5 The null hypothesis states that there is no significant difference. The probability of the two group means having a difference of 5 in the context of null hypothesis should be determined. This can be done by converting the difference in the sample means to a t statistic and identify the probability of finding a t of that value. The t distribution has known probabilities attached to it. The critical values in t distribution for two samples of 20 each with 38 as degrees of freedom (n1+n2)-2 = 38) is 2.021. A two tailed test is used to know whether the difference between Store A and Store B is positive or negative. The t statistics can be calculated for testing the hypothesis as follows: x x ( 1 2 ) t= 1 2 SX 1 SX 2
S x1 S x 2 =
2 n1 S12 + n2 S 2 1 1 + ( n1 + n2 2) n1 n2
( 20 10 ) + ( 20 15 ) 1 + 1
2 2
( 20 + 20 2)
t=
20 20
xB ( A B ) 4.136 It is known that x A x B = 5 (The difference in the mean of two stores)

A
(x
79
A B = 0 (null hypothesis)
50 = 1.209 4.136 The t value of 1.209 is much below the value of 2.201at 95% significance level. Even for 90% probability requires a value of 1.684. Thus the difference of 5 found between the two stores is not significant. The conclusion is that there is no significant difference between the spending pattern of the customers in Store A and in Store B. Thus the null hypothesis is accepted and alternate hypothesis is rejected. t= 3.8.3 Determining the Sample size Sampling is done to reduce the cost of data collection and for the purpose of convenience. However there is a likelihood of missing some useful information about the population if the sample is inadequate. While deciding the sample size, care should be taken to ensure that neither a small sample is selected so as to enhance the risk of sampling error nor too many units are selected to increase the cost of study. It is necessary to make a trade-off between (i) increasing sample size which would reduce the sampling error but increase the cost and (ii) decreasing the sample size which might increase the sampling error while decreasing the cost. Several factors should be considered before deciding the sample size. The firs and the foremost is the size of the error that would be tolerable for the purpose of the decision-making. The second is the degree of confidence with the results of the study. If 100 percent confidence of result is needed the entire population must be studied. However it is impractical and costly. Normally confidence limit is accepted at 99%, 95% and 90%. The confidence and precision aspects are discussed in detail under the heading precision and confidence in sample size estimation dealt earlier. For determining the sample size the following relationship is used. x = standard error of the estimate = n
x can be calculated if we know the upper and lower confidence limits. If these limits are assumed to be Y, then Z x = Y where Z is the value of the normal variate for a given confidence level.
The procedure for determining sample size can be illustrated through an example. A management consultant concern is performing a survey to determine the annual salary of managers numbering 3000 in the textile concern within a district. The sample size it should take for the purpose of the study should be ascertained in order to estimate the mean annual earnings within plus and minus 1000 at 95 percent confidence level. The standard deviation of annual earning of the entire population is known to be Rs.3000. The desired upper and lower limit is Rs.1000 ie., the estimate of annual earnings within plus and minus Rs.1000 should be ascertained. Z = 1000 The level of confidence is 95 %, the Z value is 1.96. 1.96 x =1000
x =
1000 = 510.20 1.96
The standard error x is given by
where the population standard deviation
= 510.20 n
80
i.e.,
3000 = 510.20 n n= 3000 = 5.88 510.20
i.e., n = 34.57
Therefore the desired sample size is approximately 35. Have you understood? Discuss the different data sources, explaining their usefulness and disadvantages? Discuss the types of error and the steps to avoid the same. Discuss the important issues to be considered in designing a questionnaire What are projective techniques? Where can it be used? How has the advancement in technology helped data gathering? Elucidate the differences between questionnaire and an interview schedule? What is an electronic survey? Discuss the issues to be considered in designing and electronic questionnaire When should a researcher opt for sampling and why? Discuss the steps involved in sampling plan. Discuss the various probability and non probability sampling techniques. Non probability sampling design ought to be preferred to probability sampling designs in some cases. Explain with example. Discuss the issues concerned with precision and confidence in sampling design.
SUMMARY This chapter dealt in detail the various sources of data and the data collection methods. The primary data sources viz., the focus group and panels were discussed in detail. The data collection methods viz., the interview, questionnaire, observation and other methods were examined. Sampling design is an important element of research as it decides the validity and the reliability of the research findings. The various probability and non probability techniques were discussed in detail. The method of determining the sample size, the precision and confidence desired in estimating the population size were explained. With this background, the next unit provides a detailed discussion on the various multivariate techniques used to analyze the data collected.
81
Unit 4 A Refresher on Some Multivariate Statistical techniques 4.1 Introduction Business problems today are more complex. The various functional areas of management are confronted by multiple independent and/or dependent variables. This requires the application of multivariate techniques to gain an insight into the problems or to make decisions regarding the choices involved. The availability of computers with fast processing speed and versatile software has enhanced the application of these techniques which involves complex mathematical calculations. Multivariate analysis can be defined as those statistical techniques which focus upon, and bring out in bold relief, the structure of simultaneous relationships among three or more phenomena. Thus multivariate analysis refers to a group of statistical techniques used when there are two or more measurements on each element and the variables are analyzed simultaneously. It is concerned with the simultaneous relationship among two or more phenomena. Multivariate techniques are largely empirical and deal with the reality. The basic objective underlying the use of multivariate techniques is to represent a collection of massive data in a simplified manner. In other words, multivariate techniques transform a mass of observations into smaller number of composite scores in such a way that they reflect as much information as possible contained in the raw data obtained in a research study. This unit explains some of the multivariate techniques and the application of statistical package to solve the same. 82
4.2 Learning objectives: After reading this unit, you will be able to: Classify and select appropriate multivariate techniques Understand where ,why and how to use factor analysis Know the use of Cluster analysis techniques for grouping similar objects or people Classify people or objects into groups using Discriminant analysis Predict a metric dependent variable from a set of metric independent variable using multiple regression & correlation Need and application of canonical correlation Use the statistical package to apply multivariate techniques

4.3 Multivariate Techniques Selecting an appropriate technique requires an understanding of the distinction between dependency and interdependency techniques. In dependence method, multivariate techniques is used to explain or predict the dependent variable on the basis of two or more independent variables. A dependence method can be defined as one in which a variable is identified as the dependent variable to be predicted or explained by other independent variables. Dependence techniques include multiple regression analysis, discriminant analysis, MANOVA and conjoint analysis. In interdependence method, no single variable or group of variables is defined as being independent or dependent. The multivariate procedure here involves the analysis of all the variables in the data set simultaneously. The goal of interdependence method is to group respondents or objects together. In this case no single variable is explained or predicted by others. Cluster analysis, factor analysis and multidimensional scalings are the most frequently used interdependence techniques. In selecting a multivariate technique, two aspects should be considered viz., i. Whether the variables can be grouped as dependent and independent or whether it is based on the interdependency or dependency based technique. ii. Whether the data is metric or non metric. The nature of the measurement scales will enable to determine the appropriate multivariate technique to be selected to analyze the data. The selection of technique requires consideration of the types of measures used for both dependent and independent set of variables. If the dependent variable is measured nonmetrically, the appropriate methods are discriminant and conjoint analysis. If the dependent variable is measured metrically, the techniques like multiple regression, ANOVA, MANOVA and Conjoint can be used. Multiple regression and discriminant analysis require metric independents, but they can use nonmetric dummy variables. ANOVA, MANOVA and conjoint analysis are appropriate with nonmetric independent variables. The interdependence techniques of factor analysis and cluster analysis are most frequently used with metrically measured variables, but nonmetric adaptations are possible. The multivariate procedures are explained in the following figure: Variables in multivariate analysis Several variables are used in the context of multivariate analysis. They are classified into different categories and explained below; 1. Explanatory variable and criterion variable 83
If X is considered to be the cause of Y, then X is described as explanatory variable. It is also called as causal or independent variable. Y is described as criterion variable and it is also called as resultant or dependent variable. In some situations both explanatory variable and criterion variable may consist of a set of many variables in which case set X1, X2,X3Xn is called as set of explanatory variables and the set Y1, Y2,Y3Yn is called as a set of criterion variables , if the variation in the former is supposed to cause the variation in the latter as a whole. In economics the explanatory variables are called as external or exogeneous variables and the criterion variables are called endogeneious variables. The term external criterion for explanatory variable and the term internal criterion for criterion variable is also used. 2. Observable variables and latent variables
If the explanatory variables described above can be observed directly, it is termed as observable variables. However there are some unobservable variables which may have an influence on the criterion variables. they are termed as unobservable or latent variables 3. Discrete variables and continuous variables
Discrete variables are those variables which can be measured in term of the integer value only. Continuous variables can assume real value ie the decimal points. 4. Dummy variable.
This is also called as Pseudo variable. The term is used in technical sense and is useful in algebraic manipulations in the context of multivariate analysis. Xi ( i = 1, ., m) is called as dummy variable if only one of Xi is 1 and the others are all zero. 4.4 Factor Analysis Factor analysis is the most often used multivariate technique in research studies especially in studies pertaining to social and behaviourial sciences. It is a class of procedures primarily used for data reduction and summarization. Researchers can use factor analysis for two primary functions in data analysis ie., to identify the underlying constructs in data and to reduce the number of variables to a more manageable set. In factor analysis there is no distinction between dependent and independent variables, all variables under investigation are analyzed together to identify underlying factors. It is used to summarize the information contained in a large number of variables into a smaller number of subsets or factors. The purpose of factor analysis is to simplify the data. In reducing the number of variables, factor analysis procedures attempt to retain as much of the information as possible and make the remaining variables meaningful and easy to work with. Factor analysis resolves a large set of measured variables into relatively few factors and the factors so derived are treated as new variables. The value of the new variables is derived by summing the original values which have been grouped into the factors. The meaning and name of the new variable is subjectively determined by the researcher. Factor is a linear combination of data and hence the coordinates of each observation or variable is measured to obtain the factor loadings. The factor loadings represent the correlation between the particular variable and the factor and are usually placed in a matrix of correlations between the variable and the factors. The mathematical basis of factor analysis concerns a data matrix also termed as score matrix symbolized as S. The matrix contains the scores of N persons of K measures. abck 84
a1 b1 c1 k1 a2 b2 c2 k2 a3 b3 c3 k3 aN bN cN kN 1 2 3 Persons(objects) N a1 is the Score of person 1 on measure a , a2 is the score of person 2 on measure a and kN is the measure of person N on measure k. It is assumed that scores on each measure are standardized The sum of scores in any column of the matrix S is zero and the variance of scores in any column is 1. A factor is any linear combination of the variable in a data matrix and can be stated in general manner : A = Wa a + Wbb + . Wkk . The values are obtained and factor loading ie the factor variable correlations is are calculated. Then communality symbolized as h2, the eigen value and the total sum of squares are obtained and the results are interpreted. The technique of rotation is done in order to obtain realistic results. The rotation reveals different structures in the data. Finally the factor scores are obtained which enables to explain the factors. After obtaining factor scores, several other multivariate analysis like cluster, multiple regression, discriminant analysis etc can be performed. 4.4.1 Statistics and terms associated with factor analysis The statistics and some of the basic terms used in factor analysis are explained below: i. Factor : A factor is an underlying dimension that account for several observed variables. There can be more than one factor depending upon the nature of the study and the number of variable involved in it. ii. Factor loadings: Factor-loadings are simple correlation between the variables and the factors. It explains how closely the variables are related to each one of the factor discovered. This is also known as factor-variable correlations and acts as the key to understand the meaning of factor. The absolute size rather than plus or minus signs of the loading is important in the interpretation of a factor. iii. Communality (h2): communality sympolised by h2 shows how much of each variable is accounted for by the underlying factor taken together. A high value of communality means that not much of the variable is left over after whatever the factor represent is taken into consideration. It is worked out for each variable as under: h2 of ith variathe = ( ith factor loading of factor A )2 + 85
( ith factor loading of factor B)2 + . iv. Eigen value (latent root): It represents the total variance explained by each factor. Eigen value is the sum of squared values of factor loadings relating to a factor. It indicates the relative importance of each factor in accounting for the particular set of variables analysed. v. Total sum of squares: When the eigen values of all factors are totaled, the resulting value is termed as the total sum of squared. This value, when divided by the number of variables involved in the study results in an index that shows how the particular solution accounts for what all the variables taken together represent. vi. Rotation: Rotation reveals different structures in data. Different rotations give results that appear to be entirely different, but from the statistical point of view, all results are taken as equal. However right rotation should be selected to make sense out of the results. If the factors are independent orthogonal rotation is done and if the factors are correlated, an oblique rotation is made. Communality for each variable will remain undisturbed regardless of rotation but the eigen values will change as a result of rotation. vii. Factor scores: Factor scores are composite scores estimated for each respondent on the derived factors. With the factor scores several other multivariate analyses can be performed. viii. Bartletts test of sphericity : It is a test statistic used to examine the hypothesis that the variables are uncorrelated in the population. ix. Correlation matrix: It shows the simple correalation r, between all possible pairs of variables included in the analysis. The diagonal elements which are all 1, are usually omitted. x. Kaiser-Meyer-Olkin(KMO) measure of sampling adequacy: The KMO measure of sampling adequacy is an index used to examine the appropriateness of factor analysis. High values between .05 and 1.0 indicate factor analysis is appropriate. Values below .5 imply that factor analysis may not be appropriate. xi. Scree plot: Scree plot is a plot of the eigen values against the number of factors in order of extraction.
4.4.2 Steps in conducting factor analysis The first step involved in conducting factor analysis is to define the problem and identify the variables involved. A correlation matrix is to be constructed and a method of factor analysis to be performed is to be selected. Decision regarding the number of factors to be extracted and the method of factor analysis is made. The rotated factors are interpreted. Depending upon the objective the factor scores are calculated or surrogate variables selected so as to represent the factors in subsequent multivariate analysis. Finally the fit of the factor analysis model is determined. The steps is illustrated below : 1. Formulate the problem Problem formulation includes several tasks. The objectives of factor analysis should be identified and the variables to be included in the factor analysis should be specified based on the past research, theory and judgement of the researcher. The variables should be appropriately measured in an interval or ratio scale. An appropriate sample size should be identified. The sample size should be atleast four or five times more than the variables identified for the study. For eg, if the study includes 20 variables , then the sample size should be a minimum of 80 or 40. If the sample size is small and the ratio is not maintained, the results should be interpreted cautiously. 2. Construct the correlation matrix 86
Correlation matrix provides valuable insight and is the basis for further analytical process. The variables identified for the study should be correlated in order to conduct the factor analysis. If the correlation between the variables is small, factor analysis may not be appropriate. It can also be expected that the variables that are highly correlated with each other would also highly correlate with the same factor or factors. Formal statistics are available for testing the appropriateness of the factor model. Bartletts test of sphericity can be used to test the null hypothesis that the variables are uncorrelated in the population. A large value of test statistics will favour the rejection of the null hypothesis and if the null hypothesis cannot be rejected, then the appropriateness of factor analysis should be questioned. Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy can also be used. The index compares the magnitudes of the observed correlation coefficients to the magnitude of the partial correlation coefficients. Small values of the KMO statistics indicate that the correlation between pairs of variables cannot be explained by other variables and that factor analysis may not be appropriate. Generally a value greater than 0.5 is desirable. 3. Identify the method of factor analysis After determining the appropriateness of factor analysis for analyzing the data, a suitable method should be selected. The approach used to derive the weights or factor scores coefficients differentiates the various method of factor analysis. The two most commonly employed factor analytic procedures are principal component and common factor analysis. Based on the researchers objective the procedure to be used is chosen. Principal component analysis is used when the objective is to summarize information in a larger set of variables into a fewer factors. It is recommended if the primary concern is to determine the minimum number of factors that will account for maximum variance in the data for use in subsequent multivariate analysis. The factors are called principal components. If the researcher is attempting to uncover underlying dimensions surrounding the original variables, common factor analysis is used. Principal component analysis is based on the total information in each variable, whereas common factor analysis is concerned only with the variance shared among all the variables. Principal component analysis Factor analysis begins with the construction of a new set of variables based on the relationships in the correlation matrix. Principal component analysis method transforms a set of variables into a new set of composite variable or principal components that are not correlated with each other. The linear combination of factors accounts for the variance in the data as a whole. The best combination makes up the first principal component and is the first factor. The second principal component is defined as the best linear combination of variables for explaining the variance not accounted for by the first factor. Likewise there may be a third, fourth and kth component, each being the best linear combination of variables not accounted for by the previous factors. This process continues till all the variance is accounted. However usually it is stopped after a small number of factors has been extracted. The output of the principal component analysis might look like the data given below; Extracted components % of variance accounted for Component no. 1 74% Component no. 2 15% Component no. 3 11% Cumulative variance 74% 89% 40 %
Numerical results from a factor analysis will be presented like the following table. The values in the table are correlation coefficients between the factor and the variables. 87
Variable
A B C D E F Eigenvalue % of variance Cumulative %
A Unrotated factors I II .70 -.40 .60 -.50 .60 -.35 .50 .50 .60 .50 .60 .60 2.18 1.39 36.30 23.20 36.30 59.50
h2 .65 .61 .48 .50 .61 .72
B Rotated factors I II .79 .15 .75 .03 .68 .4 .06 .70 .13 .77 .07 .85
In the above table .70 is the correlation coefficient between variable A and factor I. The correlation coefficients are called as loadings. Eigen values are the sum of the variances of the factor values. For factor I the eigenvalue is sum of .702 +602+602+502+602+602 which is 2.18. The eigen value 2.18 divided by the number of variables ie., 6 yields an estimate of the amount of total variance explained by the factor. In the example given above Factor I accounts for 36% of the total variance. The column heading h2 gives the communalities ie the estimates of the variance in each variable that is explained by the two factors. From the above table it can be seen that in the case of Variable A the communality is .702 + (-402) = .65 which indicates that 65 percent of the variance in variable A is statistically explained in terms of factors I and II. In the unrotated factor, a loading does not providing much information. It is not possible to find the variables with high loading in factor I and factor II. Rotation enables to identify the variables associated with the factors. 4. Determine the number of factors It is possible to compute as many principal components as there are variables, but it does not serve the purpose of conducting a factor analysis. In order to summarize the information contained in the original variables, smaller number of factors should be extracted. The question of how many factors are to be extracted arise. Several procedures are discussed below for determining the number of factors. A Priori determination: Due to prior knowledge the researcher knows how many factors to extract and thus can specify the number of factors to be extracted beforehand. The extraction of factors is completed as soon as the desired number of factors is extracted. Determination based on Eigenvalues: In this approach only factors with eigen values greater than 1.0 are retained, the other factors are not included in the model. An eigen value represents the amount of variance associated with the factor. Hence, factors with variance greater than 1.0 are included. If the number of variables is less than 20, this approach will result in conservative number of factors.
88
Determination based on Scree plot: A scree plot is a plot of the eigen values against the number of factors in order of extraction. The shape of plot is used to determine the number of factors. The plot typically has a distinct break between the steep slope of factors with large eigenvalues and a gradual trailing off associated with the rest of the factors. The gradual trailing off is refereed as scree. The point at which the scree begins denotes the true number of factors. Determination based on the percentage of variance: In this approach the number of factors to be extracted is determined in such a way that the cumulative percentage of variance extracted by the factors reaches a satisfactory level. Satisfactory level depends upon the problem at hand. However, it is recommended that the factors extracted should account for at least 60 percent of the variance. Determination based on Split-Half reliability: The sample is split in half and factor analysis is performed on each half. Only factors with high correspondence of factor loadings across the two subsamples are retained. Determination based on significance tests: It is possible to determine the statistical significance of the separate eigen values and retain only those factors that are statistically significant. A drawback is that with large samples sizes greater than 200, many factors are likely to be statistically significant, although many of these may account for only a small proportion of the total variance. 5. Rotate the factors The initial or unrotated factor matrix indicates the relationship between the factors and the individual variables. However it is difficult to identify the variables with a factor or interpreting the factor is difficult from the unrotated matrix. Through rotation the factor matrix is transformed into a simpler one that is easier to interpret. Through rotation it is possible to see that each factor has a nonzero or significant loadings or coefficients for only some of the variables. Likewise it can also be ensured that each variable has a nonzero or significant loading with only a few factors, or with only one factor. If several factors have high loadings with the same variable, it is difficult to interpret them. Rotation does not affect the communalities and the percentage of total variance explained. However the percentage of variance accounted for by each of the factor changes. It can be seen from the table shown in previous pages that the variance explained by the individual factors is redistributed by rotation. Hence, different methods of rotation result in identification of different factors. Two basic types of rotation available are the orthogonal rotation and oblique rotation. If the axes are maintained at right angles then the rotation is called orthogonal rotation.. VARIMAX rotation. is the most commonly used rotation. Its goal is to minimize the complexity of the components by making the large loadings larger and the small loadings smaller within each component. There are other rotational methods. QUARTIMAX rotation makes large loadings larger and small loadings smaller within each variable. EQUAMAX rotation is a compromise that attempts to simplify both components and variables. These are all orthogonal rotations, that is, the axes remain perpendicular, so the components are not correlated with one another. When the axes are not maintained at right angles and the factors are correlated then it is called oblique rotation. Oblique rotation should be used when factors in the population are likely to be strongly correlated.
89
In the table given above, it can be seen that the factor interpretability is more in case of the rotated matrix than the unrotated matrix. Rotated factor matrix forms the basis for interpretation of the factors. 6. Interpret factors Interpretation is facilitated by identifying the variables that have large loading on the same factor. The factor can be interpreted in terms of the variables that load high on it. In the above table it can be seen that variables A,B,C load high on Factor I and hence factor I is interpreted in terms of variables A,B and C. likewise the variables D,E,F are interpreted in terms of Factor II. Another useful method in interpretation of factors is to plot the variables using the factor loadings as coordinates. Variables at the end of the axis are those that have high loadings on only that factor and hence describe the factor. 7. Calculate factor scores If the goal of factor analysis is to reduce the original set of variables to a smaller set of composite variables ie factors for application in subsequent multivariate analysis it is useful to compute factor scores for each respondent. A factor is simply a linear combination of the orginal variable. the factor scores for the ith factor may be estimated as follows; Fi = Wi1X1 + Wi2X2+ Wi3X3+ .+ WikXk The weights or factor score coefficients are obtained from the factor score coefficient matrix. Only in principal component analysis it is possible to compute exact factor scores and these scores are uncorrelated. In common factor analysis, estimates of these scores are obtained, and there is no guarantee that the factors will be uncorrelated with each other. The factor scores can be used instead of the original variables in the subsequent multivariate analysis. 8. Select the surrogate variables Instead of computing factor scores, the researcher may select the surrogate or substitute variables. Selection of surrogate variables involves identifying some of the original variables for use in the subsequent analysis. This allows the researcher to conduct further analysis and interpret the results in terms of original variables rather than the factor scores. The variables can be selected by examining the factor matrix and selecting for each factor the variable with the highest loading on that factor. The variable could be as surrogate variables for the associated factor. This process will work well if one factor loading for a variable is clearly higher than all other factor loadings. However if two or more variables have similar loadings the choice will be difficult. In such cases the choice of variables should be made on the basis of theoretical and measurement consideration. For example the theory may suggest that a variable with a slightly lower loading is more important than one with slightly higher loading. Likewise if variable has a slightly lower loading has been measured more precisely, it should be selected as the surrogate variable. 9. Determine the Model fit Determining the fitness of the model is the final step in factor analysis. A basic assumption underlying factor analysis is that the observed correlation between variables can be attributed to common factors. The correlation between variables can be reproduced from the estimated correlations between the variables and the factors. The differences between the observed correlations as given in the input crorrelation matrix and the reproduced correlations as estimated form the factor matrix can be examined to determine the fitness of the model. The differences are called residuals. If there are many large residuals, the factor model does not provide a good fit to the data and model should be reconsidered. R-type and Q-type factor analyses 90
Factor analysis may be R-type factor analysis or Q-type. In R-type factor analysis high correlations occur when respondents who score high on variable 1 also score high on variable 2 and respondents who score low on variable 1 also score low on variable 2. Factors emerge when there are high correlations within group of variables In Q-type factor analysis, the correlations are computed between pairs of respondents instead of pairs of variables. High correlations occur when respondents A pattern of responses on all the variables is much like respondents B pattern of responses. Factors emerge when there are high correlations within group of people. Q-type analysis is useful when the object is to sort out people into groups based on their simultaneous responses to all the variables. 4.4.3 Uses and limitations of factor analysis: The benefits of using factor analysis are dealt below: i. Interdependency and pattern delineation. If a researcher has a table of data concerning attitude, lifestyle, personality characteristics, or answers to a questionnaire--and if he suspects that these data are interrelated in a complex fashion, then factor analysis may be used to untangle the linear relationships into their separate patterns. Each pattern will appear as a factor delineating a distinct cluster of interrelated data. ii. Parsimony or data reduction. Factor analysis can be useful for reducing a mass of information to an economical description. For example, data on fifty characteristics for 500 respondents are unwieldy to handle, descriptively or analytically. Management of data analysis, and clear understanding of such data are facilitated by reducing them to their common factor patterns. These factors concentrate and index the dispersed information in the original data and can therefore replace the fifty characteristics without much loss of information. iii. Structure. Factor analysis may be employed to discover the basic structure of a domain.. Data collected on a large sample of groups and factor analyzed can help disclose this structure. iv. Classification or description. Factor analysis is a tool for developing an empirical typology. It can be used to classify respondents profiles into types with similar characteristics or behavior. v. Scaling. A researcher often wishes to develop a scale on which individuals, groups, or nations can be rated and compared. A problem in developing a scale is to weight the characteristics being combined. Factor analysis offers a solution by dividing the characteristics into independent sources of variation (factors). Each factor then represents a scale based on the empirical relationships among the characteristics. As additional findings, the factor analysis will give the weights to employ for each characteristic when combining them into the scales. The factor score results are actually such scales. vi. Hypothesis testing. Hypotheses are generally framed regarding dimensions of attitude, personality, group, social behavior, and conflict. Factor analysis may be used to test for their empirical existence. Which characteristics or behavior should, by theory, be related to which dimensions can be postulated in advance and statistical tests of significance can be applied to the factor analysis results. vii. Data transformation. Factor analysis can be used to transform data to meet the assumptions of other techniques. For example, application of the multiple regression technique assumes independent variables are statistically unrelated . If the predictor / independent variables are correlated in violation of the assumption, factor analysis can be employed to reduce them to a smaller set of uncorrelated factor scores. The scores may be used in the regression analysis in place of the original variables, with the knowledge that the meaningful variation in the original data has not been lost. Likewise, a large number of dependent variables also can be reduced through factor analysis. viii. Exploration The unknown domain may be explored through factor analysis. It can reduce complex interrelationships to a relatively simple linear expression and it can uncover unsuspected, startling, relationships. Unlike pure science researchers, usually the social science researcher is unable to manipulate variables in a laboratory but they deal with the manifold complexity of behaviors in their social setting. Factor analysis thus fulfills some functions of the laboratory and enables the researcher to untangle 91
interrelationships, to separate different sources of variation, and to partial out or control for undesirable influences on the variables of concern. ix. Mapping. Besides facilitating exploration, factor analysis also enables a researcher to map the social terrain by the systematic attempt to chart major empirical concepts and sources of variation. These concepts may then be used to describe a domain or to serve as inputs to further research x. Theory: The analytic framework of social theories or models can be built from the geometric or algebraic structure of factor analysis. Application of Factor analysis: Some areas in which factor analysis can be used are; i. It can be used in market segmentation for identifying the underlying variables on which to group the customers. ii. Factor analysis can be used in product research to determine the brand attributes that influence the consumers choice. iii. In advertising studies, factor analysis can be used to understand the media consumption habits of the consumers. iv. In pricing studies, it can be used to identify the characteristics of price-sensitive consumers. Limitations: i. Factor analysis involves laborious computations at a heavy cost burden. With the computer facility and statistical packages the factor analysis has become relatively faster and easier, however large factor analyses are still bound to be quite expensive. ii. The results of single factor analysis are considered generally less reliable and dependable as the factor analysis mostly starts with a set of imperfect data. The factor analysis should be done atleast twice. If similar results are obtained the confidence regarding the results will increase iii. Factor analysis is a complicated decision tool that can be used only when one has thorough knowledge and enough experience of handling this tool. 4.4.4 Application of Statistical package: Factor analysis A marketing concern would like to predict the sales of the cars from a set of variable. However many of the variables are correlated and this might adversely result in a wrong prediction. The variables are vehicle type, price, engine size, fuel capacity, fuel efficiency, wheel base, horsepower, width, length. Factor analysis with principal component extraction can be used to identify a manageable subset of predictors. The steps to be followed in performing factor analysis and interpretation of the same output is discussed below: From the Data Editor Window Click on Analyze Click on Data Reduction Click on Factor...
The following Factor Analysis dialog box will appear. 92
Select the variables you want to enter into the factor analysis by double clicking on them, or use the shift or control keys to select them and click the right arrow key to move the selected variables to the Variables list on the right. Click Extraction Extracting factors and factor rotation: There is no hard and fast rule to determine the number of factors. A commonly used convention is to use the number of factors with eigen values greater than 1. The statistical package will select this number by default. The scree plot may also be used to determine the number of factors
Click on Rotation. The Factor Analysis: Rotation dialog box appears.
Under method, select Varimax by clicking on it Under Display, select Rotated solution by clicking on it Click on Continue which brings you back to the Factor Analysis dialog box. Click on OK to run the analysis. Click Continue. Click OK in the Factor Analysis dialog box.
Interpretation of the output The output obtained through performing the steps discussed is detailed in the following pages. Communalities Communalities indicate the amount of variance in each variable that is accounted for. Initial communalities are estimates of the variance in each variable accounted for by all components or factors. For principal components extraction, this is always equal to 1.0 for correlation analyses. The communalities in this table are all high, which indicates that the extracted components represent the variables well. Total variance explained The variance explained by the initial solution, extracted components, and rotated components is displayed. This first section of the table shows the Initial Eigenvalues. The Total column gives the eigenvalue, or amount of variance in the original variables accounted for by each component. The % of Variance column gives the ratio of the variance accounted for by each component to the total variance in all of the variables. The Cumulative % column gives the percentage of variance accounted for by the first n components. For example, the cumulative percentage for the second component is the sum of the percentage of variance for the first and second components. For the initial solution, there are as many components as variables.
The second section of the table shows the extracted components. They explain nearly 88% of the variability in the original ten variables, so the complexity of the data set can be considerably reduced by using these components, with only a 12% loss of information.
93
The rotation maintains the cumulative percentage of variation explained by the extracted components, but that variation is now spread more evenly over the components. The large changes in the individual total suggest that the rotated component matrix will be easier to interpret than the unrotated matrix.
The scree plot enables to determine the optimal number of components. The eigen value of each component in the initial solution is plotted. Generally, the components on the steep slope are extracted. The components on the shallow slope contribute little to the solution. The last big drop occurs between the third and fourth components, so the first three components are selected.
The rotated component matrix helps to determine what each components represent. The first component is most highly correlated with Price in thousands and Horsepower. Price in thousands is a better representative, however, because it is less correlated with the other two components. The second component is most highly correlated with Length. The third component is most highly correlated with Vehicle type. This suggests that Price in thousands, Length, and Vehicle type can be focused for in further analyses. 4.5 Cluster analysis Like factor analysis cluster analysis examines an entire set of interdependent relationships. Cluster analysis, also called as classification analysis or numerical taxonomy is a class of techniques used to classify objects or cases into relatively homogeneous groups called clusters. Objects in each cluster tend to be similar to each other and dissimilar to objects in other clusters. The primary objective of cluster analysis is to classify objects into relatively homogeneous groups based on the set of variables considered. Objects in a group are relatively similar in terms of these variables and different from objects in other groups. Cluster analysis does not distinguish between dependent and independent variables. The independent relationship between the whole set of variables are examined. The following illustration shows an ideal clustering situation in which clusters are distinctly separated on two variables. Each object is assigned to only one cluster and there is no overlapping areas.
The following illustration provides a clustering situation that is practically encountered. The boundaries for the clusters are not clear cut and the classification of consumers is not obvious, as many of them could be grouped into one cluster or another.
Cluster and discriminant analysis are both concerned with classification. However discriminant analysis requires prior knowledge of the cluster or group membership for each object included to develop the classification rule. In cluster analysis there is no a priori information about the group or cluster membership for any of the objects. Groups are suggested by data not defined a priori. 4.5.1 Statistics and terms associated with cluster analysis 94
Clustering methods are based on relatively simple procedures that are not supported by an extensive body of statistical reasoning. The methods are based on algorithms and hence differ from factor, discriminant, regression, ANOVA which are based on extensive statistical reasoning. The terms associated are discussed below: i. Agglomeration schedule: The schedule gives information on the objects or cases being combined at each stage of a hierarchical clustering process. ii. Cluster centroid: The cluster centroid is the mean values of the variables for all the objects in a particular cluster. iii. Cluster centers: The cluster centers are the initial starting points in nonhierarchical clustering. Clustering are built around these centers or seeds. iv. Cluster membership: cluster membership indicates the cluster to which each object or case belongs. v. Dendrogram: Dendrogram or tree graph, is a graphical device for displaying clustering results. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distance at which clusters were joined. The dendrogram is read from left to right. vi. Distance between cluster centers: The distance indicates how separated the individual pairs of clusters are. Clusters that are widely separated are distinct and desirable. vii. Icicle diagram: An icicle diagram is a graphical display of clustering results, It resembles a row of icicles hanging from the roof of a house. The columns correspond to the objects being clustered and the rows correspond to the number of clusters. viii. Similarity/distance coefficient matrix: It is a matrix containing pair wise distances between objects or cases. 4.5.2 Steps in conducting cluster analysis The first step in cluster analysis is to formulate the clustering problem by defining the variables on which the clustering will be based. Then an appropriate distance measure must be selected. The distance measure determines how similar or dissimilar the objects being clustered are. Several clustering procedure are available from which the researcher should select the appropriate one suitable to the problem. The researcher should decide the number of clusters and the derived cluster should be interpreted in terms of the variables used to cluster them. Finally the validity of the clustering process should be assessed. The steps are explained in the clustering procedure are listed below;
1. Formulate the problem The most important aspect in formulating a problem is selecting the variable on the basis of which clusters is to formed. Including irrelevant variables will affect the clustering solution. The variables selected should describe the similarity between objects in terms of the problem selected. The variables should be selected based on past research, theory or in consideration of the hypothesis being tested. In exploratory research, the researcher should act based on judgment and intuition. 2. Select a distance measure The objective of clustering is to group similar objects together. For this purpose some measure should be adopted to assess how similar or different the objects are. The most common approach is to measure similarity in terms of distance between pairs of objects. Objects with smaller distances between them are most similar to each other than those at larger distances. The following are some of the methods available to measure the distance between objects: 95
i. The Euclidean distance is the most commonly used measure. It is the square root of the sum of the squared differences in values for each variable. ii. The city-block or Manhattan distance measure the distance between two objects in terms of the sum of the absolute differences in values for each variable. iii. The Chebychev distance between two objects is the maximum absolute difference in values for any variable. The variables involved in the study may be measured in terms of different units for example in terms of Likert scale, frequency, percentages etc. in such cases before clustering the respondents , the data must be standardized by rescaling each variable to have a mean of zero and a standard deviation of unity. The outliners or cases with nonconforming values should also be eliminated. 3. Select a Clustering Procedure Clustering procedures may be broadly categorized as hierarchical or nonhierarchical. Hierarchical clustering is characterized by development of hierarchy or tree-like structure. Hierarchical methods can be of two types viz., divisive or agglomerative. Divisive clustering starts with all the objects grouped in a single cluster. Clusters are divided until each object is in a separate cluster. Agglomerative clustering starts with each object in a separate cluster. Clusters are formed by grouping objects into bigger and bigger clusters. This process is continued until all objects are formed into a single cluster. Agglomerative consists of (i) Linkage methods, (ii) Variance methods and (iii) Centroid methods. (i). Linkage methods Linkage methods include single linkage, complete linkage and average linkage. The single linkage method is based on minimum distance or the nearest neighbour rule. The first two objects clustered are those that have the smallest distance between them. The next shortest distance is identified, and either the third object is clustered with the first two, or a new two-object cluster is formed. At every stage the distance between two clusters is the distance between their two closest points as illustrated below;
Single linkage method does not work well when the clusters are poorly defined. The Complete linkage method is similar to single linkage except that it is based on the maximum distance or the furthest neighbour approach. The distance between two clusters is calculated as the distance between their two furthest points. In the average linkage method the distance between two clusters is defined as the average of the distances between all pairs of objects, where one member of the pair is from each of the clusters. This method uses information on all pairs of distances, not merely the minimum or maximum distances. Hence it is preferable to single and complete linkage method.
(ii). Variance methods The variance method attempts to minimize the within cluster variance. Wards procedure is a commonly used variance method. For each cluster, the means of all the variables are computed. Subsequently for each object the squared Euclidean distance to the cluster means is calculated. The distances are summed for all the objects. At each stage, the 96
two clusters with the smallest increase in the overall sum of squares within the cluster distances are combined. This is illustrated below; (iii). Centroid methods In the Centroid methods, the distance between two clusters is the distance between their centroids ie means of all the variables. Every time objects are grouped, new centroid is computed. The average linkage method and wards method perform better than other procedures. Nonhierarchical clustering The non hierarchical clustering method is also known as k-means clustering. This method includes sequential threshold, parallel threshold and optimizing partitioning. i. In sequential threshold method, a cluster center is selected and all objects within a prespecified threshold value from the center are grouped together. Next a new cluster or seed is selected and the process is repeated for the unclustered points. Once an object is clustered with a seed, it is no longer considered for clustering with subsequent seeds. ii. The parallel threshold method operates similarly, however several clusters are selected simultaneously and objects within the threshold are grouped with the nearest center. iii. The optimizing partitioning method differs from the other threshold method i.e., the objects can later be reassigned to clusters to optimize an overall criterion, such as average within-cluster distance for a given number of clusters. Nonhierarchical clustering is faster than the hierarchical methods and is preferable when the number of objects or observation is large. The major drawback of nonhierarchical procedure is that the number of clusters must be prespecified and the selection of cluster centers is arbitrary. The clustering results depend on how the centers are selected. The hierarchical and nonhierarchical methods can be used together. An initial clustering solution can be obtained using a hierarchical procedure and the number of cluster and cluster centroids so obtained are used as inputs to the optimizing portioning method. 4. Decide on the number of clusters Some guidelines to make decision regarding the number of clusters are: Theoretical, conceptual or practical considerations may suggest the number of clusters. In hierarchical clustering , the distance at which clusters are combined can be used as criteria. This information can be obtained form the agglomeration schedule or from the dendrogram. In nonhierarchical clustering, the ratio of total within-group variance to between group variance can be plotted against the number of clusters. The point at which a sharp bend occurs indicates an appropriate number of clusters. Increasing the number of clusters beyond this point will not be useful
The relative sizes of the cluster should be meaningful with each cluster having more elements. It is not useful to have only one element in a cluster. 97
5. Interpret and Profile the clusters Interpreting and profiling clusters involves examining the cluster centroids. The centroids represent the mean values of the objects contained in the cluster on each of the variables. The centroids enable us to describe each cluster by assigning it a name or label. It will be more helpful to profile the clusters in terms of variables that are not used for clustering. The demographic, psychographic, product usage, media usage or other variables can be used for profiling. The variables that significantly differentiate between clusters can be identified via discriminant analysis and one-way analysis of variance. 6. Assess Reliability and Validity Several decisions are made on the basis of cluster analysis, hence clustering solutions should not be accepted without assessing the reliability and validity. The following procedure can be followed to provide adequate checks on the quality of clustering results. Perform cluster analysis on the same data using different distance measure. Compare the results across measures to determine the stability of the solutions. Use different methods of clustering and compare the results. Split the data randomly into halves, perform clustering separately on each half and compare the cluster centroids across the two sub samples. Delete variables randomly. Perform clustering based on the reduced set of variables. Compare the results with those obtained by clustering based on the entire set of variables. In nonhierarchical clustering, the solution may depend on the order of cases in the data set. Multiple runs using different order of cases can be performed until solutions are stabilized.
4.5.3 Uses of Cluster analysis: Some practical area where cluster analysis can be used is explained below; i. Segmenting the market: The consumers may be clustered on the basis of the benefits sought from the purchase of a product. Each cluster would consist of consumers who are relatively homogeneous in term of the benefit they seek. This is called benefit segmentation. ii. Understanding buyer behaviour: Cluster analysis can be used to identify homogeneous groups of buyers. The buying behaviour of each group may be examined separately. iii. Identifying new product opportunities: clustering brands and products enables to identify the competitive sets within the market. Brands within the same cluster compete more fiercely with each other than with brands in other clusters. A firm can examine its current offerings compared to those of the competitors to identify potential new product opportunities iv. Selecting test markets: clustering geographical areas enable to select comparable cities to test the various marketing strategies v. Reducing data: Clusters analysis can be used as data reduction tool to develop clusters or subgroups of data that are more manageable than individual observations. 4.5.4 Application of Statistical package : Cluster analysis A car manufacturing concern would like to ascertain the current market for its vehicles. For this it needs to group cars based on the information available regarding various models of vehicles. The information regarding the vehicle type, price, engine size, fuel capacity, fuel efficiency, wheel base, horsepower, width, length are available. The segmentation could be performed using the Hierarchical Cluster Analysis procedure. The steps are discussed below; 98
To perform cluster analysis from the menus choose: Analyze Classify Hierarchical Cluster...
Select the variables on basis of which clusters are to be formed. Also select the case labeling variable.
Click Plots. Select Dendrogram. Select None in the Icicle group. Click Continue.
Click Method in the Hierarchical Cluster Analysis dialog box. Select Nearest neighbor as the cluster method. Select Z scores as the standardization in the Transform Values group. Click Continue. Click OK in the Hierarchical Cluster Analysis group. Interpretation of the output The output of cluster analysis is discussed below: The dendrogram is a graphical summary of the cluster solution. Cases are listed along the left vertical axis. The horizontal axis shows the distance between clusters when they are joined. Parsing the classification tree to determine the number of clusters is a subjective process. Generally, the "gaps" between joinings along the horizontal axis is looked for . Starting from the right, there is a gap between 20 and 25, which splits the automobiles into two clusters. There is another gap from approximately 4 to 15, which suggests 6 clusters The agglomeration Schedule The agglomeration schedule is a numerical summary of the cluster solution
99
At the first stage, cases 8 and 11 are combined because they have the smallest distance. The cluster created by their joining next appears in stage 7. In stage 7, the clusters created in stages 1 and 3 are joined. The resulting cluster next appears in stage 8. When there are many cases, the table becomes rather long, but it may be easier to scan the coefficients column for large gaps rather than scan the dendrogram. A good cluster solution sees a sudden jump (gap) in the distance coefficient. The solution before the gap indicates the good solution. The largest gaps in the coefficients column occur between stages 5 and 6, indicating a 6-cluster solution, and stages 9 and 4, indicating a 2-cluster solution. These are the same as the findings from the dendrogram. 4.6. Discriminant analysis Discriminant analsyis is a dependence multivariate technique. The purpose of dependence technique is to predict a variable form a set of independent variables. It is also used for predicting group membership on the basis of two or more independent variables. Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predicator or independent variables are interval in nature. For eg. The dependent variable may be the choice of a brand and the independent variable may be the ratings of attributes of soft drinks on 5 point Likert scale. The objectives of discriminant analysis are as follows; 1. Development of discriminant fuctions which will best discriminate between the categories of the dependent variable. Discriminant function is the linear combination of the predictor or independent variables that will best discriminate between the categories of dependent variable. 2. To examine whether significant differences exist among the groups in terms of the predictor variables. 3. Determination of the predictor variables which contributes to most of the intergroup differences. 4. Classification of cases to one of the groups based on the values of the predictor variables 5. Evaluation of the accuracy of classification. The discriminant analysis techniques are described by the number of categories possessed by the dependent variable/ criterion variable. When the ciriterion variable has two categories, the technique is known as two-group discriminant analysis. When three or more categories are involved the technique is referred to as multiple discriminant analysis. In two group discriminant analysis it is possible to derive only one discriminant function. In multiple discriminant analysis, more than one function may be computed. The discriminant analysis model involves the linear combination of the following form: D = b0 + b1X1 + b2X2+ b3X3 + ..+ bkXk D = Discriminant score bn = Discriminant coefficients or weights Xn = Predictors or independent variable The coefficients or weights(b) are estimates so that the groups differ as much as possible on the values of the discriminant function. This will happen when the ratio of between group sum of squares to within-group sum of squares for the discriminant scores is at the maximum. Any other linear combination will result in a smaller ratio. Several statistics are associated with discriminant analysis which are dealt below: 4.6.1 Statistics associated with discriminant analysis 100
The important statistics associated with discriminant analysis are; Centroid : The centroid is the mean value for the discriminant socres for a particular group. The means for a group on all functions are the group centroids. Classification matrix: This is also called as confusion matrix or prediction matrix. It contains the number of correctly and misclassified cases. The correctly classified cases appear on the diagonal because the predicted and actual groups are the same. the off diagonal elements represent cases that have been incorrectly classified. The sum of the diagonal elements divided by the total number of cases represent the hit ratio. Discriminant function coefficients: The unstandardised discriminant function coefficients are the multipliers of variables, when the variables are in the original units of measurement. Discriminant scores: The unstandardized coefficients are multiplied by the values of the variables. These products are summed and added to the constant term to obtain the discriminant scores. Eigenvalue. For each discriminant function, the eigen value is the ratio of between group to within group sums of squares. Large eigenvalues imply superior functions. F values and their significance: These are calculated from a one-way ANOVA, with the grouping variable serving as the categorical independent variable. each predictor serves as metric dependent variable in the ANOVA. Group means and group standard deviation: These are computed for each predictor for each group. Pooled within- group correlation matrix: The pooled within group correlation matrix is computed by averaging the separate covariance matrices for all the groups. Structure correlations: This is also referred to as discriminant loadings , the structure correlations represent the simple correlations between the predictors and the discriminant function. Total correlation matrix: If the cases are treated as if they were from a single sample and the correlations computed, a total correlation matrix is obtained. Wilks : Sometimes also called the U statistic, Wilks for each predictor is the ratio of the within group sum of squares to the toal sum of squares. The values range between 0 and 1. Large values of nearing 1 indicates that the group means are not different. Small values of nearing 0 indicate that the group means are different. 4.6.2 Steps in conducting Two group Discriminant analysis The steps in conducting two group discriminant analysis are discussed below: 1. Formulate the problem: The first step in discriminant analysis is to formulate the problem by identifying the objectives, the criterion variable and the dependent variables. The criterion variables must consist of two or more mutually exclusive and collectively exhaustive categories. When the dependent variable is interval or ratio scaled, it must first be converted into categories. The predictor variable should be selected based on a theoretical model or previous research or in the case of exploratory research, the experience of the researcher should guide the selection. 101
2. Research design issues Research design for discriminant anlysis requires consideration of the following issues (1) selection of both dependent and independent variables, (2)deciding the sample size needed for estimation of discriminant function and (3) division of sample for validation purpose. (i) Selection of dependent and independent variable To apply discriminant analysis the researcher should specify the dependent and the independent variables. Dependent variable should be categorical and the independent variables are metric. The number of dependent variables categories can be two or more, but these groups must be mutually exclusive and exhaustive. Each observation should be such that it can be placed into only one group. The dependent variable in some cases may involve two groups eg., purchasers and non purchasers. In some cases it may also involve several groups such as heavy users, medium users, light users and non users of a product. After the decision regarding the dependent variables, the researcher must decide about the independent variables to be included in the analysis. Independent variables can be selected in the following two ways. Identifying the variables from the previous research or from the theoretical model that is underlying the basis of research question. The second approach is intuition ie utilizing the researchers knowledge and intuitively selecting variables for which previous research is not available.
(ii) Sample size The ratio of sample size to the number of predictor variables should be considered in discriminant analysis. Many studies suggest a ratio of 20 observations for each predictor variable. If adequate sample is not maintained the results became unstable. The minimum size recommended is five observations per independent variable. The ratio applies to all variables considered in the analysis, even if all of the variables considered are not entered into the discriminant function. In addition to the overall sample size, the researcher must also consider sample size of each group. The smallest group size must exceed the number of independent variables. The practical guideline is that each group should have atleast 20 observations. (iii) Division of sample The sample should be divided into two groups called as estimation or analysis sample and the holdout or validation sample. The analysis sample is used for estimation of the discriminant function. The hold out or validation sample is reserved for validating the discriminant function. It is essential that each subsample should be of adequate size to support conclusions from the results. If the sample is large enough, it can be split in half. One half serves as the analysis sample and the other is used for validation. The analysis sample is used to develop the discriminant function and the validation sample is used to test the discriminant function. This method of validating the sample is refereed to as the split-sample or cross-validation approach. The role of the halves is then interchanged and the analysis is repeated. This is called double crossvalidation. The distributions of the number of cases in the analysis and validation samples follow the distribution in the total sample. For example, if the total sample contains 60 percent users and 40 percent non users of the product, then the analysis and validation sample would each contain 60 percent users and 40 percent non-users. 3. Assumptions 102
The key assumptions in deriving the discriminant function are multivariate normality of the independent variables and the unknown dispersion and covariance structures for the groups as defined by the dependent variable. as in the case of all multivariate techniques, the implicit assumptions that all relationship are linear applies to discriminant analysis also. The researcher should examine the data and if assumptions are violated, the researcher should identify the alternative methods available and the impacts on the results that can be expected. Data not meeting the multivariate normality assumption can cause problems in the estimation of the discriminant function. 4. Estimating the discriminant function To derive at the discriminant functions, the researcher must decide on the method of estimation and then determine the number of functions to be retained. After the estimation of the function the overall model fit can be assessed in several ways. Methods to derive discriminant function Two computational methods are used to derive the discriminant function viz., simultaneous/direct method and the stepwise method. The direct method involves estimating the discriminant function so that all the predictors are included simultaneously. In this case each independent variable is included regardless of its discriminating power. This method is appropriate when the researcher want to include all the independent variables for theoretical reasons and is not interested in viewing the intermediate results based only on the most discriminating variables. In stepwise discriminant analysis the independent variables are entered one at a time, based on their ability to discriminate among groups. The stepwise method is useful when the researcher wants to consider a relatively large number of independent variables for inclusion in the function. Statistical significance The researcher must assess the level of significance of the discriminant function computed. It would not be meaningful to interpret the analysis if the discriminant functions estimated were not statistically significant. Significance test can be done on the basis of a number of statistical criteria viz., wilks lambda, Hotellings trace and Pillai criterion. The significant criterion of .05 or beyond is often used. If the higher levels of risk for including nonsignificant results are acceptable, significance level at .2 or .3 may be fixed. If the number of groups is three or more, the researcher must decide not only if the discrimination between groups is significant but also if each of the estimated discriminant function is statistically significant. Assessing Overall Fit Assessing overall fit of the selected discriminant function involves three tasks: calculating discriminant Z scores for each observation, evaluating group differences on the discriminant Z scores and assessing group membership predication accuracy. 5. Interpretation of discriminant functions Interpretation involves examining the discriminant functions to determine the relative importance of each independent variable in discriminating between the groups. Three methods are available to assess the importance the discriminating function. i. The sign and magnitude of the standardized discriminant weights or discriminant coefficient assigned to each variable is taken into consideration. A small weight may indicate that the corresponding variable is irrelevant in determining the relationship 103
ii. Discriminant loadings also referred as structure correlations, measure the simple linear correlations between each independent variable and the discriminant functions. Variables are associated with the functions in which it has a higher loading iii. If stepwise method is selected in deriving discriminant functions, an additional means of interpreting the relative discriminating power of the independent variable is available through partial F values. The absolute sizes of the significant F values are examined and ranked. Large F values indicate greater discriminatory power. 6. Validation of the discrimination results The final stage in discriminant analysis involves validating the discriminant results to provide assurance that the results have external as well as internal validity. The most frequently used procedure to validate the discriminant function is to divide the groups randomly into analysis and holdout sample. This involves developing a discriminant function with the analysis sample and applying the same to the holdout sample. Instead of randomly dividing the total sample into analysis and holdout samples once, the total sample can be divided randomly divided into analysis and holdout samples several times, each time testing the validity of the function through the development of a classification matrix and a hit ratio. Either one of these two approaches can be used only when the smallest group size is atleast three times the number of predictor variables. 4.6.3 Uses of Discriminant analysis: A medical researcher may record different variables relating to patients' backgrounds in order to learn which variables best predict whether a patient is likely to recover completely (group 1), partially (group 2), or not at all (group 3). A biologist could record different characteristics of similar types (groups) of flowers, and then perform a discriminant function analysis to determine the set of characteristics that allows for the best discrimination between the types. Discrimininant analysis can help to distinguish between heavy, medium and light users of a product in terms of consumption habits and lifestyles It enables to carry out image research i.e., it enables to distinguish between customers who exhibit favorable perceptions of a store and those who do not It assists in distinguishing how market segments differ in media consumption habits
4.6.4Application of Statistical package : Discriminant analysis Using cluster analysis a telephone company has categorized the customers into four groups viz., Basic service, eservice, plus service and total service. The concern wants to predict group membership so as to customize offers for individual prospective customers. The predication should be based on the demographic data viz., gender , age, marital status, income, education, number of years in current address, years with current employer, retired and number of people in family. The Discriminant Analysis procedure can be used to classify customers. The steps are discussed below; To run the discriminant analysis, from the menus choose:
104
Analyze Classify Discriminant...
Select the grouping variable. Click Define Range. Enter the Minimum Enter the Maximum Click Continue Click Classify in the Discriminant Analysis dialog box. Select Summary table and Territorial map. Click Continue. Click OK in the Discriminant Analysis dialog box. These selections produce a discriminant model using the stepwise method of variable selection. Interpretation of the Output The discriminant model produced using the stepwise method of variable selection is discussed below; Variables Not in Analysis
When there are lots of predictors, the stepwise method can be useful by automatically selecting the "best" variables to use in the model. The stepwise method starts with a model that doesn't include any of the predictors. At each step, the predictor with the largest F to Enter value that exceeds the entry criteria (by default, 3.84) is added to the model. The variables left out of the analysis have F to Enter values smaller than 3.84, so no more are added. The following table displays statistics for the variables that are in the analysis at each step.
105
Variables in Analysis
Tolerance is the proportion of a variable's variance not accounted for by other independent variables in the equation. A variable with very low tolerance contributes little information to a model and can cause computational problems. F to Remove values are useful for describing what happens if a variable is removed from the current model (given that the other variables remain). F to Remove for the entering variable is the same as F to Enter at the previous step (shown in the Variables Not in the Analysis table From the Summary of the Canonical functions eigen values table it can be seen that nearly all of the variance explained by the model is due to the first two discriminant functions Eigen Values
Three functions are fit automatically, but due to its minuscule eigenvalue, the third function can be ignored. Wilks' lambda shows that only the first two functions are useful Wilks Lamda
Structure Matrix The structure matrix enables to identify the significant variables within each function.
When there is more than one discriminant function, an asterisk(*) marks each variable's largest absolute correlation with one of the canonical functions. Within each function, these marked variables are then ordered by the size of the correlation. Level of education is most strongly correlated with the first function, and it is the only variable most strongly correlated with this function. Years with current employer, Age in years, Household income in thousands, Years at current address, Retired, and Gender are most strongly correlated with the second function, although Gender and Retired are more weakly correlated than the others. The other variables mark this function as a "stability" function. Number of people in household and Marital status are most strongly correlated with the third discriminant function, but since this is a useless function predictors are also useless.
106
The territorial map The territorial map helps to study the relationships between the groups and the discriminant functions. Combined with the structure matrix results, it gives a graphical interpretation of the relationship between predictors and groups.
The territorial map offers a comprehensive view of the discriminant model.The first function, shown on the horizontal axis, separates group 4 from the others. Since Level of education is strongly positively correlated with the first function, this suggests that group 4 customers are, in general, the most highly educated. The second function separates groups 1 and 3. Since the third function was found to be rather insignificant, only the first two discriminant functions are plotted. From Wilks' lambda, it can be understood that the model is doing better than guessing, but the classification results should be considered to determine how much better the model is . Classification Matrix
Given the observed data given in the above table it can be seen that, the "null" model (that is, one without predictors) would classify maximum number of customer into the modal group, Plus service. Thus, the null model would be correct 281/400( 281 customers out of 400 customers ) = 28.1% of the time. The discriminant model gets 11.4% more or 39.5% of the customers. In particular the model excels at identifying Total service customers. However, it does an exceptionally poor job of classifying E-service customers. 4.7 Multiple Regression & Correlation Multiple regression is a multivariate statistical technique used to examine the relationship between a single dependent variable and a set of independent variables. The objective of multiple regression analysis is to use the independent variables whose values are known to predict the single dependent variable. Each independent variable is weighted by the regression analysis procedure to estimate the maximal prediction from the set of independent variables. The weights denote the relative contribution of the independent variables to the overall prediction and facilitate interpretation as to the influence of each variable in making the prediction. Regression analysis is the most widely used technique for business decision making. It is the foundation for building business forecasting models. It can also be used to study the factors influencing consumer decisions. It enables to evaluate the expected return from a stock option etc., 4.7.1 Statistics associated with multiple regression and correlation analysis The statistics and some of the basic terms used in multiple correlation and regression analysis are explained below: Beta coefficient : It is a standardized regression coefficient on the basis of which the direct comparison between coefficients regarding their relative explanatory power of dependent variable can be made Correlation coefficient( r): It indicates the strength of the association between any two metric variables. The sign (+ or - ) indicates the direction of the relationship. The correlation value can range from -1 to + 1 with +1 indicating a perfect positive relationship, 0 indicating no relationship and 1 indicating a perfect negative or reverse relationship 107
Coefficient of determination(R2): It is the measure of proportion of the variance of the dependent variable about its mean that is explained by the independent variables. The coefficient vary between 0 and 1 . Higher value of R2 greater the explanatory power of the regression equation and therefore better prediction of dependent variable is possible. Collinearity : It is an expression of relationship between two(collinearity) or more independent variables( multicollenearity). Two independent variables are said to exhibit complete collinearity if their correlation coefficient is 1, and completer lack of collinearity if their correltion coefficient is 0. Multicollinearity occurs when any single independent variable is highly correlated with a set of other independent variables. Regression coefficient (bn): Numerical value of the parameter estimate directly associated with an independent variable. In the model Y = b0+ b1x1, the value b1 is the regression coefficient for the variable X1 . Residual (e or E): Error in predicting the sample data. It is an estimate of the true random error in the population and not just the error in the prediction of the sample. 4.7.2 Steps in conducting multiple regression analysis The steps in conducting multiple regression analysis are discussed below: 1. Formulating the research problem The starting point in multiple regression is identification of research problem. In selecting suitable application for multiple regression , three issues are to be considered viz., the appropriateness of the research problem, specification of a statistical relationship and selection of the dependent and independent variables. (i)Appropriateness of research problem Multiple regression is an appropriate tool for research problems concerned with prediction and explanation. These problems are not mutually exclusive, an application of multiple regression analysis can address either or both types of research problem. The fundamental purpose of multiple regression is to predict the dependent variable with a set of independent variables. In predicting the dependent variables, two more objectives are fulfilled viz., it provides an objective means of assessing the predictive power of a set of independent variables and also enables comparing two or more set of independent variables to ascertain the predictive power of each variate. It also provides a means of objectively assessing the degree and character of the relationship between dependent and independent variables. The independent variables in addition to their collective prediction of the dependent variable, may be considered for their individual contribution to the variate and its predictions. The variate may be interpreted on any of the three perspectives: the importance of independent variables, the type of relationships found, or the interrelationships among the independent variables. (ii)Specifying a statistical relationship Multiple regression is appropriate when the researcher is interested in a statistical not functional relationship. In case of specifying the functional relationship there will be no error in prediction. In specifying a statistical relationship there will always be some random component to the relationship being examined. In statistical relationship more than one value of the dependent value will be usually be observed for any value of an independent variable. (iii)Selection of dependent and independent variables 108
The success of multiple regression techniques depends on the selection of the variables that are to be used in the analysis. The selection of dependent and independent variable should be based on conceptual or theoretical grounds. In selecting the variables the measurement error and specification error should be taken into consideration. Measurement error refers to the degree that the variable is an accurate and consistent measure of the concepts that are being studied. Measurement error may happen more, in case of selection of the dependent variable. The most problematic issue in independent variable selection is specification error which is concerned with the inclusion of irrelevant variables or the omission of relevant variables from a set of independent variables. 2. Research design issues In the design of a multiple regression analysis the researcher must consider the issues regarding the sample size, the nature of the independent variables and the possible creation of new variables to represent special relationship between the dependent and the independent variables. The sample size used in multiple regression is most important as the effect of sample size is most directly felt in the statistical power of the significance testing and the generalizability of the result. Power in multiple regression refers to the probability of detecting as statistically significant a specific level of R2 or a regression coefficient at a specified significance level for a specific sample size. Sample size has a direct and sizable impact on power. Sample size also affects the generalizability of the results by the ratio of observations to independent variables. There should be atleast five observations for each independent variable in the variate. If the ratio falls below this stipulation, the risk of over fitting the variate to the sample i.e., making the result too specific to the sample result which leads to lack of generalizability. Multiple regression deals with the linear association between metric dependent and independent variables. If non metric data needs to be included or to represent any effects other than non metric variable, new variables must be created by transformations. The transformations can be performed by using simple commands in various statistical packages. 3. Assumptions In carrying out multiple regression analysis several assumptions about the dependent and independent variables and about the relationships as a whole are made. once the variate has been derived through multiple regression, it acts collectively in predicting the dependent variable. The assumption is made not only for the individual variables but also for the variate itself. The variate and its relationship with the dependent variable should also meet the assumptions of multiple regression. The assumptions are; i. The linearity of the relationship between dependent and independent variables is assumed. This represents the degree to which the change in the dependent variable is associated with the independent variable. Partial regression slots can be used to show the relationship between a single independent variable and dependent variable. A curvilinear pattern of residuals indicate a non linear relationship between a specific independent variable and the dependent variable. ii. The presence of unequal variances ie heteroscedasticity is one of the most common assumption violations. Plotting the residuals against the predicted dependent values and comparing them to the null plot show a consistent pattern , if the variance is not constant. The null plot and heteroscedasticity is shown below iii. Independence of the error terms is assumed in regression. Each predicted value is assumed to be independent, it is not related to any other prediction i.e it is not sequenced by any other variable. iv. Normality of the dependent and independent variables or both is assumed. However this is the most frequently encountered assumption violation. 109
4. Estimating the regression model In order to estimate the regression model and to assess the overall predictive accuracy of the independent variables three tasks must be performed viz.,(i) selecting a method for estimating the regression model,(ii) assessing the statistical significance of the overall model in predicting the dependent variable and (iii)determining whether any observation exert undue influence on the results. (i) Method for estimating the regression model The regression model can be estimated using confirmatory approach , sequential search methods and combinational approach. (a) The confirmatory approach is used when the set of independent variables is completely specified. The researcher has total control over the variable selection. (b) In sequential search method the regression equation is estimated using a set of variables and then selectively adding or deleting variables until some overall crieteria is achieved. This approach provides an objective method for selecting variables that maximizes the prediction with the smallest number of variables employed. There are two types of sequential approach viz., stepwise estimation and forward addition and backward elimination. (i). Stepwise estimation allows the researcher to examine the contribution of each independent variable to the regression model. Each variable is considered for inclusion prior to developing the equation. The independent variable with the greatest contribution is added first, then based on incremental approach further variables are selected. (ii). Forward addition and backward elimination procedures are based on trial and error approach. The forward addition model is similar to the stepwise procedure mentioned above. The backward elimination procedure computes a regression equation with all the independent variables and then deletes independent variables that do not contribute significantly. (c) The combinational approach is where all possible combinations of the independent variables are used using a procedure called all-possible-subsets regression. All possible combinations of the independent variables are examined ad the best fitting set of variables is identified. (ii) Assessing the statistical significance of the overall model in predicting the dependent variable Testing for statistical significance is needed when the analysis is based on sample rather than census. Significance testing of regression coefficient provides a statistically based probability estimate of whether the estimated coefficients across a large number of samples of a certain size will be different than zero. The test is done to determine whether the impacts represented by the coefficients are generalizable to other samples from the population. (iii). Identifying the influential observations Individual observation should be focused so as to identify the observation that lie outside the general patterns of the data set or that strongly influence the regression results. Influential observations are of three basic types: outliers, leverage points and influentials. Outliers are observations that have large residual values and can be identified only with respect to a specific regression model, Leverage points are observations that are distinct form the remaining observations based on their independent variable values. Influential observation includes all observations that have a disproportionate effect on the regression results. It includes outliers and leverage points and may include other observations as well. 110
5. Interpreting the regression variate Each of the independent variables should be standardized before the regression equation is estimated. The coefficients resulting from standardized data are called beta coefficients. The advantage is that they eliminate the problem of dealing with different units of measurement thus reflecting the relative impact on the dependent variable of a change in one standard deviation in either variable. Since there is a common unit of measurement, it enables to identify the variable which is having the highest impact. 6. Validation of the results After identifying the regression model, the next step is to ensure that it represents the general population and is appropriate for the situation in which it will be used. The most appropriate empirical validation approach is to test the regression model on a new sample drawn form the general population. The ability to collect new data is limited due to cost, time pressures or availability of respondents. In this case, the split samples can be used i.e., the sample can be divided into two parts viz., an estimation model for the purpose of creating the regression model and the holdout or validation sub sample to test the equation. 4.7.3 Application of Statistical package : Multiple Regression and Correlation An automobile concern wants to identify the sales for a variety of personal motor vehicles so as to identify over- and underperforming models. This necessitates establishing a relationship between vehicle sales and vehicle characteristics. Information concerning different makes and models of cars like the vehicle type, price, engine size, fuel capacity, fuel efficiency, wheel base, horsepower, width, length are available. Linear regression can be performed in STATISTICAL PACKAGEto identify models that are not selling well. Steps are discussed below; To run a linear regression analysis, from the menus choose: Analyze Regression Linear
Select the dependent variable
Select the Independent variables. Select Stepwise as the entry method. Select the case labeling variable. Click Statistics
111
Select Casewise diagnostics and type 2 in the text box. Click Continue. Click Plots in the Linear Regression dialog box. Select the y variable and the x variable. Select Histogram. Click Continue. Click Save in the Linear Regression dialog box. Select Standardized in the Predicted Values group. Select Cook's and Leverage values in the Distances group. Click Continue. Click OK in the Linear Regression dialog box Interpretation of output The collinearity among the variables needs to be verified from the output collinearity diagnostics. If the eigenvalues are close to 0, it means that the predictors are highly inter correlated and that small changes in the data values may lead to large changes in the estimates of the coefficients. Condition index values greater than 15 indicate a possible problem with collinearity; greater than 30, a serious problem. The following collinearity table show that there are no eigenvalues close to 0, and all of the condition indices are much less than 15. The model built using stepwise methods does not have problems with collinearity. Collinerity diagnostics Checking the model fit The ability of the model to predict the dependent variable can be checked through the model fit summary Model Summary The adjuster R square value predicts the fitness of the model. Higher value is preferable. Stepwise Co-efficients
112
The stepwise algorithm chooses price and size of the vehicle wheelbase as predictors. Sales are negatively affected by price and positively affected by size. Hence the conclusion is that cheaper, bigger cars sell well. 4.8 Canonical correlation Canonical correlation analysis is a multivariate statistical model that facilitates the study of interrelationship among set of multiple dependent variables and multiple independent variables. Multiple Regression predicts a single dependent variable from a set of multiple independent variables, canonical correlation simultaneously predicts multiple dependent variables from multiple independent variables. Canonical correlation has fewest restrictions on the types of data on which it operates. Other techniques impose high level of restriction and hence the information obtained from them is of higher quality. However in case of situations with multiple dependent and independent variables, canonical correlation is the most appropriate and powerful multivariate technique. Canonical correlations goal is to quantify the strength of the relationship between two sets of dependent and independent variables. Canonical correlation deals with the association between composite of sets of multiple dependent and independent variables. During the process it develops a number of independent canonical functions that maximize the correlation between the linear composites, also known as canonical variates, which are sets of dependent and independent variables. Each canonical function is actually based on the correlation between two canonical variates, one variate for the dependent variables and one for the independent variables. The variates are derived to maximize their correlation. 4.8.1 Statistics and Key terms associated with Canonical correlation The statistics and some of the basic terms used in multiple correlation and regression analysis are explained below: Canonical variable or variate: A canonical variable, also called a variate, is a linear combination of a set of original variables in which the within-set correlation has been controlled. That is, the variance of each variable accounted for by other variables in the set has been removed. It is a form of latent variable. There are two canonical variables per canonical correlation (function). One is the dependent canonical variable, the independent variable are called the covariate canonical variable. Canonical correlation, also called a characteristic root, is a form of correlation relating two sets of variables. There may be more than one canonical correlation, each representing an orthogonally separate pattern of relationships between the two latent variables. The maximum number of canonical correlations between two sets of variables is the number of variables in the smaller set. Pooled Rc2 (pooled canonical correlation) is the sum of the squares of all the canonical correlation coefficients, representing all the orthogonal dimensions in the solution by which the two sets of variables are related. Pooled Rc2 is used to assess the extent to which one set of variables can be predicted or explained by the other set. Eigenvalues They reflect the proportion of variance in the canonical variate explained by the canonical correlation relating two sets of variables. Canonical weight. This is also called as the canonical function coefficient or the canonical coefficient: The standardized canonical weights are used to assess the relative importance of individual variables' contributions to a given canonical correlation. Structure correlation coefficient is also called as canonical factor loadings A structure correlation is the correlation of a canonical variable with an original variable in its set. Structure correlations are used for the following purposes. 113
1. Interpreting the Canonical Variables: The magnitudes of the structure correlations help in interpreting the meaning of the canonical variables with which they are associated. Larger canonical factor loadings should be weighted more when assigning an interpretive label to the given canonical correlation. A rule of thumb is for variables with correlations of 0.3 or above to be interpreted as being part of the canonical variable, and those below not to be considered part of the canonical variable. 2. Calculating Variance Explained in a Given Original Variable: The square of the structure correlation is the percent of the variance in a given original variable accounted for by a given canonical variable on a given canonical correlation. Canonical communality coefficient is the sum of the squared structure coefficients for a given variable. The canonical communality coefficient measures how much of a given original variable's variance is reproducible from the canonical variables. Redundancy coefficient, d, also called Rd, measures the percent of the variance of the original variables of one set may be predicted from a (usually the first) canonical variable from the other set. High redundancy means high ability to predict. 4.8.2 Steps in conducting Canonical correlation The steps are involved in building canonical correlation are discussed below; 1. Formulating the objectives The canonical correlations analysis is highly flexible in terms of both the number and types of variables handled and hence more complex problems could be addressed. Two sets of variables - dependent and independent are identified in the canonical correlation. Once the variables are identified the canonical correlation can be performed for the following purposes; i. Determining the magnitude of relationship between two sets of variables ii. Deriving a set of weights for each set of dependent and independent variables so that the linear combinations of each set are maximally correlated. iii. Explaining the relationship between the sets of dependent and independent variables by measuring the relative contribution of each variable to the extracted canonical functions.
2. Designing a Canonical Correlation Analysis The researcher in case of canonical analysis may add more number of dependent and independent variable without understanding the implications on the sample size. The issues on the sample size and the necessity for a sufficient number of observations per variable are frequently encountered. Small samples will not represent the correlations well and very large sample will indicate statistical significance in all instances, even when practical significance is not indicated. The sample should constitute of at least 4 observations for one variable to avoid overfitting the data. The classification of data as dependent and independent variables does not assume much significance for statistical estimation, as canonical correlations weights both variates to maximize the correlation and it does not place any particular emphasize on either variate. However a researcher must have conceptually linked set of the variables before 114
applying canonical correlation analysis. This makes the specification of dependent and independent variables essential so as to establish a strong conceptual foundation for the variables. 3. Assumptions The following assumptions are made; Multivariate normality is required for significance testing in canonical correlation. This assumption is violated when dichotomous, dummy, and other discrete variables are used. Low multicollinearity: To the extent that the variables within the independent sets of variables are highly intercorrelated, the canonical coefficients will be unstable. The coefficients for some variables may be misleadingly low or even negative because variance has already been explained by other variables. Homoscedasticity and other assumptions of correlation are assumed. Minimal measurement error is assumed since low reliability affects the correlation coefficient. Canonical correlation also can be quite sensitive to missing data. Adequate sample size must exist to reduce the chances of Type II error (thinking you don't have something when you do). Stevens (1986) recommends at least 20 times as many cases as variables in the analysis in order to interpret the first canonical correlation only. For two canonical correlations, Barcikowski and Stevens (1975) recommend 40 to 60 times as many cases as variables. No or few outliers. Outliers can substantially affect canonical correlation coefficients, particularly if sample size is not very large. 5. Deriving the Canonical Functions and Assessing Overall Fit
The first step on canonical correlation analysis is to derive one or more canonical functions. The canonical correlation analysis focuses on accounting for maximum amount of relationship between the two sets of variables. The first pair of canonical variate is derived so as to have the highest intercorrelation between two sets of variables. The second pair of canonical variates is then derived so that it exhibits the maximum relationship between the two sets of variables not accounted for by the first pair of variates. Three criteria can be used in conjunction with one another to decide which canonical function should be interpreted. They are the (1) Level of statistical significance of the function. The level of significance generally considered to be minimum acceptable is the .05 level (2) Magnitude of the canonical correlation represented by the size of canonical correlations should be considered when deciding which functions to interpret and (3) Redundancy measure for the percentage of variance accounted. 5. Interpreting the Canonical Variate
115
Interpretation involves examining the canonical functions to determine the relative importance of each of the original variables in the canonical relationships. The following three methods can be used to interpret the variate a. Canonical weights can be used to interpret the canonical functions. This involves examining the sign and magnitude of the canonical weight assigned to each variable in its canonical variate. Variables with relatively larger weights contribute more to the variates and vice versa. b. Canonical loadings can be used to interpret the functions. It measures the simple linear correlations between an original observed variable in the dependent or independent set and the sets canonical variate. The larger the coefficient, the more important it is in deriving the canonical variate. c. Canonical cross loading can be used as an alternative to canonical loadings. This involves correlating each of the orginal observed dependent variables directly with the independent canonical variate and vice versa. 6. Validation and Diagnosis
Canonical correlation analysis should be subjected to validation methods to ensure that the results are not specific only to the sample data and can be generalized to the population. For the purpose of validation two sub samples can be created and analyses can be performed on each sub sample separately. Then the results are compared for similarity of canonical functions, variate loadings etc., If marked differences are found additional investigation should be performed. Another approach is to assess the sensitivity of the results to the removal of dependent or independent variable. To ensure the stability of the canonical weights and loadings, multiple canonical correlations can be performed each time removing a different independent and dependent variable. 4.4.3 Application of Statistical Package : Canonical correlation Canonical correlation can be carried out in SPSS using syntax. There are two ways to perform the same. One is to use the Canonical correlation.sps macro. The other way is to use MANOVA with DISCRIM subcommand. 1. Canonical correlation.sps macro
The macro is a part of the SPSS package and can be found in a subdirectory where SPSS is installed. To use the canonical correlation macro, locate the file Canonical correlation.sps on the computer. Suppose that it is in c:\Program Files\spss. In the syntax window, type include file 'c:\Program files\spss\canonical correlation.sps'. cancorr set1=var1 var2 var3 /set2=var4 var5 var6. In the above syntax replace var1-var6 with variable names to be used in the canonical correlation analysis. 2. MANOVA
To use MANOVA the following syntax should be typed in the window: MANOVA set1 WITH set2 116
/DISCRIM ALL ALPHA(1) /PRINT SIG(EIG DIM). Replace set 1 and set2 with the variable lists. Then run the program by selecting Run from the menu. The data set is to be kept open in the data window while running the program. The MANOVA output contains also multivariate regression results in addition to canonical correlation analysis. The canonical correlation coefficients in the macro output have the same values, but opposite signs to the ones in the MANOVA output. The table names are also different, for example, the correlations between the variables under analysis and canonical variables are called loadings in the macro output. SUMMARY Selection of multivariate techniques to analyze the data is based on two criteria: dependents or independent variables and the type of data ie metric or non metric. The various multivariate techniques like factor, cluster, multiple regression and correlation discriminant analysis and canonical correlation were presented. The criteria for applying the statistical tests and the steps involved in conducting the same is explained in detail. Applications of these statistical tests using the software package were also discussed. Once the data analysis is done, the report has to be prepared to communicate the results to all concerned. The next unit on report writing deals with the same. Have you understood?

Explain the significance of multivariate techniques in the context of research studies. Identify a situation where factor analysis can be used. Discuss the steps involved in performing the
same. Explain the application of cluster analysis with example. Elucidate the process of performing the same in SPSS. How will you interpret the results? What are the uses of discriminant analysis? Explain the process of building a discriminant model. What is multiple regression? Explain the steps involved in the application of the same . When can you apply canonical correlation? Explain the steps involved in building the model.
Unit 5 The Research Report 5.1 Introduction Report writing is an integral part of a research process. Research reports are written to communicate to the world at large the results of the research, field work, and other activities. Research report is a concrete outcome of the research work undertaken. The quality of the research is judged by the quality of the writing and how well the importance of the findings is conveyed. A research carried out very scientifically revealing findings of great importance may not be of value if the same is not communicated effectively. In the context of business, the report assumes importance as it is through the reports the management gets information regarding the activities performed at various levels of the 117
organization. The management takes decisions and controls various activities of the business on the basis of information provided through the business reports. According to Louis L.N. Business report is an unbiased and arranged presentation of facts by one or more than one persons for a definite and specified important business purpose. Koontz and O'Donnell define report as a documentation in which by the purpose of providing information a specified problem is researched and analyzed and conclusions, thoughts and sometimes references are presented. In a nut shell a business report is any factual, objective document that serves a business purpose. This chapter provides an insight into the basics of writing research reports in addition to the contents and characteristic features of a good report. The contents of a research proposal and the use of visual aids in preparing reports are dealt in detail. 5.2 Learning objectives After reading this unit you will be able to understand;

The basics of research report The importance and types of report The characteristics of a good report The need for audience analysis The contents of a report The steps in generating a report The contents of a research proposal Use of visual aids in report
5.3 Purpose of business reports: The business reports are prepared for the following purposes:
Report enables the management to monitor the operations undertaken at various levels and control the
same The written report act as a guideline for future course of action. It enables to plan and organize things in an effective manner. The feed back regarding the various aspects, controls and processes implemented in the organization can be obtained through the reports. The information regarding specific problems or issues can be obtained by way of report. This report may be narrowly focused and provide the desired information to the management in a brief format. Information provided in the reports enables decision making. Report may also be prepared to convince the reader or to sell an idea. The report in this case would be more detailed and convincing as to how the proposed idea could add to the organizations value or the justification as to why it should be adopted. Report may also be prepared to provide several alternative solutions or recommendations so as to compare the pros and cons and select a best course of action. A detailed discussion of methodology, criteria for comparison, data analyses etc should be provided Reports may be prepared to provide an insight into the problem and may also provide a final solution to the same.
5.4 Types of reports: Reports can be classified on the basis of purpose, source, frequency, target audience, length, subject dealt, function performed and intention. 1. Source 118
Source refers to the person/persons who initiated the report. Voluntary reports are prepared on own initiative and they require more detail. The background of the subject should be more carefully planned. The authorized reports are those which are prepared as a response to a request made. 2. Frequency Routine or periodic reports are submitted on a recurring basis which may be weekly, monthly, daily etc. Some routine reports may be prepared in preprinted computerized form. Due to the routine nature of report, it requires only less introduction then the special reports. Special reports are nonrecurring in nature and they present the results of specific, one time studies or investigations. 3. Length A short report differs from a long report in scope, research and duration. A long report examines the problem in detail and requires more extensive time and effort in preparation. On the contrary a short report may discuss only a module of a problem. A summary is a short report which gives a concise overview of a situation. It highlights the important details but does not include background material, examples or specific details. A short report is suitable when the problem is well defined, is of limited scope and has a simple methodology. It normally runs to five pages. 4. Intent Informal reports focuses on the facts and explains or educates the readers. Analytical report is designed to solve a problem by convincing readers that the conclusions and recommendations reached are justified based on the data collected, analysis and interpretation. Information provided plays a supporting role in convincing the reader. 5. Function The reports may be classified as informative and interpretative on the basis of function performed. Informative reports present facts pertinent to the issue or situation. Common types of informational reports include those for monitoring and controlling operations, statements of policies and procedures, compliance reports and progress reports. It may take the form of an operating or a periodic report. Operating reports provides managers with detailed information regarding all activities like sales, inventory, costs etc., Periodic reports which describes the activities in a department during a particular period. Interpretative also knows as analytical or investigative report analyses the facts and presents recommendations and conclusions. The report presents facts and persuades the reader to accept a stated decision, action or the recommendations detailed throughout the report. It may take the form of problem solving report providing the background information and analysis about the various options. Trouble shooting reports is a form of problem solving report which discusses the source of the problem, extent of damage done and solutions possible. A feasibility report is a problem solving report that studies proposed options to assess whether all or any one of them is sound. 6. Subject dealt The reports may be categorized as problem determining, fact finding, performance report, technical report etc. The problem determining report focuses on underlying a problem or to ascertain whether a problem actually exists. Technical reports are concerned with presenting data on a specialized subject with or without comments. 7. Legal reports
119
Reports may be prepared to meet the government regulations. For eg., A compliance report explains what a company is doing to conform to the government regulations. It may be prepared on annual basis like the income tax returns, annual share holders report etc. Interim compliance reports can also be prepared to monitor and control the licenses granted by the government. 5.5 The Concept of audience Reports are written for the sake of audience ie the readers of the reports. The goal of report writer is to enable the audience to act and hence the audience should be taken into consideration, right from word choice, planning, organizing, deciding about the visual aids, sentence structure etc., A good report requires to tune up the various aspects of the audience viz., their knowledge level, their role in the given situation, their place in the organization and their attitude. (1)Knowledge level of the audience Knowledge level refers to the extent to which the audience are aware about the subject matter discussed in the report. This level ranges from expert to non expert. An expert audience understands the basic terminology, facts, concepts and implications associated with the topic. Information about the extent of audience knowledge enables to choose the information to be presented and the depth of explanation needed. In ascertaining the audience knowledge level the following aspects should be considered. Understanding the knowledge level of the audience The researcher should ascertain what the prospective reader of the report knows about the topic. The knowledge level can be ascertained by directly discussing with the audience .The duties and responsibilities occupied by the audience will also provide a key to understand their familiarity with the concept. Adapting to the knowledge level of audience The report should be adapted to the knowledge level of the audience by building on their schemata ie based on the concepts they have formed from their prior experiences. The basic principle here is that the report add the knowledge level of the audience and not to waste the time by concentrating on what the audience already knows. (2)Audience Role in the situation Audience take decision and plan the further course of action based on the report presented to them. A good report should be adapted to accommodate different audience roles. The topics and subtopics may be similar, but the report should be quite different because of the different roles of the intended audiences. In order to determine the audience roles the following aspects should be considered. i. Type of audience The audience for whom the report is drafted should be taken into account. The audience could be a single person or members of a committee or a large group. Sometimes there may be both primary audience as well as secondary audience. Primary audiences are those at whom the document is addressed. Secondary audiences are people who could read the report for information but not for immediate action. ii. Audience need
120
Information in reports could be presented in different format in different way. The audience need should be considered in deciding the format of report, its content, the details needed, the level of precision required and the time period within which the report is to be submitted. iii. Writers goal and audiences need The basic goal of a researcher/ the person who prepares the report is to enable the audience to perform the act. This has to be fulfilled by delivering the basic message that has a specific purpose. Basic message consists of the basic facts that need to be presented to filling the purpose which may be to inform, instruct or persuade the readers. If the goal is to persuade the reader to act in a certain manner because of the information, then the report should clearly point out the significance of the data and the action they support. iv. Audiences task Audience task refers to the type of activity which the audience will indulge in after reading the report. The audiences task may be different and hence the report directed at each one of audience should also be different. For eg., For a manager the report should contain explanatory paragraphs rather than the numbered how-to-do-it steps which will be apt for a operator. v. Number of audience In case of the presence of both the primary and the secondary audience, the report writer must decide whether the concentration should be on the primary or secondary audience. If a brief note is only needed than a lengthy format, the same should be preferred. If the same report is to be circulated among various members then brief informal report is inappropriate. (3) Audiences attitude Audience attitude refers to the expectations of a reader while reading the report. The expectations arise due to the readers role played in the organization, the social situation ,the feelings about the message provided and the sender. These attitudes powerfully affect the way the readers read the message in the report. The following factors helps to determine the attitudes of the individuals The consequences which arise out of the information given in the report should be considered. A positive message may be welcomed in a optimistic perspective by the readers. Understanding the history may provide an insight into the attitude of the reader. History is the situation prior to the report writing. The report writers need to show that the situation prior to writing report is understood. Otherwise the readers may dismiss the same on the ground that the writer does not understand the implications of what is being reported. The readers power affects his attitude towards the information provided in the report. Power is the supervisory relationship of the author and the reader. The more powerful the reader, the less likely the report will give orders and the more likely it is to make suggestions. Formality is the degree of impersonality in the document. Written report communicated to the reader is mostly presented in an official tone. The extent of formality depicted in the report will affect the perception of the reader regarding the message conveyed through the report Readers feeling regarding the subject dealt may be positively inclined, neutral or negatively inclined towards the topic. If there is a positive inclination towards the subject then the reader may be more receptive towards the message conveyed.
121
The writer should establish a relationship with the reader. The relationship is affected by the writers credibility and authority. If the readers believe that the writer has followed a clear and scientific method of investigating the topic, a positive image will be created. The audience expects messages in a certain form. To stimulate a positive attitude, the document should be presented in the form expected.
5.6 Basics of written reports This section deals with an overview of steps involved in writing reports and also highlight the characteristics of a good report. 5.6.1 Stages in report writing Report writing is a process which should be carried out at various stages. The goal of the writing process is to generate clear, effective document so as to enable audience to act. The writing process is performed in the following three stages viz., prewriting, writing and post writing stage. The stages are discussed below; I. Prewriting stage Prewriting stage involves planning the task for writing the reports. It includes collection of all the relevant information and deciding the steps to be followed. It involves three tasks viz., analyzing the situation, investigation and adaptation. 1. Analyzing the situation A thorough analysis of the situation should be made to decide whether the situation merits writing report. Sometimes it may be enough to make a phone call or email or conduct a meeting. If situation warranties writing reports, then the next step is to decide the type of report needed. It may be informational or an analytical report. In case of informational report, the specific purpose of the report should be defined and report type that is appropriate should be selected. For analytical reports, the problem should be defined before stating the purpose of the report. Problem definition The problem addressed by a report may be defined by the person who authorizes the report or by the researcher himself. The readers of the report should be convinced about the existence of the problem. This requires persuasive writing method. The problem definition can be made by answering the following issues: What needs to be ascertained? When did the problem start? What is the importance of the issue? Who are involved in the situation? Where is the trouble located? Problem factoring can also be done which involves breaking down the perceived problem into a series of logical, connected questions that try to identify the cause and effect. Speculating the cause for a problem leads to forming a hypothesis. A hypothesis is a potential explanation that needs to be tested. Dividing the problem and framing the hypothesis based on the available evidence enables to tackle even the most complex situation. 122
Developing the statement of purpose The problem statement enables to define what is going to be investigated whereas the statement of purpose defines why the report is prepared. The purpose statement can be started with an infinite phrase. For eg. To analyse the reasons for fall in the share price. Using an infinite phrase ( to plus a verb) encourages to take control and decide where the starting should be made. The purpose statement should be highly specific and the same should be checked with the person who has authorized the report. The confirmed statement can be used as the basis for developing the preliminary outline of the report. Developing a preliminary outline Preliminary outline establishes the framework for the report preparation. It provides a visual diagram of the report to be prepared, it important points, the order in which the discussion will take place and the details to be included. The preliminary outline might look different from the final outline of the report, however the outline guides the research effort and acts as a foundation for organizing and composing the report. Since outline is only a working draft it will be revised and modified in the further steps. The two common outline formats used to guide the writing efforts are; alphanumeric and decimal. The grammatical parallelism should be ensured among the various items presented at the same level. Parallelism ensures generality by showing that the ideas are related and they are of similar importance. Preparing the work plan Most of the reports have a firm deadline to be met. A carefully prepared work plan ensures that the quality reports are produced on the schedule. If the work plan is prepared for the researcher himself, it can be prepared in an informal manner. However in case of proposal, a detailed work plan should be prepared which becomes the basis for the contract if the proposal is accepted. A formal work plan might included the following elements; Statement of the problem which enables to stay focused on the core problem The purpose statement which describes the plan to accomplished with the report and the boundaries of the work. A description of the product that will arise out of the investigation. Many times the report may be the only outcome. A review of the project assignments, schedules and resource requirements indicating who will be responsible for what, when the task will be completed and how much will be the investigation cost. Plan for following up after delivering the report should be explained.

2. Investigating information Information should be gathered for writing reports on various perspectives such as the specific company information, trends, issues, product, events, related literature, micro and macro economic perspectives of the problem taken for the study etc. The following tasks should be completed in investigating the information;

Identify the right questions Find and access primary and secondary source of information Evaluate and finalize the resources Process the information Analyze the data Interpret the finding
The tasks mentioned need not be performed in specific order. 123
3. Adapting the report A good relationship with the audience should be maintained in order to ensure that the report is audience centered. A report will be successful, only if it focuses on the audience. The focus on the audience can be maintained by following the criteria given below: The you attitude should be followed and the report should answer the audience questions and solve their problems Emphasize should be given to the positive aspects. If the report recommends a negative action, the facts should be stated and the recommendation should be made positively Credibility should be established by building audience trust. The trust can be gained by researching the topic from all sides and documenting the findings with credible sources The report should address the audience in a polite manner. The audience respect should be earned by being courteous, kind and tactful Bias- free language should be used. Unethical and embarrassing blunders in language related to gender, race, age and disability should be avoided The style and language of the report should reflect and adapt to the image of the organization.
Selecting the appropriate channel and medium A right medium should be selected for conveying the report. It may be in the form of oral presentation, e-format, email, letter or a formal written report. Written reports are opted to convey complex lengthy information which needs to be presented in a structural format and is need for further reference. If immediate feedback is needed oral reports are appropriate. Electronic reports are stored in electronic media and may be distributed on disk, attached to an email or posted in the website. When compared to paper based reports, electronic reports enable to save cost and space. It also enables faster distribution as well enables to include multimedia features. The appropriate channel should be chosen based on the requirement of the audience and the researcher. II. Writing stage Actual composing of report should be preceded by organizing the material collected and arranging the same in a logical order that will meet the audience needs. The format, length, order and structure of the report should be decided before drafting the report. Deciding the format and length Four options are available to format the report viz., Preprinted form: The preprinted form is a fill-in-the-blank type report. These reports are relatively short and deals with only routine information. Memo: It is a short informal report distributed within the organization. It has headings and visual aids and if the length exceeds ten pages it is called as memo report. Letter: It includes the normal parts of a letter and in addition may have headings, footnotes, tables and illustrations. It is commonly used for reports of five or fewer pages that are directed to outsiders. Manuscript: It is commonly used for reports that require a formal approach and may range from a few pages to several hundred pages. The prefatory parts and supplementary parts will have more number of pages as the length of the report increases.
124
If the report is more of routine nature, the flexibility in deciding the format and length is much lesser. The length of the report is often decided by the subject matter and the type of relationship with the audience. Choosing the approach The researcher may choose a direct or indirect approach in writing the report. A direct report starts with the main idea first and thereby saves time and enables easier understanding of the report. The direct approach is used when the audience is more receptive or open minded. The report starts with the findings, conclusions and recommendations. This method is mostly followed in the business reports. The indirect approach withholds the main idea until the latter part of the report. If the audience is skeptical or hostile then the complete findings and all supporting details should be presented before presenting the findings and conclusions. Structuring the reports Structure of the report deals with the way in which the ideas will be subdivided and developed. The structure of the report depends on its type viz., informational, analytical, investigative etc. The reports may follow topical organization ie arranging materials according to one of the following topics: Materials may be organized on the basis of the importance of the subject matter. The most important topic may be presented first and least important at the end of the report If the report is presented on the process, then it should be arranged in sequential order of the process. If events are reported in the study then the same should be reported in the chronological order. If a physical object is discussed in a report ,then the same should be discussed from left to right, top to bottom, outside to inside. If the report is organized on the basis of geographical area occupied it has to be organized on the basis of the regions under study viz., city, district, state or country.
Composing Reports Once the decision regarding the length, approach and structure is made, composing of first draft can begin. The writing task should start with preparation of a final outline. This would act as a guide to the writing process and will also enable to critically evaluate the selection and order of information to be presented in the report. The outline preparation may lead to rephrasing the points and tone of the report. While composing the reports, the researcher should only concentrate on drafting the message and not editing or polishing the same which is taken at a later stage. While composing the reports the following points should be kept in mind; Formal language should be used in writing reports. Obsolete and pompous language should be avoided. Similarly using big words, trite expressions and overly complicated sentences to impress others should not be attempted. Correct words should be used in report. The words selected should convey the meaning clearly, specifically and dynamically. The words that are familiar to the audience should be chosen. Clinches and jargons can be used only when it is understood by the audiences for whom the report is directed to. Due attention should be paid to the grammatical accuracy of the content delivered as it affects the image of the researcher. The report should concentrate on presenting the facts The arguments for or against any aspect should be constructed in a rational manner
125
Active or passive voice should be used appropriately in composing the reports. Active voice can be used to emphasize the subject and to produce shorter sentences. Passive voice is mostly used in research reports as it is prepared in a formal situation. To ensure readability, the report should be broken up into paragraphs with suitable headings. Consistent time perspective should be ensured in the report i.e the report should be in past or present tense. The chronological sequence should also be adapted in presenting the events. The reader's perspective of the report might be different from the researcher's perspective. Hence a preview or road map of the report structure should be included. This will clarify the reader regarding the overall organization and flow of report.
III. Post writing stage A research report will undergo many drafts before finalization. The report is revised many times to ensure the content, organization, style and tone, readability, clarity and conciseness. Post writing stage involves revision of the report, production and proofreading the same. (1)Revision Revision takes place during and after preparation of the first draft. It is an ongoing process that occurs throughout the writing process. Revision involves search for best way of saying something, probing for right words, rephrasing sentences, reshaping, juggling elements etc. Revision is a never ending process, however the every research report has a deadline and hence schedules should be drawn and met. Revision consists of three main activities viz,(i) evaluating content, organization ,style and tone (ii) reviewing for readability and scannability and (iii) editing for clarity and conciseness (i) Evaluating content, organization, style and tone During the process of evaluating the content the following aspects should be given due attention;

Accuracy of the information presented Relevance of the facts presented to the concerned audience Completeness of information provided to suite the audience needs Balance between specific and general information
While reviewing the organization the following aspects should be considered;

Logical order in presentation and coverage of all main points to be ensured Assuring that the main theme is given more space and prominence Correctness in the sequence of presentation Grouping of scattered details in an appropriate manner
More attention should be given to the introduction and conclusion of the report as it has major impact on the audience. The words used should be of right style and tone. The opening statements should be relevant, interesting and enticing the reader to reader further. It should establish the subject, purpose and organization of the information in the report. The conclusion should be reviewed to ensure that it summarizes the main idea and leaves the reader with a positive impression. (ii) Reviewing for readability and scannability
126
Readability depends on choice of words, sentence length, sentence structure, organization and the physical appearance of the message. The following techniques can be used to ensure readability: Variety in sentence structure makes the information presented more appealing to the reader. While long sentences should be avoided, use of too many short sentences should not be attempted. Average sentence length should consist of 20 words or fewer. Important ideas can be presented in the forms of list. Lists are effective tools for highlighting and simplifying the information presented. It provide the reader with clues, simplifies the complex subjects, highlights the main point, breaks up the pages visually and ease the skimming process for busy readers Heading is a brief title that provides cues to the reader about the content of the section that follows. Heading should be properly used to attract the readers' attention and to divide the material into shorter sections.
(iii) Editing for clarity and conciseness Clarity in information presented should be ensured. Clarity prevents confusion. If the information is presented in a cluttered manner it can be interpreted by the reader in several ways which is not intended by the researcher. The following aspects should be considered to ensure clarity; Long sentences should be broken up. Connecting too many clauses with and should be avoided. Too many hedging statements should be avoided. Parallelism should be ensured among related ideas. It can be achieved by repeating the pattern in words, phrases, clause or entire sentences. Long noun sequences should be avoided Words ending with -ion,-tion, -ing, -ment, -ant, -ent, -ance, and -ency should be used with care as they change verbs into nouns and adjectives. Verbs should be used instead of noun phrases Reports use expressions like ' above-mentioned, as mentioned above, the former, the latter, respectively '. These words cause reader to jump from point to point and hence these awkward references should be kept to the minimum.

To ensure conciseness, every word in the report should be carefully scrutinized. Words which do not serve any function should be eliminated. Every long word should be replaced with a short word. Conciseness should be ensured by way of deleting unnecessary words and phrases, shortening words and phrases and by eliminating redundancies. Use of computers enables to revise the report in a much faster and efficient manner. Word processor helps to add, delete and move text with functions like cut and paste, search and replace, replace all options etc. Autocorrect feature enables to store words commonly misspelled or mistyped along with correct spelling. History of revisions made can also be fetched by enabling the software options. Three advanced software functions viz., spell checker, thesaurus and grammar checker enables to create an effective report. (2) Production the report Producing the report involves adding elements such as graphics and designing the page layout to give the report attractive and contemporary appearance. Adding graphics is dealt in detail in the latter part of this unit. The appearance of the report meets the eyes of the reader first and plays an important role in creating impression. Effective design should have the following elements; Consistency should be ensured throughout the report in terms of the margins, typeface, type size, spacing, paragraph indent, boarders, columns etc. Proper balance should be maintained between the text, graphs and white space.
127
Too much of highlighting, decorative touches and design element should be avoided. Simplicity in design should be aimed at. Attention should be paid to details like heading should not be separated from the information, avoiding narrow columns and the like. Variety of design elements such as line justification, type faces, styles etc can be used to create a professional and interesting report , but it should be kept in mind that too many design elements might confuse the reader.
(3) Proof reading While proofreading attention should be devoted to spelling, punctuations and typographical errors. Credibility of the researcher is affected by the attention paid to details, mechanics and form. Researcher should carefully check the grammar usage, language errors, missing material, design errors and typographical errors. Design errors include elements like wrong typeface, wrong type style, misalignment etc. typographical errors include uneven spacing between lines and words, heading at the bottom of a page, incorrect hyphenation, non confirmation with the guidelines provided etc. Attention should also be paid to overall format. Routine documents will have only fewer elements to check. Longer more complex documents have many components that need checking and more time should be devoted for the same. The three stages in report writing is summarized in the following pictorial representation. Stages in report writing
5.6.2 Characteristics of a good report The desirable features of a good report is dealt under various sections of report writing discussed in previous pages. A summary of the salient features are listed below; A good research report should focus on the purpose of the study and the type of the audience It should also have clarity, conciseness and coherence Right emphasis should be place on the important aspects of the problem identified meaningful organization of paragraphs, sentences and smooth transition from one topic to next should be achieved by ensuring , parallelism, specificity etc. The report should be free of technical or statistical jargon if the same is addressed to audiences who may not understand. Care should be taken to avoid grammatical, spelling and typographical errors. Assumptions made by the researcher should be clearly spelled out Operational definitions of words used with specific meaning should be given in the beginning of the report The report should be organized in a meaningful manner so as to enable smooth flow of information

128
Side headings should be properly used to ensure smooth and logical flow of meaning Appearance should be given due emphasize so that a professional image can be created Ambiguity, multiple meanings and allusions should be avoided by choosing the right words and sentences The report should adhere to the guidelines specified regarding the format etc and should be prepared within the schedule provided.
5.7 Integral parts of a report Research report has a set of identifiable components. The components of report should be decided keeping in mind the needs of the audience. The headings and side headings should also focus on the requirements of the audience and the problem identified for the study. Generally a research report consists of three parts; the preliminaries, the text and the reference materials. Each of the main parts may consist of several subsections as shown below;
A. The preliminaries The preliminaries do not make a direct contribution to the identified research problem however it assists the reader in using the research report. The subsections in preliminaries are discussed below; Letter of transmittal A letter of transmittal is required in case of formal relationship between the researcher and audience at whom the report is directed at. It is mostly used in case of carrying out research work for a specific client or for an outside organization. The letter should highlight the authorization for conducting the project and the specific instructions provided to complete the study. It should also state the purpose and scope of the study. The letter of transmittal is not necessary if the report is aimed at authorities within the organization. Title page Most of the organization have their own form of title page for the research report and the same should be complied with. The title page generally has the following information; Title of the report The month and year of submission For whom and by whom the report is submitted If project report submitted for award of degree, the degree for which the dissertation is submitted for should be listed

The best practice is to centre the title of the report on the page in upper case letters. If the title is too long to be centered on one line, an inverted pyramid principle should be followed without splitting word or phrases. Preface The preface may include the writer's purpose in conducting the study, a brief resume of the background, scope, purpose, general nature of the research for which the report is prepared and the acknowledgments. A preface can be prepared only after the final form of the report is ready. In the case of dissertation submitted for award of degree the preface is omitted and instead an acknowledgment is added. 129
Acknowledgment recognizes the persons to whom the writer is indebted for guidance and assistance during the study. It also credits the institution for providing funds to conduct the study and for the granting permission to use the facilities. The researcher should acknowledge the assistance provided by all concerned honestly in a simple and tactful manner. Executive summary An executive summary is a brief account of the research study. It is a report in miniature covering all aspects in the body of the report but in a brief manner. It provides an overview of the research problem identified and highlights the important information such as the sampling design, data collection method used, results of data analysis, findings and recommendation. The length of the executive summary will normally be two to three pages. The executive summary is usually written after the completion of the report. Sometimes a synopsis or an abstract may be included instead of the executive summary; however they are not one and the same. Executive summaries are more comprehensive than a synopsis. It includes heading, visual aids and enough information to help busy people to make quick decision. Although executive summaries are not designed to replace the report, in some cases it may be the only thing that may be read by the audience. By contrast, a synopsis is only a brief overview of the entire report and may either highlight the main points as they appear in the report or simply inform the reader as to the content of the report. The purpose of synopsis is to entice the audience to read the report. Table of contents The table of contents includes the major divisions of the report. It indicates in outline form the topics included in the report. The purpose of a table of contents is to provide an analytical overview of the topics included in the report together with the sequence of presentation. Depending on the length and complexity of the report, the content page may show only the top two or three levels of headings or only the first-level headings. Care should be exercised to see that the titles of chapters and captions of subdivisions within chapters correspond exactly with those included in the body of the report. Page numbers for each of the divisions are given. The relationship between major divisions and minor subdivisions should be shown by using capital letters and indentation or by using numeric system. The table of contents is prepared after the other parts of report have been typed, so that the page numbers can be given. If they are fewer than four visual aids, the same may be listed in the table of contents, but if there are more than four visual aids, a separate list of illustration should be prepared. Some guidelines for writing table of contents are given below;

The page is titled as Table of Contents or Contents The name of each section should be worded and formatted as it appears in the text The table of contents should not be underlined as they may overwhelm the words Use only the page number on which the section starts The margins should be set such that the page numbers align on the right Not more than three levels of headings should be given The leaders, a series of dots can be used to connect the words to page numbers
List of Tables The researcher should prepare a list of tables compiled under the heading LIST OF TABLES. It should be centered on a separate page by itself. Two spaces below the headings 'Table number', 'Title', and 'Page number' should be given. Table number should be aligned to the left, page number should be aligned at the right and the title should be centered. 130
List of Illustrations The list of figures should be prepared in the same form as the list of tables. The page is headed as LIST OF FIGURES. The list includes the Figure number, title of the figure and page number. Normally arabic numerals are used for numbering. B. The Text The text is the most important part of a report as it is in this section that the writer presents the facts. The researcher should devote the greater part of attention to the careful organization and presentation of his findings or arguments. The text may be organized as Introduction, methodology and as many chapters as required for presenting the report. Introduction The introduction prepares the reader for the report by describing the various parts; Background, problem statement and research objectives. Background The background information provides a prelude to the reader of the research report. It may be the preliminary results of exploration the survey or any other source. The secondary data from the literature review could also be highlighted. Previous research, theory or situations that led to the research issue can be discussed. The literature should be organized, integrated and presented in a logical manner. The background includes definitions, assumptions etc. It provides the needed information to understand the remainder of the research report. It contains information pertinent to the management problem or the situation that led to the study. It may be placed before the problem statement. Problem statement The problem statement contains the need for the research project. The problem is usually represented by a management question. It is followed by a more detailed set of objectives. The guidelines are given below;

It gives basic facts about the problem It specify the causes or origin of the problem It explains the significance of the problem
Research objectives The research objectives provide the purpose of the research. The objectives may be research questions and associated investigative questions. In correlational study, the hypothesis statements are included. Hypotheses are declarative statements describing the relationship between two or more variables. They state clearly the variables of concern, the relationships among them, and the target group being studied. Operational definitions of variables should be included. Methodology The methodology contains the following sections;

The type of the study viz., descriptive, exploratory should be mentioned in the methodology The sampling design explains the sample method and sample size The data collection method is described in the report. The tools used for analysis of data should be explained 131
Findings and Conclusions The findings section is generally the longest section of the report. The objective is to explain the data. Wherever needed the data should be supplemented with charts, and graphs. The conclusion serves the important function of tying together the whole thesis or assignment. The recommendations of the study are also presented in this section. It provides idea about the corrective actions. In academic research, the suggestions broaden the understanding of the subject area. In applied research, the recommendation includes the guidelines for further managerial actions. Several alternatives may be provided with further justifications. The conclusion should leave the reader with the impression of completeness and of positive gain. C. Reference material The reference material includes, bibliography, appendix and index. Bibliography The bibliography follows the main body of the text and is a separate but integral part of a thesis, preceded by a division sheet or introduced by a centered capitalized heading BIBLIOGRAPHY. A bibliography is a list of secondary sources consulted while preparing the report. In a proper sense bibliography differs from the reference list. A bibliography is the listing of the work that is relevant to the main topic of research interest arranged in the alphabetical order of the last names of the authors. A reference list is a subset of bibliography. It includes details of all the citations used in the literature survey and elsewhere in the research report, arranged in the alphabetical order of the last names of the author. These citations are provided for the purpose of crediting the author and enabling the reader to find the works cited Proper citation, style and formats should be followed in providing reference. Various methods of referencing are available viz., Publication manual of the American Psychological Association (APA), The Chicago Manual of Style, The Modern Language Association (MLA) System, American Chemical Society (ACS) system. Each of the manuals specifies with examples, how the books, journals, newspaper articles, dissertations and so on should be referenced. For books the order may be as under: 1. 2. 3. 4. Example: Peeru Mohamed et.al, Customer Relationship Management, Delhi, Vikas publishing house, 2002. References for articles in journals could be cited as under: 1. 2. 3. 4. 5. 6. Name of the author, last name first title of article in quotation marks Name of periodical, in italics The volume or volume and number. The date of the issue The pagination 132 Name of the author, last name first Title of the book in italics Place of publication and the publisher Year of publication
Example Chitra.K, " In search of Green Consumer: A Perceptual Study", Journal of Services Research, Volume 7, No.1, AprilSeptember, 2007, pp.173-191. The above examples are just samples for bibliography entries. There are many other acceptable forms which can be used. However a researcher should follow a consistent style of reference throughout the report. Appendix The appendix contains information of a subordinate, supplementary or highly technical nature that the researcher does not want to place in the body of the report. Each appendix should be clearly separated from the other and should be listed in the table of contents. The guidelines for preparing appendix are;

Each appendix item should be referred in the appropriate place in the body of the report In short reports, the page number numbers may be continued in sequence from the last page of the
body In long reports, a separate pagination system can be followed as the appendixes are often identified as Appendix A, Appendix B, and so on. The page numbers can be given along with the appropriate letter: A-1, A-2, B-1, B-2. The illustrations in the appendix may continue with the sequence started in the body of the report
Index The index should be included after bibliography and the appendix. It acts as a good guide to the reader. Index may be prepared both as subject index and author index. The subject index gives the names of the subject-topics or concepts along with the number of pages on which they have appeared or discussed in the report. The author index gives similar information regarding the names of the authors. The index should always be arranged alphabetically. An index is not required for an unpublished thesis or a report. If the finding in the report is subsequently published as a book, monograph or bulletin, an index is necessary. 5.8 Research proposal A research proposal is also a type of research report prepared for getting the permission to proceed with the research work. It is a work plan, outline, statement or intent or draft plan of the proposed research work. It gives an insight into what, why, how, where and for whom the research is done for. It is a road map showing all elements of the research process and resources required at every step right from the beginning to the end. The preparation of research proposal benefits both the researcher and the research sponsor. The research proposal enables the sponsor to assess the research design and the validity of the same. The sincerity of the researcher can be evaluated by comparing the completed work with the proposal. It serves as the basic for additional discussion on the problem identified. The proposal benefits the researcher more than that of the sponsor as it necessitates the researcher to plan and review the logical steps involved in the research process. This enables the researcher to revise the research process where needed. It acts as an outline for preparing the final project report. The research proposal can be prepared for the internal or external audiences. An internal proposal is prepared by the research department or staff within the firm. External proposal is sanctioned by outsiders like the government agencies or University Grants Commission and the like sponsors. 5.8.1 Structure of the research proposal 133
The research proposal may include the following modules. The modules are flexible. The contents and length can be altered to suit the requirement of the researcher and the sponsoring agents. A brief overview of the contents is dealt below; Executive summary Executive summary enables the sponsors to understand the core of the research proposal within a short time. The goal of the summary is to secure a positive evaluation by the sponsors who will authorize the research work. It should include a brief background of the research work proposed, its importance, the objectives, the proposed research design, the deliverables and the implication of conducting the research work. It should highlight the benefits of conducting the proposed research. Problem statement This section should provide the background of the problem, consequences and the implication of the same to the management or the sponsor. The importance of the finding answer to the research question should be asserted. It should also specify the boundary line of the problem and the issues which may not be addressed. The problem statement should be clear to the management to make the decision regarding its significance and the future action required to solve the same. Research objectives This section highlights the purpose of conducting the research. It should give specific, concrete and achievable goals. The objectives should be listed in the order of importance or it can be specified in a general term. Later on specific objectives could be highlighted. It is core of the proposed research work and also for the final research report. Review of literature This module examines and presents the recent research studies, industry reports etc that supports the proposed study. Unnecessary information should be avoided. A brief review of the information of interest should be highlighted. The objectives, methodology, results and conclusions of the similar studies should be presented. The researcher should discuss how the literature applies to the proposed study and the gap which will be addressed by the conducting the study. Benefits of the study The explicit benefits that can be gained by conducting the study should be highlighted. The importance of doing the study is emphasized. This section gains more importance if the proposal is submitted to an external body, particularly if it is an unsolicited proposal. This section should be geared to convince the sponsor that their needs will be met by the conduct of the study. Research design The design module describes the technical issues involved in conducting the study. What is going to be done is described in technical terms. It can be divided into many subsection viz., type of study, sampling design, data collection method tools for analysis , scope of the study and limitations. The justification as to why the particular method of sampling or data collection is opted should also be discussed. Qualifications of the researcher 134
This section should provide the names of the principal investigator and co-investigators individuals involved in the project. The professional research competence and experience of the researchers should be highlighted to assure the sponsor. The academic experience, research experience and similar projects conducted for internal and external agencies should be listed. The membership of the researcher in various associations and other relevant accomplishment can be mentioned. A profile of the researcher can be enclosed in the appendix of the report. Budget Budget should be prepared in the format required by the sponsoring agents. the details to be presented in the budget varies depending on the sponsors requirements. It should not be more than one or two pages. All the expenses should be presented with a proper breakup. Schedule The schedule should indicate the major phases of the project, the time required at each phase and the milestones that determines the completion of the project. For example the major phases may be refining the problem based on interaction with management, tuning up the objectives, designing the questionnaire, conducting pilot study, data collection, analysis and interpretation and report writing. Each of the phases should be presented along with the time schedule and the resources including the people assigned to complete the work. Facilities and special resources The special facilities or resources needed to complete the project should be described in detail along with the justification for the same. The proposal should carefully list the relevant facilities and the resources that will be used. The costs for such facility should also be detailed in the budget/ Apart form the above the bibliography listing the books, journals, websites referred should be mentioned in the alphabetical order. The appendixes including the glossary of terms, questionnaire, profile of the investigator etc should be prepared. For a detailed discussion of the sections refer the integral parts of research report. 5.9 Visual aids in reports Visual aids are an essential part of report. Carefully presented visual aids can make the report more interesting and understandable. Visual aids have the simple purpose of revealing the data. It enables to understand complex data and the interrelationship among data in a easier manner. It also enables to view data from different perspectives. The purposes of using visuals are; To summarize data and present information in concise form. To provide an opportunity to the reader to explore data on their own. The reader can focus on any aspects that are relevant to their needs. To orient the readers to the topic even before the text is presented. To communicate effectively with diverse audience. To attract and hold readers attention. To make the reader understand the text description and quantitative information in a better manner. To simplify information by breaking complicated description into components that can be depicted with conceptual models, flowcharts, diagrams etc., To summarize major points in a narrative form with the help of charts that sum up data To enhance the retention of important message in readers mind by presenting the same in visual form.

5.9.1 Steps in creating visual aids 135
The steps involved in creating visual aids viz., Planning, drafting, finishing and discussing are presented; Planning the visuals The visual aid is an opportunity to present data and to engage the reader. The overall goal of presenting visuals is to help the reader to find the needed information. While planning the visuals the following aspects should be considered;

The level of knowledge of the reader The need for the information The researchers goal in presenting the visuals
The researcher should consider the following aspects;

Each visual aid has a one main point to communicate The time needed for presenting a clear visual The layout of the visuals
Drafting the visuals Drafting involves producing the visuals, revising the same until it produces the data in the most effective manner. It involves selecting the type of visuals, selecting the wordings, tick marks, data line characteristics, type of legends, colours etc. the various elements are to be selected and tried until the best visual is produced to present the concept in a clear manner. Finishing the visuals Care should be exercised to finish all the visuals in a consistent manner. The visuals should create a pleasant view. Cluttering of pages with unnecessary visuals or presenting simple facts that could be easily understood by text should be avoided. Discussing the Visual aids The readers attention should be carefully guided to the visual aids to be discussed. The description of the visual aids could be done at three levels viz., elementary, intermediate or overall information. In addition the background which necessitates the visuals, the methodology viz the aspects used to represent the various components in the visual and the overall significance of information derived out of visuals should be explained. The visual can be referred by number. If it is presented after or before several pages, the page number should also be mentioned apart from the visual number. Either textual or parenthetical method can be followed in referring the visuals. The textual reference is a statement in the text that calls attention to the visual aid. For example; Table 1 shows that the price of the product is declining gradually. In parenthetical reference the visual aid is referred in parentheses in sentence. For example; The sales shows an increasing trend (see figure 22). 5.9.2 Guidelines for creating effective visual aids The following guidelines can lead to creation of an effective visual aid. The researcher should plan for the visual aids and develop the same as early as planning the first draft of report.
136
Each visual aid should be prepared in such a manner that it concentrates on conveying one point only. If too much of data is included, the reader may not be able to grasp the meaning clearly. The visual aids should be positioned in the report at logical and convenient places The visuals should be revised to eliminate clutter in terms of unnecessary words, lines , three dimensions etc High quality visuals should be created in terms of clarity in lines, words, numbers and organizations as it is an important aspect which determines the effectiveness of a report.
Various types of visuals are available to present the data. Some types of visuals depict certain kinds of data better than others;
o o
Tables can be opted to present detailed, exact values Frequencies and pie chart can be represented better with pie chart, segmented bar chart or area Line chart or bar chart can be used to illustrate trend over a time period Bar chart is used to compare one item with another Pie chart is used to compare on part with the whole Line chart, bar chart or a scatter chart can be used to depict correlations Map is used to show the geographical relationship Flowchart or diagram is used to illustrate a process or a procedure.
chart
o o o o o o
5.9.3 Types of visuals The various types of visuals are discussed in detail. 1. Tables A table is a collection of information presented in columns and rows. Tables should contain enough information to enable the readers to understand its contents. It should have a caption that contains the table number and title, rules, column heads, data and notes. The title should explain the subject of the table, details regarding the data classification and the time period or other related matters. A subtitle is sometimes included under the title to explain some aspects of a table like the statement explaining the measurement units in which the data is expressed. The contents of the columns are explained by the column heads and the row contents are explained by the stub. The body of the table contains the data and the footnote contains the needed explanation. Footnotes should be identified by letters or symbols such as asterisks. The source note should also be presented. Tables should be accompanied by text to direct the readers attention to the important figures. The guidelines relating to creation of table are given below; The tables should be numbered consecutively throughout the report. The number and the title are given above the table. Table title should be informative and identify the main points of the table. Horizontal rules are used to separate the parts of the table. The rules are placed above and below the column heads and below the last row of the table. Vertical lines can also be used to separate the column. Spanner head should be used to characterize the column headings. The spanners eliminate repetition in column headings Common understandable units should be used. All items in a column should be expressed in the same unit and rounded off for simplicity Column or row total should be provided where needed. Explanatory comments should be placed below the table with the word Note Source of the data given in the table should be mentioned.
137
2. Line graphs Line graph depicts trends or relationship. It shows the relationship between two variables by a line connecting points in X axis and Y axis. The line graphs usually show trends over time. The line connects the points and its ups and downs illustrate the changes. Line graphs have conventional parts; a caption that contains number and title, axis rules, axis labels and a legend. Some guidelines to create line graphs are given below; The figures should be numbered consecutively using Arabic numerals. Brief clear title should be used to specify the content of the graph The caption can be given either above or below the figure, but a consistent pattern should be followed throughout the report The independent variables are recorded on the X axis and the dependent variables on the Y axis Clear axis labels should be provided If the graphs have more than one line, visual distinct should be made between the same. The lines should also be identified with labels or in a legend

Example: A surface chart, also called an area chart, is a form of line chart with a cumulative effect; all the lines add up to the top line, which represents the total. This form of chart helps to illustrate the changes in the composition of something over time. In preparing the surface chart, the most important segment should be put in the baseline and the number of strata should be restricted to four or five. 3. Bar graphs A bar chart depicts numbers by height or length of its rectangular bars. It makes numbers easy to read and understand. Bar charts are very much useful to

Compare the size of several items at one time To track changes over time To indicate the composition of several items over time To show the relative size of components of a whole
The guidelines to prepare bar charts are; Proper title that informs about the content should be given and the graphs should be numbered consecutively in Arabic numerals The independent variables to be placed on horizontal axis and the dependent variable on the vertical axis Proper axis labels and description of legends should be adhered to in a consistent manner. Elaborate cross hatching and striping should be avoided Subdivision of bars can be made to show additional comparison
138
A bar chart can be created in many ways depending on the need and creativity of the researcher. However care should be exercised to see that the width of all bars are be uniform and are placed evenly in logical order. 4. Pie charts A pie chart is used to show the relative sizes of parts of a whole. It uses segments of a circle to indicate percentages of a total. The whole circle represents 100 percent, the segments of circle represents each items percentage of the total. Pie charts are effective way to show percentages or to compare one segment with another. General guidelines are; A pie chart should not be divided into more that five segments as the reader may have difficulty in differentiating the sizes of small segments Segments should be identified with legends or call outs The segments should be arranged in sequence clockwise form largest to smallest Different color s or patterns can be used to distinguish the various pieces. All the segments put together should add up to 100 percent, if percentages are used. Percentages can be placed inside the segments The segment which needs greater attention can be exploded i.e pulled out from the rest of the segment.
Example: 5. Pictograms A chart that uses symbols instead of words or numbers to portray data is known as pictogram. It is very novel way of presentation and it conveys more literal visual messages. Pictograms enhance reports value. 6. Flow charts Flow charts are used to show a time sequence, decision sequence or conceptual relationships. The flowcharts are indispensable when illustrating processes, procedures and sequential relationships. Arrows indicate the direction of the action, and symbols represent steps or particular points in the action. In case of computer programming the symbols have special shapes for certain activities. 7. Organization charts The organization chart illustrates the positions, units or functions of an organization and the way they interrelate. Organization charts are used to depict the interrelationships among the parts of an organization. An organizations normal communication channels can be explained in detail with the help of organization charts. 139
8. Decision charts A decision chart or decision tree is a flow chart that uses graphs to explain whether or not to perform a certain action in a certain situation. At each point, the reader must decide yes or no and then follow the appropriate path until the final goad is reached. 9. Gantt Charts A Gantt chart represents the schedule of a project. Unit of time is represented along the horizontal axis and sub processes are explained on the vertical axis. The lines indicate the starting and ending point of each sub process. 10. Maps Maps are used to represent statistics by geographical area. It is also used to show location relationships. Maps can be used to show regional differences in sales of the company. The maps can be illustrated to suit the needs. It can use dots, shading, colour lines, labels, numbers and symbols. The computer software like Excel and coral draw has templates which makes the production of maps easier. 11. Photographs Photographs enable to capture the exact appearance of an object and uses visual appeals to capture the readers attention. The advent in technology like the digital cameras has reduced the cost of including photographs drastically. Further modification of photos to the requirement can be made with the help of software. It duplicates the items to be discussed and also shows the relationships among various parts. However photographs reduces the three dimensional reality to two dimensions. Photographs can be used to provide general introduction to orient the readers towards the object. 12. Drawings and diagrams Drawings and diagrams are often used to show how something looks or operates. Diagrams can be much clearer than words in explaining the readers the process or the uses of an object. A variety of software programs can be used to add decorative touch to the report. The drawings/diagrams enables to eliminate unnecessary details so that the readers can focus on important aspect. Two commonly used drawings are the exploded view and the detailed drawing. An exploded view shows the parts disconnected but arranged in the order in which they fit together. They are used to show the internal parts of a small and intricate object or to explain how the objects are assembled. Manuals often use exploded drawings with named or numbered parts. Detailed drawings are renditions of particular parts or assemblies. Have you understood? Discuss the need for understanding the audience while preparing the research report? What are the steps involved in writing a research report? Discuss the contents of a report? Prepare a research proposal for identifying the market potential of a new product launched by your concern. What type of visual aids can be used for the presenting a report on the customer satisfaction of a new brand of laptop introduced by your concern in the market.

140
SUMMARY The research report is prepared to communicate the research findings. This unit covered the different types of reports. The importance of audience analysis was explained. The steps involved in the preparation of report and the integral parts of the report were discussed. The contents of research proposal were highlighted. In addition the basic guidelines to use the visual aids and the various types of visual aids were dealt.
141

RSB

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RSB

Uploaded by

Copyright:

Available Formats

Unit 1 Introduction to Research 1.

Control variables Level I Level II

Treatment A Cell 1 Cell 2

Treatment B Cell 3 Cell 4

vi. vii. viii. ix. x. xi.

Registration/ Login Additional information links Questionnaire questions

i. ii. iii. iv. v. vi.

Select the Data Collection Method

Identify the Sampling Frame needed

Select the Appropriate Sampling Method

Determine necessary sample size and overall contact rates

Create an Operating plan for selecting sampling units

xB ( A B ) 4.136 It is known that x A x B = 5 (The difference in the mean of two stores)

1000 = 510.20 1.96

The standard error x is given by

where the population standard deviation

3000 = 510.20 n n= 3000 = 5.88 510.20

A B C D E F Eigenvalue % of variance Cumulative %

h2 .65 .61 .48 .50 .61 .72

The following Factor Analysis dialog box will appear. 92

Click on Rotation. The Factor Analysis: Rotation dialog box appears.

Analyze Classify Discriminant...

Select the dependent variable

The tasks mentioned need not be performed in specific order. 123

While reviewing the organization the following aspects should be considered;

5.9.1 Steps in creating visual aids 135

The researcher should consider the following aspects;

You might also like