# Chap.6.

Data analysis

6.1. Information systems used for data analysis 6.2. Descriptive statistics 6.3. Inferential statistics

**6.1. Information systems used for data analysis
**

SPSS System (Statistical Package for Social Sciences) is used on a large scale in marketing research for data analysis.

It is used mainly for data gathered with the help of questionnaires but also for various quantitative data from statistics, company’s recording etc.) The obtained information is presented as tables and charts. It offers multiple ways of data analysis like: summarize data, transforming variables, statistical tests etc.

The flow of using SPSS system for information processing Data gathering Creating SPSS data base Selecting the procedure of data analysis Selecting the variables for analysis Data processing in order to obtain the information

Data gathering Depends on the research method: Surveys, statistical data bases, company recordings etc. Avoiding data gathering errors is very important for the research success. The researcher should pay special attention to: Proper training of the operators that collect data. Verification in the fieldwork to ensure that the interviewers are following the sampling procedures. Controlling the data recordings to determine whether interviewers are cheating. Secondary data – official statistics, questionnaire.

Creating SPSS data base In order to create a data base in SPSS the following steps are followed: Opening a new file Defining the variables of research Recording data in the data base Verification of recorded data

Start/ Programs/ SPSS for Windows

1.6. Information systems used for data analysis A new empty data base .

Information systems used for data analysis The window for defining variables .1.6.

1. Information systems used for data analysis Setting the type of data .6.

1. Information systems used for data analysis Defining the codes for response categories .6.

1. Information systems used for data analysis Defining the codes for missing responses .6.

Coding data The process of identifying and assigning numerical scores or other character symbols to data expressed in words. Codes allow data to be processed by computers. Codes facilitate the introduction of data in data bases. Coding depends on the type of scale used in questionnaire.

Ex: Nominal scale What brand of cigarettes do you smoke most often? Winston (1) L&M (2) Kent (3) Marlboro (4) Winchester (5) Viceroy (6) Other. Please specify ________ (7) Attention: The assigned codes do not represent an order or a specific quantity. They are allotted only for identification of a response category (like the numbers of football players) Binary (dichotomus) scale: particular case Are you smoking? Yes (1) No (0)

Ex: Ordinal scale 1. The rank order scale according to a characteristic: Please rank the following 5 brands of laundry detergent according to your preference (give the rank 1 to the most preferred brand, rank 2 for the second preferred brand and so on until the rank 5 for the least preferred brand). OMO ARIEL DERO PERSIL TIDE Coding: in this case it is defined a variable for every response category. The rank assigned by every respondent (from 1 to 5) will be introduced in data base. Attention: for the ordinal scales, the codes assigned generate an order.

2. Likert scale Please indicate your opinion related to the following statement: "When somebody chooses a laundry detergent, the price is the most important, all brands having about the same whitening power". __(5)__ __(4)___ _____(3)____ ____(2)____ __(1)__ strongly agree neither agree disagree strongly agree nor disagree disagree 3. Semantic differential How much important is the ratio quality – price when you choose a brand of laundry detergent? __(5)__ __(4)___ _____(3)____ ____(2)____ __(1)__ very important neither important not important not at all important nor unimportant important 4. Numerical scale How satisfied you are with the whitening power of Ariel laundry detergent? Very satisfied 5 4 3 2 1 Very dissatisfied Usually, in this case only the extreme values are coded (1= very dissatisfied, 5=very satisfied)

Interval scale The middle point of every interval is recorded in data base. Ex: How many cigarettes do you generally smoke during a day ? 5-9 (7) 10-14 (12) 15-19 (17) 20-24 (22) 25-29 (27) Ratio scale For this type of scale, coding is not used. In the data base it is recorded the exact value indicated by the respondent. This one is used both as value of the variable and code of the response category. Ex: How many hours do you study for an exam during the examination session?____5 h____

Ex: Divide 100 points among each of the following brands according to your preference for the brand: ARIEL __40___ DERO __20___ PERSIL __30__ TIDE __10___ Coding: in this case it is defined a variable for every response category (like in the case of rank order scale). The value assigned by every respondent will be introduced in the data base.

2. calculation of central tendency and variation indicators.6. . Descriptive statistics Descriptive analysis Refers to the transformation of raw data into a form that will make them easy to understand and interpret (summarize data). Charts could be associated to frequency tables in order to facilitate the understanding of information. Attention: Descriptive statistics is computed exclusively at the level of sample. percentage distribution. using the data collected from the sample members. The most common ways to summarize data are: frequency distribution.

6. Descriptive statistics Selecting the procedures of descriptive analysis in SPSS .2.

General Happiness Frequency Very Happy 467 Pretty Happy 872 Not Too Happy 165 Total 1504 NA 13 1517 Percent 30.6.2.1 .0 100.0 Valid Percent 31.9 99.8 57.5 10.0 Cumulative Percent 31.1 58.9 100.0 100.0 11.1 89. Descriptive statistics Frequency table An arrangement of statistical data in a row-and-column format that exhibits the count of responses and percentages for each category assigned to a variable.0 Valid Missing Total .

2. n For binary scale i i ∑x f x= i =1 n x= 1 ⋅ fYes + 0 ⋅ f No fYes = =p n n Mean score – represents a summarized rank used in the case of ordinal scale for creating final order of analyzed categories. Mean – is the most commonly used for central tendency when data are measured with ratio or interval scale. mean. Descriptive statistics Measures of central tendency: mode.6. . median Mode – is the response category with the highest frequency Median – is the middle value when the data are arranged in ascending or descending order. It divide the sample into two equal groups (50% of the sample members are on the left and the other 50% on the right of the median). It is calculated like mean but it has not the same properties with this one.

6.a measure of how much the value of the mean may vary from sample to sample taken from the same distribution. n For binary scale 2 ∑( x − x ) i i =1 fi s2 = s 2 = p( 1 − p ) or s 2 = p( 100 − p ) n Standard deviation – is the square root of the variance. It is an indicator of sample homogeneity. s= i =1 s= p( 1 − p ) or s = p( 100 − p ) s sx = n . variance. It is expressed in the same units as the data. standard error of mean. Descriptive statistics Variation indicators: range. standard deviation.2. Range – measures the spread of data Range=xlargest-xsmallest Variance – is the mean of squared deviation from mean. For binary scale n ∑( x − x ) i 2 fi n Standard error of mean .

Descriptive statistics Selecting the procedures of descriptive analysis in SPSS .6.2.

Interval scale : Mode. Variance. because the distances between scale levels are not equals. Variance. Standard error of mean. Mean score. Ratio scale: the same with interval scale and in addition we can divide a scale value by another (due to existence of absolute zero).6. We can calculate: Mean. Mean score: It is calculated like mean but it has not the same properties with this one. Standard error of mean.2. Median. Descriptive statistics The indicators of descriptive statistics that could be calculated for every type of scale: Nominal scale: Mode Ordinal scale : Mode. Exceptions: Binary scale: even if it is a nominal scale. Standard deviation. Mean. the absence of a named characteristic represents absolute zero. It does not allow to calculate variance and standard deviation. Median. Standard deviation. .