You are on page 1of 6

Business Analytics

Computer Lab # 1
Data Preprocessing
Objectives:
1. To understand data cleaning or data preprocessing so that outliers in the data could be
identified.
2. To understand box plots for finding outliers

Contents:
 Introduction to Data and Data Analysis
 Types of data
 Measures of positions
 Finding quartiles, deciles and percentiles in a dataset
 Interquartile range (IQR)

Assessment Mechanism
Assessment tools Sessional
Lab Sheet+ Lab Performance 15%
Project 20%
Midterm Exam 25%
Final Exam 40%

Recommended:

 Reference: Lecture 1 and 2 of BA class


Consider the following dataset
Dataset:
X=23,24,225,227,228,231,33,236,240,241,42,248,250,253,257,260,263,276,67,300,815,301

a) Determine the required measures of positions for finding the allowed minimum and maximum
values of the dataset
b) Determine the outliers in the dataset (using 1.5IQR and 0.5IQR).
c) Verify that data is cleaned from outliers (using both 1.5IQR and 0.5IQR)

Deliverables

Calculations:
Screen shots of SPSS results
Take home exercises (Practice exercises)
Question No. 1
For the dataset given below, determine the allowed minimum and maximum values of the dataset
to avoid the outliers also make box plot of it.

Dataset:
X= 2,12, 23, 63, 121, 123, 126, 131, 137, 138, 140, 147, 148, 150, 154,164, 364,356

Question No. 2
Determine the outliers in the following dataset without using the box plot approach (if any).

Dataset:
X= 2,37,225,227,228,231,33,236,240,241,42,248,270,253,257,260,263,276,67,300,815,901

Question No. 3
Determine the outliers in the following dataset using the box plot approach (if any).

Dataset:
X= 23,24,225,227,228,231,33,236,240,241,42,248,250,253,257,260,263,276,67,300,815,301

Question No. 4
For the following dataset, clean the data from the outliers (if any). What is the cleaned data?

Group Dataset

0 21

0 22

0 26

0 8

0 29

0 14

0 19

0 26

0 28

0 7

1 48

1 45
1 39

1 35

1 33

1 47

1 21

1 11

1 51

1 42

2 78

2 71

2 74

2 73

2 65

2 64

2 32

2 84

2 81

2 22

Question No. 5
What type of measure scale is being used in the following cases? (Data Type).

1. High school men soccer players classified by their athletic ability: Superior, Average, Above
average.

2. Baking temperatures for various main dishes: 350, 400, 325, 250, 300

3. The colors of crayons in a 24-crayon box.

4. Social security numbers.

5. Incomes measured in dollars


6. A satisfaction survey of a social website by number: 1 very satisfied, 2 somewhat satisfied, 3
not satisfied.

7. Political outlook: extreme left, left-of-center, right-of-center, extreme right.

8. Time of day on an analog watch.

9. The distance in miles to the closest grocery store.

10. The dates 1066, 1492, 1644, 1947, 1944.

11. The heights of 21 65 year-old women.

12. Common letter grades A, B, C, D, F

You might also like