You are on page 1of 30

AFT2103

BUSINESS STATISTICS
FEBRUARY SEMESTER
SESSION 2020/2021
GROUP ASSIGNMENT
LOCAL BANK LOAN

LECTURER’S NAME : Dr. Nur Ain Ayuni Binti Sabri


TUTORIAL GROUP : L1T3
GROUP NUMBER : Group 18
VIDEO LINK : https://youtu.be/nTmmIWtc7Xg

Student’s Name Matric Number


Iryne Athirah Binti Shahrol Anuar A20A1372
Intan Farihah Binti Matali A20A1369
Siti Aminah Binti Daud A20A1951
BB Ellyna Binti Brahim A20A1285
Mohd Sukri Bin Jalapar A20A2160
Niran A/L Nadaraja A20A1597

1
Table Of Content

No. Content Page

1. Data description 3-4

2. Graphical representation 5-11

3. Numerical Descriptive measures 12-21

4. Statistical analysis 22-26

5. Theoretical modelling 27-28

6. Concluding remarks 29-30

2
1.0 DATA DESCRIPTIVE

Descriptive is the analysis of data to help describe, show or summarize data. Descriptive is a
summary of data in the form of tables, graphs, and numbers. This data aids in describing and
comprehending the characteristics of a certain data collection by providing summaries of the
sample and data measurements. Descriptive helps to simplify large amounts of data by reducing a
lot of data making it easier to summarize.

LOCAL BANK LOAN

In our assignment, we selected local bank loans to review, analyze, and draw conclusions. We
have chosen to analyze the level of education that is did not complete high school, high school
degree, some college, and a college degree. Through this study, we were able to infer variables
of interest such as does the level of education affect debt to income and does the level of
education affect household income?

Age group Educational background Household Debt to income ratio


Income (‘000)
≤24 Did not complete high 49 28.7
school

25-34 Did not complete high 256 96.3


school

35-44 Did not complete high 559 96.8


school

≥45 Did not complete high 233 25.8


school

≤24 High school degree 51 49.2

25-34 High school degree 99 54.1

35-44 High school degree 386 37.2

≥45 High school degree 76 14.6

≤24 Some college 25 4.8

3
25-34 Some college 27 9.8

35-44 Some college 25 10.2

≥45 Some college 92 12.3

≤24 College degree 31 6.6

25-34 College degree 117 26.4

35-44 College degree 25 19.7

≥45 College degree 42 9.5

Table 1.0 the age, educational level, household income, and debt to income ratio of
50 respondents taken from local bank data.

Table 1.0 shows the number of bank loan in three categories which is the household income,
debt to income ratio, the educational level respondent, in this study, there are four main
categories which do not complete high school, high school degree, some college, and college
degree and the age group with ages ≤ 24, ages 25-34, ages 35-44, and ages ≥ 45.

In the sample data table 1.0, the variable of interest ‘local bank loan’, can be viewed from the
perspectives of different fields; the age group, the educational level, household income, and
debt to income ratio. Does the educational level affect the household income and does the
educational level affect the debt to income ratio?

4
2.0 Graphical Representation

Graphical representation refers to the use of intuitive charts to illustrate and simplify clearly
data set. The data is absorbed into a software graphical representation of the data and then
represented with various symbols, such as lines on a line chart, bars on a bar chart, or cuts on a
pie charts, from which the user can obtain a larger view than by numerical analysis alone.

Household Income

Lower Upper Cumulative


Class Midpoint Frequency
boundaries boundaries Frequency
14-40 13.5 40.5 27 28 28
41-67 40.5 67.5 54 17 45
68-94 67.5 94.5 81 2 47
95-121 94.5 121.5 108 2 49
122-148 121.5 148.5 135 0 49
149-175 148.5 175.5 162 0 49
176-202 175.5 202.5 189 1 50
∑𝒇 = 𝟓𝟎

From the data above, we had the table which is household income. Data distribution is a
function or listing which shows all the possible values of the data. With using the distribution,
it can calculate the probability of any one particular observation in sample space. To decide the
number of classes (k), the formula that we had used is 2k>n. for example, household income
rate is 50 of the number in the data. So, k=6, the classes use is 6, then 2^6=64 which is greater
than 50.

5
Household Income

=2^k> n
=2^k > 50
=2^6 >50
= 64 >50
=6

Determine the class width for household income rate,

𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑒𝑛𝑡𝑟𝑦 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑒𝑛𝑡𝑟𝑦


c=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠

176−14
=
6

= 27

From the data of distribution, these are example of chart that we can illustrate to describe the
data variable in Household Income.

Histogram
1. A histogram is a bar graph that represents the frequency distribution of a data set.

2. The following properties must be followed:

• The horizontal scale is quantitative and measures the data values (either class
midpoints or class boundaries)
• The vertical scale measures the frequencies of the class
• Consecutive bars must touch - bars must begin and end at class boundaries instead
of class limits.

3. Class boundaries: the numbers that separate classes without forming gaps between them.

4. Lower boundary = lower limit - 0.5

5. Upper boundary = upper limit + 0.5

6
Table of Histogram:

Lower Upper
Class Midpoint Frequency
boundaries boundaries
14-40 13.5 40.5 27 28
41-67 40.5 67.5 54 17
68-94 67.5 94.5 81 2
95-121 94.5 121.5 108 2
122-148 121.5 148.5 135 0
149-175 148.5 175.5 162 0
176-202 175.5 202.5 189 1
∑𝒇 = 𝟓𝟎

Graph of Histogram:

HOUSEHOLD INCOME
30

25

20

15

10

0
27 54 81 108 135 162 189

Graph 1: household income

7
Histogram graph is a plot point which is used midpoint to show in the graph. The midpoint is
calculated by using a frequency table. This graph representing with the intervals between class
boundaries and with areas proportional to frequencies in the corresponding classes. Graph 1
show the total of Household Income. From this histogram of household income, we are using
midpoint as X-axis where the number is taking from class interval. While for the Y-axis is
taking from the amount of frequency. By follow the table, we can say that class of 14 - 40 is
higher than other class which is 28 of frequency.

2.2 Frequency Polygon

i) To Construct the frequency polygon, use the same horizontal and vertical scales that were
used in the histogram, labelled with class midpoint.

ii) Because the graph should begin and end on the horizontal axis, extent the left side to one
class width before the first-class midpoint and extent the right side to one class after the last
class midpoint.

Table of Frequency Polygon

Class Midpoint Frequency


(-13) – 13 0 0
14 – 40 27 28
41 – 67 54 17
68 – 94 81 2
95 – 121 108 2
122 – 148 135 0
149 – 175 162 0
176 – 202 189 1
203 - 229 216 0

8
Graph of Frequency of Polygon:

HOUSEHOLD INCOME
30

25

20

15
0 27 54 81 108 135 162 189 216

Graph 2: Frequency Polygon

A frequency polygon is a graph constructed by using lines to join the midpoints of each
interval. The heights of the points represent the frequencies. A frequency polygon can be
created from the histogram or by calculating the midpoints from the frequency distribution
table. In graph 2 show the Household Income. The data in graph frequency polygon start from
0 for midpoint and the frequency is 0. The second midpoint is 27 and frequency is 28, third is
54 and the frequency is 17, midpoint 81 frequency 2, midpoint 108 frequency 2, midpoint 135
frequency 0, midpoint 162 frequency 0, midpoint 189 frequency 1, while the last midpoint is
216 and the frequency is
0. The highest in the graph frequency polygon is at midpoint 27 and the frequency is 28. While
the lowest is at midpoint 135 and 162 and the frequency is 0.

9
2.2 Ogive Graph

A cumulative frequency graph or ogive is a line graph that displays the cumulative frequency
of each class at its upper-class boundaries

(i) Construct a frequency distribution that includes cumulative frequencies as one of


the columns.
(ii) Specify the horizontal and vertical scale.
➢ Horizontal: upper class boundaries
➢ Vertical: cumulative frequencies
(iii) Plot points that represent the upper-class boundaries and their corresponding
cumulative frequencies.
(iv) Connect the points in order from left to right.
(v) The graph should start at the lower boundary of the first class (cumulative
frequency is 0) and should end at the upper boundary of the last class (cumulative
frequency is equal to the sample size)

Table of Ogive:

Cumulative
Class Upper boundaries Frequency
Frequency
(-13) – 13 0 0 0
14 – 40 27 28 28
41 – 67 54 17 45
68 – 94 81 2 47
95 – 121 108 2 49
122 – 148 135 0 49
149 – 175 162 0 49
176 – 202 189 1 50

10
Household Income
60

50

40

30

20 0 27 54 81 108 135 162 189


Upper Boundaries

Graph 3: Ogive Chart

From this ogive of Household income, we are using upper boundaries as X-axis where the
number is added 0.5 and follow by others class. While the Y-axis is taking from the number of
cumulative frequencies which is the adding number from the frequency. By follow the table, we
can see that the graph increases from 0 to 189 at upper boundaries data.

11
3.0 Numerical Descriptive measure

A descriptive numerical measure is a numerical statistic that describes population


characteristics. The population is described by numerical measures. It can also be defined as
the summary of data using a number of numerical measures. In each category, the proportion
or percentage of data values is the primary numerical measure for qualitative data. The
average, median, mode, range, variance and default are the numerical measurements most
commonly used for quantitative data.

3.1 MEAN

Mean is the total number of values, which is the sum of all values in a dataset. Mean can also
be defined as the average to which all numbers are added and then divided by number. The
mean form for clustered data is

𝑥̅ = Σ (𝑓𝑥) Where x = midpoint of a class


𝑛 f = frequencies of a class
n = total of frequency (sample size)
MEAN OF HOUSEHOLD
INCOME
CLASS MIDPOINT, x FREQUENCY, f fx
14 - 40 27 28 756
41 -67 54 17 918
68 - 94 81 2 162
95 - 121 108 2 216
122 - 148 135 0 0
149 - 175 162 0 0
176 - 202 189 1 189
Σ𝑓 =50 Σ(𝑓𝑥) = 2241

12
Mean = Σ (𝑓𝑥)
𝑛
= 2241
50
= 44.82

MEDIAN

Median is the value that lies in the middle of the data when the data set is ordered. For the odd
number data entries the median is the middle of data entry and even number data entries the
median is the mean of the two middle data entries.

The median can be calculated using the following formula :-

𝑛
∑𝑓
𝐿

Where ̕Lm = lower class boundary of median class

fm = frequencies of the median class

Σ 𝑓𝑚−1 = cumulative frequency before median class

C = class size / class width

13
MEDIAN OF HOUSEHOLD INCOME

CUMULATIVE
CLASS FREQUENCY FREQUENCY
(-13) - 13 0 0
14 - 40 28 28
41 - 67 17 45
68 - 94 2 47
95 - 121 2 49
122 - 148 0 49
149 - 175 0 49
176 - 202 1 50
Σ𝑓 = 50

Step 1: Determine the location of the median class

[𝑛]ᵗʰ Observation = [50]ᵗʰ

= 25th observation

Therefore, the median class is 14 – 40

Step 2: Determine the lower class boundary: Lm = 13.5

Step 3: Determine class size / class width

c = 40.5 – 13.5

= 27

Step 4: Determine Σ𝑓𝑚−1 𝑎𝑛𝑑 𝑓𝑚

Σ𝑓𝑚−1= 0 𝑎𝑛𝑑 𝑓𝑚= 28

14
Step 5: Calculate the median

𝑛
−∑𝑓 𝑚−1
2
Formula = 𝐿𝑚 + [ ]×𝑐
𝑓𝑚

50
2
−−0
= 13.5 + [ ] × 27
28

= 37.61

MODE

Mode is the data entry that occurs with the greatest frequency. If no entry is repeated, the data set
has no mode and if have two entries occur with the same greatest frequency, each entry is a
mode and the data set is called bimodal. Formula of mode for grouped data is: -
Mode =

Where Lm = Lower class boundary of the modal class

Δı = The different between frequency of modal class and the frequency of the class
before modal class

Δ₂ = The different between frequency of modal class and the frequency of the class after modal
class.

C = Class size / class width

15
MODE OF HOUSEHOLD INCOME

CLASS FREQUENCY
(-13) - 13 0
14 - 40 28
41 - 67 17
68 - 94 2
95 - 121 2
122 - 148 0
149 - 175 0
176 - 202 1
Σ𝑓 =50

Step 1: Determine the location of the modal class (based on the highest frequency)

The mode class is 14 – 40


Step 2: Determine the lowers class boundary, Lm:
Lm = 13.5
Step 3: Determine class size / class width

c = 40.5 – 13.5
= 27
Step 4: Determine Δ𝚤 𝑎𝑛𝑑 Δ₂
Δ𝚤 = 28 – 0 = 28
Δ₂ = 28 – 17 = 11

16
Step 5: Calculate the mode
∆1
Mode = 𝐿𝑚 +[ ]×𝑐
∆1+∆2
28
= 13.5 +[ ] × 27
28+11

=32.88

MEASURES OF POSITION

Measure of position is to determine the measurement of position, the data must be sorted from
lowest to highest. The different measures of position are quartiles, divided the data set into four
(4) equal parts. Deciles, divide the data set into ten (10) equal parts and percentiles, divide the
data set into one hundred (100) equal parts.

Quartile

A quartile is one of the three points that divide a data set into four equal groups. First quartile
(Q1) is lower quartile same that cuts of 25% of data. It’s also equal to 25ᵗʰ percentile. Second
quartile (Q₂) is median that cuts data set in half. It is also equal to 50ᵗʰ percentile. Third quartile
(Q3) is upper quartile same that cuts off highest 25% of data or lowest 75%. It is also equal to
75th percentile. The different between the upper and lower quartiles is called the interquartile
range.

17
QUARTILE OF HOUSEHOLD INCOME

CUMULATIVE
CLASS FREQUENCY FREQUENCY
(-13) - 13 0 0
14 - 40 28 28
41 - 67 17 45
68 - 94 2 47
95 - 121 2 49
122 - 148 0 49
149 - 175 0 49
176 - 202 1 50
Σ𝑓 = 50

Step 1: Find the location of Q1 and Q₃


𝑁
Q1 = = 12.5 Class Interval: 14 – 40
4

Q₃ = 3𝑁 = 37.5 Class Interval: 68 – 94

Step 2: Find the value of Q1 and Q₃

𝑁
−𝐹
Q1 = 𝐿𝑄1 + [ 4 ]×𝑐
𝑓𝑞1

12.5−0
= 13.5 + [ ] × 27
28

= 25.55

18
3𝑁
−𝐹
Q3 = 𝐿𝑄3 + [ 4𝑓𝑞1 ] × 𝑐

37.5−45
= 67.5 + [ ] × 27
2

= - 33.75

Step 3: Interquartile Range = Q₃ - 𝑸𝟏

= Q₃ - 𝑄1

=-33.75 – 25.5

= -59.25

3.3 Measures of Variation

A measure of variation is some parameters attempt to describe the amount of variation between
random variables. Three parameters that are used to quantify the amount of variation in a set of
random variables like a range, variance, and standard deviation.

3.3.1 Range

Range is a one of a data which is difference between the maximum and minimum data entries in
the set. The formula of range is:

Range = (Maximum data entry) – (Minimum data entry)

Range of household income = 202 – 14

= 188

19
3.3.2 Variance and Standard Deviation

The variance is defined as the average of the squared difference from the mean and Standard
deviation is a measure of how spread out number are. The formula is so easy. It is the square root
of the variance.

Formula variance: -
Formula Standard Deviation: -

S2

Variance and Standard deviation of household income

Class 𝑓 𝑥 𝑥2 𝑓x 𝑓𝑥2

14 -40 28 27 729 756 571536


41 -67 17 54 2916 918 842724
68 -94 2 81 6561 162 26244
95 -121 2 108 11664 216 46656
122 -148 0 135 18225 0 0
149 -175 0 162 26244 0 0
176 - 202 1 189 35721 189 35721

∑𝑓x = ∑𝑓𝑥2=
∑f = 50 ∑x =102060
2
2241 1522881

20
2
1 2
(∑ 𝑓𝑥)2
𝑆 = + [∑ 𝑓𝑥 − ]
𝑛−1 𝑛

1 5 022 081
= + [1 522 881 − ]
50 − 1 50

=29 029.38

S2 = √29029.38

= 170.38

3.3.3 The Shape of Distributions

The shape of distributions has two categories. It is symmetric, when a vertical line can be drawn
through the middle of graph of distribution and resulting halves are approximately mirror
images. It is uniform and bell-shaped/ Normal and then the non- symmetric are skewed to the left
and skewed to the right.

The Shape of Distributions (Household income):

MEAN = 44.82 MEDIAN = 37.61 MODE = 32.88

MEAN > MEDIAN > MODE

Therefore, it is skewed to the right if its tail extends to the right.

21
4.1 NUMERICEL DESCRIPTIVE MEASURES

22
4.2 HYPOTHESIS TESTING

23
ANOVAa

Sum of
Model Squares df Mean Square F Sig.
Regression 9670.132 2 4835.066 7.349 .002b
Residual 30923.868 47 657.955
Total 40594.000 49
a. Dependent Variable: Household Income
b. Predictors: (Constant), Age, Educational Background

• The p-value from the ANOVA table is 0.000 which is less than 0.005, which
means that at least one of the two variables : year and population can be used to
model total crime index result from this test statistics is 7.349.

4.3 CORRELATION ANALYSIS

Correlations
Household Educational
Income Background Age
Pearson Household Income 1.000 -.065 .476
Correlation Educational -.065 1.000 .085
Background
Age .476 .085 1.000
Sig. (1-tailed) Household Income . .327 .000
Educational .327 . .278
Background
Age .000 .278 .
N Household Income 50 50 50
Educational 50 50 50
Background
Age 50 50 50

24
In the scatter plot matrix, graph 1 plots total crime index against population. It is seen that the
higher number of population, the higher total number of crime index.

• Graph 2 plots total crime index against year. There is no relationship between total
crime index and year.
• Graph 3 plots population against year and also there is no relationship between it.

4.5 CONFIDENCE INTERVALS

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -6.413 15.632 -.410 .683
Educational -2.913 3.503 -.106 -.832 .410
Background
Age 1.581 .416 .486 3.800 .000
a. Dependent Variable: Household Income

25
• The equation: Household income = -6.413 + 1.581(age) – 2.913(educational background)
• Thus, for every unit increase in education background, household income will drop by
2.913, provided the customer’s age remains unchanged. Similarly for every unit of age,
household income will go up by 1.581 units, provided the customer is still of the same
educational background.

26
5.0 Theoretical Modelling

Model Summaryb
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .488a .238 .206 25.651
a. Predictors: (Constant), Age, Educational Background
b. Dependent Variable: Household Income

• The R-Square value is 0.238, which means 23.8% of the variation in household
income can be explained by age and educational background.

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -6.413 15.632 -.410 .683
Educational Background -2.913 3.503 -.106 -.832 .410
Age 1.581 .416 .486 3.800 .000
a. Dependent Variable: Household Income

• The equation: Household income = -6.413 + 1.581(age) – 2.913(educational background)


• Thus, for every unit increase in education background, household income will drop by 2.913,
provided the customer’s age remains unchanged. Similarly for every unit of age, household
income will go up by 1.581 units, provided the customer is still of the same educational
background.

Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Educational Background Statistic df Sig. Statistic df Sig.
Household Income 1 .145 27 .151 .926 27 .056

2 .342 12 .000 .714 12 .001

3 .345 5 .051 .807 5 .093

4 .241 6 .200* .862 6 .195

*. This is a lower bound of the true significance.


a. Lilliefors Significance Correction

27
• the Kolmogorov-Smirnov test of normality on the residuals gives a p-value of 0.145, 0.342, 0.345,
0.241, which are more than 0.05. Thus, the assumption of normality of the residual terms is met.
• The Shapiro-wilk test of normality on the residuals give a p-value of 0.056, 0.093 and 0.195
which is more than 0.05. Thus, the assumption of normality of the residual terms is met. The
Shapiro-wilk test of normality on the residuals give a p-value of 0.001, Which is less than 0.05.
Thus, the assumption of normality of the residual terms is not met.

28
6.0 CONCLUDING REMARKS

As the conclusion, from this project or group assignment we can learn and applying on what we
have learned before in business statistics. Such as how to make a grouped data from raw data, to
represent it into graphical data, use numerical descriptive measure and also we learned how to
use SPSS. From this, we can know how our understanding about this subject and know how to
improve it.

Besides, we can also learn how to analyze data using graph and table. From our raw data, we
make it as a distribution table and after that make it to two grouped data because it is from
different dependent variable. To make the data into a graphic, there is two types of graph and we
use histogram and ogive to represent our data. For histogram, we use midpoint as x – axis and
frequency as y – axis and for ogive we use upper boundaries for x – axis and cumulative
frequency as y – axis.

Actually, statistics is already used in our day to day life such as to make a prediction, quality
testing, weather forecast, emergency preparedness, predicting disease, political campaigns,
insurance, consumer goods, financial market, sports and others. Statistics play a very important
role in all aspects of life. From this assignment , we can identify about Descriptive is the analysis
of data to help describe, show or summarize data. Descriptive is a summary of data in the form
of tables, graphs, and numbers. This data aids in describing and comprehending the
characteristics of a certain data collection by providing summaries of the sample and data
measurements. Descriptive helps to simplify large amounts of data by reducing a lot of data
making it easier to summarize.
From this assignment , we can identify about Descriptive is the analysis of data to help describe,
show or summarize data. Descriptive is a summary of data in the form of tables, graphs, and
numbers. This data aids in describing and comprehending the characteristics of a certain data
collection by providing summaries of the sample and data measurements. Descriptive helps to
simplify large amounts of data by reducing a lot of data making it easier to summarize.

29
In our assignment, we selected local bank loans to review, analyze, and draw conclusions. We
have chosen to analyze the level of education that is did not complete high school, high school
degree, some college, and a college degree. Graphical representation refers to the use of intuitive
charts to illustrate and simplify clearly data set. The data is absorbed into a software graphical
representation of the data and then represented with various symbols, such as lines on a line
chart, bars on a bar chart, or cuts on a pie charts, from which the user can obtain a larger view
than by numerical analysis alone.

From the data above, we had the table which is household income. Data distribution is a function
or listing which shows all the possible values of the data. With using the distribution, it can
calculate the probability of any one particular observation in sample space. To decide the number
of classes (k), the formula that we had used is 2k>n. for example, household income rate is 50 of
the number in the data. So, k=6, the classes use is 6, then 2^6=64 which is greater than 50. A
descriptive numerical measure is a numerical statistic that describes population characteristics.
The population is described by numerical measures. It can also be defined as the summary of
data using a number of numerical measures. In each category, the proportion or percentage of
data values is the primary numerical measure for qualitative data.

The average, median, mode, range, variance and default are the numerical measurements most
commonly used for quantitative data. Mean is the total number of values, which is the sum of all
values in a dataset. Mean can also be defined as the average to which all numbers are added and
then divided by number. The mean form for clustered data is. Median is the value that lies in the
middle of the data when the data set is ordered. For the odd number data entries the median is the
middle of data entry and even number data entries the median is the mean of the two middle data
entries. Mode is the data entry that occurs with the greatest frequency.

If no entry is repeated, the data set has no mode and if have two entries occur with the same
greatest frequency, each entry is a mode and the data set is called bimodal. Formula of mode for
grouped data is Where Lm = Lower class boundary of the modal class. Δı = The different
between frequency of modal class and the frequency of the class before modal class. Δ₂ = The
different between frequency of modal class and the frequency of the class after modal class. C =
Class size / class width

30

You might also like