Professional Documents
Culture Documents
What is the Relationship between SAT Scores and Family Income of the Test
Takers around the World?
Introduction
Statement of Task
I am investigating the relationship of SAT scores and family income of the test takers
around the world. I have collected data on SAT scores and family income of the test takers
around the world. With the collection of data that I have acquired, a number of mathematical
processes were used to analyze the data: a scatter plot of the data, calculation of the least
squares regression line and correlation coefficient. I am going to do a χ2 test on the data to show
the dependence of SAT scores and family income of the test takers around the world.
Mathematical Investigation
Collected Data
Table 1: Mean SAT scores per section categorized in family income of test taker in 2007
This bottom row, the “More than $100,000” I am going to consider as an outlier
therefore excluded in all calculations as it goes from $100,000 up to the millions of
dollar of income which is too wide of a range to include into the calculations of this
assessment.
Graph 1: Average SAT Score Vs. Family Income
1600
1559
1550 1522
Overall Averaged SAT Score (top score 2400)
1508
1500 1487
1462
1450 1427
1250
1200
1150
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
Family Income of SAT Takers ($ in Thousands)
Graph 1 shows the average SAT score Vs. family income of test taker. As of now, there
seems to be very strong positive correlation. It does appear that the SAT scores improve
as the family income increases. (Graph was generated through Microsoft Excel)
Calculation of the Least Squares Regression
The Least Square regression identifies the relationship between the independent
variable, x, and the dependent variable, y. It is given by the following formula:
x y xy x2
15000 1301 19515000 225000000
25000 1371 34275000 625000000
35000 1363 47705000 1225000000
45000 1427 64215000 2025000000
55000 1462 80410000 3025000000
65000 1487 96655000 4225000000
75000 1508 113100000 5625000000
85000 1522 129370000 7225000000
95000 1559 148105000 9025000000
∑ = 495000 ∑ = 13000 ∑ = 733350000 ∑ = 33225000000
x = 55000 y = 1444. 44 x y = 79444444.44 x = 3691666667
2
These are the calculated values used in finding the Least Squares Regression
S xy =
∑ xy −x y
n
733350000
S xy = −79444444.44
9
S xy =2038888.893
S x=
√ ∑ x 2 −¿ x ¿2
n
S x=
√ 33225000000
9
−3025000000
S x =25819.88897
S xy
y− y= (x−x )
Sx 2
2038888.893
y−1444.44444=
¿¿
y=0.0030583333 x +1276.231666
√ √
∑ ( x−x )2 , S = ∑ ( y− y )2 and
S xy
r= where S x =
Sx S y n y
n
S xy is the covariance
∑ xy −x y .
n
x y ( x−x )2 ( y− y )2
15000 1301 1600000000 20576.30864
25000 1371 900000000 5394.08642
35000 1363 400000000 6633.197531
45000 1427 100000000 304.308642
55000 1462 0 308.1975309
65000 1487 100000000 1810.975309
75000 1508 400000000 4039.308642
85000 1522 900000000 6014.864198
95000 1559 1600000000 13122.97531
∑ = 495000 ∑ = 13000 ∑ = 6000000000 ∑ = 58204.22222
x = 55000 y = 1444. 44
These are the calculated values used in finding the Correlation Coefficient.
S x =25819.88897
S y=
√ 58204.22222
9
S y =80.4185041
2038888.893
r=
(25819.88897)(80.4185041)
r =¿0.9819360378
2
r =0.9642983824
1550 1522
1508
1500 1487
1462
1450 1427 Average SAT score
Linear (Average SAT score)
1400 1371 1363 Linear (Average SAT score)
1350
1301
1300
2
=0.9642983824
r1250
0 00 0 000 0 00 0 00 0 00 00 0 00 0 00 0 00 0 00
20 30 x +1276.231666
y=0.0030583333
10 40 50 60 70 80 90 1 00
Graph 2 indicates that there is a strong positive linear correlation. This is also indicated
through the value of correlation coefficient, 0.96.(the graph was generated through Microsoft
Excel )
Calculation of a χ 2 test
The χ 2 test is used to measure whether two classifications or factors from the
same sample are independent of each other – if the occurrence of one of them does not
affect the occurrence of the other.
( f o−f e )2
χ =∑
2
fe
Observed Values:
B1 B2 Total
A1 A B A+B
A2 C D C+D
Total A+C B+D N
B1 B2 Total
( A + B ) ( A+C) ( A + B ) (B+ D)
A1 A+B
N N
( A +C ) (C+ D) ( B+ D ) (C+ D)
A2 C+D
N N
Total A+C B+D N
Degrees of freedom measure the number of values in the final calculation that are free
to vary:
Df =( rows−1 )(columns−1)
Null (H0) Hypothesis: SAT scores and family income are independent from each other.
Alternative (H1) Hypothesis: SAT scores and family income are dependent from each
other.
Table 4: Observation Values
Score
Income($) 1300-1430 1431-1561 Total
15000 – 55000 4 1 5
56000 – 96000 - 4 4
Total 4 5 9
Table 2 shows the observed values for SAT score Vs. family income. The data pieces have
been put into ranges that represent the income of the families of the test takers.
Score
Income($) 1300-1430 1300-1430 Total
( 4+ 1 ) (4 +0) ( 4+ 1 ) (1+4)
15000 – 55000 4+1
9 9
( 4+ 0 ) (0+ 4) ( 1+ 4 ) (0+ 4)
56000 – 96000 0+4
9 9
Total 4+0 1+4 9
Table 3 shows the individual calculations for each of the expected values.
Score
Income($) 1300-1430 1300-1430 Total
15000 – 55000 2.22222 2.77777 5
56000 – 96000 1.77777 2.22222 4
Total 4 5 9
Table 6 shows the expected values retrieved by the calculations in table 4
( f o−f e )2
χ =∑
2
fe
Df =( 2−1 ) (2−1)
Df =1
Discussion/Validity
Limitations
Throughout the investigation between the correlation of SAT scores and family
income, various limitations may have affected the outcome of the results.
One limitation of the data collected could be that it only reflects on the people
who filled in the family income section before signing up for the SAT. There is no
evidence that the data reflects everyone who has taken the SAT score as there may be
people who did not fill that section.
Another limitation could be that not everyone in the world decide to take the
SAT, people who cannot afford it or take alternative tests are being neglected. Also the
data does not confirm of how many SAT takers are being considered. The data can be
proved insufficient and inaccurate for those reasons.
There is also a limitation in the data as it states income of “$100,000 and above”.
That could mean that the data goes on unto family incomes of millions which is not
proportionate to the other ranges of family income given. Due to this however, that
piece of data was left out in the calculations.
Continuing, there might be a limitation to the recording of the data itself as SAT
takers are to take a survey where they mention family income when signing up for SAT.
This might cause a problem as many SAT takers, mostly in ages 15-17, do not know the
actual income of their family therefore wrong data may be entered.
Then there could be a limitation to the data due to culture and race. The data
does not mention culture and race which might affect the data as there might have
been more American surveys who mentioned family income compared to Asian who
answered the survey.
Another limitation is that the table of expected values in the χ 2 test has all
values less than 5 which reduces its validity.
Adding on to that, there might be a limitation to the amount of data that was
collected as 9 pieces of data may not prove to be sufficient enough to reflect the
correlation between SAT scores and family income in a world perspective.
Lastly, there may be many other factors taking place when considering the
correlation between SAT scores and family income such as reasons for having a high
family income and IQ of SAT test takers.
Conclusion
Work Cited
Rampell, Catherine. "SAT Scores and Family Income - NYTimes.com." The Economy
scores-and-family-income/>.
Downey, Joel. "SAT Scores Rise with Family Income." Cleveland OH Local News,
Breaking News, Sports & Weather - Cleveland.com. 10 Apr. 2008. Web. 01 Nov.
2010.<http://www.cleveland.com/pdgraphics/index.ssf/2008/04/sat_scores_rise_
with_family_in.html>.
Whiffen, Glen, John Owen, Robert Haese, Sandra Haese, and Mark Bruce. "Two
Studies SL. By Mal Coad. [S.l.]: Haese And Harris Pub, 2010. 581-82. Print.