Professional Documents
Culture Documents
Regresija I Anova - Uvod U Statističko Učenje
Regresija I Anova - Uvod U Statističko Učenje
Statističko učenje
- pozvani predavači
- seminari
Teme 15.03.2005.
- Neparametarska statistika
- ANOVA
- Regresija
17.03.2004. 1/22
Uvod u statističko učenje
2 test
Neparametarski test
17.03.2004. 2/22
Uvod u statističko učenje
2 x 2 Table (Irisdat.sta)
Column 1Column 2 Row
Totals
Frequencies, row 1 110 90 200
Percent of total 27,500% 22,500%50,000%
Frequencies, row 2 104 96 200
Percent of total 26,000% 24,000%50,000%
Column totals 214 186 400
Percent of total 53,500% 46,500%
Chi-square (df=1) ,36 p= ,5475
V-square (df=1) ,36 p= ,5480
Yates corrected Chi-square ,25 p= ,6162
Phi-square ,00090
Fisher exact p, one-tailed p= ,3081
two-tailed p= ,6163
McNemar Chi-square (A/D) ,82 p= ,3651
Chi-square (B/C) ,87 p= ,3506
Primjer 2:
Muškarci preferiraju novu formulu 41 od 50.
Žene preferiraju novu formulu 27 od 50. (Statistica example)
2 x 2 Table (Irisdat.sta)
Column 1Column 2 Row
Totals
Frequencies, row 1 41 9 50
Percent of total 41,000% 9,000%50,000%
Frequencies, row 2 27 23 50
Percent of total 27,000% 23,000%50,000%
Column totals 68 32 100
Percent of total 68,000% 32,000%
Chi-square (df=1) 9,01 p= ,0027
V-square (df=1) 8,92 p= ,0028
Yates corrected Chi-square 7,77 p= ,0053
Phi-square ,09007
Fisher exact p, one-tailed p= ,0025
two-tailed p= ,0049
McNemar Chi-square (A/D) 4,52 p= ,0336
Chi-square (B/C) 8,03 p= ,0046
17.03.2004. 3/22
Uvod u statističko učenje
ANOVA
(ANALIZA VARIJANCE , ANalisys Of VAriance)
R.A. Fisher
Svrha:
Nalaženje faktora koji najviše utječu na model (primjer
regresija)
Reducira se na testiranje razlike između srednjih vrijednosti više
uzoraka.
U principu uzorci nisu nezavisni i dobiveni su dizajnom
eksperimenta (kada se kontrolira vrijednost faktora)
17.03.2004. 4/22
Uvod u statističko učenje
Particioniranje varijance:
Grupa 1 Grupa 2
O1 3 6
O2 2 7
O3 1 5
x 2 6
SS 2 2
Ukupna x 4
Total SS 28
TOTALNA
17.03.2004. 5/22
Uvod u statističko učenje
Objašnjena varijabilnost
Mnoge stat. procedure koriste omjer Nebjašnjena varijabilnost
17.03.2004. 6/22
Uvod u statističko učenje
17.03.2004. 7/22
Uvod u statističko učenje
ANOVA I REGRESIJA
17.03.2004. 8/22
Uvod u statističko učenje
History Lesson
Sir Francis Galton, in his 1885 Presidential address before the anthropology section
of the British Association for the Advancement of Science (Stigler, 1986), described a study he
had made that compared the heights of children with the heights of their parents. He examined
the heights of parents and their grown children, perhaps to gain some insight into what degree
height is an inherited characteristic. He published his results in a paper, "Regression Towards
Mediocrity In Hereditary Stature," (Galton, F. (1886)).
Figure A shows a JMP scatterplot of Galton's original data. The right-hand plot is his attempt to
summarize the data and fit a line. He multiplied the womens' heights by 1.08 to make them
comparable to mens' heights and defined the parent's height as the average of the two parents.
He defined ranges of parents' heights and calculated the mean child's height for each range. Then
he drew a straight line that went through the means as best he could.
He thought he had made a discovery when he found that the heights of the
children tended to be more moderate than the heights of their parents. For
example, if parents were very tall the children tended to be tall but shorter than
their parents. If parents were very short the children tended to be short but taller
than their parents were. This discovery he called "regression to the mean," with
the word "regression" meaning to come back to.
17.03.2004. 9/22
Uvod u statističko učenje
Linearna regresija
Najčešće je taj model linearna funkcija – pravac, ali ne mora biti i isti
se postupak može primijeniti sve dok je model linearan (!).
Y=ax + b
Y=a + bx + cx2
Linearan model je onaj koji je linearan u parametrima koji se
procjenjuju – odnos između varijabli to ne mora biti!
17.03.2004. 10/22
Uvod u statističko učenje
Yˆi a bX i
Yˆi Yi
jest rezidual, devijacija ili pogreška koja nastaje kada
predviđamo Y u zavisnosti od X.
17.03.2004. 11/22
Uvod u statističko učenje
Da li se best fit može dobiti na drugi način ? Zašto baš min SS?
17.03.2004. 12/22
Uvod u statističko učenje
0 < R2 < 1
17.03.2004. 13/22
Uvod u statističko učenje
17.03.2004. 14/22
Uvod u statističko učenje
17.03.2004. 15/22
Uvod u statističko učenje
17.03.2004. 16/22
Uvod u statističko učenje
17.03.2004. 17/22
Uvod u statističko učenje
17.03.2004. 18/22
Uvod u statističko učenje
y = a0 + a1 x1 + a2 x2 + …… + ak xk +
F(k, n-k-1)
F se odnosi prema r2 (godness to of fit):
17.03.2004. 19/22
Uvod u statističko učenje
r 2 ( n k 1)
F
(1 r 2 )k
17.03.2004. 20/22
Uvod u statističko učenje
Y X1
17.03.2004. 21/22
Uvod u statističko učenje
Y1
d
c
a
b
X2
X1
17.03.2004. 22/22