9 views

Uploaded by Syaiful Mansyur

statistik multivariat analisis

- Publication Bias in Eco No Metrics 2005
- Missing Voters 17May10
- Ppt.1.Probab
- Formulas and Tables
- Microsoft Word - Documento1
- 1-s2.0-S0047259X14002711-main
- Full Text
- lec25 (1)
- 350-proj01
- Weighting
- kragt2017
- 10.1.1.703.9931.pdf
- Environmental
- Your document contains an error. Check to see if the braces match. ie. '{' and '}' count is equal.
- Normal Distribution Questions Ex
- Biostatistic
- Lab 7
- Panel Data Problem Set 2
- Predicting Statistical Distributions of Footbridge Vibrations
- project handout for students

You are on page 1of 36

**Subagyo Teknik Industri UGM
**

Iegmubetter

1

LEARNING OBJECTIVES

Upon completing this chapter, you should be able to do the following:

•Select the appropriate graphical method to examine the

characteristics of the data or relationships of interest. •Assess the type and potential impact of missing data. •Understand the different types of missing data processes. •Explain the advantages and disadvantages of the approaches available for dealing with missing data.

______________________________ Creative-Productive-Efficient

Iegmubetter

3

LEARNING OBJECTIVES…

Upon completing this chapter, you should be able to do the following:

•Identify univariate, bivariate, and multivariate outliers. •Test your data for the assumptions underlying most

multivariate techniques. •Determine the best method of data transformation given a specific problem. •Understand how to incorporate nonmetric variables as metric variables.

______________________________ Creative-Productive-Efficient

Iegmubetter

4

• Identify and deal with outliers. • Develop a preliminary understanding of your data. Iegmubetter ______________________________ Creative-Productive-Efficient 5 . • Identify and evaluate missing values. • Check whether statistical assumptions are met.Examination Phases • Graphical examination.

Graphical Examination • Shape: Histogram Bar Chart Box & Whisker plot Stem and Leaf plot • Relationships: Scatterplot Outliers ______________________________ Creative-Productive-Efficient Iegmubetter 6 .

______________________________ Creative-Productive-Efficient Iegmubetter 7 .Histograms and The Normal Curve This is the distribution for HBAT database variable X19 – Satisfaction.

2).00 10. 6. and each number is a leaf.0.00 5. These 00 correspond to the leaves of 5. the two highest values for X6. 5.00 10.0 1 case(s) Iegmubetter 8 . 5.00 9.0.00 18. 9.0 and there 56699999 are ten observations.00 14. the stem is 10. 6. The length of the stem.0.00 8.5 to 5. For this stem.5.0. shows the frequency 5567777899 distribution. There are three observations with 000122234 values in this range (5.Product Quality Stem-and-Leaf Plot Frequency 3. 5567777999 01144 This table shows the distribution of X6 with a stem and leaf diagram (Figure 2. 10 .1 and 5. 8. indicated by the Stem width: Each leaf: 1. At the other end of the figure. 012 number of leaves. 9. 8. ranging from 5.00 10. ______________________________ Creative-Productive-Efficient Each stem is shown by the numbers.00 2. 1 and 2.00 11.9.2). the frequency 0112344444 is 14. representing two values of10. These are also the three lowest values 001111222333333444for X6. This stem has 10 leaves. This is shown as three 55556667777778 leaves of 0. In the next stem.00 Stem & Leaf 5. 55666777899 thus the stem is 5.Stem & Leaf Diagram–HBAT Variable X6 X6 . 7.5 to 5. 9. It is associated with two leaves (0 and 0). The first category is from 5. the stem value is again 5.0 to 5. 7.

Frequency Distribution: Variable X6 – Product Quality ______________________________ Creative-Productive-Efficient Iegmubetter 9 .

HBAT Diagnostics: Box & Whiskers Plots Outlier = #13 Group 2 has substantially more dispersion than the other groups. Median ______________________________ Creative-Productive-Efficient Iegmubetter 10 .

5 quartiles away from the box . • Outliers: 1.5 quartiles away from the box • Extreme values: greater than 1.Boxplot • Selain bisa dapat mengamati perbedaan intergroup juga bisa mengamati outliers.0 .1.

HBAT Scatterplot: Variables X19 and X6 ______________________________ Creative-Productive-Efficient Iegmubetter 11 .

Relationships between variables .

just a complementary tools. .Graphical Displays • Are not intended as a replacement for the statistical diagnostic.

. ______________________________ Creative-Productive-Efficient 12 .Missing Data •Missing Data = information not available for a subject (or case) about whom other information is available. • • Systematic? Random? •Researcher’s Concern = to identify the patterns and relationships underlying the missing data in order to maintain as close as possible to the original distribution of values when any remedy is applied. • Reduces sample size available for analysis. Iegmubetter •Impact . . Typically occurs when respondent fails to answer one or more questions in a survey.

a non random component. sufficiently random to accommodate any type of missing data remedy Step 4: Select the Imputation Method ______________________________ Creative-Productive-Efficient Iegmubetter 13 . Step 3: Diagnose the Randomness of the Missing Data Processes MAR.Four-Step Process for Identifying Missing Data Step 1: Determine the Type of Missing Data Is the missing data ignorable? Step 2: Determine the Extent of Missing Data Examine the patterns of missing data and determine the extent of missing data. Missing Completely at Random. Missing At Random. MCAR.

The observed Y value represent a random sample of the actual Y values for each value of X.MAR Vs MCAR • Missing at Random. . if the missing value of Y depend on X. MAR. with no underlying process that lends bias to the observed data. • Missing Completely at Random. MCAR. the observed values of Y are truly a random sample of all Y values. but not on Y.

. use observations with complete data only.Missing Data Strategies for handling missing data .. delete case(s) and/or variable(s). ______________________________ Creative-Productive-Efficient Iegmubetter 14 . estimate missing values.

concentration in a specific set of questions.Rules of Thumb 2–1 How Much Missing Data Is Too Much? Missing data under 10% for an individual case or observation can generally be ignored. attrition at the end of the questionnaire. The number of cases with no missing data must be sufficient for the selected analysis technique if replacement values will not be substituted (imputed) for the missing data. etc. except when the missing data occurs in a specific nonrandom fashion (e.).. ______________________________ Creative-Productive-Efficient Iegmubetter .g.

hot deck case substitution and regression methods most preferred for MCAR data. the preferred methods are: the regression method for MCAR situations. although the complete case method has been shown to be the least preferred. 10 to 20% – The increased presence of missing data makes the all available. ______________________________ Creative-Productive-Efficient Iegmubetter . while model-based methods are necessary with MAR missing data processes Over 20% – If it is necessary to impute missing data when the level is over 20%.Rules of Thumb 2–3 Imputation of Missing Data Under 10% – Any of the imputation methods can be applied when missing data is this low. and model-based methods when MAR missing data occurs.

Outlier Outlier = an observation/response with a unique combination of characteristics identifiable as distinctly different from the other observations/responses. Issue: “Is the observation/response representative of the population?” ______________________________ Creative-Productive-Efficient Iegmubetter .

______________________________ Creative-Productive-Efficient Iegmubetter . or untapped element previously not identified. Accounts for uniqueness of the observation. namun kombinasinya outlier. Extraordinary Event.Why Do Outliers Occur? • • Procedural Error. Perhaps they represent an emerging element. Hujan es di Yogyakarta. Masing-masing variable ok. Data entry error or a mistake in coding. • Extraordinary Observations. KKN menemukan data penduduk ber-TOEFL 590 • Observations unique in their combination of values. Nilai olah raga dan matematika sama-sama baik.

Dealing with Outliers • • • Identify outliers. Describe outliers. Delete or Retain? ______________________________ Creative-Productive-Efficient Iegmubetter .

______________________________ Creative-Productive-Efficient Iegmubetter .Identifying Outliers • • • Standardize data and then identify outliers in terms of number of standard deviations. Stem & Leaf. and Scatterplots. Multivariate detection (Mahalanobis D2). Examine data using Box Plots.

Rules of Thumb 2–4 Univariate methods – examine all metric variables to identify unique or extreme observations. For small samples (80 or fewer observations). increase the threshold value of standard scores up to 4. such as the independent variables in regression or the variables in factor analysis: threshold levels for the D2/df measure should be very conservative (.005 or . Multivariate methods – best suited for examining a complete variate. such as the independent versus dependent variables: use scatterplots with confidence intervals at a specified Alpha level. outliers typically are defined as cases with standard scores of 2. identify cases falling outside the ranges of 2. For larger sample sizes.5 or greater.001). Bivariate methods – focus their use on specific variable relationships. If standard scores are not used.5 versus 4 standard deviations. ______________________________ Creative-Productive-Efficient Outlier Detection Iegmubetter . resulting in values of 2. depending on the sample size.5 (small samples) versus 3 or 4 in larger samples.

Multivariate Assumptions Normality Linearity Homoscedasticity (dependent variables exhibit equal levels of variance across the range of predictor variables) Non-correlated Errors Data Transformations? ______________________________ Creative-Productive-Efficient Iegmubetter .

Homoscedasticity .

Normal probability plot. Levene test (univariate). Homoscedasticity Equal variances across independent variables. Kurtosis. Box’s M (multivariate). • ______________________________ Creative-Productive-Efficient Iegmubetter .Testing Assumptions • Normality assumptions Visual check of histogram.

Correlated errors arise from a process that must be treated much like missing data. Most cases of heteroscedasticity are a result of non-normality in one or more variables. the researcher must first define the “causes” among variables either internal or external to the dataset. Nonlinear relationships can be very well defined. but seriously understated unless the data is transformed to a linear pattern or explicit model components are used to represent the nonlinear portion of the relationship. That is. serious biases can occur in the results. but the impact effectively diminishes when sample sizes reach 200 cases or more. If they are not found and remedied. many times unknown to the researcher. but may be needed to equalize the variance.Rules of Thumb 2–5 Testing Statistical Assumptions Normality can have serious effects in small samples (less than 50cases). remedying normality may not be needed due to sample size. ______________________________ Creative-Productive-Efficient Iegmubetter . Thus.

______________________________ Creative-Productive-Efficient Iegmubetter . To correct violations of the statistical assumptions underlying the multivariate techniques. or 2.Data Transformations ? Data transformations . To improve the relationship (correlation) between the variables. . . provide a means of modifying variables for one of two reasons: 1.

Rules of Thumb 2–6 To judge the potential impact of a transformation. Heteroscedasticity can be remedied only by the transformation of the dependent variable in a dependence relationship. select the variable with the smallest ratio . Iegmubetter ______________________________ Creative-Productive-Efficient . Transformations may change the interpretation of the variables. Transformations should be applied to the independent variables except in the case of heteroscedasticity. transforming variables by taking their logarithm translates the relationship into a measure of proportional change (elasticity). Use variables in their original (untransformed) format when profiling or interpreting results. Always be sure to explore thoroughly the possible interpretations of the transformed variables. When the transformation can be performed on either of two variables. If a heteroscedastic relationship is also nonlinear. calculate the ratio of the Transforming Data variable’s mean to its standard deviation: Noticeable effects should occur when the ratio is less than 4. the dependent variable. must be transformed. and perhaps the independent variables. For example.

. a nonmetric independent variable that has two (or more) distinct levels that are coded 0 and 1. These variables act as replacement variables to enable nonmetric variables to be used as metric variables.Dummy Variable Dummy variable . ______________________________ Creative-Productive-Efficient Iegmubetter . .

Dummy Variable Coding Category Physician Attorney X1 1 0 X2 0 1 Professor 0 0 ______________________________ Creative-Productive-Efficient Iegmubetter .

Note: Chi square results will be distorted if more than 20 percent of the cells have an expected count of less than 5.Simple Approaches to Understanding Data o Tabulation = a listing of how respondents answered all possible answers to each question. Chi-Square = a statistic that tests for significant differences between the frequency distributions for two (or more) categorical variables (non-metric) in a cross-tabulation table. Cross Tabulation = a listing of how respondents answered two or more questions. ANOVA = a statistic that tests for significant differences between two means. or if any cell has an expected count of less than 1. This typically is shown in a two-way frequency table to enable comparisons between groups. This typically is shown in a frequency table. ______________________________ Creative-Productive-Efficient o o o Iegmubetter .

3. Why examine your data? What are the principal aspects of data that need to be examined? What approaches would you use? ______________________________ Creative-Productive-Efficient Iegmubetter .Examining Data Learning Checkpoint 1. 2.

Tugas #1 (maksimum 10 halaman) • Discuss why outliers might be classified as beneficial and as problematic. • Describe the conditions under which a researcher would delete a case of with missing data versus the condition under which a researcher would use an imputation method. .

Terimakasih ______________________________ Creative-Productive-Efficient Iegmubetter 31 .

- Publication Bias in Eco No Metrics 2005Uploaded byStanislav Ivanov
- Missing Voters 17May10Uploaded byxaxilus1675
- Ppt.1.ProbabUploaded byPujitha
- Formulas and TablesUploaded byAmanda
- Microsoft Word - Documento1Uploaded byFabfour Fabrizio
- 1-s2.0-S0047259X14002711-mainUploaded bypedrohcgs
- Full TextUploaded byArif Munj
- lec25 (1)Uploaded bysammyluver5
- 350-proj01Uploaded byShihab Khatib
- WeightingUploaded byn_akash3977
- kragt2017Uploaded byzhriee
- 10.1.1.703.9931.pdfUploaded bywerwerwer
- EnvironmentalUploaded byAdnan Siddique
- Your document contains an error. Check to see if the braces match. ie. '{' and '}' count is equal.Uploaded byHealthyWeight
- Normal Distribution Questions ExUploaded byGimson D Parambil
- BiostatisticUploaded byandirio7486
- Lab 7Uploaded byanotheralternate
- Panel Data Problem Set 2Uploaded byYadavalli Chandradeep
- Predicting Statistical Distributions of Footbridge VibrationsUploaded byAhmad Waal
- project handout for studentsUploaded byapi-257252390
- unitplan - edu531 - elizabethtilton 2-5Uploaded byapi-254063058
- korelasiUploaded byDian Rizky Ramadhani
- algebra 1 review through jan 2016Uploaded byapi-262532023
- Statistik pendidikanUploaded bymohd razali abd samad
- 7.Sampling and Sampling DistrUploaded bySuryati Purba
- US Federal Reserve: method errataUploaded byThe Fed
- Gear Health Threshold Setting Based On a Probability of False AlarmUploaded byEric Bechhoefer
- SPSS TutorialUploaded byKenan Ali
- Vol 4 - Info. TechUploaded bywilolud9822
- Probability and Statistics NotesUploaded byendu wesen

- corong inovasiUploaded bySyaiful Mansyur
- corong inovasiUploaded bySyaiful Mansyur
- Bedah Buku Konversi BiomassaUploaded bySyaiful Mansyur
- Background and justification.docxUploaded bySyaiful Mansyur
- 07. Bab 10 Budaya Sebagai Sumber Kekuatan & KelemahanUploaded bySyaiful Mansyur
- Iso BaruuuuuuUploaded bySeptiana Nur Untari
- Iso BaruuuuuuUploaded bySeptiana Nur Untari
- Surat Perjanjian Mas WiyonoUploaded bySyaiful Mansyur
- Surat Perjanjian Mas WiyonoUploaded bySyaiful Mansyur
- indeks pembanguna manusiaUploaded bySyaiful Mansyur
- Uji Kemampuan Minyak Hasil Pirolisis Plastik Sebagai Bahan Bakar Alternatif Dibandingkan Dengan Minyak Tanah Dan SolarUploaded bySyaiful Mansyur
- Jumlah Penduduk Miskin Menurut ProvinsiUploaded bySyaiful Mansyur
- 1. User Guide Pddikti - FeederUploaded bySyahrial Sidik Al-ghifari
- 08. Bab 11 Keterkaitan Budaya Organisasi Dengan Perangkat Organisasi LainnyaUploaded bySyaiful Mansyur
- Briket Serbuk Dan Oli BekasUploaded bySyaiful Mansyur
- Resume TesisUploaded bySyaiful Mansyur
- Tugas Risk ManajemenUploaded bySyaiful Mansyur
- Data Valid as i 282 Rusa Kring AnUploaded bySyaiful Mansyur
- Tentang an Kopi Kapal API PtUploaded byHviz F. Pranata
- Kementerian Sosial 2014Uploaded byAlek Al Hadi
- Profil PerusahaanUploaded bySyaiful Mansyur
- Teknik Pemesinan Jilid 2Uploaded bycepimanca
- Surat KuasaUploaded bySyaiful Mansyur
- MIT2_092F09_lec01Uploaded byjgómez_510704
- Learning High Quality Decisions With Neural Networks in “Conscious” Software AgentsUploaded bySyaiful Mansyur
- Ekspor Briket SerbukUploaded bySyaiful Mansyur
- NN ExamplesUploaded byMinh Map Vu
- soal dan pembahasan olimpiade fisikaUploaded byNailul Abror
- belajar statistik: probabilitas teorema bayesUploaded bySyaiful Mansyur
- Dataset SeedsUploaded bySyaiful Mansyur

- Tieu Chuan Vietnam TCVN 2622 - 1995 (en)Uploaded bythomdao
- What Sort of People Should There Be_ - Jonathan GloverUploaded byAmber Cotton
- 806Uploaded byB. Merkur
- VIP Business Immigration Brochure EN 2011Uploaded byinfo2699
- Cl3 Comp Prep.09Uploaded byAnup Lal Rajbahak
- barrel race entry formUploaded byapi-245044768
- Tronic Pet FeederUploaded byfernlop
- Astro 2018 BreastUploaded byAndrés Imbaquingo
- List of Mobile Phone Makers by CountryUploaded byUday Edapalapati
- t 5 ChlorineUploaded byMahesh Kondavathini
- drug studyUploaded byJhOe Soreno Serrano
- Harvard Trauma Questionnaire OriginalUploaded byAdmiry
- Example Masterplan for Reference 2Uploaded byCharles Fernandes
- Graphitized Mesoporous Carbon Modified Glassy Carbon Electrode for Selective Sensing of Xanthine, Hypoxanthine and Uric AcidUploaded bysunilove
- Halonace 5mg 10 TABUploaded byHadeelNehad
- AGU_Poster_OA_DR_KKUploaded byboris carballo
- Aug 1992 VW T4 Owners ManualWMUploaded bymalamock
- Statutory Returns Calendar 2Uploaded byShoaib
- Cape Cod and Hyannis Railroad Inc. - Anatomy of an 03 Account - June 1989Uploaded byffm784
- I T CUploaded byMohamed Ali
- west bengal disaster management 2008Uploaded byapi-19712860
- AYPVC Programme OutlineUploaded byNina Oktapiani Sembiring Kembaren
- New Microsoft Word Document (7)Uploaded byShejlla
- HSIL-Hem-Securities-Research-Report.pdfUploaded byThasveer Av
- 2) Piping Interview QuestionnaireUploaded byAjaz Malik
- Creating Awareness About HIV-AIDS in the CommunityUploaded byapi-3823785
- Earth Science 2GG3 - Natural Disasters (McMaster University)Uploaded byJohnny Flash
- Difference Between Organic and Inorganic CompoundsUploaded byBilal Khan
- SilverCrest WiFi Repeater Manual106969_EN.pdfUploaded bygmichaels44
- Marital CounsellingUploaded byManali Verma