8 views

Uploaded by Anjali Shergil

Intro Multivariate Stats(Lecture10)

save

You are on page 1of 30

Overview of Methods

8 6

4

2

0

-2 -2 0 2 4 6 8

Sophisticated multivariate statistical methods are becoming standard practice in the physical, natural and social sciences, as well as in business

• Variations of existing methods are being developed, existing techniques are being applied to new applications, and new methods continue to be designed

1.0

Talking to co-workers Reading a book

.5

Talking to friends

30-34 35-39

Radio Cable television 40-44

Dimension 2

0.0

25-29

Newspaper 45-49 Broadcast television

-.5

Internet 50-54

-1.0

Magazines

Most important information source Age group

-1.5 -1.0 -.8 -.6 -.4 -.2 .0 .2 .4 .6 .8

Dimension 1

**The accelerated use of advanced multivariate techniques is being driven by
**

• Growing complexity in the topics being addressed • Ever-larger data sets • Ability to apply computationally intensive methods through powerful computer tools • Academic training

8 6 4 2 0 -2 -2 0 2 4 6 8 .

standard deviation All of above. mean. equality of differences Ordered relation. equality of differences. absolute zero Attitudes. frequency Percentiles. social class Economic indices Sales. mode Ordinal Ordered relation between categories Ordered relation. coefficient of variation Interval Metric Ratio .Scale Nominal Definition Non-ordered categories Examples Race. median Range. gender marital status Descriptive Statistics Percentages.costs.

Response vs explanatory • Response or dependent variable ▪ Variable to be modeled or predicted • Explanatory or independent variable ▪ Variables used to predict or model dependent variable Importance of identifying data and variable types • Critical in determining analysis objectives and appropriate analysis method • Avoid inappropriate variable operations .

Dependence techniques • One or a set of variables are regarded as dependent variables • Objective is to predict or explain the value of the dependent variable(s) based on the values of a set of independent variables • Examples ▪ What is the probability that a loan applicant will default? ▪ What factors best differentiate people whose primary news source is the Internet? .

Dependence techniques • • • • • • • Multiple regression Logistic regression Discriminant analysis Canonical correlation Structural equation modeling Analysis of variance Decision trees .

Interdependence techniques • No single group of variables defined as dependent or independent • Objective is to identify and characterize underlying structure between the variables • Examples ▪ What are the underlying factors that define a customer’s perception of a brand? ▪ Which signal returns arise from the same object and how many objects are present? .

Interdependence techniques • • • • Factor analysis Multidimensional scaling Correspondence analysis Cluster analysis .

Interdependence techniques are valuable data reduction methods • Data reduction attempts to manage and interpret the large amounts of data gathered • One goal is combine groups of cases measured over multiple variables into a relatively small number of understandable segments • Or to group variables together into categories of latent traits and then characterize cases with respect to this smaller number of traits The reduced data variables are then often used as variables in dependence techniques .

Multiple regression is a dependence technique used to model the relationship between the value of a single metric dependent variable and a set of metric independent variables • Categorical variables can be included as “dummy” variables Model can be applied to predict changes in the dependent variable’s response to changes in the independent variables Regression also indicates the relative importance of independent variables on the response of the dependent variable .

a client may be interested in understanding the effect of price and promotional activity on a product’s market share among both “loyal” and “not loyal” customers Technical result is a linear model of the form • Y = a0 + a1X1 + a2X2 + … +anXn Best visualizations of the results control all but one (or two) of the independent variables and examine how the value of dependent variable changes with respect to the “free” independent variables .For example.

Market share for loyal customers Market share for not-loyal customers 60 60 50 50 40 40 30 30 20 20 Market Share Market Share 10 10 0 20 30 40 50 60 70 80 0 20 30 40 50 60 70 80 Promotion Index Promotion Index .

preferably on interval scale • Familiar and useful technique Issues • Assumes linear relationship between dependent and independent variables • Overused and often assumptions not fully checked • Often misapplied to classification problems . Properties • Single interval scale dependent variable • Multiple independent variables.

Logistic Regression is a dependence techniques used to model the relationship between a single categorical dependent variable and a set of metric independent variables • Typically dependent variable takes one of two values – success/failure. buy/do not buy • Multinomial formulations A logistic model gives the probability that the dependent variable takes a target value given the values of the independent variable .

which credit and demographic factors best predict whether a customer will keep a loan current • Dependent variable taken as 60 days past due or worse • Independent variables are credit and employment history. For example. and demographic descriptors .

Properties • Powerful technique for predicting group membership and identifying important independent variables • Becoming more widely used • Procedures and results similar to linear regression Issues • Adequate data • Model validation • Communicating probabilistic concepts .

Classification could be used to focus advertising campaign . Decision trees are a dependence technique used to develop a model to classify the value of a single dependent variable based on a set of independent variables • Dependent and independent variables can be any data type The typical product of CART is a straightforward. easily interpretable set of segmentation rules • For example. classify existing customers as high or low likelihood buyers of a new product based on demographics and historical purchasing behavior.

Decision trees can be also used to examine profiles of different market segments with respect to underlying demographic and psychographic variables ▪ For example. what are the most significant demographic variables determining whether the Internet is a person’s most important information source? .

.

Properties • Single dependent variable of any scale • Multiple independent variables of any scale • Free of model assumptions typical in other dependence techniques • Powerful statistical learning algorithm able to identify complex variable interactions • Not as familiar • Standard inferential statistics not applicable • Often leads to asymmetric relationships Issues .

Factor analysis is an interdependence technique used to identify a set of underlying latent traits (factors) that explain the correlations between a large number of variables • Data summarizing ▪ Derive a set of underlying concepts that summarize a larger set of variables • Data reduction ▪ Develop a set of factor variables that serves as a more parsimonious description of the data .

Interested in defining underlying dimensions influencing the perception of online destinations • Survey respondents are asked to rate a set of destinations (including client’s) with respect to a number of traits • Factor analysis can be applied to develop a succinct set of perception dimensions • This manageable set of dimensions can be used to characterize a client’s site and to develop a focused plan to reposition it .

0 3.5 Factors can then be used to provide visual summary of data A C B E On a scale of 1 to 5 where "1" means "not at all descriptive" and "5" means "extremely descriptive.0 1.5 H 2.5 1.0 2. 4.5 4.5 3.5 D 1." how well do each of the following words or phrases describe the +website? 4.0 .0 3.5 F Competence Sophistication Trustworthy Exciting Trustworthy 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Down-to-earth Daring Intelligent Confusing Friendly Up-to-date Clumsy Slick Genuine Imaginative Pretentious Upper class Honest Spirited Dependable Reliable Informative Silly Efficient Sassy .0 4.5 3.0 G Client 2.5 2.

Properties • Very useful in identifying structure and relationships in data • Provides tractable set of concepts for both managerial and analytical uses • Provides opportunities for visualizations Issues • Questionnaire design • Variable selection • Factor interpretation and validity .

Cluster analysis is an interdependence technique used to segment cases into homogeneous groups based on a specified set of variables • Data reduction ▪ Develop a more parsimonious description of cases which can then be used in analytical classification methods • Identify similarities between cases with respect to clustering variables • Characterize clusters with respect to other sets of variables .

Cluster analysis provides an objective method for multiple traits • Clusters can be characterized with respect to variables not used in the analysis. Want to identify and then characterize similar groups of TV pilot shows based on survey responses rating shows on various traits • For one or two traits it may be possible to do this subjectively. such as show success. and cluster membership can be used as a dependent variable in classification method .

60 50 1 The Grub National 2 The Pitt 3 Oliver B Cedric Wanda at Live Gir Ground2 Normal O More Pat Becoming Bernie M 40 Beat Cop Andy Ric College Nathan's Greg Ruling the C 30 Msgr. Ma Normal P Tick2 HUMOR 20 20 30 40 50 60 CLEVER Cluster 1: Low likelihood of success Cluster 2: Moderate likelihood of success Cluster 3: High likelihood of success .

Properties • Many cluster techniques are available for data of all scales • Can identify structure in large data sets that may be difficult to discover in any other way • Provides objective segmentation method Issues • Selecting appropriate clustering method • Determining appropriate number of clusters • Validating clusters .

- edfUploaded byAbhishek Neha
- week 5 reading 1Uploaded byAydin Yelin
- Institutions SampleUploaded byJing Zhao
- Chapter 1 to Curriculum Vitae - Final 2Uploaded byRoxanne Montealegre
- ExcelHandbkUploaded byValentina Socol
- Bookbinders Case 1Uploaded byAnonymous armxBd
- Prevalence and Potential Risk Factors for Org a No Phosphate and Pyretroid Resistance in Boophilus Micrplus Tick on Cattle Ranches From the State of Yucatan MexicoUploaded byEulises Muñoz Cuevas
- SPSS and TestsUploaded byHoney Sarao
- fall arestUploaded byAli Rizvi
- Am. J. Agr. Econ.-2014-Maystadt-1157-82Uploaded byLupo
- Hothorn+Hornik+Zeileis-2006Uploaded byjuanquesada57295
- eco dummyUploaded byPulkit Gupta
- Relationship Between HotelUploaded byCooobey
- Evaluating multiple spatial dimensions of economic growth in Brazil using spatial panel data modelsUploaded bycm_feipe
- Descriptive and Inferential Statistics Part 2 2015Uploaded bynurfazihah
- SSRN-id1940504Uploaded bywalamaking
- 5167565.pdfUploaded byFerdina Nidyasari
- Biostatistics in Public Health Using STATA (Introduction)Uploaded byArif Nugroho
- Agren CNPinPlants EcLet04 CopyUploaded byEsteban Klop
- INFLUENCE OF EMPLOYEE TRAINING ON ORGANIZATIONAL PERFORMANCE: A CASE OF KISII COUNTY GOVERNMENT, KENYA.Uploaded byIJAR Journal
- 3Uploaded bybalajimeie
- Desmos User GuideUploaded byvarun

- Assignment6.1 DataMining Part2 Multiple Linear RegressionUploaded bydalo835
- Parametros de Random ForestUploaded byYulisa Quecaño Turpo
- Esta Di SticaUploaded byCarlos Martinez Gutierrez
- Australian Journal LibreUploaded byBambangWijanarkoSoesilo
- Jmp Stat Graph GuideUploaded byvemuriraja
- Moderation_Meditation.pdfUploaded byMostafa Salah Elmokadem
- Regression DdaUploaded byDhanushka Rajapaksha
- Survey on Data Analysis using Correlation MethodsUploaded byIRJET Journal
- ESTADISTICA ACTIVIDAD 11Uploaded bymateo narvaez torrez
- Brm AnalysisUploaded byDeepali Sharma
- SAS - Regression Using JMPUploaded byDany Davan
- ANOVAResUploaded byCarlos Ramon Vidal Tovar
- Part 2 Rothart Empirical Research MethodsUploaded byjulile
- Fa UsefulUploaded byMingmiin Teoh
- hasil_uji_validitasUploaded byBerta Sihite
- 230705841-guia-practica-regresion.docUploaded bywilson
- CorrelationUploaded byanonymice
- NOTA SEMUploaded bywartawan24
- Taller 1 Econometría RRUploaded byRicardo Reales
- Use of Dami Variables in Eco No MetricsUploaded byMian Ahmad Sajjad Shabbir
- ch15Uploaded byandroide007
- analisis de regresionUploaded bySaúl Eduardo Alzamora Verastegui
- Regression WorksheetUploaded byRocket Fire
- RMA AnalysisUploaded byOhmy Gosh
- 390 Lecture 21Uploaded bykhilanvekaria
- Stata 3, Linear Regression v3Uploaded byApostolos Davillas
- RaschUploaded byRaimundo Magalhães
- 248644906-Ejercicios-Pronosticos-Gaither-y-Chase-docx.pdfUploaded byCristian Vilca
- Multilevel and Mixed-Effects hamilton_ch7.pdfUploaded byMatthew
- Econometrics pset3Uploaded bytarun singh