The Artsy Corporation has been sued in the United States Federal Court on charges of employment discrimination under Title VII of the Civil Rights Act of 1964. (Artsy is an actual corporation and the data given in the case is real, but the name has been changed to protect the firm's true identity.) The litigation at contention here is a "class action" lawsuit brought on behalf of all females whom the company employed, or who had applied for work with the company, between 1979 and 1987. Artsy operates in several states, runs four quite distinct businesses, and has many different types of employees. The allegations against Artsy include issues of hiring, pay, promotions, and other "conditions of employment." In such large class action employment discrimination lawsuits statistical evidence commonly plays a central role in the determination of guilt or damages. In an interesting twist on traditional legal procedures, the precedent in these cases is that plaintiffs may make a "prima-facie" case purely in terms of circumstantial statistical evidence. If that statistical evidence is reasonably strong, the burden of proof shifts to the defendants to rebut the plaintiff's statistics with other statistical data, other statistical analyses of the same data, or by non-statistical testimony. In practice, statistical arguments often dominate the proceedings of such EEO cases. Indeed, in this case the statistical data used filled numerous computer tapes and the supporting statistical analysis comprised thousands of pages of computer printouts and reports. We work here with a small subset of the voluminous data that pertain to one of the several contested issues in one of the company's locations.

Specifically, the data in Table 1 relate to the pay of 256 employees on the biweekly payroll at one of the Artsy Companys Pocahontas, Maine production facilities. The data include: an identification number (IDNUMBER) that would permit us to identify the person by name or social security number, the person's sex (SEX) where a 0 denotes female and a 1 denotes a male, the person's job grade in 1986 (GRADE), the length of time (in years) the person had been in that job grade as of 12/31/86 (TING), and

the person's weekly pay rate as of 12/31/86 (RATE). The issue of concern is fair pay for female employees. The plaintiff's attorneys have proposed settling the pay issues for this group of female employees for a "back pay" lump payment of 25% of their pay during the period 1979 to 1987. It is our task to examine the data in the table for evidence in favor of, or against the charges of pay discrimination against the females. To make our mission explicit suppose that we are to advise the lawyers for the Artsy Company on how to proceed. (An alternative mission would be to assist the plaintiffs.)

Please consider the following issues: 1) Overall, how different is pay by sex? Are the differences in pay statistically significant? Is a statistical hypothesis test appropriate in an issue like this? If so, how should it be done? How could it be explained to a judge? What arguments do you anticipate the plaintiffs will be making with these data? Answer Box plot can be used to compare the average values of pay graphically.

The box plot suggest that median pay for feamles are less compared to that of males. They are a few number of outliers also. Hypothesis testing H0: There is no signifiacnt difference in the mean pay of males and females. H1: The mean pay of females are significantly lower than that of males. Test Statistics used is independnet sample t test.

The test statistic used is

t=

X1 X 2 S

Rejection criteria: Reject the null hypothesis, if the calculated value of t is greater than the critical value of t at 0.05 significance level. Details

Group Statistics SEX RATE Female Male N 171 85 Mean 832.77 1128.18 Std. Deviation 158.529 223.338 Std. Error Mean 12.123 24.224

Independent Samples Test RATE Equal variances assumed Levene's Test for Equality of F Variances t-test for Equality of Means Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 18.431 .000 -12.195 254 .000 -295.405 24.224 -343.109 -247.700 -10.905 127.396 .000 -295.405 27.089 -349.006 -241.803 Equal variances not assumed

Conclusion: Reject the null hypothesis. The sample provides enough evidence to support the claim that The mean pay of females are significantly lower than that of males. Plaintiffs may argue that this difference in average salary is due to gender discrimination in the Artsy Corporation. 2) The Artsy Company wishes to argue that a legitimate explanation of any pay rate difference is the difference in job grades by sex. (In this analysis we will tacitly assume that each person's job grade is, in fact, appropriate for them, even though the plaintiff's attorneys have charged that females have been unfairly kept in the lower grades. Other statistical data, not available here, are used in the analysis of the job placement issue.) The companys lawyers ask, "Is there a relatively easy way to understand, to analyze and display the pay differences by job grade? Easy enough that it

4

could be presented to an average jury without confusing them? Again, try to anticipate the possible arguments of the plaintiffs. To what extent does job grade appear to explain the pay rate differences between the sexes? Propose and carry out appropriate hypothesis tests or confidence intervals to check whether the difference in pay between sexes is statistically significant within each of the grades. Answer Here Two ANOVA with interaction term can be adopted to answer the question. H01:There is no signifiacnt difference in the mean pay of males and females. H11: There is signifiacnt difference in the mean pay of males and females. H01:There is no signifiacnt difference in the mean pay in differnet grades . H12: There is signifiacnt difference in the mean pay in differnet grades. H03: There is no significant interaction effect between sex and grades for pay rate. H13: There is significant interaction effect between sex and grades for pay rate. Here we are mainly interested in the 3rd hypotheis about interaction effect. Test statistic used is F test (ANOVA) Rejection criteria: Reject the null hypothesis, if the calculated value of t is greater than the critical value of t at 0.05 significance level. Details

GRD Female Mean 1 2 3 4

dimension2

SEX Male N 22 51 22 18 24 15 17 2 171 835.33 824.20 918.64 1130.80 1212.85 1375.94 1128.18 36.776 87.688 161.421 113.927 133.423 103.461 223.338 9 5 11 10 33 16 85 Mean 804.00 Std. Deviation . N 1

Std. Deviation 81.492 56.311 57.085 87.505 67.578 99.692 122.897 128.693 158.529

5 6 7 8 Total

Tests of Between-Subjects Effects Dependent Variable:RATE Source Corrected Model Intercept SEX GRD SEX * GRD Error Total Corrected Total Type III Sum of Squares 1.127E7 8.378E7 109218.580 4244536.124 114509.511 2148174.881 2.352E8 1.342E7 df 14 1 1 7 6 241 256 255 Mean Square 804906.055 8.378E7 109218.580 606362.303 19084.919 8913.589 F 90.301 9398.924 12.253 68.027 2.141 Sig. .000 .000 .001 .000 .050

Conclusion: Fails to reject the null hypothesis about the interaction . The sample provides enough evidence to support the claim that there is no discrimination in pay rate at different grades by sex. The other two hypothesis are significant and suggest that there is significant difference in pay rate with respect to gender and grade.

The interaction plot also supports the above arguments. The model adequacy measure R2 =0.831. Thus 83.1% variability in pay rate can be explained by the two way ANOVA.

6

3) In the actual case, the analysis carried out in (2) above suggested to the attorneys that differences in pay rates are due, at least in part, to differences in job grades. They had heard that in another EEO case the dependence of pay rate on job grade had been investigated with regression analysis. Perform a simple linear regression of pay rate on job grade. Interpret the results fully. Is the regression significant? How much of the variability in pay does job grade account for? What light does this analysis shed on the pay fairness issue? Does it help or hurt the Artsy company? Answer Here scatter diagram can be adopted to graphically represent the relationship between pay rate and grade.

The scatter diagram suggest that there is a positive correlation between grade and pay rate. The estimated regression equation is

Coefficientsa Model Unstandardized Coefficients B 1 (Constant) GRD a. Dependent Variable: RATE 533.937 90.001 Std. Error 15.341 3.105 .876 Standardized Coefficients Beta t 34.804 28.989 Sig. .000 .000

Thus for a unit increase in grade, the pay rate increase by 90.001 units. The t test for the significance of regression coefficient is highly significant with t statistic =28.989, p-value =0.000. Thus we can conclude that grade have a significant effect on the rate.

Model

dime nsio

R .876a

Model Summary Adjusted R Std. Error of the R Square Square Estimate .768 .767 110.724

The model adequacy measure R2 suggests that 76.8% variability in pay rate can be explained by the simple regression model with grade as the explanatory variable.

4) It is argued that seniority within a job grade should be taken into account since the Artsy Company's written pay policy explicitly calls for the consideration of this factor. How different are times in grade by sex? Enough to matter? Answer Here independent sample t test can be applied. H0: There is no significant difference in the mean times in grade among males and females. H1: There is significant difference in the mean times in grade among males and females. Test Statistics used is independnet sample t test.

Group Statistics SEX TinG Female Male N 171 85 Mean 1.286 2.628 Std. Deviation 1.0602 1.8322 Std. Error Mean .0811 .1987

Independent Samples Test TinG Equal variances assumed Levene's Test for Equality of F Variances t-test for Equality of Means Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 84.526 .000 -7.411 254 .000 -1.3423 .1811 -1.6989 -.9856 -6.254 112.747 .000 -1.3423 .2146 -1.7675 -.9170 Equal variances not assumed

Conclusion: Reject the null hypothesis. The sample provides enough evidence to support the claim that there is significant difference in the mean times in grade among males and females. Clearly mean times in grade for males is higher than that of females. The box plot also support this argument

5) The Artsy legal team wants an analysis of the simultaneous influence of grade and time in grade on pay. Perform a multiple linear regression of pay rate versus grade and time in grade. Is the regression significant? How much of the variability in pay rates does this model explain? Will this analysis help your clients? Could the plaintiffs effectively attack it? Utilize residuals in your analysis of these issues. Answer Here the pay rate can be analyzed using a multiple regression model with interaction term for grade and time in grade .

Coefficientsa Model Unstandardized Coefficients B Std. Error 537.622 22.454 18.943 75.867 2.891 12.838 4.659 2.114 Standardized Coefficients Beta .124 .739 .138

(Constant)

10

Rate =537.622+18.943*TinG+75.867*GRD+2.891*TinG*GRD The t test for the significance of regression coefficients suggest that only grade have significant effect on the rate. The main effect of TinG and Interaction effect of TinG*GRD has only insignificant effect on Rate. The assumptions of regression model are validated using the residual analysis. The Histogram and PP plot of residuals suggest that the errors have a normal distribution.

The homogeneity of variance assumption is validated using the plot of residuals against the predicted value.

11

Thus with the support of residual analysis, we can claim that the major factor that influence the rate is the grade. The model adequacy measure R2 indicates that 81.5% variability can be explained by the multiple regression models.

Model Summaryb Adjusted R R R Square Square 1 .903a .815 .813 a. Predictors: (Constant), TinG*GRD, GRD, TinG b. Dependent Variable: RATE Model

dime nsio

The results from the above analysis give a solid statistical evidence to claim that there is no significant level of discrimination based on gender. All the assumptions of regression analysis are also valid. Thus it is difficult for the plaintiffs to effectively attack it .

12

6) The attorneys ask: Is it possible to do a regression analysis that simultaneously considers the effect on pay of grade, time-ingrade and sex? If so, carry one out. Answer Here a multiple regression analysis with dummy coded variable for sex can be used to answer the question. The estimated regression model is Rate = 526.882+75.019*Grade +59.667*Sex+30.79*TinG

Coefficientsa Model Unstandardized Coefficients B 1 (Constant) GRD SEX TinG a. Dependent Variable: RATE 526.882 75.019 59.667 30.790 Std. Error 14.131 3.325 15.980 4.562 .730 .123 .202 Standardized Coefficients Beta t 37.285 22.562 3.734 6.749 Sig. .000 .000 .000 .000

The regression coefficients can be interpreted as For a unit increase in grade, the pay increase by 75.019 units . For males, the pay rate is 59.667 units higher than that of females. For a unit increase in time in grade, the pay increase by 30.790 units. The t test for the significance of regression coefficients are highly significant with p values less than 0.05. Thus we can conclude that all explanatory variables have significant effect on pay rate. Here model adequacy measure R2=0.823 . Thus 82.3% variability pay rate can be explained by the regression model.

13

7) Organize your analyses and conclusions in a brief report summarizing your findings for your client, the Artsy Corporation. Be complete but succinct. Be sure to advise them on the issue of the settlement. Please be as forceful as you can be in arguing "the Artsy Case" without misusing the data or statistical theory. Apprise your client of the risks they face by developing the most forceful counter argument that you believe the female plaintiffs could fairly make. Conclusion: Statistical techniques are effectively applied here to establish that there is no discrimination in the pay rate among males and females. Inappropriate use independent sample t test suggest that there is significant difference in the pay rate with respect to gender. But regression analysis and two way ANOVA are used to disprove this argument. These statistical techniques indicates that Pay rate is determined by the grade and not the other factors. The assumption of regression analysis is also validated using the residual plot. Thus it is difficult for the female plaintiffs to raise a valid counter argument against the conclusions.

14

ID RATE 1 865 2 820 3 675 4 1494 5 730 6 710 7 692 8 723 9 727 10 692 11 1142 12 1413 13 795 14 825 15 867 16 779 17 1057 18 706 19 1052 20 735 21 780 22 1255 23 1264 24 692 25 946 26 1410 27 747 28 789 29 1110 30 923 31 692 32 648 33 1067 34 870 35 882 36 885 37 909 38 1035 39 658 40 860 41 616 42 924 43 929 44 762 45 1223 46 907 47 1119 48 1050 49 1500 50 740 51 1183 52 990 53 1368 54 1385 55 834 56 1263 57 1154 58 1263 59 814 60 825 61 840 62 692 63 837 64 813 TinG SEX GRD ID RATE 1.5 0 2 65 963 0.5 0 4 66 747 1.5 0 2 67 916 1.5 1 8 68 952 0.5 1 4 69 831 1.5 0 2 70 854 1.1 0 2 71 660 0.5 0 3 72 1174 0.8 0 2 73 1057 1.5 0 2 74 1230 0.5 1 6 75 628 0.5 1 8 76 762 1.5 0 3 77 885 1.5 1 3 78 865 0.5 0 4 79 1177 0.5 0 3 80 825 0.5 1 5 81 848 1.5 0 1 82 682 0.5 1 7 83 1240 0.5 0 2 84 1519 1.5 0 2 85 730 0.5 1 7 86 1500 5.0 1 7 87 806 0.7 0 2 88 813 2.5 0 6 89 801 5.0 1 8 90 894 1.5 0 2 91 825 2.5 0 2 92 893 1.5 1 7 93 687 0.5 0 5 94 796 0.2 0 2 95 702 1.3 0 1 96 788 1.5 0 7 97 1110 2.5 1 5 98 779 2.5 0 5 99 795 1.5 1 3 100 780 0.5 1 3 101 819 0.5 0 7 102 1229 2.2 0 1 103 810 1.5 1 4 104 630 0.8 0 2 105 730 2.5 0 6 106 1065 1.5 1 5 107 816 0.6 0 3 108 1172 2.5 1 7 109 723 3.5 1 4 110 958 4.5 1 7 111 1275 0.5 0 4 112 894 4.5 1 7 113 602 0.7 1 5 114 1004 0.5 0 8 115 1135 2.5 0 5 116 840 5.0 1 7 117 756 0.5 1 8 118 770 0.5 0 3 119 750 5.0 1 7 120 687 5.0 1 6 121 900 5.0 1 7 122 780 1.5 0 5 123 1428 1.5 1 3 124 1275 5.0 0 3 125 912 0.3 0 2 126 1174 0.9 0 3 127 710 0.5 0 4 128 1263 TinG SEX GRD ID RATE 2.5 0 7 129 788 2.5 0 2 130 808 0.5 0 6 131 1338 1.5 0 3 132 808 0.5 1 3 133 1230 1.5 0 3 134 1024 0.9 0 2 135 588 0.5 1 6 136 906 3.5 1 6 137 1552 1.5 1 7 138 1177 1.0 0 1 139 802 1.6 0 2 140 612 0.5 0 5 141 1002 0.1 0 4 142 932 3.5 1 5 143 1191 1.5 0 3 144 730 1.5 0 3 145 1365 0.8 0 1 146 810 5.0 1 7 147 856 5.0 1 7 148 1269 0.1 0 4 149 624 3.3 1 8 150 865 0.4 1 5 151 698 1.5 0 2 152 1238 0.5 1 3 153 990 0.1 0 4 154 818 0.5 0 4 155 687 1.5 0 5 156 1067 2.5 0 2 157 730 0.5 0 3 158 1350 1.2 0 2 159 1385 0.5 0 1 160 867 1.5 1 7 161 1128 4.5 0 1 162 1082 2.5 0 2 163 1396 0.1 0 2 164 831 2.5 1 3 165 692 4.5 1 8 166 1131 0.5 0 5 167 837 0.2 0 1 168 735 0.5 0 4 169 1073 0.5 1 7 170 710 1.5 0 3 171 923 5.0 1 7 172 1200 1.3 0 2 173 894 3.0 1 6 174 804 5.0 1 8 175 590 0.5 0 6 176 914 1.0 0 1 177 588 2.5 0 7 178 780 0.5 1 6 179 623 1.5 0 3 180 717 2.5 0 2 181 762 1.5 0 2 182 1154 0.5 0 2 183 779 2.5 0 2 184 771 0.5 0 4 185 1350 1.0 0 2 186 1360 5.0 1 7 187 616 5.0 1 8 188 1428 0.5 0 5 189 813 0.5 1 7 190 740 1.8 0 2 191 635 4.5 1 7 192 817 TinG SEX GRD ID RATE 1.2 0 2 193 713 0.5 0 2 194 952 5.0 1 7 195 1376 2.2 0 4 196 630 3.5 1 7 197 901 0.5 0 7 198 579 1.1 0 1 199 952 0.8 0 5 200 1125 5.0 1 8 201 663 5.0 1 5 202 1390 1.2 0 2 203 1038 0.5 0 1 204 720 1.5 0 6 205 960 1.5 0 4 206 756 1.5 1 7 207 597 0.1 0 4 208 623 0.5 0 8 209 756 1.5 0 3 210 804 1.5 0 1 211 1158 3.5 1 7 212 1148 0.8 0 2 213 1050 0.5 0 4 214 858 0.9 0 1 215 1004 2.0 0 7 216 1390 2.5 0 6 217 894 1.5 0 1 218 952 2.5 0 2 219 1200 1.5 0 7 220 842 0.5 1 4 221 1131 1.5 1 8 222 990 5.0 1 8 223 1073 1.5 0 5 224 690 3.7 1 7 225 961 5.0 0 6 226 762 5.0 1 8 227 1419 0.5 0 5 228 1258 0.0 0 2 229 900 4.5 1 7 230 804 0.1 0 3 231 1096 0.5 0 2 232 932 1.5 0 5 233 819 0.7 0 2 234 1056 0.5 0 4 235 764 0.9 0 6 236 1079 1.5 1 4 237 690 1.5 0 2 238 1183 0.8 0 1 239 837 0.5 0 6 240 929 1.0 0 1 241 835 0.5 0 5 242 886 0.3 0 1 243 806 0.7 0 1 244 929 0.7 0 4 245 1070 2.5 1 8 246 730 1.5 0 2 247 762 0.5 0 3 248 1053 1.5 1 8 249 1188 0.5 0 7 250 981 0.8 0 2 251 951 5.0 1 8 252 606 1.5 1 5 253 806 0.5 1 5 254 720 0.2 0 2 255 981 0.5 0 5 256 1038 TinG SEX GRD 1.5 0 2 1.5 0 6 5.0 1 6 4.5 0 1 1.5 0 5 0.8 0 1 1.5 0 5 0.5 1 6 0.5 0 2 5.0 1 7 0.7 0 7 0.2 0 2 4.5 1 7 2.3 0 2 0.9 0 1 0.5 0 2 2.5 0 2 3.5 1 1 5.0 1 7 2.5 0 7 0.5 0 7 3.5 0 5 2.5 1 6 5.0 0 7 1.5 0 5 0.8 1 7 0.5 1 7 0.5 0 3 2.5 1 7 2.5 1 5 3.5 0 7 0.7 0 2 5.0 0 5 0.8 0 2 0.5 1 8 5.0 1 7 1.5 0 3 1.5 1 3 0.4 0 6 2.5 0 5 0.5 1 3 2.5 0 7 0.5 0 3 1.5 0 6 0.5 0 2 0.5 1 6 0.2 0 5 0.5 0 5 1.5 0 5 1.5 0 3 0.5 1 5 0.5 0 6 2.5 1 7 1.0 0 4 0.5 0 4 0.5 0 7 3.3 0 6 1.3 0 7 5.0 0 3 0.5 0 1 0.5 0 5 1.2 0 2 0.5 0 6 2.5 0 7

15

16

