You are on page 1of 9

Jager

Name: Hantao (Forrest) Ling

Fall 2013 ST314: Homework 1 Total: 25 points Due: Wednesday, October 8th, at 11:55pm Please Download, complete and upload as PDF. No other format will be accepted. Typing or entering answers by hand is accepted as long as solutions are neatly given and document is uploaded as PDF. Give the solutions in the space provided. Material from Week 1 Lecture, Chapters 1 and 2 are covered on this assignment. Only select problems will be graded for correctness. Others will be graded upon completion. Part I: (5 points)For 3x5 Introcard that includes: Bring to class Tuesday, October 7th, a 3x5 index card with the following information: 1. A picture of you attached. 2. The type of engineering you are interested in. 3. Best class you ever took and why. 4. Something interesting about you! This helps me remember your names Part II: Data collection and Visualization: Chapters 1 and 2 1. (Question 1.4 in Text) Researchers are interested in the breaking strength of the plastic. Small plastic pellets are pressed into a test specimen and then measured for breaking strength. There are two suppliers of the plastic pellets. Interest lies in determining if the mean breaking strength differs for the two suppliers. Fifteen batches of pellets are selected at random from each supplier. a. What is the response of interest? Hint: What is being measured? The response of interest is the breaking strength of a plastic pellet.

b. List the factors of interest. State whether they are categorical or continuous. - Supplier Categorical - Test Specimen - Categorical

c. List the levels for the factors. Factor Supplier Levels Supplier A, Supplier B Factor Test Specimen Levels Test specimen (only one level)

d. List out all possible treatment combinations. - Breaking strength tested on the structure of the plastic pellets from Supplier A pressed into a test specimen - Breaking strength tested on the structure of the plastic pellets from Supplier B pressed into a test specimen

Jager

Name: Hantao (Forrest) Ling

e. What is the experimental unit? What is the observation unit? Experimental unit 15 batches of plastic pellets from each supplier Observation unit The structure of the plastic pellets pressed into a test specimen

f. (1 point) Is the experiment randomized? Yes. The batches from each supplier are selected from random.

2. Define the following terms and give an example of each (within the space provided): a. Population A population is the total set of all entities in a certain group.

b. Simple Random Sample A simple random sample is a sample from a population where all of the possible combinations of the sample size have an equal chance of being chosen from the population.

c. Stratified Random Sample A stratified random sample is a sample taken from a population that has been divided into characteristically common sub-populations.

d. Systematic Sample A systemic random sample is taken by generating a random starting position within a population and selecting every nth item.

Jager

Name: Hantao (Forrest) Ling

3. In a short paragraph name, describe and compare the three methods for collecting data, or studies, described in the text. Which method seems to have the least disadvantages? There are three methods used to collect data; the retrospective study, the observational study, and the designed experiment. The retrospective study focuses on compiling and analyzing data from previous studies to create a new or more comprehensive study. This study is cost efficient but does not account for missing data and does not allow for the distinction between variable effects. The observational study is an observation of subjects or units where variables or measurements are collected based on an established process. This study is useful for when a treatment cannot be implemented onto a subject. However, it also does not allow for the distinction between variable effects, and thus a cause-and-effect relationship cannot be established. The last data collection method is the designed experiment. The designed experiment is an experiment that can control for confounding between variables and allows the researcher to be able to claim cause-and-effect relationships. This process can be more complicated and difficult to do properly, but it yields the best results. Thus, the designed experiment is the method that has the least disadvantages for collecting data is the designed experiment.

4. Given a data display of a distribution indicates extreme outliers and skewness which measure, the mean or the median, would be a more appropriate to represent the center of the data? Explain your reasoning. In the case that extreme outliers and skewness are present in a data distribution, the more appropriate measure of the center of the data would be the median. Because the mean includes the outliers in its calculation, it is more easily influenced by the outliers than the median. For the calculation of the median, the outliers have the same value in determining the center of data as any other data point and thus will not affect the calculation nearly as much as the mean. For example, when looking at the center of data for the salaries of employees at a large company, the mean of the data may show a skewed center salary value because of the highly paid CEO and manager positions. However, if the median of the data is used instead, the center salary view may be more realistic because a much larger portion of the company may be made up of manual laborers and lower-paying positions.

Jager

Name: Hantao (Forrest) Ling

5. A study was done of the strength of high-performance concrete obtained by using superplastiziers and certain binders. In particular, the following data was gathered on the flexural strength (a measure of the ability to resist failure in bending), which are measured in megapascals (MPa), for a sample of 28 beams made of high-performance concrete. 5.9 8.2 7.2 8.7 7.3 7.8 6.3 9.7 8.1 8.5 6.8 7.7 7.0 9.7 7.6 7.8 6.8 7.7 6.4 8.3 6.3 7.9 9.0 11.6 10.3 11.8 10.7 13.1

a) Construct a stem-leaf display for these data by hand. Hint: Page 40 in Text

**Note: 5 | 9 = 5.9

b) Discuss the distribution of the flexural strength of these beams made of high-performance concrete (i.e. symmetric, skewed, unimodal, outliers etc...). The distribution of the flexural strength of these beams is skewed to the right. The majority of the data points exist around the 6 MPa to 8 MPa range, but there is an outlier beam with a flexural strength of 13.1 MPa that skews the data. c) Suppose that specifications state that the flexural strength of a beam must be 6.5 MPa or higher if it is to be used as a support beam on a bridge. What proportion of the beams in this sample would meet specifications? In this sample twenty-four out of twenty-eight, or 6/7, beams would meet the given specifications.

d) What do you think about the advisability of using these beams as bridge supports? I would advise against using these beams as bridge supports because around 1/7 of the beams cannot handle the load of a bridge. If around 14% of the bridge is made up of these unsafe beams, there is a good chance that the bridge would fail.

Jager

Name: Hantao (Forrest) Ling

6. The following data are lifetimes (in hundreds of hours) of twenty-six 40-watt 110-volt internally frosted incandescent lamps taken from forced life tests. Hint: Page 54 in Text 4.1 9.9 8.6 10.4 9.1 10.5 9.2 10.6 9.2 10.7 9.3 10.9 9.4 10.9 9.5 11.0 9.5 11.3 9.6 11.6 9.7 11.7 9.7 12.4 9.8 13.4

a) Find the following First Quartile ___9.4___ Median___9.85___ Third Quartile ___10.9___ The step ___2.25___

The lower fences: inner __7.15__ outer __4.9__ The upper fences: inner __13.15__ outer __15.4__ b) Construct a boxplot by hand below and label the axis

Bulb Lifetime (in hundreds of hours)

c) Describe in words the distribution of the lifetimes of these 40-watt incandescent lamps. The distribution of the lifetimes of these lamps is skewed to the right because of an outlier. One of the bulbs only lasted 410 hours whereas the next lowest bulb lifetime was over double that at 860 hours.

Jager

Name: Hantao (Forrest) Ling

d) Calculate the mean and standard deviation of the lifetimes (in hundreds of hours) of the twenty-six lamps. Show work. 1. How does the mean compare to the median?
Mean = 4.1 8.6 9.1 9.2 9.2 9.3 9.4 9.5 9.5 9.6 9.7 9.7 9.8 9.9 10.4 10.5 10.6 10.7 10.9 10.9 11.0 11.3 11.6 11.7 12.4 13.4
26

Mean = 10.08 Mean = 1008 hours

The mean of the data set is 1008 hours whereas the median is 985 hours. The mean is slightly higher than the median because there are a higher number of lamps that exceeded 1000 hours than failed to reach it. The outlier bulb that only lasted 410 hours also brought the median of the data set down slightly.

2. How does standard deviation compare to the IQR? ( BulbLifetime AverageBulbLifetime) Standard Deviation = 26 Standard Deviation = 1.659 Standard Deviation = 165.9 hours The standard deviation of the data set is 165.9 hours whereas the IQR of the data set is 150 hours. These numbers make sense because the standard deviation is based more on the whole data set whereas the IQR is based on the bulk of the data. The standard deviation is skewed more by the outliers than the IQR.

Jager

Name: Hantao (Forrest) Ling

7. (Data from Example 2.19 in Text) In 1992 researchers studied the impact of viscosity on the observed coating thicknesses produced by a paint operation. Ideally, this process should produce a coating thickness of 0.8 millimeters. For simplicity, they chose to study only two viscosities: low and high. The following plots were produced using Minitab. a) Create a Stem-and-leaf display for Coating Thickness for each Viscosity level Low Viscosity: High Viscosity:

Note: 8 | 5 = 0.85

Note: 7 | 5 = 0.75

Interpret the Stem-and-leaf displays. The low viscosity data set has a smaller spread than the high velocity data set. The low viscosity data set also seems to have to peaks at the top and bottom of the plot whereas the high viscosity data set seems to have two smaller peaks at the top and bottom of the plot and a larger peak in the center of the plot.

Jager

Name: Hantao (Forrest) Ling

b) Dotplot comparison of Coating Thickness by Viscosity level


Dotplot of Thickness

Viscosity

High Low 0.75 1.00 1.25 1.50 1.75 Thickness 2.00 2.25

Interpret the Dotplot. In the dotplot, it is a little easier to see the trends. The bulk of the data points in the high viscosity data set are located in the center of the data distribution, around 1.5 mm thick. However, there are also smaller peaks on the left and right sides of the distribution. For the low viscosity data set, the distribution seems more even but is difficult to interpret in the dotplot.

c) Side-by-side boxplots of Coating Thickness by Viscosity level


Boxplot of Thickness

Viscosity
L 0.5

1.0

1.5 Thickness

2.0

2.5

Interpret the side-by-side boxplots. In the box plot, you can see that the spread of data is again larger for the high viscosity data set and smaller for the low viscosity data set. The average thickness for both viscosities seems pretty similar, which was difficult to tell in the previous two plots.

Jager

Name: Hantao (Forrest) Ling

d) Histograms of Coating Thickness by Viscosity Level


Histogram of Low
3.0 2.5 2.0 1.5 1.0 0.5 0.0
1 5

Histogram of High

Frequency

Frequency

0.6

0.9

1.2

1.5 Low

1.8

2.1

2.4

0.6

0.9

1.2

1.5 High

1.8

2.1

2.4

Interpret the Histograms. In the histograms, it is easy to see where exactly the data is located for both of the viscosity levels. The low viscosity data set is definitely more consistent with a smaller spread of data. The high viscosity data set, though, is obviously more spread out and includes three different peaks.

e) Comment on the plots. What are the advantages and disadvantages of each display? Which plot do you prefer when comparing the two groups (No wrong answer)? Is there evidence that the coating thickness differs between the two viscosity levels? Explain. The different plots give different types of information at varying levels. The stem and leaf plot shows raw data in an easy to read format and gives a mediocre display of data distribution. However, beyond those aspects, the stem and leaf plot does not have much else to offer. It is still difficult to read compared to the other graphs and gives almost no concrete analysis on the data. The dotplot is also very difficult to read. The data points are hard to see and it is difficult to interpret much from the plot. The averages and spread of each data set is mostly left up to guesswork. The dotplot does give a better idea of the data distribution than the stem and leaf plot. The boxplot was a good tool to easily interpret the averages of each data set and see the total distribution of data. The downside to the boxplot is that the reader loses the ability to look at individual data points and instead looks at general distributions. Finally, the histogram seemed to provide the most information in the most aspects. The display is easy to look at and read. The distribution and spread of data can be seen clearly along with the individual data points. The averages of the data sets may be more difficult to locate but still provides a better display for estimating averages than the stem and leaf plot or the dotplot. In my opinion, the histogram is the best overall plot to use for analyzing raw data. Based on the two data sets, it seems that there is a coating thickness difference between the two viscosity levels. While the averages of each of the viscosity levels were about even, the distributions were very different. The high viscosity data was very spread out, which indicates that that method is not very consistent. Thus, the low viscosity paint seems to be more effective.

You might also like