Sampling Design Lecture Notes UNIT III

Published by Amit Kumar
Sampling Design- Census & Sample Surveys; Steps in Sampling Design; Types of Sample designs-Probability & Non Probability sampling.
Published by: Amit Kumar on Mar 04, 2013
Prof Amit Kumar 1FIT Group of Institutions
ANALYSIS OF DATAAnalysis of data
is a process of inspecting, cleaning,transforming, and modellingdatawith the goal of highlighting usefulinformation,suggesting conclusions, andsupporting decision making. Data analysis has multiplefacets and approaches, encompassing diverse techniquesunder a variety of names, in different business, science, andsocial science domains.
Data is often coded before storage. Coding means to changethe original data into a shortened version by assigning acode.For example consider these data codes:
Instead of Male, Female it could be shortened toM, F
The full date of 18th January 2000AD could beshortened to JAN The
of the houses could said to be 'lilac, 'light blue','black', 'sage' and so on - but the data coding could changethese to Pink, Blue, Black, Blue.
Advantages of coding
Less storage space required
Comparisons are shortened and can therefore bemade quicker, thus speeding up searches
A limited number of codes makes data input fasterand simplifies validation.
Problems of Coding-Coding obscures the meaning of data
 A reader seeing the 'gender' data as M/ F is pretty likely toknow that it means Male/ Female. But with more obscurecodes such as Switzerland being coded as CHE means thereader must be given the complete list of possibilities tounderstand the meaning of the data.
Coding of Value Judgements
 For example, was that curry too spicy? As it is to be codedas a judgement of 1-4. This will be coded differently bydifferent people and makes comparison difficult. The problem with a value-judgement is that there is nosingle correct value. The value depends on someone'sopinion.Coding of value judgements will inevitably lead tocoarsening of the data since there will be a wide range of opinions that could be held and only a limited number of codes available.
Data should beeditedbefore being presented as information. This action ensures that the information provided isaccurate, complete and consistent. No matter what type of data you are working with, certain edits are performed on allsurveys. Data editing can be performed manually, with theassistance of computerprogramming,or a combination of both techniques. It depends on the medium (electronic,paper) by which the data are submitted. There are two levels of data editing
- and
.Micro-editing corrects the data at the record level. Thisprocess detects errors in data through checks of theindividual data records. The intent at this point is todetermine the consistency of the data and correct theindividual data records.Macro-editing also detects errors in data, but does thisthrough the analysis of aggregate data (totals). The data arecompared with data from othersurveys,administrative files,  or earlier versions of the same data. This process determinesthe compatibility of data.We might ask the question "Why are there errors in ourfiles?" There are several situations where errors can beintroduced into the data, and the following list gives some of them:
A respondent could have misunderstood a question.
A respondent or an interviewer could have checkedthe wrong response.
An interviewer could have miscoded ormisunderstood a written response.
An interviewer could have forgotten to ask aquestion or record the answer.
A respondent could have provided inaccurateresponses.Always keep in mind the objectives of data editing:
to ensure the accuracy of data;
to establish the consistency of data;
to determine whether or not the data are complete;
to ensure the coherence of aggregated data; and
to obtain the best possible data available.
Applying editing rules
So, how do we edit? The first step is to apply 'rules' (orfactors to be taken into consideration) to the data. Theserules are determined by the expert knowledge of a subject-matter specialist, the structure of the questionnaire, thehistory of the data, and any other related surveys or data.Expert knowledge can come from a variety of sources. Thespecialist could be an analyst who has extensive experiencewith the type of data being edited. An expert could also be
one of the survey sponsors who is familiar with therelationships between the data. The layout and structure of thequestionnairewill alsoimpact the rules for editing data. For example, sometimesrespondents are instructed to skip certain questions if thequestions do not apply to them or their situation. Thisspecification must be respected and incorporated into theediting rules.Lastly, other surveys relating to the same sort of variablesorcharacteristics are used in order to establish some of therules for editing data.
Data editing types
 There are several types of data edits available: They include
Validity edits
look at one question field or cell at atime. They check to ensure the record identifiers,invalid characters, and values have been accountedfor; essential fields have been completed (e.g., noquantity field is left blank where a number isrequired); specified units of measure have beenproperly used; and the reporting time is within thespecified limits.
Range edits
are similar to validity edits in that theylook at one field at a time. The purpose of this typeof edit is to ensure that the values, ratios andcalculations fall within the pre-established limits.
Duplication edits
examine one full record at atime. These types of edits check for duplicatedrecords, making certain that a respondent or asurvey item has only been recorded once. Aduplication edit also checks to ensure that therespondent does not appear in the survey universemore than once, especially if there has been a namechange. Finally, it ensures that the data have beenentered into the system only once.
Consistency edits
compare different answers fromthe same record to ensure that they are coherentwith one another. For example, if a person isdeclared to be in the 0 to 14 age group, but alsoclaims that he or she is retired, there is aconsistency problem between the two answers.Inter-field edits are another form of a consistencyedit. These edits verify that if a figure is reported inone section, a corresponding figure is reported inanother.
Historical edits
are used to compare surveyanswers in current and previous surveys. Forexample, any dramatic changes since the lastsurvey will be flagged. The ratios and calculationsare also compared, and any percentage variancethat falls outside the established limits will be notedand questioned.
Statistical edits
look at the entire set of data. Thistype of edit is performed only after all other editshave been applied and the data have been corrected. The data are compiled and all extreme values,suspicious data andoutliersare rejected.
Miscellaneous edits
fall in the range of special-reporting arrangements; dynamic edits particular tothe survey; correct classification checks; changes tophysical addresses, locations and/or contacts; andlegibility edits (i.e., making sure the figures orsymbols are recognizable and easy to read).Data editing is influenced by the complexity of thequestionnaire. Complexity refers to the length, as well as thenumber of questions asked. It also includes the detail of questions and the range of subject matter that thequestionnaire may cover. In some cases, the terminology of a question can be very technical. For these types of surveys,special reporting arrangements and industry-specific editsmay occur.
  The process of placing classified data into tabular form isknown as tabulation. A table is a symmetric arrangementof statistical data in rows and columns. Rows arehorizontal arrangements whereas columns are verticalarrangements. It may be simple, double or complexdepending upon the type of classification.
 Types of Tabulation:(1) Simple Tabulation or One-way Tabulation:
 When the data are tabulated to one characteristic, itis said to be simple tabulation or one-way tabulation.
For Example:
 Tabulation of data on population of worldclassified by one characteristic like Religion is example of simple tabulation.
(2) Double Tabulation or Two-way Tabulation:
 When the data are tabulated according to twocharacteristics at a time. It is said to be double tabulationor two-way tabulation.
For Example:
 Tabulation of data on population of worldclassified by two characteristics like Religion and Sex isexample of double tabulation.
(3) Complex Tabulation:
 When the data are tabulated according to manycharacteristics, it is said to be complex tabulation.
For Example:
 Tabulation of data on population of worldclassified by two characteristics like Religion, Sex andLiteracy etc…is example of complex tabulation.
pie chart
(or a
circle graph
) is acircularchartdivided intosectors,illustrating proportion. In a pie chart, thearc lengthof each sector (and consequently itscentral angleand area), isproportionalto the quantity it represents. When angles are measured with 1turnas unit then a number of percent is identified with the same number of centiturns. Together, the sectors create a full disk. It is named for itsresemblance to apiewhich has been sliced. The pie chart is perhaps the most widely used statisticalchart in the business world and the mass media. However, ithas been criticized, and some recommend avoiding it,
pointing out in particular that it is difficult to comparedifferent sections of a given pie chart, or to compare dataacross different pie charts. Pie charts can be an effectiveway of displaying information in some cases, in particular if the intent is to compare the size of a slice with the wholepie, rather than comparing the slices among them. Pie chartswork particularly well when the slices represent 25 to 50%of the data, but in general, other plots such as thebar chartor thedot plot,or non-graphical methods such astables,may be more adapted for representing certain information. It alsoshows the frequency within certain groups of information.
bar chart
bar graph
is achartwithrectangularbars withlengthsproportional to the values that they represent. The bars can be plotted vertically or horizontally.
Bar charts
are used for marking clear data which haslearned values. Some examples of discontinuous datainclude 'shoe size' or 'eye color', for which you would use abar chart. In contrast, some examples of continuous datawould be 'height' or 'weight'. A bar chart is very useful iyou are trying to record certain information whether it iscontinuous or not continuous data. Bar charts also look a lotlike a histogram. They are often mistaken for each other
is acomputer programused for conducting statisticalanalysis , manipulating data, and generating tables, graphsthat summarize data. SPSS is the most popular computersoftware for data analysis. The computer software providesa comprehensive set of flexible tools that can be used toaccomplish a wide variety of data analysis tasks. SPSS isspecially useful for social for social scientist and socialscience students, including scholars performing quantitativeresearch and undergraduates working for their thesis.
SPSS is the statistical package most widely used by socialscientist. The main uses and advantages of SPSS are asfollow-1. One can use it either a window point-and-click approachor through syntax (i.e., writing out of SPSScommands).Each has its own advantages and users canswitch between the approaches.2. Of the major packages, it seems to be the easiest to usefor most widely used statistical technique.
 The Analysis Of Variance, popularly known as theANOVA test, can be used in cases where there are morethan two groups.When we have only two samples we can use the t-test tocompare the means of the samples but it might becomeunreliable in case of more than two samples. If we onlycompare two means, then the t-test (independent samples)will give the same results as the ANOVA.It is used to compare the means of more than two samples. This can be understood better with the help of an example.
EXAMPLE: Suppose we want to test the effect of fivedifferent exercises. For this, we recruit 20 men and assignone type of exercise to 4 men (5 groups). Their weights arerecorded after a few weeks.We may find out whether the effect of these exercises onthem is significantly different or not and this may be doneby comparing the weights of the 5 groups of 4 men each. The example above is a case of one-way balanced ANOVA.It has been termed as one-way as there is only one categorywhose effect has been studied and balanced as the samenumber of men has been assigned on each exercise. Thus thebasic idea is to test whether the samples are all alike or not.
As mentioned above, the t-test can only be used to testdifferences between two means. When there are more thantwo means, it is possible to compare each mean with eachother mean using many t-tests.But conducting such multiple t-tests can lead to severecomplications and in such circumstances we use ANOVA. Thus, this technique is used whenever an alternativeprocedure is needed fortesting hypothesesconcerningmeans when there are several populations.
Now some questions may arise as to what are the means weare talking about and whyvariancesare analyzed in order toderiveconclusionsabout means. The whole procedure canbe made clear with the help of anexperiment. Let us study the effect of fertilizers on yield of wheat. Weapply five fertilizers, each of different quality, on four plotsof land each of wheat. The yield from each plot of land isrecorded and the difference in yield among the plots isobserved. Here, fertilizer is a factor and the differentqualities of fertilizers are called levels. This is a case of one-way or one-factor ANOVA since thereis only one factor, fertilizer. We may also be interested tostudy the effect of fertility of the plots of land. In such a casewe would have two factors, fertilizer and fertility. Thiswould be a case of two-way or two-factor ANOVA.Similarly, a third factor may be incorporated to have a caseof three-way or three-factor ANOVA.

