Professional Documents
Culture Documents
Assignment -1
Name – Yogender Bansal
Roll No- 11712303920
________________________________________________________________
Q1. Based on the Data Summarization Methods and Techniques discussed in the class compile the
advantages and disadvantages of each technique in a tabular form and also mention the data type
suitable for each technique used.
Ans.
2 Histogram estimate key values at a glance fail to reveal key assumptions, Continuous
norms, causes, effects, or Data
patterns
clarify trends better than do tables be easily manipulated to yield
or arrays false impressions
show each interval in the be inadequate to describe the
frequency distribution attribute, behavior, or condition
of interest
closely resemble the bell curve if require additional written or
sufficient data and classes are used verbal explanation
3 Frequency summarize a large data set in Lose details on relative numbers Nominal Data
visual form. and proportions vis-a-vis the
histogram
begin to show central tendency, be inadequate to describe the
dispersion, and attribute, behavior, or condition
clustering/modality. of interest
estimate key values, especially the fail to delineate each interval in a
mean, and show skew and kurtosis frequency distribution
clarify trends better than do tables, require additional written or
arrays, and most other graphs. verbal explanation
4 Bar Graph be visually simpler than other be easily manipulated to yield Nominal Data
types of graphs false impressions
show areas proportional to the fail to reveal key assumptions,
number of data points in each norms, causes, effects, or
category patterns
display relative proportions of fail to describe the attribute,
multiple classes of data behavior, or condition of interest
permit a visual check of the reveal little about central
reasonableness or accuracy of tendency, dispersion, skew, or
calculations kurtosis
5 Mean It is based on all the values. It gives misleading conclusions. Interval Data
It is rigidly defined It has upward bias.
It is not based on the position in It is affected by extreme values.
the series.
It is easy to understand & simple Not appropriate with nominal
calculate. or ordinal data.
Sensitive to extreme outliners.
It is easy to understand the It cannot be calculated for open
arithmetic average even if some of end classes.
the details of the data are lacking. It cannot be located graphically
8 Standard It gives a more accurate idea of Only used with data where an Continuous
Deviation how the data is distributed independent variable is plotted Data
against the frequency of it
Shows how much data is clustered It doesn't give you the full range
around a mean value of the data
It can be hard to calculate
Not as affected by extreme values Assumes a normal distribution
pattern
9 Range easy to compute and understand It is very much affected by the Ordinal Data
extreme values.
Communicates information of Range cannot be computed in
interest to readers of a report. case of open-end distribution.
best for symmetric data with no It is not based on each and every
outliers item of the distribution.
good option for ordinal data The value of Range is affected
more by sampling fluctuations