You are on page 1of 16

2019-21Batch

PGDM
NEHA SINGH
BUSINESS
RollSec.’A’
10, No. & Section

ANALYTICS
ASSIGNMENT
Q-1 How do you Compute Cumulative Relative Frequencies in an Excel Sheet. Take an
example to support your answer and structure the process.

SOLUTION:
For cumulative relative frequency Firstly we have to find out cumulative frequency and
relative frequency.

The data in the table represent the tuition for all 2-year community colleges in a region in
2009-2010.

Tuition-
Number-of-Community-colleges
(dollars)
775-799 20
800-824 67
825-849 15
850-874 5
875-899 0
900-924 0
925-949 0
950-974 2

Steps for Cumulative Frequency are:

Step 1: Create a frequency distribution table in an Excel worksheet.

Step 2: Add a third column to your frequency chart. Title it “Cumulative Frequency.”

Step 3: Type the formula “=C2” (where C2 is the actual location of your first frequency
count) in the first row of your new column.
Step 4: Type the formula “=SUM(C2:C3)” (where C2 is the actual location of your first
cumulative frequency count from Step 3, and B3 is the location of your second frequency
count) in the first row of your new column.

Step 5: Click the cell you entered the formula in Step 4. Click and drag the little black square
in the bottom right hand corner of the cell to the bottom of the column. Excel will populate
the cell with all of the remaining values.

After calculate cumulative frequency, we have to find relative frequency.


Steps for Relative Frequency are:
Step 1: Firstly calculate the sum/Total of “Number-of-Community-colleges”

Step 2: Type the formula =C2/C$12 in cell E2 then click Enter key.
Step 3: Click the cell you entered the formula in Step 2. Click and drag the little black square
in the bottom right hand corner of the cell to the bottom of the column. Excel will populate
the cell with all of the remaining values.

Steps for Cumulative Relative Frequency are:


Step 1: Copy the first digits of relative frequency as-it-is.
Step 2: Type the formula =SUM(E2;E3) in cell F3 then click Enter key OR type =SUM and
select cell E2 to E3 and click enter key.

Step 3: In this step the same formula will apply (which was used for previous cell)
=SUM(E2:E4) ) in cell F4 then click Enter key OR type =SUM and select cell E2 to E4 and
click enter key. And, do the same with all remaining columns to find cumulative relative
frequency.

Q-2 You are asked to analyze the impact of Covid 19 (Novel Coronavirus) on economy
of India for which you need to conduct a survey on individuals in Delhi city. You are
required to ask the following:
• Gender
• Education
• Ethnicity
• Name
• Age
• Length of residency
• Factors affecting Economy (using a scale of 1–5, going from poor to excellent)

SOLUTION:
1) What types of data (categorical, ordinal, interval, or ratio) would each of the survey
Items represent and why?

 Gender
It is categorical or nominal type of data scale. Nominal scales were often called
qualitative scales, and measurements made on qualitative scales were called qualitative data.

 Education
It is a type of ratio Scale. Applications of measurement models in educational contexts
often indicate that total scores have a fairly linear relationship with measurements across the
range of an assessment.

 Ethnicity
It is cover under Nominal Scale. The nominal type differentiates between items or
subjects based only on their names or (meta-) categories and other qualitative classifications
they belong to; thus dichotomous data involves the construction of classifications as well as
the classification of items. 

 Name
It is also cover under Nominal scale. Because Nominal data are used to label
variables without any quantitative value.

 Age
It is type of Ratio Data scale. Using the aforementioned definition, age is in a ratio
scale. And In this case there are no categorized are given of age. So, in this situation
it is cover under the ratio scale only.

 Length of residency
It is also a Ratio scale. Ratio scale refers to the level of measurement in which the
attributes composing variables are measured on specific numerical scores or values.

 Factors affecting Economy (using a scale of 1–5, going from poor to


excellent)
By using the 1-5 scale it is categorizes as ordinal data. Because it is a Likert scale
and this scale is cover under the topic of ordinal data.

2) What analytical tools you would use on this data to analyze it.

When we talk about the analysis tools there are three names which come to the mind are
MS-Excel, SAS and SPSS.

So, comparison between Excel, SPSS and SAS are given below:

Excel SPSS SAS


Maximum number of Approx 1.048
Approx. 2.15 billion Approx. 2.15 billion
rows million
Maximum number of
16,383 32,767 32,767
columns
More expensive than
Price Cheap Expensive
SPSS
License Perpetual Perpetual Year on year
Easy to learn and Easy to learn and Hard to learn and
General Use
use use use
Typical user Non programmer Non programmer Programmer

Statistical Analysis Very limited Advanced Analysis Advanced Analysis

Data Manipulation Ok Powerful Very Powerful

Macro Programming Powerful (VBA) Ok Very Powerful


Documentation
Ok Excellent Good
(Help Guide)
Popular in data
Most widely used Popular in market
Popularity mining and business
tool research field
modeling

By Job Function
Let's dig into this job market data and study popularity of these tools by Job Function.

SAS
SAS is mostly used in business analytics and intelligence industry. IT Software functional
area comes second in the list, followed by KPO/BPO, clinical research and finance etc.

SPSS
SPSS is mostly used in business analytics and intelligence industry. IT Software functional
area comes second in the list, followed by KPO/BPO, finance and others. Others include
Sales, Retail, Marketing, Logistics etc.

Excel
Excel is mostly used in finance industry. KPO/BPO industry comes second in the list,
followed by IT software, business analytics and others. Others include HR,Retail, Marketing,
Supply Chain, Logistics etc

Conclusion

In the situation of Covid-19 survey we are working as market research. So, the best tool for
the Market research uses is SPSS which is most popular in the Market research industry and
it is also a tool from 3 of them which major job function is Analytics. It has many advance
functions and cheaper than SAS also.

3) What precautions you will take as an analyst before analyzing the data?
Before analyzing the data analyst should remove the errors in data. Here are 7 criteria you
should consider:

1. Respondents who only answer a portion of your questions

Respondents who answer just a fraction of your required questions can bias your overall
results for many reasons:

 It can be a sign that they weren’t qualified to take your survey to begin with (leading
them to leave).
 It can indicate that they weren’t as engaged and considerate in their responses as those
who were willing to complete it.

 When you're working with an incomplete dataset, using filters or Compare Rules may


not show you the full picture, but offer a partial (and potentially skewed) view instead.

2. Respondents who don’t meet your target criteria

Say you want to survey women between the ages of 18 and 29.

You wouldn’t want the responses of a 50-year-old influencing your overall findings, would
you?

Whatever audience specifications you land on, you can ignore respondents who don’t match
them by filtering them out.

3. Respondents who speed through your survey

Imagine sending a respondent a 10-question survey.

If they only take a few seconds to complete it, they’re likely speeding through the questions,
which means they aren’t reading them carefully and answering them thoughtfully.

So how do you go about deciding who’s a speeder and who isn’t? The answer can vary,
depending on the subject of your survey and the types of questions you ask.

4. Respondents who “straight-line”

Straight lining is when a respondent chooses the same answer choice over and over again
(e.g. the first answer option). Straight liners are often speeders as well, as they race through
the survey by answering each question with little to no thought.
5. Respondents who provide unrealistic answers

Imagine asking respondents how much TV they watch per week, on average. If a respondent
writes in 165 hours, they’re likely exaggerating (Hint: there are only 168 hours in a week).

We call this type of response an outlier, because it falls beyond the range of answers from our
other respondents, and is, quite frankly, unrealistic.

6. Respondents who give inconsistent responses

When a respondent’s answer contradicts their response to another question, it’s clear that
they’re either being dishonest or careless (or even both!)

7. Respondents who offer nonsensical feedback in your open-ended questions

Having a response like: “Fdsklj” might make you smile, but it isn’t going to get you far in
your analysis.
Q3 Discuss the concept of contingency tables and analyze the dashboard
given below on the following parameters

 Region wise sale of Product/Count of Customer id


 Payment Mode
 Sale Amount
 Is the visualization used in dashboard appropriate?

DATA SET

Below Is The Header/Parameter For The Source Sheet


Remark: Only for better understanding the headers are given no calculation needs to be
done

Regi Time
Cust ID on Payment Transaction Code Source Amount Product Of
Day
10001 East Paypal 93816545 Web $20.19 DVD 22:19

OUTPUT OF DASHBOA

SOLUTION:

CONCEPT OF CONTINGENCY TABLES


Contingency table / cross tabulation is a tabular method that display the number of
observation in a data set for different subcategories variables.

The subcategories of variable must be mutually exclusive and exhaustive, meaning that each
observation can be classified into only one subcategory, and taken together over all
subcategories, they must constitute and complete data set.
Additionally, a pivot table is one of the possible ways of creating a contingency table. A
typical pivot table has the visual form of the contingency table, or pivot table can be used to
quickly create cross-tabulation and to drill down into a large set of data in numerous ways.

ANALYZE THE DASHBOARD

 Region wise sale of Product/Count of Customer id


 In every region book sale is higher than the DVD
 Maximum sale in in west of both products.
 In East has lowest sale and in North and South has almost head to head
competition according to the total quantity of product.
 In South the difference in no. of DVD and Book ordered is high whereas in north
No. of books and DVD ordered is almost similar.

 Payment Mode
 In East, North, South credit payment is use more than Paypal. But in West Paypal
is much casual
 In Total the much amount or payment is made and accept by the credit.

 Sale Amount.

 Highest sale amount is in West, because the no. of product ordered in west is
highest.
 As no. of product sold in east and south is not a big difference. Similarly the sale
amount is also haven’t big difference (East have $251.67 more than south).
 North has lowest sale Amount or Lowest sale.

 Is the visualization used in dashboard appropriate?


 In the graphical representation only region, source and no. of particular product
order is visualized.
 But, the amount region sale amount received and payment mode not shown.

D
REFRENCES

 https://www.youtube.com/watch?v=DrtChy0dBuk
 https://en.wikipedia.org/wiki/Level_of_measurement
 https://stats.stackexchange.com/questions/240363/is-age-interval-scale
 https://www.surveymonkey.com/curiosity/survey-data-cleaning-7-things-to-
check-before-you-start-your-analysis/
 https://www.listendata.com/2013/04/data-analysis-tools-excel-spss-or-sas.html
 https://stats.stackexchange.com/questions/44834/what-is-the-
difference-between-the-pivot-table-and-contingency-table
 Michael.hahsler.net

You might also like