You are on page 1of 51

STATISTICSS

AND
ANALYTICS

TEACHER’S
GUIDE
EXPERIMENT -1
AIM: To prepare a close/open ended hand written questionnaire containing 25 questions.
PURPOSE: A questionnaire can be helpful for collecting data and analyse it.
STRUCTURE:
• Interpretation of the language should be same for all the questions. It means language
should be concise.
• Language should be clear and straight forward.
• Questions should not be long, as it bores the respondents.
• Avoid phrases and expressions that are abstract.
• Questions and statements of leading character-ones that put replies into the mouth of
the respondent should be guarded against.
• Units of questions should be precisely stated or defined in order to ensure proper
orientation of respondent.
• Subjective words such as ‘bad’, ‘good’, ‘fair’ and the like do not lend themselves
either to quantitative or qualitative and as such should be avoided.
• No single question should deal with more than one issue and as such the principle of
one question, one issue should be followed.
• Vocabulary employed in the questions should be appropriate to the background of the
respondents.
• Questions should be so worded that ego of the respondents is not injured in any way.
• Complex questions that require the respondent to go through several steps of
reasoning before answering are undesirable and as such should be avoided.
• Questions on controversial issues should be broken down into components.
A closed ended questionnaire with the problem statement - potential involvement of a person
with drugs not including alcoholic beverages is for your reference.
****************************
Name: ……………………………….. Date:
…………………
Please answer every question by ticking your response. If you have difficulty with a
statement, then choose the response that is mostly right.
1. Have you used drugs other than those required for medical reasons? YES
NO
2. Have you abused prescription drugs? YES
NO
3. Do you abuse more than one drug at a time? YES
NO
4. Can you get through a week without using drugs? YES
NO
5. Can you stop using drugs at any time you want? YES
NO
6. Have you had blackouts or flashbacks as a result of drug use? YES
NO
7. Do you feel guilty of drug use? YES
NO
8. Does your parents (or spouse) ever complain about your involvement with drugs?
YES NO
9. Has drug use created problems with you and your parents or spouse? YES
NO
10. Have you lost friends because of your use of drugs? YES
NO
11. Have you neglected your family because of your drug use? YES
NO
12. Have you been in trouble at work because of your drug use? YES
NO
13. Have you lost your job because of drug use? YES
NO
14. Have you gotten into fights when under the influence of drugs? YES
NO
15. Have you engaged in illegal activities in order to obtain drugs? YES
NO
16. Have you been arrested for possession of illegal drugs? YES
NO
17. Have you experienced withdrawal symptoms? YES
NO
18. Have you had medical problems as a result of drug use? YES
NO
19. Have you gone to anyone for help for a drug problem? YES
NO
20. Have you been involved in a treatment problem specifically related to drug use?
YES NO
21. Do you feel anxious when you use drugs? YES
NO
22. Do you feel comfortable post drug use? YES
NO
23. Did you start drug use for fun? YES
NO
24. Do you take drugs in more dosages? YES
NO
25. Will be consciuos when you drugs? YES
NO
EXPERIMENT -2
AIM: To transform a questionnaire into a Google form for a well-defined problem statement.
PURPOSE: A questionnaire in the form of Google forms helps to gather data by online
survey.
STEPS INVOVLED: The reference questionnaire in experiment 1 is taken to be transformed
into Google form. The following are the steps involved in creating the Google form.
• Login into your email

• Click on Google apps icon


• Click on My Drive

• Click on New

• Click on Google forms for a blank form


• Enter the title and form description

• Start entering the closed ended questions


EXPERIMENT -3
AIM: Send out a survey for a well-defined problem statement and collect the dataset in the
spreadsheet (.csv file)
PURPOSE: Data in spreadsheet (.csv file) is very useful to analyze the dataset and
interpretation.
STEPS INVOVLED: The reference questionnaire in experiment 1 is taken to be transformed
into Google form and sent for online survey. The following are the steps involved in sending
Google forms for online survey and download the responses in spreadsheet (.csv file).
• Gather the email ID’s of the respondents
• Click on the send, enter the recipient’s email ID, click on dialogue box and send.

• Click on responses and download the .csv file


VIEW OF .CSV FILE
EXPERIMENT - 4
AIM: Remove unwanted observations from the dataset (spreadsheet) provided, including
duplicate observations or irrelevant observations.
PURPOSE: Data cleaning is very important process in maintaining the quality of the dataset
and conduct an efficient hypothesis testing.
STEPS INVOVLED:
The following are the steps involved in arranging the data in similar pattern.
➢ Using the TRIM function in Excel the data in a column can be arranged
according to one’s priority. Below is an example where the TRIM function is
used to fix the format for all the required cells in the spread sheet.
The data set shown below consists of the list of Hollywood movies in which
the data is arranged erratically. The mistakes can be fixed using the TRIM
function.

In the first movie name the text is arrange for proper text as shown above
Mistakes are fixed in the first movie and the auto format is used to format the
rest
Now the data set is arranged according to the priority
This is the best practice to make the data look identical.
➢ In the example below the dataset consists of 50 passengers and their ages who
travelled in the Titanic ship. Some cells in age column are empty. The logical
test (IF) is used to fill the missing values in those cells.

A new age column is created and if logic is built.


If the value of the cell exists then the value is retained and if it is empty then a
particular value of decision, say 28, is filled.
The code below executes this task.
IF(logical_test, [value_if_true], [value_if_false])
Auto format fill is used to fill this to all the cells in that column of new age. It
is seen that the missing data is filled.
➢ DEDUPLICATION OF DATA
Duplicate values happen when the same value or set of values appear in the data.
Removal of such data is called Deduplication of data.
The following are the steps involved in removing duplicates in the data set.
In the data set given below contains the malnutrition in kids in different states of India, based
on the three reasons is given. The data set contains duplicates which are highlighted. Now
Duplicates are removed as shown below

• Select the entire data vertically and horizontally as well


• Click on DATA tab
• Then click on remove duplicates button
• On the window which appears on the screen, Click OK
• A window which appears shows the number of duplicates removed and the number of
unique values remaining in the dataset. Click on OK button
• The dataset is now cleaned for duplicate data.
EXPERIMENT -5
AIM: In Microsoft Excel spread sheet draw the frequency distribution table for the given data (data
set should contain minimum 50 data).

PURPOSE: Tabulation makes the data brief. Therefore, it can be easily presented in the form
of graphs.
STEPS INVOVLED: The following are the steps for tabulating data in Excel spread sheet.
• Gather the ungroup data. For instance
The data set consists of 50 student marks in Mathematics in a class.

Marks Scored
47 44 49 46 13
21 44 36 22 5
31 11 40 26 8
1 29 33 25 47
21 11 16 45 2
41 11 19 9 48
9 29 2 11 11
33 25 23 21 7
16 16 3 46 17
15 32 39 25 49

The data in excel spread sheet is as follows

• Using COUNTIFS function the data can tabulated.

Construct the table in Excel as shown below.


• Enter the code =COUNTIFS(A3:A52,">=1",A3:A52,"<=5") and press enter as shown
in the figure below
• Excel shows that there are 10 students who have scored marks in the interval (1-5)

Similarly this can be done with other ranges also.


The final frequency distribution table is as shown.
EXPERIMENT -6
AIM: To prepare Microsoft excel spread sheet to draw the relative frequency distribution
table for the data.

PURPOSE: The data in excel spread sheet will be helpful for collecting data set and
finding relative frequency distribution in charts.

STEPS INVOLVED:
Step1: Tabulate the data which has been gathered.

Step2: Divide frequency of each range by total to determine relative frequency

Inference:
EXPERIMENT -7
AIM: To conduct survey on favorite fruit of 100 persons using excel spread sheet and to plot
bar graph for the collected data.
PURPOSE: Bar graphs have been in widespread use everywhere from textbooks to
newspapers, most audiences understand how to read a bar graph and can grasp the
information the graph conveys.
STEPS INVOLVED:

Step1: Tabulate the data collected.

Step2: Select the entire tabulated data as shown in the figure.

Step 3: Click on insert tab > chart : Modify the chart by adding tittle name, axis tittles and
format data labels.
120
Total Respose,
100
100

orange, 23
80

banana, 18

sapota, 18

grapes, 10
60

kiwi, 12
40

20

0
Count of Fruits

INFERENCE:
EXPERIMENT - 8
AIM: Using Microsoft Excel spread sheet plot pie chart for the data collected from 50 people(
for example, conduct a survey on the smokers with respect to their ages in your locality.
PURPOSE: A pie chart presents data as a simple and easy-to-understand picture. It can be an
effective communication tool for even an uninformed audience, because it represents data
visually as a fractional part of a whole. Readers or audiences see a data comparison at a glance,
enabling them to make an immediate analysis or to understand information quickly.
STEPS INVOLVED:
Step1: Tabulate the collected data as shown below.

Step2: Select the entire tabulated data and Click on insert >> pie-chart.

Step3: Select pie chart of your choice and label the chart.

PIE CHART
8% 20-34
21%
35-49

35% 50-64
18% 65-79
18% 80-94

INFERENCE:
EXPERIMENT - 9
AIM: Using Microsoft Excel spread sheet draw a line graph for the given dataset.
PURPOSE: A line graph, also known as a line chart, is a type of chart used to visualize the
value of something over time.
STEPS INVOLVED:
Step 1: Put each category and the associated value on one line:

Step 2: Add projected sales along with actual sales here

Step 3: highlight the data you want in the graph:


Step 4: Then, open the Insert tab in the Ribbon. In the Charts group, click the Insert Line or
Area Chart Button:

Step 5: From the resulting menu, click the 2D line button:

Step 6: line graph will appear

INFERENCE:
EXPERIMENT - 10
AIM: To prepare Microsoft excel spread sheet and to draw frequency polygon and frequency
curve for the data.
PURPOSE Frequency polygons are a graphical device for understanding the shapes of
distributions. They serve the same purpose as histograms, but are especially helpful for
comparing sets of data.
STEPS INVOLVED:

Step1: Tabulate the given data

Step2: Select the entire data, Click on insert tab>>insert bar chart>> click on the bar
graph>> select format data series>> make gap width 0 to obtain the Histogram
Histogram

65

53
50 48 50
46 46 47
42 44
41 41 39 40 39

Step4: Once the histogram is constructed join the midpoints of each bar by hand, through a
straight line to develop frequency polygon and smooth curve to develop frequency curve.
INFERENCE:
EXPERIMENT - 11
AIM: To Using Microsoft Excel spread sheet construct a box plot for the given dataset.
PURPOSE: Box plot is usually helpful in explanatory data. It indicates the spread out of data
based on 5 number summary namely minimum, Q1 (Quartile 1), Median, Q3 (Quartile 3), and
Maximum.
STEPS INVOLVED:
Step 1: Select the data>>go to recommendations>>all charts>>click on box plot>>enter
EXPERIMENT - 12
AIM: Using Microsoft Excel spread sheet construct a stem and leaf plot for the given dataset.
PURPOSE: A stem-and-leaf display (also known as a stemplot) is a diagram designed to allow
you to quickly assess the distribution of a given dataset. It indicates the recurrence of data.
STEPS INVOLVED:
Step 1: Select the data as show in the figure

Sort the values in ascending order.


To start with, sort your actual data in ascending order.
1. Select any cell within the dataset range (A2:A25).
2. Go to the Data tab.
3. Click the “Sort” button.
4. In each dropdown menu, sort by the following:
1. For “Column,” select “Customer Age” (Column A).
2. For “Sort On,” select “Values” / “Cell Values.”
3. For “Order,” select “Smallest to Largest.”
Step 2: Set up a helper table.
Once the column of data has been sorted, set up a separate helper table for storing all the chart
data as follows:

A few words on each element of the table:


• Stem (Column C) – This will contain the first digit of all of the ages.
• Leaf (Column D) – This will contain the second digit of all the ages.
Step 3: Enter the left digit of the data column wise as stem.

Step 4: Enter the right digit of the data as leaf. Hence the obtained is the leaf plot
stem leaf
1 5 9
2 3 3 6 7 8 8
3 0 1 1 1 2 5 6 6
4 0 0 4 5 5
6 0 0 0

INFERENCE:
EXPERIMENT - 13
AIM: Using Microsoft Excel spread sheet find the Mean, Mode and Median for the data
(univariate data) given and also represent them in a Histogram.
PURPOSE: The central tendencies Mean, Mode and Median help us understand that has
already taken place and predict future values as well.
STEPS INVOLVED:
Step 1: Tabulate the collected data as shown below

Step 2: Select the data and enter the syntax for Mean, Mode and Median
Step 3: Press enter to obtain the Mean or Average, Mode and Median

Step 4: Plot the Histogram with the instruction given in previous examples.

INFERENCE:
EXPERIMENT - 14
AIM: To generate a 50 random data sample (even and odd number dataset) using Microsoft
Excel spread sheet and determine the range and Quartiles.
PURPOSE: The quartiles are especially useful when working with data that isn't symmetrically
distributed, or a data set that has outliers.
STEPS INVOLVED:
Step 1: Consider the odd number and even number data set as shown below

Step 2: To obtain the quartiles the syntax are as follows: range and interquartile range to be
found using the formula: Maximum-Minimum and Q3-Q1.

Same instructions to be followed for odd number set.


Step 3: The end result obtained is

INFERENCE:
EXPERIMENT - 15
AIM: To determine the mean deviation and quartile deviation for the data collected.
PURPOSE: The Mean deviation is an important descriptive statistic that is not frequently
encountered in mathematical statistics. This is essentially because while mean deviation has a
natural intuitive definition as the "mean deviation from the mean," the introduction of the
absolute value makes analytical calculations using this statistic much more complicated than
the standard deviation
STEPS INVOLVED:
Step 1: Find the mean of the data collected as shown in the figure below

Step 2: Determine the elements of the new columns as shown below


Step 3: The average of the absolute values is determined. This value is called Mean deviation.
Next the quartiles Q1 and Q3 are found with which the Quartile deviation QD=(Q3-Q1)/2 is
found

INFERENCE:
EXPERIMENT - 16
AIM: To determine the standard deviation for the data collected.
PURPOSE: SD tells us about the shape of our distribution, how close the individual data values
are from the mean value.
STEPS INVOLVED:
Step 1: Select the data gathered for 2 livestock as shown below
Step 2: Type the syntax for standard deviation and determine the same

INFERENCE:
EXPERIMENT - 17
AIM: To determine the variance for the data collected.
PURPOSE: SD and variance tells us about the shape of our distribution, how close the
individual data values are from the mean value.
STEPS INVOLVED:
Step 1: Find the mean and the follow the instruction as shown below

INFERENCE:
EXPERIMENT - 18
AIM: Using Microsoft Excel spread sheet draw a Skewness graph and Kurtosis graph for
randomly generated dataset.
PURPOSE: Skewness and kurtosis is useful in finding the symmetry and peakedness of the
data distribution.
STEPS INVOLVED:
Step 1: For the collected data determine the mean and mode
Step 2: Draw the line graph
Step 3: In the graph, mark the mean and median and claim the negativeness and
positiveness of the graph.
(If the mean appears before the median then declare that the data is negatively skewed
and positively skewed otherwise)
Step 4: To determine the kurtosis enter the syntax (KURT(number1, number2…))

INFERENCE:
EXPERIMENT - 19
AIM: To write a python program to Convert Decimal to Binary, Octal and Hexadecimal.

CODE:

dec=int(input(“enter the number:”))


print("The decimal value of", dec, "is:")
print(bin(dec), "in binary.")
print(oct(dec), "in octal.")
print(hex(dec), "in hexadecimal.")

OUTPUT:

The decimal value of 344 is:


0b101011000 in binary.
0o530 in octal.
0x158 in hexadecimal.

EXPERIMENT - 20
AIM: To write a python program to add 2 integers and 2 strings and print the result

CODE:

num1 = int(input('Enter first number: '))


num2 = int(input('Enter second number: '))
x=num1+num2
print(“Sum”,x)

OUTPUT:

Enter the first number 10


Enter the second number 5
Sum 15
EXPERIMENT - 21
AIM: To write a python program to find the sum of first 10 natural numbers.

CODE:

OUTPUT: Enter the number 10


The sum of the first n natural numbers is 55

EXPERIMENT - 22
AIM: To write a python program to find whether the number is odd or even.

CODE:

OUTPUT: Enter any number 10


Number is even
EXPERIMENT - 23
AIM: To write a python program to find whether the number is odd or even.
CODE:

OUTPUT: Enter any number 10


Number is even
EXPERIMENT - 24
AIM: To write a python program to enter the marks of the student across the subject
CODE:

OUTPUT
EXPERIMENT - 25
AIM: To write a python program to create a labeled bar graph using matpoltlib. pyplot.
CODE:

OUTPUT:
EXPERIMENT - 26
AIM: To write a python program to create a labeled bar graph using matpoltlib. pyplot.
CODE:

OUTPUT:

You might also like