You are on page 1of 6

MID TERM TEST – SECTION –B (Data Analytics with Python)

Time Limit: 40 Minutes


HANDS ON ASSESMENT (30)
Note: You are expected to attempt all questions in Python
Platform & paste the code against each question with appropriate
comment in a single Word file. Word file should be submitted as
an assignment through Google Classroom before 12:15PM (as it is
scheduled as mentioned, you will not be able to submit Later.)
Q1.Write a Program to find out Maximum, Minimum element in each row & Column &
Cumulative sum along each row.
arr = np.array([[1, 5, 6],
                [4, 7, 2],
                [3, 1, 9]])
   (4)
Q2. There is a code given in Python to demonstrate sorting in Numpy.
Complete the following Code with the help of Numpy Functions for
getting an output against each Comment Statements as mentioned below:
# Python program to demonstrate sorting in numpy

import numpy as np
  
a = np.array([[1, 4, 2],
                 [3, 4, 6],
              [0, -1, 5]])
  

# to get Array elements in Sorted Order


# to get Row Wise Sorted array
# Column wise sort by applying merge sort
  
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]
  
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), 
           ('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]
             
arr = np.array(values, dtype = dtypes)
# to sort an array by name
# To Sort an array by graduation year and then cgpa
              
(5)

Q3. Case Study (20)

In this Case Study, you are expected to apply the NumPy python library
to explore a dataset. The dataset we'll be using is a medical dataset
with information about some patients on metrics like glucose, insulin
levels, and other metrics related to diabetes. The assignment will
serve two primary objectives - (a) practice NumPy on a realistic task,
and (b) learn how to get a feel for a large dataset (also known as
data cleaning and data exploration).

Dataset description

The following are the column names: Pregnancies, Glucose,


BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction,
Age, Outcome

Perform the following based on a given Data Set using Python Library
NUMPY:

1. Import the Data Set


2. How many patients does the dataset have information about?
3. What is the blood pressure of the patient number 5 (0-indexed)?
4. What is the age of the patient number 112 (0-indexed)?

In this dataset, Outcome = 0 denotes that the patient does not have
diabetes. And Outcome = 1 denotes that the patient has diabetes.

5. Does patient number 227 (0-indexed) have diabetes?

6. Out of the total patients, how many have diabetes?

7. For features Glucose, BloodPressure, SkinThickness, Insulin and


BMI (columns 1, 2, 3, 4 and 5 0-indexed) the values are missing for
some of the patients. Instead of the actual value, the dataset simply
has a 0. Find Total number of missing values

8. For how many patients is at-least one of the features missing? (Be
careful that it is okay for someone to be Pregnant 0 times).

9. what is the total number of patients who have diabetes in the


dataset?

10. What is the average glucose level in the dataset?

11. What is the average glucose level among the diabetes patients?

12. What is the average glucose level among the non-diabetic people?
ANSWERS

CODE

#################################################### Question 1
import numpy as np

arr = np.array([[1, 5, 6],[4, 7, 2],[3, 1, 9]])

# maximum element of array


print ("Largest element is", arr.max())
print ("Row-wise max elements:", arr.max(axis = 1))
print ("Row-wise min elements:", arr.min(axis = 1))

# minimum element of array


print ("Column-wise max elements:", arr.max(axis = 0))
print ("Column-wise min elements:", arr.min(axis = 0))

# sum of array elements


print ("Sum of all array elements:", arr.sum())

# cumulative sum along each row


print ("Cumulative sum along each row:\n", arr.cumsum(axis = 1))

####################################################### Question 2
a = np.array([[1, 4, 2],
[3, 4,6],
[0, -1, 5]])

# Sorted array
print ("Array elements in sorted order:\n", np.sort(a, axis = None))

# Row-wise sorted array


print ("Row-wise sorted array:\n", np.sort(a, axis = 1))

# Column-wise Merge Sort


print ("Column wise sort by applying merge-sort:\n", np.sort(a, axis = 0,
kind = 'mergesort'))

dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]


values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), ('Pankaj', 2008, 7.9),
('Aakash', 2009, 9.0)]
## creating array
arr = np.array(values, dtype = dtypes)
## Sortr by names
print ("\nArray sorted by names:\n", np.sort(arr, order = 'name'))
## Sort by grad year and then cgpa
print ("Array sorted by grauation year and then cgpa:\n", np.sort(arr, order
= ['grad_year', 'cgpa']))

####################################################### CASE STUDY

###### Question 1
# Importing dataset
dbdata = np.loadtxt('C:/Users/Sarthak Kaushik/Desktop/diabetes.csv',
skiprows=1, delimiter=',')

###### 2 Data set information


print(dbdata.shape)

###### 3 Blood pressure of Patient 5


print(dbdata[5, 2])

###### 4 Age of patient 112


print(dbdata[112, 7])

###### 5 Diabetes of patient 227


print(dbdata[227, 8])

##### 6 No. of patients having diabetes


pat = sum(dbdata[:, 8] == 1)
print(pat)

##### 7 Missing value


miss = sum(dbdata[:, 4] == 0)
print(miss)

#### 8 One feature missing


print(sum((dbdata[:, 1] == 0) |
(dbdata[:, 2] == 0) |
(dbdata[:, 3] == 0) |
(dbdata[:, 4] == 0) |
(dbdata[:, 5] == 0) |
(dbdata[:, 6] == 0) |
(dbdata[:, 7] == 0) ))

##### 9 total no. of patients having diabetes


p = sum(dbdata[:, 8] == 1)
print(p)
##### 10 Average glucose level in data
avg = np.mean(dbdata, axis=0)
print(avg)

#### 11 average g level in diabetes patient


d = dbdata[ (dbdata[:, 8] == 1) , :]
a = np.mean(d, axis=0)
print(a)

##### 12 avg glucose level in non diabetic patient


nd = dbdata[ (dbdata[:, 8] == 0) , :]
avg = np.mean(nd, axis=0)
print(avg)

OUTPUT

Largest element is 9
Row-wise max elements: [6 7 9]
Row-wise min elements: [1 2 1]
Column-wise max elements: [4 7 9]
Column-wise min elements: [1 1 2]
Sum of all array elements: 38
Cumulative sum along each row:
[[ 1 6 12]
[ 4 11 13]
[ 3 4 13]]
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
Column wise sort by applying merge-sort:
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]

Array sorted by names:


[(b'Aakash', 2009, 9. ) (b'Ajay', 2008, 8.7) (b'Hrithik', 2009, 8.5)
(b'Pankaj', 2008, 7.9)]
Array sorted by grauation year and then cgpa:
[(b'Pankaj', 2008, 7.9) (b'Ajay', 2008, 8.7) (b'Hrithik', 2009, 8.5)
(b'Aakash', 2009, 9. )]
(768, 9)
74.0
23.0
1.0
268
374
376
268
[ 3.84505208 120.89453125 69.10546875 20.53645833 79.79947917
31.99257812 0.4718763 33.24088542 0.34895833]
[ 4.86567164 141.25746269 70.82462687 22.1641791 100.3358209
35.14253731 0.5505 37.06716418 1. ]
[ 3.298 109.98 68.184 19.664 68.792 30.3042
0.429734 31.19 0. ]

You might also like