You are on page 1of 6

B.E / B.Tech.

PRACTICAL END SEMESTER EXAMINATIONS, NOVEMBER/DECEMBER 2022


Third Semester

CS3361 – DATA SCIENCE LABORATORY

(Regulations 2021)

Time : 3 Hours Answer any one Question Max. Marks 100

Aim/Principle/Apparatus Tabulation/Circuit/ Calculation Viva-Voce Record Total


required/Procedure Program/Drawing & Results
20 30 30 10 10 100

1. a. Write a NumPy program to convert an array to a float type

b. Write a NumPy program to add a border (filled with 0's) around an existing array

c. Write a NumPy program to convert a list and tuple into arrays

d. Write a NumPy program to append values to the end of an array

2. a. Write a NumPy program to convert an array to a float type

b. Write a NumPy program to create an empty and a full array

c. Write a NumPy program to convert a list and tuple into arrays

d. Write a NumPy program to find the real and imaginary parts of an array of complex numbers

3. Write a Pandas program to create and display a DataFrame from a specified dictionary data which
has the index labels.
Sample Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
.... i 2 Kevin no 8.0
j 1 Jonas yes 19.0

Page 1 of 6
4. Write a Pandas program to select the rows where the number of attempts in the examination is
greater than 2.
Sample Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of attempts in the examination is greater than 2:
name score attempts qualify
b Dima 9.0 3 no
d James NaN 3 no
f Michael 20.0 3 yes

5. Write a Pandas program to get the first 3 rows of a given DataFrame.


Sample Python dictionary data and list labels:
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
First three rows of the data frame:
attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5

6. Write a Pandas program to select the rows where the score is missing, i.e. is NaN.

Sample Python dictionary data and list labels:


exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura',
'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Rows where score is missing:
attempts name qualify score
d 3 James no NaN
h 1 Laura no NaN

Page 2 of 6
7. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set

8. Use the diabetes data set from UCI data set for performing the following:

Apply Univariate analysis:

• Frequency
• Mean,
• Median,
• Mode,
• Variance
• Standard Deviation
• Skewness and Kurtosis

9. Use the diabetes data set from UCI data set for performing the following:

Apply Bivariate analysis:

• Linear and logistic regression modeling

10. Use the diabetes data set from UCI data set for performing the following:

Apply Bivariate analysis:

• Multiple Regression analysis

11. Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the
following:

a) Normal values

b) Density and contour plots

c) Three-dimensional plotting

12. Apply and explore various plotting functions on Pima Indians Diabetes data set for performing the
following:

a) Correlation and scatter plots


b) Histograms
c) Three-dimensional plotting

Page 3 of 6
13. Apply and explore various plotting functions on UCI data set for performing the following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting

14. Apply and explore various plotting functions on UCI data set for performing the following:

a) Correlation and scatter plots


b) Histograms
c) Three-dimensional plotting

15. Write a Pandas program to get the numeric representation of an array by identifying distinct values
of a given column of a dataframe.
Sample Output:
Original DataFrame:
Name Date_Of_Birth Age
0 Alberto Franco 17/05/2002 18.5
1 Gino Mcneill 16/02/1999 21.2
2 Ryan Parkes 25/09/1998 22.5
3 Eesha Hinton 11/05/2002 22.0
4 Gino Mcneill 15/09/1997 23.0
Numeric representation of an array by identifying distinct values:
[0 1 2 3 1]
Index(['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton'], dtype='object')

16. Write a Pandas program to check for inequality of two given DataFrames.
Sample Output:
Original DataFrames:
WXYZ
0 68.0 78.0 84 86
1 75.0 85.0 94 97
2 86.0 NaN 89 96
3 80.0 80.0 83 72
4 NaN 86.0 86 83
WXYZ
0 78.0 78 84 86
1 75.0 85 84 97
2 86.0 96 89 96
3 80.0 80 83 72
4 NaN 76 86 83
Check for inequality of the said dataframes:
WXYZ
0 True False False False
1 False False True False

Page 4 of 6
2 False True False False
3 False False False False
4 True True False False

17. Write a Pandas program to get first n records of a DataFrame.


Sample Output:
Original DataFrame
col1 col2 col3
0147
1255
2368
3 4 9 12
4751
5 11 0 11
First 3 rows of the said DataFrame':
col1 col2 col3
0147
1255
2368

18. Write a Pandas program to select all columns, except one given column in a DataFrame.

Sample Output:
Original DataFrame
col1 col2 col3
0147
1258
2 3 6 12
3491
4 7 5 11
All columns except 'col3':
col1 col2
014
125
236
349
475

19. Write a NumPy program to convert a Python dictionary to a NumPy ndarray.


Sample Output:
Original dictionary:
{'column0': {'a': 1, 'b': 0.0, 'c': 0.0, 'd': 2.0},
'column1': {'a': 3.0, 'b': 1, 'c': 0.0, 'd': -1.0},
'column2': {'a': 4, 'b': 1, 'c': 5.0, 'd': -1.0},
'column3': {'a': 3.0, 'b': -1.0, 'c': -1.0, 'd': -1.0}}
Type: <class 'dict'>
ndarray:
[[ 1. 0. 0. 2.]
Page 5 of 6
[ 3. 1. 0. -1.]
[ 4. 1. 5. -1.]
[ 3. -1. -1. -1.]]
Type: <class 'numpy.ndarray'>

20. Write a NumPy program to search the index of a given array in another given array.

Sample Output:
Original NumPy array:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
Searched array:
[4 5 6]
Index of the searched array in the original array:
[1]

Page 6 of 6

You might also like