Professional Documents
Culture Documents
Homework 02 ISE-291-T222
Homework 02 ISE-291-T222
Term 222
Homework 02 2023
Covers: Topics 4-5 Material Deadline: 10 March 2023
11:59 PM
Page 1 of 7
ISE-291: Homework 02
Problem A [40 Marks]: Consider data given in “HW2_DataA” Microsoft Excel Comma Separated
Values (.CSV) file.
A-1. [3 marks]: Read the data (Assume the 1st row in HW2_DataA contains the column headings).
Then display:
A-2. [5 marks]: Create a new dataframe (let’s say: ndf) by selecting the first 20 rows from column-2
(Type_of_Payment) to column-6 (Late_delivery) and then sort the second column of the new
dataframe in descending order and the third column of the new dataframe in ascending order.
A-3. [2 marks]: Considering the new dataframe from part A-2, display the separate statistical
A-4. [5 marks]: From the original dataset “HW2_DataA”, How many customers below the age of 40
“Order_Item_Discount” column values from float to nearest integer using the apply command
A-7. [3 marks]: Select the rows having Order_Item_Discount greater than 30% and plot their
A-8. [10 marks]: Draw a plot showing the payment method preference (Type_of_Payment) based on
the Customer_Segment (your graph must show the count of the different types of
(i) Which customer segment has the highest number of DEBIT type?
(ii) Which customer segment prefers PAYMENT over the other types?
Page 2 of 7
ISE-291: Homework 02
A-9. [5 marks]: Make boxplots for the Sales_per_customer of the COMPLETE and PENDING orders
of Smart watch. Compare the two boxplots and explain how the median Sales per customer is
Problem B [40 Marks]: Consider the data given in “HW2_DataB” Microsoft Excel (.csv) file and
described in Table 1. Note: Solve all the following questions using Python. Use the Pandas & Sklearn library for
all the following analyses.
B-1. [3 marks]: Read and display the data. Identify the number of rows and columns. Does any
Page 3 of 7
ISE-291: Homework 02
B-2. [2 marks]: Type Consistency: For each column, identify each field type and verify that each
column in Python is identified correctly. If there is any discrepancy, then indicate it.
B-3. [5 marks]: Filter noise: Looking at the data, some values in the numeric columns (“age”) were
entered in a less than 1 (by mistake). Fix the inconsistencies. Furthermore, find unique categorical
B-4. [7 marks]: Handling NaN values: Drop all columns containing 30% or more missing values.
B-5. [5 marks]: Normalization/Transformation: Normalize all numeric columns to a mean of zero and
B-7. [5 marks]: Encoding: For the “ever_married,” convert it using binary values (0 and 1). Do not
(i) When is best to use a label encoder rather than one hot encoding?
(iii) Give a real-world example of direct and indirect data acquisition approaches.
Access the UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets.php, and then do the
following:
Page 4 of 7
ISE-291: Homework 02
C-1. [5 marks]: Find the “Abalone” Data Set and download the data file. The data file is in the “Data
Folder” with “.data” extension. To customize and limit your search, you may use the filters on
Attribute Type : mixed; Data Type : multivariate; # Attributes : : less than 10.
C-2. [5 marks]: Identify the columns’ headers of the data. (Tip: The columns’ headers are under the
C-3. [5 marks]: Read the data into Pandas dataframe. Include the list of columns’ headers to your
“abalone.data” file. (Tip: Create a list of columns’ headers and then add it as data headers to
C-4. [5 marks]: Display a statistical summary for each of the numerical and non-numerical data.
Consider the following python methods, available in naive Python, or pandas/seaborn libraries:
D-1. pandas.DataFrame()
D-3. pandas.DataFrame.head()
D-4. pandas.DataFrame.index
D-5. pandas.DataFrame.columns
D-6. pandas.DataFrame.describe()
D-7. pandas.DataFrame.info()
D-8. pandas.DataFrame.loc()
D-9. pandas.DataFrame.iloc()
D-11. pandas.DataFrame.isin()
Page 5 of 7
ISE-291: Homework 02
D-13. pandas.DataFrame.apply()
D-14. pandas.DataFrame.applymap()
D-15. seaborn.relplot()
D-16. seaborn.pairplot()
D-17. seaborn.catplot()
Consider the following python methods, available in naive Python, or pandas/sklearn libraries:
E-1. pandas.DataFrame.index
E-2. pandas.DataFrame.columns
E-3. pandas.DataFrame.dtypes
E-5. pandas.DataFrame().apply()
E-6. pandas.DataFrame.map()
Page 6 of 7
ISE-291: Homework 02
E-8. sklearn.preprocessing.LabelEncoder.fit()
E-9. sklearn.preprocessing.LabelEncoder.transform()
E-10. sklearn.preprocessing.StandardScaler.transform()
☞ Note: You can use the following online references to answer the above questions:
♣ https://docs.python.org/3.8/library/functions.html#help
♦ https://docs.python.org/3/library/index.html
♥ https://pandas.pydata.org/pandas-docs/stable/index.html
♠ https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing
Page 7 of 7