RATING Pre-Processing Data Cleaning • “Unnamed:0” column in the dataset is set a Index column. • Column “id” is dropped since it has no significance in the analysis. Unique values • Dataset is checked for unique values in each column. – Majority of columns(14) have values range from 0 to 5. – Four columns are with distribution of two values. – Target column also have two vales. – Other columns have unique values of • Age – 75 • Class - 3 • Flight Distance – 3281 • Departure Delay in Minutes – 313 • Arrival Delay in Minutes - 320 Null values • Null values are only present in “Arrival Delay in Minutes”. • The null values are replaced with mean value of the column. i)Handling the outlier in 'Flight Distance' column. • Here outliers are present in 584 Rows. Handling the outlier in 'CheckIn service' column • The normal range of CheckIn services is 1 to 5 and obtained value for outlier is 1. • This accounts for 12.1 percent of total data set. • So we concluded that it is not an unwanted outlier. So we retained it. Handling the outlier in 'Departure Delay in Minutes' column Handling the outlier in Arrival Delay. Encoding • The dataset is assigned to following variables. – Gender – Male(1),Female(0) – Customer Type - Loyal Customer(0) Disloyal Customer(1). – Type of Travel - Business travel(0),Personal travel(1). – Class – Business(0),Eco(1),Eco Plus(2).