You are on page 1of 3

Data Preprocessing - 1

Using Python

Data preprocessing is an important step in the data analysis and machine learning
pipeline. It involves cleaning, transforming, and organizing raw data into a format
that is suitable for analysis or modeling. Python provides several libraries and
tools to help with data preprocessing, including NumPy, Pandas, and Scikit-
Learn.
Example:
1) Start by importing the necessary libraries for data preprocessing, such as
NumPy and Pandas:

2) Load Dataset

3) Data Exploration
data.head() # View the first few rows of the dataset

data.info() # Get information about the data types and missing values
data.describe() # Summary statistics

data.shape

4) Handle Missing Values

# Check for missing values


missing_values = data.isna().sum()
print(missing_values)
a) Remove Rows with Missing Values

data.dropna(inplace=True) # This will remove rows with any missing values

b) Input Missing Values:

data['column_name'].fillna(data['column_name'].mean(), inplace=True)

c) Replace with Constant Values

data['column_name'].fillna(0, inplace=True)

Save File
data.to_csv("diabetes.csv", index=False)

You might also like