Professional Documents
Culture Documents
Lecturer: Le Hai Ha
Std.ID: 20192664
Table of Contents
Introduction....................................................................................................................................2
I. Introduction about project: House Prices – Advanced Regression Techniques..................2
1.1 Topic description....................................................................................................................2
1.2 About Linear regression.........................................................................................................2
1.3 Practice Skills.........................................................................................................................3
II. Project solution.........................................................................................................................3
2.1 Importing libraries..................................................................................................................3
2.2 Importing and read data.........................................................................................................3
2.3 Data exploration and visualization.........................................................................................4
2.3.1 Histogram and Normal probability plot..........................................................................4
2.3.2 Relationship with numerical variables and categorical features.....................................5
2.4 Data preprocessing.................................................................................................................7
2.4.1 Separating target and features.........................................................................................8
2.4.2 Looking at NaN % within the data..................................................................................8
2.4.3 Data cleaning with missing values..................................................................................8
2.5 Feature engineering..............................................................................................................10
2.6 Modeling..............................................................................................................................11
Conclusion....................................................................................................................................13
Introduction
Business data science is the study of data to extract insights that are meaningful to business
operations. It is a multidisciplinary approach that combines the principles and practices of the
fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large
volumes of data. This analysis will help data scientists ask and answer questions such as what
happened, why it happened, what events will happen, and how the results can be used. for what
purpose.
After finishing the course, students learn how to use several tools such as: RapidMiner, Python,
SQL, collect and process structured and unstructured data, administration skills, aggregation,
data extraction, exploiting information, detecting hidden trends, new knowledge in data.
Learning about this field right now will bring promising results for learners. Investment in this
field is also an important orientation of modern society, indispensable for the development of
each country and each economy, and Vietnam will certainly not stand aside from this orientation
Are we sure that all these nans are real missing values? Looking at the given description file, we
can see how the majority of these nans reflect the absence of something, and for this reason, they
are not NaNs. We can impute them (for numerical features) or substitute them with data in the
file.
2.4.3 Data cleaning with missing values
Important questions when thinking about missing data: