You are on page 1of 3

Conceptual points in ML

 When we used in pandas count() so within count function its


not allow to write any column name its give error.

 Ex:- df.groupby(‘YearsExperience’).count(‘Salary’) Wrong


syntax
 If we want to correct this syntex then

 Ex:df.groupby(‘YearsExperience’)[‘Salary’].count() correct
one

 Positive correlation: To determine if there is a positive


correlation, we can compare the values of "YearsExperience"
with the corresponding "Salary" values. If, in general, as the
years of experience increase, the salary tends to increase as
well, then we can say there is a positive correlation between
the two variables.
 For example, let's consider the first two data points in the
dataset:

 For 1.1 years of experience, the salary is 39343.


 For 1.3 years of experience, the salary is 46205.
 Here, we can see that as the years of experience increase from
1.1 to 1.3, the salary also increases from 39343 to 46205. This
pattern holds for most of the data points, indicating a positive
correlation.

 Linearity: Linearity refers to whether the relationship between


"YearsExperience" and "Salary" follows a straight line on a
graph. While the scatter plot does not have to be a perfectly
straight line, we want to see if there is a general trend or
pattern.
 For example, if we plot the data points on a graph, and they
form a roughly upward-sloping line or curve, it suggests a linear
or non-linear relationship between the variables.

 Outliers: Outliers are data points that significantly deviate from


the overall pattern. By examining the dataset, we can look for
any extreme values that are notably different from the majority
of data points.
 For example, if all data points have similar salaries for a
particular range of years of experience, but there is one data
point with an exceptionally high or low salary compared to
others, it may be considered an outlier.

 By considering these observations, we can draw inferences


about the relationship between "YearsExperience" and "Salary."
In the given dataset, we can infer that there is a positive
correlation between the two variables, meaning that as years of
experience increase, salaries tend to increase as well.
Additionally, the scatter plot may exhibit a roughly linear
relationship.

 Range = Maximum value - Minimum value

 Correlation coefficients: You can calculate the correlation


coefficients between the housing features and the target
variable (house price). The features with higher absolute
correlation coefficients are likely to be more dependent
variables and can be considered for determining the price of
the house.

 Z-score: The Z-score can be used to identify outliers in the


housing features. Outliers may have a significant impact on the
dependent variable (house price), so considering the Z-score
can help identify influential features.

 IQR Range: The IQR range can be used to detect and handle
outliers in the housing features. Removing outliers or treating
them separately can be important in determining the
dependent variables affecting the house price accurately.

 Range of the Features: Analyzing the range of each feature can


provide insights into the variability of the data. Features with a
larger range may have a more substantial impact on the
dependent variable.

 It's worth noting that the above parameters are used for
identifying the relationship between the features and the
dependent variable (house price) and determining their
significance. Other considerations, such as domain knowledge
and feature importance analysis, can also play a role in selecting
the dependent variables for predicting house prices accurately.

You might also like