Professional Documents
Culture Documents
MACHINE LEARNING:
According to ARTHUR SAMUEL (Pioneer of AI ) defined ML as :’The field of studies that gives
computers the ability to learn without being explicitly programmed.
1. Supervised Learning:
Used in application when labelled historic data predicts future events.
2. Unsupervised Learning:
Used when historical data not labelled. They are used to discover unknown
patterns in the data.
4. Reinforcement:
Discovers which action yields maximum rewards through trial and error.
Applications: Robotics,Gaming and Navigation.
Train the machine with known data so that it learns something from it.
Machine classify a new unknown data point using the knowledge gained in previous
step.
Model evaluated based upon the accuracy it has classified in the unknown data.
1. CLASSIFICATION:
Used to predict discrete result
Eg: Email is spam or not
Transaction fraudulent
2. Regression:
Used to predict continuous numeric results
Eg: Price of car
Delivery time
Credit limit
REGRESSION MODEL:
SIMPLE LINEAR REGRESSION: MULTIPLE LINEAR REGRESSION:
Y=mX+C Y=m1X1+m2X2+m3X3……
m=Intercept
1. The linear Regression method is very easy to use. If the relationship between the
variables (independent and dependent) is known, we can easily implement the
regression method accordingly (Linear Regression for linear relationship).
2. After performing linear regression, we get the best fit line, which is used in
prediction, which we can use according to the business requirement.
3. Easy to implement, interpret and efficient to train.
4. It handles overfitting model easily.
USES OF LIBRARIES:
Faster application development
Enhances code efficiency
Achieve code modularization
1. NUMPY:
Used for scientific computation to get mathematical and structural understanding of data.
2. PANDAS:
Data structure and Data analysis.
3. MATPLOTIB:
Plotting and visualization.
Eg:box plot , scatter plot, histogram.
4. Scikit learn/SK learn:
Used to build predictive models
PROJECT
STEP 1: To import all the libraries that are required for the program.
PLOTTING is of 2 types:
MATLAB and OBJECT ORIENTED
MATLAB: (matplotlib.plyplot). It
is the simple way.Used for box ,scatter, histogram.
OBJECT ORIENTED:Used for more control and customization in plot.
They read the input data as there are multiple forms of data available like (eg:’json’,
‘csv’,’xslx’,….)
CSV=COMMA SEPARATED VALUES
print(df.shape)
it describes about shape of the file, here it is number of rows and column.
STEP 3: PREPROCESSING
Since the data does not contain an uniform data format,this can be changed either in excel or
python.
df.info()
It gives the complete summary of dataframe(our 2d data structure).It includes
List of all columns with their data types.
Number of nonvalues in each column etc…
From the above output it is very clear that DATE IS IN THE FORM OF OBJECT AND NOT IN ITS
DATE FORM.
%m,%d,%Y- Represents the format code nothing but month date and year
Since y=mx+c
(df['Date']-df['Date'][0]) :
Here we add an extra column Date1 to which difference in the days are
added.
np.timedelta64(1,'D'):
Still, they are in days format we convert them to integer using this
syntax.
Iloc:
It is a pandas module .
Which helps us to locate the specific row or column from data set.
df.iloc[a,b]
Here a- Row
b- Column
reshape(-1, 1):
Used to change the vertical data frame to a horizontal array. Only reshaping occurs not the
changing of data takes place.
fit(x, y):
{model.intercept_}:
{model.coef_}:
1. Simplicity: It is a simple and non-mathematical method of studying the correlation between the
two variables.
2. Easily understandable: It can be easily understood and interpreted. It enables us to know the
presence or absence of correlation at a single glance of the diagram.
3. Not affected by extreme items: It is not influenced by the size of extreme values, whereas most of
the mathematical methods lack this quality.
4. First Step: It is a step in investigating the relationship between two variables.