You are on page 1of 78

1

Python ilə Data


Science Sessiya 2
Təlimçi: Etibar Hüseynli

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
2

Dərs 2 :
Xülasə
Əhatə ediləcək mövzular

Mövzu 2:datatypes in python,


numpy and pandas packages,git
and github

Keyz diskussiya : Marketing


campaign for bank

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Probability Distribution 3

Probability can be used for more than calculating the likelihood of one event; it
can summarize the likelihood of all possible outcomes

A thing of interest in probability is called a random variable, and the


relationship between each possible outcome for a random variable and their
probabilities is called a probability distribution.

The structure and type of the probability distribution varies based on the
properties of the random variable, such as continuous or discrete, and this, in
turn, impacts how the distribution might be summarized or how to calculate the
most likely outcome and its probability.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Random Variable 4

A random variable is a quantity that is produced by a random process.

In probability, a random variable can take on one of many possible values,


e.g. events from the state space. A specific value or set of values for a
random variable can be assigned a probability.

A random variable can be either discrete or continuous.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Discrete Random Variable 5

Example: Coin toss

Discrete random variables take on a


countable
number of distinct values.
Consider an experiment where a coin is
tossed three times. If X represents the
number of times that the coin comes up
heads, then X is a discrete random variable
that can only have the values 0, 1, 2, 3 (from
no heads in three successive coin tosses to all

heads). No other value is possible for X.


www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Continuos Random Variable 6

Example: Time

A continuous random variable is a random


variable where the data can take infinitely many
values.
For example, a random variable measuring the time
taken for something to be done is continuous since
there are an infinite number of possible times that
can be taken.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Discrete Random Variable 7

The two types of discrete random variables most commonly used in


machine learning are binary and categorical.
A binary random variable is a discrete random variable where the finite set
of outcomes is in {0, 1}.
A categorical random variable is a discrete random variable where the finite
set of outcomes is in {1, 2, …, K}, where K is the total number of unique
outcomes.
Each outcome or event for a discrete random variable has a probability.
www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Probability of discrete random variable 8

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Probability Distribution 9

A probability distribution is a summary of probabilities for the


values of a random variable.
Important properties of a probability distribution are:
• the expected value (The average value of a random variable.)
• the variance (The average spread of values around the
expected value.)
• skewness
• kurtosis

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Skewness 10

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Kurtosis 11

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Discrete random variable distribution 12

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Discrete random variable distribution 13

Probability 
2 3 4 5 6 7 8 9 10 11 12
Sum 

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Continuous Probability Distributions 14

A continuous probability distribution summarizes the probability for


a continuous random variable.

The probability distribution function, or PDF, defines the probability


distribution for a continuous random variable.

 The probabilities of the heights of humans form a Normal


distribution.
 The probabilities of movies being a hit form a Power-law
distribution.
 The probabilities of income levels form a Pareto distribution.
www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Continuous Probability Distributions 15

Temperature 33.3%
3
30.6
22.22% 22.22%
31.4 2

11.1% 11.1%
31.2 1

32.1 30 – 31

31 – 32

32 – 33

33 – 34

34 – 35
32.2

32 – 33

33 – 34

34 – 35
30 – 31

31 – 32
32.7
33.4 Frequency Distribution with Bins Probability of the Bins Probability Density
33.8
34.6

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Normal Distribution 16

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Normal Distribution 17

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Example of Normal Distribution 18

• Diastolic Blood Pressure •Manufacturing • Arrival Time at office

50 82 110 94 mm 100 mm 106 mm 7:45 AM 8:00 AM 8:15 AM

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Positively Skewed Distribution 19

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 20

To compute normal probabilities, you first convert a normally distributed


random variable, X, to a standardized normal random variable, Z,
using the transformation formula.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 21

Time to download a video is normally distributed, with a mean of 7 seconds


and a standard deviation of 2 seconds. Therefore, a download time of 9
seconds is equivalent to 1 standardized unit (1 standard deviation) above
the mean because.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 22

A download time of 1 second is equivalent to –3 standardized units


(3 standard deviations) below the mean because.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 23

The standard deviation is the unit of measurement. In other words, a time


of 9 seconds is 2 seconds (1 standard deviation) higher, or slower, than
the mean time of 7 seconds. Similarly, a time of 1 second is 6 seconds (3
standard deviations) lower, or faster, than the mean time.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 24

To further illustrate the transformation formula, suppose that another website


has a download time for a video that is normally distributed, with a mean
seconds m = 4 and a standard deviation s = 1 second.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 25

• Comparing these results with previous one, you see that a


download time of 5 seconds is 1 standard deviation above the
mean download time because

• A time of 1 second is 3 standard deviations below the mean


download time because

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 26

With the Z value computed, you look up the normal probability using a
table of values from the cumulative standardized normal distribution.
Suppose you wanted to find the probability that the download time for first
example is less than 9 seconds. Recall that transforming to standardized
Z units, given a mean 7 seconds and a standard deviation seconds, leads
to a Z value of +1.00.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 27

With this value, you use Table to find the cumulative area under the
normal curve less than (to the left of) Z = +1.0 To read the probability or
area under the curve less than Z=+1.0 . You scan down the Z column in
Table until you locate the Z value of interest(in 10ths) in the Z row for 1.0.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 28

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Computing Normal Probabilities 29

However, for the other website, you see that a time of 5 seconds is
1 standardized unit above the mean time of 4 seconds. Thus, the
probability that the download time will be less than 5 seconds is also
0.8413.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Challenge 30

 What is the probability that the video download time for the
first website will be more than 9 seconds?
 What is the probability that the video download time for the
first website will be under 7 seconds or over 9 seconds?
 What is the probability that video download time for the first
website will be between 5 and 9 seconds?

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Golden rule 31

 For any normal distribution, 68.26% of the values will fall within +-
1 standard deviation of the mean.
 95.44% of the values will fall within +-2 standard deviations of the
mean. Thus, 95.44% of the download times are between 3 and
11 seconds.
 99.73% of the values are within +-3 standard deviations above or
below the mean. Thus, 99.73% of the download times are between
1 and 13 seconds.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Example 32

Therefore, it is unlikely (0.0027, or only 27 in 10,000) that a download


time will be so fast or so slow that it will take under 1 second or more
than 13 seconds. In general, you can use 6std (that is, 3 standard
deviations below the mean to 3 standard deviations above the mean)
as a practical approximation of the range for normally distributed data.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
33

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Rule 34

 Approximately 68.26% of the values fall within +-1 standard deviation


of the mean.
 Approximately 95.44% of the values fall within +-2 standard
deviations of the mean
 Approximately 99.73% of the values fall within +-3 standard
deviations of the mean.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Example 35

How much time (in seconds) will elapse before the fastest 10% of
the downloads of an first example video are complete?
Because 10% of the videos are expected to download in under X
seconds, the area under the normal curve less than this value is
0.1000. Using the body of Table, you search for the area or probability
of 0.1000. The closest result is 0.1003, as shown in Table

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Example 36

Working from this area to the margins of the table, you find that the Z
value corresponding to the particular Z row (-1.2) and Z column (.08)
is
-1.28
www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Example 37

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Çay fasiləsi

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Finding outliers 39

Outliers are stragglers — extremely high or extremely low values — in


a data set that can throw off your stats. For example, if you were
measuring children’s nose length, your average value might be thrown off if
Pinocchio was in the class.

An outlier is a piece of data that is an abnormal distance from other


points. In other words, it’s data that lies outside the other values in the
set.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Outliers 40

In this set of random numbers, 1 and 201 are outliers:


1, 99, 100, 101, 103, 109, 110, 201
“1” is an extremely low value and “201” is an extremely high value.

Outliers aren’t always that obvious. Let’s say you received the
following paychecks last month:
$225, $250, $25, $235.
Your average paycheck is $135. But that small paycheck ($25) might be
because you went on vacation, so a weekly paycheck average of $135 isn’t
a true reflection of how much you earned. Your average is actually closer to
$237 if you take the outlier ($25) out of the set.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Outliers 41

Of course, trying to find outliers isn’t always that simple. Your data set
may look like this:
61, 10, 32, 19, 22, 29, 36, 14, 49, 3.
You could take a guess that 3 might be an outlier and perhaps 61. But
you’d be wrong: 61 is the only outlier in this data set.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Boxplot 42

A box and whiskers chart (boxplot)


often shows outliers

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Finding outliers 43

The most effective way to find all of your outliers is by using the interquartile
range (IQR). The IQR contains the middle bulk of your data, so outliers can
be easily found once you know the IQR.

An outlier is defined as being any point of data that lies over 1.5 IQRs below
the first quartile (Q1) or above the third quartile (Q3)in a data set.

High = (Q3) + 1.5


IQR Low = (Q1) –
1.5 IQR

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Finding outliers 44

Sample Question: Find the outliers for the following data set: 3, 10, 14, 22, 19,
29, 70, 49, 36, 32.

Step 1: Find the Q1, Q3 and


IQR. Step 2: Multiply IQR by 1.5
Step 3: Add the amount you found in Step 2 to Q3 from Step
1 Step 4: Subtract Step 2 from Q1
Step 5: Insert your low and high values
Step 6: Highlight any number below or above the numbers
you inserted in step 5 www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Finding IQR 45

-Sort data: 3,10,14,19,22,29,32,36,49,70


-Find the median: (3,10,14,19,22,|29,32,36,49,70)
-Find Q1 = (3,10,14,19,22 ) = 14
-Find Q3 = (29,32,36,49,70) = 36
-Find IQR = Q2-Q1 = 36-14 = 22

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Finding outliers 46

Step 2: Multiply IQR by 1.5


22*1.5 = 33

Step 3: Add the amount you found in Step 2 to Q3 from Step


1
33+36 = 69
This is your upper limit.

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Finding outliers 47

Step 4: Subtract the amount you found in Step 2 from Q1 from Step
1:
14 – 33 = -19.
This is your lower limit. Set this number aside for a moment.

Step 5 : Insert your low and high values into your data set, in order:
-19, 3, 10, 14, 19, 22, 29, 32, 36, 49, 69, 70

Step 6: Highlight any number below or above


-19, 3, 10, 14, 19, 22, 29, 32, 36, 49, 69, 70 www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Numpy 48

Using NumPy, a developer can perform the following operations


• Mathematical and logical operations on arrays.
• Fourier transforms and routines for shape manipulation.
• Operations related to linear algebra. NumPy has in-built functions for
linear algebra and random number generation.
NumPy is often used along with packages like SciPy (Scientific Python)
and Mat−plotlib (plotting library). This combination is widely used as a
replacement for MatLab, a popular platform for technical computing.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Numpy 49

NumPy is suited to many applications:


• Image processing
• Signal processing
• Linear algebra
• A plethora of others

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Numpy: Arrays 50

Array in Numpy is a table of elements (usually numbers), all of the same


type, indexed by a tuple of positive integers. In Numpy, number of dimensions
of the array is called rank of the array.A tuple of integers giving the size of the
array along each dimension is known as shape of the array. An array class in
Numpy is called as ndarray. Elements in Numpy arrays are accessed by
using square brackets and can be initialized by using nested Python Lists.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Numpy: Arrays 51

The key difference between an array and a list is, arrays are designed to
handle vectorized operations while a python list is not.

That means, if you apply a function it is performed on every item in the array,
rather than on the whole array object.

Another characteristic is that, once a numpy array is created, you cannot


increase its size. To do so, you will have to create a new array. But such a
behavior of extending the size is natural in a list.

A numpy array must have all items to be of the same data type, unlike lists.
This is another significant difference.
www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 52

This tool is essentially your data’s home. Through pandas, you get acquainted with your data by
cleaning, transforming, and analyzing it.
For example, say you want to explore a dataset stored in a CSV on your computer. Pandas will
extract the data from that CSV into a DataFrame — a table, basically — then let you do things like:
• Calculate statistics and answer questions about the data, like
• What's the average, median, max, or min of each column?
• Does column A correlate with column B?
• What does the distribution of data in column C look like?
• Clean the data by doing things like removing missing values and filtering rows or columns by
some criteria
• Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.
• Store the cleaned, transformed data back into a CSV, other file or database

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 53

It takes data (like a CSV or TSV file, or a SQL database) and creates a
Python object with rows and columns called data frame that looks very similar
to table in a statistical software (think Excel or SPSS for example.)Pandas is
built on top of the NumPy package, meaning a lot of the structure of NumPy
is used or replicated in Pandas. Data in pandas is often used to feed
statistical analysis in SciPy, plotting functions from Matplotlib, and machine
learning algorithms in Scikit-learn.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 54

 When you want to use Pandas for data analysis, you’ll usually use it in one of
three different ways:
 Convert a Python’s list, dictionary or Numpy array to a Pandas data frame
 Open a local file using Pandas, usually a CSV file, but could also be a
delimited text file (like TSV), Excel, etc
 Open a remote file or database like a CSV or a JSONon a website through a
URL or read from a SQL table/database

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 55

There are different filetypes Pandas can work with, so you would replace
“filetype” with the actual, well, filetype (like CSV). You would give the path,
filename etc inside the parenthesis. Inside the parenthesis you can also pass
different arguments that relate to how to open the file. There are numerous
arguments and in order to know all you them, you would have to read the
documentation (for example, the documentation for pd.read_csv() would
contain all the arguments you can pass in this Pandas command).

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 56

In order to convert a certain Python object (dictionary, lists etc) the basic
command is:

Inside the parenthesis you would specify the object(s) you’re


creating the data frame from.
You can also save a data frame you’re working with/on to different
kinds of files (like CSV, Excel, JSON and SQL tables). The general
code for that is:

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas: Data viewing 57

Viewing the data:


• Running the name of the data frame would give you the entire table, but
you can also get the first n rows with df.head(n) or the last n rows with
df.tail(n).
• df.shape would give you the number of rows and columns.
• df.info() would give you the index, datatype and memory information.
• The command s.value_counts(dropna=False) would allow you to view
unique values and counts for a series (like a column or a few columns).
• A very useful command is df.describe() which inputs summary statistics
for numerical columns.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas: Statistics 58

• df.mean() Returns the mean of all columns


• df.corr() Returns the correlation between columns in a data frame
• df.count() Returns the number of non-null values in each data frame
column
• df.max() Returns the highest value in each column
• df.min() Returns the lowest value in each column
• df.median() Returns the median of each column
• df.std() Returns the standard deviation of each column

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas: Data selection 59

Selection of Data
• You can select a column (df[col]) and return column with label col
as Series or a few columns (df[[col1, col2]]) and returns columns
as a new DataFrame. You can select by position (s.iloc[0]), or by
index (s.loc['index_one']). In order to select the first row you can
use df.iloc[0,:] and in order to select the first element of the first
column you would run df.iloc[0,0] .

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 60

You can use different conditions to filter columns. For example,


df[df[year] > 1984] would give you only the column year is greater
than 1984. You can use & (and) or | (or) to add different conditions to
your filtering. This is also called boolean filtering.

It is possible to sort values in a certain column in an ascending order


using df.sort_values(col1) ; and also in a descending order using
df.sort_values(col2,ascending=False). Furthermore, it’s possible to
sort values by col1 in ascending order then col2 in descending order
by using df.sort_values([col1,col2],ascending=[True,False]).
www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas 61

The last command in this section is groupby. It involves splitting the


data into groups based on some criteria, applying a function to each
group independently and combining the results into a data structure.
df.groupby(col) returns a groupby object for values from one column
while df.groupby([col1,col2]) returns a groupby object for values
from multiple columns.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas: Data cleaning 62

Check for missing values in the data by running pd.isnull() which


checks for null Values, and returns a boolean array (an array of true
for missing values and false for non-missing values).

In order to get a sum of null/missing values, run pd.isnull().sum().

pd.notnull() is the opposite of pd.isnull()

After you get a list of missing values you can get rid of them, or drop
them by using df.dropna() to drop the rows or df.dropna(axis=1) to
drop the columns.
www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas: Data cleaning 63

A different approach would be to fill the missing values with other


values by using df.fillna(x) which fills the missing values with x (you
can put there whatever you want) or s.fillna(s.mean()) to replace all
null values with the mean (mean can be replaced with almost any
function from the statistics section).

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Pandas: join and combine 64

The last set of basic Pandas commands are for joining or combining
data frames or rows/columns. The three commands are:
• df1.append(df2)— add the rows in df1 to the end of df2 (columns
should be identical)
• df.concat([df1, df2],axis=1) — add the columns in df1 to the end of
df2 (rows should be identical)
• df1.join(df2,on=col1,how='inner') — SQL-style join the columns in
df1 with the columns on df2 where the rows for colhave identical
values. how can be equal to one of: 'left', 'right', 'outer', 'inner'

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Çay
fasiləsi

www.qss.a z
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Matplotlib 66

#pip install matplotlib

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Scatter plot 67

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Clustering 68

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Density Plot 69

www.qss.az
QSSAnalytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Github 70

www.github.com
 Largest web based git repository hosting service
 Allows code collaboration
 Allows open source projects and documentation

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 71

• Git is a distributed revision control and source code management system


• Version Control System (VCS) is a software that helps software
developers to work together and maintain a complete history of their work.
• Git is a version-control system for tracking changes in computer files and
coordinating work on those files among multiple people. 
• Git helps you keep track of the changes you make to your code.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 72

Install git
• https://git-scm.com/downloads

Create Github account


• https://github.com/

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 73

 A system that keeps records of your changes


 Allows for collaborative development
 Allows you to know who made what changes and when
 Allows you to revert changes

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 74

git
config
git git
pull init

git git
push clone

Git
git git
remote add

git git
status commit
git
diff

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 75

• git config
Usage: git config –global user.name “[name]”
Usage: git config –global user.email “[email address]”

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 76

• git init
This command is used to start a new repository.
• git clone
This command is used to obtain a repository from an existing URL.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
Git 77

• git remote
Usage: git remote add [variable name] [Remote Server Link]
This command is used to connect your local repository to the remote server.

• git push
Usage: git push [variable name] master
This command sends the committed changes of master branch to your
remote repository.

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.
78

GƏLDİYİNİZ ÜÇÜN TƏŞƏKKÜRLƏR!

www.qss.az
QSS Analytics/Tədqiqat və İnkişaf Mərkəzi. Bütün hüquqlar qorunur.

You might also like