You are on page 1of 6

Data Science

What we have learned so far.


• We started with Vlookup, if-else, loops,

• Then got into Business statistics under which we learned descriptive &
inferential statistics, types of data, mean, median, mode, standard
deviation.

• We learned about hypothesis testing along with T-Statistics and P-


value.

• Linear, multiple and logistics regression.

• In R we learned to use lm(), data manipulation, joining() in R


programming, libraries(dplyr) etc.

Use of Vlookup and if-else in excel.

Vlookup is an Excel function to get data from a table organized vertically.


Lookup values must appear in the rst column of the table passed into
Vlookup.  Vlookup supports approximate and exact matching, and
wildcards (* ?) for partial matches. Ex - If we want to get all the
information of a person by using just his/her name.

We use if-else when we want to impose some conditions or we want to


import some data based on a particular condition. Ex- If in our data we
we only want to get information of people having 10000<salary<50000.
fi

Descriptive Statistics provide simple summaries about the sample and the
measures. It only describes data and does not help in prediction.Here we can
use mean, median, mode, standard deviation to know where most of our data
lies.
Ex- If we have a sample size of 100 students and want to know what is their
favourite subject.

Inferential Statistics is used when we want to compare two different samples


and then make a prediction on the whole population.
Ex- take sample data from a small number of people and try to determine if
the data can predict whether the medicine will work for everyone (i.e. the
population).
In inferential statistics we use make use of hypothesis testing as in the above
example our null hypothesis would be that drug will work in the whole
population and the alternative hypothesis would be that the medicine does
not work in the whole population.

Linear Regression is used when we want to predict value of our


dependent variable based on the value of independent variable. Ex-
When we want to predict the use of social media based on whether
there is a lockdown in the city or not.

Multiple Regression is used when we have many independent variables


to predict our dependent variable. Ex- if we want to know the reason of
rising coronavirus cases. Reason could be travelling, no masks, no social
distancing etc.

In R we learned about linear regression model which helps us in


predicting our dependent variable. Ex- If we want to know the price of
an apartment on the basis of area, locality etc.

We also learned about dplyr. It helps us to manipulate our data. Ex- if we


want to lter our data based on some condition such as if we want to see
only NA values in our data.
fi

You might also like