You are on page 1of 3

* To check current directory

getwd()

* 2 types of variables in statistics

1. Category variable (cant do any arithmatic operation. eg. gender) (Factor in R)


2. Metric variable (can do any arithmatic operation. eg. age) (num, int in R)

* Descriptive statistics
summary(variablename)

* summary statistics of particular region like europe

* To store data in another .csv file eg. the only europe data extracted.

* the variables v created - ls()

*to perform analytics to the particular variable ---

* to access particular column variable (Under15) in a imported file

vic$Under15

* to perform mean analytics on Under15


mean(vic$Under15)

* standard deviation stands for risk


more SD value, more risk

sd()
* to find min and max values
which.min()

to find which column variable have this value

------------

Install RStudio

tm - list all packages

Rcmdr -

QDAP -

gephi -

snowball -

------------

plot(x,y)

hist(x)

boxplot(x,y)
* to add axis legends

------------

* How to create table

table() - for category varibale table o/p ll b better (for metric plot)

----------

Gonna Try for New Data Set USDA

--------

Wine Test

formula

Model 1

y = b0+b1x+e

from the summary(model1)

y=-3.4178+0.6351*AGST

one unit change in AGST will change 0.64 unit in the price

Model 2

The same with all possible independent variables

The more star the more influencing/ contributing variable

so we can remove star-less variable like Age and FrancePop

Model 3

Now R-squared also increased - which show better model

and star for all the variables (even age too) that shows no unwanted variables

to find correlation among the variables

between -1 and 1

-1 and 1 highly related


0.0 - least related

thus, Age and FrancePop are -0.999 thus highly related.


So, we want to remove any one since the independent varaibles need to be not
dependent on each other. they should relate to the dependent variable.

Age is related to price than FrancePop, so we neglect FrancePop.

(Intercept) -3.4299802
AGST 0.6072093
HarvestRain -0.0039715
WinterRain 0.0010755
Age 0.0239308

Final Outcome:

Price=0.06072093(AGST)-0.0039715(HR)+0.0010755(WR)+0.0239308(Age)

Thus,
79% variance in price (from R-squared) are predicted by AGST, HR,WR and Age.

---------------------

Day 2

Text Analytics

ggplot2 - for WordCloud -

To download FB data

Netvizz v1.44

---------------

Contact No. pachayappanvn@gmail.com


9894917049

You might also like