You are on page 1of 93

Introduction to Analytics

There are major four stages in any analytics work-

• Stage 1: Descriptive Analytics


• Where data or information is gathered and
summarized upon. This stage usually caters to
questions like “How many students dropped
out last year?”
• Stage 2: Diagnostic Analytics
• Where data is analysed, and insights are
generated which help in answering the
question in the first stage. Here the question
that comes up will be “Why has the dropout
rate increased in the last one year?”
• Stage 3: Predictive analysis
• With the help of the analyses done in the
previous two stages, this stage tries to answer
unforeseen phenomena like “Which students
are most likely to drop out?
• Stage 4: Prescriptive Analytics
• Finally, the last stage tries to analyse the type
action required to be taken to support or
avoid the unforeseen phenomena predicted in
the previous stage. In this scenario,
prescriptive analytics has queries like “Which
students should I target to keep from dropping
out?”
• Banking :
• Large amount of customer
data is generated everyday in
Banks. While dealing with
millions of customers on
regular basis, it becomes
hard to track their
mortgages.
• Solution: 
• R builds a custom model that
maintains the loans provided
to every individual customer
which helps us to decide the
amount to be paid by the
customer over time.
• Healthcare:
• Every year millions of people
are admitted in hospital and
billions are spent annually
just in the admission
process.
• Solution:
• Given the patient history
and medical history, a
predictive model can be
built to identify who is at risk
for hospitalization and to
what extent the medical
equipment should be scaled.
• Insurance:
• Insurance extensively
depends on
forecasting. It is difficult
to decide which policy
to accept or reject.
• Solution: 
• By  using the continuous
credit report as input,
we can create a model
in R that will not only
assess risk appetite but
also make a predictive
forecast as well.
•  
Now we know how data analytics helps organizations to harness
their data and use it to identify new opportunities. If we talk
about the need for analytics in an organization, you must come
across these 4 aspects
• :
Data visualization:
• Data visualization: Data visualization is a visual access to huge amounts
of data that you have generated after analytics. The human mind processes
visual images and visual graphics are more better than compare to raw data.
Its always easy for us to understand a pie chart or a bar graph compare to
raw numbers.

- 71 -
R Vs Python: What’s the Difference?

• R and Python are both open-source


programming languages with a large
community. New libraries or tools are added
continuously to their respective catalog. R is
mainly used for statistical analysis while
Python provides a more general approach to
data science.
Introduction to R
• R is free and powerful programming language
for statistical computing and data
visualization.
Why R
• R can be used to compute a large variety of classical
statistic tests including:
• Student’s t-test comparing the means of two groups of
samples
• Wilcoxon test, a non parametric alternative of t-test
• Analysis of variance (ANOVA) comparing the means of
more than two groups
• Chi-square test comparing proportions/distributions
• Correlation analysis for evaluating the relationship
between two or more variables
• It is also possible to use R for performing
classification analysis such as:
• Principal component analysis
• clustering
• Many types of graphs can be drawn using R,
including: box plot, histogram, density curve,
scatter plot, line plot, bar plot,etc..
• R is open source and completely free. R community
members regularly contribute packages to increase
R’s functionality.
• R has extensive statistical and graphing capabilities.
• R provides hundreds of built-in statistical functions as
well as its own built-in programming language.
• R is used in teaching and performing computational
statistics. It is the language of choice for many
academics who teach computational statistics.
• Getting help from the R user community is easy.
There are readily available online tutorials, data sets,
and discussion forums about R.
• R combines aspects of functional and object-
oriented programming.
• R can use in interactive mode
• It is an interpreted language rather than a
compiled one.
• Finding and fixing mistakes is typically much
easier in R than in many other languages.
What is CRAN?
• CRAN abbreviates Comprehensive R Archive
Network will provide binary files and follow
the installation instructions and accepting all
defaults.
• Download from http://cran.r-project.org/
R Studio:
• R Studio is an Integrated Development
Environment (IDE) for R Language with
advanced and more
• user-friendly GUI. R Studio allows the user to
run R in a more user-friendly environment. It
is open-source (i.e. Free) and available at
http://www.rstudio.com/.
• RStudio is a four pane work-space for
• 1) creating file containing R script
• 2) typing R commands
• 3) viewing command histories
• 4) viewing plots and more
• top-left panel: Code editor allowing you to create and open a
file containing R script. The R script is where you keep a
record of your work. R script can be created as follow: File –>
New –> R Script.
• Bottom-left panel: R console for typing R commands
Top-right panel:
• Workspace tab: shows the list of R objects you created during
your R session
• History tab: shows the history of all previous commands
Bottom-right panel:
• Files tab: show files in your working directory
• Plots tab: show the history of plots you created. From this tab,
you can export a plot to a PDF or an image files
• Packages tab: show external R packages available on your
system. If checked, the package is loaded in R.
Who uses R?
• Bank of America uses R for reporting.
• Mozilla, the foundation responsible for the
Firefox web browser, uses R to visualize Web
activity.
• Google uses R to predict Economic Activity.
• ANZ, the fourth largest bank in Australia, using
R for credit risk analysis.
How R Works
• First, R is an interpreted language, not a compiled one
meaning that all commands typed on the keyboard are directly
executed without requiring to build a complete program like in
most computer languages (C, Fortran, Pascal, . . . ).
• Second, R's syntax is very simple and intuitive.
• A function always needs to be written with parentheses, even
if there is nothing within them (e.g., ls()).
• When R is running, variables, data, functions, results, etc, are
stored in the active memory of the computer in the form of
objects which have a name.
• The user can do actions on these objects with operators
(arithmetic, logical, comparison, . . . ) and functions (which
are themselves objects).
• R is available in several forms: the sources (written
mainly in C and some routines in Fortran), essentially
for Unix and Linux machines, or some pre-compiled
binaries for Windows, Linux, and Macintosh.
• The files needed to install R, either from the sources
or from the pre-compiled binaries, are distributed
from the internet site of the Comprehensive R Archive
Network (CRAN) where the instructions for the
installation are also available
• The functions available to the user are stored in a
library localised on the disk in a directory called R
HOME/library (R HOME is the directory where R
is installed).
• This directory contains packages of functions,
which are themselves structured in directories.
The package named base is in a way the core of R
and contains the basic functions of the language,
particularly, for reading and manipulating data
• Each package has a directory called R with file
named like the package.
• for instance, for the package base, this is
thefile R HOME/library/base/R/base.
• This file contains all the functions of the
package.
Active memory??
• With active memory, computation that is typically handled by a
CPU is performed within the memory system. Performance is
improved and energy reduced because processing is done in
proximity to the data without incurring the overhead of moving
the data across chip interconnects from memory to the
processor.
• a separate logic layer added to the memory stack holds
compute elements that operate directly on data in the memory
package itself. The benefits of this approach are two-fold. First,
the amount of data transferred from the memory chips to the
CPU can be reduced because computation can occur in the
logic layer. 

You might also like