There are major four stages in any analytics work-
• Stage 1: Descriptive Analytics
• Where data or information is gathered and summarized upon. This stage usually caters to questions like “How many students dropped out last year?” • Stage 2: Diagnostic Analytics • Where data is analysed, and insights are generated which help in answering the question in the first stage. Here the question that comes up will be “Why has the dropout rate increased in the last one year?” • Stage 3: Predictive analysis • With the help of the analyses done in the previous two stages, this stage tries to answer unforeseen phenomena like “Which students are most likely to drop out? • Stage 4: Prescriptive Analytics • Finally, the last stage tries to analyse the type action required to be taken to support or avoid the unforeseen phenomena predicted in the previous stage. In this scenario, prescriptive analytics has queries like “Which students should I target to keep from dropping out?” • Banking : • Large amount of customer data is generated everyday in Banks. While dealing with millions of customers on regular basis, it becomes hard to track their mortgages. • Solution: • R builds a custom model that maintains the loans provided to every individual customer which helps us to decide the amount to be paid by the customer over time. • Healthcare: • Every year millions of people are admitted in hospital and billions are spent annually just in the admission process. • Solution: • Given the patient history and medical history, a predictive model can be built to identify who is at risk for hospitalization and to what extent the medical equipment should be scaled. • Insurance: • Insurance extensively depends on forecasting. It is difficult to decide which policy to accept or reject. • Solution: • By using the continuous credit report as input, we can create a model in R that will not only assess risk appetite but also make a predictive forecast as well. • Now we know how data analytics helps organizations to harness their data and use it to identify new opportunities. If we talk about the need for analytics in an organization, you must come across these 4 aspects • : Data visualization: • Data visualization: Data visualization is a visual access to huge amounts of data that you have generated after analytics. The human mind processes visual images and visual graphics are more better than compare to raw data. Its always easy for us to understand a pie chart or a bar graph compare to raw numbers.
- 71 - R Vs Python: What’s the Difference?
• R and Python are both open-source
programming languages with a large community. New libraries or tools are added continuously to their respective catalog. R is mainly used for statistical analysis while Python provides a more general approach to data science. Introduction to R • R is free and powerful programming language for statistical computing and data visualization. Why R • R can be used to compute a large variety of classical statistic tests including: • Student’s t-test comparing the means of two groups of samples • Wilcoxon test, a non parametric alternative of t-test • Analysis of variance (ANOVA) comparing the means of more than two groups • Chi-square test comparing proportions/distributions • Correlation analysis for evaluating the relationship between two or more variables • It is also possible to use R for performing classification analysis such as: • Principal component analysis • clustering • Many types of graphs can be drawn using R, including: box plot, histogram, density curve, scatter plot, line plot, bar plot,etc.. • R is open source and completely free. R community members regularly contribute packages to increase R’s functionality. • R has extensive statistical and graphing capabilities. • R provides hundreds of built-in statistical functions as well as its own built-in programming language. • R is used in teaching and performing computational statistics. It is the language of choice for many academics who teach computational statistics. • Getting help from the R user community is easy. There are readily available online tutorials, data sets, and discussion forums about R. • R combines aspects of functional and object- oriented programming. • R can use in interactive mode • It is an interpreted language rather than a compiled one. • Finding and fixing mistakes is typically much easier in R than in many other languages. What is CRAN? • CRAN abbreviates Comprehensive R Archive Network will provide binary files and follow the installation instructions and accepting all defaults. • Download from http://cran.r-project.org/ R Studio: • R Studio is an Integrated Development Environment (IDE) for R Language with advanced and more • user-friendly GUI. R Studio allows the user to run R in a more user-friendly environment. It is open-source (i.e. Free) and available at http://www.rstudio.com/. • RStudio is a four pane work-space for • 1) creating file containing R script • 2) typing R commands • 3) viewing command histories • 4) viewing plots and more • top-left panel: Code editor allowing you to create and open a file containing R script. The R script is where you keep a record of your work. R script can be created as follow: File –> New –> R Script. • Bottom-left panel: R console for typing R commands Top-right panel: • Workspace tab: shows the list of R objects you created during your R session • History tab: shows the history of all previous commands Bottom-right panel: • Files tab: show files in your working directory • Plots tab: show the history of plots you created. From this tab, you can export a plot to a PDF or an image files • Packages tab: show external R packages available on your system. If checked, the package is loaded in R. Who uses R? • Bank of America uses R for reporting. • Mozilla, the foundation responsible for the Firefox web browser, uses R to visualize Web activity. • Google uses R to predict Economic Activity. • ANZ, the fourth largest bank in Australia, using R for credit risk analysis. How R Works • First, R is an interpreted language, not a compiled one meaning that all commands typed on the keyboard are directly executed without requiring to build a complete program like in most computer languages (C, Fortran, Pascal, . . . ). • Second, R's syntax is very simple and intuitive. • A function always needs to be written with parentheses, even if there is nothing within them (e.g., ls()). • When R is running, variables, data, functions, results, etc, are stored in the active memory of the computer in the form of objects which have a name. • The user can do actions on these objects with operators (arithmetic, logical, comparison, . . . ) and functions (which are themselves objects). • R is available in several forms: the sources (written mainly in C and some routines in Fortran), essentially for Unix and Linux machines, or some pre-compiled binaries for Windows, Linux, and Macintosh. • The files needed to install R, either from the sources or from the pre-compiled binaries, are distributed from the internet site of the Comprehensive R Archive Network (CRAN) where the instructions for the installation are also available • The functions available to the user are stored in a library localised on the disk in a directory called R HOME/library (R HOME is the directory where R is installed). • This directory contains packages of functions, which are themselves structured in directories. The package named base is in a way the core of R and contains the basic functions of the language, particularly, for reading and manipulating data • Each package has a directory called R with file named like the package. • for instance, for the package base, this is thefile R HOME/library/base/R/base. • This file contains all the functions of the package. Active memory?? • With active memory, computation that is typically handled by a CPU is performed within the memory system. Performance is improved and energy reduced because processing is done in proximity to the data without incurring the overhead of moving the data across chip interconnects from memory to the processor. • a separate logic layer added to the memory stack holds compute elements that operate directly on data in the memory package itself. The benefits of this approach are two-fold. First, the amount of data transferred from the memory chips to the CPU can be reduced because computation can occur in the logic layer.