You are on page 1of 63

R,

30 Open Technet
-
2012.07.26

R,

Excellence in Application & Data Governance

What is R?
R is a language and environment for statistical computing and graphics.

l Data analysis software


l A programming language
-

l An environment
-

, , , ,

l An open-source software project


-

Free, open, and active

l A community
-

contributors, 2

Excellence in Application & Data Governance

R : Open source analytics for the enterprise

Excellence in Application & Data Governance

R
l ,

: Revolution Analytics
Excellence in Application & Data Governance

R
l ,
tool

Excellence in Application & Data Governance

R
l , Google Facebook R

Google's R Style Guide


Google uses R for data exploration
and model prototyping, it is not
typically used in production: in Bos
group, R is typically run in a desktop
environment.
- Bo Cowgill, Google

Itamar conveyed how Facebooks


Data Team used R in 2007 to
answer two questions about new
users: (i) which data points predict
whether a user will stay? and (ii) if
they stay, which data points predict
how active theyll be after three
months?
- Itamar Rosenn, Facebook
http://www.dataspora.com/2009/02/predictive-analytics-using-r/
Excellence in Application & Data Governance

Vendor R
l Oracle, IBM Netezza, SAP HANA, Teradata in-memory
in-database R

Excellence in Application & Data Governance

Vendor R
l SAS SPSS R

SAS/JMP

Excellence in Application & Data Governance

SAS understands why R


A key benefit of R is that it provides near
instant availability of new and experimental
methods created by its user base without
waiting for the development/release cycle of
commercial software. SAS recognizes the
value of R to our customer base
Michael Gilliland, Product Marketing
Manager SAS Institute, Inc.

Excellence in Application & Data Governance

10

SPSS with R
l SPSS R ?
-
- , R
, . ( )

l SPSS Statistics 17( 18 ) R


Win-Win
-

SPSS, R
SPSS , , TEST ,
,
Tobit SPSS
, , ,
. SPSS
!!!
: RUserConf2011 - SPSS R , , SPSS Korea, 2011.10.28

Excellence in Application & Data Governance

11

Big Data R
l

: Revolution Analytics
Excellence in Application & Data Governance

12

R
l In-Memory Computing
- , H/W
l Object-oriented programming
- , object
- (class) & (method)
l Package
-
- , Help Examples
l Visualization
-
- Chart, Plot, MotionChart, Map R
Excellence in Application & Data Governance

13

History of R

: (, 2011)

l R 1993 2(Ross Ihaka,


Robert Gentleman)
l 1976 Bell Lab John Chambers, Rick Becker, Allan Wilks
S Language
Excellence in Application & Data Governance

14

The R Foundation (http://www.r-project.org)

l R Development Core Team


l R R Development Core Team

Excellence in Application & Data Governance

15

CRAN (The Comprehensive R Archive Network)


http://cran.r-project.org/

Korea :
http://cran.nexr.com/

l R CRAN Site
l 39 87 Mirror

Excellence in Application & Data Governance

16

R Package

R Package (http://cran.r-project.org/web/packages/)
R

l CRAN Site 3,938


(2012 7 20 )

l
IT

l Software Vendor
Version Up

Excellence in Application & Data Governance

17

R Package

CRAN Task View (http://cran.r-project.org/web/views/)

Excellence in Application & Data Governance

18

R Package

Crantastic! (http://crantastic.org)

Excellence in Application & Data Governance

19

R Visualization

ggplot2
l ggplot2 is a flexible, colorful, and dynamic graphics package:

Excellence in Application & Data Governance

20

R Visualization

Word Cloud
l R text mining Word Cloud

Excellence in Application & Data Governance

21

R Visualization

2-D contour map of Maunga Whau volcano

Excellence in Application & Data Governance

22

R Visualization

3-D contour map of Maunga Whau volcano


Make it 3-d:

Excellence in Application & Data Governance

23

R Visualization

googleVis
l R API

MotionChart Gap Miner


googleVis R

Hans Rosling: No more boring data: TEDTalks

http://code.google.com/p/google-motion-charts-with-r/
Excellence in Application & Data Governance

24

R Visualization

RgoogleMaps
l

# ...
library(RgoogleMaps)
#
tollgate_info <- read.csv(".csv")
#
map.center.loc <- c(36, 128)
#
input.zoom <- 7
map_data <- tollgate_info
#
win.graph()
mymap <- GetMap(center = map.center.loc,
zoom = input.zoom, maptype = "road", format
= "roadmap", destfile = "mymap.png")
PlotOnStaticMap(mymap, lat =
map_data$Y, lon = map_data$X,
destfile = "mymap.point.png", cex = 1, pch
=20, col="blue)
: R ,
Excellence in Application & Data Governance

25

R Visualization

animation
l R Graph animation
# animation
saveHTML({
for(map.i in 1:length(unique.name)) {
mymap <- GetMap(center =
map.center.loc, zoom = input.zoom, maptype
= "road", format = "roadmap", destfile =
"mymap.png")
PlotOnStaticMap(mymap, lat =
map_data[map_data$ ==
unique.name[map.i], ]$Y, lon =
map_data[map_data$ ==
unique.name[map.i], ]$X, destfile =
"mymap.point.png", cex = 1, pch =20,
col="blue")
}
}, img.name = "unif_plot", imgdir = "unif_dir",
htmlfile = "random.html",
autobrowse = FALSE, title = "
",
description = c("RgoogleMaps
.\n\n"))
: R ,
Excellence in Application & Data Governance

26

Useful R IDE

RGUI
l RGui , ,

Excellence in Application & Data Governance

27

Useful R IDE

R Studio (http://www.rstudio.org/)

Excellence in Application & Data Governance

28

Useful R IDE

R Studio

Excellence in Application & Data Governance

29

Useful R IDE

Red-R (http://www.red-r.org/)

Excellence in Application & Data Governance

30

Useful R IDE

Red-R

Excellence in Application & Data Governance

31

Useful R IDE

Rattle (http://rattle.togaware.com/), (install.packages("rattle"))

Excellence in Application & Data Governance

32

Useful R IDE

Rattle (library(rattle); rattle())

Excellence in Application & Data Governance

33

Useful R IDE

RExcel (http://rcom.univie.ac.at/)

Excellence in Application & Data Governance

34

Useful R IDE

RExcel

Excellence in Application & Data Governance

35

Useful R IDE

R Commander (install.packages("Rcmdr"), library(Rcmdr))

Excellence in Application & Data Governance

36

Useful R IDE

R Commander

Excellence in Application & Data Governance

37

Useful R IDE

Revolution R (http://www.revolutionanalytics.com/)

Excellence in Application & Data Governance

38

Useful R IDE

Revolution R

Excellence in Application & Data Governance

39

Useful R IDE

StatET for R (http://www.walware.de/goto/statet)

Excellence in Application & Data Governance

40

Useful R IDE

StatET for R

Excellence in Application & Data Governance

41

R
l http://www.r-project.org
l http://www.r-project.kr ( R - KRUG)
l http://www.r-bloggers.com
l http://stackoverflow.com
l http://stats.stackexchange.com
l http://www.inside-r.org/
l http://www.r-statistics.com/
l http://support.rstudio.org/
l http://quora.com
Excellence in Application & Data Governance

42

Excellence in Application & Data Governance

43

Learning R

: Revolution Analytics
Excellence in Application & Data Governance

44

R,

Excellence in Application & Data Governance

45

Advanced Analytics
l (Advanced Analytics) ,
() ,

Optimization
Predictive Modeling

egatnavda evititepmoC

Forecasting/extrapolation
Statistical Analysis

Whats the best that can happen?


What will happen next?
What if these trends continue?

Analytics

Why is this happening?


What actions are needed?

Alerts
Query/drill down

Where exactly is the problem?

Ad hoc Reports

How many, how often, where?

Standard Reports

What happened?

Access and
Reporting

Degree of Intelligence
: Davenport and Harris, Competing on Analytics, Harvard Business School Press (2007)
Excellence in Application & Data Governance

46

What is Big Data Analysis?

: SAS and IDC


Excellence in Application & Data Governance

47

What makes a good data scientist?


l Technical expertise( ): the best data scientists
typically have deep expertise in some scientific discipline
l Curiosity(): a desire to go beneath the surface and discover
and distill a problem down into a very clear set of hypotheses that
can be tested.
l Storytelling(): the ability to use data to tell a story and to
be able to communicate it effectively.
l Cleverness(): the ability to look at a problem
in different, creative ways.
* Patil, DJ (2011-09-22). Building Data Science Teams. OReilly Media
Excellence in Application & Data Governance

48

R
l Single Core
- CPU 1
- R 2.14 parallel

l In-Memory
-
-

Open Source
l Snow, multicore, parallel, bigmemory Multi-core

l

Excellence in Application & Data Governance

49

RHIPE (http://www.datadr.org/index.html)
l RHIPE(R and Hadoop Integrated Programming Environment)
Purdue Univ. Saptarshi Guha
R
l R Hadoop MapReduce
l Revolution Analytics RHadoop .
l Divide & Recombine (D&R) , not parallel processing.

Facebook RHIPE Video (2010.03.09)


http://www.lecturemaker.com/2011/02/rhipe/

Excellence in Application & Data Governance

50

RHadoop (https://github.com/RevolutionAnalytics/RHadoop)
l Hadoop Map-Reduce R Code
Hadoop

: Revolution Analytics
Excellence in Application & Data Governance

51

RHadoop (Word Count Example)

: R ,
Excellence in Application & Data Governance

52

RHive Motivation of Rhive (NexR)

: R and RHive in Data Scientists toolbox, (2011)


Excellence in Application & Data Governance

53

RHive (https://github.com/nexr/RHive)
l R Hive , R
Hive Hadoop

: R and RHive in Data Scientists toolbox, (2011)


Excellence in Application & Data Governance

54

RHive Example

: http://www.nexr.co.kr/nexrcorp_en/products_and_services/rhive_useCase.php
Excellence in Application & Data Governance

55

RevoScaleR
l RevoScaleR Single Core
l HPC Clusters/Cloud R

RevoScaleR on Sing Computer

RevoScaleR on Multiple Computers


) HPC Clusters, IBM LSF Cluster ,
2012 Linux Clusters
: Revolution Analytics

Excellence in Application & Data Governance

56

High-Performance Computing
(http://cran.r-project.org/web/views/HighPerformanceComputing.html)
l Parallel computing: Explicit parallelism Rmpi, snow, snowfall, foreach, doMPI
l Parallel computing: Implicit parallelism fork, multicore
l Parallel computing: Grid computing
l Parallel computing: Hadoop

GridR, xgrid, biocep-distrib


RHIPE, Rhadoop, EMR(Elastic Map Reduce)

l Parallel computing: Random numbers doRNG, rprng


l Parallel computing: Resource managers and batch schedulers batch, BatchJobs
l Parallel computing: Applications

caret, maanova, tm, bcp

l Parallel computing: GPUs

gputools, cudaBayesreg, OpenCL, WideLM

l Large memory and out-of-memory data bigmemory, biglm, ff, HaddpStreaming


l Easier interfaces for Compiled code

inline, Rcpp, rJava

l Profiling tools

profr, proftools

Excellence in Application & Data Governance

57

RevoDeployR (Architecture)
l RevoDeployR R Application

Excellence in Application & Data Governance

: Revolution Analytics

58

RevoDeployR (Sample App)


l DeployR HTML5 plot library from JingCharts

: Revolution Analytics
Excellence in Application & Data Governance

59


l ?
:
RDBMS (Scale-up) ,
.
Scale-out
?
l ?
: , SAS SPSS
.
.

?

Excellence in Application & Data Governance

60

LCBEx (Low Cost But Excellent)


l (Low
Cost) (Excellent)
Analytic Platform( )

Efficiency




, ,

Scalability

Agility




Hadoop


,



: R ,

Excellence in Application & Data Governance

61


l .
l ?
( , ) .
Data Scientist ( ) .

Slow and Steady Wins the Race.


Study,

study,

Excellence in Application & Data Governance

study, study.

study

62

DG
Tel : 010-4801-6609
email : leewow@gtone.co.kr

Excellence in Application & Data Governance

63

You might also like