You are on page 1of 5

R Reference Card for Data Mining Performance Evaluation kmeansCBI() interface function for kmeans (fpc)

Yanchang Zhao, RDataMining.com, February 6, 2014 performance() provide various measures for evaluating performance of pre- cluster.optimal() search for the optimal k-clustering of the dataset
diction and classification models (ROCR) (bayesclust)
- See the latest version at http://www.RDataMining.com PRcurve() precision-recall curves (DMwR) clara() Clustering Large Applications (cluster)
- The package names are in parentheses. CRchart() cumulative recall charts (DMwR) fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (cluster)
- Recommended packages and functions are shown in bold. roc() build a ROC curve (pROC) kcca() k-centroids clustering (flexclust)
- Click a package in this PDF file to find it on CRAN. auc() compute the area under the ROC curve (pROC) ccfkms() clustering with Conjugate Convex Functions (cba)
ROC() draw a ROC curve (DiagnosisMed) apcluster() affinity propagation clustering for a given similarity matrix (ap-
Packages cluster)
Association Rules and Sequential Patterns party recursive partitioning
apclusterK() affinity propagation clustering to get K clusters (apcluster)
cclust() Convex Clustering, incl. k-means and two other clustering algorithms
Functions rpart recursive partitioning and regression trees
(cclust)
randomForest classification and regression based on a forest of trees using ran-
apriori() mine associations with APRIORI algorithm – a level-wise, KMeansSparseCluster() sparse k-means clustering (sparcl)
dom inputs
breadth-first algorithm which counts transactions to find frequent item- tclust(x,k,alpha,...) trimmed k-means with which a proportion alpha of
ROCR visualize the performance of scoring classifiers
sets (arules) observations may be trimmed (tclust)
caret classification and regression models
eclat() mine frequent itemsets with the Eclat algorithm, which employs
r1071 functions for latent class analysis, short time Fourier transform, fuzzy
equivalence classes, depth-first search and set intersection instead of
clustering, support vector machines, shortest path computation, bagged cluster- Hierarchical Clustering
counting (arules) a hierarchical decomposition of data in either bottom-up (agglomerative) or top-
ing, naive Bayes classifier, . . .
cspade() mine frequent sequential patterns with the cSPADE algorithm (aru- down (divisive) way
rpartOrdinal ordinal classification trees, deriving a classification tree when the
lesSequences) hclust() hierarchical cluster analysis on a set of dissimilarities
response to be predicted is ordinal
seqefsub() search for frequent subsequences (TraMineR) birch() the BIRCH algorithm that clusters very large data with a CF-tree (birch)
rpart.plot plots rpart models
Packages pROC display and analyze ROC curves pvclust() hierarchical clustering with p-values via multi-scale bootstrap re-
arules mine frequent itemsets, maximal frequent itemsets, closed frequent item- nnet feed-forward neural networks and multinomial log-linear models sampling (pvclust)
sets and association rules. It includes two algorithms, Apriori and Eclat. RSNNS neural networks in R using the Stuttgart Neural Network Simulator agnes() agglomerative hierarchical clustering (cluster)
arulesViz visualizing association rules (SNNS) diana() divisive hierarchical clustering (cluster)
arulesSequences add-on for arules to handle and mine frequent sequences neuralnet training of neural networks using backpropagation, resilient backprop- mona() divisive hierarchical clustering of a dataset with binary variables only
TraMineR mining, describing and visualizing sequences of states or events agation with or without weight backtracking (cluster)
rockCluster() cluster a data matrix using the Rock algorithm (cba)
Classification & Prediction Regression proximus() cluster the rows of a logical matrix using the Proximus algorithm
Decision Trees Functions (cba)
ctree() conditional inference trees, recursive partitioning for continuous, cen- isopam() Isopam clustering algorithm (isopam)
lm() linear regression
sored, ordered, nominal and multivariate response variables in a condi- flashClust() optimal hierarchical clustering (flashClust)
glm() generalized linear regression
tional inference framework (party) fastcluster() fast hierarchical clustering (fastcluster)
gbm() generalized boosted regression models (gbm)
rpart() recursive partitioning and regression trees (rpart) cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchical clus-
predict() predict with models
mob() model-based recursive partitioning, yielding a tree with fitted models as- tering dendrograms (dynamicTreeCut)
residuals() residuals, the difference between observed values and fitted val-
sociated with each terminal node (party) HierarchicalSparseCluster() hierarchical sparse clustering (sparcl)
ues
Random Forest nls() non-linear regression
gls() fit a linear model using generalized least squares (nlme) Model based Clustering
cforest() random forest and bagging ensemble (party)
gnls() fit a nonlinear model using generalized least squares (nlme) Mclust() model-based clustering (mclust)
randomForest() random forest (randomForest)
Packages HDDC() a model-based method for high dimensional data clustering (HDclassif )
importance() variable importance (randomForest)
fixmahal() Mahalanobis Fixed Point Clustering (fpc)
varimp() variable importance (party) nlme linear and nonlinear mixed effects models
fixreg() Regression Fixed Point Clustering (fpc)
Neural Networks gbm generalized boosted regression models
mergenormals() clustering by merging Gaussian mixture components (fpc)
nnet() fit single-hidden-layer neural network (nnet) Clustering Density based Clustering
mlp(), dlvq(), rbf(), rbfDDA(), elman(), jordan(), som(),
Partitioning based Clustering generate clusters by connecting dense regions
art1(), art2(), artmap(), assoz()
partition the data into k groups first and then try to improve the quality of clus- dbscan(data,eps,MinPts,...) generate a density based clustering of
various types of neural networks (RSNNS)
tering by moving objects from one group to another arbitrary shapes, with neighborhood radius set as eps and density thresh-
neuralnet training of neural networks (neuralnet)
kmeans() perform k-means clustering on a data matrix old as MinPts (fpc)
Support Vector Machine (SVM) pdfCluster() clustering via kernel density estimation (pdfCluster)
kmeansruns() call kmeans for the k-means clustering method and includes
svm() train a support vector machine for regression, classification or density-
estimation of the number of clusters and finding an optimal solution from
estimation (e1071) Other Clustering Techniques
several starting points (fpc)
ksvm() support vector machines (kernlab)
pam() the Partitioning Around Medoids (PAM) clustering method (cluster) mixer() random graph clustering (mixer)
Bayes Classifiers pamk() the Partitioning Around Medoids (PAM) clustering method with esti- nncluster() fast clustering with restarted minimum spanning tree (nnclust)
naiveBayes() naive Bayes classifier (e1071) mation of number of clusters (fpc) orclus() ORCLUS subspace clustering (orclus)

1
Plotting Clustering Solutions Packages Frequent Terms and Association
plotcluster() visualisation of a clustering or grouping in data (fpc) Rlof a parallel implementation of the LOF algorithm findAssocs() find associations in a term-document matrix (tm)
bannerplot() a horizontal barplot visualizing a hierarchical clustering (cluster) extremevalues detect extreme values in one-dimensional data findFreqTerms() find frequent terms in a term-document matrix (tm)
Cluster Validation mvoutlier multivariate outlier detection based on robust methods termFreq() generate a term frequency vector from a text document (tm)
outliers some tests commonly used for identifying outliers Topic Modelling
silhouette() compute or extract silhouette information (cluster)
cluster.stats() compute several cluster validity statistics from a clustering Time Series Analysis LDA() fit a LDA (latent Dirichlet allocation) model (topicmodels)
and a dissimilarity matrix (fpc) CTM() fit a CTM (correlated topics model) model (topicmodels)
clValid() calculate validation measures for a given set of clustering algorithms Construction & Plot terms() extract the most likely terms for each topic (topicmodels)
and number of clusters (clValid) ts() create time-series objects topics() extract the most likely topics for each document (topicmodels)
clustIndex() calculate the values of several clustering indexes, which can be plot.ts() plot time-series objects Sentiment Analysis
independently used to determine the number of clusters existing in a data smoothts() time series smoothing (ast )
sfilter() remove seasonal fluctuation using moving average (ast ) polarity() polarity score (sentiment analysis) (qdap)
set (cclust)
NbClust() provide 30 indices for cluster validation and determining the number Decomposition Text Categorization
of clusters (NbClust) decomp() time series decomposition by square-root filter (timsac) textcat() n-gram based text categorization (textcat)
Packages decompose() classical seasonal decomposition by moving averages Text Visualizatoin
cluster cluster analysis stl() seasonal decomposition of time series by loess wordcloud() plot a word cloud (wordcloud)
fpc various methods for clustering and cluster validation tsr() time series decomposition (ast) comparison.cloud() plot a cloud comparing the frequencies of words
mclust model-based clustering and normal mixture modeling ardec() time series autoregressive decomposition (ArDec) across documents (wordcloud)
birch clustering very large datasets using the BIRCH algorithm Forecasting commonality.cloud() plot a cloud of words shared across documents
pvclust hierarchical clustering with p-values arima() fit an ARIMA model to a univariate time series (wordcloud)
apcluster Affinity Propagation Clustering predict.Arima() forecast from models fitted by arima Packages
cclust Convex Clustering methods, including k-means algorithm, On-line Up- auto.arima() fit best ARIMA model to univariate time series (forecast) tm a framework for text mining applications
date algorithm and Neural Gas algorithm and calculation of indexes for finding forecast.stl(), forecast.ets(), forecast.Arima() topicmodels fit topic models with LDA and CTM
the number of clusters in a data set forecast time series using stl, ets and arima models (forecast) wordcloud various word clouds
cba Clustering for Business Analytics, including clustering techniques such as lda fit topic models with LDA
Proximus and Rock
Correlation and Covariance
acf() autocovariance or autocorrelation of a time series wordnet an interface to the WordNet
bclust Bayesian clustering using spike-and-slab hierarchical model, suitable for RTextTools automatic text classification via supervised learning
clustering high-dimensional data ccf() cross-correlation or cross-covariance of two univariate series
qdap transcript analysis, text mining and natural language processing
biclust algorithms to find bi-clusters in two-dimensional data Packages tm.plugin.dc a plug-in for package tm to support distributed text mining
clue cluster ensembles forecast displaying and analysing univariate time series forecasts tm.plugin.mail a plug-in for package tm to handle mail
clues clustering method based on local shrinking TSclust time series clustering utilities textir a suite of tools for inference about text documents and associated sentiment
clValid validation of clustering results dtw Dynamic Time Warping (DTW) tau utilities for text analysis
clv cluster validation techniques, contains popular internal and external cluster timsac time series analysis and control program textcat n-gram based text categorization
validation methods for outputs produced by package cluster ast time series analysis
bayesclust tests/searches for significant clusters in genetic data ArDec time series autoregressive-based decomposition Social Network Analysis and Graph Mining
clustsig significant cluster analysis, tests to see which (if any) clusters are statis- dse tools for multivariate, linear, time-invariant, time series models
tically different
Functions
clusterSim search for optimal clustering procedure for a data set Text Mining graph(), graph.edgelist(), graph.adjacency(),
clusterGeneration random cluster generation graph.incidence() create graph objects respectively from edges,
Text Cleaning and Preparation an edge list, an adjacency matrix and an incidence matrix (igraph)
gcExplorer graphical cluster explorer Corpus() build a corpus, which is a collection of text documents (tm)
hybridHclust hybrid hierarchical clustering via mutual clusters plot(), tkplot(), rglplot() static, interactive and 3D plotting of
tm map() transform text documents, e.g., stemming, stopword removal (tm) graphs (igraph)
Modalclust hierarchical modal Clustering tm filter() filtering out documents (tm)
iCluster integrative clustering of multiple genomic data types gplot(), gplot3d() plot graphs (sna)
TermDocumentMatrix(), DocumentTermMatrix() construct a vcount(), ecount() number of vertices/edges (igraph)
EMCC evolutionary Monte Carlo (EMC) methods for clustering term-document matrix or a document-term matrix (tm)
rEMM extensible Markov Model (EMM) for data stream clustering V(), E() vertex/edge sequence of igraph (igraph)
Dictionary() construct a dictionary from a character vector or a term- is.directed() whether the graph is directed (igraph)
Outlier Detection document matrix (tm) are.connected() check whether two nodes are connected (igraph)
stemDocument() stem words in a text document (tm) degree(), betweenness(), closeness(), transitivity() various cen-
Functions stemCompletion() complete stemmed words (tm) trality scores (igraph, sna)
boxplot.stats()$out list data points lying beyond the extremes of the SnowballStemmer() Snowball word stemmers (Snowball) add.edges(), add.vertices(), delete.edges(), delete.vertices()
whiskers stopwords(language) return stopwords in different languages (tm) add and delete edges and vertices (igraph)
lofactor() calculate local outlier factors using the LOF algorithm (DMwR removeNumbers(), removePunctuation(), removeWords() re- neighborhood() neighborhood of graph vertices (igraph, sna)
or dprep) move numbers, punctuation marks, or a set of words from a text docu- get.adjlist() adjacency lists for edges or vertices (igraph)
lof() a parallel implementation of the LOF algorithm (Rlof ) ment (tm) nei(), adj(), from(), to() vertex/edge sequence indexing (igraph)
removeSparseTerms() remove sparse terms from a term-document matrix (tm)

2
cliques(), largest.cliques(), maximal.cliques(), clique.number() Statistics persp() perspective plots of surfaces over the x?y plane
find cliques, ie. complete subgraphs (igraph) cloud(), wireframe() 3d scatter plots and surfaces (lattice)
clusters(), no.clusters() maximal connected components of a graph and Summarization interaction.plot() two-way interaction plot
the number of them (igraph) summary() summarize data iplot(), ihist(), ibar(), ipcp() interactive scatter plot, histogram, bar
fastgreedy.community(), spinglass.community() community detection describe() concise statistical description of data (Hmisc) plot, and parallel coordinates plot (iplots)
(igraph) boxplot.stats() box plot statistics pdf(), postscript(), win.metafile(), jpeg(), bmp(),
cohesive.blocks() calculate cohesive blocks (igraph) Analysis of Variance png(), tiff() save graphs into files of various formats
induced.subgraph() create a subgraph of a graph (igraph) aov() fit an analysis of variance model gvisAnnotatedTimeLine(), gvisAreaChart(),
%->%, %<-%, %--% edge sequence indexing (igraph) anova() compute analysis of variance (or deviance) tables for one or more fitted gvisBarChart(), gvisBubbleChart(),
get.edgelist() return an edge list in a two-column matrix (igraph) model objects gvisCandlestickChart(), gvisColumnChart(),
read.graph(), write.graph() read and writ graphs from and to files gvisComboChart(), gvisGauge(), gvisGeoChart(),
Statistical Tests
of various formats (igraph) gvisGeoMap(), gvisIntensityMap(),
chisq.test() chi-squared contingency table tests and goodness-of-fit tests gvisLineChart(), gvisMap(), gvisMerge(),
Packages ks.test() Kolmogorov-Smirnov tests gvisMotionChart(), gvisOrgChart(),
igraph network analysis and visualization t.test() student’s t-test gvisPieChart(), gvisScatterChart(),
sna social network analysis prop.test() test of equal or given proportions gvisSteppedAreaChart(), gvisTable(),
statnet a set of tools for the representation, visualization, analysis and simulation binom.test() exact binomial test gvisTreeMap() various interactive charts produced with the Google
of network data
Mixed Effects Models Visualisation API (googleVis)
egonet ego-centric measures in social network analysis
lme() fit a linear mixed-effects model (nlme) gvisMerge() merge two googleVis charts into one (googleVis)
snort social network-analysis on relational tables
network tools to create and modify network objects nlme() fit a nonlinear mixed-effects model (nlme) Packages
bipartite visualising bipartite networks and calculating some (ecological) indices Principal Components and Factor Analysis ggplot2 an implementation of the Grammar of Graphics
blockmodelinggeneralized and classical blockmodeling of valued networks princomp() principal components analysis googleVis an interface between R and the Google Visualisation API to create
diagram visualising simple graphs (networks), plotting flow diagrams prcomp() principal components analysis interactive charts
NetCluster clustering for networks Other Functions rCharts interactive javascript visualizations from R
NetData network data for McFarland’s SNA R labs lattice a powerful high-level data visualization system, with an emphasis on mul-
var(), cov(), cor() variance, covariance, and correlation
NetIndices estimating network indices, including trophic structure of foodwebs tivariate data
density() compute kernel density estimates
in R vcd visualizing categorical data
NetworkAnalysis statistical inference on populations of weighted or unweighted Packages iplots interactive graphics
networks nlme linear and nonlinear mixed effects models
Data Manipulation
tnet analysis of weighted, two-mode, and longitudinal networks Graphics
Spatial Data Analysis Functions
Functions transform() transform a data frame
Functions plot() generic function for plotting scale() scaling and centering of matrix-like objects
geocode() geocodes a location using Google Maps (ggmap) barplot(), pie(), hist() bar chart, pie chart and histogram t() matrix transpose
plotGoogleMaps() create a plot of spatial data on Google Maps (plot- boxplot() box-and-whisker plot aperm() array transpose
GoogleMaps) stripchart() one dimensional scatter plot sample() sampling
qmap() quick map plot (ggmap) dotchart() Cleveland dot plot table(), tabulate(), xtabs() cross tabulation
get map() queries the Google Maps, OpenStreetMap, or Stamen Maps server qqnorm(), qqplot(), qqline() QQ (quantile-quantile) plot stack(), unstack() stacking vectors
for a map at a certain location (ggmap) coplot() conditioning plot split(), unsplit() divide data into groups and reassemble
gvisGeoChart(), gvisGeoMap(), gvisIntensityMap(), splom() conditional scatter plot matrices (lattice) reshape() reshape a data frame between “wide” and “long” format
gvisMap() Google geo charts and maps (googleVis) pairs() a matrix of scatterplots merge() merge two data frames; similar to database join operations
GetMap() download a static map from the Google server (RgoogleMaps) cpairs() enhanced scatterplot matrix (gclus) aggregate() compute summary statistics of data subsets
ColorMap() plot levels of a variable in a colour-coded map (RgoogleMaps) parcoord() parallel coordinate plot (MASS) by() apply a function to a data frame split by factors
PlotOnStaticMap() overlay plot on background image of map tile cparcoord() enhanced parallel coordinate plot (gclus) melt(), cast() melt and then cast data into the reshaped or aggregated
(RgoogleMaps) parallelplot() parallel coordinates plot (lattice) form you want (reshape)
TextOnStaticMap() plot text on map (RgoogleMaps) densityplot() kernel density plot (lattice) complete.cases() find complete cases, i.e., cases without missing values
contour(), filled.contour() contour plot na.fail, na.omit, na.exclude, na.pass handle missing values
Packages levelplot(), contourplot() level plots and contour plots (lattice)
plotGoogleMaps plot spatial data as HTML map mushup over Google Maps smoothScatter() scatterplots with smoothed densities color representation;
Packages
RgoogleMaps overlay on Google map tiles in R capable of visualizing large datasets reshape flexibly restructure and aggregate data
ggmap Spatial visualization with Google Maps and OpenStreetMap sunflowerplot() a sunflower scatter plot data.table extension of data.frame for fast indexing, ordered joins, assignment,
plotKML visualization of spatial and spatio-temporal objects in Google Earth assocplot() association plot and grouping and list columns
SGCS Spatial Graph based Clustering Summaries for spatial point patterns mosaicplot() mosaic plot gdata various tools for data manipulation
spdep spatial dependence: weighting schemes, statistics and models matplot() plot the columns of one matrix against the columns of another
fourfoldplot() a fourfold display of a 2 × 2 × k contingency table

3
Data Access XML reading and creating XML and HTML documents registerDoSEQ(), registerDoSNOW(), registerDoMC() register respec-
tively the sequential, SNOW and multicore parallel backend with the
Functions MapReduce and Hadoop foreach package (foreach, doSNOW, doMC)
save(), load() save and load R data objects
Functions Packages
read.csv(), write.csv() import from and export to .CSV files
read.table(), write.table(), scan(), write() read and mapreduce() define and execute a MapReduce job (rmr2) snowfall usability wrapper around snow for easier development of parallel R
write data keyval() create a key-value object (rmr2) programs
read.xlsx(), write.xlsx() read and write Excel files (xlsx) from.dfs(), to.dfs() read/write R objects from/to file system (rmr2) snow simple parallel computing in R
read.fwf() read fixed width format files Packages multicore parallel processing of R code on machines with multiple cores or
write.matrix() write a matrix or data frame (MASS) rmr2 perform data analysis with R via MapReduce on a Hadoop cluster CPUs
readLines(), writeLines() read/write text lines from/to a connection, rhdfs connect to the Hadoop Distributed File System (HDFS) snowFT extension of snow supporting fault tolerant and reproducible applica-
such as a text file rhbase connect to the NoSQL HBase database tions, and easy-to-use parallel programming
sqlQuery() submit an SQL query to an ODBC database (RODBC) Rhipe R and Hadoop Integrated Processing Environment Rmpi interface (Wrapper) to MPI (Message-Passing Interface)
sqlFetch() read a table from an ODBC database (RODBC) RHive distributed computing via HIVE query rpvm R interface to PVM (Parallel Virtual Machine)
sqlSave(), sqlUpdate() write or update a table in an ODBC database Segue Parallel R in the cloud using Amazon’s Elastic Map Reduce (EMR) engine nws provide coordination and parallel execution facilities
(RODBC) HadoopStreaming Utilities for using R scripts in Hadoop streaming foreach foreach looping construct for R
sqlColumns() enquire about the column structure of tables (RODBC) hive distributed computing via the MapReduce paradigm doMC foreach parallel adaptor for the multicore package
sqlTables() list tables on an ODBC connection (RODBC) rHadoopClient Hadoop client interface for R doSNOW foreach parallel adaptor for the snow package
odbcConnect(), odbcClose(), odbcCloseAll() open/close con- doMPI foreach parallel adaptor for the Rmpi package
nections to ODBC databases (RODBC)
Large Data doParallel foreach parallel adaptor for the multicore package
dbSendQuery execute an SQL statement on a given database connection (DBI) doRNG generic reproducible parallel backend for foreach Loops
Functions
dbConnect(), dbDisconnect() create/close a connection to a DBMS GridR execute functions on remote hosts, clusters or grids
as.ffdf() coerce a dataframe to an ffdf (ff ) fork R functions for handling multiple processes
(DBI) read.table.ffdf(), read.csv.ffdf() read data from a flat file to an ffdf
Packages object (ff ) Interface to Weka
RODBC ODBC database access write.table.ffdf(), write.csv.ffdf() write an ffdf object to a flat file Package RWeka is an R interface to Weka, and enables to use the following Weka
foreign read and write data in other formats, such as Minitab, S, SAS, SPSS, (ff ) functions in R.
Stata, Systat, . . . ffdfappend() append a dataframe or an ffdf to an existing ffdf (ff ) Association rules:
DBI a database interface (DBI) between R and relational DBMS big.matrix() create a standard big.matrix, which is constrained to available Apriori(), Tertius()
RMySQL interface to the MySQL database RAM (bigmemory) Regression and classification:
RJDBC access to databases through the JDBC interface read.big.matrix() create a big.matrix by reading from an ASCII file (big- LinearRegression(), Logistic(), SMO()
RSQLite SQLite interface for R memory) Lazy classifiers:
ROracle Oracle database interface (DBI) driver write.big.matrix() write a big.matrix to a file (bigmemory) IBk(), LBR()
RpgSQL DBI/RJDBC interface to PostgreSQL database filebacked.big.matrix() create a file-backed big.matrix, which may ex- Meta classifiers:
RODM interface to Oracle Data Mining ceed available RAM by using hard drive space (bigmemory) AdaBoostM1(), Bagging(), LogitBoost(), MultiBoostAB(),
xlsx read, write, format Excel 2007 and Excel 97/2000/XP/2003 files mwhich() expanded “which”-like functionality (bigmemory) Stacking(),
xlsReadWrite read and write Excel files Packages CostSensitiveClassifier()
WriteXLS create Excel 2003 (XLS) files from data frames Rule classifiers:
ff memory-efficient storage of large data on disk and fast access functions
Web Data Access ffbase basic statistical functions for package ff JRip(), M5Rules(), OneR(), PART()
filehash a simple key-value database for handling large data Regression and classification trees:
Functions J48(), LMT(), M5P(), DecisionStump()
g.data create and maintain delayed-data packages
download.file() download a file from the Internet Clustering:
BufferedMatrix a matrix data storage object held in temporary files
xmlParse(), htmlParse() parse an XML or HTML file (XML) Cobweb(), FarthestFirst(), SimpleKMeans(), XMeans(),
biglm regression for data too large to fit in memory
userTimeline(), homeTimeline(), mentions(), DBScan()
bigmemory manage massive matrices with shared memory and memory-mapped
retweetsOfMe() retrieve various timelines within the Twitter uni- Filters:
files
verse (twitteR) Normalize(), Discretize()
biganalytics extend the bigmemory package with various analytics
searchTwitter() a search of Twitter based on a supplied search string (twit- Word stemmers:
bigtabulate table-, tapply-, and split-like functionality for matrix and
teR) IteratedLovinsStemmer(), LovinsStemmer()
big.matrix objects
getUser(), lookupUsers() get information of Twitter users (twitteR) Tokenizers:
getFollowers(), getFollowerIDs(), getFriends(), Parallel Computing AlphabeticTokenizer(), NGramTokenizer(), WordTokenizer()
getFriendIDs() get a list of followers/friends or their IDs of a
Twitter user (twitteR) Functions Interface to Other Programming Languages
twListToDF() convert twitteR lists to data frames (twitteR) sfInit(), sfStop() initialize and stop the cluster (snowfall) Functions
sfLapply(), sfSapply(), sfApply() parallel versions of
Packages .jcall() call a Java method (rJava)
lapply(), sapply(), apply() (snowfall)
twitteR an interface to the Twitter web API .jnew() create a new Java object (rJava)
foreach(...) %dopar% looping in parallel (foreach)
RCurl general network (HTTP/FTP/. . . ) client interface for R .jinit() initialize the Java Virtual Machine (JVM) (rJava)

4
.jaddClassPath() adds directories or JAR files to the class path (rJava) RDataMining on Twitter (1400+ followers)
Packages http://twitter.com/rdatamining
RDataMining Project on R-Forge
rJava low-level R to Java interface
http://www.rdatamining.com/package
Generating Reports http://package.rdatamining.com
Functions Comments & Feedback
Sweave() mixing text and R/S code for automatic report generation If you have any comments, or would like to suggest any relevant R pack-
Packages ages/functions, please feel free to email me <yanchang@rdatamining.com>.
Thanks.
knitr a general-purpose package for dynamic report generation in R
R2HTML making HTML reports
R2PPT generating Microsoft PowerPoint presentations
Building GUIs and Web Applications
shiny web application framework for R
svDialogs dialog boxes
gWidgets a toolkit-independent API for building interactive GUIs
R Editors/GUIs
RStudio a free integrated development environment (IDE) for R
Tinn-R a free GUI for R language and environment
rattle graphical user interface for data mining in R
Rpad workbook-style, web-based interface to R
RPMG graphical user interface (GUI) for interactive R analysis sessions
Red-R An open source visual programming GUI interface for R
R AnalyticFlow a software which enables data analysis by drawing analysis
flowcharts
latticist a graphical user interface for exploratory visualisation
Other R Reference Cards
R Reference Card, by Tom Short
http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/
R-refcard.pdf or
http://cran.r-project.org/doc/contrib/Short-refcard.pdf
R Reference Card, by Jonathan Baron
http://cran.r-project.org/doc/contrib/refcard.pdf
R Functions for Regression Analysis, by Vito Ricci
http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.
pdf
R Functions for Time Series Analysis, by Vito Ricci
http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
RDataMining Books
R and Data Mining: Examples and Case Studies
introduces into using R for data mining with examples and case studies.
http://www.rdatamining.com/books/rdm
Data Mining Applications with R
presents 15 real-world applications on data mining with R.
http://www.rdatamining.com/books/dmar
RDataMining Website, Group, Twitter & Package
RDataMining Website
http://www.rdatamining.com
RDataMining Group on LinkedIn (4000+ members)
http://group.rdatamining.com

You might also like