Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .

Introduction
1.1 How to Read This Book?
1.2 A Short Introduction to R
1.2.1 Starting with R
1.2.2 R Objects
1.2.3 Vectors
1.2.4 Vectorization
1.2.5 Factors
1.2.6 Generating Sequences
1.2.7 Sub-Setting
1.2.8 Matrices and Arrays
1.2.9 Lists
1.2.10 Data Frames
1.2.11 Creating New Functions
1.2.12 Objects, Classes, and Methods
1.3 A Short Introduction to MySQL
Predicting Algae Blooms
2.1 Problem Description and Objectives
2.2 Data Description
2.4 Data Visualization and Summarization
FIGURE 2.3: An“enriched”box plot for orthophosphate
FIGURE 2.4: A conditioned box plot of Algal a1
FIGURE 2.5: A conditioned box percentile plot of Algal a1
FIGURE 2.6: A conditioned strip plot of Algal a3 using a continuous vari- able
2.5 Unknown Values
2.5.1 Removing the Observations with Unknown Values
2.5.2 Filling in the Unknowns with the Most Frequent Val- ues
FIGURE 2.7: A histogram of variable mxPH conditioned by season
FIGURE 2.8: The values of variable mxPH by river size and speed
2.6 Obtaining Prediction Models
2.6.1 Multiple Linear Regression
2.6.2 Regression Trees
2.7 Model Evaluation and Selection
FIGURE 2.11: Visualization of the cross-validation results
FIGURE 2.12: Visualization of the cross-validation results on all algae
2.8 Predictions for the Seven Algae
2.9 Summary
Predicting Stock Market Returns
3.1 Problem Description and Objectives
3.2 The Available Data
3.2.1 Handling Time-Dependent Data in R
3.2.2 Reading the Data from the CSV File
3.2.3 Getting the Data from the Web
3.2.4 Reading the Data from a MySQL Database
3.3.1 What to Predict?
FIGURE 3.1: S&P500 on the last 3 months and our indicator
3.3.2 Which Predictors?
3.4 The Prediction Models
3.4.1 How Will the Training Data Be Used?
FIGURE 3.3: Three forms of obtaining predictions for a test period
3.4.2 The Modeling Tools
3.4.2.1 Artiﬁcial Neural Networks
3.4.2.2 Support Vector Machines
FIGURE 3.5: An example of two hinge functions with the same threshold
3.5 From Predictions into Actions
3.5.1 How Will the Predictions Be Used?
3.5.3 Putting Everything Together: A Simulated Trader
3.6 Model Evaluation and Selection
3.6.1 Monte Carlo Estimates
FIGURE 3.7: The Monte Carlo experimental process
3.6.2 Experimental Comparisons
3.6.3 Results Analysis
3.7.1 Evaluation of the Final Test Data
FIGURE 3.11: Yearly percentage returns of“grow.nnetR.v12”system
3.8 Summary
4.2.2 Exploring the Dataset
FIGURE 4.1: The number of transactions per salesperson
4.2.3 Data Problems
4.2.3.1 Unknown Values
4.2.3.2 Few Transactions of Some Products
FIGURE 4.4: Some properties of the distribution of unit prices
4.3 Deﬁning the Data Mining Tasks
4.3.1 Diﬀerent Approaches to the Problem
4.3.1.1 Unsupervised Techniques
4.3.1.2 Supervised Techniques
4.3.1.3 Semi-Supervised Techniques
4.3.2 Evaluation Criteria
4.3.2.1 Precision and Recall
4.3.2.2 Lift Charts and Precision/Recall Curves
4.3.2.3 Normalized Distance to Typical Price
4.3.3 Experimental Methodology
4.4 Obtaining Outlier Rankings
4.4.1 Unsupervised Approaches
4.4.1.1 The Modiﬁed Box Plot Rule
4.4.1.2 Local Outlier Factors (LOF)
4.4.1.3 Clustering-Based Outlier Rankings (ORh)
4.4.2 Supervised Approaches
4.4.2.1 The Class Imbalance Problem
FIGURE 4.10: Using SMOTE to create more rare class examples
4.4.2.2 Naive Bayes
4.4.3 Semi-Supervised Approaches
4.5 Summary
0 of .
Results for:
P. 1
Data.mining.with.R.learning.with.Case.studies.nov.2010

# Data.mining.with.R.learning.with.Case.studies.nov.2010

Ratings: (0)|Views: 444 |Likes:

### Availability:

See more
See less

03/27/2013

pdf

text

original

Pages 6 to 130 are not shown in this preview.
Pages 136 to 171 are not shown in this preview.
Pages 177 to 178 are not shown in this preview.
Pages 184 to 203 are not shown in this preview.
Pages 209 to 290 are not shown in this preview.