Table Of Contents

1.1 How to Read This Book?
1.2 A Short Introduction to R
1.2.1 Starting with R
1.2.2 R Objects
1.2.3 Vectors
1.2.4 Vectorization
1.2.5 Factors
1.2.6 Generating Sequences
1.2.7 Sub-Setting
1.2.8 Matrices and Arrays
1.2.9 Lists
1.2.10 Data Frames
1.2.11 Creating New Functions
1.2.12 Objects, Classes, and Methods
1.2.13 Managing Your Sessions
1.3 A Short Introduction to MySQL
Predicting Algae Blooms
2.1 Problem Description and Objectives
2.2 Data Description
2.3 Loading the Data into R
2.4 Data Visualization and Summarization
FIGURE 2.3: An“enriched”box plot for orthophosphate
FIGURE 2.4: A conditioned box plot of Algal a1
FIGURE 2.5: A conditioned box percentile plot of Algal a1
FIGURE 2.6: A conditioned strip plot of Algal a3 using a continuous vari- able
2.5 Unknown Values
2.5.1 Removing the Observations with Unknown Values
2.5.2 Filling in the Unknowns with the Most Frequent Val- ues
FIGURE 2.7: A histogram of variable mxPH conditioned by season
FIGURE 2.8: The values of variable mxPH by river size and speed
2.6 Obtaining Prediction Models
2.6.1 Multiple Linear Regression
2.6.2 Regression Trees
2.7 Model Evaluation and Selection
FIGURE 2.11: Visualization of the cross-validation results
FIGURE 2.12: Visualization of the cross-validation results on all algae
2.8 Predictions for the Seven Algae
2.9 Summary
Predicting Stock Market Returns
3.1 Problem Description and Objectives
3.2 The Available Data
3.2.1 Handling Time-Dependent Data in R
3.2.2 Reading the Data from the CSV File
3.2.3 Getting the Data from the Web
3.2.4 Reading the Data from a MySQL Database Loading the Data into R Running on Windows Loading the Data into R Running on Linux
3.3 Defining the Prediction Tasks
3.3.1 What to Predict?
FIGURE 3.1: S&P500 on the last 3 months and our indicator
3.3.2 Which Predictors?
3.4 The Prediction Models
3.4.1 How Will the Training Data Be Used?
FIGURE 3.3: Three forms of obtaining predictions for a test period
3.4.2 The Modeling Tools Artificial Neural Networks Support Vector Machines Multivariate Adaptive Regression Splines
FIGURE 3.5: An example of two hinge functions with the same threshold
3.5 From Predictions into Actions
3.5.1 How Will the Predictions Be Used?
3.5.2 Trading-Related Evaluation Criteria
3.5.3 Putting Everything Together: A Simulated Trader
3.6 Model Evaluation and Selection
3.6.1 Monte Carlo Estimates
FIGURE 3.7: The Monte Carlo experimental process
3.6.2 Experimental Comparisons
3.6.3 Results Analysis
3.7 The Trading System
3.7.1 Evaluation of the Final Test Data
FIGURE 3.11: Yearly percentage returns of“grow.nnetR.v12”system
3.7.2 An Online Trading System
3.8 Summary
4.2.2 Exploring the Dataset
FIGURE 4.1: The number of transactions per salesperson
4.2.3 Data Problems Unknown Values Few Transactions of Some Products
FIGURE 4.4: Some properties of the distribution of unit prices
4.3 Defining the Data Mining Tasks
4.3.1 Different Approaches to the Problem Unsupervised Techniques Supervised Techniques Semi-Supervised Techniques
4.3.2 Evaluation Criteria Precision and Recall Lift Charts and Precision/Recall Curves Normalized Distance to Typical Price
4.3.3 Experimental Methodology
4.4 Obtaining Outlier Rankings
4.4.1 Unsupervised Approaches The Modified Box Plot Rule Local Outlier Factors (LOF) Clustering-Based Outlier Rankings (ORh)
4.4.2 Supervised Approaches The Class Imbalance Problem
FIGURE 4.10: Using SMOTE to create more rare class examples Naive Bayes AdaBoost
4.4.3 Semi-Supervised Approaches
4.5 Summary
