Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword or section
Like this
24Activity

Table Of Contents

INTRODUCTION
CLASSIFICATION TREE CONSTRUCTION
Split Selection
Data Access
Tree Pruning
Missing Values
A SHORT INTRODUCTION TO REGRESSION TREES
Problem Definition
APPLICATIONS AND AVAILABLE SOFTWARE
Cataloging Sky Objects
Decision Trees in Today’s Data Mining Tools
SUMMARY
REFERENCES
MARKET BASKET ANALYSIS
ASSOCIATION RULE DISCOVERY
The Apriori Algorithm
The Power of the Frequent Item Set Strategy
MEASURES OF INTERESTINGNESS
Lift
Leverage
ITEM SET DISCOVERY
TECHNIQUES FOR FREQUENT ITEM SET DISCOVERY
Closed Item Set Strategies
Long Item Sets
Sampling
TECHNIQUES FOR DISCOVERING ASSOCIATION RULES WITHOUT ITEM SET DISCOVERY
ASSOCIATIONS WITH NUMERIC VALUES
APPLICATIONS OF ASSOCIATION RULE DISCOVERY
Artificial Neural Network Models for Data Mining
INTRODUCTION TO MULTILAYER FEEDFORWARD NETWORKS
GRADIENT BASED TRAINING METHODS FOR MFN
The Partial Derivatives
Nonlinear Least Squares Methods
Batch versus Incremental Learning
COMPARISON OF MFN AND OTHER CLASSIFICATION METHODS
Decision Tree Methods
Discriminant Analysis Methods
Multiple Partition Decision Tree
A Growing MFN
CASE STUDY 1—CLASSIFYING SURFACE TEXTURE
Experimental Conditions
Quantitative Comparison Results of Classification Methods
Closing Discussions on Case 1
INTRODUCTION TO SOM
The SOM Algorithm
SOM Building Blocks
Implementation of the SOM Algorithm
Trajectory Computation from Motor Cortical Discharge Rates
Using Data from Spiral Tasks to Train the SOM
Using Data from Spiral and Center→Out Tasks to Train the SOM
Average Testing Result Using the Leave-K-Out Method
Closing Discussions on Case 2
FINAL CONCLUSIONS AND DISCUSSIONS
Statistical Analysis of Normal and Abnormal Data
UNIVARIATE CONTROL CHARTS
Variables Control Charts
Attributes Control Charts
Cumulative Sum Control Charts
Exponentially Weighted Moving Average Control Charts
Choice of Control Charting Techniques
Average Run Length
MULTIVARIATE CONTROL CHARTS
Data Description
Multivariate EWMA Control Charts
Bayesian Data Analysis
FUNDAMENTALS OF BAYESIAN INFERENCE
A Simple Example
A More Complicated Example
Hierarchical Models and Exchangeability
Prior Distributions in Practice
BAYESIAN MODEL SELECTION AND MODEL AVERAGING
Model Selection
Model Averaging
Model Assessment
BAYESIAN COMPUTATION
Importance Sampling
Markov Chain Monte Carlo (MCMC)
An Example
Application to Massive Data
Importance Sampling for Analysis of Massive Data Sets
Variational Methods
BAYESIAN MODELING
BUGS and Models of Realistic Complexity via MCMC
Bayesian Predictive Modeling
Bayesian Descriptive Modeling
AVAILABLE SOFTWARE
DISCUSSION AND FUTURE DIRECTIONS
ACKNOWLEDGMENTS
Hidden Markov Processes and Sequential Pattern Mining
INTRODUCTION TO HIDDEN MARKOV MODELS
PARAMETER ESTIMATION IN THE PRESENCE OF MISSING DATA
The EM Algorithm
MCMC Data Augmentation
Missing Data Summary
LOCAL COMPUTATION
The Likelihood Recursion
The Forward-Backward Recursions
The Viterbi Algorithm
Understanding the Recursions
A Numerical Example Illustrating the Recursions
ILLUSTRATIVE EXAMPLES AND APPLICATIONS
Fetal Lamb Movements
The Business Cycle
HMM STATIONARY AND PREDICTIVE DISTRIBUTIONS
Stationary Distribution of dt
Predictive Distributions
Posterior Covariance of h
Strategies and Methods for Prediction
INTRODUCTION TO THE PREDICTION PROBLEM
Guiding Examples
Prediction Model Components
LOSS FUNCTIONS—WHAT WE ARE TRYING TO ACCOMPLISH
Common Regression Loss Functions
Common Classification Loss Functions
Cox Loss Function for Survival Data
LINEAR MODELS
Linear Regression
Classification
Generalized Linear Model
NONLINEAR MODELS
Nearest Neighbor and Kernel Methods
Tree Models
Neural Networks
Support Vector Machines
Boosting
AVAILABILITY OF SOFTWARE
Principal Components and Factor Analysis
Examples of Variation Patterns in Correlated Multivariate Data
Overview of Methods for Identifying Variation Patterns
REPRESENTATION AND ILLUSTRATION OF VARIATION PATTERNS IN MULTIVARIATE DATA
PRINCIPAL COMPONENTS ANALYSIS
Definition of Principal Components
Using Principal Components as Estimates of the Variation Patterns
FACTOR ROTATION
Capabilities and Limitations of PCA
Methods for Factor Rotation
BLIND SOURCE SEPARATION
The Classic Blind Source Separation Problem
Blind Separation Principles
Fourth-Order Blind Separation Methods
ADDITIONAL MANUFACTURING APPLICATIONS
Psychometric Methods of Latent Variable Modeling
BASIC LATENT VARIABLE MODELS
The Basic Latent Class Model
The Basic Finite Mixture Model
The Basic Latent Trait Model
The Basic Factor Analytic Model
Common Structure
EXTENSION FOR DATA MINING
Extending the Basic Latent Class Model
Extending the Basic Mixture Model
Extending the Latent Trait Model
Extending the Factor Analytic Model
AN ILLUSTRATIVE EXAMPLE
Hierarchical Structure in Transaction Data
Individualized Mixture Models
Data Sets
Experimental Results
REFERENCES AND TOOLS
References
Tools
Scalable Clustering
CLUSTERING TECHNIQUES: A BRIEF SURVEY
Partitional Methods
Hierarchical Methods
Discriminative versus Generative Models
Assessment of Results
Visualization of Results
CLUSTERING CHALLENGES IN DATA MINING
Transactional Data Analysis
Next Generation Clickstream Clustering
Clustering Coupled Sequences
Large Scale Remote Sensing
SCALABLE CLUSTERING FOR DATA MINING
Scalability to Large Number of Records or Patterns, N
Scalability to Large Number of Attributes or Dimensions,d
Balanced Clustering
SEQUENCE CLUSTERING TECHNIQUES
CASE STUDY: SIMILARITY BASED CLUSTERING OF MARKET BASKETS AND WEB LOGS
CASE STUDY: IMPACT OF SIMILARITY MEASURES ON WEB DOCUMENT CLUSTERING
Similarity Measures: A Sampler
Clustering Algorithms and Text Data Sets
Comparative Results
CLUSTERING SOFTWARE
Time Series Similarity and Indexing
TIME SERIES SIMILARITY MEASURES
Euclidean Distances and Lp Norms
Normalization Transformations
General Transformations
Dynamic Time Warping
Longest Common Subsequence Similarity
Piecewise Linear Representations
Probabilistic Methods
Other Similarity Measures
INDEXING TECHNIQUES FOR TIME SERIES
Indexing Time Series When the Distance Function Is a Metric
A Survey of Dimensionality Reduction Techniques
Subsequence Retrieval
Nonlinear Time Series Analysis
EMBEDDING METHOD FOR CHAOTIC TIME SERIES ANALYSIS
Reconstruction of Phase Space
Computation of Dimension
Detection of Unstable Periodic Orbits
Computing Lyapunov Exponents from Time Series
TIME-FREQUENCY ANALYSIS OF TIME SERIES
Analytic Signals and Hilbert Transform
Method of EMD
ACKNOWLEDGMENT
Distributed Data Mining
RELATED RESEARCH
DATA DISTRIBUTION AND PREPROCESSING
Homogeneous/Heterogeneous Data Scenarios
Data Preprocessing
DISTRIBUTED DATA MINING ALGORITHMS
Distributed Classifier Learning
Collective Data Mining
Distributed Association Rule Mining
Distributed Clustering
Privacy Preserving Distributed Data Mining
Other DDM Algorithms
DISTRIBUTED DATA MINING SYSTEMS
Architectural Issues
Communication Models in DDM Systems
Components Maintenance
FUTURE DIRECTIONS
How Data Relates to Data Mining
The “10 Commandments” of Data Mining
What You Need to Know about Algorithms Before Preparing Data
Why Data Needs to be Prepared Before Mining It
DATA COLLECTION
Choosing the Right Data
Assembling the Data Set
Assaying the Data Set
Assessing the Effect of Missing Values
DATA PREPARATION
Why Data Needs Preparing: The Business Case
Representing Time: Absolute, Relative, and Cyclic
Outliers and Distribution Normalization
Ranges and Normalization
Numbers and Categories
DATA QUALITY
What Is Quality?
Enforcing Quality: Advantages and Disadvantages
Data Quality and Model Quality
DATA VISUALIZATION
Seeing Is Believing
Absolute Versus Relative Visualization
Visualizing Multiple Interactions
Data Storage and Management
TEXT FILES AND SPREADSHEETS
Text Files for Data
Spreadsheet Files
DATABASE SYSTEMS
Historical Databases
Relational Database
Object-Oriented Database
ADVANCED TOPICS IN DATA STORAGE AND MANAGEMENT
OLAP
Data Warehouse
Distributed Databases
Feature Extraction, Selection, and Construction
FEATURE EXTRACTION
Concepts
Algorithms
Summary
FEATURE SELECTION
Algorithm
FEATURE CONSTRUCTION
Algorithms and Examples
SOME APPLICATIONS
Performance Analysis and Evaluation
OVERVIEW OF EVALUATION
Training versus Testing
MEASURING ERROR
Error Measurement
Error from Regression
Error from Classification
Error from Conditional Density Estimation
Accuracy
False Positives and Negatives
Precision, Recall, and the F Measure
Sensitivity and Specificity
Confusion Tables
ROC Curves
Lift Curves
Clustering Performance: Unlabeled Data
ESTIMATING ERROR
Independent Test Cases
Significance Testing
Resampling and Cross-Validation
Bootstrap
Time Series
ESTIMATING COST AND RISK
OTHER ATTRIBUTES OF PERFORMANCE
Training Time
Application Time
Interpretability
Expert Evaluation
Field Testing
Cost of Obtaining Labeled Data
Security and Privacy
INTRODUCTION: WHY THERE ARE SECURITY AND PRIVACY ISSUES WITH DATA MINING
DETAILED PROBLEM ANALYSIS, SOLUTIONS, AND ONGOING RESEARCH
Privacy of Individual Data
RELATIONSHIPS
Mining Human Performance Data
INTRODUCTION AND OVERVIEW
MINING FOR ORGANIZATIONAL LEARNING
Methods
INDIVIDUAL LEARNING
Data on Individual Learning
Individual Forgetting
DISTRIBUTIONS AND PATTERNS OF INDIVIDUAL PERFORMANCE
OTHER AREAS
PRIVACY ISSUES FOR HUMAN PERFORMANCE DATA
Mining Text Data
TAXONOMY CONSTRUCTION
IMPLEMENTATION ISSUES OF TEXT MINING
Soft Matching
Temporal Resolution
Anaphora Resolution
To Parse or Not to Parse?
Database Connectivity
VISUALIZATIONS AND ANALYTICS FOR TEXT MINING
Definitions and Notations
Category Connection Maps
Relationship Maps
Trend Graphs
Mining Geospatial Data
SPATIAL OUTLIER DETECTION TECHNIQUES
Illustrative Examples and Application Domains
Tests for Detecting Spatial Outliers
Solution Procedures
SPATIAL COLOCATION RULES
Illustrative Application Domains
Colocation Rule Approaches
LOCATION PREDICTION
An Illustrative Application Domain
Problem Formulation
Modeling Spatial Dependencies Using the SAR and MRF Models
Logistic SAR
MRF Based Bayesian Classifiers
CLUSTERING
Categories of Clustering Algorithms
K-Medoid: An Algorithm for Clustering
Clustering, Mixture Analysis, and the EM Algorithm
Mining Science and Engineering Data
MOTIVATION FOR MINING SCIENTIFIC DATA
DATA MINING EXAMPLES IN SCIENCE AND ENGINEERING
Data Mining in Astronomy
Data Mining in Earth Sciences
Data Mining in Medical Imaging
Data Mining in Nondestructive Testing
Data Mining in Security and Surveillance
Data Mining in Simulation Data
Other Applications of Scientific Data Mining
POTENTIAL SOLUTIONS TO SOME COMMON PROBLEMS
Data Registration
De-Noising Data
Object Identification
Dimensionality Reduction
Generating a Good Training Set
Software for Scientific Data Mining
Mining Data in Bioinformatics
BACKGROUND
Basic Molecular Biology
Mining Methods in Protein Structure Prediction
MINING PROTEIN CONTACT MAPS
Classifying Contacts Versus Noncontacts
Mining Methodology
How Much Information Is There in Amino Acids Alone?
Using Local Structures for Contact Prediction
CHARACTERIZING PHYSICAL, PROTEIN-LIKE CONTACT MAPS
Generating a Database of Protein-Like Structures
Mining Dense Patterns in Contact Maps
Pruning and Integration
FUTURE DIRECTIONS FOR CONTACT MAP MINING
Heuristic Rules for “Physicality”
Rules for Pathways in Contact Map Space
Mining Customer Relationship Management (CRM) Data
Strategic Questions
Operational Questions
Mining Computer and Network Security Data
INTRUSIVE ACTIVITIES AND SYSTEM ACTIVITY DATA
Phases of Intrusions
Data of System Activities
EXTRACTION AND REPRESENTATION OF ACTIVITY FEATURES FOR INTRUSION DETECTION
Features of System Activities
Feature Representation
EXISTING INTRUSION DETECTION TECHNIQUES
Data Source and Representation
Testing Performance
Mining Image Data
RELATED WORKS
METHOD
How to Discover the Number of Clusters: k
K-Automatic Discovery Algorithm
Clustering Algorithm
EXPERIMENTAL RESULTS
Data Item Representation
Evaluation Method
Results and Analysis
Mining Manufacturing Quality Data
MEWMA Charts
NONPARAMETRIC PROPERTIES OF THE MEWMA CONTROL CHARTS
Author Index
Subject Index
0 of .
Results for:
No results containing your search query
P. 1
7262410 Data Mining Handbook

7262410 Data Mining Handbook

Ratings: (0)|Views: 282 |Likes:
Published by maxwell3333

More info:

Published by: maxwell3333 on Sep 03, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/11/2013

pdf

text

original

You're Reading a Free Preview
Pages 15 to 478 are not shown in this preview.
You're Reading a Free Preview
Pages 493 to 514 are not shown in this preview.
You're Reading a Free Preview
Pages 537 to 630 are not shown in this preview.
You're Reading a Free Preview
Pages 645 to 722 are not shown in this preview.

Activity (24)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
gmundy liked this
antonio.feynmann liked this
Jothi Kumar liked this
akbisoi1 liked this
akbisoi1 liked this
vijayang liked this
pakox liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->