Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .

1.1 Data Matrix
1.2 Attributes
1.3 Data: Algebraic and Geometric View
1.3.1 Distance and Angle
1.3.2 Mean and Total Variance
1.3.3 Orthogonal Projection
1.3.4 Linear Independence and Dimensionality
1.4 Data: Probabilistic View
1.4.1 Bivariate Random Variables
1.4.2 Multivariate Random Variable
1.4.3 Random Sample and Statistics
1.5 Data Mining
1.5.1 Exploratory Data Analysis
1.5.2 Frequent Pattern Mining
1.5.3 Clustering
1.5.4 Classiﬁcation
1.7 Exercises
Data Analysis Foundations
Numeric Attributes
2.1 Univariate Analysis
2.1.1 Measures of Central Tendency
2.1.2 Measures of Dispersion
2.2 Bivariate Analysis
2.2.1 Measures of Location and Dispersion
2.2.2 Measures of Association
2.3 Multivariate Analysis
2.4 Data Normalization
2.5 Normal Distribution
2.5.1 Univariate Normal Distribution
2.5.2 Multivariate Normal Distribution
2.7 Exercises
Categorical Attributes
3.1 Univariate Analysis
3.1.1 Bernoulli Variable
3.1.2 Multivariate Bernoulli Variable
3.2.1 Attribute Dependence: Contingency Analysis
3.3 Multivariate Analysis
3.3.1 Multi-way Contingency Analysis
3.4 Distance and Angle
3.5 Discretization
3.7 Exercises
Graph Data
4.1 Graph Concepts
4.2 Topological Attributes
4.3 Centrality Analysis
4.3.1 Basic Centralities
4.3.2 Web Centralities
4.4 Graph Models
4.4.1 Erdös-Rényi Random Graph Model
4.4.2 Watts-Strogatz Small-world Graph Model
4.4.3 Barabási-Albert Scale-free Model
4.6 Exercises
Kernel Methods
5.1 Kernel Matrix
5.1.1 Reproducing Kernel Map
5.1.2 Mercer Kernel Map
5.2 Vector Kernels
5.3 Basic Kernel Operations in Feature Space
5.4 Kernels for Complex Objects
5.4.1 Spectrum Kernel for Strings
5.4.2 Diﬀusion Kernels on Graph Nodes
5.6 Exercises
High-Dimensional Data
6.1 High-Dimensional Objects
6.2 High-Dimensional Volumes
6.3 Hypersphere Inscribed within Hypercube
6.4 Volume of Thin Hypersphere Shell
6.5 Diagonals in Hyperspace
6.6 Density of the Multivariate Normal
6.7 Appendix: Derivation of Hypersphere Volume
6.9 Exercises
Dimensionality Reduction
7.1 Background
7.2 Principal Component Analysis
7.2.1 Best Line Approximation
7.2.2 Best Two-dimensional Approximation
7.2.3 Best r-dimensional Approximation
7.2.4 Geometry of PCA
7.3 Kernel Principal Component Analysis (Kernel PCA)
7.4 Singular Value Decomposition
7.4.1 Geometry of SVD
7.4.2 Connection between SVD and PCA
7.6 Exercises
Frequent Pattern Mining
Itemset Mining
8.1 Frequent Itemsets and Association Rules
8.2 Itemset Mining Algorithms
8.2.1 Level-Wise Approach: Apriori Algorithm
8.2.2 Tidset Intersection Approach: Eclat Algorithm
8.2.3 Frequent Pattern Tree Approach: FPGrowth Algorithm
8.3 Generating Association Rules
8.5 Exercises
Summarizing Itemsets
9.1 Maximal and Closed Frequent Itemsets
9.2 Mining Maximal Frequent Itemsets: GenMax Algo-
9.3 Mining Closed Frequent Itemsets: Charm algorithm
9.4 Non-Derivable Itemsets
9.6 Exercises
Sequence Mining
10.1 Frequent Sequences
10.2 Mining Frequent Sequences
10.2.1 Level-Wise Mining: GSP
10.2.3 Projection-Based Sequence Mining: PreﬁxSpan
10.3 Substring Mining via Suﬃx Trees
10.3.1 Suﬃx Tree
10.3.2 Ukkonen’s Linear Time Algorithm
10.5 Exercises
Graph Pattern Mining
11.1 Isomorphism and Support
11.2 Candidate Generation
11.2.1 Canonical Code
11.3 The gSpan Algorithm
11.3.1 Extension and Support Computation
11.3.2 Canonicality Checking
11.5 Exercises
Pattern and Rule Assessment
12.1 Rule and Pattern Assessment Measures
12.1.1 Rule Assessment Measures
12.1.2 Pattern Assessment Measures
12.1.3 Comparing Multiple Rules and Patterns
12.2 Signiﬁcance Testing and Conﬁdence Intervals
12.2.1 Fisher Exact Test for Productive Rules
12.2.2 Permutation Test for Signiﬁcance
12.2.3 Bootstrap Sampling for Conﬁdence Interval
12.4 Exercises
Representative-based Clustering
13.1 K-means Algorithm
13.2 Kernel K-means
13.3 Expectation Maximization (EM) Clustering
13.3.1 EM in One Dimension
13.3.2 EM in d-Dimensions
13.3.3 Maximum Likelihood Estimation
13.3.4 Expectation-Maximization Approach
13.5 Exercises
Hierarchical Clustering
14.1 Preliminaries
14.2 Agglomerative Hierarchical Clustering
14.2.1 Distance between Clusters
14.2.2 Updating Distance Matrix
14.2.3 Computational Complexity
14.4 Exercises and Projects
Density-based Clustering
15.1 The DBSCAN Algorithm
15.2 Kernel Density Estimation
15.2.1 Univariate Density Estimation
15.2.2 Multivariate Density Estimation
15.2.3 Nearest Neighbor Density Estimation
15.3 Density-based Clustering: DENCLUE
15.5 Exercises
Spectral and Graph Clustering
16.1 Graphs and Matrices
16.2 Clustering as Graph Cuts
16.2.1 Clustering Objective Functions: Ratio and Normalized Cut
16.2.2 Spectral Clustering Algorithm
Algorithm 16.1: Spectral Clustering Algorithm
16.2.3 Maximization Objectives: Average Cut and Modularity
16.3 Markov Clustering
17.1.3 Pair-wise Measures
17.1.4 Correlation Measures
17.2 Internal Measures
17.5 Exercises
Classiﬁcation
Probabilistic Classiﬁcation
18.1 Bayes Classiﬁer
18.1.1 Estimating the Prior Probability
18.1.2 Estimating the Likelihood
18.2 Naive Bayes Classiﬁer
18.4 Exercises
Decision Tree Classiﬁer
19.1 Decision Trees
19.2 Decision Tree Algorithm
Algorithm 19.1: Decision Tree Algorithm
19.2.1 Split-point Evaluation Measures
19.2.2 Evaluating Split-points
19.2.3 Computational Complexity
19.4 Exercises
Linear Discriminant Analysis
20.1 Optimal Linear Discriminant
20.2 Kernel Discriminant Analysis
20.4 Exercises
Support Vector Machines
21.1 Linear Discriminants and Margins
21.2 SVM: Linear and Separable Case
21.3 Soft Margin SVM: Linear and Non-Separable Case
21.3.1 Hinge Loss
21.4 Kernel SVM: Nonlinear Case
21.5 SVM Training Algorithms
21.5.1 Dual Solution: Stochastic Gradient Ascent
21.5.2 Primal Solution: Newton Optimization
Classiﬁcation Assessment
22.1 Classiﬁcation Performance Measures
22.1.1 Contingency Table Based Measures
22.1.2 Binary Classiﬁcation: Positive and Negative Class
22.1.3 ROC Analysis
22.2 Classiﬁer Evaluation
22.2.1 K-fold Cross-Validation
22.2.2 Bootstrap Resampling
22.2.3 Conﬁdence Intervals
22.2.4 Comparing Classiﬁers: Paired t-Test
22.3 Bias-Variance Decomposition
22.3.1 Ensemble Classiﬁers
22.5 Exercises
Index
0 of .
Results for:
P. 1
Data Mining and Analysis: Fundamental Concepts and Algorithms

# Data Mining and Analysis: Fundamental Concepts and Algorithms

Ratings: (0)|Views: 17|Likes:
Data Mining and Analysis:
Fundamental Concepts and Algorithms
Data Mining and Analysis:
Fundamental Concepts and Algorithms

### Availability:

See more
See less

05/23/2014

pdf

text

original

Pages 14 to 85 are not shown in this preview.