You are on page 1of 2

Descriptive Statistics: Probability Theory: Statistical Inference:

 Measures of central tendency  Basic probability  Sampling techniques


(mean, median, mode) concepts (random sampling,
 Measures of variability (variance,  Conditional probability stratified sampling,
standard deviation, range)  Bayes' theorem etc.)
 Percentiles and quartiles  Random variables and  Hypothesis testing
 Skewness and kurtosis probability distributions  Type I and Type II
(discrete and errors
continuous)  Confidence intervals
 Joint and marginal  t-tests and z-tests
probability distributions  Chi-square tests
 Expected value and  Analysis of variance
variance (ANOVA)

Regression Analysis: Experimental Design: Time Series Analysis:


 Simple linear regression  Control and treatment  Trend analysis
 Multiple linear regression groups  Seasonality and
 Assumptions of linear regression  Randomized controlled cyclicity
 Model evaluation and diagnostics trials (RCTs)  Autocorrelation and
(R-squared, adjusted R-squared,  Factorial designs partial autocorrelation
residuals analysis)  Blocking and  Stationarity and
 Logistic regression (binary and randomization differencing
multinomial  Analysis of variance  ARIMA models
(ANOVA)

Multivariate Analysis: Bayesian Statistics: Non-parametric Methods:


 Principal Component Analysis  Bayesian inference  Mann-Whitney U test
(PCA)  Prior and posterior  Wilcoxon signed-rank
 Factor Analysis distributions test
 Cluster Analysis (k-means,  Markov Chain Monte  Kruskal-Wallis test
hierarchical clustering) Carlo (MCMC) methods  Non-parametric
 Discriminant Analysis correlation (Spearman's
rank correlation)

Resampling Methods: Data Visualization: Statistical Software:


 Bootstrapping  Histograms  Familiarity with
 Cross-validation  Box plots statistical programming
 Scatter plots languages (R or Python)
 Bar charts  Experience with
 Heatmaps statistical libraries and
 Time series plots packages (e.g., pandas,
NumPy, scikit-learn)
Supervised Learning: Unsupervised Learning: Evaluation Metrics:
 Linear Regression  Clustering (k-means,  Accuracy, Precision,
 Logistic Regression hierarchical clustering, Recall, F1-score
 Decision Trees DBSCAN)  ROC curve and AUC
 Random Forests  Dimensionality  Confusion Matrix
 Support Vector Reduction (Principal  Mean Squared Error
Machines (SVM) Component Analysis - (MSE), Root Mean
 Naive Bayes PCA, t-SNE) Squared Error (RMSE)
 k-Nearest Neighbors  Anomaly Detection  R-squared, Adjusted R-
(k-NN)  Association Rules squared
 Gradient Boosting
(e.g., XGBoost,
LightGBM)

Cross-Validation and Model Regularization: Feature Engineering:


Selection:  L1 and L2  Handling Missing Data
 k-fold Cross-Validation Regularization (Lasso  Feature Scaling
 Hyperparameter and Ridge Regression)  One-Hot Encoding
Tuning  Elastic Net  Feature Selection
 Grid Search  Early Stopping Techniques
 Model Selection  Feature Extraction
Techniques (e.g., (e.g., PCA)
nested cross-
validation)

Ensemble Learning: Handling Imbalanced Data:


 Bagging  Undersampling
 Boosting  Oversampling
 Stacking  SMOTE (Synthetic
 Voting Classifiers Minority Over-
sampling Technique)
 Evaluation metrics for
imbalanced data (e.g.,
precision-recall curve)

Reinforcement Learning Reinforcement Learning Model Interpretability:


(basics): (advanced):  Feature Importance
 Markov Decision  Policy Gradients  Partial Dependence
Processes (MDPs)  Actor-Critic methods Plots
 Q-Learning  Proximal Policy  LIME (Local
 Deep Q-Network Optimization (PPO) Interpretable Model-
(DQN)  Deep Deterministic Agnostic Explanations)
 Policy Gradient (DDPG)  SHAP (SHapley
Additive exPlanations)

You might also like