Professional Documents
Culture Documents
Programming Skills
• Basic Python Programming with functional programming practise.
• Numpy for Numeric and Algebra Operation
• Pandas for Data Handling and Statistical operation
• Matplotlib, Seaborn for Data Visualization
• Scikit-Learn for Machine learning algorithm
• Excel knowledge is added advantage
Sr No Task Duration(Days)
1. Business Understanding 3
• Understands the business process
• Define and Frame the business problem
• Define the business objective
• Formulate the Milestone
2. Data Collection and Understanding 5
• Collect the data from different sources
• Understand the important features
• Indentify independent and dependent variables
3. Exploration Data Analysis 15
• Variable Identification
• Univariate Data Analysis
• Bivariate Data Analysis
• Multivariate Data Analysis
4. Data Preprocessing and Wrangling 10
• Missing values identification
• Scaling using Noramlization, Standarization
• Outliers Detection using Boxplot, IQR, Z-Score
• Special values, Obvious inconsistencies treatments
• Feature imputation using Hot-Deck, Cold-Deck,
Mean-substitution, Linear regression methods
5. Feature Engineering and Base line Model Training 8
• Discretization - Continuous Features and
Categorical Features
• Reframe Numerical Quantities – Scaling all
variable in one unit
• Crossing – Generate the new features from existing
data
• Train the baseline model and check the
performance with feature engineering and
without feature engineering
6. Feature Selection and Base line Model training 15
• Correlation
• Dimensionality Reduction - PCA
• Feature Importance Methods
o Filter based
▪ Correlation
▪ Chi-Square Test
▪ Anova
o Wrapper based
▪ Forward Selection
▪ Backward Selection
▪ Recursive Feature Elimination
o Embedded methods
▪ L1 Regularization
▪ L2 Regularization
• Model Training Comparision with Feature
selection variant
7. Data Sampling and Model Selection 15
• Data Sampling technique
o Random Sampling– train test split
o Stratified Sampling – Startified kfold
o Cross Validation
• Model Selection
o Linear Models
o Non-Linear Models
o Tree based Models
8. Hyper Parameter Tuning on Model 12
• Grid Search with cross validation
• Random Search with cross validation
• Baysian Search with cross validation
9. Model Integration 7
• Using Flask API
• Using Streamlit