You are on page 1of 1

1.

one hot encoding is preferred by most of the ml engineers despite increase dimensionality of
dataset. justify your answer with example
2. demonstrate the steps for performing forward feature selection and backward feature
selection. how will you check which one is well suited ?
3. suppose you have a dataset and you need to apply feature engineering. how will you decide
whether you will apply feature extraction or feature selection ?

4. suppose you have collected data through questionaries and need to pre process the data. give
answer of following questions

a. if we have a date column in our dataset then how will you perform feature engineering

b. when would you remove co related variables ?

c. how do you transform a skewed distribution into a normal distribution ?

d. how do you handle wrong values ?

e. how do you perform feature engineering on unknown features ?

5. consider a dataset and solve the following queries :

Region age income onine_shopper

India 49 86400 no

Brazil 32 56700 yes

USA 35 64800 no

Brazil 43 73200 no

USA 45 yes

India 40 69600 yes

Brazil 62400 no

India 53 94800 yes

USA 55 99600 no

India 42 80400 yes

a. The data in data.txt file. Brief the step for reading and creating pandas data frame.

b. Find missing values and imputed those by proper method?

c. Find categorical values and apply appropriate encoding method

d. Which of all features should be chosen for model building, why and how ?

e. How will you generate heat map for corelation checking ?

f. Will you generate some new features? If yes then why and how ?

You might also like