Professional Documents
Culture Documents
Table of Contents
STATISTICS ........................................................................................................................................................... 6
Q1. WHAT IS THE CENTRAL LIMIT THEOREM AND WHY IS IT IMPORTANT? ........................................................................ 6
Q2. WHAT IS SAMPLING? HOW MANY SAMPLING METHODS DO YOU KNOW? ................................................................... 7
Q3. WHAT IS THE DIFFERENCE BETWEEN TYPE I VS TYPE II ERROR? .................................................................................. 9
Q4. WHAT IS LINEAR REGRESSION? WHAT DO THE TERMS P-VALUE, COEFFICIENT, AND R-SQUARED VALUE MEAN? WHAT IS THE
SIGNIFICANCE OF EACH OF THESE COMPONENTS? ................................................................................................................. 9
Q5. WHAT ARE THE ASSUMPTIONS REQUIRED FOR LINEAR REGRESSION? ........................................................................ 10
Q6. WHAT IS A STATISTICAL INTERACTION? .............................................................................................................. 10
Q7. WHAT IS SELECTION BIAS? .............................................................................................................................. 11
Q8. WHAT IS AN EXAMPLE OF A DATA SET WITH A NON-GAUSSIAN DISTRIBUTION? .......................................................... 11
DATA SCIENCE .................................................................................................................................................... 12
Q1. WHAT IS DATA SCIENCE? LIST THE DIFFERENCES BETWEEN SUPERVISED AND UNSUPERVISED LEARNING. ......................... 12
Q2. WHAT IS SELECTION BIAS? ............................................................................................................................. 12
Q3. WHAT IS BIAS-VARIANCE TRADE-OFF? ............................................................................................................... 12
Q4. WHAT IS A CONFUSION MATRIX? ..................................................................................................................... 13
Q5. WHAT IS THE DIFFERENCE BETWEEN “LONG” AND “WIDE” FORMAT DATA?............................................................... 14
Q6. WHAT DO YOU UNDERSTAND BY THE TERM NORMAL DISTRIBUTION? ...................................................................... 15
Q7. WHAT IS CORRELATION AND COVARIANCE IN STATISTICS?...................................................................................... 15
Q8. WHAT IS THE DIFFERENCE BETWEEN POINT ESTIMATES AND CONFIDENCE INTERVAL? ................................................. 16
Q9. WHAT IS THE GOAL OF A/B TESTING? ............................................................................................................... 16
Q10. WHAT IS P-VALUE? ....................................................................................................................................... 16
Q11. IN ANY 15-MINUTE INTERVAL, THERE IS A 20% PROBABILITY THAT YOU WILL SEE AT LEAST ONE SHOOTING STAR. WHAT IS THE
PROBABILITY THAT YOU SEE AT LEAST ONE SHOOTING STAR IN THE PERIOD OF AN HOUR? ........................................................... 16
Q12. HOW CAN YOU GENERATE A RANDOM NUMBER BETWEEN 1 – 7 WITH ONLY A DIE? .................................................... 17
Q13. A CERTAIN COUPLE TELLS YOU THAT THEY HAVE TWO CHILDREN, AT LEAST ONE OF WHICH IS A GIRL. WHAT IS THE
PROBABILITY THAT THEY HAVE TWO GIRLS? ....................................................................................................................... 17
Q14. A JAR HAS 1000 COINS, OF WHICH 999 ARE FAIR AND 1 IS DOUBLE HEADED. PICK A COIN AT RANDOM AND TOSS IT 10
TIMES. GIVEN THAT YOU SEE 10 HEADS, WHAT IS THE PROBABILITY THAT THE NEXT TOSS OF THAT COIN IS ALSO A HEAD? ................. 17
Q15. WHAT DO YOU UNDERSTAND BY STATISTICAL POWER OF SENSITIVITY AND HOW DO YOU CALCULATE IT? ......................... 18
Q16. WHY IS RE-SAMPLING DONE? ......................................................................................................................... 18
Q17. WHAT ARE THE DIFFERENCES BETWEEN OVER-FITTING AND UNDER-FITTING? ............................................................ 19
Q18. HOW TO COMBAT OVERFITTING AND UNDERFITTING? ......................................................................................... 19
Q19. WHAT IS REGULARIZATION? WHY IS IT USEFUL? .................................................................................................. 20
Q20. WHAT IS THE LAW OF LARGE NUMBERS? .......................................................................................................... 20
Q21. WHAT ARE CONFOUNDING VARIABLES? ........................................................................................................... 20
Q22. WHAT ARE THE TYPES OF BIASES THAT CAN OCCUR DURING SAMPLING? ............................................................... 20
Q23. WHAT IS SURVIVORSHIP BIAS? ........................................................................................................................ 20
Q24. WHAT IS SELECTION BIAS? WHAT IS UNDER COVERAGE BIAS? ............................................................................... 21
Q25. EXPLAIN HOW A ROC CURVE WORKS? .............................................................................................................. 21
Q26. WHAT IS TF/IDF VECTORIZATION? .................................................................................................................. 22
Q27. WHY WE GENERALLY USE SOFT-MAX (OR SIGMOID) NON-LINEARITY FUNCTION AS LAST OPERATION IN-NETWORK? WHY
RELU IN AN INNER LAYER?............................................................................................................................................ 22
DATA ANALYSIS.................................................................................................................................................. 23
Q1. PYTHON OR R – WHICH ONE WOULD YOU PREFER FOR TEXT ANALYTICS? ................................................................. 23
Q2. HOW DOES DATA CLEANING PLAY A VITAL ROLE IN THE ANALYSIS? ........................................................................... 23
Q3. DIFFERENTIATE BETWEEN UNIVARIATE, BIVARIATE AND MULTIVARIATE ANALYSIS........................................................ 23
Q4. EXPLAIN STAR SCHEMA. ................................................................................................................................. 23
Q5. WHAT IS CLUSTER SAMPLING? ........................................................................................................................ 23
Steve Nouri
Q6. DESCRIBE THE STRUCTURE OF ARTIFICIAL NEURAL NETWORKS? ............................................................................. 57
Q7. HOW ARE WEIGHTS INITIALIZED IN A NETWORK? ............................................................................................... 57
Q8. WHAT IS THE COST FUNCTION? ....................................................................................................................... 58
Q9. WHAT ARE HYPERPARAMETERS? ..................................................................................................................... 58
Q10. WHAT WILL HAPPEN IF THE LEARNING RATE IS SET INACCURATELY (TOO LOW OR TOO HIGH)? ................................... 58
Q11. WHAT IS THE DIFFERENCE BETWEEN EPOCH, BATCH, AND ITERATION IN DEEP LEARNING? ......................................... 58
Q12. WHAT ARE THE DIFFERENT LAYERS ON CNN? .................................................................................................... 58
Convolution Operation ...................................................................................................................................... 60
Pooling Operation ............................................................................................................................................. 62
Classification ..................................................................................................................................................... 63
Training ............................................................................................................................................................. 64
Testing ............................................................................................................................................................... 65
Q13. WHAT IS POOLING ON CNN, AND HOW DOES IT WORK? .................................................................................... 65
Q14. WHAT ARE RECURRENT NEURAL NETWORKS (RNNS)? ........................................................................................ 65
Parameter Sharing ............................................................................................................................................ 67
Deep RNNs ......................................................................................................................................................... 68
Bidirectional RNNs ............................................................................................................................................. 68
Recursive Neural Network ................................................................................................................................. 69
Encoder Decoder Sequence to Sequence RNNs ................................................................................................. 70
LSTMs ................................................................................................................................................................ 70
Q15. HOW DOES AN LSTM NETWORK WORK? ......................................................................................................... 70
Recurrent Neural Networks ............................................................................................................................... 71
The Problem of Long-Term Dependencies ......................................................................................................... 72
LSTM Networks.................................................................................................................................................. 73
The Core Idea Behind LSTMs ............................................................................................................................. 74
Q16. WHAT IS A MULTI-LAYER PERCEPTRON (MLP)? ................................................................................................. 75
Q17. EXPLAIN GRADIENT DESCENT. ......................................................................................................................... 76
Q18. WHAT IS EXPLODING GRADIENTS? .................................................................................................................... 77
Solutions ............................................................................................................................................................ 78
Q19. WHAT IS VANISHING GRADIENTS? .................................................................................................................... 78
Solutions ............................................................................................................................................................ 79
Q20. WHAT IS BACK PROPAGATION AND EXPLAIN IT WORKS. ....................................................................................... 79
Q21. WHAT ARE THE VARIANTS OF BACK PROPAGATION? ............................................................................................ 79
Q22. WHAT ARE THE DIFFERENT DEEP LEARNING FRAMEWORKS? .................................................................................. 81
Q23. WHAT IS THE ROLE OF THE ACTIVATION FUNCTION? ............................................................................................ 81
Q24. NAME A FEW MACHINE LEARNING LIBRARIES FOR VARIOUS PURPOSES..................................................................... 81
Q25. WHAT IS AN AUTO-ENCODER? ........................................................................................................................ 81
Q26. WHAT IS A BOLTZMANN MACHINE? ................................................................................................................. 82
Q27. WHAT IS DROPOUT AND BATCH NORMALIZATION? ............................................................................................. 83
Q28. WHY IS TENSORFLOW THE MOST PREFERRED LIBRARY IN DEEP LEARNING? ............................................................. 83
Q29. WHAT DO YOU MEAN BY TENSOR IN TENSORFLOW? .......................................................................................... 83
Q30. WHAT IS THE COMPUTATIONAL GRAPH? ........................................................................................................... 83
Q31. HOW IS LOGISTIC REGRESSION DONE? ............................................................................................................... 83
MISCELLANEOUS ................................................................................................................................................ 84
Q1. EXPLAIN THE STEPS IN MAKING A DECISION TREE. ................................................................................................. 84
Q2. HOW DO YOU BUILD A RANDOM FOREST MODEL? ................................................................................................ 84
Q3. DIFFERENTIATE BETWEEN UNIVARIATE, BIVARIATE, AND MULTIVARIATE ANALYSIS....................................................... 85
Univariate .......................................................................................................................................................... 85
Bivariate ............................................................................................................................................................ 85
Multivariate ....................................................................................................................................................... 85
Q4. WHAT ARE THE FEATURE SELECTION METHODS USED TO SELECT THE RIGHT VARIABLES? .............................................. 86
Filter Methods ................................................................................................................................................... 86
Steve Nouri
Wrapper Methods ............................................................................................................................................. 86
Q5. IN YOUR CHOICE OF LANGUAGE, WRITE A PROGRAM THAT PRINTS THE NUMBERS RANGING FROM ONE TO 50. BUT FOR
MULTIPLES OF THREE, PRINT "FIZZ" INSTEAD OF THE NUMBER AND FOR THE MULTIPLES OF FIVE, PRINT "BUZZ." FOR NUMBERS WHICH
ARE MULTIPLES OF BOTH THREE AND FIVE, PRINT "FIZZBUZZ." .............................................................................................. 86
Q6. YOU ARE GIVEN A DATA SET CONSISTING OF VARIABLES WITH MORE THAN 30 PERCENT MISSING VALUES. HOW WILL YOU
DEAL WITH THEM?....................................................................................................................................................... 87
Q7. FOR THE GIVEN POINTS, HOW WILL YOU CALCULATE THE EUCLIDEAN DISTANCE IN PYTHON? ........................................ 87
Q8. WHAT ARE DIMENSIONALITY REDUCTION AND ITS BENEFITS? ................................................................................. 87
Q9. HOW WILL YOU CALCULATE EIGENVALUES AND EIGENVECTORS OF THE FOLLOWING 3X3 MATRIX? ................................. 88
Q10. HOW SHOULD YOU MAINTAIN A DEPLOYED MODEL? ............................................................................................ 88
Q11. HOW CAN A TIME-SERIES DATA BE DECLARED AS STATIONERY? ............................................................................... 88
Q12. 'PEOPLE WHO BOUGHT THIS ALSO BOUGHT...' RECOMMENDATIONS SEEN ON AMAZON ARE A RESULT OF WHICH ALGORITHM?
89
Q13. WHAT IS A GENERATIVE ADVERSARIAL NETWORK?.............................................................................................. 89
Q14. YOU ARE GIVEN A DATASET ON CANCER DETECTION. YOU HAVE BUILT A CLASSIFICATION MODEL AND ACHIEVED AN ACCURACY
OF 96 PERCENT. WHY SHOULDN'T YOU BE HAPPY WITH YOUR MODEL PERFORMANCE? WHAT CAN YOU DO ABOUT IT? ................... 90
Q15. BELOW ARE THE EIGHT ACTUAL VALUES OF THE TARGET VARIABLE IN THE TRAIN FILE. WHAT IS THE ENTROPY OF THE TARGET
VARIABLE? [0, 0, 0, 1, 1, 1, 1, 1] .................................................................................................................................. 90
Q16. WE WANT TO PREDICT THE PROBABILITY OF DEATH FROM HEART DISEASE BASED ON THREE RISK FACTORS: AGE, GENDER, AND
BLOOD CHOLESTEROL LEVEL. WHAT IS THE MOST APPROPRIATE ALGORITHM FOR THIS CASE? CHOOSE THE CORRECT OPTION: ........... 90
Q17. AFTER STUDYING THE BEHAVIOR OF A POPULATION, YOU HAVE IDENTIFIED FOUR SPECIFIC INDIVIDUAL TYPES THAT ARE
VALUABLE TO YOUR STUDY. YOU WOULD LIKE TO FIND ALL USERS WHO ARE MOST SIMILAR TO EACH INDIVIDUAL TYPE. WHICH
ALGORITHM IS MOST APPROPRIATE FOR THIS STUDY?.......................................................................................................... 90
Q18. YOU HAVE RUN THE ASSOCIATION RULES ALGORITHM ON YOUR DATASET, AND THE TWO RULES {BANANA, APPLE} => {GRAPE}
AND {APPLE, ORANGE} => {GRAPE} HAVE BEEN FOUND TO BE RELEVANT. WHAT ELSE MUST BE TRUE? CHOOSE THE RIGHT ANSWER: .. 90
Q19. YOUR ORGANIZATION HAS A WEBSITE WHERE VISITORS RANDOMLY RECEIVE ONE OF TWO COUPONS. IT IS ALSO POSSIBLE THAT
VISITORS TO THE WEBSITE WILL NOT RECEIVE A COUPON. YOU HAVE BEEN ASKED TO DETERMINE IF OFFERING A COUPON TO WEBSITE
VISITORS HAS ANY IMPACT ON THEIR PURCHASE DECISIONS. WHICH ANALYSIS METHOD SHOULD YOU USE?.................................... 91
Q20. WHAT ARE THE FEATURE VECTORS? .................................................................................................................. 91
Q21. WHAT IS ROOT CAUSE ANALYSIS? ..................................................................................................................... 91
Q22. DO GRADIENT DESCENT METHODS ALWAYS CONVERGE TO SIMILAR POINTS? ............................................................. 91
Q23. WHAT ARE THE MOST POPULAR CLOUD SERVICES USED IN DATA SCIENCE? .............................................................. 91
Q24. WHAT IS A CANARY DEPLOYMENT? .................................................................................................................. 92
Q25. WHAT IS A BLUE GREEN DEPLOYMENT? ............................................................................................................ 93
Steve Nouri