You are on page 1of 5


Test 1 & 2 Mixed question and Options. BATC_631_ Data mining

Question No. Question Ans Options
1. minimize the sume of squared residuals over all data points.2. Maximize the sume of
1 Least squares regression works by choosing the unique regression line that 1 squared residual over all the data points.3. Minimize the sume of squared residuals over some
data pints. 4. none of these.
2 In general….. Null hypothesis if p-value less than level of significane alpha( a small preset value ,say 0.05) 2 1. accept 2. reject,3 more rest required 4. none of these

3 Hypothesis testing with too many varibale may result into 2 1.understanding the data 2. overfitting the data 3. perfect fitting the data 4. none of these

4 The result of min-max normalization is always in the range. 2 1 -1 to 1. 2 0 to 1 3, 0 to 5 4. none of these

4 1. RMSE ( Root mean square error) 2. MSE (Mean square error) 3. MAE ( mean absoulte error)
5 ……. Is known as the standard error of the estimate.
Not sure? 4. None of these

6 Generally the high compelxity model has a…. Bias ( in return of the rror rate on the traning set), it has a….. 2 1. high low 2. low high 3. low low 4. high high.

According to the minium descripitive legth pricinple, it quantifies that the best representation ( or decription)
7 of the a model or body of data is the one that … the information required ( in bits) to endoe (i) the model and 1 1. Minimize 2. maximize 3. neither minize nor maximize 4. none of these.
(ii)the exceptions to the model.

8 The factor solutions provided by factor analysis are not invarient to ….. 1 1. transformations. 2. rotation 3. scaling 4. noneof these.

The proportion of flase positivies of flase negatives, which are addivtive inverses of the proportions of….. And 1. True positive ture negatives 2. true negatives true positives 3. Flase positives, flase negative
9 1
th proportions …… respectively 4. None of these.

10 Thumb rule is flag observations whose standardized exceed… in aboulte value as being outliers. 1 1. 2 2. 3 3. 5 4.1

11 For the most real world data skewness is 1 1. postivie 2. neagative 3. zero 4. none of these.

It is to be streesed that model evaluation techniques should be performed on the data set, rather than on
12 1 1. Test, 2. verification 3. Training 4. none of these.
training set, or on the data set as a whole.
……. Sample size is the only way to decrease the margin of error while maintaining a constant level of 1
13 1. Increasing 2. decrasing 3. keeping constant 4. none of these.
confidence. ??

In ANOVA for continous variable, as extension of two sample t-tests, if we have three fold paritition of data
14 1 1. Mean 2. variance. 3 error 4. None of these.
set, then it analyzes that the ……. Value of the continous variable is the same across the subsets of data.

15 …… is also a good estimate of the overall variance, but only on the condition that the null hypothesis is ture. 2 1. MSE 2. MSR 3. RMSE 4. none of these.

16 where x and y are ….. As the value of the x increases, the value of the y tends to dereases. 1. positively corelated 2. neagtively corelated 3, uncorrleated 4. all of these.

17 What are the values of coefficent of determination and SSE respetively of perfect case ( SSR=SST)? 3 1.0,1 2. 1,1, 3. 1,0 4.1,-1

18 In the …. Task, analysts try to find ways to describe patterns and trends lying within the data. 3 1. estimation 2. prediction 3. description 4. classification

This study source was downloaded by 100000824045226 from on 01-03-2023 00:21:17 GMT -06:00

1. gathering data. 2 plotting data. 3. finding useful patterns and trends in large data sets. 4.
19 data mining is a process of 3
filtering data.

20 In the regression model, changing the ordering of the variables into the model changes nothing expcet the ….. 1 1. sequential sum of squares 2. sum of squares 3. Difference of squares 4. none of these.

21 Decrasing the value of confidence level is always ……. To reduce margin of error wrt constant sample size. 3 1. Recommended. 2 no affect 3. not recommended 4. none of these.

Extrapolation referes to estmates and predictions of the traget variables made using the regression equation
22 1 1. x 2. y 3. both a nd b 4. none of these
with values of the predictor variable outside of the range of the values of ….. In the data set.
1. difference in means 2. difference in propertion 3. homogeneity of propertions 4. none of
23 For multinomial variable, generally the test is used for 2

24 Most data alogrithms searche for patterns and structures among all the variables with respect to…. 3 1. error 2. model 3. target. 4. none of these

1. We are 95% confident that the population mean number of customer service calss for all
customers falls between some range. 2. We are 95% confident that the population mean
25 95% confidence interval about the mean number of customer service calls for all customers indicates. 1 number of customer service calss for all customers falls between some range.3. We are 5%
confident that the population mean number of customer service calss for all falls between
some range. 4. none of these.
Generally F-test is used to find significance of the regression mode in which F-test considers the… relationship
26 1 1. Linear 2. non-linear 3. both a and b 4. none of these.
between the traget variable y and the set of prictors taken as a whole but not as individual predictor.

1. difference in means 2. difference in propertion 3. homogeneity of propertions 4. none of

27 For continous variable, generally two sample t test are used for 1

28 According to CRISP_DM how many phases are there in the data mining project life cycle? 2 1. five 2. six 3. four 4. seve.

1. a priori hypothesis in mind for which needs to be validated.2. there is no priori hypthesis but
29 For data mining in general data analyst has….. 2 task is to find actionable inference from data. 3. there is model of data exists however, it needs
to get output from model with data set, 4. none of these.
Generally, by increasing complexity of model, it performs well on traning set and many results in… on test
30 1 1. overfitting 2. underfitting 3. perfectly well . 4. none of these.

In ANOVA, the F distribution statistics, to reject null hypothesis, the F-data, will be ….. When between sample
1 1 1. larger greater.2. small greater 3. small lesser. 4. large lesser.
variability is much…… than within sample variability.

2 For most of the real world data is skweness is 1 1. positive 2. negative 3. zero 4.none of these.

1. We are 95% confident that the population mean number of customer service calss for all
customers falls between some range. 2. We are 95% confident that the population mean
3 95% confidence innerval about the mean number of customer service calls for all customers indicates. 1 number of customer service calss for all customers falls between some range.3. We are 5%
confident that the population mean number of customer service calss for all falls between
some range. 4. none of these.
1. Each method is distinct and the are no correlation between methods.2. these are multipile
4 In the real world applications in general data mining mehtods are wide spread applicablity and 2 methods to be used for a real world task. 3. there are multiple methods to be used for real
world task. 4. all of these.

This study source was downloaded by 100000824045226 from on 01-03-2023 00:21:17 GMT -06:00

5 Which of the following method is best for numerical variables? 1 1. Equal width binning.2. equal frequency binning 3. binning by clustering. 4. none of these.

6 …… is always a good estimate of the overall varience regardless of wheather the null hypotheis is ture or not. 1 1. MSE 2. MSR 3. RMSE 4. none of these.

1. dimensionally reduction of given set of attirbutes 2. find correlation among set of attributes.
7 Principal component analysis is used for 3
3. both a & b 4. none of these.

8 Which of the following is uded as standard error of estimate for linear regrression models? 1 1. RMSE 2MSE 3. MAE 4. none of these.

In the general a user defined composite is simply a ….. Combination of the variables, which combines several
9 3 1. homogenous 2. superposition 3. linear 4. none of these.
variables together a sinle composite measure.

10 In general….. Null hypothesis if p-value less than level of significane alpha( a small preset value ,say 0.05) 2 1. accept 2. reject,3 more rest required 4. none of these

1. just cleaning data 2. just compressing data. 3. guessing about present output without any data
11 Predictive analytics is the process of 4
4. information retrival to make useful predictions about future outcomes.

12 Hypothesis testing with too many variables may result into 2 1. Underfitting data 2. overfitting data 3. perfect fitting data 4. none of these.

In ANOVA for continous variable, as extension of two sample t-tests, if we have three fold paritition of data
13 1 1. Mean 2. variance. 3 error 4. None of these.
set, then it analyzes that the ……. Value of the continous variable is the same across the subsets of data.

1. differnce in means 2. difference in proportion 3. homogeneity of proportions 4. none of

14 For flag variable, generally two sample z tests are used for 2

15 In which phase of DRISP-DM report is generated? 4 1. Data understading phase 2. modeling phase 3. evaluation phase4. deploymnet phase.

Generally, by increasing complexity of model, it performs well on traning set and many results in… on test
16 1 1. overfitting 2. underfitting 3. perfectly well . 4. none of these.
The proportion of flase positivies of flase negatives, which are addivtive inverses of the proportions of….. And 1. True positive ture negatives 2. true negatives true positives 3. Flase positives, flase negative
17 1
th proportions …… respectively 4. None of these.

18 Sample mean is equal to population mean, 2 1. always 2. when sample represents popluation 3. cannot equal 4. none of these.

19 Outlier may represented in data entry. 3 1. peak 2. Mean 3. Error 4. None of these.

This study source was downloaded by 100000824045226 from on 01-03-2023 00:21:17 GMT -06:00

20 In ANOVA, the F distribution statistics, F-data is calculated in the ratio of 3 1. MSTR/MSTE 2. MSE/MSTR 3 MSTR/MSE 4 MSTE/MSE.

21 Generally the low compelxity model has a…. Bias it has a…..vairence 1 1. high low 2. low high 3. low low 4. high high.

1. gathering data. 2 plotting data. 3. finding useful patterns and trends in large data sets. 4.
22 data mining is a process of 3
filtering data.
……. Sample size is the only way to decrease the margin of error while maintaining a constant level of
23 1 1. Increasing 2. decrasing 3. keeping constant 4. none of these.
1. RMSE ( Root mean square error) 2. MSE (Mean square error) 3. MAE ( mean absoulte error)
24 ……. Is known as the standard error of the estimate. 1
4. None of these

25 The result of min-max normalization is always in the range. 2 1 -1 to 1. 2 0 to 1 3, 0 to 5 4. none of these

1. reject null hypothesis 2. accept null hypothesis 3. reject alternative hypothesis 4. accept
26 Generally in ANOVA, the small p value and large F data value leads to 1
alternate hypothesis.

According to the minium descripitive legth pricinple, it quantifies that the best representation ( or decription)
27 of the a model or body of data is the one that … the information required ( in bits) to endoe (i) the model and 1 1. Minimize 2. maximize 3. neither minize nor maximize 4. none of these.
(ii)the exceptions to the model.

A…… confidence interval for 'mu' is equvalent to a two talied hypothesis test ofr mu with level of siginificane
28 1 1. 100 (1- alpha)% 2. 100(alpha-1)% 3. 100 (1-alpha)% 4. 100/(alpha-1)%

A multiplier regression model uses a …. Surface, such as a…. To approximate the realtionship between a 1. Linear plane or hyperplane 2. non linear, parabola, or hyper bola 3. both a and b 4. none of
29 1
continous response ( target) variable and a set of predictor variables. these

…….. Will treat all errors equally, whether outliers or not, and thereby avoid the problem of under influence of
30 1 1. MAE 2. MSE 3. RMSE 4. none of these.

This study source was downloaded by 100000824045226 from on 01-03-2023 00:21:17 GMT -06:00

This study source was downloaded by 100000824045226 from on 01-03-2023 00:21:17 GMT -06:00
Powered by TCPDF (

You might also like