DSR Mid1

1.What are the applications of Data science?
Ans)Fraud and Risk Detection —>Healthcare.—>Internet Search.—>Targeted Advertising.

—>Website Recommendations.—>Advanced Image Recognition.—>Speech Recognition.
—>Airline Route Planning. —>In Search Engines—>In Transport
2.Explain half space in linear algebra?

Ans)A half-space is a convex set, the boundary of which is a hyperplane. A
half-space separates the whole space in two halves. —>The complement of the
half-space is the open half-space . —>A half-space is a set defined by a single
affine inequality. Precisely, a half-space in is a set of the form
where , . A half-space is a convex

set, the boundary of which is a hyperplane. A half-space separates the whole
space in two halves. The complement of the half-space is the open half-space
.
3.Explain Type-I and Type-II error in hypothesis testing?

Ans)A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually true in the
population. Ex: The test result says you have coronavirus, but you actually don't.
A type II error (false-negative) occurs if the investigator fails to reject a null hypothesis that is actually false in the
population. Ex: The test result says you don't have coronavirus, but you actually do.
4.Differentiate between continuous and discrete random variables?

Ans)Discrete random variables can only take on a finite number of values. For example, the outcome of rolling a
die is a discrete random variable, as it can only land on one of six possible numbers.
Continuous random variables, on the other hand, can take on any value in a given interval.
COMPARISON DISCRETE VARIABLE CONTINUOUS VARIABLE
Meaning Discrete variable refers to the variable Continuous variable alludes to a variable which
that assumes a finite number of assumes infinite number of different values.
isolated values.
Range of specified no. Complete Incomplete
Values Values are obtained by counting. Values are obtained by measuring.
Classification Non-overlapping Overlapping
Assumes Distinct or separate values. Any value between the two values.
5.Write the application of data science in the present scenario?

Ans) —>Healthcare: Data science can identify and predict disease, and personalize healthcare recommendations.
● Transportation: Data science can optimize shipping routes in real-time.
● Sports: Data science can accurately evaluate athletes’ performance.
● Government: Data science can prevent tax evasion and predict incarceration rates.
● E-commerce: Data science can automate digital ad placement.
● Gaming: Data science can improve online gaming experiences.
● Social media: Data science can create algorithms to pinpoint compatible partners.
6.Explain about hyper-planes in n-dimensional vectors?
Ans)Hyperplane H is a linear subspace of a vector space V such that the basis of H has cardinality one less than the
cardinality of the basis for V. In other words, if V is an n-dimensional vector space than H is an (n-1)-dimensional
subspace. Examples of hyperplanes in 2 dimensions are any straight line through the origin. In 3 dimensions, any
plane containing the origin. In higher dimensions, a hyperplane is a member of an affine family of (n-1)-dimensional
subspaces, such that the entire space is partitioned into these affine subspaces. This family will be stacked along the
unique vector (up to sign) that is perpendicular to the original hyperplane.
7.Explain different types of distributed functions in Random variables?

Ans)The probability distribution for a random variable describes how the probabilities are distributed over the values
of the random variable. For a discrete random variable, x, the probability distribution is defined by a probability mass
function, denoted by f(x).
8.What is a random variable? Explain briefly.

Ans)A random variable is a variable whose value is unknown or a function that assigns values to each of an
experiment's outcomes. A random variable can be either discrete (having specific values) or continuous (any value in
a continuous range). —>There are three types of random variables- discrete random variables, continuous random
variables, and mixed random variables.
———————————————————————————————————————————————————
LAQ
1.(a) Difference between Probability Distribution and Probability Density Function
Probability distribution function and probability density function are functions defined over the sample space, to
assign the relevant probability value to each element.
• Probability distribution functions are defined for the discrete random variables while probability density functions are
defined for the continuous random variables.
• Distribution of probability values (i.e. probability distributions) are best portrayed by the probability density function
and the probability distribution function.
• The probability distribution function can be represented as values in a table, but that is not possible for the
probability density function because the variable is continuous.
• When plotted, the probability distribution function gives a bar plot while the probability density function gives a
curve.
• The height/length of the bars of the probability distribution function must add to 1 while the area under the curve of
the probability density function must add to 1.
• In both cases, all the values of the function must be non-negative.
Evaluate the cumulative distribution function(CDF): The cumulative distribution function is used to describe the
probability distribution of random variables. It can be used to describe the probability for a discrete, continuous or
mixed variable. It is obtained by summing up the probability density function and getting the cumulative probability for
a random variable.
—>CDF of random variable X is defined as FX(x) = P(X ≤ x), for all x ∈ R. Note that the subscript X indicates that
this is the CDF of the random variable X.
—>Note that the CDF gives us P(X≤x). To find P(X<x), for a discrete random variable, we can simply write
P(X<x)=P(X≤x)−P(X=x)=FX(x)−PX(x). Let X be a discrete random variable with range RX={1,2,3,...}.
(b)
2. (a)Question : Let X be a random variable, and P(X=x) is the PMF given by,
X 0 1 2 3 4 5 6 7
P(X=x) 0 k 2k 2k 3k k2 2k2 7k2+k

(i) K (ii) P(X<6) (ii)P(X≥6) (iv)P(0<x≤ 5 )
Ans) ∑P(xi)=1
Therefore, 0+k+2+2k+3k+k2+2k2+7k2+k=1 ⇒9k+10k2=1 ⇒ 10k2+9k–1=0 ⇒Therefore, k = 1/10 and k = -1
(b)
———————————————————————————————————————————————————
3.Eigen values and eigen vectors
—————————————————————————————————————————————————
4.(a) Difference between Z-test and T-test. Explain with example
—> Definition of Z-Test: Z-test is the statistical test used to analyze whether two population means are different or
not when the variances are known, and the sample size is large. The z-test is based on
the normal distribution. The assumptions for Z-test are:
→All observations are independent. →Size of sample should be more than 30. →The Z
distribution is normal when the mean is 0, and the variance is 1.
The test statistic is defined by: ★X bar is the sample mean
★σ is the population standard deviation ★n is the sample size ★μ is the population mean
Example: Let's say that the mean score of students in a class is greater than 70 with a standard deviation of 10. If a
sample of 50 students was selected with a mean score of 80, calculate the Z-value to check if there is enough
evidence to support this claim at a 0.05 significance level.
Solution: Here, the sample size is 50 and we know the standard deviation. This is a case of a right-tailed
one-sample z test. The Null hypothesis is the mean score is 70. The Alternative hypothesis is mean score is greater
than 70. From the z-table, the critical value at alpha = 0.05 is 1.645 ★Xbar = 80 ★μ = 70 ★n = 50 ★σ = 10
Substituting the values in the formula, you will get the Z value to be equal to 7.09. Since 7.09 > 1.645 thus, the null
hypothesis is rejected and there is enough to support that the mean of the class is greater than 70.
—> Definition Of T-Test: A T-test is a parametric test applied to identify how the average of two data sets differs
when variance is not given. When the sample size is small, and the population standard
deviation is unknown, the T-test is used in conjunction with the t-distribution. The
degree of freedom significantly impacts the shape of a t-distribution. The number of
independent observations in a given set of observations is the degree of freedom.
The assumptions taken for the T-Test: ★All the data points are independent. ★The
sample size is very small. ★The sample size should be taken and recorded accurately .
★Xbar is the sample mean ★s is the sample standard deviation ★n is the sample size ★μ is the population mean
Example: A store wants to improve its sales. The previous sales data shows that the average sale of 30 salesmen
was $40 per sale. After some training, the current data showed an average sale of $60 per transaction. If the
standard deviation given is $20, find the t-value. Did training improve the sales?
Solution: ★Xbar = 60 ★s = 20 ★n = 30 ★μ = 40
Substituting the values in the formula, you will get t-value = 5.47. For the alpha value of 0.05, the critical value is
1.711. Here 5.47 > 1.711, we can reject the null hypothesis and conclude that training did affect sales.
Comparison Table Between Z-Test Vs T-Test
(b)What is hypothesis testing?
Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a
population parameter or a population probability distribution. First, a tentative assumption is made about the
parameter or distribution.
Alternative Hypothesis
Null Hypothesis
In the alternative hypothesis, there is some

In the null hypothesis, there is no relationship between
relationship between the two variables i.e. They are
the two variables.
dependent upon each other.
Generally, researchers and scientists try to reject or Generally, researchers and scientists try to accept or
disprove the null hypothesis. approve the null hypothesis
If the alternative hypothesis gets accepted

If the null hypothesis is accepted researchers have to
researchers do not have to make changes in their
make changes in their opinions and statements.
opinions and statements.
Here no effect can be observed i.e. it does not affect

Here effect can be observed i.e. it affects the output.
output.
Here the testing process is implicit and indirect. Here the testing process is explicit and direct.
This hypothesis is denoted by H0. This hypothesis is denoted by Ha or H1.
It gets accepted if we fail to reject the null

It is generally used when we reject the null hypothesis.
hypothesis.
In this hypothesis, the p-value is smaller than the In this hypothesis, the p-value is greater than the
significance level. significance level.

DSR Mid1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSR Mid1

Uploaded by

Copyright:

Available Formats

1.What are the applications of Data science?

Ans)Fraud and Risk Detection —>Healthcare.—>Internet Search.—>Targeted Advertising.

2.Explain half space in linear algebra?

where , . A half-space is a convex

3.Explain Type-I and Type-II error in hypothesis testing?

4.Differentiate between continuous and discrete random variables?

Range of specified no. Complete Incomplete

Values Values are obtained by counting. Values are obtained by measuring.

Classification Non-overlapping Overlapping

5.Write the application of data science in the present scenario?

7.Explain different types of distributed functions in Random variables?

8.What is a random variable? Explain briefly.

P(X=x) 0 k 2k 2k 3k k2 2k2 7k2+k

In the alternative hypothesis, there is some

If the alternative hypothesis gets accepted

Here no effect can be observed i.e. it does not affect

This hypothesis is denoted by H0. This hypothesis is denoted by Ha or H1.

It gets accepted if we fail to reject the null

You might also like