You are on page 1of 1

15. What is the role of statistical model in Data Analytics? 55.

55. Explain the concept of core points in the DBSCAN (Density-


A statistical model in data analytics serves as a mathematical Based Spatial Clustering of Applications with Noise) algorithm,
representation of relationships within data. It enables the and discuss their significance in the clustering process. Provide an
analysis of patterns, predictions, and inferences about a example to illustrate your explanation.
population based on a sample. Statistical models quantify Core Points in DBSCAN: Core points in DBSCAN are data points
uncertainty, aiding data interpretation and decision-making. within a dense neighborhood, indicating regions with sufficient
They offer a framework for extracting insights, validating data density. A core point has a minimum number of data points
hypotheses, and guiding data-driven strategies in various fields (minPts) within a specified radius (eps).
1.Describe why SVMs offer more accurate results than Logistic such as finance, healthcare, and marketing, contributing to Significance: Core points initiate cluster formation and connect
Regression. informed decision-making and robust analytics. neighboring points, distinguishing dense regions from sparse ones.
SVM try to maximize the margin between the closest support They are pivotal in identifying clusters of varying shapes and sizes,
vectors whereas logistic regression maximize the posterior class resilient to noise and outliers.
probability. @L R is used for solving Classification problems, Example: Consider a set of G PS coordinates where each core point
while SVM model is used for both Classification and regression. represents a location with at least five nearby points within a
@S V M is deterministic while LR is probabilistic. @L R is radius of 0.1 kilometers. These core points become central to
vulnerable to overfitting, while the risk of overfitting is less in forming distinct clusters, like urban centers, in spatial data.
SVM.

2.Explain about Probability Distribution and Entropy. Probability


Distributions: Probability distribution describes the likelihood
of different outcomes in a random experiment. It assigns
probabilities to each possible event, ensuring they sum to 1,
providing a basis for statistical analysis.
Entropy: Entropy measures the amount of surprise and data
present in a variable. In information theory, a random variable’s
entropy reflects the average uncertainty level in its possible
outcomes. Events with higher uncertainty have higher entropy.

3.Express Multiple one-way ANOVA on a two-way device.


One-way ANOVA: Analyzes differences in means for one
factor with multiple levels.
Two-way Design: Involves two independent variables
influencing a dependent variable.
If dealing with repeated measures in a two-way design,
consider using repeated measures ANOVA.
Typically, use two-way ANOVA for two independent
variables.
One-way ANOVA is suitable for one factor with multiple
levels.

4. Why you mean to take Big Data?


Big Data is embraced for its capacity to handle massive volumes
of diverse information, extracting valuable insights and
patterns. Its scalability and analytics potential empower
businesses to make data-driven decisions, enhance efficiency,
and gain a competitive edge in today's complex and fast-paced
digital landscape.

7. Compare and construct the relationship between Clustering


and Centroid.
Several approaches to clustering exist. Each approach is best
suited to a particular data distribution. Focusing on centroid-
based clustering using k-means : Centroid-based clustering
organizes the data into non-hierarchical clusters, in contrast to
hierarchical clustering defined below. k-means is the most
widely-used centroid-based clustering algorithm. Centroid-
based algorithms are efficient but sensitive to initial conditions
and outliers.

8. Write short note on Deep Learning.


Deep Learning, a subset of machine learning, employs deep
neural networks to learn from extensive data, excelling in tasks
like image recognition and natural language processing. With
multiple layers, these architectures autonomously extract
intricate features. Despite computational demands, deep
learning drives transformative advances in artificial intelligence,
impacting diverse industries with its capacity to understand and
model complex patterns.

9. Explain ANOVA.
ANOVA stands for Analysis of Variance. It is a statistical method
used to analyze the differences between the means of two or
more groups or treatments. It is often used to determine
whether there are any statistically significant differences
between the means of different groups. ANOVA compares the
variation between group means to the variation within the
groups. If the variation between group means is significantly
larger than the variation within groups, it suggests a significant
difference between the means of the groups.

10. Explain Quadratic Determinant Analysis.


Quadratic Discriminant Analysis (QDA) is a classification
algorithm used in statistics and machine learning. It's an
extension of Linear Discriminant Analysis (LDA) but allows for
different covariance matrices in each class. QDA calculates
quadratic decision boundaries to discriminate between classes,
making it more flexible in capturing non-linear relationships.
While it can perform well with complex data, QDA might be
sensitive to overfitting in high-dimensional spaces due to the
estimation of individual covariance matrices for each class.

11. Describe Probability Distribution in details.


Probability distribution is a statistical concept describing the
likelihood of different outcomes in a random experiment. It
assigns probabilities to each possible event, ensuring the sum of
all probabilities equals 1. Discrete distributions involve distinct
outcomes, while continuous distributions span a range of
values. The distribution can be described by its shape, center,
and spread. Common types include the normal, binomial, and
Poisson distributions, crucial for statistical analysis and
understanding uncertainty in various fields like finance, physics,
and biology.

13. What is the role of Statistical Model in data analytics?


A Statistical Model in data analytics serves as a mathematical
representation of relationships within data. It helps analyze
patterns, make predictions, and infer information about a
population based on a sample. B y quantifying uncertainty and
capturing underlying structures, statistical models aid in data
interpretation and decision-making. They provide a framework
for extracting meaningful insights, validating hypotheses, and
guiding data-driven strategies in diverse fields such as finance,
healthcare, and marketing.

14. Justify why SVM is so fast?


SVM performs and generalized well on the out of sample data.
Due to this as it performs well on out of generalization sample
data SVM proves itself to be fast as the sure fact says that in
SVM for the classification of one sample , the kernel function is
evaluated and performed for each and every support vectors.

15. Main characteristics of Data Analytics.


Data analytics is the process of examining, cleaning,
transforming, and interpreting data to extract valuable insights
and support decision-making. main characteristics of data
analytics are: ● Data collection from various sources. ● Data
cleaning and preprocessing for accuracy. ● Descriptive,
diagnostic, predictive, and prescriptive analytics. ● Data
visualization for better understanding. ● Use of machine
learning and AI. ● Handling big data and real-time analytics. ●
Emphasis on data security and privacy. ● An iterative process to
refine insights. ● Domain-specific and fosters a data-driven
culture
30. Describe briefly descriptive statistics.
Descriptive statistics involve summarizing and presenting key
features of a dataset, providing insights into its central tendencies,
variability, and distribution. Common measures include mean,
median, and mode for central tendency, along with standard
deviation for dispersion. Descriptive statistics help convey the
overall characteristics of data, aiding in the initial understanding and
interpretation of its essential features in quantitative terms.

You might also like