ASSIGNMENT
Q1: Define PCA in brief.
- Principal Component Analysis (PCA) is a statistical technique used for reducing the dimensionality
of data while preserving as much variance as possible.
- It transforms correlated features into a set of linearly uncorrelated components called principal
components.
Q2: Why do we need dimensionality reduction? What are its drawbacks?
- Need for Dimensionality Reduction:
- Reduces computational cost and storage.
- Helps in better visualization of high-dimensional data.
- Removes noise and redundant features.
- Improves model performance by avoiding overfitting.
- Drawbacks:
- May lead to information loss.
- Interpretation of transformed features can be difficult.
- Some models may not perform well with reduced data.
Q3: Explain the Limitations of PCA.
- Assumes linear relationships among variables, which may not always be true.
- Sensitive to scaling; features must be standardized before applying PCA.
- May discard useful information along with noise.
- Difficult to interpret principal components.
- Performance depends on the choice of the number of components retained.
Q4: Should one remove highly correlated variables before doing PCA?
- No, highly correlated variables are the main reason PCA is used, as it transforms correlated
variables into uncorrelated principal components.
- However, if variables are extremely redundant, removing some may improve computational
efficiency before applying PCA.