Welcome to Scribd!

Rahul Datta - Assignment

Uploaded by

0% found this document useful (0 votes)

8 views2 pages

Redundant, trivial, and obsolete data can be identified and resolved using correlation and covariance analysis. Pearson's chi-square test can also be used to test if observed frequency distributions are significantly different than expected distributions, helping to identify redundant data. The chi-square test statistic calculates the sum of the squared differences between observed and expected frequencies, with a larger chi-square value indicating a bigger difference between observations and expectations. Comparing the chi-square value to a critical value determines if the difference is statistically significant.

Original Description:

Use of Data Mining to generate predictive model for future predictions

Original Title

2101150_Rahul Datta_Assignment

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views2 pages

Rahul Datta - Assignment

Uploaded by

Rahul Datta

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Q How to handle redundant data through Co-relations and Co-variance analysis via example?

A- We can check if an attribute is redundant or not using correlation analysis. Given two attributes,
we can measure how strongly one attribute implies the other, based on the available data.

Let us suppose we have numeric attributes. We can use the correlation coefficient and covariance,
both of which assess how one attribute’s values vary from those of another.

• Redundant data is duplicate data stored in multiple places within the same system or across
multiple systems. Intranet networks often contain a great deal of redundant content.
• Trivial data is information that does not need to be kept. Specifically, trivial information is
content that does not contribute to corporate knowledge, business insight or record-keeping
requirements.
• Obsolete data is information that is incorrect, incomplete or simply no longer in use. It can
include outdated data that has been superseded by new information.

Common examples of redundant, trivial and obsolete data include:

• Unneeded duplicate copies of emails

• Content that would not be subject to discovery in a lawsuit and that does not need to be
retained to meet regulations or legal requirements
• Old server session cookies
• Web content that is no longer accurate
• Documents that have been superseded by new content

Redundancy can be resolved using Pearson Chi-Square test.

Pearson’s chi-square (Χ2) tests, often referred to simply as chi-square tests, are among the most
common nonparametric tests. Nonparametric tests are used for data that don’t follow the
assumptions of parametric tests, especially the assumption of a normal distribution.

If you want to test a hypothesis about the distribution of a categorical variable you’ll need to use a
chi-square test or another nonparametric test. Categorical variables can be nominal or ordinal and
represent groupings such as species or nationalities. Because they can only have a few specific
values, they can’t have a normal distribution.

Test hypotheses about frequency distributions

There are two types of Pearson’s chi-square tests, but they both test whether the observed
frequency distribution of a categorical variable is significantly different from its expected frequency
distribution. A frequency distribution describes how observations are distributed between different
groups.

Frequency distributions are often displayed using frequency distribution tables. A frequency
distribution table shows the number of observations in each group. When there are two categorical
variables, you can use a specific type of frequency distribution table called a contingency table to
show the number of observations in each combination of groups.
A chi-square test (a chi-square goodness of fit test) can test whether these observed frequencies are
significantly different from what was expected, such as equal frequencies.

The chi-square formula

Both of Pearson’s chi-square tests use the same formula to calculate the test statistic, chi-square
(Χ2):

Where:

• Χ2 is the chi-square test statistic

• Σ is the summation operator (it means “take the sum of”)
• is the observed frequency
• E is the expected frequency

The larger the difference between the observations and the expectations (O − E in the equation), the
bigger the chi-square will be. To decide whether the difference is big enough to be statistically
significant, you compare the chi-square value to a critical value.

Dictionary of Literary Characters
Document849 pages
Dictionary of Literary Characters
Pritesh Chakraborty
No ratings yet
Content 2:: Descriptive and Inferential Statistics
Document59 pages
Content 2:: Descriptive and Inferential Statistics
John Fuerzas
No ratings yet
Q2e LS5 U01 AnswerKey
Document4 pages
Q2e LS5 U01 AnswerKey
Tkp Android
No ratings yet
Success Through Excess: How Property and Casualty Insurers Are Boosting Profits by Entering The Excess and Surplus Market
Document32 pages
Success Through Excess: How Property and Casualty Insurers Are Boosting Profits by Entering The Excess and Surplus Market
Nauman Noor
No ratings yet
The Book of Jonah, by Alfred Von Rohr Sauer
Document13 pages
The Book of Jonah, by Alfred Von Rohr Sauer
Daniel A. Hinton
No ratings yet
Data Screening Checklist
Document57 pages
Data Screening Checklist
Sugan Pragasam
No ratings yet
Practical Engineering, Process, and Reliability Statistics
From Everand
Practical Engineering, Process, and Reliability Statistics
Mark Allen Durivage
No ratings yet
Module 6 Chi-Square T Z Test
Document72 pages
Module 6 Chi-Square T Z Test
Lavanya Shetty
100% (1)
Spectrum 3rd ESO
Document23 pages
Spectrum 3rd ESO
Isalor3
0% (1)
Parametric and Non Parametric
Document4 pages
Parametric and Non Parametric
Nadia Thaqief
100% (2)
Praise For The Art of Community
Document394 pages
Praise For The Art of Community
api-26285909
No ratings yet
What Is Statistics: - Statistics 2 Ways To Define
Document135 pages
What Is Statistics: - Statistics 2 Ways To Define
Anurag Dave
No ratings yet
Shiseido Co LTD in Beauty and Personal Care (World)
Document43 pages
Shiseido Co LTD in Beauty and Personal Care (World)
Rachel
No ratings yet
Thermal Conductivity in Solids
Document302 pages
Thermal Conductivity in Solids
Yang Jubiao
No ratings yet
Unit 12 Progress Test
Document2 pages
Unit 12 Progress Test
PressCall Academy
No ratings yet
Chi Square Good
Document12 pages
Chi Square Good
Rheanne Cerveza
No ratings yet
File 2
Document44 pages
File 2
Shushanth munna
No ratings yet
Chapter 2 4 - FBAS
Document17 pages
Chapter 2 4 - FBAS
Jasper Lagrimas
No ratings yet
Marketing Ii: Facultad de Economía y Negocios Universidad de Chile
Document18 pages
Marketing Ii: Facultad de Economía y Negocios Universidad de Chile
Paz Guerrero Wolf
No ratings yet
IRS Automatic Indexing UNIT-2
Document18 pages
IRS Automatic Indexing UNIT-2
suryachandra podugu
50% (2)
Statistical Methods: 4 Unit
Document39 pages
Statistical Methods: 4 Unit
Premalatha KP
No ratings yet
Stat
Document16 pages
Stat
Bagioso, Cyrus Jude G.
No ratings yet
Data Analysis and Evaluation
Document18 pages
Data Analysis and Evaluation
adejuwonlo
No ratings yet
Video Questions
Document13 pages
Video Questions
sebaszj
No ratings yet
Unit-2 Statistics
Document64 pages
Unit-2 Statistics
Jyothi Pulikanti
No ratings yet
Research
Document35 pages
Research
Roodrapratap Singh Parihar
No ratings yet
Data Analysis
Document10 pages
Data Analysis
Freya Aparis
No ratings yet
Analyzing The Data
Document54 pages
Analyzing The Data
Magnolia Khine
No ratings yet
Week 12 Data Analysis and Presentation
Document21 pages
Week 12 Data Analysis and Presentation
Ace
No ratings yet
Individual Reading - Chapter 11 - Lê M NH Duy - 1852293
Document7 pages
Individual Reading - Chapter 11 - Lê M NH Duy - 1852293
Duy Lê Mạnh
No ratings yet
Research Methods Session 11 Data Preparation and Preliminary Data Analysis (Compatibility Mode)
Document9 pages
Research Methods Session 11 Data Preparation and Preliminary Data Analysis (Compatibility Mode)
Gopaul Usha
No ratings yet
It Is Also Including Hypothesis Testing and Sampling
Document12 pages
It Is Also Including Hypothesis Testing and Sampling
Melanie Arangel
No ratings yet
IRS III Year UNIT-3 Part 1
Document18 pages
IRS III Year UNIT-3 Part 1
Banala ramyasree
50% (2)
Distribution Free Methods, Which Do Not Rely On Assumptions That The
Document4 pages
Distribution Free Methods, Which Do Not Rely On Assumptions That The
Nihar Jawadin
No ratings yet
Dewi Prita Statistic 4D
Document7 pages
Dewi Prita Statistic 4D
Sihat
No ratings yet
Chi Square
Document14 pages
Chi Square
NOYON KUMAR DA
No ratings yet
Fundamentals of Measurement
Document14 pages
Fundamentals of Measurement
Raju Nagendra
No ratings yet
Discriminant Analysis
Document29 pages
Discriminant Analysis
balaa aiswarya
No ratings yet
Biostatistics - Data and Its Types
Document11 pages
Biostatistics - Data and Its Types
prerana chakraborty
No ratings yet
Unit 5 BRM
Document19 pages
Unit 5 BRM
Kajal Tyagi
No ratings yet
Business Research Method: Unit 5
Document19 pages
Business Research Method: Unit 5
Prince Singh
No ratings yet
Research Methods Unit 4
Document6 pages
Research Methods Unit 4
Mukul
No ratings yet
Chapter 11
Document26 pages
Chapter 11
Carmenn Lou
No ratings yet
Measures of Variabilit1
Document7 pages
Measures of Variabilit1
Ken Enciso
No ratings yet
Analysis of Data and Interpretetion of The Results of Statistical Computations
Document62 pages
Analysis of Data and Interpretetion of The Results of Statistical Computations
EL FRANCE MACATE
No ratings yet
Interpretation & Analysis of Data
Document4 pages
Interpretation & Analysis of Data
Priyanka Goyal
100% (1)
Hybrid Model For Clinical Diagnosis and Treatment Using Data Mining Techniques
Document4 pages
Hybrid Model For Clinical Diagnosis and Treatment Using Data Mining Techniques
theijes
No ratings yet
Nonparametric Statistics
Document5 pages
Nonparametric Statistics
joseph676
No ratings yet
Statistics Theory Notes
Document21 pages
Statistics Theory Notes
aanya jain
No ratings yet
Batch 17 - Semester 3 Question Bank Big Data & Business Analytics
Document25 pages
Batch 17 - Semester 3 Question Bank Big Data & Business Analytics
Ann Amitha Antony
No ratings yet
Statistical Methods in Data Mining
Document4 pages
Statistical Methods in Data Mining
Muhammad Tehseen Qureshi
No ratings yet
Data Analysis Chp9 RM
Document9 pages
Data Analysis Chp9 RM
nikjadhav
No ratings yet
Unit 333
Document18 pages
Unit 333
Himajanaidu
No ratings yet
Mixed-Method Data Analysis
Document10 pages
Mixed-Method Data Analysis
Jihan Hidayah
No ratings yet
MBR Session 20
Document78 pages
MBR Session 20
mariamabbasi2626
No ratings yet
Qualitative Data Analysis
Document26 pages
Qualitative Data Analysis
Yousra HR
No ratings yet
Presentation Mbs712 08
Document43 pages
Presentation Mbs712 08
Lizel Morcilla
No ratings yet
Statistics For Communication Research
Document48 pages
Statistics For Communication Research
Faiz Yasin
No ratings yet
Local Media1419236475208910846
Document36 pages
Local Media1419236475208910846
Maria Khristina Buera Bongala
No ratings yet
Topic 12
Document10 pages
Topic 12
Arokiasamy Jairish
No ratings yet
Research Theory
Document8 pages
Research Theory
Pratiksha Gondane
No ratings yet
CC Unit - 4 Imp Questions
Document4 pages
CC Unit - 4 Imp Questions
Sahana Urs
No ratings yet
Advanced Statistics in Quantitative Research
Document21 pages
Advanced Statistics in Quantitative Research
GESIVER MANAOIS
No ratings yet
Stats Interview Questions Answers 1697190472
Document54 pages
Stats Interview Questions Answers 1697190472
Amit Sinha
No ratings yet
Descriptive Statistics
Document5 pages
Descriptive Statistics
api-472656698
No ratings yet
Dersnot 4626 1666356899
Document16 pages
Dersnot 4626 1666356899
Samet Özdemir
No ratings yet
An Analytic Data Set (ADS) Is The
Document27 pages
An Analytic Data Set (ADS) Is The
Pavan Reshmanth
No ratings yet
STAT Notes
Document6 pages
STAT Notes
Celine Therese Bu
No ratings yet
BRM CS
Document4 pages
BRM CS
palija shakya
No ratings yet
Circular Assignment
Document3 pages
Circular Assignment
Rahul Datta
No ratings yet
Group10 1002 1006 1012 1050 1104
Document6 pages
Group10 1002 1006 1012 1050 1104
Rahul Datta
No ratings yet
Class Notes 1 - LogiM - Integrated Supply Chain
Document42 pages
Class Notes 1 - LogiM - Integrated Supply Chain
Rahul Datta
No ratings yet
Class Notes 3 - LogiM - Best Practices in Distribution Centre MGMT
Document51 pages
Class Notes 3 - LogiM - Best Practices in Distribution Centre MGMT
Rahul Datta
No ratings yet
Group 2
Document19 pages
Group 2
Rahul Datta
No ratings yet
LM - Group 7
Document7 pages
LM - Group 7
Rahul Datta
No ratings yet
Group 7 - SCM Assignment 2
Document14 pages
Group 7 - SCM Assignment 2
Rahul Datta
No ratings yet
Group 7 - SCM Assignment 3
Document16 pages
Group 7 - SCM Assignment 3
Rahul Datta
No ratings yet
Group 7 - SCM Assignment 1
Document6 pages
Group 7 - SCM Assignment 1
Rahul Datta
No ratings yet
BS 6001-2-1993 - (2017-10-06 - 02-55-17 Am)
Document30 pages
BS 6001-2-1993 - (2017-10-06 - 02-55-17 Am)
Rankie Choi
No ratings yet
38 Pamintuan v. CA
Document1 page
38 Pamintuan v. CA
Joshua Alexander Calaguas
No ratings yet
Soal Ramedial Bahasa Inggris Kelas XI
Document3 pages
Soal Ramedial Bahasa Inggris Kelas XI
Kusnadi Dahari
No ratings yet
Carkhuff Counselling Skills Level 1
Document2 pages
Carkhuff Counselling Skills Level 1
Mohamad Shuhmy Shuib
No ratings yet
SMJK Notre Dame Convent Melaka Peperiksaan Percubaan SPM Tahun 2017
Document19 pages
SMJK Notre Dame Convent Melaka Peperiksaan Percubaan SPM Tahun 2017
Loh Chee Wei
No ratings yet
The Oral Approach and Situational Language Teaching
Document5 pages
The Oral Approach and Situational Language Teaching
alfa1411
0% (1)
Chapter 2 Understanding Management's Context Constraints and Challenges
Document36 pages
Chapter 2 Understanding Management's Context Constraints and Challenges
Einal Abidin
No ratings yet
CH 02
Document40 pages
CH 02
BaoYu MoZhi
No ratings yet
The Moon: The Moon Abiding in The Midst of Serene Mind Great Waves Breaking Into Light
Document3 pages
The Moon: The Moon Abiding in The Midst of Serene Mind Great Waves Breaking Into Light
metaltiger
No ratings yet
224 Midterm
Document6 pages
224 Midterm
RGM36
No ratings yet
Kesetimbangan Partikel
Document25 pages
Kesetimbangan Partikel
Galih Indra Sukmana
No ratings yet
Algebra: Paid Ebook
Document31 pages
Algebra: Paid Ebook
Souvik Bhattacharjee
No ratings yet
Traxmaker User Manual
Document220 pages
Traxmaker User Manual
rojozurdo
100% (1)
Get Ready For The TOEFL Primary - Grade 5 - For Students
Document110 pages
Get Ready For The TOEFL Primary - Grade 5 - For Students
Anh Nguyễn
No ratings yet
Capstone Project
Document36 pages
Capstone Project
api-462295397
No ratings yet
2 Esguerra VS Trinidad
Document5 pages
2 Esguerra VS Trinidad
KRISIA MICHELLE ORENSE
No ratings yet
Schilling Dennis 2.10 Resume
Document4 pages
Schilling Dennis 2.10 Resume
dschill2
No ratings yet
Libre Office
Document12 pages
Libre Office
Toan Hang
No ratings yet
Danfoss Bbauer
Document35 pages
Danfoss Bbauer
jhephe46
No ratings yet
HSO408 Assessment 2 Expanded Rubrics - FINAL2019
Document5 pages
HSO408 Assessment 2 Expanded Rubrics - FINAL2019
sidney drecotte
No ratings yet
The Influences of Leadership, Work Discipline, and Communication On Employees Performance
Document7 pages
The Influences of Leadership, Work Discipline, and Communication On Employees Performance
International Journal of Innovative Science and Research Technology
No ratings yet