Welcome to Scribd!

Data Mining - Weka 3 Questions

Uploaded by

0% found this document useful (0 votes)

37 views1 page

1. The random seed affects which random sample is chosen for the training and testing sets. In Weka, the random seed value can be changed in the "Random seed for XVal / % Split" section under the More options button in the Classify panel. 2. When running J48 10 times on the same data, different accuracy results were obtained each time because a different random seed value was used for each run, resulting in a different random split of data into training and testing sets. 3. For the weather dataset containing 9 plays and 5 non-plays, if those were the only facts known, the classifier would guess play, since play is the more frequent outcome at 9/14 versus 5/

Original Description:

Original Title

Data mining - Weka 3 questions

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

37 views1 page

Data Mining - Weka 3 Questions

Uploaded by

Hakufuu

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 1

Search inside document

1.

Setting a random seed effects which random sample from the data set will be chosen for the training (and
therefore also for the testing) set. How do we change the random seed in Weka?
We can do that by clicking the More options button in the Classify panel, and then change the
value in the “Random seed for XVal / % Split” section.
2. At minute 4:17 we see that he got 10 different results for the accuracy of J48. Why are there different
results?
Because each result was generated using a different value of the random seed.
3. In the weather.nominal dataset they play golf 9 times altogether and don’t play 5 times. If that is ALL you
knew (you don’t know anything about the other variables), then what would you guess would happen
tomorrow with regard to whether they would play golf or not?
If all the data I knew was the number of time they did and did not play golf with each other, I would guess
that in the next day they would play with each other since the probability of that is higher (9/14 > 5/14)
4. Regarding question number 3, what classifier behaves like the guessing you made in question number 3?
In my opinion, it was the baseline classifier
5. One of the datasets in the data folder does not work very well in terms of building a model to predict the
class. In fact many algorithms perform worse than the baseline. Which dataset is that?

It is the diabetes.arff dataset.

6. He repeated the experiment 10 times. The standard deviation measures the variation of the accuracy
measure over the 10 repetitions. That is also referred to as the variance of the estimate. What was the value
of the variance of the estimate?
The variance of the estimate, or standard deviation measures the variation of the accuracy measure over
the 10 repetitions was 0.018
7. In the video around minute 2:40 you can see two “pies”. Explain what they are.
1 pie represents 1 dataset. It is divided into 10 parts; we then use 9 parts for training and 1 for testing. The
chosen one tenth of the dataset in each pie is different, indicating that in repeated holdout, each time a
different 10% of the dataset will be used for testing.
8. At minute 4:15 you can see the word “DEPLOY”. Can you understand what he means on this screen by that
word? If so, explain
After cross-validation, Weka will run the algorithm a 11th time using 100% of the data set to get a final
classifier that can be deployed in practice. In my opinion, deploy here refer to the act of applying the actual
classifier in real-world usage.
9. What is meant by a fold?
Fold refers to the number of groups that a given data sample is to be split into. As such, the procedure is
often called k-fold cross-validation.
10. At minute 2:20 he says “Each branch assigns the most frequent class that comes down that branch.” What
does that mean?
It means that each branch is a possible value of the attribute, and for each possible value, the algorithm will
associate it with the most frequent class that appears in the selected value.

Data Science Probability
Document97 pages
Data Science Probability
Diego Armando Ramirez Hernandez
No ratings yet
Form To Answer Exploration Harley
Document8 pages
Form To Answer Exploration Harley
Paul Giuliani
No ratings yet
CS440: HW3
Document7 pages
CS440: HW3
Jon Mueller
No ratings yet
WEKA
Document81 pages
WEKA
whaleed
No ratings yet
Cpe/Csc 480 Artificial Intelligence Final Exam Fa L L 2 0 0 4
Document10 pages
Cpe/Csc 480 Artificial Intelligence Final Exam Fa L L 2 0 0 4
Aymancva Ghabayensen
No ratings yet
ML Interviw Questions
Document11 pages
ML Interviw Questions
618Vishwajit Pawar
No ratings yet
Decision Tree & Random Forest
Document16 pages
Decision Tree & Random Forest
reshma acharya
No ratings yet
Orange Tutorial
Document19 pages
Orange Tutorial
amkr_dav7810
No ratings yet
Machine Learning
Document13 pages
Machine Learning
MOHAMMED MANSOOR
No ratings yet
Dtree&rf
Document26 pages
Dtree&rf
Mohit Soni
No ratings yet
Exercise For Module SEL ( (
Document2 pages
Exercise For Module SEL ( (
antony
No ratings yet
Cross Validation - How Many Times Should We Repeat A K-Fold CV - Cross Validated
Document3 pages
Cross Validation - How Many Times Should We Repeat A K-Fold CV - Cross Validated
Azeddine Ramzi
No ratings yet
S Pss Multiple Response Command
Document6 pages
S Pss Multiple Response Command
Vale Milanés
No ratings yet
Data Science Interview Questions
Document68 pages
Data Science Interview Questions
Ava White
100% (1)
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
Document46 pages
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
ChinmoyDas
No ratings yet
I Am Sharing 'Interview' With You
Document65 pages
I Am Sharing 'Interview' With You
Branch Reed
100% (3)
Natural Selection Lab-Phet Simulation
Document5 pages
Natural Selection Lab-Phet Simulation
Chipmunk
No ratings yet
Homework 2
Document25 pages
Homework 2
Arpit Gulati
100% (1)
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
From Everand
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
Dwayne Phillips
No ratings yet
Interview Questions ML
Document83 pages
Interview Questions ML
kprdeepak
100% (1)
9th Grade Artifact - Aol Statistics Data Exploration
Document5 pages
9th Grade Artifact - Aol Statistics Data Exploration
api-550453163
No ratings yet
Excel Simulations
From Everand
Excel Simulations
Gerard M. Verschuuren
Rating: 3.5 out of 5 stars
3.5/5 (2)
Proliferation Tutorial
Document16 pages
Proliferation Tutorial
Arghyashree RoyChowdhury
No ratings yet
Science Fair Project Proposal: For Example, If John Doe and Mary Jane Are in A Team Together Their File Name Should Read
Document3 pages
Science Fair Project Proposal: For Example, If John Doe and Mary Jane Are in A Team Together Their File Name Should Read
Ariallucas1
No ratings yet
Stat Ess Mod 3 Ses 1
Document29 pages
Stat Ess Mod 3 Ses 1
Akila
100% (1)
Accuracy, Precision, Recall or F1
Document5 pages
Accuracy, Precision, Recall or F1
nathanlgrossman
No ratings yet
Interview Questions
Document67 pages
Interview Questions
vaishnav Jyothi
100% (1)
Correct Answer:: Always True
Document1 page
Correct Answer:: Always True
Vanessa
No ratings yet
Making Decisions With Simulations
Document25 pages
Making Decisions With Simulations
radi krikab
No ratings yet
Department of Computing Technologies Software Testing and Reliability
Document6 pages
Department of Computing Technologies Software Testing and Reliability
Syed Emad
No ratings yet
Frequently Asked Questions Equating of Scores On Multiple Forms
Document6 pages
Frequently Asked Questions Equating of Scores On Multiple Forms
DeveshKumar
No ratings yet
Gauging Gage Minitab
Document16 pages
Gauging Gage Minitab
Francisco Hernandez
No ratings yet
Practical 5: Introduction To Weka For Classfication
Document4 pages
Practical 5: Introduction To Weka For Classfication
Phạm Hoàng Kim
No ratings yet
Module 1-Lesson-6
Document13 pages
Module 1-Lesson-6
Bhoncarlo Rogador
No ratings yet
ch5 - Social Interaction in Individual vs. Partner Playing
Document11 pages
ch5 - Social Interaction in Individual vs. Partner Playing
begonapino
No ratings yet
Machine Learning IQs
Document13 pages
Machine Learning IQs
pixelheart
100% (1)
Ia Psycho
Document21 pages
Ia Psycho
Oliver Rowe
No ratings yet
Direction: Classify The Following Random Variables As Discrete or Continuous
Document2 pages
Direction: Classify The Following Random Variables As Discrete or Continuous
Cindy Pableo Santos
No ratings yet
Interview Questions For DS & DA (ML)
Document66 pages
Interview Questions For DS & DA (ML)
pratikmovie999
100% (1)
Unit 4 Ai 1351
Document9 pages
Unit 4 Ai 1351
anand.happy1231817
No ratings yet
10 Ai Evaluation tp01
Document5 pages
10 Ai Evaluation tp01
tanjirouchihams12
No ratings yet
Evaluating A Machine Learning Model
Document14 pages
Evaluating A Machine Learning Model
Jean
No ratings yet
Assignment 6 - Law of Large Numbers
Document2 pages
Assignment 6 - Law of Large Numbers
api-314251476
No ratings yet
Nervous System Lab
Document4 pages
Nervous System Lab
akshyta gantan
No ratings yet
Machine Learning Questions
Document19 pages
Machine Learning Questions
Mojdeh Soltani
No ratings yet
Natural Language Processing
Document11 pages
Natural Language Processing
Divya Negi
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
Document13 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
sumanroyal
No ratings yet
Data Science Intervieew Questions
Document16 pages
Data Science Intervieew Questions
Satyam Anand
100% (1)
ML Mid Question Solve
Document19 pages
ML Mid Question Solve
md.anis molla
No ratings yet
QnA - Business Analytics
Document6 pages
QnA - Business Analytics
Rumani Chakraborty
No ratings yet
Assignment Name - Machine Learning Basics Problem Statement
Document4 pages
Assignment Name - Machine Learning Basics Problem Statement
Mubassir Surve
No ratings yet
ML
Document3 pages
ML
Aptech Pitampura
No ratings yet
Java Questions
Document25 pages
Java Questions
Petruţa Nagy
No ratings yet
A Statistical Summary of All The Variables
Document5 pages
A Statistical Summary of All The Variables
Mohit Soni
No ratings yet
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
From Everand
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
UBER AUTHOR
No ratings yet
Train: Dev: Test Sets
Document5 pages
Train: Dev: Test Sets
John
No ratings yet
Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)
Document24 pages
Unit 5: Test of Significance/Hypothesis Testing (Topics 20, 22, 23)
Riddhiman Pal
No ratings yet
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
Document16 pages
How To Choose The Right Test Options When Evaluating Machine Learning Algorithms
prediatech
No ratings yet
Understanding Research and Data Analysis Basic Tools and Techniques Using spss2
Document39 pages
Understanding Research and Data Analysis Basic Tools and Techniques Using spss2
Amy Khan
No ratings yet
ML Review01
Document4 pages
ML Review01
languages cultures
No ratings yet
ECMT1020 - Week 04 Workshop Answers PDF
Document5 pages
ECMT1020 - Week 04 Workshop Answers PDF
perthwashington.j9t23
No ratings yet
Difference Between Population and Sample
Document3 pages
Difference Between Population and Sample
yigo
100% (2)
Mann-Whitney U Test-1
Document24 pages
Mann-Whitney U Test-1
Iftitah Akbar
No ratings yet
W 9-10 - Peluang Distribusi Diskrit & Kontinyu PDF
Document116 pages
W 9-10 - Peluang Distribusi Diskrit & Kontinyu PDF
Irfan Aji
No ratings yet
Ce Peds Probability PDF
Document28 pages
Ce Peds Probability PDF
Renvil Pedernal
No ratings yet
Somers D Paper
Document6 pages
Somers D Paper
sadatnfs
No ratings yet
007 0032 (2017) PDF
Document3 pages
007 0032 (2017) PDF
mequanent adino
No ratings yet
Set 4
Document2 pages
Set 4
murthy
25% (8)
PTSP - MLRS - R22 - II - I - ECE - Syllabus
Document2 pages
PTSP - MLRS - R22 - II - I - ECE - Syllabus
rupa kumar dhanavath
No ratings yet
Kruskal Wallis
Document19 pages
Kruskal Wallis
ANGELO JOSEPH CASTILLO
No ratings yet
Mann
Document21 pages
Mann
Neagoe
No ratings yet
Statistics Unit 9 Notes
Document10 pages
Statistics Unit 9 Notes
gopscharan
No ratings yet
Chapter 8 - Interval Estimation
Document56 pages
Chapter 8 - Interval Estimation
gauravpalgarimapal
No ratings yet
Lecture-11,12 - Chapter 6 - Continuous Random Variables - Normal Distribution
Document107 pages
Lecture-11,12 - Chapter 6 - Continuous Random Variables - Normal Distribution
Tasmiah Hossain
No ratings yet
Frequency Distribution Table
Document14 pages
Frequency Distribution Table
Lizlette Ianne
100% (2)
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
Document38 pages
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
Tram Anh
No ratings yet
Sample and Sampling Terminology
Document49 pages
Sample and Sampling Terminology
alone_01
100% (1)
Tango - 1998 - Equivalence Test and Confidence Interval For The Difference in Proportions For The Paired-Sample Design
Document18 pages
Tango - 1998 - Equivalence Test and Confidence Interval For The Difference in Proportions For The Paired-Sample Design
Mario
No ratings yet
Sample Surveys - Nonprobability Sampling
Document4 pages
Sample Surveys - Nonprobability Sampling
Max Sarmento
No ratings yet
An Introduction To Statistical Inference
Document33 pages
An Introduction To Statistical Inference
Wathz Nawarathna
No ratings yet
A Classification of Experimental Designs
Document15 pages
A Classification of Experimental Designs
sony21
100% (1)
Previously, We Looked at Estimating and Testing The Population Mean When The Population Standard Deviation Was Known or Given
Document38 pages
Previously, We Looked at Estimating and Testing The Population Mean When The Population Standard Deviation Was Known or Given
Hamshavathini Yohoratnam
No ratings yet
Statistics Problems
Document2 pages
Statistics Problems
ldlewis
No ratings yet
LAS #3 (Statistics and Probability) PDF
Document5 pages
LAS #3 (Statistics and Probability) PDF
Kenneth Carl Osillos
No ratings yet
Homogeneity of Variance Tutorial
Document14 pages
Homogeneity of Variance Tutorial
api-163017967
No ratings yet
T-Test NOtes - Lecture
Document9 pages
T-Test NOtes - Lecture
norhanifah matanog
No ratings yet
BS - GT Merged
Document512 pages
BS - GT Merged
Chinmoy
No ratings yet
Module 5 Experimental Designs and Significance Testing PDF
Document28 pages
Module 5 Experimental Designs and Significance Testing PDF
milrosebatilo2012
No ratings yet
Data Mining For Business Analyst Assignment
Document9 pages
Data Mining For Business Analyst Assignment
Nageshwar Singh
No ratings yet
Note Multivariate Analysis of Variance
Document3 pages
Note Multivariate Analysis of Variance
jia quan goh
No ratings yet