Welcome to Scribd!

Framework To Approach A Kaggle Problem: 1. Importing The Training / Test Population

Uploaded by

0% found this document useful (0 votes)

9 views2 pages

This document provides a framework and tips for approaching problems on the Kaggle machine learning competition platform. It notes that competing with experienced data scientists can be challenging, as some have automated tools for data exploration. The tips include: working hard; teaming up initially; focusing on feature engineering; researching the domain and problem; making simple initial submissions; being open to starting from scratch; and experimenting with algorithms and ensembles. It then outlines a framework involving importing training/test data, sampling the population, choosing attributes, and comparing models. The goal is to help readers get started competing on Kaggle to enter the new era of analytics and machine learning.

Original Description:

problem statement

Original Title

Problem Statement

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

9 views2 pages

Framework To Approach A Kaggle Problem: 1. Importing The Training / Test Population

Uploaded by

Govind Naik

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Competing with the best data scientists can be challenging.

Especially so, if some of them

have been doing so for years. I know a few people who have well automated scripts to perform
most of the data exploration! These people are out deciding on best algorithms when rest of
the world is still figuring out the nuances of the data.

Here are a few things you need to keep in mind before starting a problem on Kaggle :

1. Like all good things in life, winning a Kaggle competition is all about hard work. Get
ready to devote long hours wondering on the same problem for days/weeks/months.
2. Team up with a good team mate for competing in initial competitions. Good team mate
is some one with similar bent of mind and thought process, but might have
complementary skills on tool / domain / work experience.
3. Be ready to do a lot of feature engineering – that is what differentiates the best from
the rest.
4. Do a preliminary research on the domain and the problem. There might be good
research papers with non-conventional effective solutions available on the internet.
5. Make simple initial solutions and submit them to get a sense on how much gap you
need to cover
6. Always be open to start from scratch
7. Experiment with different algorithms and be prepared to prepare ensembles.

The list is not exhaustive, but covers a significant portion. Now let’s look at a simple framework
to approach a Kaggle problem. Participants are challenged at each step of this framework by
Kaggle.

Framework to approach a Kaggle Problem

Next, we will take you through a step by step process of taking a simple shot on a Kaggle
statement. The process generally involve following pieces :

1. Importing the training / test population : Kaggle challenges you to import the training /
test dataset. In general, this is not very straight forward. For example in following problems,
training data needs to messaged well before we start working on the model.

Here are two problem statements where you need to extract data from multiple excel files :

a. Driver Telematic Analysis

b. BCI Challenge @ NER 2015

2. Sampling the population : In general the population size is huge and might not be the
best idea to train using the entire population. For example, “Sentiment Analysis fro Movie
Review” with an enormous number of phrases might be a bad idea to build an initial dictionary.
Choosing this sample can be done randomly or in a stratified way.
3. Choosing the right attributes : This is the most critical step which distinguishes different
submissions on Kaggle. In general we use Principle component analysis, factor analysis,
Information Value, Weight of Evidence to do this part. But there is no set procedure to do this.

4. Compare different ensemble / simple models : Once we have the input and the target
variables, we start building different models. The choice of model depends on the evaluation
metrics, type of input / target variable, distribution of population on target values etc.

In this article we will start with the first step leveraging the BCI challenge. We will start with
the problem statement and then define the scope of this article. After reading this article, I
believe you can start competing on Kaggle and start your journey to discover the new era of
Analytics & Machine Learning.

GRE Tips To Raise Your Score
Document13 pages
GRE Tips To Raise Your Score
api-3811671
100% (1)
Problem Solving Tools
Document9 pages
Problem Solving Tools
Brady_tom
No ratings yet
CapStone Project
Document4 pages
CapStone Project
Manojay's Directionone
No ratings yet
How To Prepare For An Interview at Google.
Document4 pages
How To Prepare For An Interview at Google.
Shashi Kolar
No ratings yet
Learning in Artificial Intelligence
Document8 pages
Learning in Artificial Intelligence
R Ravi Teja
50% (2)
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
From Everand
Confident Programmer Problem Solver: Six Steps Programming Students Can Take to Solve Coding Problems
Cloudy Heaven Games
No ratings yet
How To Study For Your CCIE
Document18 pages
How To Study For Your CCIE
Tobi Babs
No ratings yet
How To Start Kaggle
Document40 pages
How To Start Kaggle
udhai170819
No ratings yet
Machine Learning Interviews
Document22 pages
Machine Learning Interviews
Mark Fisher
100% (2)
Blooms Taxonomy of Educational Objectives
Document57 pages
Blooms Taxonomy of Educational Objectives
eden
No ratings yet
The Ultimate Learning Path To Become A Data Scientist and Master Machine Learning in 2019
Document12 pages
The Ultimate Learning Path To Become A Data Scientist and Master Machine Learning in 2019
k2sh
No ratings yet
Google Interview Preparation
Document3 pages
Google Interview Preparation
Nelliud D. Torres
100% (2)
Ebook Prompt Engineering 101
Document26 pages
Ebook Prompt Engineering 101
FELIPE
No ratings yet
Test Driven Machine Learning - Sample Chapter
Document25 pages
Test Driven Machine Learning - Sample Chapter
Packt Publishing
100% (1)
Programming Contest Guide
Document240 pages
Programming Contest Guide
JoelUchoa
100% (1)
Research Instrument: Prepared By: Mayflor S. Barile
Document17 pages
Research Instrument: Prepared By: Mayflor S. Barile
Andreea Manole
No ratings yet
ML Interview Questions
Document146 pages
ML Interview Questions
IndraneelGhosh
No ratings yet
Research Design
Document96 pages
Research Design
Lorijhane Ubal
No ratings yet
COURSE OUTLINE Methods of Research in Computing
Document2 pages
COURSE OUTLINE Methods of Research in Computing
michael angelo Burac
No ratings yet
Machine Learning Project Example - Building A Model Step-By-Step PDF
Document9 pages
Machine Learning Project Example - Building A Model Step-By-Step PDF
SrinivasKannan
No ratings yet
Machine Learning Yearning
Document116 pages
Machine Learning Yearning
glory m
No ratings yet
CHS, CSM
Document2 pages
CHS, CSM
ankur sharma
No ratings yet
2000 VISEM Activity Chain Based Modeling
Document19 pages
2000 VISEM Activity Chain Based Modeling
hsmnp
No ratings yet
Artificial Intelligence - (Unit - 1)
Document47 pages
Artificial Intelligence - (Unit - 1)
kertthanarajesh
No ratings yet
7641 Assignment 1
Document4 pages
7641 Assignment 1
Muhammad Aleem
No ratings yet
Spring Assignment 2024
Document12 pages
Spring Assignment 2024
Soham Kalburgi
No ratings yet
Applied Machine Learning Process
Document23 pages
Applied Machine Learning Process
prediatech
No ratings yet
Introduction To Data Science: Dataset
Document13 pages
Introduction To Data Science: Dataset
yogesh
No ratings yet
Economics 5263: Lee - Adkins@okstate - Edu
Document8 pages
Economics 5263: Lee - Adkins@okstate - Edu
Cherry Saeed
No ratings yet
Assignment 3
Document3 pages
Assignment 3
Beta Ways
No ratings yet
ML Step by Step
Document10 pages
ML Step by Step
OUAFI Kheireddine
No ratings yet
Week 2 - Select and Train A Model
Document29 pages
Week 2 - Select and Train A Model
Bhuwan Bhatt
No ratings yet
2d66fc4-21a3-53b-4fd1-E4c1abe6e56 The Starter Guide For Modern Data
Document8 pages
2d66fc4-21a3-53b-4fd1-E4c1abe6e56 The Starter Guide For Modern Data
aiParacha
No ratings yet
Advanced C# Programming Learning Manual
Document308 pages
Advanced C# Programming Learning Manual
Thato Mathabatha
No ratings yet
5 Common Challenges That Data Scientists Face in Starting Their Careers 1
Document17 pages
5 Common Challenges That Data Scientists Face in Starting Their Careers 1
Anand Rangarajan
No ratings yet
Why Do AI Initiatives Fail
Document5 pages
Why Do AI Initiatives Fail
Md Ahsan Ali
No ratings yet
Program Description and Course Syllabus
Document4 pages
Program Description and Course Syllabus
Mohamed Samer
No ratings yet
Kaggle Talk Online Version
Document13 pages
Kaggle Talk Online Version
Muhammad Asad Bhutta
No ratings yet
Data Science Projects
Document3 pages
Data Science Projects
Hanane Gríssette
No ratings yet
Introduction To Machine Learning Top-Down Approach - Towards Data Science
Document6 pages
Introduction To Machine Learning Top-Down Approach - Towards Data Science
Kashaf Bakali
No ratings yet
Program Description and Course Syllabus
Document3 pages
Program Description and Course Syllabus
Dwi Orvino
No ratings yet
Thesis Template Blogspot
Document8 pages
Thesis Template Blogspot
afcmfuind
100% (3)
Program Description and Course Syllabus
Document4 pages
Program Description and Course Syllabus
Domo
No ratings yet
CS229 Final Project Spring 2023 Public PDF
Document12 pages
CS229 Final Project Spring 2023 Public PDF
Amr Abbas
No ratings yet
Life Cycle of Data Science - Complete Step-By-step Guide
Document3 pages
Life Cycle of Data Science - Complete Step-By-step Guide
MTSSEducation
No ratings yet
The Context-Driven Approach To Software Testing
Document14 pages
The Context-Driven Approach To Software Testing
Omer E Javedan
No ratings yet
List of Topics For Research Paper in Software Engineering
Document8 pages
List of Topics For Research Paper in Software Engineering
caq5kzrg
No ratings yet
The Approach For Logic Building Skills
Document9 pages
The Approach For Logic Building Skills
Tauqir Ahmed
No ratings yet
Project For Econometrics
Document3 pages
Project For Econometrics
yue nicole
No ratings yet
School Id System Thesis
Document5 pages
School Id System Thesis
afibykkhxxhdid
100% (2)
Dissertation Article 1382
Document6 pages
Dissertation Article 1382
CustomPapersCleveland
100% (1)
6.891 Machine Learning: Project Proposal
Document2 pages
6.891 Machine Learning: Project Proposal
sagar
No ratings yet
Ultimate Algorithm Guide
Document35 pages
Ultimate Algorithm Guide
narendraup
No ratings yet
Master Thesis in Overall Equipment Effectiveness
Document4 pages
Master Thesis in Overall Equipment Effectiveness
HelpWritingACollegePaperCanada
100% (2)
Fall Semester Design Manual
Document17 pages
Fall Semester Design Manual
arilovesmusic
No ratings yet
Simulation Guide For USA12
Document6 pages
Simulation Guide For USA12
dilbertdave
No ratings yet
KICKBACKS: A How-To Guide and Expectations: Purpose of This Document
Document4 pages
KICKBACKS: A How-To Guide and Expectations: Purpose of This Document
Giles Domkam
No ratings yet
Agile Software Testing: White Paper
Document12 pages
Agile Software Testing: White Paper
nithiananthi
No ratings yet
Common Analytics Interview Questions
Document4 pages
Common Analytics Interview Questions
PKDB
No ratings yet
How To Learn Data Science
Document8 pages
How To Learn Data Science
Atsal
100% (1)
Chapter 5 Thesis Sample Scribd
Document6 pages
Chapter 5 Thesis Sample Scribd
CollegePaperHelpFargo
100% (2)
Subject Segment Topic: Data Analysis For Managers Welcome
Document3 pages
Subject Segment Topic: Data Analysis For Managers Welcome
Burhan Al Messi
No ratings yet
Following Is The Strategy We Used in One of My Projects:: Define Brain Stromming and Cause Effect Graphing? With Eg?
Document10 pages
Following Is The Strategy We Used in One of My Projects:: Define Brain Stromming and Cause Effect Graphing? With Eg?
murtajiz110
No ratings yet
Data Analyst Coursera
Document6 pages
Data Analyst Coursera
Abhishek Mathur
No ratings yet
Programming Dissertation Ideas
Document5 pages
Programming Dissertation Ideas
CustomCollegePaperCanada
100% (1)
如何修改一份完美的DS求职简历（中英文版）
Document8 pages
如何修改一份完美的DS求职简历（中英文版）
xi
No ratings yet
Investigation of Open Educational Resources Adoption in Higher Education Using Rogers
Document26 pages
Investigation of Open Educational Resources Adoption in Higher Education Using Rogers
ahmad faisal
No ratings yet
Impact of Perceived Service Quality of Omni-Channel E-Tailers Fulfilment An Analysis
Document21 pages
Impact of Perceived Service Quality of Omni-Channel E-Tailers Fulfilment An Analysis
Chandrakala v Venkatesh
No ratings yet
Comparative Study Between Rajasthan State Road Transport Corporation and Haryana Roadways
Document47 pages
Comparative Study Between Rajasthan State Road Transport Corporation and Haryana Roadways
International Journal of Innovative Science and Research Technology
No ratings yet
Consumer Behavior and Brand Preference Towards NOKIA Mobile in East Delhi
Document19 pages
Consumer Behavior and Brand Preference Towards NOKIA Mobile in East Delhi
chandanparsad
No ratings yet
Principles of Marketing - MGT301 Solved Mcqs PDF
Document51 pages
Principles of Marketing - MGT301 Solved Mcqs PDF
sehar Shah nawaz
100% (1)
Luna Et Al. 2017 Superficie Auricular Final
Document14 pages
Luna Et Al. 2017 Superficie Auricular Final
Constanza Cáceres Orellana
No ratings yet
1020 Quality Assurance
Document10 pages
1020 Quality Assurance
amila_vithanage
No ratings yet
INGLÉS 1 Gumpert2002
Document18 pages
INGLÉS 1 Gumpert2002
Augusto Braulio Valdivia Palomino
No ratings yet
An Index of Syntactic Development For Cantonese-Chinese Preschool Children 2023-02-14 04 - 32 - 47
Document47 pages
An Index of Syntactic Development For Cantonese-Chinese Preschool Children 2023-02-14 04 - 32 - 47
Fei Ling
No ratings yet
Effects of Hydrogen Peroxide On Corrosion of Stainless Steel II
Document11 pages
Effects of Hydrogen Peroxide On Corrosion of Stainless Steel II
venky4025
No ratings yet
IJETST-Galang Manuscript
Document13 pages
IJETST-Galang Manuscript
joan
No ratings yet
01-BU7030 - Assessment1
Document7 pages
01-BU7030 - Assessment1
Mehran Raza
No ratings yet
Revista 51,6
Document72 pages
Revista 51,6
Hector Arenas
No ratings yet
Taping Correction: The Principal Systematic Errors in Linear Measurements Made With A Tape Are
Document17 pages
Taping Correction: The Principal Systematic Errors in Linear Measurements Made With A Tape Are
Zeleke Taimu
No ratings yet
Different Question: Conduct The 5-Step Test For This Exercise
Document6 pages
Different Question: Conduct The 5-Step Test For This Exercise
Tun Tai
No ratings yet
Asian Incentive Events NSW PDF
Document28 pages
Asian Incentive Events NSW PDF
Febrian Wardani
No ratings yet
IOPS 311 Assignment 1
Document8 pages
IOPS 311 Assignment 1
Dylan Banks
No ratings yet
Proposal 1
Document19 pages
Proposal 1
Addisu Teshome
No ratings yet
Sta 328.applied Regression Analysis Ii PDF
Document12 pages
Sta 328.applied Regression Analysis Ii PDF
Kimondo King
No ratings yet
HISTORY
Document13 pages
HISTORY
John Carlo Dela Cruz
No ratings yet
STQM Chapter 1 & 2 Notes
Document10 pages
STQM Chapter 1 & 2 Notes
Jenayl Miller
No ratings yet
CST EXAM Surveying 2
Document195 pages
CST EXAM Surveying 2
notevale
No ratings yet
Leadership
Document10 pages
Leadership
Muhamad Basri
No ratings yet
Name: - Score
Document4 pages
Name: - Score
Charmaine Janine Mamangon
No ratings yet