0% found this document useful (0 votes)

784 views29 pages

Kaggle Tutorial 1

This document provides instructions for getting started with Kaggle competitions using Jupyter Notebooks on Kaggle. It discusses creating a Kaggle account, navigating to competitions, creating a kernel (notebook), installing necessary libraries like PyTorch, defining a dataset class to load competition data, and making the kernel's code public. The steps covered include logging into Kaggle, exploring sample competitions and datasets, starting a new kernel, using Jupyter Notebooks, committing and publishing code, and accessing kernels on one's profile page. The overall document aims to tutorial users through getting set up on Kaggle and beginning to engage with their platform and machine learning competitions.

Uploaded by

Marcos Filipe Godoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

784 views29 pages

Kaggle Tutorial 1

Uploaded by

Marcos Filipe Godoy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

This is like the most usefull resource on practical

AI that you can possibly find today! Have you

ever wanted to play around with
the AMAZING Kaggle datasets? If this interests
you, buckle in!
GETTING STARTED!
If you are new to Kaggle, you can create your
account with
- Google
- Facebook
- Kaggle
YOU ALREADY LOGGED IN KAGGLE

Ok, Now you already have logged into Kaggle, to

start playind around you can go to the
competitions. You can have a look at the most
recent and what prizes they offer.
HUMPBACK WHALE IDENTIFICATION
CHALLENGE

In this tutorial we're going to be looking at

the recent ( late 2018 ) Humpback Whale
Identification Challenge, MNIST is also a good
place to start, you can look have a look at
how simple it is.
CREATING YOUR FIRST KERNEL

Now we get to the competitions page, you can

see a blue button called "New Kernel", just press
it .
USING JUPYTER NOTEBOOKS
If you have never heard of Jupyter Notebooks (Do
you live in a cave? hahhaa ). They are an amazing
resource to share replicable code, they were used even
for the Gravitational Waves!
Pretty amazing not? We are going to be using them in
our Kernels at Kaggle
GAME ON!

Now the Game is On! We can have amazing

Kernels and share with the community!
COMMITING YOUR CHANGES

If you aren't familiar with the term commit have a

look here (This channel has also Amazing resources
within the Data Science field! ). You can commit
your changes, don't worry, this wont make your
Kernel public yet, you can go nuts!
YOUR PROFILE HAS YOUR
KERNELS!

You finished all your code and tests and now you
are ready to move on! Make the final commit and
go to your profile. There lies your precious Kernels!
ALMOST THERE!

You can open your Kernel by clicling on the title

GO PUBLIC
To make your now Private Kernel Public you need
to click on the "Access" button
BE PROUD
Change the Privacy Options from Private to Public and
that's it! But wait, my kernel isn't showing up, I know I know,
I've been were you are. If you went to the public kernels
and didn't find your own, don't panic, the Kaggle website
takes sometime to update the Kernels. Now be PROUD,
you've made your very first public Kernel!
10 MAJOR STEPS IN YOUR KERNEL
In this video I'll show you how the 10 major steps in creating
your very first simple model to this Whale Competion! If you
enjoy the content, consider subscribing and activating the
notification, I upload videos every week on Data Science
topics!
If you want more on this
content just or any other
content. Let me know on the
comments bellow. We post
weekly on topics such as Data
Science, if you don't want to
miss out, just subscribe to our
Newsletter to receive weekly
news on your email!
KAGGLE IMAGE COMPETITION,
HOW TO DEAL WITH LARGE
DATASETS

When I have to deal with Huge image datasets, this is what I

do. Working with image datasets in Kaggle competitions
can be quite problematic, your computer could just freeze
and don't care about you anymore. To stop this things from
happening, I'm going to be sharing with you here the 5
Major Steps to work with Image datasets.
THE 5 MAJOR STEPS
I'm posting videos every week and if you don't
want to miss out, subscribe to the channel!
KAGGLE TUTORIAL :
COMPETITIONS – PART I

This Kaggle competition is a great way to get your

hands on real data science and data analysis
problems.
HUMPBACK WHALE IDENTIFICATION
One of the major problems when learning data science is
how to get your hands on real problems. If you want to
become a real data scientist or learn data science, Kaggle is
one of best places to practice data science.

ABOUT THIS TUTORIAL

Here I'm going to be doing this kaggle tutorial on how to get
started in one of the current competitions of the website. If
you want to follow along, just go to the competitions, and
scroll down to the Humpback whale identification challenge.
I've been playing around with the humpback whale
identification challenge for about a month now. You can
checkout the prizes for this competition, they are up to 10k
dollars.
BREAKING DOWN
This Kaggle competition is a great way to get your hands on
real data science and data analysis I'm going to be breaking
down this competition from the very start.

We are going to be going from 0 to creating a model to

make our submissions.

In this first video I'll be showing you the kernel I've made so
that you can follow along with the videos. For those that
aren't familiar with kaggle, this kernels are like jupyter
notebooks that you can run on the cloud.

You can check out the specifications for the machine

running your scripts here. And you can also check out the
commits made to the kernel. The specifications are quite
reasonable to run your first models
LET'S GET CODING
Ok, Now that I gave you an introduction to the kernels at
kaggle, we can move into the coding part. To make our
model we'll use pytorch, I got quite surprised when I asked
if you wanted more videos on keras or pytorch and you
choose pytorch, but this great, I've enjoyed pytorch much
better then keras and tensorflow so far.

A part from pytorch, you can see pytorch beeing imported

here, you'll use the os library, to work with the files, also
going to be using pandas, We can't miss that on our data
science project .

For the matrices and vector calculations we'll be using our

old friend numpy. To understand a bit better and visualize
our dataset we'll use matplotlib.
KAGGLE TUTORIAL :
COMPETITIONS – PART II

This Kaggle competition is a great way to get your

hands on real data science and data analysis
problems.
HUMPBACK WHALE IDENTIFICATION
We are going to take the first steps to the kaggle
competition today! YEAH! To participate in kaggle, one of
the major choices one has to make today is what deep
learning frameworks to use, because, well, there’s lot’s of
frameworks out there.

PYTORCH
I’ve asked around and you’ve choosen PyTorch, and this is
great, because I’m loving PyTorch so far. If you haven’t see
the first video, it’s fine, I know your time is precious, I’ll just
lay out for you a review. I just introduced the Kaggle
website, the competition, the prizes and whatthat this is a
series of videos is all about, after finishing this video you still
want to watch the first, great. I see you there.
BREAKING DOWN
I’m going to be breaking down this competition from the
very start. We are going to be going from 0 to creating a
model to make our submissions.

In this first video I’ll be showing you the kernel I’ve made so
that you can follow along with the videos. For those that
aren’t familiar with kaggle, this kernels are like jupyter
notebooks that you can run on the cloud.

You can check out the specifications for the machine

running your scripts here. And you can also check out the
commits made to the kernel.

The specifications are quite reasonable to run your first

models
THE FUN PART
Now for the fun part. We already go through the libraries
here, the next step is to create a class for our dataset.

But why do we need a class for our dataset? I understand

you, the first time I’ve tried to play around with PyTorch I get
a little frustrated that there wasn’t a simple way to load the
dataset.

I’m not talking about MNIST and CIFAR10 like datasets here,
there are simple ways to load this datasets into memory.
THE FUN PART
I’m talking about a custom dataset, just like you’ll encounter if
get the chance to work as a data scientist. But I’m glad I got
around and created the dataset, because this get’s pretty
handy to deal with more complex situations

And when you create the first time for a dataset, you pretty
much copy and paste the Class and make the adjustments for
your specific dataset,

I myself followed this tutorial on the pytorch documentation, if

you want to have a look, it’s a great reading addition to this
tutorial, let me know in the comments if you founded usefull
the reference so I make more of these in the videos
THE CLASS
The first thing we create here is the __init__ method, it’s a
good idea if you want to share your code to create a docstring
in the functions. I’ve explained here the parameters to this
function, we need to pass the path of the csv file containing
the data, also we need to pass the root directory of our project,
then we can pass a transform, We’ll come back to this later,
and we can also pass if this is the testing dataset.

You can see here, if we have a test dataset, In this case I passed
the dataset to the class, you could also change this to receive
the csv path filename to the test dataset and read with
pandas inside here.
THE CLASS
If we are not passing the test dateset, we call the one hot
encoding function. Here we read the training dataset with
pandas, you can use df.head() to checkout the dataset, we
have the name of the images and the classes.

Now that we have created our dataframe, we can create

also a variable for our labels, To transform our labels into
one hot encoded vectors, we can use sklearn. We can see
here that it transformed the class into a one dimensional
vector.

Continuing here we just add the roo directory and the

transform , we’ll get back to this transform later. Now we
have two more methods, the len and getitem, the len
method will only return the length of our dataset, the
__getitem__ is more interesting.
THE CLASS
This function is the one you need to implement to get one
record from your dataset, we get the img_name by joining
the root directory of our project and the name of the image,
we use the iloc function from pandas here.

We can use this to get a record from our dataset, if we just

put the index 0 here, it’ll return the first record from our
dataset, but we want the image name, so we add another
argument to let the function know we want the first
column.

After this we get the associated label with that image, load
the image into memory and return as a dict.
INSTANTIATE OUR CLASS
We can instatiate our dataset now. You can call the dataset
and pass the index, this is the index used in the getitem
function we just saw.

We have the image and the label, we can use matplotlib to

plot if we want to check if it’s ok. In the next tutorials we’ll
be moving on to creating a class to handle our dataset, then
making some basic preprocessing so we can create our
conv neu net with pytorch.

AI - ML Resource Sheet
100% (1)
AI - ML Resource Sheet
10 pages
Kaggle Machine Learning Workshop with R
100% (2)
Kaggle Machine Learning Workshop with R
61 pages
Kaggle: Your Machine Learning and Data Science Community
No ratings yet
Kaggle: Your Machine Learning and Data Science Community
7 pages
Mathematical Foundations of Machine Learning
No ratings yet
Mathematical Foundations of Machine Learning
74 pages
Indian Startup Funding Report Q1 2023
No ratings yet
Indian Startup Funding Report Q1 2023
57 pages
Data Science Careers Guide Springboard Final
No ratings yet
Data Science Careers Guide Springboard Final
75 pages
Python for Machine Learning Basics
100% (1)
Python for Machine Learning Basics
8 pages
Data Science Skills for Job Seekers
No ratings yet
Data Science Skills for Job Seekers
19 pages
A Beginners Guide To Getting First Data Science Job PDF
No ratings yet
A Beginners Guide To Getting First Data Science Job PDF
64 pages
College Essay Writing Tips
No ratings yet
College Essay Writing Tips
3 pages
PY0101 - Python For Data Science, AI, & Development Cheat Sheet
No ratings yet
PY0101 - Python For Data Science, AI, & Development Cheat Sheet
2 pages
Face Recognition in The Browser With Tensorflow - Js & JavaScript
No ratings yet
Face Recognition in The Browser With Tensorflow - Js & JavaScript
13 pages
Ineuron Intelligence PVT LTD: Internship Offer Letter
No ratings yet
Ineuron Intelligence PVT LTD: Internship Offer Letter
1 page
Top 10 Deep Learning Algorithms You Should Know in 2023
No ratings yet
Top 10 Deep Learning Algorithms You Should Know in 2023
14 pages
Machine Learning Resource Guide
No ratings yet
Machine Learning Resource Guide
11 pages
ML System Design
100% (1)
ML System Design
11 pages
The-Road-To-React-Your-Journey-To-Master-Plain-Yet-Pragmatic-React-2020 EDITION
No ratings yet
The-Road-To-React-Your-Journey-To-Master-Plain-Yet-Pragmatic-React-2020 EDITION
207 pages
Big Tech Interview Guide for Grads
No ratings yet
Big Tech Interview Guide for Grads
14 pages
Becoming AI Engineer Learning Path
No ratings yet
Becoming AI Engineer Learning Path
4 pages
Python Codin
No ratings yet
Python Codin
4 pages
Dynamic Pricing for Substitutable Flights
No ratings yet
Dynamic Pricing for Substitutable Flights
16 pages
Google Interview Questions Guide
67% (6)
Google Interview Questions Guide
2 pages
Exploratory Data Analysis With Matlab
No ratings yet
Exploratory Data Analysis With Matlab
360 pages
MongoDB CheatSheet
No ratings yet
MongoDB CheatSheet
9 pages
Automate Strategy Finding With LLM in Quant Invest
No ratings yet
Automate Strategy Finding With LLM in Quant Invest
13 pages
Win Kaggle Competition Course
No ratings yet
Win Kaggle Competition Course
14 pages
How To Win A Data Science Competition
No ratings yet
How To Win A Data Science Competition
78 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
4 pages
Ds Roadmap1
No ratings yet
Ds Roadmap1
5 pages
Basics of ML
No ratings yet
Basics of ML
4 pages
? Complete Roadmap To Become A Professional Data Scientist
No ratings yet
? Complete Roadmap To Become A Professional Data Scientist
5 pages
ML RoadMap
No ratings yet
ML RoadMap
28 pages
Action PlanJournaling
No ratings yet
Action PlanJournaling
7 pages
Data Science Self-Learning Guide
100% (3)
Data Science Self-Learning Guide
16 pages
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
No ratings yet
OceanofPDF - Com Hands-On Machine Learning From Scratch - Venelin Valkov
119 pages
Kaggle Guide: Mastering Python for Data Science
No ratings yet
Kaggle Guide: Mastering Python for Data Science
10 pages
Data Science Roadmap for Beginners
No ratings yet
Data Science Roadmap for Beginners
4 pages
Machine Learning A-Z - Become Kaggle Master (AvaxHome)
No ratings yet
Machine Learning A-Z - Become Kaggle Master (AvaxHome)
4 pages
Getting Started
No ratings yet
Getting Started
10 pages
Dogs vs Cats CNN Classification Guide
No ratings yet
Dogs vs Cats CNN Classification Guide
18 pages
Approaching Any Machine Learning Problem
No ratings yet
Approaching Any Machine Learning Problem
22 pages
Kaggle Competition Strategy Guide
No ratings yet
Kaggle Competition Strategy Guide
2 pages
1data Preprocessing
No ratings yet
1data Preprocessing
4 pages
Week 3 A
No ratings yet
Week 3 A
18 pages
ML Road Map
No ratings yet
ML Road Map
13 pages
Ai and Data Science
No ratings yet
Ai and Data Science
9 pages
Machine Learning Mastery Pathway
No ratings yet
Machine Learning Mastery Pathway
7 pages
Data Analysis and Machine Learning With Kaggle How To Win Competitions On Kaggle and Build A Successful Career in Data Science 1801817472 9781801817479
100% (1)
Data Analysis and Machine Learning With Kaggle How To Win Competitions On Kaggle and Build A Successful Career in Data Science 1801817472 9781801817479
48 pages
Master Machine Learning in 30 Days
No ratings yet
Master Machine Learning in 30 Days
25 pages
Data Science Student Schedule
No ratings yet
Data Science Student Schedule
7 pages
Machine Learning and Deep Learning Diplom1
No ratings yet
Machine Learning and Deep Learning Diplom1
9 pages
Agentic AI
No ratings yet
Agentic AI
26 pages
Complet ML
No ratings yet
Complet ML
44 pages
Master Machine Learning in Just 30 Days Version01
No ratings yet
Master Machine Learning in Just 30 Days Version01
25 pages
Kaggle
No ratings yet
Kaggle
4 pages
(V2) Kaggle's Community Competitions Setup Guide and FAQs
No ratings yet
(V2) Kaggle's Community Competitions Setup Guide and FAQs
24 pages
Timeline of Learning Python, AI, Machine Learning, Data Analysis
No ratings yet
Timeline of Learning Python, AI, Machine Learning, Data Analysis
8 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Ar PDF
No ratings yet
Ar PDF
26 pages
PTCL Financial Analysis Report 2023
No ratings yet
PTCL Financial Analysis Report 2023
30 pages
Teacher Resume Writing Guide
100% (2)
Teacher Resume Writing Guide
4 pages
Unit 11
100% (1)
Unit 11
3 pages
Tinnevelly History to 1801
No ratings yet
Tinnevelly History to 1801
317 pages
Bhali Si Ek Shakal Thi Explanation
No ratings yet
Bhali Si Ek Shakal Thi Explanation
4 pages
Unit 12 - Lesson B: What's For Dinner?: Touchstone 2nd Edition - Language Summary - Level 1
No ratings yet
Unit 12 - Lesson B: What's For Dinner?: Touchstone 2nd Edition - Language Summary - Level 1
4 pages
SEA 30 Syllabus 1st2016
No ratings yet
SEA 30 Syllabus 1st2016
2 pages
COT MATH5 qtr3 Wk7
No ratings yet
COT MATH5 qtr3 Wk7
10 pages
Invention Offer Letter
No ratings yet
Invention Offer Letter
2 pages
Sow C++ Cso Chapter 03 9e
No ratings yet
Sow C++ Cso Chapter 03 9e
69 pages
The Hamilton Rating Scale For Depression: Questionnaire Review
No ratings yet
The Hamilton Rating Scale For Depression: Questionnaire Review
1 page
Table Tennis Knowledge Assessment
No ratings yet
Table Tennis Knowledge Assessment
3 pages
Gaveshanam )
No ratings yet
Gaveshanam )
111 pages
CMO 03 s2007
No ratings yet
CMO 03 s2007
222 pages
Advocacy Project Part II
No ratings yet
Advocacy Project Part II
7 pages
Question 1-Solve: Answer - X 2 7: Exercise - 2.1
No ratings yet
Question 1-Solve: Answer - X 2 7: Exercise - 2.1
22 pages
Henry Wadsworth Longfellow
No ratings yet
Henry Wadsworth Longfellow
3 pages
Uncovering Watergate's Hidden Truths
No ratings yet
Uncovering Watergate's Hidden Truths
13 pages
Five Snapshots by Lina Flor
100% (1)
Five Snapshots by Lina Flor
11 pages
1 - Some Boilerplate Clauses
No ratings yet
1 - Some Boilerplate Clauses
12 pages
Construction Management System
25% (4)
Construction Management System
23 pages
Character Charts
No ratings yet
Character Charts
28 pages
First Conditional Lesson Plan
No ratings yet
First Conditional Lesson Plan
3 pages
Five Kingdom Classification Solutions ICSE Class9
No ratings yet
Five Kingdom Classification Solutions ICSE Class9
3 pages
The New Diversity Equity and Inclusion D
No ratings yet
The New Diversity Equity and Inclusion D
16 pages
Chemistry - Rune Factory - Guardians of Azuma Walkthrough & Guide - GameFAQs
No ratings yet
Chemistry - Rune Factory - Guardians of Azuma Walkthrough & Guide - GameFAQs
3 pages
Dcs Crosswalk - 2019 2003
No ratings yet
Dcs Crosswalk - 2019 2003
8 pages
Symptoms of Peptic Ulcer Disease
No ratings yet
Symptoms of Peptic Ulcer Disease
2 pages
Enviromental3 Safe Places
0% (1)
Enviromental3 Safe Places
10 pages

Kaggle Tutorial 1

Uploaded by

Kaggle Tutorial 1

Uploaded by

This is like the most usefull resource on practical

AI that you can possibly find today! Have you

Ok, Now you already have logged into Kaggle, to

In this tutorial we're going to be looking at

Now we get to the competitions page, you can

Now the Game is On! We can have amazing

If you aren't familiar with the term commit have a

You can open your Kernel by clicling on the title

When I have to deal with Huge image datasets, this is what I

This Kaggle competition is a great way to get your

ABOUT THIS TUTORIAL

We are going to be going from 0 to creating a model to

You can check out the specifications for the machine

A part from pytorch, you can see pytorch beeing imported

For the matrices and vector calculations we'll be using our

This Kaggle competition is a great way to get your

You can check out the specifications for the machine

The specifications are quite reasonable to run your first

But why do we need a class for our dataset? I understand

I myself followed this tutorial on the pytorch documentation, if

Now that we have created our dataframe, we can create

Continuing here we just add the roo directory and the

We can use this to get a record from our dataset, if we just

We have the image and the label, we can use matplotlib to

You might also like