0% found this document useful (0 votes)
784 views29 pages

Kaggle Tutorial 1

This document provides instructions for getting started with Kaggle competitions using Jupyter Notebooks on Kaggle. It discusses creating a Kaggle account, navigating to competitions, creating a kernel (notebook), installing necessary libraries like PyTorch, defining a dataset class to load competition data, and making the kernel's code public. The steps covered include logging into Kaggle, exploring sample competitions and datasets, starting a new kernel, using Jupyter Notebooks, committing and publishing code, and accessing kernels on one's profile page. The overall document aims to tutorial users through getting set up on Kaggle and beginning to engage with their platform and machine learning competitions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
784 views29 pages

Kaggle Tutorial 1

This document provides instructions for getting started with Kaggle competitions using Jupyter Notebooks on Kaggle. It discusses creating a Kaggle account, navigating to competitions, creating a kernel (notebook), installing necessary libraries like PyTorch, defining a dataset class to load competition data, and making the kernel's code public. The steps covered include logging into Kaggle, exploring sample competitions and datasets, starting a new kernel, using Jupyter Notebooks, committing and publishing code, and accessing kernels on one's profile page. The overall document aims to tutorial users through getting set up on Kaggle and beginning to engage with their platform and machine learning competitions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

This is like the most usefull resource on practical

AI that you can possibly find today! Have you


ever wanted to play around with
the AMAZING Kaggle datasets? If this interests
you, buckle in!
GETTING STARTED!
If you are new to Kaggle, you can create your
account with
- Google
- Facebook
- Kaggle
YOU ALREADY LOGGED IN KAGGLE

Ok, Now you already have logged into Kaggle, to


start playind around you can go to the
competitions. You can have a look at the most
recent and what prizes they offer.
HUMPBACK WHALE IDENTIFICATION
CHALLENGE

In this tutorial we're going to be looking at


the recent ( late 2018 ) Humpback Whale
Identification Challenge, MNIST is also a good
place to start, you can look have a look at
how  simple  it is.
CREATING YOUR FIRST KERNEL

Now we get to the competitions page, you can


see a blue button called "New Kernel", just press
it .
USING JUPYTER NOTEBOOKS
If you have never heard of  Jupyter Notebooks (Do
you live in a cave? hahhaa ). They are an amazing
resource to share replicable code, they were used even
for the  Gravitational Waves!
Pretty amazing not? We are going to be using them in
our Kernels at Kaggle
GAME ON!

Now the Game is On! We can have amazing


Kernels and share with the community!
COMMITING YOUR CHANGES

If you aren't familiar with the term commit have a


look here (This channel has also Amazing resources
within the Data Science field! ). You can commit
your changes, don't worry, this wont make your
Kernel public yet, you can go nuts!
YOUR PROFILE HAS YOUR
KERNELS!

You finished all your code and tests and now you
are ready to move on! Make the final commit and
go to your profile. There lies your precious Kernels!
ALMOST THERE!

You can open your Kernel by clicling on the title


GO PUBLIC
To make your now Private Kernel Public you need
to click on the "Access" button
BE PROUD
Change the Privacy Options from Private to Public and
that's it! But wait, my kernel isn't showing up, I know I know,
I've been were you are. If you went to the  public kernels
 and didn't find your own, don't panic, the Kaggle website
takes sometime to update the Kernels. Now be PROUD,
you've made your very first public Kernel!
10 MAJOR STEPS IN YOUR KERNEL
In this video I'll show you how the 10 major steps in creating
your very first simple model to this Whale Competion! If you
enjoy the content, consider subscribing and activating the
notification, I upload videos every week on Data Science
topics!
If you want more on this
content just or any other
content. Let me know on the
comments bellow. We post
weekly on topics such as Data
Science, if you don't want to
miss out, just subscribe to our
Newsletter to receive weekly
news on your email! 
KAGGLE IMAGE COMPETITION,
HOW TO DEAL WITH LARGE
DATASETS

When I have to deal with Huge image datasets, this is what I


do. Working with image datasets in Kaggle competitions
can be quite problematic, your computer could just freeze
and don't care about you anymore. To stop this things from
happening, I'm going to be sharing with you here the 5
Major Steps to work with Image datasets.
THE 5 MAJOR STEPS
I'm posting videos every week and if you don't
want to miss out, subscribe to the channel!
KAGGLE TUTORIAL :
COMPETITIONS – PART I

This Kaggle competition is a great way to get your


hands on real data science and data analysis
problems.
HUMPBACK WHALE IDENTIFICATION
One of the major problems when learning data science is
how to get your hands on real problems. If you want to
become a real data scientist or learn data science, Kaggle is
one of best places to practice data science.

ABOUT THIS TUTORIAL


Here I'm going to be doing this kaggle tutorial on how to get
started in one of the current competitions of the website. If
you want to follow along, just go to the competitions, and
scroll down to the Humpback whale identification challenge.
I've been playing around with the humpback whale
identification challenge for about a month now. You can
checkout the prizes for this competition, they are up to 10k
dollars.
BREAKING DOWN
This Kaggle competition is a great way to get your hands on
real data science and data analysis I'm going to be breaking
down this competition from the very start.

We are going to be going from 0 to creating a model to


make our submissions.

In this first video I'll be showing you the kernel I've made so
that you can follow along with the videos. For those that
aren't familiar with kaggle, this kernels are like jupyter
notebooks that you can run on the cloud.

You can check out the specifications for the machine


running your scripts here. And you can also check out the
commits made to the kernel. The specifications  are quite
reasonable to run your first models
LET'S GET CODING
Ok, Now that I gave you an introduction to the kernels at
kaggle, we can move into the coding part. To make our
model we'll use pytorch, I got quite surprised when I asked
if you wanted more videos on keras or pytorch and you
choose pytorch, but this great, I've enjoyed pytorch much
better then keras and tensorflow so far.

A part from pytorch, you can see pytorch beeing imported


here, you'll use the os library, to work with the files, also
going to be using pandas, We can't miss that on our data
science project .

For the matrices and vector calculations we'll be using our


old friend numpy. To understand a bit better and visualize
our dataset we'll use matplotlib.
KAGGLE TUTORIAL :
COMPETITIONS – PART II

This Kaggle competition is a great way to get your


hands on real data science and data analysis
problems.
HUMPBACK WHALE IDENTIFICATION
We are going to take the first steps to the kaggle
competition today! YEAH! To participate in kaggle, one of
the major choices one has to make today is what deep
learning frameworks to use, because, well, there’s lot’s of
frameworks out there.

PYTORCH
I’ve asked around and you’ve choosen PyTorch, and this is
great, because I’m loving PyTorch so far. If you haven’t see
the first video, it’s fine, I know your time is precious, I’ll just
lay out for you a review. I just introduced the Kaggle
website, the competition, the prizes and whatthat this is a
series of videos is all about, after finishing this video you still
want to watch the first, great. I see you there.
BREAKING DOWN
I’m going to be breaking down this competition from the
very start. We are going to be going from 0 to creating a
model to make our submissions.

In this first video I’ll be showing you the kernel I’ve made so
that you can follow along with the videos. For those that
aren’t familiar with kaggle, this kernels are like jupyter
notebooks that you can run on the cloud.

You can check out the specifications for the machine


running your scripts here. And you can also check out the
commits made to the kernel.

The specifications  are quite reasonable to run your first


models
THE FUN PART
Now for the fun part. We already go through the libraries
here, the next step is to create a class for our dataset.

But why do we need a class for our dataset? I understand


you, the first time I’ve tried to play around with PyTorch I get
a little frustrated that there wasn’t a simple way to load the
dataset.

I’m not talking about MNIST and CIFAR10 like datasets here,
there are simple ways to load this datasets into memory.
THE FUN PART
I’m talking about a custom dataset, just like you’ll encounter if
get the chance to work as a data scientist. But I’m glad I got
around and created the dataset, because this get’s pretty
handy to deal with more complex situations

And when you create the first time for a dataset, you pretty
much copy and paste the Class and make the adjustments for
your specific dataset,

I myself followed this tutorial on the pytorch documentation, if


you want to have a look, it’s a great reading addition to this
tutorial, let me know in the comments if you founded usefull
the reference so I make more of these in the videos
THE CLASS
The first thing we create here is the __init__ method, it’s a
good idea if you want to share your code to create a docstring
in the functions. I’ve explained here the parameters to this
function, we need to pass the path of the csv file containing
the data, also we need to pass the root directory of our project,
then we can pass a transform, We’ll come back to this later,
and we can also pass if this is the testing dataset.

You can see here, if we have a test dataset, In this case I passed
the dataset to the class, you could also change this to receive
the csv path filename to the test  dataset and read with
pandas inside here.
THE CLASS
If we are not passing the test dateset, we call the one hot
encoding function. Here we read the training dataset with
pandas, you can use df.head() to checkout the dataset, we
have the name of the images and the classes.

Now that we have created our dataframe, we can create


also a variable for our labels, To transform our labels into
one hot encoded vectors, we can use sklearn. We can see
here that it transformed the class into a one dimensional
vector.

Continuing here we just add the roo directory and the


transform , we’ll get back to this transform later. Now we
have two more methods, the len and getitem, the len
method will only return the length of our dataset, the
__getitem__ is more interesting.
THE CLASS
This function is the one you need to implement to get one
record from your dataset, we get the img_name by joining
the root directory of our project and the name of the image,
we use the iloc function from pandas here.

We can use this to get a record from our dataset, if we just


put the index 0 here, it’ll return the first record from our
dataset, but we want the image name, so we add another
argument to let the function know we want the first
column.

After this we get the associated label with that image, load
the image into memory and return as a dict.
INSTANTIATE OUR CLASS
We can instatiate our dataset now. You can call the dataset
and pass the index, this is the index used in the getitem
function we just saw.

We have the image and the label, we can use matplotlib to


plot if we want to check if it’s ok. In the next tutorials we’ll
be moving on to creating a class to handle our dataset, then
making some basic preprocessing so we can create our
conv neu net with pytorch.

You might also like