You are on page 1of 3

05/02/2016

DataSecondAnnualDataScienceBowl|Kaggle
Host

Competitions

Datasets

Scripts

Jobs

Community Velu Pandian Ravichandran

Logout

$200,000 460 teams

Second Annual Data Science Bowl


Mon 14 Dec 2015

Dashboard
Home
Data
Make a submission
Information
Description
Evaluation
Rules
Prizes
About the DSB
Deep Learning Tutorial
Fourier Based Tutorial
Resources
Timeline

Forum
Leaderboard
My Submissions

Leaderboard
1. heart
2. Tencia & woshialex

Merger and 1st Submission Deadline

Mon 14 Mar 2016 (38 days to go)

Competition Details Get the Data Make a submission


1.00

Data Files
File Name

Available Formats

validate

.zip (5.16 gb)

train

.zip (12.71 gb)

train.csv

.zip (3.05 kb)

sample_submission_validate.csv

.zip (3.12 kb)

In this dataset, youare givenhundreds of cardiac MRI images in DICOMformat. These


are 2D cine images thatcontain approximately30 images across the cardiac cycle. Each
slice is acquired on a separate breath hold. This is important since the registration from
slice to slice is expected to be imperfect.
The competition task is to create an automated method capable of determiningthe left
ventricle volume at two points in time: after systole, when the heart is contracted and
the ventricles are at theirminimum volume, and after diastole, when the heart is at its
largest volume.

3. Mike
4. PaulG
5. Tim Hochberg
6. BoShuang
7. nagadomi
8. Keras.io
9. BioMedIA
10. h-wit

Forum (113 topics)


Java

7 hours ago

Keras Deep Learning tutorial


(~0.0359)
20 hours ago

nolearn BatchIterator question


22 hours ago

The volumes at systole, VS , and diastole, VD ,form the basis ofan important clinical
measurement known as the ejection fraction:

The results need to be


reproducible?

100

yesterday

VD

yesterday

Would anyone with score <0.017


like to team up?

VD VS

This quantity represents the fraction of outbound blood pumped from the heart with

https://www.kaggle.com/c/secondannualdatasciencebowl/data

1/3

05/02/2016

DataSecondAnnualDataScienceBowl|Kaggle

yesterday

each heartbeat. An ejection fraction thatis too low can signify a wide range of cardiac

Sunnybrook data

problems.

yesterday

teams
players
entries

Variations in anatomy, function, image quality, and acquisitionmake automated


quantificationof left ventricle size a challenging problem. You will encounter this
variation in the competition dataset, which aims to provide a diverse representationof
cases. It contains patients from young to old, images from numerous hospitals, and
hearts from normal to abnormal cardiac function. A computational method which is
robust to these variations could both validate and automatethe cardiologists' manual
measurementof ejection fraction.
This is a two-stage competition. In the first stage, you are building models based on the
training dataset, and testing your models by submitting predictions on the validation
set. Twoweeks before the final deadline, you will submit your model to Kaggle. At this
point, the second stage of the competition starts. Kaggle will release the final test
dataset, on which you will run your models. The final standingsare based on this final
test set.

File descriptions
Each case has an associated directory of DICOM files. The exact number of images will
differ from case to case, either varying inthe number of slices, the views which are
captured, orthe number of frames in the time sequences.
The main view forassessing ventricle size is the short axis stack, which containsimages
taken in a plane perpendicular to the long axis of the left ventricle. Thesehave the
prefix"sax_" in the competition dataset. Most cases also have alternative views, which
you should feel free to incorporate into your methodology.
The structure is as follows:
train.zip- the train set directory, contains cases where you will have the
associated systolic and diastolic volumes
validate.zip- the validationset directory, used for the leaderboard in stage
one of the competition.You should predict the volumes for these cases
duringstage one.
test.zip - the test set,used for the leaderboard in stage twoof the
competition (a.k.a. the final standings).You should predict the volumes for
these cases during stage two. This file will not be released until the second
stage.
train.csv-contains the systolic and diastolic volumes for the cases in the
training set.
sample_submission_validate.csv- a sample submission file in the correct
format for stage one
sample_submission_test.csv - a sample submission file in the correct
format for stage two.This file will not be released until the second stage.

DICOM
The DICOM standard is complexand there are a number of different toolsto work
withDICOM files. You may find the following resources helpful for managingthe
competition data:
Thelite version of OsiriXis useful for viewing images on OSX
https://www.kaggle.com/c/secondannualdatasciencebowl/data

2/3

05/02/2016

DataSecondAnnualDataScienceBowl|Kaggle

pydicom- a packagefor working withimages in python


oro.dicom- a package for working with images inR
Mangois a useful DICOM viewer for Windows users

FAQ
We will add to this section as relevant common questions arise.
How do I know where the left ventricle is? How do I compute its volume?
Watch this video for a primer on the anatomy and process used by clinicians:

Second Annual Data Science Bowl Competition Tutorial ...


1.00

I see more than one series at the same slice location. How should we deal with
those cases?
Generally, a slice location is repeated if there is an artifact on the images. You can use
either slice but the odds are that the last slice at a given slice location is the best the
technologist could acquire.
Some MRI images are not consistent (in size, shape, or structure). What should
we do about these?
We have opted to include as many cases as possible in this dataset. As this is real data
from many sources, it is bound to have some amount of unwanted variability. You
should do your best to handlethesefiles. Since this is a two stage competition and the
test set may have unseen abnormalities, we recommend including some formof error
catching as you write your code.

Citation
The data for the Data Science Bowl is available for research and academic pursuits.
Please cite as Data Science Bowl Cardiac Challenge Data.

2016 Kaggle Inc

About Our Team Careers Terms Privacy Contact/Support

https://www.kaggle.com/c/secondannualdatasciencebowl/data

3/3