You are on page 1of 4

2020 Big Data Challenge Data Portal

Categorized Data Sets



○ This repository represents data for over 50,000 patients with diabetes.. Datasets
include allergy intolerance, family history, medical procedures, disease case
indicator, health conditions, demographics, diagnosis laboratory results, risk factors,
encounter, medication, and vaccine.
● Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE
○ Real-time updates on coronavirus confirmed cases, deaths, and recoveries

General Data Portal- Use these portals to search for your data sets
● Government of Canada Open Data
○ The Health and Safety section includes 1588 Open Data that is relevant to Canadians
● Google Dataset Search
● Kaggle
○ Kaggle is a subsidiary of Google LLC, an online community of data scientists and
machine learning practitioners.
● Geodata
○ The Environmental Data Explorer is the authoritative source for data sets used by
UNEP and its partners in the Global Environment Outlook (GEO) report and other
integrated environment assessments. Its online database holds more than​ ​500
variables, such as national, subregional, regional and global statistics, covering
themes like Freshwater, Population, Forests, Emissions, Climate, Disasters, Health
and GDP
● Gapminder
○ Gapminder Foundation is a non-profit venture registered in Stockholm, Sweden,
that promotes sustainable global development and achievement of the United
Nations Millennium Development Goals by increased use and understanding of
statistics and other information about social, economic and environmental
development at local, national and global levels
● World Bank
○ World Bank Climate Change
○ World Bank Urban Development
○ World Bank Health
○ Data Portal
○ The World Bank is an international financial institution that provides loans and
grants to the governments of poorer countries for the purpose of pursuing capital
projects. It comprises two institutions: the International Bank for Reconstruction
and Development, and the International Development Association.
● STEM Base

Public Health Data - General


● Government of Canada Public Health Data
○ Resources on the state of Canadians' health, including statistics, facts, reports and
data.
● Public Health Agency Canada
○ The Public Health Agency of Canada is an agency of the Government of Canada that
is responsible for public health, emergency preparedness and response, and
infectious and chronic disease control and prevention.
● Health Research Data Resources
● National Centre for Health Statistics
○ The National Center for Health Statistics is a principal agency of the U.S. Federal
Statistical System which provides statistical information to guide actions and
policies to improve the health of the American people.
● UN Office on Drugs and Crime Data
○ UNODC regularly provides global statistical series on crime, criminal justice, drug
trafficking and prices, drug production, and drug use. Data produced by UNODC
have multiple sources. Member States regularly submit to UNODC statistics on drugs
(through the Annual Report Questionnaire) and crime and criminal justice (through
the Crime Trend Survey). Other data are collected through national surveys
implemented by UNODC in cooperation with national governments or are compiled
from scientific literature. UNODC also applies scientific methods to maximize the
comparability of the data and estimate regional and global statistics.
● CDC Drug Overdose Data
● US National Survey on Drug Use and Health
● CDC Wonder (General Health Data)
● Health Data by US County/State
● Canadian Institute for Health Information
○ The Canadian Institute for Health Information (CIHI) is an independent,
not-for-profit organization that provides essential information on Canada's health
systems and the health of Canadians. CIHI provides comparable and actionable data
and information that are used to accelerate improvements in health care, health
system performance and population health across Canada. Stakeholders use the
broad range of health system databases, measurements and standards, together
with evidence-based reports and analyses, in their decision-making processes. CIHI
protects the privacy of Canadians by ensuring the confidentiality and integrity of the
health care information.
● Leafly​- Reviews and consumer data- can potentially conduct meta-analysis with strain
reviews
● Molecule Attributes​- drug bank data
● WHO:
○ Description: ​provides datasets based on global health priorities. The organization
includes easy search and provides insights for topics along with the datasets.
● CDC:
○ Description: use this for US-specific public health. The CDC maintains WONDER
(Wide-ranging Online Data for Epidemiological Research) and sets are searchable by
topic, state, and other factors.
● 1000 Genomes Project​:
○ Sequencing from 2500 individuals and 26 different populations. It’s one of the
biggest genome repositories you can access and is an international collaboration. It’s
accessed through AWS. (Note, there are grants available for genome projects)

● CHDS​: Child Health and Development Studies datasets are intended to research how disease and
health pass down through generation. It contains datasets for research into not just genomic
expression but how social, environmental, and cultural factors play into disease and health
● Biobank​: ​Integrated within the Canadian Health Measures Survey (CHMS), the biobank is
designed to produce a nationally representative cohort to facilitate the progress of new and
innovative health research projects. The biobank currently holds biospecimens (blood,
urine and DNA) collected from over 22,000 consenting Canadians between the ages of 3 to
79 years. Additionally, respondent privacy and confidentiality is of upmost importance and
are upheld by Statistics Canada governance and responsibilities under the ​Statistics Act​.
● Centre for Health Informatics and Analytics (CHIA)​: ​The Center for Health Informatics and
Analytics (CHIA), based at Memorial University, provides next generation health informatics
and data analytics hardware and software platforms to facilitate the rapid interrogation and
integration of complex clinical and research data from multiple partner organizations.
● Canadian Research Data Centre Network (CRDCN)​:​ The CRDCN offers access to over 300 data
cycles collected by Statistics Canada that include social, economic, and health determinants.
For example, health information is available from the Canadian Community Health Survey,
Healthy Aging Survey, Survey on Living with Chronic Diseases in Canada, Canadian Survey
on Disability, and Mental Health Survey.

Images
● OASIS​:
○ Description: Open Access Series of Imaging makes neuroimages of the brain freely,
hoping to foster research and new advances in both basic health and clinical
neuroscience
● OpenfMRI​:
○ Other imaging data sets from MRI machines to foster research, better diagnostics, and
training. It includes 95 datasets from 3372 subjects with new material being added as
researchers make their own data open to the public.
● CT Medical Images​:
○ This one is a small dataset, but it’s specifically cancer-related. It contains labeled
images with age, modality, and contrast tags. Again, high-quality images associated
with training data may help speed breakthroughs.
● Deep Lesion​:
○ One of the largest image sets currently available. CT images released from the NIH to
help with better accuracy of lesion documentation and diagnosis. It includes over 32,000
lesions from 4000 unique patients.

You might also like