Professional Documents
Culture Documents
List of Datasets For Machine-Learning Research
List of Datasets For Machine-Learning Research
These datasets are applied for machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of
machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively,
the availability of high-quality training datasets.[1] High-quality labeled training datasets for supervised and semi-supervised machine learning algorithms are
usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality
datasets for unsupervised learning can also be difficult and costly to produce.[2][3][4][5]
Many organizations including governments publish and share their datasets . The datasets are classified, based on the licenses, as Open data and Non-Open data.
The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made
available for searching, depositing and accessing through interfaces like Open API. The datasets are made available as various sorted types and subtypes.
Finance, Economics, Commerce, Societal, Health, Academy, Sports, Food, Agriculture, Travel, Geospatial, Political, Consumer,
Specific category
Transport, Logistics, Environmental, Real-Estate, Legal, Entertainment, Energy, Hospitality
Scope Supranational Union, National, Subnational, Municipality, Urban, Rural
Status (https://docs.openml.org/
Verified, In-Preparation, Deactivated(or Deprecated)
#dataset-status)
Number of records 100s, 1000s, 10000s, 100000s, Millions
The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many
government organizations and academic institutions.
https://ckan.github.io/ckan-instances/
Data repository for government or non-profit
Comprehensive Knowledge
AGPL organisations, Data Management Solution for
Archive Network (CKAN) https://github.com/sebneu/ckan_instances/blob/master/instances.csv Research Institutes
https://dataverse.org/installations
Data Management Solution for Research
Dataverse Apache
https://dataverse.org/metrics Institutes
Datasetlist.com https://www.datasetlist.com
Global Open Data Index – Open Knowledge https://index.okfn.org/ Archived (https://web.archive.org/web/20200525213547/https://index.okfn.org/) 25 May 2020 at the
Foundation Wayback Machine
Google Dataset Search https://datasetsearch.research.google.com/
Kaggle https://www.kaggle.com/datasets
OpenDOAR https://v2.sherpa.ac.uk/opendoar/
OpenML https://www.openml.org/search?type=data
Papers with Code https://paperswithcode.com/datasets
Image data
These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.
Facial recognition
In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces.
Dataset Created
Brief description Preprocessing Instances Format Default task Reference Creator
name (updated)
Files labelled
Ryerson
7,356 video and with
Audio-Visual Classification,
audio recordings of expression. S.R.
Database of face
24 professional Perceptual [12][13] Livingstone
Emotional 7,356 Video, sound files recognition, 2018
actors. 8 emotions validation and F.A.
Speech and voice
each at two ratings Russo
Song recognition
intensities. provided by
(RAVDESS)
319 raters.
Location of
Color images of facial features Classification,
[14][15] M. Grgic et
SCFace faces at various extracted. 4,160 Images, text face 2011
al.
angles. Coordinates of recognition
features given.
Faces of 15
Yale Face individuals in 11 Labels of Face [16][17] J. Yang et
165 Images 1997
Database different expressions. recognition al.
expressions.
Cohn-Kanade
Large database of Tracking of Facial
AU-Coded 500+ [18][19] T. Kanade
images with labels certain facial Images, text expression 2000
Expression sequences et al.
for expressions. features. analysis
Database
Images of faces
BioID Face Manually set Face [24][25]
with eye positions 1521 Images, text 2001 BioID
Database eye positions. recognition
marked.
neutral face, 5
expressions: anger, Face
UOY 3D- [30][31] University
Face
happiness, sadness, labeling. 5250 Images, text recognition, 2004
of York
eyes closed, classification
eyebrows raised.
Institute of
Expressions: Anger,
CASIA 3D Face Automation,
smile, laugh, [32][33]
Face
surprise, closed
None. 4624 Images, text recognition, 2007 Chinese
Database classification Academy of
eyes.
Sciences
Expressions: Anger Annotated Visible
Face
Disgust Fear Spectrum and Near Infrared [34] Zhao, G. et
CASIA NIR None. 480 recognition, 2011
Happiness Sadness Video captures at 25 al.
classification
Surprise frames per second
Up to 22 samples
Face National
for each subject.
Recognition Face Institute of
Expressions: anger, [36][37]
Grand None. 4007 Images, text recognition, 2004 Standards
happiness, sadness,
Challenge classification and
surprise, disgust,
Dataset Technology
puffy. 3D Data.
Up to 61 samples
for each subject.
Expressions neutral Face King Juan
Gavabdb face, smile, frontal None. 549 Images, text recognition, 2008 [38][39] Carlos
accentuated laugh, classification University
frontal random
gesture. 3D images.
Gender
A set of
classification,
synthetic filters
112 persons (66 42,592 face
(blur,
males and 46 (2,662 detection,
occlusions,
females) wear original face [42][43] Afifi, M. et
SoF noise, and Images, Mat file 2017
glasses under image × 16 recognition, al.
posterization )
different illumination synthetic age
with different
conditions. image) estimation,
level of
and glasses
difficulty.
detection
Gender
classification,
IMDb and Wikipedia face
R. Rothe,
face images with detection, [44]
IMDb-WIKI None 523,051 Images 2015 R. Timofte,
gender and age face
L. V. Gool
labels. recognition,
age
estimation
Action recognition
Created
Dataset name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
45M Classification,
Large video dataset for Actions classified and Video, images, [47][48]
THUMOS Dataset frames of action 2013 Y. Jiang et al.
action classification. labeled. text
video detection
Dataset Name Brief description Preprocessing Instances Format Default Task Created (updated) Reference
Berkeley 3-D 849 images taken Object bounding boxes 849 labeled images, text Object 2014 [51][52] A.
Object in 75 different and labeling. recognition al.
Dataset scenes. About 50
different object
classes are
labeled.
Labeled object
image database,
Labeled objects, Object
used in the
bounding boxes, recognition, [59][60][61]
ImageNet ImageNet Large 14,197,122 Images, text 2009 (2014) J.
descriptive words, SIFT scene
Scale Visual
features recognition
Recognition
Challenge
A Large set of
images listed as
having CC BY 2.0 2017
license with image- Classification,
Image-level labels, [62]
Open Images level labels and 9,178,275 Images, text Object
bounding boxes
Bounding boxes
recognition (V7 : 2022)
spanning
thousands of
classes.
TV News
Channel TV commercials Audio and video features
Clustering, [63][64]
Commercial and news extracted from still 129,685 Text 2015 P.
classification
Detection broadcasts. images.
Dataset
MI
Classification, Sc
Annotated pictures [72]
LabelMe Objects outlined. 187,240 Images, text object 2005 Art
of scenes.
detection Int
La
Stereo video
sequences
recorded in street Classification,
Cityscapes Pixel-level segmentation [73] Da
scenes, with pixel- 25,000 Images, text object 2016
Dataset and labeling al.
level annotations. detection
Metadata also
included.
Large number of
Classification,
PASCAL VOC images for Labeling, bounding box [74][75] M.
500,000 Images, text object 2010
Dataset classification included et
detection
tasks.
Like CIFAR-10,
Classes labelled,
CIFAR-100 above, but 100 [60][76] A.
training set splits 60,000 Images Classification 2009
Dataset classes of objects et
created.
are given.
A unified Lu
contribution of Ell
CIFAR-10 and Classes labelled, Cro
CINIC-10 [77]
Imagenet with 10 training, validation, test 270,000 Images Classification 2018 An
Dataset
classes, and 3 set splits created. An
splits. Larger than Am
CIFAR-10. Sto
A MNIST-like Classes labelled,
Fashion- [78]
fashion product training set splits 60,000 Images Classification 2017 Za
MNIST
database created.
Some publicly
available fonts and
extracted glyphs
from them to make Classes labelled,
[79] Ya
notMNIST a dataset similar to training set splits 500,000 Images Classification 2011
Bu
MNIST. There are created.
10 classes, with
letters A-J taken
from different fonts.
Images from
vehicles of traffic
German signs on German
Traffic Sign roads. These signs
Detection comply with UN Signs manually labeled 900 Images Classification 2013 [80][81] S
Benchmark standards and
Dataset therefore are the
same as in other
countries.
Autonomous
vehicles driving
through a mid-size
KITTI Vision Classification,
city captured Many benchmarks >100 GB of [82][83][84]
Benchmark Images, text object 2012 AG
images of various extracted from data. data
Dataset detection
areas using
cameras and laser
scanners.
Classes labelled,
Linnaeus 5 Images of 5 [85] Ch
training set splits 8000 Images Classification 2017
dataset classes of objects. Ka
created.
Multi-modal dataset
for obstacle
detection in
agriculture
Classification,
including stereo
object
camera, thermal Classes labelled >400 GB of Images and 3D point [86]
FieldSAFE detection, 2017 M.
camera, web geographically. data clouds
object
camera, 360-
localization
degree camera,
lidar, radar, and
precise
localization.
11,076 hand
images (1600 x
1200 pixels) of 190
Gender
subjects, of varying
11,076 hand Images and (.mat, .txt, and recognition [87]
11K Hands ages between 18 – None 2017 M
images .csv) label files and biometric
75 years old, for
identification
gender recognition
and biometric
identification.
Specifically
designed for
Continuous/Lifelong
Learning and images (.png or .pkl)
Classes labelled,
Object Recognition,
training set splits 164,866 Classification,
is a collection of and (.pkl, .txt, .tsv) [88] V.
CORe50 created based on a 3- RBG-D Object 2017
more than 500 an
way, multi-runs images label files recognition
videos (30fps) of
benchmark.
50 domestic
objects belonging
to 10 different
categories.
OpenLORIS- Lifelong/Continual Classes labelled, 1,106,424 images (.png and .pkl) Classification, 2019 [89] Q.
Object Robotic Vision training/validation/testing RBG-D Lifelong
dataset set splits created by images and (.pkl) label files object
(OpenLORIS- benchmark scripts. recognition,
Object) collected Robotic
by real robots Vision
mounted with
multiple high-
resolution sensors,
includes a
collection of 121
object instances
(1st version of
dataset, 40
categories daily
necessities objects
under 20 scenes).
The dataset has
rigorously
considered 4
environment
factors under
different scenes,
including
illumination,
occlusion, object
pixel size and
clutter, and defines
the difficulty levels
of each factor
explicitly.
The Cambridge-
Ga
driving Labeled Object
The dataset is labeled Bro
Video Database over 700 recognition [95][96][97]
CamVid with semantic labels for Images 2008 Sh
(CamVid) is a images and
32 semantic classes. Fa
collection of classification
Ro
videos.
Oli
Ma
RailSem19 is a Object
Mu
dataset for recognition
The dataset is labeled Ma
understanding and [98][99]
RailSem19 semanticly and box- 8500 Images 2019 Ze
scenes for vision classification,
wise. Da
systems on scene
Ste
railways. recognition
Sa
Cs
Ke
Bu
BOREAS is a
J.
multi-season
Yu
autonomous driving
An
dataset. It includes
Object Ha
data from includes
recognition Sh
a Velodyne Alpha-
The data is annotated by 350 km of Images, Lidar and Radar and [100][101] Jin
BOREAS Prime (128-beam) 2023
3D bounding boxes. driving data data classification, We
lidar, a FLIR
scene Ts
Blackfly S camera,
recognition La
a Navtech CIR304-
Y.K
H radar, and an
An
Applanix POS LV
Sc
GNSS-INS.
Tim
Ba
5000
images for
The labeling include training and Ka
Bosch Small
It is a dataset of bounding boxes of traffic a video Traffic light [102][103] Be
Traffic Lights Images 2017
traffic lights. lights together with their sequence of recognition No
Dataset
state (active light). 8334 Bo
frames for
evaluation
Je
Nic
The labeling include Ré
It is a dataset of bounding boxes of Railway Ra
more than [104][105]
FRSign French railway railway signals together Images signal 2020 Ch
100000
signals. with their state (active recognition Gr
light). Ro
Po
Ha
The labeling include
Ph
It is a dataset of bounding boxes of Railway
[106][107] Fa
GERALD German railway railway signals together 5000 Images signal 2023
Ch
signals. with their state (active recognition
Sc
light).
Multi-cue Multi-cue onboard The databaset is labeled 1092 image Images Object 2009 [108] Ch
pedestrian pedestrian box-wise. pairs with recognition Wo
detection dataset is 1776 boxes and Wa
a dataset for for classification Sc
pedestrians
detection of
pedestrians.
Tu
RAWPED is a Bu
Object
dataset for Be
The dataset is labeled recognition [109][110]
RAWPED detection of 26000 Images 2020 Bu
box-wise. and
pedestrians in the Cu
classification
context of railways. Gu
Alp
OSDaR23 is a
DZ
multi-sensory Object
Sc
dataset for The databaset is labeled 16874 Images, Lidar, Radar and recognition [111][112]
OSDaR23 2023 De
detection of objects box-wise. frames Infrared and
an
in the context of classification
Fu
railways.
Arg
Argoverse is a Object
Ca
multi-sensory recognition
Me
dataset for The dataset is annotated 320 hours Data from 7 cameras and and [113][114]
Agroverse 2022 Un
detection of objects box-wise. of recording LiDAR classification,
Ge
in the context of object
Ins
roads. tracking
Te
Artificially
generated Coordinates of
data lines drawn
Artificial
describing given as Handwriting recognition, [115]
Characters 6000 Text 1992
the structure integers. classification
Dataset
of 10 capital Various other
English features.
letters.
Online
handwritten
Chinese
character
database,
Provides the
collected
CASIA- sequences of Handwriting recognition, [119][118]
using Anoto 1,174,364 Images, Text 2009
OLHWDB coordinates of classification
pen on paper.
strokes.
3755 classes
in the GB
2312
character
set.
Labeled
samples of
3-dimensional
pen tip
Character pen tip velocity
trajectories Handwriting recognition, [120][121]
Trajectories trajectory 2858 Text 2008
for people classification
Dataset matrix for each
writing
sample
simple
characters.
Character
recognition in
natural
Character recognition,
Chars74K images of [122]
74,107 handwriting recognition, 2009
Dataset symbols
OCR, classification
used in both
English and
Kannada
Derived from
NIST Special
Database 19. EMNIST dataset[124]
Handwritten Converted to character recognition,
EMNIST characters 28x28 pixel 800,000 Images classification, handwriting 2016
dataset from 3600 images, recognition Documentation[125
contributors matching the
MNIST
dataset.[123]
UJI Pen Isolated Coordinates of 11,640 Text Handwriting recognition, 2009 [126][127]
Characters handwritten pen position as classification
Dataset characters characters
were written
given.
Handwriting Features
samples extracted from
from the images, split
Gisette Handwriting recognition, [128]
often- into train/test, 13,500 Images, text 2003
Dataset classification
confused 4 handwriting
and 9 images size-
characters. normalized.
1623
different
handwritten
Omniglot Classification, one-shot [129][130]
characters Hand-labeled. 38,300 Images, text, strokes 2015
dataset learning
from 50
different
alphabets.
Database of
MNIST [131][132]
handwritten Hand-labeled. 60,000 Images, text Classification 1994
database
digits.
Optical
Recognition Normalized Size
of bitmaps of normalized and Handwriting recognition, [133]
5620 Images, text 1998
Handwritten handwritten mapped to classification
Digits data. bitmaps.
Dataset
Pen-Based
Feature
Recognition Handwritten
vectors
of digits on Handwriting recognition, [134][135]
extracted to be 10,992 Images, text 1998
Handwritten electronic classification
uniformly
Digits pen-tablet.
spaced.
Dataset
All handwritten
digits have
Semeion
Handwritten been
Handwritten Handwriting recognition, [136]
digits from normalized for 1593 Images, text 2008
Digit classification
80 people. size and
Dataset
mapped to the
same grid.
Aerial images
Created
Dataset name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Aditya Arora,
Akshita Gupta,
Precise instance-level
annotatio carried out by Aerial
iSAID: Instance professional Classification, Salman Khan,
655,451
Segmentation in annotators, cross- Images, Object [140][141]
(15 2019
Aerial Images checked and validated
classes)
jpg, json Detection, Guolei Sun,
Dataset by expert annotators Instance
complying with well- Segmentation Fahad Shahbaz Khan,
defined guidelines.
Fan Zhu,
Aerial Image 80 high-resolution Images manually 80 Images Aerial 2013 [142][143] J. Yuan et al.
Segmentation aerial images with segmented. Classification,
Dataset spatial resolution object
detection
ranging from 0.3 to
1.0.
Multiple labeled Images manually
Images People
training and evaluation labeled to show paths [144][145]
KIT AIS Data Set ~ 150 with tracking, 2012 M. Butenuth et al.
datasets of aerial of individuals through
paths aerial tracking
images of crowds. crowds.
Maritime scenes of
optical aerial images
from the visible
spectrum. It contains
color images in
Classification,
dynamic marine Object bounding boxes [148][149]
MASATI dataset 7389 Images aerial object 2018 A.-J. Gallego et al.
environments, each and labeling.
detection
image may contain
one or multiple targets
in different weather
and illumination
conditions.
Forest Type Satellite imagery of Image wavelength [150][151]
326 Text Classification 2015 B. Johnson
Mapping Dataset forests in Japan. bands extracted.
Over 30 annotations
Annotated overhead and over 60 statistics
Overhead Imagery Images, [152][153]
imagery. Images with that describe the target 1000 Classification 2009 F. Tanner et al.
Research Data Set text
multiple objects. within the context of
the image.
SpaceNet is a corpus
GeoTiff and GeoJSON Classification,
of commercial satellite [154][155][156]
SpaceNet files containing building >17533 Images Object 2017 DigitalGlobe, Inc.
imagery and labeled
footprints. Identification
training data.
Underwater images
Created
Dataset name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Other images
Created
Dataset name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
A. Ebadi, P.
A novel benchmark gas Image, [162][163]
NRC-GAMMA None 28,883 Classification 2021 Paul, S. Auer, &
meter image dataset Label
S. Tremblay
The Images of scanned None 4908 TIFF/pdf Source device 2020 [164] C. Ben Rabah
SUPATLANTIQUE official and Wikipedia identification, et al.
dataset documents forgery detection,
Classification,..
2D keypoints and 3D
StanfordExtra 2D keypoints and Labelled [173]
segmentations for the 12,035 reconstruction/pose 2020 B. Biggs et al.
Dataset segmentations provided. images
Stanford Dogs Dataset. estimation
Breed labeled, tight
37 categories of pets
The Oxford-IIIT Pet bounding box, Images, Classification, [172][174]
with roughly 200 images ~ 7,400 2012 O. Parkhi et al.
Dataset foreground-background text object detection
of each.
segmentation.
Online Video
Transcoding times for
Characteristics and [177]
various different videos Video features given. 168,286 Text Regression 2015 T. Deneke et al.
Transcoding Time
and video properties.
Dataset.
Descriptive caption and
Microsoft Sequential storytelling given for
Dataset for sequential Images, [178] Microsoft
Image Narrative each photo, and photos 81,743 Visual storytelling 2016
vision-to-language text Research
Dataset (SIND) are arranged in
sequences
Discrete LIRIS- Short videos annotated Valence and arousal Video emotion [185]
9800 Video 2015 Y. Baveye et al.
ACCEDE for valence and arousal. labels. elicitation detection
Labeled Information
Library of Alexandria:
Biology and
Conservation. Labeled
~10M [193] LILA working
LILA BC images that support None Images Classification 2019
images group
machine learning
research around ecology
and environmental
science.
Text data
These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis.
Reviews
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Classification,
US product reviews from 233.1 2015 [197][198]
Amazon reviews None. Text sentiment McAuley et al.
Amazon.com. million (2018)
analysis
Car Evaluation Data Car properties and their Six categorical features [204][205]
1728 Text Classification 1997 M. Bohanec
Set overall acceptability. given.
User vote data for pairs of
YouTube Comedy
videos shown on YouTube. [206][207]
Slam Preference Video metadata given. 1,138,562 Text Classification 2012 Google
Users voted on funnier
Dataset
videos.
Vietnamese Social
Users’ Facebook [212]
Media Emotion Comments 6,927 Text Classification 1997 Nguyen et al.
Comments.
Corpus (UIT-VSMEC)
Vietnamese Open-
domain Complaint [213]
Customer product reviews Comments 5,485 Text Classification 2021 Nguyen et al.
Detection dataset
(ViOCD)
Containing
ViHOS: Hate Speech
26k spans Span [214]
Spans Detection for Social Media Texts Comments Text 2021 Hoang et al.
on 11k Detection
Vietnamese
comments
News articles
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
NLP,
The Irish Times 24 Years of Ireland News Publish time, Headline Computational [225]
1,484,340 CSV 2020 R. Kulkarni
Ireland News Corpus from 1996 to 2019 Category and Text Linguistics,
Events
Messages
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Attachments removed,
Network
Emails from employees at invalid email addresses
analysis, 2004 [227][228] Klimt, B. and Y.
Enron Email Dataset Enron organized into converted to ~ 500,000 Text
sentiment (2015) Yang
folders. user@enron.com or
analysis
no_address@enron.com.
Natural
Twenty Newsgroups Messages from 20 different [233]
None. 20,000 Text language 1999 T. Mitchell et al.
Dataset newsgroups.
processing
Spam
Many text features [234]
Spambase Dataset Spam emails. 4,601 Text detection, 1999 M. Hopkins et al.
extracted.
classification
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Clustering,
SNAP Social Circles: Node features, circles, [242][243] J. McAuley et
Large Twitter network data. 1,768,149 Text graph 2012
Twitter Database and ego networks. al.
analysis
Twitter Dataset for
Samples hand-labeled as [244][245]
Arabic Sentiment Arabic tweets. 2000 Text Classification 2014 N. Abdulla
positive or negative.
Analysis
Dutch Social media This dataset contains classified for sentiment, 271,342 JSONL Sentiment, 2020 [252][253][254] Aaaksh Gupta,
collection COVID-19 tweets made by tweet text & user multi-label CoronaWhy
Dutch speakers or users description translated to classification,
from Netherlands. The data English. Industry mention machine
has been machine labeled are extracted translation
Dialogues
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Hand privacy masked, NLP,
Posts from age-specific [255] Forsyth, E., Lin, J.,
NPS Chat Corpus tagged for part of speech ~ 500,000 XML programming, 2007
online chat rooms. & Martell, C.
and dialogue-act. linguistics
Reddit All Comments All Reddit comments (as ~ 1.7 NLP, [259]
JSON 2015 Stuck_In_the_Matrix
Corpus of 2015). billion research
930
Dialogues extracted from thousand Dialogue
Ubuntu Dialogue [260]
Ubuntu chat stream on dialogues, CSV Systems 2015 Lowe, R. et al.
Corpus
IRC. 7.1 million Research
utterances
DSTC2
The Dialog State Tracking
contains
Challenges 2 & 3
~3.2k Henderson, Matthew
(DSTC2&3) were research
Dialog State Tracking Transcription of spoken calls – Dialogue [261] and Thomson,
challenge focused on Json 2014
Challenge dialogs with labelling DSTC3 state tracking Blaise and Williams,
improving the state of the
contains Jason D
art in tracking the state of
~2.3k
spoken dialog systems.
calls
Legal
Default Created
Dataset Name Brief description Preprocessing Instances Format Reference Creator
Task (updated)
Other text
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Classification,
Web of Science Hierarchical Datasets [266][267] K. Kowsari et
None. 46,985 Text 2017
Dataset for Text Classification Categorization al.
Summarization,
Federal Court of
Legal Case [268][269] F. Galgani et
Australia cases from None. 4,000 Text 2012
Reports
2006 to 2009.
citation analysis al.
Stories and
Dataset for the
associated questions Natural language
Machine [274][275] M. Richardson
for testing None. 660 Text processing, machine 2013
Comprehension of et al.
comprehension of comprehension
Text
text.
Naturally occurring
The Penn Text is parsed into Natural language [276][277] M. Marcus et
text annotated for ~ 1M words Text 1995
Treebank Project semantic trees. processing, summarization al.
linguistic structure.
Task given is to
determine, from Features extracted
features given, which include word stems. [278]
DEXTER Dataset 2600 Text Classification 2008 Reuters
articles are about Distractor features
corporate included.
acquisitions.
Collected for
experiments in
In addition to normal
Authorship Attribution
texts, syntactically [281][282] K. Luyckx et
Personae Corpus and Personality 145 Text Classification, regression 2008
annotated texts are al.
Prediction. Consists
given.
of 145 Dutch-
language essays.
Archives of social
media websites, Text extracted and
~100,000,000 [283][284] J.
PushShift including Reddit, normalized from Json NLP, sentiment, linguistics 2022
posts Baumgartner
Twitter, and WARCs
Hackernews.
Categorization task
for free text Word frequency has [285][286] P. Ciarelli et
CNAE-9 Dataset 1080 Text Classification 2012
descriptions of been extracted. al.
Brazilian companies.
Sentiment of each
Sentiment Labeled 3000 sentiment sentence has been Classification, sentiment [287][288]
3000 Text 2015 D. Kotzias
Sentences Dataset labeled sentences. hand labeled as analysis
positive or negative.
Dataset to predict the
number of comments
BlogFeedback Many features of [289][290]
a post will receive 60,021 Text Regression 2014 K. Buza
Dataset each post extracted.
based on features of
that post.
Image captions
matched with newly Entailment class
Stanford Natural
constructed labels, syntactic Natural language
Language [291] S. Bowman et
sentences to form parsing by the 570,000 Text inference/recognizing 2015
Inference (SNLI) al.
entailment, Stanford PCFG textual entailment
Corpus
contradiction, or parser
neutral pairs.
A multilingual
collection of short
DSL Corpus excerpts of 294,000 Discriminating between [292] Tan, Liling et
None Text 2017
Collection (DSLCC) journalistic texts in phrases similar languages al.
similar languages and
dialects.
Urban Dictionary Corpus of words, User names NLP, Machine [293]
2,580,925 CSV 2016 May Anonymous
Dataset votes and definitions anonymised comprehension
JSON
and NIF
Wikipedia abstracts Alignment of Wikidata [2] (http
11M aligned [294] H. Elsahar et
T-REx aligned with Wikidata triples with Wikipedia s://hady NLP, Relation Extraction 2018
triples al.
entities abstracts elsahar.
github.i
o/t-rex/)
~1M
General Language
Benchmark of nine sentences [295][296][297]
Understanding Various NLU 2018 Wang et al.
tasks and sentence
Evaluation (GLUE)
pairs
Contract
Understanding The Atticus
Atticus Dataset Dataset of legal CSV Project (http
~13,000 Natural language
(CUAD) (formerly contracts with rich and 2021 s://www.atticu
labels processing, QnA
known as Atticus expert annotations PDF sprojectai.org/
Open Contract cuad)
Dataset (AOK))
26,850
Vietnamese
Vietnamese Names Vietnamese
Names annotated Natural language [299]
annotated with full names CSV 2020 To et al.
with Genders (UIT- processing
Genders annotated
ViNames)
with genders
10,000
Vietnamese
Vietnamese
Vietnamese users'
Constructive and
Constructive and comments on Natural Language [300]
Toxic Speech CSV 2021 Nguyen et al.
Toxic Speech online Processing
Detection Dataset
Detection Dataset newspapers
(UIT-ViCTSD)
on 10
domains
Sound data
These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis.
Speech
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
English:
Unsupervised
5h, 12
Zero Resource Spontaneous speech discovery of
speakers; WAV (audio [301][302] Versteegh et
Speech Challenge (English), Read speech None, raw WAV files. speech 2015
Xitsonga: only) al.
2015 (Xitsonga). features/subword
2h30, 24
units/word units
speakers
Recordings of 630
speakers of eight major
Speech
dialects of American Speech is lexically and [313][314] J. Garofolo et
TIMIT 6300 Text recognition, 1986
English, each reading ten phonemically transcribed. al.
classification.
phonetically rich
sentences.
Speech
A single-speaker, Modern
Synthesis,
Standard Arabic (MSA)
Speech is Speech
speech corpus with
Arabic Speech orthographically and Recognition, [315]
phonetic and ~1900 Text, WAV 2016 N. Halabi
Corpus phonetically transcribed Corpus
orthographic transcripts
with stress marks. Alignment,
aligned to phoneme
Speech Therapy,
level.
Education.
A public domain
database of English: MP3 with 2017 June
Validation by other users Speech [316]
Common Voice crowdsourced data 1,118 corresponding (2019 Mozilla
. recognition
across a wide range of hours text files December)
dialects.
A single-speaker corpus
of English public-domain Quality check,
Speech [317] Keith Ito,
LJSpeech audiobook recordings, normalized transcription 13,100 CSV, WAV 2017
synthesis Linda Johnson
split into short clips at alongside the original.
punctuation marks.
Music
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Audio features of music Geographic
Geographic Origin of Audio features extracted [318][319]
samples from different 1,059 Text classification, 2014 F. Zhou et al.
Music Data Set using MARSYAS software.
locations. clustering
Other sounds
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
10-second sound
snippets from 128-d PCA'd VGG-ish
Text (CSV) and TensorFlow [328] J. Gemmeke
AudioSet YouTube videos, and features every 1 2,084,320 Classification 2017
Record files et al., Google
an ontology of over second.
500 labels.
Queen Mary
Audio from
University
Bird Audio environmental
2016 [329][330] and IEEE
Detection monitoring stations, 17,000+ Classification
(2018) Signal
challenge plus crowdsourced
Processing
recordings
Society
Signal data
Datasets containing electric signal information requiring some sort of signal processing for further analysis.
Electrical
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Levels of various
Data covering the nonlinear
components as a function [340][341]
Servo Dataset relationships observed in a 167 Text Regression 1993 K. Ullrich
of other components are
servo-amplifier circuit.
given.
Motion-tracking
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
10 normal and 10
aggressive physical
Vicon Physical Action Many parameters recorded [350][351]
actions that measure the 3000 Text Classification 2011 T. Theodoridis
Data Set Dataset by 3D tracker.
human activity tracked by
a 3D tracker.
Many sensors given, no
Daily and Sports Motor sensor data for 19 [352][353] B. Barshan et
preprocessing done on 9120 Text Classification 2013
Activities Dataset daily and sports activities. al.
signals.
Gyroscope and
Human Activity accelerometer data from Actions performed are
[354][355] J. Reyes-Ortiz
Recognition Using people wearing labeled, all signals 10,299 Text Classification 2012
et al.
Smartphones Dataset smartphones and preprocessed for noise.
performing normal actions.
Weight Lifting
Five variations of the
Exercises monitored Some statistics calculated [358][359] W. Ugulino et
biceps curl exercise 39,242 Text Classification 2013
with Inertial from raw data. al.
monitored with IMUs.
Measurement Units
Two databases of surface
sEMG for Basic Hand [360][361] C. Sapsanis et
electromyographic signals None. 3000 Text Classification 2014
movements Dataset al.
of 6 hand movements.
Evaluate techniques
dealing with the effects of
REALDISP Activity [361][362]
sensor displacement in None. 1419 Text Classification 2014 O. Banos et al.
Recognition Dataset
wearable activity
recognition.
18 different types of
PAMAP2 Physical
physical activities [367]
Activity Monitoring None. 3,850,505 Text Classification 2012 A. Reiss
performed by 9 subjects
Dataset
wearing 3 IMUs.
Human Activity
Recognition from wearable,
OPPORTUNITY object, and ambient
[368][369] D. Roggen et
Activity Recognition sensors is a dataset None. 2551 Text Classification 2012
al.
Dataset devised to benchmark
human activity recognition
algorithms.
Human Activity
Recognition from wearable
devices. Distinguishes 3,150,000
Real World Activity [370]
between seven on-body None. (per Text Classification 2016 T. Sztyler et al.
Recognition Dataset
device positions and sensor)
comprises six different
kinds of sensors.
10 healthy
3D human pose estimates person and
(Kinect) of stroke patients 9 stroke
Toronto Rehab Stroke and healthy participants survivors [371][372][373] E. Dolatabadi
None. CSV Classification 2017
Pose Dataset performing a set of tasks (3500– et al.
using a stroke rehabilitation 6000
robot. frames per
person)
Other signals
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Physical data
Datasets from physical systems.
High-energy physics
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Monte Carlo simulations of
28 features of each [380][381][382]
HIGGS Dataset particle accelerator 11M Text Classification 2014 D. Whiteson
collision are given.
collisions.
Systems
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Yacht Hydrodynamics Yacht performance based Six features are given for [384][385]
308 Text Regression 2013 R. Lopez
Dataset on dimensions. each yacht.
A series of aerodynamic
Airfoil Self-Noise and acoustic tests of two Data about frequency, angle [394]
1503 Text Regression 2014 R. Lopez
Dataset and three-dimensional airfoil of attack, etc., are given.
blade sections.
Astronomy
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Volcanoes on Venus –
Venus images returned by Images are labeled by [398][399]
JARtool experiment not given Images Classification 1991 M. Burl
the Magellan spacecraft. humans.
Dataset
Monte Carlo generated high- Numerous features
MAGIC Gamma [399][400]
energy gamma particle extracted from the 19,020 Text Classification 2007 R. Bock
Telescope Dataset
events. simulations.
Measurements of the
number of certain types of Many solar flare-specific Regression, [401]
Solar Flare Dataset 1389 Text 1989 G. Bradshaw
solar flare events occurring features are given. classification
in a 24-hour period.
Earth science
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Volcanoes of the World Volcanic eruption data for all Details such as region, 1535 Text Regression, 2013 [403] E. Venzke et al.
known volcanic events on subregion, tectonic setting, classification
earth.
dominant rock type are
given.
Catchment hydrology
dataset with CSV,
[408] C. Alvarez-
CAMELS-Chile hydrometeorological see Reference 516 Text, Regression 2018
Garreton et al.
timeseries and various Shapefile
attributes
Catchment hydrology
dataset with CSV,
CAMELS-Brazil hydrometeorological see Reference 897 Text, Regression 2020 [409] V. Chagas et al.
timeseries and various Shapefile
attributes
Catchment hydrology
dataset with CSV,
CAMELS-GB hydrometeorological see Reference 671 Text, Regression 2020 [410] G. Coxon et al.
timeseries and various Shapefile
attributes
Catchment hydrology
dataset with CSV,
CAMELS-Australia hydrometeorological see Reference 222 Text, Regression 2021 [411] K. Fowler et al.
timeseries and various Shapefile
attributes
Catchment hydrology
dataset with CSV,
LamaH-CE hydrometeorological see Reference 859 Text, Regression 2021 [412] C. Klingler et al.
timeseries and various Shapefile
attributes
Other physical
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Dataset of concrete
Concrete Compressive Nine features are given for [413][414]
properties and compressive 1030 Text Regression 2007 I. Yeh
Strength Dataset each sample.
strength.
Concrete Slump Test Concrete slump flow given Features of concrete given [415][416]
103 Text Regression 2009 I. Yeh
Dataset in terms of properties. such as fly ash, water, etc.
Predict if a molecule, given Arris
168 features given for each [417]
Musk Dataset the features, will be a musk 6598 Text Classification 1994 Pharmaceutical
molecule.
or a non-musk. Corp.
Semeion
Steel Plates Faults Steel plates of 7 different 27 features given for each [418]
1941 Text Classification 2010 Research
Dataset types. sample.
Center
Biological data
Datasets from biological systems.
Human
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
A five-step method to
A structured general- infer birth and death
purpose dataset on years, gender, and Paper[419]
life, work, and death occupation from Regression, Amoradnejad
Age Dataset 1,223,009 Text 2022
of 1.22 million community-submitted Classification Dataset[420] et al.
distinguished people. data to all language
Public domain. versions of the
Wikipedia project.
United States
National Survey on Large scale survey on Department of
Classification, [430]
Drug Use and health and drug use in None. 55,268 Text 2012 Health and
regression
Health the United States. Human
Services
9 years of
Diabetes 130-US
readmission data
hospitals for years Many features of each Classification, [435][436]
across 130 US 100,000 Text 2014 J. Clore et al.
1999–2008 readmission are given. clustering
hospitals for patients
Dataset
with diabetes.
Features extracted
Diabetic Features extracted
from images of eyes [437][438]
Retinopathy and conditions 1151 Text Classification 2014 B. Antal et al.
with and without
Debrecen Dataset diagnosed.
diabetic retinopathy.
Methods to evaluate
segmentation and
Diabetic Features retinopathy
indexing techniques in Images, Classification, [439][440] Messidor
Retinopathy grade and risk of 1200 2008
the field of retinal Text Segmentation Project
Messidor Dataset macular edema
ophthalmology
(MESSIDOR)
Seven biological
Liver Disorders Data for people with [441][442] Bupa Medical
features given for 345 Text Classification 1990
Dataset liver disorders. Research Ltd.
each patient.
10 databases of
Thyroid Disease [443][444]
thyroid disease patient None. 7200 Text Classification 1987 R. Quinlan
Dataset
data.
Large number of
Mesothelioma Mesothelioma patient features, including [445][446] A. Tanrikulu et
324 Text Classification 2016
Dataset data. asbestos exposure, al.
are given.
2D human pose
Parkinson's Vision- estimates of Camera shake has
Classification, [447][448][449]
Based Pose Parkinson's patients been removed from 134 Text 2017 M. Li et al.
regression
Estimation Dataset performing a variety of trajectories.
tasks.
KEGG Metabolic Network of metabolic Detailed features for 65,554 Text Classification, 2011 [450] M. Naeem et
Reaction Network pathways. A reaction each network node clustering, al.
regression
(Undirected) network and a relation and pathway are
Dataset network are given. given.
Animal
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Marine
Physical measurements of
[453] Research
Abalone Dataset Abalone. Weather patterns None. 4177 Text Regression 1995
Laboratories –
and location are also given.
Taroona
Primate splice-junction
Splice-junction Gene gene sequences (DNA) with [432]
None. 3190 Text Classification 1992 G. Towell et al.
Sequences Dataset associated imperfect
domain theory.
Expression levels of 77
Mice Protein Classification, [457][458]
proteins measured in the None. 1080 Text 2015 C. Higuera et al.
Expression Dataset Clustering
cerebral cortex of mice.
Fungi
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Plant
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Forest fires and their 13 features of each fire are [462][463]
Forest Fires Dataset 517 Text Regression 2008 P. Cortez et al.
properties. extracted.
Measurements of
geometrical properties of Classification, [469][470] Charytanowicz
Seeds Dataset None. 210 Text 2012
kernels belonging to three clustering et al.
different varieties of wheat.
Microbe
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Predictions of Cellular
Eight features given per [484][485]
Yeast Dataset localization sites of 1484 Text Classification 1996 K. Nakai et al.
instance.
proteins.
Drug discovery
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Prediction of outcome of Chemical descriptors of [486]
Tox21 Dataset 12707 Text Classification 2016 A. Mayr et al.
biological assays. molecules are given.
Anomaly data
Default Created
Dataset Name Brief description Preprocessing Instances Format Reference Creator
Task (updated)
This dataset
contains a large
collection of Open
A large collection of Neural SPARQL
Question to Templates and
DBpedia Neural SPARQL specially instances for
Hartmann,
Question design for Open training Neural Question [491][492]
894,499 Question-query pairs 2018 Soru, and
Answering Domain Neural SPARQL Answering
Marx et al.
(DBNQA) Dataset Question Answering Machines; it was
over DBpedia pre-processed by
Knowledgebase. semi-automatic
annotation tools as
well as by three
SPARQL experts.
This dataset
comprises over
A large collection of 23,000 human-
Vietnamese
Vietnamese generated question-
Question Question [493] Nguyen et
questions for answer pairs based 23,074 Question-answer pairs 2020
Answering Dataset Answering al.
evaluating MRC on 5,109 passages
(UIT-ViQuAD)
models. of 174 Vietnamese
articles from
Wikipedia.
A collection of
Vietnamese This corpus
Vietnamese Question
Multiple-Choice includes 2,783
multiple-choice Answering/Machine [494] Nguyen et
Machine Reading Vietnamese 2,783 Question-answer pairs 2020
questions for Reading al.
Comprehension multiple-choice
evaluating MRC Comprehension
Corpus(ViMMRC) questions.
models.
Taskmaster-1 and
Taskmaster-2:
conversation id,
utterances, Instruction id
Taskmaster-3:
"The conversation id,
Taskmaster
utterances, vertical,
corpus Taskmaster-1: goal-oriented
consists of conversational dataset. It includes scenario, instructions.
THREE 13,215 task-based dialogs
datasets, comprising six domains. For further details
Taskmaster-1 check the project's
(TM-1), Taskmaster-2: 17,289 dialogs
Taskmaster-2 GitHub repository (http
(TM-2), and
in the seven domains s://github.com/google- Dialog/Instruction
Byrne and
Taskmaster 2019 [498] Krishnamoorthi
Taskmaster-3 (restaurants, food ordering, research-datasets/Tas prompted
et al.
(TM-3), movies, hotels, flights, music kmaster) or the
comprising and sports).
over 55,000 Hugging Face dataset
spoken and cards (taskmaster-1 (h
written task- Taskmaster-3: 23,757 movie
ttps://huggingface.co/d
oriented ticketing dialogs.
dialogs in atasets/taskmaster1),
over a dozen taskmaster-2 (https://h
domains."[497] uggingface.co/dataset
s/taskmaster2),
taskmaster-3 (https://h
uggingface.co/dataset
s/taskmaster3)).
Cybersecurity
Brief Default Created
Dataset Name Preprocessing Instances Format Reference
description Task (updated)
CVE is a list
of publicly
disclosed
cybersecurity
Data can be downloaded
vulnerabilities
from: Allitems (https://cve. [506]
CVE that is free to C
mitre.org/data/downloads/al
search, use,
litems.csv)
and
incorporate
into products
and services.
Software
Development (https://c
Common we.mitre.org/data/csv/
Weakness 699.csv.zip) Hardware [507]
CWE C
Enumeration
data. Design (https://cwe.mit
re.org/1194.csv.zip)
Research Concepts (h
ttps://cwe.mitre.org/dat
a/csv/1000.csv.zip)
2009 (https://www.use
nix.org/legacy/events/
sec09/tech/), 2010 (htt
ps://www.usenix.org/le
gacy/events/sec10/tec
h/) 2011 (https://static.
usenix.org/event/sec1
1/tech/), 2012 (https://
www.usenix.org/confe
rence/usenixsecurity1
2/technical-sessions),
2013 (https://www.use
nix.org/conference/us
enixsecurity13/technic
al-sessions), 2014 (htt
ps://www.usenix.org/c
onference/usenixsecu
rity14/technical-sessio
ns), 2015 (https://www.
usenix.org/conferenc
e/usenixsecurity15/tec
hnical-sessions), 2016
(https://www.usenix.or
g/conference/usenixse
curity16/technical-ses
sions), 2017 (https://w
ww.usenix.org/confere
nce/usenixsecurity17/t
echnical-sessions),
2018 (https://www.use
nix.org/conference/us
enixsecurity18/technic
al-sessions), 2019 (htt
ps://www.usenix.org/c
onference/usenixsecu
rity19/technical-sessio
ns), 2020 (https://www.
usenix.org/conferenc
e/usenixsecurity20/tec
hnical-sessions), 2021
(https://www.usenix.or
g/conference/usenixse
curity21/technical-ses
sions), 2022 (https://w
ww.usenix.org/confere
nce/usenixsecurity22/t
echnical-sessions).
APTNotes Collection of This data is not The GitHub repository (http [510] A
public pre-processed. s://github.com/aptnotes/dat
documents, a) of the project contains a
whitepapers
and articles file with links to the data
about APT stored in box.
campaigns.
All the Data files can also be
documents
arepublicly
downloaded here (http
available s://github.com/ameza1
data. 3/APTNotesData/).
Small
collection of
security
Security eBooks eBooks, and This data is not [512][513][514][515][516][517][518][519][520][521][522][523]
for free security pre-processed.
presentations
publicly
available.
Repository of
worldwide
National Cyber
strategy This data is not [524]
Security strategy
documents pre-processed.
repository
about
cybersecurity.
Y
Data about
C
cybersecurity Tokenization,
Cyber Security Y
strategies meaningless- [525]
Natural Language W
from more frequent words
Processing Y
than 75 removal.
X
countries.
X
Sample of
APT reports,
All data is available in this
malware, Raw and
APT Reports GitHub (https://github.com/ [526]
technology, tokenize data b
collection blackorbird/APT_REPORT)
and available.
repository.
intelligence
collection
News (https://www.databre
aches.net/news/), list of
news from Aug 2022 to Feb
Databreaches This data is not [531]
2023 (https://github.com/be
news pre-processed.
e3202/cybersecurity-data-s
ources/blob/main/DATABR
EACHES.md)
News (https://cybernews.c
om/news/), curated list of
This data is not news (https://github.com/b [532]
Cybernews
pre-processed. ee3202/cybersecurity-data-
sources/blob/main/CYBER
NEWS.md)
News (https://www.hipaajou
This data is not [533]
Hipaajournal rnal.com/category/hipaa-co
pre-processed.
mpliance-news/)
Matrix of
Mitre Defend Defend json files [544]
artifacts
Mitre Atlas Mitre Atlas is This data is not [545]
a knowledge pre-processed
base of
adversary
tactics,
techniques,
and case
studies for
machine
learning (ML)
systems
based on
real-world
observations.
MITRE
Engage is a
framework for
planning and
discussing
adversary
engagement
operations
This data is not [546]
Mitre Engage that
pre-processed
empowers
you to
engage your
adversaries
and achieve
your
cybersecurity
goals.
This data is not [547]
Hacking Tutorials
pre-processed
Each claim is
accompanied by five
A dataset adopting manually annotated
the FEVER evidence sentences Dataset HF card (https://hu
methodology that retrieved from the ggingface.co/datasets/clima
consists of 1,535 English Wikipedia that te_fever), and project's [554]
CLIMATE-FEVER support, refute or do Diggelmann et al.
real-world claims GitHub repository (https://git
regarding climate- not give enough hub.com/tdiggelm/climate-fe
change collected information to validate ver-dataset).
on the internet. the claim totalling in
7,675 claim-evidence
pairs.[553]
The dataset is made
Climate news DB (http://ww
A dataset for NLP up of a number of data
w.climate-news-db.com/),
Climate News and climate artifacts (JSON, [555]
Project's GitHub repository ADGEfficiency
dataset change media JSONL & CSV text
(https://github.com/ADGEffi
researchers files & SQLite
ciency/climate-news-db)
database)
Climatext is a
HF dataset (https://huggingf
dataset for
ace.co/datasets/mwong/cli [556]
Climatext sentence-based University of Zurich
matetext-evidence-related-e
climate change
valuation/tree/main/data)
topic detection.
Website with
articles about This data is not pre- [560]
GreenBiz GreenBiz
climate and processed
sustainability
Code data
Brief Default Created
Dataset Name Preprocessing Instances Format Reference Creator
description Task (updated)
The
Community
Distribution of
This data is not List of GitHub repositories of the project (http
OKD Kubernetes
pre-processed s://github.com/orgs/okd-project/repositories)
that powers
Red Hat
OpenShift
The developer
and operations List of GitHub repositories of the project (http
OpenShift friendly s://github.com/bee3202/open-shift-repos/blo
Kubernetes b/main/pages_openshift.md)
distro
List of GitHub repositories of the project (http
This data is not
Kubernetes s://github.com/bee3202/open-shift-repos/blo
pre-processed
b/main/pages_kubernetes.md)
GitHub home
List of GitHub repositories of the project (http
of the Red Hat This data is not
Red Hat Developer s://github.com/bee3202/open-shift-repos/blo
Developer pre-processed
b/main/pages_redhat_developer.md)
program
Red Hat
List of GitHub repositories of the project (http
This data is not
s://github.com/bee3202/open-shift-repos/blo
Workshops pre-processed
b/main/pages_redhat_workshops.md)
Multivariate data
Financial
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Weather
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Census
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Partial data from 1990 US Results randomized and Classification, [593] United States
US Census Data 1990 2,458,285 Text 1990
census. useful attributes selected. regression Census Bureau
Transit
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Many features, including
Hourly and daily count of [594][595]
Bike Sharing Dataset weather, length of trip, etc., 17,389 Text Regression 2013 H. Fanaee-T
rental bikes in a large city.
are given.
39,000
Speed, flow, occupancy individual
Regression,
and other metrics from loop Metric usually aggregated detectors, Comma California
Forecasting, (updated [600]
PeMS detectors and other sensors via Average into 5 minutes each separated Department of
Nowcasting, realtime)
in the freeway of the State timesteps. containing values Transportation
Interpolation
of California, U.S.A.. years of
timeseries
Internet
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Large collection of
Webpages from webpages and how they clustering, [601]
None. 3.5B Text 2013 V. Granville
Common Crawl 2012 are connected via classification
hyperlinks
Features encode
Internet Dataset for predicting if a
geometry of ads and [602][603]
Advertisements given image is an 3279 Text Classification 1998 N. Kushmerick
phrases occurring in the
Dataset advertisement or not.
URL.
Internet Usage General demographics of Classification, [604]
None. 10,104 Text 1999 D. Cook
Dataset internet users. clustering
Freebase is an online
Freebase Simple Topics from Freebase Classification, [609][610]
effort to structure all large Text 2011 Freebase
Topic Dump have been extracted. clustering
human knowledge.
An open-source
recreation of the WebText Natural
corpus. The text is web Extracted non-HTML 8,013,769 Language
[616][617] A. Gokaslan,
OpenWebText content extracted from content, deduplicated, Documents, Text Processing, 2019
V. Cohen
URLs shared on Reddit and tokenized. 38GB Text
with at least three Prediction
upvotes.
Games
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
Attributes of each hand are
5 card hands from a given, including the Poker Regression, [618]
Poker Hand Dataset 1,025,010 Text 2007 R. Cattral
standard 52 card deck. hands formed by the cards classification
it contains.
Other multivariate
Created
Dataset Name Brief description Preprocessing Instances Format Default Task Reference Creator
(updated)
OpenML:[646] Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating
algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms.
PMLB:[647] A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification
and regression datasets in a standardized format that are accessible through a Python API.
Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and
counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic.
Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question
answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.[648][649]
See also
Comparison of deep learning software
List of manual image annotation tools
List of biological databases
References
1. Wissner-Gross, A. "Datasets Over Algorithms" (https://edge.org/resp 4. Abney, Steven (17 September 2007). Semisupervised Learning for
onse-detail/26587). Edge.com. Retrieved 8 January 2016. Computational Linguistics (https://books.google.com/books?id=VC
2. Weiss, G. M.; Provost, F. (1 September 2003). "Learning When d67cGB_rAC&pg=PP1). CRC Press. ISBN 978-1-4200-1080-0.
Training Data are Costly: The Effect of Class Distribution on Tree 5. Žliobaitė, Indrė; Bifet, Albert; Pfahringer, Bernhard; Holmes, Geoff
Induction" (https://www.jair.org/index.php/jair/article/download/1034 (2011). "Active Learning with Evolving Streaming Data". Machine
6/24739). Journal of Artificial Intelligence Research. AI Access Learning and Knowledge Discovery in Databases. Berlin,
Foundation. 19: 315–354. doi:10.1613/jair.1199 (https://doi.org/10.1 Heidelberg: Springer Berlin Heidelberg. pp. 597–612.
613%2Fjair.1199). ISSN 1076-9757 (https://www.worldcat.org/issn/ doi:10.1007/978-3-642-23808-6_39 (https://doi.org/10.1007%2F978
1076-9757). S2CID 2344521 (https://api.semanticscholar.org/Corpu -3-642-23808-6_39). ISBN 978-3-642-23807-9. ISSN 0302-9743 (ht
sID:2344521). tps://www.worldcat.org/issn/0302-9743).
3. Turney, Peter (2000). "Types of cost in inductive concept learning".
arXiv:cs/0212034 (https://arxiv.org/abs/cs/0212034).
6. Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; 17. Nguyen, Duy; et al. (2006). "Real-time face detection and lip feature
Kotsia, I. (2017). "Aff-Wild: Valence and Arousal in-the-wild extraction using field-programmable gate arrays". IEEE
Challenge" (http://openaccess.thecvf.com/content_cvpr_2017_work Transactions on Systems, Man, and Cybernetics – Part B:
shops/w33/papers/Zafeiriou_Aff-Wild_Valence_and_CVPR_2017_ Cybernetics. 36 (4): 902–912. CiteSeerX 10.1.1.156.9848 (https://cit
paper.pdf) (PDF). Computer Vision and Pattern Recognition eseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.9848).
Workshops (CVPRW), 2017: 1980–1987. doi:10.1109/tsmcb.2005.862728 (https://doi.org/10.1109%2Ftsmcb.
doi:10.1109/CVPRW.2017.248 (https://doi.org/10.1109%2FCVPR 2005.862728). PMID 16903373 (https://pubmed.ncbi.nlm.nih.gov/16
W.2017.248). ISBN 978-1-5386-0733-6. S2CID 3107614 (https://ap 903373). S2CID 7334355 (https://api.semanticscholar.org/CorpusI
i.semanticscholar.org/CorpusID:3107614). D:7334355).
7. Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; 18. Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "Comprehensive
Schuller, B.; Kotsia, I.; Zafeiriou, S. (2019). "Deep Affect Prediction database for facial expression analysis (http://www.ri.cmu.edu/pub_
in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, files/pub2/kanade_takeo_2000_1/kanade_takeo_2000_1.pdf)."
and Beyond" (https://rdcu.be/bmGm2). International Journal of Automatic Face and Gesture Recognition, 2000. Proceedings.
Computer Vision. 127 (6–7): 907–929. doi:10.1007/s11263-019- Fourth IEEE International Conference on. IEEE, 2000.
01158-4 (https://doi.org/10.1007%2Fs11263-019-01158-4). 19. Zeng, Zhihong; et al. (2009). "A survey of affect recognition
S2CID 13679040 (https://api.semanticscholar.org/CorpusID:136790 methods: Audio, visual, and spontaneous expressions". IEEE
40). Transactions on Pattern Analysis and Machine Intelligence. 31 (1):
8. Kollias, D.; Zafeiriou, S. (2019). "Expression, affect, action unit 39–58. CiteSeerX 10.1.1.144.217 (https://citeseerx.ist.psu.edu/view
recognition: Aff-wild2, multi-task learning and arcface" (https://bmvc doc/summary?doi=10.1.1.144.217). doi:10.1109/tpami.2008.52 (http
2019.org/wp-content/uploads/papers/0399-paper.pdf) (PDF). British s://doi.org/10.1109%2Ftpami.2008.52). PMID 19029545 (https://pub
Machine Vision Conference (BMVC), 2019. arXiv:1910.04855 (http med.ncbi.nlm.nih.gov/19029545).
s://arxiv.org/abs/1910.04855). 20. Lyons, Michael; Kamachi, Miyuki; Gyoba, Jiro (1998). "Facial
9. Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. (2020). "Analysing expression images". The Japanese Female Facial Expression
affective behavior in the first abaw 2020 competition" (https://www.c (JAFFE) Database. doi:10.5281/zenodo.3451524 (https://doi.org/1
omputer.org/csdl/proceedings-article/fg/2020/307900a794/1kecIYu9 0.5281%2Fzenodo.3451524).
wL6). IEEE International Conference on Automatic Face and 21. Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro
Gesture Recognition (FG), 2020: 637–643. arXiv:2001.11409 (http "Coding facial expressions with Gabor wavelets (https://zenodo.org/
s://arxiv.org/abs/2001.11409). doi:10.1109/FG47880.2020.00126 (ht record/3430156)." Automatic Face and Gesture Recognition, 1998.
tps://doi.org/10.1109%2FFG47880.2020.00126). ISBN 978-1-7281- Proceedings. Third IEEE International Conference on. IEEE, 1998.
3079-8. S2CID 210966051 (https://api.semanticscholar.org/CorpusI 22. Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to
D:210966051).
cleaning large face datasets (http://vintage.winklerbros.net/Publicati
10. Phillips, P. Jonathon; et al. (1998). "The FERET database and ons/icip2014a.pdf)." Image Processing (ICIP), 2014 IEEE
evaluation procedure for face-recognition algorithms". Image and International Conference on. IEEE, 2014.
Vision Computing. 16 (5): 295–306. doi:10.1016/s0262- 23. RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-
8856(97)00070-x (https://doi.org/10.1016%2Fs0262-8856%2897% Miller, Erik (2015). "One-to-many face recognition with bilinear
2900070-x). CNNs". arXiv:1506.01342 (https://arxiv.org/abs/1506.01342) [cs.CV
11. Wiskott, Laurenz; et al. (1997). "Face recognition by elastic bunch (https://arxiv.org/archive/cs.CV)].
graph matching". IEEE Transactions on Pattern Analysis and 24. Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz.
Machine Intelligence. 19 (7): 775–779. CiteSeerX 10.1.1.44.2321 (h
"Robust face detection using the hausdorff distance." Audio-and
ttps://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.2321). video-based biometric person authentication. Springer Berlin
doi:10.1109/34.598235 (https://doi.org/10.1109%2F34.598235). Heidelberg, 2001.
S2CID 30523165 (https://api.semanticscholar.org/CorpusID:305231
65). 25. Huang, Gary B., et al. Labeled faces in the wild: A database for
studying face recognition in unconstrained environments (https://ha
12. Livingstone, Steven R.; Russo, Frank A. (2018). "The Ryerson l.inria.fr/docs/00/32/19/23/PDF/Huang_long_eccv2008-lfw.pdf). Vol.
Audio-Visual Database of Emotional Speech and Song 1. No. 2. Technical Report 07-49, University of Massachusetts,
(RAVDESS): A dynamic, multimodal set of facial and vocal Amherst, 2007.
expressions in North American English" (https://www.ncbi.nlm.nih.g
ov/pmc/articles/PMC5955500). PLOS ONE. 13 (5): e0196391. 26. Bhatt, Rajen B., et al. "Efficient skin region segmentation using low
Bibcode:2018PLoSO..1396391L (https://ui.adsabs.harvard.edu/abs/ complexity fuzzy decision tree model (http://citeseerx.ist.psu.edu/vie
2018PLoSO..1396391L). doi:10.1371/journal.pone.0196391 (http wdoc/download?doi=10.1.1.708.9158&rep=rep1&type=pdf)." India
s://doi.org/10.1371%2Fjournal.pone.0196391). PMC 5955500 (http Conference (INDICON), 2009 Annual IEEE. IEEE, 2009.
s://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955500). 27. Lingala, Mounika; et al. (2014). "Fuzzy logic color detection: Blue
PMID 29768426 (https://pubmed.ncbi.nlm.nih.gov/29768426). areas in melanoma dermoscopy images" (https://www.ncbi.nlm.nih.
13. Livingstone, Steven R.; Russo, Frank A. (2018). "Emotion". The gov/pmc/articles/PMC4287461). Computerized Medical Imaging
Ryerson Audio-Visual Database of Emotional Speech and Song and Graphics. 38 (5): 403–410.
(RAVDESS). doi:10.5281/zenodo.1188976 (https://doi.org/10.528 doi:10.1016/j.compmedimag.2014.03.007 (https://doi.org/10.1016%
1%2Fzenodo.1188976). 2Fj.compmedimag.2014.03.007). PMC 4287461 (https://www.ncbi.n
lm.nih.gov/pmc/articles/PMC4287461). PMID 24786720 (https://pub
14. Grgic, Mislav; Delac, Kresimir; Grgic, Sonja (2011). "SCface– med.ncbi.nlm.nih.gov/24786720).
surveillance cameras face database". Multimedia Tools and
Applications. 51 (3): 863–879. doi:10.1007/s11042-009-0417-2 (htt 28. Maes, Chris, et al. "Feature detection on 3D face surfaces for pose
ps://doi.org/10.1007%2Fs11042-009-0417-2). S2CID 207218990 (h normalisation and recognition (https://lirias.kuleuven.be/retrieve/135
ttps://api.semanticscholar.org/CorpusID:207218990). 678)." Biometrics: Theory Applications and Systems (BTAS), 2010
Fourth IEEE International Conference on. IEEE, 2010.
15. Wallace, Roy, et al. "Inter-session variability modelling and joint
factor analysis for face authentication (https://repository.ubn.ru.nl/bit 29. Savran, Arman, et al. "Bosphorus database for 3D face analysis (htt
stream/handle/2066/94489/94489.pdf)." Biometrics (IJCB), 2011 ps://web.archive.org/web/20190222192331/http://pdfs.semanticsch
International Joint Conference on. IEEE, 2011. olar.org/4254/fbba3846008f50671edc9cf70b99d7304543.pdf)."
Biometrics and Identity Management. Springer Berlin Heidelberg,
16. Georghiades, A. "Yale face database". Center For Computational 2008. 47–56.
Vision And Control At Yale University,
http://CVC.yale.edu/Projects/Yalefaces/Yalefa. 2: 1997. {{cite 30. Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-
journal}}: External link in |journal= (help) dimensional face recognition: An eigensurface approach (http://epri
nts.whiterose.ac.uk/1526/01/austinj4.pdf)." Image Processing,
2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE,
2004.
31. Ge, Yun; et al. (2011). "3D Novel Face Sample Modeling for Face
Recognition". Journal of Multimedia. 6 (5): 467–475.
CiteSeerX 10.1.1.461.9710 (https://citeseerx.ist.psu.edu/viewdoc/su
mmary?doi=10.1.1.461.9710). doi:10.4304/jmm.6.5.467-475 (https://
doi.org/10.4304%2Fjmm.6.5.467-475).
32. Wang, Yueming; Liu, Jianzhuang; Tang, Xiaoou (2010). "Robust 3D 45. Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. (2012).
face recognition by local shape difference boosting". IEEE "Structured learning of human interactions in TV shows". IEEE
Transactions on Pattern Analysis and Machine Intelligence. 32 (10): Transactions on Pattern Analysis and Machine Intelligence. 34 (12):
1858–1870. CiteSeerX 10.1.1.471.2424 (https://citeseerx.ist.psu.ed 2441–2453. doi:10.1109/tpami.2012.24 (https://doi.org/10.1109%2F
u/viewdoc/summary?doi=10.1.1.471.2424). tpami.2012.24). PMID 23079467 (https://pubmed.ncbi.nlm.nih.gov/2
doi:10.1109/tpami.2009.200 (https://doi.org/10.1109%2Ftpami.200 3079467). S2CID 6060568 (https://api.semanticscholar.org/CorpusI
9.200). PMID 20724762 (https://pubmed.ncbi.nlm.nih.gov/2072476 D:6060568).
2). S2CID 15263913 (https://api.semanticscholar.org/CorpusID:152 46. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January
63913). 2013). Berkeley MHAD: A comprehensive multimodal human action
33. Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face database (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.
recognition using learned visual codebook (http://citeseerx.ist.psu.e 1.432.5113&rep=rep1&type=pdf). In Applications of Computer
du/viewdoc/download?doi=10.1.1.580.8534&rep=rep1&type=pdf)." Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE.
Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE 47. Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a
Conference on. IEEE, 2007. large number of classes." ICCV Workshop on Action Recognition
34. Zhao, G.; Huang, X.; Taini, M.; Li, S. Z.; Pietikäinen, M. (2011). with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-
"Facial expression recognition from near-infrared videos" (http://ww Workshop. 2013.
w.academia.edu/download/42229488/Image_and_Vision_Computi 48. Simonyan, Karen, and Andrew Zisserman. "Two-stream
ng20160206-29020-1auzaon.pdf) (PDF). Image and Vision convolutional networks for action recognition in videos (https://pape
Computing. 29 (9): 607–619. doi:10.1016/j.imavis.2011.07.002 (http rs.nips.cc/paper/5353-two-stream-convolutional-networks-for-action
s://doi.org/10.1016%2Fj.imavis.2011.07.002). -recognition-in-videos.pdf)." Advances in Neural Information
35. Soyel, Hamit, and Hasan Demirel. "Facial expression recognition Processing Systems. 2014.
using 3D facial feature distances (https://pdfs.semanticscholar.org/cf 49. Stoian, Andrei; Ferecatu, Marin; Benois-Pineau, Jenny; Crucianu,
81/4b618fcbc9a556cdce225e74a8806867ba84.pdf)." Image Michel (2016). "Fast Action Localization in Large-Scale Video
Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831– Archives". IEEE Transactions on Circuits and Systems for Video
838. Technology. 26 (10): 1917–1930.
36. Bowyer, Kevin W.; Chang, Kyong; Flynn, Patrick (2006). "A survey doi:10.1109/TCSVT.2015.2475835 (https://doi.org/10.1109%2FTC
of approaches and challenges in 3D and multi-modal 3D+ 2D face SVT.2015.2475835). S2CID 31537462 (https://api.semanticscholar.
recognition". Computer Vision and Image Understanding. 101 (1): org/CorpusID:31537462).
1–15. CiteSeerX 10.1.1.134.8784 (https://citeseerx.ist.psu.edu/view 50. Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata,
doc/summary?doi=10.1.1.134.8784). Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-
doi:10.1016/j.cviu.2005.05.005 (https://doi.org/10.1016%2Fj.cviu.20 Jia; Shamma, David A; Bernstein, Michael S; Fei-Fei, Li (2017).
05.05.005). "Visual Genome: Connecting Language and Vision Using
37. Tan, Xiaoyang; Triggs, Bill (2010). "Enhanced local texture feature Crowdsourced Dense Image Annotations". International Journal of
sets for face recognition under difficult lighting conditions". IEEE Computer Vision. 123: 32–73. arXiv:1602.07332 (https://arxiv.org/ab
Transactions on Image Processing. 19 (6): 1635–1650. s/1602.07332). doi:10.1007/s11263-016-0981-7 (https://doi.org/10.1
Bibcode:2010ITIP...19.1635T (https://ui.adsabs.harvard.edu/abs/20 007%2Fs11263-016-0981-7). S2CID 4492210 (https://api.semantic
10ITIP...19.1635T). CiteSeerX 10.1.1.105.3355 (https://citeseerx.ist. scholar.org/CorpusID:4492210).
psu.edu/viewdoc/summary?doi=10.1.1.105.3355). 51. Karayev, S., et al. "A category-level 3-D object dataset: putting the
doi:10.1109/tip.2010.2042645 (https://doi.org/10.1109%2Ftip.2010. Kinect to work (http://alliejanoch.com/iccvw2011.pdf)." Proceedings
2042645). PMID 20172829 (https://pubmed.ncbi.nlm.nih.gov/20172 of the IEEE International Conference on Computer Vision
829). S2CID 4943234 (https://api.semanticscholar.org/CorpusID:49 Workshops. 2011.
43234). 52. Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable
38. Mousavi, Mir Hashem, Karim Faez, and Amin Asghari. "Three nonparametric image parsing with superpixels (http://152.2.128.56/
dimensional face recognition using SVM classifier (https://ieeexplor ~jtighe/Papers/ECCV10/eccv10-jtighe.pdf) Archived (https://web.arc
e.ieee.org/abstract/document/4529822/)." Computer and hive.org/web/20190806022752/http://152.2.128.56/~jtighe/Papers/E
Information Science, 2008. ICIS 08. Seventh IEEE/ACIS CCV10/eccv10-jtighe.pdf) 6 August 2019 at the Wayback Machine."
International Conference on. IEEE, 2008. Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010.
39. Amberg, Brian, Reinhard Knothe, and Thomas Vetter. "Expression 352–365.
invariant 3D face recognition with a morphable model (https://gravi 53. Arbelaez, P.; Maire, M; Fowlkes, C; Malik, J (May 2011). "Contour
s.dmi.unibas.ch/publications/2008/FG08_Amberg.pdf)." Automatic Detection and Hierarchical Image Segmentation" (http://www.eecs.
Face & Gesture Recognition, 2008. FG'08. 8th IEEE International berkeley.edu/Research/Projects/CS/vision/grouping/papers/amfm_
Conference on. IEEE, 2008. pami2010.pdf) (PDF). IEEE Transactions on Pattern Analysis and
40. İrfanoğlu, M. O., Berk Gökberk, and Lale Akarun. "3D shape-based Machine Intelligence. 33 (5): 898–916. doi:10.1109/tpami.2010.161
face recognition using automatically registered facial surfaces (http (https://doi.org/10.1109%2Ftpami.2010.161). PMID 20733228 (http
s://www.researchgate.net/profile/Berk_Gokberk/publication/409070 s://pubmed.ncbi.nlm.nih.gov/20733228). S2CID 206764694 (https://
4_3D_Shape-based_face_recognition_using_automatically_regist api.semanticscholar.org/CorpusID:206764694). Retrieved
ered_facial_surfaces/links/0fcfd50ee9450e057a000000.pdf)." 27 February 2016.
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th 54. Lin, Tsung-Yi; Maire, Michael; Belongie, Serge; Bourdev, Lubomir;
International Conference on. Vol. 4. IEEE, 2004. Girshick, Ross; Hays, James; Perona, Pietro; Ramanan, Deva;
41. Beumier, Charles; Acheroy, Marc (2001). "Face verification from 3D Lawrence Zitnick, C.; Dollár, Piotr (2014). "Microsoft COCO:
and grey level clues". Pattern Recognition Letters. 22 (12): 1321– Common Objects in Context". arXiv:1405.0312 (https://arxiv.org/ab
1329. Bibcode:2001PaReL..22.1321B (https://ui.adsabs.harvard.ed s/1405.0312) [cs.CV (https://arxiv.org/archive/cs.CV)].
u/abs/2001PaReL..22.1321B). doi:10.1016/s0167-8655(01)00077- 55. Russakovsky, Olga; et al. (2015). "Imagenet large scale visual
0 (https://doi.org/10.1016%2Fs0167-8655%2801%2900077-0). recognition challenge". International Journal of Computer Vision.
42. Afifi, Mahmoud; Abdelhamed, Abdelrahman (13 June 2017). 115 (3): 211–252. arXiv:1409.0575 (https://arxiv.org/abs/1409.057
"AFIF4: Deep Gender Classification based on AdaBoost-based 5). doi:10.1007/s11263-015-0816-y (https://doi.org/10.1007%2Fs11
Fusion of Isolated Facial Features and Foggy Faces". 263-015-0816-y). hdl:1721.1/104944 (https://hdl.handle.net/1721.
arXiv:1706.04277 (https://arxiv.org/abs/1706.04277) [cs.CV (https:// 1%2F104944). S2CID 2930547 (https://api.semanticscholar.org/Cor
arxiv.org/archive/cs.CV)]. pusID:2930547).
43. "SoF dataset" (https://sites.google.com/view/sof-dataset). 56. "COCO – Common Objects in Context" (https://cocodataset.org/).
sites.google.com. Retrieved 18 November 2017. cocodataset.org.
44. "IMDb-WIKI" (https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/). 57. Xiao, Jianxiong, et al. "Sun database: Large-scale scene
data.vision.ee.ethz.ch. Retrieved 13 March 2018. recognition from abbey to zoo." Computer vision and pattern
recognition (CVPR), 2010 IEEE conference on. IEEE, 2010.
58. Donahue, Jeff; Jia, Yangqing; Vinyals, Oriol; Hoffman, Judy; Zhang, 73. M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R.
Ning; Tzeng, Eric; Darrell, Trevor (2013). "DeCAF: A Deep Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes
Convolutional Activation Feature for Generic Visual Recognition". Dataset (https://www.cityscapes-dataset.com/wordpress/wp-conten
arXiv:1310.1531 (https://arxiv.org/abs/1310.1531) [cs.CV (https://arx t/papercite-data/pdf/cordts2015cvprw.pdf)." In CVPR Workshop on
iv.org/archive/cs.CV)]. The Future of Datasets in Vision, 2015.
59. Deng, Jia, et al. "Imagenet: A large-scale hierarchical image 74. Everingham, Mark; et al. (2010). "The pascal visual object classes
database (https://www.researchgate.net/profile/Li_Jia_Li/publicatio (voc) challenge" (https://www.research.ed.ac.uk/portal/en/publicatio
n/221361415_ImageNet_a_Large-Scale_Hierarchical_Image_Data ns/the-pascal-visual-object-classes-voc-challenge(88a29de3-6220-
base/links/00b495388120dbc339000000/ImageNet-a-Large-Scale- 442b-ab2d-284210cf72d6).html). International Journal of Computer
Hierarchical-Image-Database.pdf)."Computer Vision and Pattern Vision. 88 (2): 303–338. doi:10.1007/s11263-009-0275-4 (https://do
Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. i.org/10.1007%2Fs11263-009-0275-4).
60. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet hdl:20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6 (http
classification with deep convolutional neural networks (http://paper s://hdl.handle.net/20.500.11820%2F88a29de3-6220-442b-ab2d-28
s.nips.cc/paper/4824-imagenet-classification-with-deep-convolution 4210cf72d6). S2CID 4246903 (https://api.semanticscholar.org/Corp
al-neural-networks.pdf)." Advances in neural information usID:4246903).
processing systems. 2012. 75. Felzenszwalb, Pedro F.; et al. (2010). "Object detection with
61. Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; discriminatively trained part-based models". IEEE Transactions on
Satheesh, Sanjeev; et al. (11 April 2015). "ImageNet Large Scale Pattern Analysis and Machine Intelligence. 32 (9): 1627–1645.
Visual Recognition Challenge". International Journal of Computer CiteSeerX 10.1.1.153.2745 (https://citeseerx.ist.psu.edu/viewdoc/su
Vision. 115 (3): 211–252. arXiv:1409.0575 (https://arxiv.org/abs/140 mmary?doi=10.1.1.153.2745). doi:10.1109/tpami.2009.167 (https://d
9.0575). doi:10.1007/s11263-015-0816-y (https://doi.org/10.1007%2 oi.org/10.1109%2Ftpami.2009.167). PMID 20634557 (https://pubme
Fs11263-015-0816-y). hdl:1721.1/104944 (https://hdl.handle.net/17 d.ncbi.nlm.nih.gov/20634557). S2CID 3198903 (https://api.semantic
21.1%2F104944). S2CID 2930547 (https://api.semanticscholar.org/ scholar.org/CorpusID:3198903).
CorpusID:2930547). 76. Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A
62. Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El- procrustean approach to learning binary codes." Computer Vision
Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE,
Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, 2011.
Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for 77. "CINIC-10 dataset" (http://www.bayeswatch.com/2018/10/09/CINI
large-scale multi-label and multi-class image classification, 2017. C/). Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J.
Available from https://github.com/openimages." Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 9 October
63. Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast 2018. Retrieved 13 November 2018.
News Videos (https://dl.acm.org/citation.cfm?id=2683546)." 78. fashion-mnist: A MNIST-like fashion product database. Benchmark
Proceedings of the 2014 Indian Conference on Computer Vision :point_right (https://github.com/zalandoresearch/fashion-mnist),
Graphics and Image Processing. ACM, 2014. Zalando Research, 7 October 2017, retrieved 7 October 2017
64. Hauptmann, Alexander G., and Michael J. Witbrock. "Story 79. "notMNIST dataset" (http://yaroslavvb.blogspot.com/2011/09/notmni
segmentation and detection of commercials in broadcast news st-dataset.html). Machine Learning, etc. 8 September 2011.
video (https://pdfs.semanticscholar.org/5c21/6db7892fa3f515d816f8 Retrieved 13 October 2017.
4893bfab1137f0b2.pdf)." Research and Technology Advances in 80. Houben, Sebastian, et al. "Detection of traffic signs in real-world
Digital Libraries, 1998. ADL 98. Proceedings. IEEE International images: The German Traffic Sign Detection Benchmark (https://ww
Forum on. IEEE, 1998. w.researchgate.net/profile/Sebastian_Houben/publication/2423466
65. Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and 25_Detection_of_Traffic_Signs_in_Real-World_Images_The_Germ
visualizing nonlinear correlation clusters (https://www.researchgate. an_Traffic_Sign_Detection_Benchmark/links/0046352a03ec384e9
net/profile/Anthony_Tung/publication/221214229_CURLER_Findin 7000000/Detection-of-Traffic-Signs-in-Real-World-Images-The-Ger
g_and_Visualizing_Nonlinear_Correlated_Clusters/links/55b8691a man-Traffic-Sign-Detection-Benchmark.pdf)." Neural Networks
08aed621de05cd92.pdf)." Proceedings of the 2005 ACM SIGMOD (IJCNN), The 2013 International Joint Conference on. IEEE, 2013.
international conference on Management of data. ACM, 2005. 81. Mathias, Mayeul, et al. "Traffic sign recognition—How far are we
66. Jarrett, Kevin, et al. "What is the best multi-stage architecture for from the solution? (http://www.varcity.eu/paper/ijcnn2013_mathias_t
object recognition? (https://ieeexplore.ieee.org/abstract/document/5 rafficsign.pdf)." Neural Networks (IJCNN), The 2013 International
459469/)." Computer Vision, 2009 IEEE 12th International Joint Conference on. IEEE, 2013.
Conference on. IEEE, 2009. 82. Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready
67. Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond for autonomous driving? the kitti vision benchmark suite (https://ww
bags of features: Spatial pyramid matching for recognizing natural w.cvlibs.net/publications/Geiger2012CVPR.pdf)." Computer Vision
scene categories (https://hal.inria.fr/inria-00548585/documen and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE,
t)."Computer Vision and Pattern Recognition, 2006 IEEE Computer 2012.
Society Conference on. Vol. 2. IEEE, 2006. 83. Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D
68. Griffin, G., A. Holub, and P. Perona. Caltech-256 object category SLAM systems (http://jsturm.de/publications/data/sturm12iros.pdf)."
dataset California Inst. Technol., Tech. Rep. 7694, 2007. Available: Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ
http://authors.library.caltech.edu/7694, 2007. International Conference on. IEEE, 2012.
69. Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern 84. The KITTI Vision Benchmark Suite (https://www.youtube.com/watc
🐺
information retrieval. Vol. 463. New York: ACM press, 1999. h?v=KXpZ6B1YB_k) on YouTube
70. COYO-700M: Image-Text Pair Dataset (https://github.com/kakao 85. Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5
brain/coyo-dataset), Kakao Brain, 3 November 2022, retrieved dataset. Chaladze.com. Retrieved 13 November 2017, from
3 November 2022 http://chaladze.com/l5/
71. Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing 86. Kragh, Mikkel F.; et al. (2017). "FieldSAFE – Dataset for Obstacle
(https://pdfs.semanticscholar.org/9da2/abae3072fd9fcff0e13b8f00fc Detection in Agriculture" (https://vision.eng.au.dk/fieldsafe).
21f22d0085.pdf)." Computer Vision—ACCV 2014. Springer Sensors. 17 (11): 2579. arXiv:1709.03526 (https://arxiv.org/abs/170
International Publishing, 2014. 162–177. 9.03526). Bibcode:2017Senso..17.2579K (https://ui.adsabs.harvar
72. Heitz, Geremy; et al. (2009). "Shape-based object localization for d.edu/abs/2017Senso..17.2579K). doi:10.3390/s17112579 (https://d
descriptive classification". International Journal of Computer Vision. oi.org/10.3390%2Fs17112579). PMC 5713196 (https://www.ncbi.nl
84 (1): 40–62. CiteSeerX 10.1.1.142.280 (https://citeseerx.ist.psu.ed m.nih.gov/pmc/articles/PMC5713196). PMID 29120383 (https://pub
u/viewdoc/summary?doi=10.1.1.142.280). doi:10.1007/s11263-009- med.ncbi.nlm.nih.gov/29120383).
0228-y (https://doi.org/10.1007%2Fs11263-009-0228-y). 87. Afifi, Mahmoud (12 November 2017). "Gender recognition and
S2CID 646320 (https://api.semanticscholar.org/CorpusID:646320). biometric identification using a large dataset of hand images".
arXiv:1711.04322 (https://arxiv.org/abs/1711.04322) [cs.CV (https://
arxiv.org/archive/cs.CV)].
88. Lomonaco, Vincenzo; Maltoni, Davide (18 October 2017). 103. Behrendt, Karsten; Novak, Libor; Botros, Rami (May 2017). "A deep
"CORe50: a New Dataset and Benchmark for Continuous Object learning approach to traffic lights: Detection, tracking, and
Recognition". arXiv:1705.03550 (https://arxiv.org/abs/1705.03550) classification" (https://ieeexplore.ieee.org/document/7989163).
[cs.CV (https://arxiv.org/archive/cs.CV)]. 2017 IEEE International Conference on Robotics and Automation
89. She, Qi; Feng, Fan; Hao, Xinyue; Yang, Qihan; Lan, Chuanlin; (ICRA): 1370–1377. doi:10.1109/ICRA.2017.7989163 (https://doi.or
Lomonaco, Vincenzo; Shi, Xuesong; Wang, Zhengwei; Guo, Yao; g/10.1109%2FICRA.2017.7989163). ISBN 978-1-5090-4633-1.
Zhang, Yimin; Qiao, Fei; Chan, Rosa H.M. (15 November 2019). S2CID 6257133 (https://api.semanticscholar.org/CorpusID:625713
"OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for 3).
Lifelong Deep Learning". arXiv:1911.06487v2 (https://arxiv.org/abs/ 104. "FRSign Dataset" (https://frsign.irt-systemx.fr/). frsign.irt-systemx.fr.
1911.06487v2) [cs.CV (https://arxiv.org/archive/cs.CV)]. Retrieved 5 May 2023.
90. Morozov, Alexei; Sushkova, Olga (13 June 2019). "THz and thermal 105. Harb, Jeanine; Rébéna, Nicolas; Chosidow, Raphaël; Roblin,
video data set" (http://www.fullvision.ru/monitoring/description_eng. Grégoire; Potarusov, Roman; Hajri, Hatem (5 February 2020).
php). Development of the multi-agent logic programming approach "FRSign: A Large-Scale Traffic Light Dataset for Autonomous
to a human behaviour analysis in a multi-channel video Trains". arXiv:2002.05665 (https://arxiv.org/abs/2002.05665) [cs.CY
surveillance. Moscow: IRE RAS. Retrieved 19 July 2019. (https://arxiv.org/archive/cs.CY)].
91. Morozov, Alexei; Sushkova, Olga; Kershner, Ivan; Polupanov, 106. "ifs-rwth-aachen/GERALD" (https://github.com/ifs-rwth-aachen/GER
Alexander (9 July 2019). "Development of a method of terahertz ALD). Chair and Institute for Rail Vehicles and Transport Systems.
intelligent video surveillance based on the semantic fusion of 30 April 2023. Retrieved 5 May 2023.
terahertz and 3D video images" (http://ceur-ws.org/Vol-2391/paper1 107. Leibner, Philipp; Hampel, Fabian; Schindler, Christian (3 April
9.pdf) (PDF). CEUR. 2391: paper19. Retrieved 19 July 2019. 2023). "GERALD: A novel dataset for the detection of German
92. "Papers with Code - Daimler Monocular Pedestrian Detection mainline railway signals" (https://journals.sagepub.com/doi/abs/10.
Dataset" (https://paperswithcode.com/dataset/daimler-monocular-p 1177/09544097231166472). Proceedings of the Institution of
edestrian-detection). paperswithcode.com. Retrieved 5 May 2023. Mechanical Engineers, Part F: Journal of Rail and Rapid Transit:
93. Enzweiler, Markus; Gavrila, Dariu M. (December 2009). "Monocular 095440972311664. doi:10.1177/09544097231166472 (https://doi.o
Pedestrian Detection: Survey and Experiments" (https://ieeexplore.i rg/10.1177%2F09544097231166472). ISSN 0954-4097 (https://ww
eee.org/document/4657363). IEEE Transactions on Pattern w.worldcat.org/issn/0954-4097). S2CID 257939937 (https://api.sem
Analysis and Machine Intelligence. 31 (12): 2179–2195. anticscholar.org/CorpusID:257939937).
doi:10.1109/TPAMI.2008.260 (https://doi.org/10.1109%2FTPAMI.20 108. Wojek, Christian; Walk, Stefan; Schiele, Bernt (June 2009). "Multi-
08.260). ISSN 1939-3539 (https://www.worldcat.org/issn/1939-353 cue onboard pedestrian detection" (https://ieeexplore.ieee.org/docu
9). PMID 19834140 (https://pubmed.ncbi.nlm.nih.gov/19834140). ment/5206638). 2009 IEEE Conference on Computer Vision and
S2CID 1192198 (https://api.semanticscholar.org/CorpusID:119219 Pattern Recognition: 794–801. doi:10.1109/CVPR.2009.5206638
8). (https://doi.org/10.1109%2FCVPR.2009.5206638). ISBN 978-1-
94. Yin, Guojun; Liu, Bin; Zhu, Huihui; Gong, Tao; Yu, Nenghai (28 July 4244-3992-8. S2CID 18000078 (https://api.semanticscholar.org/Cor
2020). "A Large Scale Urban Surveillance Video Dataset for pusID:18000078).
Multiple-Object Tracking and Behavior Analysis". arXiv:1904.11784 109. Toprak, Tuğçe; Aydın, Burak; Belenlioğlu, Burak; Güzeliş, Cüneyt;
(https://arxiv.org/abs/1904.11784) [cs.CV (https://arxiv.org/archive/c Selver, M. Alper (5 April 2020). "Railway Pedestrian Dataset
s.CV)]. (RAWPED)" (https://zenodo.org/record/3741742).
95. "Object Recognition in Video Dataset" (https://mi.eng.cam.ac.uk/res doi:10.1109/TVT.2020.2983825 (https://doi.org/10.1109%2FTVT.20
earch/projects/VideoRec/CamVid/). mi.eng.cam.ac.uk. Retrieved 20.2983825). S2CID 216510283 (https://api.semanticscholar.org/C
5 May 2023. orpusID:216510283). Retrieved 5 May 2023.
96. Brostow, Gabriel J.; Shotton, Jamie; Fauqueur, Julien; Cipolla, 110. Toprak, Tugce; Belenlioglu, Burak; Aydın, Burak; Guzelis, Cuneyt;
Roberto (2008). "Segmentation and Recognition Using Structure Selver, M. Alper (May 2020). "Conditional Weighted Ensemble of
from Motion Point Clouds" (https://link.springer.com/chapter/10.100 Transferred Models for Camera Based Onboard Pedestrian
7/978-3-540-88682-2_5). Computer Vision – ECCV 2008. Lecture Detection in Railway Driver Support Systems" (https://ieeexplore.ie
Notes in Computer Science. Springer. 5302: 44–57. ee.org/document/9050835). IEEE Transactions on Vehicular
doi:10.1007/978-3-540-88682-2_5 (https://doi.org/10.1007%2F978- Technology. 69 (5): 5041–5054. doi:10.1109/TVT.2020.2983825 (ht
3-540-88682-2_5). ISBN 978-3-540-88681-5. tps://doi.org/10.1109%2FTVT.2020.2983825). ISSN 1939-9359 (htt
97. Brostow, Gabriel J.; Fauqueur, Julien; Cipolla, Roberto (15 January ps://www.worldcat.org/issn/1939-9359). S2CID 216510283 (https://
2009). "Semantic object classes in video: A high-definition ground api.semanticscholar.org/CorpusID:216510283).
truth database" (https://www.sciencedirect.com/science/article/abs/p 111. Tilly, Roman; Neumaier, Philipp; Schwalbe, Karsten; Klasek, Pavel;
ii/S0167865508001220). Pattern Recognition Letters. 30 (2): 88– Tagiew, Rustam; Denzler, Patrick; Klockau, Tobias; Boekhoff,
97. Bibcode:2009PaReL..30...88B (https://ui.adsabs.harvard.edu/ab Martin; Köppel, Martin (2023). "Open Sensor Data for Rail 2023" (in
s/2009PaReL..30...88B). doi:10.1016/j.patrec.2008.04.005 (https://d German). doi:10.57806/9mv146r0 (https://doi.org/10.57806%2F9mv
oi.org/10.1016%2Fj.patrec.2008.04.005). ISSN 0167-8655 (https:// 146r0).
www.worldcat.org/issn/0167-8655). 112. Tagiew, Rustam; Köppel, Martin; Schwalbe, Karsten; Denzler,
98. "WildDash 2 Benchmark" (https://wilddash.cc/railsem19). Patrick; Neumaier, Philipp; Klockau, Tobias; Boekhoff, Martin;
wilddash.cc. Retrieved 5 May 2023. Klasek, Pavel; Tilly, Roman (4 May 2023). "OSDaR23: Open
99. Zendel, Oliver; Murschitz, Markus; Zeilinger, Marcel; Steininger, Sensor Data for Rail 2023". arXiv:2305.03001 (https://arxiv.org/abs/
Daniel; Abbasi, Sara; Beleznai, Csaba (June 2019). "RailSem19: A 2305.03001) [cs.CV (https://arxiv.org/archive/cs.CV)].
Dataset for Semantic Rail Scene Understanding" (https://ieeexplor 113. "Home" (https://www.argoverse.org/). Argoverse. Retrieved 5 May
e.ieee.org/document/9025646). 2019 IEEE/CVF Conference on 2023.
Computer Vision and Pattern Recognition Workshops (CVPRW): 114. Chang, Ming-Fang; Lambert, John; Sangkloy, Patsorn; Singh,
1221–1229. doi:10.1109/CVPRW.2019.00161 (https://doi.org/10.11 Jagjeet; Bak, Slawomir; Hartnett, Andrew; Wang, De; Carr, Peter;
09%2FCVPRW.2019.00161). ISBN 978-1-7281-2506-0. Lucey, Simon; Ramanan, Deva; Hays, James (6 November 2019).
S2CID 198166233 (https://api.semanticscholar.org/CorpusID:19816 "Argoverse: 3D Tracking and Forecasting with Rich Maps".
6233). arXiv:1911.02620 (https://arxiv.org/abs/1911.02620) [cs.CV (https://
100. "The Boreas Dataset" (https://www.boreas.utias.utoronto.ca/#/). arxiv.org/archive/cs.CV)].
www.boreas.utias.utoronto.ca. Retrieved 5 May 2023. 115. Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept
101. Burnett, Keenan; Yoon, David J.; Wu, Yuchen; Li, Andrew Zou; definitions (https://pdfs.semanticscholar.org/9f0e/1349d1422f1b455
Zhang, Haowei; Lu, Shichen; Qian, Jingxing; Tseng, Wei-Kang; b8ccc26ebf7b114b8db20.pdf)." Fuzzy Systems, 1993., Second
Lambert, Andrew; Leung, Keith Y. K.; Schoellig, Angela P.; Barfoot, IEEE International Conference on. IEEE, 1993.
Timothy D. (26 January 2023). "Boreas: A Multi-Season 116. Frey, Peter W.; Slate, David J. (1991). "Letter recognition using
Autonomous Driving Dataset". arXiv:2203.10168 (https://arxiv.org/a Holland-style adaptive classifiers" (https://doi.org/10.1007%2Fbf00
bs/2203.10168) [cs.RO (https://arxiv.org/archive/cs.RO)]. 114162). Machine Learning. 6 (2): 161–182.
102. "Bosch Small Traffic Lights Dataset" (https://hci.iwr.uni-heidelberg.d doi:10.1007/bf00114162 (https://doi.org/10.1007%2Fbf00114162).
e/content/bosch-small-traffic-lights-dataset). hci.iwr.uni-
heidelberg.de. 1 March 2017. Retrieved 5 May 2023.
117. Peltonen, Jaakko; Klami, Arto; Kaski, Samuel (2004). "Improved 132. Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of
learning of Riemannian metrics for exploratory analysis". Neural handwritten digit recognition tested on MNIST database". Image
Networks. 17 (8): 1087–1100. CiteSeerX 10.1.1.59.4865 (https://cite and Vision Computing. 22 (12): 971–981.
seerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.4865). doi:10.1016/j.imavis.2004.03.008 (https://doi.org/10.1016%2Fj.imav
doi:10.1016/j.neunet.2004.06.008 (https://doi.org/10.1016%2Fj.neu is.2004.03.008).
net.2004.06.008). PMID 15555853 (https://pubmed.ncbi.nlm.nih.go 133. Xu, Lei; Krzyżak, Adam; Suen, Ching Y. (1992). "Methods of
v/15555853). combining multiple classifiers and their applications to handwriting
118. Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January recognition". IEEE Transactions on Systems, Man and Cybernetics.
2013). "Online and offline handwritten Chinese character 22 (3): 418–435. doi:10.1109/21.155943 (https://doi.org/10.1109%2
recognition: Benchmarking on new databases". Pattern F21.155943). hdl:10338.dmlcz/135217 (https://hdl.handle.net/1033
Recognition. 46 (1): 155–162. Bibcode:2013PatRe..46..155L (http 8.dmlcz%2F135217).
s://ui.adsabs.harvard.edu/abs/2013PatRe..46..155L). 134. Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based
doi:10.1016/j.patcog.2012.06.021 (https://doi.org/10.1016%2Fj.patc handwritten digit recognition (http://citeseerx.ist.psu.edu/viewdoc/su
og.2012.06.021). mmary?doi=10.1.1.25.6299)." (1996).
119. Wang, D.; Liu, C.; Yu, J.; Zhou, X. (2009). "CASIA-OLHWDB1: A 135. Tang, E. Ke; et al. (2005). "Linear dimensionality reduction using
Database of Online Handwritten Chinese Characters". 2009 10th relevance weighted LDA". Pattern Recognition. 38 (4): 485–493.
International Conference on Document Analysis and Recognition: Bibcode:2005PatRe..38..485T (https://ui.adsabs.harvard.edu/abs/2
1206–1210. doi:10.1109/ICDAR.2009.163 (https://doi.org/10.110 005PatRe..38..485T). doi:10.1016/j.patcog.2004.09.005 (https://doi.
9%2FICDAR.2009.163). ISBN 978-1-4244-4500-4. org/10.1016%2Fj.patcog.2004.09.005). S2CID 10580110 (https://ap
S2CID 5705532 (https://api.semanticscholar.org/CorpusID:570553 i.semanticscholar.org/CorpusID:10580110).
2). 136. Hong, Yi, et al. "Learning a mixture of sparse distance metrics for
120. Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting classification and dimensionality reduction (https://pages.ucsd.edu/
motion primitives from natural handwriting data (https://www.era.lib. ~ztu/publication/iccv11_sparsemetric.pdf)." Computer Vision
ed.ac.uk/bitstream/handle/1842/3221/BH%20Williams%20PhD%20 (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
thesis%2009.pdf?sequence=1). Springer Berlin Heidelberg, 2006. 137. Thoma, Martin (2017). "The HASYv2 dataset". arXiv:1701.08380 (ht
121. Meier, Franziska, et al. "Movement segmentation using a primitive tps://arxiv.org/abs/1701.08380) [cs.CV (https://arxiv.org/archive/cs.C
library (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.3 V)].
95.8598&rep=rep1&type=pdf)."Intelligent Robots and Systems 138. Karki, Manohar; Liu, Qun; DiBiano, Robert; Basu, Saikat;
(IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011. Mukhopadhyay, Supratik (20 June 2018). "Pixel-level
122. T. E. de Campos, B. R. Babu and M. Varma. Character recognition Reconstruction and Classification for Noisy Handwritten Bangla
in natural images (http://personal.ee.surrey.ac.uk/Personal/T.Decam Characters". arXiv:1806.08037 (https://arxiv.org/abs/1806.08037)
pos/papers/decampos_etal_visapp2009.pdf). In Proceedings of the [cs.CV (https://arxiv.org/archive/cs.CV)].
International Conference on Computer Vision Theory and 139. Liu, Qun; Collier, Edward; Mukhopadhyay, Supratik (2019),
Applications (VISAPP), Lisbon, Portugal, February 2009
"PCGAN-CHAR: Progressively Trained Classifier Generative
123. Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van Adversarial Networks for Classification of Noisy Handwritten
Schaik (2017). "EMNIST: An extension of MNIST to handwritten Bangla Characters", Digital Libraries at the Crossroads of Digital
letters". arXiv:1702.05373v1 (https://arxiv.org/abs/1702.05373v1) Information for the Future, Springer International Publishing, pp. 3–
[cs.CV (https://arxiv.org/archive/cs.CV)]. 15, arXiv:1908.08987 (https://arxiv.org/abs/1908.08987),
124. "The EMNIST Dataset" (https://www.nist.gov/itl/products-and-servic doi:10.1007/978-3-030-34058-2_1 (https://doi.org/10.1007%2F978-
es/emnist-dataset). NIST. 4 April 2017. 3-030-34058-2_1), ISBN 978-3-030-34057-5, S2CID 201665955 (ht
125. Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van tps://api.semanticscholar.org/CorpusID:201665955)
Schaik (2017). "EMNIST: An extension of MNIST to handwritten 140. "iSAID" (https://captain-whu.github.io/iSAID/index.html). captain-
letters". arXiv:1702.05373 (https://arxiv.org/abs/1702.05373) [cs.CV whu.github.io. Retrieved 30 November 2021.
(https://arxiv.org/archive/cs.CV)]. 141. Zamir, Syed & Arora, Aditya & Gupta, Akshita & Khan, Salman &
126. Llorens, David, et al. "The UJIpenchars Database: a Pen-Based Sun, Guolei & Khan, Fahad & Zhu, Fan & Shao, Ling & Xia, Gui-
Database of Isolated Handwritten Characters (https://web.archive.or Song & Bai, Xiang. (2019). iSAID: A Large-scale Dataset for
g/web/20190806015012/https://pdfs.semanticscholar.org/24cf/ef150 Instance Segmentation in Aerial Images. website (https://captain-wh
94c59322560377bbf8e4185245c654f.pdf)." LREC. 2008. u.github.io/iSAID/index.html)
127. Calderara, Simone; Prati, Andrea; Cucchiara, Rita (2011). "Mixtures 142. Yuan, Jiangye; Gleason, Shaun S.; Cheriyadat, Anil M. (2013).
of von mises distributions for people trajectory shape analysis". "Systematic benchmarking of aerial image segmentation". IEEE
IEEE Transactions on Circuits and Systems for Video Technology. Geoscience and Remote Sensing Letters. 10 (6): 1527–1531.
21 (4): 457–471. doi:10.1109/tcsvt.2011.2125550 (https://doi.org/10. Bibcode:2013IGRSL..10.1527Y (https://ui.adsabs.harvard.edu/abs/
1109%2Ftcsvt.2011.2125550). S2CID 1427766 (https://api.semanti 2013IGRSL..10.1527Y). doi:10.1109/lgrs.2013.2261453 (https://doi.
cscholar.org/CorpusID:1427766). org/10.1109%2Flgrs.2013.2261453). S2CID 629629 (https://api.se
128. Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature manticscholar.org/CorpusID:629629).
selection challenge (http://papers.nips.cc/paper/2728-result-analysi 143. Vatsavai, Ranga Raju. "Object based image classification: state of
s-of-the-nips-2003-feature-selection-challenge.pdf)." Advances in the art and computational challenges (https://dl.acm.org/citation.cf
neural information processing systems. 2004. m?id=2534927)." Proceedings of the 2nd ACM SIGSPATIAL
129. Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. (11 December International Workshop on Analytics for Big Geospatial Data. ACM,
2015). "Human-level concept learning through probabilistic 2013.
program induction" (https://doi.org/10.1126%2Fscience.aab3050). 144. Butenuth, Matthias, et al. "Integrating pedestrian simulation,
Science. 350 (6266): 1332–1338. Bibcode:2015Sci...350.1332L (htt tracking and event detection for crowd analysis (http://www.hartman
ps://ui.adsabs.harvard.edu/abs/2015Sci...350.1332L). n-alberts.de/dirk/pub/proceedings2011e.pdf)." Computer Vision
doi:10.1126/science.aab3050 (https://doi.org/10.1126%2Fscience.a Workshops (ICCV Workshops), 2011 IEEE International
ab3050). ISSN 0036-8075 (https://www.worldcat.org/issn/0036-807 Conference on. IEEE, 2011.
5). PMID 26659050 (https://pubmed.ncbi.nlm.nih.gov/26659050). 145. Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis
130. Lake, Brenden (9 November 2019), Omniglot data set for one-shot using frame-wise normalized feature for people counting (http://ww
learning (https://github.com/brendenlake/omniglot), retrieved w.eurecom.fr/fr/publication/3841/download/mm-publi-3841.pdf)."
10 November 2019 Information Forensics and Security (WIFS), 2012 IEEE International
131. LeCun, Yann; et al. (1998). "Gradient-based learning applied to Workshop on. IEEE, 2012.
document recognition". Proceedings of the IEEE. 86 (11): 2278– 146. Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan.
2324. CiteSeerX 10.1.1.32.9552 (https://citeseerx.ist.psu.edu/viewd "A hybrid pansharpening approach and multiscale object-based
oc/summary?doi=10.1.1.32.9552). doi:10.1109/5.726791 (https://do image analysis for mapping diseased pine and oak trees (http://cite
i.org/10.1109%2F5.726791). S2CID 14542261 (https://api.semantic seerx.ist.psu.edu/viewdoc/download?doi=10.1.1.826.9200&rep=rep
scholar.org/CorpusID:14542261). 1&type=pdf)." International journal of remote sensing34.20 (2013):
6969–6982.
147. Mohd Pozi, Muhammad Syafiq; Sulaiman, Md Nasir; Mustapha, 161. Waszak et al. "Semantic Segmentation in Underwater Ship
Norwati; Perumal, Thinagaran (2015). "A new classification model Inspections: Benchmark and Data Set (https://ieeexplore.ieee.org/d
for a class imbalanced data set using genetic programming and ocument/9998080)." IEEE Journal of Oceanic Engineering. IEEE,
support vector machines: Case study for wilt disease classification" 2022.
(https://www.tandfonline.com/doi/abs/10.1080/2150704X.2015.106 162. Ebadi, Ashkan; Paul, Patrick; Auer, Sofia; Tremblay, Stéphane (12
2159). Remote Sensing Letters. 6 (7): 568–577. November 2021). "NRC-GAMMA: Introducing a Novel Large Gas
doi:10.1080/2150704X.2015.1062159 (https://doi.org/10.1080%2F2 Meter Image Dataset". arXiv:2111.06827 (https://arxiv.org/abs/2111.
150704X.2015.1062159). S2CID 58788630 (https://api.semanticsc 06827) [cs.CV (https://arxiv.org/archive/cs.CV)].
holar.org/CorpusID:58788630).
163. Canada, Government of Canada National Research Council
148. Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification (2021). "The gas meter image dataset (NRC-GAMMA) - NRC
from Optical Aerial Images with Convolutional Neural Networks (htt Digital Repository" (https://nrc-digital-repository.canada.ca/eng/vie
ps://www.mdpi.com/2072-4292/10/4/511)." Remote Sensing. 2018; w/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4). nrc-digital-
10(4):511. repository.canada.ca. doi:10.4224/3c8s-z290 (https://doi.org/10.422
149. Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery 4%2F3c8s-z290). Retrieved 2 December 2021.
dataset". Available: https://www.iuii.ua.es/datasets/masati/, 2018. 164. Rabah, Chaima Ben; Coatrieux, Gouenou; Abdelfattah, Riadh
150. Johnson, Brian; Tateishi, Ryutaro; Xie, Zhixiao (2012). "Using (October 2020). "The Supatlantique Scanned Documents Database
geographically weighted variables for image classification". for Digital Image Forensics Purposes" (https://dx.doi.org/10.1109/ici
Remote Sensing Letters. 3 (6): 491–499. p40778.2020.9190665). 2020 IEEE International Conference on
doi:10.1080/01431161.2011.629637 (https://doi.org/10.1080%2F01 Image Processing (ICIP). IEEE: 2096–2100.
431161.2011.629637). S2CID 122543681 (https://api.semanticscho doi:10.1109/icip40778.2020.9190665 (https://doi.org/10.1109%2Fic
lar.org/CorpusID:122543681). ip40778.2020.9190665). ISBN 978-1-7281-6395-6.
151. Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid S2CID 224881147 (https://api.semanticscholar.org/CorpusID:22488
NN-GA Model Based Approach (https://www.researchgate.net/profil 1147).
e/Sankhadeep_Chatterjee/publication/282605325_Forest_Type_Cl 165. Mills, Kyle; Tamblyn, Isaac (16 May 2018), Big graphene dataset,
assification_A_Hybrid_NN-GA_Model_Based_Approach/links/574 National Research Council of Canada,
93cb308ae5c51e29e6f1b/Forest-Type-Classification-A-Hybrid-NN- doi:10.4224/c8sc04578j.data (https://doi.org/10.4224%2Fc8sc0457
GA-Model-Based-Approach.pdf)." Information Systems Design and 8j.data)
Intelligent Applications. Springer India, 2016. 227–236. 166. Mills, Kyle; Spanner, Michael; Tamblyn, Isaac (16 May 2018).
152. Diegert, Carl. "A combinatorial method for tracing objects using "Quantum simulation". Quantum simulations of an electron in a two
semantics of their shape (https://www.osti.gov/servlets/purl/127883 dimensional potential well. National Research Council of Canada.
7)." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 doi:10.4224/PhysRevA.96.042113.data (https://doi.org/10.4224%2F
IEEE 39th. IEEE, 2010. PhysRevA.96.042113.data).
153. Razakarivony, Sebastien, and Frédéric Jurie. "Small target 167. Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. (2012). "A
detection combining foreground and background manifolds (https:// database for fine grained activity detection of cooking activities".
hal.archives-ouvertes.fr/hal-00943444/file/13_mva-detection.pdf)." 2012 IEEE Conference on Computer Vision and Pattern
IAPR International Conference on Machine Vision Applications. Recognition. IEEE. pp. 1194–1201. doi:10.1109/cvpr.2012.6247801
2013. (https://doi.org/10.1109%2Fcvpr.2012.6247801). ISBN 978-1-4673-
154. "SpaceNet" (http://explore.digitalglobe.com/spacenet). 1228-8.
explore.digitalglobe.com. Retrieved 13 March 2018. 168. Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of
155. Etten, Adam Van (5 January 2017). "Getting Started With SpaceNet actions: Recovering the syntax and semantics of goal-directed
Data" (https://medium.com/the-downlinq/getting-started-with-spacen human activities (https://www.cv-foundation.org/openaccess/content
et-data-827fd2ec9f53). The DownLinQ. Retrieved 13 March 2018. _cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_pap
er.pdf)."Proceedings of the IEEE Conference on Computer Vision
156. Vakalopoulou, M.; Bus, N.; Karantzalosa, K.; Paragios, N. (July
2017). Integrating edge/boundary priors with classification scores and Pattern Recognition. 2014.
for building detection in very high resolution data. 2017 IEEE 169. Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in
International Geoscience and Remote Sensing Symposium authentication based on physical non-cloneable functions: The
(IGARSS). pp. 3309–3312. doi:10.1109/IGARSS.2017.8127705 (htt Forensic Authentication Microstructure Optical Set (FAMOS). (http://
ps://doi.org/10.1109%2FIGARSS.2017.8127705). ISBN 978-1- vision.unige.ch/publications/postscript/2012/2012.WIFS.database.p
5090-4951-6. S2CID 8297433 (https://api.semanticscholar.org/Corp df)"Proc. Proceedings of IEEE International Workshop on
usID:8297433). Information Forensics and Security. 2012.
157. Yang, Yi; Newsam, Shawn (2010). Bag-of-visual-words and spatial 170. Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile
extensions for land-use classification. Proceedings of the 18th fine-grained recognition of pharma packages (https://archive-ouvert
SIGSPATIAL International Conference on Advances in Geographic e.unige.ch/unige:97444/ATTACHMENT01)."Proc. European Signal
Information Systems – GIS '10. New York, New York, USA: ACM Processing Conference (EUSIPCO). 2017.
Press. doi:10.1145/1869790.1869829 (https://doi.org/10.1145%2F1 171. Khosla, Aditya, et al. "Novel dataset for fine-grained image
869790.1869829). ISBN 9781450304283. S2CID 993769 (https://a categorization: Stanford dogs (https://people.csail.mit.edu/khosla/pa
pi.semanticscholar.org/CorpusID:993769). pers/fgvc2011.pdf)."Proc. CVPR Workshop on Fine-Grained Visual
158. Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; Categorization (FGVC). 2011.
DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (3 172. Parkhi, Omkar M., et al. "Cats and dogs (http://www.robots.ox.ac.uk:
November 2015). DeepSat: a learning framework for satellite 5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf)."Computer
imagery. ACM. p. 37. doi:10.1145/2820783.2820816 (https://doi.org/ Vision and Pattern Recognition (CVPR), 2012 IEEE Conference
10.1145%2F2820783.2820816). ISBN 9781450339674. on. IEEE, 2012.
S2CID 4387134 (https://api.semanticscholar.org/CorpusID:438713 173. Biggs, Benjamin; Boyne, Oliver; Charles, James; Fitzgibbon,
4). Andrew; Cipolla, Roberto (2020). Computer Vision – ECCV 2020.
159. Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Lecture Notes in Computer Science. Vol. 12356. arXiv:2007.11110
Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (https://arxiv.org/abs/2007.11110). doi:10.1007/978-3-030-58621-8
(21 November 2019). "DeepSat V2: feature augmented (https://doi.org/10.1007%2F978-3-030-58621-8). ISBN 978-3-030-
convolutional neural nets for satellite image classification". Remote 58620-1. S2CID 227173931 (https://api.semanticscholar.org/Corpu
Sensing Letters. 11 (2): 156–165. arXiv:1911.07747 (https://arxiv.or sID:227173931).
g/abs/1911.07747). doi:10.1080/2150704x.2019.1693071 (https://d 174. Razavian, Ali, et al. "CNN features off-the-shelf: an astounding
oi.org/10.1080%2F2150704x.2019.1693071). ISSN 2150-704X (htt baseline for recognition (https://www.cv-foundation.org/openaccess/
ps://www.worldcat.org/issn/2150-704X). S2CID 208138097 (https:// content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Feat
api.semanticscholar.org/CorpusID:208138097). ures_Off-the-Shelf_2014_CVPR_paper.pdf)." Proceedings of the
160. Md Jahidul Islam, et al. "Semantic Segmentation of Underwater IEEE Conference on Computer Vision and Pattern Recognition
Imagery: Dataset and Benchmark (https://ieeexplore.ieee.org/abstra Workshops. 2014.
ct/document/9340821)." 2020 IEEE/RSJ International Conference
on Intelligent Robots and Systems (IROS). IEEE, 2020.
175. Ortega, Michael; et al. (1998). "Supporting ranked boolean similarity 192. Taj-Eddin, I. A. T. F.; Afifi, M.; Korashy, M.; Hamdy, D.; Nasser, M.;
queries in MARS". IEEE Transactions on Knowledge and Data Derbaz, S. (July 2016). A new compression technique for
Engineering. 10 (6): 905–925. CiteSeerX 10.1.1.36.6079 (https://cit surveillance videos: Evaluation using new dataset. 2016 Sixth
eseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.6079). International Conference on Digital Information and Communication
doi:10.1109/69.738357 (https://doi.org/10.1109%2F69.738357). Technology and Its Applications (DICTAP). pp. 159–164.
176. He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. doi:10.1109/DICTAP.2016.7544020 (https://doi.org/10.1109%2FDI
"Multiscale conditional random fields for image labeling (ftp://www-v CTAP.2016.7544020). ISBN 978-1-4673-9609-7. S2CID 8698850
host.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr (https://api.semanticscholar.org/CorpusID:8698850).
04.pdf)." Computer vision and pattern recognition, 2004. CVPR 193. Tabak, Michael A.; Norouzzadeh, Mohammad S.; Wolfson, David
2004. Proceedings of the 2004 IEEE computer society conference W.; Sweeney, Steven J.; Vercauteren, Kurt C.; Snow, Nathan P.;
on. Vol. 2. IEEE, 2004. Halseth, Joseph M.; Di Salvo, Paul A.; Lewis, Jesse S.; White,
177. Deneke, Tewodros, et al. "Video transcoding time prediction for Michael D.; Teton, Ben; Beasley, James C.; Schlichting, Peter E.;
proactive load balancing (https://ieeexplore.ieee.org/abstract/docum Boughton, Raoul K.; Wight, Bethany; Newkirk, Eric S.; Ivan, Jacob
ent/6890256/)." Multimedia and Expo (ICME), 2014 IEEE S.; Odell, Eric A.; Brook, Ryan K.; Lukacs, Paul M.; Moeller, Anna
International Conference on. IEEE, 2014. K.; Mandeville, Elizabeth G.; Clune, Jeff; Miller, Ryan S.;
Photopoulou, Theoni (2018). "Machine learning to classify animal
178. Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh,
species in camera trap images: Applications in ecology" (https://doi.
Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick,
Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, org/10.1111%2F2041-210X.13120). Methods in Ecology and
Evolution. 10 (4): 585–590. doi:10.1111/2041-210X.13120 (https://d
Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
(13 April 2016). "Visual Storytelling". arXiv:1604.03968 (https://arxi oi.org/10.1111%2F2041-210X.13120). ISSN 2041-210X (https://ww
v.org/abs/1604.03968) [cs.CL (https://arxiv.org/archive/cs.CL)]. w.worldcat.org/issn/2041-210X).
194. Taj-Eddin, Islam A. T. F.; Afifi, Mahmoud; Korashy, Mostafa; Ahmed,
179. Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset (htt
ps://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf)." Ali H.; Ng, Yoke Cheng; Hernandez, Evelyng; Abdel-Latif, Salma M.
(November 2017). "Can we see photosynthesis? Magnifying the
(2011).
tiny color changes of plant green leaves using Eulerian video
180. Duan, Kun, et al. "Discovering localized attributes for fine-grained magnification". Journal of Electronic Imaging. 26 (6): 060501.
recognition (http://vision.soic.indiana.edu/papers/attributes2012cvp arXiv:1706.03867 (https://arxiv.org/abs/1706.03867).
r.pdf)." Computer Vision and Pattern Recognition (CVPR), 2012 Bibcode:2017JEI....26f0501T (https://ui.adsabs.harvard.edu/abs/20
IEEE Conference on. IEEE, 2012. 17JEI....26f0501T). doi:10.1117/1.jei.26.6.060501 (https://doi.org/1
181. "YouTube-8M Dataset" (https://research.google.com/youtube8m/). 0.1117%2F1.jei.26.6.060501). ISSN 1017-9909 (https://www.worldc
research.google.com. Retrieved 1 October 2016. at.org/issn/1017-9909). S2CID 12367169 (https://api.semanticschol
182. Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; ar.org/CorpusID:12367169).
Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, 195. "Mathematical Mathematics Memes" (https://www.kaggle.com/abdel
Sudheendra (27 September 2016). "YouTube-8M: A Large-Scale ghanibelgaid/mathematical-mathematics-memes).
Video Classification Benchmark". arXiv:1609.08675 (https://arxiv.or 196. Karras, Tero; Laine, Samuli; Aila, Timo (June 2019). "A Style-Based
g/abs/1609.08675) [cs.CV (https://arxiv.org/archive/cs.CV)]. Generator Architecture for Generative Adversarial Networks" (http
183. "YFCC100M Dataset" (http://mmcommons.org). mmcommons.org. s://dx.doi.org/10.1109/cvpr.2019.00453). 2019 IEEE/CVF
Yahoo-ICSI-LLNL. Retrieved 1 June 2017. Conference on Computer Vision and Pattern Recognition (CVPR).
184. Bart Thomee; David A Shamma; Gerald Friedland; Benjamin IEEE: 4396–4405. arXiv:1812.04948 (https://arxiv.org/abs/1812.049
Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li (25 April 48). doi:10.1109/cvpr.2019.00453 (https://doi.org/10.1109%2Fcvpr.2
2016). "Yfcc100m: The new data in multimedia research". 019.00453). ISBN 978-1-7281-3293-8. S2CID 54482423 (https://ap
Communications of the ACM. 59 (2): 64–73. arXiv:1503.01817 (http i.semanticscholar.org/CorpusID:54482423).
s://arxiv.org/abs/1503.01817). doi:10.1145/2812802 (https://doi.org/ 197. McAuley, Julian; Targett, Christopher; Shi, Qinfeng; Anton van den
10.1145%2F2812802). S2CID 207230134 (https://api.semanticsch Hengel (2015). "Image-based Recommendations on Styles and
olar.org/CorpusID:207230134). Substitutes". arXiv:1506.04757 (https://arxiv.org/abs/1506.04757)
185. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS- [cs.CV (https://arxiv.org/archive/cs.CV)].
ACCEDE: A Video Database for Affective Content Analysis (https:// 198. "Amazon review data" (https://nijianmo.github.io/amazon/index.htm
hal.archives-ouvertes.fr/hal-01375518/document)," in IEEE l). nijianmo.github.io. Retrieved 8 October 2021.
Transactions on Affective Computing, 2015. 199. Ganesan, Kavita; Zhai, Chengxiang (2012). "Opinion-based entity
186. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep ranking". Information Retrieval. 15 (2): 116–150.
Learning vs. Kernel Methods: Performance for Emotion Prediction doi:10.1007/s10791-011-9174-8 (https://doi.org/10.1007%2Fs1079
in Videos (https://hal.archives-ouvertes.fr/hal-01193144/documen 1-011-9174-8). hdl:2142/15252 (https://hdl.handle.net/2142%2F152
t)," in 2015 Humaine Association Conference on Affective 52). S2CID 16258727 (https://api.semanticscholar.org/CorpusID:16
Computing and Intelligent Interaction (ACII), 2015. 258727).
187. M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. 200. Lv, Yuanhua, Dimitrios Lymberopoulos, and Qiang Wu. "An
Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The exploration of ranking heuristics in mobile local search (http://citese
mediaeval 2015 affective impact of movies task (https://www.resear erx.ist.psu.edu/viewdoc/download?doi=10.1.1.599.1442&rep=rep1
chgate.net/profile/Hanli_Wang2/publication/309704559_The_Medi &type=pdf)." Proceedings of the 35th international ACM SIGIR
aEval_2015_Affective_Impact_of_Movies_Task/links/581dada308a conference on Research and development in information retrieval.
e12715af33bc8/The-MediaEval-2015-Affective-Impact-of-Movies-T ACM, 2012.
ask.pdf)," in MediaEval 2015 Workshop, 2015. 201. Harper, F. Maxwell; Konstan, Joseph A. (2015). "The MovieLens
188. S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Datasets: History and Context". ACM Transactions on Interactive
Appearance Models for Human Pose Estimation (http://sam.johnso Intelligent Systems. 5 (4): 19. doi:10.1145/2827872 (https://doi.org/1
n.io/research/publications/johnson10bmvc.pdf)", in Proceedings of 0.1145%2F2827872). S2CID 16619709 (https://api.semanticschola
the 21st British Machine Vision Conference (BMVC2010) r.org/CorpusID:16619709).
189. S. Johnson and M. Everingham, "Learning Effective Human Pose 202. Koenigstein, Noam, Gideon Dror, and Yehuda Koren. "Yahoo!
Estimation from Inaccurate Annotation (http://sam.johnson.io/resear music recommendations: modeling music ratings with temporal
ch/publications/johnson11cvpr.pdf)", In Proceedings of IEEE dynamics and item taxonomy (https://www.researchgate.net/profile/
Conference on Computer Vision and Pattern Recognition Noam_Koenigstein/publication/221141054_Yahoo_music_recomm
(CVPR2011) endations_Modeling_music_ratings_with_temporal_dynamics_and
190. Afifi, Mahmoud; Hussain, Khaled F. (2 November 2017). "The _item_taxonomy/links/5404184a0cf2c48563b03c68/Yahoo-music-r
Achievement of Higher Flexibility in Multiple Choice-based Tests ecommendations-Modeling-music-ratings-with-temporal-dynamics-
Using Image Classification Techniques". arXiv:1711.00972 (https:// and-item-taxonomy.pdf)." Proceedings of the fifth ACM conference
arxiv.org/abs/1711.00972) [cs.CV (https://arxiv.org/archive/cs.CV)]. on Recommender systems. ACM, 2011.
191. "MCQ Dataset" (https://sites.google.com/view/mcq-dataset/mcqe-da
taset). sites.google.com. Retrieved 18 November 2017.
203. McFee, Brian, et al. "The million song dataset challenge (https://bm 217. Amini, Massih R.; Usunier, Nicolas; Goutte, Cyril (2009). "Learning
cfee.github.io/papers/msdchallenge.pdf)." Proceedings of the 21st from Multiple Partially Observed Views – an Application to
international conference companion on World Wide Web. ACM, Multilingual Text Categorization" (http://papers.nips.cc/paper/3690-l
2012. earning-from-multiple-partially-observed-views-an-application-to-m
204. Bohanec, Marko, and Vladislav Rajkovic. "Knowledge acquisition ultilingual-text-categorization). Advances in Neural Information
and explanation for multi-attribute decision making (https://www.res Processing Systems. 22: 28–36.
earchgate.net/profile/Marko_Bohanec/publication/246614940_KNO 218. Liu, Ming; et al. (2015). "VRCA: a clustering algorithm for massive
WLEDGE_ACQUISITION_AND_EXPLANATION_FOR_MULTI-AT amount of texts" (https://www.aaai.org/ocs/index.php/IJCAI/IJCAI15/
TRIBUTE_DECISION_MAKING/links/02e7e532152f452d8700000 paper/download/10903/10990). Proceedings of the 24th
0.pdf)." 8th Intl Workshop on Expert Systems and their Applications. International Conference on Artificial Intelligence. AAAI Press.
1988. 219. Al-Harbi, S; Almuhareb, A; Al-Thubaity, A; Khorsheed, M. S.; Al-
205. Tan, Peter J., and David L. Dowe. "MML inference of decision Rajeh, A (2008). "Automatic Arabic Text Classification".
graphs with multi-way joins (http://www.csse.monash.edu.au/~dld/P Proceedings of the 9th International Conference on the Statistical
ublications/2002/Tan+Dowe2002_MMLDecisionGraphs.ps)." Analysis of Textual Data, Lyon, France.
Australian Joint Conference on Artificial Intelligence. 2002. 220. "Relationship and Entity Extraction Evaluation Dataset: Dstl/re3d"
206. "Quantifying comedy on YouTube: why the number of o's in your (https://github.com/dstl/re3d). GitHub. 17 December 2018.
LOL matter" (https://metatext.io/datasets). Metatext NLP Database. 221. "The Examiner – SpamClickBait Catalogue" (https://www.kaggle.co
Retrieved 26 October 2020. m/therohk/examine-the-examiner).
207. Kim, Byung Joo (2012). "A Classifier for Big Data" (https://link.sprin 222. "A Million News Headlines" (https://www.kaggle.com/therohk/millio
ger.com/chapter/10.1007/978-3-642-32692-9_63). Convergence n-headlines).
and Hybrid Information Technology. Communications in Computer
223. "One Week of Global News Feeds" (https://www.kaggle.com/theroh
and Information Science. Vol. 310. pp. 505–512. doi:10.1007/978-3-
k/global-news-week).
642-32692-9_63 (https://doi.org/10.1007%2F978-3-642-32692-9_6
3). ISBN 978-3-642-32691-2. 224. Kulkarni, Rohit (2018), Reuters News-Wire Archive, Harvard
Dataverse, doi:10.7910/DVN/XDB74W (https://doi.org/10.7910%2F
208. Pérezgonzález, Jose D.; Gilbey, Andrew (2011). "Predicting Skytrax DVN%2FXDB74W)
airport rankings from customer reviews" (https://www.ingentaconnec
t.com/content/hsp/cam/2011/00000005/00000004/art00007). 225. "IrishTimes – the Waxy-Wany News" (https://www.kaggle.com/thero
Journal of Airport Management. 5 (4): 335–339. hk/ireland-historical-news).
209. Loh, Wei-Yin, and Yu-Shan Shih. "Split selection methods for 226. "News Headlines Dataset For Sarcasm Detection" (https://kaggle.c
classification trees (http://www3.stat.sinica.edu.tw/statistica/oldpdf/A om/rmisra/news-headlines-dataset-for-sarcasm-detection).
7n41.pdf)." Statistica sinica(1997): 815–840. kaggle.com. Retrieved 27 April 2019.
210. Lim, Tjen-Sien; Loh, Wei-Yin; Shih, Yu-Shan (2000). "A comparison 227. Klimt, Bryan, and Yiming Yang. "Introducing the Enron Corpus (http
of prediction accuracy, complexity, and training time of thirty-three s://bklimt.com/papers/2004_klimt_ceas.pdf)." CEAS. 2004.
old and new classification algorithms". Machine Learning. 40 (3): 228. Kossinets, Gueorgi; Kleinberg, Jon; Watts, Duncan (2008). "The
203–228. doi:10.1023/a:1007608224229 (https://doi.org/10.1023%2 Structure of Information Pathways in a Social Communication
Fa%3A1007608224229). S2CID 17030953 (https://api.semanticsch Network". arXiv:0806.3201 (https://arxiv.org/abs/0806.3201)
olar.org/CorpusID:17030953). [physics.soc-ph (https://arxiv.org/archive/physics.soc-ph)].
211. Kiet Van Nguyen, Vu Duc Nguyen, Phu X. V. Nguyen, Tham T. H. 229. Androutsopoulos, Ion; Koutsias, John; Chandrinos, Konstantinos V.;
Truong, Ngan Luu-Thuy Nguyen. "UIT-VSFC: Vietnamese Paliouras, George; Spyropoulos, Constantine D. (2000). "An
Students’ Feedback Corpus for Sentiment Analysis (https://ieeexplo evaluation of Naive Bayesian anti-spam filtering". In Potamias, G.;
re.ieee.org/document/8573337) Moustakis, V.; van Someren, M. (eds.). Proceedings of the
212. Ho, Vong Anh; Nguyen, Duong Huynh-Cong; Nguyen, Danh Workshop on Machine Learning in the New Information Age. 11th
Hoang; Pham, Linh Thi-Van; Nguyen, Duc-Vu; Nguyen, Kiet Van; European Conference on Machine Learning, Barcelona, Spain.
Nguyen, Ngan Luu-Thuy (2020). "Emotion Recognition for Vol. 11. pp. 9–17. arXiv:cs/0006013 (https://arxiv.org/abs/cs/000601
Vietnamese Social Media Text" (https://link.springer.com/chapter/1 3). Bibcode:2000cs........6013A (https://ui.adsabs.harvard.edu/abs/2
0.1007/978-981-15-6168-9_27). Computational Linguistics. 000cs........6013A).
Communications in Computer and Information Science. Vol. 1215. 230. Bratko, Andrej; et al. (2006). "Spam filtering using statistical data
pp. 319–333. arXiv:1911.09339 (https://arxiv.org/abs/1911.09339). compression models" (http://www.jmlr.org/papers/volume7/bratko06
doi:10.1007/978-981-15-6168-9_27 (https://doi.org/10.1007%2F978 a/bratko06a.pdf) (PDF). The Journal of Machine Learning
-981-15-6168-9_27). ISBN 978-981-15-6167-2. S2CID 208202333 Research. 7: 2673–2698.
(https://api.semanticscholar.org/CorpusID:208202333). 231. Almeida, Tiago A., José María G. Hidalgo, and Akebo Yamakami.
213. Nhung Thi-Hong Nguyen, Phuong Ha-Dieu Phan, Luan Thanh "Contributions to the study of SMS spam filtering: new collection
Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen (24 April 2021). and results (http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
"Vietnamese Open-domain Complaint Detection in E-Commerce doceng11.pdf)."Proceedings of the 11th ACM symposium on
Websites". arXiv:2104.11969 (https://arxiv.org/abs/2104.11969) Document engineering. ACM, 2011.
[cs.CL (https://arxiv.org/archive/cs.CL)]. 232. Delany; Jane, Sarah; Buckley, Mark; Greene, Derek (2012). "SMS
214. Phu Gia Hoang, Canh Duc Luu, Khanh Quoc Tran, Kiet Van spam filtering: methods and data" (https://arrow.dit.ie/cgi/viewconten
Nguyen, Ngan Luu-Thuy Nguyen (26 January 2023). "ViHOS: Hate t.cgi?article=1022&context=scschcomart). Expert Systems with
Speech Spans Detection for Vietnamese". arXiv:2301.10186 (http Applications. 39 (10): 9899–9908. doi:10.1016/j.eswa.2012.02.053
s://arxiv.org/abs/2301.10186) [cs.CL (https://arxiv.org/archive/cs.C (https://doi.org/10.1016%2Fj.eswa.2012.02.053). S2CID 15546924
L)]. (https://api.semanticscholar.org/CorpusID:15546924).
215. Dermouche, Mohamed; Velcin, Julien; Khouas, Leila; Loudcher, 233. Joachims, Thorsten. A Probabilistic Analysis of the Rocchio
Sabine (2014). "A Joint Model for Topic-Sentiment Evolution over Algorithm with TFIDF for Text Categorization (https://apps.dtic.mil/dt
Time". 2014 IEEE International Conference on Data Mining. IEEE. ic/tr/fulltext/u2/a307731.pdf). No. CMU-CS-96-118. Carnegie-
pp. 773–778. doi:10.1109/icdm.2014.82 (https://doi.org/10.1109%2 mellon univ pittsburgh pa dept of computer science, 1996.
Ficdm.2014.82). ISBN 978-1-4799-4302-9. 234. Dimitrakakis, Christos, and Samy Bengio. Online Policy Adaptation
216. Rose, Tony; Stevenson, Mark; Whitehead, Miles (2002). "The for Ensemble Algorithms (https://infoscience.epfl.ch/record/82788/fil
Reuters Corpus Volume 1-from Yesterday's News to Tomorrow's es/rr02-28.pdf). No. EPFL-REPORT-82788. IDIAP, 2002.
Language Resources" (https://web.archive.org/web/201908060150 235. Dooms, S. et al. "Movietweetings: a movie rating dataset collected
15/https://pdfs.semanticscholar.org/3e4b/dc7f8904c58f8fce1993892 from twitter, 2013. Available from
99ec1ed8e1226.pdf) (PDF). LREC. 2. S2CID 9239414 (https://api.s https://github.com/sidooms/MovieTweetings."
emanticscholar.org/CorpusID:9239414). Archived from the original 236. RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-
(https://pdfs.semanticscholar.org/3e4b/dc7f8904c58f8fce199389299 Miller, Erik (2017). "Twitter100k: A Real-world Dataset for Weakly
ec1ed8e1226.pdf) (PDF) on 6 August 2019. Supervised Cross-Media Retrieval". arXiv:1703.06618 (https://arxiv.
org/abs/1703.06618) [cs.CV (https://arxiv.org/archive/cs.CV)].
237. "huyt16/Twitter100k" (https://github.com/huyt16/Twitter100k).
GitHub. Retrieved 26 March 2018.
238. Go, Alec; Bhayani, Richa; Huang, Lei (2009). "Twitter sentiment 256. Sordoni, Alessandro; Galley, Michel; Auli, Michael; Brockett, Chris;
classification using distant supervision". CS224N Project Report, Ji, Yangfeng; Mitchell, Margaret; Nie, Jian-Yun; Gao, Jianfeng;
Stanford. 1: 12. Dolan, Bill (2015). "A Neural Network Approach to Context-
239. Chikersal, Prerna, Soujanya Poria, and Erik Cambria. "SeNTU: Sensitive Generation of Conversational Responses".
sentiment analysis of tweets by combining a rule-based classifier arXiv:1506.06714 (https://arxiv.org/abs/1506.06714) [cs.CL (https://a
with supervised learning (https://www.aclweb.org/anthology/S15-21 rxiv.org/archive/cs.CL)].
08)." Proceedings of the International Workshop on Semantic 257. Shaoul, C. & Westbury C. (2013) A reduced redundancy USENET
Evaluation, SemEval. 2015. corpus (2005–2011) Edmonton, AB: University of Alberta
240. Zafarani, Reza, and Huan Liu. "Social computing data repository at (downloaded from
ASU." School of Computing, Informatics and Decision Systems http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.d
Engineering, Arizona State University (2009). 258. KAN, M. (2011, January). NUS Short Message Service (SMS)
241. Bisgin, Halil, Nitin Agarwal, and Xiaowei Xu. "Investigating Corpus. Retrieved from
homophily in online social networks (http://www.academia.edu/dow http://www.comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/
nload/3746109/4191a533.pdf)." Web Intelligence and Intelligent Archived (https://web.archive.org/web/20180629055042/http://www.
Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International comp.nus.edu.sg/entrepreneurship/innovation/osr/corpus/) 29 June
Conference on. Vol. 1. IEEE, 2010. 2018 at the Wayback Machine
242. McAuley, Julian J.; Leskovec, Jure. "Learning to Discover Social 259. Stuck_In_the_Matrix. (2015, July 3). I have every publicly available
Circles in Ego Networks". NIPS. 2012: 2012. Reddit comment for research. ~ 1.7 billion comments @ 250 GB
compressed. Any interest in this? [Original post]. Message posted to
243. Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko (2014). "Network-based
statistical comparison of citation topology of bibliographic https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_pu
databases" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC417829 260. Lowe, Ryan; Pow, Nissan; Serban, Iulian; Pineau, Joelle (2015).
2). Scientific Reports. 4 (6496): 6496. arXiv:1502.05061 (https://arxi "The Ubuntu Dialogue Corpus: A Large Dataset for Research in
v.org/abs/1502.05061). Bibcode:2014NatSR...4E6496S (https://ui.a Unstructured Multi-Turn Dialogue Systems". arXiv:1506.08909 (http
dsabs.harvard.edu/abs/2014NatSR...4E6496S). s://arxiv.org/abs/1506.08909) [cs.CL (https://arxiv.org/archive/cs.C
doi:10.1038/srep06496 (https://doi.org/10.1038%2Fsrep06496). L)].
PMC 4178292 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178 261. Jason Williams Antoine Raux Matthew Henderson, "[1] (https://ww
292). PMID 25263231 (https://pubmed.ncbi.nlm.nih.gov/25263231). w.microsoft.com/en-us/research/publication/the-dialog-state-trackin
244. Abdulla, N., et al. "Arabic sentiment analysis: Corpus-based and g-challenge-series-a-review/)", Dialogue & Discourse | April 2016 .
lexicon-based." Proceedings of the IEEE conference on Applied 262. Hoppe, Travis (16 December 2021), The-Pile-FreeLaw (https://githu
Electrical Engineering and Computing Technologies (AEECT). b.com/thoppe/The-Pile-FreeLaw), retrieved 11 January 2023
2013. 263. Zheng, Lucia; Guha, Neel; Anderson, Brandon R.; Henderson,
245. Abooraig, Raddad, et al. "On the automatic categorization of Arabic Peter; Ho, Daniel E. (21 June 2021). "When does pretraining help?"
articles based on their political orientation (https://www.researchgat (https://dx.doi.org/10.1145/3462757.3466088). Proceedings of the
e.net/profile/Shadi_Alzubi/publication/324487844_Automatic_categ Eighteenth International Conference on Artificial Intelligence and
orization_of_Arabic_articles_based_on_their_political_orientation/li Law. New York, NY, USA: ACM: 159–168.
nks/5c1201c9299bf139c7549e1a/Automatic-categorization-of-Arabi doi:10.1145/3462757.3466088 (https://doi.org/10.1145%2F346275
c-articles-based-on-their-political-orientation.pdf)." Third 7.3466088). ISBN 9781450385268. S2CID 233296302 (https://api.s
International Conference on Informatics Engineering and emanticscholar.org/CorpusID:233296302).
Information Science (ICIEIS2014). 2014. 264. "pile-of-law/pile-of-law · Datasets at Hugging Face" (https://huggingf
246. Kawala, François, et al. "Prédictions d'activité dans les réseaux ace.co/datasets/pile-of-law/pile-of-law). huggingface.co. 4 July
sociaux en ligne (https://hal.archives-ouvertes.fr/hal-00881395/docu 2022. Retrieved 11 January 2023.
ment)." 4ième conférence sur les modèles et l'analyse des réseaux: 265. "About | Caselaw Access Project" (https://case.law/about/).
Approches mathématiques et informatiques. 2013. case.law. Retrieved 11 January 2023.
247. Sabharwal, Ashish; Samulowitz, Horst; Tesauro, Gerald (2015). 266. K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S.
"Selecting Near-Optimal Learners via Incremental Data Allocation". Gerber and L. E. Barnes, "HDLTex: Hierarchical Deep Learning for
arXiv:1601.00024 (https://arxiv.org/abs/1601.00024) [cs.LG (https://a Text Classification", 2017 16th IEEE International Conference on
rxiv.org/archive/cs.LG)]. Machine Learning and Applications (ICMLA), pp. 364–371. doi:
248. Xu et al. "SemEval-2015 Task 1: Paraphrase and Semantic 10.1109/ICMLA.2017.0-134 (https://doi.org/10.1109/ICMLA.2017.0-
Similarity in Twitter (PIT) (https://www.aclweb.org/anthology/S15-20 134)
01)" Proceedings of the 9th International Workshop on Semantic 267. K. Kowsari, D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S.
Evaluation. 2015. Gerber and L. E. Barnes, "Web of Science Dataset",
249. Xu et al. "Extracting Lexically Divergent Paraphrases from Twitter (h doi:10.17632/9rw3vkcfy4.6 (https://doi.org/10.17632%2F9rw3vkcfy
ttps://transacl.org/ojs/index.php/tacl/article/viewFile/498/64)" 4.6)
Transactions of the Association for Computational (TACL). 2014. 268. Galgani, Filippo, Paul Compton, and Achim Hoffmann. "Combining
250. Middleton, Stuart E; Middleton, Lee; Modafferi, Stefano (2014). different summarization techniques for legal text (https://www.aclwe
"Real-Time Crisis Mapping of Natural Disasters Using Social b.org/anthology/W12-0515)." Proceedings of the Workshop on
Media" (https://eprints.soton.ac.uk/370581/1/ieee-is2014.pdf) Innovative Hybrid Approaches to the Processing of Textual Data.
(PDF). IEEE Intelligent Systems. 29 (2): 9–17. Association for Computational Linguistics, 2012.
doi:10.1109/MIS.2013.126 (https://doi.org/10.1109%2FMIS.2013.12 269. Nagwani, N. K. (2015). "Summarizing large text collection using
6). S2CID 15139204 (https://api.semanticscholar.org/CorpusID:151 topic modeling and clustering based on MapReduce framework" (ht
39204). tps://doi.org/10.1186%2Fs40537-015-0020-5). Journal of Big Data.
251. "geoparsepy" (https://pypi.org/project/geoparsepy). 2016. Python 2 (1): 1–18. doi:10.1186/s40537-015-0020-5 (https://doi.org/10.118
PyPI library 6%2Fs40537-015-0020-5).
252. Gupta, Aakash (5 December 2020). "Dutch social media collection" 270. Schler, Jonathan; et al. (2006). "Effects of Age and Gender on
(http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/ Blogging" (https://www.aaai.org/Papers/Symposia/Spring/2006/SS-
MTPTL7). doi:10.5072/FK2/MTPTL7 (https://doi.org/10.5072%2FF 06-03/SS06-03-039.pdf) (PDF). AAAI Spring Symposium:
K2%2FMTPTL7). {{cite journal}}: ; Check |url= value Computational Approaches to Analyzing Weblogs. 6.
(help) 271. Anand, Pranav, et al. "Believe Me-We Can Do This! Annotating
253. "Streamlit" (https://huggingface.co/datasets/viewer/?dataset=dutch_ Persuasive Acts in Blog Text."Computational Models of Natural
social). huggingface.co. Retrieved 18 December 2020. Argument. 2011.
254. "Dutch Social media collection" (https://kaggle.com/skylord/dutch-t 272. Traud, Amanda L., Peter J. Mucha, and Mason A. Porter. "Social
weets). kaggle.com. Retrieved 18 December 2020. structure of Facebook networks." Physica A: Statistical Mechanics
255. Forsyth, E., Lin, J., & Martell, C. (2008, June 25). The NPS Chat and its Applications391.16 (2012): 4165–4180.
Corpus. Retrieved from http://faculty.nps.edu/cmartell/NPSChat.htm
273. Richard, Emile; Savalle, Pierre-Andre; Vayatis, Nicolas (2012). 292. "DSL Corpus Collection" (http://ttg.uni-saarland.de/resources/DSLC
"Estimation of Simultaneously Sparse and Low Rank Matrices". C/). ttg.uni-saarland.de. Retrieved 22 September 2017.
arXiv:1206.6474 (https://arxiv.org/abs/1206.6474) [cs.DS (https://arx 293. "Urban Dictionary Words and Definitions" (https://www.kaggle.com/t
iv.org/archive/cs.DS)]. herohk/urban-dictionary-words-dataset).
274. Richardson, Matthew; Burges, Christopher JC; Renshaw, Erin 294. H. Elsahar, P. Vougiouklis, A. Remaci, C. Gravier, J. Hare, F.
(2013). "MCTest: A Challenge Dataset for the Open-Domain Laforest, E. Simperl, "T-REx: A Large Scale Alignment of Natural
Machine Comprehension of Text" (https://www.aclweb.org/antholog Language with Knowledge Base Triples (https://www.aclweb.org/an
y/D13-1020). EMNLP. 1. thology/L18-1544)", Proceedings of the Eleventh International
275. Weston, Jason; Bordes, Antoine; Chopra, Sumit; Rush, Alexander Conference on Language Resources and Evaluation (LREC-2018).
M.; Bart van Merriënboer; Joulin, Armand; Mikolov, Tomas (2015). 295. Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy,
"Towards AI-Complete Question Answering: A Set of Prerequisite Omer; Bowman, Samuel R. (2018). "GLUE: A Multi-Task
Toy Tasks". arXiv:1502.05698 (https://arxiv.org/abs/1502.05698) Benchmark and Analysis Platform for Natural Language
[cs.AI (https://arxiv.org/archive/cs.AI)]. Understanding". arXiv:1804.07461 (https://arxiv.org/abs/1804.0746
276. Marcus, Mitchell P.; Ann Marcinkiewicz, Mary; Santorini, Beatrice 1) [cs.CL (https://arxiv.org/archive/cs.CL)].
(1993). "Building a large annotated corpus of English: The Penn 296. "Computers Are Learning to Read—But They're Still Not So Smart"
Treebank" (http://repository.upenn.edu/cgi/viewcontent.cgi?article=1 (https://www.wired.com/story/computers-are-learning-to-read-but-th
246&context=cis_reports). Computational Linguistics. 19 (2): 313– eyre-still-not-so-smart/). Wired. Retrieved 29 December 2019.
330. 297. "GLUE Benchmark" (https://gluebenchmark.com/).
277. Collins, Michael (2003). "Head-driven statistical models for natural gluebenchmark.com. Retrieved 25 February 2019.
language parsing" (https://doi.org/10.1162%2F0891201033227533 298. Quan, Hoang Lam; Quang, Duy Le; Van Kiet, Nguyen; Ngan, Luu-
56). Computational Linguistics. 29 (4): 589–637. Thuy Nguyen. "UIT-ViIC: A Dataset for the First Evaluation on
doi:10.1162/089120103322753356 (https://doi.org/10.1162%2F089
Vietnamese Image Captioning" (https://www.springerprofessional.d
120103322753356).
e/uit-viic-a-dataset-for-the-first-evaluation-on-vietnamese-image-/18
278. Guyon, Isabelle, et al., eds. Feature extraction: foundations and 612672).
applications (https://books.google.com/books?id=FOTzBwAAQBAJ
299. To, Quoc Huy; Nguyen, Van Kiet; Nguyen, Luu Thuy Ngan; Nguyen,
&q=DEXTER). Vol. 207. Springer, 2008. Gia Tuan Anh (2020). "Gender Prediction Based on Vietnamese
279. Lin, Yuri, et al. "Syntactic annotations for the google books ngram Names with Machine Learning Techniques". Proceedings of the 4th
corpus (https://www.aclweb.org/anthology/P/P12/P12-3029.pdf)." International Conference on Natural Language Processing and
Proceedings of the ACL 2012 system demonstrations. Association Information Retrieval. pp. 55–60. arXiv:2010.10852 (https://arxiv.or
for Computational Linguistics, 2012. g/abs/2010.10852). doi:10.1145/3443279.3443309 (https://doi.org/1
280. Krishnamoorthy, Niveda; et al. (2013). "Generating Natural- 0.1145%2F3443279.3443309). ISBN 9781450377607.
Language Video Descriptions Using Text-Mined Knowledge" (http S2CID 224814110 (https://api.semanticscholar.org/CorpusID:22481
s://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/download/645 4110).
4/7204). AAAI. 1. 300. Nguyen, Luan Thanh; Van Nguyen, Kiet; Nguyen, Ngan Luu-Thuy
281. Luyckx, Kim, and Walter Daelemans. "Personae: a Corpus for (18 March 2021). "Constructive and Toxic Speech Detection for
Author and Personality Prediction from Text (http://www.academia.e Open-Domain Social Media Comments in Vietnamese". Advances
du/download/30766398/759.pdf)." LREC. 2008. and Trends in Artificial Intelligence. Artificial Intelligence Practices.
282. Solorio, Thamar, Ragib Hasan, and Mainul Mizan. "A case study of Lecture Notes in Computer Science. Vol. 12798. pp. 572–583.
sockpuppet detection in wikipedia (https://www.aclweb.org/antholog arXiv:2103.10069 (https://arxiv.org/abs/2103.10069).
y/W13-1107)." Workshop on Language Analysis in Social Media doi:10.1007/978-3-030-79457-6_49 (https://doi.org/10.1007%2F978
(LASM) at NAACL HLT. 2013. -3-030-79457-6_49). ISBN 978-3-030-79456-9. S2CID 232269671
(https://api.semanticscholar.org/CorpusID:232269671).
283. "Pushshift Files" (https://files.pushshift.io/). files.pushshift.io.
Retrieved 12 January 2023. 301. M. Versteegh, R. Thiollière, T. Schatz, X.-N. Cao, X. Anguera, A.
Jansen, and E. Dupoux (2015). "The Zero Resource Speech
284. Baumgartner, Jason; Zannettou, Savvas; Keegan, Brian; Squire,
Challenge 2015," in INTERSPEECH-2015.
Megan; Blackburn, Jeremy (23 January 2020). "The Pushshift
Reddit Dataset". arXiv:2001.08435 (https://arxiv.org/abs/2001.0843 302. M. Versteegh, X. Anguera, A. Jansen, and E. Dupoux, (2016). "The
5) [cs.SI (https://arxiv.org/archive/cs.SI)]. Zero Resource Speech Challenge 2015: Proposed Approaches
285. Ciarelli, Patrick Marques, and Elias Oliveira. "Agglomeration and and Results (https://core.ac.uk/download/pdf/82574050.pdf)," in
elimination of terms for dimensionality reduction (https://ieeexplore.i SLTU-2016.
eee.org/abstract/document/5364970/)." Intelligent Systems Design 303. Sakar, Betul Erdogdu; et al. (2013). "Collection and analysis of a
and Applications, 2009. ISDA'09. Ninth International Conference Parkinson speech dataset with multiple types of sound recordings".
on. IEEE, 2009. IEEE Journal of Biomedical and Health Informatics. 17 (4): 828–
286. Zhou, Mingyuan, Oscar Hernan Madrid Padilla, and James G. 834. doi:10.1109/jbhi.2013.2245674 (https://doi.org/10.1109%2Fjbh
Scott. "Priors for random count matrices derived from a family of i.2013.2245674). PMID 25055311 (https://pubmed.ncbi.nlm.nih.gov/
25055311). S2CID 15491516 (https://api.semanticscholar.org/Corp
negative binomial processes." Journal of the American Statistical
usID:15491516).
Association just-accepted (2015): 00–00.
304. Zhao, Shunan, et al. "Automatic detection of expressed emotion in
287. Kotzias, Dimitrios, et al. "From group to individual labels using deep
Parkinson's disease (https://www.researchgate.net/profile/Steven_L
features (http://datalab.ics.uci.edu/papers/kdd2015_dimitris.pdf)."
Proceedings of the 21th ACM SIGKDD International Conference on ivingstone2/publication/267623907_Automatic_detection_of_expre
Knowledge Discovery and Data Mining. ACM, 2015. ssed_emotion_in_Parkinson%27s_Disease/links/5453af1d0cf26d5
090a54cfe/Automatic-detection-of-expressed-emotion-in-Parkinson
288. Ning, Yue; Muthiah, Sathappan; Rangwala, Huzefa; Ramakrishnan, s-Disease.pdf)." Acoustics, Speech and Signal Processing
Naren (2016). "Modeling Precursors for Event Forecasting via (ICASSP), 2014 IEEE International Conference on. IEEE, 2014.
Nested Multi-Instance Learning". arXiv:1602.08033 (https://arxiv.or
g/abs/1602.08033) [cs.SI (https://arxiv.org/archive/cs.SI)]. 305. Used in: Hammami, Nacereddine, and Mouldi Bedda. "Improved
tree model for Arabic speech recognition." Computer Science and
289. Buza, Krisztian. "Feedback prediction for blogs (http://www.cs.bme. Information Technology (ICCSIT), 2010 3rd IEEE International
hu/~buza/pdfs/gfkl2012_blogs.pdf)."Data analysis, machine Conference on. Vol. 5. IEEE, 2010.
learning and knowledge discovery. Springer International
306. Maaten, Laurens. "Learning discriminative fisher kernels (https://lvd
Publishing, 2014. 145–152.
maaten.github.io/publications/papers/ICML_2011.pdf)."
290. Soysal, Ömer M (2015). "Association rule mining with mostly Proceedings of the 28th International Conference on Machine
associated sequential patterns". Expert Systems with Applications. Learning (ICML-11). 2011.
42 (5): 2582–2592. doi:10.1016/j.eswa.2014.10.049 (https://doi.org/
307. Cole, Ronald, and Mark Fanty. "Spoken letter recognition (https://w
10.1016%2Fj.eswa.2014.10.049).
ww.aclweb.org/anthology/H90-1075)." Proc. Third DARPA Speech
291. Bowman, Samuel R.; Angeli, Gabor; Potts, Christopher; Manning, and Natural Language Workshop. 1990.
Christopher D. (2015). "A large annotated corpus for learning
natural language inference". arXiv:1508.05326 (https://arxiv.org/ab
s/1508.05326) [cs.CL (https://arxiv.org/archive/cs.CL)].
308. Chapelle, Olivier; Sindhwani, Vikas; Keerthi, Sathiya S. (2008). 324. Esposito, Roberto; Radicioni, Daniele P. (2009). "Carpediem:
"Optimization techniques for semi-supervised support vector Optimizing the viterbi algorithm and applications to supervised
machines" (http://www.jmlr.org/papers/volume9/chapelle08a/chapel sequential learning" (http://www.jmlr.org/papers/volume10/esposito
le08a.pdf) (PDF). The Journal of Machine Learning Research. 9: 09a/esposito09a.pdf) (PDF). The Journal of Machine Learning
203–233. Research. 10: 1851–1880.
309. Kudo, Mineichi; Toyama, Jun; Shimbo, Masaru (1999). 325. Sourati, Jamshid; et al. (2016). "Classification Active Learning
"Multidimensional curve classification using passing-through Based on Mutual Information" (https://doi.org/10.3390%2Fe180200
regions". Pattern Recognition Letters. 20 (11): 1103–1111. 51). Entropy. 18 (2): 51. Bibcode:2016Entrp..18...51S (https://ui.ads
Bibcode:1999PaReL..20.1103K (https://ui.adsabs.harvard.edu/abs/ abs.harvard.edu/abs/2016Entrp..18...51S). doi:10.3390/e18020051
1999PaReL..20.1103K). CiteSeerX 10.1.1.46.2515 (https://citeseer (https://doi.org/10.3390%2Fe18020051).
x.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2515). 326. Salamon, Justin; Jacoby, Christopher; Bello, Juan Pablo. "A dataset
doi:10.1016/s0167-8655(99)00077-x (https://doi.org/10.1016%2Fs0 and taxonomy for urban sound research (https://www.researchgate.
167-8655%2899%2900077-x). net/profile/Justin_Salamon/publication/267269056_A_Dataset_and
310. Jaeger, Herbert; et al. (2007). "Optimization and applications of _Taxonomy_for_Urban_Sound_Research/links/544936af0cf2f6388
echo state networks with leaky-integrator neurons". Neural 0810a84/A-Dataset-and-Taxonomy-for-Urban-Sound-Research.pd
Networks. 20 (3): 335–352. doi:10.1016/j.neunet.2007.04.016 (http f)." Proceedings of the ACM International Conference on
s://doi.org/10.1016%2Fj.neunet.2007.04.016). PMID 17517495 (http Multimedia. ACM, 2014.
s://pubmed.ncbi.nlm.nih.gov/17517495). 327. Lagrange, Mathieu; Lafay, Grégoire; Rossignol, Mathias; Benetos,
311. Tsanas, Athanasios; et al. (2010). "Accurate telemonitoring of Emmanouil; Roebel, Axel (2015). "An evaluation framework for
Parkinson's disease progression by noninvasive speech tests" (htt event detection using a morphological model of acoustic scenes".
p://precedings.nature.com/documents/3920/version/1). IEEE arXiv:1502.00141 (https://arxiv.org/abs/1502.00141) [stat.ML (https://
Transactions on Biomedical Engineering (Submitted manuscript). arxiv.org/archive/stat.ML)].
57 (4): 884–893. doi:10.1109/tbme.2009.2036000 (https://doi.org/1 328. Gemmeke, Jort F., et al. "Audio Set: An ontology and human-
0.1109%2Ftbme.2009.2036000). PMID 19932995 (https://pubmed. labeled dataset for audio events." IEEE International Conference on
ncbi.nlm.nih.gov/19932995). S2CID 7382779 (https://api.semantics Acoustics, Speech, and Signal Processing (ICASSP). 2017.
cholar.org/CorpusID:7382779).
329. "Watch out, birders: Artificial intelligence has learned to spot birds
312. Clifford, Gari D.; Clifton, David (2012). "Wireless technology in from their songs" (https://www.science.org/content/article/watch-out-
disease management and medicine". Annual Review of Medicine. birders-artificial-intelligence-has-learned-spot-birds-their-songs).
63: 479–492. doi:10.1146/annurev-med-051210-114650 (https://doi. Science | AAAS. 18 July 2018. Retrieved 22 July 2018.
org/10.1146%2Fannurev-med-051210-114650). PMID 22053737 (h
330. "Bird Audio Detection challenge" (http://machine-listening.eecs.qmu
ttps://pubmed.ncbi.nlm.nih.gov/22053737). l.ac.uk/bird-audio-detection-challenge/). Machine Listening Lab at
313. Zue, Victor; Seneff, Stephanie; Glass, James (1990). "Speech Queen Mary University. 3 May 2016. Retrieved 22 July 2018.
database development at MIT: TIMIT and beyond". Speech 331. Wichern, Gordon; Antognini, Joe; Flynn, Michael; Licheng Richard
Communication. 9 (4): 351–356. doi:10.1016/0167-6393(90)90010- Zhu; McQuinn, Emmett; Crow, Dwight; Manilow, Ethan; Jonathan Le
7 (https://doi.org/10.1016%2F0167-6393%2890%2990010-7).
Roux (2019). "WHAM!: Extending Speech Separation to Noisy
314. Kapadia, Sadik, Valtcho Valtchev, and S. J. Young. "MMI training for Environments". arXiv:1907.01160 (https://arxiv.org/abs/1907.01160)
continuous phoneme recognition on the TIMIT database." [cs.SD (https://arxiv.org/archive/cs.SD)].
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93.,
332. Drossos, K., Lipping, S., and Virtanen, T. "Clotho: An Audio
1993 IEEE International Conference on. Vol. 2. IEEE, 1993. Captioning Dataset" IEEE International Conference on Acoustics,
315. Halabi, Nawar (2016). Modern Standard Arabic Phonetics for Speech, and Signal Processing (ICASSP). 2020.
Speech Synthesis (http://en.arabicspeechcorpus.com/Nawar%20H 333. Drossos, K., Lipping, S., and Virtanen, T. (2019). Clotho dataset
alabi%20PhD%20Thesis%20Revised.pdf) (PDF) (PhD Thesis). (Version 1.0) [Data set]. Zenodo.
University of Southampton, School of Electronics and Computer
http://doi.org/10.5281/zenodo.3490684 (https://doi.org/10.5281/zeno
Science. do.3490684)
316. Ardila, Rosana; Branson, Megan; Davis, Kelly; Henretty, Michael;
334. The CAIDA UCSD Dataset on the Witty Worm – 19–24 March 2004,
Kohler, Michael; Meyer, Josh; Morais, Reuben; Saunders, Lindsay;
http://www.caida.org/data/passive/witty_worm_dataset.xml
Tyers, Francis M.; Weber, Gregor (13 December 2019). "Common
Voice: A Massively-Multilingual Speech Corpus". 335. Chen, Zesheng, and Chuanyi Ji. "Optimal worm-scanning method
arXiv:1912.06670v2 (https://arxiv.org/abs/1912.06670v2) [cs.CL (htt using vulnerable-host distributions (https://web.archive.org/web/201
ps://arxiv.org/archive/cs.CL)]. 90806022753/https://pdfs.semanticscholar.org/672e/7be9499fef9a7
ff6b131b650a4de7614aae8.pdf)." International Journal of Security
317. "The LJ Speech Dataset" (https://keithito.com/LJ-Speech-Dataset).
and Networks 2.1–2 (2007): 71–80.
keithito.com. Retrieved 13 April 2022.
336. Kachuee, Mohamad, et al. "Cuff-less high-accuracy calibration-free
318. Zhou, Fang, Q. Claire, and Ross D. King. "Predicting the
blood pressure estimation using pulse transit time (http://download.
geographical origin of music (https://ieeexplore.ieee.org/abstract/do
xuebalib.com/533elteIDEwk.pdf)." Circuits and Systems (ISCAS),
cument/7023456/)." Data Mining (ICDM), 2014 IEEE International 2015 IEEE International Symposium on. IEEE, 2015.
Conference on. IEEE, 2014.
337. PhysioBank, PhysioToolkit. "PhysioNet: components of a new
319. Saccenti, Edoardo; Camacho, José (2015). "On the use of the research resource for complex physiologic signals." Circulation.
observation‐wise k‐fold operation in PCA cross‐validation". Journal
v101 i23. e215-e220.
of Chemometrics. 29 (8): 467–478. doi:10.1002/cem.2726 (https://d
oi.org/10.1002%2Fcem.2726). hdl:10481/55302 (https://hdl.handle. 338. Vergara, Alexander; et al. (2012). "Chemical gas sensor drift
net/10481%2F55302). S2CID 62248957 (https://api.semanticschola compensation using classifier ensembles". Sensors and Actuators
r.org/CorpusID:62248957). B: Chemical. 166: 320–329. doi:10.1016/j.snb.2012.01.074 (https://
doi.org/10.1016%2Fj.snb.2012.01.074).
320. Bertin-Mahieux, Thierry, et al. "The million song dataset." ISMIR
2011: Proceedings of the 12th International Society for Music 339. Korotcenkov, G.; Cho, B. K. (2014). "Engineering approaches to
Information Retrieval Conference, 24–28 October 2011, Miami, improvement of conductometric gas sensor parameters. Part 2:
Florida. University of Miami, 2011. Decrease of dissipated (consumable) power and improvement
stability and reliability". Sensors and Actuators B: Chemical. 198:
321. Henaff, Mikael; et al. (2011). "Unsupervised learning of sparse
316–341. doi:10.1016/j.snb.2014.03.069 (https://doi.org/10.1016%2
features for scalable audio classification" (https://archives.ismir.net/i Fj.snb.2014.03.069).
smir2011/paper/000128.pdf) (PDF). ISMIR. 11.
340. Quinlan, John R (1992). "Learning with continuous classes" (https://
322. Rafii, Zafar (2017). "Music". MUSDB18 – a corpus for music
sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Quinlan-AI.pdf)
separation. doi:10.5281/zenodo.1117372 (https://doi.org/10.5281% (PDF). 5th Australian Joint Conference on Artificial Intelligence. 92.
2Fzenodo.1117372).
341. Merz, Christopher J.; Pazzani, Michael J. (1999). "A principal
323. Defferrard, Michaël; Benzi, Kirell; Vandergheynst, Pierre; Bresson, components approach to combining regression estimates" (https://d
Xavier (6 December 2016). "FMA: A Dataset For Music Analysis". oi.org/10.1023%2Fa%3A1007507221352). Machine Learning. 36
arXiv:1612.01840 (https://arxiv.org/abs/1612.01840) [cs.SD (https://
(1–2): 9–32. doi:10.1023/a:1007507221352 (https://doi.org/10.102
arxiv.org/archive/cs.SD)]. 3%2Fa%3A1007507221352).
342. Torres-Sospedra, Joaquin, et al. "UJIIndoorLoc-Mag: A new 353. Nathan, Ran; et al. (2012). "Using tri-axial acceleration data to
database for magnetic field-based localization problems." Indoor identify behavioral modes of free-ranging animals: general
Positioning and Indoor Navigation (IPIN), 2015 International concepts and tools illustrated for griffon vultures" (https://www.ncbi.
Conference on. IEEE, 2015. nlm.nih.gov/pmc/articles/PMC3284320). The Journal of
343. Berkvens, Rafael, Maarten Weyn, and Herbert Peremans. "Mean Experimental Biology. 215 (6): 986–996. doi:10.1242/jeb.058602 (h
Mutual Information of Probabilistic Wi-Fi Localization (https://www.r ttps://doi.org/10.1242%2Fjeb.058602). PMC 3284320 (https://www.
esearchgate.net/profile/Raf_Berkvens/publication/284154212_Mea ncbi.nlm.nih.gov/pmc/articles/PMC3284320). PMID 22357592 (http
n_Mutual_Information_of_Probabilistic_Wi-Fi_Localization/links/56 s://pubmed.ncbi.nlm.nih.gov/22357592).
4c6b7508aeab8ed5e92fcb.pdf)." Indoor Positioning and Indoor 354. Anguita, Davide, et al. "Human activity recognition on smartphones
Navigation (IPIN), 2015 International Conference on. Banff, using a multiclass hardware-friendly support vector machine (http
Canada: IPIN. 2015. s://upcommons.upc.edu/bitstream/handle/2117/101769/IWAAL201
344. Paschke, Fabian, et al. "Sensorlose Zustandsüberwachung an 2.pdf)." Ambient assisted living and home care. Springer Berlin
Synchronmotoren."Proceedings. 23. Workshop Computational Heidelberg, 2012. 216–223.
Intelligence, Dortmund, 5.-6. Dezember 2013. KIT Scientific 355. Su, Xing; Tong, Hanghang; Ji, Ping (2014). "Activity recognition
Publishing, 2013. with smartphone sensors". Tsinghua Science and Technology. 19
345. Lessmeier, Christian, et al. "Data Acquisition and Signal Analysis (3): 235–249. doi:10.1109/tst.2014.6838194 (https://doi.org/10.110
from Measured Motor Currents for Defect Detection in 9%2Ftst.2014.6838194). S2CID 62751498 (https://api.semanticsch
Electromechanical Drive Systems (https://www.researchgate.net/pr olar.org/CorpusID:62751498).
ofile/Olaf_Enge-Rosenblatt/publication/264441239_Data_Acquisiti 356. Kadous, Mohammed Waleed. Temporal classification: Extending
on_and_Signal_Analysis_from_Measured_Motor_Currents_for_De the classification paradigm to multivariate time series (https://pdfs.s
fect_Detection_in_Electromechanical_Drive_Systems/links/53df97 emanticscholar.org/4bad/c3f0ad169ed9ec7d073375e9b168fa9f6c8
e90cf2a768e49bb3b9.pdf)." f.pdf). Diss. The University of New South Wales, 2002.
346. Ugulino, Wallace, et al. "Wearable computing: Accelerometers’ data 357. Graves, Alex, et al. "Connectionist temporal classification: labelling
classification of body postures and movements (http://groupware.se unsegmented sequence data with recurrent neural networks (https://
condlab.inf.puc-rio.br/public/papers/2012.Ugulino.WearableComput mediatum.ub.tum.de/doc/1292048/file.pdf)." Proceedings of the
ing.HAR.Classifier.RIBBON.pdf) Archived (https://web.archive.org/w 23rd international conference on Machine learning. ACM, 2006.
eb/20200925222906/http://groupware.secondlab.inf.puc-rio.br/publi 358. Velloso, Eduardo, et al. "Qualitative activity recognition of weight
c/papers/2012.Ugulino.WearableComputing.HAR.Classifier.RIBBO lifting exercises (https://www.perceptualui.org/publications/velloso1
N.pdf) 25 September 2020 at the Wayback Machine." Advances in 3_ah.pdf)."Proceedings of the 4th Augmented Human International
Artificial Intelligence-SBIA 2012. Springer Berlin Heidelberg, 2012. Conference. ACM, 2013.
52–61.
359. Mortazavi, Bobak Jack, et al. "Determining the single best axis for
347. Schneider, Jan; et al. (2015). "Augmenting the senses: a review on exercise repetition recognition and counting on smartwatches (htt
sensor-based learning support" (https://www.ncbi.nlm.nih.gov/pmc/ p://www.thehabitslab.com/assets/papers/28.pdf) Archived (https://w
articles/PMC4367401). Sensors. 15 (2): 4097–4133. eb.archive.org/web/20211104043511/https://www.thehabitslab.co
Bibcode:2015Senso..15.4097S (https://ui.adsabs.harvard.edu/abs/2 m/assets/papers/28.pdf) 4 November 2021 at the Wayback
015Senso..15.4097S). doi:10.3390/s150204097 (https://doi.org/10. Machine." Wearable and Implantable Body Sensor Networks
3390%2Fs150204097). PMC 4367401 (https://www.ncbi.nlm.nih.go (BSN), 2014 11th International Conference on. IEEE, 2014.
v/pmc/articles/PMC4367401). PMID 25679313 (https://pubmed.ncb
360. Sapsanis, Christos, et al. "Improving EMG based Classification of
i.nlm.nih.gov/25679313).
basic hand movements using EMD (https://www.researchgate.net/pr
348. Madeo, Renata CB, Clodoaldo AM Lima, and Sarajane M. Peres. ofile/Christos_Sapsanis/publication/257602303_Improving_EMG_b
"Gesture unit segmentation using support vector machines: ased_classification_of_basic_hand_movements_using_EMD/links/
segmenting gestures from rest positions (https://tarjomefa.com/wp-c 56dfb7fd08ae979addef64a2/Improving-EMG-based-classification-o
ontent/uploads/2016/11/5781-English.pdf)." Proceedings of the f-basic-hand-movements-using-EMD.pdf)." Engineering in Medicine
28th Annual ACM Symposium on Applied Computing. ACM, 2013. and Biology Society (EMBC), 2013 35th Annual International
349. Lun, Roanna; Zhao, Wenbing (2015). "A survey of applications and Conference of the IEEE. IEEE, 2013.
human motion recognition with Microsoft Kinect" (https://engagedsc 361. Andrianesis, Konstantinos; Tzes, Anthony (2015). "Development
holarship.csuohio.edu/cgi/viewcontent.cgi?article=1417&context=e and control of a multifunctional prosthetic hand with shape memory
nece_facpub). International Journal of Pattern Recognition and alloy actuators". Journal of Intelligent & Robotic Systems. 78 (2):
Artificial Intelligence. 29 (5): 1555008. 257–289. doi:10.1007/s10846-014-0061-6 (https://doi.org/10.100
doi:10.1142/s0218001415550083 (https://doi.org/10.1142%2Fs021 7%2Fs10846-014-0061-6). S2CID 207174078 (https://api.semantic
8001415550083). scholar.org/CorpusID:207174078).
350. Theodoridis, Theodoros, and Huosheng Hu. "Action classification 362. Banos, Oresti; et al. (2014). "Dealing with the effects of sensor
of 3d human models using dynamic ANNs for mobile robot displacement in wearable activity recognition" (https://www.ncbi.nl
surveillance (https://cswww.sx.ac.uk/staff/hhu/Papers/ROBIO07-66. m.nih.gov/pmc/articles/PMC4118358). Sensors. 14 (6): 9995–
pdf) Archived (https://web.archive.org/web/20190806015015/https:// 10023. Bibcode:2014Senso..14.9995B (https://ui.adsabs.harvard.e
cswww.sx.ac.uk/staff/hhu/Papers/ROBIO07-66.pdf) 6 August 2019 du/abs/2014Senso..14.9995B). doi:10.3390/s140609995 (https://do
at the Wayback Machine."Robotics and Biomimetics, 2007. ROBIO i.org/10.3390%2Fs140609995). PMC 4118358 (https://www.ncbi.nl
2007. IEEE International Conference on. IEEE, 2007. m.nih.gov/pmc/articles/PMC4118358). PMID 24915181 (https://pub
351. Etemad, Seyed Ali, and Ali Arya. "3D human action recognition and med.ncbi.nlm.nih.gov/24915181).
style transformation using resilient backpropagation neural 363. Stisen, Allan, et al. "Smart Devices are Different: Assessing and
networks." Intelligent Computing and Intelligent Systems, 2009. MitigatingMobile Sensing Heterogeneities for Activity Recognition
ICIS 2009. IEEE International Conference on. Vol. 4. IEEE, 2009. (h (https://www.researchgate.net/profile/Henrik_Blunck/publication/30
ttps://ieeexplore.ieee.org/abstract/document/5357690/) 1464144_Smart_Devices_are_Different_Assessing_and_Mitigatin
352. Altun, Kerem; Barshan, Billur; Tunçel, Orkun (2010). "Comparative gMobile_Sensing_Heterogeneities_for_Activity_Recognition/links/
study on classifying human activities with miniature inertial and 585a4c4908ae3852d256f186.pdf)."Proceedings of the 13th ACM
magnetic sensors". Pattern Recognition. 43 (10): 3605–3620. Conference on Embedded Networked Sensor Systems. ACM,
Bibcode:2010PatRe..43.3605A (https://ui.adsabs.harvard.edu/abs/2 2015.
010PatRe..43.3605A). doi:10.1016/j.patcog.2010.04.019 (https://do 364. Bhattacharya, Sourav, and Nicholas D. Lane. "From Smart to Deep:
i.org/10.1016%2Fj.patcog.2010.04.019). hdl:11693/11947 (https://h Robust Activity Recognition on Smartwatches using Deep Learning
dl.handle.net/11693%2F11947). (http://discovery.ucl.ac.uk/1503672/1/deepwatch_wristsense.pdf)."
365. Bacciu, Davide; et al. (2014). "An experimental characterization of
reservoir computing in ambient assisted living applications". Neural
Computing and Applications. 24 (6): 1451–1464.
doi:10.1007/s00521-013-1364-4 (https://doi.org/10.1007%2Fs0052
1-013-1364-4). hdl:11568/237959 (https://hdl.handle.net/11568%2F
237959). S2CID 14124013 (https://api.semanticscholar.org/CorpusI
D:14124013).
366. Palumbo, Filippo; Barsocchi, Paolo; Gallicchio, Claudio; Chessa, 379. Kaya, Heysem, Pınar Tüfekci, and Fikret S. Gürgen. "Local and
Stefano; Micheli, Alessio (2013). "Multisensor Data Fusion for global learning methods for predicting power of a combined gas &
Activity Recognition Based on Reservoir Computing" (https://link.sp steam turbine." International conference on emerging trends in
ringer.com/chapter/10.1007/978-3-642-41043-7_3). Evaluating AAL computer and electronics engineering (ICETCEE'2012), Dubai.
Systems Through Competitive Benchmarking. Communications in 2012.
Computer and Information Science. Vol. 386. pp. 24–35. 380. Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2014).
doi:10.1007/978-3-642-41043-7_3 (https://doi.org/10.1007%2F978- "Searching for exotic particles in high-energy physics with deep
3-642-41043-7_3). ISBN 978-3-642-41042-0. learning". Nature Communications. 5: 2014. arXiv:1402.4735 (http
367. Reiss, Attila, and Didier Stricker. "Introducing a new benchmarked s://arxiv.org/abs/1402.4735). Bibcode:2014NatCo...5.4308B (https://
dataset for activity monitoring (https://www.researchgate.net/profile/ ui.adsabs.harvard.edu/abs/2014NatCo...5.4308B).
Attila_Reiss/publication/235348485_Introducing_a_New_Benchma doi:10.1038/ncomms5308 (https://doi.org/10.1038%2Fncomms530
rked_Dataset_for_Activity_Monitoring/links/00b7d5309d19ca43460 8). PMID 24986233 (https://pubmed.ncbi.nlm.nih.gov/24986233).
00000/Introducing-a-New-Benchmarked-Dataset-for-Activity-Monito S2CID 195953 (https://api.semanticscholar.org/CorpusID:195953).
ring.pdf)."Wearable Computers (ISWC), 2012 16th International 381. Baldi, Pierre; Sadowski, Peter; Whiteson, Daniel (2015).
Symposium on. IEEE, 2012. "Enhanced Higgs Boson to τ+ τ− Search with Deep Learning".
368. Roggen, Daniel, et al. "OPPORTUNITY: Towards opportunistic Physical Review Letters. 114 (11): 111801. arXiv:1410.3469 (http
activity and context recognition systems (https://infoscience.epfl.ch/r s://arxiv.org/abs/1410.3469). Bibcode:2015PhRvL.114k1801B (http
ecord/138648/files/RoggenFoCaHoFaTrLuPiBaKuFeHoRiChMi09. s://ui.adsabs.harvard.edu/abs/2015PhRvL.114k1801B).
pdf)." World of Wireless, Mobile and Multimedia Networks & doi:10.1103/physrevlett.114.111801 (https://doi.org/10.1103%2Fph
Workshops, 2009. WoWMoM 2009. IEEE International Symposium ysrevlett.114.111801). PMID 25839260 (https://pubmed.ncbi.nlm.ni
on a. IEEE, 2009. h.gov/25839260). S2CID 2339142 (https://api.semanticscholar.org/
369. Kurz, Marc, et al. "Dynamic quantification of activity recognition CorpusID:2339142).
capabilities in opportunistic systems (https://www.researchgate.net/ 382. Adam-Bourdarios, C.; Cowan, G.; Germain-Renaud, C.; Guyon, I.;
profile/Marc_Kurz/publication/220271166_Dynamic_Quantification Kégl, B.; Rousseau, D. (2015). "The Higgs Machine Learning
_of_Activity_Recognition_Capabilities_in_Opportunistic_Systems/li Challenge" (https://higgsml.lal.in2p3.fr/). Journal of Physics:
nks/09e4150f66b480c97a000000/Dynamic-Quantification-of-Activit Conference Series. 664 (7): 072015.
y-Recognition-Capabilities-in-Opportunistic-Systems.pdf)." Bibcode:2015JPhCS.664g2015A (https://ui.adsabs.harvard.edu/ab
Vehicular Technology Conference (VTC Spring), 2011 IEEE 73rd. s/2015JPhCS.664g2015A). doi:10.1088/1742-6596/664/7/072015
IEEE, 2011. (https://doi.org/10.1088%2F1742-6596%2F664%2F7%2F072015).
370. Sztyler, Timo, and Heiner Stuckenschmidt. "On-body localization of 383. Baldi, Pierre; Cranmer, Kyle; Faucett, Taylor; Sadowski, Peter;
wearable devices: an investigation of position-aware activity Whiteson, Daniel (2016). "Parameterized neural networks for high-
recognition (https://sensor.informatik.uni-mannheim.de/publications/ energy physics". The European Physical Journal C. 76 (5): 235.
presentation/percom2016.pdf)." Pervasive Computing and arXiv:1601.07913 (https://arxiv.org/abs/1601.07913).
Communications (PerCom), 2016 IEEE International Conference Bibcode:2016EPJC...76..235B (https://ui.adsabs.harvard.edu/abs/2
on. IEEE, 2016. 016EPJC...76..235B). doi:10.1140/epjc/s10052-016-4099-4 (https://
371. Zhi, Ying Xuan; Lukasik, Michelle; Li, Michael H.; Dolatabadi, doi.org/10.1140%2Fepjc%2Fs10052-016-4099-4).
Elham; Wang, Rosalie H.; Taati, Babak (2018). "Automatic S2CID 254108545 (https://api.semanticscholar.org/CorpusID:25410
Detection of Compensation During Robotic Stroke Rehabilitation 8545).
Therapy" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788403). 384. Ortigosa, I.; Lopez, R.; Garcia, J. "A neural networks approach to
IEEE Journal of Translational Engineering in Health and Medicine. residuary resistance of sailing yachts prediction". Proceedings of
6: 2100107. doi:10.1109/JTEHM.2017.2780836 (https://doi.org/10.1 the International Conference on Marine Engineering MARINE.
109%2FJTEHM.2017.2780836). ISSN 2168-2372 (https://www.worl 2007.
dcat.org/issn/2168-2372). PMC 5788403 (https://www.ncbi.nlm.nih. 385. Gerritsma, J., R. Onnink, and A. Versluis.Geometry, resistance and
gov/pmc/articles/PMC5788403). PMID 29404226 (https://pubmed.n stability of the delft systematic yacht hull series. Delft University of
cbi.nlm.nih.gov/29404226). Technology, 1981.
372. Dolatabadi, Elham; Zhi, Ying Xuan; Ye, Bing; Coahran, Marge; 386. Liu, Huan, and Hiroshi Motoda. Feature extraction, construction and
Lupinacci, Giorgia; Mihailidis, Alex; Wang, Rosalie; Taati, Babak selection: A data mining perspective (https://books.google.com/book
(23 May 2017). The toronto rehab stroke pose dataset to detect s?id=zi_0EdWW5fYC). Springer Science & Business Media, 1998.
compensation during stroke rehabilitation therapy. ACM. pp. 375–
387. Reich, Yoram. Converging to Ideal Design Knowledge by Learning.
381. doi:10.1145/3154862.3154925 (https://doi.org/10.1145%2F31 [Carnegie Mellon University], Engineering Design Research
54862.3154925). ISBN 9781450363631. S2CID 24581930 (https://
Center, 1989.
api.semanticscholar.org/CorpusID:24581930).
388. Todorovski, Ljupčo; Džeroski, Sašo (1999). "Experiments in Meta-
373. "Toronto Rehab Stroke Pose Dataset" (https://www.kaggle.com/der
level Learning with ILP" (https://link.springer.com/chapter/10.1007/9
ekdb/toronto-robot-stroke-posture-dataset).
78-3-540-48247-5_11). Principles of Data Mining and Knowledge
374. Jung, Merel M.; Poel, Mannes; Poppe, Ronald; Heylen, Dirk K. J. (1 Discovery. Lecture Notes in Computer Science. Vol. 1704. pp. 98–
March 2017). "Automatic recognition of touch gestures in the corpus 106. doi:10.1007/978-3-540-48247-5_11 (https://doi.org/10.1007%2
of social touch". Journal on Multimodal User Interfaces. 11 (1): 81– F978-3-540-48247-5_11). ISBN 978-3-540-66490-1.
96. doi:10.1007/s12193-016-0232-9 (https://doi.org/10.1007%2Fs1 S2CID 39382993 (https://api.semanticscholar.org/CorpusID:393829
2193-016-0232-9). ISSN 1783-8738 (https://www.worldcat.org/issn/ 93).
1783-8738). S2CID 1802116 (https://api.semanticscholar.org/Corpu
389. Wang, Yong. A new approach to fitting linear models in high
sID:1802116). dimensional spaces (http://www.cs.waikato.ac.nz/~ml/publications/2
375. Jung, M.M. (Merel) (1 June 2016). "Corpus of Social Touch (CoST)" 000/thesis.pdf). Diss. The University of Waikato, 2000.
(https://data.4tu.nl/articles/dataset/Corpus_of_Social_Touch_CoST 390. Kibler, Dennis; Aha, David W.; Albert, Marc K. (1989). "Instance‐
_/12696869). University of Twente. doi:10.4121/uuid:5ef62345-
based prediction of real‐valued attributes" (https://escholarship.org/
3b3e-479c-8e1d-c922748c9b29 (https://doi.org/10.4121%2Fuuid%
uc/item/68f860zb). Computational Intelligence. 5 (2): 51–57.
3A5ef62345-3b3e-479c-8e1d-c922748c9b29). doi:10.1111/j.1467-8640.1989.tb00315.x (https://doi.org/10.1111%2
376. Aeberhard, S., D. Coomans, and O. De Vel. "Comparison of Fj.1467-8640.1989.tb00315.x). S2CID 40800413 (https://api.seman
classifiers in high dimensional settings." Dept. Math. Statist., James ticscholar.org/CorpusID:40800413).
Cook Univ., North Queensland, Australia, Tech. Rep 92-02 (1992). 391. Palmer, Christopher R., and Christos Faloutsos. "Electricity based
377. Basu, Sugato. "Semi-supervised clustering with limited background external similarity of categorical attributes (http://citeseerx.ist.psu.ed
knowledge (http://www.aaai.org/Papers/AAAI/2004/AAAI04-138.pd u/viewdoc/download?doi=10.1.1.469.989&rep=rep1&type=pdf)."
f)." AAAI. 2004. Advances in Knowledge Discovery and Data Mining. Springer
378. Tüfekci, Pınar (2014). "Prediction of full load electrical power output Berlin Heidelberg, 2003. 486–500.
of a base load operated combined cycle power plant using machine
learning methods". International Journal of Electrical Power &
Energy Systems. 60: 126–140. doi:10.1016/j.ijepes.2014.02.027 (ht
tps://doi.org/10.1016%2Fj.ijepes.2014.02.027).
392. Tsanas, Athanasios; Xifara, Angeliki (2012). "Accurate quantitative 404. Sikora, Marek; Wróbel, Łukasz (2010). "Application of rule induction
estimation of energy performance of residential buildings using algorithms for analysis of data collected by seismic hazard
statistical machine learning tools". Energy and Buildings. 49: 560– monitoring systems in coal mines" (https://www.infona.pl/resource/b
567. doi:10.1016/j.enbuild.2012.03.003 (https://doi.org/10.1016%2F wmeta1.element.baztech-article-BPZ5-0008-0008). Archives of
j.enbuild.2012.03.003). Mining Sciences. 55 (1): 91–114.
393. De Wilde, Pieter (2014). "The gap between predicted and 405. Sikora, Marek, and Beata Sikora. "Rough natural hazards
measured energy performance of buildings: A framework for monitoring." Rough Sets: Selected Methods and Applications in
investigation". Automation in Construction. 41: 40–49. Management and Engineering. Springer London, 2012. 163–179.
doi:10.1016/j.autcon.2014.02.009 (https://doi.org/10.1016%2Fj.autc 406. Addor, Nans; Newman, Andrew J.; Mizukami, Naoki; Clark, Martyn
on.2014.02.009). P. (20 October 2017). "The CAMELS data set: catchment attributes
394. Brooks, Thomas F., D. Stuart Pope, and Michael A. Marcolini. Airfoil and meteorology for large-sample studies" (https://hess.copernicus.
self-noise and prediction (https://ntrs.nasa.gov/archive/nasa/casi.ntr org/articles/21/5293/2017/). Hydrology and Earth System Sciences.
s.nasa.gov/19890016302.pdf). Vol. 1218. National Aeronautics and 21 (10): 5293–5313. Bibcode:2017HESS...21.5293A (https://ui.adsa
Space Administration, Office of Management, Scientific and bs.harvard.edu/abs/2017HESS...21.5293A). doi:10.5194/hess-21-
Technical Information Division, 1989. 5293-2017 (https://doi.org/10.5194%2Fhess-21-5293-2017).
395. Draper, David. "Assessment and propagation of model uncertainty ISSN 1607-7938 (https://www.worldcat.org/issn/1607-7938).
(http://www2.denizyuret.com/ref/draper/assessment-and-propagatio 407. Newman, A. J.; Clark, M. P.; Sampson, K.; Wood, A.; Hay, L. E.;
n.pdf)." Journal of the Royal Statistical Society, Series B Bock, A.; Viger, R. J.; Blodgett, D.; Brekke, L.; Arnold, J. R.; Hopson,
(Methodological) (1995): 45–97. T. (14 January 2015). "Development of a large-sample watershed-
396. Lavine, Michael (1991). "Problems in extrapolation illustrated with scale hydrometeorological data set for the contiguous USA: data
space shuttle O-ring data". Journal of the American Statistical set characteristics and assessment of regional variability in
Association. 86 (416): 919–921. hydrologic model performance" (https://hess.copernicus.org/articles/
doi:10.1080/01621459.1991.10475132 (https://doi.org/10.1080%2F 19/209/2015/). Hydrology and Earth System Sciences. 19 (1): 209–
01621459.1991.10475132). 223. Bibcode:2015HESS...19..209N (https://ui.adsabs.harvard.edu/
397. Wang, Jun, Bei Yu, and Les Gasser. "Concept tree based clustering abs/2015HESS...19..209N). doi:10.5194/hess-19-209-2015 (https://
doi.org/10.5194%2Fhess-19-209-2015). ISSN 1607-7938 (https://w
visualization with shaded similarity matrices (https://www.researchg
ate.net/profile/Bei_Yu2/publication/228407462_Concept_Tree_Bas ww.worldcat.org/issn/1607-7938).
ed_Ordering_for_Shaded_Similarity_Matrix/links/00b7d5175607b6 408. Alvarez-Garreton, Camila; Mendoza, Pablo A.; Boisier, Juan Pablo;
1d2e000000.pdf)." Data Mining, 2002. ICDM 2003. Proceedings. Addor, Nans; Galleguillos, Mauricio; Zambrano-Bigiarini, Mauricio;
2002 IEEE International Conference on. IEEE, 2002. Lara, Antonio; Puelma, Cristóbal; Cortes, Gonzalo; Garreaud, Rene;
McPhee, James (13 November 2018). "The CAMELS-CL dataset:
398. Pettengill, Gordon H.; Ford, Peter G.; Johnson, William T. K.; Raney,
catchment attributes and meteorology for large sample studies –
R. Keith; Soderblom, Laurence A. (1991). "Magellan: Radar
Performance and Data Products" (https://www.science.org/doi/abs/1 Chile dataset" (https://hess.copernicus.org/articles/22/5817/2018/).
Hydrology and Earth System Sciences. 22 (11): 5817–5846.
0.1126/science.252.5003.260). Science. 252 (5003): 260–265.
Bibcode:2018HESS...22.5817A (https://ui.adsabs.harvard.edu/abs/
Bibcode:1991Sci...252..260P (https://ui.adsabs.harvard.edu/abs/19
91Sci...252..260P). doi:10.1126/science.252.5003.260 (https://doi.o 2018HESS...22.5817A). doi:10.5194/hess-22-5817-2018 (https://do
rg/10.1126%2Fscience.252.5003.260). PMID 17769272 (https://pub i.org/10.5194%2Fhess-22-5817-2018). ISSN 1607-7938 (https://ww
w.worldcat.org/issn/1607-7938). S2CID 133955609 (https://api.sem
med.ncbi.nlm.nih.gov/17769272). S2CID 43398343 (https://api.sem
anticscholar.org/CorpusID:43398343). anticscholar.org/CorpusID:133955609).
409. Chagas, Vinícius B. P.; Chaffe, Pedro L. B.; Addor, Nans; Fan,
399. Aharonian, F.; et al. (2008). "Energy spectrum of cosmic-ray
electrons at TeV energies". Physical Review Letters. 101 (26): Fernando M.; Fleischmann, Ayan S.; Paiva, Rodrigo C. D.;
261104. arXiv:0811.3894 (https://arxiv.org/abs/0811.3894). Siqueira, Vinícius A. (8 September 2020). "CAMELS-BR:
hydrometeorological time series and landscape attributes for 897
Bibcode:2008PhRvL.101z1104A (https://ui.adsabs.harvard.edu/ab
catchments in Brazil" (https://essd.copernicus.org/articles/12/2075/2
s/2008PhRvL.101z1104A). doi:10.1103/PhysRevLett.101.261104
(https://doi.org/10.1103%2FPhysRevLett.101.261104). 020/). Earth System Science Data. 12 (3): 2075–2096.
Bibcode:2020ESSD...12.2075C (https://ui.adsabs.harvard.edu/abs/
hdl:2440/51450 (https://hdl.handle.net/2440%2F51450).
2020ESSD...12.2075C). doi:10.5194/essd-12-2075-2020 (https://do
PMID 19437632 (https://pubmed.ncbi.nlm.nih.gov/19437632).
S2CID 41850528 (https://api.semanticscholar.org/CorpusID:418505 i.org/10.5194%2Fessd-12-2075-2020). ISSN 1866-3516 (https://ww
w.worldcat.org/issn/1866-3516). S2CID 234737197 (https://api.sem
28).
anticscholar.org/CorpusID:234737197).
400. Bock, R. K.; et al. (2004). "Methods for multidimensional event
classification: a case study using images from a Cherenkov 410. Coxon, Gemma; Addor, Nans; Bloomfield, John P.; Freer, Jim; Fry,
Matt; Hannaford, Jamie; Howden, Nicholas J. K.; Lane, Rosanna;
gamma-ray telescope". Nuclear Instruments and Methods in
Physics Research Section A: Accelerators, Spectrometers, Lewis, Melinda; Robinson, Emma L.; Wagener, Thorsten (12
Detectors and Associated Equipment. 516 (2): 511–528. October 2020). "CAMELS-GB: hydrometeorological time series and
landscape attributes for 671 catchments in Great Britain" (https://ess
Bibcode:2004NIMPA.516..511B (https://ui.adsabs.harvard.edu/abs/
2004NIMPA.516..511B). doi:10.1016/j.nima.2003.08.157 (https://do d.copernicus.org/articles/12/2459/2020/). Earth System Science
i.org/10.1016%2Fj.nima.2003.08.157). Data. 12 (4): 2459–2483. Bibcode:2020ESSD...12.2459C (https://ui.
adsabs.harvard.edu/abs/2020ESSD...12.2459C). doi:10.5194/essd-
401. Li, Jinyan; et al. (2004). "Deeps: A new instance-based lazy 12-2459-2020 (https://doi.org/10.5194%2Fessd-12-2459-2020).
discovery and classification system" (https://doi.org/10.1023%2Fb% ISSN 1866-3516 (https://www.worldcat.org/issn/1866-3516).
3Amach.0000011804.08528.7d). Machine Learning. 54 (2): 99– S2CID 226192657 (https://api.semanticscholar.org/CorpusID:22619
124. doi:10.1023/b:mach.0000011804.08528.7d (https://doi.org/10. 2657).
1023%2Fb%3Amach.0000011804.08528.7d).
411. Fowler, Keirnan J. A.; Acharya, Suwash Chandra; Addor, Nans;
402. Villaescusa-Navarro, Francisco; al., et (2022). "The CAMELS Chou, Chihchung; Peel, Murray C. (6 August 2021). "CAMELS-
Multifield Data Set: Learning the Universe's Fundamental AUS: hydrometeorological time series and landscape attributes for
Parameters with Artificial Intelligence". The Astrophysical Journal 222 catchments in Australia" (https://essd.copernicus.org/articles/1
Supplement Series. 259 (2): 61. arXiv:2109.10915 (https://arxiv.org/ 3/3847/2021/). Earth System Science Data. 13 (8): 3847–3867.
abs/2109.10915). Bibcode:2022ApJS..259...61V (https://ui.adsabs. Bibcode:2021ESSD...13.3847F (https://ui.adsabs.harvard.edu/abs/
harvard.edu/abs/2022ApJS..259...61V). doi:10.3847/1538- 2021ESSD...13.3847F). doi:10.5194/essd-13-3847-2021 (https://do
4365/ac5ab0 (https://doi.org/10.3847%2F1538-4365%2Fac5ab0). i.org/10.5194%2Fessd-13-3847-2021). ISSN 1866-3516 (https://ww
S2CID 237604997 (https://api.semanticscholar.org/CorpusID:23760 w.worldcat.org/issn/1866-3516). S2CID 238796784 (https://api.sem
4997). anticscholar.org/CorpusID:238796784).
403. Siebert, Lee, and Tom Simkin. "Volcanoes of the world: an
illustrated catalog of Holocene volcanoes and their eruptions."
(2014).
412. Klingler, Christoph; Schulz, Karsten; Herrnegger, Mathew (16 425. Donchin, Emanuel; Spencer, Kevin M.; Wijesinghe, Ranjith (2000).
September 2021). "LamaH-CE: LArge-SaMple DAta for Hydrology "The mental prosthesis: assessing the speed of a P300-based
and Environmental Sciences for Central Europe" (https://essd.coper brain-computer interface". IEEE Transactions on Rehabilitation
nicus.org/articles/13/4529/2021/). Earth System Science Data. 13 Engineering. 8 (2): 174–179. doi:10.1109/86.847808 (https://doi.org/
(9): 4529–4565. Bibcode:2021ESSD...13.4529K (https://ui.adsabs.h 10.1109%2F86.847808). PMID 10896179 (https://pubmed.ncbi.nlm.
arvard.edu/abs/2021ESSD...13.4529K). doi:10.5194/essd-13-4529- nih.gov/10896179).
2021 (https://doi.org/10.5194%2Fessd-13-4529-2021). ISSN 1866- 426. Detrano, Robert; et al. (1989). "International application of a new
3516 (https://www.worldcat.org/issn/1866-3516). S2CID 240533508 probability algorithm for the diagnosis of coronary artery disease".
(https://api.semanticscholar.org/CorpusID:240533508). The American Journal of Cardiology. 64 (5): 304–310.
413. Yeh, I–C (1998). "Modeling of strength of high-performance doi:10.1016/0002-9149(89)90524-9 (https://doi.org/10.1016%2F000
concrete using artificial neural networks". Cement and Concrete 2-9149%2889%2990524-9). PMID 2756873 (https://pubmed.ncbi.nl
Research. 28 (12): 1797–1808. doi:10.1016/s0008-8846(98)00165- m.nih.gov/2756873).
3 (https://doi.org/10.1016%2Fs0008-8846%2898%2900165-3). 427. Bradley, Andrew P (1997). "The use of the area under the ROC
414. Zarandi, MH Fazel; et al. (2008). "Fuzzy polynomial neural curve in the evaluation of machine learning algorithms" (http://espac
networks for approximation of the compressive strength of e.library.uq.edu.au/view/UQ:8925/pr-t.pdf) (PDF). Pattern
concrete". Applied Soft Computing. 8 (1): 488–498. Recognition. 30 (7): 1145–1159. Bibcode:1997PatRe..30.1145B (ht
Bibcode:2008ApSoC...8...79S (https://ui.adsabs.harvard.edu/abs/20 tps://ui.adsabs.harvard.edu/abs/1997PatRe..30.1145B).
08ApSoC...8...79S). doi:10.1016/j.asoc.2007.02.010 (https://doi.org/ doi:10.1016/s0031-3203(96)00142-2 (https://doi.org/10.1016%2Fs0
10.1016%2Fj.asoc.2007.02.010). 031-3203%2896%2900142-2). S2CID 13806304 (https://api.seman
415. Yeh, I. "Modeling slump of concrete with fly ash and ticscholar.org/CorpusID:13806304).
superplasticizer." Computers and Concrete5.6 (2008): 559–572. 428. Street, W. N.; Wolberg, W. H.; Mangasarian, O. L. (1993). "Nuclear
416. Gencel, Osman; et al. (2011). "Comparison of artificial neural feature extraction for breast tumor diagnosis" (https://www.spiedigita
networks and general linear model approaches for the analysis of llibrary.org/conference-proceedings-of-spie/1905/0000/Nuclear-feat
abrasive wear of concrete". Construction and Building Materials. 25 ure-extraction-for-breast-tumor-diagnosis/10.1117/12.148698.short).
(8): 3486–3494. doi:10.1016/j.conbuildmat.2011.03.040 (https://doi. In Acharya, Raj S; Goldgof, Dmitry B (eds.). Biomedical Image
org/10.1016%2Fj.conbuildmat.2011.03.040). Processing and Biomedical Visualization (http://digital.library.wisc.e
du/1793/59692). Vol. 1905. pp. 861–870. doi:10.1117/12.148698 (ht
417. Dietterich, Thomas G., et al. "A comparison of dynamic reposing
and tangent distance for drug activity prediction (http://papers.nips.c tps://doi.org/10.1117%2F12.148698). S2CID 14922543 (https://api.
semanticscholar.org/CorpusID:14922543).
c/paper/781-a-comparison-of-dynamic-reposing-and-tangent-distan
ce-for-drug-activity-prediction.pdf)." Advances in Neural Information 429. Demir, Cigdem, and Bülent Yener. "Automated cancer diagnosis
Processing Systems (1994): 216–216. based on histopathological images: a systematic survey (http://cites
eerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.1199&rep=rep1
418. Buscema, Massimo, William J. Tastle, and Stefano Terzi. "Meta net:
&type=pdf)." Rensselaer Polytechnic Institute, Tech. Rep (2005).
A new meta-classifier family (https://www.researchgate.net/profile/M
assimo_Buscema/publication/13731626_MetaNet_The_Theory_of 430. Abuse, Substance. "Mental Health Services Administration, Results
_Independent_Judges/links/0deec52baf2937fc8e000000.pd from the 2010 National Survey on Drug Use and Health: Summary
f)."Data Mining Applications Using Artificial Adaptive Systems. of National Findings, NSDUH Series H-41, HHS Publication No.
Springer New York, 2013. 141–182. (SMA) 11-4658." Rockville, MD: Substance Abuse and Mental
Health Services Administration 201 (2011).
419. Amoradnejad, Issa; Amoradnejad, Rahimberdi; et al. (2022). "Age
dataset: A structured general-purpose dataset on life, work, and 431. Hong, Zi-Quan; Yang, Jing-Yu (1991). "Optimal discriminant plane
death of 1.22 million distinguished people" (http://workshop-procee for a small number of samples and design method of classifier on
dings.icwsm.org/abstract?id=2022_82). Workshop Proceedings of the plane". Pattern Recognition. 24 (4): 317–324.
the 16th International AAAI Conference on Web and Social Media Bibcode:1991PatRe..24..317H (https://ui.adsabs.harvard.edu/abs/1
(ICWSM). 3: 1–4. doi:10.36190/2022.82 (https://doi.org/10.36190%2 991PatRe..24..317H). doi:10.1016/0031-3203(91)90074-f (https://do
F2022.82). S2CID 249668669 (https://api.semanticscholar.org/Corp i.org/10.1016%2F0031-3203%2891%2990074-f).
usID:249668669). 432. Li, Jinyan, and Limsoon Wong. "Using rules to analyse bio-medical
420. "Age Dataset" (https://github.com/Moradnejad/AgeDataset). GitHub. data: a comparison between C4. 5 and PCL." Advances in Web-
7 June 2022. Age Information Management. Springer Berlin Heidelberg, 2003.
421. "Synthetic Fundus Dataset" (https://web.archive.org/web/20211129 254–265.
155047/http://math.unipa.it/cvalenti/fundus/). Archived from the 433. Güvenir, H. Altay, et al. "A supervised machine learning algorithm
original (http://math.unipa.it/cvalenti/fundus/) on 29 November 2021. for arrhythmia analysis (http://repository.bilkent.edu.tr/bitstream/han
Retrieved 22 February 2023. dle/11693/27699/bilkent-research-paper.pdf?sequence=
422. Lo Castro, Dario; et al. (2020). "A visual framework to create 1)."Computers in Cardiology 1997. IEEE, 1997.
photorealistic retinal vessels for diagnosis purposes". Journal of 434. Lagus, Krista, et al. "Independent variable group analysis in
Biomedical Informatics. 108: 103490. doi:10.1016/j.jbi.2020.103490 learning compact representations for data (http://users.ics.aalto.fi/ah
(https://doi.org/10.1016%2Fj.jbi.2020.103490). PMID 32640292 (htt onkela/papers/Lagus05akrr.pdf)." Proceedings of the International
ps://pubmed.ncbi.nlm.nih.gov/32640292). S2CID 220429697 (http and Interdisciplinary Conference on Adaptive Knowledge
s://api.semanticscholar.org/CorpusID:220429697). Representation and Reasoning (AKRR'05), T. Honkela, V.
423. Ingber, Lester (1997). "Statistical mechanics of neocortical Könönen, M. Pöllä, and O. Simula, Eds., Espoo, Finland. 2005.
interactions: Canonical momenta indicatorsof 435. Strack, Beata, et al. "Impact of HbA1c measurement on hospital
electroencephalography". Physical Review E. 55 (4): 4578–4593. readmission rates: analysis of 70,000 clinical database patient
arXiv:physics/0001052 (https://arxiv.org/abs/physics/0001052). records (http://downloads.hindawi.com/journals/bmri/2014/781670.
Bibcode:1997PhRvE..55.4578I (https://ui.adsabs.harvard.edu/abs/1 pdf)." BioMed Research International 2014; 2014
997PhRvE..55.4578I). doi:10.1103/PhysRevE.55.4578 (https://doi.o 436. Rubin, Daniel J (2015). "Hospital readmission of patients with
rg/10.1103%2FPhysRevE.55.4578). S2CID 6390999 (https://api.se diabetes". Current Diabetes Reports. 15 (4): 1–9.
manticscholar.org/CorpusID:6390999). doi:10.1007/s11892-015-0584-7 (https://doi.org/10.1007%2Fs1189
424. Hoffmann, Ulrich; Vesin, Jean-Marc; Ebrahimi, Touradj; Diserens, 2-015-0584-7). PMID 25712258 (https://pubmed.ncbi.nlm.nih.gov/2
Karin (2008). "An efficient P300-based brain–computer interface for 5712258). S2CID 3908599 (https://api.semanticscholar.org/CorpusI
disabled subjects". Journal of Neuroscience Methods. 167 (1): 115– D:3908599).
125. CiteSeerX 10.1.1.352.4630 (https://citeseerx.ist.psu.edu/viewd 437. Antal, Bálint; Hajdu, András (2014). "An ensemble-based system for
oc/summary?doi=10.1.1.352.4630). automatic screening of diabetic retinopathy". Knowledge-Based
doi:10.1016/j.jneumeth.2007.03.005 (https://doi.org/10.1016%2Fj.jn Systems. 60 (2014): 20–27. arXiv:1410.8576 (https://arxiv.org/abs/1
eumeth.2007.03.005). PMID 17445904 (https://pubmed.ncbi.nlm.ni 410.8576). Bibcode:2014arXiv1410.8576A (https://ui.adsabs.harvar
h.gov/17445904). S2CID 9648828 (https://api.semanticscholar.org/ d.edu/abs/2014arXiv1410.8576A).
CorpusID:9648828). doi:10.1016/j.knosys.2013.12.023 (https://doi.org/10.1016%2Fj.kno
sys.2013.12.023). S2CID 13984326 (https://api.semanticscholar.or
g/CorpusID:13984326).
438. Haloi, Mrinal (2015). "Improved Microaneurysm Detection using 451. Javadi, Soroush; Mirroshandel, Seyed Abolghasem (2019). "A
Deep Neural Networks". arXiv:1505.04424 (https://arxiv.org/abs/150 novel deep learning method for automatic assessment of human
5.04424) [cs.CV (https://arxiv.org/archive/cs.CV)]. sperm images". Computers in Biology and Medicine. 109: 182–194.
439. ELIE, Guillaume PATRY, Gervais GAUTHIER, Bruno LAY, Julien doi:10.1016/j.compbiomed.2019.04.030 (https://doi.org/10.1016%2
ROGER, Damien. "ADCIS Download Third Party: Messidor Fj.compbiomed.2019.04.030). ISSN 0010-4825 (https://www.worldc
Database" (http://www.adcis.net/en/Download-Third-Party/Messido at.org/issn/0010-4825). PMID 31059902 (https://pubmed.ncbi.nlm.ni
r.htmldownload.php). adcis.net. Retrieved 25 February 2018. h.gov/31059902). S2CID 146809768 (https://api.semanticscholar.or
g/CorpusID:146809768).
440. Decencière, Etienne; Zhang, Xiwei; Cazuguel, Guy; Lay, Bruno;
Cochener, Béatrice; Trone, Caroline; Gain, Philippe; Ordonez, 452. "soroushj/mhsma-dataset: MHSMA: The Modified Human Sperm
Richard; Massin, Pascale (26 August 2014). "Feedback on a Morphology Analysis Dataset" (https://github.com/soroushj/mhsma-
Publicly Distributed Image Database: The Messidor Database" (http dataset). github.com. Retrieved 3 May 2019.
s://doi.org/10.5566%2Fias.1155). Image Analysis & Stereology. 33 453. Clark, David, Zoltan Schreter, and Anthony Adams. "A quantitative
(3): 231–234. doi:10.5566/ias.1155 (https://doi.org/10.5566%2Fias. comparison of dystal and backpropagation." Proceedings of 1996
1155). ISSN 1854-5165 (https://www.worldcat.org/issn/1854-5165). Australian Conference on Neural Networks. 1996.
441. Bagirov, A. M.; et al. (2003). "Unsupervised and supervised data 454. Jiang, Yuan, and Zhi-Hua Zhou. "Editing training data for kNN
classification via nonsmooth and global optimization". Top. 11 (1): classifiers with neural network ensemble (https://cs.nju.edu.cn/zhou
1–75. CiteSeerX 10.1.1.1.6429 (https://citeseerx.ist.psu.edu/viewdo zh/zhouzh.files/publication/isnn04a.pdf)." Advances in Neural
c/summary?doi=10.1.1.1.6429). doi:10.1007/bf02578945 (https://do Networks–ISNN 2004. Springer Berlin Heidelberg, 2004. 356–361.
i.org/10.1007%2Fbf02578945). S2CID 14165678 (https://api.seman 455. Ontañón, Santiago, and Enric Plaza. "On similarity measures based
ticscholar.org/CorpusID:14165678). on a refinement lattice." Case-Based Reasoning Research and
442. Fung, Glenn, et al. "A fast iterative algorithm for fisher discriminant Development. Springer Berlin Heidelberg, 2009. 240–255.
using heterogeneous kernels (https://jinbo-bi.uconn.edu/wp-conten 456. "PLF data inventory" (https://github.com/Animal-Data-Inventory/PLF
t/uploads/sites/2638/2018/12/icml04_kernel.pdf)."Proceedings of DataInventory). GitHub. 5 November 2021.
the twenty-first international conference on Machine learning. ACM,
457. Higuera, Clara; Gardiner, Katheleen J.; Cios, Krzysztof J. (2015).
2004. "Self-organizing feature maps identify proteins critical to learning in
443. Quinlan, John Ross, et al. "Inductive knowledge acquisition: a case a mouse model of down syndrome" (https://www.ncbi.nlm.nih.gov/p
study." Proceedings of the Second Australian Conference on mc/articles/PMC4482027). PLOS ONE. 10 (6): e0129126.
Applications of expert systems. Addison-Wesley Longman Bibcode:2015PLoSO..1029126H (https://ui.adsabs.harvard.edu/ab
Publishing Co., Inc., 1987. s/2015PLoSO..1029126H). doi:10.1371/journal.pone.0129126 (http
444. Zhou, Zhi-Hua; Jiang, Yuan (2004). "NeC4. 5: neural ensemble s://doi.org/10.1371%2Fjournal.pone.0129126). PMC 4482027 (http
based C4. 5". IEEE Transactions on Knowledge and Data s://www.ncbi.nlm.nih.gov/pmc/articles/PMC4482027).
Engineering. 16 (6): 770–773. CiteSeerX 10.1.1.1.8430 (https://cites PMID 26111164 (https://pubmed.ncbi.nlm.nih.gov/26111164).
eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.8430). 458. Ahmed, Md Mahiuddin; et al. (2015). "Protein dynamics associated
doi:10.1109/tkde.2004.11 (https://doi.org/10.1109%2Ftkde.2004.1 with failed and rescued learning in the Ts65Dn mouse model of
1). S2CID 1024861 (https://api.semanticscholar.org/CorpusID:1024 Down syndrome" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4
861). 368539). PLOS ONE. 10 (3): e0119491.
445. Er, Orhan; et al. (2012). "An approach based on probabilistic neural Bibcode:2015PLoSO..1019491A (https://ui.adsabs.harvard.edu/ab
network for diagnosis of Mesothelioma's disease". Computers & s/2015PLoSO..1019491A). doi:10.1371/journal.pone.0119491 (http
Electrical Engineering. 38 (1): 75–81. s://doi.org/10.1371%2Fjournal.pone.0119491). PMC 4368539 (http
doi:10.1016/j.compeleceng.2011.09.001 (https://doi.org/10.1016%2 s://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368539).
Fj.compeleceng.2011.09.001). PMID 25793384 (https://pubmed.ncbi.nlm.nih.gov/25793384).
446. Er, Orhan, A. Çetin Tanrikulu, and Abdurrahman Abakay. "Use of 459. Langley, PAT (2014). "Trading off simplicity and coverage in
artificial intelligence techniques for diagnosis of malignant pleural incremental concept learning" (https://web.archive.org/web/201908
mesothelioma (https://dergipark.org.tr/download/article-file/5452 06184005/https://www.westmont.edu/~iba/pubs/hillary-paper.pdf)
1)."Dicle Tıp Dergisi 42.1 (2015). (PDF). Machine Learning Proceedings. 1988: 73. Archived from the
447. Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (25 original (https://www.westmont.edu/~iba/pubs/hillary-paper.pdf)
July 2017). "Vision-Based Assessment of Parkinsonism and (PDF) on 6 August 2019. Retrieved 6 August 2019.
Levodopa-Induced Dyskinesia with Deep Learning Pose 460. "Mushroom Data Set 2020" (https://mushroom.mathematik.uni-marb
Estimation" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC621908 urg.de/). mushroom.mathematik.uni-marburg.de. Retrieved 6 April
2). Journal of Neuroengineering and Rehabilitation. 15 (1): 97. 2021.
arXiv:1707.09416 (https://arxiv.org/abs/1707.09416). 461. Wagner, Dennis; Heider, Dominik; Hattab, Georges (14 April 2021).
Bibcode:2017arXiv170709416L (https://ui.adsabs.harvard.edu/abs/ "Mushroom data creation, curation, and simulation to support
2017arXiv170709416L). doi:10.1186/s12984-018-0446-z (https://do classification tasks" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC
i.org/10.1186%2Fs12984-018-0446-z). PMC 6219082 (https://www. 8046754). Scientific Reports. 11 (1): 8134.
ncbi.nlm.nih.gov/pmc/articles/PMC6219082). PMID 30400914 (http Bibcode:2021NatSR..11.8134W (https://ui.adsabs.harvard.edu/abs/
s://pubmed.ncbi.nlm.nih.gov/30400914). 2021NatSR..11.8134W). doi:10.1038/s41598-021-87602-3 (https://
448. Li, Michael H.; Mestre, Tiago A.; Fox, Susan H.; Taati, Babak (May doi.org/10.1038%2Fs41598-021-87602-3). ISSN 2045-2322 (http
2018). "Automated assessment of levodopa-induced dyskinesia: s://www.worldcat.org/issn/2045-2322). PMC 8046754 (https://www.
Evaluating the responsiveness of video-based features". ncbi.nlm.nih.gov/pmc/articles/PMC8046754). PMID 33854157 (http
Parkinsonism & Related Disorders. 53: 42–45. s://pubmed.ncbi.nlm.nih.gov/33854157).
doi:10.1016/j.parkreldis.2018.04.036 (https://doi.org/10.1016%2Fj.p 462. Cortez, Paulo, and Aníbal de Jesus Raimundo Morais. "A data
arkreldis.2018.04.036). ISSN 1353-8020 (https://www.worldcat.org/i mining approach to predict forest fires using meteorological data."
ssn/1353-8020). PMID 29748112 (https://pubmed.ncbi.nlm.nih.gov/ (2007).
29748112). S2CID 13666294 (https://api.semanticscholar.org/Corp 463. Farquad, M. A. H.; Ravi, V.; Raju, S. Bapi (2010). "Support vector
usID:13666294). regression based hybrid rule extraction methods for forecasting".
449. "Parkinson's Vision-Based Pose Estimation Dataset | Kaggle" (http Expert Systems with Applications. 37 (8): 5577–5589.
s://www.kaggle.com/limi44/parkinsons-visionbased-pose-estimatio doi:10.1016/j.eswa.2010.02.055 (https://doi.org/10.1016%2Fj.eswa.
n-dataset/home). kaggle.com. Retrieved 22 August 2018. 2010.02.055).
450. Shannon, Paul; et al. (2003). "Cytoscape: a software environment 464. Fisher, Ronald A (1936). "The use of multiple measurements in
for integrated models of biomolecular interaction networks" (https:// taxonomic problems". Annals of Eugenics. 7 (2): 179–188.
www.ncbi.nlm.nih.gov/pmc/articles/PMC403769). Genome doi:10.1111/j.1469-1809.1936.tb02137.x (https://doi.org/10.1111%2
Research. 13 (11): 2498–2504. doi:10.1101/gr.1239303 (https://doi. Fj.1469-1809.1936.tb02137.x). hdl:2440/15227 (https://hdl.handle.n
org/10.1101%2Fgr.1239303). PMC 403769 (https://www.ncbi.nlm.ni et/2440%2F15227).
h.gov/pmc/articles/PMC403769). PMID 14597658 (https://pubmed.n
cbi.nlm.nih.gov/14597658).
465. Ghahramani, Zoubin, and Michael I. Jordan. "Supervised learning 478. Muresan, Horea; Oltean, Mihai (2018). "Fruit recognition from
from incomplete data via an EM approach (http://papers.nips.cc/pap images using deep learning" (https://www.researchgate.net/publicat
er/767-supervised-learning-from-incomplete-data-via-an-em-approa ion/321475443). Acta Univ. Sapientiae, Informatica. 10 (1): 26–42.
ch.pdf)." Advances in neural information processing systems 6. doi:10.2478/ausi-2018-0002 (https://doi.org/10.2478%2Fausi-2018-
1994. 0002).
466. Mallah, Charles; Cope, James; Orwell, James (2013). "Plant leaf 479. Oltean, Mihai; Muresan, Horea (2017). "A dataset with fruit images
classification using probabilistic integration of shape, texture and on Kaggle" (https://www.kaggle.com/moltean/fruits).
margin features" (https://www.researchgate.net/publication/2666323 480. Nakai, Kenta; Kanehisa, Minoru (1991). "Expert system for
57). Signal Processing, Pattern Recognition and Applications. 5: 1. predicting protein localization sites in gram‐negative bacteria".
467. Yahiaoui, Itheri, Olfa Mzoughi, and Nozha Boujemaa. "Leaf shape Proteins: Structure, Function, and Bioinformatics. 11 (2): 95–110.
descriptor for tree species identification (http://www.cmlab.csie.ntu.e doi:10.1002/prot.340110203 (https://doi.org/10.1002%2Fprot.34011
du.tw/~zenic/Data/Download/ICME2012/Conference/data/4711a25 0203). PMID 1946347 (https://pubmed.ncbi.nlm.nih.gov/1946347).
4.pdf) Archived (https://web.archive.org/web/20190806184006/htt S2CID 27606447 (https://api.semanticscholar.org/CorpusID:276064
p://www.cmlab.csie.ntu.edu.tw/~zenic/Data/Download/ICME2012/C 47).
onference/data/4711a254.pdf) 6 August 2019 at the Wayback 481. Ling, Charles X., et al. "Decision trees with minimal costs (https://cli
Machine." Multimedia and Expo (ICME), 2012 IEEE International ng.csd.uwo.ca/cs860/ICML04-Ling.pdf)." Proceedings of the twenty-
Conference on. IEEE, 2012. first international conference on Machine learning. ACM, 2004.
468. Tan, Ming, and Larry Eshelman. "Using weighted networks to 482. Mahé, Pierre, et al. "Automatic identification of mixed bacterial
represent classification knowledge in noisy domains (https://www.s species fingerprints in a MALDI-TOF mass-spectrum (https://acade
ciencedirect.com/science/article/pii/B9780934613644500189)." mic.oup.com/bioinformatics/article/30/9/1280/237488)."
Proceedings of the Fifth International Conference on Machine Bioinformatics (2014): btu022.
Learning. 2014.
483. Barbano, Duane; et al. (2015). "Rapid characterization of
469. Charytanowicz, Małgorzata, et al. "Complete gradient clustering microalgae and microalgae mixtures using matrix-assisted laser
algorithm for features analysis of x-ray images (http://home.agh.edu. desorption ionization time-of-flight mass spectrometry (MALDI-TOF
pl/~kulpi/publ/Charytanowicz_Niewczas_Kulczycki_Kowalski_Luk MS)" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4536233).
asik_Zak_-_Information_Technologies_in_Biomedicine_-_2010.pd PLOS ONE. 10 (8): e0135337. Bibcode:2015PLoSO..1035337B (htt
f)." Information technologies in biomedicine. Springer Berlin ps://ui.adsabs.harvard.edu/abs/2015PLoSO..1035337B).
Heidelberg, 2010. 15–24. doi:10.1371/journal.pone.0135337 (https://doi.org/10.1371%2Fjour
470. Sanchez, Mauricio A.; et al. (2014). "Fuzzy granular gravitational nal.pone.0135337). PMC 4536233 (https://www.ncbi.nlm.nih.gov/p
clustering algorithm for multivariate data". Information Sciences. mc/articles/PMC4536233). PMID 26271045 (https://pubmed.ncbi.nl
279: 498–511. doi:10.1016/j.ins.2014.04.005 (https://doi.org/10.101 m.nih.gov/26271045).
6%2Fj.ins.2014.04.005). 484. Horton, Paul; Nakai, Kenta (1996). "A probabilistic classification
471. Blackard, Jock A.; Dean, Denis J. (1999). "Comparative accuracies system for predicting the cellular localization sites of proteins" (http
of artificial neural networks and discriminant analysis in predicting s://www.aaai.org/Papers/ISMB/1996/ISMB96-012.pdf) (PDF). ISMB-
forest cover types from cartographic variables". Computers and 96 Proceedings. 4: 109–15. PMID 8877510 (https://pubmed.ncbi.nl
Electronics in Agriculture. 24 (3): 131–151. m.nih.gov/8877510).
CiteSeerX 10.1.1.128.2475 (https://citeseerx.ist.psu.edu/viewdoc/su 485. Allwein, Erin L.; Schapire, Robert E.; Singer, Yoram (2001).
mmary?doi=10.1.1.128.2475). doi:10.1016/s0168-1699(99)00046-0 "Reducing multiclass to binary: A unifying approach for margin
(https://doi.org/10.1016%2Fs0168-1699%2899%2900046-0). classifiers" (http://www.jmlr.org/papers/volume1/allwein00a/allwein
S2CID 13985407 (https://api.semanticscholar.org/CorpusID:139854 00a.pdf) (PDF). The Journal of Machine Learning Research. 1:
07). 113–141.
472. Fürnkranz, Johannes. "Round robin rule learning (http://citeseerx.is 486. Mayr, Andreas; Klambauer, Guenter; Unterthiner, Thomas;
t.psu.edu/viewdoc/summary?doi=10.1.1.20.9520)."Proceedings of Hochreiter, Sepp (2016). "DeepTox: Toxicity Prediction Using Deep
the 18th International Conference on Machine Learning (ICML-01): Learning" (http://bioinf.jku.at/research/DeepTox/tox21.html).
146—153. 2001. Frontiers in Environmental Science. 3: 80.
473. Li, Song; Assmann, Sarah M.; Albert, Réka (2006). "Predicting doi:10.3389/fenvs.2015.00080 (https://doi.org/10.3389%2Ffenvs.20
essential components of signal transduction networks: a dynamic 15.00080).
model of guard cell abscisic acid signaling" (https://www.ncbi.nlm.ni 487. Lavin, Alexander; Ahmad, Subutai (12 October 2015). Evaluating
h.gov/pmc/articles/PMC1564158). PLOS Biol. 4 (10): e312. arXiv:q- Real-time Anomaly Detection Algorithms – the Numenta Anomaly
bio/0610012 (https://arxiv.org/abs/q-bio/0610012). Benchmark. p. 38. arXiv:1510.03336 (https://arxiv.org/abs/1510.033
Bibcode:2006q.bio....10012L (https://ui.adsabs.harvard.edu/abs/200 36). doi:10.1109/ICMLA.2015.141 (https://doi.org/10.1109%2FICML
6q.bio....10012L). doi:10.1371/journal.pbio.0040312 (https://doi.org/ A.2015.141). ISBN 978-1-5090-0287-0. S2CID 6842305 (https://api.
10.1371%2Fjournal.pbio.0040312). PMC 1564158 (https://www.ncb semanticscholar.org/CorpusID:6842305).
i.nlm.nih.gov/pmc/articles/PMC1564158). PMID 16968132 (https://p 488. Iurii D. Katser; Vyacheslav O. Kozitsin. "SKAB GitHub repository" (h
ubmed.ncbi.nlm.nih.gov/16968132).
ttps://github.com/waico/skab). GitHub. Retrieved 12 January 2021.
474. Munisami, Trishen; et al. (2015). "Plant Leaf Recognition Using
489. Iurii D. Katser; Vyacheslav O. Kozitsin (2020). "Skoltech Anomaly
Shape Features and Colour Histogram with K-nearest Neighbour Benchmark (SKAB)" (https://www.kaggle.com/yuriykatser/skoltech-
Classifiers" (https://doi.org/10.1016%2Fj.procs.2015.08.095).
anomaly-benchmark-skab). Kaggle.
Procedia Computer Science. 58: 740–747.
doi:10.34740/KAGGLE/DSV/1693952 (https://doi.org/10.34740%2F
doi:10.1016/j.procs.2015.08.095 (https://doi.org/10.1016%2Fj.procs. KAGGLE%2FDSV%2F1693952). Retrieved 12 January 2021.
2015.08.095).
490. Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello,
475. Li, Bai (2016). "Atomic potential matching: An evolutionary target Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira;
recognition approach based on edge features". Optik. 127 (5): Houle, Michael E. (2016). "On the evaluation of unsupervised
3162–3168. Bibcode:2016Optik.127.3162L (https://ui.adsabs.harvar
outlier detection: measures, datasets, and an empirical study". Data
d.edu/abs/2016Optik.127.3162L). doi:10.1016/j.ijleo.2015.11.186 (h Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-
ttps://doi.org/10.1016%2Fj.ijleo.2015.11.186). 015-0444-8 (https://doi.org/10.1007%2Fs10618-015-0444-8).
476. Nilsback, Maria-Elena, and Andrew Zisserman. "A visual ISSN 1384-5810 (https://www.worldcat.org/issn/1384-5810).
vocabulary for flower classification (http://www.robots.ox.ac.uk/~me S2CID 1952214 (https://api.semanticscholar.org/CorpusID:195221
n/papers/nilsback_cvpr06.pdf)."Computer Vision and Pattern 4).
Recognition, 2006 IEEE Computer Society Conference on. Vol. 2.
491. Ann-Kathrin Hartmann, Tommaso Soru, Edgard Marx. Generating a
IEEE, 2006. Large Dataset for Neural Question Answering over the DBpedia
477. Giselsson, Thomas M.; et al. (2017). "A Public Image Database for Knowledge Base (https://www.researchgate.net/publication/324482
Benchmark of Plant Seedling Classification Algorithms". 598_Generating_a_Large_Dataset_for_Neural_Question_Answeri
arXiv:1711.05458 (https://arxiv.org/abs/1711.05458) [cs.CV (https:// ng_over_the_DBpedia_Knowledge_Base). 2018.
arxiv.org/archive/cs.CV)].
492. Tommaso Soru, Edgard Marx. Diego Moussallem, Andre
Valdestilhas, Diego Esteves, Ciro Baron. SPARQL as a Foreign
Language. 2018.
493. Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan 507. "CWE - Common Weakness Enumeration" (https://cwe.mitre.org/ind
Luu-Thuy Nguyen. A Vietnamese Dataset for Evaluating Machine ex.html). cwe.mitre.org. Retrieved 14 January 2023.
Reading Comprehension (https://www.aclweb.org/anthology/2020.c 508. Lim, Swee Kiat; Muis, Aldrian Obaja; Lu, Wei; Ong, Chen Hui (July
oling-main.233.pdf). COLING 2020. 2017). "MalwareTextDB: A Database for Annotated Malware
494. Kiet Van Nguyen, Khiem Vinh Tran, Son T. Luu, Anh Gia-Tuan Articles" (https://aclanthology.org/P17-1143). Proceedings of the
Nguyen, Ngan Luu-Thuy Nguyen. Enhancing Lexical-Based 55th Annual Meeting of the Association for Computational
Approach With External Knowledge for Vietnamese Multiple- Linguistics (Volume 1: Long Papers). Vancouver, Canada:
Choice Machine Reading Comprehension (https://ieeexplore.ieee.o Association for Computational Linguistics: 1557–1567.
rg/document/9247161). IEEE Access. 2020. doi:10.18653/v1/P17-1143 (https://doi.org/10.18653%2Fv1%2FP17
495. Anantha, Raviteja; Vakulenko, Svitlana; Tu, Zhucheng; Longpre, -1143). S2CID 7816596 (https://api.semanticscholar.org/CorpusID:7
Shayne; Pulman, Stephen; Chappidi, Srinivas (2020). "Open- 816596).
Domain Question Answering Goes Conversational via Question 509. "USENIX" (https://www.usenix.org/). USENIX. Retrieved
Rewriting". arXiv:2010.04898 (https://arxiv.org/abs/2010.04898) 19 January 2023.
[cs.IR (https://arxiv.org/archive/cs.IR)]. 510. "APTnotes | Read the Docs" (https://readthedocs.org/projects/aptno
496. Khashabi, Daniel; Min, Sewon; Khot, Tushar; Sabharwal, Ashish; tes/). readthedocs.org. Retrieved 19 January 2023.
Tafjord, Oyvind; Clark, Peter; Hajishirzi, Hannaneh (November 511. "Cryptography and Security authors/titles recent submissions" (http
2020). "UNIFIEDQA: Crossing Format Boundaries with a Single QA s://arxiv.org/list/cs.CR/recent). arxiv.org. Retrieved 19 January 2023.
System" (https://aclanthology.org/2020.findings-emnlp.171). 512. "Holistic Info-Sec for Web Developers - Fascicle 0" (https://f0.holisti
Findings of the Association for Computational Linguistics: EMNLP
cinfosecforwebdevelopers.com/).
2020. Online: Association for Computational Linguistics: 1896– f0.holisticinfosecforwebdevelopers.com. Retrieved 20 January
1907. arXiv:2005.00700 (https://arxiv.org/abs/2005.00700). 2023.
doi:10.18653/v1/2020.findings-emnlp.171 (https://doi.org/10.1865
3%2Fv1%2F2020.findings-emnlp.171). S2CID 218487109 (https://a 513. "Holistic Info-Sec for Web Developers - Fascicle 1" (https://f1.holisti
pi.semanticscholar.org/CorpusID:218487109). cinfosecforwebdevelopers.com/).
f1.holisticinfosecforwebdevelopers.com. Retrieved 20 January
497. Taskmaster (https://github.com/google-research-datasets/Taskmast
2023.
er), Google Research Datasets, 17 December 2022, retrieved
7 January 2023 514. Vincent, Adam. "Web Services Web Services Hacking and
Hardening" (https://owasp.org/www-pdf-archive/Web_Services_Ha
498. Byrne, Bill; Krishnamoorthi, Karthik; Sankar, Chinnadhurai;
cking_and_Hardening.pdf) (PDF). owasp.org.
Neelakantan, Arvind; Duckworth, Daniel; Yavuz, Semih; Goodrich,
Ben; Dubey, Amit; Cedilnik, Andy; Kim, Kyu-Young (1 September 515. McCray, Joe. "Advanced SQL Injection" (https://defcon.org/images/d
2019). "Taskmaster-1: Toward a Realistic and Diverse Dialog efcon-17/dc-17-presentations/defcon-17-joseph_mccray-adv_sql_in
Dataset". arXiv:1909.05358 (https://arxiv.org/abs/1909.05358) jection.pdf) (PDF). defcon.org.
[cs.CL (https://arxiv.org/archive/cs.CL)]. 516. Shah, Shreeraj. "Blind SQL injection discovery & exploitation
499. Yasunaga, Michihiro; Liang, Percy (21 November 2020). "Graph- technique" (https://blueinfy.com/wp/blindsql.pdf) (PDF).
based, Self-Supervised Program Repair from Diagnostic blueinfy.com.
Feedback" (https://proceedings.mlr.press/v119/yasunaga20a.html). 517. Palcer, C. C. "Ethical hacking" (https://blueinfy.com/wp/blindsql.pdf)
International Conference on Machine Learning. PMLR: 10799– (PDF). textfiles.
10808. arXiv:2005.10636 (https://arxiv.org/abs/2005.10636). 518. "Hacking Secrets Revealed - Information and Instructional Guide"
500. Wang, Yizhong; Mishra, Swaroop; Alipoormolabashi, Pegah; Kordi, (https://www.onlinepot.org/security/HackersSecrets.pdf) (PDF).
Yeganeh; Mirzaei, Amirreza; Arunkumar, Anjana; Ashok, Arjun; 519. Park, Alexis. "Hack any website" (https://defcon.org/images/defcon-
Dhanasekaran, Arut Selvan; Naik, Atharva; Stap, David; Pathak, 11/dc-11-presentations/dc-11-Gentil/dc-11-gentil.pdf) (PDF).
Eshaan; Karamanolakis, Giannis; Lai, Haizhi Gary; Purohit, Ishan; 520. Cerrudo, Cesar; Martinez Fayo, Esteban. "Hacking Databases for
Mondal, Ishani (24 October 2022). "Super-NaturalInstructions: Owning your Data" (https://www.blackhat.com/presentations/bh-eur
Generalization via Declarative Instructions on 1600+ NLP Tasks". ope-07/Cerrudo/Whitepaper/bh-eu-07-cerrudo-WP-up.pdf) (PDF).
arXiv:2204.07705 (https://arxiv.org/abs/2204.07705) [cs.CL (https://a blackhat.
rxiv.org/archive/cs.CL)].
521. O'Connor, Tj. "Violent Python-A Cookbook for Hackers, Forensic
501. Paperno, Denis; Kruszewski, Germán; Lazaridou, Angeliki; Pham, Analysts, Penetration Testers and Security Engineers" (https://githu
Quan Ngoc; Bernardi, Raffaella; Pezzelle, Sandro; Baroni, Marco; b.com/reconSF/python/blob/master/Syngress.Violent.Python.a.Coo
Boleda, Gemma; Fernández, Raquel (7 August 2016), The kbook.for.Hackers.2013.pdf) (PDF). Github.
LAMBADA dataset (https://zenodo.org/record/2630551),
522. Grand, Joe. "Hardware Reverse Engineering: Access, Analyze, &
doi:10.5281/zenodo.2630551 (https://doi.org/10.5281%2Fzenodo.2
Defeat" (https://media.blackhat.com/bh-dc-11/Grand/BlackHat_DC_
630551), retrieved 7 January 2023
2011_Grand-Workshop.pdf) (PDF). blackhat.
502. Paperno, Denis; Kruszewski, Germán; Lazaridou, Angeliki; Pham,
523. Chang, Jason V. "Computer Hacking: Making the Case for National
Ngoc Quan; Bernardi, Raffaella; Pezzelle, Sandro; Baroni, Marco;
Reporting Requirement" (https://cyber.harvard.edu/sites/cyber.law.h
Boleda, Gemma; Fernández, Raquel (August 2016). "The
arvard.edu/files/ComputerHacking.pdf) (PDF). cyber.harvard.edu.
LAMBADA dataset: Word prediction requiring a broad discourse
context" (https://aclanthology.org/P16-1144). Proceedings of the 524. "National Cybersecurity Strategies Repository" (https://www.itu.int:4
54th Annual Meeting of the Association for Computational 43/en/ITU-D/Cybersecurity/Pages/National-Strategies-repository.as
Linguistics (Volume 1: Long Papers). Berlin, Germany: Association px). ITU. Retrieved 20 January 2023.
for Computational Linguistics: 1525–1534. doi:10.18653/v1/P16- 525. Chen, Yanlin (31 August 2022), Cyber Security Natural Language
1144 (https://doi.org/10.18653%2Fv1%2FP16-1144). Processing (https://github.com/Ychen463/Cyber), retrieved
hdl:10230/32702 (https://hdl.handle.net/10230%2F32702). 20 January 2023
S2CID 2381275 (https://api.semanticscholar.org/CorpusID:238127 526. "https://twitter.com/blackorbird" (https://twitter.com/blackorbird).
5). Twitter. Retrieved 20 January 2023. {{cite web}}: External link
503. Wei, Jason; Bosma, Maarten; Zhao, Vincent; Guu, Kelvin; Yu, in |title= (help)
Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. 527. Zampieri, Marcos; Malmasi, Shervin; Nakov, Preslav; Rosenthal,
(10 February 2022). "Finetuned Language Models are Zero-Shot Sara; Farra, Noura; Kumar, Ritesh (16 April 2019). "Predicting the
Learners" (https://openreview.net/forum?id=gEZrGCozdqR). Type and Target of Offensive Posts in Social Media".
arXiv:2109.01652 (https://arxiv.org/abs/2109.01652). arXiv:1902.09666 (https://arxiv.org/abs/1902.09666) [cs.CL (https://a
504. "Working with ATT&CK | MITRE ATT&CK®" (https://attack.mitre.or rxiv.org/archive/cs.CL)].
g/resources/working-with-attack/). attack.mitre.org. Retrieved 528. "Threat reports" (https://www.ncsc.gov.uk/section/keep-up-to-date/th
14 January 2023. reat-reports). www.ncsc.gov.uk. Retrieved 20 January 2023.
505. "CAPEC - Common Attack Pattern Enumeration and Classification 529. "Category: APT reports | Securelist" (https://securelist.com/category/
(CAPEC™)" (https://capec.mitre.org/). capec.mitre.org. Retrieved apt-reports/). securelist.com. Retrieved 23 January 2023.
14 January 2023. 530. "Your Cybersecurity News Connection - Cyber News | CyberWire"
506. "CVE - Home" (https://cve.mitre.org/cve/). cve.mitre.org. Retrieved (https://thecyberwire.com/). The CyberWire. Retrieved 23 January
14 January 2023. 2023.
531. "News" (https://www.databreaches.net/news/). Retrieved 556. "Climatext" (http://www.sustainablefinance.uzh.ch/en/research/clim
23 January 2023. ate-fever/climatext.html). www.sustainablefinance.uzh.ch. Retrieved
532. "Cybernews" (https://cybernews.com/). Cybernews. 19 February 2023.
533. "HIPAA Journal" (https://www.hipaajournal.com/). HIPAA Journal. 557. "Greenbiz" (https://www.greenbiz.com/). www.greenbiz.com.
Retrieved 23 January 2023. Retrieved 2 March 2023.
534. "BleepingComputer" (https://www.bleepingcomputer.com/). 558. "Explore the @Reuters Hot List of 1,000 top climate scientists" (http
BleepingComputer. Retrieved 23 January 2023. s://www.reuters.com/investigates/special-report/climate-change-sci
entists-list/). Reuters. Retrieved 22 March 2023.
535. "Homepage" (https://therecord.media/). The Record from Recorded
Future News. Retrieved 23 January 2023. 559. "Blogs | Alliance for Research on Corporate Sustainability" (https://c
orporate-sustainability.org/blogs/). corporate-sustainability.org.
536. "HackRead | Latest Cyber Crime - InfoSec- Tech - Hacking News"
Retrieved 27 March 2023.
(https://www.hackread.com/). 8 January 2022. Retrieved 23 January
2023. 560. "Greenbiz" (https://www.greenbiz.com/). www.greenbiz.com.
Retrieved 29 March 2023.
537. "Securelist | Kaspersky's threat research and reports" (https://secure
list.com/). securelist.com. Retrieved 31 January 2023. 561. "CSR News" (https://www.csrwire.com/press_releases).
www.csrwire.com. Retrieved 29 March 2023.
538. Harshaw, Christopher R.; Bridges, Robert A.; Iannacone, Michael
D.; Reed, Joel W.; Goodall, John R. (5 April 2016). "GraphPrints: 562. "CDP Homepage" (https://www.cdp.net/en). www.cdp.net.
Towards a Graph Analytic Method for Network Anomaly Detection" Retrieved 29 March 2023.
(https://doi.org/10.1145/2897795.2897806). Proceedings of the 11th 563. "Hybrid cloud blog" (https://content.cloud.redhat.com/blog).
Annual Cyber and Information Security Research Conference. content.cloud.redhat.com. Retrieved 9 April 2023.
CISRC '16. New York, NY, USA: Association for Computing 564. "Production-Grade Container Orchestration" (https://kubernetes.io/).
Machinery: 1–4. doi:10.1145/2897795.2897806 (https://doi.org/10.1 Kubernetes. Retrieved 9 April 2023.
145%2F2897795.2897806). ISBN 978-1-4503-3752-6. 565. "Home | Official Red Hat OpenShift Documentation" (https://docs.op
539. "Farsight Security, cyber security intelligence solutions" (https://ww enshift.com/). docs.openshift.com. Retrieved 9 April 2023.
w.farsightsecurity.com/). Farsight Security. Retrieved 13 February 566. "Cloud Native Computing Foundation" (https://www.cncf.io/). Cloud
2023. Native Computing Foundation. Retrieved 9 April 2023.
540. "Schneier on Security" (https://www.schneier.com/).
567. CNCF Community Presentations (https://github.com/cncf/presentati
www.schneier.com. Retrieved 13 February 2023. ons/blob/2ff57e4d78f6d70bb1fd5daf81e76f04a54c8520/kubernete
541. "#1 in Cloud Security & Endpoint Cybersecurity" (https://www.trend s/README.md), Cloud Native Computing Foundation (CNCF), 11
micro.com/en_us/business.html). Trend Micro. Retrieved April 2023, retrieved 11 April 2023
13 February 2023. 568. "Red Hat - We make open source technologies for the enterprise"
542. "The Hacker News | #1 Trusted Cybersecurity News Site" (https://th (https://www.redhat.com/en). www.redhat.com. Retrieved 1 May
ehackernews.com/). The Hacker News. Retrieved 13 February 2023.
2023.
569. Brown, Michael Scott, Michael J. Pelosi, and Henry Dirska.
543. "Krebs on Security – In-depth security news and investigation" (http "Dynamic-radius species-conserving genetic algorithm for the
s://krebsonsecurity.com/). Retrieved 25 February 2023. financial forecasting of Dow Jones index stocks (http://www.academ
544. "MITRE D3FEND Knowledge Graph" (https://d3fend.mitre.org/). ia.edu/download/46729605/BrownPelosiDirska79880027.pdf)."
d3fend.mitre.org. Retrieved 31 March 2023. Machine Learning and Data Mining in Pattern Recognition.
545. "MITRE | ATLAS™" (https://atlas.mitre.org/). atlas.mitre.org. Springer Berlin Heidelberg, 2013. 27–41.
Retrieved 31 March 2023. 570. Shen, Kao-Yi; Tzeng, Gwo-Hshiung (2015). "Fuzzy Inference-
546. "MITRE Engage™ | An Adversary Engagement Framework from Enhanced VC-DRSA Model for Technical Analysis: Investment
MITRE" (https://engage.mitre.org/). Retrieved 1 April 2023. Decision Aid". International Journal of Fuzzy Systems. 17 (3): 375–
547. "Hacking Tutorials - The best Step-by-Step Hacking Tutorials" (http 389. doi:10.1007/s40815-015-0058-8 (https://doi.org/10.1007%2Fs
s://www.hackingtutorials.org/). Hacking Tutorials. Retrieved 1 April 40815-015-0058-8). S2CID 68241024 (https://api.semanticscholar.o
rg/CorpusID:68241024).
2023.
548. "TCFD Knowledge Hub" (https://www.tcfdhub.org/). TCFD 571. Quinlan, J. Ross (1987). "Simplifying decision trees". International
Journal of Man-Machine Studies. 27 (3): 221–234.
Knowledge Hub. Retrieved 3 February 2023.
CiteSeerX 10.1.1.18.4267 (https://citeseerx.ist.psu.edu/viewdoc/su
549. "ResponsibilityReports.com" (https://www.responsibilityreports.co mmary?doi=10.1.1.18.4267). doi:10.1016/s0020-7373(87)80053-6
m/). www.responsibilityreports.com. Retrieved 3 February 2023. (https://doi.org/10.1016%2Fs0020-7373%2887%2980053-6).
550. "About — IPCC" (https://www.ipcc.ch/about/). Retrieved 572. Hamers, Bart; Suykens, Johan AK; De Moor, Bart (2003). "Coupled
20 February 2023. transductive ensemble learning of kernel models" (http://ftp.esat.kul
551. "Alliance for Research on Corporate Sustainability | ARCS serves euven.be/pub/SISTA/hamers/BH_clm.pdf) (PDF). Journal of
as a vehicle for advancing rigorous academic research on Machine Learning Research. 1: 1–48.
corporate sustainability issues" (https://corporate-sustainability.or 573. Shmueli, Galit, Ralph P. Russo, and Wolfgang Jank. "The
g/). corporate-sustainability.org. Retrieved 2 March 2023. BARISTA: a model for bid arrivals in online auctions (https://project
552. Mehra, Srishti; Louka, Robert; Zhang, Yixun (26 March 2022). euclid.org/download/pdfview_1/euclid.aoas/1196438025)." The
"ESGBERT: Language Model to Help with Classification Tasks Annals of Applied Statistics(2007): 412–441.
Related to Companies Environmental, Social, and Governance 574. Peng, Jie, and Hans-Georg Müller. "Distance-based clustering of
Practices". Embedded Systems and Applications: 183–190. sparsely observed stochastic processes, with applications to online
arXiv:2203.16788 (https://arxiv.org/abs/2203.16788). auctions (https://projecteuclid.org/download/pdfview_1/euclid.aoas/
doi:10.5121/csit.2022.120616 (https://doi.org/10.5121%2Fcsit.2022. 1223908052)." The Annals of Applied Statistics (2008): 1056–1077.
120616). ISBN 9781925953657. S2CID 247825524 (https://api.sem
575. Eggermont, Jeroen, Joost N. Kok, and Walter A. Kosters. "Genetic
anticscholar.org/CorpusID:247825524).
programming for data classification: Partitioning the search space
553. This article incorporates text (https://www.tensorflow.or (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.8725&r
g/datasets/community_catalog/huggingface/climate_fever) ep=rep1&type=pdf)."Proceedings of the 2004 ACM symposium on
available under the CC BY 4.0 license. Applied computing. ACM, 2004.
554. Diggelmann, Thomas; Boyd-Graber, Jordan; Bulian, Jannis; 576. Moro, Sérgio; Cortez, Paulo; Rita, Paulo (2014). "A data-driven
Ciaramita, Massimiliano; Leippold, Markus (2 January 2021). approach to predict the success of bank telemarketing". Decision
"CLIMATE-FEVER: A Dataset for Verification of Real-World Climate Support Systems. 62: 22–31. doi:10.1016/j.dss.2014.03.001 (https://
Claims". arXiv:2012.00614 (https://arxiv.org/abs/2012.00614) [cs.CL doi.org/10.1016%2Fj.dss.2014.03.001). hdl:10071/9499 (https://hdl.
(https://arxiv.org/archive/cs.CL)]. handle.net/10071%2F9499). S2CID 14181100 (https://api.semantic
555. "climate-news-db" (http://www.climate-news-db.com/). www.climate- scholar.org/CorpusID:14181100).
news-db.com. Retrieved 3 February 2023.
577. Payne, Richard D.; Mallick, Bani K. (2014). "Bayesian Big Data 593. Meek, Christopher, Bo Thiesson, and David Heckerman. "The
Classification: A Review with Complements". arXiv:1411.5653 (http Learning Curve Method Applied to Clustering (https://www.microsof
s://arxiv.org/abs/1411.5653) [stat.ME (https://arxiv.org/archive/stat.M t.com/en-us/research/wp-content/uploads/2001/01/lc-aistats.pdf)."
E)]. AISTATS. 2001.
578. Akbilgic, Oguz; Bozdogan, Hamparsum; Balaban, M. Erdal (2014). 594. Fanaee-T, Hadi; Gama, Joao (2013). "Event labeling combining
"A novel Hybrid RBF Neural Networks model as a forecaster". ensemble detectors and background knowledge" (http://repositorio.i
Statistics and Computing. 24 (3): 365–375. doi:10.1007/s11222- nesctec.pt/handle/123456789/3506). Progress in Artificial
013-9375-7 (https://doi.org/10.1007%2Fs11222-013-9375-7). Intelligence. 2 (2–3): 113–127. doi:10.1007/s13748-013-0040-3 (htt
S2CID 17764829 (https://api.semanticscholar.org/CorpusID:177648 ps://doi.org/10.1007%2Fs13748-013-0040-3). S2CID 3345087 (http
29). s://api.semanticscholar.org/CorpusID:3345087).
579. Jabin, Suraiya. "Stock market prediction using feed-forward artificial 595. Giot, Romain, and Raphaël Cherrier. "Predicting bikeshare system
neural network (http://citeseerx.ist.psu.edu/viewdoc/download?doi= usage up to one day ahead (https://hal.archives-ouvertes.fr/docs/01/
10.1.1.677.8985&rep=rep1&type=pdf)." Int. J. Comput. Appl. (IJCA) 06/59/83/PDF/paper_final.pdf)." Computational intelligence in
99.9 (2014). vehicles and transportation systems (CIVTS), 2014 IEEE
580. Yeh, I-Cheng; Che-hui, Lien (2009). "The comparisons of data symposium on. IEEE, 2014.
mining techniques for the predictive accuracy of probability of 596. Zhan, Xianyuan; et al. (2013). "Urban link travel time estimation
default of credit card clients". Expert Systems with Applications. 36 using large-scale taxi data with partial information". Transportation
(2): 2473–2480. doi:10.1016/j.eswa.2007.12.020 (https://doi.org/10. Research Part C: Emerging Technologies. 33: 37–49.
1016%2Fj.eswa.2007.12.020). doi:10.1016/j.trc.2013.04.001 (https://doi.org/10.1016%2Fj.trc.2013.
581. Lin, Shu Ling (2009). "A new two-stage hybrid approach of credit 04.001).
risk in banking industry". Expert Systems with Applications. 36 (4): 597. Moreira-Matias, Luis; et al. (2013). "Predicting taxi–passenger
8333–8341. doi:10.1016/j.eswa.2008.10.015 (https://doi.org/10.101 demand using streaming data" (http://repositorio.inesctec.pt/handle/
6%2Fj.eswa.2008.10.015). 123456789/5356). IEEE Transactions on Intelligent Transportation
582. Pelckmans, Kristiaan; et al. (2005). "The differogram: Non- Systems. 14 (3): 1393–1402. doi:10.1109/tits.2013.2262376 (https://
parametric noise variance estimation and its use for model doi.org/10.1109%2Ftits.2013.2262376). S2CID 14764358 (https://a
selection". Neurocomputing. 69 (1): 100–122. pi.semanticscholar.org/CorpusID:14764358).
doi:10.1016/j.neucom.2005.02.015 (https://doi.org/10.1016%2Fj.ne 598. Hwang, Ren-Hung; Hsueh, Yu-Ling; Chen, Yu-Ting (2015). "An
ucom.2005.02.015). effective taxi recommender system based on a spatio-temporal
583. Bay, Stephen D.; et al. (2000). "The UCI KDD archive of large data factor analysis model". Information Sciences. 314: 28–40.
sets for data mining research and experimentation". ACM SIGKDD doi:10.1016/j.ins.2015.03.068 (https://doi.org/10.1016%2Fj.ins.201
Explorations Newsletter. 2 (2): 81–85. CiteSeerX 10.1.1.15.9776 (ht 5.03.068).
tps://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.9776). 599. H. V. Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis
doi:10.1145/380995.381030 (https://doi.org/10.1145%2F380995.38 Papakonstantinou, Jignesh M. Patel, Raghu Ramakrishnan, and
1030). S2CID 534881 (https://api.semanticscholar.org/CorpusID:53 Cyrus Shahabi. Big data and its technical challenges. Commun.
4881). ACM, 57(7):86–94, July 2014.
584. Lucas, D. D.; et al. (2015). "Designing optimal greenhouse gas 600. Caltrans PeMS (http://pems.dot.ca.gov/)
observing networks that consider performance and cost" (https://doi. 601. Meusel, Robert, et al. "The Graph Structure in the Web—Analyzed
org/10.5194%2Fgi-4-121-2015). Geoscientific Instrumentation, on Different Aggregation Levels (https://www.nowpublishers.com/art
Methods and Data Systems. 4 (1): 121. Bibcode:2015GI......4..121L icle/OpenAccessDownload/JWS-0003)."The Journal of Web
(https://ui.adsabs.harvard.edu/abs/2015GI......4..121L). Science 1.1 (2015).
doi:10.5194/gi-4-121-2015 (https://doi.org/10.5194%2Fgi-4-121-201 602. Kushmerick, Nicholas. "Learning to remove internet advertisements
5). (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.35.5686
585. Pales, Jack C.; Keeling, Charles D. (1965). "The concentration of &rep=rep1&type=pdf)." Proceedings of the third annual conference
atmospheric carbon dioxide in Hawaii". Journal of Geophysical on Autonomous Agents. ACM, 1999.
Research. 70 (24): 6053–6076. Bibcode:1965JGR....70.6053P (http
603. Fradkin, Dmitriy, and David Madigan. "Experiments with random
s://ui.adsabs.harvard.edu/abs/1965JGR....70.6053P).
projections for machine learning (https://www.researchgate.net/profi
doi:10.1029/jz070i024p06053 (https://doi.org/10.1029%2Fjz070i02 le/Dmitriy_Fradkin/publication/2573186_Experiments_with_Rando
4p06053).
m_Projections_for_Machine_Learning/links/0fcfd50b6230aaf30900
586. Sigillito, Vincent G., et al. "Classification of radar returns from the 0000.pdf)."Proceedings of the ninth ACM SIGKDD international
ionosphere using neural networks." Johns Hopkins APL Technical conference on Knowledge discovery and data mining. ACM, 2003.
Digest10.3 (1989): 262–266.
604. This data was used in the American Statistical Association
587. Zhang, Kun, and Wei Fan. "Forecasting skewed biased stochastic Statistical Graphics and Computing Sections 1999 Data Exposition.
ozone days: analyses, solutions and beyond (http://citeseerx.ist.ps
605. Ma, Justin, et al. "Identifying suspicious URLs: an application of
u.edu/viewdoc/download?doi=10.1.1.218.9860&rep=rep1&type=pd
large-scale online learning (https://cseweb.ucsd.edu/~voelker/pubs/
f)." Knowledge and Information Systems14.3 (2008): 299–326. mal-url-icml09.pdf)."Proceedings of the 26th annual international
588. Reich, Brian J., Montserrat Fuentes, and David B. Dunson. conference on machine learning. ACM, 2009.
"Bayesian spatial quantile regression (https://www.ncbi.nlm.nih.gov/
606. Levchenko, Kirill, et al. "Click trajectories: End-to-end analysis of
pmc/articles/PMC3583387/)." Journal of the American Statistical
the spam value chain (http://www.icir.org/christian/publications/2011
Association (2012).
-oakland-trajectory.pdf)." Security and Privacy (SP), 2011 IEEE
589. Kohavi, Ron (1996). "Scaling Up the Accuracy of Naive-Bayes Symposium on. IEEE, 2011.
Classifiers: A Decision-Tree Hybrid". KDD. 96.
607. Mohammad, Rami M., Fadi Thabtah, and Lee McCluskey. "An
590. Oza, Nikunj C., and Stuart Russell. "Experimental comparisons of assessment of features related to phishing websites using an
online and batch versions of bagging and boosting." Proceedings of automated technique (http://eprints.hud.ac.uk/16229/1/The_7th_ICI
the seventh ACM SIGKDD international conference on Knowledge TST_2012_Conference_-An_Assessment_of_Features_Related_t
discovery and data mining. ACM, 2001. o_Phishing_Websites_using_an_Automated_Technique.pd
591. Bay, Stephen D (2001). "Multivariate discretization for set mining". f)."Internet Technology And Secured Transactions, 2012
Knowledge and Information Systems. 3 (4): 491–512. International Conference for. IEEE, 2012.
CiteSeerX 10.1.1.217.921 (https://citeseerx.ist.psu.edu/viewdoc/su 608. Singh, Ashishkumar, et al. "Clustering Experiments on Big
mmary?doi=10.1.1.217.921). doi:10.1007/pl00011680 (https://doi.or Transaction Data for Market Segmentation (https://dl.acm.org/citatio
g/10.1007%2Fpl00011680). S2CID 10945544 (https://api.semantic n.cfm?id=2644161)." Proceedings of the 2014 International
scholar.org/CorpusID:10945544). Conference on Big Data Science and Computing. ACM, 2014.
592. Ruggles, Steven (1995). "Sample designs and sampling errors". 609. Bollacker, Kurt, et al. "Freebase: a collaboratively created graph
Historical Methods. 28 (1): 40–46. database for structuring human knowledge (http://citeseerx.ist.psu.e
doi:10.1080/01615440.1995.9955312 (https://doi.org/10.1080%2F0 du/viewdoc/download?doi=10.1.1.538.7139&rep=rep1&type=pdf)."
1615440.1995.9955312). Proceedings of the 2008 ACM SIGMOD international conference on
Management of data. ACM, 2008.
610. Mintz, Mike, et al. "Distant supervision for relation extraction without 626. Li, Lihong; Chu, Wei; Langford, John; Wang, Xuanhui (2011).
labeled data (https://www.aclweb.org/anthology/P09-1113)." "Unbiased offline evaluation of contextual-bandit-based news
Proceedings of the Joint Conference of the 47th Annual Meeting of article recommendation algorithms". Proceedings of the fourth ACM
the ACL and the 4th International Joint Conference on Natural international conference on Web search and data mining. pp. 297–
Language Processing of the AFNLP: Volume 2-Volume 2. 306. arXiv:1003.5956 (https://arxiv.org/abs/1003.5956).
Association for Computational Linguistics, 2009. doi:10.1145/1935826.1935878 (https://doi.org/10.1145%2F193582
611. Mesterharm, Chris, and Michael J. Pazzani. "Active learning using 6.1935878). ISBN 9781450304931. S2CID 744200 (https://api.sem
on-line algorithms (http://research.cs.rutgers.edu/~mesterha/active- anticscholar.org/CorpusID:744200).
online.pdf) Archived (https://web.archive.org/web/20170922013803/ 627. Yeung, Kam Fung, and Yanyan Yang. "A proactive personalized
http://research.cs.rutgers.edu/~mesterha/active-online.pdf) 22 mobile news recommendation system (https://ieeexplore.ieee.org/a
September 2017 at the Wayback Machine."Proceedings of the 17th bstract/document/5633837/)." Developments in E-systems
ACM SIGKDD international conference on Knowledge discovery Engineering (DESE), 2010. IEEE, 2010.
and data mining. ACM, 2011. 628. Gass, Susan E.; Roberts, J. Murray (2006). "The occurrence of the
612. Wang, Shusen; Zhang, Zhihua (2013). "Improving CUR matrix cold-water coral Lophelia pertusa (Scleractinia) on oil and gas
decomposition and the Nyström approximation via adaptive platforms in the North Sea: colony growth, recruitment and
sampling" (http://www.jmlr.org/papers/volume14/wang13c/wang13c. environmental controls on distribution". Marine Pollution Bulletin.
pdf) (PDF). The Journal of Machine Learning Research. 14 (1): 52 (5): 549–559. Bibcode:2006MarPB..52..549G (https://ui.adsabs.h
2729–2769. arXiv:1303.4207 (https://arxiv.org/abs/1303.4207). arvard.edu/abs/2006MarPB..52..549G).
Bibcode:2013arXiv1303.4207W (https://ui.adsabs.harvard.edu/abs/ doi:10.1016/j.marpolbul.2005.10.002 (https://doi.org/10.1016%2Fj.
2013arXiv1303.4207W). marpolbul.2005.10.002). PMID 16300800 (https://pubmed.ncbi.nlm.
613. "The Pile" (https://pile.eleuther.ai/). pile.eleuther.ai. Retrieved nih.gov/16300800).
14 April 2022. 629. Gionis, Aristides; Mannila, Heikki; Tsaparas, Panayiotis (2007).
614. "JSON Lines" (https://jsonlines.org/). jsonlines.org. Retrieved "Clustering aggregation". ACM Transactions on Knowledge
14 April 2022. Discovery from Data. 1 (1): 4. CiteSeerX 10.1.1.709.528 (https://cite
seerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.709.528).
615. Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe,
Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; doi:10.1145/1217299.1217303 (https://doi.org/10.1145%2F121729
9.1217303). S2CID 433708 (https://api.semanticscholar.org/CorpusI
Nabeshima, Noa; Presser, Shawn (31 December 2020). "The Pile:
D:433708).
An 800GB Dataset of Diverse Text for Language Modeling".
arXiv:2101.00027 (https://arxiv.org/abs/2101.00027) [cs.CL (https://a 630. Obradovic, Zoran, and Slobodan Vucetic.Challenges in Scientific
rxiv.org/archive/cs.CL)]. Data Mining: Heterogeneous, Biased, and Large Samples.
Technical Report, Center for Information Science and Technology
616. Cohen, Vanya. "OpenWebTextCorpus" (https://skylion007.github.io/
OpenWebTextCorpus/). OpenWebTextCorpus. Retrieved 9 January Temple University, 2004.
2023. 631. Van Der Putten, Peter; van Someren, Maarten (2000). "CoIL
617. "openwebtext · Datasets at Hugging Face" (https://huggingface.co/d challenge 2000: The insurance company case". Published by
Sentient Machine Research, Amsterdam. Also a Leiden Institute of
atasets/openwebtext). huggingface.co. 16 November 2022.
Advanced Computer Science Technical Report. 9: 1–43.
Retrieved 9 January 2023.
618. Cattral, Robert; Oppacher, Franz; Deugo, Dwight (2002). 632. Mao, K. Z. (2002). "RBF neural network center selection based on
Fisher ratio class separability measure". IEEE Transactions on
"Evolutionary data mining with automatic rule generalization" (http
Neural Networks. 13 (5): 1211–1217.
s://web.archive.org/web/20190806015013/https://pdfs.semanticsch
olar.org/c068/ea7807367573f4b5f98c0681fca665e9ef74.pdf) doi:10.1109/tnn.2002.1031953 (https://doi.org/10.1109%2Ftnn.200
2.1031953). PMID 18244518 (https://pubmed.ncbi.nlm.nih.gov/1824
(PDF). Recent Advances in Computers, Computing and
4518).
Communications: 296–300. S2CID 18625415 (https://api.semantics
cholar.org/CorpusID:18625415). Archived from the original (https://p 633. Olave, Manuel; Rajkovic, Vladislav; Bohanec, Marko (1989). "An
dfs.semanticscholar.org/c068/ea7807367573f4b5f98c0681fca665e application for admission in public school systems" (http://kt.ijs.si/M
9ef74.pdf) (PDF) on 6 August 2019. arkoBohanec/pub/Nursery89.pdf) (PDF). Expert Systems in Public
619. Burton, Ariel N.; Kelly, Paul H.J. (2006). "Performance prediction of Administration. 1: 145–160.
paging workloads using lightweight tracing". Future Generation 634. Lizotte, Daniel J.; Madani, Omid; Greiner, Russell (2012).
Computer Systems. Elsevier BV. 22 (7): 784–793. "Budgeted Learning of Naive-Bayes Classifiers". arXiv:1212.2472
doi:10.1016/j.future.2006.02.003 (https://doi.org/10.1016%2Fj.futur (https://arxiv.org/abs/1212.2472) [cs.LG (https://arxiv.org/archive/cs.
e.2006.02.003). ISSN 0167-739X (https://www.worldcat.org/issn/01 LG)].
67-739X). 635. Lebowitz, Michael (1986). Concept learning in a rich input domain:
620. Bain, Michael; Muggleton, Stephen (1994). "Learning optimal chess Generalization-based memory (https://books.google.com/books?id=
strategies". Machine Intelligence. Oxford University Press, Inc. 13. f9RylgKpHZsC&q=%22Concept+learning+in+a+rich+input+domai
621. Quilan, J. R. (1983). "Learning efficient classification procedures n:+Generalization-based+memory%22&pg=PA193). Machine
Learning: An Artificial Intelligence Approach. Vol. 2. pp. 193–214.
and their application to chess end games". Machine Learning: An
Artificial Intelligence Approach. 1: 463–482. doi:10.1007/978-3-662- ISBN 9780934613002.
12405-5_15 (https://doi.org/10.1007%2F978-3-662-12405-5_15). 636. Yeh, I-Cheng; Yang, King-Jang; Ting, Tao-Ming (2009). "Knowledge
ISBN 978-3-662-12407-9. discovery on RFM model using Bernoulli sequence". Expert
622. Shapiro, Alen D. (1987). Structured induction in expert systems. Systems with Applications. 36 (3): 5866–5871.
doi:10.1016/j.eswa.2008.07.018 (https://doi.org/10.1016%2Fj.eswa.
Addison-Wesley Longman Publishing Co., Inc.
2008.07.018).
623. Matheus, Christopher J.; Rendell, Larry A. (1989). "Constructive
637. Lee, Wen-Chen; Cheng, Bor-Wen (2011). "An intelligent system for
Induction on Decision Trees" (http://www.academia.edu/download/4
improving performance of blood donation" (http://www.airitilibrary.co
0413240/Constructive_Induction_On_Decision_Trees20151126-44
70-tjt71n.pdf) (PDF). IJCAI. 89. m/Publication/alDetailedMesh?docid=10220690-201104-20110505
0019-201105050019-173-185). Journal of Quality Vol. 18 (2): 173.
624. Belsley, David A., Edwin Kuh, and Roy E. Welsch. Regression
638. Schmidtmann, Irene, et al. "Evaluation des Krebsregisters NRW
diagnostics: Identifying influential data and sources of collinearity.
Vol. 571. John Wiley & Sons, 2005. Schwerpunkt Record Linkage (http://www.krebsregister-nrw.de/filea
dmin/user_upload/dokumente/Evaluation/EKR_NRW_Evaluation_
625. Ruotsalo, Tuukka; Aroyo, Lora; Schreiber, Guus (2009). Abschlussbericht_2009-06-11.pdf)." Abschlußbericht vom 11
"Knowledge-based linguistic annotation of digital cultural heritage (2009).
collections" (http://dare.ubvu.vu.nl/bitstream/handle/1871/24407/24
639. Sariyar, Murat; Borg, Andreas; Pommerening, Klaus (2011).
3319.pdf?sequence=3) (PDF). IEEE Intelligent Systems. 24 (2): 64–
"Controlling false match rates in record linkage using extreme value
75. doi:10.1109/MIS.2009.32 (https://doi.org/10.1109%2FMIS.2009.
32). hdl:1871.1/9f6091aa-9596-46a9-9251-f11edeeb28b7 (https://h theory". Journal of Biomedical Informatics. 44 (4): 648–654.
doi:10.1016/j.jbi.2011.02.008 (https://doi.org/10.1016%2Fj.jbi.2011.
dl.handle.net/1871.1%2F9f6091aa-9596-46a9-9251-f11edeeb28b
02.008). PMID 21352952 (https://pubmed.ncbi.nlm.nih.gov/2135295
7). S2CID 6667472 (https://api.semanticscholar.org/CorpusID:6667
472). 2).
640. Candillier, Laurent, and Vincent Lemaire. "Design and Analysis of 645. Barlacchi, Gianni; De Nadai, Marco; Larcher, Roberto; Casella,
the Nomao challenge Active Learning in the Real-World (https://we Antonio; Chitic, Cristiana; Torrisi, Giovanni; Antonelli, Fabrizio;
b.archive.org/web/20181206102406/https://pdfs.semanticscholar.or Vespignani, Alessandro; Pentland, Alex; Lepri, Bruno (2015). "A
g/1647/fc91cfe3e68ef3c41d727b7292ce20482b11.pdf)." multi-source dataset of urban life in the city of Milan and the
Proceedings of the ALRA: Active Learning in Real-world Province of Trentino" (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
Applications, Workshop ECML-PKDD. 2012. C4622222). Scientific Data. 2: 150055.
641. Marquez, Ivan Garrido. "A Domain Adaptation Method for Text Bibcode:2015NatSD...250055B (https://ui.adsabs.harvard.edu/abs/
Classification based on Self-adjusted Training Approach (http://ccc.i 2015NatSD...250055B). doi:10.1038/sdata.2015.55 (https://doi.org/
naoep.mx/~mmontesg/tesis%20estudiantes/TesisMaestria-IvanGarr 10.1038%2Fsdata.2015.55). ISSN 2052-4463 (https://www.worldca
ido.pdf)." (2013). t.org/issn/2052-4463). PMC 4622222 (https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC4622222). PMID 26528394 (https://pubmed.ncbi.n
642. Nagesh, Harsha S., Sanjay Goil, and Alok N. Choudhary. "Adaptive
lm.nih.gov/26528394).
Grids for Clustering Massive Data Sets." SDM. 2001.
643. Kuzilek, Jakub, et al. "OU Analyse: analysing at-risk students at The 646. Vanschoren J, van Rijn JN, Bischl B, Torgo L (2013). "OpenML:
networked science in machine learning". SIGKDD Explorations. 15
Open University (http://oro.open.ac.uk/42529/1/__userdata_docume
nts4_ctb44_Desktop_analysing-at-risk-students-at-open-university. (2): 49–60. arXiv:1407.7722 (https://arxiv.org/abs/1407.7722).
pdf)." Learning Analytics Review (2015): 1–16. doi:10.1145/2641190.2641198 (https://doi.org/10.1145%2F264119
0.2641198). S2CID 4977460 (https://api.semanticscholar.org/Corpu
644. Siemens, George, et al. Open Learning Analytics: an integrated & sID:4977460).
modularized platform (http://search.ror.unisa.edu.au/record/UNISA_
647. Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH
ALMA11143300720001831/media/digital/open/991590917910183
(2017). "PMLB: a large benchmark suite for machine learning
1/12143300710001831/13143328550001831/pdf). Diss. Open
University Press, 2011. evaluation and comparison" (https://www.ncbi.nlm.nih.gov/pmc/artic
les/PMC5725843). BioData Mining. 10: 36. arXiv:1703.00512 (http
s://arxiv.org/abs/1703.00512). Bibcode:2017arXiv170300512O (http
s://ui.adsabs.harvard.edu/abs/2017arXiv170300512O).
doi:10.1186/s13040-017-0154-4 (https://doi.org/10.1186%2Fs1304
0-017-0154-4). PMC 5725843 (https://www.ncbi.nlm.nih.gov/pmc/art
icles/PMC5725843). PMID 29238404 (https://pubmed.ncbi.nlm.nih.
gov/29238404).
648. "Off The Shelf Datasets" (https://appen.com/off-the-shelf-datasets/).
appen.com. Appen. Retrieved 30 December 2020.
649. "Open Source Datasets" (https://appen.com/resources/datasets/).
appen.com. Appen. Retrieved 30 December 2020.