Professional Documents
Culture Documents
Untitled
Untitled
We
ar
e
al
l
f
ami
l
i
ar
wi
t
h
t
he
concept
of
a
dat
abase.
For
most
devel
oper
s,
a
dat
abase
i
s
a
magi
c
bl
ack
box
t
o
whi
ch
t
hey
can
send
a
pi
ece
of
code
(
a
quer
y)
and
t
hat
,
as
a
r
esponse,
r
et
ur
ns
a
bunch
of
dat
a
t
hat
i
s
t
he
answer
t
o
t
he
quer
y,
based
on
t
he
i
nf
or
mat
i
on
hi
dden
i
nsi
de
t
he
dat
abase.
We
can
t
hus
cl
eanl
y
model
a
dat
abas
e
t
hat
st
or
es
val
ues
of
t
ype
T
as
a
hi
gher
or
der
f
unct
i
on
database(T R) [(T,R)],
t
hat
t
akes
a
quer
y
f
unct
i
on
of
t
ype
T R
as
ar
gument
,
and
pr
oduces
a
r
esul
t
set
of
t
upl
es
of
i
nput
val
ues
and
r
esul
t
s.
An
exampl
e
quer
y
f
unct
i
on
i
s
a
pr
oj
ect
i
on
f
unct
i
on
t
hat
pr
oduces
an
empl
oyee'
s
sal
ar
y
f
r
om
an
ent
i
r
e
empl
oyee
r
ecor
d,
and
r
et
ur
ns
t
he
empl
oyee
pl
us
i
t
s
sal
ar
y.
Not
e
t
hat
our
not
i
on
of
quer
y
i
s
sl
i
ght
l
y
unusual
i
n
t
he
sense
t
hat
we
r
et
ur
n
bot
h
t
he
or
i
gi
nal
i
t
em
and
t
he
r
esul
t
.
Typi
cal
quer
i
es
ar
e
mul
t
i
st
aged
composi
t
i
ons
of
el
ement
al
f
i
l
t
er
s,
pr
oj
ect
i
ons,
gr
oupi
ng,
aggr
egat
i
on,
and
or
der
i
ng
oper
at
or
s
.
A
f
i
l
t
er
quer
y
t
hat
oper
at
es
on
a
dat
abase
wi
t
h
i
t
ems
of
t
ype
T,
uses
a
pr
edi
cat
e
PT
t
o
r
ej
ect
val
ues
f
or
whi
ch
t
he
pr
edi
cat
e
i
s
f
al
se:
Af
t
er
f
i
l
t
er
i
ng
out
uni
nt
er
est
i
ng
i
t
ems,
a
quer
y
t
hen
usual
l
y
gr
oups
t
he
r
emai
ni
ng
el
ement
s
accor
di
ng
t
o
some
par
t
i
t
i
on
f
unct
i
on
GT K
i
nt
o
a
key
space
K,
whi
ch
yi
el
ds
a
nest
ed
col
l
ec
t
i
on
of
equi
val
ence
cl
asses
of
val
ues
of
t
ype
T
t
hat
map
i
nt
o
t
he
same
r
epr
esent
at
i
v
e
key
of
t
ype
K:
and
t
hen
ext
r
act
s
t
he
i
nf
or
mat
i
on
we
ar
e
i
nt
er
est
ed
i
n
f
r
om
eac
h
i
t
em
i
n
t
he
gr
oup
usi
ng
a
pr
oj
ect
i
on
f
unct
i
on
XT S:
We
opt
i
onal
l
y
aggr
egat
e
each
gr
oup
(
whose
i
t
ems
ar
e
now
of
t
ype
S
af
t
er
t
he
pr
oj
ect
i
on
st
ep)
i
nt
o
a
si
ngl
e
val
ue
usi
ng
an
aggr
egat
i
on
f
unct
i
on
A[S] R.
A
r
el
at
i
onal
dat
abase
i
s
not
abl
e
t
o
deal
wi
t
h
nest
ed
col
l
ect
i
ons
and
r
equi
r
es
t
he
gr
oups
t
o
be
aggr
egat
ed
i
nt
o
a
si
ngl
e
val
ue,
however
,
i
n
t
he
gener
al
case
we
can
aggr
egat
e
col
l
ect
i
ons
i
nt
o
ot
her
col
l
ect
i
ons.
3
5
8
4
Fi
nal
l
y,
we
sor
t
t
he
r
esul
t
i
ng
col
l
ect
i
on
of
val
ues
of
t
ype
R
usi
ng
an
or
der
i
ng,
or
di
st
ance,
f
unct
i
on
O(R,R)
t
o
obt
ai
n
t
he
f
i
nal
r
esul
t
:
4
8
4
8
Unf
or
t
unat
el
y,
i
n
many
cases,
we
do
not
know
how
t
o
i
mpl
ement
t
he
var
i
ous
f
i
l
t
er
,
pr
oj
ec
t
,
gr
oupi
ng,
and
or
der
i
ng
f
unct
i
ons
t
hat
make
up
a
quer
y.
For
exampl
e,
i
f
we
ar
e
quer
yi
ng
our
emai
l
,
we
mi
ght
want
t
o
f
i
r
st
f
i
l
t
er
out
al
l
spam
messages.
Whi
l
e
we
know
(
i
nt
ui
t
i
vel
y)
what
i
s
and
what
i
sn
t
spam,
i
t
i
s
i
mpossi
bl
e
t
o
def
i
ne
an
exact
al
gor
i
t
hmi
c
i
mpl
ement
at
i
on
of
a
pr
edi
c
at
e
IsSpamEmail .
However
,
gi
ven
a
set
of
exampl
es
of
spam
and
non
spam
emai
l
s
r
epr
esent
ed
as
a
col
l
ect
i
ons
of
exampl
es
of
t
ype
[(Email,)]
,
we
can
use
Bayesianclassification
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Nai
ve_Bayes_cl
assi
f
i
er
]
,
DecisionsTrees[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Deci
si
on_t
r
ee]
or
ot
her
of
f
t
he
shel
f
machi
ne
l
ear
ni
ng
al
gor
i
t
hms
t
o
i
nf
er
a
f
unct
i
on
of
t
ype
(Email ).
I
f
we
sl
i
ght
l
y
squi
nt
at
f
i
l
t
er
i
ng
a
dat
abase
of
emai
l
s
usi
ng
a
pr
edi
cat
e
IsSpamEmail
by
i
nst
ead
of
t
hr
owi
ng
away
emai
l
s
t
hat
ar
e
spam
havi
ng
i
t
r
et
ur
n
a
col
l
ect
i
on
[(Email,)]
of
emai
l
s
pai
r
ed
t
he
r
esul
t
of
appl
yi
ng
IsSpam
(
we
can
al
ways
gr
oup
t
he
r
esul
t
s
l
at
er
and
t
hr
ow
away
t
he
spam
messages
t
he,
j
ust
l
i
k
e
our
emai
l
cl
i
ent
put
s
spam
messages
i
n
a
speci
al
mai
l
box)
,
t
hen
we
see
t
hat
quer
y
i
s
t
he
dual
of
machi
ne
l
ear
ni
ng.
Gi
ven
a
f
unct
i
on
of
t
ype
(Email ),
a
quer
y
pr
oduces
a
col
l
ect
i
on
of
pai
r
s
of
t
ype
[(Email,)]
.
Si
mi
l
ar
l
y,
when
we
want
t
o
gr
oup
a
set
of
val
ues,
we
of
t
en
don
t
know
exact
l
y
how
t
o
def
i
ne
t
he
par
t
i
t
i
on
f
unct
i
on
GT K
ei
t
her
,
but
we
ar
e
abl
e
t
o
i
dent
i
f
y
smal
l
gr
oups
of
mean
val
ues
t
hat
ar
e
r
epr
esent
at
i
ve
of
t
he
gr
oups
we
want
t
o
const
r
uct
r
epr
esent
ed
as
a
k
t
upl
e
of
col
l
ec
t
i
ons
(
[T],,[T])
.
Gi
ven
suc
h
an
k
t
upl
e,
we
can
use
t
he
kmeansclustering
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
K
means_cl
ust
er
i
ng]
al
gor
i
t
hm
t
o
cr
eat
e
a
gr
oupi
ng
f
unct
i
on
GT .
For
exampl
e,
a
soci
al
net
wor
k
l
i
ke
Googl
e+
or
Facebook
can
cl
ust
er
your
exi
st
i
ng
f
r
i
ends
(
and
r
ecommend
new
f
r
i
ends)
i
nt
o
ci
r
cl
es
or
cat
egor
i
es
based
on
an
i
ni
t
i
al
cl
assi
f
i
cat
i
on
i
nt
o
k
gr
oups.
Not
e
t
hat
t
ype
([T],,[
T])
of
k
t
upl
es
of
col
l
ect
i
ons
of
t
ype
T
i
s
i
somor
phi
c
t
o
t
he
t
ype
[(T,)]
of
col
l
ect
i
ons
of
set
s
of
pai
r
s
of
val
ues
of
t
ype
T
and
i
nt
eger
s,
and
hence
i
t
pr
eci
sel
y
f
i
t
s
t
he
pat
t
er
n
synt
hesi
zi
ng
a
f
unct
i
on
T
f
r
om
a
set
of
exampl
e
pai
r
s
[(T,)].
On
t
he
quer
y
si
de,
gr
oupi
ng
t
akes
a
f
unct
i
on
(
T )
and
pr
oduces
a
nest
ed
col
l
ect
i
on
of
t
ype
[([T],)]
,
whi
ch
i
s
i
somor
phi
c
t
o
[(T,)]assumi
ng
each
gr
oup
has
a
uni
que
key,
agai
n
t
he
dual
of
t
he
l
ear
ni
ng
al
gor
i
t
hm.
Si
nce
aut
omat
i
c
cl
ust
er
i
ng
i
s
based
on
t
he
si
mi
l
ar
i
t
y
of
val
ues,
a
common
t
r
i
ck
i
s
t
o
f
i
r
st
pr
oj
ect
T
t
o
a
anot
her
t
ype
K
on
whi
ch
you
t
hen
cl
ust
er
.
To
sor
t
a
col
l
ect
i
on
of
val
ues
of
t
ype
T,
we
need
a
measur
e
f
unct
i
on
d(T,T) .
But
of
t
en
pr
esent
ed
wi
t
h
t
wo
val
ues
of
t
ype
T,
we
can
t
el
l
t
he
di
st
ance
bet
ween
t
hem,
t
hat
i
s
we
can
pr
oduce
a
col
l
ect
i
on
of
pai
r
s
[((T,T),)]
t
hat
compar
es
i
ndi
vi
dual
val
ues
of
t
ype
T.
When
you
si
gn
up
f
or
Net
f
l
i
x,
i
t
wi
l
l
ask
you
a
f
ew
quest
i
ons
about
movi
es.
Your
answer
s
ef
f
ect
i
vel
y
seed
t
he
t
r
ai
ni
ng
set
f
or
t
hei
r
r
anki
ng
al
gor
i
t
hm
t
hat
wi
l
l
sugges
t
whi
ch
movi
es
you
wi
l
l
most
enj
oy.
Uns
ur
pr
i
si
ngl
y,
we
can
t
ur
n
t
hat
col
l
ect
i
on
i
nt
o
a
di
st
ance
f
unct
i
on
(T,T)
usi
ng
one
of
many
rankingalgorithms
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Lear
ni
ng_t
o_r
ank]
.
As
wi
t
h
gr
oupi
ng,
i
t
i
s
common
t
o
f
i
r
st
pr
oj
ect
T
i
nt
o
anot
her
t
ype
S
t
hat
i
s
easi
er
t
o
t
r
ai
n.
Based
on
your
act
ual
pi
cks
of
movi
es,
Net
f
l
i
x
wi
l
l
use
t
hi
s
i
nf
or
mat
i
on
t
o
i
ncr
ement
al
l
y
r
ef
i
ne
t
he
i
nf
er
r
ed
di
st
ance
f
unct
i
on.
Al
l
t
he
above
machi
ne
l
ear
ni
ng
al
gor
i
t
hms
ar
e
exampl
es
of
supervised
l
ear
ni
ng
met
hods
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Super
vi
s
ed_l
ear
ni
ng]
Cat
egor
y
t
heor
i
st
s
def
i
ne
a
"
(
par
t
i
al
)
f
unct
i
on"
of
t
ype
T R
as
"
a
subset
of
t
he
car
t
esi
an
pr
oduct
(T,R)
such
t
hat
no
el
ement
of
appear
mor
e
t
han
once
i
n
t
he
subset
.
"
(
we
l
l
say
mor
e
about
t
hi
s
r
est
r
i
ct
i
on
l
at
er
)
.
The
super
vi
sed
l
ear
ni
ng
t
r
ai
ni
ng
set
i
s
a
noi
sed
up
sampl
e
of
t
he
f
unct
i
on.
The
Machi
ne
Lear
ni
ng
al
gor
i
t
hm
us
es
heur
i
st
i
cs
t
o
i
nf
er
(
a
r
ul
e
f
or
)
t
he
f
unct
i
on
f
r
om
t
he
noi
sy
sampl
e.
Of
t
en
k
means
cl
ust
er
i
ng
i
s
not
consi
der
ed
as
a
super
vi
sed
l
ear
ni
ng
met
hod,
but
t
hat
r
ef
l
ect
s
t
he
case
wher
e
we
st
ar
t
wi
t
h
a
k
t
upl
e
wi
t
h
empt
y
col
l
ect
i
ons
of
seed
val
ues.
Sel
ect
or
f
unct
i
ons
do
not
f
i
t
i
nt
o
t
he
super
vi
sed
l
ear
ni
ng
model
t
hey
ar
e
t
he
pr
i
me
exampl
e
of
unsupervised
l
ear
ni
ng.
I
nst
ead
of
gi
vi
ng
a
t
r
ai
ni
ng
set
of
exampl
e
pai
r
s
[(S,R)],
we
f
i
x
t
he
r
esul
t
t
ype
R
and
i
nf
er
t
he
ent
i
t
y
ext
r
act
i
on
f
unct
i
on
f
r
om
S
t
o
R.
I
n
many
cases,
ent
i
t
y
ext
r
act
i
on
uses
domai
n
knowl
edge
and
nat
ur
al
l
anguage
pr
ocessi
ng
t
o
ext
r
act
phone
number
s,
zi
p
codes,
names,
et
c
f
r
om
unst
r
uct
ur
ed
t
ext
.
Fi
ndi
ng
aggr
egat
i
on
f
unct
i
ons
i
s
usual
l
y
not
a
pr
obl
em
and
such
f
unct
i
ons
ar
e
chosen
f
r
om
a
st
andar
d
set
l
i
ke
summat
i
on,
aver
age,
Wher
e
aggr
egat
i
on
becomes
i
nt
er
est
i
ng
i
s
when
deal
i
ng
wi
t
h
i
nf
i
ni
t
e
st
r
eams,
i
n
whi
ch
case
we
need
t
o
appl
y
appr
oxi
mat
e
st
r
eami
ng
al
gor
i
t
hms
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
St
r
eami
ng_al
gor
i
t
hm]
t
hat
r
un
i
n
const
ant
space.
For
exampl
e,
gi
ven
an
i
nf
i
ni
t
e,
or
ver
y
l
ar
ge,
st
r
eam,
we
cannot
exact
l
y
det
er
mi
ne
t
he
nmost
frequentlyoccurringitems
i
n
t
he
st
r
eam,
and
we
must
use
t
echni
ques
l
i
ke
t
he
space
savi
ng
al
gor
i
t
hm
t
o
comput
e
a
r
unni
ng
appr
oxi
mat
i
on.
Bl
oom
f
i
l
t
er
s,
count
mi
n
sket
ch,
and
"
sket
chi
ng"
i
n
gener
al
ar
e
t
er
ms
r
ef
er
r
i
ng
t
o
al
gor
i
t
hms
f
or
f
i
ndi
ng
appr
oxi
mat
e
answer
s
i
n
bi
g
dat
a.
But
t
hat
i
s
a
t
opi
c
out
si
de
t
he
s
c
ope
of
t
he
cur
r
ent
paper
.
The
above
exampl
es
show
t
hat
we
c
an
vi
ew
super
vi
sed
machi
ne
l
ear
ni
ng
as
co
quer
y
pr
ocessi
ng
t
hat
t
akes
as
i
nput
a
t
r
ai
ni
ng
set
[(T,R)]
of
pai
r
s
of
exampl
es,
and
t
ur
n
t
hi
s
i
nt
o
a
f
unct
i
on
of
t
ype
(T R).
Machi
ne
l
ear
ni
ng
t
hus
i
s
t
he
dual
of
quer
y.
Fr
om
a
dev
el
oper
s
per
s
pect
i
ve,
we
can
al
so
say
t
hat
machi
ne
l
ear
ni
ng
i
s
aut
omat
ed
Test
Dr
i
ven
Devel
opment
(
TDD)
,
wher
ei
n
a
human
devel
oper
gener
al
i
zes
f
r
om
a
smal
l
set
of
speci
f
i
c
exampl
es
t
o
a
f
ul
l
wor
ki
ng
pr
ogr
am.
Machi
ne
Lear
ni
ng
i
s
much
mor
e
exc
i
t
i
ng
t
han
quer
y.
Quer
y
speci
al
i
zes
f
r
om
a
l
ar
ge
set
of
dat
a
i
nt
o
a
smal
l
f
r
act
i
on
(
by
f
i
l
t
er
i
ng,
pr
oj
ect
i
on,
and
aggr
egat
i
on)
,
wher
eas
Machi
ne
Lear
ni
ng
gener
al
i
zes
f
r
om
a
smal
l
set
of
exampl
es
t
o
handl
e
a
l
ar
ge
set
of
i
nput
s.
Caveat Emptor
Many
peopl
e
wi
l
l
obj
ect
t
hat
we
have
pai
nt
ed
a
ver
y
nai
ve
pi
ct
ur
e
of
machi
ne
l
ear
ni
ng
and
t
hat
we
i
gnor
e
many
of
t
he
essent
i
al
el
ement
s
of
usi
ng
machi
ne
l
ear
ni
ng
such
as
cr
oss
val
i
dat
i
on,
.
.
.
That
i
s
a
t
ot
al
l
y
f
ai
r
cr
i
t
i
ci
sm,
however
,
i
nst
ead
of
poi
nt
i
ng
at
t
hose
i
s
sues
as
a
r
eason
t
hat
bl
ack
box
machi
ne
l
ear
ni
ng
won
t
wor
k,
we
shoul
d
t
ake
t
hem
i
n
t
he
spi
r
i
t
of
Hammi
ng
s
advi
ce
and
st
ar
t
wor
ki
ng
t
o
el
i
mi
nat
e
t
hese
obst
acl
es
t
o
adopt
i
on.
When
(
r
el
at
i
onal
)
dat
abases
wer
e
i
nt
r
oduced
i
n
t
he
1970
t
i
es,
usi
ng
t
hem
r
equi
r
ed
t
weaki
ng
a
l
ot
of
par
amet
er
s,
set
t
i
ng
up
i
ndexes
,
and
paper
s
about
dat
abases
wer
e
f
ul
l
of
gr
eek
symbol
s
and
har
d
t
o
under
st
and
mat
hemat
i
c
s
f
or
pr
act
i
t
i
oner
s.
Onl
y
speci
al
i
zed
DBAs
(
t
he
dat
a
sci
ent
i
st
s
of
t
he
1970
t
i
es)
coul
d
oper
at
e
a
dat
abase.
Si
nce
t
hen
however
,
t
he
dat
abase
has
been
f
ul
l
y
democr
at
i
zed,
and
t
oday,
any
t
oddl
er
can
i
nst
al
l
and
use
a
r
el
at
i
onal
dat
abase
such
as
SQLi
t
e
wi
t
hout
any
knowl
edge
of
mat
hemat
i
cal
t
heor
y
such
as
r
el
at
i
onal
al
gebr
a
or
i
mpl
ement
at
i
on
det
ai
l
s
l
i
ke
B
t
r
ees
and
i
ndexes.
The
r
ol
e
of
DBA
i
s
now
i
n
t
he
same
cl
ass
of
obsol
et
e
occupat
i
ons
as
bl
acksmi
t
h
and
newspaper
del
i
ver
y
boy.
For
Machi
ne
Lear
ni
ng
we
ar
e
not
t
her
e
yet
.
Especi
al
l
y
t
he
l
i
t
er
at
ur
e
on
Machi
ne
Lear
ni
ng
i
s
r
i
f
e
wi
t
h
obscur
e
t
er
mi
nol
ogy
and
har
d
mat
hemat
i
cs.
However
,
t
hi
ngs
ar
e
changi
ng
f
ast
.
Count
l
ess
hopef
ul
st
ar
t
ups
and
est
abl
i
shed
compani
es
l
i
ke
Googl
e
and
Mi
cr
osof
t
ar
e
of
f
er
i
ng
machi
ne
l
ear
ni
ng
as
a
ser
vi
ce
t
hat
br
i
ngs
t
he
power
of
t
ur
ni
ng
dat
a
i
nt
o
code
t
o
t
he
masses.
Wi
t
hi
n
a
f
ew
year
s
we
bel
i
eve
t
hat
usi
ng
machi
ne
l
ear
ni
ng
wi
l
l
be
as
easy
f
or
pr
act
i
t
i
oner
s
as
usi
ng
dat
abases
i
s
t
oday.
Just
l
i
ke
dat
abases
ar
e
sel
f
t
uni
ng,
machi
ne
l
ear
ni
ng
ser
vi
ces
and
API
s
wi
l
l
t
ake
car
e
of
al
l
t
he
gor
y
det
ai
l
s
t
hat
cur
r
ent
l
y
pr
event
mass
adopt
i
on
t
he
t
echnol
ogy
by
pr
act
i
t
i
oner
s.
We
shoul
d
al
so
not
f
or
get
t
hat
j
ust
l
i
k
e
quer
y
opt
i
mi
zat
i
on,
ever
y
Machi
ne
Lear
ni
ng
al
gor
i
t
hm
i
s
essent
i
al
l
y
a
hack
t
hat
uses
heur
i
st
i
cs
t
o
i
nf
er
t
he
best
possi
bl
e
f
unct
i
on
f
r
om
noi
sy
dat
a.
We
have
expl
ai
ned
machi
ne
l
ear
ni
ng
as
t
he
dual
of
pul
l
based
quer
yi
ng
of
per
si
st
ed
dat
a,
but
machi
ne
l
ear
ni
ng
r
eal
l
y
s
hi
nes
when
we
use
i
t
t
o
quer
y
st
r
eami
ng
dat
a.
Rel
at
i
onal
dat
abases
l
ever
age
t
he
f
act
t
hat
dat
a
i
s
f
ai
r
l
y
st
at
i
c
(
no
need
t
o
r
ebui
l
d
i
ndexes)
and
has
hi
ghl
y
pr
edi
ct
abl
e
st
at
i
st
i
cs
(
used
by
t
he
quer
y
opt
i
mi
zer
)
.
For
st
r
eami
ng
dat
a,
pat
t
er
ns
ar
e
const
ant
l
y
changi
ng
over
t
i
me
(
weat
her
or
seasons
change,
t
r
endi
ng
Twi
t
t
er
t
opi
cs
come
and
go,
.
.
.
)
,
so
new
quer
i
es
must
be
gener
at
ed
cont
i
nuousl
y
t
o
r
ef
l
ect
changes
i
n
char
act
er
i
st
i
cs
of
t
he
i
nput
s
t
r
eams.
St
r
eami
ng
dat
a
i
s
of
t
en
gener
at
ed
by
sensor
s,
so
i
t
al
r
eady
st
ar
t
s
out
as
an
appr
oxi
mat
i
on,
and
ar
e
used
i
n
non
cr
i
t
i
cal
si
t
uat
i
ons
we
ar
e
not
t
al
ki
ng
about
ai
r
pl
anes,
car
s,
cer
t
ai
n
medi
cal
equi
pment
,
and
nucl
ear
r
eact
or
s,
but
about
home
t
her
most
at
s,
per
sonal
heal
t
h
moni
t
or
i
ng
devi
ces,
Twi
t
t
er
and
Facebook
f
eeds,
Hence
i
t
i
s
not
suf
f
i
ci
ent
l
y
val
uabl
e
t
o
j
ust
i
f
y
expensi
ve
cl
eansi
ng,
nor
mal
i
zat
i
on,
and
cur
at
i
on
l
i
ke
mi
ssi
on
cr
i
t
i
cal
ent
er
pr
i
se
dat
a
i
n
SQL
dat
abases.
Last
l
y
,
t
he
r
esul
t
s
of
pr
ocessi
ng
st
r
eami
ng
dat
a
must
be
speci
al
i
zed
f
or
each
i
ndi
vi
dual
user
,
so
a
one
s
i
ze
f
i
t
s
al
l
appr
oach
or
f
i
x
ed
l
i
br
ar
y
of
st
andar
d
quer
i
es
(
cal
cul
at
e
net
sal
ar
y
and
t
axes
f
or
al
l
empl
oyees)
wi
l
l
not
wor
k
.
When
we
use
machi
ne
l
ear
ni
ng
i
n
t
he
cont
ext
of
st
r
eami
ng
dat
a
pr
ocessi
ng,
we
ar
e
l
ear
ni
ng
t
he
transitionfunction
f(S,T) (S,R)
of
a
Meal
y
aut
omat
on
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Meal
y_machi
ne]
,
or
mor
e
gener
al
l
y,
a
hi
dden
Mar
kov
model
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Hi
dden_Mar
kov_model
]
t
hat
al
l
ows
t
he
aut
omat
on
t
o
r
eact
on
cur
r
ent
event
s,
f
r
om
i
t
(
desi
r
ed)
behavi
or
[((S,T),(S,R))]
on
past
event
s:
I
n
t
he
next
sect
i
ons
we
wi
l
l
have
a
c
l
oser
l
ook
at
a
coupl
e
of
machi
ne
l
ear
ni
ng
ser
vi
ces
such
as
t
he
Googl
e
Pr
edi
ct
i
on
API
,
Bi
gML
and
Text
Razor
t
o
pr
ocess
st
r
eami
ng
dat
a
f
r
om
weat
her
pr
edi
ct
i
ons
and
Googl
e
c
al
endar
updat
es
t
hat
t
el
l
us
i
f
we
shoul
d
pl
ay
gol
f
or
not
.
Anot
her
i
nt
er
est
i
ng
appr
oach
i
s
t
o
use
genet
i
c
al
gor
i
t
hms
[
ht
t
p:
/
/
en.
wi
ki
pedi
a.
or
g/
wi
ki
/
Genet
i
c_al
gor
i
t
hm]
,
l
i
ke
Eur
eqa,
t
o
pr
oduce
a
(
synt
act
i
cal
)
mat
hemat
i
cal
expr
essi
on
t
hat
mi
mi
c
i
nput
dat
a,
but
but
t
hat
i
s
out
si
de
t
he
scope
of
t
hi
s
paper
.
The
Googl
e
Pr
edi
ct
i
on
API
i
s
a
ser
vi
c
e
t
hat
(
i
ncr
ement
al
l
y)
accept
s
t
r
ai
ni
ng
dat
a
[(T,R)]
i
n
t
he
f
or
m
of
CSV
f
i
l
es.
Each
r
ow
i
n
t
he
i
nput
f
i
l
e
r
epr
esent
s
an
exampl
e
pai
r
consi
st
i
ng
of
t
he
ser
i
al
i
zed
r
epr
esent
at
i
on
f
or
t
ype
T,
pl
us
an
answer
R,
t
hat
must
be
ei
t
her
cat
egor
i
cal
(
a
f
i
ni
t
e
set
r
epr
esent
ed
by
st
r
i
ngs)
or
numer
i
c.
Once
t
he
model
i
s
t
r
ai
ned,
t
he
pr
edi
ct
i
on
API
act
s
as
a
r
emot
e
i
mpl
ement
at
i
on
of
a
f
unct
i
on
of
t
ype
T R
t
hat
gi
ven
a
ser
i
al
i
zed
val
ue
of
t
ype
T
wi
l
l
pr
oduce
a
pr
edi
ct
i
on
of
t
ype
R.
The
Googl
e
Pr
edi
ct
i
on
API
does
not
speci
f
y
whi
ch
machi
ne
l
ear
ni
ng
al
gor
i
t
hms
ar
e
used
under
t
he
cover
s.
I
t
onl
y
al
l
ows
user
s
t
o
speci
f
y
t
he
t
ype
of
pr
edi
ct
or
as
cat
egor
i
cal
or
r
egr
essi
on,
or
i
nf
er
s
t
hi
s
f
r
om
t
he
t
ype
of
t
he
answer
s
i
n
t
he
t
r
ai
ni
ng
dat
a
(
when
t
he
answer
t
ype
i
s
numer
i
c,
i
t
wi
l
l
use
r
egr
essi
on)
.
To
t
r
ai
n
our
model
,
we
f
i
r
st
pull
hi
st
or
i
cal
dat
a
f
r
om
Googl
e
Cal
endar
of
t
ype
Iterable[Event]
and
t
he
NOAA
hi
st
or
i
cal
weat
her
dat
a
t
o
obt
ai
n
a
l
i
st
of
exampl
e
pai
r
s
of
weat
her
i
nf
or
mat
i
on
and
a
bool
ean
i
ndi
cat
i
ng
whet
her
we
cancel
l
ed
our
gol
f
appoi
nt
ment
f
or
t
hat
day
or
not
.
I
n
pseudo
code
(
t
he
act
ual
code
you
need
t
o
wr
i
t
e
i
s
cl
ut
t
er
ed
by
non
essent
i
al
OAut
h
aut
hent
i
cat
i
on
t
o
connect
t
o
t
he
Googl
e
API
)
t
hi
s
l
ooks
somet
hi
ng
l
i
ke
t
hi
s:
valtrainingData:List[{va
lweather:Weather;valcancelled:Boolean}]=
myCalendar.events(start
=,end=).filter(e=>
isA
boutGolf(e.description)
)
.map(e=>new{
valweather=noaa.l
ookup(e.location,e.start),
valcancelled=e.st
atus==cancelled
}).toList
The
f
unct
i
on
isAboutGolf
i
s
def
i
ned
bel
ow
usi
ng
ent
i
t
y
ext
r
act
i
on,
and
i
s
an
exampl
e
of
unsuper
vi
s
ed
machi
ne
l
ear
ni
ng.
Gi
ven
t
hi
s
t
r
ai
ni
ng
dat
a
i
n
t
he
f
or
m
of
a
l
i
st
of
exampl
e
i
nput
/
out
put
pai
r
s,
we
can
submi
t
i
t
t
o
t
he
Googl
e
pr
edi
ct
i
on
API
t
o
get
back
a
f
unct
i
on
f
r
om
Weather
t
o
Boolean:
valplay:Weather=>Boolean=createModel(trainingData)
The
act
ual
pr
edi
ct
i
on
API
obvi
ousl
y
r
equi
r
es
mor
e
el
abor
at
e
code
t
han
t
hi
s,
but
,
i
n
essence,
t
hat
i
s
pr
eci
sel
y
what
i
t
pr
ovi
des:
a
way
t
o
t
ur
n
dat
a
i
nt
o
code.
Gi
ven
t
he
model
,
we
swi
t
ch
t
o
r
eact
i
ng
t
o
r
eal
t
i
me
weat
her
updat
es
of
t
ype
Observabl
e[Forecast],
f
r
om
whi
ch
we
gener
at
e
a
st
r
eam
of
event
s
t
hat
we
must
cancel
on
our
cal
endar
:
valshouldCancel:Observab
le[Event]=
weatherUpdate
s.flatMap(forecast=>
myCalendar
.events(date=forecast.date)
.filter(e=
>isAboutGolf(e.description)&&!play(forecast))
)
sho
uldCancel.subscribe(e=
>e.status=cancelled)
The
bi
g
quest
i
on
i
s
whet
her
we
shoul
d
aut
omat
i
cal
l
y
cancel
our
gol
f
appoi
nt
ment
s
based
on
t
he
gener
at
ed
model
.
I
n
ot
her
wor
ds
,
how
sur
e
can
we
be
t
hat
t
he
model
gi
ves
us
t
he
cor
r
ect
answer
?
For
cat
egor
i
cal
model
s,
t
he
Googl
e
pr
edi
ct
i
on
API
(
and
i
n
f
act
t
he
ot
her
API
s
we
i
l
l
ust
r
at
e
bel
ow)
r
et
ur
ns,
i
n
addi
t
i
on
t
o
t
he
most
l
i
kel
y
answer
,
a
sor
t
ed
l
i
st
of
al
t
er
nat
i
ve
answer
s
i
n
decr
easi
ng
pr
obabi
l
i
t
y.
Sl
i
ght
l
y
si
mpl
i
f
i
ed,
t
he
model
i
s
a
r
epr
esent
at
i
on
of
a
f
unct
i
on
of
t
y
pe
T [R]
t
hat
gi
ve
a
val
ue
of
t
ype
T,
r
et
ur
ns
a
di
scr
et
e
pr
obabi
l
i
t
y
di
st
r
i
but
i
on
f
or
t
he
poss
i
bl
e
answer
s
i
n
R.
I
n
some
sense
t
hi
s
i
s
a
mor
e
r
eal
i
st
i
c
r
ef
l
ect
i
on
of
what
i
s
happeni
ng:
"
Mac
hi
ne
Lear
ni
ng
i
s
statistical
f
unct
i
on
i
nver
si
on"
.
The
t
r
ai
ni
ng
dat
a
can
cont
ai
n
mul
t
i
pl
e
sampl
e
out
put
val
ues
f
or
t
he
same
i
nput
val
ue,
i
.
e.
t
he
t
r
ai
ni
ng
dat
a
[(T,R)]
i
s
ef
f
ect
i
vel
y
of
t
he
f
or
m
[(T,[R])]
,
and
hence
we
expect
t
he
synt
hesi
zed
model
t
o
a
non
det
er
mi
ni
st
i
c
f
unct
i
on
of
t
ype
T [R]
.
For
dat
abases,
i
t
i
s
ver
y
common
t
o
have
a
quer
y
of
t
he
f
or
m
T [R],
whi
ch
i
s
t
hen
f
l
at
mapped
over
t
he
dat
a
t
o
yi
el
d
a
r
esul
t
of
t
ype
[R],
or
usi
ng
t
upl
i
ng
i
nt
o
a
r
esul
t
set
of
t
ype
[(T,R)].
Learningbycounting:BayesianClassification
The
condi
t
i
onal
pr
obabi
l
i
t
y
(R|T)
of
R
gi
ven
T,
i
s
r
eal
l
y
a
di
f
f
er
ent
not
at
i
on
f
or
a
monadi
c
f
unct
i
on
of
t
ype
T (R)
t
hat
maps
i
nput
s
T
t
oprobabilitymassfunctions
(R)
f
or
t
he
out
put
Nai
vel
y,
we
can
r
epr
esent
pr
obabi
l
i
t
y
mass
f
unct
i
ons
as
(
sor
t
ed)
col
l
ect
i
ons(R,)
t
hat
associ
at
e
a
r
eal
val
ued
pr
obabi
l
i
t
y
wi
t
h
each
val
ue
i
n
R.
For
pr
act
i
cal
appl
i
cat
i
ons,
we
need
a
mor
e
comput
at
i
onal
l
y
ef
f
i
ci
ent
r
epr
esent
at
i
on,
f
or
i
nst
ance
usi
ng
sampl
i
ng
f
unct
i
ons
[
ht
t
p:
/
/
www.
kei
t
hschwar
z.
com/
dar
t
s
di
ce
coi
ns/
]
,
but
,
f
or
now
t
he
concr
et
e
r
epr
esent
at
i
on
usi
ng
l
i
st
s
of
pai
r
s
wi
l
l
be
much
mor
e
i
l
l
ust
r
at
i
ve.
We
wi
l
l
use
speci
al
l
i
st
br
acket
s
t
o
i
ndi
c
at
e
t
hat
t
he
sum
of
t
he
pr
obabi
l
i
t
i
es
i
s
nor
mal
i
zed
t
o
1.
Gi
ven
a
pr
obabi
l
i
t
y
mas
s
f
unct
i
on
pt(T)
and
a
condi
t
i
onal
pr
obabi
l
i
t
y
f(R|T)=T (R)
,
we
can
use
monadi
c
bi
nd,
or
flatMap
as
devel
oper
s
cal
l
i
t
,
t
o
cal
cul
at
e
a
post
er
i
or
pr
obabi
l
i
t
y
mass
f
unct
i
on
of
t
ype
(R),
or
usi
ng
t
upl
i
ng
(TR)
,
as
f
ol
l
ows:
pt.flatMap(f,id)= ((t,r),p
pr)|(t,pt) pt,(r,pr) f(t).
BayesLaw
t
el
l
s
us
t*
-1
t
hat
gi
ven
an
f(R|T)
and
a
pr
i
or
pt(T),
t
he
i
nver
se
of
f
i
s
equi
val
ent
t
o
f R (T)
=r (
r,p)|((t,
r),p) pt.flatMap(f,id),r=r.
Fr
om
a
t
r
ai
ni
ng
set
training[(T,R
)]
we
can
cal
cul
at
e
pt(
T)
and
f(R|T)
usi
ng
(
si
mpl
e)
count
i
ng
namel
y
pt=(t,#[t|(t,r) training,t=t])|tT,
and f(t)
=(r,#[r
|(t,r) training,t=t,r=r])|rR
.
I
t
coul
d
happen
t
hat
t
he
t
r
ai
ni
ng
dat
a
does
not
cont
ai
n
sampl
es
f
or
ever
y
tT,
but
we
can
al
ways
pretend
t
he
t
r
ai
ni
ng
dat
a
i
s
i
ni
t
i
al
i
zed
wi
t
h
(t,)
f
or
each
val
ue
tT
wher
e
mat
ches
any
val
ue
i
n
i
n
R,
and
hence
comput
e
pt=(t,#[t|(t,r) training,t=t])+1|tT
and
f(t)=(r,#[r
|
(t,r) training,t=t,r=r]+1)|rR.Smoot
hi
ng
t
he
count
s
l
i
ke
t
hi
s
(
ar
t
i
f
i
ci
al
l
y)
makes
ev
er
yt
hi
ng
t
ot
al
and
pr
event
s
pr
obabi
l
i
t
i
es
f
or
unknown
val
ues
t
o
become
zer
o.
I
n
t
he
r
eal
wor
l
d,
peopl
e
happi
l
y
use
appr
oxi
mat
e
answer
s
t
o
wat
ch
movi
es,
buy
books,
f
i
nd
par
t
ner
s,
et
c.
wi
t
hout
any
pr
obl
ems.
But
pr
ogr
ammi
ng
l
anguages
ar
e
st
uck
i
n
t
he
1950
s
and
demand
exact
answer
s
l
i
ke
t
r
ue
and
f
al
se.
We
shoul
d
embr
ace
comput
i
ng
wi
t
h
uncer
t
ai
nt
y
i
nst
ead
of
expect
i
ng
exact
answer
s.
I
n
f
act
,
f
l
oat
i
ng
poi
nt
number
s
ar
e
a
pr
i
me
exampl
e
of
comput
i
ng
wi
t
h
appr
oxi
mat
i
ons
most
of
t
he
t
i
me
we
don
t
even
t
hi
nk
about
.
The
devel
opment
of
novel
pr
obabi
l
i
st
i
c
pr
ogr
ammi
ng
l
anguages
shoul
d
be
an
i
mpor
t
ant
f
ocus
f
or
pr
ogr
ammi
ng
l
anguage
r
esear
cher
s
i
nst
ead
of
ar
gui
ng
about
t
r
i
vi
al
i
t
i
es
such
as
st
at
i
c
ver
sus
dynami
c
t
y
pes.
I
n
t
he
exampl
e
code
above
we
used
a
yet
undef
i
ned
f
unct
i
on
isAboutGolf:String=>
Boolean.
To
i
mpl
ement
t
hi
s
f
unct
i
on
we
wi
l
l
use
t
he
t
opi
c
t
aggi
ng
f
eat
ur
e
of
t
he
Text
Razor
API
,
whi
ch,
gi
ven
a
t
ext
st
r
i
ng,
cr
eat
es
a
l
i
st
of
t
opi
cs
t
hat
i
t
i
nf
er
s
t
he
t
ext
i
s
about
.
isA
boutGolf:String=>Boo
lean=input=>
textRazor.to
pics(input).contains(golf)
Agai
n,
t
he
act
ual
code
i
s
a
l
i
t
t
l
e
mor
e
i
nvol
ved,
but
,
i
n
essence,
i
t
i
s
as
si
mpl
e
as
t
hi
s.
I
f
we
submi
t
t
he
t
ext
HeyErik,fancyafewroundsofgolfnextweek?
,
Text
Razor
wi
l
l
i
dent
i
f
y
t
he
wor
d
golf
as
r
ef
er
i
ng
t
o
ht
t
p:
/
/
www.
f
r
eebase.
com/
m/
037hz.
Clustering: BigML
Fr
om
a
dev
el
oper
per
spect
i
ve,
t
he
Bi
gML
machi
ne
l
ear
ni
ng
API
i
s
ver
y
si
mi
l
ar
t
o
t
he
Googl
e
Pr
edi
ct
i
on
API
you
submi
t
t
r
ai
ni
ng
dat
a
as
a
CSV
f
i
l
e,
t
r
ai
n
t
he
model
,
and
t
hen
quer
y
t
he
model
by
s
endi
ng
i
t
i
nput
s
t
o
r
ecei
ve
out
put
s.
Wher
e
Bi
gML
di
f
f
er
s
i
s
t
hat
i
t
al
so
pr
ovi
des
a
downl
oadabl
e
r
epr
esent
at
i
on
of
t
he
t
r
ai
ned
model
i
n
var
i
ous
f
or
ms,
i
ncl
udi
ng
as
sour
ce
code
i
n
var
i
ous
l
anguages.
Al
s
o
Bi
gML
of
f
er
s
mor
e
choi
ce
of
l
ear
ni
ng
al
gor
i
t
hms
i
ncl
udi
ng
cl
ust
er
i
ng.
[
[
TODO:
Code
sampl
e
f
or
ht
t
ps:
/
/
bi
gml
.
com/
devel
oper
s/
cl
ust
er
s]
]
Conclusion
Sooner
t
han
we
expect
,
we
wi
l
l
al
l
be
f
ami
l
i
ar
wi
t
h
t
he
concept
of
machi
ne
l
ear
ner
s.
For
most
devel
oper
s
,
a
machi
ne
l
ear
ner
wi
l
l
be
a
magi
c
bl
ack
box
t
o
whi
ch
t
hey
can
send
t
r
ai
ni
ng
dat
a
and
t
hat
as
a
r
esponse
wi
l
l
r
et
ur
n
a
f
unct
i
on
(
model
)
synt
hesi
zed
f
r
om
t
he
t
r
ai
ni
ng
dat
a,
based
on
t
he
al
gor
i
t
hms
hi
dden
i
nsi
de
t
he
machi
ne
l
ear
ner
.
Just
as
t
he
democr
at
i
zat
i
on
of
dat
abases
i
n
past
decades
has
aut
omat
ed
and
hi
dden
al
l
compl
exi
t
y
behi
nd
easy
t
o
use
i
nt
er
f
aces,
t
he
same
i
s
happeni
ng
f
or
machi
ne
l
ear
ner
s,
except
at
a
much
mor
e
mor
e
r
api
d
pace.
Just
as
no
pr
act
i
t
i
oner
i
n
a
sane
st
at
e
of
mi
nd
wi
l
l
i
mpl
ement
t
hei
r
own
dat
abase
(
r
at
her
l
eavi
ng
t
hi
s
t
o
speci
al
i
zed
compani
es)
,
no
one
wi
l
l
i
mpl
ement
t
hei
r
own
machi
ne
l
ear
ni
ng
al
gor
i
t
hms
ei
t
her
(
but
l
eave
t
hi
s
t
o
speci
al
i
zed
compani
es)
.
Easy
access
t
o
machi
ne
l
ear
ner
s
wi
l
l
enabl
e
a
new
wance
of
smar
t
appl
i
cat
i
ons
t
hat
ar
e
hi
ghl
y
speci
al
i
zed
f
or
each
user
,
especi
al
l
y
appl
i
cat
i
ons
t
hat
wor
k
over
st
r
eami
ng
and
(
human)
r
eal
t
i
me
dat
a.
The
f
ut
ur
e
of
bi
g
dat
a
i
s
Machi
ne
Lear
ni
ng.