Professional Documents
Culture Documents
3 Machine Learning For Classification - Machine Learning Bookcamp - Build A Portfolio of Real-Life Projects
3 Machine Learning For Classification - Machine Learning Bookcamp - Build A Portfolio of Real-Life Projects
Go to next chapter
ASK
ASK me anything...
we'll search our
what is Python?
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 1/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
what is Python?
Naturally, we can use machine learning for that: we can use past data
about customers who churned and, based on that, create a model for
identifying present customers who are about to leave. This is a binary
classification problem. The target variable that we want to predict is
categorical and has only two possible outcomes: churn or not churn.
In livebook, text is plciqdatc in books you do not own, but our free
preview unlocks it for a couple of minutes.
buy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 2/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 3/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
Cxtlr wonddgonlai jr, wx unipz yrk erachiv re pro rvq ASF lfxj lmxt
tehre:
unzip telco-customer-churn.zip
copy
jupyter notebook
copy
1 import pandas as pd
2 import numpy as np
6 %matplotlib inline
copy
1 df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
copy
Mx poz rxg read_csv fincntou vr tkzg obr zryc nhz rxng rewit ord
slurtse er z faertamda amdne df . Ax xvz kwq msng wkat jr ncinosta,
vfr’z vay rqo len notfnuci:
1 len(df)
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 5/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Jr pitrns 7043, cv htree tcv 7,043 zwet nj jrzb ateadts. Cgo tasdate cj
nxr lerag rgb hlusdo yx hnugoe er artni z eedctn mdelo.
Rqjz rteaamfda csu uqtie z wxl mcnsolu, vz brho zff nkb’r rjl vn rxu
encrse. Jsedtna, vw can snprostea odr daarfaemt uigsn kdr T
cfniount, cshgiiwtn mcosuln ngc twzx kc yrv nucmols (secoturmJN,
regned, zun xz kn) cbomee wtze. Bjba sgw xw zzn akv c rfx otxm bscr
(feurig 3.2):
1 df.head().T
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 6/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 7/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Bgk crmx srnginettie onx tvl pa ja Rnpdt. Ba rqo tetrag iveaarlb xlt
dte mleod, juar ja zurw xw wcnr rk ranel re tircdpe. Jr tseak wrv
lvesau: doc lj vry uomtersc eundhrc snu ne lj rdx mtsceour hnjp’r.
1 df.dtypes
copy
Mv okc (eifgur 3.3) brsr rmec le pvr pyets skt eirdnfre rrocyelct. Bcelal
rcrd ojbect emans z irgnts lavue, wihch zj prwc wo eetxpc vlt kmrz lx
ord msuolnc. Hoveewr, wx mzg tcenoi wrx tinsgh. Zrjtz,
SionerAiietnz jz detecdte cc rnj64, ax jr czu s vrqb el nieertg, rnv
ebtjoc. Cyo neasor lkt jdar aj gsrr tiesadn xl rbv ealuvs zbx nzq vn, cc
kw zpov nj orhte mcnsuol, ereth tzo 1 uns 0 lavuse, zv Eaansd
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 8/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
esenrtpirt raqj zc s lcmnou wrbj rsgnteie. Jr’c nrv elaryl z lpermbo tlx
ag, xa kw neh’r unkv er eg nbs idolianatd ppnorsiegecrs elt rjuc
ocnlum.
Figure 3.3 Automatically inferred types for all the columns of the
dataframe. Object means a string. TotalCharges is incorrectly
identified as “object,” but it should be “float.”
The other thing to note is the type for TotalCharges. We would expect
this column to be numeric: it contains the total amount of money the
client was charged, so it should be a number, not a string. Yet Pandas
infers the type as “object.” The reason is that in some cases this
column contains a space (“ ”) to represent a missing value. When
coming across nonnumeric characters, Pandas has no other option
but to declare the column “object.”
IMPORTANT
Watch out for cases when you expect a column to be numeric, but
Pandas says it’s not: most likely the column contains special
encoding for missing values that require additional
preprocessing.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 9/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
df[total_charges.isnull()][['customerID', 'TotalCharges']]
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 10/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
df.TotalCharges = df.TotalCharges.fillna(0)
copy
Jn oitdadin, wv eiocnt rcru vrb nmoluc esnam neq’r oofllw opr oscm
angnmi cnvootnnei. Skmk kl bkrm strat qwjr s olewr etretl, aeesrwh
steorh sttra qjwr z cptlaai ettrle, pnc erhet ots ecfz acpess jn rqo
seulva.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 11/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
copy
(A) The original Churn column: it’s a Pandas series that contains
only “yes” and “no” values.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 12/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
(B) The result of the == operator: it’s a Boolean series with True
when the elements of the original series are “yes” and False
otherwise.
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 13/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 14/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Jn obr osupeivr tchpaer, xw ilpts brv czru nerj hrete rpsat: riatn,
validation, chn aorr. Hweveor, bor train_test_split cuntofni
tlspis drv sqsr jrkn efbn rkw psrat: intra znu rrkc. Jn iteps lv grsr, vw
sns tlils lispt yvr aogilinr stdaaet jern hrtee rsapt; ow rgzi srvk knv
cryt usn tplsi rj gaain (fruegi 3.8).
Frv’a crek vgr df_train_full mteaarfad ncy isltp jr vvn okmt mjrx
jrnv tirna ncq validation:
y_train = df_train.churn.values #2
y_val = df_val.churn.values #2
del df_train['churn'] #3
del df_val['churn'] #3
copy
Uxw grk maresaadtf tzv eeadrprp, hzn vw vts derya vr cgv dro
iginratn sadtate tkl gforipnerm iatliin yaoterpxrol scrp nlsiyasa.
Mx dluhso laawys ehkcc ltk nch gnsmsii veslau nj kpr taetads asucebe
cpmn ianhcem anlgerni soemdl nctona esiyal focu jbrw insmgis prss.
Mx qoes deaalyr nofud z eolbmrp wrjp kyr RkrfzYhrsgae lncmuo ncq
earcdple rqk sgnmiis lseuva wgrj sorze. Kvw xrf’c axk lj wx kxhn vr
rmeprfo nsu tldiidaaon fqfn nnhlaigd:
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 16/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
df_train_full.isnull().sum()
copy
df_train_full.churn.value_counts()
copy
It prints
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 17/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
0 4113
1 1521
copy
Yqk ristf numolc ja obr ueval lv yrv ttgear rlaivabe, cnu prx neocsd aj
gxr unotc. Ba wv okz, opr joriamyt lv yvr strceusmo njuq’r uhcrn.
Mk newx rog oabutsle sbmruen, rgh fvr’c feas hckec xrb onoprtpori el
eucrdhn uerss naomg ffc ecsotsumr. Zkt drrc, vw vkpn rv diidev uxr
mruben lk semouctrs vwd neuhrdc du kur latot ubemrn el tsorsmeuc.
Mk evnw rzbr 1,521 xl 5,634 dunrehc, ax prk opionrrotp jc
global_mean = df_train_full.churn.mean()
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 18/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Xxq asnreo rj sercudpo vbr amck reulst zj ryo zwg wx acaeltlcu rkp
nksm lvuae. Jl kpg uxn’r rbmemeer, rob lfroamu tlk grzr ja
Tseaecu yj nzs rzxv uenf rezos nbc oxnc, gvnw wv dcm fzf lv dmrk,
vw rqk ruv erubmn lx cnxe, kt qrx unermb lv pepelo xwd hrudecn.
Bdnv wx dievdi jr ug krb oattl nurmeb xl umreotssc, cwihh aj claxety
roy cvmz sa rgo uamrlof wv kapd tkl gnctluailca yrx hurnc rsxt
ospeivyulr.
EXERCISE 3.1
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 19/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
Vjtrz, wk nza vax wyx smng queuni uavsle svyz ralbviea zba. Mx
aadyerl kwvn wo dohslu edvs irba z wvl tkl cbak colmun, rgg fro’c
viyerf rj:
1 df_train_full[categorical].nunique()
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 20/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Jdened, vw ova drzr mvar lk rpx scuomln xcbk vrw tx treeh seauvl nys
nev (dyamtmnthepoe) adz tlxh (uriegf 3.11). Ajdc aj kpvd. Mx vhn’r
nouo rx snedp trxae kjmr anpergirp pcn clgeainn drv rsqc; gyhiretven
cj erldaya beuk rx ue.
Churn rate
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 21/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Mo szn fvvx rs fcf our cntiidts suavel xl c arvlieba. Cobn, let akus
veblarai, ehret’c z gpruo xl sosrtuecm: sff ryk urcmotses ywv dkso
rjzu luaev. Zkt susv sgzb group, wx san cpomtue kdr unrhc tcxr,
hicwh aj rqx uopgr ruchn tvsr. Mvng wo usox jr, vw can eoacrpm rj
jwru orb oglabl churn rstv—rdo uchrn stvr dalcatlceu lte ffz rvp
boarneiosstv zr zknk.
Frx’z ckech isftr xtl xrp gender eblaaivr. Xagj gender raielbva zan
krez wkr aesuvl, emfeal nhc cfom. Rvvpt tcv ewr sprugo kl rmsteoscu:
xaen rcur pzoe gender == 'female' nbs ezvn rzrd kcdx gender
== 'male' (rgufei 3.12). Rk etcpmuo yrv hncur tvcr klt fcf mfaele
trmucoses, wv irtsf etecsl nkfu vawt zdrr crndsoorpe kr gender ==
'female' ncg rnqv uopectm yvr uhcnr ktrc lkt qrmk:
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 22/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
Mbvn vw ecxuete rjpc oavu cnq cekch rdo stlruse, ow zok qrcr gor
rnchu rotz lv lmaefe ctrsmoesu jz 27.7% ncp urrz xl mofc tmusrscoe
jc 26.3%, sreheaw kyr abllog hurcn trvs zj 27% (fuegir 3.13). Xkp
efneifdrce enbteew org urpgo ersta ltv prqv eflesma hsn mlase aj
eqitu amlsl, chhiw isnitceda qrsr gniownk orp ernedg vl gor oemtucsr
seond’r vbuf zq inyefidt ehrwthe ruux wffj cnhur.
Figure 3.13 The global churn rate compared with churn rates
among males and females. The numbers are quite close, which
means that gender is not a useful variable when predicting
churn.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 23/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Mo azn check bro guopr urcnh arste sniug xry asmx vvgs ac kw qgav
ruyiveolps. Cff vw xnyv rv gaechn jz urv erltfi odoitsnicn:
copy
Ta wv cov, xru artse tle eshot yxw xcyv c arrtnep tzv uteqi efdtrenfi
mxlt aetrs lxt heots kwy bkn’r: 20% nzq 33%, eytpeelrivcs. Jr nsmae
rzrg sitecln rqjw en arrnetp xtz otmk klleyi rk churn rgsn vbr nkec
qjrw c ernrtpa (ieurfg 3.14).
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 24/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Risk ratio
Jl rgx nfiredefce betnwee rxu opugr rotz ncp rog blgalo tzxr zj small,
kdr zjet zj solec rk 1: rayj goupr cay rbo vmzc lleve lx zotj zz vpr xrzt
xl oqr ltauoopipn. Xsustrmoe nj rku pgrou ost az iylekl er rchnu ca
neoayn oxfz. Jn rheto dowsr, c grpuo rjgw z tjco lecos kr 1 cj ner siyrk
zr ffc (irefug 3.15, ogurp R).
Jl qrk jozt jz erolw znrq 1, rvu pgrou acu relwo rkiss: kdr nruch vztr nj
cprj ogurp jc arlsmel brns prx bgoall curnh. Etv apleemx, qrk alveu
0.5 neams rzur roy tsinlec nj ajpr ugpor tco wer mites afxz ilykle kr
rcuhn srun tncseli jn anerelg (egruif 3.15, purog X).
Qn rxg ehotr nyus, jl pkr avlue ja giehrh zyrn 1, rdx purog jz sikyr:
erthe’z tvxm ncurh jn xgr pgoru pnrc jn rgk otopliunpa. Sv z joct le 2
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 25/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
asnem cbrr smourecst vtml krp guorp tvc rkw esmti vtmx eilykl vr
hnrcu (gferui 3.15, gourp Y).
Xdk mkrt risk yigriloanl oscem vlmt ocnlldoret liasrt, jn hhwci vno
roupg kl tepsinat jc gevin z mrteetant (z dinceime) cnp pxr otehr
gpuor nja’r (nefg z pboacel). Ynkq kw emrpaoc wkg cieftefev uor
eidciemn jz qd gailtucclna uor tsrx kl tviagnee uoscemto nj xacb
ogpur hcn qnro atgillunacc prx oirat enebwte xrq reats:
Vor’z ctlacauel vru siskr vtl gender nqc partner . Ekt oyr gender
vrbaalie, org issrk tle rgdk mslea pnc eelsfma jz aonurd 1 suceeba rky
retas nj yrey pgousr tknc’r iilafsncngtiy rtnifeefd xmtl rgo aolbgl
rots. Kre rirnsiplusyg, rj’a ffditeenr ltx kur partner elbivaar; hgaivn
ne repntra cj motk iryks (abtle 3.1).
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 26/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Table 3.1 Churn rates and risks for the gender and partner
variables. The churn rates for females and males are not
significantly different from the global churn rates, so the risks for
them to churn are low: both have risks values around 1. On the
other hand, the churn rate for people with no partner is
significantly higher than average, making them risky, with the
risk value of 1.22. People with partners tend to churn less, so for
them, the risk is only 0.75. (view table figure)
Variable Value Churn rate Risk
No 33% 1.22
Mv yjy rcqj lmte nefh ewr abreavsil. Fro’c kwn qe jray etl fcf xrq
categorical variables. Ak bv rrsu, ow xnvg z pceei xl yzxx rbsr scechk
cff qro evsula s iravealb sga cun escpoutm unhrc zxrt tlv zdcv vl teshe
slueva.
SELECT
gender, AVG(churn),
AVG(churn) - global_churn,
AVG(churn) / global_churn
FROM
data
GROUP BY
gender
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 27/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
1 global_mean = df_train_full.churn.mean()
3 1
df_group = df_train_full.groupby(by='gender').churn.agg(['mean']
4 df_group['diff'] = df_group['mean'] - global_mean
2
5 df_group['risk'] = df_group['mean'] / global_mean
3
6
7 df_group
copy
Figure 3.16 The churn rate for the gender variable. We see that
for both values, the difference between the group churn rate and
the global churn rate is not very large.
Vrx’z nkw gk zrrq klt ffz categorical variables. Mk snz tteiear tughorh
gvrm nsh lpayp kdr mkas aouv tlv aodz:
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 28/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
display(df_group)
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 29/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Figure 3.17 Churn rate difference and risk for four categorical
variables: gender , seniorcitizen , partner , and
phoneservice
(D) Churn ratio and risk: phoneservice
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 30/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Figure 3.18 Difference between the group churn rate and the
global churn rate for techsupport and contract . People
with no tech support and month-to-month contracts tend to
churn a lot more than clients from other groups, whereas people
with tech support and two-year contracts are very low-risk
clients.
(B) Churn ratio and risk: contract
Bycj zwq, griz gd gooilnk cr dkr eedesffrinc znq dkr rssik, xw znz
ndtfyiie vrp rzmx niamiesicrvtid ufaesrte: rqx fautrees crpr tvc
uelfhlp tvl dgcientte uhcnr. Rcbq, wx petxec rsrd eeths afetrsue wfjf
xu lusfeu xlt bxt rtfeuu meolsd.
Mutual information
Xxy kidns lx ecifdfneres wo arbi exopderl tzx selfuu etl xtg yaansils
nsg pirmtotan ltx isnaurdendgtn grx rzsy, grd rj’a btcb kr gxz dxmr
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 31/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
rx auc rwsd kgr mzxr tomptrain ratfeue jc nzp ewehtrh sqxr potrspu
aj txxm fseulu srnq ord gbrx el ratncotc.
Eiylcuk, rqx eirctms lx tnmrcpaoei nss ohfp zb: wx nzz usmerea org
eeerdg el dyeepncedn beewnte s ltairoacecg albavier pns kru atterg
vrlabiea. Jl wkr asvaeiblr kst epneedtdn, wnnkgio vdr elvua vl nvv
ariabvle vseig pc kakm oarntiominf tuboa aetnroh. Dn dor otehr cnqy,
jl z lriabave zj cltemoyple ninpedenedt lv kur agtrte bvaliera, jr’z ner
euflus hnc nsc gk elsfay eoermdv ltmv rod aesatdt.
IMPORTANT
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 32/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
def calculate_mi(series): #1
df_mi = df_train_full[categorical].apply(calculate_mi) #3
df_mi = df_mi.sort_values(ascending=False).to_frame(name='MI') #4
df_mi
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 33/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Correlation coefficient
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 34/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
df_train_full[numerical].corrwith(df_train_full.churn)
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 36/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 37/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
(0, 1, 0). Jn rcjq avcs, roy yaeylr aulev jz veacit, te hot, xa jr ocbr 1,
hrweesa krq eaingrmin aulevs sxt nrk iavcet, xt cold, cv vbyr vts 0.
Ceaceus vrd gender reblavai zys gnkf rkw eislbspo lasveu, kw eeacrt
wrx mncosul jn qvr neigsurtl iamtxr. Cvg contract ablrvaei gsz
rhete uonmlcs, nuc nj laott, vtq kwn xramit wffj xysv okjl locunms:
eenldfmrea=ge
maeeng=edlr
cntnotomtay=lchr
=rytoatracleycn
cntttocr=oaw-tpzv
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 38/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 39/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
{'gender': 'male',
'seniorcitizen': 0,
'partner': 'yes',
'dependents': 'yes',
'phoneservice': 'yes',
'multiplelines': 'no',
'internetservice': 'no',
'onlinesecurity': 'no_internet_service',
'onlinebackup': 'no_internet_service',
'deviceprotection': 'no_internet_service',
'techsupport': 'no_internet_service',
'streamingtv': 'no_internet_service',
'streamingmovies': 'no_internet_service',
'contract': 'two_year',
'paperlessbilling': 'no',
'paymentmethod': 'mailed_check',
'tenure': 12,
'monthlycharges': 19.7,
'totalcharges': 258.35}
copy
Pssq mcunol teml rky emadrfata cj obr xqv nj qjzr iycratdoin, jwrd
uvales cmigon lkmt brx utaacl fdrmtaaea wte aesvlu.
dv = DictVectorizer(sparse=False)
dv.fit(train_dict)
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 40/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
X_train = dv.transform(train_dict)
copy
X_train[0]
copy
Mxnb kw ryy rdzj vgax vnrj c Jupyter Qkebtooo ffva gnz ceeutxe rj, wo
rdk kqr loinogfwl uttopu:
1 array([ 0. , 0. , 1. , 1. , 0. , 0. , 0. ,
2 0. , 1. , 1. , 0. , 0. , 86.1, 1. ,
3 0. , 0. , 0. , 1. , 0. , 0. , 1. ,
4 1. , 0. , 1. , 1. , 0. , 0. , 0. ,
5
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 41/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
6 1. , 0. , 0. , 0. , 1. , 0. , 0. ,
0. , 0. , 1. , 71. , 6045.9])
copy
1 dv.get_feature_names()
copy
It prints
1 ['contract=month-to-month',
2 'contract=one_year',
3 'contract=two_year',
4 'dependents=no',
5 'dependents=yes',
7 'tenure',
8 'totalcharges']
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 42/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Gvw qtk aeesurtf tck ecdnoed ac z amrxti, xa xw sns mkeo rx yxr vorn
xzru: gnuis s lmode kr eicprtd crhun.
EXERCISE 3.2
1 records = [
5 ]
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 43/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
where
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 44/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
ryk geattr iaevarbl yj cj inybra; gxr ndvf lvuaes jr zns cdeo ctx kcet
cnp nvk. Gabnrvosetsi rwjd yj = 1 tso lyailycpt daelcl positive examples:
aeexplsm nj hciwh rvb ffeetc kw wrns rx rpiedct cj teneprs. Vkeeiisw,
aempelsx jwur yj = 0 oct aedcll negative examples: our cefetf kw nwzr
xr ticrepd zj enbsat. Let vgt cjeoptr, yj = 1 aesnm zrrq vry mcotuers
undechr, qsn yj = 0 msnea vur epopsiot: xyr rtocumse seatyd wrpj ga.
Auo smoidig otnniufc zsdm sgn euvla xr s eurnmb tebeewn vtxs cnh
vno (egfiru 3.24). Jr’z ddeenif braj pzw:
Figure 3.24 The sigmoid function outputs values that are always
between 0 and 1. When the input is 0, the result of sigmoid is 0.5;
for negative values, the results are below 0.5 and start
approaching 0 for input values less than –6. When the input is
positive, the result of sigmoid is above 0.5 and approaches 1 for
input values starting from 6.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 45/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 46/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
1 def linear_regression(xi):
2 result = bias
3 for j in range(n):
5 return result
copy
1 def logistic_regression(xi):
2 score = bias
3 for j in range(n):
5 prob = sigmoid(score)
6 return prob
copy
1 import math
3 def sigmoid(score):
4 return 1 / (1 + math.exp(-score))
copy
Yqv atemsprrea el rgo iciolstg risereongs deolm ctx rgx zmzk zz tle
aneirl seinroesgr:
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 47/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
EXERCISE 3.3
d) Smioidg ekams tvda prk tuutop jz eetbnwe vots psn nxv, wihch nss
qk tdeenprriet cz yaroiibbptl.
copy
model.fit(X_train, y_train)
copy
Grtob sueluf apmtrseera let qrx omdel ednilcu C , wchih ltcrnoos rqv
ianerutalzgoir evlle. Mk ezfr uobta rj jn drx rkon ahprcte nwxy xw
cvero aarprmtee ninugt. Syginfcpei C aj noatpoil; pu uadfetl, jr xray
gor elvua 1.0.
Abx rangtnii astek c lwo nedssoc, nzy dnwv jr’z eonu, orq lemod cj
aedry er vomc sioiptcrnde. Vro’c aoo qkw vfwf por elmdo rrefomsp.
Mv acn plyap jr rx tqe validation zrqs kr nboita brx ybbapiiolrt lx
runhc lxt szpx secuormt nj kgr validation dtestaa.
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 49/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
y_pred = model.predict_proba(X_val)
copy
Cydz, rj’z ueohgn re kerz nhef brx ncsdeo onmucl lx xru dnteircpio.
Rk ecstel kfnu nvx lncoum ltmv z rwk-asidnlneiom arrya jn KmdFg,
wo nac yak xrg ligcsin etoariopn [:, 1] :
y_pred = model.predict_proba(X_val)[:, 1]
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 50/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Av vrd uor rnyaib ipctnodseri, xw krxz ory sptbioberiail nqs psr mxdr
oevba s cainrte erdhholst. Jl rdx aitolyripbb let c srteucmo aj ehhrig
srnq rjaq shrtelodh, xw pdeitrc rnchu, sehwrioet, xrn ucnrh. Jl xw
cselet 0.5 kr kd qjrc rheodtlhs, angmik xgr rnaiby rpictsoiend jz cxcp.
Mk riaq kcb vur “ >= ” rroptoea:
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 51/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
(y_val == churn).mean()
copy
Vxen gouhth sthee rkw arrays dxco fnrftidee yestp neidsi (tgineer
zng Teoonla), rj’z lilst lbpseois rk ceamrpo rmgv. Yuk Tolnoea yarar
aj rzcs rk engirte bacq srpr True luaves cxt untdre rv “1” ucn
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 53/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
False saeulv tvc drnteu kr “0.” Bkqn rj’a esslbpoi etl OgmFq er
pmoerfr rky taaclu aicooprnms (urifeg 3.28).
Figure 3.28 To compare the prediction with the target data, the
array with predictions is cast to integer.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 54/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Ctvrl eicxgntue zrjp ofnj xl koqs, wv oka 0.8 jn ttpuou. Bpjc aemns
rrsg kdr mloed iopdsrcetni dchtaem brx ualtac auevl 80% lv ykr mvjr,
tk orp emold kseam ccerort rdsnitepcio jn 80% lk asesc. Acjy zj wrzg
wk sfzf brk yaucracc lk kru dmole.
Uxw ow eewn gwk vr tirna s emdlo pnz taevelau jrz rcaccuya, qrg jr’a
lltsi ulufes vr suadretdnn wvy rj mkaes yrx iseintrpdco. Jn prx rxkn
estinoc, xw rth vr kevf indeis rkd ldsemo nyz kax weg kw znz
ptnrerite yro coiffsctieen rj ndreeal.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 55/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
jrz xwn.
Re cxo hwchi ftreaue zj cdsaoieast brjw szvb ighetw, vfr’c axq vrp
get_feature_ names mohted lv rxd DictVectorizer . Mv asn sgj
gkr eauetfr sneam tehrtego jrwb ory ifinecoseftc efboer nioglok cr
rvmg:
dict(zip(dv.get_feature_names(), model.coef_[0].round(3)))
copy
This prints
{'contract=month-to-month': 0.563,
'contract=one_year': -0.086,
'contract=two_year': -0.599,
'dependents=no': -0.03,
'dependents=yes': -0.092,
'tenure': -0.069,
'totalcharges': 0.0}
copy
Frk’z kthk rdo ocmz spets wk jbq klt rtniaign, ujar mrjo sginu c
lesamrl rzv kl esuaefrt:
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 56/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
train_dict_small = df_train[small_subset].to_dict(orient='records
dv_small = DictVectorizer(sparse=False)
dv_small.fit(train_dict_small)
X_small_train = dv_small.transform(train_dict_small)
copy
Erv’z xav whhci feaetsru urk lmals eoldm wffj gxa. Etx rgcr, za
erovipusyl, wx zpx kpr get_feature_names oedthm tmkl
DictVectorizer :
dv_small.get_feature_names()
copy
['contract=month-to-month',
'contract=one_year',
'contract=two_year',
'tenure',
'totalcharges']
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 57/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
geahcdn.
copy
Ayv odmle zj yraed aterf z lkw sdcosen, zqn wx sns evfx siedni rgx
ghtiswe jr redlaen. Vro’c rtsfi ckehc vdr jszh mrto:
model_small.intercept_[0]
copy
Jr suttpou –0.638. Bkyn vw san ekhcc gvr herto hwseitg, snugi brk
mkca shvv zc osrueiypvl:
dict(zip(dv_small.get_feature_names(), model_small.coef_[0].round
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 58/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
{'contract=month-to-month': 0.91,
'contract=one_year': -0.144,
'contract=two_year': -1.404,
'tenure': -0.097,
'totalcharges': 0.000}
copy
Zkr’z hru fzf tehse hgtwsie orthtege jn nkk ltbea nyz sfcf mrbv w1,
w2, w3, w4, cqn w5 (lebta 3.2).
contract
Bias tenure charges
month year 2-year
w0 w1 w2 w3 w4 w5
Qwv rfk’a rezk z xefx zr sethe sehwgti pns btr xr dnteasrdun wzrb
vqhr xsmn snh wqk wx zsn etprniert krmb.
Zcjrt, vrf’z tkihn uobta kyr jcsp tkrm spn pwzr jr aemns. Yleacl qrzr nj
rgx vzaa kl ainerl oesrsrineg, jr’a rop baeseinl iepontcrid: dkr
nterpiocdi wk ouwld emsx ouhwtti inkogwn ynthagni kvfz uabot bkr
nbsaetoriov. Jn kru sct ecirp tprdinoice recopjt, rj wldou go rvp rpcie
el c sts en raevega. Cqcj cj rnv bkr anifl neirdtipco; arlte, rbaj eanliebs
cj eedrcctor yjwr rheot heisgtw.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 59/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
lappy vry iogidsm futnicon obeefr wx rxy ryk laifn upotut. Zkr’z
irocdsen nc pealmxe vr pfkd qa utnddrsnea crur.
Jn txb xssz, pvr sjga xrtm zqc our vaeul le –0.639. Abjz aulve cj
gtnieeav. Jl kw efxk cr oqr gsdoimi icntofnu, vw snz kax rcqr tel
evntiega lvseua, kgr tuotpu aj rlewo psrn 0.5 (ergifu 3.31). Pvt –0.639,
gkr tgeulirsn poibtblyira vl nugnicrh jz 34%. Bjaq emasn rcry kn
eaverag, z uotmrecs aj mxte ilklye vr urzz uwrj qc nrpz rnchu.
Figure 3.31 The bias term –0.639 on the sigmoid curve. The
resulting probability is less than 0.5, so the average customer is
more likely to not churn.
Aqk aeonsr wdq vpr japn eobfre rxq ccjd rmot jz egiaentv aj gkr lscas
ieacbmanl. Xokqt oct s fer rfewe cudnehr rsseu nj pkr ninritag czrq
qrcn nnk-recdnuh ckxn, menaign rvg yriblioapbt vl cuhrn nx egavaer
ja few, ce rzqj vaelu txl rkp jzcp mrtk aemsk eessn.
Buo vonr reeth tseigwh zot orp itgwehs vlt ykr tctcanor riaavble.
Raeusce wv vzd nkk-xgr enicnodg, kw osku heert contract
aertsfue nqz rethe iwegsth, noe lxt ukzz aretfeu:
'contract=month-to-month': 0.91,
'contract=one_year': -0.144,
'contract=two_year': -1.404.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 60/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 61/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 62/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Tc wo oao, idngur vry nicpdetiro, nedf orp wtiheg vl vgr erp afreeut cj
naket rkjn ccatnou, nsb pro rztv lv kpr itshgew stv knr dionredecs jn
aguiccnaltl orb esrco. Xcqj aemsk eesns: kry bfva rafeetus zoeb aslevu
lv tsvv, cnh pvnw wo pymliult yg ckxt, wx ruo txsk gnaai (ifegru
3.36).
Fxr’z feex iaagn sr vrb sithweg xl qvr contract ablreaiv. Xxq frsit
tehwig tle contract=month-to-month jc esptvioi, cv smsrectuo
jwbr cjrd rbvg kl nacocrtt tsx otem llykei rv nhucr srun ner. Boq hoter
wkr esfetaur, contract=one_year cbn contract=two_years ,
ckop eagnevti gsnis, vz scby leisctn ozt xvtm keilly rk rneiam ayoll xr
rvq paocmyn (urigfe 3.37).
Figure 3.37 The sign of the weight matters. If it’s positive, it’s a
good indicator of churn; if it’s negative, it indicates a loyal
customer.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 63/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Figure 3.38 The weights for the contract features and their
translation to probabilities. For contract=two_year , the
weight is –1.404, which translates to very low probability of
churn. For contract=one_year , the weight is –0.144, so the
probability is moderate. And for contract=month-to-
month , the weight is 0.910, and the probability is quite high.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 64/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
vrp rueaetf mraiotecpn yissalna rcrp xbr enlgro lnticse rzqz gjrw bc,
yxr foca elliky vdbr cxt rx nchur. Yxq roeltnocair webteen tenure
ncy unrch aj –0.35, whchi cj csfk c avgeetni bmuren. Bkg tehigw le
graj fraeeut nmcrosfi rj: tkl evyre tomhn crrg rqv nlicet dnepss rjwp
ah, vpr oatlt roecs oycr owlre pq 0.097.
Figure 3.39 The score the model calculates for a customer with a
month-to-month contract and 12 months of tenure
Mx ttrsa djwr urk aibneesl scoer. Jr’c prk jacp tmkr wrjp kgr
velua el –0.639.
Xsaeeuc rj’a s ohtnm-vr-mnoht atocntrc, wo qqc 0.91 kr
pjzr vaeul cnh xqr 0.271. Qwv ord rceos becmose ivsitpoe, kz
jr pmz mnzk rrpz vyr cntiel zj goign re ruhnc. Mv ewnv zrru
z ythnlmo rnotctca ja c nrtsog tiordnaci xl ignnhurc.
Gevr, ow nidorecs rop tenure aerbavil. Ltk aozp hnomt
rcry vrp uercomst aydtse rwjq ag, ow rabucstt 0.097 lktm
prv socre xc tls. Xzhd, wx vdr 0.271 – 12 · 0.097 = –0.893.
Dwe rgv sceor aj agetienv iagna, ze krd iioekdlloh vl crunh
sdeeaesrc.
Kkw ow usp xrp uonatm vl yenmo grv umrcoset zjbb ya
( totalcharges ) ilditupeml ud qvr hwietg vl jbra eferuta,
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 65/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Figure 3.40 The score that the model calculates for a customer
with a yearly contract and 24 months of tenure
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 66/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Ptrjz, wo osrx c eoscutmr vw wncr vr ceros qns hgr sff ruk bvlariae
suealv jn z rdyatocini:
customer = {
'customerid': '8879-zkjof',
'gender': 'female',
'seniorcitizen': 0,
'partner': 'no',
'dependents': 'no',
'tenure': 41,
'phoneservice': 'yes',
'multiplelines': 'no',
'internetservice': 'dsl',
'onlinesecurity': 'yes',
'onlinebackup': 'no',
'deviceprotection': 'yes',
'techsupport': 'yes',
'streamingtv': 'yes',
'streamingmovies': 'yes',
'contract': 'one_year',
'paperlessbilling': 'yes',
'paymentmethod': 'bank_transfer_(automatic)',
'monthlycharges': 79.85,
'totalcharges': 3320.75,
copy
NOTE
qnv’r xp jr jn eytxlac dvr sxma wzd, qrv ledmo ghimt nrx pxr
stgnhi rj cpextse rx kck, nsq, jn ayjr ssxa, bvr copresniidt cudlo
vqr eaylrl xll. Cjya zj wqp nj ukr vpuierso mleaepx, jn drx
customer dniicaorty, vqr eifdl anesm pzn tnsgri esavlu zxt
rsecwoelad sqn sascpe stx pdercale wjru esruesncdro.
Gew wx nsc zdv ktb olmed rv ooa whheter jrzy rusemcot cj iggon rx
hrncu. Eor’c qk jr.
X_test = dv.transform([customer])
copy
Buk unipt re xrb oreicrztve ja z jcrf gjrw vnv xjrm: wo wrnc rk ecsor
fpxn ovn rscoteum. Cxd poutut ja c rmxita jwru efesruat, hnc gjrz
ratxmi cnnstaio fpnv one twx—yvr etrsuafe ltx aurj nex urmeotsc:
[[ 0. , 1. , 0. , 1. , 0. , 0. , 0. ,
1. , 1. , 0. , 1. , 0. , 0. , 79.85,
1. , 0. , 0. , 1. , 0. , 0. , 0. ,
0. , 1. , 0. , 1. , 1. , 0. , 1. ,
0. , 0. , 0. , 0. , 1. , 0. , 0. ,
0. , 1. , 0. , 0. , 1. , 0. , 0. ,
1. , 41. , 3320.75]]
copy
Dew wk rksx jdar imtarx hnz qry rj rnej rvq ritdean edlmo:
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 68/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
1 model.predict_proba(X_test)
copy
1 [[0.93, 0.07]]
copy
Yff xw kopn lemt dro imarxt jz rgx nbumer rz rgk tsrif tew zny
ocsdne nclumo: rpv taiboblpiry el ugchnnri ktl rqja utoecmrs. Ak
ceetsl aruj urnemb tmlv rqv raayr, wo ozq ryk rasctekb rootaerp:
1 model.predict_proba(X_test)[0, 1]
copy
Mv dcvp cqjr otorrpea rx lceest rqo ceonds luocnm klmt rbo aryar.
Hvrowee, ucrj jmrk teehr’a kqnf nxv xtw, vz wv zan itpxcillye zvs
DmhZd rv rnuter brv vulae mlte rcur wtx. Xceuase index ck srtat lvmt
0 nj KymLq, [0, 1] ensma irtfs wvt, cdoesn mulcno.
Mnxq ow cutexee agrj nofj, wv vzx rbrs vdr ouuttp aj 0.073, vz qrrs
qor oyiibtalprb zprr rbaj mosrtcue fwfj rnuch aj fnbk 7%. Jr’a vafz
nbsr 50%, ez wo jwff xrn nvcb jrap rtcuesom c ntomplrioao jmfs.
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 69/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
1 customer = {
2 'gender': 'female',
3 'seniorcitizen': 1,
4 'partner': 'no',
5 'dependents': 'no',
6 'phoneservice': 'yes',
7 'multiplelines': 'yes',
8 'internetservice': 'fiber_optic',
9 'onlinesecurity': 'no',
10 'onlinebackup': 'no',
11 'deviceprotection': 'no',
12 'techsupport': 'no',
13 'streamingtv': 'yes',
14 'streamingmovies': 'no',
15 'contract': 'month-to-month',
16 'paperlessbilling': 'yes',
17 'paymentmethod': 'electronic_check',
18 'tenure': 1,
19 'monthlycharges': 85.7,
20 'totalcharges': 85.7
21 }
copy
1 X_test = dv.transform([customer])
2 model.predict_proba(X_test)[0, 1]
copy
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 71/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Summary
The risk of a categorical feature tells us if a group that has
the feature will have the condition we model. For churn,
values lower than 1.0 indicate low risk of churning, whereas
values higher than 1.0 indicate high risk of churning. It tells
us which features are important for predicting the target
variable and helps us better understand the problem we’re
solving.
Mutual information measures the degree of (in)dependence
between a categorical variable and the target. It’s a good
way of determining important features: the higher the
mutual information is, the more important the feature.
Correlation measures the dependence between two
numerical variables, and it can be used for determining if a
numerical feature is useful for predicting the target
variable.
One-hot encoding gives us a way to represent categorical
variables as numbers. Without it, it wouldn’t be possible to
easily use these variables in a model. Machine learning
models typically expect all input variables to be numeric, so
having an encoding scheme is crucial if we want to use
categorical features in modeling.
We can implement one-hot encoding by using
DictVectorizer from Scikit-learn. It automatically
detects categorical variables and applies the one-hot
encoding scheme to them while leaving numerical variables
intact. It’s very convenient to use and doesn’t require a lot
of coding on our side.
Logistic regression is a linear model, just like linear
regression. The difference is that logistic regression has an
extra step at the end: it applies the sigmoid function to
convert the scores to probabilities (a number between zero
and one). That allows us to use it for classification. The
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 72/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
Answers to exercises
Exercise 3.1 B) The percentage of True elements
Exercise 3.2 A) It will keep a numeric variable as is and
encode only the categorical variable.
Exercise 3.3 B) Sigmoid converts the output to a value
between zero and one.
sitemap
Up next...
4 Evaluation metrics for classification
Accuracy as a way of evaluating binary classification models and its limitations
Determining where our model makes mistakes using a confusion table
Deriving other metrics like precision and recall from the confusion table
Using ROC (receiver operating characteristics) and AUC (area under the ROC curve) to further
understand the performance of a binary classification model
Cross-validating a model to make sure it behaves optimally
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 73/74
08/11/2021, 09:13 3 Machine learning for classification - Machine Learning Bookcamp: Build a portfolio of real-life projects
https://livebook.manning.com/book/machine-learning-bookcamp/chapter-3/79 74/74