You are on page 1of 26
import pandas as pa mort nunpy as np ‘nport matplotlib.pyplot as plt ngort toren lnport pandas as pd fron sklearn, cluster Snort kNeans fron sklearn nodel_selection inport train_test split fron sklearn, linear pode import LogisticRegression Smport warnings §mport seaborn as sas Xeatplotiib inline ‘ron google.colab inport drive drive.nount ("/content/drive") Drive already mounted at /content/dive; to attenpt to forcibly renount, call drive.mount("/content/drive", force_renount “Fapd.cead_csv("/eontent/arsve/My Orive/datasets/reart.c3v") af nead() om 13 M5 2 4 eT er 497 4 2 190 20 9 4 ro ae kt 2 4 0 1 120 26 9 3 5 11 10 me 0 ee 4057 0 0 120 34 0 403 100 08 2 0 24 af.r0810) 298 209 300 aot 02 Listado de datos de formulario de valores nulos print df. isnut1()-sun()) age hot fe thalae ene oldpea sip shall cutput type: age sex ep trtbps sr 0 0 M0 13 M0 Cr) sy 1 0 10 sy 0 1 10 ‘hh rk 24 26 199 1 26 os restecg thalachh o o 1 1 1 1 ° 1 12 vs 15 174 1 o 02 12 a4 12 oo sip caa thall output 1 1 1 1 1 ° ° 2 1 1 3 3 3 3 2 0 0 Eliminacién de valores nulos porque los valores nulos son muy inferiores en afeat.dropna() y Visualizacién de datos Proporcione la informacién del tipo de datos af. info) bata columns (total 14 columns): Column Nan-Null Count Btype zee 388 snes °” 303 ‘tops 303, shot 303 fe restecg 303 303 303 types: loatsa(a), intsa(a3) mmenory usage: 35.5 XB ~ Describiendo los datos completos af desersbe() ‘count 203.000000, ata min 25% 50% 75% 94.366297 s.oe2s01 29,000000 47.500000 ‘s.000000 6.000000 7.000000 :302.000000 oseat68 o4cs0tt 0.000000 0.000000 +.900000 +.900000 +.900000 -303,090000 o.966007 1.032052 0.000000 0.000000 4.000000 2.000000 3.000000 .308.000000 31629762 srsaas ‘4.000000 +20,000000 +30,000000 *+40,000000 200.000000 ~ visualizando valores nulos si existe 4. tsmu12().sun() age ‘ethos fos restece thalachhn exe eldpesk sip :302,000000 246,264028 51830751 +26,000000 2.000000 249,000000 274500000 564.000000 -303,090000, 0.148515 oase198 0.000000 0.000000 2.000000 2.000000 +.000000 ‘303,000000 o.s200sa 0.525860 0.000000 2.000000 ‘000000 4.000000 2.000000 :392.000000 a9.646005 22906161 ‘7.000000 +93.500000 +63.000000 +66,000000 202.000000 302.000000 oa2s7a3 .4e0704 0.000000 2.000000 2.000000 +.000000 1.000000 -303,000000 s.ox9804 sse1075 0.000000 0.000000 ‘.800000 4.500000 6.200000 203.01 1a os oo 101 101 201 201 chal 8 cuter 8 type: inte plt.Figure(igsize=(22,18)) pit. xticks(st24=28,color="grey') plt-tick_parans(sizest2,color= rey") ple.title( Finding Will Values Using Hestmap\n’,color='grey’size-30) sns.heatnap(oF.1snul1(), yticklabels-False, chareFalse, fenape'Pusu, > ~ se visualiza el plot para indicar cual de los generos sufren mas ataques al corazon ple. Figure ns count 2 pip install nttps://github.con/pancas-profiling/pandas-profiling/archive/naster.2ip Looking in indexes: Collecting bites: //eithub.con/pandas-oraftling/eandas-nroftL ine. Using cached httoe://eithub,con/pandas-profiling/eandas-profiling/archive/aaster.219, Requirenent Requirenen= Requirenent Requirenent Requirenent Requirenent Requirenen= Requirenent Requirenent Requirenent Requirenent Requirenen= Requirenent Requirenen= Requirenent Requirenent Requirenen= Requirenent Requirenen= Requirenent Requirenent Requirenent Requirenent Requirenen: already already already already already already already already already already already already, already already already already already already already already already lready already already nttps://oypl.org/siaple, nttps://us-python.okg,dev/colab-wneels/public/sinpley. satisfied: sstisfied: satisfied: satisfied: satisfied: satistiea: satisfied: satisfied: satisfied: satisfied: satisfied: satisfied: satistied: satisfied: satisfied: satisfied: satisfied: satisfied: satisfied: satisfied: satisfied: Satisfied: satisfied: satisfied: ng archive aaster.in JodLib~=£.1.0 in /usr/ocal/Ilb/python3.7/dist-packages (fron pandas-profling==3.2.0) (2.2.0) Scipyset.6.i in /usr/local/13b/python3.7/dist-packages (Fron pandas-profilingr=3.2.0) (2.7.3) andasi=i.0.0, 01.001, [=1.8.2, lo1-1.0,998.25.9 in /use/loedl/140/python3. 7/dise-packages. (Fron p atplotisbs-3.2.0 in /usr/local/ib/python3.7/¢3st-packages (ron pandas-profiling=-3.2.0) (3.2. pydantic>=1.8.1 in /usr/local/Iib/pythen3.7/dist-packages. (from pandas-profilinge=3.2.@) (1.8.2) PYYAML>=5.8.8 in /usr/ocal/Iib/python3.7/dist-packages (fron pandas-profiing==3.2.0) (6.8) Jinjazse2.11.1 in /usr/ocai/Lib/aychon3.7/aist- packages (fron pandas-profilingen3.2-0) (2.11.3) ‘istons(type_ieage_path]=-8.7.5 in /usr/local/lib/python3.7/aist-packages (Fron pandas-protiLing ‘nunpy>-1.36.8 in fusr/local/1ib/python3.7/dist-packages (Fron pandas-profilings-3.2.0) (2.21.6) hemininong.1.22 4n /usr/ocal/Iib/pythen3.7/aist-packages. (from pandas-profilinge=3.2.0) (@.1-12 rssingno>-0.4.2 in /usr/local/Lib/python3.7/dist-packages (Fron pandas-profilinga-3-2.0) (5.1 phik>e0.23.1 in /usr/Local/1ib/python3.7/cist-packages (fron pandas-profilings=3.2.0) (0.32.2) tangied-up-in-unicode=s8.2.0 In /usr/ocai/11b/python3.7/dist packages. (Fron pandas-profiling- reguests>-2.24.8 in /usn/local/1ib/python3.7/dist-packages (from pancas-profiling--3.2.@) (2.28 ttgdnr=4.48.2 in Jusr/local/ib/python’.7/ist-packages (fron pandas-profiling==3.2.0) (4.64.0) seabora=8.10.1 in /usr/local/Iib/python3.7/dist-packages (From pandas-profilinge=3.2.0) (@.12.2 ultinethod>~i.4 in /usn/local/1ib/python3.7/dist-packages (Fron pancas-profiling--3.2.2) (1.8) retworino=2.4 in fusr/iocal/lib/python3.7/dist-packages (fron visions{type_inage path]==0.7.5->p atersy=19.3.0 in /usr/ocal/110/python3.7/dist-packages (fron visions[type_inage path]==8.7.5->p Sagenash 5 /usr/local/ib/python3.7/dist-packages (rom visions|type_inage_path==0.7.5.>panda Pillow in /usr/local/Iib/pythen3.7/@ist-packages. (From visions[type_inage_path]=-8.7.5->pancas-p Narkupsates=0.23 in /use/local/1ib/python3.7/aist-packages (Fron jinja2>=2-11.1->pandas-proflin kiwisolvers-1.0.1 in /usr/local/ib/python3.7/dist-packages (Fron matplotlib>-3.2.8->pandas-prof cyelers=0.18 in /use/local/1i0/python3.7/cist-packages (from aatplotlib>=3.2.8- pandas- profiling Requirene Requirenent equirenent Requirenent Requirene Requirenent Requirenent Requirenent already already already already already already already already satisfied satisfied satisfied satisfied satisfied satisfied satisfied satisfied import pandas_ profiling 25 pp pp. ProfileReport(aF) python-dateutil»-2.4 in /usr/local/Lib/python?.7/dist-packages (Fron matplotliby-3.2.0-rpandas-p Pyparsing!=2.0.4,122.1.2,1=2.1.6,992.0.1 in /usr/loeal/1ib/pythons.7/dist-packages (From natpiot typing-extensions in'/ust/ocal/iib/python3.7/dist-packages (ron kiwisolver>=1.0.1->aazpletlib> pytz>-2017.3 in fusr/local/1ib/python3.7/dist-packages (From pandas!=i.0,0, -1-0.1,1-1.0.25!=1.1 Shoa1.5 in /use/local/i8/pythons. 7/dist-packages (Fron python-dateutil>=2-1-yaatplotliby-3.2.@ UelNib3=2.24.8-pandos-p nace, d=2.5 in Jusr/local/Lib/python3. 7/eist-packages (rom requests)=2-24.8->pandas-profilings= certifiy=2017.4.17 in /usr/iocal/Lie/python3.7/dist-packages (fron requests)=2.26.@->pandas-prot charset-nomalizer~=2,0.0 in fusr/local/1ib/python3.7/aist-packages (fron requests>.2.24.8- pond Pyavelets in /usr/ocai/1ib/pychon3.7/dist-packages (from inagehash->visions[type_inage_ path} Seleccién de caracteristicas 1 Seleccién univariada Seleccién univariante: se pueden usar pruebas estadisticas para seleccionar cletas caractersticas que tienen la mejor relacién con a variable de rendimiento fron sklearn.feature selection Snport SelectkBest af -cony() y = data, dloe[:) bestfeatures = Selactenest(score_funcechi2, bestfeatures.F12(%,9) fscores = pe.DataFrane( it. scores_) afcoluens = pé.vataFrane(X.colums) ee 1 featurescores = pd.concat({4Fcolums, dfscores),axise1) Featurescores.columns « ['Specs”, Score” } print(FeatureScores.nlargest (12, Score) specs ‘oldpesk 72.644253 2 cp 62.598098 5 emg. 351918377 e ‘age 23.286624 3 tetnps sa a23025 30 sip 3.804095 6 restecg 2.978271 » 2. Importancia de las caracteristicas Importancia de la caracteristica: puede obtener laimportancia de cada caracteristca de su conjunto de datos mediante el uso de la propiedad Caracteristicas del modelo = extratreesclassitter() =) print(nodel. feature_inportances_) = inportances = pé.Series(nodel.feature_inportances_, index-X.colums) = inportances.nlargest(13).plot(kind='barh") plt.show() + 3. Matriz de correlacién con mapa de calor Matriz de correlaci6n con mapa de calor: la correlacién indica cémo se relacionan las caracteristicas entre sio con la variable de destino | pls. Figure(#igsizes(12,10)) Sng heatnap(¢F.core(), annot=True, cnap= yea" ets! 26°) for & in df columns: print(i,ten(ar[ 4) -unique()) age a ope tithes 69 hol 152 fae 2 restecg 3 ‘halacth 92, emg? oldpeak 42 sips faa 5 ‘hall & output 2 + Visualizacion de datos sns.set_style(‘darkgria") sns.set_palette("Set2') a2 = af.cony() def chng(sex) sf sex return ‘fenale" ease: return ‘male’ fa 'sex'] = o#2[ sex ]-2pply(cine) ef chng2(preb): Af prob == 0: return “Heart Diseas ease! return ‘No Heart ofsease f2{ output] = of2[ ‘output J-apply(chag2) sns.countplot(datar df2,_ x2’ sex’ ,hues‘ovtput") ple.tittet “Gender v/s target\n') Text(@.5, 1.8, ‘Gender v/s target\n") ‘cendar vi target sns.countplot (datas 4f2, x="cp"shue=‘output") ple.tittet Chest Pain Type w/s target\n") Text(@.5, 1.8, ‘Chest Pain Type v/s target\n’) ‘chest Pain Type ws target + Edad de las pacientes con enfermedades del corazén ple. #igure(tigsize=(16,7)) sns.distplot(dF[df{ ‘output ]=-8][ age"), kde-False,bins-5@) ple.title("Eead ge ae pacientes con enfersecades del corazén\n') ‘use /Local/2ib/python3.7/d1st-packages/seaborn/disteibutions.9y:2519: Futuredarn varnings.warn(asg, Futurekorning) Toxt(@.5, 1-8, "Edad de Ias pacientes con enfermedades del corazén\n') ‘distplot’ 1s 2 deprecated function and wt dad dela pacientes con enfermedades del corazén awit ~ Colesterol de pacientes con enfermedades del corazon s pit. Figure(Figsize(16,7)) Sns distplot @F[éf[‘output”J+-8][ “chol'J,kdeFalse,bins=4) ple.title("colesterol de pacientes con enferedades del corazén\n") ‘use /Local/ib/python3.7/dist-packages/seaborn/disteibutions.oy:2619: FutuseNanning: “distplot’ is 2 deprecated function and wi earnings. warn(asg, Futurekarning) Toxt(@.5, 1.8, "Colesterol de pacientes con enfermedades del corazén\n") Coleserel de pacientes con enfermedades del crazin + Boxplot / violinplot ns boxplot (éatardf2,x>"output"y-'age") cnatplotlib.axes._subplots.nxessusplot at ex7Fesessdedde> Bay ple. Figure(Figsizes(14,8)) ns, violinplot(data-d’2,x-'caa" yo" age" shue~"output”) cnatplotlib.axes._subplots.nxessubplot at ex7FesescabF90> . i See aF.colums = ["age', ‘sex’, ‘chest_pain_type', ‘resting blood pressure’, ‘cholesterol’, “fasting. blood sugar’, ‘rest_ecg_type's "max rexercise_induced angina’, "st depression’, ‘st_slope type’, ‘num_eajor_vessels', ‘thalassenta type’, ‘target”] éf.columns Index(("age", ‘sex’, “chest_pain_type’, ‘resting blood pressure’, eholesteral!, “fasting blood sugar", “rest_eci type" snaxheart_rate_achieved", "evercise”induced_angina’, 'st_depression', “st_slope type", ‘nup_najor_vessels"y atypen’object") : = i wy o_o Generando valores de columnas categéricas ww . " 1 — sep ~ ehest_pain type 4F-Loc{ot{ "chest _pain_type'] == 6, ‘chest_pain type’) = ‘asymptomatic 4F-Loc{af{‘chest_pain_type'] == 2, ‘chest_pain_type'] = ‘atypical angina’ 4, Loc{af{ ‘chest _pain_type'] == 2, ‘chest_pain_type"] = ‘son-anginal pain’ Gf. Loc{af{ ‘chest _pain_type’] == 3, ‘chest_pain_type’] = ‘typical angina’ restecg - rest ecg type Af 1oe{ar| ‘rest eck type") == 8, ‘rest_ecg type") = ‘Lert ventricular hypertrophy’ af. Loe{ofl‘rest_ecg type") == 1, ‘rest_ecg type") = ‘normal! 4F Loe(at{ “rest ecg type"] == 2, ‘restiecg type") = ‘ST-T wave abnormality” ‘slope - st_slope_type 4f.Loc{at{'st_slope_type'] == 0, 'st_slope type") = ‘downstoping” 4f-Loc[ae{'st slope type'] == 1, ‘st_slope type’) = ‘flat’ 4F.1oc[ef{'st_slopetype"] == 2, 'st_slope_type'] = ‘upsloping’ ‘#thal ~ shalassenta_type 4f.Loc{af{ ‘thalassemia type") 4F-Loc{ar{ ‘thalassemia type") 4F-1oc{ef{ ‘thalassenia_type"] AF Loe{ar{‘thalassenia_type"] af.nead() 6 a “ 58 s7 Index sex chest_pain type 1 dumnies (ae, crop_fins ue’, exercise induced angina’, CGrget's enest-pain 5t_sTope_type_downsloping # st slepe_ypenipslopings ‘thalassenia,sype_nomal", ‘thalassemia ‘ypical angina nonanginal pin pial angina ypical angina asymptomate sex", “nesting bleod pressure’, ‘thalassenia_type'] = ‘nothing’ ‘thalassenia_type'] = ‘fixed defect ‘thalassenia_type'] = ‘nornal ‘thalassonia_type'] ~ ‘reversable defect resting blood pressure cholesterol fasting blood sugar 45 20 1 130 250 ° 130 208 ° 120 26 ° 120 24 ° False) pain type ST wave: abnormality", St_depression’, "nun_major_vessels', ype_ssyaptonatic", pe_atypical angina’, ‘ches ‘ype_non-anginal pain’, “se slope typeflat', ‘chalaasemia_type_fixed defect", pe_nathing’, rest_ece_type tot vonticular hypertophy lef venticular yperwophy inax_heart_rate_achieved 150 17 we 178 16 tralassenia_type_reversable defect], af _xenp = data[ 'thalassenia_type_fixed defect] data.naaat) age sex resting blood pressure cholesterol om 4 15 233 oor 4 130 250 24 0 130 208 355 4 120 236 457 0 120 354 frames = (data, df temp] result = pd.concat(franes, axis-2) resutt.head() fasting_blood_sugar max heart_rate_achieved 1 150 o 187 0 17 0 8 0 163 exercise_induced angina st dep age sex resting blood pressure cholesterol fasting blood sugar max heart_rateachieved exercise induced angina st_dep om 4 45 23 4 180 ° 174 120 20 ° sar ° 24 0 130 208 ° 1m ° a8 4 120 236 ° 18 ° Dado que una codificacién activa eliminé la colurnna "thalassemia type_fited defect’, que era una columna dtl en comparacién con ‘thalassemia type_nothing, que es una columna nua, eliminamos thalassemia.type_nothing’y concatenamos thalassemia.type-fixed defect 2 resutt.crop( "thalasseni resuite = result.copy() |type_rothing’ ,axise1,inplace-True) REGRESION LOGISTIVA resuit.columns Index({'2ge", ‘sex’, “resting blood pressure’, ‘cholesterol’, fasting blood sugar’, "sax heart_rate achieved’, ‘exercise_induced angina’, “st_depression’, ‘nun major_vessels", ‘target, “chest pain_type_atypical angina", ‘chest_pain_sypecnon-anginal pain’, “chest_pain_type_typical angina’, rest seq type left ventricular hypertrophy", "rest ecg type_normal’, Stuslope_type_flat', "st_slope_type_upsioping’, ‘thalassenia type_nomal’, "thalassenia_type_reversable defect’, thalassenia_type_fixed defect], atype=" object") X = result.drop("target, axis = 1) fron sklearn.nodel_selection inport train_test_split Krein, Ktest, ytrain, ytest = train test split(X, y, test_sizera.2, randon_state-d) Normalizacion ¢train-np.min(X train))/(npnax(X_train)-np.nin(Xtrain)) values X test (X_fest-np.min(X_test))/(mp.nax(X test) -np-nin(X_test)) values Encajar en el modelo from silearn. Linear podel import Logistickegression ogre = Logistictegression() ogre. Fie(X_train,y_train) Lopistdetegression() Prediccion y.pred = logre. predict test) actual = (1 predestion = [] for i,j in zipty_test,y pred): ‘actual append(@) predeition.append(3) ie = (actual sactual, "Preatetion’:predettion > result = pd.bataFrane(dic) Sngort plotly.graphobjects a5 go fe = go.Figure() #ig.adé_trace(go-Seatter(x=np-arange(@,len(y_test)}, yoy test, rrode='markerselines', Test") Fig.add_trace(go-Seatten(xonp-arange(,len(y_test)), yoy_red, rode='markers", rnane="Pred)) 04 ~ Modelo evaluacion fron skleamn.netrics import accuracy score print(accuracy_score(y,test,y_pred)) fron sklearn.netrics inport classification report print(classification_report(y.test,y_pred)) precision recall f1-score 28m aks a accuracy 0.87 support Ps a fron sidearn.netrics import confusion matrix print(confusion natrix(y_test,y_pred)) est,y_pred),annot~True) le 3] Us 29)) cnatplotlib.axes._subplots.Axessusplot st ex?f05e98c7096> f ~ Curva ROC Las curvas ROC resumen el equilibro entre la tasa de verdaderos positives y la tasa de falsos positivos para un modelo predictive que utiliza diferentes umbrales de probabllidad fron sklearn.netrics import roc_curve for, tor, thresholds ~ roc_curvely test, ¥_pred) plot(fprstpe) £-xLin( (0.8, 1.0]) plt.ylin({0.8, 1.01) Ple.titte("R0C curve for Heart disease classifier") plt.wlabel(*False positive rate (1-Specifseity)") plt-ylabel( ‘True positive rate (Sensitivity)*) plt.gria(True) ROC curve for Haat deesse classifier rent Sensi & 8 fae stele Speci) 0s completado alas 13:28,

You might also like