Professional Documents
Culture Documents
Introducción
Ejemplo 1: suma acumulada
Secuencial
multiprocessing Pool.map
joblib
Ejemplo 2: entrenamiento de modelos
Secuencial
multiprocessing Pool.map Paralelizar bucle for
con Python
joblib
Información de sesión
Introducción
Para las dos librerías, se muestra cómo paralelizar una función sencilla
y cómo paralelizar el entrenamiento de modelos de scikit-learn
(https://www.cienciadedatos.net/documentos
/py06_machine_learning_python_scikitlearn.html).
Ejemplo 1: suma
acumulada
Secuencial
In [1]: # secuencial (no paralelizado)
# ====================================================
==========================
import pandas as pd
import numpy as np
# Se define la función
def suma_acumulada(number):
return sum(range(1, number + 1))
Out[2]: [5000000050000000,
5000000050000000,
5000000050000000,
5000000050000000,
5000000050000000]
multiprocessing Pool.map
In [3]: # multiprocessing Pool.map
# ====================================================
==========================
import pandas as pd
import numpy as np
import multiprocessing
# Se define la función
def suma_acumulada(number):
return sum(range(1, number + 1))
In [4]: %%time
Out[4]: [5000000050000000,
5000000050000000,
5000000050000000,
5000000050000000,
5000000050000000]
joblib
In [5]: # joblib
# ====================================================
==========================
import pandas as pd
import numpy as np
import multiprocessing
from joblib import Parallel, delayed
# Se define la función
def suma_acumulada(number):
return sum(range(1, number + 1))
Ejemplo 2: entrenamiento de
modelos
Secuencial
In [7]: # Secuencial (no paralelizado)
# ====================================================
==========================
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
# Datos de entrenamiento
X, y = load_boston(return_X_y=True)
# Función de entrenamiento
def train_model(X, y, n_estimators):
model = RandomForestRegressor(
n_estimators = n_estimators,
n_jobs = 1,
random_state = 123
)
model.fit(X, y)
return model
In [8]: %%time
modelos = []
multiprocessing Pool.map
Tabla de contenidos In [9]: # Entrenamiento paralelo de múltiples modelos multipro
cessing Pool.map()
# ====================================================
Introducción
==========================
Ejemplo 1: suma acumulada
Secuencial
import pandas as pd
multiprocessing Pool.map import numpy as np
joblib import multiprocessing
Ejemplo 2: entrenamiento de modelos from sklearn.datasets import load_boston
Secuencial from sklearn.ensemble import RandomForestRegressor
multiprocessing Pool.map
joblib # Datos de entrenamiento
Información de sesión X, y = load_boston(return_X_y=True)
# Función de entrenamiento
def train_model(X, y, n_estimators):
model = RandomForestRegressor(
n_estimators = n_estimators,
n_jobs = 1,
random_state = 123
)
model.fit(X, y)
return model
In [10]: %%time
n_jobs = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=multiprocessi
ng.cpu_count())
modelos = pool.starmap(train_model, [(X, y, n_estimato
rs) for n_estimators in list_n_estimators])
joblib
In [11]: # Entrenamiento paralelo de múltiples modelos joblib
# ====================================================
==========================
import pandas as pd
import numpy as np
from joblib import Parallel, delayed
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
# Datos de entrenamiento
X, y = load_boston(return_X_y=True)
# Función de entrenamiento
def train_model(X, y, n_estimators):
model = RandomForestRegressor(
n_estimators = n_estimators,
n_jobs = 1,
random_state = 123
)
model.fit(X, y)
return model
Tabla de contenidos In [12]: %%time
n_jobs = multiprocessing.cpu_count()
Introducción
modelos = Parallel(n_jobs=n_jobs)(delayed(train_mode
Ejemplo 1: suma acumulada
l)(X, y, n_estimators) for n_estimators in list_n_esti
Secuencial
mators)
multiprocessing Pool.map
joblib CPU times: user 1.65 s, sys: 1.64 s, total: 3.3 s
Ejemplo 2: entrenamiento de modelos Wall time: 35.9 s
Secuencial
multiprocessing Pool.map
joblib
Información de sesión
Información de sesión
In [13]: from sinfo import sinfo
sinfo()
-----
joblib 0.15.1
numpy 1.19.2
pandas 1.1.3
sinfo 0.3.1
sklearn 0.23.1
-----
IPython 7.18.1
jupyter_client 6.1.7
jupyter_core 4.6.3
jupyterlab 2.2.9
notebook 6.1.4
-----
Python 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
Linux-5.4.0-1029-aws-x86_64-with-debian-buster-sid
8 logical CPU cores, x86_64
-----
Session information updated at 2020-10-31 13:07