We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
Copilot
‘Sure! Here's complete code snippet to develop a predictive model for your water quality project, based on the inputs and objectives we've discussed:
import pandas 25 pa
inport Ratplotlib,pyplot as plt
ingort Seaborn 38. sne
‘on skLeaPn-redel selection Inport train test_split
‘ron sklearnensenble inport Randonrorestélassifier
‘from sklearncretrics import accuracy. score, ¢lassi*ication report
4 Load he dotaset
Gf = pd-read_csv(water_quality_dataset.csv")
bata Cleaning
af_cleanee = df dropna()
{teteaned{ pl} = pd-to_nuneric(sé_cleaned|‘pi"], errars="coence")
‘feleanea df cleaned. ropna(subeate[pi'])
# bata analysis
unmary sats = af_cleaned.describe()
correlation aateix = dfveleaned. corr)
1 bata visualization
ple. figure(Figsize-(10, 6))
ns. histplot(af_cleaned( pH, kaesTrve)
plectitie( oistributien of pil Levels)
ple cxdabel( ph")
pls ylabel( Frequency")
ple show)
ple -tigure(igsize=(1@, 6))
Sas. scatterplot(xe Dissolved oxygen’, y='900', data=df_clesned)
ple-title('Dissolved Oxygen vs. ssochenical Oxygen enand')
[Link](Dissolved Oxygen (ne/L)")
pit -ylabel ("600 (ag/0)")
ple show)
ple. figure(figsizen(12, 8))
[Link](correlation matrix, annot=Truey_cnaps"cootwarn")
[Link]( Correlation Ratrix of nater Quality Parameters”)
ple show)
4 potabitity peternination
def 15_potable(rom
SF(6.5 cu ron pi] co 8.5 and
‘oa Dissolved Oxygen] >'S and
‘owl '800'] <3 and
Foal Nitrate Ritrogen'] < 18 androw{ *Fecal_Coliform’) < 1)
revuen t
else!
4_cleanee(Potable’] = af_cleaned.agply(is_potable, axise1)
4 Predictive Yodel Developnent
¢-eleaned[[pil", "Dissolved oxygen’, ‘200°, 'Mtrate Nitrogen’, “Fecal_coliforn")]
Celeaned{ 'Potable"]
y:
Htrain, test, y_train, y_test = traintest_split(K, y test s
2e-0.3, random state-42)
rodel = tandosForestClassifier(n_estinatonse109, randon_statent2)
wogel.f1t(actrain, y_teain)
YLpred = [Link](X test)
accuracy = accuracyscore(y test, y_pred)
report classification report([Link], y_pred)
print(f*Aceuracy? (accuracy) ")
print (report)
4 Feature importance
‘Feature _inportances = nodel.feature_inportances._
for feature, inportance in zip(Features, feature_snportances):
print(F" (feature): (inportance:.4f)')
Explanation:
Data Cleaning: The code removes missing values and ensures the pH values are numeri.
1
2. Data Analysis: Suramary statistics and a correlation matrix are generated.
3. Data Visualization: Histograms, scatter plots, and hestmaps are created to visualize the data
4
5
Potability Determination: A function is defied to classify water samples as potable or non-potable based on ideal parameter ranges.
Predictive Model Development: A Random Forest Classifier is tained to predict water potability. The model's accuracy and a classi
report are printed.
6, Feature Importance; The importance of each feature in the predictive model is displayed,
‘This code provides a comprehensive approach to developing a predictive model for your water quality project. Ifyou have any questions or need further
customization, fee fe to ask!