Student: Cecilia Baggini

Course: Women in Data Academy

Modules: Python Libraries

 

Using data to explore water quality issues in the West Midlands

Through the use of python libraries, scikit-learn, sktime libraries and seaborn, I determined whether the causes of water quality issues in West Midlands’ rivers can be predicted using temporal patterns in water phosphate, ammonia, flow and rainfall. 

To get started, I collected the data I needed for the model including the water quality data, monthly flow and rainfall data, and the causes of water quality issues for each catchment. Once I had collected all of the relevant information, I then used geopandas and pandas to select West Midlands water quality sites and associate them with the river catchments. 

The images below show the initial data exploration using seaborn to determine how phosphate and its relationship with rainfall changes depending on the causes of water quality problems where known for certain. 

 

Data shown below in NumPy and Pandas

TechTalent Academy Women in Data student Cecilia Student Showcase

TechTalent Academy Women in Data student Cecilia Student Showcase

TechTalent Academy Women in Data student Cecilia Student Showcase

 

Following the preliminary data exploration, I used the knowledge I had gained during the course to train a model predicting the causes of water quality issues in the river catchments where these were unknown. In order to use time series directly in the model, I chose sktime library, which allowed me to use time series directly as predictor variables in machine learning models. 

The initial model achieved an accuracy of 70% therefore I am trying other approaches I learned during the Women in Data Academy to improve its performance.  

If you would like to learn more about our Women in Data Academy, visit our website for further information.