Student: Cecilia Baggini
Course: Women in Data Academy
Modules: Python Libraries
Using data to explore water quality issues in the West Midlands
Through the use of python libraries, scikit-learn, sktime libraries and seaborn, I determined whether the causes of water quality issues in West Midlands’ rivers can be predicted using temporal patterns in water phosphate, ammonia, flow and rainfall.
To get started, I collected the data I needed for the model including the water quality data, monthly flow and rainfall data, and the causes of water quality issues for each catchment. Once I had collected all of the relevant information, I then used geopandas and pandas to select West Midlands water quality sites and associate them with the river catchments.
The images below show the initial data exploration using seaborn to determine how phosphate and its relationship with rainfall changes depending on the causes of water quality problems where known for certain.
Data shown below in NumPy and Pandas
Following the preliminary data exploration, I used the knowledge I had gained during the course to train a model predicting the causes of water quality issues in the river catchments where these were unknown. In order to use time series directly in the model, I chose sktime library, which allowed me to use time series directly as predictor variables in machine learning models.
The initial model achieved an accuracy of 70% therefore I am trying other approaches I learned during the Women in Data Academy to improve its performance.
If you would like to learn more about our Women in Data Academy, visit our website for further information.