Climate Wins
Machine Learning Project
Background
Climate Wins is concerned with the increase in extreme weather events, especially in the past 10 to 20 years. The company wants to assess the tools available to categorize and predict the weather in mainland Europe, focusing on machine learning as a predictive tool.
Objective
Assess the viability of machine learning as a technique for predicting climate change with Climate Wins weather data from the past century.
Context
This is a project I completed as a part of the data analytics course at Career Foundry.
The analysis for this project was conducted in Python. Libraries used include pandas, NumPy, matplotlib, scipy, seaborn, sklearn, keras, and more.
Visualizations were created in Python primarily using matplotlib.
Data Sets
The data set compiles weather observations from 18 different weather stations across Europe with data ranging from the late 1800s to 2022. Recordings exist for almost every day with values such as temperature, wind speed, snow, global radiation, and more. This data is collected by the European Climate Assessment & Data Set project.
The data set can be found here.
Steps
Constraining and Directing
It was important to first anonymize the data to protect data privacy. Also, identifying potential biases behind the data was vital to avoid skewing the results of the project. Examples include:
Economic bias - wealthy countries are more likely to have the capability of producing high-quality data, skewing the data towards those countries
Latent bias - past data may not be effective for predicting the future since new impacts of climate change are being identified each year. Prior data may not exist to help predict these effects in the future.
Scaling Data
Forming the data around a mean provided a framework for the machine learning techniques as well as normalizing the data and identifying outliers.
Optimization
Finding the best parameters for the machine learning models to achieve the most accurate results. I performed gradient-descent optimizations here, which helped to predict future parameters. I also performed feature importance identifications to identify how future modeling should be weighted in favor or against certain features. In addition, I backed results with confusion matrixes to test accuracy.
Testing Supervised and Unsupervised learning models
Supervised - models tested include K-Nearest Neighbor, Decision Trees, and Artificial Neural Network
Unsupervised - models included Principal Component Analysis, Random Forests, Convolutional Neural Networks and Generative Adversarial Networks
Decision Trees - Using branching, weighted decision points to predict weather patterns.
Generative Adversarial Network - correctly predicting the weather shown in a generated image.
Recommendations
Random Forest modeling was the most accurate and is currently a viable model for use in predicting weather patterns.
99% accuracy in unsupervised testing.
Generative Adversarial Networks have the most potential for future applications.
Visual identification of potentially disastrous weather patterns from satellite images.
Highly accurate GANs will be able to detect bad weather earlier and with greater accuracy.
Accuracy of all models is dependent on quality of inputs, especially key feature importance weighting.
Weather is not universal throughout the data, all areas put more importance on some factors than others.
Average temperature is most weighted.