Climate Wins

Machine Learning Project

Background

  • Climate Wins is concerned with the increase in extreme weather events, especially in the past 10 to 20 years. The company wants to assess the tools available to categorize and predict the weather in mainland Europe, focusing on machine learning as a predictive tool.

Objective

  • Assess the viability of machine learning as a technique for predicting climate change with Climate Wins weather data from the past century.

Context

  • This is a project I completed as a part of the data analytics course at Career Foundry.

  • The analysis for this project was conducted in Python. Libraries used include pandas, NumPy, matplotlib, scipy, seaborn, sklearn, keras, and more.

  • Visualizations were created in Python primarily using matplotlib.

Data Sets

  • The data set compiles weather observations from 18 different weather stations across Europe with data ranging from the late 1800s to 2022. Recordings exist for almost every day with values such as temperature, wind speed, snow, global radiation, and more. This data is collected by the European Climate Assessment & Data Set project.

    • The data set can be found here.

VIEW THE SCRIPTS
VIEW THE PRESENTATION

Steps

  • Constraining and Directing

    • It was important to first anonymize the data to protect data privacy. Also, identifying potential biases behind the data was vital to avoid skewing the results of the project. Examples include:

      • Economic bias - wealthy countries are more likely to have the capability of producing high-quality data, skewing the data towards those countries

      • Latent bias - past data may not be effective for predicting the future since new impacts of climate change are being identified each year. Prior data may not exist to help predict these effects in the future.

  • Scaling Data

    • Forming the data around a mean provided a framework for the machine learning techniques as well as normalizing the data and identifying outliers.

  • Optimization

    • Finding the best parameters for the machine learning models to achieve the most accurate results. I performed gradient-descent optimizations here, which helped to predict future parameters. I also performed feature importance identifications to identify how future modeling should be weighted in favor or against certain features. In addition, I backed results with confusion matrixes to test accuracy.

Testing Supervised and Unsupervised learning models

  • Supervised - models tested include K-Nearest Neighbor, Decision Trees, and Artificial Neural Network

  • Unsupervised - models included Principal Component Analysis, Random Forests, Convolutional Neural Networks and Generative Adversarial Networks

Decision Trees - Using branching, weighted decision points to predict weather patterns.

Generative Adversarial Network - correctly predicting the weather shown in a generated image.

Recommendations

  • Random Forest modeling was the most accurate and is currently a viable model for use in predicting weather patterns.

    • 99% accuracy in unsupervised testing.

  • Generative Adversarial Networks have the most potential for future applications.

    • Visual identification of potentially disastrous weather patterns from satellite images.

    • Highly accurate GANs will be able to detect bad weather earlier and with greater accuracy.

  • Accuracy of all models is dependent on quality of inputs, especially key feature importance weighting.

    • Weather is not universal throughout the data, all areas put more importance on some factors than others.

    • Average temperature is most weighted.

Next
Next

Climate Maize