Project Overview
Developed a model to improve the accuracy of annual crop yield estimates using limited training data by utilizing feature engineering and machine learning techniques. The goal was to explore the predictive potential of the dataset of annual crop yields collected by the customer.
Data Acquisition & Feature Engineering
Additional data was acquired using external geospatial satellite information and new features were engineered from existing features. Both regression and classification modeling were performed on the data to explore the full predictive potential of the dataset.
Results
Support Vector Regression turned out to be the best performing regression model, which achieved an average MAPE of 15.6% across the crops. For various technical reasons, the problem was reframed as a classification problem and an XGBoost model was able to achieve an F1-score between 0.78 and 0.85 across the crops. This turned out to be a promising result and provides the customer with greater predictive improvement.
Interested in a similar system?
Let's talk about your requirements.