Volcano Eruption Prediction

Time Series Analysis and Regression Modeling for Volcanic Activity Forecasting

View Interactive Notebook

Project Overview

This project tackles the critical challenge of predicting volcanic eruptions using time series sensor data. The system predicts two key metrics: the tilt-erupt (pressure) value at eruption time and the time remaining until eruption occurs. Using advanced feature extraction from temporal patterns and multiple regression models, this project demonstrates how machine learning can contribute to volcanic hazard assessment and early warning systems.

Key Objectives

Dataset

Size: 189 volcanic observations (150 training, 39 testing)

Format: Time series sensor data with timestamp and pressure measurements tracking volcano behavior from rest phase through eruption

Key Features:

Methods & Techniques

Feature Engineering:

Models Evaluated:

Linear Regression RidgeCV Linear SVR Decision Tree Random Forest K-Neighbors Gradient Boosting AdaBoost

Results & Performance

99.73%
RidgeCV R² Score
99.60%
Linear Regression R²
95.94%
Random Forest R² (Time)

Key Findings

  • Best for Tilt-Erupt Prediction: RidgeCV and Linear Regression achieve exceptional R² scores (>0.99), demonstrating near-perfect pressure prediction capability
  • Best for Time Prediction: Random Forest (R²=0.9594), AdaBoost (R²=0.9570), and Gradient Boosting (R²=0.9549) outperform linear models for temporal forecasting
  • Model Robustness: Linear models maintain R²>0.94 across different window sizes and overlap configurations
  • Feature Importance: Pressure range and temporal length emerge as the most predictive features
  • Residual Analysis: Models show strong normality in residuals and effective outlier detection on test data

Technologies Used

Python scikit-learn tsfresh pandas NumPy matplotlib seaborn statsmodels XGBoost LazyPredict Google Colab