Project Overview
This project tackles the critical challenge of predicting volcanic eruptions using time series sensor data. The system predicts two key metrics: the tilt-erupt (pressure) value at eruption time and the time remaining until eruption occurs. Using advanced feature extraction from temporal patterns and multiple regression models, this project demonstrates how machine learning can contribute to volcanic hazard assessment and early warning systems.
Key Objectives
- Predict the final pressure/tilt-erupt reading at volcanic eruption moment
- Estimate time to eruption based on real-time sensor readings
- Develop robust models that perform consistently across different window sizes
- Extract meaningful temporal features from imbalanced time series data
Dataset
Size: 189 volcanic observations (150 training, 39 testing)
Format: Time series sensor data with timestamp and pressure measurements tracking volcano behavior from rest phase through eruption
Key Features:
- Time: Temporal information from sensor readings
- Pressure (tilt-erupt): Volcanic pressure/tilt measurements
- Data undergoes Yeo-Johnson transformation for normalization
- Sliding window approach with variable sizes (w=100, w=300) and overlap
Methods & Techniques
Feature Engineering:
- tsfresh library for comprehensive time series feature extraction
- Sliding window approach with overlapping segments
- Statistical feature selection to identify significant predictors
- Yeo-Johnson power transformation applied per observation
Models Evaluated:
Linear Regression
RidgeCV
Linear SVR
Decision Tree
Random Forest
K-Neighbors
Gradient Boosting
AdaBoost
Results & Performance
99.60%
Linear Regression R²
95.94%
Random Forest R² (Time)
Key Findings
- Best for Tilt-Erupt Prediction: RidgeCV and Linear Regression achieve exceptional R² scores (>0.99), demonstrating near-perfect pressure prediction capability
- Best for Time Prediction: Random Forest (R²=0.9594), AdaBoost (R²=0.9570), and Gradient Boosting (R²=0.9549) outperform linear models for temporal forecasting
- Model Robustness: Linear models maintain R²>0.94 across different window sizes and overlap configurations
- Feature Importance: Pressure range and temporal length emerge as the most predictive features
- Residual Analysis: Models show strong normality in residuals and effective outlier detection on test data
Technologies Used
Python
scikit-learn
tsfresh
pandas
NumPy
matplotlib
seaborn
statsmodels
XGBoost
LazyPredict
Google Colab