Table of Contents

Submitter Details

Name: Ranjana Rajendran

Email: ranjana.rajendran@gmail.com

Problem Statement: Predict tilt-erupt of volcano eruption and time duration till eruption

Note : Throughout this notebook, 'pressure' and 'tilt-erupt' have been used synonymously

SUMMARY OF THE STUDY


Result

For predicting tilt-erupt 2 Linear models gave best result

  1. Linear Regression model gave Rsquare value of 0.996039470506752
  2. RidgeCV model gave Rsquare model of 0.9973216655765884.

The 2 Linear models are robust to window sizes and overlap windows and consistently gave results above 0.94 for the task of predicting final tilt-erupt value. The models have been analyzed using residual analysis in this study.

For predicting time to eruption of volcano, a linear model and several ensemble methods showed good performance.

  1. RidgeCV model gave 0.9441159515751051.
  2. AdaBoostRegressor gave 0.9569821587366184.
  3. Random Forest gave 0.959427257586498
  4. GradientBoostRegressor gave 0.9549051234483795

Explanation of method

There are to regression tasks in this project. First one is to predict tilt-erupt value at time of explosion of a volcano and the second one is to predict time to explosion of a volcano. The methods used to build feature vectors for the 2 tasks are similar, but there is a difference. The observations from the sensors are split into sliding windows and features exracted from each window using tsfresh.extract_features. There is a poroperty of tsfresh.extract_features - it does a group by on the column_id and then extracts features over all windows in each group. For predicting the final tilt erupt, where the final target value is unique for each observation, we can leverage this aspect to our advantage to get a model which shows a high accuracy. As this groups by column_id, if we give obs_id as column_id, it will extract information from all windows of that observation and give a summary result which will be a single row in the feature vector. Features such as pressure_maximum, pressume_absolute_maximum ranked high in significance in this task as this was a summary of all windows in that observation and these features directly implied the target value. Utilizing this method, I found that the choice of model, their Rsquare values are even robust to different window sizes and overlap sizes as the function will give a summary result which will remain almost the same for several features.

For predicting time to eruption, column_id is a tuple (obs_id, first time stamp of window). This produced reasonable number of windows which gave meaningful features. This is summarized in the section for predicting time to eruption. I was able to get high Rsquare values in the range of 0.9x for this task for a window size of 300 and overlap of less than 100. For window size of 100, the Rsquare value reduced to 0.6x range.

Choice of Metric

The scorer used throughout this notebook is Rsquare value. For each model , the mean absolute error, mean squared error, mean absolute percentage error have been computed for each model evaluated. Mean squared log error couldn't be utilized the target value was often negative.

The reason for choosing Rsquare value: (excerpt from ChatGPT)

Emphasis on Deviations: Squaring the residuals gives more weight to larger deviations from the regression line. This is beneficial because it penalizes larger errors more heavily, providing a clearer picture of how well the model fits the data.

Ensuring Positivity: Squaring ensures that all residuals are positive, as the square of any real number is non-negative. This simplifies calculations and makes interpretation easier.

Differentiability: Squaring the residuals results in a smooth, differentiable function. This property is essential for many optimization algorithms used to minimize the sum of squared residuals, such as the ordinary least squares (OLS) method.

Statistical Basis: The sum of squared residuals is directly related to the variance of the dependent variable around the regression line. Minimizing this sum is equivalent to maximizing the likelihood function in the context of normal errors, which is a common assumption in linear regression.

Mathematical Convenience: Squaring the residuals leads to a straightforward mathematical formulation, making it easier to derive properties of the estimators and conduct statistical inference.

Overall, using the square of the residuals in linear regression provides a robust and statistically sound framework for fitting models to data and assessing their goodness of fit.

Test & Train Split of Dataset

Where applicable, sklearn's test_train_split function has been used. Howeever, for extrating features to create the feature vector for test and training set, a different method has been used. First 150 observations have been used for training and the remaining 39 used for testing. For running the notebook for a different window/overlap size, play the cell which will shuffle the list of observations, so that a different set of observations will be training and test set.

The reason for implementing the training and test set in this manner are as follows

  1. To ensure no data leakage. The test set has to be a completely unseen set during training. No window from the test set should be in training.
  2. The test set should be unbiased, ie , the test set should have an equal propertion of all phases of the volcano as there is in the training set. Providing 39 full observations for test set ensures that the test set does not have bias.
  3. tsfresh.extract_features could spill data between windows and yet produce different rows. Separating the training and test well in advance will avoid such data leakage.

Residual Analysis

Thorough residual analysis has been attempted in this project. Residuals and their density are plotted against the predicted values and predictors. QQPLots were made to confirm that the residuals follow a normal distribution. I did not have to make adjustments in the models to fix issues as the residual plots looked very good. The analysis of residual plots were made referring https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/

Different window/overlap sizes

The notebook can be made to run for different window and overlap values by editing the corresponding cells for the w and overlap sizes and then running the subsequent sections by clicking the play button and not making any code changes.

Load and extract the data

In [ ]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#Import necessary model selection tools
from sklearn.model_selection import cross_val_score, RepeatedKFold, train_test_split, KFold
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV

#Import necessary models
from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.svm import LinearSVR
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PowerTransformer, StandardScaler

#Import necessary metrics
from sklearn.metrics import roc_auc_score, roc_curve, r2_score, mean_absolute_error, mean_squared_error, mean_squared_log_error, mean_absolute_percentage_error
#Simple decision tree
from sklearn.tree import DecisionTreeRegressor
# Simple bagged model
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor
# Simple boosted model
from xgboost import XGBRegressor
In [ ]:
from google.colab import drive
drive.mount('/content/drive')

path_prefix = '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/'
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
import os
from glob import glob

dir = path_prefix + 'Volcano_Dataset/*/*'

fnames = glob(dir)
print(fnames)

# Read each observation as a 2 x n array in a list of arrays
arrays = [np.loadtxt(f, skiprows=14, delimiter = ',') for f in fnames]
['/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation13.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation14.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation15.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation16.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation17.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation18.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation20.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation19.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano8/observation21.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano1/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation13.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation14.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation15.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation16.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation18.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation17.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation20.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation21.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation19.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation22.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation23.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation24.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation26.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation25.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation27.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation28.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation30.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano3/observation29.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation14.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation13.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation15.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation16.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation17.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation18.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation19.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation20.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation21.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation22.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation23.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation24.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation25.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation26.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation27.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation28.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation29.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano10/observation30.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation13.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation14.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano4/observation15.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano9/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation13.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation15.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation14.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation16.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation17.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation18.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation20.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation19.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation21.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation22.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation24.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation23.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation25.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation27.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano6/observation26.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano7/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation3.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation4.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation5.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation6.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation7.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation9.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation8.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation10.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation11.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation12.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation13.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation14.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation15.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation16.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation17.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation18.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation19.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation21.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation20.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation23.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation22.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation24.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation25.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation26.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano2/observation27.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano5/observation1.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano5/observation2.txt', '/content/drive/MyDrive/ML-SwitchUP/Mini Project/Volcano/Volcano_Dataset/Volcano5/observation3.txt']

EDA

Scatter plot of tilt-erupt vs time stamp

In [ ]:
for x in arrays:
  plt.plot(x[0, :], x[1, :])
No description has been provided for this image
In [ ]:
!mkdir path_prefix + 'Volcano_Dataset_Transformed'

Investigate Yeo-Johnson power transformation

In [ ]:
# Transformation applied separately for each observation to avoid data leakage between observations
from sklearn.preprocessing import PowerTransformer, power_transform

new_arrays = []
path = path_prefix + 'Volcano_Dataset_Transformed/obs'
for i, x in enumerate(arrays, 0):
  x_transformed = power_transform(x, method = 'yeo-johnson', standardize = False, copy = True)
  new_arrays.append(x_transformed)
  pd.DataFrame(x_transformed).to_csv(path+str(i)+'.csv', index = False, header = False)
In [ ]:
import os
from glob import glob

dir_t = path_prefix + 'Volcano_Dataset_Transformed/*'

fnames_t = glob(dir_t)
#print(fnames_t)

# Read each observation as a 2 x n array in a list of arrays
new_arrays = [np.loadtxt(f, delimiter = ',', dtype = np.float64) for f in fnames_t]
In [ ]:
plt.xlabel('Time')
plt.ylabel('tilt-erupt')
for x in new_arrays:
  plt.plot(x[0, :], x[1, :])
plt.show()
No description has been provided for this image
In [ ]:
plt.subplot(1,2,1)
plt.xlabel('Time to eruption')
for x in new_arrays:
  sns.kdeplot(x[0, :])

plt.subplots_adjust(wspace=0.4)

plt.subplot(1,2,2)
plt.xlabel('tilt-erupt')
for x in new_arrays:
  sns.kdeplot(x[1, :])
plt.show()
No description has been provided for this image

This shows that most of the points are in the rest phase of the volcano and there are fewer points towards the time of eruption.

Investigate simply fitting to a polynomial curve

In [ ]:
x = []
y = []
for item in new_arrays:
  x.append(item[0,:].reshape(-1,1))
  y.append(item[1,:].reshape(-1,1))

x = np.concatenate(x)
y = np.concatenate(y)

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Initialize the scaler
scaler = StandardScaler()

x_train_s = np.ravel(scaler.fit_transform(x_train))
x_test_s = np.ravel(scaler.transform(x_test))

print(y_train.shape)
print(y_test.shape)
print(x_train_s.shape)
print(x_test_s.shape)
(194261, 1)
(48566, 1)
(194261,)
(48566,)
In [ ]:
# Fit a polynomial curve
degree = 5  # Degree of the polynomial curve
coefficients = np.polyfit(x_train_s, y_train, degree)  # Fitting the polynomial curve
fitted_curve = np.polyval(coefficients, x_test_s)  # Evaluating the polynomial curve at given time points

# Plot the original data and the fitted curve
plt.scatter(x_train_s, y_train, label='Training Data', color = 'b')
plt.scatter(x_test_s, y_test, label='Test Data', color = 'g')
plt.scatter(x_test_s, fitted_curve, color='red', label='Fitted Curve (degree={})'.format(degree))
plt.xlabel('Time')
plt.ylabel('Tilt-Erupt')
plt.title('Fitting Polynomial Curve to Time Series Data')
plt.legend()
plt.grid(True)
plt.show()

print("r2 score of polynomial curve fitting: ",r2_score(fitted_curve,y_test))
print("Mean squared error of Linear Regression model: ",mean_squared_error(fitted_curve,y_test))
print("Mean absolute error of Linear Regression model: ",mean_absolute_error(fitted_curve,y_test))
print("Root mean squared error of Linear Regression model: ",np.sqrt(mean_squared_error(fitted_curve,y_test)))
No description has been provided for this image
r2 score of polynomial curve fitting:  -6.568577058389797
Mean squared error of Linear Regression model:  5.850870808354021e-17
Mean absolute error of Linear Regression model:  1.5680448224050767e-09
Root mean squared error of Linear Regression model:  7.649098514435555e-09

Increasing or decreasing the degree of polynomial did not help to improve Rsquare value considerably

Linear Regression with Polynomial Features

In [ ]:
#### Linear Regression with Polynomial Features

from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=degree)
x_poly = poly_features.fit_transform(x_train_s.reshape(-1, 1))

# Linear regression
model = LinearRegression()
model.fit(x_poly, y_train)

x_fit_poly = poly_features.transform(x_test_s.reshape(-1, 1))
y_fit = model.predict(x_fit_poly)

# Plot the original data and the fitted curve
plt.scatter(x_train_s, y_train, label='Training Data', color = 'b')
plt.scatter(x_test_s, y_test, label='Test Data', color = 'g')
plt.scatter(x_test_s, y_fit, color='red', label='Fitted Curve (degree={})'.format(degree))
plt.xlabel('Time')
plt.ylabel('Tilt-Erupt')
plt.title('Fitting Linear Regression with Polynomial Features to Time Series Data')
plt.legend()
plt.grid(True)
plt.show()

print("r2 score of LR with Poly features: ",r2_score(y_fit,y_test))
print("Mean squared error of LR with Poly features: ",mean_squared_error(y_fit,y_test))
print("Mean absolute error of LR with Poly features: ",mean_absolute_error(y_fit,y_test))
print("Root mean squared error of LR with poly features: ",np.sqrt(mean_squared_error(y_fit,y_test)))
No description has been provided for this image
r2 score of LR with Poly features:  -6.568577058389565
Mean squared error of LR with Poly features:  5.850870808354017e-17
Mean absolute error of LR with Poly features:  1.568044822405037e-09
Root mean squared error of LR with poly features:  7.649098514435552e-09
In [ ]:
# Linear regression
model = LinearRegression()
model.fit(x_train_s.reshape(-1,1), y_train)

y_predicted = model.predict(x_test_s.reshape(-1,1))

print("r2 score of LR with Poly features: ",r2_score(y_predicted,y_test))
print("Mean squared error of LR with Poly features: ",mean_squared_error(y_predicted,y_test))
print("Mean absolute error of LR with Poly features: ",mean_absolute_error(y_predicted,y_test))
print("Root mean squared error of LR with poly features: ",np.sqrt(mean_squared_error(y_predicted,y_test)))
r2 score of LR with Poly features:  -253.95022922819055
Mean squared error of LR with Poly features:  6.769170619890776e-17
Mean absolute error of LR with Poly features:  1.523567466034987e-09
Root mean squared error of LR with poly features:  8.227496958304376e-09

We see that Linear Regression, or polynomial fitting , neither works on the data set as given above. For time series, we need to try extracting features from windows.

Install required libraries

In [ ]:
! pip install tsfresh
Collecting tsfresh
  Downloading tsfresh-0.20.2-py2.py3-none-any.whl (95 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 95.8/95.8 kB 1.3 MB/s eta 0:00:00
Requirement already satisfied: requests>=2.9.1 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (2.31.0)
Requirement already satisfied: numpy>=1.15.1 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (1.25.2)
Requirement already satisfied: pandas>=0.25.0 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (2.0.3)
Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (1.11.4)
Requirement already satisfied: statsmodels>=0.13 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (0.14.1)
Requirement already satisfied: patsy>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (0.5.6)
Requirement already satisfied: scikit-learn>=0.22.0 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (1.2.2)
Requirement already satisfied: tqdm>=4.10.0 in /usr/local/lib/python3.10/dist-packages (from tsfresh) (4.66.2)
Collecting stumpy>=1.7.2 (from tsfresh)
  Downloading stumpy-1.12.0-py3-none-any.whl (169 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 169.1/169.1 kB 3.7 MB/s eta 0:00:00
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from tsfresh) (2.2.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.25.0->tsfresh) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.25.0->tsfresh) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.25.0->tsfresh) (2024.1)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from patsy>=0.4.1->tsfresh) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.9.1->tsfresh) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.9.1->tsfresh) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.9.1->tsfresh) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.9.1->tsfresh) (2024.2.2)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.22.0->tsfresh) (1.4.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.22.0->tsfresh) (3.4.0)
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.10/dist-packages (from statsmodels>=0.13->tsfresh) (24.0)
Requirement already satisfied: numba>=0.55.2 in /usr/local/lib/python3.10/dist-packages (from stumpy>=1.7.2->tsfresh) (0.58.1)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.55.2->stumpy>=1.7.2->tsfresh) (0.41.1)
Installing collected packages: stumpy, tsfresh
Successfully installed stumpy-1.12.0 tsfresh-0.20.2
In [ ]:
!pip install lazypredict
Collecting lazypredict
  Downloading lazypredict-0.2.12-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from lazypredict) (8.1.7)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from lazypredict) (1.2.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from lazypredict) (2.0.3)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from lazypredict) (4.66.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from lazypredict) (1.4.0)
Requirement already satisfied: lightgbm in /usr/local/lib/python3.10/dist-packages (from lazypredict) (4.1.0)
Requirement already satisfied: xgboost in /usr/local/lib/python3.10/dist-packages (from lazypredict) (2.0.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from lightgbm->lazypredict) (1.25.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from lightgbm->lazypredict) (1.11.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->lazypredict) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->lazypredict) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->lazypredict) (2024.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->lazypredict) (3.4.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->lazypredict) (1.16.0)
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.12

Extract Features over a sliding window on Time Series Data

  1. Windowing Technique: Each observation is divided into windows of size 'w' with an overlap of 'v', and features are extracted from each window. This allows for capturing temporal patterns within the data.

  2. Robustness: This windowing technique is robust to different observation lengths because residual readings are appended as the last window after the penultimate window of size 'w'. This allows for flexibility in handling observations of varying lengths.

  3. Two Tasks: There are two tasks: predicting the final 'tilt-erupt' value and predicting the 'time to eruption'.

  4. Different Y Values: For predicting the final 'tilt-erupt' value, the 'y' value appended to each window DataFrame is the same for all windows from the same observation. However, for predicting the 'time to eruption', the 'y' value is the beginning timestamp of each window, allowing for predicting the time until eruption.

  5. Feature Extraction: Features are extracted from the windows DataFrame using tsfresh.extract_features. This function performs a group by on 'column_id', which is defined differently for the two tasks. For predicting 'tilt-erupt', the 'column_id' is a tuple (obs_id, final tilt-erupt value), while for predicting 'time to eruption', it is a tuple (obs_id, first time stamp of window).

  6. Data Leakage Prevention: To avoid data leakage between training and test sets, it's crucial to separate them well in advance, before windowing. This ensures that overlapping windows don't end up in both the training and test sets.

In [ ]:
import math
import pandas as pd

from tsfresh.feature_extraction import ComprehensiveFCParameters,extract_features
from tsfresh.utilities.dataframe_functions import impute

from sklearn.model_selection import train_test_split

def computeFeatureVector(start_pos, end_pos, train_test_spec, w, overlap, ifptime) :
  df_windows = []
  count = 0
  for observation_num, x in enumerate(new_arrays[start_pos:end_pos], start_pos):
    df_obs = pd.DataFrame({'time': x[0, :].astype(float), 'Pressure': x[1, :], 'obs_id': observation_num})
    window_size = w
    window_hop = w - overlap
    start_frame = w
    end_frame = window_hop * math.floor(float(df_obs.shape[0]) / window_hop)

    df_list = []
    for frame_idx in range(start_frame, end_frame+1, window_hop):
      df_list.append(df_obs.iloc[frame_idx-window_size:frame_idx, :])
    df_list.append(df_obs.iloc[end_frame:df_obs.shape[0], :])

    for df in df_list:
      if (df.shape[0] == 0):
        break;
      df['y'] = x[1][x.shape[1]-1]
      if (ifptime):
        df['y'] = df.iloc[0][0]
      df['obs_id'] = list(zip(df.obs_id, df.y))
    pd.concat(df_list)
    df_windows.append(pd.concat(df_list))
  df_all_windows = pd.concat(df_windows)
  print(df_all_windows.shape)
  print(df_all_windows.head())
  path_suffix=""
  if (ifptime):
    path_suffix = "ptime"
  df_all_windows.to_csv(path_prefix + 'Windows' + str(w) + ',' + str(overlap) + path_suffix + train_test_spec+'.csv', index = False)

  df_full_features = extract_features(df_all_windows.drop(['y'], axis = 1, errors = 'ignore'), column_id='obs_id', column_sort="time", impute_function=impute, default_fc_parameters=ComprehensiveFCParameters())
  print(df_full_features.shape)
  print(df_full_features.index)
  df_full_features['y'] = [(a[1]) for a in df_full_features.index]
  df_full_features['obs_id'] = [(a[0]) for a in df_full_features.index]
  print(df_full_features.head())

  df_full_features.to_csv(path_prefix + 'WindowsComprehensiveExtractedFeatures' + str(w) + ',' + str(overlap) + path_suffix +train_test_spec+'.csv', index = False)

Prediction of tilt-erupt at eruption

In this section we will analyze models derived for features extracted for different window and overlap sizes for predicting the final tilt-erupt value at volcano eruption time.

The feature extraction is done by tsfresh.extract_features which will group the dataset based on column_id. In this case the column_id is a tuple (obs_id, last tilt-erupt value for the observation). This tuple is the same value for all windows from the same observation. The feature vector extracted from a training set of 150 observation will hence have 150 rows, one row for each observation, as the windows were grouped by column_id which is the tuple (obs_id, last tilt-erupt value of the observation).

w = 100, overlap = 0

In [ ]:
w = 100
ov = 0
In [ ]:
# Shuffle the array of observations so that it is
# not always the same set of observations which are training and test
np.random.shuffle(new_arrays)
In [ ]:
computeFeatureVector(0, 150, 'training', w, ov, False) # Observations 0 to 149 for training set
computeFeatureVector(150, 189, 'test', w, ov, False) # Observations 150 to 188 for test set
(235963, 4)
   time  Pressure                       obs_id    y
0 -7.42      0.00  (0, 3.5538654389651816e-07) 0.00
1 -7.42      0.00  (0, 3.5538654389651816e-07) 0.00
2 -7.42     -0.00  (0, 3.5538654389651816e-07) 0.00
3 -7.42      0.00  (0, 3.5538654389651816e-07) 0.00
4 -7.42      0.00  (0, 3.5538654389651816e-07) 0.00
WARNING:tsfresh.feature_extraction.settings:Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction: 100%|██████████| 150/150 [07:04<00:00,  2.83s/it]
(150, 783)
MultiIndex([(  0, 3.5538654389651816e-07),
            (  1, 3.5583512963837676e-07),
            (  2, 3.5536777062596756e-07),
            (  3, 3.2361928388801734e-08),
            (  4,  3.198339354297349e-08),
            (  5,  3.217059208356934e-08),
            (  6, 1.9559341821258307e-08),
            (  7,  1.949650826210547e-08),
            (  8,   8.23516017236024e-08),
            (  9,  1.964272370128105e-08),
            ...
            (140, 1.7804439487989928e-09),
            (141, 1.2529020699264907e-08),
            (142, 1.2482785393902992e-08),
            (143,  1.275488711165709e-08),
            (144,   1.86524683256835e-09),
            (145, 2.1835586551504416e-09),
            (146, 1.7075436484240946e-09),
            (147,  1.130550312394371e-09),
            (148,  7.367719536224891e-10),
            (149,  7.952451868192843e-10)],
           length=150)
        Pressure__variance_larger_than_standard_deviation  \
0 0.00                                               0.00   
1 0.00                                               0.00   
2 0.00                                               0.00   
3 0.00                                               0.00   
4 0.00                                               0.00   

        Pressure__has_duplicate_max  Pressure__has_duplicate_min  \
0 0.00                         0.00                         0.00   
1 0.00                         0.00                         0.00   
2 0.00                         0.00                         1.00   
3 0.00                         0.00                         0.00   
4 0.00                         0.00                         0.00   

        Pressure__has_duplicate  Pressure__sum_values  Pressure__abs_energy  \
0 0.00                     1.00                 -0.00                  0.00   
1 0.00                     1.00                 -0.00                  0.00   
2 0.00                     1.00                 -0.00                  0.00   
3 0.00                     1.00                  0.00                  0.00   
4 0.00                     1.00                  0.00                  0.00   

        Pressure__mean_abs_change  Pressure__mean_change  \
0 0.00                       0.00                   0.00   
1 0.00                       0.00                   0.00   
2 0.00                       0.00                   0.00   
3 0.00                       0.00                   0.00   
4 0.00                       0.00                   0.00   

        Pressure__mean_second_derivative_central  Pressure__median  ...  \
0 0.00                                      0.00             -0.00  ...   
1 0.00                                      0.00             -0.00  ...   
2 0.00                                      0.00             -0.00  ...   
3 0.00                                      0.00              0.00  ...   
4 0.00                                      0.00              0.00  ...   

        Pressure__fourier_entropy__bins_100  \
0 0.00                                 0.44   
1 0.00                                 0.74   
2 0.00                                 0.84   
3 0.00                                 0.14   
4 0.00                                 1.69   

        Pressure__permutation_entropy__dimension_3__tau_1  \
0 0.00                                               1.45   
1 0.00                                               1.50   
2 0.00                                               1.51   
3 0.00                                               1.51   
4 0.00                                               1.57   

        Pressure__permutation_entropy__dimension_4__tau_1  \
0 0.00                                               2.48   
1 0.00                                               2.56   
2 0.00                                               2.58   
3 0.00                                               2.59   
4 0.00                                               2.70   

        Pressure__permutation_entropy__dimension_5__tau_1  \
0 0.00                                               3.56   
1 0.00                                               3.66   
2 0.00                                               3.69   
3 0.00                                               3.73   
4 0.00                                               3.92   

        Pressure__permutation_entropy__dimension_6__tau_1  \
0 0.00                                               4.56   
1 0.00                                               4.67   
2 0.00                                               4.72   
3 0.00                                               4.76   
4 0.00                                               4.99   

        Pressure__permutation_entropy__dimension_7__tau_1  \
0 0.00                                               5.28   
1 0.00                                               5.39   
2 0.00                                               5.44   
3 0.00                                               5.48   
4 0.00                                               5.67   

        Pressure__query_similarity_count__query_None__threshold_0.0  \
0 0.00                                               0.00             
1 0.00                                               0.00             
2 0.00                                               0.00             
3 0.00                                               0.00             
4 0.00                                               0.00             

        Pressure__mean_n_absolute_max__number_of_maxima_7    y  obs_id  
0 0.00                                               0.00 0.00       0  
1 0.00                                               0.00 0.00       1  
2 0.00                                               0.00 0.00       2  
3 0.00                                               0.00 0.00       3  
4 0.00                                               0.00 0.00       4  

[5 rows x 785 columns]
(62664, 4)
   time  Pressure                        obs_id    y
0 -7.28     -0.00  (150, 2.623199859347049e-09) 0.00
1 -7.28     -0.00  (150, 2.623199859347049e-09) 0.00
2 -7.28      0.00  (150, 2.623199859347049e-09) 0.00
3 -7.28      0.00  (150, 2.623199859347049e-09) 0.00
4 -7.28      0.00  (150, 2.623199859347049e-09) 0.00
WARNING:tsfresh.feature_extraction.settings:Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction: 100%|██████████| 39/39 [01:50<00:00,  2.82s/it]
(39, 783)
MultiIndex([(150,  2.623199859347049e-09),
            (151, 1.6745400618242413e-09),
            (152, 2.3056905123176618e-09),
            (153,  2.130970730439121e-09),
            (154, 2.2203502550347822e-09),
            (155, 2.1720938225974015e-09),
            (156, 3.5291951621696047e-09),
            (157, 2.9365797970238453e-09),
            (158,   8.87742655268486e-08),
            (159, 2.9134127543336565e-09),
            (160,  8.908283301866573e-08),
            (161,  8.864296608370148e-08),
            (162, 5.3139513871931684e-08),
            (163,  5.375945532272394e-08),
            (164,  5.352125296215788e-08),
            (165,  4.359177948709936e-08),
            (166,  4.311601097723991e-08),
            (167,  4.293780085893408e-08),
            (168, 4.8256747930963684e-08),
            (169, 4.8084994427495644e-08),
            (170, 4.8641012340700516e-08),
            (171,  5.970207083196416e-08),
            (172,  5.988787647178924e-08),
            (173,   6.00232972447479e-08),
            (174, 1.0599871848013916e-07),
            (175, 1.0616198235927892e-07),
            (176, 1.0600264280439614e-07),
            (177,  4.613371717976378e-08),
            (178,  4.623525680005148e-08),
            (179,  4.612834016656873e-08),
            (180,  4.125577971897842e-08),
            (181,  4.101918938757605e-08),
            (182, 4.1330238129247715e-08),
            (183,  4.158545161946747e-08),
            (184, 4.1438173579691054e-08),
            (185,  4.206503600602408e-08),
            (186, 1.4080263397645962e-09),
            (187, 1.5685048816779147e-09),
            (188,    9.6059247231473e-10)],
           )
          Pressure__variance_larger_than_standard_deviation  \
150 0.00                                               0.00   
151 0.00                                               0.00   
152 0.00                                               0.00   
153 0.00                                               0.00   
154 0.00                                               0.00   

          Pressure__has_duplicate_max  Pressure__has_duplicate_min  \
150 0.00                         0.00                         1.00   
151 0.00                         0.00                         0.00   
152 0.00                         0.00                         0.00   
153 0.00                         0.00                         0.00   
154 0.00                         0.00                         0.00   

          Pressure__has_duplicate  Pressure__sum_values  Pressure__abs_energy  \
150 0.00                     1.00                  0.00                  0.00   
151 0.00                     1.00                  0.00                  0.00   
152 0.00                     1.00                  0.00                  0.00   
153 0.00                     1.00                  0.00                  0.00   
154 0.00                     1.00                  0.00                  0.00   

          Pressure__mean_abs_change  Pressure__mean_change  \
150 0.00                       0.00                   0.00   
151 0.00                       0.00                   0.00   
152 0.00                       0.00                   0.00   
153 0.00                       0.00                   0.00   
154 0.00                       0.00                   0.00   

          Pressure__mean_second_derivative_central  Pressure__median  ...  \
150 0.00                                      0.00              0.00  ...   
151 0.00                                     -0.00              0.00  ...   
152 0.00                                     -0.00              0.00  ...   
153 0.00                                      0.00              0.00  ...   
154 0.00                                     -0.00              0.00  ...   

          Pressure__fourier_entropy__bins_100  \
150 0.00                                 4.00   
151 0.00                                 3.93   
152 0.00                                 3.77   
153 0.00                                 3.92   
154 0.00                                 3.87   

          Pressure__permutation_entropy__dimension_3__tau_1  \
150 0.00                                               1.52   
151 0.00                                               1.61   
152 0.00                                               1.56   
153 0.00                                               1.60   
154 0.00                                               1.62   

          Pressure__permutation_entropy__dimension_4__tau_1  \
150 0.00                                               2.60   
151 0.00                                               2.78   
152 0.00                                               2.68   
153 0.00                                               2.75   
154 0.00                                               2.80   

          Pressure__permutation_entropy__dimension_5__tau_1  \
150 0.00                                               3.72   
151 0.00                                               4.01   
152 0.00                                               3.85   
153 0.00                                               3.95   
154 0.00                                               4.06   

          Pressure__permutation_entropy__dimension_6__tau_1  \
150 0.00                                               4.77   
151 0.00                                               5.19   
152 0.00                                               4.94   
153 0.00                                               5.01   
154 0.00                                               5.17   

          Pressure__permutation_entropy__dimension_7__tau_1  \
150 0.00                                               5.56   
151 0.00                                               5.92   
152 0.00                                               5.71   
153 0.00                                               5.67   
154 0.00                                               5.84   

          Pressure__query_similarity_count__query_None__threshold_0.0  \
150 0.00                                               0.00             
151 0.00                                               0.00             
152 0.00                                               0.00             
153 0.00                                               0.00             
154 0.00                                               0.00             

          Pressure__mean_n_absolute_max__number_of_maxima_7    y  obs_id  
150 0.00                                               0.00 0.00     150  
151 0.00                                               0.00 0.00     151  
152 0.00                                               0.00 0.00     152  
153 0.00                                               0.00 0.00     153  
154 0.00                                               0.00 0.00     154  

[5 rows x 785 columns]
In [ ]:
X_train_p = pd.read_csv(path_prefix + 'WindowsComprehensiveExtractedFeatures'+ str(w) + ',' + str(ov) + 'training.csv')
X_test_p = pd.read_csv(path_prefix + 'WindowsComprehensiveExtractedFeatures'+ str(w) + ',' + str(ov) +'test.csv')

pressures_train = X_train_p['y'] # Training target
pressures_test = X_test_p['y'] # Test target

X_train_p = X_train_p.drop(['y', 'obs_id'], axis = 1)
X_test_p = X_test_p.drop(['y', 'obs_id'], axis = 1)

column_names = X_train_p.columns

scaler = StandardScaler()
X_train_ps = pd.DataFrame(scaler.fit_transform(X_train_p), columns=column_names)
X_test_ps = pd.DataFrame(scaler.transform(X_test_p), columns=column_names)
In [ ]:
print(X_train_ps.shape)
print(X_test_ps.shape)
print(pressures_train.shape)
print(pressures_test.shape)
(150, 783)
(39, 783)
(150,)
(39,)
In [ ]:
from tsfresh import select_features

X_train_pressure = select_features(X_train_ps, pressures_train)
X_test_pressure = X_test_ps[X_train_pressure.columns]

print(X_train_pressure.shape)
print(X_test_pressure.shape)
(150, 506)
(39, 506)
In [ ]:
#for x in X_train_pressure.columns:
  #print(x)

LazyPredict

In [ ]:
from lazypredict.Supervised import LazyRegressor

lpredict = LazyRegressor(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = lpredict.fit(X_train_pressure, X_test_pressure, pressures_train, pressures_test)


print(models)
 95%|█████████▌| 40/42 [00:12<00:00,  7.83it/s]
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001142 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 25410
[LightGBM] [Info] Number of data points in the train set: 150, number of used features: 501
[LightGBM] [Info] Start training from score 0.000000
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
100%|██████████| 42/42 [00:13<00:00,  3.13it/s]
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
                                                   Adjusted R-Squared  \
Model                                                                   
SGDRegressor                  336163185725451509960840059983757312.00   
MLPRegressor                                         7215713882599.33   
HuberRegressor                                                  68.66   
NuSVR                                                           14.01   
SVR                                                              3.69   
DecisionTreeRegressor                                            1.32   
LGBMRegressor                                                    1.24   
PassiveAggressiveRegressor                                       1.22   
GaussianProcessRegressor                                         1.22   
KernelRidge                                                      1.19   
HistGradientBoostingRegressor                                    1.16   
TweedieRegressor                                                 1.08   
ElasticNet                                                       1.08   
LassoLars                                                        1.08   
LassoLarsCV                                                      1.08   
DummyRegressor                                                   1.08   
Lars                                                             1.08   
LarsCV                                                           1.08   
Lasso                                                            1.08   
PoissonRegressor                                                 1.08   
XGBRegressor                                                     1.08   
ExtraTreeRegressor                                               1.07   
GammaRegressor                                                   1.04   
KNeighborsRegressor                                              1.02   
GradientBoostingRegressor                                        1.01   
ExtraTreesRegressor                                              1.01   
BaggingRegressor                                                 1.01   
RandomForestRegressor                                            1.01   
AdaBoostRegressor                                                1.01   
Ridge                                                            1.00   
BayesianRidge                                                    1.00   
LinearRegression                                                 1.00   
TransformedTargetRegressor                                       1.00   
LinearSVR                                                        1.00   
RidgeCV                                                          1.00   
OrthogonalMatchingPursuitCV                                      1.00   
OrthogonalMatchingPursuit                                        1.00   
LassoCV                                                          1.00   
ElasticNetCV                                                     1.00   

                                                              R-Squared  \
Model                                                                     
SGDRegressor                  -4140115024197665413320312640482836480.00   
MLPRegressor                                         -88867213080420.52   
HuberRegressor                                                  -832.30   
NuSVR                                                           -159.19   
SVR                                                              -32.18   
DecisionTreeRegressor                                             -2.90   
LGBMRegressor                                                     -1.90   
PassiveAggressiveRegressor                                        -1.69   
GaussianProcessRegressor                                          -1.69   
KernelRidge                                                       -1.34   
HistGradientBoostingRegressor                                     -0.91   
TweedieRegressor                                                  -0.03   
ElasticNet                                                        -0.03   
LassoLars                                                         -0.03   
LassoLarsCV                                                       -0.03   
DummyRegressor                                                    -0.03   
Lars                                                              -0.03   
LarsCV                                                            -0.03   
Lasso                                                             -0.03   
PoissonRegressor                                                  -0.03   
XGBRegressor                                                      -0.03   
ExtraTreeRegressor                                                 0.20   
GammaRegressor                                                     0.51   
KNeighborsRegressor                                                0.79   
GradientBoostingRegressor                                          0.83   
ExtraTreesRegressor                                                0.84   
BaggingRegressor                                                   0.86   
RandomForestRegressor                                              0.86   
AdaBoostRegressor                                                  0.88   
Ridge                                                              0.99   
BayesianRidge                                                      0.99   
LinearRegression                                                   0.99   
TransformedTargetRegressor                                         0.99   
LinearSVR                                                          0.99   
RidgeCV                                                            1.00   
OrthogonalMatchingPursuitCV                                        1.00   
OrthogonalMatchingPursuit                                          1.00   
LassoCV                                                            1.00   
ElasticNetCV                                                       1.00   

                                        RMSE  Time Taken  
Model                                                     
SGDRegressor                  64686553991.36        0.05  
MLPRegressor                            0.30        0.30  
HuberRegressor                          0.00        0.13  
NuSVR                                   0.00        0.04  
SVR                                     0.00        0.05  
DecisionTreeRegressor                   0.00        0.13  
LGBMRegressor                           0.00        0.34  
PassiveAggressiveRegressor              0.00        0.05  
GaussianProcessRegressor                0.00        0.10  
KernelRidge                             0.00        0.07  
HistGradientBoostingRegressor           0.00        0.70  
TweedieRegressor                        0.00        0.06  
ElasticNet                              0.00        0.04  
LassoLars                               0.00        0.08  
LassoLarsCV                             0.00        0.06  
DummyRegressor                          0.00        0.04  
Lars                                    0.00        0.08  
LarsCV                                  0.00        0.05  
Lasso                                   0.00        0.04  
PoissonRegressor                        0.00        0.07  
XGBRegressor                            0.00        0.36  
ExtraTreeRegressor                      0.00        0.10  
GammaRegressor                          0.00        0.55  
KNeighborsRegressor                     0.00        0.04  
GradientBoostingRegressor               0.00        0.31  
ExtraTreesRegressor                     0.00        0.35  
BaggingRegressor                        0.00        0.13  
RandomForestRegressor                   0.00        1.03  
AdaBoostRegressor                       0.00        0.65  
Ridge                                   0.00        0.05  
BayesianRidge                           0.00        0.08  
LinearRegression                        0.00        0.05  
TransformedTargetRegressor              0.00        0.06  
LinearSVR                               0.00        0.23  
RidgeCV                                 0.00        0.10  
OrthogonalMatchingPursuitCV             0.00        0.12  
OrthogonalMatchingPursuit               0.00        0.05  
LassoCV                                 0.00        2.35  
ElasticNetCV                            0.00        4.15  

Accordig to alaysis by LazyPredict, Linear models like RidgeCV, Linear Regression are suggested as the best models for this task. We will focus this study on Linear Regression, then Linear SVR, then we can try a Decision Tree, Random Forest and KNeighborsRegressor to see if we can infer anything useful.

Linear Models

Linear Regression [Good Model according to this study for predicting tilt-erupt]

In [ ]:
from sklearn.linear_model import LinearRegression

model_lr = LinearRegression()

param_grid = {
    'fit_intercept': [True, False],       # Tuning fit_intercept
    'copy_X': [True, False],               # Tuning copy_X
    'positive': [True, False],             # Tuning positive
    'n_jobs': [None, -1],                  # Tuning n_jobs
}

# Instantiate the GridSearchCV object
grid_search_lr = GridSearchCV(model_lr, param_grid, cv=5, scoring='neg_mean_squared_error')

# Fit the model to the training data
grid_search_lr.fit(X_train_pressure, pressures_train)

# Get the best model and its parameters
best_model_lr = grid_search_lr.best_estimator_
best_params_lr = grid_search_lr.best_params_

# Make predictions on the test data
prediction_lr = best_model_lr.predict(X_test_pressure)
print("best parameters:",best_params_lr)
print("r2 score of Linear Regression model: ",r2_score(prediction_lr,pressures_test))
print("Mean squared error of Linear Regression model: ",mean_squared_error(prediction_lr,pressures_test))
print("MAE score of Linear Regression model: ",mean_absolute_error(prediction_lr,pressures_test))
#print("MSLE score of Linear Regression model: ",mean_squared_log_error(prediction_lr,pressures_test))
print("MAPE score of Linear Regression model: ",mean_absolute_percentage_error(prediction_lr,pressures_test))
best parameters: {'copy_X': True, 'fit_intercept': True, 'n_jobs': None, 'positive': True}
r2 score of Linear Regression model:  0.998803034173374
Mean squared error of Linear Regression model:  1.2533941491150075e-18
MAE score of Linear Regression model:  6.670393160729303e-10
MAPE score of Linear Regression model:  0.04087138065695761

Evaluation of Linear regression best model

Residual Evaluation
In [ ]:
### Calculate residual

prediction_lr_train = best_model_lr.predict(X_train_pressure)

train_residuals = pressures_train - prediction_lr_train
test_residuals = pressures_test - prediction_lr

Plot residuals, density of residuals vs predicted values
In [ ]:
# Residual evaluation plots
plt.figure(figsize=(12, 6))

# Residuals vs Predictions plot
plt.subplot(2, 2, 1)
plt.scatter(pressures_test, test_residuals, c='green', marker='s', label='Test data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

plt.subplot(2, 2, 2)
plt.scatter(pressures_train, train_residuals, c='blue', marker='o', label='Training data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

# Residuals distribution plot
plt.subplot(2, 2, 3)
sns.histplot(test_residuals, bins = 20, color='green', label='Test residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.subplot(2, 2, 4)
sns.histplot(train_residuals, bins = 20, color='blue', label='Training residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

For training data, the residuals follow a normal distribution relative to predicted values. For certain window / overlap sizes, in the test data, it looks like there are outliers, which is expected as test data is scaled to transform, not to fit. The trained model is hence capable of detecting outliers in unseen data.

QQ Plot of residuals
In [ ]:
from cProfile import label
import statsmodels.api as sm
import scipy.stats as stats

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.qqplot(test_residuals, stats.t, fit=True, line="45", ax=ax1)
ax1.set_xlabel('Theoretical Quantiles')
ax1.set_ylabel('Sample Quantiles')
ax1.set_title('Q-Q Plot of Test Residuals')

sm.qqplot(train_residuals, stats.t, fit=True, line="45", ax=ax2)
ax2.set_xlabel('Theoretical Quantiles')
ax2.set_ylabel('Sample Quantiles')
ax2.set_title('Q-Q Plot of Training Residuals')

plt.tight_layout()
plt.show()
No description has been provided for this image

The training residuals follow a normal distribution and plot along the 45 degree line. For the test residuals, it follows the 45 degree line but not exactly. RidgeCV , analyzed below, showed better QQ plot for test data set. Ridge CV has also showed good results in this analysis.

Plot residuals against the most significant feature
In [ ]:
def find_most_important_feature(features,model):

    features = features.columns
    coefficients = best_model_lr.coef_
    #most_significant_feature_index = abs(coefficients).argmax()
    most_significant_feature_index = coefficients.argmax()
    most_significant_feature = features[most_significant_feature_index]
    print("Most significant feature:", most_significant_feature)
    print("Coefficient:", coefficients[most_significant_feature_index])
    return most_significant_feature
In [ ]:
print('\n Most important feature for Linear Regression\n')
most_important_feature = find_most_important_feature(X_train_pressure,best_model_lr)
 Most important feature for Linear Regression

Most significant feature: Pressure__maximum
Coefficient: 5.3460227410884995e-08
In [ ]:
# Residual evaluation plots
plt.figure(figsize=(8, 4))

# Residuals vs Predictions plot
plt.subplot(1, 2, 1)
plt.scatter(X_test_pressure[most_important_feature], test_residuals, c='green', marker='s', label='Test data')
plt.xlabel(most_important_feature)
plt.ylabel('Residuals')
plt.title('Residuals vs most significant feature')
plt.legend()

plt.subplot(1, 2, 2)
plt.scatter(X_train_pressure[most_important_feature], train_residuals, c='blue', marker='o', label='Training data')
plt.xlabel(most_important_feature)
plt.ylabel('Residuals')
plt.title('Residuals vs most significant feature')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

This test is done to understand if there is any correlation between the residuals and the most significant feature. You can see that residuals are mostly concentrated around zero on X and Y axes and otherwise the residuals are randomly spread.

Test autocorrelation of residuals against time lags
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.graphics.tsa.plot_acf(test_residuals, lags=20, ax = ax1)
ax1.set_xlabel('Lag')
ax1.set_ylabel('Autocorrelation')
ax1.set_title('Autocorrelation of Test Residuals')

sm.graphics.tsa.plot_acf(train_residuals, lags=50, ax = ax2)
ax2.set_xlabel('Lag')
ax2.set_ylabel('Autocorrelation')
ax2.set_title('Autocorrelation of Training Residuals')

plt.show()
No description has been provided for this image

The residuals are independent of each other across time lag.

Plot actual vs predicted values for training and test

In the residuals vs fitted data and residuals vs predictor plots we saw above, we saw an X axis imbalance for the training data set. This does not necessarily mean that the model has a problem. We should plot actual vs predicted data to see if there is a lot of deviation from the 45 degree line.

In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

ax1.scatter(prediction_lr, pressures_test, label='Test data')
#ax1.axline(xy1=(0, 0), xy2 =(1,1), color='r', lw=2)
ax1.set_xlabel('Predicted')
ax1.set_ylabel('Actual')
ax1.grid(True)
ax1.legend()
ax1.set_title('predicted vs actual')

ax2.plot(prediction_lr_train, pressures_train, 'o', label='Training data')
ax2.set_xlabel('Predicted')
ax2.set_ylabel('Actual')
ax2.set_title('predicted vs actual')
ax2.grid(True)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

The points seem to all align to a 45 degree line. If not we could have tuned the model. This proves that the X axis impbalance we saw previously can be ignored. It just showed that we had more points closer to zero predicted value.

In [ ]:
 

Plot predicted values and actual values against most significant feature
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

Xvalues_test = X_test_pressure[most_important_feature]
ax1.scatter(Xvalues_test, pressures_test,  color='black', linewidth = 3, label = 'test')
ax1.scatter(Xvalues_test, prediction_lr, color='green', label = 'prediction')
ax1.set_xlabel(most_important_feature)
ax1.set_ylabel('tilt-erupt')
ax1.set_title('Linear Regression from test data')
ax2.legend(loc="upper left")

Xvalues_train = X_train_pressure[most_important_feature]
ax2.scatter(Xvalues_train, pressures_train,  color='black', linewidth = 3, label = 'training')
ax2.scatter(Xvalues_train, prediction_lr_train, color='blue', label = 'prediction')
ax2.set_xlabel(most_important_feature)
ax2.set_ylabel('tilt-erupt')
ax2.set_title('Linear Regression from test data')
ax2.legend(loc="upper left")
plt.tight_layout()
plt.show()
WARNING:matplotlib.legend:No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No description has been provided for this image

Evaluate Cross-validation error for training data
In [ ]:
# evaluate the model
cv = RepeatedKFold(random_state=1)

n_scores_lr = cross_val_score(best_model_lr, X_train_pressure, pressures_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')

# report performance
print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_lr), np.std(n_scores_lr)))

sns.distplot(n_scores_lr , label = 'Best_linear_regression')

plt.legend()
plt.show()
Avg. error: 0.998 (0.003)
No description has been provided for this image

Linear Support Vector Regression

In [ ]:
from sklearn.svm import LinearSVR

model_lsvr = LinearSVR()

param_grid = {
    'C': [0.1, 1, 10, 100],
    'epsilon': [0.1, 0.2, 0.3, 0.4],
    'fit_intercept': [True, False],
    'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive'],
    'max_iter': [1000, 2000, 3000],
    'tol': [1e-3, 1e-4, 1e-5],
    'verbose': [False],
    'random_state': [42],
    'dual': [True, False]
}

# Instantiate the GridSearchCV object
grid_search_lsvr = GridSearchCV(model_lsvr, param_grid, cv=5, scoring='r2')

# Fit the model to the training data
grid_search_lsvr.fit(X_train_pressure, pressures_train)

# Get the best model and its parameters
best_model_lsvr = grid_search_lsvr.best_estimator_
best_params_lsvr = grid_search_lsvr.best_params_

# Make predictions on the test data
prediction_lsvr = best_model_lsvr.predict(X_test_pressure)
print("best parameters:",best_params_lsvr)
print("r2 score of Linear SV Regression model: ",r2_score(prediction_lsvr,pressures_test))
print("Mean squared error of Linear SV Regression model: ",mean_squared_error(prediction_lsvr,pressures_test))
best parameters: {'C': 0.1, 'dual': True, 'epsilon': 0.1, 'fit_intercept': True, 'loss': 'epsilon_insensitive', 'max_iter': 1000, 'random_state': 42, 'tol': 0.001, 'verbose': False}
r2 score of Linear SV Regression model:  0.0
Mean squared error of Linear SV Regression model:  2.721163533922422e-15

Not investigating Linear SVR further as the test above did not produce good r2 value.

RidgeCV [As good as LR for predicting tilt-erupt]

In [ ]:
model_rcv = RidgeCV(alphas=[0.1, 0.5, 1.0, 5.0, 10.0, 100, 150, 200], cv=5, scoring = 'r2', gcv_mode = 'eigen')

# Fit the model to the training data
model_rcv.fit(X_train_pressure, pressures_train)

# Make predictions on the test data
prediction_rcv = model_rcv.predict(X_test_pressure)
print("Best alpha:", model_rcv.alpha_)
print("r2 score of RidgeCV model: ",r2_score(prediction_rcv,pressures_test))
print("Mean squared error of RidgeCV model: ",mean_squared_error(prediction_rcv,pressures_test))
print("MAE score of Linear Regression model: ",mean_absolute_error(prediction_rcv,pressures_test))
print("MAPE score of Linear Regression model: ",mean_absolute_percentage_error(prediction_rcv,pressures_test))
Best alpha: 1.0
r2 score of RidgeCV model:  0.9973216655765884
Mean squared error of RidgeCV model:  2.752770938887454e-18
MAE score of Linear Regression model:  1.290416203417458e-09
MAPE score of Linear Regression model:  0.19155376993325463

RidgeCV is also a good model for this purpose and prduced similar result as Linear Regression and we could analyze this model further like we did LR. Here, we just plot the residuals to make sure its distribution looks good.

Evaluation of RidgeCV for predicting tilt-erupt at eruption time

Residual Evaluation
In [ ]:
prediction_rcv_train = model_rcv.predict(X_train_pressure)

train_residuals_rcv = pressures_train - prediction_rcv_train
test_residuals_rcv = pressures_test - prediction_rcv

Plot residuals, density of residuals vs predicted values
In [ ]:
# Residual evaluation plots
plt.figure(figsize=(12, 6))

# Residuals vs Predictions plot
plt.subplot(2, 2, 1)
plt.scatter(pressures_test, test_residuals_rcv, c='green', marker='s', label='Test data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

plt.subplot(2, 2, 2)
plt.scatter(pressures_train, train_residuals_rcv, c='blue', marker='o', label='Training data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

# Residuals distribution plot
plt.subplot(2, 2, 3)
sns.histplot(test_residuals_rcv, bins = 20, color='green', label='Test residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.subplot(2, 2, 4)
sns.histplot(train_residuals_rcv, bins = 20, color='blue', label='Training residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

QQPlot
In [ ]:
from cProfile import label
import statsmodels.api as sm
import scipy.stats as stats

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.qqplot(test_residuals_rcv, stats.t, fit=True, line="45", ax=ax1)
ax1.set_xlabel('Theoretical Quantiles')
ax1.set_ylabel('Sample Quantiles')
ax1.set_title('Q-Q Plot of Test Residuals')

sm.qqplot(train_residuals_rcv, stats.t, fit=True, line="45", ax=ax2)
ax2.set_xlabel('Theoretical Quantiles')
ax2.set_ylabel('Sample Quantiles')
ax2.set_title('Q-Q Plot of Training Residuals')

plt.tight_layout()
plt.show()
No description has been provided for this image

Test autocorrelation of residuals against time lags
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.graphics.tsa.plot_acf(test_residuals_rcv, lags=20, ax = ax1)
ax1.set_xlabel('Lag')
ax1.set_ylabel('Autocorrelation')
ax1.set_title('Autocorrelation of Test Residuals')

sm.graphics.tsa.plot_acf(train_residuals_rcv, lags=50, ax = ax2)
ax2.set_xlabel('Lag')
ax2.set_ylabel('Autocorrelation')
ax2.set_title('Autocorrelation of Training Residuals')

plt.show()
No description has been provided for this image

Cross-validation of training data
In [ ]:
# evaluate the model
cv = RepeatedKFold(random_state=1)

n_scores_rcv = cross_val_score(model_rcv, X_train_pressure, pressures_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')

# report performance
print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_rcv), np.std(n_scores_rcv)))

sns.distplot(n_scores_lr , label = 'RidgeCV Cross Validation')

plt.legend()
plt.show()
Avg. error: 0.998 (0.003)
No description has been provided for this image
In [ ]:
print('\n Most important feature for RidgeCV\n')
most_important_feature = find_most_important_feature(X_train_pressure,model_rcv)
print(most_important_feature)
 Most important feature for RidgeCV

Most significant feature: Pressure__mean_second_derivative_central
Coefficient: 3.9445295237626595e-09
Pressure__mean_second_derivative_central

RidgeCV also looks like a very good model for predicting tilt-erupt value at explosion time.

Decision Tree

In [ ]:
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor(random_state=1)
dt_parameters = {
'max_depth':[2,4,8,12,16,24,32],
'max_features':['auto','sqrt','log2']}
dt_search = GridSearchCV(model,dt_parameters,cv=5,scoring='r2')
dt_search.fit(X_train_pressure, pressures_train)

#Import necessary metrics
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

best_model_dt = dt_search.best_estimator_
best_params_dt = dt_search.best_params_
prediction_dt = best_model_dt.predict(X_test_pressure)
print("best parameters:",best_params_dt)
print("r2 score of best decision tree model: ",r2_score(prediction_dt,pressures_test))
print("Mean squared error of best decision tree model: ",mean_squared_error(prediction_dt,pressures_test))
best parameters: {'max_depth': 4, 'max_features': 'log2'}
r2 score of best decision tree model:  0.4382403260270359
Mean squared error of best decision tree model:  2.985718032405973e-15

Decision Tree is not a good model for predicting tilt-erupt at explosion time.

Random Forest

In [ ]:
from sklearn.ensemble import RandomForestRegressor
model_rf = RandomForestRegressor(random_state=123)
rf_parameters = {
'max_depth':[2,4,8,12,16,20,24,32],
'n_estimators':[10, 20, 30, 40, 60, 80, 100, 150]}
rf_search = GridSearchCV(model_rf,rf_parameters,cv=5,scoring='r2')
rf_search.fit(X_train_pressure, pressures_train)

#Import necessary metrics
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

best_model_rf = rf_search.best_estimator_
best_params_rf = rf_search.best_params_
prediction_rf = best_model_rf.predict(X_test_pressure)
print("best parameters:",best_params_rf)
print("r2 score of best random forest model: ",r2_score(prediction_rf,pressures_test))
print("Mean squared error of best random forest model: ",mean_squared_error(prediction_rf,pressures_test))
best parameters: {'max_depth': 2, 'n_estimators': 100}
r2 score of best random forest model:  0.9348132202885535
Mean squared error of best random forest model:  4.8065633785474004e-17

Random Forest seems to perform well, but when we have got a simple Linear model like Linear Regression and RidgeCV, why take on a complex model like Random Forest. We are not digging in to evaluate Random forest for this task.

KNeighborsRegressor

In [ ]:
from sklearn.neighbors import KNeighborsRegressor

knn = KNeighborsRegressor()
knn_parameters = {
'n_neighbors':[2,4,8,12,16,20,24,32]}
knn_search = GridSearchCV(knn,knn_parameters,cv=5,scoring='r2')
knn_search.fit(X_train_pressure, pressures_train)

#Import necessary metrics
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

best_model_knn = knn_search.best_estimator_
best_params_knn = knn_search.best_params_
prediction_knn = best_model_knn.predict(X_test_pressure)
print("best parameters:",best_params_knn)
print("r2 score of best knn model: ",r2_score(prediction_knn,pressures_test))
print("Mean squared error of best knn model: ",mean_squared_error(prediction_knn,pressures_test))
print("MAE score of Linear Regression model: ",mean_absolute_error(prediction_rcv,pressures_test))
print("MAPE score of Linear Regression model: ",mean_absolute_percentage_error(prediction_rcv,pressures_test))
best parameters: {'n_neighbors': 2}
r2 score of best knn model:  0.6379213619645967
Mean squared error of best knn model:  2.329026088628052e-16
MAE score of Linear Regression model:  1.290416203417458e-09
MAPE score of Linear Regression model:  0.19155376993325463

We have got better results with Linear Regression and RidgeCV. Hence we skip analysing this model.

Plot feature importance of different models for predicting tilt-erupt

In [ ]:
def plot_feature_importance(features,model, isLinear):

    print('\n Feature importance plot\n')
    features = features.columns
    importances = model.coef_ if (isLinear == True ) else model.feature_importances_
    indices = np.argsort(importances)[::-1]
    fig, ax =plt.subplots(figsize=(4,4))
    plt.title('Feature Importances')
    plt.barh(range(5), [importances[i] for i in indices[0:5]], color='b', align='center')
    plt.yticks(range(5), [features[i] for i in indices[0:5]])
    plt.xlabel('Relative Importance')
    plt.show()

    for i in indices[0:5]  :
      print(features[i],importances[i])

    return features[indices[0]]

RidgeCV

In [ ]:
most_important_feature_rcv_p = plot_feature_importance(X_train_pressure,model_rcv, True)
 Feature importance plot

No description has been provided for this image
Pressure__mean_second_derivative_central 2.4707501019723e-09
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.8 1.5564531088929196e-09
Pressure__agg_linear_trend__attr_"slope"__chunk_len_50__f_agg_"max" 1.5089377218097741e-09
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_50__f_agg_"max" 1.4429100054622083e-09
Pressure__agg_linear_trend__attr_"slope"__chunk_len_10__f_agg_"max" 1.4057883211046688e-09

Linear Regression

In [ ]:
most_important_feature_lr_p = plot_feature_importance(X_train_pressure,best_model_lr, True)
 Feature importance plot

No description has been provided for this image
Pressure__mean_second_derivative_central 3.9445295237626595e-09
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.0 1.8225358627781535e-09
Pressure__mean_change 1.8225358627781363e-09
Pressure__agg_linear_trend__attr_"slope"__chunk_len_50__f_agg_"max" 1.743411483948896e-09
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_50__f_agg_"max" 1.7049028104894433e-09

Random Forest

In [ ]:
most_important_feature_rf_p = plot_feature_importance(X_train_pressure,best_model_rf, False)
 Feature importance plot

No description has been provided for this image
Pressure__minimum 0.07432327413589969
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_5__f_agg_"var" 0.06620647248735846
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.6 0.04789272830714917
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_50__f_agg_"var" 0.03989776691273986
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_1.0__ql_0.6 0.03739358555741338

Prediction of time to eruption

w= 300, overlap = 10

Feature Extraction

We will analyze this task for different window sizes.

In [ ]:
w_t = 300
ov_t = 10
In [ ]:
# Shuffle the array of observations so that it is
# not always the same set of observations which are training and test
np.random.shuffle(new_arrays)
In [ ]:
computeFeatureVector(0, 150, 'training', w_t, ov_t, True)
computeFeatureVector(150, 189, 'test', w_t, ov_t, True)
(152783, 4)
   time  Pressure                   obs_id     y
0 -7.42      0.00  (0, -7.418179380927304) -7.42
1 -7.42      0.00  (0, -7.418179380927304) -7.42
2 -7.42     -0.00  (0, -7.418179380927304) -7.42
3 -7.42      0.00  (0, -7.418179380927304) -7.42
4 -7.42      0.00  (0, -7.418179380927304) -7.42
WARNING:tsfresh.feature_extraction.settings:Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction: 100%|██████████| 580/580 [01:33<00:00,  6.21it/s]
(580, 783)
MultiIndex([(  0,  -7.418179380927304),
            (  0,  -7.226933041613358),
            (  0,  -6.990254385239971),
            (  0,  -6.679595019152675),
            (  0,  -5.375278407684165),
            (  1,  -7.418180822726788),
            (  1,  -7.226937123122801),
            (  1,  -6.990258298229178),
            (  1, -6.6795918578357965),
            (  1, -5.3752735746945675),
            ...
            (147,  -6.977281341630747),
            (147,   -6.66185539526958),
            (147,  -5.308265688129788),
            (148, -7.1380739880059965),
            (148,  -6.876265310232346),
            (148,  -6.520621753389733),
            (148,  -4.595114593374947),
            (149,  -7.030856742579076),
            (149,  -6.734591659972948),
            (149,  -5.564520864150209)],
           length=580)
         Pressure__variance_larger_than_standard_deviation  \
0 -7.42                                               0.00   
  -7.23                                               0.00   
  -6.99                                               0.00   
  -6.68                                               0.00   
  -5.38                                               0.00   

         Pressure__has_duplicate_max  Pressure__has_duplicate_min  \
0 -7.42                         0.00                         1.00   
  -7.23                         1.00                         0.00   
  -6.99                         0.00                         1.00   
  -6.68                         1.00                         0.00   
  -5.38                         0.00                         0.00   

         Pressure__has_duplicate  Pressure__sum_values  Pressure__abs_energy  \
0 -7.42                     1.00                 -0.00                  0.00   
  -7.23                     1.00                 -0.00                  0.00   
  -6.99                     1.00                 -0.00                  0.00   
  -6.68                     1.00                 -0.00                  0.00   
  -5.38                     1.00                 -0.00                  0.00   

         Pressure__mean_abs_change  Pressure__mean_change  \
0 -7.42                       0.00                  -0.00   
  -7.23                       0.00                  -0.00   
  -6.99                       0.00                  -0.00   
  -6.68                       0.00                  -0.00   
  -5.38                       0.00                   0.00   

         Pressure__mean_second_derivative_central  Pressure__median  ...  \
0 -7.42                                     -0.00             -0.00  ...   
  -7.23                                      0.00             -0.00  ...   
  -6.99                                      0.00             -0.00  ...   
  -6.68                                      0.00             -0.00  ...   
  -5.38                                      0.00             -0.00  ...   

         Pressure__fourier_entropy__bins_100  \
0 -7.42                                 1.78   
  -7.23                                 0.53   
  -6.99                                 2.49   
  -6.68                                 1.15   
  -5.38                                 0.16   

         Pressure__permutation_entropy__dimension_3__tau_1  \
0 -7.42                                               1.75   
  -7.23                                               1.77   
  -6.99                                               1.68   
  -6.68                                               1.70   
  -5.38                                               1.49   

         Pressure__permutation_entropy__dimension_4__tau_1  \
0 -7.42                                               3.04   
  -7.23                                               3.12   
  -6.99                                               2.91   
  -6.68                                               2.94   
  -5.38                                               2.51   

         Pressure__permutation_entropy__dimension_5__tau_1  \
0 -7.42                                               4.39   
  -7.23                                               4.54   
  -6.99                                               4.16   
  -6.68                                               4.09   
  -5.38                                               3.46   

         Pressure__permutation_entropy__dimension_6__tau_1  \
0 -7.42                                               5.21   
  -7.23                                               5.42   
  -6.99                                               4.99   
  -6.68                                               4.87   
  -5.38                                               3.98   

         Pressure__permutation_entropy__dimension_7__tau_1  \
0 -7.42                                               5.52   
  -7.23                                               5.61   
  -6.99                                               5.34   
  -6.68                                               5.20   
  -5.38                                               4.16   

         Pressure__query_similarity_count__query_None__threshold_0.0  \
0 -7.42                                               0.00             
  -7.23                                               0.00             
  -6.99                                               0.00             
  -6.68                                               0.00             
  -5.38                                               0.00             

         Pressure__mean_n_absolute_max__number_of_maxima_7     y  obs_id  
0 -7.42                                               0.00 -7.42       0  
  -7.23                                               0.00 -7.23       0  
  -6.99                                               0.00 -6.99       0  
  -6.68                                               0.00 -6.68       0  
  -5.38                                               0.00 -5.38       0  

[5 rows x 785 columns]
(40714, 4)
   time  Pressure                      obs_id     y
0 -7.28     -0.00  (150, -7.2834494338709765) -7.28
1 -7.28     -0.00  (150, -7.2834494338709765) -7.28
2 -7.28      0.00  (150, -7.2834494338709765) -7.28
3 -7.28      0.00  (150, -7.2834494338709765) -7.28
4 -7.28      0.00  (150, -7.2834494338709765) -7.28
WARNING:tsfresh.feature_extraction.settings:Dependency not available for matrix_profile, this feature will be disabled!
Feature Extraction: 100%|██████████| 155/155 [00:25<00:00,  6.15it/s]
(155, 783)
MultiIndex([(150, -7.2834494338709765),
            (150, -7.0613250026343115),
            (150,   -6.77536508845462),
            (150,   -6.37331733785454),
            (150, -1.9459175593575684),
            (151, -7.2181688292566655),
            (151,  -6.979146008480729),
            (151,  -6.664407601004179),
            (151,  -5.318119573292807),
            (152,  -7.295057645142567),
            ...
            (186, -3.6375863548978438),
            (187,  -7.284136014196062),
            (187,  -7.062192472435525),
            (187,  -6.776506992372183),
            (187,  -6.375025418707996),
            (187, -2.0794441334897864),
            (188,  -7.207112079434446),
            (188,  -6.965076083478976),
            (188,  -6.645091620020865),
            (188,  -5.247024072160486)],
           length=155)
           Pressure__variance_larger_than_standard_deviation  \
150 -7.28                                               0.00   
    -7.06                                               0.00   
    -6.78                                               0.00   
    -6.37                                               0.00   
    -1.95                                               0.00   

           Pressure__has_duplicate_max  Pressure__has_duplicate_min  \
150 -7.28                         0.00                         0.00   
    -7.06                         0.00                         0.00   
    -6.78                         0.00                         0.00   
    -6.37                         0.00                         0.00   
    -1.95                         0.00                         0.00   

           Pressure__has_duplicate  Pressure__sum_values  \
150 -7.28                     1.00                  0.00   
    -7.06                     1.00                  0.00   
    -6.78                     1.00                  0.00   
    -6.37                     1.00                  0.00   
    -1.95                     0.00                  0.00   

           Pressure__abs_energy  Pressure__mean_abs_change  \
150 -7.28                  0.00                       0.00   
    -7.06                  0.00                       0.00   
    -6.78                  0.00                       0.00   
    -6.37                  0.00                       0.00   
    -1.95                  0.00                       0.00   

           Pressure__mean_change  Pressure__mean_second_derivative_central  \
150 -7.28                   0.00                                      0.00   
    -7.06                  -0.00                                      0.00   
    -6.78                   0.00                                     -0.00   
    -6.37                   0.00                                      0.00   
    -1.95                   0.00                                      0.00   

           Pressure__median  ...  Pressure__fourier_entropy__bins_100  \
150 -7.28              0.00  ...                                 3.73   
    -7.06              0.00  ...                                 3.68   
    -6.78              0.00  ...                                 3.70   
    -6.37              0.00  ...                                 3.79   
    -1.95              0.00  ...                                 1.04   

           Pressure__permutation_entropy__dimension_3__tau_1  \
150 -7.28                                               1.74   
    -7.06                                               1.75   
    -6.78                                               1.73   
    -6.37                                               1.72   
    -1.95                                               0.50   

           Pressure__permutation_entropy__dimension_4__tau_1  \
150 -7.28                                               3.03   
    -7.06                                               3.05   
    -6.78                                               3.02   
    -6.37                                               2.97   
    -1.95                                               0.56   

           Pressure__permutation_entropy__dimension_5__tau_1  \
150 -7.28                                               4.35   
    -7.06                                               4.40   
    -6.78                                               4.34   
    -6.37                                               4.22   
    -1.95                                               0.64   

           Pressure__permutation_entropy__dimension_6__tau_1  \
150 -7.28                                               5.20   
    -7.06                                               5.26   
    -6.78                                               5.15   
    -6.37                                               5.07   
    -1.95                                               0.69   

           Pressure__permutation_entropy__dimension_7__tau_1  \
150 -7.28                                               5.51   
    -7.06                                               5.62   
    -6.78                                               5.52   
    -6.37                                               5.46   
    -1.95                                              -0.00   

           Pressure__query_similarity_count__query_None__threshold_0.0  \
150 -7.28                                               0.00             
    -7.06                                               0.00             
    -6.78                                               0.00             
    -6.37                                               0.00             
    -1.95                                               0.00             

           Pressure__mean_n_absolute_max__number_of_maxima_7     y  obs_id  
150 -7.28                                               0.00 -7.28     150  
    -7.06                                               0.00 -7.06     150  
    -6.78                                               0.00 -6.78     150  
    -6.37                                               0.00 -6.37     150  
    -1.95                                               0.00 -1.95     150  

[5 rows x 785 columns]
In [ ]:
X_train_to = pd.read_csv( path_prefix + 'WindowsComprehensiveExtractedFeatures' + str(w_t) + ',' + str(ov_t) + 'ptimetraining.csv')
print(X_train_to.shape)

X_test_to = pd.read_csv(path_prefix + 'WindowsComprehensiveExtractedFeatures' + str(w_t) + ',' + str(ov_t) + 'ptimetest.csv')
print(X_test_to.shape)
(580, 785)
(155, 785)
In [ ]:
times_train = X_train_to['y']
times_test = X_test_to['y']
X_train_t = X_train_to.drop(['y', 'obs_id'], axis = 1)
X_test_t = X_test_to.drop(['y', 'obs_id'], axis = 1)
In [ ]:
column_names = X_train_t.columns

scaler_t = StandardScaler()
X_train_ts = pd.DataFrame(scaler.fit_transform(X_train_t), columns=column_names)
X_test_ts = pd.DataFrame(scaler.transform(X_test_t), columns=column_names)
In [ ]:
from tsfresh import select_features
X_train_times = select_features(X_train_ts, times_train)
X_test_times = X_test_ts[X_train_times.columns]
In [ ]:
print(X_train_times.shape)
print(X_test_times.shape)
(580, 614)
(155, 614)
In [ ]:
for x in X_train_times.columns:
  print(x)
Pressure__length
Pressure__range_count__max_1__min_-1
Pressure__count_above_mean
Pressure__number_peaks__n_1
Pressure__permutation_entropy__dimension_7__tau_1
Pressure__fft_aggregated__aggtype_"variance"
Pressure__number_peaks__n_3
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.0
Pressure__mean_change
Pressure__permutation_entropy__dimension_6__tau_1
Pressure__number_peaks__n_5
Pressure__fft_aggregated__aggtype_"centroid"
Pressure__last_location_of_maximum
Pressure__linear_trend__attr_"rvalue"
Pressure__linear_trend__attr_"slope"
Pressure__agg_linear_trend__attr_"slope"__chunk_len_5__f_agg_"min"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_5__f_agg_"mean"
Pressure__agg_linear_trend__attr_"slope"__chunk_len_5__f_agg_"mean"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_5__f_agg_"max"
Pressure__augmented_dickey_fuller__attr_"pvalue"__autolag_"AIC"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_5__f_agg_"min"
Pressure__agg_linear_trend__attr_"slope"__chunk_len_10__f_agg_"mean"
Pressure__agg_linear_trend__attr_"slope"__chunk_len_5__f_agg_"max"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_10__f_agg_"mean"
Pressure__number_cwt_peaks__n_5
Pressure__time_reversal_asymmetry_statistic__lag_2
Pressure__time_reversal_asymmetry_statistic__lag_1
Pressure__number_cwt_peaks__n_1
Pressure__approximate_entropy__m_2__r_0.5
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_10__f_agg_"max"
Pressure__time_reversal_asymmetry_statistic__lag_3
Pressure__number_crossing_m__m_0
Pressure__cid_ce__normalize_True
Pressure__approximate_entropy__m_2__r_0.3
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.8
Pressure__permutation_entropy__dimension_5__tau_1
Pressure__c3__lag_2
Pressure__maximum
Pressure__index_mass_quantile__q_0.9
Pressure__agg_linear_trend__attr_"slope"__chunk_len_10__f_agg_"max"
Pressure__approximate_entropy__m_2__r_0.7
Pressure__index_mass_quantile__q_0.8
Pressure__number_peaks__n_10
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_5__f_agg_"max"
Pressure__range_count__max_1000000000000.0__min_0
Pressure__fft_aggregated__aggtype_"kurtosis"
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.6
Pressure__agg_linear_trend__attr_"slope"__chunk_len_10__f_agg_"min"
Pressure__fft_coefficient__attr_"imag"__coeff_8
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.2
Pressure__augmented_dickey_fuller__attr_"teststat"__autolag_"AIC"
Pressure__fft_aggregated__aggtype_"skew"
Pressure__first_location_of_maximum
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_10__f_agg_"max"
Pressure__linear_trend__attr_"stderr"
Pressure__linear_trend__attr_"pvalue"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_10__f_agg_"min"
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_5__f_agg_"mean"
Pressure__c3__lag_3
Pressure__number_peaks__n_50
Pressure__approximate_entropy__m_2__r_0.9
Pressure__index_mass_quantile__q_0.7
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_4
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_5
Pressure__mean_second_derivative_central
Pressure__fft_coefficient__attr_"imag"__coeff_5
Pressure__index_mass_quantile__q_0.3
Pressure__index_mass_quantile__q_0.6
Pressure__partial_autocorrelation__lag_1
Pressure__fft_coefficient__attr_"imag"__coeff_4
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_10__f_agg_"mean"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_50__f_agg_"max"
Pressure__index_mass_quantile__q_0.4
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_1.0__ql_0.4
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_9
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_5__f_agg_"min"
Pressure__fourier_entropy__bins_100
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_2
Pressure__agg_linear_trend__attr_"slope"__chunk_len_50__f_agg_"max"
Pressure__fft_coefficient__attr_"imag"__coeff_6
Pressure__c3__lag_1
Pressure__fft_coefficient__attr_"imag"__coeff_7
Pressure__value_count__value_0
Pressure__index_mass_quantile__q_0.2
Pressure__agg_linear_trend__attr_"slope"__chunk_len_50__f_agg_"mean"
Pressure__has_duplicate
Pressure__absolute_maximum
Pressure__fft_coefficient__attr_"abs"__coeff_1
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_10__f_agg_"min"
Pressure__index_mass_quantile__q_0.1
Pressure__fft_coefficient__attr_"imag"__coeff_9
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_0
Pressure__fft_coefficient__attr_"imag"__coeff_10
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_50__f_agg_"mean"
Pressure__large_standard_deviation__r_0.2
Pressure__root_mean_square
Pressure__fft_coefficient__attr_"imag"__coeff_2
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_1.0__ql_0.8
Pressure__fft_coefficient__attr_"real"__coeff_1
Pressure__permutation_entropy__dimension_4__tau_1
Pressure__standard_deviation
Pressure__variance
Pressure__fft_coefficient__attr_"imag"__coeff_12
Pressure__ar_coefficient__coeff_1__k_10
Pressure__symmetry_looking__r_0.05
Pressure__fft_coefficient__attr_"imag"__coeff_11
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_1
Pressure__fft_coefficient__attr_"imag"__coeff_3
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_10__f_agg_"var"
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_3
Pressure__fft_coefficient__attr_"abs"__coeff_2
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_5__f_agg_"mean"
Pressure__linear_trend__attr_"intercept"
Pressure__ar_coefficient__coeff_2__k_10
Pressure__mean
Pressure__mean_n_absolute_max__number_of_maxima_7
Pressure__ar_coefficient__coeff_3__k_10
Pressure__augmented_dickey_fuller__attr_"usedlag"__autolag_"AIC"
Pressure__ratio_beyond_r_sigma__r_1.5
Pressure__fft_coefficient__attr_"imag"__coeff_13
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_5__f_agg_"var"
Pressure__fft_coefficient__attr_"abs"__coeff_5
Pressure__percentage_of_reoccurring_values_to_all_values
Pressure__fft_coefficient__attr_"imag"__coeff_14
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_10__f_agg_"var"
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"mean"
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_6
Pressure__fft_coefficient__attr_"abs"__coeff_3
Pressure__agg_linear_trend__attr_"slope"__chunk_len_10__f_agg_"var"
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_1.0__ql_0.6
Pressure__fft_coefficient__attr_"abs"__coeff_4
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_7
Pressure__fourier_entropy__bins_10
Pressure__fft_coefficient__attr_"abs"__coeff_7
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_1.0__ql_0.8
Pressure__fft_coefficient__attr_"imag"__coeff_15
Pressure__fft_coefficient__attr_"imag"__coeff_17
Pressure__fft_coefficient__attr_"abs"__coeff_0
Pressure__fft_coefficient__attr_"imag"__coeff_20
Pressure__fft_coefficient__attr_"abs"__coeff_8
Pressure__fft_coefficient__attr_"imag"__coeff_16
Pressure__fft_coefficient__attr_"abs"__coeff_6
Pressure__fft_coefficient__attr_"real"__coeff_2
Pressure__fft_coefficient__attr_"imag"__coeff_26
Pressure__quantile__q_0.9
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_5__f_agg_"var"
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_5__f_agg_"max"
Pressure__agg_linear_trend__attr_"slope"__chunk_len_50__f_agg_"var"
Pressure__fft_coefficient__attr_"imag"__coeff_22
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_1.0__ql_0.6
Pressure__agg_linear_trend__attr_"slope"__chunk_len_5__f_agg_"var"
Pressure__fft_coefficient__attr_"imag"__coeff_24
Pressure__fft_coefficient__attr_"imag"__coeff_19
Pressure__fft_coefficient__attr_"angle"__coeff_8
Pressure__ratio_value_number_to_time_series_length
Pressure__permutation_entropy__dimension_3__tau_1
Pressure__large_standard_deviation__r_0.25
Pressure__fft_coefficient__attr_"imag"__coeff_18
Pressure__longest_strike_below_mean
Pressure__fft_coefficient__attr_"imag"__coeff_32
Pressure__sample_entropy
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_1.0__ql_0.8
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_50__f_agg_"var"
Pressure__fft_coefficient__attr_"imag"__coeff_28
Pressure__fft_coefficient__attr_"imag"__coeff_29
Pressure__fft_coefficient__attr_"imag"__coeff_37
Pressure__ar_coefficient__coeff_4__k_10
Pressure__fft_coefficient__attr_"imag"__coeff_21
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_1.0__ql_0.2
Pressure__fourier_entropy__bins_5
Pressure__fft_coefficient__attr_"abs"__coeff_9
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_1.0__ql_0.6
Pressure__percentage_of_reoccurring_datapoints_to_all_datapoints
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"max"
Pressure__approximate_entropy__m_2__r_0.1
Pressure__fft_coefficient__attr_"imag"__coeff_23
Pressure__abs_energy
Pressure__fft_coefficient__attr_"imag"__coeff_27
Pressure__agg_linear_trend__attr_"slope"__chunk_len_50__f_agg_"min"
Pressure__agg_linear_trend__attr_"rvalue"__chunk_len_50__f_agg_"min"
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_1.0__ql_0.4
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.6__ql_0.2
Pressure__fft_coefficient__attr_"imag"__coeff_35
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_50__f_agg_"max"
Pressure__fft_coefficient__attr_"imag"__coeff_39
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.4__ql_0.2
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_50__f_agg_"mean"
Pressure__fft_coefficient__attr_"imag"__coeff_1
Pressure__fft_coefficient__attr_"abs"__coeff_13
Pressure__ratio_beyond_r_sigma__r_1
Pressure__fourier_entropy__bins_3
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_5__f_agg_"min"
Pressure__fft_coefficient__attr_"abs"__coeff_16
Pressure__fft_coefficient__attr_"imag"__coeff_31
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_1.0__ql_0.2
Pressure__fft_coefficient__attr_"abs"__coeff_10
Pressure__fft_coefficient__attr_"abs"__coeff_11
Pressure__fft_coefficient__attr_"imag"__coeff_53
Pressure__fft_coefficient__attr_"abs"__coeff_12
Pressure__fft_coefficient__attr_"imag"__coeff_44
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.6__ql_0.2
Pressure__fft_coefficient__attr_"abs"__coeff_14
Pressure__fft_coefficient__attr_"imag"__coeff_25
Pressure__fft_coefficient__attr_"imag"__coeff_34
Pressure__fft_coefficient__attr_"angle"__coeff_53
Pressure__fft_coefficient__attr_"real"__coeff_3
Pressure__fft_coefficient__attr_"real"__coeff_4
Pressure__fft_coefficient__attr_"real"__coeff_43
Pressure__fft_coefficient__attr_"imag"__coeff_30
Pressure__fft_coefficient__attr_"imag"__coeff_33
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_1.0__ql_0.4
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_1.0__ql_0.4
Pressure__variation_coefficient
Pressure__fft_coefficient__attr_"angle"__coeff_39
Pressure__count_below_mean
Pressure__sum_values
Pressure__fft_coefficient__attr_"real"__coeff_0
Pressure__fft_coefficient__attr_"real"__coeff_41
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.6__ql_0.2
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_1.0__ql_0.2
Pressure__fft_coefficient__attr_"imag"__coeff_43
Pressure__range_count__max_0__min_-1000000000000.0
Pressure__fft_coefficient__attr_"real"__coeff_32
Pressure__fft_coefficient__attr_"angle"__coeff_23
Pressure__fft_coefficient__attr_"abs"__coeff_27
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_50__f_agg_"mean"
Pressure__fft_coefficient__attr_"angle"__coeff_9
Pressure__fft_coefficient__attr_"angle"__coeff_35
Pressure__fft_coefficient__attr_"angle"__coeff_11
Pressure__fft_coefficient__attr_"imag"__coeff_50
Pressure__fft_coefficient__attr_"imag"__coeff_38
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.4__ql_0.2
Pressure__fft_coefficient__attr_"real"__coeff_71
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.8__ql_0.2
Pressure__longest_strike_above_mean
Pressure__fft_coefficient__attr_"angle"__coeff_17
Pressure__fft_coefficient__attr_"angle"__coeff_10
Pressure__fft_coefficient__attr_"real"__coeff_35
Pressure__fft_coefficient__attr_"abs"__coeff_21
Pressure__fft_coefficient__attr_"real"__coeff_48
Pressure__fft_coefficient__attr_"angle"__coeff_24
Pressure__fft_coefficient__attr_"imag"__coeff_40
Pressure__fft_coefficient__attr_"imag"__coeff_36
Pressure__fft_coefficient__attr_"abs"__coeff_20
Pressure__fft_coefficient__attr_"real"__coeff_51
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.4__ql_0.2
Pressure__fft_coefficient__attr_"real"__coeff_57
Pressure__fft_coefficient__attr_"abs"__coeff_24
Pressure__ratio_beyond_r_sigma__r_7
Pressure__fft_coefficient__attr_"angle"__coeff_16
Pressure__fft_coefficient__attr_"imag"__coeff_51
Pressure__fft_coefficient__attr_"abs"__coeff_25
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"var"
Pressure__fft_coefficient__attr_"angle"__coeff_12
Pressure__fft_coefficient__attr_"angle"__coeff_14
Pressure__fft_coefficient__attr_"abs"__coeff_17
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_50__f_agg_"max"
Pressure__fft_coefficient__attr_"abs"__coeff_19
Pressure__fft_coefficient__attr_"imag"__coeff_61
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_10__f_agg_"min"
Pressure__fft_coefficient__attr_"real"__coeff_44
Pressure__ratio_beyond_r_sigma__r_6
Pressure__fft_coefficient__attr_"angle"__coeff_51
Pressure__fft_coefficient__attr_"abs"__coeff_29
Pressure__fft_coefficient__attr_"real"__coeff_39
Pressure__fft_coefficient__attr_"real"__coeff_37
Pressure__fft_coefficient__attr_"angle"__coeff_7
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_50__f_agg_"var"
Pressure__partial_autocorrelation__lag_2
Pressure__fft_coefficient__attr_"angle"__coeff_19
Pressure__fft_coefficient__attr_"abs"__coeff_18
Pressure__fft_coefficient__attr_"abs"__coeff_22
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.8__ql_0.2
Pressure__fft_coefficient__attr_"imag"__coeff_54
Pressure__fft_coefficient__attr_"real"__coeff_64
Pressure__fft_coefficient__attr_"real"__coeff_69
Pressure__fft_coefficient__attr_"real"__coeff_30
Pressure__fft_coefficient__attr_"real"__coeff_38
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.8__ql_0.2
Pressure__large_standard_deviation__r_0.15000000000000002
Pressure__fft_coefficient__attr_"abs"__coeff_41
Pressure__quantile__q_0.8
Pressure__fft_coefficient__attr_"real"__coeff_79
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.4__ql_0.2
Pressure__fft_coefficient__attr_"imag"__coeff_55
Pressure__fft_coefficient__attr_"real"__coeff_5
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_50__f_agg_"var"
Pressure__fft_coefficient__attr_"angle"__coeff_33
Pressure__fft_coefficient__attr_"real"__coeff_34
Pressure__fft_coefficient__attr_"real"__coeff_33
Pressure__fft_coefficient__attr_"real"__coeff_52
Pressure__fft_coefficient__attr_"real"__coeff_53
Pressure__fft_coefficient__attr_"imag"__coeff_42
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.8__ql_0.2
Pressure__fft_coefficient__attr_"abs"__coeff_31
Pressure__fft_coefficient__attr_"angle"__coeff_18
Pressure__fft_coefficient__attr_"real"__coeff_29
Pressure__fft_coefficient__attr_"real"__coeff_73
Pressure__fft_coefficient__attr_"imag"__coeff_49
Pressure__fft_coefficient__attr_"real"__coeff_40
Pressure__fft_coefficient__attr_"abs"__coeff_23
Pressure__fft_coefficient__attr_"angle"__coeff_43
Pressure__fft_coefficient__attr_"abs"__coeff_15
Pressure__fft_coefficient__attr_"real"__coeff_60
Pressure__fft_coefficient__attr_"angle"__coeff_26
Pressure__fft_coefficient__attr_"angle"__coeff_49
Pressure__fft_coefficient__attr_"abs"__coeff_34
Pressure__spkt_welch_density__coeff_2
Pressure__fft_coefficient__attr_"angle"__coeff_61
Pressure__fft_coefficient__attr_"angle"__coeff_4
Pressure__fft_coefficient__attr_"abs"__coeff_30
Pressure__fft_coefficient__attr_"real"__coeff_63
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_5__f_agg_"var"
Pressure__fft_coefficient__attr_"real"__coeff_45
Pressure__ratio_beyond_r_sigma__r_5
Pressure__fft_coefficient__attr_"abs"__coeff_32
Pressure__fft_coefficient__attr_"real"__coeff_61
Pressure__fft_coefficient__attr_"angle"__coeff_50
Pressure__fft_coefficient__attr_"real"__coeff_49
Pressure__fft_coefficient__attr_"real"__coeff_65
Pressure__binned_entropy__max_bins_10
Pressure__fft_coefficient__attr_"imag"__coeff_48
Pressure__fft_coefficient__attr_"angle"__coeff_37
Pressure__fft_coefficient__attr_"imag"__coeff_41
Pressure__large_standard_deviation__r_0.30000000000000004
Pressure__fft_coefficient__attr_"real"__coeff_66
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.6__ql_0.4
Pressure__fft_coefficient__attr_"abs"__coeff_26
Pressure__fft_coefficient__attr_"imag"__coeff_58
Pressure__fft_coefficient__attr_"angle"__coeff_38
Pressure__fft_coefficient__attr_"angle"__coeff_56
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_1.0__ql_0.0
Pressure__mean_abs_change
Pressure__fft_coefficient__attr_"abs"__coeff_48
Pressure__fft_coefficient__attr_"angle"__coeff_44
Pressure__fft_coefficient__attr_"angle"__coeff_15
Pressure__fft_coefficient__attr_"angle"__coeff_32
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.8__ql_0.4
Pressure__fft_coefficient__attr_"angle"__coeff_29
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_1.0__ql_0.0
Pressure__fft_coefficient__attr_"angle"__coeff_28
Pressure__fft_coefficient__attr_"imag"__coeff_45
Pressure__fft_coefficient__attr_"real"__coeff_75
Pressure__fft_coefficient__attr_"real"__coeff_68
Pressure__fft_coefficient__attr_"imag"__coeff_62
Pressure__fft_coefficient__attr_"real"__coeff_62
Pressure__fft_coefficient__attr_"angle"__coeff_13
Pressure__fft_coefficient__attr_"abs"__coeff_33
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.8__ql_0.0
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_1.0__ql_0.0
Pressure__fft_coefficient__attr_"real"__coeff_47
Pressure__fft_coefficient__attr_"imag"__coeff_56
Pressure__fft_coefficient__attr_"angle"__coeff_72
Pressure__fft_coefficient__attr_"real"__coeff_23
Pressure__fft_coefficient__attr_"real"__coeff_27
Pressure__fft_coefficient__attr_"real"__coeff_36
Pressure__ratio_beyond_r_sigma__r_2
Pressure__fft_coefficient__attr_"real"__coeff_6
Pressure__symmetry_looking__r_0.1
Pressure__fft_coefficient__attr_"angle"__coeff_5
Pressure__fft_coefficient__attr_"abs"__coeff_39
Pressure__fft_coefficient__attr_"angle"__coeff_25
Pressure__fft_coefficient__attr_"real"__coeff_72
Pressure__fft_coefficient__attr_"angle"__coeff_52
Pressure__fft_coefficient__attr_"abs"__coeff_28
Pressure__fft_coefficient__attr_"real"__coeff_76
Pressure__fft_coefficient__attr_"abs"__coeff_43
Pressure__fft_coefficient__attr_"abs"__coeff_55
Pressure__fft_coefficient__attr_"angle"__coeff_71
Pressure__fft_coefficient__attr_"real"__coeff_42
Pressure__fft_coefficient__attr_"abs"__coeff_37
Pressure__fft_coefficient__attr_"abs"__coeff_58
Pressure__fft_coefficient__attr_"real"__coeff_7
Pressure__fft_coefficient__attr_"angle"__coeff_41
Pressure__fft_coefficient__attr_"real"__coeff_70
Pressure__fft_coefficient__attr_"abs"__coeff_36
Pressure__fft_coefficient__attr_"imag"__coeff_73
Pressure__fft_coefficient__attr_"angle"__coeff_22
Pressure__fft_coefficient__attr_"abs"__coeff_44
Pressure__fft_coefficient__attr_"imag"__coeff_47
Pressure__fft_coefficient__attr_"angle"__coeff_20
Pressure__fft_coefficient__attr_"imag"__coeff_74
Pressure__fft_coefficient__attr_"abs"__coeff_66
Pressure__fft_coefficient__attr_"abs"__coeff_56
Pressure__fft_coefficient__attr_"real"__coeff_26
Pressure__fft_coefficient__attr_"angle"__coeff_40
Pressure__fft_coefficient__attr_"real"__coeff_78
Pressure__fft_coefficient__attr_"real"__coeff_84
Pressure__fft_coefficient__attr_"angle"__coeff_30
Pressure__fft_coefficient__attr_"angle"__coeff_27
Pressure__fft_coefficient__attr_"angle"__coeff_6
Pressure__fft_coefficient__attr_"imag"__coeff_63
Pressure__fft_coefficient__attr_"imag"__coeff_59
Pressure__fft_coefficient__attr_"angle"__coeff_54
Pressure__fft_coefficient__attr_"abs"__coeff_53
Pressure__fft_coefficient__attr_"abs"__coeff_68
Pressure__ratio_beyond_r_sigma__r_3
Pressure__fft_coefficient__attr_"abs"__coeff_47
Pressure__median
Pressure__fft_coefficient__attr_"abs"__coeff_40
Pressure__fft_coefficient__attr_"real"__coeff_55
Pressure__fft_coefficient__attr_"angle"__coeff_34
Pressure__fft_coefficient__attr_"abs"__coeff_38
Pressure__fft_coefficient__attr_"imag"__coeff_46
Pressure__fft_coefficient__attr_"abs"__coeff_35
Pressure__fft_coefficient__attr_"angle"__coeff_79
Pressure__fft_coefficient__attr_"angle"__coeff_45
Pressure__fft_coefficient__attr_"angle"__coeff_31
Pressure__fft_coefficient__attr_"angle"__coeff_21
Pressure__fft_coefficient__attr_"imag"__coeff_57
Pressure__fft_coefficient__attr_"real"__coeff_28
Pressure__fft_coefficient__attr_"angle"__coeff_66
Pressure__fft_coefficient__attr_"imag"__coeff_60
Pressure__fft_coefficient__attr_"imag"__coeff_70
Pressure__fourier_entropy__bins_2
Pressure__fft_coefficient__attr_"real"__coeff_54
Pressure__fft_coefficient__attr_"abs"__coeff_52
Pressure__fft_coefficient__attr_"real"__coeff_81
Pressure__fft_coefficient__attr_"angle"__coeff_42
Pressure__fft_coefficient__attr_"imag"__coeff_52
Pressure__fft_coefficient__attr_"abs"__coeff_62
Pressure__fft_coefficient__attr_"real"__coeff_31
Pressure__fft_coefficient__attr_"abs"__coeff_69
Pressure__fft_coefficient__attr_"abs"__coeff_45
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.8__ql_0.4
Pressure__fft_coefficient__attr_"angle"__coeff_55
Pressure__fft_coefficient__attr_"angle"__coeff_73
Pressure__fft_coefficient__attr_"real"__coeff_25
Pressure__fft_coefficient__attr_"abs"__coeff_57
Pressure__fft_coefficient__attr_"abs"__coeff_64
Pressure__ar_coefficient__coeff_5__k_10
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.8__ql_0.4
Pressure__fft_coefficient__attr_"angle"__coeff_60
Pressure__fft_coefficient__attr_"real"__coeff_82
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.8__ql_0.0
Pressure__large_standard_deviation__r_0.1
Pressure__fft_coefficient__attr_"real"__coeff_46
Pressure__fft_coefficient__attr_"imag"__coeff_64
Pressure__fft_coefficient__attr_"abs"__coeff_61
Pressure__fft_coefficient__attr_"imag"__coeff_77
Pressure__fft_coefficient__attr_"abs"__coeff_63
Pressure__fft_coefficient__attr_"real"__coeff_77
Pressure__fft_coefficient__attr_"angle"__coeff_48
Pressure__fft_coefficient__attr_"real"__coeff_85
Pressure__fft_coefficient__attr_"angle"__coeff_62
Pressure__fft_coefficient__attr_"angle"__coeff_59
Pressure__fft_coefficient__attr_"real"__coeff_67
Pressure__fft_coefficient__attr_"real"__coeff_91
Pressure__fft_coefficient__attr_"abs"__coeff_67
Pressure__fft_coefficient__attr_"real"__coeff_74
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.8__ql_0.4
Pressure__fft_coefficient__attr_"angle"__coeff_75
Pressure__fft_coefficient__attr_"abs"__coeff_42
Pressure__fft_coefficient__attr_"real"__coeff_83
Pressure__fft_coefficient__attr_"angle"__coeff_65
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.6__ql_0.0
Pressure__fft_coefficient__attr_"real"__coeff_59
Pressure__fft_coefficient__attr_"angle"__coeff_63
Pressure__fft_coefficient__attr_"angle"__coeff_58
Pressure__fft_coefficient__attr_"imag"__coeff_71
Pressure__fft_coefficient__attr_"abs"__coeff_60
Pressure__fft_coefficient__attr_"abs"__coeff_83
Pressure__count_below__t_0
Pressure__fft_coefficient__attr_"angle"__coeff_69
Pressure__fft_coefficient__attr_"abs"__coeff_72
Pressure__fft_coefficient__attr_"angle"__coeff_47
Pressure__fft_coefficient__attr_"angle"__coeff_77
Pressure__fft_coefficient__attr_"abs"__coeff_46
Pressure__agg_linear_trend__attr_"stderr"__chunk_len_50__f_agg_"min"
Pressure__fft_coefficient__attr_"abs"__coeff_76
Pressure__fft_coefficient__attr_"imag"__coeff_78
Pressure__ar_coefficient__coeff_7__k_10
Pressure__fft_coefficient__attr_"real"__coeff_58
Pressure__fft_coefficient__attr_"real"__coeff_80
Pressure__fft_coefficient__attr_"abs"__coeff_77
Pressure__fft_coefficient__attr_"real"__coeff_50
Pressure__fft_coefficient__attr_"real"__coeff_92
Pressure__fft_coefficient__attr_"angle"__coeff_2
Pressure__fft_coefficient__attr_"imag"__coeff_72
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.6__ql_0.2
Pressure__fft_coefficient__attr_"imag"__coeff_75
Pressure__fft_coefficient__attr_"abs"__coeff_71
Pressure__fft_coefficient__attr_"real"__coeff_88
Pressure__lempel_ziv_complexity__bins_3
Pressure__fft_coefficient__attr_"abs"__coeff_51
Pressure__fft_coefficient__attr_"abs"__coeff_54
Pressure__fft_coefficient__attr_"imag"__coeff_65
Pressure__fft_coefficient__attr_"abs"__coeff_59
Pressure__fft_coefficient__attr_"real"__coeff_86
Pressure__fft_coefficient__attr_"abs"__coeff_50
Pressure__fft_coefficient__attr_"imag"__coeff_66
Pressure__fft_coefficient__attr_"real"__coeff_24
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.4__ql_0.0
Pressure__fft_coefficient__attr_"angle"__coeff_46
Pressure__fft_coefficient__attr_"angle"__coeff_76
Pressure__large_standard_deviation__r_0.35000000000000003
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.8__ql_0.0
Pressure__fft_coefficient__attr_"real"__coeff_56
Pressure__fft_coefficient__attr_"angle"__coeff_78
Pressure__fft_coefficient__attr_"angle"__coeff_81
Pressure__fft_coefficient__attr_"imag"__coeff_85
Pressure__fft_coefficient__attr_"abs"__coeff_79
Pressure__fft_coefficient__attr_"real"__coeff_87
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.8__ql_0.6
Pressure__fft_coefficient__attr_"imag"__coeff_67
Pressure__fft_coefficient__attr_"real"__coeff_94
Pressure__fft_coefficient__attr_"abs"__coeff_74
Pressure__fft_coefficient__attr_"abs"__coeff_70
Pressure__fft_coefficient__attr_"abs"__coeff_49
Pressure__first_location_of_minimum
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.6__ql_0.0
Pressure__fft_coefficient__attr_"angle"__coeff_74
Pressure__fft_coefficient__attr_"imag"__coeff_82
Pressure__fft_coefficient__attr_"angle"__coeff_70
Pressure__fft_coefficient__attr_"real"__coeff_8
Pressure__fft_coefficient__attr_"angle"__coeff_68
Pressure__fft_coefficient__attr_"angle"__coeff_36
Pressure__fft_coefficient__attr_"angle"__coeff_57
Pressure__fft_coefficient__attr_"imag"__coeff_68
Pressure__fft_coefficient__attr_"imag"__coeff_80
Pressure__fft_coefficient__attr_"abs"__coeff_65
Pressure__fft_coefficient__attr_"abs"__coeff_73
Pressure__fft_coefficient__attr_"real"__coeff_96
Pressure__fft_coefficient__attr_"imag"__coeff_69
Pressure__fft_coefficient__attr_"angle"__coeff_82
Pressure__agg_linear_trend__attr_"intercept"__chunk_len_50__f_agg_"min"
Pressure__fft_coefficient__attr_"real"__coeff_19
Pressure__energy_ratio_by_chunks__num_segments_10__segment_focus_8
Pressure__ar_coefficient__coeff_9__k_10
Pressure__fft_coefficient__attr_"angle"__coeff_85
Pressure__ar_coefficient__coeff_6__k_10
Pressure__quantile__q_0.7
Pressure__fft_coefficient__attr_"imag"__coeff_81
Pressure__fft_coefficient__attr_"angle"__coeff_64
Pressure__fft_coefficient__attr_"angle"__coeff_92
Pressure__fft_coefficient__attr_"imag"__coeff_76
Pressure__fft_coefficient__attr_"angle"__coeff_3
Pressure__quantile__q_0.4
Pressure__absolute_sum_of_changes
Pressure__change_quantiles__f_agg_"mean"__isabs_True__qh_0.4__ql_0.0
Pressure__fft_coefficient__attr_"abs"__coeff_81
Pressure__fft_coefficient__attr_"real"__coeff_98
Pressure__fft_coefficient__attr_"angle"__coeff_1
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.6__ql_0.4
Pressure__fft_coefficient__attr_"imag"__coeff_79
Pressure__fft_coefficient__attr_"abs"__coeff_80
Pressure__fft_coefficient__attr_"real"__coeff_93
Pressure__fft_coefficient__attr_"angle"__coeff_93
Pressure__fft_coefficient__attr_"real"__coeff_97
Pressure__fft_coefficient__attr_"angle"__coeff_67
Pressure__fft_coefficient__attr_"abs"__coeff_89
Pressure__fft_coefficient__attr_"angle"__coeff_89
Pressure__fft_coefficient__attr_"imag"__coeff_92
Pressure__fft_coefficient__attr_"abs"__coeff_86
Pressure__fft_coefficient__attr_"real"__coeff_22
Pressure__last_location_of_minimum
Pressure__fft_coefficient__attr_"abs"__coeff_75
Pressure__fft_coefficient__attr_"real"__coeff_89
Pressure__fft_coefficient__attr_"angle"__coeff_97
Pressure__fft_coefficient__attr_"abs"__coeff_88
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.6__ql_0.0
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.6__ql_0.4
Pressure__fft_coefficient__attr_"angle"__coeff_95
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.8__ql_0.0
Pressure__fft_coefficient__attr_"angle"__coeff_0
Pressure__fft_coefficient__attr_"angle"__coeff_83
Pressure__change_quantiles__f_agg_"mean"__isabs_False__qh_0.8__ql_0.6
Pressure__fft_coefficient__attr_"abs"__coeff_78
Pressure__fft_coefficient__attr_"abs"__coeff_85
Pressure__fft_coefficient__attr_"angle"__coeff_80
Pressure__fft_coefficient__attr_"abs"__coeff_84
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.8__ql_0.6
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.6__ql_0.4
Pressure__cwt_coefficients__coeff_5__w_5__widths_(2, 5, 10, 20)
Pressure__quantile__q_0.6
Pressure__fft_coefficient__attr_"imag"__coeff_93
Pressure__ratio_beyond_r_sigma__r_0.5
Pressure__cwt_coefficients__coeff_6__w_5__widths_(2, 5, 10, 20)
Pressure__cwt_coefficients__coeff_4__w_5__widths_(2, 5, 10, 20)
Pressure__fft_coefficient__attr_"abs"__coeff_87
Pressure__ratio_beyond_r_sigma__r_10
Pressure__cwt_coefficients__coeff_4__w_10__widths_(2, 5, 10, 20)
Pressure__fft_coefficient__attr_"real"__coeff_90
Pressure__spkt_welch_density__coeff_5
Pressure__fft_coefficient__attr_"angle"__coeff_84
Pressure__fft_coefficient__attr_"angle"__coeff_91
Pressure__cwt_coefficients__coeff_3__w_10__widths_(2, 5, 10, 20)
Pressure__fft_coefficient__attr_"angle"__coeff_90
Pressure__cwt_coefficients__coeff_7__w_5__widths_(2, 5, 10, 20)
Pressure__fft_coefficient__attr_"real"__coeff_99
Pressure__change_quantiles__f_agg_"var"__isabs_False__qh_0.8__ql_0.6
Pressure__fft_coefficient__attr_"imag"__coeff_88
Pressure__fft_coefficient__attr_"imag"__coeff_83
Pressure__benford_correlation
Pressure__fft_coefficient__attr_"imag"__coeff_97
Pressure__cwt_coefficients__coeff_5__w_10__widths_(2, 5, 10, 20)
Pressure__large_standard_deviation__r_0.45
Pressure__large_standard_deviation__r_0.4
Pressure__fft_coefficient__attr_"abs"__coeff_97
Pressure__fft_coefficient__attr_"imag"__coeff_96
Pressure__fft_coefficient__attr_"real"__coeff_95
Pressure__fft_coefficient__attr_"abs"__coeff_82
Pressure__fft_coefficient__attr_"angle"__coeff_87
Pressure__cwt_coefficients__coeff_2__w_2__widths_(2, 5, 10, 20)
Pressure__fft_coefficient__attr_"real"__coeff_9
Pressure__lempel_ziv_complexity__bins_5
Pressure__fft_coefficient__attr_"angle"__coeff_88
Pressure__cwt_coefficients__coeff_3__w_5__widths_(2, 5, 10, 20)
Pressure__cwt_coefficients__coeff_6__w_10__widths_(2, 5, 10, 20)
Pressure__fft_coefficient__attr_"imag"__coeff_98
Pressure__fft_coefficient__attr_"real"__coeff_18
Pressure__fft_coefficient__attr_"abs"__coeff_98
Pressure__fft_coefficient__attr_"real"__coeff_21
Pressure__cwt_coefficients__coeff_2__w_10__widths_(2, 5, 10, 20)

LazyPredict

Shows Random Forest as best model with Rsquare value of 0.63

In [ ]:
from lazypredict.Supervised import LazyRegressor

lpredict = LazyRegressor(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = lpredict.fit(X_train_times, X_test_times, times_train, times_test)

print(models)
 98%|█████████▊| 41/42 [01:49<00:03,  3.55s/it]
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.005666 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 110392
[LightGBM] [Info] Number of data points in the train set: 580, number of used features: 608
[LightGBM] [Info] Start training from score -6.348986
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
100%|██████████| 42/42 [01:53<00:00,  2.71s/it]
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
                                      Adjusted R-Squared  \
Model                                                      
SGDRegressor                  37013960459925191655424.00   
Lars                             56909672973441400832.00   
LarsCV                           56909672973441400832.00   
TransformedTargetRegressor                         30.91   
LinearRegression                                   30.91   
GaussianProcessRegressor                           10.38   
KernelRidge                                        10.36   
DummyRegressor                                      1.34   
LassoLars                                           1.33   
Lasso                                               1.33   
ElasticNet                                          1.13   
MLPRegressor                                        1.08   
LinearSVR                                           1.07   
ExtraTreeRegressor                                  1.06   
SVR                                                 1.04   
NuSVR                                               1.04   
Ridge                                               1.04   
PassiveAggressiveRegressor                          1.03   
HuberRegressor                                      1.02   
KNeighborsRegressor                                 1.02   
DecisionTreeRegressor                               1.02   
RidgeCV                                             1.02   
OrthogonalMatchingPursuit                           1.02   
TweedieRegressor                                    1.02   
BayesianRidge                                       1.02   
ElasticNetCV                                        1.02   
OrthogonalMatchingPursuitCV                         1.02   
LassoLarsCV                                         1.02   
LassoCV                                             1.02   
HistGradientBoostingRegressor                       1.01   
XGBRegressor                                        1.01   
LGBMRegressor                                       1.01   
AdaBoostRegressor                                   1.01   
GradientBoostingRegressor                           1.01   
BaggingRegressor                                    1.01   
ExtraTreesRegressor                                 1.01   
RandomForestRegressor                               1.01   

                                                 R-Squared            RMSE  \
Model                                                                        
SGDRegressor                  -110561180594581742288896.00 403557321370.64   
Lars                             -169989932258331459584.00  15823971693.29   
LarsCV                           -169989932258331459584.00  15823971693.29   
TransformedTargetRegressor                          -88.33           11.47   
LinearRegression                                    -88.33           11.47   
GaussianProcessRegressor                            -27.00            6.42   
KernelRidge                                         -26.95            6.42   
DummyRegressor                                       -0.00            1.21   
LassoLars                                             0.01            1.21   
Lasso                                                 0.01            1.21   
ElasticNet                                            0.60            0.77   
MLPRegressor                                          0.76            0.59   
LinearSVR                                             0.79            0.56   
ExtraTreeRegressor                                    0.81            0.53   
SVR                                                   0.87            0.44   
NuSVR                                                 0.87            0.44   
Ridge                                                 0.89            0.40   
PassiveAggressiveRegressor                            0.92            0.35   
HuberRegressor                                        0.93            0.33   
KNeighborsRegressor                                   0.93            0.33   
DecisionTreeRegressor                                 0.93            0.32   
RidgeCV                                               0.93            0.31   
OrthogonalMatchingPursuit                             0.94            0.30   
TweedieRegressor                                      0.94            0.30   
BayesianRidge                                         0.94            0.29   
ElasticNetCV                                          0.95            0.28   
OrthogonalMatchingPursuitCV                           0.95            0.28   
LassoLarsCV                                           0.95            0.28   
LassoCV                                               0.95            0.28   
HistGradientBoostingRegressor                         0.96            0.26   
XGBRegressor                                          0.96            0.25   
LGBMRegressor                                         0.96            0.25   
AdaBoostRegressor                                     0.96            0.25   
GradientBoostingRegressor                             0.96            0.24   
BaggingRegressor                                      0.96            0.24   
ExtraTreesRegressor                                   0.96            0.24   
RandomForestRegressor                                 0.96            0.23   

                               Time Taken  
Model                                      
SGDRegressor                         0.07  
Lars                                 0.42  
LarsCV                               3.29  
TransformedTargetRegressor           0.18  
LinearRegression                     0.20  
GaussianProcessRegressor             0.51  
KernelRidge                          0.09  
DummyRegressor                       0.04  
LassoLars                            0.11  
Lasso                                0.07  
ElasticNet                           0.05  
MLPRegressor                         0.86  
LinearSVR                            1.01  
ExtraTreeRegressor                   0.16  
SVR                                  0.17  
NuSVR                                0.19  
Ridge                                0.06  
PassiveAggressiveRegressor           0.13  
HuberRegressor                       0.25  
KNeighborsRegressor                  0.06  
DecisionTreeRegressor                0.36  
RidgeCV                              0.17  
OrthogonalMatchingPursuit            0.06  
TweedieRegressor                     0.46  
BayesianRidge                        0.32  
ElasticNetCV                        18.41  
OrthogonalMatchingPursuitCV          0.17  
LassoLarsCV                          5.03  
LassoCV                             17.28  
HistGradientBoostingRegressor        5.77  
XGBRegressor                        11.08  
LGBMRegressor                        3.92  
AdaBoostRegressor                    4.27  
GradientBoostingRegressor           10.21  
BaggingRegressor                     1.78  
ExtraTreesRegressor                  7.45  
RandomForestRegressor               18.47  

Among the models evaluated by LazyPredict, let us first consider Linear Models and then go to ensemble models. Although ensemble models seems to have given the best results, let us first start with simpler models. Among Linear Models, RidgeCV showed R2 value of 0.93 and time taken is as low as 0.25. Among ensembles, AdaBoostRegressor took the least amount of time of 2.04 for similar R2 value as GradientBoostRegressor. Let us evaluate RidgeCV first, then the ensemble models.

RidgeCV

In [ ]:
rcv_parameters = {
    'alpha_per_target': [True, False],
    'fit_intercept':[True, False],
    'gcv_mode':['auto', 'svd','eigen']
}

model_rcv_t = RidgeCV(alphas = [0.1, 0.5, 1.0, 5.0, 10.0, 100, 150, 200],cv=5, scoring = 'r2')

model_rcv_t = GridSearchCV(model_rcv_t,rcv_parameters,cv=5,scoring='r2')
model_rcv_t.fit(X_train_times, times_train)

#model_rcv_t = RidgeCV(alphas=[0.1, 0.5, 1.0, 5.0, 10.0, 100, 150, 200], cv=5, scoring = 'r2', gcv_mode = ['auto', 'svd','eigen'])

# Fit the model to the training data
#model_rcv_t.fit(X_train_times, times_train)

# Make predictions on the test data
prediction_rcv_t = model_rcv_t.predict(X_test_times)
#print("Best alpha:", model_rcv_t.alpha_)
best_model_rcv_t = model_rcv_t.best_estimator_
print("Best alpha:", best_model_rcv_t.alpha_)
print("best parameters:",model_rcv_t.best_params_)
print("r2 score of RidgeCV model: ",r2_score(prediction_rcv_t,times_test))
print("Mean squared error of RidgeCV model: ",mean_squared_error(prediction_rcv_t,times_test))
print("MAE score of RidgeCV model: ",mean_absolute_error(prediction_rcv_t,times_test))
print("MAPE score of RidgeCV model: ",mean_absolute_percentage_error(prediction_rcv_t,times_test))
best parameters: {'alpha_per_target': False, 'fit_intercept': True, 'gcv_mode': 'auto'}
r2 score of RidgeCV model:  0.9441159515751051
Mean squared error of RidgeCV model:  0.06675242912655013
MAE score of Linear Regression model:  0.20363439049171383
MAPE score of Linear Regression model:  0.034088789766596704
In [ ]:
print("Best alpha:", best_model_rcv_t.alpha_)
Best alpha: 150.0

Random Forest

italicized text

In [ ]:
from sklearn.ensemble import RandomForestRegressor

model_rf_time = RandomForestRegressor(random_state=123)
rf_parameters = {
'max_depth':[2,4,8,12,16,20,24,32],
'n_estimators':[10, 20, 30]}
rf_search_time = GridSearchCV(model_rf_time,rf_parameters,cv=5,scoring='r2')
rf_search_time.fit(X_train_times, times_train)

#Import necessary metrics
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error

best_model_rf_time = rf_search_time.best_estimator_
best_params_rf_time = rf_search_time.best_params_
prediction_rf_t = best_model_rf_time.predict(X_test_times)
print("best parameters:",best_params_rf_time)
print("r2 score of best random forest model: ",r2_score(prediction_rf_t,times_test))
print("Mean squared error of best random forest model: ",mean_squared_error(prediction_rf_t,times_test))
print("MAE score of RF model: ",mean_absolute_error(prediction_rf_t,times_test))
print("MAPE score of RF model: ",mean_absolute_percentage_error(prediction_rf_t,times_test))
best parameters: {'max_depth': 12, 'n_estimators': 30}
r2 score of best random forest model:  0.959427257586498
Mean squared error of best random forest model:  0.0581036870637935
MAE score of RF model:  0.18624854912275293
MAPE score of RF model:  0.029962504771901585

GradientBoostRegressor

In [ ]:
from sklearn.ensemble import GradientBoostingRegressor

# Define the model
model_gbr = GradientBoostingRegressor()

# Define the grid of hyperparameters to search
param_grid_gbr = {
    'n_estimators': [2, 4, 10, 20, 30],
    'learning_rate': [0.01, 0.1, 0.5],
    'max_depth': [3, 5, 7]
}

# Perform grid search with cross-validation
grid_search_gbr = GridSearchCV(model_gbr, param_grid_gbr, cv=5, scoring='r2')
grid_search_gbr.fit(X_train_times, times_train)

# Get the best model and hyperparameters
best_model_gbr = grid_search_gbr.best_estimator_
best_params_gbr = grid_search_gbr.best_params_

# Print the best hyperparameters found
print("Best Hyperparameters:", best_params_gbr)

# Make predictions on the test set
prediction_gbr = best_model_gbr.predict(X_test_times)

# Print the R-squared score
print("R-squared score:", r2_score(times_test, prediction_gbr))
print("Mean squared error of best random forest model: ",mean_squared_error(prediction_gbr,times_test))
print("MAE score of RF model: ",mean_absolute_error(prediction_gbr,times_test))
print("MAPE score of RF model: ",mean_absolute_percentage_error(prediction_gbr,times_test))
Best Hyperparameters: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 30}
R-squared score: 0.9549051234483795
Mean squared error of best random forest model:  0.05716799831131944
MAE score of RF model:  0.18787431577149286
MAPE score of RF model:  0.030404607871929865

AdaBoostRegressor

In [ ]:
from sklearn.ensemble import AdaBoostRegressor

# Define the model
model_abr = AdaBoostRegressor()

# Define the grid of hyperparameters to search
param_grid_abr = {
    'n_estimators': [5, 10, 15, 30, 40],
    'learning_rate': [0.01, 0.1, 0.5],
}

# Perform grid search with cross-validation
grid_search_abr = GridSearchCV(model_abr, param_grid_abr, cv=5, scoring='r2')
grid_search_abr.fit(X_train_times, times_train)

# Get the best model and hyperparameters
best_model_abr = grid_search_abr.best_estimator_
best_params_abr = grid_search_abr.best_params_

# Print the best hyperparameters found
print("Best Hyperparameters:", best_params_abr)

# Make predictions on the test set
prediction_abr = best_model_abr.predict(X_test_times)

# Print the R-squared score
print("R-squared score:", r2_score(times_test, prediction_abr))
print("Mean squared error of best ABR model: ",mean_squared_error(prediction_abr,times_test))
print("MAE score of ABR model: ",mean_absolute_error(prediction_abr,times_test))
print("MAPE score of ABR model: ",mean_absolute_percentage_error(prediction_abr,times_test))
Best Hyperparameters: {'learning_rate': 0.5, 'n_estimators': 30}
R-squared score: 0.9569821587366184
Mean squared error of best random forest model:  0.054534884331848325
MAE score of RF model:  0.18834641476212952
MAPE score of RF model:  0.030728892886982908

The following text is provided by ChatGPT.

Choosing between RidgeCV and an ensemble model like AdaBoostRegressor or GradientBoostingRegressor depends on various factors such as interpretability, computational efficiency, and model complexity. Here are some considerations:

Interpretability: RidgeCV provides coefficients for each feature, allowing for easier interpretation of the model's behavior. On the other hand, ensemble models like AdaBoostRegressor or GradientBoostingRegressor are generally more complex, making it harder to interpret individual feature contributions.

Computational Efficiency: RidgeCV is generally faster to train compared to ensemble models, especially when dealing with large datasets. Ensemble models often require training multiple weak learners or base estimators, which can be computationally expensive.

Model Complexity: RidgeCV typically produces a simpler model compared to ensemble models. If a simpler model is preferred due to concerns about overfitting or interpretability, RidgeCV might be a better choice. Robustness: Ensemble models can often handle a wider range of data distributions and relationships compared to RidgeCV. They are also less sensitive to outliers and noise in the data.

Generalization Performance: While both RidgeCV and ensemble models can achieve similar R-square values on training data, their performance on unseen data might differ. It's important to evaluate the models on validation or test data to assess their generalization performance. In conclusion, if interpretability, computational efficiency, and model simplicity are important considerations, RidgeCV might be preferred. However, if robustness and potential for higher predictive performance are more important, ensemble models like AdaBoostRegressor or GradientBoostingRegressor might be preferred, even if they show similar R-square values on training data.

RidgeCV gave very good values for all metrics. As it is a simple model, let us evaluate it in detail first.

Evaluation of RidgeCV model for predicting time to eruption

Residual Evaluation

In [ ]:
prediction_rcv_train_t = best_model_rcv_t.predict(X_train_times)

train_residuals_rcv_t = times_train - prediction_rcv_train_t
test_residuals_rcv_t = times_test - prediction_rcv_t

Plot residual and density against predicted for training and test
In [ ]:
# Residual evaluation plots
plt.figure(figsize=(12, 6))

# Residuals vs Predictions plot
plt.subplot(2, 2, 1)
plt.scatter(times_test, test_residuals_rcv_t, c='green', marker='s', label='Test data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

plt.subplot(2, 2, 2)
plt.scatter(times_train, train_residuals_rcv_t, c='blue', marker='o', label='Training data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

# Residuals distribution plot
plt.subplot(2, 2, 3)
sns.histplot(test_residuals_rcv_t, bins = 20, color='green', label='Test residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.subplot(2, 2, 4)
sns.histplot(train_residuals_rcv_t, bins = 20, color='blue', label='Training residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

QQ Plot of residuals
In [ ]:
from cProfile import label
import statsmodels.api as sm
import scipy.stats as stats

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.qqplot(test_residuals_rcv_t, stats.t, fit=True, line="45", ax=ax1)
ax1.set_xlabel('Theoretical Quantiles')
ax1.set_ylabel('Sample Quantiles')
ax1.set_title('Q-Q Plot of Test Residuals')

sm.qqplot(train_residuals_rcv_t, stats.t, fit=True, line="45", ax=ax2)
ax2.set_xlabel('Theoretical Quantiles')
ax2.set_ylabel('Sample Quantiles')
ax2.set_title('Q-Q Plot of Training Residuals')

plt.tight_layout()
plt.show()
No description has been provided for this image

See that fpr both test and training, the residuals follow the 45 degree line closely, which is an indication of a good model and proves that residuals are normally distributed.

Analyse autocorrelation of residuals
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.graphics.tsa.plot_acf(test_residuals_rcv_t, lags=100, ax = ax1)
ax1.set_xlabel('Lag')
ax1.set_ylabel('Autocorrelation')
ax1.set_title('Autocorrelation of Test Residuals')

sm.graphics.tsa.plot_acf(train_residuals_rcv_t, lags=100, ax = ax2)
ax2.set_xlabel('Lag')
ax2.set_ylabel('Autocorrelation')
ax2.set_title('Autocorrelation of Training Residuals')

plt.show()
No description has been provided for this image

We see that there is no auto correlation of residuals for any lag.

In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

ax1.scatter(prediction_rcv_t, times_test, label='Test data')
#ax1.axline(xy1=(0, 0), xy2 =(1,1), color='r', lw=2)
ax1.set_xlabel('Predicted')
ax1.set_ylabel('Actual')
ax1.grid(True)
ax1.legend()
ax1.set_title('predicted vs actual')

ax2.plot(prediction_rcv_train_t, times_train, 'o', label='Training data')
ax2.set_xlabel('Predicted')
ax2.set_ylabel('Actual')
ax2.set_title('predicted vs actual')
ax2.grid(True)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

Slight x-axis imbalance in the residuals vs predicted plo is explained here with this plot. The predicted and actual follow each other closely.

Cross-validation check on training data.

Evaluation of AdaBoostRegressor for predicting time to eruption

In [ ]:
# evaluate the model
cv = RepeatedKFold(random_state=1)
n_scores_rcv_t = cross_val_score(best_model_rcv_t, X_train_times, times_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')
#n_scores_best = cross_val_score(best_model_rf, X_train_pressure, pressures_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_rcv_t), np.std(n_scores_rcv_t)))
#print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_best), np.std(n_scores_best)))

sns.distplot(n_scores_rcv_t)
Avg. error: 0.917 (0.018)
Out[ ]:
<Axes: ylabel='Density'>
No description has been provided for this image

Residual Evaluation

In [ ]:
prediction_abr_t_train = best_model_abr.predict(X_train_times)

train_residuals_abr_t = times_train - prediction_abr_t_train
test_residuals_abr_t = times_test - prediction_abr

Plot residual and density against predicted for training and test
In [ ]:
# Residual evaluation plots
plt.figure(figsize=(12, 6))

# Residuals vs Predictions plot
plt.subplot(2, 2, 1)
plt.scatter(times_test, test_residuals_abr_t, c='green', marker='s', label='Test data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

plt.subplot(2, 2, 2)
plt.scatter(times_train, train_residuals_abr_t, c='blue', marker='o', label='Training data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

# Residuals distribution plot
plt.subplot(2, 2, 3)
sns.histplot(test_residuals_abr_t, bins = 20, color='green', label='Test residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.subplot(2, 2, 4)
sns.histplot(train_residuals_abr_t, bins = 20, color='blue', label='Training residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

QQPlot
In [ ]:
from cProfile import label
import statsmodels.api as sm
import scipy.stats as stats

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.qqplot(test_residuals_abr_t, stats.t, fit=True, line="45", ax=ax1)
ax1.set_xlabel('Theoretical Quantiles')
ax1.set_ylabel('Sample Quantiles')
ax1.set_title('Q-Q Plot of Test Residuals')

sm.qqplot(train_residuals_abr_t, stats.t, fit=True, line="45", ax=ax2)
ax2.set_xlabel('Theoretical Quantiles')
ax2.set_ylabel('Sample Quantiles')
ax2.set_title('Q-Q Plot of Training Residuals')

plt.tight_layout()
plt.show()
No description has been provided for this image

Analyse autocorrelation of residuals
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.graphics.tsa.plot_acf(test_residuals_abr_t, lags=100, ax = ax1)
ax1.set_xlabel('Lag')
ax1.set_ylabel('Autocorrelation')
ax1.set_title('Autocorrelation of Test Residuals')

sm.graphics.tsa.plot_acf(train_residuals_abr_t, lags=100, ax = ax2)
ax2.set_xlabel('Lag')
ax2.set_ylabel('Autocorrelation')
ax2.set_title('Autocorrelation of Training Residuals')

plt.show()
No description has been provided for this image

Plot actual vs predicted
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

ax1.scatter(prediction_abr, times_test, label='Test data')
#ax1.axline(xy1=(0, 0), xy2 =(1,1), color='r', lw=2)
ax1.set_xlabel('Predicted')
ax1.set_ylabel('Actual')
ax1.grid(True)
ax1.legend()
ax1.set_title('predicted vs actual')

ax2.plot(prediction_abr_t_train, times_train, 'o', label='Training data')
ax2.set_xlabel('Predicted')
ax2.set_ylabel('Actual')
ax2.set_title('predicted vs actual')
ax2.grid(True)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

This explins the X axis imbalance in the residual vs predicted plot before.

Cross validation against training data

In [ ]:
# evaluate the model
cv = RepeatedKFold(random_state=1)
n_scores_abr_t = cross_val_score(best_model_abr, X_train_times, times_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')
#n_scores_best = cross_val_score(best_model_rf, X_train_pressure, pressures_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_abr_t), np.std(n_scores_abr_t)))
#print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_best), np.std(n_scores_best)))

sns.distplot(n_scores_abr_t)
Avg. error: 0.947 (0.013)
Out[ ]:
<Axes: ylabel='Density'>
No description has been provided for this image

Evaluation of Random Forest Regressor for predicting time to eruption

In [ ]:
prediction_rf_train_t = best_model_rf_time.predict(X_train_times)

train_residuals_rf_t = times_train - prediction_rf_train_t
test_residuals_rf_t = times_test - prediction_rf_t

Residual Analysis

Plot residual and density against predicted for training and test
In [ ]:
# Residual evaluation plots
plt.figure(figsize=(12, 6))

# Residuals vs Predictions plot
plt.subplot(2, 2, 1)
plt.scatter(times_test, test_residuals_rf_t, c='green', marker='s', label='Test data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

plt.subplot(2, 2, 2)
plt.scatter(times_train, train_residuals_rf_t, c='blue', marker='o', label='Training data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.title('Residuals vs Predicted Values')
plt.axhline(y=0, color='red', linestyle='--')
plt.legend()

# Residuals distribution plot
plt.subplot(2, 2, 3)
sns.histplot(test_residuals_rf_t, bins = 20, color='green', label='Test residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.subplot(2, 2, 4)
sns.histplot(train_residuals_rf_t, bins = 20, color='blue', label='Training residuals', kde = True)
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

QQ Plot
In [ ]:
from cProfile import label
import statsmodels.api as sm
import scipy.stats as stats

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.qqplot(test_residuals_rf_t, stats.t, fit=True, line="45", ax=ax1)
ax1.set_xlabel('Theoretical Quantiles')
ax1.set_ylabel('Sample Quantiles')
ax1.set_title('Q-Q Plot of Test Residuals')

sm.qqplot(train_residuals_rf_t, stats.t, fit=True, line="45", ax=ax2)
ax2.set_xlabel('Theoretical Quantiles')
ax2.set_ylabel('Sample Quantiles')
ax2.set_title('Q-Q Plot of Training Residuals')

plt.tight_layout()
plt.show()
No description has been provided for this image

Analysis of Autocorrelation of residuals
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

sm.graphics.tsa.plot_acf(test_residuals_rf_t, lags=100, ax = ax1)
ax1.set_xlabel('Lag')
ax1.set_ylabel('Autocorrelation')
ax1.set_title('Autocorrelation of Test Residuals')

sm.graphics.tsa.plot_acf(train_residuals_rf_t, lags=100, ax = ax2)
ax2.set_xlabel('Lag')
ax2.set_ylabel('Autocorrelation')
ax2.set_title('Autocorrelation of Training Residuals')

plt.show()
No description has been provided for this image

Plot actual vs predicted
In [ ]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

ax1.scatter(prediction_rf_t, times_test, label='Test data')
#ax1.axline(xy1=(0, 0), xy2 =(1,1), color='r', lw=2)
ax1.set_xlabel('Predicted')
ax1.set_ylabel('Actual')
ax1.grid(True)
ax1.legend()
ax1.set_title('predicted vs actual')

ax2.plot(prediction_rf_train_t, times_train, 'o', label='Training data')
ax2.set_xlabel('Predicted')
ax2.set_ylabel('Actual')
ax2.set_title('predicted vs actual')
ax2.grid(True)
plt.legend()

plt.tight_layout()
plt.show()
No description has been provided for this image

Cross validation on training data
In [ ]:
# evaluate the model
cv = RepeatedKFold(random_state=1)
n_scores = cross_val_score(best_model_rf_time, X_train_times, times_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')
#n_scores_best = cross_val_score(best_model_rf, X_train_pressure, pressures_train, scoring='r2', cv=cv, n_jobs=-1, error_score='raise')
# report performance
print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores), np.std(n_scores)))
#print('Avg. error: %.3f (%.3f)' % (np.mean(n_scores_best), np.std(n_scores_best)))

sns.distplot(n_scores)
#sns.distplot(n_scores_best)
Avg. error: 0.944 (0.012)
Out[ ]:
<Axes: ylabel='Density'>
No description has been provided for this image

Plot Feature importance of different models for predicting time to eruption

Feature importance of RidgeCV

In [ ]:
most_important_feature_rcv_time = plot_feature_importance(X_train_times,best_model_rcv_t, True)

plt.scatter(X_test_times[most_important_feature_rcv_time], times_test,  color='black')
plt.scatter(X_test_times[most_important_feature_rcv_time], prediction_rf_t, color='blue', linewidth=3)

plt.ylabel(('time to eruption'))
plt.xlabel((most_important_feature_rcv_time))

plt.show()
 Feature importance plot

No description has been provided for this image
Pressure__change_quantiles__f_agg_"var"__isabs_True__qh_0.6__ql_0.4 0.05153284465290502
Pressure__fft_coefficient__attr_"angle"__coeff_16 0.04980710923336445
Pressure__permutation_entropy__dimension_3__tau_1 0.04714278644046238
Pressure__number_cwt_peaks__n_1 0.04085155510467099
Pressure__agg_linear_trend__attr_"slope"__chunk_len_10__f_agg_"max" 0.02845081586903857
No description has been provided for this image

AdaBoostRegressor

In [ ]:
most_important_feature_abr_time = plot_feature_importance(X_train_times,best_model_abr, False)

plt.scatter(X_test_times[most_important_feature_abr_time], times_test,  color='black')
plt.scatter(X_test_times[most_important_feature_abr_time], prediction_rf_t, color='blue', linewidth=3)

plt.ylabel(('time to eruption'))
plt.xlabel((most_important_feature_abr_time))

plt.show()
 Feature importance plot

No description has been provided for this image
Pressure__range_count__max_1__min_-1 0.355417191095119
Pressure__mean_change 0.2932159198686467
Pressure__length 0.19623052746876923
Pressure__count_above_mean 0.04099818324639365
Pressure__last_location_of_maximum 0.02688819431605379
No description has been provided for this image

GradientBoostRegressor

In [ ]:
#Feature importance of GradientBoostingRegressor

most_important_feature_gbr_time = plot_feature_importance(X_train_times,best_model_gbr, False)

plt.scatter(X_test_times[most_important_feature_gbr_time], times_test,  color='black')
plt.scatter(X_test_times[most_important_feature_gbr_time], prediction_rf_t, color='blue', linewidth=3)

plt.ylabel(('time to eruption'))
plt.xlabel((most_important_feature_gbr_time))

plt.show()
 Feature importance plot

No description has been provided for this image
Pressure__range_count__max_1__min_-1 0.525144424212412
Pressure__length 0.3521496969917338
Pressure__mean_change 0.06126065124766116
Pressure__fft_coefficient__attr_"angle"__coeff_64 0.009474966844870595
Pressure__fft_coefficient__attr_"real"__coeff_29 0.00712515588995123
No description has been provided for this image

Feature imprtance of Random Forest

In [ ]:
most_important_feature_rf_time = plot_feature_importance(X_train_times,best_model_rf_time, False)

plt.scatter(X_test_times[most_important_feature_rf_time], times_test,  color='black')
plt.scatter(X_test_times[most_important_feature_rf_time], prediction_rf_t, color='blue', linewidth=3)

plt.ylabel(('time to eruption'))
plt.xlabel((most_important_feature_rf_time))

plt.show()
 Feature importance plot

No description has been provided for this image
Pressure__length 0.4684653940985864
Pressure__range_count__max_1__min_-1 0.3513040652970508
Pressure__fft_aggregated__aggtype_"variance" 0.07151147999595393
Pressure__agg_linear_trend__attr_"slope"__chunk_len_5__f_agg_"max" 0.02642999456864357
Pressure__count_below_mean 0.014521182265793292
No description has been provided for this image

For all the models considered for predicting time to eruption, it is the same 2 most significant features, which are :

  1. Pressure__range_count__max_1__min_-1
  2. Pressure__length