autots.tools package¶
Submodules¶
autots.tools.anomaly_utils module¶
Created on Fri Jul 1 15:41:21 2022
@author: Colin
point, contextual, and collective. Point anomalies are single values that fall within low-density regions of values, collective anomalies indicate that a sequence of values is anomalous rather than any single value by itself, and contextual anomalies are single values that do not fall within low-density regions yet are anomalous with regard to local values - https://arxiv.org/pdf/1802.04431.pdf
- autots.tools.anomaly_utils.anomaly_df_to_holidays(anomaly_df, actuals=None, anomaly_scores=None, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=False, use_lunar_holidays=False, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False)¶
- autots.tools.anomaly_utils.anomaly_new_params(method='random')¶
- autots.tools.anomaly_utils.create_dates_df(dates)¶
Take a pd.DatetimeIndex and create simple date parts.
- autots.tools.anomaly_utils.dates_to_holidays(dates, df_cols, style='long', holiday_impacts='value', day_holidays=None, wkdom_holidays=None, wkdeom_holidays=None, lunar_holidays=None, lunar_weekday=None, islamic_holidays=None, hebrew_holidays=None, hindu_holidays=None, max_features: int | None = None)¶
Populate date information for a given pd.DatetimeIndex.
- Parameters:
dates (pd.DatetimeIndex) – list of dates
day_holidays (pd.DataFrame) – list of month/day holidays. Pass None if not available
style (str) – option for how to return information “long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar
holiday_impacts (dict) – a dict passed to .replace contaning values for holiday_names, or str ‘value’ or ‘anomaly_score’
- autots.tools.anomaly_utils.detect_anomalies(df, output, method, transform_dict=None, method_params={}, eval_period=None, n_jobs=1)¶
All will return -1 for anomalies.
- Parameters:
output (str) – ‘multivariate’ (each series unique outliers), or ‘univariate’ (all series together for one outlier flag per timestamp)
transform_dict (dict) – option but helpful, often datepart, differencing, or other standard AutoTS transformer params
forecast_params (dict) – used to backcast and identify ‘unforecastable’ values, required only for predict_interval method
eval_periods (int) – only use this length tail of data, currently only implemented for forecast_params forecasting if used
n_jobs (int) – multiprocessing jobs, used by some methods
- Returns:
pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)
- autots.tools.anomaly_utils.fit_anomaly_classifier(anomalies, scores)¶
Fit a DecisionTree model to predict if a score is an anomaly.
This is a shared utility for both AnomalyDetector and AnomalyRemoval classes. Using DecisionTree as it can handle nonparametric anomalies.
- Parameters:
anomalies (pd.DataFrame) – DataFrame with -1 for anomalies, 1 for normal
scores (pd.DataFrame) – DataFrame with anomaly scores
- Returns:
(classifier, score_categories) - trained classifier and categorical mapping
- Return type:
tuple
- autots.tools.anomaly_utils.gaussian_mixture(df, n_components=2, tol=0.001, max_iter=100, responsibility_threshold=0.05)¶
- autots.tools.anomaly_utils.holiday_new_params(method='random')¶
- autots.tools.anomaly_utils.limits_to_anomalies(df, output, upper_limit, lower_limit, method_params=None)¶
- autots.tools.anomaly_utils.loop_sk_outliers(df, method, method_params={}, n_jobs=1)¶
Multiprocessing on each series for multivariate outliers with sklearn.
- autots.tools.anomaly_utils.nonparametric_multivariate(df, output, method_params, n_jobs=1)¶
- autots.tools.anomaly_utils.score_to_anomaly(scores, classifier, score_categories)¶
Convert anomaly scores to anomaly classifications using a trained classifier.
This is a shared utility for both AnomalyDetector and AnomalyRemoval classes.
- Parameters:
scores (pd.DataFrame) – DataFrame with anomaly scores
classifier – trained sklearn classifier
score_categories – categorical mapping from fit_anomaly_classifier
- Returns:
Classifications (-1 = anomaly, 1 = normal)
- Return type:
pd.DataFrame
- autots.tools.anomaly_utils.sk_outliers(df, method, method_params={})¶
scikit-learn outlier methods wrapper.
- autots.tools.anomaly_utils.values_to_anomalies(df, output, threshold_method, method_params, n_jobs=1)¶
- autots.tools.anomaly_utils.zscore_survival_function(df, output='multivariate', method='zscore', distribution='norm', rolling_periods: int = 200, center: bool = True)¶
Take a dataframe and generate zscores and then generating survival probabilities (smaller = more outliery).
- Parameters:
df (pd.DataFramme) – wide style time series data (datetimeindex, series)
output (str) – univariate (1 series from all) or multivariate (all series input returned unique)
method (str) – zscore, rolling_zscore, mad (median abs dev)
distribution (str) – distribution to sample sf/outliers from
rolling_period (int) – >1, used for rolling_zscore period
center (bool) – passed to pd.rolling for rolliing_zscore, True for holiday detection, False for anomaly detection generally
- Returns:
pd.Dataframe of p-values
autots.tools.autoencoder module¶
Autoencoder tools for anomaly detection
Variational Autoencoder (VAE) implementation for time series anomaly detection
- class autots.tools.autoencoder.VAE(input_dim, latent_dim, depth=1, dropout_rate=0.0)¶
Bases:
ModuleComplete VAE model.
- forward(x)¶
- reparameterize(mean, logvar)¶
Reparameterization trick.
- class autots.tools.autoencoder.VAEAnomalyDetector(depth=1, batch_size=32, epochs=50, learning_rate=0.001, loss_function='elbo', dropout_rate=0.0, latent_dim=None, beta=1.0, random_state=None, device=None)¶
Bases:
objectVariational Autoencoder for Anomaly Detection.
- fit(X)¶
Fit the VAE model.
- predict(X)¶
Predict anomaly scores.
- class autots.tools.autoencoder.VAEDecoder(input_dim, latent_dim, depth=1, dropout_rate=0.0)¶
Bases:
ModuleVAE Decoder network.
- forward(z)¶
- class autots.tools.autoencoder.VAEEncoder(input_dim, latent_dim, depth=1, dropout_rate=0.0)¶
Bases:
ModuleVAE Encoder network.
- forward(x)¶
- autots.tools.autoencoder.loop_vae_outliers(df, method_params={}, n_jobs=1)¶
Multiprocessing on each series for multivariate VAE outliers.
- autots.tools.autoencoder.vae_outliers(df, method_params={})¶
VAE-based outlier detection.
autots.tools.bayesian_regression module¶
- class autots.tools.bayesian_regression.BayesianMultiOutputRegression(gaussian_prior_mean=0.0, alpha=1.0, wishart_prior_scale=1.0, wishart_dof_excess=0)¶
Bases:
objectMatrix-normal/Inverse-Wishart Bayesian linear regression.
Computes an analytical posterior for multivariate linear regression with Gaussian weight noise and unknown Gaussian observation noise. The posterior mean matches ridge regression, while the stored covariance allows callers to recover coefficient standard deviations, prediction intervals, and posterior samples without resorting to numerical sampling during training.
- coefficient_interval(z_value=1.96)¶
- fit(X, Y)¶
- predict(X, return_std=False)¶
- sample_posterior(n_samples=1)¶
autots.tools.calendar module¶
Calendar conversion functions.
Includes Lunar, Chinese lunar, and Arabic lunar
- autots.tools.calendar.gregorian_to_chinese(datetime_index, epoch=2444238.5)¶
Convert a pandas DatetimeIndex to Chinese Lunar calendar. Potentially has errors.
- autots.tools.calendar.gregorian_to_christian_lunar(datetime_index)¶
Convert a pandas DatetimeIndex to Christian Lunar calendar. Aspiration it doesn’t work exactly.
- autots.tools.calendar.gregorian_to_hebrew(dates)¶
Convert pd.Datetimes to a Hebrew date. From pyluach by simlist.
This is the slowest of the lot and needs to be improved.
- autots.tools.calendar.gregorian_to_hindu(datetime_index, method: str = 'lunar')¶
Convert a pandas DatetimeIndex to Hindu calendar date components. Hindu calendar has numerous regional variations. This implementation aims for compatibility with common internationally recognized Hindu holidays.
- Parameters:
datetime_index (pd.DatetimeIndex) – pandas DatetimeIndex
method (str) – ‘simple’ or ‘lunar’. Simple is faster. Lunar is more accurate.
- autots.tools.calendar.gregorian_to_islamic(date, epoch_adjustment=1.5)¶
Calculate Islamic dates for pandas DatetimeIndex. Approximately. From convertdate by fitnr.
- Parameters:
epoch_adjustment (float) – 1.0 and that needs to be adjusted by about +/- 0.5 to account for timezone
- autots.tools.calendar.heb_is_leap(year)¶
- autots.tools.calendar.lunar_from_lunar(new_moon)¶
Assumes continuous daily data and pre-needed start.
- autots.tools.calendar.lunar_from_lunar_full(full_moon)¶
Assumes continuous daily data and pre-needed start.
- autots.tools.calendar.to_jd(year, month, day)¶
Determine Julian day count from Islamic date. From convertdate by fitnr.
autots.tools.changepoints module¶
- class autots.tools.changepoints.ChangepointDetector(method='pelt', method_params=None, aggregate_method='mean', min_segment_length=5, probabilistic_output=False, n_jobs=1)¶
Bases:
objectAdvanced changepoint detection class for time series data.
Supports multiple algorithms for detecting changepoints and level shifts in wide-format time series data, similar to HolidayDetector.
- create_features(forecast_length=0)¶
Create changepoint features for regression modeling.
- Parameters:
forecast_length (int) – Number of future periods to extend features
- Returns:
Changepoint features
- Return type:
pd.DataFrame
- detect(df)¶
Run changepoint detection on wide-format time series data.
- Parameters:
df (pd.DataFrame) – Wide-format time series with DatetimeIndex
- fit(df)¶
Fit the changepoint detector and prepare transformer artifacts.
- Parameters:
df (pd.DataFrame) – Training data with DatetimeIndex.
- fit_transform(df)¶
Fit the detector and immediately transform the input data.
- Parameters:
df (pd.DataFrame) – Training data with DatetimeIndex.
- get_market_changepoints(method='dbscan', params=None)¶
Find common changepoints across multiple time series using clustering.
- Parameters:
method (str) – Clustering method (‘dbscan’, ‘kmeans’, ‘hierarchical’)
params (dict) – Parameters for clustering algorithm
- Returns:
Array of market-wide changepoint indices
- Return type:
np.ndarray
- static get_new_params(method='random')¶
Generate new random parameters for changepoint detection.
- Parameters:
method (str) –
Method for parameter selection - ‘fast’: All methods but with fastest parameter configurations for PELT and composite_fused_lasso - Or specify a method name directly: ‘basic’, ‘pelt’, ‘l1_fused_lasso’,
’l1_total_variation’, ‘cusum’, ‘autoencoder’, ‘composite_fused_lasso’
- Returns:
Complete parameter dictionary for ChangepointDetector initialization
- Return type:
dict
- inverse_transform(df, trans_method='forecast')¶
Restore data to the original scale using stored changepoint trends.
- Parameters:
df (pd.DataFrame) – Data to inverse transform.
trans_method (str) – Compatibility argument for transformer interface.
- plot(series_name=None, figsize=(12, 8))¶
Plot time series with detected changepoints.
- Parameters:
series_name (str) – Name of series to plot (for individual detection)
figsize (tuple) – Figure size
- transform(df)¶
Apply changepoint-based detrending to the provided data.
- Parameters:
df (pd.DataFrame) – Data to transform.
- autots.tools.changepoints.changepoint_fcst_from_last_row(x_t_last_row, n_forecast=10)¶
- autots.tools.changepoints.create_changepoint_features(DTindex, changepoint_spacing=60, changepoint_distance_end=120, method='basic', params=None, data=None)¶
Creates a feature set for estimating trend changepoints using various algorithms.
Parameters: DTindex (pd.DatetimeIndex): a datetimeindex changepoint_spacing (int): Distance between consecutive changepoints (legacy, for basic method). changepoint_distance_end (int): Number of rows that belong to the final changepoint (legacy, for basic method). method (str): Method for changepoint detection (‘basic’, ‘pelt’, ‘l1_fused_lasso’, ‘l1_total_variation’, ‘cusum’, ‘ewma’, ‘autoencoder’) params (dict): Additional parameters for the chosen method data (array-like): Time series data (required for advanced methods)
Returns: pd.DataFrame: DataFrame containing changepoint features for linear regression.
- autots.tools.changepoints.find_market_changepoints_multivariate(df, detector_params=None, clustering_method='dbscan', clustering_params=None, min_series_agreement=0.3)¶
Find common changepoints across multivariate time series data.
- Parameters:
df (pd.DataFrame) – Wide-format time series data
detector_params (dict) – Parameters for ChangePointDetector
clustering_method (str) – Method for clustering changepoints (‘dbscan’, ‘kmeans’, ‘agreement’)
clustering_params (dict) – Parameters for clustering
min_series_agreement (float) – Minimum fraction of series that must agree on a changepoint
- Returns:
Dictionary with market changepoints and individual series changepoints
- Return type:
dict
- autots.tools.changepoints.generate_random_changepoint_params(method='random')¶
Generate random parameters for changepoint detection methods.
This function creates appropriately weighted random parameters for different changepoint detection algorithms, supporting the flexible method/params system.
DEPRECATED: This function now delegates to ChangepointDetector.get_new_params() for consistency. Use ChangepointDetector.get_new_params() directly for new code.
- Parameters:
method (str) – Method for parameter selection - ‘random’: All methods with balanced weights - ‘fast’: All methods but with fastest parameter configurations for PELT and composite_fused_lasso - ‘default’/’basic_weighted’: Basic method heavily weighted
- Returns:
- (changepoint_method, changepoint_params) where
changepoint_method (str): Selected method name
changepoint_params (dict): Method-specific parameters
- Return type:
tuple
- autots.tools.changepoints.half_yr_spacing(df)¶
autots.tools.cointegration module¶
Cointegration
Johansen heavily based on Statsmodels source code
BTCD heavily based on D. Barba https://towardsdatascience.com/canonical-decomposition-a-forgotten-method-for-time-series-cointegration-and-beyond-4d1213396da1
- autots.tools.cointegration.btcd_decompose(p_mat: ndarray, regression_model, max_lag: int = 1, return_eigenvalues=False)¶
Calculate decomposition. p_mat is of shape(t,n), wide style data.
- autots.tools.cointegration.coint_fast(endog, k_ar_diff=1)¶
A fast estimation of cointegration vectors for VECM.
- Parameters:
endog (np.array) – should be (nobs, n_vars)
k_ar_diff (int) – number of lags to use in VECM.
- autots.tools.cointegration.coint_johansen(endog, det_order=-1, k_ar_diff=1, return_eigenvalues=False, fast: bool = True)¶
Johansen cointegration test of the cointegration rank of a VECM, abbreviated from Statsmodels
- autots.tools.cointegration.fourier_series(dates, period, series_order)¶
Provides Fourier series components with the specified frequency and order.
- Parameters:
dates (pd.Series containing timestamps.) –
period (Number of days of the period.) –
series_order (Number of components.) –
- Return type:
Matrix with seasonality features.
- autots.tools.cointegration.lagmat(x, maxlag: int, trim='forward', original='ex')¶
Create 2d array of lags. Modified from Statsmodels.
autots.tools.constraint module¶
Constraint and adjustment generation functions
- autots.tools.constraint.apply_adjustment_single(forecast: DataFrame, adjustment_method: str, adjustment_params: dict | None = None, df_train: DataFrame | None = None, series_ids=None, lower_forecast: DataFrame | None = None, upper_forecast: DataFrame | None = None)¶
Apply a single adjustment to forecast (and optional bounds).
- adjustment_method:
- “basic”: linear ramp between start/end values and dates
params: start_date, end_date, start_value, end_value, method (“additive”|”multiplicative”)
- “align_last_value”: align start of forecast to recent history, requires df_train
params: any AlignLastValue kwargs (rows, lag, method, strength, etc.)
- “smoothing”: EWMA smoothing
params: span (int)
series_ids limits adjustment to specific columns; defaults to all columns.
- autots.tools.constraint.apply_constraint_single(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None)¶
- autots.tools.constraint.apply_fit_constraint(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, lower_constraint=None, upper_constraint=None, train_min=None, train_max=None, fillna=None)¶
- autots.tools.constraint.constant_growth_rate(periods, final_growth)¶
Take a final target growth rate (ie 2 % over a year) and convert to a daily (etc) value.
- autots.tools.constraint.constraint_new_params(method: str = 'fast')¶
- autots.tools.constraint.fit_constraint(constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None, forecast_length=None)¶
autots.tools.cpu_count module¶
CPU counter for multiprocesing.
- autots.tools.cpu_count.cpu_count(modifier: float = 1)¶
Find available CPU count, running on both Windows/Linux.
- Attempts to be very conservative:
Remove Intel Hyperthreading logical cores
Find max cores allowed to the process, if less than machine has total
Runs best with psutil installed, fallsback to mkl, then os core count/2
- Parameters:
modifier (float) – multiple CPU count by this value
- autots.tools.cpu_count.set_n_jobs(n_jobs, verbose=0)¶
autots.tools.fast_kalman module¶
From SIMD KALMAN, (c) 2017 Otto Seiskari (MIT License)
- Some other resources that I have found useful:
https://kevinkotze.github.io/ts-4-state-space/ https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html an Introduction to State Space Time Series Analysis, Commandeur and Koopman, chp 8 Forecasting, structural time series models, and the Kalman Filter, Andrew Harvey
Following the notation in [1], the Kalman filter framework consists of a dynamic model (state transition model)
and a measurement model (observation model)
where the vector \(x\) is the (hidden) state of the system and \(y\) is an observation. A and H are matrices of suitable shape and \(Q\), \(R\) are positive-definite noise covariance matrices.
Usage example¶
import numpy.random numpy.random.seed(0)
Define model
import simdkalman import numpy as np
- kf = simdkalman.KalmanFilter(
state_transition = [[1,1],[0,1]], # matrix A process_noise = np.diag([0.1, 0.01]), # Q observation_model = np.array([[1,0]]), # H observation_noise = 1.0) # R
Generate some fake data
import numpy.random as random
# 100 independent time series data = random.normal(size=(100, 200))
# with 10% of NaNs denoting missing values data[random.uniform(size=data.shape) < 0.1] = np.nan
Smooth all data
- smoothed = kf.smooth(data,
initial_value = [1,0], initial_covariance = np.eye(2) * 0.5)
# second timeseries, third time step, hidden state x print(‘mean’) print(smoothed.states.mean[1,2,:])
print(‘covariance’) print(smoothed.states.cov[1,2,:,:])
mean [ 0.29311384 -0.06948961] covariance [[ 0.19959416 -0.00777587]
[-0.00777587 0.02528967]]
Predict new data for a single series (1d case)
predicted = kf.predict(data[1,:], 123)
# predicted observation y, third new time step pred_mean = predicted.observations.mean[2] pred_stdev = np.sqrt(predicted.observations.cov[2])
print(‘%g +- %g’ % (pred_mean, pred_stdev))
1.71543 +- 1.65322
Low-level Kalman filter computation steps with multi-dimensional input arrays. Unlike with the KalmanFilter class, all inputs must be numpy arrays. However, their dimensions can flexibly vary form 1 to 3 as long as they are reasonable from the point of view of matrix multiplication and numpy broadcasting rules. Matrix operations are applied on the last two axes of the arrays.
- class autots.tools.fast_kalman.Gaussian(mean, cov)¶
Bases:
object- static empty(n_states, n_vars, n_measurements, cov=True)¶
- unvectorize_state()¶
- unvectorize_vars()¶
- class autots.tools.fast_kalman.KalmanFilter(state_transition, process_noise, observation_model, observation_noise)¶
Bases:
objectThe main Kalman filter class providing convenient interfaces to vectorized smoothing and filtering operations on multiple independent time series.
As long as the shapes of the given parameters match reasonably according to the rules of matrix multiplication, this class is flexible in their exact nature accepting
scalars:
process_noise = 0.1(2d) numpy matrices:
process_noise = numpy.eye(2)2d arrays:
observation_model = [[1,2]]3d arrays and matrices for vectorized computations. Unlike the other options, this locks the shape of the inputs that can be processed by the smoothing and prediction methods.
- Parameters:
state_transition – State transition matrix \(A\)
process_noise – Process noise (state transition covariance) matrix \(Q\)
observation_model – Observation model (measurement model) matrix \(H\)
observation_noise – Observation noise (measurement noise covariance) matrix \(R\)
- class Result¶
Bases:
object
- compute(data, n_test, initial_value=None, initial_covariance=None, smoothed=True, filtered=False, states=True, covariances=True, observations=True, likelihoods=False, gains=False, log_likelihood=False, verbose=False)¶
Smoothing, filtering and prediction at the same time. Used internally by other methods, but can also be used directly if, e.g., both smoothed and predicted data is wanted.
See smooth and predict for explanation of the common parameters. With this method, there also exist the following flags.
- Parameters:
smoothed (boolean) – compute Kalman smoother (used by smooth)
filtered (boolean) – return (one-way) filtered data
likelihoods (boolean) – return likelihoods of each step
gains (boolean) – return Kalman gains and pairwise covariances (used by the EM algorithm). If true, the gains are provided as a member of the relevant subresult
filtered.gainsand/orsmoothed.gains.log_likelihood (boolean) – return the log-likelihood(s) for the entire series. If matrix data is given, this will be a vector where each element is the log-likelihood of a single row.
- Return type:
result object whose fields depend on of the above parameter flags are True. The possible values are:
smoothed(the return value of smooth, may containsmoothed.gains),filtered(likesmoothed, may also containfiltered.gains),predicted(the return value of predict ifn_test > 0)pairwise_covariances,likelihoodsandlog_likelihood.
- em(data, n_iter=5, initial_value=None, initial_covariance=None, verbose=False, tol=None)¶
- em_observation_noise(result, data, verbose=False)¶
- em_process_noise(result, verbose=False)¶
- predict(data, n_test, initial_value=None, initial_covariance=None, states=True, observations=True, covariances=True, verbose=False)¶
Filter past data and predict a given number of future values. The data can be given as either of
1d array, like
[1,2,3,4]. In this case, one Kalman filter is used and the return value structure will contain an 1d array ofobservations(both.meanand.covwill be 1d).2d matrix, whose each row is interpreted as an independent time series, all of which are filtered independently. The returned
observationsmembers will be 2-dimensional in this case.3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e,
data[1,2,:]defines the components of the third observation of the second series. In the-multi-dimensional case the returnedobservations.meanwill be 3-dimensional andobservations.cov4-dimensional.
Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.
- Parameters:
data – Past data
n_test (integer) – number of future steps to predict.
initial_value – Initial value \({\mathbb E}[x_0]\)
initial_covariance – Initial uncertainty \({\rm Cov}[x_0]\)
states (boolean) – predict states \(x\)?
observations (boolean) – predict observations \(y\)?
covariances (boolean) – include covariances in predictions?
- Return type:
Result object with fields
statesandobservations, if the respective parameter flags are set to True. Both areGaussianresult objects with fieldsmeanandcov(if the covariances flag is True)
- predict_next(m, P)¶
Single prediction step
- Parameters:
m – \({\mathbb E}[x_{j-1}]\), the previous mean
P – \({\rm Cov}[x_{j-1}]\), the previous covariance
- Return type:
(prior_mean, prior_cov)predicted mean and covariance \({\mathbb E}[x_j]\), \({\rm Cov}[x_j]\)
- predict_observation(m, P)¶
Probability distribution of observation \(y\) for a given distribution of \(x\)
- Parameters:
m – \({\mathbb E}[x]\)
P – \({\rm Cov}[x]\)
- Return type:
mean \({\mathbb E}[y]\) and covariance \({\rm Cov}[y]\)
- smooth(data, initial_value=None, initial_covariance=None, observations=True, states=True, covariances=True, verbose=False)¶
Smooth given data, which can be either
1d array, like
[1,2,3,4]. In this case, one Kalman filter is used and the return value structure will contain an 1d array ofobservations(both.meanand.covwill be 1d).2d matrix, whose each row is interpreted as an independent time series, all of which are smoothed independently. The returned
observationsmembers will be 2-dimensional in this case.3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e,
data[1,2,:]defines the components of the third observation of the second series. In the-multi-dimensional case the returnedobservations.meanwill be 3-dimensional andobservations.cov4-dimensional.
Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.
- Parameters:
data – 1d or 2d data, see above
initial_value – Initial value \({\mathbb E}[x_0]\)
initial_covariance – Initial uncertainty \({\rm Cov}[x_0]\)
states (boolean) – return smoothed states \(x\)?
observations (boolean) – return smoothed observations \(y\)?
covariances (boolean) – include covariances results?
- Return type:
Result object with fields
statesandobservations, if the respective parameter flags are set to True. Both areGaussianresult objects with fieldsmeanandcov(if the covariances flag is True)
- smooth_current(m, P, ms, Ps)¶
Simgle Kalman smoother backwards step
- Parameters:
m – \({\mathbb E}[x_j|y_1,\ldots,y_j]\), the filtered mean of \(x_j\)
P – \({\rm Cov}[x_j|y_1,\ldots,y_j]\), the filtered covariance of \(x_j\)
ms – \({\mathbb E}[x_{j+1}|y_1,\ldots,y_T]\)
Ps – \({\rm Cov}[x_{j+1}|y_1,\ldots,y_T]\)
- Return type:
(smooth_mean, smooth_covariance, smoothing_gain)smoothed mean \({\mathbb E}[x_j|y_1,\ldots,y_T]\), and covariance \({\rm Cov}[x_j|y_1,\ldots,y_T]\) & smoothing gain \(C\)
- update(m, P, y, log_likelihood=False)¶
Single update step with NaN check.
- Parameters:
m – \({\mathbb E}[x_j|y_1,\ldots,y_{j-1}]\), the prior mean of \(x_j\)
P – \({\rm Cov}[x_j|y_1,\ldots,y_{j-1}]\), the prior covariance of \(x_j\)
y – observation \(y_j\)
log_likelihood – compute log-likelihood?
- Return type:
(posterior_mean, posterior_covariance, log_likelihood)posterior mean \({\mathbb E}[x_j|y_1,\ldots,y_j]\) & covariance \({\rm Cov}[x_j|y_1,\ldots,y_j]\) and, if requested, log-likelihood. If \(y_j\) is NaN, returns the prior mean and covariance instead
- autots.tools.fast_kalman.autoshape(func)¶
Automatically shape arguments and return values
- autots.tools.fast_kalman.ddot(A, B)¶
Matrix multiplication over last two axes
- autots.tools.fast_kalman.ddot_t_right(A, B)¶
Matrix multiplication over last 2 axes with right operand transposed
- autots.tools.fast_kalman.ddot_t_right_old(A, B)¶
Matrix multiplication over last 2 axes with right operand transposed
- autots.tools.fast_kalman.dinv(A)¶
Matrix inverse applied to last two axes
- autots.tools.fast_kalman.douter(a, b)¶
Outer product, last two axes
- autots.tools.fast_kalman.em_initial_state(result, initial_means)¶
- autots.tools.fast_kalman.ensure_matrix(x, dim=1)¶
- autots.tools.fast_kalman.predict(mean, covariance, state_transition, process_noise)¶
Kalman filter prediction step
- Parameters:
mean – \({\mathbb E}[x_{j-1}]\), the filtered mean form the previous step
covariance – \({\rm Cov}[x_{j-1}]\), the filtered covariance form the previous step
state_transition – matrix \(A\)
process_noise – matrix \(Q\)
- Return type:
(prior_mean, prior_cov)predicted mean and covariance \({\mathbb E}[x_j]\), \({\rm Cov}[x_j]\)
- autots.tools.fast_kalman.predict_observation(mean, covariance, observation_model, observation_noise)¶
Compute probability distribution of the observation \(y\), given the distribution of \(x\).
- Parameters:
mean – \({\mathbb E}[x]\)
covariance – \({\rm Cov}[x]\)
observation_model – matrix \(H\)
observation_noise – matrix \(R\)
- Return type:
mean \({\mathbb E}[y]\) and covariance \({\rm Cov}[y]\)
- autots.tools.fast_kalman.priv_smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)¶
- autots.tools.fast_kalman.priv_update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement, log_likelihood=False)¶
- autots.tools.fast_kalman.smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)¶
Kalman smoother backwards step
- Parameters:
posterior_mean – \({\mathbb E}[x_j|y_1,\ldots,y_j]\), the filtered mean of \(x_j\)
posterior_covariance – \({\rm Cov}[x_j|y_1,\ldots,y_j]\), the filtered covariance of \(x_j\)
state_transition – matrix \(A\)
process_noise – matrix \(Q\)
next_smooth_mean – \({\mathbb E}[x_{j+1}|y_1,\ldots,y_T]\)
next_smooth_covariance – \({\rm Cov}[x_{j+1}|y_1,\ldots,y_T]\)
- Return type:
(smooth_mean, smooth_covariance, smoothing_gain)smoothed mean \({\mathbb E}[x_j|y_1,\ldots,y_T]\), and covariance \({\rm Cov}[x_j|y_1,\ldots,y_T]\)
- autots.tools.fast_kalman.stable_pinv(A, tol=1e-05, regularization=0.0001)¶
- autots.tools.fast_kalman.update(prior_mean, prior_covariance, observation_model, observation_noise, measurement)¶
Kalman filter update step
- Parameters:
prior_mean – \({\mathbb E}[x_j|y_1,\ldots,y_{j-1}]\), the prior mean of \(x_j\)
prior_covariance – \({\rm Cov}[x_j|y_1,\ldots,y_{j-1}]\), the prior covariance of \(x_j\)
observation_model – matrix \(H\)
observation_noise – matrix \(R\)
measurement – observation \(y_j\)
- Return type:
(posterior_mean, posterior_covariance)posterior mean and covariance \({\mathbb E}[x_j|y_1,\ldots,y_j]\), \({\rm Cov}[x_j|y_1,\ldots,y_j]\) after observing \(y_j\)
- autots.tools.fast_kalman.update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement)¶
Kalman filter update with a check for NaN observations. Like
updatebut returns(prior_mean, prior_covariance)ifmeasurementis NaN
autots.tools.fast_kalman_params module¶
Generate random state-space model parameters paired with fast_kalman.py.
- autots.tools.fast_kalman_params.ensure_stability(st)¶
- autots.tools.fast_kalman_params.holt_winters_damped_matrices(M, alpha, beta, gamma, phi=1.0)¶
Not sure if this is correct. It’s close, at least.
- autots.tools.fast_kalman_params.new_kalman_params(method=None, allow_auto=True)¶
- autots.tools.fast_kalman_params.random_matrix(rows, cols, density=0.2)¶
- autots.tools.fast_kalman_params.random_state_space(tries=15)¶
- autots.tools.fast_kalman_params.random_state_space_original()¶
Return randomly generated statespace models.
autots.tools.fft module¶
Created on Mon Oct 9 22:07:37 2023
@author: colincatlin
- class autots.tools.fft.FFT(n_harm=10, detrend='linear', freq_range=None)¶
Bases:
object- fit(x)¶
- generate_harmonics_dataframe(forecast_length=0)¶
- predict(forecast_length=0)¶
- autots.tools.fft.fourier_extrapolation(x, forecast_length=10, n_harm=10, detrend='linear', freq_range=None)¶
autots.tools.fir_filter module¶
Created on Fri Sep 6 23:59:35 2024
@author: colincatlin
- autots.tools.fir_filter.apply_fir_filter_time_domain(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')¶
Apply FIR filter using time-domain convolution (lfilter) for smaller memory usage. This function has padding issues currently.
- autots.tools.fir_filter.apply_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')¶
Apply FIR filter to an array of time series data with shape (observations, series).
Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)
Returns: - filtered_data: The filtered version of the input data
- autots.tools.fir_filter.fft_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming', chunk_size=1000)¶
Apply FIR filter to an array of time series data with shape (observations, series).
Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)
Returns: - filtered_data: The filtered version of the input data
- autots.tools.fir_filter.generate_random_fir_params(method='random', data_type='time_series')¶
autots.tools.g7xx_codec module¶
G.711 and G.726 audio codec inspired utilities for time series processing.
This module provides: 1. G.711 companding (mu-law and A-law) for non-linear scaling 2. G.726 adaptive differential PCM for denoising/smoothing 3. Transformer classes (G711Scaler, G726Filter) for AutoTS integration
G.711 is used as a non-linear scaler: encode on transform, decode on inverse_transform. G.726 is used as an adaptive filter to denoise while preserving structure.
All operations are fully vectorized for efficient processing of multiple time series.
- class autots.tools.g7xx_codec.G711Scaler(mode: str = 'mu', mu: float = 100.0, A: float = 87.6, center: str = 'median', scale_method: str = 'mad', scale_factor: float = 3.0, min_scale: float = 1e-06, clip: bool = True, zero_offset: float = 0.0, fill_method: str = 'interpolate', on_transform: bool = True, on_inverse: bool = True, bounds_only: bool = False, **kwargs)¶
Bases:
EmptyTransformerG.711-style non-linear scaler (mu-law or A-law).
Encodes (compands) on transform and decodes (expands to original space) on inverse_transform. Optimized for daily/hourly data by using robust per-series centering and scaling to normalize into [-1, 1] prior to companding.
- Parameters:
mode – ‘mu’ for mu-law or ‘a’ for A-law companding
mu – mu-law parameter (default 255.0, try 100-512 range)
A – A-law parameter (default 87.6, try 50-100 range)
center – ‘median’ or ‘mean’ for centering
scale_method – ‘mad’, ‘std’, ‘maxabs’, or ‘percentile’ for scale estimation
scale_factor – Multiplier for scale (default 3.0, lower=more compression, higher=better reconstruction)
min_scale – Minimum scale to prevent division by zero
clip – Whether to clip normalized values to [-1, 1]
zero_offset – Small constant added to handle exact zeros (e.g., 1e-6 for sparse data)
fill_method – Method to handle NaN values
on_transform – Apply encoding on transform
on_inverse – Apply decoding on inverse_transform
bounds_only – Only apply during bounds adjustment
Note
For sparse/intermittent data with many zeros, consider setting zero_offset or using alternative transformers like ReplaceConstant or CenterSplit.
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Transform data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
- inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶
Inverse transform data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Transform data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.g7xx_codec.G726Filter(quant_bits: int = 4, adaptation_rate: float = 0.96, prediction_alpha: float = 0.92, floor_step: float = 0.01, dynamic_range: float = 1.5, blend: float = 0.15, noise_gate: float = 0.0, fill_method: str = 'interpolate', on_transform: bool = True, on_inverse: bool = False, bounds_only: bool = False, quantizer: str = 'uniform', use_adaptive_predictor: bool = True, predictor_leak: float = 0.9999, **kwargs)¶
Bases:
EmptyTransformerAdaptive differential PCM smoothing inspired by the G.726 codec.
Implements proper G.726-style adaptive filtering with: - 2-pole/6-zero adaptive predictor with decision-directed updates (not unlike an ARMA (2,6) model) - Non-uniform quantization optimized for time series - Fast/slow scale factor adaptation with speed switching
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Transform data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
- inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶
Inverse transform data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Transform data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- autots.tools.g7xx_codec.g711_decode(encoded: ndarray | DataFrame, center: ndarray, scale: ndarray, mode: str = 'mu', mu: float = 255.0, A: float = 87.6, zero_offset: float = 0.0) ndarray¶
Decode from G.711 companded space back to original feature space.
- Parameters:
zero_offset (float, optional) – Must match the offset used in encoding. Subtracted after decoding.
- autots.tools.g7xx_codec.g711_encode(values: ndarray | DataFrame, mode: str = 'mu', mu: float = 255.0, A: float = 87.6, center: str = 'median', scale_method: str = 'mad', scale_factor: float = 3.0, min_scale: float = 1e-06, clip: bool = True, zero_offset: float = 0.0) tuple[ndarray, ndarray, ndarray]¶
Encode values via G.711-style companding after robust normalization.
Returns encoded array and the center/scale used so that decoding can recover the original feature space.
- Parameters:
zero_offset (float, optional) – Small constant added before transformation to handle exact zeros. Useful for sparse/intermittent data. Default 0.0 (no offset).
- autots.tools.g7xx_codec.g726_adpcm_filter(values: ndarray | DataFrame, quant_bits: int = 4, adaptation_rate: float = 0.96, prediction_alpha: float = 0.92, floor_step: float = 0.01, dynamic_range: float = 1.5, blend: float = 0.15, noise_gate: float = 0.0, quantizer: str = 'uniform', use_adaptive_predictor: bool = True, predictor_leak: float = 0.9999) ndarray¶
Apply the adaptive encode/decode cycle to an array of time series.
- Parameters:
values (array-like or DataFrame) – Shape (observations, series). Each column is processed independently, but operations stay vectorized across series.
quant_bits (int, default 4) – Number of bits used for the quantizer (only for uniform mode).
adaptation_rate (float, default 0.96) – Only used when use_adaptive_predictor=False (legacy EMA mode).
prediction_alpha (float, default 0.92) – Only used when use_adaptive_predictor=False (legacy EMA mode).
floor_step (float, default 0.01) – Initial scale factor (minimum allowed value).
dynamic_range (float, default 1.5) – Initial scale multiplier (legacy mode only).
blend (float, default 0.15) – Optional blending factor with the predictor baseline. Must be in [0, 1].
noise_gate (float, default 0.0) – Additional soft threshold applied to residual magnitudes.
quantizer (str, default "uniform") – Quantization method: “uniform” for simple rounding, “nonuniform” for G.726-style decision levels optimized for time series.
use_adaptive_predictor (bool, default True) – Use 2-pole/6-zero adaptive predictor. If False, falls back to EMA.
predictor_leak (float, default 0.9999) – Leakage factor for predictor coefficients to prevent unbounded growth.
- Returns:
Filtered data with the same shape as
values.- Return type:
numpy.ndarray
autots.tools.hierarchial module¶
- autots.tools.hierarchial.compute_volatility_weights(S: ndarray, cov_bottom: ndarray, volatility_method: str = 'variance', volatility_power: float = 1.0) ndarray¶
Compute volatility-based weights for preferential adjustment of high-volatility series.
- Parameters:
S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.
cov_bottom (np.ndarray, shape (M, M)) – Covariance matrix of bottom-level series.
volatility_method (str) – Method to compute volatility: “variance”, “std”, “cv” (coefficient of variation)
volatility_power (float) – Power to raise volatility weights (higher values increase preference for volatile series)
- Returns:
vol_weights – Volatility-weighted matrix where higher weights are placed on more volatile series.
- Return type:
np.ndarray, shape (L, L)
- autots.tools.hierarchial.erm_reconcile(S: ndarray, y_all: ndarray, W: ndarray) ndarray¶
- ERM (Error or Empirical Risk Minimization) Reconciliation using robust numerical methods:
Solve Weighted LS: min_{y_bottom} ||y_all - S y_bottom||_W^2 subject to hierarchical constraints.
- The closed-form solution for y_bottom^r:
y_bottom^r = (S’ W S)^{-1} S’ W y_all
=> y_all^r = S y_bottom^r = S (S’ W S)^{-1} S’ W y_all
- Parameters:
S (np.ndarray, shape (L, M)) – Hierarchy matrix. L = # total levels, M = # bottom series.
y_all (np.ndarray, shape (T, L)) – Forecasts for T time points, dimension L.
W (np.ndarray, shape (L, L)) – Weight (covariance) matrix for the Weighted LS objective.
- Returns:
y_all_reconciled – Reconciled forecasts for all L levels.
- Return type:
np.ndarray, shape (T, L)
- class autots.tools.hierarchial.hierarchial(grouping_method: str = 'tile', n_groups: int = 5, reconciliation: str = 'mean', grouping_ids: dict | None = None)¶
Bases:
objectCreate hierarchial series, then reconcile.
Currently only performs one-level groupings. :param grouping_method: method to create groups. ‘User’ requires hier_id input of groupings. :type grouping_method: str :param n_groups: number of groups, if above is not ‘User’ :type n_groups: int :param reconciliation: None, or ‘mean’ method to combine top and bottom forecasts. :type reconciliation: str :param grouping_ids: dict of series_id: group_id to use if grouping is ‘User’ :type grouping_ids: dict
- fit(df)¶
Construct and save object info.
- reconcile(df)¶
Apply to forecasted data containing bottom and top levels.
- transform(df)¶
Apply hierarchy to existing data with bottom levels only.
- autots.tools.hierarchial.iterative_mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray, max_iterations: int = 10, convergence_threshold: float = 1e-06, damping_factor: float = 0.7) ndarray¶
Iterative MinT reconciliation that gradually converges to an optimal solution.
This method applies MinT reconciliation iteratively, updating the weight matrix based on reconciliation residuals from previous iterations.
- Parameters:
S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.
y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points.
W (np.ndarray, shape (L, L)) – The initial covariance (or weighting) matrix.
max_iterations (int) – Maximum number of iterations.
convergence_threshold (float) – Convergence threshold for relative change in reconciled forecasts.
damping_factor (float) – Damping factor for weight matrix updates (0 < damping_factor < 1).
- Returns:
y_all_reconciled – Reconciled forecasts for all L levels.
- Return type:
np.ndarray, shape (T, L)
- autots.tools.hierarchial.iterative_volatility_mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray, cov_bottom: ndarray, volatility_method: str = 'variance', volatility_power: float = 1.0, volatility_mix: float = 0.5, max_iterations: int = 10, convergence_threshold: float = 1e-06, damping_factor: float = 0.7) ndarray¶
Combined iterative and volatility-weighted MinT reconciliation.
This method combines both approaches: volatility-based weighting and iterative refinement.
- Parameters:
S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.
y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points.
W (np.ndarray, shape (L, L)) – The base covariance (or weighting) matrix.
cov_bottom (np.ndarray, shape (M, M)) – Covariance matrix of bottom-level series for volatility computation.
volatility_method (str) – Method to compute volatility: “variance”, “std”, “cv”
volatility_power (float) – Power to raise volatility weights.
volatility_mix (float) – Mixing parameter between base weights and volatility weights.
max_iterations (int) – Maximum number of iterations.
convergence_threshold (float) – Convergence threshold for relative change in reconciled forecasts.
damping_factor (float) – Damping factor for weight matrix updates.
- Returns:
y_all_reconciled – Reconciled forecasts for all L levels.
- Return type:
np.ndarray, shape (T, L)
- autots.tools.hierarchial.ledoit_wolf_covariance(X: ndarray, assume_centered: bool = False) ndarray¶
Computes the Ledoit-Wolf shrunk covariance matrix of X.
- Parameters:
X (np.ndarray, shape (n_samples, n_features)) – The data matrix. Each row is an observation, each column is a variable.
assume_centered (bool) – If True, X is assumed to already be centered.
- Returns:
lw_cov – The Ledoit-Wolf shrunk covariance matrix estimate.
- Return type:
np.ndarray, shape (n_features, n_features)
Notes
This shrinks the sample covariance toward the identity matrix.
The shrinkage intensity gamma is determined from data per Ledoit & Wolf (2004).
- autots.tools.hierarchial.mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray) ndarray¶
- MinT reconciliation formula using robust numerical methods:
y_all^r = S (S’ W^-1 S)^-1 S’ W^-1 y_all
- Parameters:
S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix. L = number of hierarchical levels (top + middle + bottom), M = number of bottom-level series.
y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points (the second dimension L must match S.shape[0]).
W (np.ndarray, shape (L, L)) – The (regularized) covariance (or weighting) matrix for the hierarchical levels.
- Returns:
y_all_reconciled – Reconciled forecasts for all L levels (top, middle, bottom).
- Return type:
np.ndarray, shape (T, L)
- autots.tools.hierarchial.volatility_weighted_mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray, cov_bottom: ndarray, volatility_method: str = 'variance', volatility_power: float = 1.0, volatility_mix: float = 0.5) ndarray¶
Volatility-weighted MinT reconciliation that preferentially adjusts high-volatility series.
The method combines traditional MinT with volatility-based weighting: W_vol = (1 - α) * W + α * V where V is the volatility-based weight matrix and α is the mixing parameter.
- Parameters:
S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.
y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points.
W (np.ndarray, shape (L, L)) – The base covariance (or weighting) matrix.
cov_bottom (np.ndarray, shape (M, M)) – Covariance matrix of bottom-level series for volatility computation.
volatility_method (str) – Method to compute volatility: “variance”, “std”, “cv”
volatility_power (float) – Power to raise volatility weights.
volatility_mix (float) – Mixing parameter between base weights (0) and volatility weights (1).
- Returns:
y_all_reconciled – Reconciled forecasts for all L levels.
- Return type:
np.ndarray, shape (T, L)
autots.tools.holiday module¶
Manage holiday features.
- autots.tools.holiday.holiday_flag(DTindex, country: str = 'US', encode_holiday_type: bool = False, holidays_subdiv=None)¶
Create a 0/1 flag for given datetime index. Includes fallback to pandas for US holidays if holidays package unavailable.
- Parameters:
DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays also accepts a list of countries, but not a list of subdivisions
encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays
holidays_subdiv (str) – subdivision (ie state), if used
- Returns:
pd.DataFrame with DatetimeIndex
- autots.tools.holiday.query_holidays(DTindex, country: str, encode_holiday_type: bool = False, holidays_subdiv=None)¶
Query holidays package for dates.
- Parameters:
DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays
encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays
autots.tools.impute module¶
Fill NA.
- autots.tools.impute.FillNA(df, method: str = 'ffill', window: int = 10)¶
Fill NA values using different methods.
- Parameters:
method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘seasonal_linear’ - seasonally-aware linear regression imputation using datetime and local features ‘seasonal_linear_window_3’ - seasonal linear with window=3 ‘seasonal_linear_window_10’ - seasonal linear with window=10 also most method values of pd.DataFrame.interpolate()
window (int) – length of rolling windows for filling na, for rolling methods
- class autots.tools.impute.SeasonalityMotifImputer(k: int = 3, datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False)¶
Bases:
object- impute(df)¶
Infer missing values on input df.
- class autots.tools.impute.SimpleSeasonalityMotifImputer(datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False, max_iter: int = 100)¶
Bases:
object- impute(df)¶
Infer missing values on input df.
- autots.tools.impute.biased_ffill(df, mean_weight: float = 1)¶
Fill NaN with average of last value and mean.
- autots.tools.impute.fake_date_fill(df, back_method: str = 'slice')¶
Numpy vectorized version. Return a dataframe where na values are removed and values shifted forward.
Warning
Thus, values will have incorrect timestamps!
- Parameters:
back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na
- autots.tools.impute.fake_date_fill_old(df, back_method: str = 'slice')¶
Return a dataframe where na values are removed and values shifted forward.
Warning
Thus, values will have incorrect timestamps!
- Parameters:
back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na
- autots.tools.impute.fill_forward(df)¶
Fill NaN with previous values.
- autots.tools.impute.fill_forward_alt(df)¶
Fill NaN with previous values.
- autots.tools.impute.fill_mean(df)¶
- autots.tools.impute.fill_mean_old(df)¶
Fill NaN with mean.
- autots.tools.impute.fill_median(df)¶
Fill nan with median values. Does not work with non-numeric types.
- autots.tools.impute.fill_median_old(df)¶
Fill NaN with median.
- autots.tools.impute.fill_one(df)¶
Fill NaN with zero.
- autots.tools.impute.fill_zero(df)¶
Fill NaN with zero.
- autots.tools.impute.fillna_np(array, values)¶
- autots.tools.impute.rolling_mean(df, window: int = 10)¶
Fill NaN with mean of last window values.
- autots.tools.impute.seasonal_linear_imputer(df, datepart_method: str = 'common_fourier', window: int = 5, lambda_: float = 0.01)¶
Seasonally-aware linear interpolation using multioutput linear regression.
This method creates a feature set consisting of local linear trend features (changepoints) and datetime features (seasonal patterns). It trains a ridge regression model on all non-missing points, then predicts on all missing points to fill them.
This is fully vectorized for speed on large datasets and uses matrix operations across the entire dataframe. Unlike BasicLinearModel, this handles missing data by training only on non-NaN rows.
- Parameters:
df (pd.DataFrame) – DataFrame with datetime index and potential missing values
datepart_method (str) – Method for generating seasonal features. Default is ‘common_fourier’
window (int) – Controls changepoint spacing for local linear trend features. Default is 5
lambda (float) – Ridge regression regularization parameter. Default is 0.01
- Returns:
DataFrame with missing values filled
- Return type:
pd.DataFrame
Example
>>> df_filled = seasonal_linear_imputer(df, window=7) >>> # Or via FillNA: >>> df_filled = FillNA(df, method='seasonal_linear', window=10)
autots.tools.kalman module¶
Created on Wed Feb 5 16:29:58 2025
- autots.tools.kalman.kalman_fusion_forecasts(F: ndarray, index, columns, coverage: float = 0.9, method: str = 'multi_series', Q_init=0.1, R_init=1.0, adapt_Q: str | None = None, adapt_R: str | None = None, initial_x=None, initial_P=None, min_std=1e-15, scale: bool = True, a: float = 1.0)¶
Fuse multiple forecasts using a Kalman Filter for each forecast step.
- Parameters:
F (np.ndarray) – Forecasts of shape (n, S, T) - n = number of models - S = forecast length (time steps) - T = number of time series
index (pd.Index) – Index for the returned DataFrames (length=S).
columns (list-like) – Column names for the returned DataFrames (length=T).
coverage (float) – e.g. 0.90 => ~1.645 standard deviations for intervals.
method (str) – Either “multi_series” or “per_series”. - “multi_series”: single (T-dim) Kalman Filter for all T series together - “per_series”: run T separate 1D Kalman Filters, one per series
Q_init (float or np.ndarray) – Base process noise (or matrix).
R_init (float or np.ndarray) – Base measurement noise (or matrix).
adapt_Q (str) – If “spread”, adapt Q each step based on model spread.
adapt_R (str) – If “spread”, adapt R each step based on model spread.
initial_x (np.ndarray) – initial state guess. - For multi_series: shape (T,) - For per_series: shape (T,)
initial_P (np.ndarray) – initial covariance guess. - For multi_series: shape (T, T) - For per_series: shape (T,)
min_std (float) – Small floor value for numerical stability in standard deviations.
scale (bool) – If True, each series is scaled (by its standard deviation) before filtering. The fused forecasts are converted back to the original scale.
- Returns:
Kalman-fused point forecasts, shape (S, T). df_lower (pd.DataFrame): Lower bound, shape (S, T). df_upper (pd.DataFrame): Upper bound, shape (S, T).
- Return type:
df_point (pd.DataFrame)
Notes
In “multi_series” mode the state is a T-dimensional vector that is updated with an (n*T)-dimensional measurement (the stacked forecasts from the n models).
In “per_series” mode the algorithm runs T separate 1D Kalman Filters (one per series). In that case, for each forecast step (loop over S) you update each of the T filters independently.
When scale=True, each series is divided by its standard deviation (computed over all forecasts) before filtering; after filtering the outputs are multiplied by the same scale so that the returned forecasts are in the original feature space.
- autots.tools.kalman.robust_pinv(M, rcond=1e-15, reg=1e-08)¶
autots.tools.lunar module¶
Phases of the moon. Modified from https://stackoverflow.com/a/2531541/9492254 by keturn and earlier from John Walker
- autots.tools.lunar.dcos(d)¶
- autots.tools.lunar.dsin(d)¶
- autots.tools.lunar.fixangle(a)¶
- autots.tools.lunar.kepler(m, ecc=0.016718)¶
Solve the equation of Kepler.
- autots.tools.lunar.moon_phase(datetime_index, epsilon=1e-06, epoch=2444237.905, ecliptic_longitude_epoch=278.83354, ecliptic_longitude_perigee=282.596403, eccentricity=0.016718, moon_mean_longitude_epoch=64.975464, moon_mean_perigee_epoch=349.383063)¶
Numpy version. Takes a pd.DatetimeIndex and returns moon phase (%illuminated). Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. This is for US. epoch=2444238.5 for Asia generally.
- autots.tools.lunar.moon_phase_alternative(datetime_index, epoch=2451545.0)¶
Numpy version. Takes a pd.DatetimeIndex and returns moon phase (%illuminated).
- autots.tools.lunar.moon_phase_df(datetime_index, epoch=2444237.905)¶
Convert pandas DatetimeIndex to moon phases. Note timezone and hour can matter slightly. Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. 2444237.905 is for US Central. epoch=2444238.5 for Asia generally.
- autots.tools.lunar.moon_phase_df_alternative(datetime_index, epoch=None)¶
Convert pandas DatetimeIndex to moon phases. Note timezone and hour can matter slightly.
- Parameters:
datetime_index – pandas DatetimeIndex
epoch – If provided, uses the approximate (legacy) method for backward compatibility. If None, uses the new more accurate Jean Meeus method.
- autots.tools.lunar.phase_string(p, precision=0.05, new=0.0, first=0.25, full=0.4, last=0.75, nextnew=1.0)¶
- autots.tools.lunar.todeg(r)¶
- autots.tools.lunar.torad(d)¶
autots.tools.mocks module¶
Fake/local imports for when libraries aren’t available, to prevent the whole package from failing to load.
- class autots.tools.mocks.DataLoader(dataset, *args, **kwargs)¶
Bases:
object
- class autots.tools.mocks.Dataset(*args, **kwargs)¶
Bases:
object
- class autots.tools.mocks.MinMaxScaler(feature_range=(0, 1))¶
Bases:
object- fit(X)¶
- fit_transform(X)¶
- transform(X)¶
- class autots.tools.mocks.Module(*args, **kwargs)¶
Bases:
object- eval()¶
- forward(*args, **kwargs)¶
- load_state_dict(state_dict, strict=True)¶
- state_dict()¶
- train(mode=True)¶
- class autots.tools.mocks.StandardScaler¶
Bases:
objectStandardScaler that works with both numpy arrays and pandas DataFrames. Compatible with sklearn.preprocessing.StandardScaler interface.
- fit(X)¶
Compute the mean and standard deviation for each feature.
- fit_transform(X)¶
Fit the scaler and transform the dataset.
- inverse_transform(X)¶
Revert the scaled data back to the original scale.
- transform(X)¶
Scale the dataset using the stored mean and standard deviation.
- class autots.tools.mocks.TensorDataset(*tensors)¶
Bases:
object
- autots.tools.mocks.butter(x)¶
- autots.tools.mocks.curve_fit(x)¶
- autots.tools.mocks.fftconvolve(x)¶
- class autots.tools.mocks.norm(loc=0, scale=1)¶
Bases:
object- cdf(x)¶
Cumulative distribution function using error function.
- ppf(q)¶
Simple normal quantile approximation using Abramowitz & Stegun rational approximation.
- autots.tools.mocks.savgol_filter(x)¶
- autots.tools.mocks.sosfiltfilt(x)¶
- autots.tools.mocks.tqdm(x, **kwargs)¶
autots.tools.percentile module¶
Faster percentile and quantile for numpy
Entirely from: https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/
- autots.tools.percentile.nan_percentile(in_arr, q, method='linear', axis=0, errors='raise')¶
Given a 3D array, return the given percentiles as input by q. Beware this is only tested for the limited case required here, and will not match np fully. Args more limited. If errors=”rollover” passes to np.nanpercentile where args are not supported.
- autots.tools.percentile.nan_quantile(arr, q, method='linear', axis=0, errors='raise')¶
Same as nan_percentile but accepts q in range [0, 1]. Args more limited. If errors=”rollover” passes to np.nanpercentile where not supported.
- autots.tools.percentile.trimmed_mean(data, percent, axis=0)¶
autots.tools.plotting module¶
Shared plotting utilities for feature-rich time series.
- autots.tools.plotting.calculate_peak_density(model: str, data: DataFrame, group_col: str = 'Model', y_col: str = 'TotalRuntimeSeconds')¶
Maximum KDE value for the given model’s distribution.
- autots.tools.plotting.create_seaborn_palette_from_cmap(cmap_name: str = 'gist_rainbow', n: int = 10)¶
Return seaborn palette sampling the given matplotlib cmap.
- autots.tools.plotting.plot_distributions(runtimes_data: DataFrame, group_col: str = 'Model', y_col: str = 'TotalRuntimeSeconds', xlim: float | None = None, xlim_right: float | None = None, title_suffix: str = '')¶
Plot runtime density per group with custom palette.
- autots.tools.plotting.plot_feature_panels(series_name: str, date_index: DatetimeIndex, series_data: Series | ndarray, components: Mapping[str, Iterable[float]], labels: Mapping[str, Iterable], series_type_description: str | None = None, scale: float | None = None, noise_to_signal: float | None = None, figsize=(16, 12), title_prefix: str = 'Feature Analysis', save_path: str | None = None, show: bool = True)¶
Create a four-panel diagnostic plot shared by generator and detector.
- autots.tools.plotting.plot_forecast_with_intervals(plot_df: DataFrame, actual_col: str | None = 'actuals', forecast_col: str = 'forecast', lower_col: str = 'low_forecast', upper_col: str = 'up_forecast', title: str | None = None, colors: Mapping[str, str] | None = None, include_bounds: bool = True, alpha: float = 0.3, band_color: str | None = None, interval_label: str | None = 'Prediction Interval', band_kwargs: Mapping[str, Any] | None = None, plot_lines: bool = True, ax=None, **plot_kwargs)¶
Plot forecast (and optionally actuals) with confidence bounds.
- autots.tools.plotting.plot_risk_score_bar(risk_data: Series | ndarray, index=None, bar_color: str = '#6495ED', bar_ylim: tuple | list | None = None, title: str = 'Risk Score', ylabel: str = 'Risk', xlabel: str = 'Forecast Horizon', ax=None, **bar_kwargs)¶
Plot risk scores as a bar chart.
Utility function for plotting event risk or similar probability scores.
- Parameters:
risk_data – Series or array of risk scores to plot
index – x-axis values; if None, uses range or Series index
bar_color – color for bars
bar_ylim – y-axis limits as (min, max) or [min, max]
title – chart title
ylabel – y-axis label
xlabel – x-axis label
ax – matplotlib axis to plot on; if None, creates new subplot
**bar_kwargs – additional arguments passed to ax.bar()
- Returns:
matplotlib axis
- autots.tools.plotting.plot_simulation_paths(simulations: ndarray, index=None, colors: list | None = None, alpha: float = 0.9, linewidth: float = 1.2, ax=None, **plot_kwargs)¶
Plot multiple simulation/forecast paths.
Utility for plotting Monte Carlo simulations, motif neighbors, or ensemble members.
- Parameters:
simulations – 2D array of shape (n_simulations, n_timesteps)
index – x-axis values; if None, uses range
colors – list of color strings for each path; if None, uses random grays
alpha – transparency for lines
linewidth – width of lines
ax – matplotlib axis to plot on; if None, creates new subplot
**plot_kwargs – additional arguments passed to ax.plot()
- Returns:
matplotlib axis
autots.tools.probabilistic module¶
Point to Probabilistic
- autots.tools.probabilistic.Point_to_Probability(train, forecast, prediction_interval=0.9, method: str = 'historic_quantile')¶
Data driven placeholder for model error estimation.
Catlin Point to Probability method (‘a mixture of dark magic and gum disease’)
- Parameters:
train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
prediction_interval (float) – confidence or perhaps credible interval
method (str) – spell to cast to create dark magic. ‘historic_quantile’, ‘inferred_normal’, ‘variable_pct_change’ gum disease available separately upon request.
- Returns:
upper_error, lower_error (two pandas.DataFrames for upper and lower bound respectively)
- autots.tools.probabilistic.Variable_Point_to_Probability(train, forecast, alpha=0.3, beta=1)¶
Data driven placeholder for model error estimation.
ErrorRange = beta * (En + alpha * En-1 [cum sum of En]) En = abs(0.5 - QTP) * D D = abs(Xn - ((Avg % Change of Train * Xn-1) + Xn-1)) Xn = Forecast Value QTP = Percentile of Score in All Percent Changes of Train Score = Percent Change (from Xn-1 to Xn)
- Parameters:
train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
alpha (float) – parameter which effects the broadening of error range over time Usually 0 < alpha < 1 (although it can be larger than 1)
beta (float) – parameter which effects the general width of the error bar Usually 0 < beta < 1 (although it can be larger than 1)
- Returns:
error width for each value of forecast.
- Return type:
ErrorRange (pandas.DataFrame)
- autots.tools.probabilistic.historic_quantile(df_train, prediction_interval: float = 0.9, nan_flag=None)¶
Computes the difference between the median and the prediction interval range in historic data.
- Parameters:
df_train (pd.DataFrame) – a dataframe of training data
prediction_interval (float) – the desired forecast interval range
- Returns:
two 1D arrays
- Return type:
lower, upper (np.array)
- autots.tools.probabilistic.inferred_normal(train, forecast, n: int = 5, prediction_interval: float = 0.9)¶
A corruption of Bayes theorem. It will be sensitive to the transformations of the data.
- autots.tools.probabilistic.percentileofscore_appliable(x, a, kind='rank')¶
autots.tools.profile module¶
autots.tools.regressor module¶
autots.tools.seasonal module¶
seasonal
@author: Colin
- autots.tools.seasonal.anchored_segment_fourier_features(DTindex, method: str)¶
- autots.tools.seasonal.anchored_warped_fourier_features(DTindex, method: str)¶
- autots.tools.seasonal.create_datepart_components(DTindex, seasonality)¶
single date part one-hot flags.
- autots.tools.seasonal.create_seasonality_feature(DTindex, t, seasonality, history_days=None)¶
Cassandra-designed feature generator.
- autots.tools.seasonal.date_part(DTindex, method: str = 'simple', set_index: bool = True, polynomial_degree: int | None = None, holiday_country: str | None = None, holiday_countries_used: bool = True, lags: int | None = None, forward_lags: int | None = None)¶
Create date part columns from pd.DatetimeIndex.
If you date_part isn’t recognized, you will see a [‘year’, ‘month’ ‘day’, ‘weekday’] output
- Parameters:
DTindex (pd.DatetimeIndex) – datetime index to provide dates
method (str) – expanded, recurring, or simple simple - just day, year, month, weekday expanded - all available futures recurring - all features that should commonly repeat without aging simple_2 simple_3 simple_binarized expanded_binarized common_fourier
set_index (bool) – if True, return DTindex as index of df
polynomial_degree (int) – add this degree of sklearn polynomial features if not None
holdiay_country (list or str) – names of countries to pull calendar holidays for
holiday_countries_used (bool) – to use holiday_country if given
lags (int) – if not None, include the past N previous index date parts
forward_lags (int) – if not None, include the future N index date parts
- Returns:
pd.Dataframe with DTindex
- autots.tools.seasonal.fourier_df(DTindex, seasonality, order=10, t=None, history_days=None)¶
- autots.tools.seasonal.fourier_series(t, p=365.25, n=10)¶
- autots.tools.seasonal.random_datepart(method='random')¶
New random parameters for seasonality.
- autots.tools.seasonal.seasonal_independent_match(DTindex, DTindex_future, k, datepart_method='simple_binarized', distance_metric='canberra', full_sort=False, nan_array=None)¶
- autots.tools.seasonal.seasonal_int(include_one: bool = False, small=False, very_small=False)¶
Generate a random integer of typical seasonalities.
- Parameters:
include_one (bool) – whether to include 1 in output options
small (bool) – if True, keep below 364
very_small (bool) – if True keep below 30
- autots.tools.seasonal.seasonal_repeating_wavelet(DTindex, p, order=12, sigma=4.0, wavelet_type='morlet')¶
- autots.tools.seasonal.seasonal_window_match(DTindex, k, window_size=10, forecast_length=None, datepart_method='common_fourier_rw', distance_metric='mae', full_sort=False)¶
autots.tools.shaping module¶
Reshape data.
- class autots.tools.shaping.NumericTransformer(na_strings: list = ['', ' '], categorical_fillna: str = 'ffill', handle_unknown: str = 'use_encoded_value', downcast: str | None = None, verbose: int = 0)¶
Bases:
objectGeneral purpose numeric conversion for pandas dataframes.
All categorical data and levels must be passed to .fit(). If new categorical series or levels are present in .transform() it won’t work!
Currently datetimes cannot be inverse_transformed back to datetime
- Parameters:
na_strings (list) – list of strings to replace as pd.NA
categorical_fillna (str) – how to fill NaN for categorical variables (numeric NaN are unaltered) “ffill” - uses forward and backward filling to supply na values “indicator” or anything else currently results in all missing replaced with str “missing_value”
handle_unknown (str) – passed through to scikit-learn OrdinalEncoder
downcast (str) – passed to pd.to_numeric, use None or ‘float’
verbose (int) – greater than 0 to print some messages
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- inverse_transform(df, convert_dtypes: bool = False)¶
Convert numeric back to categorical. :param df: df :type df: pandas.DataFrame :param convert_dtypes: whether to use pd.convert_dtypes after inverse :type convert_dtypes: bool
- transform(df)¶
Convert categorical dataset to numeric.
- autots.tools.shaping.clean_weights(weights, series, verbose=0)¶
Polish up series weighting information
- Parameters:
weights (dict) – dictionary of series_id: weight (float or int)
series (iterable) – list of series_ids in the dataset
- autots.tools.shaping.df_cleanup(df_wide, frequency: str = 'infer', prefill_na: str | None = None, na_tolerance: float = 0.999, drop_data_older_than_periods: int = 100000, drop_most_recent: int = 0, aggfunc: str = 'first', verbose: int = 1)¶
Pass cleaning functions through to dataframe.
- Parameters:
df_wide (pd.DataFrame) – input dataframe to clean.
frequency (str, optional) – frequency in string of alias for DateOffset object, normally “1D” -daily, “MS” -month start etc. Currently, aliases are listed somewhere in here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html. Defaults to “infer”.
na_tolerance (float, optional) – allow up to this percent of values to be NaN, else drop the entire series. The default of 0.95 means a series can be 95% NaN values and still be included. Defaults to 0.999.
drop_data_older_than_periods (int, optional) – cut off older data because eventually you just get too much. Defaults to 100000.
drop_most_recent (int, optional) – number of most recent data points to remove. Useful if you pull monthly data before month end, and you don’t want an incomplete month appearing complete. Defaults to 0.
aggfunc (str, optional) – passed to pd.pivot_table, determines how to aggregate duplicates for upsampling. Other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. If categorical data is provided, aggfunc=’first’ is recommended. Defaults to ‘first’.
verbose (int, optional) – 0 for silence, higher values for more noise. Defaults to 1.
- Returns:
original dataframe, now possibly shorter.
- Return type:
pd.DataFrame
- autots.tools.shaping.drop_leading_na_block(df: DataFrame, threshold: float = 0.75, window: int = 7, max_drop_fraction: float = 0.5, warn: bool = True) DataFrame¶
- Drop an initial leading block of rows if:
Those rows are ‘bad’ (row NaN proportion > threshold), AND
There are at least window consecutive bad rows starting from the top.
Will warn or raise if more than max_drop_fraction of rows would be dropped.
- Parameters:
df – pandas DataFrame with a DatetimeIndex.
threshold – Proportion of NaNs to classify a row as bad.
window – Number of consecutive leading bad rows required.
max_drop_fraction – Maximum allowed fraction of rows to drop.
warn – If True, raises a UserWarning on excessive drop; if False, proceeds with printed warning.
- Returns:
Trimmed DataFrame (or identical DataFrame if no qualifying block found).
- autots.tools.shaping.freq_to_timedelta(freq)¶
Working around pandas limitations.
- autots.tools.shaping.infer_frequency(df_wide, warn=True, **kwargs)¶
Infer the frequency in a slightly more robust way.
- Parameters:
df_wide (pd.Dataframe or pd.DatetimeIndex) – input to pull frequency from
warn (bool) – unused, here to make swappable with pd.infer_freq
- autots.tools.shaping.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', aggfunc: str = 'first')¶
Take long data and convert into wide, cleaner data.
- Parameters:
df (pd.DataFrame) –
date_col (str) –
value_col (str) –
the name of the column with the values of the time series (ie sales $)
id_col (str) –
name of the id column, unique for each time series
aggfunc (str) –
passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime
other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended
- autots.tools.shaping.simple_train_test_split(df, forecast_length: int = 10, min_allowed_train_percent: float = 0.3, verbose: int = 1)¶
Uses the last periods of forecast_length as the test set, the rest as train
- Parameters:
forecast_length (int) – number of future periods to predict
min_allowed_train_percent (float) –
forecast length cannot be greater than 1 - this
constrains the forecast length from being much larger than than the training data note this includes NaNs in current configuration
- Returns:
train, test (both pd DataFrames)
- autots.tools.shaping.split_digits_and_non_digits(s)¶
- autots.tools.shaping.subset_series(df, weights, n: int = 1000, random_state: int = 2020)¶
Return a sample of time series.
- Parameters:
df (pd.DataFrame) – wide df with series as columns and DT index
n (int) – number of unique time series to keep, or None
random_state (int) – random seed
- autots.tools.shaping.wide_to_3d(wide_arr, seasonality=7, output_shape='gst')¶
Generates 3d (groups/seasonality, series, time steps) from wide (time step, series) numpy array.
- Parameters:
wide_arr (np.array) – wide style (timesteps, series) numpy time series
seasonality (int) – seasonality of the series to use, avoid really large values
output_shape (str) – either ‘gst’ or ‘sgt’ which is output shape gst: (groups/seasonality, series, time steps) sgt: (series, groups/seasonality, time steps)
autots.tools.thresholding module¶
Created on Thu Jul 7 10:27:46 2022
@author: Colin
- class autots.tools.thresholding.NonparametricThreshold(data, warmup_pts: int = 1, p=0.1, error_buffer=1, z_init=2.5, z_limit=12.0, z_step=0.5, max_contamination=0.25, mean_weight: float = 10, sd_weight: float = 10, anomaly_count_weight: float = 1, inverse: bool = False)¶
Bases:
object- compare_to_epsilon(inverse=False)¶
Compare smoothed error values to epsilon (error threshold) and group consecutive errors together into sequences.
- Parameters:
errors_all (obj) – Errors class object containing list of all
set (previously identified anomalies in test) –
- find_epsilon(inverse=False)¶
Find the anomaly threshold that maximizes function representing tradeoff between:
number of anomalies and anomalous ranges
b) the reduction in mean and st dev if anomalous points are removed from errors
(see https://arxiv.org/pdf/1802.04431.pdf)
- Parameters:
inverse (bool) – If true, epsilon is calculated for inverted errors
- prune_anoms(inverse=False)¶
Remove anomalies that don’t meet minimum separation from the next closest anomaly or error value
- Parameters:
inverse (bool) – If true, epsilon is calculated for inverted errors
- score_anomalies()¶
Calculate anomaly scores based on max distance from epsilon for each anomalous sequence.
- autots.tools.thresholding.consecutive_groups(iterable, ordering=<function <lambda>>)¶
Yield groups of consecutive items using
itertools.groupby().From more_itertools package, see description there for details (circa mid 2022)
- autots.tools.thresholding.nonparametric(series, method_params)¶
autots.tools.transform module¶
autots.tools.wavelet module¶
- autots.tools.wavelet.continuous_db2_wavelet(t, p, order, sigma)¶
- autots.tools.wavelet.create_daubechies_db2_wavelet(p)¶
Create a Daubechies db2 wavelet with specified period p.
Parameters: - p (int): The period or length to generate the wavelet.
Returns: - np.ndarray: The generated Daubechies db2 wavelet.
- autots.tools.wavelet.create_gaussian_wavelet(p, frequency=3, sigma=1.0)¶
Create a Gaussian-modulated cosine wavelet with specified frequency and sigma.
Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated Gaussian-modulated wavelet.
- autots.tools.wavelet.create_haar_wavelet(p)¶
Create a Haar wavelet with specified period p.
Parameters: - p (float): The period or length to generate the wavelet.
Returns: - np.ndarray: The generated Haar wavelet.
- autots.tools.wavelet.create_mexican_hat_wavelet(p, frequency=None, sigma=1.0)¶
Create a Mexican Hat wavelet (Ricker wavelet) with specified sigma.
Parameters: - p (float): The period or length to generate the wavelet. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated Mexican Hat wavelet.
- autots.tools.wavelet.create_morlet_wavelet(p, frequency=3, sigma=1.0)¶
Create a Morlet wavelet with specified frequency and sigma.
Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated complex Morlet wavelet.
- autots.tools.wavelet.create_narrowing_wavelets(p, max_order, t, sigma=0.5)¶
- autots.tools.wavelet.create_real_morlet_wavelet(p, frequency=3, sigma=1.0)¶
Create a real-valued Morlet wavelet with specified frequency and sigma.
Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated real Morlet wavelet.
- autots.tools.wavelet.create_wavelet(t, p, sigma=1.0, phase_shift=0, wavelet_type='morlet')¶
Create a real-valued wavelet based on real-world anchored time steps in t, with an additional phase shift and a choice of wavelet type.
Parameters: - t (np.ndarray): Array of time steps (in days) from a specified origin. - p (float): The period of the wavelet in the same units as t (typically days). - sigma (float): Standard deviation for the Gaussian envelope. - phase_shift (float): Phase shift to adjust the position of the wavelet peak. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).
Returns: - np.ndarray: The generated wavelet values for each time step.
- autots.tools.wavelet.offset_wavelet(p, t, order=5, sigma=1.0, wavelet_type='morlet')¶
Create an offset collection of wavelets with order offsets, ensuring that peaks are spaced p/order apart.
Parameters: - p (float): Period of the wavelet in the same units as t (typically days). - t (np.ndarray): Array of time steps. - order (int): The number of offsets. - sigma (float): Standard deviation for the Gaussian envelope. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).
Returns: - np.ndarray: A 2D array with order wavelets along axis 1.
autots.tools.window_functions module¶
- autots.tools.window_functions.chunk_reshape(arr, window_size=10, chunk_size=100, sample_fraction=None, random_seed=7734, dtype=<class 'numpy.float32'>)¶
Shifts from (n_records, n_series) to (windows, window_size). Multivariate. More memory efficient, if not quite as fast as x.reshape(-1, x.shape[-1]) for 3D numpy array.
- autots.tools.window_functions.last_window(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False)¶
Pandas based function to provide the last window of window_maker.
- autots.tools.window_functions.np_2d_arange(start=0, stop=3, step=1, num_columns=4)¶
- autots.tools.window_functions.retrieve_closest_indices(df, num_indices, forecast_length, window_size: int = 10, distance_metric: str = 'braycurtis', stride_size: int = 1, start_index: int | None = None, include_differenced: bool = False, include_last: bool = True, verbose: int = 0)¶
Find next indicies closest to the final segment of forecast_length
- Parameters:
df (pd.DataFrame) – source data in wide format
num_indices (int) – number of indices to return
forecast_length (int) – length of forecast
window_size (int) – length of comparison
distance_metric (str) – distance measure from scipy and nan_euclidean
stride_size (int) – length of spacing between windows
start_index (int) – index to begin creation of windows from
include_difference (bool) – if True, also compare on differences
- autots.tools.window_functions.rolling_window_view(array, window_shape=(0,), axis=None, writeable=False)¶
Create a view of array which for every point gives the n-dimensional neighbourhood of size window. New dimensions are added at the end of array or after the corresponding original dimension.
Based on: https://gist.github.com/seberg/3866040 but designed to match the newer np.sliding_window_view
- Parameters:
array (np.array) – Array to which the rolling window is applied.
window_shape (int) – Either a single integer to create a window of only the last axis or a tuple to create it for the last len(window) axis. 0 can be used as a to ignore a dimension in the window.
axis (int) – If given, must have the same size as window. In this case window is interpreted as the size in the dimension given by axis. IE. a window of (2, 1) is equivalent to window=2 and axis=-2.
- Returns:
A view on array which is smaller to fit the windows and has windows added dimensions (0s not counting), ie. every point of array is an array of size window.
- autots.tools.window_functions.sliding_window_view(array, window_shape=(0,), axis=None, writeable=False, **kwargs)¶
Toggles between numpy and internal version depending on np.__version__.
- autots.tools.window_functions.window_id_maker(window_size: int, max_steps: int, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)¶
Create indices for array of multiple window slices of data
- Parameters:
window_size (int) – length of time history to include
max_steps (int) – the maximum number of windows to create
start_index (int) – if to not start at the first point, start at this point
stride_size (int) – number of skips between each window start point
skip_size (int) – number of skips between each obs in a window (downsamples)
- Returns:
np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)
- autots.tools.window_functions.window_lin_reg(x, y, w)¶
- autots.tools.window_functions.window_lin_reg_mean(x, y, w)¶
- autots.tools.window_functions.window_lin_reg_mean_no_nan(x, y, w)¶
- autots.tools.window_functions.window_maker(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False, shuffle: bool = False, output_dim: str = 'forecast_length', forecast_length: int = 1, max_windows: int = 5000, regression_type: str | None = None, future_regressor=None, random_seed: int = 1234)¶
Convert a dataset into slices with history and y forecast.
- Parameters:
df (pd.DataFrame) – wide format df with sorted index
window_size (int) – length of history to use for X window
input_dim (str) – univariate or multivariate. If multivariate, all series in single X row
shuffle (bool) – (deprecated)
output_dim (str) – ‘forecast_length’ or ‘1step’ where 1 step is basically forecast_length=1
forecast_length (int) – number of periods ahead that will be forecast
max_windows (int) – a cap on total number of windows to generate. If exceeded, random of this int are selected.
regression_type (str) – None or “user” if to try to concat regressor to windows
future_regressor (pd.DataFrame) – values of regressor if used
random_seed (int) – a consistent random
- Returns:
X, Y
- autots.tools.window_functions.window_maker_2(array, window_size: int, max_steps: int | None = None, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)¶
Create array of multiple window slices of data Note that this returns a different orientation than window_maker_3
- Parameters:
array (np.array) – source of historic information of shape (num_obs, num_series)
window_size (int) – length of time history to include
max_steps (int) – the maximum number of windows to create
start_index (int) – if to not start at the first point, start at this point
stride_size (int) – number of skips between each window start point
skip_size (int) – number of skips between each obs in a window (downsamples)
- Returns:
np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)
- autots.tools.window_functions.window_maker_3(array, window_size: int, **kwargs)¶
stride tricks version of window. About 40% faster than window_maker_2 Note that this returns a different orientation than window_maker_2
- Parameters:
array (np.array) – in shape of (num_obs, num_series)
window_size (int) – length of slice of history
np.lib.stride_tricks.sliding_window_view (**kwargs passed to) –
- Returns:
np.array with 3D shape (num windows, num columns/series, window_length), 2D array if only 1D array provided)
- autots.tools.window_functions.window_sum_mean(x, w, axis=0)¶
- autots.tools.window_functions.window_sum_mean_nan_tail(x, w, axis=0)¶
- autots.tools.window_functions.window_sum_nan_mean(x, w, axis=0)¶
Module contents¶
Basic utilities.