autots.tools package

Submodules

autots.tools.anomaly_utils module

Created on Fri Jul 1 15:41:21 2022

@author: Colin

point, contextual, and collective. Point anomalies are single values that fall within low-density regions of values, collective anomalies indicate that a sequence of values is anomalous rather than any single value by itself, and contextual anomalies are single values that do not fall within low-density regions yet are anomalous with regard to local values - https://arxiv.org/pdf/1802.04431.pdf

autots.tools.anomaly_utils.anomaly_df_to_holidays(anomaly_df, actuals=None, anomaly_scores=None, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=False, use_lunar_holidays=False, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False)
autots.tools.anomaly_utils.anomaly_new_params(method='random')
autots.tools.anomaly_utils.create_dates_df(dates)

Take a pd.DatetimeIndex and create simple date parts.

autots.tools.anomaly_utils.dates_to_holidays(dates, df_cols, style='long', holiday_impacts='value', day_holidays=None, wkdom_holidays=None, wkdeom_holidays=None, lunar_holidays=None, lunar_weekday=None, islamic_holidays=None, hebrew_holidays=None, hindu_holidays=None, max_features: int | None = None)

Populate date information for a given pd.DatetimeIndex.

Parameters:
  • dates (pd.DatetimeIndex) – list of dates

  • day_holidays (pd.DataFrame) – list of month/day holidays. Pass None if not available

  • style (str) – option for how to return information “long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar

  • holiday_impacts (dict) – a dict passed to .replace contaning values for holiday_names, or str ‘value’ or ‘anomaly_score’

autots.tools.anomaly_utils.detect_anomalies(df, output, method, transform_dict=None, method_params={}, eval_period=None, n_jobs=1)

All will return -1 for anomalies.

Parameters:
  • output (str) – ‘multivariate’ (each series unique outliers), or ‘univariate’ (all series together for one outlier flag per timestamp)

  • transform_dict (dict) – option but helpful, often datepart, differencing, or other standard AutoTS transformer params

  • forecast_params (dict) – used to backcast and identify ‘unforecastable’ values, required only for predict_interval method

  • eval_periods (int) – only use this length tail of data, currently only implemented for forecast_params forecasting if used

  • n_jobs (int) – multiprocessing jobs, used by some methods

Returns:

pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)

autots.tools.anomaly_utils.fit_anomaly_classifier(anomalies, scores)

Fit a DecisionTree model to predict if a score is an anomaly.

This is a shared utility for both AnomalyDetector and AnomalyRemoval classes. Using DecisionTree as it can handle nonparametric anomalies.

Parameters:
  • anomalies (pd.DataFrame) – DataFrame with -1 for anomalies, 1 for normal

  • scores (pd.DataFrame) – DataFrame with anomaly scores

Returns:

(classifier, score_categories) - trained classifier and categorical mapping

Return type:

tuple

autots.tools.anomaly_utils.gaussian_mixture(df, n_components=2, tol=0.001, max_iter=100, responsibility_threshold=0.05)
autots.tools.anomaly_utils.holiday_new_params(method='random')
autots.tools.anomaly_utils.limits_to_anomalies(df, output, upper_limit, lower_limit, method_params=None)
autots.tools.anomaly_utils.loop_sk_outliers(df, method, method_params={}, n_jobs=1)

Multiprocessing on each series for multivariate outliers with sklearn.

autots.tools.anomaly_utils.nonparametric_multivariate(df, output, method_params, n_jobs=1)
autots.tools.anomaly_utils.score_to_anomaly(scores, classifier, score_categories)

Convert anomaly scores to anomaly classifications using a trained classifier.

This is a shared utility for both AnomalyDetector and AnomalyRemoval classes.

Parameters:
  • scores (pd.DataFrame) – DataFrame with anomaly scores

  • classifier – trained sklearn classifier

  • score_categories – categorical mapping from fit_anomaly_classifier

Returns:

Classifications (-1 = anomaly, 1 = normal)

Return type:

pd.DataFrame

autots.tools.anomaly_utils.sk_outliers(df, method, method_params={})

scikit-learn outlier methods wrapper.

autots.tools.anomaly_utils.values_to_anomalies(df, output, threshold_method, method_params, n_jobs=1)
autots.tools.anomaly_utils.zscore_survival_function(df, output='multivariate', method='zscore', distribution='norm', rolling_periods: int = 200, center: bool = True)

Take a dataframe and generate zscores and then generating survival probabilities (smaller = more outliery).

Parameters:
  • df (pd.DataFramme) – wide style time series data (datetimeindex, series)

  • output (str) – univariate (1 series from all) or multivariate (all series input returned unique)

  • method (str) – zscore, rolling_zscore, mad (median abs dev)

  • distribution (str) – distribution to sample sf/outliers from

  • rolling_period (int) – >1, used for rolling_zscore period

  • center (bool) – passed to pd.rolling for rolliing_zscore, True for holiday detection, False for anomaly detection generally

Returns:

pd.Dataframe of p-values

autots.tools.autoencoder module

Autoencoder tools for anomaly detection

Variational Autoencoder (VAE) implementation for time series anomaly detection

class autots.tools.autoencoder.VAE(input_dim, latent_dim, depth=1, dropout_rate=0.0)

Bases: Module

Complete VAE model.

forward(x)
reparameterize(mean, logvar)

Reparameterization trick.

class autots.tools.autoencoder.VAEAnomalyDetector(depth=1, batch_size=32, epochs=50, learning_rate=0.001, loss_function='elbo', dropout_rate=0.0, latent_dim=None, beta=1.0, random_state=None, device=None)

Bases: object

Variational Autoencoder for Anomaly Detection.

fit(X)

Fit the VAE model.

predict(X)

Predict anomaly scores.

class autots.tools.autoencoder.VAEDecoder(input_dim, latent_dim, depth=1, dropout_rate=0.0)

Bases: Module

VAE Decoder network.

forward(z)
class autots.tools.autoencoder.VAEEncoder(input_dim, latent_dim, depth=1, dropout_rate=0.0)

Bases: Module

VAE Encoder network.

forward(x)
autots.tools.autoencoder.loop_vae_outliers(df, method_params={}, n_jobs=1)

Multiprocessing on each series for multivariate VAE outliers.

autots.tools.autoencoder.vae_outliers(df, method_params={})

VAE-based outlier detection.

autots.tools.bayesian_regression module

class autots.tools.bayesian_regression.BayesianMultiOutputRegression(gaussian_prior_mean=0.0, alpha=1.0, wishart_prior_scale=1.0, wishart_dof_excess=0)

Bases: object

Matrix-normal/Inverse-Wishart Bayesian linear regression.

Computes an analytical posterior for multivariate linear regression with Gaussian weight noise and unknown Gaussian observation noise. The posterior mean matches ridge regression, while the stored covariance allows callers to recover coefficient standard deviations, prediction intervals, and posterior samples without resorting to numerical sampling during training.

coefficient_interval(z_value=1.96)
fit(X, Y)
predict(X, return_std=False)
sample_posterior(n_samples=1)

autots.tools.calendar module

Calendar conversion functions.

Includes Lunar, Chinese lunar, and Arabic lunar

autots.tools.calendar.gregorian_to_chinese(datetime_index, epoch=2444238.5)

Convert a pandas DatetimeIndex to Chinese Lunar calendar. Potentially has errors.

autots.tools.calendar.gregorian_to_christian_lunar(datetime_index)

Convert a pandas DatetimeIndex to Christian Lunar calendar. Aspiration it doesn’t work exactly.

autots.tools.calendar.gregorian_to_hebrew(dates)

Convert pd.Datetimes to a Hebrew date. From pyluach by simlist.

This is the slowest of the lot and needs to be improved.

autots.tools.calendar.gregorian_to_hindu(datetime_index, method: str = 'lunar')

Convert a pandas DatetimeIndex to Hindu calendar date components. Hindu calendar has numerous regional variations. This implementation aims for compatibility with common internationally recognized Hindu holidays.

Parameters:
  • datetime_index (pd.DatetimeIndex) – pandas DatetimeIndex

  • method (str) – ‘simple’ or ‘lunar’. Simple is faster. Lunar is more accurate.

autots.tools.calendar.gregorian_to_islamic(date, epoch_adjustment=1.5)

Calculate Islamic dates for pandas DatetimeIndex. Approximately. From convertdate by fitnr.

Parameters:

epoch_adjustment (float) – 1.0 and that needs to be adjusted by about +/- 0.5 to account for timezone

autots.tools.calendar.heb_is_leap(year)
autots.tools.calendar.lunar_from_lunar(new_moon)

Assumes continuous daily data and pre-needed start.

autots.tools.calendar.lunar_from_lunar_full(full_moon)

Assumes continuous daily data and pre-needed start.

autots.tools.calendar.to_jd(year, month, day)

Determine Julian day count from Islamic date. From convertdate by fitnr.

autots.tools.changepoints module

class autots.tools.changepoints.ChangepointDetector(method='pelt', method_params=None, aggregate_method='mean', min_segment_length=5, probabilistic_output=False, n_jobs=1)

Bases: object

Advanced changepoint detection class for time series data.

Supports multiple algorithms for detecting changepoints and level shifts in wide-format time series data, similar to HolidayDetector.

create_features(forecast_length=0)

Create changepoint features for regression modeling.

Parameters:

forecast_length (int) – Number of future periods to extend features

Returns:

Changepoint features

Return type:

pd.DataFrame

detect(df)

Run changepoint detection on wide-format time series data.

Parameters:

df (pd.DataFrame) – Wide-format time series with DatetimeIndex

fit(df)

Fit the changepoint detector and prepare transformer artifacts.

Parameters:

df (pd.DataFrame) – Training data with DatetimeIndex.

fit_transform(df)

Fit the detector and immediately transform the input data.

Parameters:

df (pd.DataFrame) – Training data with DatetimeIndex.

get_market_changepoints(method='dbscan', params=None)

Find common changepoints across multiple time series using clustering.

Parameters:
  • method (str) – Clustering method (‘dbscan’, ‘kmeans’, ‘hierarchical’)

  • params (dict) – Parameters for clustering algorithm

Returns:

Array of market-wide changepoint indices

Return type:

np.ndarray

static get_new_params(method='random')

Generate new random parameters for changepoint detection.

Parameters:

method (str) –

Method for parameter selection - ‘fast’: All methods but with fastest parameter configurations for PELT and composite_fused_lasso - Or specify a method name directly: ‘basic’, ‘pelt’, ‘l1_fused_lasso’,

’l1_total_variation’, ‘cusum’, ‘autoencoder’, ‘composite_fused_lasso’

Returns:

Complete parameter dictionary for ChangepointDetector initialization

Return type:

dict

inverse_transform(df, trans_method='forecast')

Restore data to the original scale using stored changepoint trends.

Parameters:
  • df (pd.DataFrame) – Data to inverse transform.

  • trans_method (str) – Compatibility argument for transformer interface.

plot(series_name=None, figsize=(12, 8))

Plot time series with detected changepoints.

Parameters:
  • series_name (str) – Name of series to plot (for individual detection)

  • figsize (tuple) – Figure size

transform(df)

Apply changepoint-based detrending to the provided data.

Parameters:

df (pd.DataFrame) – Data to transform.

autots.tools.changepoints.changepoint_fcst_from_last_row(x_t_last_row, n_forecast=10)
autots.tools.changepoints.create_changepoint_features(DTindex, changepoint_spacing=60, changepoint_distance_end=120, method='basic', params=None, data=None)

Creates a feature set for estimating trend changepoints using various algorithms.

Parameters: DTindex (pd.DatetimeIndex): a datetimeindex changepoint_spacing (int): Distance between consecutive changepoints (legacy, for basic method). changepoint_distance_end (int): Number of rows that belong to the final changepoint (legacy, for basic method). method (str): Method for changepoint detection (‘basic’, ‘pelt’, ‘l1_fused_lasso’, ‘l1_total_variation’, ‘cusum’, ‘ewma’, ‘autoencoder’) params (dict): Additional parameters for the chosen method data (array-like): Time series data (required for advanced methods)

Returns: pd.DataFrame: DataFrame containing changepoint features for linear regression.

autots.tools.changepoints.find_market_changepoints_multivariate(df, detector_params=None, clustering_method='dbscan', clustering_params=None, min_series_agreement=0.3)

Find common changepoints across multivariate time series data.

Parameters:
  • df (pd.DataFrame) – Wide-format time series data

  • detector_params (dict) – Parameters for ChangePointDetector

  • clustering_method (str) – Method for clustering changepoints (‘dbscan’, ‘kmeans’, ‘agreement’)

  • clustering_params (dict) – Parameters for clustering

  • min_series_agreement (float) – Minimum fraction of series that must agree on a changepoint

Returns:

Dictionary with market changepoints and individual series changepoints

Return type:

dict

autots.tools.changepoints.generate_random_changepoint_params(method='random')

Generate random parameters for changepoint detection methods.

This function creates appropriately weighted random parameters for different changepoint detection algorithms, supporting the flexible method/params system.

DEPRECATED: This function now delegates to ChangepointDetector.get_new_params() for consistency. Use ChangepointDetector.get_new_params() directly for new code.

Parameters:

method (str) – Method for parameter selection - ‘random’: All methods with balanced weights - ‘fast’: All methods but with fastest parameter configurations for PELT and composite_fused_lasso - ‘default’/’basic_weighted’: Basic method heavily weighted

Returns:

(changepoint_method, changepoint_params) where
  • changepoint_method (str): Selected method name

  • changepoint_params (dict): Method-specific parameters

Return type:

tuple

autots.tools.changepoints.half_yr_spacing(df)

autots.tools.cointegration module

Cointegration

Johansen heavily based on Statsmodels source code

BTCD heavily based on D. Barba https://towardsdatascience.com/canonical-decomposition-a-forgotten-method-for-time-series-cointegration-and-beyond-4d1213396da1

autots.tools.cointegration.btcd_decompose(p_mat: ndarray, regression_model, max_lag: int = 1, return_eigenvalues=False)

Calculate decomposition. p_mat is of shape(t,n), wide style data.

autots.tools.cointegration.coint_fast(endog, k_ar_diff=1)

A fast estimation of cointegration vectors for VECM.

Parameters:
  • endog (np.array) – should be (nobs, n_vars)

  • k_ar_diff (int) – number of lags to use in VECM.

autots.tools.cointegration.coint_johansen(endog, det_order=-1, k_ar_diff=1, return_eigenvalues=False, fast: bool = True)

Johansen cointegration test of the cointegration rank of a VECM, abbreviated from Statsmodels

autots.tools.cointegration.fourier_series(dates, period, series_order)

Provides Fourier series components with the specified frequency and order.

Parameters:
  • dates (pd.Series containing timestamps.) –

  • period (Number of days of the period.) –

  • series_order (Number of components.) –

Return type:

Matrix with seasonality features.

autots.tools.cointegration.lagmat(x, maxlag: int, trim='forward', original='ex')

Create 2d array of lags. Modified from Statsmodels.

autots.tools.constraint module

Constraint and adjustment generation functions

autots.tools.constraint.apply_adjustment_single(forecast: DataFrame, adjustment_method: str, adjustment_params: dict | None = None, df_train: DataFrame | None = None, series_ids=None, lower_forecast: DataFrame | None = None, upper_forecast: DataFrame | None = None)

Apply a single adjustment to forecast (and optional bounds).

adjustment_method:
  • “basic”: linear ramp between start/end values and dates

    params: start_date, end_date, start_value, end_value, method (“additive”|”multiplicative”)

  • “align_last_value”: align start of forecast to recent history, requires df_train

    params: any AlignLastValue kwargs (rows, lag, method, strength, etc.)

  • “smoothing”: EWMA smoothing

    params: span (int)

series_ids limits adjustment to specific columns; defaults to all columns.

autots.tools.constraint.apply_constraint_single(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None)
autots.tools.constraint.apply_fit_constraint(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, lower_constraint=None, upper_constraint=None, train_min=None, train_max=None, fillna=None)
autots.tools.constraint.constant_growth_rate(periods, final_growth)

Take a final target growth rate (ie 2 % over a year) and convert to a daily (etc) value.

autots.tools.constraint.constraint_new_params(method: str = 'fast')
autots.tools.constraint.fit_constraint(constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None, forecast_length=None)

autots.tools.cpu_count module

CPU counter for multiprocesing.

autots.tools.cpu_count.cpu_count(modifier: float = 1)

Find available CPU count, running on both Windows/Linux.

Attempts to be very conservative:
  • Remove Intel Hyperthreading logical cores

  • Find max cores allowed to the process, if less than machine has total

Runs best with psutil installed, fallsback to mkl, then os core count/2

Parameters:

modifier (float) – multiple CPU count by this value

autots.tools.cpu_count.set_n_jobs(n_jobs, verbose=0)

autots.tools.fast_kalman module

From SIMD KALMAN, (c) 2017 Otto Seiskari (MIT License)

Some other resources that I have found useful:

https://kevinkotze.github.io/ts-4-state-space/ https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html an Introduction to State Space Time Series Analysis, Commandeur and Koopman, chp 8 Forecasting, structural time series models, and the Kalman Filter, Andrew Harvey

Following the notation in [1], the Kalman filter framework consists of a dynamic model (state transition model)

\[x_k = A x_{k-1} + q_{k-1}, \qquad q_{k-1} \sim N(0, Q)\]

and a measurement model (observation model)

\[y_k = H x_k + r_k, \qquad r_k \sim N(0, R),\]

where the vector \(x\) is the (hidden) state of the system and \(y\) is an observation. A and H are matrices of suitable shape and \(Q\), \(R\) are positive-definite noise covariance matrices.

Usage example

import numpy.random numpy.random.seed(0)

Define model

import simdkalman import numpy as np

kf = simdkalman.KalmanFilter(

state_transition = [[1,1],[0,1]], # matrix A process_noise = np.diag([0.1, 0.01]), # Q observation_model = np.array([[1,0]]), # H observation_noise = 1.0) # R

Generate some fake data

import numpy.random as random

# 100 independent time series data = random.normal(size=(100, 200))

# with 10% of NaNs denoting missing values data[random.uniform(size=data.shape) < 0.1] = np.nan

Smooth all data

smoothed = kf.smooth(data,

initial_value = [1,0], initial_covariance = np.eye(2) * 0.5)

# second timeseries, third time step, hidden state x print(‘mean’) print(smoothed.states.mean[1,2,:])

print(‘covariance’) print(smoothed.states.cov[1,2,:,:])

mean [ 0.29311384 -0.06948961] covariance [[ 0.19959416 -0.00777587]

[-0.00777587 0.02528967]]

Predict new data for a single series (1d case)

predicted = kf.predict(data[1,:], 123)

# predicted observation y, third new time step pred_mean = predicted.observations.mean[2] pred_stdev = np.sqrt(predicted.observations.cov[2])

print(‘%g +- %g’ % (pred_mean, pred_stdev))

1.71543 +- 1.65322

Low-level Kalman filter computation steps with multi-dimensional input arrays. Unlike with the KalmanFilter class, all inputs must be numpy arrays. However, their dimensions can flexibly vary form 1 to 3 as long as they are reasonable from the point of view of matrix multiplication and numpy broadcasting rules. Matrix operations are applied on the last two axes of the arrays.

class autots.tools.fast_kalman.Gaussian(mean, cov)

Bases: object

static empty(n_states, n_vars, n_measurements, cov=True)
unvectorize_state()
unvectorize_vars()
class autots.tools.fast_kalman.KalmanFilter(state_transition, process_noise, observation_model, observation_noise)

Bases: object

The main Kalman filter class providing convenient interfaces to vectorized smoothing and filtering operations on multiple independent time series.

As long as the shapes of the given parameters match reasonably according to the rules of matrix multiplication, this class is flexible in their exact nature accepting

  • scalars: process_noise = 0.1

  • (2d) numpy matrices: process_noise = numpy.eye(2)

  • 2d arrays: observation_model = [[1,2]]

  • 3d arrays and matrices for vectorized computations. Unlike the other options, this locks the shape of the inputs that can be processed by the smoothing and prediction methods.

Parameters:
  • state_transition – State transition matrix \(A\)

  • process_noise – Process noise (state transition covariance) matrix \(Q\)

  • observation_model – Observation model (measurement model) matrix \(H\)

  • observation_noise – Observation noise (measurement noise covariance) matrix \(R\)

class Result

Bases: object

compute(data, n_test, initial_value=None, initial_covariance=None, smoothed=True, filtered=False, states=True, covariances=True, observations=True, likelihoods=False, gains=False, log_likelihood=False, verbose=False)

Smoothing, filtering and prediction at the same time. Used internally by other methods, but can also be used directly if, e.g., both smoothed and predicted data is wanted.

See smooth and predict for explanation of the common parameters. With this method, there also exist the following flags.

Parameters:
  • smoothed (boolean) – compute Kalman smoother (used by smooth)

  • filtered (boolean) – return (one-way) filtered data

  • likelihoods (boolean) – return likelihoods of each step

  • gains (boolean) – return Kalman gains and pairwise covariances (used by the EM algorithm). If true, the gains are provided as a member of the relevant subresult filtered.gains and/or smoothed.gains.

  • log_likelihood (boolean) – return the log-likelihood(s) for the entire series. If matrix data is given, this will be a vector where each element is the log-likelihood of a single row.

Return type:

result object whose fields depend on of the above parameter flags are True. The possible values are: smoothed (the return value of smooth, may contain smoothed.gains), filtered (like smoothed, may also contain filtered.gains), predicted (the return value of predict if n_test > 0) pairwise_covariances, likelihoods and log_likelihood.

em(data, n_iter=5, initial_value=None, initial_covariance=None, verbose=False, tol=None)
em_observation_noise(result, data, verbose=False)
em_process_noise(result, verbose=False)
predict(data, n_test, initial_value=None, initial_covariance=None, states=True, observations=True, covariances=True, verbose=False)

Filter past data and predict a given number of future values. The data can be given as either of

  • 1d array, like [1,2,3,4]. In this case, one Kalman filter is used and the return value structure will contain an 1d array of observations (both .mean and .cov will be 1d).

  • 2d matrix, whose each row is interpreted as an independent time series, all of which are filtered independently. The returned observations members will be 2-dimensional in this case.

  • 3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e, data[1,2,:] defines the components of the third observation of the second series. In the-multi-dimensional case the returned observations.mean will be 3-dimensional and observations.cov 4-dimensional.

Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.

Parameters:
  • data – Past data

  • n_test (integer) – number of future steps to predict.

  • initial_value – Initial value \({\mathbb E}[x_0]\)

  • initial_covariance – Initial uncertainty \({\rm Cov}[x_0]\)

  • states (boolean) – predict states \(x\)?

  • observations (boolean) – predict observations \(y\)?

  • covariances (boolean) – include covariances in predictions?

Return type:

Result object with fields states and observations, if the respective parameter flags are set to True. Both are Gaussian result objects with fields mean and cov (if the covariances flag is True)

predict_next(m, P)

Single prediction step

Parameters:
  • m\({\mathbb E}[x_{j-1}]\), the previous mean

  • P\({\rm Cov}[x_{j-1}]\), the previous covariance

Return type:

(prior_mean, prior_cov) predicted mean and covariance \({\mathbb E}[x_j]\), \({\rm Cov}[x_j]\)

predict_observation(m, P)

Probability distribution of observation \(y\) for a given distribution of \(x\)

Parameters:
  • m\({\mathbb E}[x]\)

  • P\({\rm Cov}[x]\)

Return type:

mean \({\mathbb E}[y]\) and covariance \({\rm Cov}[y]\)

smooth(data, initial_value=None, initial_covariance=None, observations=True, states=True, covariances=True, verbose=False)

Smooth given data, which can be either

  • 1d array, like [1,2,3,4]. In this case, one Kalman filter is used and the return value structure will contain an 1d array of observations (both .mean and .cov will be 1d).

  • 2d matrix, whose each row is interpreted as an independent time series, all of which are smoothed independently. The returned observations members will be 2-dimensional in this case.

  • 3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e, data[1,2,:] defines the components of the third observation of the second series. In the-multi-dimensional case the returned observations.mean will be 3-dimensional and observations.cov 4-dimensional.

Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.

Parameters:
  • data – 1d or 2d data, see above

  • initial_value – Initial value \({\mathbb E}[x_0]\)

  • initial_covariance – Initial uncertainty \({\rm Cov}[x_0]\)

  • states (boolean) – return smoothed states \(x\)?

  • observations (boolean) – return smoothed observations \(y\)?

  • covariances (boolean) – include covariances results?

Return type:

Result object with fields states and observations, if the respective parameter flags are set to True. Both are Gaussian result objects with fields mean and cov (if the covariances flag is True)

smooth_current(m, P, ms, Ps)

Simgle Kalman smoother backwards step

Parameters:
  • m\({\mathbb E}[x_j|y_1,\ldots,y_j]\), the filtered mean of \(x_j\)

  • P\({\rm Cov}[x_j|y_1,\ldots,y_j]\), the filtered covariance of \(x_j\)

  • ms\({\mathbb E}[x_{j+1}|y_1,\ldots,y_T]\)

  • Ps\({\rm Cov}[x_{j+1}|y_1,\ldots,y_T]\)

Return type:

(smooth_mean, smooth_covariance, smoothing_gain) smoothed mean \({\mathbb E}[x_j|y_1,\ldots,y_T]\), and covariance \({\rm Cov}[x_j|y_1,\ldots,y_T]\) & smoothing gain \(C\)

update(m, P, y, log_likelihood=False)

Single update step with NaN check.

Parameters:
  • m\({\mathbb E}[x_j|y_1,\ldots,y_{j-1}]\), the prior mean of \(x_j\)

  • P\({\rm Cov}[x_j|y_1,\ldots,y_{j-1}]\), the prior covariance of \(x_j\)

  • y – observation \(y_j\)

  • log_likelihood – compute log-likelihood?

Return type:

(posterior_mean, posterior_covariance, log_likelihood) posterior mean \({\mathbb E}[x_j|y_1,\ldots,y_j]\) & covariance \({\rm Cov}[x_j|y_1,\ldots,y_j]\) and, if requested, log-likelihood. If \(y_j\) is NaN, returns the prior mean and covariance instead

autots.tools.fast_kalman.autoshape(func)

Automatically shape arguments and return values

autots.tools.fast_kalman.ddot(A, B)

Matrix multiplication over last two axes

autots.tools.fast_kalman.ddot_t_right(A, B)

Matrix multiplication over last 2 axes with right operand transposed

autots.tools.fast_kalman.ddot_t_right_old(A, B)

Matrix multiplication over last 2 axes with right operand transposed

autots.tools.fast_kalman.dinv(A)

Matrix inverse applied to last two axes

autots.tools.fast_kalman.douter(a, b)

Outer product, last two axes

autots.tools.fast_kalman.em_initial_state(result, initial_means)
autots.tools.fast_kalman.ensure_matrix(x, dim=1)
autots.tools.fast_kalman.predict(mean, covariance, state_transition, process_noise)

Kalman filter prediction step

Parameters:
  • mean\({\mathbb E}[x_{j-1}]\), the filtered mean form the previous step

  • covariance\({\rm Cov}[x_{j-1}]\), the filtered covariance form the previous step

  • state_transition – matrix \(A\)

  • process_noise – matrix \(Q\)

Return type:

(prior_mean, prior_cov) predicted mean and covariance \({\mathbb E}[x_j]\), \({\rm Cov}[x_j]\)

autots.tools.fast_kalman.predict_observation(mean, covariance, observation_model, observation_noise)

Compute probability distribution of the observation \(y\), given the distribution of \(x\).

Parameters:
  • mean\({\mathbb E}[x]\)

  • covariance\({\rm Cov}[x]\)

  • observation_model – matrix \(H\)

  • observation_noise – matrix \(R\)

Return type:

mean \({\mathbb E}[y]\) and covariance \({\rm Cov}[y]\)

autots.tools.fast_kalman.priv_smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)
autots.tools.fast_kalman.priv_update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement, log_likelihood=False)
autots.tools.fast_kalman.smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)

Kalman smoother backwards step

Parameters:
  • posterior_mean\({\mathbb E}[x_j|y_1,\ldots,y_j]\), the filtered mean of \(x_j\)

  • posterior_covariance\({\rm Cov}[x_j|y_1,\ldots,y_j]\), the filtered covariance of \(x_j\)

  • state_transition – matrix \(A\)

  • process_noise – matrix \(Q\)

  • next_smooth_mean\({\mathbb E}[x_{j+1}|y_1,\ldots,y_T]\)

  • next_smooth_covariance\({\rm Cov}[x_{j+1}|y_1,\ldots,y_T]\)

Return type:

(smooth_mean, smooth_covariance, smoothing_gain) smoothed mean \({\mathbb E}[x_j|y_1,\ldots,y_T]\), and covariance \({\rm Cov}[x_j|y_1,\ldots,y_T]\)

autots.tools.fast_kalman.stable_pinv(A, tol=1e-05, regularization=0.0001)
autots.tools.fast_kalman.update(prior_mean, prior_covariance, observation_model, observation_noise, measurement)

Kalman filter update step

Parameters:
  • prior_mean\({\mathbb E}[x_j|y_1,\ldots,y_{j-1}]\), the prior mean of \(x_j\)

  • prior_covariance\({\rm Cov}[x_j|y_1,\ldots,y_{j-1}]\), the prior covariance of \(x_j\)

  • observation_model – matrix \(H\)

  • observation_noise – matrix \(R\)

  • measurement – observation \(y_j\)

Return type:

(posterior_mean, posterior_covariance) posterior mean and covariance \({\mathbb E}[x_j|y_1,\ldots,y_j]\), \({\rm Cov}[x_j|y_1,\ldots,y_j]\) after observing \(y_j\)

autots.tools.fast_kalman.update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement)

Kalman filter update with a check for NaN observations. Like update but returns (prior_mean, prior_covariance) if measurement is NaN

autots.tools.fast_kalman_params module

Generate random state-space model parameters paired with fast_kalman.py.

autots.tools.fast_kalman_params.ensure_stability(st)
autots.tools.fast_kalman_params.holt_winters_damped_matrices(M, alpha, beta, gamma, phi=1.0)

Not sure if this is correct. It’s close, at least.

autots.tools.fast_kalman_params.new_kalman_params(method=None, allow_auto=True)
autots.tools.fast_kalman_params.random_matrix(rows, cols, density=0.2)
autots.tools.fast_kalman_params.random_state_space(tries=15)
autots.tools.fast_kalman_params.random_state_space_original()

Return randomly generated statespace models.

autots.tools.fft module

Created on Mon Oct 9 22:07:37 2023

@author: colincatlin

class autots.tools.fft.FFT(n_harm=10, detrend='linear', freq_range=None)

Bases: object

fit(x)
generate_harmonics_dataframe(forecast_length=0)
predict(forecast_length=0)
autots.tools.fft.fourier_extrapolation(x, forecast_length=10, n_harm=10, detrend='linear', freq_range=None)

autots.tools.fir_filter module

Created on Fri Sep 6 23:59:35 2024

@author: colincatlin

autots.tools.fir_filter.apply_fir_filter_time_domain(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')

Apply FIR filter using time-domain convolution (lfilter) for smaller memory usage. This function has padding issues currently.

autots.tools.fir_filter.apply_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')

Apply FIR filter to an array of time series data with shape (observations, series).

Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)

Returns: - filtered_data: The filtered version of the input data

autots.tools.fir_filter.fft_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming', chunk_size=1000)

Apply FIR filter to an array of time series data with shape (observations, series).

Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)

Returns: - filtered_data: The filtered version of the input data

autots.tools.fir_filter.generate_random_fir_params(method='random', data_type='time_series')

autots.tools.g7xx_codec module

G.711 and G.726 audio codec inspired utilities for time series processing.

This module provides: 1. G.711 companding (mu-law and A-law) for non-linear scaling 2. G.726 adaptive differential PCM for denoising/smoothing 3. Transformer classes (G711Scaler, G726Filter) for AutoTS integration

G.711 is used as a non-linear scaler: encode on transform, decode on inverse_transform. G.726 is used as an adaptive filter to denoise while preserving structure.

All operations are fully vectorized for efficient processing of multiple time series.

class autots.tools.g7xx_codec.G711Scaler(mode: str = 'mu', mu: float = 100.0, A: float = 87.6, center: str = 'median', scale_method: str = 'mad', scale_factor: float = 3.0, min_scale: float = 1e-06, clip: bool = True, zero_offset: float = 0.0, fill_method: str = 'interpolate', on_transform: bool = True, on_inverse: bool = True, bounds_only: bool = False, **kwargs)

Bases: EmptyTransformer

G.711-style non-linear scaler (mu-law or A-law).

Encodes (compands) on transform and decodes (expands to original space) on inverse_transform. Optimized for daily/hourly data by using robust per-series centering and scaling to normalize into [-1, 1] prior to companding.

Parameters:
  • mode – ‘mu’ for mu-law or ‘a’ for A-law companding

  • mu – mu-law parameter (default 255.0, try 100-512 range)

  • A – A-law parameter (default 87.6, try 50-100 range)

  • center – ‘median’ or ‘mean’ for centering

  • scale_method – ‘mad’, ‘std’, ‘maxabs’, or ‘percentile’ for scale estimation

  • scale_factor – Multiplier for scale (default 3.0, lower=more compression, higher=better reconstruction)

  • min_scale – Minimum scale to prevent division by zero

  • clip – Whether to clip normalized values to [-1, 1]

  • zero_offset – Small constant added to handle exact zeros (e.g., 1e-6 for sparse data)

  • fill_method – Method to handle NaN values

  • on_transform – Apply encoding on transform

  • on_inverse – Apply decoding on inverse_transform

  • bounds_only – Only apply during bounds adjustment

Note

For sparse/intermittent data with many zeros, consider setting zero_offset or using alternative transformers like ReplaceConstant or CenterSplit.

fit(df)

Learn behavior of data to change.

Parameters:

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Transform data.

Parameters:

df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')
inverse_transform(df, trans_method: str = 'forecast', adjustment=None)

Inverse transform data.

Parameters:

df (pandas.DataFrame) – input dataframe

transform(df)

Transform data.

Parameters:

df (pandas.DataFrame) – input dataframe

class autots.tools.g7xx_codec.G726Filter(quant_bits: int = 4, adaptation_rate: float = 0.96, prediction_alpha: float = 0.92, floor_step: float = 0.01, dynamic_range: float = 1.5, blend: float = 0.15, noise_gate: float = 0.0, fill_method: str = 'interpolate', on_transform: bool = True, on_inverse: bool = False, bounds_only: bool = False, quantizer: str = 'uniform', use_adaptive_predictor: bool = True, predictor_leak: float = 0.9999, **kwargs)

Bases: EmptyTransformer

Adaptive differential PCM smoothing inspired by the G.726 codec.

Implements proper G.726-style adaptive filtering with: - 2-pole/6-zero adaptive predictor with decision-directed updates (not unlike an ARMA (2,6) model) - Non-uniform quantization optimized for time series - Fast/slow scale factor adaptation with speed switching

fit(df)

Learn behavior of data to change.

Parameters:

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fit and Transform data.

Parameters:

df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')
inverse_transform(df, trans_method: str = 'forecast', adjustment=None)

Inverse transform data.

Parameters:

df (pandas.DataFrame) – input dataframe

transform(df)

Transform data.

Parameters:

df (pandas.DataFrame) – input dataframe

autots.tools.g7xx_codec.g711_decode(encoded: ndarray | DataFrame, center: ndarray, scale: ndarray, mode: str = 'mu', mu: float = 255.0, A: float = 87.6, zero_offset: float = 0.0) ndarray

Decode from G.711 companded space back to original feature space.

Parameters:

zero_offset (float, optional) – Must match the offset used in encoding. Subtracted after decoding.

autots.tools.g7xx_codec.g711_encode(values: ndarray | DataFrame, mode: str = 'mu', mu: float = 255.0, A: float = 87.6, center: str = 'median', scale_method: str = 'mad', scale_factor: float = 3.0, min_scale: float = 1e-06, clip: bool = True, zero_offset: float = 0.0) tuple[ndarray, ndarray, ndarray]

Encode values via G.711-style companding after robust normalization.

Returns encoded array and the center/scale used so that decoding can recover the original feature space.

Parameters:

zero_offset (float, optional) – Small constant added before transformation to handle exact zeros. Useful for sparse/intermittent data. Default 0.0 (no offset).

autots.tools.g7xx_codec.g726_adpcm_filter(values: ndarray | DataFrame, quant_bits: int = 4, adaptation_rate: float = 0.96, prediction_alpha: float = 0.92, floor_step: float = 0.01, dynamic_range: float = 1.5, blend: float = 0.15, noise_gate: float = 0.0, quantizer: str = 'uniform', use_adaptive_predictor: bool = True, predictor_leak: float = 0.9999) ndarray

Apply the adaptive encode/decode cycle to an array of time series.

Parameters:
  • values (array-like or DataFrame) – Shape (observations, series). Each column is processed independently, but operations stay vectorized across series.

  • quant_bits (int, default 4) – Number of bits used for the quantizer (only for uniform mode).

  • adaptation_rate (float, default 0.96) – Only used when use_adaptive_predictor=False (legacy EMA mode).

  • prediction_alpha (float, default 0.92) – Only used when use_adaptive_predictor=False (legacy EMA mode).

  • floor_step (float, default 0.01) – Initial scale factor (minimum allowed value).

  • dynamic_range (float, default 1.5) – Initial scale multiplier (legacy mode only).

  • blend (float, default 0.15) – Optional blending factor with the predictor baseline. Must be in [0, 1].

  • noise_gate (float, default 0.0) – Additional soft threshold applied to residual magnitudes.

  • quantizer (str, default "uniform") – Quantization method: “uniform” for simple rounding, “nonuniform” for G.726-style decision levels optimized for time series.

  • use_adaptive_predictor (bool, default True) – Use 2-pole/6-zero adaptive predictor. If False, falls back to EMA.

  • predictor_leak (float, default 0.9999) – Leakage factor for predictor coefficients to prevent unbounded growth.

Returns:

Filtered data with the same shape as values.

Return type:

numpy.ndarray

autots.tools.hierarchial module

autots.tools.hierarchial.compute_volatility_weights(S: ndarray, cov_bottom: ndarray, volatility_method: str = 'variance', volatility_power: float = 1.0) ndarray

Compute volatility-based weights for preferential adjustment of high-volatility series.

Parameters:
  • S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.

  • cov_bottom (np.ndarray, shape (M, M)) – Covariance matrix of bottom-level series.

  • volatility_method (str) – Method to compute volatility: “variance”, “std”, “cv” (coefficient of variation)

  • volatility_power (float) – Power to raise volatility weights (higher values increase preference for volatile series)

Returns:

vol_weights – Volatility-weighted matrix where higher weights are placed on more volatile series.

Return type:

np.ndarray, shape (L, L)

autots.tools.hierarchial.erm_reconcile(S: ndarray, y_all: ndarray, W: ndarray) ndarray
ERM (Error or Empirical Risk Minimization) Reconciliation using robust numerical methods:

Solve Weighted LS: min_{y_bottom} ||y_all - S y_bottom||_W^2 subject to hierarchical constraints.

The closed-form solution for y_bottom^r:

y_bottom^r = (S’ W S)^{-1} S’ W y_all

=> y_all^r = S y_bottom^r = S (S’ W S)^{-1} S’ W y_all

Parameters:
  • S (np.ndarray, shape (L, M)) – Hierarchy matrix. L = # total levels, M = # bottom series.

  • y_all (np.ndarray, shape (T, L)) – Forecasts for T time points, dimension L.

  • W (np.ndarray, shape (L, L)) – Weight (covariance) matrix for the Weighted LS objective.

Returns:

y_all_reconciled – Reconciled forecasts for all L levels.

Return type:

np.ndarray, shape (T, L)

class autots.tools.hierarchial.hierarchial(grouping_method: str = 'tile', n_groups: int = 5, reconciliation: str = 'mean', grouping_ids: dict | None = None)

Bases: object

Create hierarchial series, then reconcile.

Currently only performs one-level groupings. :param grouping_method: method to create groups. ‘User’ requires hier_id input of groupings. :type grouping_method: str :param n_groups: number of groups, if above is not ‘User’ :type n_groups: int :param reconciliation: None, or ‘mean’ method to combine top and bottom forecasts. :type reconciliation: str :param grouping_ids: dict of series_id: group_id to use if grouping is ‘User’ :type grouping_ids: dict

fit(df)

Construct and save object info.

reconcile(df)

Apply to forecasted data containing bottom and top levels.

transform(df)

Apply hierarchy to existing data with bottom levels only.

autots.tools.hierarchial.iterative_mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray, max_iterations: int = 10, convergence_threshold: float = 1e-06, damping_factor: float = 0.7) ndarray

Iterative MinT reconciliation that gradually converges to an optimal solution.

This method applies MinT reconciliation iteratively, updating the weight matrix based on reconciliation residuals from previous iterations.

Parameters:
  • S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.

  • y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points.

  • W (np.ndarray, shape (L, L)) – The initial covariance (or weighting) matrix.

  • max_iterations (int) – Maximum number of iterations.

  • convergence_threshold (float) – Convergence threshold for relative change in reconciled forecasts.

  • damping_factor (float) – Damping factor for weight matrix updates (0 < damping_factor < 1).

Returns:

y_all_reconciled – Reconciled forecasts for all L levels.

Return type:

np.ndarray, shape (T, L)

autots.tools.hierarchial.iterative_volatility_mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray, cov_bottom: ndarray, volatility_method: str = 'variance', volatility_power: float = 1.0, volatility_mix: float = 0.5, max_iterations: int = 10, convergence_threshold: float = 1e-06, damping_factor: float = 0.7) ndarray

Combined iterative and volatility-weighted MinT reconciliation.

This method combines both approaches: volatility-based weighting and iterative refinement.

Parameters:
  • S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.

  • y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points.

  • W (np.ndarray, shape (L, L)) – The base covariance (or weighting) matrix.

  • cov_bottom (np.ndarray, shape (M, M)) – Covariance matrix of bottom-level series for volatility computation.

  • volatility_method (str) – Method to compute volatility: “variance”, “std”, “cv”

  • volatility_power (float) – Power to raise volatility weights.

  • volatility_mix (float) – Mixing parameter between base weights and volatility weights.

  • max_iterations (int) – Maximum number of iterations.

  • convergence_threshold (float) – Convergence threshold for relative change in reconciled forecasts.

  • damping_factor (float) – Damping factor for weight matrix updates.

Returns:

y_all_reconciled – Reconciled forecasts for all L levels.

Return type:

np.ndarray, shape (T, L)

autots.tools.hierarchial.ledoit_wolf_covariance(X: ndarray, assume_centered: bool = False) ndarray

Computes the Ledoit-Wolf shrunk covariance matrix of X.

Parameters:
  • X (np.ndarray, shape (n_samples, n_features)) – The data matrix. Each row is an observation, each column is a variable.

  • assume_centered (bool) – If True, X is assumed to already be centered.

Returns:

lw_cov – The Ledoit-Wolf shrunk covariance matrix estimate.

Return type:

np.ndarray, shape (n_features, n_features)

Notes

  • This shrinks the sample covariance toward the identity matrix.

  • The shrinkage intensity gamma is determined from data per Ledoit & Wolf (2004).

autots.tools.hierarchial.mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray) ndarray
MinT reconciliation formula using robust numerical methods:

y_all^r = S (S’ W^-1 S)^-1 S’ W^-1 y_all

Parameters:
  • S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix. L = number of hierarchical levels (top + middle + bottom), M = number of bottom-level series.

  • y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points (the second dimension L must match S.shape[0]).

  • W (np.ndarray, shape (L, L)) – The (regularized) covariance (or weighting) matrix for the hierarchical levels.

Returns:

y_all_reconciled – Reconciled forecasts for all L levels (top, middle, bottom).

Return type:

np.ndarray, shape (T, L)

autots.tools.hierarchial.volatility_weighted_mint_reconcile(S: ndarray, y_all: ndarray, W: ndarray, cov_bottom: ndarray, volatility_method: str = 'variance', volatility_power: float = 1.0, volatility_mix: float = 0.5) ndarray

Volatility-weighted MinT reconciliation that preferentially adjusts high-volatility series.

The method combines traditional MinT with volatility-based weighting: W_vol = (1 - α) * W + α * V where V is the volatility-based weight matrix and α is the mixing parameter.

Parameters:
  • S (np.ndarray, shape (L, M)) – The hierarchy (aggregator) matrix.

  • y_all (np.ndarray, shape (T, L)) – Forecasts at all levels for T time points.

  • W (np.ndarray, shape (L, L)) – The base covariance (or weighting) matrix.

  • cov_bottom (np.ndarray, shape (M, M)) – Covariance matrix of bottom-level series for volatility computation.

  • volatility_method (str) – Method to compute volatility: “variance”, “std”, “cv”

  • volatility_power (float) – Power to raise volatility weights.

  • volatility_mix (float) – Mixing parameter between base weights (0) and volatility weights (1).

Returns:

y_all_reconciled – Reconciled forecasts for all L levels.

Return type:

np.ndarray, shape (T, L)

autots.tools.holiday module

Manage holiday features.

autots.tools.holiday.holiday_flag(DTindex, country: str = 'US', encode_holiday_type: bool = False, holidays_subdiv=None)

Create a 0/1 flag for given datetime index. Includes fallback to pandas for US holidays if holidays package unavailable.

Parameters:
  • DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags

  • country (str) – to pass through to python package Holidays also accepts a list of countries, but not a list of subdivisions

  • encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays

  • holidays_subdiv (str) – subdivision (ie state), if used

Returns:

pd.DataFrame with DatetimeIndex

autots.tools.holiday.query_holidays(DTindex, country: str, encode_holiday_type: bool = False, holidays_subdiv=None)

Query holidays package for dates.

Parameters:
  • DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags

  • country (str) – to pass through to python package Holidays

  • encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays

autots.tools.impute module

Fill NA.

autots.tools.impute.FillNA(df, method: str = 'ffill', window: int = 10)

Fill NA values using different methods.

Parameters:
  • method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘seasonal_linear’ - seasonally-aware linear regression imputation using datetime and local features ‘seasonal_linear_window_3’ - seasonal linear with window=3 ‘seasonal_linear_window_10’ - seasonal linear with window=10 also most method values of pd.DataFrame.interpolate()

  • window (int) – length of rolling windows for filling na, for rolling methods

class autots.tools.impute.SeasonalityMotifImputer(k: int = 3, datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False)

Bases: object

impute(df)

Infer missing values on input df.

class autots.tools.impute.SimpleSeasonalityMotifImputer(datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False, max_iter: int = 100)

Bases: object

impute(df)

Infer missing values on input df.

autots.tools.impute.biased_ffill(df, mean_weight: float = 1)

Fill NaN with average of last value and mean.

autots.tools.impute.fake_date_fill(df, back_method: str = 'slice')

Numpy vectorized version. Return a dataframe where na values are removed and values shifted forward.

Warning

Thus, values will have incorrect timestamps!

Parameters:

back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na

autots.tools.impute.fake_date_fill_old(df, back_method: str = 'slice')

Return a dataframe where na values are removed and values shifted forward.

Warning

Thus, values will have incorrect timestamps!

Parameters:

back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na

autots.tools.impute.fill_forward(df)

Fill NaN with previous values.

autots.tools.impute.fill_forward_alt(df)

Fill NaN with previous values.

autots.tools.impute.fill_mean(df)
autots.tools.impute.fill_mean_old(df)

Fill NaN with mean.

autots.tools.impute.fill_median(df)

Fill nan with median values. Does not work with non-numeric types.

autots.tools.impute.fill_median_old(df)

Fill NaN with median.

autots.tools.impute.fill_one(df)

Fill NaN with zero.

autots.tools.impute.fill_zero(df)

Fill NaN with zero.

autots.tools.impute.fillna_np(array, values)
autots.tools.impute.rolling_mean(df, window: int = 10)

Fill NaN with mean of last window values.

autots.tools.impute.seasonal_linear_imputer(df, datepart_method: str = 'common_fourier', window: int = 5, lambda_: float = 0.01)

Seasonally-aware linear interpolation using multioutput linear regression.

This method creates a feature set consisting of local linear trend features (changepoints) and datetime features (seasonal patterns). It trains a ridge regression model on all non-missing points, then predicts on all missing points to fill them.

This is fully vectorized for speed on large datasets and uses matrix operations across the entire dataframe. Unlike BasicLinearModel, this handles missing data by training only on non-NaN rows.

Parameters:
  • df (pd.DataFrame) – DataFrame with datetime index and potential missing values

  • datepart_method (str) – Method for generating seasonal features. Default is ‘common_fourier’

  • window (int) – Controls changepoint spacing for local linear trend features. Default is 5

  • lambda (float) – Ridge regression regularization parameter. Default is 0.01

Returns:

DataFrame with missing values filled

Return type:

pd.DataFrame

Example

>>> df_filled = seasonal_linear_imputer(df, window=7)
>>> # Or via FillNA:
>>> df_filled = FillNA(df, method='seasonal_linear', window=10)

autots.tools.kalman module

Created on Wed Feb 5 16:29:58 2025

autots.tools.kalman.kalman_fusion_forecasts(F: ndarray, index, columns, coverage: float = 0.9, method: str = 'multi_series', Q_init=0.1, R_init=1.0, adapt_Q: str | None = None, adapt_R: str | None = None, initial_x=None, initial_P=None, min_std=1e-15, scale: bool = True, a: float = 1.0)

Fuse multiple forecasts using a Kalman Filter for each forecast step.

Parameters:
  • F (np.ndarray) – Forecasts of shape (n, S, T) - n = number of models - S = forecast length (time steps) - T = number of time series

  • index (pd.Index) – Index for the returned DataFrames (length=S).

  • columns (list-like) – Column names for the returned DataFrames (length=T).

  • coverage (float) – e.g. 0.90 => ~1.645 standard deviations for intervals.

  • method (str) – Either “multi_series” or “per_series”. - “multi_series”: single (T-dim) Kalman Filter for all T series together - “per_series”: run T separate 1D Kalman Filters, one per series

  • Q_init (float or np.ndarray) – Base process noise (or matrix).

  • R_init (float or np.ndarray) – Base measurement noise (or matrix).

  • adapt_Q (str) – If “spread”, adapt Q each step based on model spread.

  • adapt_R (str) – If “spread”, adapt R each step based on model spread.

  • initial_x (np.ndarray) – initial state guess. - For multi_series: shape (T,) - For per_series: shape (T,)

  • initial_P (np.ndarray) – initial covariance guess. - For multi_series: shape (T, T) - For per_series: shape (T,)

  • min_std (float) – Small floor value for numerical stability in standard deviations.

  • scale (bool) – If True, each series is scaled (by its standard deviation) before filtering. The fused forecasts are converted back to the original scale.

Returns:

Kalman-fused point forecasts, shape (S, T). df_lower (pd.DataFrame): Lower bound, shape (S, T). df_upper (pd.DataFrame): Upper bound, shape (S, T).

Return type:

df_point (pd.DataFrame)

Notes

  • In “multi_series” mode the state is a T-dimensional vector that is updated with an (n*T)-dimensional measurement (the stacked forecasts from the n models).

  • In “per_series” mode the algorithm runs T separate 1D Kalman Filters (one per series). In that case, for each forecast step (loop over S) you update each of the T filters independently.

  • When scale=True, each series is divided by its standard deviation (computed over all forecasts) before filtering; after filtering the outputs are multiplied by the same scale so that the returned forecasts are in the original feature space.

autots.tools.kalman.robust_pinv(M, rcond=1e-15, reg=1e-08)

autots.tools.lunar module

Phases of the moon. Modified from https://stackoverflow.com/a/2531541/9492254 by keturn and earlier from John Walker

autots.tools.lunar.dcos(d)
autots.tools.lunar.dsin(d)
autots.tools.lunar.fixangle(a)
autots.tools.lunar.kepler(m, ecc=0.016718)

Solve the equation of Kepler.

autots.tools.lunar.moon_phase(datetime_index, epsilon=1e-06, epoch=2444237.905, ecliptic_longitude_epoch=278.83354, ecliptic_longitude_perigee=282.596403, eccentricity=0.016718, moon_mean_longitude_epoch=64.975464, moon_mean_perigee_epoch=349.383063)

Numpy version. Takes a pd.DatetimeIndex and returns moon phase (%illuminated). Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. This is for US. epoch=2444238.5 for Asia generally.

autots.tools.lunar.moon_phase_alternative(datetime_index, epoch=2451545.0)

Numpy version. Takes a pd.DatetimeIndex and returns moon phase (%illuminated).

autots.tools.lunar.moon_phase_df(datetime_index, epoch=2444237.905)

Convert pandas DatetimeIndex to moon phases. Note timezone and hour can matter slightly. Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. 2444237.905 is for US Central. epoch=2444238.5 for Asia generally.

autots.tools.lunar.moon_phase_df_alternative(datetime_index, epoch=None)

Convert pandas DatetimeIndex to moon phases. Note timezone and hour can matter slightly.

Parameters:
  • datetime_index – pandas DatetimeIndex

  • epoch – If provided, uses the approximate (legacy) method for backward compatibility. If None, uses the new more accurate Jean Meeus method.

autots.tools.lunar.phase_string(p, precision=0.05, new=0.0, first=0.25, full=0.4, last=0.75, nextnew=1.0)
autots.tools.lunar.todeg(r)
autots.tools.lunar.torad(d)

autots.tools.mocks module

Fake/local imports for when libraries aren’t available, to prevent the whole package from failing to load.

class autots.tools.mocks.DataLoader(dataset, *args, **kwargs)

Bases: object

class autots.tools.mocks.Dataset(*args, **kwargs)

Bases: object

class autots.tools.mocks.MinMaxScaler(feature_range=(0, 1))

Bases: object

fit(X)
fit_transform(X)
transform(X)
class autots.tools.mocks.Module(*args, **kwargs)

Bases: object

eval()
forward(*args, **kwargs)
load_state_dict(state_dict, strict=True)
state_dict()
train(mode=True)
class autots.tools.mocks.StandardScaler

Bases: object

StandardScaler that works with both numpy arrays and pandas DataFrames. Compatible with sklearn.preprocessing.StandardScaler interface.

fit(X)

Compute the mean and standard deviation for each feature.

fit_transform(X)

Fit the scaler and transform the dataset.

inverse_transform(X)

Revert the scaled data back to the original scale.

transform(X)

Scale the dataset using the stored mean and standard deviation.

class autots.tools.mocks.TensorDataset(*tensors)

Bases: object

autots.tools.mocks.butter(x)
autots.tools.mocks.curve_fit(x)
autots.tools.mocks.fftconvolve(x)
class autots.tools.mocks.norm(loc=0, scale=1)

Bases: object

cdf(x)

Cumulative distribution function using error function.

ppf(q)

Simple normal quantile approximation using Abramowitz & Stegun rational approximation.

autots.tools.mocks.savgol_filter(x)
autots.tools.mocks.sosfiltfilt(x)
autots.tools.mocks.tqdm(x, **kwargs)

autots.tools.percentile module

Faster percentile and quantile for numpy

Entirely from: https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/

autots.tools.percentile.nan_percentile(in_arr, q, method='linear', axis=0, errors='raise')

Given a 3D array, return the given percentiles as input by q. Beware this is only tested for the limited case required here, and will not match np fully. Args more limited. If errors=”rollover” passes to np.nanpercentile where args are not supported.

autots.tools.percentile.nan_quantile(arr, q, method='linear', axis=0, errors='raise')

Same as nan_percentile but accepts q in range [0, 1]. Args more limited. If errors=”rollover” passes to np.nanpercentile where not supported.

autots.tools.percentile.trimmed_mean(data, percent, axis=0)

autots.tools.plotting module

Shared plotting utilities for feature-rich time series.

autots.tools.plotting.calculate_peak_density(model: str, data: DataFrame, group_col: str = 'Model', y_col: str = 'TotalRuntimeSeconds')

Maximum KDE value for the given model’s distribution.

autots.tools.plotting.create_seaborn_palette_from_cmap(cmap_name: str = 'gist_rainbow', n: int = 10)

Return seaborn palette sampling the given matplotlib cmap.

autots.tools.plotting.plot_distributions(runtimes_data: DataFrame, group_col: str = 'Model', y_col: str = 'TotalRuntimeSeconds', xlim: float | None = None, xlim_right: float | None = None, title_suffix: str = '')

Plot runtime density per group with custom palette.

autots.tools.plotting.plot_feature_panels(series_name: str, date_index: DatetimeIndex, series_data: Series | ndarray, components: Mapping[str, Iterable[float]], labels: Mapping[str, Iterable], series_type_description: str | None = None, scale: float | None = None, noise_to_signal: float | None = None, figsize=(16, 12), title_prefix: str = 'Feature Analysis', save_path: str | None = None, show: bool = True)

Create a four-panel diagnostic plot shared by generator and detector.

autots.tools.plotting.plot_forecast_with_intervals(plot_df: DataFrame, actual_col: str | None = 'actuals', forecast_col: str = 'forecast', lower_col: str = 'low_forecast', upper_col: str = 'up_forecast', title: str | None = None, colors: Mapping[str, str] | None = None, include_bounds: bool = True, alpha: float = 0.3, band_color: str | None = None, interval_label: str | None = 'Prediction Interval', band_kwargs: Mapping[str, Any] | None = None, plot_lines: bool = True, ax=None, **plot_kwargs)

Plot forecast (and optionally actuals) with confidence bounds.

autots.tools.plotting.plot_risk_score_bar(risk_data: Series | ndarray, index=None, bar_color: str = '#6495ED', bar_ylim: tuple | list | None = None, title: str = 'Risk Score', ylabel: str = 'Risk', xlabel: str = 'Forecast Horizon', ax=None, **bar_kwargs)

Plot risk scores as a bar chart.

Utility function for plotting event risk or similar probability scores.

Parameters:
  • risk_data – Series or array of risk scores to plot

  • index – x-axis values; if None, uses range or Series index

  • bar_color – color for bars

  • bar_ylim – y-axis limits as (min, max) or [min, max]

  • title – chart title

  • ylabel – y-axis label

  • xlabel – x-axis label

  • ax – matplotlib axis to plot on; if None, creates new subplot

  • **bar_kwargs – additional arguments passed to ax.bar()

Returns:

matplotlib axis

autots.tools.plotting.plot_simulation_paths(simulations: ndarray, index=None, colors: list | None = None, alpha: float = 0.9, linewidth: float = 1.2, ax=None, **plot_kwargs)

Plot multiple simulation/forecast paths.

Utility for plotting Monte Carlo simulations, motif neighbors, or ensemble members.

Parameters:
  • simulations – 2D array of shape (n_simulations, n_timesteps)

  • index – x-axis values; if None, uses range

  • colors – list of color strings for each path; if None, uses random grays

  • alpha – transparency for lines

  • linewidth – width of lines

  • ax – matplotlib axis to plot on; if None, creates new subplot

  • **plot_kwargs – additional arguments passed to ax.plot()

Returns:

matplotlib axis

autots.tools.probabilistic module

Point to Probabilistic

autots.tools.probabilistic.Point_to_Probability(train, forecast, prediction_interval=0.9, method: str = 'historic_quantile')

Data driven placeholder for model error estimation.

Catlin Point to Probability method (‘a mixture of dark magic and gum disease’)

Parameters:
  • train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex

  • forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.

  • prediction_interval (float) – confidence or perhaps credible interval

  • method (str) – spell to cast to create dark magic. ‘historic_quantile’, ‘inferred_normal’, ‘variable_pct_change’ gum disease available separately upon request.

Returns:

upper_error, lower_error (two pandas.DataFrames for upper and lower bound respectively)

autots.tools.probabilistic.Variable_Point_to_Probability(train, forecast, alpha=0.3, beta=1)

Data driven placeholder for model error estimation.

ErrorRange = beta * (En + alpha * En-1 [cum sum of En]) En = abs(0.5 - QTP) * D D = abs(Xn - ((Avg % Change of Train * Xn-1) + Xn-1)) Xn = Forecast Value QTP = Percentile of Score in All Percent Changes of Train Score = Percent Change (from Xn-1 to Xn)

Parameters:
  • train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex

  • forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.

  • alpha (float) – parameter which effects the broadening of error range over time Usually 0 < alpha < 1 (although it can be larger than 1)

  • beta (float) – parameter which effects the general width of the error bar Usually 0 < beta < 1 (although it can be larger than 1)

Returns:

error width for each value of forecast.

Return type:

ErrorRange (pandas.DataFrame)

autots.tools.probabilistic.historic_quantile(df_train, prediction_interval: float = 0.9, nan_flag=None)

Computes the difference between the median and the prediction interval range in historic data.

Parameters:
  • df_train (pd.DataFrame) – a dataframe of training data

  • prediction_interval (float) – the desired forecast interval range

Returns:

two 1D arrays

Return type:

lower, upper (np.array)

autots.tools.probabilistic.inferred_normal(train, forecast, n: int = 5, prediction_interval: float = 0.9)

A corruption of Bayes theorem. It will be sensitive to the transformations of the data.

autots.tools.probabilistic.percentileofscore_appliable(x, a, kind='rank')

autots.tools.profile module

autots.tools.regressor module

autots.tools.seasonal module

seasonal

@author: Colin

autots.tools.seasonal.anchored_segment_fourier_features(DTindex, method: str)
autots.tools.seasonal.anchored_warped_fourier_features(DTindex, method: str)
autots.tools.seasonal.create_datepart_components(DTindex, seasonality)

single date part one-hot flags.

autots.tools.seasonal.create_seasonality_feature(DTindex, t, seasonality, history_days=None)

Cassandra-designed feature generator.

autots.tools.seasonal.date_part(DTindex, method: str = 'simple', set_index: bool = True, polynomial_degree: int | None = None, holiday_country: str | None = None, holiday_countries_used: bool = True, lags: int | None = None, forward_lags: int | None = None)

Create date part columns from pd.DatetimeIndex.

If you date_part isn’t recognized, you will see a [‘year’, ‘month’ ‘day’, ‘weekday’] output

Parameters:
  • DTindex (pd.DatetimeIndex) – datetime index to provide dates

  • method (str) – expanded, recurring, or simple simple - just day, year, month, weekday expanded - all available futures recurring - all features that should commonly repeat without aging simple_2 simple_3 simple_binarized expanded_binarized common_fourier

  • set_index (bool) – if True, return DTindex as index of df

  • polynomial_degree (int) – add this degree of sklearn polynomial features if not None

  • holdiay_country (list or str) – names of countries to pull calendar holidays for

  • holiday_countries_used (bool) – to use holiday_country if given

  • lags (int) – if not None, include the past N previous index date parts

  • forward_lags (int) – if not None, include the future N index date parts

Returns:

pd.Dataframe with DTindex

autots.tools.seasonal.fourier_df(DTindex, seasonality, order=10, t=None, history_days=None)
autots.tools.seasonal.fourier_series(t, p=365.25, n=10)
autots.tools.seasonal.random_datepart(method='random')

New random parameters for seasonality.

autots.tools.seasonal.seasonal_independent_match(DTindex, DTindex_future, k, datepart_method='simple_binarized', distance_metric='canberra', full_sort=False, nan_array=None)
autots.tools.seasonal.seasonal_int(include_one: bool = False, small=False, very_small=False)

Generate a random integer of typical seasonalities.

Parameters:
  • include_one (bool) – whether to include 1 in output options

  • small (bool) – if True, keep below 364

  • very_small (bool) – if True keep below 30

autots.tools.seasonal.seasonal_repeating_wavelet(DTindex, p, order=12, sigma=4.0, wavelet_type='morlet')
autots.tools.seasonal.seasonal_window_match(DTindex, k, window_size=10, forecast_length=None, datepart_method='common_fourier_rw', distance_metric='mae', full_sort=False)

autots.tools.shaping module

Reshape data.

class autots.tools.shaping.NumericTransformer(na_strings: list = ['', ' '], categorical_fillna: str = 'ffill', handle_unknown: str = 'use_encoded_value', downcast: str | None = None, verbose: int = 0)

Bases: object

General purpose numeric conversion for pandas dataframes.

All categorical data and levels must be passed to .fit(). If new categorical series or levels are present in .transform() it won’t work!

Currently datetimes cannot be inverse_transformed back to datetime

Parameters:
  • na_strings (list) – list of strings to replace as pd.NA

  • categorical_fillna (str) – how to fill NaN for categorical variables (numeric NaN are unaltered) “ffill” - uses forward and backward filling to supply na values “indicator” or anything else currently results in all missing replaced with str “missing_value”

  • handle_unknown (str) – passed through to scikit-learn OrdinalEncoder

  • downcast (str) – passed to pd.to_numeric, use None or ‘float’

  • verbose (int) – greater than 0 to print some messages

fit(df)

Learn behavior of data to change.

Parameters:

df (pandas.DataFrame) – input dataframe

fit_transform(df)

Fits and Returns Magical DataFrame.

Parameters:

df (pandas.DataFrame) – input dataframe

inverse_transform(df, convert_dtypes: bool = False)

Convert numeric back to categorical. :param df: df :type df: pandas.DataFrame :param convert_dtypes: whether to use pd.convert_dtypes after inverse :type convert_dtypes: bool

transform(df)

Convert categorical dataset to numeric.

autots.tools.shaping.clean_weights(weights, series, verbose=0)

Polish up series weighting information

Parameters:
  • weights (dict) – dictionary of series_id: weight (float or int)

  • series (iterable) – list of series_ids in the dataset

autots.tools.shaping.df_cleanup(df_wide, frequency: str = 'infer', prefill_na: str | None = None, na_tolerance: float = 0.999, drop_data_older_than_periods: int = 100000, drop_most_recent: int = 0, aggfunc: str = 'first', verbose: int = 1)

Pass cleaning functions through to dataframe.

Parameters:
  • df_wide (pd.DataFrame) – input dataframe to clean.

  • frequency (str, optional) – frequency in string of alias for DateOffset object, normally “1D” -daily, “MS” -month start etc. Currently, aliases are listed somewhere in here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html. Defaults to “infer”.

  • na_tolerance (float, optional) – allow up to this percent of values to be NaN, else drop the entire series. The default of 0.95 means a series can be 95% NaN values and still be included. Defaults to 0.999.

  • drop_data_older_than_periods (int, optional) – cut off older data because eventually you just get too much. Defaults to 100000.

  • drop_most_recent (int, optional) – number of most recent data points to remove. Useful if you pull monthly data before month end, and you don’t want an incomplete month appearing complete. Defaults to 0.

  • aggfunc (str, optional) – passed to pd.pivot_table, determines how to aggregate duplicates for upsampling. Other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. If categorical data is provided, aggfunc=’first’ is recommended. Defaults to ‘first’.

  • verbose (int, optional) – 0 for silence, higher values for more noise. Defaults to 1.

Returns:

original dataframe, now possibly shorter.

Return type:

pd.DataFrame

autots.tools.shaping.drop_leading_na_block(df: DataFrame, threshold: float = 0.75, window: int = 7, max_drop_fraction: float = 0.5, warn: bool = True) DataFrame
Drop an initial leading block of rows if:
  • Those rows are ‘bad’ (row NaN proportion > threshold), AND

  • There are at least window consecutive bad rows starting from the top.

Will warn or raise if more than max_drop_fraction of rows would be dropped.

Parameters:
  • df – pandas DataFrame with a DatetimeIndex.

  • threshold – Proportion of NaNs to classify a row as bad.

  • window – Number of consecutive leading bad rows required.

  • max_drop_fraction – Maximum allowed fraction of rows to drop.

  • warn – If True, raises a UserWarning on excessive drop; if False, proceeds with printed warning.

Returns:

Trimmed DataFrame (or identical DataFrame if no qualifying block found).

autots.tools.shaping.freq_to_timedelta(freq)

Working around pandas limitations.

autots.tools.shaping.infer_frequency(df_wide, warn=True, **kwargs)

Infer the frequency in a slightly more robust way.

Parameters:
  • df_wide (pd.Dataframe or pd.DatetimeIndex) – input to pull frequency from

  • warn (bool) – unused, here to make swappable with pd.infer_freq

autots.tools.shaping.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', aggfunc: str = 'first')

Take long data and convert into wide, cleaner data.

Parameters:
  • df (pd.DataFrame) –

  • date_col (str) –

  • value_col (str) –

    • the name of the column with the values of the time series (ie sales $)

  • id_col (str) –

    • name of the id column, unique for each time series

  • aggfunc (str) –

    • passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime

    other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended

autots.tools.shaping.simple_train_test_split(df, forecast_length: int = 10, min_allowed_train_percent: float = 0.3, verbose: int = 1)

Uses the last periods of forecast_length as the test set, the rest as train

Parameters:
  • forecast_length (int) – number of future periods to predict

  • min_allowed_train_percent (float) –

    • forecast length cannot be greater than 1 - this

    constrains the forecast length from being much larger than than the training data note this includes NaNs in current configuration

Returns:

train, test (both pd DataFrames)

autots.tools.shaping.split_digits_and_non_digits(s)
autots.tools.shaping.subset_series(df, weights, n: int = 1000, random_state: int = 2020)

Return a sample of time series.

Parameters:
  • df (pd.DataFrame) – wide df with series as columns and DT index

  • n (int) – number of unique time series to keep, or None

  • random_state (int) – random seed

autots.tools.shaping.wide_to_3d(wide_arr, seasonality=7, output_shape='gst')

Generates 3d (groups/seasonality, series, time steps) from wide (time step, series) numpy array.

Parameters:
  • wide_arr (np.array) – wide style (timesteps, series) numpy time series

  • seasonality (int) – seasonality of the series to use, avoid really large values

  • output_shape (str) – either ‘gst’ or ‘sgt’ which is output shape gst: (groups/seasonality, series, time steps) sgt: (series, groups/seasonality, time steps)

autots.tools.thresholding module

Created on Thu Jul 7 10:27:46 2022

@author: Colin

class autots.tools.thresholding.NonparametricThreshold(data, warmup_pts: int = 1, p=0.1, error_buffer=1, z_init=2.5, z_limit=12.0, z_step=0.5, max_contamination=0.25, mean_weight: float = 10, sd_weight: float = 10, anomaly_count_weight: float = 1, inverse: bool = False)

Bases: object

compare_to_epsilon(inverse=False)

Compare smoothed error values to epsilon (error threshold) and group consecutive errors together into sequences.

Parameters:
  • errors_all (obj) – Errors class object containing list of all

  • set (previously identified anomalies in test) –

find_epsilon(inverse=False)

Find the anomaly threshold that maximizes function representing tradeoff between:

  1. number of anomalies and anomalous ranges

b) the reduction in mean and st dev if anomalous points are removed from errors

(see https://arxiv.org/pdf/1802.04431.pdf)

Parameters:

inverse (bool) – If true, epsilon is calculated for inverted errors

prune_anoms(inverse=False)

Remove anomalies that don’t meet minimum separation from the next closest anomaly or error value

Parameters:

inverse (bool) – If true, epsilon is calculated for inverted errors

score_anomalies()

Calculate anomaly scores based on max distance from epsilon for each anomalous sequence.

autots.tools.thresholding.consecutive_groups(iterable, ordering=<function <lambda>>)

Yield groups of consecutive items using itertools.groupby().

From more_itertools package, see description there for details (circa mid 2022)

autots.tools.thresholding.nonparametric(series, method_params)

autots.tools.transform module

autots.tools.wavelet module

autots.tools.wavelet.continuous_db2_wavelet(t, p, order, sigma)
autots.tools.wavelet.create_daubechies_db2_wavelet(p)

Create a Daubechies db2 wavelet with specified period p.

Parameters: - p (int): The period or length to generate the wavelet.

Returns: - np.ndarray: The generated Daubechies db2 wavelet.

autots.tools.wavelet.create_gaussian_wavelet(p, frequency=3, sigma=1.0)

Create a Gaussian-modulated cosine wavelet with specified frequency and sigma.

Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated Gaussian-modulated wavelet.

autots.tools.wavelet.create_haar_wavelet(p)

Create a Haar wavelet with specified period p.

Parameters: - p (float): The period or length to generate the wavelet.

Returns: - np.ndarray: The generated Haar wavelet.

autots.tools.wavelet.create_mexican_hat_wavelet(p, frequency=None, sigma=1.0)

Create a Mexican Hat wavelet (Ricker wavelet) with specified sigma.

Parameters: - p (float): The period or length to generate the wavelet. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated Mexican Hat wavelet.

autots.tools.wavelet.create_morlet_wavelet(p, frequency=3, sigma=1.0)

Create a Morlet wavelet with specified frequency and sigma.

Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated complex Morlet wavelet.

autots.tools.wavelet.create_narrowing_wavelets(p, max_order, t, sigma=0.5)
autots.tools.wavelet.create_real_morlet_wavelet(p, frequency=3, sigma=1.0)

Create a real-valued Morlet wavelet with specified frequency and sigma.

Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated real Morlet wavelet.

autots.tools.wavelet.create_wavelet(t, p, sigma=1.0, phase_shift=0, wavelet_type='morlet')

Create a real-valued wavelet based on real-world anchored time steps in t, with an additional phase shift and a choice of wavelet type.

Parameters: - t (np.ndarray): Array of time steps (in days) from a specified origin. - p (float): The period of the wavelet in the same units as t (typically days). - sigma (float): Standard deviation for the Gaussian envelope. - phase_shift (float): Phase shift to adjust the position of the wavelet peak. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).

Returns: - np.ndarray: The generated wavelet values for each time step.

autots.tools.wavelet.offset_wavelet(p, t, order=5, sigma=1.0, wavelet_type='morlet')

Create an offset collection of wavelets with order offsets, ensuring that peaks are spaced p/order apart.

Parameters: - p (float): Period of the wavelet in the same units as t (typically days). - t (np.ndarray): Array of time steps. - order (int): The number of offsets. - sigma (float): Standard deviation for the Gaussian envelope. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).

Returns: - np.ndarray: A 2D array with order wavelets along axis 1.

autots.tools.window_functions module

autots.tools.window_functions.chunk_reshape(arr, window_size=10, chunk_size=100, sample_fraction=None, random_seed=7734, dtype=<class 'numpy.float32'>)

Shifts from (n_records, n_series) to (windows, window_size). Multivariate. More memory efficient, if not quite as fast as x.reshape(-1, x.shape[-1]) for 3D numpy array.

autots.tools.window_functions.last_window(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False)

Pandas based function to provide the last window of window_maker.

autots.tools.window_functions.np_2d_arange(start=0, stop=3, step=1, num_columns=4)
autots.tools.window_functions.retrieve_closest_indices(df, num_indices, forecast_length, window_size: int = 10, distance_metric: str = 'braycurtis', stride_size: int = 1, start_index: int | None = None, include_differenced: bool = False, include_last: bool = True, verbose: int = 0)

Find next indicies closest to the final segment of forecast_length

Parameters:
  • df (pd.DataFrame) – source data in wide format

  • num_indices (int) – number of indices to return

  • forecast_length (int) – length of forecast

  • window_size (int) – length of comparison

  • distance_metric (str) – distance measure from scipy and nan_euclidean

  • stride_size (int) – length of spacing between windows

  • start_index (int) – index to begin creation of windows from

  • include_difference (bool) – if True, also compare on differences

autots.tools.window_functions.rolling_window_view(array, window_shape=(0,), axis=None, writeable=False)

Create a view of array which for every point gives the n-dimensional neighbourhood of size window. New dimensions are added at the end of array or after the corresponding original dimension.

Based on: https://gist.github.com/seberg/3866040 but designed to match the newer np.sliding_window_view

Parameters:
  • array (np.array) – Array to which the rolling window is applied.

  • window_shape (int) – Either a single integer to create a window of only the last axis or a tuple to create it for the last len(window) axis. 0 can be used as a to ignore a dimension in the window.

  • axis (int) – If given, must have the same size as window. In this case window is interpreted as the size in the dimension given by axis. IE. a window of (2, 1) is equivalent to window=2 and axis=-2.

Returns:

A view on array which is smaller to fit the windows and has windows added dimensions (0s not counting), ie. every point of array is an array of size window.

autots.tools.window_functions.sliding_window_view(array, window_shape=(0,), axis=None, writeable=False, **kwargs)

Toggles between numpy and internal version depending on np.__version__.

autots.tools.window_functions.window_id_maker(window_size: int, max_steps: int, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)

Create indices for array of multiple window slices of data

Parameters:
  • window_size (int) – length of time history to include

  • max_steps (int) – the maximum number of windows to create

  • start_index (int) – if to not start at the first point, start at this point

  • stride_size (int) – number of skips between each window start point

  • skip_size (int) – number of skips between each obs in a window (downsamples)

Returns:

np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)

autots.tools.window_functions.window_lin_reg(x, y, w)

From https://stackoverflow.com/questions/70296498/efficient-computation-of-moving-linear-regression-with-numpy-numba/70304475#70304475

autots.tools.window_functions.window_lin_reg_mean(x, y, w)

From https://stackoverflow.com/questions/70296498/efficient-computation-of-moving-linear-regression-with-numpy-numba/70304475#70304475

autots.tools.window_functions.window_lin_reg_mean_no_nan(x, y, w)

From https://stackoverflow.com/questions/70296498/efficient-computation-of-moving-linear-regression-with-numpy-numba/70304475#70304475

autots.tools.window_functions.window_maker(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False, shuffle: bool = False, output_dim: str = 'forecast_length', forecast_length: int = 1, max_windows: int = 5000, regression_type: str | None = None, future_regressor=None, random_seed: int = 1234)

Convert a dataset into slices with history and y forecast.

Parameters:
  • df (pd.DataFrame) – wide format df with sorted index

  • window_size (int) – length of history to use for X window

  • input_dim (str) – univariate or multivariate. If multivariate, all series in single X row

  • shuffle (bool) – (deprecated)

  • output_dim (str) – ‘forecast_length’ or ‘1step’ where 1 step is basically forecast_length=1

  • forecast_length (int) – number of periods ahead that will be forecast

  • max_windows (int) – a cap on total number of windows to generate. If exceeded, random of this int are selected.

  • regression_type (str) – None or “user” if to try to concat regressor to windows

  • future_regressor (pd.DataFrame) – values of regressor if used

  • random_seed (int) – a consistent random

Returns:

X, Y

autots.tools.window_functions.window_maker_2(array, window_size: int, max_steps: int | None = None, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)

Create array of multiple window slices of data Note that this returns a different orientation than window_maker_3

Parameters:
  • array (np.array) – source of historic information of shape (num_obs, num_series)

  • window_size (int) – length of time history to include

  • max_steps (int) – the maximum number of windows to create

  • start_index (int) – if to not start at the first point, start at this point

  • stride_size (int) – number of skips between each window start point

  • skip_size (int) – number of skips between each obs in a window (downsamples)

Returns:

np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)

autots.tools.window_functions.window_maker_3(array, window_size: int, **kwargs)

stride tricks version of window. About 40% faster than window_maker_2 Note that this returns a different orientation than window_maker_2

Parameters:
  • array (np.array) – in shape of (num_obs, num_series)

  • window_size (int) – length of slice of history

  • np.lib.stride_tricks.sliding_window_view (**kwargs passed to) –

Returns:

np.array with 3D shape (num windows, num columns/series, window_length), 2D array if only 1D array provided)

autots.tools.window_functions.window_sum_mean(x, w, axis=0)
autots.tools.window_functions.window_sum_mean_nan_tail(x, w, axis=0)
autots.tools.window_functions.window_sum_nan_mean(x, w, axis=0)

Module contents

Basic utilities.