autots.tools package¶

Submodules¶

autots.tools.anomaly_utils module¶

Created on Fri Jul 1 15:41:21 2022

@author: Colin

point, contextual, and collective. Point anomalies are single values that fall within low-density regions of values, collective anomalies indicate that a sequence of values is anomalous rather than any single value by itself, and contextual anomalies are single values that do not fall within low-density regions yet are anomalous with regard to local values - https://arxiv.org/pdf/1802.04431.pdf

autots.tools.anomaly_utils.anomaly_df_to_holidays(anomaly_df, actuals=None, anomaly_scores=None, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=False, use_lunar_holidays=False, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False)¶

autots.tools.anomaly_utils.anomaly_new_params(method='random')¶

autots.tools.anomaly_utils.create_dates_df(dates)¶: Take a pd.DatetimeIndex and create simple date parts.

autots.tools.anomaly_utils.dates_to_holidays(dates, df_cols, style='long', holiday_impacts='value', day_holidays=None, wkdom_holidays=None, wkdeom_holidays=None, lunar_holidays=None, lunar_weekday=None, islamic_holidays=None, hebrew_holidays=None, hindu_holidays=None, max_features: int | None = None)¶

Populate date information for a given pd.DatetimeIndex.

Parameters:

dates (pd.DatetimeIndex) – list of dates
day_holidays (pd.DataFrame) – list of month/day holidays. Pass None if not available
style (str) – option for how to return information “long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar
holiday_impacts (dict) – a dict passed to .replace contaning values for holiday_names, or str ‘value’ or ‘anomaly_score’

autots.tools.anomaly_utils.detect_anomalies(df, output, method, transform_dict=None, method_params={}, eval_period=None, n_jobs=1)¶

All will return -1 for anomalies.

Parameters:

output (str) – ‘multivariate’ (each series unique outliers), or ‘univariate’ (all series together for one outlier flag per timestamp)
transform_dict (dict) – option but helpful, often datepart, differencing, or other standard AutoTS transformer params
forecast_params (dict) – used to backcast and identify ‘unforecastable’ values, required only for predict_interval method
eval_periods (int) – only use this length tail of data, currently only implemented for forecast_params forecasting if used
n_jobs (int) – multiprocessing jobs, used by some methods

Returns:

pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)

autots.tools.anomaly_utils.gaussian_mixture(df, n_components=2, tol=0.001, max_iter=100, responsibility_threshold=0.05)¶

autots.tools.anomaly_utils.holiday_new_params(method='random')¶

autots.tools.anomaly_utils.limits_to_anomalies(df, output, upper_limit, lower_limit, method_params=None)¶

autots.tools.anomaly_utils.loop_sk_outliers(df, method, method_params={}, n_jobs=1)¶: Multiprocessing on each series for multivariate outliers with sklearn.

autots.tools.anomaly_utils.nonparametric_multivariate(df, output, method_params, n_jobs=1)¶

autots.tools.anomaly_utils.sk_outliers(df, method, method_params={})¶: scikit-learn outlier methods wrapper.

autots.tools.anomaly_utils.values_to_anomalies(df, output, threshold_method, method_params, n_jobs=1)¶

autots.tools.anomaly_utils.zscore_survival_function(df, output='multivariate', method='zscore', distribution='norm', rolling_periods: int = 200, center: bool = True)¶

Take a dataframe and generate zscores and then generating survival probabilities (smaller = more outliery).

Parameters:

df (pd.DataFramme) – wide style time series data (datetimeindex, series)
output (str) – univariate (1 series from all) or multivariate (all series input returned unique)
method (str) – zscore, rolling_zscore, mad (median abs dev)
distribution (str) – distribution to sample sf/outliers from
rolling_period (int) – >1, used for rolling_zscore period
center (bool) – passed to pd.rolling for rolliing_zscore, True for holiday detection, False for anomaly detection generally

Returns:

pd.Dataframe of p-values

autots.tools.calendar module¶

Calendar conversion functions.

Includes Lunar, Chinese lunar, and Arabic lunar

autots.tools.calendar.gregorian_to_chinese(datetime_index)¶: Convert a pandas DatetimeIndex to Chinese Lunar calendar. Potentially has errors.

autots.tools.calendar.gregorian_to_christian_lunar(datetime_index)¶: Convert a pandas DatetimeIndex to Christian Lunar calendar. Aspiration it doesn’t work exactly.

autots.tools.calendar.gregorian_to_hebrew(dates)¶

Convert pd.Datetimes to a Hebrew date. From pyluach by simlist.

This is the slowest of the lot and needs to be improved.

autots.tools.calendar.gregorian_to_hindu(datetime_index)¶

Convert a pandas DatetimeIndex to Hindu calendar date components. Hindu calendar has numerous regional variations.

Used an llm to put this one together. It gets the dates wrong, but it does appear to have correlated consistency so may still work for modeling. Suggestions for improvement welcomed.

autots.tools.calendar.gregorian_to_islamic(date, epoch_adjustment=1.5)¶

Calculate Islamic dates for pandas DatetimeIndex. Approximately. From convertdate by fitnr.

Parameters:: epoch_adjustment (float) – 1.0 and that needs to be adjusted by about +/- 0.5 to account for timezone

autots.tools.calendar.heb_is_leap(year)¶

autots.tools.calendar.lunar_from_lunar(new_moon)¶: Assumes continuous daily data and pre-needed start.

autots.tools.calendar.lunar_from_lunar_full(full_moon)¶: Assumes continuous daily data and pre-needed start.

autots.tools.calendar.to_jd(year, month, day)¶: Determine Julian day count from Islamic date. From convertdate by fitnr.

autots.tools.cointegration module¶

Cointegration

Johansen heavily based on Statsmodels source code

BTCD heavily based on D. Barba https://towardsdatascience.com/canonical-decomposition-a-forgotten-method-for-time-series-cointegration-and-beyond-4d1213396da1

autots.tools.cointegration.btcd_decompose(p_mat: ndarray, regression_model, max_lag: int = 1, return_eigenvalues=False)¶: Calculate decomposition. p_mat is of shape(t,n), wide style data.

autots.tools.cointegration.coint_johansen(endog, det_order=-1, k_ar_diff=1, return_eigenvalues=False)¶: Johansen cointegration test of the cointegration rank of a VECM, abbreviated from Statsmodels

autots.tools.cointegration.fourier_series(dates, period, series_order)¶

Provides Fourier series components with the specified frequency and order.

Parameters:

dates (pd.Series containing timestamps.) –
period (Number of days of the period.) –
series_order (Number of components.) –

Return type:

Matrix with seasonality features.

autots.tools.cointegration.lagmat(x, maxlag: int, trim='forward', original='ex')¶: Create 2d array of lags. Modified from Statsmodels.

autots.tools.constraint module¶

Constraint generation functions

autots.tools.constraint.apply_constraint_single(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None)¶

autots.tools.constraint.apply_fit_constraint(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, lower_constraint=None, upper_constraint=None, train_min=None, train_max=None, fillna=None)¶

autots.tools.constraint.constant_growth_rate(periods, final_growth)¶: Take a final target growth rate (ie 2 % over a year) and convert to a daily (etc) value.

autots.tools.constraint.constraint_new_params(method: str = 'fast')¶

autots.tools.constraint.fit_constraint(constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None, forecast_length=None)¶

autots.tools.cpu_count module¶

CPU counter for multiprocesing.

autots.tools.cpu_count.cpu_count(modifier: float = 1)¶

Find available CPU count, running on both Windows/Linux.

Attempts to be very conservative:

Remove Intel Hyperthreading logical cores
Find max cores allowed to the process, if less than machine has total

Runs best with psutil installed, fallsback to mkl, then os core count/2

Parameters:: modifier (float) – multiple CPU count by this value

autots.tools.cpu_count.set_n_jobs(n_jobs, verbose=0)¶

autots.tools.fast_kalman module¶

Some other resources that I have found useful:: https://kevinkotze.github.io/ts-4-state-space/ https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html an Introduction to State Space Time Series Analysis, Commandeur and Koopman, chp 8 Forecasting, structural time series models, and the Kalman Filter, Andrew Harvey

Following the notation in [1], the Kalman filter framework consists of a dynamic model (state transition model)

\[x_k = A x_{k-1} + q_{k-1}, \qquad q_{k-1} \sim N(0, Q)\]

and a measurement model (observation model)

\[y_k = H x_k + r_k, \qquad r_k \sim N(0, R),\]

where the vector $x$ is the (hidden) state of the system and $y$ is an observation. A and H are matrices of suitable shape and $Q$, $R$ are positive-definite noise covariance matrices.

Usage example¶

import numpy.random numpy.random.seed(0)

Define model

import simdkalman import numpy as np

kf = simdkalman.KalmanFilter(
state_transition = [[1,1],[0,1]], # matrix A process_noise = np.diag([0.1, 0.01]), # Q observation_model = np.array([[1,0]]), # H observation_noise = 1.0) # R

Generate some fake data

import numpy.random as random

# 100 independent time series data = random.normal(size=(100, 200))

# with 10% of NaNs denoting missing values data[random.uniform(size=data.shape) < 0.1] = np.nan

Smooth all data

smoothed = kf.smooth(data,
initial_value = [1,0], initial_covariance = np.eye(2) * 0.5)

# second timeseries, third time step, hidden state x print(‘mean’) print(smoothed.states.mean[1,2,:])

print(‘covariance’) print(smoothed.states.cov[1,2,:,:])

mean [ 0.29311384 -0.06948961] covariance [[ 0.19959416 -0.00777587]

[-0.00777587 0.02528967]]

Predict new data for a single series (1d case)

predicted = kf.predict(data[1,:], 123)

# predicted observation y, third new time step pred_mean = predicted.observations.mean[2] pred_stdev = np.sqrt(predicted.observations.cov[2])

print(‘%g +- %g’ % (pred_mean, pred_stdev))

1.71543 +- 1.65322

Low-level Kalman filter computation steps with multi-dimensional input arrays. Unlike with the KalmanFilter class, all inputs must be numpy arrays. However, their dimensions can flexibly vary form 1 to 3 as long as they are reasonable from the point of view of matrix multiplication and numpy broadcasting rules. Matrix operations are applied on the last two axes of the arrays.

class autots.tools.fast_kalman.Gaussian(mean, cov)¶

Bases: object

static empty(n_states, n_vars, n_measurements, cov=True)¶

unvectorize_state()¶

unvectorize_vars()¶

class autots.tools.fast_kalman.KalmanFilter(state_transition, process_noise, observation_model, observation_noise)¶

Bases: object

The main Kalman filter class providing convenient interfaces to vectorized smoothing and filtering operations on multiple independent time series.

As long as the shapes of the given parameters match reasonably according to the rules of matrix multiplication, this class is flexible in their exact nature accepting

scalars: process_noise = 0.1

(2d) numpy matrices: process_noise = numpy.eye(2)

2d arrays: observation_model = [[1,2]]

3d arrays and matrices for vectorized computations. Unlike the other options, this locks the shape of the inputs that can be processed by the smoothing and prediction methods.

Parameters:

state_transition – State transition matrix $A$
process_noise – Process noise (state transition covariance) matrix $Q$
observation_model – Observation model (measurement model) matrix $H$
observation_noise – Observation noise (measurement noise covariance) matrix $R$

class Result¶: Bases: object

compute(data, n_test, initial_value=None, initial_covariance=None, smoothed=True, filtered=False, states=True, covariances=True, observations=True, likelihoods=False, gains=False, log_likelihood=False, verbose=False)¶

Smoothing, filtering and prediction at the same time. Used internally by other methods, but can also be used directly if, e.g., both smoothed and predicted data is wanted.

See smooth and predict for explanation of the common parameters. With this method, there also exist the following flags.

Parameters:

smoothed (boolean) – compute Kalman smoother (used by smooth)
filtered (boolean) – return (one-way) filtered data
likelihoods (boolean) – return likelihoods of each step
gains (boolean) – return Kalman gains and pairwise covariances (used by the EM algorithm). If true, the gains are provided as a member of the relevant subresult filtered.gains and/or smoothed.gains.
log_likelihood (boolean) – return the log-likelihood(s) for the entire series. If matrix data is given, this will be a vector where each element is the log-likelihood of a single row.

Return type:

result object whose fields depend on of the above parameter flags are True. The possible values are: smoothed (the return value of smooth, may contain smoothed.gains), filtered (like smoothed, may also contain filtered.gains), predicted (the return value of predict if n_test > 0) pairwise_covariances, likelihoods and log_likelihood.

em(data, n_iter=5, initial_value=None, initial_covariance=None, verbose=False)¶

em_observation_noise(result, data, verbose=False)¶

em_process_noise(result, verbose=False)¶

predict(data, n_test, initial_value=None, initial_covariance=None, states=True, observations=True, covariances=True, verbose=False)¶

Filter past data and predict a given number of future values. The data can be given as either of

1d array, like [1,2,3,4]. In this case, one Kalman filter is used and the return value structure will contain an 1d array of observations (both .mean and .cov will be 1d).

2d matrix, whose each row is interpreted as an independent time series, all of which are filtered independently. The returned observations members will be 2-dimensional in this case.

3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e, data[1,2,:] defines the components of the third observation of the second series. In the-multi-dimensional case the returned observations.mean will be 3-dimensional and observations.cov 4-dimensional.

Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.

Parameters:

data – Past data
n_test (integer) – number of future steps to predict.
initial_value – Initial value ${\mathbb E}[x_0]$
initial_covariance – Initial uncertainty ${\rm Cov}[x_0]$
states (boolean) – predict states $x$?
observations (boolean) – predict observations $y$?
covariances (boolean) – include covariances in predictions?

Return type:

Result object with fields states and observations, if the respective parameter flags are set to True. Both are Gaussian result objects with fields mean and cov (if the covariances flag is True)

predict_next(m, P)¶

Single prediction step

Parameters:

m – ${\mathbb E}[x_{j-1}]$, the previous mean
P – ${\rm Cov}[x_{j-1}]$, the previous covariance

Return type:

(prior_mean, prior_cov) predicted mean and covariance ${\mathbb E}[x_j]$, ${\rm Cov}[x_j]$

predict_observation(m, P)¶

Probability distribution of observation $y$ for a given distribution of $x$

Parameters:

m – ${\mathbb E}[x]$
P – ${\rm Cov}[x]$

Return type:

mean ${\mathbb E}[y]$ and covariance ${\rm Cov}[y]$

smooth(data, initial_value=None, initial_covariance=None, observations=True, states=True, covariances=True, verbose=False)¶

Smooth given data, which can be either

1d array, like [1,2,3,4]. In this case, one Kalman filter is used and the return value structure will contain an 1d array of observations (both .mean and .cov will be 1d).

2d matrix, whose each row is interpreted as an independent time series, all of which are smoothed independently. The returned observations members will be 2-dimensional in this case.

3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e, data[1,2,:] defines the components of the third observation of the second series. In the-multi-dimensional case the returned observations.mean will be 3-dimensional and observations.cov 4-dimensional.

Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.

Parameters:

data – 1d or 2d data, see above
initial_value – Initial value ${\mathbb E}[x_0]$
initial_covariance – Initial uncertainty ${\rm Cov}[x_0]$
states (boolean) – return smoothed states $x$?
observations (boolean) – return smoothed observations $y$?
covariances (boolean) – include covariances results?

Return type:

Result object with fields states and observations, if the respective parameter flags are set to True. Both are Gaussian result objects with fields mean and cov (if the covariances flag is True)

smooth_current(m, P, ms, Ps)¶

Simgle Kalman smoother backwards step

Parameters:

m – ${\mathbb E}[x_j|y_1,\ldots,y_j]$, the filtered mean of $x_j$
P – ${\rm Cov}[x_j|y_1,\ldots,y_j]$, the filtered covariance of $x_j$
ms – ${\mathbb E}[x_{j+1}|y_1,\ldots,y_T]$
Ps – ${\rm Cov}[x_{j+1}|y_1,\ldots,y_T]$

Return type:

(smooth_mean, smooth_covariance, smoothing_gain) smoothed mean ${\mathbb E}[x_j|y_1,\ldots,y_T]$, and covariance ${\rm Cov}[x_j|y_1,\ldots,y_T]$ & smoothing gain $C$

update(m, P, y, log_likelihood=False)¶

Single update step with NaN check.

Parameters:

m – ${\mathbb E}[x_j|y_1,\ldots,y_{j-1}]$, the prior mean of $x_j$
P – ${\rm Cov}[x_j|y_1,\ldots,y_{j-1}]$, the prior covariance of $x_j$
y – observation $y_j$
log_likelihood – compute log-likelihood?

Return type:

(posterior_mean, posterior_covariance, log_likelihood) posterior mean ${\mathbb E}[x_j|y_1,\ldots,y_j]$ & covariance ${\rm Cov}[x_j|y_1,\ldots,y_j]$ and, if requested, log-likelihood. If $y_j$ is NaN, returns the prior mean and covariance instead

autots.tools.fast_kalman.autoshape(func)¶: Automatically shape arguments and return values

autots.tools.fast_kalman.ddot(A, B)¶: Matrix multiplication over last two axes

autots.tools.fast_kalman.ddot_t_right(A, B)¶: Matrix multiplication over last 2 axes with right operand transposed

autots.tools.fast_kalman.ddot_t_right_old(A, B)¶: Matrix multiplication over last 2 axes with right operand transposed

autots.tools.fast_kalman.dinv(A)¶: Matrix inverse applied to last two axes

autots.tools.fast_kalman.douter(a, b)¶: Outer product, last two axes

autots.tools.fast_kalman.em_initial_state(result, initial_means)¶

autots.tools.fast_kalman.ensure_matrix(x, dim=1)¶

autots.tools.fast_kalman.ensure_stability(st)¶

autots.tools.fast_kalman.holt_winters_damped_matrices(M, alpha, beta, gamma, phi=1.0)¶: Not sure if this is correct. It’s close, at least.

autots.tools.fast_kalman.new_kalman_params(method=None, allow_auto=True)¶

autots.tools.fast_kalman.predict(mean, covariance, state_transition, process_noise)¶

Kalman filter prediction step

Parameters:

mean – ${\mathbb E}[x_{j-1}]$, the filtered mean form the previous step
covariance – ${\rm Cov}[x_{j-1}]$, the filtered covariance form the previous step
state_transition – matrix $A$
process_noise – matrix $Q$

Return type:

(prior_mean, prior_cov) predicted mean and covariance ${\mathbb E}[x_j]$, ${\rm Cov}[x_j]$

autots.tools.fast_kalman.predict_observation(mean, covariance, observation_model, observation_noise)¶

Compute probability distribution of the observation $y$, given the distribution of $x$.

Parameters:

mean – ${\mathbb E}[x]$
covariance – ${\rm Cov}[x]$
observation_model – matrix $H$
observation_noise – matrix $R$

Return type:

mean ${\mathbb E}[y]$ and covariance ${\rm Cov}[y]$

autots.tools.fast_kalman.priv_smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)¶

autots.tools.fast_kalman.priv_update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement, log_likelihood=False)¶

autots.tools.fast_kalman.random_matrix(rows, cols, density=0.2)¶

autots.tools.fast_kalman.random_state_space(tries=15)¶

autots.tools.fast_kalman.random_state_space_original()¶: Return randomly generated statespace models.

autots.tools.fast_kalman.smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)¶

Kalman smoother backwards step

Parameters:

posterior_mean – ${\mathbb E}[x_j|y_1,\ldots,y_j]$, the filtered mean of $x_j$
posterior_covariance – ${\rm Cov}[x_j|y_1,\ldots,y_j]$, the filtered covariance of $x_j$
state_transition – matrix $A$
process_noise – matrix $Q$
next_smooth_mean – ${\mathbb E}[x_{j+1}|y_1,\ldots,y_T]$
next_smooth_covariance – ${\rm Cov}[x_{j+1}|y_1,\ldots,y_T]$

Return type:

(smooth_mean, smooth_covariance, smoothing_gain) smoothed mean ${\mathbb E}[x_j|y_1,\ldots,y_T]$, and covariance ${\rm Cov}[x_j|y_1,\ldots,y_T]$

autots.tools.fast_kalman.stable_pinv(A, tol=1e-05, regularization=0.0001)¶

autots.tools.fast_kalman.update(prior_mean, prior_covariance, observation_model, observation_noise, measurement)¶

Kalman filter update step

Parameters:

prior_mean – ${\mathbb E}[x_j|y_1,\ldots,y_{j-1}]$, the prior mean of $x_j$
prior_covariance – ${\rm Cov}[x_j|y_1,\ldots,y_{j-1}]$, the prior covariance of $x_j$
observation_model – matrix $H$
observation_noise – matrix $R$
measurement – observation $y_j$

Return type:

(posterior_mean, posterior_covariance) posterior mean and covariance ${\mathbb E}[x_j|y_1,\ldots,y_j]$, ${\rm Cov}[x_j|y_1,\ldots,y_j]$ after observing $y_j$

autots.tools.fast_kalman.update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement)¶: Kalman filter update with a check for NaN observations. Like update but returns (prior_mean, prior_covariance) if measurement is NaN

autots.tools.fft module¶

Created on Mon Oct 9 22:07:37 2023

@author: colincatlin

class autots.tools.fft.FFT(n_harm=10, detrend='linear', freq_range=None)¶

Bases: object

fit(x)¶

generate_harmonics_dataframe(forecast_length=0)¶

predict(forecast_length=0)¶

autots.tools.fft.fourier_extrapolation(x, forecast_length=10, n_harm=10, detrend='linear', freq_range=None)¶

autots.tools.fir_filter module¶

Created on Fri Sep 6 23:59:35 2024

@author: colincatlin

autots.tools.fir_filter.apply_fir_filter_time_domain(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')¶: Apply FIR filter using time-domain convolution (lfilter) for smaller memory usage. This function has padding issues currently.

autots.tools.fir_filter.apply_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')¶

Apply FIR filter to an array of time series data with shape (observations, series).

Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)

Returns: - filtered_data: The filtered version of the input data

autots.tools.fir_filter.fft_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming', chunk_size=1000)¶

Apply FIR filter to an array of time series data with shape (observations, series).

Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)

Returns: - filtered_data: The filtered version of the input data

autots.tools.fir_filter.generate_random_fir_params(method='random', data_type='time_series')¶

autots.tools.hierarchial module¶

class autots.tools.hierarchial.hierarchial(grouping_method: str = 'tile', n_groups: int = 5, reconciliation: str = 'mean', grouping_ids: dict | None = None)¶

Bases: object

Create hierarchial series, then reconcile.

Currently only performs one-level groupings. :param grouping_method: method to create groups. ‘User’ requires hier_id input of groupings. :type grouping_method: str :param n_groups: number of groups, if above is not ‘User’ :type n_groups: int :param reconciliation: None, or ‘mean’ method to combine top and bottom forecasts. :type reconciliation: str :param grouping_ids: dict of series_id: group_id to use if grouping is ‘User’ :type grouping_ids: dict

fit(df)¶: Construct and save object info.

reconcile(df)¶: Apply to forecasted data containing bottom and top levels.

transform(df)¶: Apply hierarchy to existing data with bottom levels only.

autots.tools.holiday module¶

Manage holiday features.

autots.tools.holiday.holiday_flag(DTindex, country: str = 'US', encode_holiday_type: bool = False, holidays_subdiv=None)¶

Create a 0/1 flag for given datetime index. Includes fallback to pandas for US holidays if holidays package unavailable.

Parameters:

DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays also accepts a list of countries, but not a list of subdivisions
encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays
holidays_subdiv (str) – subdivision (ie state), if used

Returns:

pd.DataFrame with DatetimeIndex

autots.tools.holiday.query_holidays(DTindex, country: str, encode_holiday_type: bool = False, holidays_subdiv=None)¶

Query holidays package for dates.

Parameters:

DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays
encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays

autots.tools.impute module¶

Fill NA.

autots.tools.impute.FillNA(df, method: str = 'ffill', window: int = 10)¶

Fill NA values using different methods.

Parameters:

method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps also most method values of pd.DataFrame.interpolate()
window (int) – length of rolling windows for filling na, for rolling methods

class autots.tools.impute.SeasonalityMotifImputer(k: int = 3, datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False)¶

Bases: object

impute(df)¶: Infer missing values on input df.

class autots.tools.impute.SimpleSeasonalityMotifImputer(datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False, max_iter: int = 100)¶

Bases: object

impute(df)¶: Infer missing values on input df.

autots.tools.impute.biased_ffill(df, mean_weight: float = 1)¶: Fill NaN with average of last value and mean.

autots.tools.impute.fake_date_fill(df, back_method: str = 'slice')¶

Numpy vectorized version. Return a dataframe where na values are removed and values shifted forward.

Warning

Thus, values will have incorrect timestamps!

Parameters:: back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na

autots.tools.impute.fake_date_fill_old(df, back_method: str = 'slice')¶

Return a dataframe where na values are removed and values shifted forward.

Warning

Thus, values will have incorrect timestamps!

Parameters:: back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na

autots.tools.impute.fill_forward(df)¶: Fill NaN with previous values.

autots.tools.impute.fill_forward_alt(df)¶: Fill NaN with previous values.

autots.tools.impute.fill_mean(df)¶

autots.tools.impute.fill_mean_old(df)¶: Fill NaN with mean.

autots.tools.impute.fill_median(df)¶: Fill nan with median values. Does not work with non-numeric types.

autots.tools.impute.fill_median_old(df)¶: Fill NaN with median.

autots.tools.impute.fill_one(df)¶: Fill NaN with zero.

autots.tools.impute.fill_zero(df)¶: Fill NaN with zero.

autots.tools.impute.fillna_np(array, values)¶

autots.tools.impute.rolling_mean(df, window: int = 10)¶: Fill NaN with mean of last window values.

autots.tools.kalman module¶

Created on Wed Feb 5 16:29:58 2025

autots.tools.kalman.kalman_fusion_forecasts(F: ndarray, index, columns, coverage: float = 0.9, method: str = 'multi_series', Q_init=0.1, R_init=1.0, adapt_Q: str | None = None, adapt_R: str | None = None, initial_x=None, initial_P=None, min_std=1e-15, scale: bool = True, a: float = 1.0)¶

Fuse multiple forecasts using a Kalman Filter for each forecast step.

Parameters:

F (np.ndarray) – Forecasts of shape (n, S, T) - n = number of models - S = forecast length (time steps) - T = number of time series
index (pd.Index) – Index for the returned DataFrames (length=S).
columns (list-like) – Column names for the returned DataFrames (length=T).
coverage (float) – e.g. 0.90 => ~1.645 standard deviations for intervals.
method (str) – Either “multi_series” or “per_series”. - “multi_series”: single (T-dim) Kalman Filter for all T series together - “per_series”: run T separate 1D Kalman Filters, one per series
Q_init (float or np.ndarray) – Base process noise (or matrix).
R_init (float or np.ndarray) – Base measurement noise (or matrix).
adapt_Q (str) – If “spread”, adapt Q each step based on model spread.
adapt_R (str) – If “spread”, adapt R each step based on model spread.
initial_x (np.ndarray) – initial state guess. - For multi_series: shape (T,) - For per_series: shape (T,)
initial_P (np.ndarray) – initial covariance guess. - For multi_series: shape (T, T) - For per_series: shape (T,)
min_std (float) – Small floor value for numerical stability in standard deviations.
scale (bool) – If True, each series is scaled (by its standard deviation) before filtering. The fused forecasts are converted back to the original scale.

Returns:

Kalman-fused point forecasts, shape (S, T). df_lower (pd.DataFrame): Lower bound, shape (S, T). df_upper (pd.DataFrame): Upper bound, shape (S, T).

Return type:

df_point (pd.DataFrame)

Notes

In “multi_series” mode the state is a T-dimensional vector that is updated with an (n*T)-dimensional measurement (the stacked forecasts from the n models).
In “per_series” mode the algorithm runs T separate 1D Kalman Filters (one per series). In that case, for each forecast step (loop over S) you update each of the T filters independently.
When scale=True, each series is divided by its standard deviation (computed over all forecasts) before filtering; after filtering the outputs are multiplied by the same scale so that the returned forecasts are in the original feature space.

autots.tools.kalman.robust_pinv(M, rcond=1e-15, reg=1e-08)¶

autots.tools.lunar module¶

Phases of the moon. Modified from https://stackoverflow.com/a/2531541/9492254 by keturn and earlier from John Walker

autots.tools.lunar.dcos(d)¶

autots.tools.lunar.dsin(d)¶

autots.tools.lunar.fixangle(a)¶

autots.tools.lunar.kepler(m, ecc=0.016718)¶: Solve the equation of Kepler.

autots.tools.lunar.moon_phase(datetime_index, epsilon=1e-06, epoch=2444237.905, ecliptic_longitude_epoch=278.83354, ecliptic_longitude_perigee=282.596403, eccentricity=0.016718, moon_mean_longitude_epoch=64.975464, moon_mean_perigee_epoch=349.383063)¶: Numpy version. Takes a pd.DatetimeIndex and returns moon phase (%illuminated). Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. This is for US. epoch=2444238.5 for Asia generally.

autots.tools.lunar.moon_phase_df(datetime_index, epoch=2444237.905)¶: Convert pandas DatetimeIndex to moon phases. Note timezone and hour can matter slightly. Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. 2444237.905 is for US Central. epoch=2444238.5 for Asia generally.

autots.tools.lunar.phase_string(p, precision=0.05, new=0.0, first=0.25, full=0.4, last=0.75, nextnew=1.0)¶

autots.tools.lunar.todeg(r)¶

autots.tools.lunar.torad(d)¶

autots.tools.percentile module¶

Faster percentile and quantile for numpy

Entirely from: https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/

autots.tools.percentile.nan_percentile(in_arr, q, method='linear', axis=0, errors='raise')¶: Given a 3D array, return the given percentiles as input by q. Beware this is only tested for the limited case required here, and will not match np fully. Args more limited. If errors=”rollover” passes to np.nanpercentile where args are not supported.

autots.tools.percentile.nan_quantile(arr, q, method='linear', axis=0, errors='raise')¶: Same as nan_percentile but accepts q in range [0, 1]. Args more limited. If errors=”rollover” passes to np.nanpercentile where not supported.

autots.tools.percentile.trimmed_mean(data, percent, axis=0)¶

autots.tools.probabilistic module¶

Point to Probabilistic

autots.tools.probabilistic.Point_to_Probability(train, forecast, prediction_interval=0.9, method: str = 'historic_quantile')¶

Data driven placeholder for model error estimation.

Catlin Point to Probability method (‘a mixture of dark magic and gum disease’)

Parameters:

train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
prediction_interval (float) – confidence or perhaps credible interval
method (str) – spell to cast to create dark magic. ‘historic_quantile’, ‘inferred_normal’, ‘variable_pct_change’ gum disease available separately upon request.

Returns:

upper_error, lower_error (two pandas.DataFrames for upper and lower bound respectively)

autots.tools.probabilistic.Variable_Point_to_Probability(train, forecast, alpha=0.3, beta=1)¶

Data driven placeholder for model error estimation.

ErrorRange = beta * (En + alpha * En-1 [cum sum of En]) En = abs(0.5 - QTP) * D D = abs(Xn - ((Avg % Change of Train * Xn-1) + Xn-1)) Xn = Forecast Value QTP = Percentile of Score in All Percent Changes of Train Score = Percent Change (from Xn-1 to Xn)

Parameters:

train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
alpha (float) – parameter which effects the broadening of error range over time Usually 0 < alpha < 1 (although it can be larger than 1)
beta (float) – parameter which effects the general width of the error bar Usually 0 < beta < 1 (although it can be larger than 1)

Returns:

error width for each value of forecast.

Return type:

ErrorRange (pandas.DataFrame)

autots.tools.probabilistic.historic_quantile(df_train, prediction_interval: float = 0.9, nan_flag=None)¶

Computes the difference between the median and the prediction interval range in historic data.

Parameters:

df_train (pd.DataFrame) – a dataframe of training data
prediction_interval (float) – the desired forecast interval range

Returns:

two 1D arrays

Return type:

lower, upper (np.array)

autots.tools.probabilistic.inferred_normal(train, forecast, n: int = 5, prediction_interval: float = 0.9)¶: A corruption of Bayes theorem. It will be sensitive to the transformations of the data.

autots.tools.probabilistic.percentileofscore_appliable(x, a, kind='rank')¶

autots.tools.profile module¶

Profiling

autots.tools.profile.data_profile(df)¶

Legacy profiler. Input: a pd DataFrame of columns which are time series, and a datetime index

Output: a pd DataFrame of column per time series, with rows which are statistics

autots.tools.profile.profile_time_series(df, adi_threshold=1.3, cvar_threshold=0.5, flat_threshold=0.92, new_product_threshold='auto', seasonal_threshold=0.5)¶

Profiles time series data into categories:: smooth, intermittent, erratic, lumpy, flat, new_product

Parameters:

df (pd.DataFrame) – Wide format DataFrame with datetime index and each column as a time series.
new_product_threshold (float) – one of the more finiky thresholds, percent of null or zero data from beginning to declare new product
new_product_correct (bool) – use dt index to correct

Returns:

DataFrame with ‘SERIES’ and ‘DEMAND_PROFILE’ columns.

Return type:

pd.DataFrame

autots.tools.profile.summarize_series(df)¶

Summarize time series data.

Parameters:: df (pd.DataFrame) – wide style data with datetimeindex

autots.tools.regressor module¶

autots.tools.regressor.create_lagged_regressor(df, forecast_length: int, frequency: str = 'infer', scale: bool = True, summarize: str | None = None, backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill')¶

Create a regressor of features lagged by forecast length. Useful to some models that don’t otherwise use such information.

It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:]

Parameters:

df (pd.DataFrame) – training data
forecast_length (int) – length of forecasts, to shift data by
frequency (str) – the ever necessary frequency for datetime things. Default ‘infer’
scale (bool) – if True, use the StandardScaler to standardize the features
summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’, ‘gaussian_random_projection’, “auto”
backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression
fill_na (str) – method to prefill NAs in data, same methods as available elsewhere

Returns:

regressor_train, regressor_forecast

autots.tools.regressor.create_regressor(df, forecast_length, frequency: str = 'infer', holiday_countries: list = ['US'], datepart_method: str = 'simple_binarized', drop_most_recent: int = 0, scale: bool = True, summarize: str = 'auto', backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill', aggfunc: str = 'first', encode_holiday_type=False, holiday_detector_params={'anomaly_detector_params': {'forecast_params': None, 'method': 'mad', 'method_params': {'alpha': 0.05, 'distribution': 'gamma'}, 'transform_dict': {'fillna': None, 'transformation_params': {'0': {}}, 'transformations': {'0': 'DifferencedTransformer'}}}, 'output': 'univariate', 'splash_threshold': None, 'threshold': 0.8, 'use_dayofmonth_holidays': True, 'use_hebrew_holidays': False, 'use_islamic_holidays': False, 'use_lunar_holidays': False, 'use_lunar_weekday': False, 'use_wkdeom_holidays': False, 'use_wkdom_holidays': True}, holiday_regr_style: str = 'flag', preprocessing_params: dict | None = None)¶

Create a regressor from information available in the existing dataset. Components: are lagged data, datepart information, and holiday.

This function has been confusing people. This is NOT necessary for machine learning models, in AutoTS they internally create more elaborate feature sets separately. This instead may help some other models (GLM, ARIMA) which accept regressors but won’t build a regressor feature set internally. And this allows post-hoc customization as needed before input to AutoTS.

It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:] If you don’t want the lagged features, set summarize=”median” which will only give one column of such, which can then be easily dropped

Parameters:

df (pd.DataFrame) – WIDE style dataframe (use long_to_wide if the data isn’t already) categorical series will be discard for this, if present
forecast_length (int) – time ahead that will be forecast
frequency (str) – those annoying offset codes you have to always use for time series
holiday_countries (list) – list of countries to pull holidays for. Reqs holidays pkg also can be a dict of {‘country’: “subdiv”} to include subdivision (state)
datepart_method (str) – see date_part from seasonal
scale (bool) – if True, use the StandardScaler to standardize the features
summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’, ‘gaussian_random_projection’
backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression
fill_na (str) – method to prefill NAs in data, same methods as available elsewhere
aggfunc (str) – str or func, used if frequency is resampled
encode_holiday_type (bool) – if True, returns column per holiday, ONLY for holidays package country holidays (not Detector)
holiday_detector_params (dict) – passed to HolidayDetector, or None
holiday_regr_style (str) – passed to detector’s dates_to_holidays ‘flag’, ‘series_flag’, ‘impact’
preprocessing_params (dict) – GeneralTransformer params to be applied before regressor creation

Returns:

regressor_train, regressor_forecast

autots.tools.seasonal module¶

seasonal

@author: Colin

autots.tools.seasonal.changepoint_fcst_from_last_row(x_t_last_row, n_forecast=10)¶

autots.tools.seasonal.create_changepoint_features(DTindex, changepoint_spacing=60, changepoint_distance_end=120)¶

Creates a feature set for estimating trend changepoints using linear regression, ensuring the final changepoint is at changepoint_distance_end from the last row.

Parameters: DTindex (pd.DatetimeIndex): a datetimeindex changepoint_spacing (int): Distance between consecutive changepoints. changepoint_distance_end (int): Number of rows that belong to the final changepoint.

Returns: pd.DataFrame: DataFrame containing changepoint features for linear regression.

autots.tools.seasonal.create_datepart_components(DTindex, seasonality)¶: single date part one-hot flags.

autots.tools.seasonal.create_seasonality_feature(DTindex, t, seasonality, history_days=None)¶: Cassandra-designed feature generator.

autots.tools.seasonal.date_part(DTindex, method: str = 'simple', set_index: bool = True, polynomial_degree: int | None = None, holiday_country: str | None = None, holiday_countries_used: bool = True, lags: int | None = None, forward_lags: int | None = None)¶

Create date part columns from pd.DatetimeIndex.

If you date_part isn’t recognized, you will see a [‘year’, ‘month’ ‘day’, ‘weekday’] output

Parameters:

DTindex (pd.DatetimeIndex) – datetime index to provide dates
method (str) – expanded, recurring, or simple simple - just day, year, month, weekday expanded - all available futures recurring - all features that should commonly repeat without aging simple_2 simple_3 simple_binarized expanded_binarized common_fourier
set_index (bool) – if True, return DTindex as index of df
polynomial_degree (int) – add this degree of sklearn polynomial features if not None
holdiay_country (list or str) – names of countries to pull calendar holidays for
holiday_countries_used (bool) – to use holiday_country if given
lags (int) – if not None, include the past N previous index date parts
forward_lags (int) – if not None, include the future N index date parts

Returns:

pd.Dataframe with DTindex

autots.tools.seasonal.fourier_df(DTindex, seasonality, order=10, t=None, history_days=None)¶

autots.tools.seasonal.fourier_series(t, p=365.25, n=10)¶

autots.tools.seasonal.half_yr_spacing(df)¶

autots.tools.seasonal.random_datepart(method='random')¶: New random parameters for seasonality.

autots.tools.seasonal.seasonal_independent_match(DTindex, DTindex_future, k, datepart_method='simple_binarized', distance_metric='canberra', full_sort=False, nan_array=None)¶

autots.tools.seasonal.seasonal_int(include_one: bool = False, small=False, very_small=False)¶

Generate a random integer of typical seasonalities.

Parameters:

include_one (bool) – whether to include 1 in output options
small (bool) – if True, keep below 364
very_small (bool) – if True keep below 30

autots.tools.seasonal.seasonal_repeating_wavelet(DTindex, p, order=12, sigma=4.0, wavelet_type='morlet')¶

autots.tools.seasonal.seasonal_window_match(DTindex, k, window_size=10, forecast_length=None, datepart_method='common_fourier_rw', distance_metric='mae', full_sort=False)¶

autots.tools.shaping module¶

Reshape data.

class autots.tools.shaping.NumericTransformer(na_strings: list = ['', ' '], categorical_fillna: str = 'ffill', handle_unknown: str = 'use_encoded_value', downcast: str | None = None, verbose: int = 0)¶

Bases: object

General purpose numeric conversion for pandas dataframes.

All categorical data and levels must be passed to .fit(). If new categorical series or levels are present in .transform() it won’t work!

Currently datetimes cannot be inverse_transformed back to datetime

Parameters:

na_strings (list) – list of strings to replace as pd.NA
categorical_fillna (str) – how to fill NaN for categorical variables (numeric NaN are unaltered) “ffill” - uses forward and backward filling to supply na values “indicator” or anything else currently results in all missing replaced with str “missing_value”
handle_unknown (str) – passed through to scikit-learn OrdinalEncoder
downcast (str) – passed to pd.to_numeric, use None or ‘float’
verbose (int) – greater than 0 to print some messages

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

inverse_transform(df, convert_dtypes: bool = False)¶: Convert numeric back to categorical. :param df: df :type df: pandas.DataFrame :param convert_dtypes: whether to use pd.convert_dtypes after inverse :type convert_dtypes: bool

transform(df)¶: Convert categorical dataset to numeric.

autots.tools.shaping.clean_weights(weights, series, verbose=0)¶

Polish up series weighting information

Parameters:

weights (dict) – dictionary of series_id: weight (float or int)
series (iterable) – list of series_ids in the dataset

autots.tools.shaping.df_cleanup(df_wide, frequency: str = 'infer', prefill_na: str | None = None, na_tolerance: float = 0.999, drop_data_older_than_periods: int = 100000, drop_most_recent: int = 0, aggfunc: str = 'first', verbose: int = 1)¶

Pass cleaning functions through to dataframe.

Parameters:

df_wide (pd.DataFrame) – input dataframe to clean.
frequency (str, optional) – frequency in string of alias for DateOffset object, normally “1D” -daily, “MS” -month start etc. Currently, aliases are listed somewhere in here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html. Defaults to “infer”.
na_tolerance (float, optional) – allow up to this percent of values to be NaN, else drop the entire series. The default of 0.95 means a series can be 95% NaN values and still be included. Defaults to 0.999.
drop_data_older_than_periods (int, optional) – cut off older data because eventually you just get too much. Defaults to 100000.
drop_most_recent (int, optional) – number of most recent data points to remove. Useful if you pull monthly data before month end, and you don’t want an incomplete month appearing complete. Defaults to 0.
aggfunc (str, optional) – passed to pd.pivot_table, determines how to aggregate duplicates for upsampling. Other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. If categorical data is provided, aggfunc=’first’ is recommended. Defaults to ‘first’.
verbose (int, optional) – 0 for silence, higher values for more noise. Defaults to 1.

Returns:

original dataframe, now possibly shorter.

Return type:

pd.DataFrame

autots.tools.shaping.freq_to_timedelta(freq)¶: Working around pandas limitations.

autots.tools.shaping.infer_frequency(df_wide, warn=True, **kwargs)¶

Infer the frequency in a slightly more robust way.

Parameters:

df_wide (pd.Dataframe or pd.DatetimeIndex) – input to pull frequency from
warn (bool) – unused, here to make swappable with pd.infer_freq

autots.tools.shaping.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', aggfunc: str = 'first')¶

Take long data and convert into wide, cleaner data.

Parameters:

df (pd.DataFrame) –
date_col (str) –
value_col (str) –
- the name of the column with the values of the time series (ie sales $)
id_col (str) –
- name of the id column, unique for each time series
aggfunc (str) –
- passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime
other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended

autots.tools.shaping.simple_train_test_split(df, forecast_length: int = 10, min_allowed_train_percent: float = 0.3, verbose: int = 1)¶

Uses the last periods of forecast_length as the test set, the rest as train

Parameters:

forecast_length (int) – number of future periods to predict
min_allowed_train_percent (float) –
- forecast length cannot be greater than 1 - this
constrains the forecast length from being much larger than than the training data note this includes NaNs in current configuration

Returns:

train, test (both pd DataFrames)

autots.tools.shaping.split_digits_and_non_digits(s)¶

autots.tools.shaping.subset_series(df, weights, n: int = 1000, random_state: int = 2020)¶

Return a sample of time series.

Parameters:

df (pd.DataFrame) – wide df with series as columns and DT index
n (int) – number of unique time series to keep, or None
random_state (int) – random seed

autots.tools.shaping.wide_to_3d(wide_arr, seasonality=7, output_shape='gst')¶

Generates 3d (groups/seasonality, series, time steps) from wide (time step, series) numpy array.

Parameters:

wide_arr (np.array) – wide style (timesteps, series) numpy time series
seasonality (int) – seasonality of the series to use, avoid really large values
output_shape (str) – either ‘gst’ or ‘sgt’ which is output shape gst: (groups/seasonality, series, time steps) sgt: (series, groups/seasonality, time steps)

autots.tools.thresholding module¶

Created on Thu Jul 7 10:27:46 2022

@author: Colin

class autots.tools.thresholding.NonparametricThreshold(data, warmup_pts: int = 1, p=0.1, error_buffer=1, z_init=2.5, z_limit=12.0, z_step=0.5, max_contamination=0.25, mean_weight: float = 10, sd_weight: float = 10, anomaly_count_weight: float = 1, inverse: bool = False)¶

Bases: object

compare_to_epsilon(inverse=False)¶

Compare smoothed error values to epsilon (error threshold) and group consecutive errors together into sequences.

Parameters:

errors_all (obj) – Errors class object containing list of all
set (previously identified anomalies in test) –

find_epsilon(inverse=False)¶

Find the anomaly threshold that maximizes function representing tradeoff between:

number of anomalies and anomalous ranges

b) the reduction in mean and st dev if anomalous points are removed from errors

(see https://arxiv.org/pdf/1802.04431.pdf)

Parameters:: inverse (bool) – If true, epsilon is calculated for inverted errors

prune_anoms(inverse=False)¶

Remove anomalies that don’t meet minimum separation from the next closest anomaly or error value

Parameters:: inverse (bool) – If true, epsilon is calculated for inverted errors

score_anomalies()¶: Calculate anomaly scores based on max distance from epsilon for each anomalous sequence.

autots.tools.thresholding.consecutive_groups(iterable, ordering=<function <lambda>>)¶

Yield groups of consecutive items using itertools.groupby().

From more_itertools package, see description there for details (circa mid 2022)

autots.tools.thresholding.nonparametric(series, method_params)¶

autots.tools.transform module¶

Preprocessing data methods.

class autots.tools.transform.AlignLastDiff(rows: int = 1, quantile: float = 0.5, decay_span: float | None = None, displacement_rows: int = 1, **kwargs)¶

Bases: EmptyTransformer

Shift all data relative to the last value(s) of the series. This version aligns based on historic diffs rather than direct values.

Parameters:

rows (int) – number of rows to average as diff history. rows=1 rather different from others
quantile (float) – quantile of historic diffs to use as allowed [0, 1]
decay_span (int) – span of exponential decay which softens adjustment to no adjustment

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶

Return data to original or forecast form.

Parameters:

df (pandas.DataFrame) – input dataframe
adjustment (float) – size of shift, utilized for adjusting the upper and lower bounds to match point forecast

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.AlignLastValue(rows: int = 1, lag: int = 1, method: str = 'additive', strength: float = 1.0, first_value_only: bool = False, threshold: int | None = None, threshold_method: str = 'max', **kwargs)¶

Bases: EmptyTransformer

Shift all data relative to the last value(s) of the series.

Parameters:

rows (int) – number of rows to average as last record
lag (int) – use last value as this lag back, 1 is no shift, 2 is lag one from end, ie second to last
method (str) – ‘additive’, ‘multiplicative’
strength (float) – softening parameter [0, 1], 1.0 for full difference
threshold (float) – if below this threshold then the shift is not applied

static find_centerpoint(df, rows, lag)¶

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶

Return data to original or forecast form.

Parameters:

df (pandas.DataFrame) – input dataframe
adjustment (float) – size of shift, utilized for adjusting the upper and lower bounds to match point forecast

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.AnomalyRemoval(output='multivariate', method='zscore', transform_dict={'transformation_params': {0: {'datepart_method': 'simple_3', 'regression_model': {'model': 'ElasticNet', 'model_params': {}}}}, 'transformations': {0: 'DatepartRegression'}}, method_params={}, fillna=None, isolated_only=False, on_inverse=False, n_jobs=1)¶

Bases: EmptyTransformer

fit(df)¶

All will return -1 for anomalies.

Parameters:: df (pd.DataFrame) – pandas wide-style data
Returns:: pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)

fit_anomaly_classifier()¶: Fit a model to predict if a score is an anomaly.

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method='random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

score_to_anomaly(scores)¶: A DecisionTree model, used as models are nonstandard (and nonparametric).

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.BKBandpassFilter(low: int = 6, high: int = 32, K: int = 1, lanczos_factor: int = False, return_diff: int = True, on_transform: bool = True, on_inverse: bool = False, **kwargs)¶

Bases: EmptyTransformer

More complete implentation of Baxter King Bandpass Filter based off the successful but somewhat confusing statmodelsfilter transformer.

Parameters:: window (int) – or None, the most recent n history to use for alignment

filter(df)¶

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.BTCD(regression_model: dict = {'model': 'LinearRegression', 'model_params': {}}, max_lags: int = 1, name: str = 'BTCD', **kwargs)¶

Bases: EmptyTransformer

Box and Tiao Canonical Decomposition.

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original space.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.CenterLastValue(rows: int = 1, **kwargs)¶

Bases: EmptyTransformer

Scale all data relative to the last value(s) of the series.

Parameters:: rows (int) – number of rows to average from most recent data

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.CenterSplit(center: str = 'zero', fillna='linear', suffix: str = '_lltmicro', **kwargs)¶

Bases: EmptyTransformer

Vaguely Croston inspired approach separating occurrence from magnitude.

Parameters:

center (str) – ‘zero’ or ‘median’, the value to use as most the intermittent gap
fillna (str) – a fillna method, see standard fillna methods

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.ChangepointDetrend(model: str = 'Linear', changepoint_spacing: int = 60, changepoint_distance_end: int = 120, datepart_method: str | None = None, **kwargs)¶

Bases: Detrend

Remove trend using changepoint features linked to a specific datetime origin.

fit(df)¶

Fits trend for later detrending using changepoint features.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fit and return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df)¶

Return data to original form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.ClipOutliers(method: str = 'clip', std_threshold: float = 4, fillna: str | None = None, **kwargs)¶

Bases: EmptyTransformer

PURGE THE OUTLIERS.

Parameters:

method (str) – “clip” or “remove”
std_threshold (float) – number of std devs from mean to call an outlier
fillna (str) – fillna method to use per tools.impute.FillNA

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.Cointegration(det_order: int = -1, k_ar_diff: int = 1, name: str = 'Cointegration', **kwargs)¶

Bases: EmptyTransformer

Johansen Cointegration Decomposition.

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original space.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.Constraint(constraint_method: int = 'historic_growth', constraint_value: int = 1.0, constraint_direction: str = 'upper', constraint_regularization: int = 1.0, forecast_length: int | None = None, bounds_only: bool = False, fillna: str | None = None, **kwargs)¶

Bases: EmptyTransformer

Apply constraints (caps on values based on history).

See base.py constraints function for argument documentation

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.CumSumTransformer(**kwargs)¶

Bases: EmptyTransformer

Cumulative Sum of Data.

Warning

Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast

fit(df)¶

Fits.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶: Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')¶

Returns data to original or forecast form

Parameters:

df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)¶: Returns changed data :param df: input dataframe :type df: pandas.DataFrame

autots.tools.transform.DatepartRegression¶: alias of DatepartRegressionTransformer

class autots.tools.transform.DatepartRegressionTransformer(regression_model: dict = {'model': 'DecisionTree', 'model_params': {'max_depth': 5, 'min_samples_split': 2}}, datepart_method: str = 'expanded', polynomial_degree: int | None = None, transform_dict: dict | None = None, holiday_country: list | None = None, holiday_countries_used: bool = False, lags: int | None = None, forward_lags: int | None = None, n_jobs: int = 1, **kwargs)¶

Bases: EmptyTransformer

Remove a regression on datepart from the data. See tools.seasonal.date_part

fit(df, regressor=None)¶

Fits trend for later detrending.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df, regressor=None)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random', holiday_countries_used=None)¶: Generate new random parameters

impute(df, regressor=None)¶: Fill Missing. Needs to have same general pattern of missingness (full rows of NaN only or scattered NaN) as was present during .fit()

inverse_transform(df, regressor=None)¶

Return data to original form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df, regressor=None)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.Detrend(model: str = 'GLS', phi: float = 1.0, window: int | None = None, transform_dict=None, **kwargs)¶

Bases: EmptyTransformer

Remove a linear trend from the data.

fit(df)¶

Fits trend for later detrending.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df)¶

Return data to original form. Will only match original if phi==1

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.DiffSmoother(output='multivariate', method=None, transform_dict=None, method_params=None, fillna=2.0, n_jobs=1, adjustment: int = 2, reverse_alignment=True, isolated_only=False)¶

Bases: EmptyTransformer

fit(df)¶: Fit. :param df: input dataframe :type df: pandas.DataFrame

fit_transform(df)¶: Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

static get_new_params(method='fast')¶: Generate new random parameters

transform(df)¶

Return differenced data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.DifferencedTransformer(lag=1, fill='bfill')¶

Bases: object

Difference from lag n value. inverse_transform can only be applied to the original series, or an immediately following forecast.

Parameters:

lag (int) – number of periods to shift.
fill (str) – method to fill NaN values created by differencing, options: ‘bfill’, ‘zero’.

fit(df)¶: Fit. :param df: input dataframe. :type df: pandas.DataFrame

fit_transform(df)¶: Fits and returns differenced DataFrame. :param df: input dataframe. :type df: pandas.DataFrame

static get_new_params(method: str = 'random')¶

inverse_transform(df, trans_method='forecast')¶

Returns data to original or forecast form

Parameters:

df (pandas.DataFrame) – input dataframe.
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original.

transform(df)¶

Return differenced data.

Parameters:: df (pandas.DataFrame) – input dataframe.

class autots.tools.transform.Discretize(discretization: str = 'center', n_bins: int = 10, nan_flag=False, **kwargs)¶

Bases: EmptyTransformer

Round/convert data to bins.

Parameters:

discretization (str) – method of binning to apply None - no discretization ‘center’ - values are rounded to center value of each bin ‘lower’ - values are rounded to lower range of closest bin ‘upper’ - values are rounded up to upper edge of closest bin ‘sklearn-quantile’, ‘sklearn-uniform’, ‘sklearn-kmeans’ - sklearn kbins discretizer
n_bins (int) – number of bins to group data into.
nan_flag (bool) – set to True if this has to run on NaN values

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.EWMAFilter(span: int = 7, **kwargs)¶

Bases: EmptyTransformer

Irreversible filters of Exponential Weighted Moving Average

Parameters:: span (int) – span of exponetial period to convert to alpha

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.EmptyTransformer(name: str = 'EmptyTransformer', **kwargs)¶

Bases: object

Base transformer returning raw data.

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.FFTDecomposition(n_harmonics: float = 0.1, detrend: str = 'linear', **kwargs)¶

Bases: EmptyTransformer

FFT decomposition, then removal, then extrapolation and addition.

Parameters:

n_harmnonics (float) – number of frequencies to include
detrend (str) – None, ‘linear’, or ‘quadratic’

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.FFTFilter(cutoff: float = 0.1, reverse: bool = False, on_transform: bool = True, on_inverse: bool = False, **kwargs)¶

Bases: EmptyTransformer

Fit Fourier Transform and keep only lowest frequencies below cutoff

Parameters:

cutoff (float) – smoothing value
reverse (bool) – if True, keep highest frequencies only

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.FIRFilter(sampling_frequency: int = 365, numtaps: int = 512, cutoff_hz: float = 30, window: str = 'hamming', on_transform: bool = True, on_inverse: bool = False, bounds_only: bool = False, **kwargs)¶

Bases: EmptyTransformer

Scipy firwin

filter(df)¶

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.FastICA(**kwargs)¶

Bases: EmptyTransformer

sklearn FastICA for signal decomposition. But need to store columns.

Parameters:: span (int) – span of exponetial period to convert to alpha

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.GeneralTransformer(fillna: str | None = None, transformations: dict = {}, transformation_params: dict = {}, grouping: str | None = None, reconciliation: str | None = None, grouping_ids=None, random_seed: int = 2020, n_jobs: int = 1, holiday_country: list | None = None, verbose: int = 0, forecast_length: int = 90)¶

Bases: object

Remove fillNA and then mathematical transformations.

Expects a chronologically sorted pandas.DataFrame with a DatetimeIndex, only numeric data, and a ‘wide’ (one column per series) shape.

Warning

inverse_transform will not fully return the original data under many conditions
- the primary intention of inverse_transform is to inverse for forecast (immediately following the historical time period) data from models, not to return original data
- NAs filled will be returned with the filled value
- Discretization, statsmodels filters, Round, Slice, ClipOutliers cannot be inversed
- RollingMean, PctChange, CumSum, Seasonal Difference, and DifferencedTransformer will only return original or an immediately following forecast
  
  by default ‘forecast’ is expected, ‘original’ can be set in trans_method

Parameters:

fillNA (str) –
- method to fill NA, passed through to FillNA()
’ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling_mean’ - fill with last n (window = 10) values ‘rolling_mean_24’ - fill with avg of last 24 ‘ffill_mean_biased’ - simple avg of ffill and mean ‘fake_date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘IterativeImputer’ - sklearn iterative imputer most of the interpolate methods from pandas.interpolate
transformations (dict) –
- transformations to apply {0: “MinMaxScaler”, 1: “Detrend”, …}
’None’ ‘MinMaxScaler’ - Sklearn MinMaxScaler ‘PowerTransformer’ - Sklearn PowerTransformer ‘QuantileTransformer’ - Sklearn ‘MaxAbsScaler’ - Sklearn ‘StandardScaler’ - Sklearn ‘RobustScaler’ - Sklearn ‘PCA, ‘FastICA’ - performs sklearn decomposition and returns n-cols worth of n_components ‘Detrend’ - fit then remove a linear regression from the data ‘RollingMeanTransformer’ - 10 period rolling average, can receive a custom window by transformation_param if used as second_transformation ‘FixedRollingMean’ - same as RollingMean, but with inverse_transform disabled, so smoothed forecasts are maintained. ‘RollingMean10’ - 10 period rolling average (smoothing) ‘RollingMean100thN’ - Rolling mean of periods of len(train)/100 (minimum 2) ‘DifferencedTransformer’ - makes each value the difference of that value and the previous value ‘PctChangeTransformer’ - converts to pct_change, not recommended if lots of zeroes in data ‘SinTrend’ - removes a sin trend (fitted to each column) from the data ‘CumSumTransformer’ - makes value sum of all previous ‘PositiveShift’ - makes all values >= 1 ‘Log’ - log transform (uses PositiveShift first as necessary) ‘IntermittentOccurrence’ - -1, 1 for non median values ‘SeasonalDifference’ - remove the last lag values from all values ‘SeasonalDifferenceMean’ - remove the average lag values from all ‘SeasonalDifference7’,’12’,’28’ - non-parameterized version of Seasonal ‘CenterLastValue’ - center data around tail of dataset ‘Round’ - round values on inverse or transform ‘Slice’ - use only recent records ‘ClipOutliers’ - simple remove outliers ‘Discretize’ - bin or round data into groups ‘DatepartRegression’ - move a trend trained on datetime index “ScipyFilter” - filter data (lose information but smoother!) from scipy “HPFilter” - statsmodels hp_filter “STLFilter” - seasonal decompose and keep just one part of decomposition “EWMAFilter” - use an exponential weighted moving average to smooth data “MeanDifference” - joint version of differencing “Cointegration” - VECM but just the vectors “BTCD” - Box Tiao decomposition ‘AlignLastValue’: align forecast start to end of training data ‘AnomalyRemoval’: more tailored anomaly removal options ‘HolidayTransformer’: detects holidays and wishes good cheer to all ‘LocalLinearTrend’: rolling local trend, using tails for future and past trend ‘KalmanSmoothing’: smooth using a state space model ‘RegressionFilter’: fit seasonal removal and local linear trend, clip std devs away from this fit ‘LevelShiftTransformer’: automatically compensate for historic level shifts in data. ‘CenterSplit’: Croston inspired magnitude/occurrence split for intermittent “FFTFilter”: filter using a fast fourier transform “FFTDecomposition”: remove FFT harmonics, later add back “ReplaceConstant”: replace a value with NaN, optionally fillna then later reintroduce “AlignLastDiff”: shift forecast to be within range of historical diffs “DiffSmoother”: smooth diffs then return to original space “HistoricValues”: match predictions to most similar historic value and overwrite “BKBandpassFilter”: another version of the Baxter King bandpass filter “Constraint”: apply constraints (caps) on values “FIRFilter”: apply a FIR filter (firwin) “ShiftFirstValue”: similar to positive shift but uses the first values as the basis of zero “ThetaTransformer”: decomposes into theta lines, then recombines “ChangepointDetrend”: detrend but with changepoints, and seasonality thrown in for fun “MeanPercentSplitter”: split data into rolling mean and percent of rolling mean “UpscaleDownscaleTransformer”: upscales and downscales
transformation_params (dict) – params of transformers {0: {}, 1: {‘model’: ‘Poisson’}, …} pass through dictionary of empty dictionaries to utilize defaults
random_seed (int) – random state passed through where applicable
forecast_length (int) – length of forecast, not needed as argument for most transformers/params

fill_na(df, window: int = 10)¶

Parameters:

df (pandas.DataFrame) – Datetime Indexed
window (int) – passed through to rolling mean fill technique

Returns:

pandas.DataFrame

fit(df)¶

Apply transformations and return transformer object.

Parameters:: df (pandas.DataFrame) – Datetime Indexed

fit_transform(df)¶: Directly fit and apply transformations to convert df.

static get_new_params(method='fast', **kwargs)¶

inverse_transform(df, trans_method: str = 'forecast', fillzero: bool = False, bounds: bool = False, start: int | None = None)¶

Undo the madness.

Parameters:

df (pandas.DataFrame) – Datetime Indexed
trans_method (str) – ‘forecast’ or ‘original’ passed through
fillzero (bool) – if inverse returns NaN, fill with zero
bounds (bool) – currently ignores AlignLastValue transform if True (also used in process_components of Cassandra)

classmethod retrieve_transformer(transformation: str | None = None, param: dict = {}, df=None, random_seed: int = 2020, n_jobs: int = 1, holiday_country: list | None = None, forecast_length: int = 90)¶

Retrieves a specific transformer object from a string.

Parameters:

df (pandas.DataFrame) – Datetime Indexed - required to set params for some transformers
transformation (str) – name of desired method
param (dict) – dict of kwargs to pass (legacy: an actual param)

Returns:

transformer object

transform(df)¶: Apply transformations to convert df.

class autots.tools.transform.HPFilter(part: str = 'trend', lamb: float = 1600, **kwargs)¶

Bases: EmptyTransformer

Irreversible filters.

Parameters:: lamb (int) – lambda for hpfilter

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.HistoricValues(window: int | None = None, **kwargs)¶

Bases: EmptyTransformer

Overwrite (align) all forecast values with the nearest actual value in window (tail) of history. (affected by upstream transformers, as usual)

Parameters:: window (int) – or None, the most recent n history to use for alignment

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.HolidayTransformer(anomaly_detector_params={}, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=True, use_lunar_holidays=True, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False, remove_excess_anomalies=True, impact=None, regression_params=None, n_jobs: int = 1, output='multivariate', verbose: int = 1)¶

Bases: EmptyTransformer

dates_to_holidays(dates, style='flag', holiday_impacts=False, max_features=365)¶: dates (pd.DatetimeIndex): list of dates style (str): option for how to return information

“long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar

fit(df)¶: Run holiday detection. Input wide-style pandas time series.

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method='random')¶: Generate new random parameters

inverse_transform(df)¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.IntermittentOccurrence(center: str = 'median', **kwargs)¶

Bases: EmptyTransformer

Intermittent inspired binning predicts probability of not center.

Does not inverse to original values!

Parameters:: center (str) – one of “mean”, “median”, “midhinge”

fit(df)¶

Fits shift interval.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df)¶

Return data to original form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

0 if Median. 1 if > Median, -1 if less.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.KalmanSmoothing(state_transition=[[1, 1], [0, 1]], process_noise=[[0.1, 0.0], [0.0, 0.01]], observation_model=[[1, 0]], observation_noise: float = 1.0, em_iter: int | None = None, on_transform: bool = True, on_inverse: bool = False, **kwargs)¶

Bases: EmptyTransformer

Apply a Kalman Filter to smooth data given a transition matrix.

Parameters:

rows (int) – number of rows to average as last record
lag (int) – use last value as this lag back, 1 is no shift, 2 is lag one from end, ie second to last
method (str) – ‘additive’, ‘multiplicative’
strength (float) – softening parameter [0, 1], 1.0 for full difference

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.LevelShiftMagic(window_size: int = 90, alpha: float = 2.5, grouping_forward_limit: int = 3, max_level_shifts: int = 20, alignment: str = 'average', old_way: bool = False, **kwargs)¶

Bases: EmptyTransformer

Detects and corrects for level shifts. May seriously alter trend.

Parameters:

method (str) – “clip” or “remove”
std_threshold (float) – number of std devs from mean to call an outlier
fillna (str) – fillna method to use per tools.impute.FillNA

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

autots.tools.transform.LevelShiftTransformer¶: alias of LevelShiftMagic

class autots.tools.transform.LocalLinearTrend(rolling_window: float = 0.1, n_future: float = 0.2, method: str = 'mean', macro_micro: bool = False, suffix: str = '_lltmicro', **kwargs)¶

Bases: EmptyTransformer

Remove a rolling linear trend. Note this will probably perform poorly with long forecast horizons as forecast trend is simply the tail (n_future) of data’s trend.

Parameters:

rolling_window (int) – width of window to take trend on
n_future (int) – amount of data for the trend to be used extending beyond the edges of history.
macro_micro (bool) – when True, splits the data into separate parts (trend and residual) and later combines together in inverse

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.MeanDifference(**kwargs)¶

Bases: EmptyTransformer

Difference from lag n value, but differenced by mean of all series. inverse_transform can only be applied to the original series, or an immediately following forecast

Parameters:: lag (int) – number of periods to shift (not implemented, default = 1)

fit(df)¶: Fit. :param df: input dataframe :type df: pandas.DataFrame

fit_transform(df)¶: Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')¶

Returns data to original or forecast form

Parameters:

df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)¶

Return differenced data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.MeanPercentSplitter(window=10, forecast_length=None, **kwargs)¶

Bases: EmptyTransformer

Splits data into rolling means and percentages. Designed to help with intermittent demand forecasting.

Parameters:

window (int or str) – Rolling window size. If ‘forecast_length’, uses forecast_length as window.
forecast_length (int) – The forecast length to use if window == ‘forecast_length’.

fit(df)¶

Fit the transformer to the data.

Parameters:: df (pandas.DataFrame) – Input DataFrame with pd.DatetimeIndex.

fit_transform(df)¶

Fit to data, then transform it.

Parameters:: df (pandas.DataFrame) – Input DataFrame with pd.DatetimeIndex.

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df)¶

Inverse transform the data back to original space.

Parameters:: df (pandas.DataFrame) – Transformed DataFrame with rolling means and percentages.

transform(df)¶

Transform the data by splitting into rolling means and percentages.

Parameters:: df (pandas.DataFrame) – Input DataFrame with pd.DatetimeIndex.

class autots.tools.transform.PCA(**kwargs)¶

Bases: EmptyTransformer

sklearn PCA for signal decomposition. But need to store columns.

Parameters:: span (int) – span of exponetial period to convert to alpha

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.PctChangeTransformer(**kwargs)¶

Bases: EmptyTransformer

% Change of Data.

Warning

Because % change doesn’t play well with zeroes, zeroes are replaced by positive of the lowest non-zero value. Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast

fit(df)¶

Fits.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶: Fit and Return Magical DataFrame. :param df: input dataframe :type df: pandas.DataFrame

inverse_transform(df, trans_method: str = 'forecast')¶

Returns data to original or forecast form

Parameters:

df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)¶: Returns changed data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.PositiveShift(log: bool = False, center_one: bool = True, squared=False, **kwargs)¶

Bases: EmptyTransformer

Shift each series if necessary to assure all values >= 1.

Parameters:

log (bool) – whether to include a log transform.
center_one (bool) – whether to shift to 1 instead of 0.
squared (bool) – whether to square (**2) values after shift.

fit(df)¶

Fits shift interval.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

inverse_transform(df)¶

Return data to original form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

autots.tools.transform.RandomTransform(transformer_list: dict = {'AlignLastDiff': 0.01, 'AlignLastValue': 0.2, 'AnomalyRemoval': 0.03, 'BKBandpassFilter': 0.01, 'BTCD': 0.01, 'CenterLastValue': 0.01, 'CenterSplit': 0.01, 'ChangepointDetrend': 0.01, 'ClipOutliers': 0.03, 'Cointegration': 0.01, 'Constraint': 0.01, 'CumSumTransformer': 0.005, 'DatepartRegression': 0.01, 'Detrend': 0.02, 'DiffSmoother': 0.005, 'DifferencedTransformer': 0.05, 'Discretize': 0.01, 'EWMAFilter': 0.02, 'FFTDecomposition': 0.01, 'FFTFilter': 0.01, 'FIRFilter': 0.01, 'FastICA': 0.01, 'HPFilter': 0.01, 'HistoricValues': 0.01, 'HolidayTransformer': 0.01, 'IntermittentOccurrence': 0.01, 'KalmanSmoothing': 0.02, 'LevelShiftTransformer': 0.03, 'LocalLinearTrend': 0.01, 'Log': 0.01, 'MaxAbsScaler': 0.03, 'MeanDifference': 0.002, 'MeanPercentSplitter': 0.01, 'MinMaxScaler': 0.03, 'PCA': 0.01, 'PctChangeTransformer': 0.01, 'PositiveShift': 0.02, 'PowerTransformer': 0.01, 'QuantileTransformer': 0.03, 'RegressionFilter': 0.01, 'ReplaceConstant': 0.02, 'RobustScaler': 0.03, 'RollingMean100thN': 0.01, 'RollingMeanTransformer': 0.02, 'Round': 0.02, 'STLFilter': 0.01, 'ScipyFilter': 0.02, 'SeasonalDifference': 0.06, 'ShiftFirstValue': 0.01, 'SinTrend': 0.01, 'Slice': 0.02, 'StandardScaler': 0.04, 'ThetaTransformer': 0.01, 'UpscaleDownscaleTransformer': 0.01, 'bkfilter': 0.05, 'cffilter': 0.01, 'convolution_filter': 0.001, None: 0.0}, transformer_max_depth: int = 4, na_prob_dict: dict = {'DatepartRegressionImputer': 0.01, 'IterativeImputer': 0.025, 'IterativeImputerExtraTrees': 0.0001, 'KNNImputer': 0.02, 'SeasonalityMotifImputer': 0.005, 'SeasonalityMotifImputer1K': 0.005, 'SeasonalityMotifImputerLinMix': 0.005, 'fake_date': 0.1, 'ffill': 0.4, 'ffill_mean_biased': 0.1, 'interpolate': 0.4, 'mean': 0.06, 'median': 0.03, 'rolling_mean': 0.1, 'rolling_mean_24': 0.1, 'zero': 0.05, None: 0.001}, fast_params: bool | None = None, superfast_params: bool | None = None, traditional_order: bool = False, transformer_min_depth: int = 1, allow_none: bool = True, no_nan_fill: bool = False)¶

Return a dict of randomly choosen transformation selections.

BTCD is used as a signal that slow parameters are allowed.

class autots.tools.transform.RegressionFilter(name: str = 'RegressionFilter', sigma: float = 2.0, rolling_window: int = 90, run_order: str = 'season_first', regression_params: dict | None = None, holiday_params: dict | None = None, holiday_country: str = 'US', trend_method: str = 'local_linear', **kwargs)¶

Bases: EmptyTransformer

Models seasonal and local linear trend, and clips std dvs from this fit.

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

No changes made.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.ReplaceConstant(constant: float = 0, fillna: str = 'linear', reintroduction_model: str | None = None, n_jobs: int = 1, **kwargs)¶

Bases: EmptyTransformer

Replace constant, filling the NaN, then possibly reintroducing. If reintroducion is used, it is unlikely inverse_transform will match original exactly.

Parameters:

constant (float) – target to replace
fillna (str) – None, and standard fillna methods of AutoTS
reintroduction_model (dict) – if given, attempts to predict occurrence of constant and reintroduce

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.RollingMeanTransformer(window: int = 10, fixed: bool = True, macro_micro: bool = False, suffix: str = '_lltmicro', center: bool = False, **kwargs)¶

Bases: EmptyTransformer

Attempt at Rolling Mean with built-in inverse_transform for time series inverse_transform can only be applied to the original series, or an immediately following forecast Does not play well with data with NaNs Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision.

Parameters:

window (int) – number of periods to take mean over
fixed (bool) – if True, don’t inverse to volatile state
macro_micro (bool) – if True, split on rolling trend vs remainder and later recombine. Overrides fixed arg.

fit(df)¶

Fits.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶: Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Returns data to original or forecast form

Parameters:

df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)¶: Returns rolling data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.Round(decimals: int = 0, on_transform: bool = False, on_inverse: bool = True, force_int: bool = False, **kwargs)¶

Bases: EmptyTransformer

Round all values. Convert into Integers if decimal <= 0.

Inverse_transform will not undo the transformation!

Parameters:

method (str) – only “middle”, in future potentially up/ceiling floor/down
decimals (int) – number of decimal places to round to.
on_transform (bool) – perform rounding on transformation
on_inverse (bool) – perform rounding on inverse transform

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.STLFilter(decomp_type='STL', part: str = 'trend', seasonal: int = 7, **kwargs)¶

Bases: EmptyTransformer

Irreversible filters.

Parameters:

decomp_type (str) – which decomposition to use
part (str) – which part of decomposition to return
seaonal (int) – seaonsal component of STL

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.ScipyFilter(method: str = 'hilbert', method_args: list | None = None, **kwargs)¶

Bases: EmptyTransformer

Irreversible filters from Scipy

Parameters:

method (str) – “hilbert”, “wiener”, “savgol_filter”, “butter”, “cheby1”, “cheby2”, “ellip”, “bessel”,
method_args (list) – passed to filter as appropriate

fit(df)¶

Fits filter.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fit and Return Detrended DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df)¶

Return data to original form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return detrended data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.SeasonalDifference(lag_1: int = 7, method: str = 'LastValue', **kwargs)¶

Bases: EmptyTransformer

Remove seasonal component.

“Oh that’s nice - ash on my tomatoes!” - Pippin

Parameters:

lag_1 (int) – length of seasonal period to remove.
method (str) – ‘LastValue’, ‘Mean’, ‘Median’ to construct seasonality

fit(df)¶

Fits.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶: Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Returns data to original or forecast form

Parameters:

df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original

transform(df)¶: Returns rolling data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.ShiftFirstValue(rows: int = 1, **kwargs)¶

Bases: EmptyTransformer

Shift all data relative to the first value(s) of the series.

Parameters:: rows (int) – number of rows to average from beginning of data

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.SinTrend(n_jobs=1, method='lm', **kwargs)¶

Bases: EmptyTransformer

Modelling sin.

fit(df)¶: Fits trend for later detrending :param df: input dataframe :type df: pandas.DataFrame

static fit_sin(tt, yy, method='lm')¶

Fit sin to the input time sequence, and return fitting parameters “amp”, “omega”, “phase”, “offset”, “freq”, “period” and “fitfunc”

from user unsym @ https://stackoverflow.com/questions/16716302/how-do-i-fit-a-sine-curve-to-my-data-with-pylab-and-numpy

fit_transform(df)¶: Fits and Returns Detrended DataFrame :param df: input dataframe :type df: pandas.DataFrame

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df)¶: Returns data to original form :param df: input dataframe :type df: pandas.DataFrame

transform(df)¶: Returns detrended data :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.Slice(method: str = '100', forecast_length: int = 30, **kwargs)¶

Bases: EmptyTransformer

Take the .tail() of the data returning only most recent values.

Inverse_transform will not undo the transformation!

Parameters:

method (str) – only “middle”, in future potentially up/ceiling floor/down
forecast_length (int) – forecast horizon, scales some slice windows

fit(df)¶

Learn behavior of data to change.

Parameters:: df (pandas.DataFrame) – input dataframe

fit_transform(df)¶

Fits and Returns Magical DataFrame.

Parameters:: df (pandas.DataFrame) – input dataframe

static get_new_params(method: str = 'random')¶: Generate new random parameters

inverse_transform(df, trans_method: str = 'forecast')¶

Return data to original or forecast form.

Parameters:: df (pandas.DataFrame) – input dataframe

transform(df)¶

Return changed data.

Parameters:: df (pandas.DataFrame) – input dataframe

class autots.tools.transform.StandardScaler¶

Bases: EmptyTransformer

fit(df: DataFrame)¶: Compute the mean and standard deviation for each feature.

fit_transform(df: DataFrame) → DataFrame¶: Fit the scaler and transform the dataset.

inverse_transform(df: DataFrame) → DataFrame¶: Revert the scaled data back to the original scale.

transform(df: DataFrame) → DataFrame¶: Scale the dataset using the stored mean and standard deviation.

class autots.tools.transform.StatsmodelsFilter(method: str = 'bkfilter', **kwargs)¶

Bases: EmptyTransformer

Irreversible filters. :param method: bkfilter or cffilter or convolution_filter :type method: str

bkfilter(df)¶

cffilter(df)¶

convolution_filter(df)¶

fit_transform(df)¶: Fit and Return Detrended DataFrame. :param df: input dataframe :type df: pandas.DataFrame

transform(df)¶: Return detrended data. :param df: input dataframe :type df: pandas.DataFrame

class autots.tools.transform.ThetaTransformer(theta_values=[0, 2], regularization=0.001, verbose=0)¶

Bases: object

fit(df)¶

Fit the transformer to the data.

Parameters:: df (pd.DataFrame) – DataFrame with DatetimeIndex and columns representing time series.
Returns:: self

fit_transform(df)¶

Fit the transformer to the data and then transform it.

Parameters:: df (pd.DataFrame) – DataFrame to fit and transform.
Returns:: Transformed DataFrame containing theta lines.
Return type:: pd.DataFrame

static get_new_params(method: str = 'random')¶

inverse_transform(df, trans_method: str = 'forecast')¶

Reconstruct the original data from theta lines.

Parameters:: df (pd.DataFrame) – DataFrame with theta lines.
Returns:: Reconstructed DataFrame in the original feature space.
Return type:: pd.DataFrame

transform(df)¶

Transform the data into theta lines.

Parameters:: df (pd.DataFrame) – DataFrame with same index and columns as fitted.
Returns:: Transformed DataFrame containing theta lines.
Return type:: pd.DataFrame

class autots.tools.transform.UpscaleDownscaleTransformer(mode='upscale', factor=3, down_method='decimate', fill_method='linear', forecast_length=None, **kwargs)¶

Bases: EmptyTransformer

Transformer that either upscales or downscales time series data for forecasting, inspired by audio processing techniques.

Upscaling will require an increase in forecast_length (handled internally by auto_model) Depending on the chosen mode:

‘upscale’: transform() increases the resolution of the data by inserting additional rows (using interpolation) so that (for example) a scaling factor of 3 creates three extra rows between every two original rows (i.e. 4× as many rows). inverse_transform() then downsamples (using decimation) back to the original index.

‘downscale’: transform() downsamples the (high-resolution) data by aggregating blocks of rows (using either decimation or a moving average). inverse_transform() then upsamples (via interpolation) to recover the original index.

Parameters:

mode (str) – Either ‘upscale’ or ‘downscale’. In ‘upscale’ mode, transform() will add new rows and inverse_transform() will remove them; in ‘downscale’ mode, transform() will remove rows and inverse_transform() will add them back.
factor (int) – The scaling factor. For example, factor=3 means that for each original interval the upscaled data will contain (factor+1) equally spaced rows.
down_method (str) – When downsampling, the method used. Options: - ‘decimate’: simply select every (factor+1)-th row. - ‘mean’: compute the mean (i.e. moving average) over each block.
fill_method (str) – When upsampling, the interpolation method (e.g., ‘linear’, ‘cubic’).

fit(df)¶

Fit the transformer to the data.

Stores the original index and columns. Also computes an approximate time delta (using the median difference between timestamps) which is used for resampling.

Parameters:: df (pandas.DataFrame) – Input DataFrame with a pd.DatetimeIndex.

fit_transform(df)¶

Fit to data, then transform it.

Parameters:: df (pandas.DataFrame) – Input DataFrame with a pd.DatetimeIndex.
Returns:: Transformed DataFrame.
Return type:: pandas.DataFrame

static get_new_params(method: str = 'random')¶

Generate new random parameters for the transformer.

Parameters:: method (str) – Method to generate new parameters. (Currently only “random” is supported.)
Returns:: Dictionary of transformer parameters.
Return type:: dict

inverse_transform(df, trans_method='forecast')¶

Inverse transform the data back to the original frequency.

For ‘upscale’ mode, if the incoming DataFrame df contains the training timestamps (self.original_index) they are used; but if df represents future forecasts (with a high-resolution index that does not intersect self.original_index), a new index is generated starting from an anchor based on the training data.

For ‘downscale’ mode, the low-resolution forecast data is reindexed to the new index and then missing values are filled.

Parameters:: df (pandas.DataFrame) – Transformed DataFrame.
Returns:: DataFrame resampled to the original frequency.
Return type:: pandas.DataFrame

transform(df)¶

Transform the data by either upscaling (increasing the resolution) or downscaling (reducing the resolution). The new index is computed so that the inverse_transform will ultimately yield a DataFrame with the same DatetimeIndex as the original.

Parameters:: df (pandas.DataFrame) – Input DataFrame with a pd.DatetimeIndex.
Returns:: Transformed DataFrame with a new index.
Return type:: pandas.DataFrame

autots.tools.transform.bkfilter_st(x, low=6, high=32, K=12, lanczos_factor=False)¶: This code is mostly from Statsmodel’s bkfilter function.

autots.tools.transform.clip_outliers(df, std_threshold: float = 4)¶

Replace outliers above threshold with that threshold. Axis = 0.

Parameters:

df (pandas.DataFrame) – DataFrame containing numeric data
std_threshold (float) – The number of standard deviations away from mean to count as outlier.

autots.tools.transform.exponential_decay(n, span=None, halflife=None)¶

autots.tools.transform.get_transformer_params(transformer: str = 'EmptyTransformer', method: str | None = None)¶: Retrieve new random params for new Transformers.

autots.tools.transform.random_cleaners()¶: Returns transformation params that clean data without shifting.

autots.tools.transform.remove_outliers(df, std_threshold: float = 3)¶

Replace outliers with np.nan. https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame

Parameters:

df (pandas.DataFrame) – DataFrame containing numeric data, DatetimeIndex
std_threshold (float) – The number of standard deviations away from mean to count as outlier.

autots.tools.transform.simple_context_slicer(df, method: str = 'None', forecast_length: int = 30)¶

Condensed version of context_slicer with more limited options.

Parameters:

df (pandas.DataFrame) – training data frame to slice
method (str) –
Option to slice dataframe ‘None’ - return unaltered dataframe ‘HalfMax’ - return half of dataframe ‘ForecastLength’ - return dataframe equal to length of forecast ‘2ForecastLength’ - return dataframe equal to twice length of forecast

(also takes 4, 6, 8, 10 in addition to 2)

’n’ - any integer length to slice by ‘-n’ - full length less this amount “0.n” - this percent of the full data

autots.tools.transform.transformer_list_to_dict(transformer_list)¶: Convert various possibilities to dict.

autots.tools.wavelet module¶

autots.tools.wavelet.continuous_db2_wavelet(t, p, order, sigma)¶

autots.tools.wavelet.create_daubechies_db2_wavelet(p)¶

Create a Daubechies db2 wavelet with specified period p.

Parameters: - p (int): The period or length to generate the wavelet.

Returns: - np.ndarray: The generated Daubechies db2 wavelet.

autots.tools.wavelet.create_gaussian_wavelet(p, frequency=3, sigma=1.0)¶

Create a Gaussian-modulated cosine wavelet with specified frequency and sigma.

Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated Gaussian-modulated wavelet.

autots.tools.wavelet.create_haar_wavelet(p)¶

Create a Haar wavelet with specified period p.

Parameters: - p (float): The period or length to generate the wavelet.

Returns: - np.ndarray: The generated Haar wavelet.

autots.tools.wavelet.create_mexican_hat_wavelet(p, frequency=None, sigma=1.0)¶

Create a Mexican Hat wavelet (Ricker wavelet) with specified sigma.

Parameters: - p (float): The period or length to generate the wavelet. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated Mexican Hat wavelet.

autots.tools.wavelet.create_morlet_wavelet(p, frequency=3, sigma=1.0)¶

Create a Morlet wavelet with specified frequency and sigma.

Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated complex Morlet wavelet.

autots.tools.wavelet.create_narrowing_wavelets(p, max_order, t, sigma=0.5)¶

autots.tools.wavelet.create_real_morlet_wavelet(p, frequency=3, sigma=1.0)¶

Create a real-valued Morlet wavelet with specified frequency and sigma.

Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.

Returns: - np.ndarray: The generated real Morlet wavelet.

autots.tools.wavelet.create_wavelet(t, p, sigma=1.0, phase_shift=0, wavelet_type='morlet')¶

Create a real-valued wavelet based on real-world anchored time steps in t, with an additional phase shift and a choice of wavelet type.

Parameters: - t (np.ndarray): Array of time steps (in days) from a specified origin. - p (float): The period of the wavelet in the same units as t (typically days). - sigma (float): Standard deviation for the Gaussian envelope. - phase_shift (float): Phase shift to adjust the position of the wavelet peak. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).

Returns: - np.ndarray: The generated wavelet values for each time step.

autots.tools.wavelet.offset_wavelet(p, t, order=5, sigma=1.0, wavelet_type='morlet')¶

Create an offset collection of wavelets with order offsets, ensuring that peaks are spaced p/order apart.

Parameters: - p (float): Period of the wavelet in the same units as t (typically days). - t (np.ndarray): Array of time steps. - order (int): The number of offsets. - sigma (float): Standard deviation for the Gaussian envelope. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).

Returns: - np.ndarray: A 2D array with order wavelets along axis 1.

autots.tools.window_functions module¶

autots.tools.window_functions.chunk_reshape(arr, window_size=10, chunk_size=100, sample_fraction=None, random_seed=7734, dtype=<class 'numpy.float32'>)¶: Shifts from (n_records, n_series) to (windows, window_size). Multivariate. More memory efficient, if not quite as fast as x.reshape(-1, x.shape[-1]) for 3D numpy array.

autots.tools.window_functions.last_window(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False)¶: Pandas based function to provide the last window of window_maker.

autots.tools.window_functions.np_2d_arange(start=0, stop=3, step=1, num_columns=4)¶

autots.tools.window_functions.retrieve_closest_indices(df, num_indices, forecast_length, window_size: int = 10, distance_metric: str = 'braycurtis', stride_size: int = 1, start_index: int | None = None, include_differenced: bool = False, include_last: bool = True, verbose: int = 0)¶

Find next indicies closest to the final segment of forecast_length

Parameters:

df (pd.DataFrame) – source data in wide format
num_indices (int) – number of indices to return
forecast_length (int) – length of forecast
window_size (int) – length of comparison
distance_metric (str) – distance measure from scipy and nan_euclidean
stride_size (int) – length of spacing between windows
start_index (int) – index to begin creation of windows from
include_difference (bool) – if True, also compare on differences

autots.tools.window_functions.rolling_window_view(array, window_shape=(0,), axis=None, writeable=False)¶

Create a view of array which for every point gives the n-dimensional neighbourhood of size window. New dimensions are added at the end of array or after the corresponding original dimension.

Based on: https://gist.github.com/seberg/3866040 but designed to match the newer np.sliding_window_view

Parameters:

array (np.array) – Array to which the rolling window is applied.
window_shape (int) – Either a single integer to create a window of only the last axis or a tuple to create it for the last len(window) axis. 0 can be used as a to ignore a dimension in the window.
axis (int) – If given, must have the same size as window. In this case window is interpreted as the size in the dimension given by axis. IE. a window of (2, 1) is equivalent to window=2 and axis=-2.

Returns:

A view on array which is smaller to fit the windows and has windows added dimensions (0s not counting), ie. every point of array is an array of size window.

autots.tools.window_functions.sliding_window_view(array, window_shape=(0,), axis=None, writeable=False, **kwargs)¶: Toggles between numpy and internal version depending on np.__version__.

autots.tools.window_functions.window_id_maker(window_size: int, max_steps: int, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)¶

Create indices for array of multiple window slices of data

Parameters:

window_size (int) – length of time history to include
max_steps (int) – the maximum number of windows to create
start_index (int) – if to not start at the first point, start at this point
stride_size (int) – number of skips between each window start point
skip_size (int) – number of skips between each obs in a window (downsamples)

Returns:

np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)

autots.tools.window_functions.window_lin_reg(x, y, w)¶: From https://stackoverflow.com/questions/70296498/efficient-computation-of-moving-linear-regression-with-numpy-numba/70304475#70304475

autots.tools.window_functions.window_lin_reg_mean(x, y, w)¶: From https://stackoverflow.com/questions/70296498/efficient-computation-of-moving-linear-regression-with-numpy-numba/70304475#70304475

autots.tools.window_functions.window_lin_reg_mean_no_nan(x, y, w)¶: From https://stackoverflow.com/questions/70296498/efficient-computation-of-moving-linear-regression-with-numpy-numba/70304475#70304475

autots.tools.window_functions.window_maker(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False, shuffle: bool = False, output_dim: str = 'forecast_length', forecast_length: int = 1, max_windows: int = 5000, regression_type: str | None = None, future_regressor=None, random_seed: int = 1234)¶

Convert a dataset into slices with history and y forecast.

Parameters:

df (pd.DataFrame) – wide format df with sorted index
window_size (int) – length of history to use for X window
input_dim (str) – univariate or multivariate. If multivariate, all series in single X row
shuffle (bool) – (deprecated)
output_dim (str) – ‘forecast_length’ or ‘1step’ where 1 step is basically forecast_length=1
forecast_length (int) – number of periods ahead that will be forecast
max_windows (int) – a cap on total number of windows to generate. If exceeded, random of this int are selected.
regression_type (str) – None or “user” if to try to concat regressor to windows
future_regressor (pd.DataFrame) – values of regressor if used
random_seed (int) – a consistent random

Returns:

X, Y

autots.tools.window_functions.window_maker_2(array, window_size: int, max_steps: int | None = None, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)¶

Create array of multiple window slices of data Note that this returns a different orientation than window_maker_3

Parameters:

array (np.array) – source of historic information of shape (num_obs, num_series)
window_size (int) – length of time history to include
max_steps (int) – the maximum number of windows to create
start_index (int) – if to not start at the first point, start at this point
stride_size (int) – number of skips between each window start point
skip_size (int) – number of skips between each obs in a window (downsamples)

Returns:

np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)

autots.tools.window_functions.window_maker_3(array, window_size: int, **kwargs)¶

stride tricks version of window. About 40% faster than window_maker_2 Note that this returns a different orientation than window_maker_2

Parameters:

array (np.array) – in shape of (num_obs, num_series)
window_size (int) – length of slice of history
np.lib.stride_tricks.sliding_window_view (**kwargs passed to) –

Returns:

np.array with 3D shape (num windows, num columns/series, window_length), 2D array if only 1D array provided)

autots.tools.window_functions.window_sum_mean(x, w, axis=0)¶

autots.tools.window_functions.window_sum_mean_nan_tail(x, w, axis=0)¶

autots.tools.window_functions.window_sum_nan_mean(x, w, axis=0)¶

Module contents¶

Basic utilities.

autots.tools package¶

Submodules¶

autots.tools.anomaly_utils module¶

autots.tools.calendar module¶

autots.tools.cointegration module¶

autots.tools.constraint module¶

autots.tools.cpu_count module¶

autots.tools.fast_kalman module¶

Usage example¶

autots.tools.fft module¶

autots.tools.fir_filter module¶

autots.tools.hierarchial module¶

autots.tools.holiday module¶

autots.tools.impute module¶

autots.tools.kalman module¶

autots.tools.lunar module¶

autots.tools.percentile module¶

autots.tools.probabilistic module¶

autots.tools.profile module¶

autots.tools.regressor module¶

autots.tools.seasonal module¶

autots.tools.shaping module¶

autots.tools.thresholding module¶

autots.tools.transform module¶

autots.tools.wavelet module¶

autots.tools.window_functions module¶

Module contents¶

Navigation

Related Topics