autots.tools package¶
Submodules¶
autots.tools.anomaly_utils module¶
Created on Fri Jul 1 15:41:21 2022
@author: Colin
point, contextual, and collective. Point anomalies are single values that fall within low-density regions of values, collective anomalies indicate that a sequence of values is anomalous rather than any single value by itself, and contextual anomalies are single values that do not fall within low-density regions yet are anomalous with regard to local values - https://arxiv.org/pdf/1802.04431.pdf
- autots.tools.anomaly_utils.anomaly_df_to_holidays(anomaly_df, actuals=None, anomaly_scores=None, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=False, use_lunar_holidays=False, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False)¶
- autots.tools.anomaly_utils.anomaly_new_params(method='random')¶
- autots.tools.anomaly_utils.create_dates_df(dates)¶
Take a pd.DatetimeIndex and create simple date parts.
- autots.tools.anomaly_utils.dates_to_holidays(dates, df_cols, style='long', holiday_impacts='value', day_holidays=None, wkdom_holidays=None, wkdeom_holidays=None, lunar_holidays=None, lunar_weekday=None, islamic_holidays=None, hebrew_holidays=None, hindu_holidays=None, max_features: int | None = None)¶
Populate date information for a given pd.DatetimeIndex.
- Parameters:
dates (pd.DatetimeIndex) – list of dates
day_holidays (pd.DataFrame) – list of month/day holidays. Pass None if not available
style (str) – option for how to return information “long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar
holiday_impacts (dict) – a dict passed to .replace contaning values for holiday_names, or str ‘value’ or ‘anomaly_score’
- autots.tools.anomaly_utils.detect_anomalies(df, output, method, transform_dict=None, method_params={}, eval_period=None, n_jobs=1)¶
All will return -1 for anomalies.
- Parameters:
output (str) – ‘multivariate’ (each series unique outliers), or ‘univariate’ (all series together for one outlier flag per timestamp)
transform_dict (dict) – option but helpful, often datepart, differencing, or other standard AutoTS transformer params
forecast_params (dict) – used to backcast and identify ‘unforecastable’ values, required only for predict_interval method
eval_periods (int) – only use this length tail of data, currently only implemented for forecast_params forecasting if used
n_jobs (int) – multiprocessing jobs, used by some methods
- Returns:
pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)
- autots.tools.anomaly_utils.gaussian_mixture(df, n_components=2, tol=0.001, max_iter=100, responsibility_threshold=0.05)¶
- autots.tools.anomaly_utils.holiday_new_params(method='random')¶
- autots.tools.anomaly_utils.limits_to_anomalies(df, output, upper_limit, lower_limit, method_params=None)¶
- autots.tools.anomaly_utils.loop_sk_outliers(df, method, method_params={}, n_jobs=1)¶
Multiprocessing on each series for multivariate outliers with sklearn.
- autots.tools.anomaly_utils.nonparametric_multivariate(df, output, method_params, n_jobs=1)¶
- autots.tools.anomaly_utils.sk_outliers(df, method, method_params={})¶
scikit-learn outlier methods wrapper.
- autots.tools.anomaly_utils.values_to_anomalies(df, output, threshold_method, method_params, n_jobs=1)¶
- autots.tools.anomaly_utils.zscore_survival_function(df, output='multivariate', method='zscore', distribution='norm', rolling_periods: int = 200, center: bool = True)¶
Take a dataframe and generate zscores and then generating survival probabilities (smaller = more outliery).
- Parameters:
df (pd.DataFramme) – wide style time series data (datetimeindex, series)
output (str) – univariate (1 series from all) or multivariate (all series input returned unique)
method (str) – zscore, rolling_zscore, mad (median abs dev)
distribution (str) – distribution to sample sf/outliers from
rolling_period (int) – >1, used for rolling_zscore period
center (bool) – passed to pd.rolling for rolliing_zscore, True for holiday detection, False for anomaly detection generally
- Returns:
pd.Dataframe of p-values
autots.tools.calendar module¶
Calendar conversion functions.
Includes Lunar, Chinese lunar, and Arabic lunar
- autots.tools.calendar.gregorian_to_chinese(datetime_index)¶
Convert a pandas DatetimeIndex to Chinese Lunar calendar. Potentially has errors.
- autots.tools.calendar.gregorian_to_christian_lunar(datetime_index)¶
Convert a pandas DatetimeIndex to Christian Lunar calendar. Aspiration it doesn’t work exactly.
- autots.tools.calendar.gregorian_to_hebrew(dates)¶
Convert pd.Datetimes to a Hebrew date. From pyluach by simlist.
This is the slowest of the lot and needs to be improved.
- autots.tools.calendar.gregorian_to_hindu(datetime_index)¶
Convert a pandas DatetimeIndex to Hindu calendar date components. Hindu calendar has numerous regional variations.
Used an llm to put this one together. It gets the dates wrong, but it does appear to have correlated consistency so may still work for modeling. Suggestions for improvement welcomed.
- autots.tools.calendar.gregorian_to_islamic(date, epoch_adjustment=1.5)¶
Calculate Islamic dates for pandas DatetimeIndex. Approximately. From convertdate by fitnr.
- Parameters:
epoch_adjustment (float) – 1.0 and that needs to be adjusted by about +/- 0.5 to account for timezone
- autots.tools.calendar.heb_is_leap(year)¶
- autots.tools.calendar.lunar_from_lunar(new_moon)¶
Assumes continuous daily data and pre-needed start.
- autots.tools.calendar.lunar_from_lunar_full(full_moon)¶
Assumes continuous daily data and pre-needed start.
- autots.tools.calendar.to_jd(year, month, day)¶
Determine Julian day count from Islamic date. From convertdate by fitnr.
autots.tools.cointegration module¶
Cointegration
Johansen heavily based on Statsmodels source code
BTCD heavily based on D. Barba https://towardsdatascience.com/canonical-decomposition-a-forgotten-method-for-time-series-cointegration-and-beyond-4d1213396da1
- autots.tools.cointegration.btcd_decompose(p_mat: ndarray, regression_model, max_lag: int = 1, return_eigenvalues=False)¶
Calculate decomposition. p_mat is of shape(t,n), wide style data.
- autots.tools.cointegration.coint_johansen(endog, det_order=-1, k_ar_diff=1, return_eigenvalues=False)¶
Johansen cointegration test of the cointegration rank of a VECM, abbreviated from Statsmodels
- autots.tools.cointegration.fourier_series(dates, period, series_order)¶
Provides Fourier series components with the specified frequency and order.
- Parameters:
dates (pd.Series containing timestamps.) –
period (Number of days of the period.) –
series_order (Number of components.) –
- Return type:
Matrix with seasonality features.
- autots.tools.cointegration.lagmat(x, maxlag: int, trim='forward', original='ex')¶
Create 2d array of lags. Modified from Statsmodels.
autots.tools.constraint module¶
Constraint generation functions
- autots.tools.constraint.apply_constraint_single(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None)¶
- autots.tools.constraint.apply_fit_constraint(forecast, lower_forecast, upper_forecast, constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, lower_constraint=None, upper_constraint=None, train_min=None, train_max=None, fillna=None)¶
- autots.tools.constraint.constant_growth_rate(periods, final_growth)¶
Take a final target growth rate (ie 2 % over a year) and convert to a daily (etc) value.
- autots.tools.constraint.constraint_new_params(method: str = 'fast')¶
- autots.tools.constraint.fit_constraint(constraint_method, constraint_value, constraint_direction='upper', constraint_regularization=1.0, bounds=True, df_train=None, forecast_length=None)¶
autots.tools.cpu_count module¶
CPU counter for multiprocesing.
- autots.tools.cpu_count.cpu_count(modifier: float = 1)¶
Find available CPU count, running on both Windows/Linux.
- Attempts to be very conservative:
Remove Intel Hyperthreading logical cores
Find max cores allowed to the process, if less than machine has total
Runs best with psutil installed, fallsback to mkl, then os core count/2
- Parameters:
modifier (float) – multiple CPU count by this value
- autots.tools.cpu_count.set_n_jobs(n_jobs, verbose=0)¶
autots.tools.fast_kalman module¶
From SIMD KALMAN, (c) 2017 Otto Seiskari (MIT License)
- Some other resources that I have found useful:
https://kevinkotze.github.io/ts-4-state-space/ https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_dfm_coincident.html an Introduction to State Space Time Series Analysis, Commandeur and Koopman, chp 8 Forecasting, structural time series models, and the Kalman Filter, Andrew Harvey
Following the notation in [1], the Kalman filter framework consists of a dynamic model (state transition model)
and a measurement model (observation model)
where the vector \(x\) is the (hidden) state of the system and \(y\) is an observation. A and H are matrices of suitable shape and \(Q\), \(R\) are positive-definite noise covariance matrices.
Usage example¶
import numpy.random numpy.random.seed(0)
Define model
import simdkalman import numpy as np
- kf = simdkalman.KalmanFilter(
state_transition = [[1,1],[0,1]], # matrix A process_noise = np.diag([0.1, 0.01]), # Q observation_model = np.array([[1,0]]), # H observation_noise = 1.0) # R
Generate some fake data
import numpy.random as random
# 100 independent time series data = random.normal(size=(100, 200))
# with 10% of NaNs denoting missing values data[random.uniform(size=data.shape) < 0.1] = np.nan
Smooth all data
- smoothed = kf.smooth(data,
initial_value = [1,0], initial_covariance = np.eye(2) * 0.5)
# second timeseries, third time step, hidden state x print(‘mean’) print(smoothed.states.mean[1,2,:])
print(‘covariance’) print(smoothed.states.cov[1,2,:,:])
mean [ 0.29311384 -0.06948961] covariance [[ 0.19959416 -0.00777587]
[-0.00777587 0.02528967]]
Predict new data for a single series (1d case)
predicted = kf.predict(data[1,:], 123)
# predicted observation y, third new time step pred_mean = predicted.observations.mean[2] pred_stdev = np.sqrt(predicted.observations.cov[2])
print(‘%g +- %g’ % (pred_mean, pred_stdev))
1.71543 +- 1.65322
Low-level Kalman filter computation steps with multi-dimensional input arrays. Unlike with the KalmanFilter class, all inputs must be numpy arrays. However, their dimensions can flexibly vary form 1 to 3 as long as they are reasonable from the point of view of matrix multiplication and numpy broadcasting rules. Matrix operations are applied on the last two axes of the arrays.
- class autots.tools.fast_kalman.Gaussian(mean, cov)¶
Bases:
object
- static empty(n_states, n_vars, n_measurements, cov=True)¶
- unvectorize_state()¶
- unvectorize_vars()¶
- class autots.tools.fast_kalman.KalmanFilter(state_transition, process_noise, observation_model, observation_noise)¶
Bases:
object
The main Kalman filter class providing convenient interfaces to vectorized smoothing and filtering operations on multiple independent time series.
As long as the shapes of the given parameters match reasonably according to the rules of matrix multiplication, this class is flexible in their exact nature accepting
scalars:
process_noise = 0.1
(2d) numpy matrices:
process_noise = numpy.eye(2)
2d arrays:
observation_model = [[1,2]]
3d arrays and matrices for vectorized computations. Unlike the other options, this locks the shape of the inputs that can be processed by the smoothing and prediction methods.
- Parameters:
state_transition – State transition matrix \(A\)
process_noise – Process noise (state transition covariance) matrix \(Q\)
observation_model – Observation model (measurement model) matrix \(H\)
observation_noise – Observation noise (measurement noise covariance) matrix \(R\)
- class Result¶
Bases:
object
- compute(data, n_test, initial_value=None, initial_covariance=None, smoothed=True, filtered=False, states=True, covariances=True, observations=True, likelihoods=False, gains=False, log_likelihood=False, verbose=False)¶
Smoothing, filtering and prediction at the same time. Used internally by other methods, but can also be used directly if, e.g., both smoothed and predicted data is wanted.
See smooth and predict for explanation of the common parameters. With this method, there also exist the following flags.
- Parameters:
smoothed (boolean) – compute Kalman smoother (used by smooth)
filtered (boolean) – return (one-way) filtered data
likelihoods (boolean) – return likelihoods of each step
gains (boolean) – return Kalman gains and pairwise covariances (used by the EM algorithm). If true, the gains are provided as a member of the relevant subresult
filtered.gains
and/orsmoothed.gains
.log_likelihood (boolean) – return the log-likelihood(s) for the entire series. If matrix data is given, this will be a vector where each element is the log-likelihood of a single row.
- Return type:
result object whose fields depend on of the above parameter flags are True. The possible values are:
smoothed
(the return value of smooth, may containsmoothed.gains
),filtered
(likesmoothed
, may also containfiltered.gains
),predicted
(the return value of predict ifn_test > 0
)pairwise_covariances
,likelihoods
andlog_likelihood
.
- em(data, n_iter=5, initial_value=None, initial_covariance=None, verbose=False)¶
- em_observation_noise(result, data, verbose=False)¶
- em_process_noise(result, verbose=False)¶
- predict(data, n_test, initial_value=None, initial_covariance=None, states=True, observations=True, covariances=True, verbose=False)¶
Filter past data and predict a given number of future values. The data can be given as either of
1d array, like
[1,2,3,4]
. In this case, one Kalman filter is used and the return value structure will contain an 1d array ofobservations
(both.mean
and.cov
will be 1d).2d matrix, whose each row is interpreted as an independent time series, all of which are filtered independently. The returned
observations
members will be 2-dimensional in this case.3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e,
data[1,2,:]
defines the components of the third observation of the second series. In the-multi-dimensional case the returnedobservations.mean
will be 3-dimensional andobservations.cov
4-dimensional.
Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.
- Parameters:
data – Past data
n_test (integer) – number of future steps to predict.
initial_value – Initial value \({\mathbb E}[x_0]\)
initial_covariance – Initial uncertainty \({\rm Cov}[x_0]\)
states (boolean) – predict states \(x\)?
observations (boolean) – predict observations \(y\)?
covariances (boolean) – include covariances in predictions?
- Return type:
Result object with fields
states
andobservations
, if the respective parameter flags are set to True. Both areGaussian
result objects with fieldsmean
andcov
(if the covariances flag is True)
- predict_next(m, P)¶
Single prediction step
- Parameters:
m – \({\mathbb E}[x_{j-1}]\), the previous mean
P – \({\rm Cov}[x_{j-1}]\), the previous covariance
- Return type:
(prior_mean, prior_cov)
predicted mean and covariance \({\mathbb E}[x_j]\), \({\rm Cov}[x_j]\)
- predict_observation(m, P)¶
Probability distribution of observation \(y\) for a given distribution of \(x\)
- Parameters:
m – \({\mathbb E}[x]\)
P – \({\rm Cov}[x]\)
- Return type:
mean \({\mathbb E}[y]\) and covariance \({\rm Cov}[y]\)
- smooth(data, initial_value=None, initial_covariance=None, observations=True, states=True, covariances=True, verbose=False)¶
Smooth given data, which can be either
1d array, like
[1,2,3,4]
. In this case, one Kalman filter is used and the return value structure will contain an 1d array ofobservations
(both.mean
and.cov
will be 1d).2d matrix, whose each row is interpreted as an independent time series, all of which are smoothed independently. The returned
observations
members will be 2-dimensional in this case.3d matrix, whose the last dimension can be used for multi-dimensional observations, i.e,
data[1,2,:]
defines the components of the third observation of the second series. In the-multi-dimensional case the returnedobservations.mean
will be 3-dimensional andobservations.cov
4-dimensional.
Initial values and covariances can be given as scalars or 2d matrices in which case the same initial states will be used for all rows or 3d arrays for different initial values.
- Parameters:
data – 1d or 2d data, see above
initial_value – Initial value \({\mathbb E}[x_0]\)
initial_covariance – Initial uncertainty \({\rm Cov}[x_0]\)
states (boolean) – return smoothed states \(x\)?
observations (boolean) – return smoothed observations \(y\)?
covariances (boolean) – include covariances results?
- Return type:
Result object with fields
states
andobservations
, if the respective parameter flags are set to True. Both areGaussian
result objects with fieldsmean
andcov
(if the covariances flag is True)
- smooth_current(m, P, ms, Ps)¶
Simgle Kalman smoother backwards step
- Parameters:
m – \({\mathbb E}[x_j|y_1,\ldots,y_j]\), the filtered mean of \(x_j\)
P – \({\rm Cov}[x_j|y_1,\ldots,y_j]\), the filtered covariance of \(x_j\)
ms – \({\mathbb E}[x_{j+1}|y_1,\ldots,y_T]\)
Ps – \({\rm Cov}[x_{j+1}|y_1,\ldots,y_T]\)
- Return type:
(smooth_mean, smooth_covariance, smoothing_gain)
smoothed mean \({\mathbb E}[x_j|y_1,\ldots,y_T]\), and covariance \({\rm Cov}[x_j|y_1,\ldots,y_T]\) & smoothing gain \(C\)
- update(m, P, y, log_likelihood=False)¶
Single update step with NaN check.
- Parameters:
m – \({\mathbb E}[x_j|y_1,\ldots,y_{j-1}]\), the prior mean of \(x_j\)
P – \({\rm Cov}[x_j|y_1,\ldots,y_{j-1}]\), the prior covariance of \(x_j\)
y – observation \(y_j\)
log_likelihood – compute log-likelihood?
- Return type:
(posterior_mean, posterior_covariance, log_likelihood)
posterior mean \({\mathbb E}[x_j|y_1,\ldots,y_j]\) & covariance \({\rm Cov}[x_j|y_1,\ldots,y_j]\) and, if requested, log-likelihood. If \(y_j\) is NaN, returns the prior mean and covariance instead
- autots.tools.fast_kalman.autoshape(func)¶
Automatically shape arguments and return values
- autots.tools.fast_kalman.ddot(A, B)¶
Matrix multiplication over last two axes
- autots.tools.fast_kalman.ddot_t_right(A, B)¶
Matrix multiplication over last 2 axes with right operand transposed
- autots.tools.fast_kalman.ddot_t_right_old(A, B)¶
Matrix multiplication over last 2 axes with right operand transposed
- autots.tools.fast_kalman.dinv(A)¶
Matrix inverse applied to last two axes
- autots.tools.fast_kalman.douter(a, b)¶
Outer product, last two axes
- autots.tools.fast_kalman.em_initial_state(result, initial_means)¶
- autots.tools.fast_kalman.ensure_matrix(x, dim=1)¶
- autots.tools.fast_kalman.ensure_stability(st)¶
- autots.tools.fast_kalman.holt_winters_damped_matrices(M, alpha, beta, gamma, phi=1.0)¶
Not sure if this is correct. It’s close, at least.
- autots.tools.fast_kalman.new_kalman_params(method=None, allow_auto=True)¶
- autots.tools.fast_kalman.predict(mean, covariance, state_transition, process_noise)¶
Kalman filter prediction step
- Parameters:
mean – \({\mathbb E}[x_{j-1}]\), the filtered mean form the previous step
covariance – \({\rm Cov}[x_{j-1}]\), the filtered covariance form the previous step
state_transition – matrix \(A\)
process_noise – matrix \(Q\)
- Return type:
(prior_mean, prior_cov)
predicted mean and covariance \({\mathbb E}[x_j]\), \({\rm Cov}[x_j]\)
- autots.tools.fast_kalman.predict_observation(mean, covariance, observation_model, observation_noise)¶
Compute probability distribution of the observation \(y\), given the distribution of \(x\).
- Parameters:
mean – \({\mathbb E}[x]\)
covariance – \({\rm Cov}[x]\)
observation_model – matrix \(H\)
observation_noise – matrix \(R\)
- Return type:
mean \({\mathbb E}[y]\) and covariance \({\rm Cov}[y]\)
- autots.tools.fast_kalman.priv_smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)¶
- autots.tools.fast_kalman.priv_update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement, log_likelihood=False)¶
- autots.tools.fast_kalman.random_matrix(rows, cols, density=0.2)¶
- autots.tools.fast_kalman.random_state_space(tries=15)¶
- autots.tools.fast_kalman.random_state_space_original()¶
Return randomly generated statespace models.
- autots.tools.fast_kalman.smooth(posterior_mean, posterior_covariance, state_transition, process_noise, next_smooth_mean, next_smooth_covariance)¶
Kalman smoother backwards step
- Parameters:
posterior_mean – \({\mathbb E}[x_j|y_1,\ldots,y_j]\), the filtered mean of \(x_j\)
posterior_covariance – \({\rm Cov}[x_j|y_1,\ldots,y_j]\), the filtered covariance of \(x_j\)
state_transition – matrix \(A\)
process_noise – matrix \(Q\)
next_smooth_mean – \({\mathbb E}[x_{j+1}|y_1,\ldots,y_T]\)
next_smooth_covariance – \({\rm Cov}[x_{j+1}|y_1,\ldots,y_T]\)
- Return type:
(smooth_mean, smooth_covariance, smoothing_gain)
smoothed mean \({\mathbb E}[x_j|y_1,\ldots,y_T]\), and covariance \({\rm Cov}[x_j|y_1,\ldots,y_T]\)
- autots.tools.fast_kalman.stable_pinv(A, tol=1e-05, regularization=0.0001)¶
- autots.tools.fast_kalman.update(prior_mean, prior_covariance, observation_model, observation_noise, measurement)¶
Kalman filter update step
- Parameters:
prior_mean – \({\mathbb E}[x_j|y_1,\ldots,y_{j-1}]\), the prior mean of \(x_j\)
prior_covariance – \({\rm Cov}[x_j|y_1,\ldots,y_{j-1}]\), the prior covariance of \(x_j\)
observation_model – matrix \(H\)
observation_noise – matrix \(R\)
measurement – observation \(y_j\)
- Return type:
(posterior_mean, posterior_covariance)
posterior mean and covariance \({\mathbb E}[x_j|y_1,\ldots,y_j]\), \({\rm Cov}[x_j|y_1,\ldots,y_j]\) after observing \(y_j\)
- autots.tools.fast_kalman.update_with_nan_check(prior_mean, prior_covariance, observation_model, observation_noise, measurement)¶
Kalman filter update with a check for NaN observations. Like
update
but returns(prior_mean, prior_covariance)
ifmeasurement
is NaN
autots.tools.fft module¶
Created on Mon Oct 9 22:07:37 2023
@author: colincatlin
- class autots.tools.fft.FFT(n_harm=10, detrend='linear', freq_range=None)¶
Bases:
object
- fit(x)¶
- generate_harmonics_dataframe(forecast_length=0)¶
- predict(forecast_length=0)¶
- autots.tools.fft.fourier_extrapolation(x, forecast_length=10, n_harm=10, detrend='linear', freq_range=None)¶
autots.tools.fir_filter module¶
Created on Fri Sep 6 23:59:35 2024
@author: colincatlin
- autots.tools.fir_filter.apply_fir_filter_time_domain(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')¶
Apply FIR filter using time-domain convolution (lfilter) for smaller memory usage. This function has padding issues currently.
- autots.tools.fir_filter.apply_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming')¶
Apply FIR filter to an array of time series data with shape (observations, series).
Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)
Returns: - filtered_data: The filtered version of the input data
- autots.tools.fir_filter.fft_fir_filter_to_timeseries(data, sampling_frequency, numtaps=512, cutoff_hz=20, window='hamming', chunk_size=1000)¶
Apply FIR filter to an array of time series data with shape (observations, series).
Parameters: - data: numpy array of shape (observations, series), where each column represents a time series - sampling_frequency: The sampling frequency of the time series data (e.g., 365 for daily data) - numtaps: Number of taps (filter length) - cutoff_hz: The cutoff frequency in Hz (for filtering purposes) - window: The windowing function to use for FIR filter design (‘hamming’, ‘hann’, etc.)
Returns: - filtered_data: The filtered version of the input data
- autots.tools.fir_filter.generate_random_fir_params(method='random', data_type='time_series')¶
autots.tools.hierarchial module¶
- class autots.tools.hierarchial.hierarchial(grouping_method: str = 'tile', n_groups: int = 5, reconciliation: str = 'mean', grouping_ids: dict | None = None)¶
Bases:
object
Create hierarchial series, then reconcile.
Currently only performs one-level groupings. :param grouping_method: method to create groups. ‘User’ requires hier_id input of groupings. :type grouping_method: str :param n_groups: number of groups, if above is not ‘User’ :type n_groups: int :param reconciliation: None, or ‘mean’ method to combine top and bottom forecasts. :type reconciliation: str :param grouping_ids: dict of series_id: group_id to use if grouping is ‘User’ :type grouping_ids: dict
- fit(df)¶
Construct and save object info.
- reconcile(df)¶
Apply to forecasted data containing bottom and top levels.
- transform(df)¶
Apply hierarchy to existing data with bottom levels only.
autots.tools.holiday module¶
Manage holiday features.
- autots.tools.holiday.holiday_flag(DTindex, country: str = 'US', encode_holiday_type: bool = False, holidays_subdiv=None)¶
Create a 0/1 flag for given datetime index. Includes fallback to pandas for US holidays if holidays package unavailable.
- Parameters:
DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays also accepts a list of countries, but not a list of subdivisions
encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays
holidays_subdiv (str) – subdivision (ie state), if used
- Returns:
pd.DataFrame with DatetimeIndex
- autots.tools.holiday.query_holidays(DTindex, country: str, encode_holiday_type: bool = False, holidays_subdiv=None)¶
Query holidays package for dates.
- Parameters:
DTindex (panda.DatetimeIndex) – DatetimeIndex of dates to create flags
country (str) – to pass through to python package Holidays
encode_holiday_type (bool) – if True, each holiday gets a unique integer column, if False, 0/1 for all holidays
autots.tools.impute module¶
Fill NA.
- autots.tools.impute.FillNA(df, method: str = 'ffill', window: int = 10)¶
Fill NA values using different methods.
- Parameters:
method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps also most method values of pd.DataFrame.interpolate()
window (int) – length of rolling windows for filling na, for rolling methods
- class autots.tools.impute.SeasonalityMotifImputer(k: int = 3, datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False)¶
Bases:
object
- impute(df)¶
Infer missing values on input df.
- class autots.tools.impute.SimpleSeasonalityMotifImputer(datepart_method: str = 'simple_2', distance_metric: str = 'canberra', linear_mixed: bool = False, max_iter: int = 100)¶
Bases:
object
- impute(df)¶
Infer missing values on input df.
- autots.tools.impute.biased_ffill(df, mean_weight: float = 1)¶
Fill NaN with average of last value and mean.
- autots.tools.impute.fake_date_fill(df, back_method: str = 'slice')¶
Numpy vectorized version. Return a dataframe where na values are removed and values shifted forward.
Warning
Thus, values will have incorrect timestamps!
- Parameters:
back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na
- autots.tools.impute.fake_date_fill_old(df, back_method: str = 'slice')¶
Return a dataframe where na values are removed and values shifted forward.
Warning
Thus, values will have incorrect timestamps!
- Parameters:
back_method (str) – how to deal with tails left by shifting NaN - ‘bfill’ -back fill the last value - ‘slice’ - drop any rows above threshold where half are nan, then bfill remainder - ‘slice_all’ - drop any rows with any na - ‘keepna’ - keep the lagging na
- autots.tools.impute.fill_forward(df)¶
Fill NaN with previous values.
- autots.tools.impute.fill_forward_alt(df)¶
Fill NaN with previous values.
- autots.tools.impute.fill_mean(df)¶
- autots.tools.impute.fill_mean_old(df)¶
Fill NaN with mean.
- autots.tools.impute.fill_median(df)¶
Fill nan with median values. Does not work with non-numeric types.
- autots.tools.impute.fill_median_old(df)¶
Fill NaN with median.
- autots.tools.impute.fill_one(df)¶
Fill NaN with zero.
- autots.tools.impute.fill_zero(df)¶
Fill NaN with zero.
- autots.tools.impute.fillna_np(array, values)¶
- autots.tools.impute.rolling_mean(df, window: int = 10)¶
Fill NaN with mean of last window values.
autots.tools.lunar module¶
Phases of the moon. Modified from https://stackoverflow.com/a/2531541/9492254 by keturn and earlier from John Walker
- autots.tools.lunar.dcos(d)¶
- autots.tools.lunar.dsin(d)¶
- autots.tools.lunar.fixangle(a)¶
- autots.tools.lunar.kepler(m, ecc=0.016718)¶
Solve the equation of Kepler.
- autots.tools.lunar.moon_phase(datetime_index, epsilon=1e-06, epoch=2444237.905, ecliptic_longitude_epoch=278.83354, ecliptic_longitude_perigee=282.596403, eccentricity=0.016718, moon_mean_longitude_epoch=64.975464, moon_mean_perigee_epoch=349.383063)¶
Numpy version. Takes a pd.DatetimeIndex and returns moon phase (%illuminated). Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. This is for US. epoch=2444238.5 for Asia generally.
- autots.tools.lunar.moon_phase_df(datetime_index, epoch=2444237.905)¶
Convert pandas DatetimeIndex to moon phases. Note timezone and hour can matter slightly. Epoch can be adjust slightly (0.5 = half day) to adjust for time zones. 2444237.905 is for US Central. epoch=2444238.5 for Asia generally.
- autots.tools.lunar.phase_string(p, precision=0.05, new=0.0, first=0.25, full=0.4, last=0.75, nextnew=1.0)¶
- autots.tools.lunar.todeg(r)¶
- autots.tools.lunar.torad(d)¶
autots.tools.percentile module¶
Faster percentile and quantile for numpy
Entirely from: https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/
- autots.tools.percentile.nan_percentile(in_arr, q, method='linear', axis=0, errors='raise')¶
Given a 3D array, return the given percentiles as input by q. Beware this is only tested for the limited case required here, and will not match np fully. Args more limited. If errors=”rollover” passes to np.nanpercentile where args are not supported.
- autots.tools.percentile.nan_quantile(arr, q, method='linear', axis=0, errors='raise')¶
Same as nan_percentile but accepts q in range [0, 1]. Args more limited. If errors=”rollover” passes to np.nanpercentile where not supported.
- autots.tools.percentile.trimmed_mean(data, percent, axis=0)¶
autots.tools.probabilistic module¶
Point to Probabilistic
- autots.tools.probabilistic.Point_to_Probability(train, forecast, prediction_interval=0.9, method: str = 'historic_quantile')¶
Data driven placeholder for model error estimation.
Catlin Point to Probability method (‘a mixture of dark magic and gum disease’)
- Parameters:
train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
prediction_interval (float) – confidence or perhaps credible interval
method (str) – spell to cast to create dark magic. ‘historic_quantile’, ‘inferred_normal’, ‘variable_pct_change’ gum disease available separately upon request.
- Returns:
upper_error, lower_error (two pandas.DataFrames for upper and lower bound respectively)
- autots.tools.probabilistic.Variable_Point_to_Probability(train, forecast, alpha=0.3, beta=1)¶
Data driven placeholder for model error estimation.
ErrorRange = beta * (En + alpha * En-1 [cum sum of En]) En = abs(0.5 - QTP) * D D = abs(Xn - ((Avg % Change of Train * Xn-1) + Xn-1)) Xn = Forecast Value QTP = Percentile of Score in All Percent Changes of Train Score = Percent Change (from Xn-1 to Xn)
- Parameters:
train (pandas.DataFrame) – DataFrame of time series where index is DatetimeIndex
forecast (pandas.DataFrame) – DataFrame of forecast time series in which the index is a DatetimeIndex and columns/series aligned with train. Forecast must be > 1 in length.
alpha (float) – parameter which effects the broadening of error range over time Usually 0 < alpha < 1 (although it can be larger than 1)
beta (float) – parameter which effects the general width of the error bar Usually 0 < beta < 1 (although it can be larger than 1)
- Returns:
error width for each value of forecast.
- Return type:
ErrorRange (pandas.DataFrame)
- autots.tools.probabilistic.historic_quantile(df_train, prediction_interval: float = 0.9, nan_flag=None)¶
Computes the difference between the median and the prediction interval range in historic data.
- Parameters:
df_train (pd.DataFrame) – a dataframe of training data
prediction_interval (float) – the desired forecast interval range
- Returns:
two 1D arrays
- Return type:
lower, upper (np.array)
- autots.tools.probabilistic.inferred_normal(train, forecast, n: int = 5, prediction_interval: float = 0.9)¶
A corruption of Bayes theorem. It will be sensitive to the transformations of the data.
- autots.tools.probabilistic.percentileofscore_appliable(x, a, kind='rank')¶
autots.tools.profile module¶
Profiling
- autots.tools.profile.data_profile(df)¶
Legacy profiler. Input: a pd DataFrame of columns which are time series, and a datetime index
Output: a pd DataFrame of column per time series, with rows which are statistics
- autots.tools.profile.profile_time_series(df, adi_threshold=1.3, cvar_threshold=0.5, flat_threshold=0.92, new_product_threshold='auto', seasonal_threshold=0.5)¶
- Profiles time series data into categories:
smooth, intermittent, erratic, lumpy, flat, new_product
- Parameters:
df (pd.DataFrame) – Wide format DataFrame with datetime index and each column as a time series.
new_product_threshold (float) – one of the more finiky thresholds, percent of null or zero data from beginning to declare new product
new_product_correct (bool) – use dt index to correct
- Returns:
DataFrame with ‘SERIES’ and ‘DEMAND_PROFILE’ columns.
- Return type:
pd.DataFrame
- autots.tools.profile.summarize_series(df)¶
Summarize time series data.
- Parameters:
df (pd.DataFrame) – wide style data with datetimeindex
autots.tools.regressor module¶
- autots.tools.regressor.create_lagged_regressor(df, forecast_length: int, frequency: str = 'infer', scale: bool = True, summarize: str | None = None, backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill')¶
Create a regressor of features lagged by forecast length. Useful to some models that don’t otherwise use such information.
It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:]
- Parameters:
df (pd.DataFrame) – training data
forecast_length (int) – length of forecasts, to shift data by
frequency (str) – the ever necessary frequency for datetime things. Default ‘infer’
scale (bool) – if True, use the StandardScaler to standardize the features
summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’, ‘gaussian_random_projection’, “auto”
backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression
fill_na (str) – method to prefill NAs in data, same methods as available elsewhere
- Returns:
regressor_train, regressor_forecast
- autots.tools.regressor.create_regressor(df, forecast_length, frequency: str = 'infer', holiday_countries: list = ['US'], datepart_method: str = 'simple_binarized', drop_most_recent: int = 0, scale: bool = True, summarize: str = 'auto', backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill', aggfunc: str = 'first', encode_holiday_type=False, holiday_detector_params={'anomaly_detector_params': {'forecast_params': None, 'method': 'mad', 'method_params': {'alpha': 0.05, 'distribution': 'gamma'}, 'transform_dict': {'fillna': None, 'transformation_params': {'0': {}}, 'transformations': {'0': 'DifferencedTransformer'}}}, 'output': 'univariate', 'splash_threshold': None, 'threshold': 0.8, 'use_dayofmonth_holidays': True, 'use_hebrew_holidays': False, 'use_islamic_holidays': False, 'use_lunar_holidays': False, 'use_lunar_weekday': False, 'use_wkdeom_holidays': False, 'use_wkdom_holidays': True}, holiday_regr_style: str = 'flag', preprocessing_params: dict | None = None)¶
Create a regressor from information available in the existing dataset. Components: are lagged data, datepart information, and holiday.
This function has been confusing people. This is NOT necessary for machine learning models, in AutoTS they internally create more elaborate feature sets separately. This instead may help some other models (GLM, ARIMA) which accept regressors but won’t build a regressor feature set internally. And this allows post-hoc customization as needed before input to AutoTS.
It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:] If you don’t want the lagged features, set summarize=”median” which will only give one column of such, which can then be easily dropped
- Parameters:
df (pd.DataFrame) – WIDE style dataframe (use long_to_wide if the data isn’t already) categorical series will be discard for this, if present
forecast_length (int) – time ahead that will be forecast
frequency (str) – those annoying offset codes you have to always use for time series
holiday_countries (list) – list of countries to pull holidays for. Reqs holidays pkg also can be a dict of {‘country’: “subdiv”} to include subdivision (state)
datepart_method (str) – see date_part from seasonal
scale (bool) – if True, use the StandardScaler to standardize the features
summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’, ‘gaussian_random_projection’
backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression
fill_na (str) – method to prefill NAs in data, same methods as available elsewhere
aggfunc (str) – str or func, used if frequency is resampled
encode_holiday_type (bool) – if True, returns column per holiday, ONLY for holidays package country holidays (not Detector)
holiday_detector_params (dict) – passed to HolidayDetector, or None
holiday_regr_style (str) – passed to detector’s dates_to_holidays ‘flag’, ‘series_flag’, ‘impact’
preprocessing_params (dict) – GeneralTransformer params to be applied before regressor creation
- Returns:
regressor_train, regressor_forecast
autots.tools.seasonal module¶
seasonal
@author: Colin
- autots.tools.seasonal.changepoint_fcst_from_last_row(x_t_last_row, n_forecast=10)¶
- autots.tools.seasonal.create_changepoint_features(DTindex, changepoint_spacing=60, changepoint_distance_end=120)¶
Creates a feature set for estimating trend changepoints using linear regression, ensuring the final changepoint is at changepoint_distance_end from the last row.
Parameters: DTindex (pd.DatetimeIndex): a datetimeindex changepoint_spacing (int): Distance between consecutive changepoints. changepoint_distance_end (int): Number of rows that belong to the final changepoint.
Returns: pd.DataFrame: DataFrame containing changepoint features for linear regression.
- autots.tools.seasonal.create_datepart_components(DTindex, seasonality)¶
single date part one-hot flags.
- autots.tools.seasonal.create_seasonality_feature(DTindex, t, seasonality, history_days=None)¶
Cassandra-designed feature generator.
- autots.tools.seasonal.date_part(DTindex, method: str = 'simple', set_index: bool = True, polynomial_degree: int | None = None, holiday_country: str | None = None, holiday_countries_used: bool = True, lags: int | None = None, forward_lags: int | None = None)¶
Create date part columns from pd.DatetimeIndex.
If you date_part isn’t recognized, you will see a [‘year’, ‘month’ ‘day’, ‘weekday’] output
- Parameters:
DTindex (pd.DatetimeIndex) – datetime index to provide dates
method (str) – expanded, recurring, or simple simple - just day, year, month, weekday expanded - all available futures recurring - all features that should commonly repeat without aging simple_2 simple_3 simple_binarized expanded_binarized common_fourier
set_index (bool) – if True, return DTindex as index of df
polynomial_degree (int) – add this degree of sklearn polynomial features if not None
holdiay_country (list or str) – names of countries to pull calendar holidays for
holiday_countries_used (bool) – to use holiday_country if given
lags (int) – if not None, include the past N previous index date parts
forward_lags (int) – if not None, include the future N index date parts
- Returns:
pd.Dataframe with DTindex
- autots.tools.seasonal.fourier_df(DTindex, seasonality, order=10, t=None, history_days=None)¶
- autots.tools.seasonal.fourier_series(t, p=365.25, n=10)¶
- autots.tools.seasonal.half_yr_spacing(df)¶
- autots.tools.seasonal.random_datepart(method='random')¶
New random parameters for seasonality.
- autots.tools.seasonal.seasonal_independent_match(DTindex, DTindex_future, k, datepart_method='simple_binarized', distance_metric='canberra', full_sort=False, nan_array=None)¶
- autots.tools.seasonal.seasonal_int(include_one: bool = False, small=False, very_small=False)¶
Generate a random integer of typical seasonalities.
- Parameters:
include_one (bool) – whether to include 1 in output options
small (bool) – if True, keep below 364
very_small (bool) – if True keep below 30
- autots.tools.seasonal.seasonal_repeating_wavelet(DTindex, p, order=12, sigma=4.0, wavelet_type='morlet')¶
- autots.tools.seasonal.seasonal_window_match(DTindex, k, window_size, forecast_length, datepart_method, distance_metric, full_sort=False)¶
autots.tools.shaping module¶
Reshape data.
- class autots.tools.shaping.NumericTransformer(na_strings: list = ['', ' '], categorical_fillna: str = 'ffill', handle_unknown: str = 'use_encoded_value', downcast: str | None = None, verbose: int = 0)¶
Bases:
object
General purpose numeric conversion for pandas dataframes.
All categorical data and levels must be passed to .fit(). If new categorical series or levels are present in .transform() it won’t work!
Currently datetimes cannot be inverse_transformed back to datetime
- Parameters:
na_strings (list) – list of strings to replace as pd.NA
categorical_fillna (str) – how to fill NaN for categorical variables (numeric NaN are unaltered) “ffill” - uses forward and backward filling to supply na values “indicator” or anything else currently results in all missing replaced with str “missing_value”
handle_unknown (str) – passed through to scikit-learn OrdinalEncoder
downcast (str) – passed to pd.to_numeric, use None or ‘float’
verbose (int) – greater than 0 to print some messages
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- inverse_transform(df, convert_dtypes: bool = False)¶
Convert numeric back to categorical. :param df: df :type df: pandas.DataFrame :param convert_dtypes: whether to use pd.convert_dtypes after inverse :type convert_dtypes: bool
- transform(df)¶
Convert categorical dataset to numeric.
- autots.tools.shaping.clean_weights(weights, series, verbose=0)¶
Polish up series weighting information
- Parameters:
weights (dict) – dictionary of series_id: weight (float or int)
series (iterable) – list of series_ids in the dataset
- autots.tools.shaping.df_cleanup(df_wide, frequency: str = 'infer', prefill_na: str | None = None, na_tolerance: float = 0.999, drop_data_older_than_periods: int = 100000, drop_most_recent: int = 0, aggfunc: str = 'first', verbose: int = 1)¶
Pass cleaning functions through to dataframe.
- Parameters:
df_wide (pd.DataFrame) – input dataframe to clean.
frequency (str, optional) – frequency in string of alias for DateOffset object, normally “1D” -daily, “MS” -month start etc. Currently, aliases are listed somewhere in here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html. Defaults to “infer”.
na_tolerance (float, optional) – allow up to this percent of values to be NaN, else drop the entire series. The default of 0.95 means a series can be 95% NaN values and still be included. Defaults to 0.999.
drop_data_older_than_periods (int, optional) – cut off older data because eventually you just get too much. Defaults to 100000.
drop_most_recent (int, optional) – number of most recent data points to remove. Useful if you pull monthly data before month end, and you don’t want an incomplete month appearing complete. Defaults to 0.
aggfunc (str, optional) – passed to pd.pivot_table, determines how to aggregate duplicates for upsampling. Other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. If categorical data is provided, aggfunc=’first’ is recommended. Defaults to ‘first’.
verbose (int, optional) – 0 for silence, higher values for more noise. Defaults to 1.
- Returns:
original dataframe, now possibly shorter.
- Return type:
pd.DataFrame
- autots.tools.shaping.freq_to_timedelta(freq)¶
Working around pandas limitations.
- autots.tools.shaping.infer_frequency(df_wide, warn=True, **kwargs)¶
Infer the frequency in a slightly more robust way.
- Parameters:
df_wide (pd.Dataframe or pd.DatetimeIndex) – input to pull frequency from
warn (bool) – unused, here to make swappable with pd.infer_freq
- autots.tools.shaping.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', aggfunc: str = 'first')¶
Take long data and convert into wide, cleaner data.
- Parameters:
df (pd.DataFrame) –
date_col (str) –
value_col (str) –
the name of the column with the values of the time series (ie sales $)
id_col (str) –
name of the id column, unique for each time series
aggfunc (str) –
passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime
other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended
- autots.tools.shaping.simple_train_test_split(df, forecast_length: int = 10, min_allowed_train_percent: float = 0.3, verbose: int = 1)¶
Uses the last periods of forecast_length as the test set, the rest as train
- Parameters:
forecast_length (int) – number of future periods to predict
min_allowed_train_percent (float) –
forecast length cannot be greater than 1 - this
constrains the forecast length from being much larger than than the training data note this includes NaNs in current configuration
- Returns:
train, test (both pd DataFrames)
- autots.tools.shaping.split_digits_and_non_digits(s)¶
- autots.tools.shaping.subset_series(df, weights, n: int = 1000, random_state: int = 2020)¶
Return a sample of time series.
- Parameters:
df (pd.DataFrame) – wide df with series as columns and DT index
n (int) – number of unique time series to keep, or None
random_state (int) – random seed
- autots.tools.shaping.wide_to_3d(wide_arr, seasonality=7, output_shape='gst')¶
Generates 3d (groups/seasonality, series, time steps) from wide (time step, series) numpy array.
- Parameters:
wide_arr (np.array) – wide style (timesteps, series) numpy time series
seasonality (int) – seasonality of the series to use, avoid really large values
output_shape (str) – either ‘gst’ or ‘sgt’ which is output shape gst: (groups/seasonality, series, time steps) sgt: (series, groups/seasonality, time steps)
autots.tools.thresholding module¶
Created on Thu Jul 7 10:27:46 2022
@author: Colin
- class autots.tools.thresholding.NonparametricThreshold(data, warmup_pts: int = 1, p=0.1, error_buffer=1, z_init=2.5, z_limit=12.0, z_step=0.5, max_contamination=0.25, mean_weight: float = 10, sd_weight: float = 10, anomaly_count_weight: float = 1, inverse: bool = False)¶
Bases:
object
- compare_to_epsilon(inverse=False)¶
Compare smoothed error values to epsilon (error threshold) and group consecutive errors together into sequences.
- Parameters:
errors_all (obj) – Errors class object containing list of all
set (previously identified anomalies in test) –
- find_epsilon(inverse=False)¶
Find the anomaly threshold that maximizes function representing tradeoff between:
number of anomalies and anomalous ranges
b) the reduction in mean and st dev if anomalous points are removed from errors
(see https://arxiv.org/pdf/1802.04431.pdf)
- Parameters:
inverse (bool) – If true, epsilon is calculated for inverted errors
- prune_anoms(inverse=False)¶
Remove anomalies that don’t meet minimum separation from the next closest anomaly or error value
- Parameters:
inverse (bool) – If true, epsilon is calculated for inverted errors
- score_anomalies()¶
Calculate anomaly scores based on max distance from epsilon for each anomalous sequence.
- autots.tools.thresholding.consecutive_groups(iterable, ordering=<function <lambda>>)¶
Yield groups of consecutive items using
itertools.groupby()
.From more_itertools package, see description there for details (circa mid 2022)
- autots.tools.thresholding.nonparametric(series, method_params)¶
autots.tools.transform module¶
Preprocessing data methods.
- class autots.tools.transform.AlignLastDiff(rows: int = 1, quantile: float = 0.5, decay_span: float | None = None, displacement_rows: int = 1, **kwargs)¶
Bases:
EmptyTransformer
Shift all data relative to the last value(s) of the series. This version aligns based on historic diffs rather than direct values.
- Parameters:
rows (int) – number of rows to average as diff history. rows=1 rather different from others
quantile (float) – quantile of historic diffs to use as allowed [0, 1]
decay_span (int) – span of exponential decay which softens adjustment to no adjustment
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
adjustment (float) – size of shift, utilized for adjusting the upper and lower bounds to match point forecast
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.AlignLastValue(rows: int = 1, lag: int = 1, method: str = 'additive', strength: float = 1.0, first_value_only: bool = False, threshold: int | None = None, threshold_method: str = 'max', **kwargs)¶
Bases:
EmptyTransformer
Shift all data relative to the last value(s) of the series.
- Parameters:
rows (int) – number of rows to average as last record
lag (int) – use last value as this lag back, 1 is no shift, 2 is lag one from end, ie second to last
method (str) – ‘additive’, ‘multiplicative’
strength (float) – softening parameter [0, 1], 1.0 for full difference
- static find_centerpoint(df, rows, lag)¶
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
adjustment (float) – size of shift, utilized for adjusting the upper and lower bounds to match point forecast
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.AnomalyRemoval(output='multivariate', method='zscore', transform_dict={'transformation_params': {0: {'datepart_method': 'simple_3', 'regression_model': {'model': 'ElasticNet', 'model_params': {}}}}, 'transformations': {0: 'DatepartRegression'}}, method_params={}, fillna=None, isolated_only=False, on_inverse=False, n_jobs=1)¶
Bases:
EmptyTransformer
- fit(df)¶
All will return -1 for anomalies.
- Parameters:
df (pd.DataFrame) – pandas wide-style data
- Returns:
pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)
- fit_anomaly_classifier()¶
Fit a model to predict if a score is an anomaly.
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method='random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- score_to_anomaly(scores)¶
A DecisionTree model, used as models are nonstandard (and nonparametric).
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.BKBandpassFilter(low: int = 6, high: int = 32, K: int = 1, lanczos_factor: int = False, return_diff: int = True, on_transform: bool = True, on_inverse: bool = False, **kwargs)¶
Bases:
EmptyTransformer
More complete implentation of Baxter King Bandpass Filter based off the successful but somewhat confusing statmodelsfilter transformer.
- Parameters:
window (int) – or None, the most recent n history to use for alignment
- filter(df)¶
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.BTCD(regression_model: dict = {'model': 'LinearRegression', 'model_params': {}}, max_lags: int = 1, name: str = 'BTCD', **kwargs)¶
Bases:
EmptyTransformer
Box and Tiao Canonical Decomposition.
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original space.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.CenterLastValue(rows: int = 1, **kwargs)¶
Bases:
EmptyTransformer
Scale all data relative to the last value(s) of the series.
- Parameters:
rows (int) – number of rows to average from most recent data
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.CenterSplit(center: str = 'zero', fillna='linear', suffix: str = '_lltmicro', **kwargs)¶
Bases:
EmptyTransformer
Vaguely Croston inspired approach separating occurrence from magnitude.
- Parameters:
center (str) – ‘zero’ or ‘median’, the value to use as most the intermittent gap
fillna (str) – a fillna method, see standard fillna methods
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.ChangepointDetrend(model: str = 'Linear', changepoint_spacing: int = 60, changepoint_distance_end: int = 120, datepart_method: str | None = None, **kwargs)¶
Bases:
Detrend
Remove trend using changepoint features linked to a specific datetime origin.
- fit(df)¶
Fits trend for later detrending using changepoint features.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df)¶
Return data to original form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.ClipOutliers(method: str = 'clip', std_threshold: float = 4, fillna: str | None = None, **kwargs)¶
Bases:
EmptyTransformer
PURGE THE OUTLIERS.
- Parameters:
method (str) – “clip” or “remove”
std_threshold (float) – number of std devs from mean to call an outlier
fillna (str) – fillna method to use per tools.impute.FillNA
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.Cointegration(det_order: int = -1, k_ar_diff: int = 1, name: str = 'Cointegration', **kwargs)¶
Bases:
EmptyTransformer
Johansen Cointegration Decomposition.
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original space.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.Constraint(constraint_method: int = 'historic_growth', constraint_value: int = 1.0, constraint_direction: str = 'upper', constraint_regularization: int = 1.0, forecast_length: int | None = None, bounds_only: bool = False, fillna: str | None = None, **kwargs)¶
Bases:
EmptyTransformer
Apply constraints (caps on values based on history).
See base.py constraints function for argument documentation
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast', adjustment=None)¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.CumSumTransformer(**kwargs)¶
Bases:
EmptyTransformer
Cumulative Sum of Data.
Warning
Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast
- fit(df)¶
Fits.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
- inverse_transform(df, trans_method: str = 'forecast')¶
Returns data to original or forecast form
- Parameters:
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
- transform(df)¶
Returns changed data :param df: input dataframe :type df: pandas.DataFrame
- autots.tools.transform.DatepartRegression¶
alias of
DatepartRegressionTransformer
- class autots.tools.transform.DatepartRegressionTransformer(regression_model: dict = {'model': 'DecisionTree', 'model_params': {'max_depth': 5, 'min_samples_split': 2}}, datepart_method: str = 'expanded', polynomial_degree: int | None = None, transform_dict: dict | None = None, holiday_country: list | None = None, holiday_countries_used: bool = False, lags: int | None = None, forward_lags: int | None = None, n_jobs: int = 1, **kwargs)¶
Bases:
EmptyTransformer
Remove a regression on datepart from the data. See tools.seasonal.date_part
- fit(df, regressor=None)¶
Fits trend for later detrending.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df, regressor=None)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random', holiday_countries_used=None)¶
Generate new random parameters
- impute(df, regressor=None)¶
Fill Missing. Needs to have same general pattern of missingness (full rows of NaN only or scattered NaN) as was present during .fit()
- inverse_transform(df, regressor=None)¶
Return data to original form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df, regressor=None)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.Detrend(model: str = 'GLS', phi: float = 1.0, window: int | None = None, transform_dict=None, **kwargs)¶
Bases:
EmptyTransformer
Remove a linear trend from the data.
- fit(df)¶
Fits trend for later detrending.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df)¶
Return data to original form. Will only match original if phi==1
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.DiffSmoother(output='multivariate', method=None, transform_dict=None, method_params=None, fillna=2.0, n_jobs=1, adjustment: int = 2, reverse_alignment=True, isolated_only=False)¶
Bases:
EmptyTransformer
- fit(df)¶
Fit. :param df: input dataframe :type df: pandas.DataFrame
- fit_transform(df)¶
Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
- static get_new_params(method='fast')¶
Generate new random parameters
- transform(df)¶
Return differenced data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.DifferencedTransformer(lag=1, fill='bfill')¶
Bases:
object
Difference from lag n value. inverse_transform can only be applied to the original series, or an immediately following forecast.
- Parameters:
lag (int) – number of periods to shift.
fill (str) – method to fill NaN values created by differencing, options: ‘bfill’, ‘zero’.
- fit(df)¶
Fit. :param df: input dataframe. :type df: pandas.DataFrame
- fit_transform(df)¶
Fits and returns differenced DataFrame. :param df: input dataframe. :type df: pandas.DataFrame
- static get_new_params(method: str = 'random')¶
- inverse_transform(df, trans_method='forecast')¶
Returns data to original or forecast form
- Parameters:
df (pandas.DataFrame) – input dataframe.
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original.
- transform(df)¶
Return differenced data.
- Parameters:
df (pandas.DataFrame) – input dataframe.
- class autots.tools.transform.Discretize(discretization: str = 'center', n_bins: int = 10, nan_flag=False, **kwargs)¶
Bases:
EmptyTransformer
Round/convert data to bins.
- Parameters:
discretization (str) – method of binning to apply None - no discretization ‘center’ - values are rounded to center value of each bin ‘lower’ - values are rounded to lower range of closest bin ‘upper’ - values are rounded up to upper edge of closest bin ‘sklearn-quantile’, ‘sklearn-uniform’, ‘sklearn-kmeans’ - sklearn kbins discretizer
n_bins (int) – number of bins to group data into.
nan_flag (bool) – set to True if this has to run on NaN values
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.EWMAFilter(span: int = 7, **kwargs)¶
Bases:
EmptyTransformer
Irreversible filters of Exponential Weighted Moving Average
- Parameters:
span (int) – span of exponetial period to convert to alpha
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.EmptyTransformer(name: str = 'EmptyTransformer', **kwargs)¶
Bases:
object
Base transformer returning raw data.
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.FFTDecomposition(n_harmonics: float = 0.1, detrend: str = 'linear', **kwargs)¶
Bases:
EmptyTransformer
FFT decomposition, then removal, then extrapolation and addition.
- Parameters:
n_harmnonics (float) – number of frequencies to include
detrend (str) – None, ‘linear’, or ‘quadratic’
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.FFTFilter(cutoff: float = 0.1, reverse: bool = False, on_transform: bool = True, on_inverse: bool = False, **kwargs)¶
Bases:
EmptyTransformer
Fit Fourier Transform and keep only lowest frequencies below cutoff
- Parameters:
cutoff (float) – smoothing value
reverse (bool) – if True, keep highest frequencies only
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.FIRFilter(sampling_frequency: int = 365, numtaps: int = 512, cutoff_hz: float = 30, window: str = 'hamming', on_transform: bool = True, on_inverse: bool = False, **kwargs)¶
Bases:
EmptyTransformer
Scipy firwin
- filter(df)¶
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.FastICA(**kwargs)¶
Bases:
EmptyTransformer
sklearn FastICA for signal decomposition. But need to store columns.
- Parameters:
span (int) – span of exponetial period to convert to alpha
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.GeneralTransformer(fillna: str | None = None, transformations: dict = {}, transformation_params: dict = {}, grouping: str | None = None, reconciliation: str | None = None, grouping_ids=None, random_seed: int = 2020, n_jobs: int = 1, holiday_country: list | None = None, verbose: int = 0, forecast_length: int = 30)¶
Bases:
object
Remove fillNA and then mathematical transformations.
Expects a chronologically sorted pandas.DataFrame with a DatetimeIndex, only numeric data, and a ‘wide’ (one column per series) shape.
Warning
- inverse_transform will not fully return the original data under many conditions
the primary intention of inverse_transform is to inverse for forecast (immediately following the historical time period) data from models, not to return original data
NAs filled will be returned with the filled value
Discretization, statsmodels filters, Round, Slice, ClipOutliers cannot be inversed
- RollingMean, PctChange, CumSum, Seasonal Difference, and DifferencedTransformer will only return original or an immediately following forecast
by default ‘forecast’ is expected, ‘original’ can be set in trans_method
- Parameters:
fillNA (str) –
method to fill NA, passed through to FillNA()
’ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling_mean’ - fill with last n (window = 10) values ‘rolling_mean_24’ - fill with avg of last 24 ‘ffill_mean_biased’ - simple avg of ffill and mean ‘fake_date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘IterativeImputer’ - sklearn iterative imputer most of the interpolate methods from pandas.interpolate
transformations (dict) –
transformations to apply {0: “MinMaxScaler”, 1: “Detrend”, …}
’None’ ‘MinMaxScaler’ - Sklearn MinMaxScaler ‘PowerTransformer’ - Sklearn PowerTransformer ‘QuantileTransformer’ - Sklearn ‘MaxAbsScaler’ - Sklearn ‘StandardScaler’ - Sklearn ‘RobustScaler’ - Sklearn ‘PCA, ‘FastICA’ - performs sklearn decomposition and returns n-cols worth of n_components ‘Detrend’ - fit then remove a linear regression from the data ‘RollingMeanTransformer’ - 10 period rolling average, can receive a custom window by transformation_param if used as second_transformation ‘FixedRollingMean’ - same as RollingMean, but with inverse_transform disabled, so smoothed forecasts are maintained. ‘RollingMean10’ - 10 period rolling average (smoothing) ‘RollingMean100thN’ - Rolling mean of periods of len(train)/100 (minimum 2) ‘DifferencedTransformer’ - makes each value the difference of that value and the previous value ‘PctChangeTransformer’ - converts to pct_change, not recommended if lots of zeroes in data ‘SinTrend’ - removes a sin trend (fitted to each column) from the data ‘CumSumTransformer’ - makes value sum of all previous ‘PositiveShift’ - makes all values >= 1 ‘Log’ - log transform (uses PositiveShift first as necessary) ‘IntermittentOccurrence’ - -1, 1 for non median values ‘SeasonalDifference’ - remove the last lag values from all values ‘SeasonalDifferenceMean’ - remove the average lag values from all ‘SeasonalDifference7’,’12’,’28’ - non-parameterized version of Seasonal ‘CenterLastValue’ - center data around tail of dataset ‘Round’ - round values on inverse or transform ‘Slice’ - use only recent records ‘ClipOutliers’ - simple remove outliers ‘Discretize’ - bin or round data into groups ‘DatepartRegression’ - move a trend trained on datetime index “ScipyFilter” - filter data (lose information but smoother!) from scipy “HPFilter” - statsmodels hp_filter “STLFilter” - seasonal decompose and keep just one part of decomposition “EWMAFilter” - use an exponential weighted moving average to smooth data “MeanDifference” - joint version of differencing “Cointegration” - VECM but just the vectors “BTCD” - Box Tiao decomposition ‘AlignLastValue’: align forecast start to end of training data ‘AnomalyRemoval’: more tailored anomaly removal options ‘HolidayTransformer’: detects holidays and wishes good cheer to all ‘LocalLinearTrend’: rolling local trend, using tails for future and past trend ‘KalmanSmoothing’: smooth using a state space model ‘RegressionFilter’: fit seasonal removal and local linear trend, clip std devs away from this fit ‘LevelShiftTransformer’: automatically compensate for historic level shifts in data. ‘CenterSplit’: Croston inspired magnitude/occurrence split for intermittent “FFTFilter”: filter using a fast fourier transform “FFTDecomposition”: remove FFT harmonics, later add back “ReplaceConstant”: replace a value with NaN, optionally fillna then later reintroduce “AlignLastDiff”: shift forecast to be within range of historical diffs “DiffSmoother”: smooth diffs then return to original space “HistoricValues”: match predictions to most similar historic value and overwrite “BKBandpassFilter”: another version of the Baxter King bandpass filter “Constraint”: apply constraints (caps) on values “FIRFilter”: apply a FIR filter (firwin) “ShiftFirstValue”: similar to positive shift but uses the first values as the basis of zero “ThetaTransformer”: decomposes into theta lines, then recombines “ChangepointDetrend”: detrend but with changepoints, and seasonality thrown in for fun “MeanPercentSplitter”: split data into rolling mean and percent of rolling mean
transformation_params (dict) – params of transformers {0: {}, 1: {‘model’: ‘Poisson’}, …} pass through dictionary of empty dictionaries to utilize defaults
random_seed (int) – random state passed through where applicable
forecast_length (int) – length of forecast, not needed as argument for most transformers/params
- fill_na(df, window: int = 10)¶
- Parameters:
df (pandas.DataFrame) – Datetime Indexed
window (int) – passed through to rolling mean fill technique
- Returns:
pandas.DataFrame
- fit(df)¶
Apply transformations and return transformer object.
- Parameters:
df (pandas.DataFrame) – Datetime Indexed
- fit_transform(df)¶
Directly fit and apply transformations to convert df.
- static get_new_params(method='fast')¶
- inverse_transform(df, trans_method: str = 'forecast', fillzero: bool = False, bounds: bool = False)¶
Undo the madness.
- Parameters:
df (pandas.DataFrame) – Datetime Indexed
trans_method (str) – ‘forecast’ or ‘original’ passed through
fillzero (bool) – if inverse returns NaN, fill with zero
bounds (bool) – currently ignores AlignLastValue transform if True (also used in process_components of Cassandra)
- classmethod retrieve_transformer(transformation: str | None = None, param: dict = {}, df=None, random_seed: int = 2020, n_jobs: int = 1, holiday_country: list | None = None, forecast_length: int = 30)¶
Retrieves a specific transformer object from a string.
- Parameters:
df (pandas.DataFrame) – Datetime Indexed - required to set params for some transformers
transformation (str) – name of desired method
param (dict) – dict of kwargs to pass (legacy: an actual param)
- Returns:
transformer object
- transform(df)¶
Apply transformations to convert df.
- class autots.tools.transform.HPFilter(part: str = 'trend', lamb: float = 1600, **kwargs)¶
Bases:
EmptyTransformer
Irreversible filters.
- Parameters:
lamb (int) – lambda for hpfilter
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.HistoricValues(window: int | None = None, **kwargs)¶
Bases:
EmptyTransformer
Overwrite (align) all forecast values with the nearest actual value in window (tail) of history. (affected by upstream transformers, as usual)
- Parameters:
window (int) – or None, the most recent n history to use for alignment
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.HolidayTransformer(anomaly_detector_params={}, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=True, use_lunar_holidays=True, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False, remove_excess_anomalies=True, impact=None, regression_params=None, n_jobs: int = 1, output='multivariate', verbose: int = 1)¶
Bases:
EmptyTransformer
- dates_to_holidays(dates, style='flag', holiday_impacts=False, max_features=365)¶
dates (pd.DatetimeIndex): list of dates style (str): option for how to return information
“long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar
- fit(df)¶
Run holiday detection. Input wide-style pandas time series.
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method='random')¶
Generate new random parameters
- inverse_transform(df)¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.IntermittentOccurrence(center: str = 'median', **kwargs)¶
Bases:
EmptyTransformer
Intermittent inspired binning predicts probability of not center.
Does not inverse to original values!
- Parameters:
center (str) – one of “mean”, “median”, “midhinge”
- fit(df)¶
Fits shift interval.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df)¶
Return data to original form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
0 if Median. 1 if > Median, -1 if less.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.KalmanSmoothing(state_transition=[[1, 1], [0, 1]], process_noise=[[0.1, 0.0], [0.0, 0.01]], observation_model=[[1, 0]], observation_noise: float = 1.0, em_iter: int | None = None, on_transform: bool = True, on_inverse: bool = False, **kwargs)¶
Bases:
EmptyTransformer
Apply a Kalman Filter to smooth data given a transition matrix.
- Parameters:
rows (int) – number of rows to average as last record
lag (int) – use last value as this lag back, 1 is no shift, 2 is lag one from end, ie second to last
method (str) – ‘additive’, ‘multiplicative’
strength (float) – softening parameter [0, 1], 1.0 for full difference
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.LevelShiftMagic(window_size: int = 90, alpha: float = 2.5, grouping_forward_limit: int = 3, max_level_shifts: int = 20, alignment: str = 'average', old_way: bool = False, **kwargs)¶
Bases:
EmptyTransformer
Detects and corrects for level shifts. May seriously alter trend.
- Parameters:
method (str) – “clip” or “remove”
std_threshold (float) – number of std devs from mean to call an outlier
fillna (str) – fillna method to use per tools.impute.FillNA
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- autots.tools.transform.LevelShiftTransformer¶
alias of
LevelShiftMagic
- class autots.tools.transform.LocalLinearTrend(rolling_window: float = 0.1, n_future: float = 0.2, method: str = 'mean', macro_micro: bool = False, suffix: str = '_lltmicro', **kwargs)¶
Bases:
EmptyTransformer
Remove a rolling linear trend. Note this will probably perform poorly with long forecast horizons as forecast trend is simply the tail (n_future) of data’s trend.
- Parameters:
rolling_window (int) – width of window to take trend on
n_future (int) – amount of data for the trend to be used extending beyond the edges of history.
macro_micro (bool) – when True, splits the data into separate parts (trend and residual) and later combines together in inverse
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.MeanDifference(**kwargs)¶
Bases:
EmptyTransformer
Difference from lag n value, but differenced by mean of all series. inverse_transform can only be applied to the original series, or an immediately following forecast
- Parameters:
lag (int) – number of periods to shift (not implemented, default = 1)
- fit(df)¶
Fit. :param df: input dataframe :type df: pandas.DataFrame
- fit_transform(df)¶
Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
- inverse_transform(df, trans_method: str = 'forecast')¶
Returns data to original or forecast form
- Parameters:
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
- transform(df)¶
Return differenced data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.MeanPercentSplitter(window=10, forecast_length=None, **kwargs)¶
Bases:
EmptyTransformer
Splits data into rolling means and percentages. Designed to help with intermittent demand forecasting.
- Parameters:
window (int or str) – Rolling window size. If ‘forecast_length’, uses forecast_length as window.
forecast_length (int) – The forecast length to use if window == ‘forecast_length’.
- fit(df)¶
Fit the transformer to the data.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with pd.DatetimeIndex.
- fit_transform(df)¶
Fit to data, then transform it.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with pd.DatetimeIndex.
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df)¶
Inverse transform the data back to original space.
- Parameters:
df (pandas.DataFrame) – Transformed DataFrame with rolling means and percentages.
- transform(df)¶
Transform the data by splitting into rolling means and percentages.
- Parameters:
df (pandas.DataFrame) – Input DataFrame with pd.DatetimeIndex.
- class autots.tools.transform.PCA(**kwargs)¶
Bases:
EmptyTransformer
sklearn PCA for signal decomposition. But need to store columns.
- Parameters:
span (int) – span of exponetial period to convert to alpha
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.PctChangeTransformer(**kwargs)¶
Bases:
EmptyTransformer
% Change of Data.
Warning
Because % change doesn’t play well with zeroes, zeroes are replaced by positive of the lowest non-zero value. Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision. inverse_transform can only be applied to the original series, or an immediately following forecast
- fit(df)¶
Fits.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Return Magical DataFrame. :param df: input dataframe :type df: pandas.DataFrame
- inverse_transform(df, trans_method: str = 'forecast')¶
Returns data to original or forecast form
- Parameters:
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
- transform(df)¶
Returns changed data :param df: input dataframe :type df: pandas.DataFrame
- class autots.tools.transform.PositiveShift(log: bool = False, center_one: bool = True, squared=False, **kwargs)¶
Bases:
EmptyTransformer
Shift each series if necessary to assure all values >= 1.
- Parameters:
log (bool) – whether to include a log transform.
center_one (bool) – whether to shift to 1 instead of 0.
squared (bool) – whether to square (**2) values after shift.
- fit(df)¶
Fits shift interval.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- inverse_transform(df)¶
Return data to original form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- autots.tools.transform.RandomTransform(transformer_list: dict = {'AlignLastDiff': 0.01, 'AlignLastValue': 0.2, 'AnomalyRemoval': 0.03, 'BKBandpassFilter': 0.01, 'BTCD': 0.01, 'CenterLastValue': 0.01, 'CenterSplit': 0.01, 'ChangepointDetrend': 0.01, 'ClipOutliers': 0.03, 'Cointegration': 0.01, 'Constraint': 0.01, 'CumSumTransformer': 0.02, 'DatepartRegression': 0.01, 'Detrend': 0.02, 'DiffSmoother': 0.005, 'DifferencedTransformer': 0.05, 'Discretize': 0.01, 'EWMAFilter': 0.02, 'FFTDecomposition': 0.01, 'FFTFilter': 0.01, 'FIRFilter': 0.01, 'FastICA': 0.01, 'HPFilter': 0.01, 'HistoricValues': 0.01, 'HolidayTransformer': 0.01, 'IntermittentOccurrence': 0.01, 'KalmanSmoothing': 0.02, 'LevelShiftTransformer': 0.03, 'LocalLinearTrend': 0.01, 'Log': 0.01, 'MaxAbsScaler': 0.03, 'MeanDifference': 0.002, 'MeanPercentSplitter': 0.01, 'MinMaxScaler': 0.03, 'PCA': 0.01, 'PctChangeTransformer': 0.01, 'PositiveShift': 0.02, 'PowerTransformer': 0.01, 'QuantileTransformer': 0.03, 'RegressionFilter': 0.01, 'ReplaceConstant': 0.02, 'RobustScaler': 0.03, 'RollingMean100thN': 0.01, 'RollingMeanTransformer': 0.02, 'Round': 0.02, 'STLFilter': 0.01, 'ScipyFilter': 0.02, 'SeasonalDifference': 0.06, 'ShiftFirstValue': 0.01, 'SinTrend': 0.01, 'Slice': 0.02, 'StandardScaler': 0.04, 'ThetaTransformer': 0.01, 'bkfilter': 0.05, 'cffilter': 0.01, 'convolution_filter': 0.001, None: 0.0}, transformer_max_depth: int = 4, na_prob_dict: dict = {'DatepartRegressionImputer': 0.01, 'IterativeImputer': 0.025, 'IterativeImputerExtraTrees': 0.0001, 'KNNImputer': 0.02, 'SeasonalityMotifImputer': 0.005, 'SeasonalityMotifImputer1K': 0.005, 'SeasonalityMotifImputerLinMix': 0.005, 'fake_date': 0.1, 'ffill': 0.4, 'ffill_mean_biased': 0.1, 'interpolate': 0.4, 'mean': 0.06, 'median': 0.03, 'rolling_mean': 0.1, 'rolling_mean_24': 0.1, 'zero': 0.05, None: 0.001}, fast_params: bool | None = None, superfast_params: bool | None = None, traditional_order: bool = False, transformer_min_depth: int = 1, allow_none: bool = True, no_nan_fill: bool = False)¶
Return a dict of randomly choosen transformation selections.
BTCD is used as a signal that slow parameters are allowed.
- class autots.tools.transform.RegressionFilter(name: str = 'RegressionFilter', sigma: float = 2.0, rolling_window: int = 90, run_order: str = 'season_first', regression_params: dict | None = None, holiday_params: dict | None = None, holiday_country: str = 'US', trend_method: str = 'local_linear', **kwargs)¶
Bases:
EmptyTransformer
Models seasonal and local linear trend, and clips std dvs from this fit.
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
No changes made.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.ReplaceConstant(constant: float = 0, fillna: str = 'linear', reintroduction_model: str | None = None, n_jobs: int = 1, **kwargs)¶
Bases:
EmptyTransformer
Replace constant, filling the NaN, then possibly reintroducing. If reintroducion is used, it is unlikely inverse_transform will match original exactly.
- Parameters:
constant (float) – target to replace
fillna (str) – None, and standard fillna methods of AutoTS
reintroduction_model (dict) – if given, attempts to predict occurrence of constant and reintroduce
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.RollingMeanTransformer(window: int = 10, fixed: bool = False, macro_micro: bool = False, suffix: str = '_lltmicro', center: bool = False, **kwargs)¶
Bases:
EmptyTransformer
Attempt at Rolling Mean with built-in inverse_transform for time series inverse_transform can only be applied to the original series, or an immediately following forecast Does not play well with data with NaNs Inverse transformed values returned will also not return as ‘exactly’ equals due to floating point imprecision.
- Parameters:
window (int) – number of periods to take mean over
fixed (bool) – if True, don’t inverse to volatile state
macro_micro (bool) – if True, split on rolling trend vs remainder and later recombine. Overrides fixed arg.
- fit(df)¶
Fits.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Returns data to original or forecast form
- Parameters:
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
- transform(df)¶
Returns rolling data :param df: input dataframe :type df: pandas.DataFrame
- class autots.tools.transform.Round(decimals: int = 0, on_transform: bool = False, on_inverse: bool = True, force_int: bool = False, **kwargs)¶
Bases:
EmptyTransformer
Round all values. Convert into Integers if decimal <= 0.
Inverse_transform will not undo the transformation!
- Parameters:
method (str) – only “middle”, in future potentially up/ceiling floor/down
decimals (int) – number of decimal places to round to.
on_transform (bool) – perform rounding on transformation
on_inverse (bool) – perform rounding on inverse transform
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.STLFilter(decomp_type='STL', part: str = 'trend', seasonal: int = 7, **kwargs)¶
Bases:
EmptyTransformer
Irreversible filters.
- Parameters:
decomp_type (str) – which decomposition to use
part (str) – which part of decomposition to return
seaonal (int) – seaonsal component of STL
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.ScipyFilter(method: str = 'hilbert', method_args: list | None = None, **kwargs)¶
Bases:
EmptyTransformer
Irreversible filters from Scipy
- Parameters:
method (str) – “hilbert”, “wiener”, “savgol_filter”, “butter”, “cheby1”, “cheby2”, “ellip”, “bessel”,
method_args (list) – passed to filter as appropriate
- fit(df)¶
Fits filter.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fit and Return Detrended DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df)¶
Return data to original form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return detrended data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.SeasonalDifference(lag_1: int = 7, method: str = 'LastValue', **kwargs)¶
Bases:
EmptyTransformer
Remove seasonal component.
“Oh that’s nice - ash on my tomatoes!” - Pippin
- Parameters:
lag_1 (int) – length of seasonal period to remove.
method (str) – ‘LastValue’, ‘Mean’, ‘Median’ to construct seasonality
- fit(df)¶
Fits.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame :param df: input dataframe :type df: pandas.DataFrame
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Returns data to original or forecast form
- Parameters:
df (pandas.DataFrame) – input dataframe
trans_method (str) – whether to inverse on original data, or on a following sequence - ‘original’ return original data to original numbers - ‘forecast’ inverse the transform on a dataset immediately following the original
- transform(df)¶
Returns rolling data :param df: input dataframe :type df: pandas.DataFrame
- class autots.tools.transform.ShiftFirstValue(rows: int = 1, **kwargs)¶
Bases:
EmptyTransformer
Shift all data relative to the first value(s) of the series.
- Parameters:
rows (int) – number of rows to average from beginning of data
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.SinTrend(n_jobs=1, method='lm', **kwargs)¶
Bases:
EmptyTransformer
Modelling sin.
- fit(df)¶
Fits trend for later detrending :param df: input dataframe :type df: pandas.DataFrame
- static fit_sin(tt, yy, method='lm')¶
Fit sin to the input time sequence, and return fitting parameters “amp”, “omega”, “phase”, “offset”, “freq”, “period” and “fitfunc”
from user unsym @ https://stackoverflow.com/questions/16716302/how-do-i-fit-a-sine-curve-to-my-data-with-pylab-and-numpy
- fit_transform(df)¶
Fits and Returns Detrended DataFrame :param df: input dataframe :type df: pandas.DataFrame
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df)¶
Returns data to original form :param df: input dataframe :type df: pandas.DataFrame
- transform(df)¶
Returns detrended data :param df: input dataframe :type df: pandas.DataFrame
- class autots.tools.transform.Slice(method: str = '100', forecast_length: int = 30, **kwargs)¶
Bases:
EmptyTransformer
Take the .tail() of the data returning only most recent values.
Inverse_transform will not undo the transformation!
- Parameters:
method (str) – only “middle”, in future potentially up/ceiling floor/down
forecast_length (int) – forecast horizon, scales some slice windows
- fit(df)¶
Learn behavior of data to change.
- Parameters:
df (pandas.DataFrame) – input dataframe
- fit_transform(df)¶
Fits and Returns Magical DataFrame.
- Parameters:
df (pandas.DataFrame) – input dataframe
- static get_new_params(method: str = 'random')¶
Generate new random parameters
- inverse_transform(df, trans_method: str = 'forecast')¶
Return data to original or forecast form.
- Parameters:
df (pandas.DataFrame) – input dataframe
- transform(df)¶
Return changed data.
- Parameters:
df (pandas.DataFrame) – input dataframe
- class autots.tools.transform.StandardScaler¶
Bases:
object
- fit(df: DataFrame)¶
Compute the mean and standard deviation for each feature.
- fit_transform(df: DataFrame) DataFrame ¶
Fit the scaler and transform the dataset.
- inverse_transform(df: DataFrame) DataFrame ¶
Revert the scaled data back to the original scale.
- transform(df: DataFrame) DataFrame ¶
Scale the dataset using the stored mean and standard deviation.
- class autots.tools.transform.StatsmodelsFilter(method: str = 'bkfilter', **kwargs)¶
Bases:
EmptyTransformer
Irreversible filters. :param method: bkfilter or cffilter or convolution_filter :type method: str
- bkfilter(df)¶
- cffilter(df)¶
- convolution_filter(df)¶
- fit_transform(df)¶
Fit and Return Detrended DataFrame. :param df: input dataframe :type df: pandas.DataFrame
- transform(df)¶
Return detrended data. :param df: input dataframe :type df: pandas.DataFrame
- class autots.tools.transform.ThetaTransformer(theta_values=[0, 2], regularization=0.001, verbose=0)¶
Bases:
object
- fit(df)¶
Fit the transformer to the data.
- Parameters:
df (pd.DataFrame) – DataFrame with DatetimeIndex and columns representing time series.
- Returns:
self
- fit_transform(df)¶
Fit the transformer to the data and then transform it.
- Parameters:
df (pd.DataFrame) – DataFrame to fit and transform.
- Returns:
Transformed DataFrame containing theta lines.
- Return type:
pd.DataFrame
- static get_new_params(method: str = 'random')¶
- inverse_transform(df, trans_method: str = 'forecast')¶
Reconstruct the original data from theta lines.
- Parameters:
df (pd.DataFrame) – DataFrame with theta lines.
- Returns:
Reconstructed DataFrame in the original feature space.
- Return type:
pd.DataFrame
- transform(df)¶
Transform the data into theta lines.
- Parameters:
df (pd.DataFrame) – DataFrame with same index and columns as fitted.
- Returns:
Transformed DataFrame containing theta lines.
- Return type:
pd.DataFrame
- autots.tools.transform.bkfilter_st(x, low=6, high=32, K=12, lanczos_factor=False)¶
This code is mostly from Statsmodel’s bkfilter function.
- autots.tools.transform.clip_outliers(df, std_threshold: float = 4)¶
Replace outliers above threshold with that threshold. Axis = 0.
- Parameters:
df (pandas.DataFrame) – DataFrame containing numeric data
std_threshold (float) – The number of standard deviations away from mean to count as outlier.
- autots.tools.transform.exponential_decay(n, span=None, halflife=None)¶
- autots.tools.transform.get_transformer_params(transformer: str = 'EmptyTransformer', method: str | None = None)¶
Retrieve new random params for new Transformers.
- autots.tools.transform.random_cleaners()¶
Returns transformation params that clean data without shifting.
- autots.tools.transform.remove_outliers(df, std_threshold: float = 3)¶
Replace outliers with np.nan. https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame
- Parameters:
df (pandas.DataFrame) – DataFrame containing numeric data, DatetimeIndex
std_threshold (float) – The number of standard deviations away from mean to count as outlier.
- autots.tools.transform.simple_context_slicer(df, method: str = 'None', forecast_length: int = 30)¶
Condensed version of context_slicer with more limited options.
- Parameters:
df (pandas.DataFrame) – training data frame to slice
method (str) –
Option to slice dataframe ‘None’ - return unaltered dataframe ‘HalfMax’ - return half of dataframe ‘ForecastLength’ - return dataframe equal to length of forecast ‘2ForecastLength’ - return dataframe equal to twice length of forecast
(also takes 4, 6, 8, 10 in addition to 2)
’n’ - any integer length to slice by ‘-n’ - full length less this amount “0.n” - this percent of the full data
- autots.tools.transform.transformer_list_to_dict(transformer_list)¶
Convert various possibilities to dict.
autots.tools.wavelet module¶
- autots.tools.wavelet.continuous_db2_wavelet(t, p, order, sigma)¶
- autots.tools.wavelet.create_daubechies_db2_wavelet(p)¶
Create a Daubechies db2 wavelet with specified period p.
Parameters: - p (int): The period or length to generate the wavelet.
Returns: - np.ndarray: The generated Daubechies db2 wavelet.
- autots.tools.wavelet.create_gaussian_wavelet(p, frequency=3, sigma=1.0)¶
Create a Gaussian-modulated cosine wavelet with specified frequency and sigma.
Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated Gaussian-modulated wavelet.
- autots.tools.wavelet.create_haar_wavelet(p)¶
Create a Haar wavelet with specified period p.
Parameters: - p (float): The period or length to generate the wavelet.
Returns: - np.ndarray: The generated Haar wavelet.
- autots.tools.wavelet.create_mexican_hat_wavelet(p, frequency=None, sigma=1.0)¶
Create a Mexican Hat wavelet (Ricker wavelet) with specified sigma.
Parameters: - p (float): The period or length to generate the wavelet. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated Mexican Hat wavelet.
- autots.tools.wavelet.create_morlet_wavelet(p, frequency=3, sigma=1.0)¶
Create a Morlet wavelet with specified frequency and sigma.
Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated complex Morlet wavelet.
- autots.tools.wavelet.create_narrowing_wavelets(p, max_order, t, sigma=0.5)¶
- autots.tools.wavelet.create_real_morlet_wavelet(p, frequency=3, sigma=1.0)¶
Create a real-valued Morlet wavelet with specified frequency and sigma.
Parameters: - p (float): The period or length to generate the wavelet. - frequency (int): Frequency of the cosine wave. - sigma (float): Standard deviation for the Gaussian envelope.
Returns: - np.ndarray: The generated real Morlet wavelet.
- autots.tools.wavelet.create_wavelet(t, p, sigma=1.0, phase_shift=0, wavelet_type='morlet')¶
Create a real-valued wavelet based on real-world anchored time steps in t, with an additional phase shift and a choice of wavelet type.
Parameters: - t (np.ndarray): Array of time steps (in days) from a specified origin. - p (float): The period of the wavelet in the same units as t (typically days). - sigma (float): Standard deviation for the Gaussian envelope. - phase_shift (float): Phase shift to adjust the position of the wavelet peak. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).
Returns: - np.ndarray: The generated wavelet values for each time step.
- autots.tools.wavelet.offset_wavelet(p, t, order=5, sigma=1.0, wavelet_type='morlet')¶
Create an offset collection of wavelets with order offsets, ensuring that peaks are spaced p/order apart.
Parameters: - p (float): Period of the wavelet in the same units as t (typically days). - t (np.ndarray): Array of time steps. - order (int): The number of offsets. - sigma (float): Standard deviation for the Gaussian envelope. - wavelet_type (str): Type of wavelet (‘morlet’ or ‘ricker’).
Returns: - np.ndarray: A 2D array with order wavelets along axis 1.
autots.tools.window_functions module¶
- autots.tools.window_functions.chunk_reshape(arr, window_size=10, chunk_size=100, sample_fraction=None, random_seed=7734, dtype=<class 'numpy.float32'>)¶
Shifts from (n_records, n_series) to (windows, window_size). Multivariate. More memory efficient, if not quite as fast as x.reshape(-1, x.shape[-1]) for 3D numpy array.
- autots.tools.window_functions.last_window(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False)¶
Pandas based function to provide the last window of window_maker.
- autots.tools.window_functions.np_2d_arange(start=0, stop=3, step=1, num_columns=4)¶
- autots.tools.window_functions.retrieve_closest_indices(df, num_indices, forecast_length, window_size: int = 10, distance_metric: str = 'braycurtis', stride_size: int = 1, start_index: int | None = None, include_differenced: bool = False, include_last: bool = True, verbose: int = 0)¶
Find next indicies closest to the final segment of forecast_length
- Parameters:
df (pd.DataFrame) – source data in wide format
num_indices (int) – number of indices to return
forecast_length (int) – length of forecast
window_size (int) – length of comparison
distance_metric (str) – distance measure from scipy and nan_euclidean
stride_size (int) – length of spacing between windows
start_index (int) – index to begin creation of windows from
include_difference (bool) – if True, also compare on differences
- autots.tools.window_functions.rolling_window_view(array, window_shape=(0,), axis=None, writeable=False)¶
Create a view of array which for every point gives the n-dimensional neighbourhood of size window. New dimensions are added at the end of array or after the corresponding original dimension.
Based on: https://gist.github.com/seberg/3866040 but designed to match the newer np.sliding_window_view
- Parameters:
array (np.array) – Array to which the rolling window is applied.
window_shape (int) – Either a single integer to create a window of only the last axis or a tuple to create it for the last len(window) axis. 0 can be used as a to ignore a dimension in the window.
axis (int) – If given, must have the same size as window. In this case window is interpreted as the size in the dimension given by axis. IE. a window of (2, 1) is equivalent to window=2 and axis=-2.
- Returns:
A view on array which is smaller to fit the windows and has windows added dimensions (0s not counting), ie. every point of array is an array of size window.
- autots.tools.window_functions.sliding_window_view(array, window_shape=(0,), axis=None, writeable=False, **kwargs)¶
Toggles between numpy and internal version depending on np.__version__.
- autots.tools.window_functions.window_id_maker(window_size: int, max_steps: int, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)¶
Create indices for array of multiple window slices of data
- Parameters:
window_size (int) – length of time history to include
max_steps (int) – the maximum number of windows to create
start_index (int) – if to not start at the first point, start at this point
stride_size (int) – number of skips between each window start point
skip_size (int) – number of skips between each obs in a window (downsamples)
- Returns:
np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)
- autots.tools.window_functions.window_lin_reg(x, y, w)¶
- autots.tools.window_functions.window_lin_reg_mean(x, y, w)¶
- autots.tools.window_functions.window_lin_reg_mean_no_nan(x, y, w)¶
- autots.tools.window_functions.window_maker(df, window_size: int = 10, input_dim: str = 'univariate', normalize_window: bool = False, shuffle: bool = False, output_dim: str = 'forecast_length', forecast_length: int = 1, max_windows: int = 5000, regression_type: str | None = None, future_regressor=None, random_seed: int = 1234)¶
Convert a dataset into slices with history and y forecast.
- Parameters:
df (pd.DataFrame) – wide format df with sorted index
window_size (int) – length of history to use for X window
input_dim (str) – univariate or multivariate. If multivariate, all series in single X row
shuffle (bool) – (deprecated)
output_dim (str) – ‘forecast_length’ or ‘1step’ where 1 step is basically forecast_length=1
forecast_length (int) – number of periods ahead that will be forecast
max_windows (int) – a cap on total number of windows to generate. If exceeded, random of this int are selected.
regression_type (str) – None or “user” if to try to concat regressor to windows
future_regressor (pd.DataFrame) – values of regressor if used
random_seed (int) – a consistent random
- Returns:
X, Y
- autots.tools.window_functions.window_maker_2(array, window_size: int, max_steps: int | None = None, start_index: int = 0, stride_size: int = 1, skip_size: int = 1)¶
Create array of multiple window slices of data Note that this returns a different orientation than window_maker_3
- Parameters:
array (np.array) – source of historic information of shape (num_obs, num_series)
window_size (int) – length of time history to include
max_steps (int) – the maximum number of windows to create
start_index (int) – if to not start at the first point, start at this point
stride_size (int) – number of skips between each window start point
skip_size (int) – number of skips between each obs in a window (downsamples)
- Returns:
np.array with 3D shape (num windows, window_length, num columns/series), 2D array if only 1D array provided)
- autots.tools.window_functions.window_maker_3(array, window_size: int, **kwargs)¶
stride tricks version of window. About 40% faster than window_maker_2 Note that this returns a different orientation than window_maker_2
- Parameters:
array (np.array) – in shape of (num_obs, num_series)
window_size (int) – length of slice of history
np.lib.stride_tricks.sliding_window_view (**kwargs passed to) –
- Returns:
np.array with 3D shape (num windows, num columns/series, window_length), 2D array if only 1D array provided)
- autots.tools.window_functions.window_sum_mean(x, w, axis=0)¶
- autots.tools.window_functions.window_sum_mean_nan_tail(x, w, axis=0)¶
- autots.tools.window_functions.window_sum_nan_mean(x, w, axis=0)¶
Module contents¶
Basic utilities.