autots package

Subpackages

Module contents

Automated Time Series Model Selection for Python

https://github.com/winedarksea/AutoTS

class autots.AnomalyDetector(output='multivariate', method='zscore', transform_dict={'transformation_params': {0: {'datepart_method': 'simple_3', 'regression_model': {'model': 'ElasticNet', 'model_params': {}}}}, 'transformations': {0: 'DatepartRegression'}}, forecast_params=None, method_params={}, eval_period=None, isolated_only=False, n_jobs=1)

Bases: object

detect(df)

Shared anomaly detection routine.

fit(df)
fit_anomaly_classifier()

Fit a model to predict if a score is an anomaly.

static get_new_params(method='random')

Generate random new parameter combinations.

Parameters:

method (str) – ‘fast’, ‘deep’, ‘default’, or any of the anomaly method names (ie ‘IQR’) to specify only that method

plot(series_name=None, title=None, marker_size=None, plot_kwargs={}, start_date=None)
remove_anomalies(df=None, fillna=None)

Detect and return a copy of the data with anomalies removed (set NaN or filled).

Parameters:
  • df (pd.DataFrame, optional) – data to run detection on. If None, uses previous detect input.

  • fillna (str, optional) – fill method passed to autots.tools.impute.FillNA.

score_to_anomaly(scores)

A DecisionTree model, used as models are nonstandard (and nonparametric).

class autots.AutoTS(forecast_length: int = 14, frequency: str = 'infer', prediction_interval: float = 0.9, max_generations: int = 25, no_negatives: bool = False, constraint: float | None = None, ensemble: str | None = None, initial_template: str = 'General+Random', random_seed: int = 2022, holiday_country: str = 'US', subset: int | None = None, aggfunc: str = 'first', na_tolerance: float = 1, metric_weighting: dict = {'containment_weighting': 0, 'contour_weighting': 0.01, 'imle_weighting': 0, 'made_weighting': 0.05, 'mae_weighting': 2, 'mage_weighting': 0, 'mle_weighting': 0, 'oda_weighting': 0.001, 'rmse_weighting': 2, 'runtime_weighting': 0.01, 'smape_weighting': 5, 'spl_weighting': 3, 'wasserstein_weighting': 0.01}, drop_most_recent: int = 0, drop_data_older_than_periods: int | None = None, model_list: str = 'scalable', transformer_list: dict = 'auto', transformer_max_depth: int = 6, models_mode: str = 'random', num_validations: int = 'auto', models_to_validate: float = 0.15, max_per_model_class: int | None = None, skip_slow_models_seconds: float | None = None, validation_method: str = 'backwards', min_allowed_train_percent: float = 0.5, remove_leading_zeroes: bool = False, prefill_na: str | None = None, introduce_na: bool | None = None, preclean: dict | None = None, model_interrupt: bool = 'stop', generation_timeout: int | None = None, current_model_file: str | None = None, force_gc: bool = False, horizontal_ensemble_validation: bool = True, custom_metric: Callable[[ndarray, ndarray, ndarray, float], ndarray] | None = None, verbose: int = 1, n_jobs: float = 0.5)

Bases: object

Automate time series modeling using a genetic algorithm.

Parameters:
  • forecast_length (int) – number of periods over which to evaluate forecast. Can be overriden later in .predict(). when you don’t have much historical data, using a small forecast length for .fit and the full desired forecast lenght for .predict is usually the best possible approach given limitations.

  • frequency (str) – ‘infer’ or a specific pandas datetime offset. Can be used to force rollup of data (ie daily input, but frequency ‘M’ will rollup to monthly).

  • prediction_interval (float) – 0-1, uncertainty range for upper and lower forecasts. Adjust range, but rarely matches actual containment.

  • max_generations (int) – number of genetic algorithms generations to run. More runs = longer runtime, generally better accuracy. It’s called max because someday there will be an auto early stopping option, but for now this is just the exact number of generations to run.

  • no_negatives (bool) – if True, all negative predictions are rounded up to 0.

  • constraint (float) –

    when not None, use this float value * data st dev above max or below min for constraining forecast values. now also instead accepts a dictionary containing the following key/values:

    constraint_method (str): one of

    stdev_min - threshold is min and max of historic data +/- constraint * st dev of data stdev - threshold is the mean of historic data +/- constraint * st dev of data absolute - input is array of length series containing the threshold’s final value for each quantile - constraint is the quantile of historic data to use as threshold

    constraint_regularization (float): 0 to 1

    where 0 means no constraint, 1 is hard threshold cutoff, and in between is penalty term

    upper_constraint (float): or array, depending on method, None if unused lower_constraint (float): or array, depending on method, None if unused bounds (bool): if True, apply to upper/lower forecast, otherwise False applies only to forecast

  • ensemble (str) – None or list or comma-separated string containing: ‘auto’, ‘simple’, ‘distance’, ‘horizontal’, ‘horizontal-min’, ‘horizontal-max’, “mosaic”, “subsample”

  • initial_template (str) – ‘Random’ - randomly generates starting template, ‘General’ uses template included in package, ‘General+Random’ - both of previous. Also can be overriden with self.import_template()

  • random_seed (int) – random seed allows (slightly) more consistent results.

  • holiday_country (str) – passed through to Holidays package for some models.

  • subset (int) – maximum number of series to evaluate at once. Useful to speed evaluation when many series are input. takes a new subset of columns on each validation, unless mosaic ensembling, in which case columns are the same in each validation

  • aggfunc (str) – if data is to be rolled up to a higher frequency (daily -> monthly) or duplicate timestamps are included. Default ‘first’ removes duplicates, for rollup try ‘mean’ or np.sum. Beware numeric aggregations like ‘mean’ will not work with non-numeric inputs. Numeric aggregations like ‘sum’ will also change nan values to 0

  • na_tolerance (float) – 0 to 1. Series are dropped if they have more than this percent NaN. 0.95 here would allow series containing up to 95% NaN values.

  • metric_weighting (dict) – weights to assign to metrics, effecting how the ranking score is generated.

  • drop_most_recent (int) – option to drop n most recent data points. Useful, say, for monthly sales data where the current (unfinished) month is included. occurs after any aggregration is applied, so will be whatever is specified by frequency, will drop n frequencies

  • drop_data_older_than_periods (int) – take only the n most recent timestamps

  • model_list (list) – str alias or list of names of model objects to use now can be a dictionary of {“model”: prob} but only affects starting random templates. Genetic algorithim takes from there.

  • transformer_list (list) – list of transformers to use, or dict of transformer:probability. Note this does not apply to initial templates. can accept string aliases: “all”, “fast”, “superfast”, ‘scalable’ (scalable is a subset of fast that should have fewer memory issues at scale)

  • transformer_max_depth (int) – maximum number of sequential transformers to generate for new Random Transformers. Fewer will be faster.

  • models_mode (str) – option to adjust parameter options for newly generated models. Only sporadically utilized. Currently includes: ‘default’/’random’, ‘deep’ (searches more params, likely slower), and ‘regressor’ (forces ‘User’ regressor mode in regressor capable models), ‘gradient_boosting’, ‘neuralnets’ (~Regression class models only)

  • num_validations (int) – number of cross validations to perform. 0 for just train/test on best split. Possible confusion: num_validations is the number of validations to perform after the first eval segment, so totally eval/validations will be this + 1. Also “auto” and “max” aliases available. Max maxes out at 50.

  • models_to_validate (int) – top n models to pass through to cross validation. Or float in 0 to 1 as % of tried. 0.99 is forced to 100% validation. 1 evaluates just 1 model. If horizontal or mosaic ensemble, then additional min per_series models above the number here are added to validation.

  • max_per_model_class (int) – of the models_to_validate what is the maximum to pass from any one model class/family.

  • skip_slow_models_seconds (float) – if not None, skip models from the initial round that took longer than this many seconds during validation. Does not apply to ensemble models. Defaults to None (no skipping). If too small a number is chosen, can cause an error in validation when all models are too slow.

  • validation_method (str) – ‘even’, ‘backwards’, or ‘seasonal n’ where n is an integer of seasonal ‘backwards’ is better for recency and for shorter training sets ‘even’ splits the data into equally-sized slices best for more consistent data, a poetic but less effective strategy than others here ‘seasonal’ most similar indexes ‘seasonal n’ for example ‘seasonal 364’ would test all data on each previous year of the forecast_length that would immediately follow the training data. ‘similarity’ automatically finds the data sections most similar to the most recent data that will be used for prediction ‘mixed_length’ - validation_indexes is a list of tuples (train, test). Can be different forecast lengths. Mosaic ensembles not functional with this ‘custom’ - if used, .fit() needs validation_indexes passed - a list of pd.DatetimeIndex’s, tail of each is used as test

  • min_allowed_train_percent (float) – percent of forecast length to allow as min training, else raises error. 0.5 with a forecast length of 10 would mean 5 training points are mandated, for a total of 15 points. Useful in (unrecommended) cases where forecast_length > training length.

  • remove_leading_zeroes (bool) – replace leading zeroes with NaN. Useful in data where initial zeroes mean data collection hasn’t started yet.

  • prefill_na (str) – value to input to fill all NaNs with. Leaving as None and allowing model interpolation is recommended. None, 0, ‘mean’, or ‘median’. 0 may be useful in for examples sales cases where all NaN can be assumed equal to zero.

  • introduce_na (bool) – whether to force last values in one training validation to be NaN. Helps make more robust models. defaults to None, which introduces NaN in last rows of validations if any NaN in tail of training data. Will not introduce NaN to all series if subset is used. if True, will also randomly change 20% of all rows to NaN in the validations

  • preclean (dict) – if not None, a dictionary of Transformer params to be applied to input data {“fillna”: “median”, “transformations”: {}, “transformation_params”: {}} This will change data used in model inputs for fit and predict, and for accuracy evaluation in cross validation!

  • model_interrupt (bool | str | dict) – configure how KeyboardInterrupts are handled. False keeps default Python behaviour (immediate termination). True or “skip” skips only the current model; press Ctrl+C twice within 1.5 seconds to stop the entire run. “end_generation” skips the current model and ends the rest of the active generation; a second Ctrl+C within 1.5 seconds still stops the run. “stop” or “run” ends the whole run on the first interrupt. Provide a dict such as {“mode”: “skip”, “double_press_window”: 1.2} to change the mode and double-press window.

  • generation_timeout (int) – if not None, this is the number of minutes from start at which the generational search ends, then proceeding to validation This is only checked after the end of each generation, so only offers an ‘approximate’ timeout for searching. It is an overall cap for total generation search time, not per generation.

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

  • force_gc (bool) – if True, run gc.collect() after each model run. Probably won’t make much difference.

  • horizontal_ensemble_validation (bool) – True is slower but more reliable model selection on unstable data, if horz. ensembles are used

  • metric (custom) – a function to generate a custom metric. Expects func(A, F, df_train, prediction_interval) where the first three are np arrays of wide style 2d.

  • verbose (int) – setting to 0 or lower should reduce most output. Higher numbers give more output.

  • n_jobs (int) – Number of cores available to pass to parallel processing. A joblib context manager can be used instead (pass None in this case). Also ‘auto’.

best_model

DataFrame containing template for the best ranked model

Type:

pd.DataFrame

best_model_name

model name

Type:

str

best_model_params

model params

Type:

dict

best_model_transformation_params

transformation parameters

Type:

dict

best_model_ensemble

Ensemble type int id

Type:

int

used_frequency

datetime frequency offset string

Type:

str

regression_check

If True, the best_model uses an input ‘User’ future_regressor

Type:

bool

df_wide_numeric

dataframe containing shaped final data, will include preclean

Type:

pd.DataFrame

initial_results.model_results

contains a collection of result metrics

Type:

object

score_per_series

generated score of metrics given per input series, if horizontal ensembles

Type:

pd.DataFrame

fit, predict, get_new_params
export_template, import_template, import_results, import_best_model
results, failure_rate
horizontal_to_df, mosaic_to_df
plot_horizontal, plot_horizontal_transformers, plot_generation_loss, plot_backforecast
back_forecast(series=None, n_splits: int = 'auto', tail: int = 'auto', verbose: int = 0)

Create forecasts for the historical training data, ie. backcast or back forecast. OUT OF SAMPLE

This actually forecasts on historical data, these are not fit model values as are often returned by other packages. As such, this will be slower, but more representative of real world model performance. There may be jumps in data between chunks.

Args are same as for model_forecast except… n_splits(int): how many pieces to split data into. Pass 2 for fastest, or “auto” for best accuracy series (str): if to run on only one column, pass column name. Faster than full. tail (int): df.tail() of the dataset, back_forecast is only run on n most recent observations.

which points at eval_periods of lower-level back_forecast function

Returns a standard prediction object (access .forecast, .lower_forecast, .upper_forecast)

best_model_per_series_mape()

This isn’t quite classic mape but is a percentage mean error intended for quick visuals not final statistics (see model.results()).

best_model_per_series_score()
best_model_str_val_results()

Generate a readable string of validation results for the best model.

Returns:

Formatted string containing validation results for key metrics (SMAPE, MAE, SPL)

Return type:

str

create_unpredictability_score(df_wide=None, scale=False)

Create a dataframe per validation index of relative unpredictability. Most representative on longer model searches.

diagnose_params(target='runtime', waterfall_plots=True)

Attempt to explain params causing measured outcomes using shap and linear regression coefficients.

Parameters:
  • target (str) – runtime, smape, mae, oda, or exception, the measured outcome to correlate parameters with

  • waterfall_plots (bool) – whether to show waterfall SHAP plots

diagnose_params_new(target='runtime', min_occurrences: int = 3)

Summarize which parameter choices are linked to slow runtimes or errors.

Parameters:
  • target (str) – one of runtime, smape, mae, oda, exception, or a column in the results DataFrame.

  • min_occurrences (int) – minimum number of rows a parameter/value pair must appear in to be included.

Returns:

aggregated statistics per parameter/value pair.

Return type:

pandas.DataFrame

expand_horizontal(force=False)

Enables expanding horizontal models trained on a subset to full data. Reruns template models and generates new template. Requires a horizontal model set as best model.

see best_model_original and best_model_original_id for reference back to original best model after this runs

Parameters:

force (bool) – if True, runs expansions whether subset or not. Necessary on imported template without .fit() as subset flag is set in fit

export_best_model(filename, **kwargs)

Basically the same as export_template but only ever the one best model.

export_template(filename=None, models: str = 'best', n: int = 40, max_per_model_class: int | None = None, include_results: bool = False, unpack_ensembles: bool = False, min_metrics: list = ['smape', 'spl', 'wasserstein', 'mle', 'imle', 'ewmae'], max_metrics: list | None = None, focus_models: list | None = None, include_ensemble: bool = True)

Export top results as a reusable template.

Parameters:
  • filename (str) – ‘csv’ or ‘json’ (in filename). None to return a dataframe and not write a file.

  • models (str) – ‘best’ or ‘all’, and ‘slowest’ for diagnostics

  • n (int) – if models = ‘best’, how many n-best to export

  • max_per_model_class (int) – if models = ‘best’, the max number of each model class to include in template

  • include_results (bool) – whether to include performance metrics

  • unpack_ensembles (bool) – if True, ensembles are returned only as components (will result in larger n models, as full ensemble counts as 1 model)

  • min_metrics (list) – if not None and models==’best’, include the lowest for this metric, a way to include even if not a major part of metric weighting as an addon

  • max_metrics (list) – for metrics to take the max model for

  • focus_models (list) – also pull the best score/min/max metrics as per just this model

  • include_ensemble (bool) – if False, exclude Ensembles (ignored with “all” models)

failure_rate(result_set: str = 'initial')

Return fraction of models passing with exceptions.

Parameters:

result_set (str, optional) – ‘validation’ or ‘initial’. Defaults to ‘initial’.

Returns:

float.

fit(df, date_col: str | None = None, value_col: str | None = None, id_col: str | None = None, future_regressor=None, weights: dict = {}, result_file: str | None = None, grouping_ids=None, validation_indexes: list | None = None)

Train algorithm given data supplied.

Parameters:
  • df (pandas.DataFrame) – Datetime Indexed dataframe of series, or dataframe of three columns as below.

  • date_col (str) – name of datetime column if long style data

  • value_col (str) – name of column containing the data of series if using long style data. NOT for pointing out the most important column if several, that’s weights

  • id_col (str) – name of column identifying different series if long style data.

  • future_regressor (numpy.Array) – single external regressor matching train.index

  • weights (dict) – {‘colname1’: 2, ‘colname2’: 5} - increase importance of a series in metric evaluation. Any left blank assumed to have weight of 1. pass the alias ‘mean’ as a str ie weights=’mean’ to automatically use the mean value of a series as its weight available aliases: mean, median, min, max

  • result_file (str) – results saved on each new generation. Does not include validation rounds. “.csv” save model results table. “.pickle” saves full object, including ensemble information.

  • grouping_ids (dict) – currently a one-level dict containing series_id:group_id mapping. used in 0.2.x but not 0.3.x+ versions. retained for potential future use

  • validation_indexes (list) – list of datetime indexes, tail of forecast length is used as holdout

fit_data(df, date_col=None, value_col=None, id_col=None, future_regressor=None, weights={})

Part of the setup that involves fitting the initial data but not running any models.

get_metric_corr(percent_best=0.1)

Returns a dataframe of correlation among evaluation metrics across evaluations.

Parameters:

percent_best (float) – percent (ie 0.1 for 10%) of models to use, best by score first

static get_new_params(method='random')

Randomly generate new parameters for the class.

get_params_from_id(model_id=None)

Model id must be one that was run in the fit or imported results.

get_top_n_counts(input_dict=None, n=5)

Get the top n most common value counts using a basic dictionary.

horizontal_per_generation()
horizontal_to_df()

helper function for plotting.

import_best_model(import_target, enforce_model_list: bool = True, include_ensemble: bool = True)

Load a best model, overriding any existing setting.

Parameters:

import_target – pd.DataFrame or file path

import_results(filename)

Add results from another run on the same data.

Input can be filename with .csv or .pickle. or can be a DataFrame of model results or a full TemplateEvalObject

import_template(filename: str, method: str = 'add_on', enforce_model_list: bool = True, include_ensemble: bool = False, include_horizontal: bool = False, force_validation: bool = False)

Import a previously exported template of model parameters. Must be done before the AutoTS object is .fit().

Use import_best_model instead for loading a model for immediate prediction.

Parameters:
  • filename (str) – file location (or a pd.DataFrame already loaded)

  • method (str) – ‘add_on’ or ‘only’ - “add_on” keeps initial_template generated in init. “only” uses only this template.

  • enforce_model_list (bool) – if True, remove model types not in model_list

  • include_ensemble (bool) – if enforce_model_list is True, this specifies whether to allow ensembles anyway (otherwise they are unpacked and parts kept)

  • include_horizontal (bool) – if enforce_model_list is True, this specifies whether to allow ensembles except horizontal (overridden by keep_ensemble)

  • force_validation (bool) – if True, all models imported here will automatically get sent to full cross validation (regardless of first eval performance) weird behavior can occur wtih force_validation if another template is added later with method==’only’. In that case, model.validate_import should be erased by setting to None

list_failed_model_types()

Return a list of model types (ie ETS, LastValueNaive) that failed. If all had at least one success, then return an empty list.

load_template(filename)

Helper funciton for just loading the file part of import_template.

mosaic_to_df()

Helper function to create a readable df of models in mosaic.

parse_best_model()
plot_back_forecast(**kwargs)
plot_backforecast(series=None, n_splits: int = 'auto', start_date='auto', title=None, alpha=0.25, facecolor='black', loc='upper left', **kwargs)

Plot the historical data and fit forecast on historic. Out of sample in chunks = forecast_length by default.

Parameters:
  • series (str or list) – column names of time series

  • n_splits (int or str) – “auto”, number > 2, higher more accurate but slower

  • start_date (datetime.datetime) – or “auto”

  • title (str) –

  • pd.DataFrame.plot() (**kwargs passed to) –

plot_chosen_transformer(df_wide=None, series=None, chosen_params=None, color1='grey', color2='darkorange')

visualizes the best model transformer, fit_transform (not inverse) effects. Won’t show much for ensembles, only shows overall transformer if present.

Parameters:
  • df_wide (pd.DataFrame) – optional, useful if preclean used

  • series (str) – name of time series to plot

  • chosen_params (dict) – the parameters of the transformer to use, defaults to best of model search

  • color1 (str) – color of original

  • color2 (str) – color of transformed

plot_failure_rate(target='transformers')
plot_generation_loss(title='Single Model Accuracy Gain Over Generations', **kwargs)

Plot improvement in accuracy over generations. Note: this is only “one size fits all” accuracy and doesn’t account for the benefits seen for ensembling.

Parameters:

pd.DataFrame.plot() (**kwargs passed to) –

plot_horizontal(max_series: int = 20, title='Model Types Chosen by Series', **kwargs)

Simple plot to visualize assigned series: models.

Note that for ‘mosaic’ ensembles, it only plots the type of the most common model_id for that series, or the first if all are mode.

Parameters:
  • max_series (int) – max number of points to plot

  • pandas.plot() (**kwargs passed to) –

plot_horizontal_model_count(color_list=None, top_n: int = 20, title='Most Frequently Chosen Models', **kwargs)

Plots most common models. Does not factor in nested in non-horizontal Ensembles.

plot_horizontal_per_generation(title='Horizontal Ensemble Accuracy Gain (first eval sample only)', **kwargs)

Plot how well the horizontal ensembles would do after each new generation. Slow.

plot_horizontal_transformers(method='transformers', color_list=None, **kwargs)

Simple plot to visualize transformers used. Note this doesn’t capture transformers nested in simple ensembles.

Parameters:
  • method (str) – ‘fillna’ or ‘transformers’ - which to plot

  • "roman". (color_list = list of colors to sample for bar colors. Can be names or hex. Or) –

  • pandas.plot() (**kwargs passed to) –

plot_metric_corr(cols=None, percent_best=0.1)

Plot correlation in results among metrics. The metrics that are highly correlated are those that mostly the unscaled ones

Parameters:
  • cols (list) – strings of columns to show, ‘all’ for all

  • percent_best (float) – percent (ie 0.1 for 10%) of models to use, best by score first

plot_model_failure_rate()

Failure Rate per Transformer type (ignoring ensembles), failure may be due to other model or transformer.

plot_mosaic(max_series: int = 60, max_rows: int | None = None, colors=None)

Show the mosaic in a mosaic ensemble, if used.

plot_per_series_error(title: str = 'Top Series Contributing Score Error', max_series: int = 10, max_name_chars: int = 25, color: str = '#ff9912', figsize=(12, 4), kind: str = 'bar', upper_clip: float = 1000, **kwargs)

Plot which series are contributing most to error (Score) of final model. Avg of validations for best_model

Parameters:
  • title (str) – plot title

  • max_series (int) – max number of series to show on plot (sorted)

  • max_name_chars (str) – if horizontal ensemble, will chop series names to this

  • color (str) – hex or name of color of plot

  • figsize (tuple) – passed through to plot axis

  • kind (str) – bar or pie

  • upper_clip (float) – set max error show to this value, to prevent unnecessary distortion

  • pandas.plot() (**kwargs passed to) –

plot_per_series_mape(title: str | None = None, max_series: int = 10, max_name_chars: int = 25, color: str = '#ff9912', figsize=(12, 4), kind: str = 'bar', **kwargs)

Plot which series are contributing most to SMAPE of final model. Avg of validations for best_model

Parameters:
  • title (str) – plot title

  • max_series (int) – max number of series to show on plot (sorted)

  • max_name_chars (str) – if horizontal ensemble, will chop series names to this

  • color (str) – hex or name of color of plot

  • figsize (tuple) – passed through to plot axis

  • kind (str) – bar or pie

  • pandas.plot() (**kwargs passed to) –

plot_per_series_smape(title: str | None = None, max_series: int = 10, max_name_chars: int = 25, color: str = '#ff9912', figsize=(12, 4), kind: str = 'bar', **kwargs)

To be backwards compatible, not necessarily maintained, plot_per_series_mape is to be preferred.

plot_series_corr(cols=15)

Plot series correlation. Data must be fit first.

Parameters:

cols (list) – strings of columns to show, ‘all’ for all, or int of number to sample

plot_transformer_by_class(template=None, colors: dict | None = None, top_n: int = 15, plot_group: str = 'ModelClass')

Using the best results (from exported template), plot usage of transformers by model class.

Parameters:
  • template (pd.DataFrame) – template object of models to use for assesement, uses best 50 otherwise

  • colors (dict) – color mapping of model class to color

  • top_n (int) – number of most frequently used transformers to plot

plot_transformer_failure_rate()

Failure Rate per Transformer type (ignoring ensembles), failure may be due to other model or transformer.

plot_unpredictability(df_wide=None, series=None, **kwargs)
plot_validations(df_wide=None, models=None, series=None, title=None, start_date='auto', end_date='auto', subset=None, compare_horizontal=False, colors=None, include_bounds=True, alpha=0.35, start_color='darkred', end_color='#A2AD9C', **kwargs)

Similar to plot_backforecast but using the model’s validation segments specifically. Must reforecast. Saves results to self.validation_forecasts and caches. Set validation_forecasts_template to None to force rerun otherwise it uses stored (when models is the same). ‘chosen’ refers to best_model_id, the model chosen to run for predict Validation sections may overlap (depending on method) which can confuse graph readers.

Parameters:
  • models (list) – list, str, df or None, models to compare (IDs unless df of model params)

  • series (str) – time series to graph

  • title (str) – graph title

  • start_date (str) – ‘auto’ or datetime, place to begin graph, None for full

  • end_date (str) – ‘auto’ or datetime, end of graph x axis

  • subset (str) – overrides series, shows either ‘best’ or ‘worst’

  • compare_horizontal (bool) – if True, plot horizontal ensemble versus best non-horizontal model, when available

  • include_bounds (bool) – if True (default) include the upper/lower forecast bounds

  • start_color (str) – color of vline for val start marker, None to remove vline

  • end_color (str) – color of vline for val end marker, None to remove vline

predict(forecast_length: int = 'self', prediction_interval: float = 'self', future_regressor=None, hierarchy=None, just_point_forecast: bool = False, fail_on_forecast_nan: bool = True, verbose: int = 'self', df=None)

Generate forecast data immediately following dates of index supplied to .fit().

If using a model from update_fit list, with no ensembling, underlying model will not be retrained when used as below, with a single prediction interval: This designed for high speed forecasting. Full retraining is best when there is sufficient time. `python model = AutoTS(model_list='update_fit') model.fit(df) model.predict() # for new data without retraining model.fit_data(df) model.predict() # to force retrain of best model (but not full model search) model.model = None model.fit_data(df) model.predict() `

Parameters:
  • forecast_length (int) – Number of periods of data to forecast ahead

  • prediction_interval (float) –

    interval of upper/lower forecasts. defaults to ‘self’ ie the interval specified in __init__() if prediction_interval is a list, then returns a dict of forecast objects.

    {str(interval): prediction_object}

  • future_regressor (numpy.Array) – additional regressor

  • hierarchy – Not yet implemented

  • just_point_forecast (bool) – If True, return a pandas.DataFrame of just point forecasts

  • fail_on_forecast_nan (bool) – if False, return forecasts even if NaN present, if True, raises error if any nan in forecast

  • df (pd.DataFrame) – wide style df, if present, calls fit_data with this dataframe. Recommended strongly to use model.fit_data(df) first instead as it has more args.

Returns:

Either a PredictionObject of forecasts and metadata, or if just_point_forecast == True, a dataframe of point forecasts

results(result_set: str = 'initial')

Convenience function to return tested models table.

Parameters:

result_set (str) – ‘validation’ or ‘initial’

retrieve_validation_forecasts(models=None, compare_horizontal=False, id_name='SeriesID', value_name='Value', interval_name='PredictionInterval')

Retrieve validation forecasts in long form dataframe. If a user is looking to do custom review of the holdout forecasts, this is the function to use.

save_template(filename, export_template, **kwargs)

Helper function for the save part of export_template.

validation_agg()
class autots.Cassandra(preprocessing_transformation: dict | None = None, scaling: str = 'BaseScaler', past_impacts_intervention: str | None = None, seasonalities: dict = ['common_fourier'], ar_lags: list | None = None, ar_interaction_seasonality: dict | None = None, anomaly_detector_params: dict | None = None, anomaly_intervention: str | None = None, holiday_detector_params: dict | None = None, holiday_countries: dict | None = None, holiday_countries_used: bool = True, multivariate_feature: str | None = None, multivariate_transformation: str | None = None, regressor_transformation: dict | None = None, regressors_used: bool = True, linear_model: dict | None = None, randomwalk_n: int | None = None, trend_window: int = 30, trend_standin: str | None = None, trend_anomaly_detector_params: dict | None = None, trend_transformation: dict = {}, trend_model: dict = {'Model': 'LastValueNaive', 'ModelParameters': {}}, trend_phi: float | None = None, constraint: dict | None = None, x_scaler: bool = False, max_colinearity: float = 0.998, max_multicolinearity: float = 0.001, frequency: str = 'infer', prediction_interval: float = 0.9, random_seed: int = 2022, verbose: int = 0, n_jobs: int = 'auto', forecast_length: int = 30, **kwargs)

Bases: ModelObject

Explainable decomposition-based forecasting with advanced trend modeling and preprocessing.

Tunc etiam fatis aperit Cassandra futuris ora, dei iussu non umquam credita Teucris. Nos delubra deum miseri, quibus ultimus esset ille dies, festa velamus fronde per urbem. -Aeneid 2.246-2.249

In general, all time series data inputs (df, regressors, impacts) should be wide style data in a pd.DataFrame

an index that is a pd.DatetimeIndex one column per time series, with a uniquely identifiable column name

Impacts get confusing. A past impact of 0.05 would mean an outside, unforecastable force caused/added 5% of the value at this time. Accordingly, that 5% will be removed before forecasting, then added back on after. Impacts can also be negative values. A future impact of 5% would mean an outside force adds 5% above the original forecast. Future impacts can be used to model product goals or temporary anomalies which can’t or should’t be modeled by forecasting and whose relative effect is known Compare this with regressors, which are essentially the model estimating the relative impact given the raw size or presence of an outside effect

Warn about remove_excess_anomalies from holiday detector if relying on anomaly prediction Linear components are always model elements, but trend is actuals (history) and model (future) Running predict updates some internal attributes used in plotting and other figures, generally expect to use functions to latest predict Seasonalities are hard-coded to be as days so 7 will always = weekly even if data isn’t daily For slope analysis and zero crossings, a slope of 0 evaluates as a positive sign (=>0). Exactly 0 slope is rare real world data Does not currently follow the regression_type=’User’ and fails if no regressor pattern of other models For component decomposition, scale will be inaccurate unless ‘BaseScaler’ is used, but regardless this won’t affect final forecast

Parameters:

pass

fit()
predict()
holiday_detector.dates_to_holidays()
create_forecast_index()

after .fit, can be used to create index of prediction

plot_forecast()
plot_components()
plot_trend()
get_new_params()
return_components()
.anomaly_detector.anomalies
.anomaly_detector.scores
.holiday_count
.holidays
Type:

series flags, holiday detector only

.params
.keep_cols, .keep_cols_idx
.x_array
.predict_x_array
.trend_train
.predicted_trend
analyze_trend(slope, index)
auto_fit(df, validation_method)
base_scaler(df)
compare_actual_components()
create_t(DTindex)
cross_validate(df, validation_method)
feature_importance()
fit(df, future_regressor=None, regressor_per_series=None, flag_regressors=None, categorical_groups=None, past_impacts=None)
fit_data(df, forecast_length=None, future_regressor=None, regressor_per_series=None, flag_regressors=None, future_impacts=None, regressor_forecast_model=None, regressor_forecast_model_params=None, regressor_forecast_transformations=None, include_history=False, past_impacts=None)
get_new_params(method='fast')

Return dict of new parameters for parameter tuning.

get_params()

Return dict of current parameters.

next_fit()
plot_components(prediction=None, series=None, figsize=(16, 9), to_origin_space=True, title=None, start_date=None)

Plot breakdown of linear model components.

Parameters:
  • prediction – the forecast object

  • series (str) – name of series to plot, if desired

  • figsize (tuple) – figure size

  • to_origin_space (bool) – setting to False can make the graph look right due to preprocessing transformers, but to the wrong scale especially useful if “AlignLastValue” and other transformers present

  • title (str) – title

  • start_date (str) – slice point for start date, can make some high frequency components easier to see with a shorter window

plot_forecast(prediction, actuals=None, series=None, start_date=None, anomaly_color='darkslateblue', holiday_color='darkgreen', trend_anomaly_color='slategray', point_size=12.0)

Plot a forecast time series.

Parameters:
  • prediction (model prediction object, required) –

  • actuals (pd.DataFrame) – wide style df, of know data if available

  • series (str) – name of time series column to plot

  • start_date (str or Timestamp) – point at which to begin X axis

  • anomaly_color (str) – name of anomaly point color

  • holiday_color (str) – name of holiday point color

  • trend_anomaly_color (str) – name of trend anomaly point color

  • point_size (str) – point size for all anomalies

plot_things()
plot_trend(series=None, vline=None, colors=['#d4f74f', '#82ab5a', '#ff6c05', '#c12600'], title=None, start_date=None, **kwargs)
predict(forecast_length=None, include_history=False, future_regressor=None, regressor_per_series=None, flag_regressors=None, future_impacts=None, new_df=None, regressor_forecast_model=None, regressor_forecast_model_params=None, regressor_forecast_transformations=None, include_organic=False, df=None, past_impacts=None)

Generate a forecast.

future_regressor and regressor_per_series should only include new future values, history is already stored they should match on forecast_length and index of forecasts

Parameters:
  • forecast_length (int) – steps ahead to predict, or None

  • include_history (bool) – include past predictions if True

  • .fit (all the same regressor args as) –

  • here (but future forecast versions) –

  • future_impacts (pd.DataFrame) – like past impacts but for the forecast ahead

  • new_df (pd.DataFrame) – or df, equivalent to fit_data update

predict_new_product()
process_components(to_origin_space=True)

Scale and standardize component outputs.

return_components(to_origin_space=True, include_impacts=False)

Return additive elements of forecast, linear and trend. If impacts included, it is a multiplicative term.

Parameters:
  • to_origin_space (bool) –

  • include_impacts (bool) –

rolling_trend(trend_residuals, t)
scale_data(df)
to_origin_space(df, trans_method='forecast', components=False, bounds=False)

Take transformed outputs back to original feature space.

treatment_causal_impact(df, intervention_dates)
trend_analysis()
class autots.EventRiskForecast(df_train, forecast_length, frequency: str = 'infer', prediction_interval=0.9, lower_limit=0.05, upper_limit=0.95, model_name='BallTreeRegressionMotif', model_param_dict={'abs_energy': False, 'additional_lag_periods': None, 'combination_transformation': {'fillna': 'time', 'transformation_params': {'0': {'decay_span': 2, 'displacement_rows': 1, 'quantile': 1.0, 'rows': 7}}, 'transformations': {'0': 'AlignLastDiff'}}, 'comparison_transformation': {'fillna': 'cubic', 'transformation_params': {'0': {'decay_span': None, 'displacement_rows': 1, 'quantile': 1.0, 'rows': 364}}, 'transformations': {'0': 'AlignLastDiff'}}, 'datepart_method': None, 'distance_metric': 'euclidean', 'ewm_alpha': None, 'ewm_var_alpha': None, 'extend_df': True, 'frac_slice': None, 'holiday': False, 'k': 100, 'macd_periods': 74, 'max_rolling_periods': 12, 'mean_rolling_periods': 12, 'min_rolling_periods': None, 'nonzero_last_n': None, 'point_method': 'midhinge', 'polynomial_degree': None, 'quantile10_rolling_periods': 10, 'quantile90_rolling_periods': 10, 'regression_type': None, 'rolling_autocorr_periods': None, 'sample_fraction': 5000000, 'scale_full_X': False, 'series_hash': True, 'std_rolling_periods': 30, 'window': 3}, model_transform_dict={'fillna': 'akima', 'transformation_params': {'0': {}, '1': {}, '2': {'changepoint_distance_end': 520, 'changepoint_spacing': 5040, 'datepart_method': 'common_fourier', 'model': 'Linear'}, '3': {}}, 'transformations': {'0': 'Log', '1': 'SinTrend', '2': 'ChangepointDetrend', '3': 'RobustScaler'}}, model_forecast_kwargs={'max_generations': 30, 'n_jobs': 'auto', 'random_seed': 321, 'transformer_list': 'no_expanding', 'verbose': 1}, future_regressor_train=None, future_regressor_forecast=None)

Bases: object

Generate a risk score (0 to 1, but usually close to 0) for a future event exceeding user specified upper or lower bounds.

Event risk was created for the use case of predicting the risk of an out of stock event (lower limit ~= 0), or a very high demand day, and similar issues. It can also be used to identify “regressions”, that is, cases where the real world data (run as a holdout) was unlikely, to raise an alert that something is amiss. This is a capability somewhat unique to motif models, as most probabilistic models “fan out” rather than creating a mix of quality outcomes.

Upper and lower limits can be one of four types, and may each be different. 1. None (no risk score calculated for this direction) 2. Float in range [0, 1] historic quantile of series (which is historic min and max at edges) is chosen as limit. 3. A dictionary of {“model_name”: x, “model_param_dict”: y, “model_transform_dict”: z, “prediction_interval”: 0.9} to generate a forecast as the limits

Primarily intended for simple forecasts like SeasonalNaive, but can be used with any AutoTS model

  1. a custom input numpy array or pandas DataFrame of shape (forecast_length, num_series)

This can be used to find the “middle” limit too, flip so upper=lower and lower=upper, then abs(U - (1 - L)). In some cases it may help to drop the results from the first forecast timestep or two.

This functions by generating multiple outcome forecast possiblities in two ways. If a ‘Motif’ type model is passed, it uses all the k neighbors motifs as outcome paths (recommended) All other AutoTS models will generate the possible outcomes by utilizing multiple prediction_intervals (more intervals = slower but more resolution). The risk score is then the % of outcome forecasts which cross the limit. (less than or equal for lower, greater than or equal for upper)

Only accepts wide style dataframe input. Methods are class_methods and can be used standalone. They default to __init__ inputs, but can be overriden. Results are usually a numpy array of shape (forecast_length, num_series)

Parameters:
  • df_train (pd.DataFrame) – `wide style data, pd.DatetimeIndex for index and one series per column

  • forecast_length (int) – number of forecast steps to make

  • frequency (str) – frequency of timesteps

  • prediction_interval (float) – float or list of floats for probabilistic forecasting if a list, the first item in the list is the one used for .fit default

  • model_forecast_kwargs (dict) – AutoTS kwargs to pass to generaet_result_windows, .fit_forecast, and forecast-style limits

  • model_name – for model_forecast in generate_result_windows

  • model_param_dict – for model_forecast in generate_result_windows

  • model_transform_dict – for model_forecast in generate_result_windows

  • future_regressor_train – regressor arrays if used

  • future_regressor_forecast – regressor arrays if used

fit()
predict()
predict_historic()
generate_result_windows()
generate_risk_array()
generate_historic_risk_array()
set_limit()
plot()
result_windows, forecast_df, up_forecast_df, low_forecast_df
lower_limit_2d, upper_limit_2d, upper_risk_array, lower_risk_array
window_index, prediction_object
fit(df_train=None, forecast_length=None, prediction_interval=None, models_mode='event_risk', model_list=['UnivariateMotif', 'MultivariateMotif', 'SectionalMotif', 'ARCH', 'MetricMotif', 'SeasonalityMotif'], ensemble=None, autots_kwargs=None, future_regressor_train=None)

Shortcut for generating model params.

args specified are those suggested for an otherwise normal AutoTS run

Parameters:
  • df_train (pd.DataFrame) – wide style only

  • model_method (str) – event_risk here is used by motif models

  • model_list (list) – suggesting the use of motif models

  • ensemble (list) – must be None or empty list to get motif result windows

  • autots_kwargs (dict) – all other args passed in as kwargs if None, defaults to class model_forecast_kwargs, for blank pass empty dict

static generate_historic_risk_array(df, limit, direction='upper')

Given a df and a limit, returns a 0/1 array of whether limit was equaled or exceeded.

generate_result_windows(df_train=None, forecast_length=None, frequency=None, prediction_interval=None, model_name=None, model_param_dict=None, model_transform_dict=None, model_forecast_kwargs=None, future_regressor_train=None, future_regressor_forecast=None)

For event risk forecasting. Params default to class init but can be overridden here.

Returns:

(num_samples/k, forecast_length, num_series/columns)

Return type:

result_windows (numpy.array)

static generate_risk_array(result_windows, limit, direction='upper')

Given a df and a limit, returns a 0/1 array of whether limit was equaled or exceeded.

plot(column_idx=0, column=None, grays=['#838996', '#c0c0c0', '#dcdcdc', '#a9a9a9', '#808080', '#989898', '#757575', '#696969', '#c9c0bb', '#c8c8c8', '#323232', '#e5e4e2', '#778899', '#4f666a', '#848482', '#414a4c', '#8a7f80', '#c4c3d0', '#bebebe', '#dbd7d2'], up_low_color=['#ff4500', '#ff5349'], bar_color='#6495ED', bar_ylim=[0.0, 0.5], figsize=(14, 8), result_windows=None, lower_limit_2d=None, upper_limit_2d=None, upper_risk_array=None, lower_risk_array=None)

Plot a sample of the risk forecast outcomes.

Parameters:
  • column_idx (int) – positional index of series to sample for plot

  • column (str) – optional column name to select instead of column_idx

  • grays (list of str) – list of hex codes for colors for the potential forecasts

  • up_low_colors (list of str) – two hex code colors for upper and lower limits

  • bar_color (str) – hex color for bar graph

  • bar_ylim (list) – passed to ylim of plot, sets scale of axis of barplot

  • figsize (tuple) – passed to figsize of output figure

plot_eval(df_test, column_idx=0, actuals_color=['#00BFFF'], up_low_color=['#ff4500', '#ff5349'], bar_color='#6495ED', bar_ylim=[0.0, 0.5], figsize=(14, 8), lower_limit_2d=None, upper_limit_2d=None, upper_risk_array=None, lower_risk_array=None)

Plot a sample of the risk forecast with known value vs risk score.

Parameters:
  • df_test (pd.DataFrame) – dataframe of known values (dt index, series)

  • column_idx (int) – positional index of series to sample for plot

  • actuals_color (list of str) – list of one hex code for line of known actuals

  • up_low_colors (list of str) – two hex code colors for lower and upper

  • bar_color (str) – hex color for bar graph

  • bar_ylim (list) – passed to ylim of plot, sets scale of axis of barplot

  • figsize (tuple) – passed to figsize of output figure

predict()

Returns forecast upper, lower risk probability arrays for input limits.

predict_historic(upper_limit=None, lower_limit=None, eval_periods=None)

Returns upper, lower risk probability arrays for input limits for the historic data. If manual numpy array limits are used, the limits will need to be appropriate shape (for df_train and eval_periods if used)

Parameters:
  • upper_limit – if different than the version passed to init

  • lower_limit – if different than the version passed to init

  • eval_periods (int) – only assess the n most recent periods of history

query_risk(dates=None, series=None, include_limits=True, include_forecast=False, historic=False, return_json=False)

Query a specific slice of risk forecast results with minimal token usage.

Designed for LLM-friendly output with compact representation.

Parameters:
  • dates (str, datetime, list, slice) – Date(s) to query. - Single date: “2024-01-15” or datetime object - Date range: slice(“2024-01-01”, “2024-01-31”) - List of dates: [“2024-01-15”, “2024-01-20”] - None: all dates

  • series (str, list) – Series name(s) to query. - Single series: “sales” - Multiple series: [“sales”, “revenue”] - None: all series

  • include_limits (bool) – Include the limit thresholds used

  • include_forecast (bool) – Include the underlying forecast values

  • historic (bool) – Query historic risk (from predict_historic) instead of forecast risk

  • return_json (bool) – Return JSON string instead of dict

Returns:

Compact risk data

Return type:

dict or str

Examples

>>> # Single series, single date
>>> risk.query_risk(dates="2024-01-15", series="sales")
{'upper_risk': {'sales': {'2024-01-15': 0.05}}, 'lower_risk': {...}}
>>> # Include limits and forecast
>>> risk.query_risk(
...     dates=slice("2024-01-01", "2024-01-07"),
...     series=["sales"],
...     include_limits=True,
...     include_forecast=True
... )
static set_limit(limit, target_shape, df_train, direction='upper', period='forecast', forecast_length=None, eval_periods=None)

Handles all limit input styles and returns numpy array.

Parameters:
  • limit – see class overview for input options, also accepts pandas DataFrame

  • target_shape (tuple) – of (forecast_length, num_series)

  • df_train (pd.DataFrame) – training data

  • direction (str) – whether it is the “upper” or “lower” limit

  • period (str) – “forecast” or “historic” only used for limits defined by forecast algorithm params

  • forecast_length (int) – needed only for historic of forecast algorithm defined limit

  • eval_periods (int) – only for historic forecast limit, only runs on the tail n (this) of data

autots.FillNA(df, method: str = 'ffill', window: int = 10)

Fill NA values using different methods.

Parameters:
  • method (str) – ‘ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling mean’ - fill with last n (window) values ‘ffill mean biased’ - simple avg of ffill and mean ‘fake date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘seasonal_linear’ - seasonally-aware linear regression imputation using datetime and local features ‘seasonal_linear_window_3’ - seasonal linear with window=3 ‘seasonal_linear_window_10’ - seasonal linear with window=10 also most method values of pd.DataFrame.interpolate()

  • window (int) – length of rolling windows for filling na, for rolling methods

class autots.GeneralTransformer(fillna: str | None = None, transformations: dict = {}, transformation_params: dict = {}, grouping: str | None = None, reconciliation: str | None = None, grouping_ids=None, random_seed: int = 2020, n_jobs: int = 1, holiday_country: list | None = None, verbose: int = 0, forecast_length: int = 90)

Bases: object

Remove fillNA and then mathematical transformations. Has .fit(), .fit_transform(), and inverse_transform() primary methods.

Expects a chronologically sorted pandas.DataFrame with a DatetimeIndex, only numeric data, and a ‘wide’ (one column per series) shape.

Warning

  • inverse_transform will not fully return the original data under many conditions
    • the primary intention of inverse_transform is to inverse for forecast (immediately following the historical time period) data from models, not to return original data

    • NAs filled will be returned with the filled value

    • Discretization, statsmodels filters, Round, Slice, ClipOutliers cannot be inversed

    • RollingMean, PctChange, CumSum, Seasonal Difference, and DifferencedTransformer will only return original or an immediately following forecast
      • by default ‘forecast’ is expected, ‘original’ can be set in trans_method

Parameters:
  • fillNA (str) –

    • method to fill NA, passed through to FillNA()

    ’ffill’ - fill most recent non-na value forward until another non-na value is reached ‘zero’ - fill with zero. Useful for sales and other data where NA does usually mean $0. ‘mean’ - fill all missing values with the series’ overall average value ‘median’ - fill all missing values with the series’ overall median value ‘rolling_mean’ - fill with last n (window = 10) values ‘rolling_mean_24’ - fill with avg of last 24 ‘ffill_mean_biased’ - simple avg of ffill and mean ‘fake_date’ - shifts forward data over nan, thus values will have incorrect timestamps ‘IterativeImputer’ - sklearn iterative imputer most of the interpolate methods from pandas.interpolate

  • transformations (dict) –

    • transformations to apply {0: “MinMaxScaler”, 1: “Detrend”, …}

    ’None’ ‘MinMaxScaler’ - Sklearn MinMaxScaler ‘PowerTransformer’ - Sklearn PowerTransformer ‘QuantileTransformer’ - Sklearn ‘MaxAbsScaler’ - Sklearn ‘StandardScaler’ - Sklearn ‘RobustScaler’ - Sklearn ‘PCA, ‘FastICA’ - performs sklearn decomposition and returns n-cols worth of n_components ‘Detrend’ - fit then remove a linear regression from the data ‘RollingMeanTransformer’ - 10 period rolling average, can receive a custom window by transformation_param if used as second_transformation ‘FixedRollingMean’ - same as RollingMean, but with inverse_transform disabled, so smoothed forecasts are maintained. ‘RollingMean10’ - 10 period rolling average (smoothing) ‘RollingMean100thN’ - Rolling mean of periods of len(train)/100 (minimum 2) ‘DifferencedTransformer’ - makes each value the difference of that value and the previous value ‘PctChangeTransformer’ - converts to pct_change, not recommended if lots of zeroes in data ‘SinTrend’ - removes a sin trend (fitted to each column) from the data ‘CumSumTransformer’ - makes value sum of all previous ‘PositiveShift’ - makes all values >= 1 ‘Log’ - log transform (uses PositiveShift first as necessary) ‘IntermittentOccurrence’ - -1, 1 for non median values ‘SeasonalDifference’ - remove the last lag values from all values ‘SeasonalDifferenceMean’ - remove the average lag values from all ‘SeasonalDifference7’,’12’,’28’ - non-parameterized version of Seasonal ‘CenterLastValue’ - center data around tail of dataset ‘Round’ - round values on inverse or transform ‘Slice’ - use only recent records ‘ClipOutliers’ - simple remove outliers ‘Discretize’ - bin or round data into groups ‘DatepartRegression’ - move a trend trained on datetime index “ScipyFilter” - filter data (lose information but smoother!) from scipy “HPFilter” - statsmodels hp_filter “STLFilter” - seasonal decompose and keep just one part of decomposition “EWMAFilter” - use an exponential weighted moving average to smooth data “MeanDifference” - joint version of differencing “BTCD” - Box Tiao decomposition ‘AlignLastValue’: align forecast start to end of training data ‘AnomalyRemoval’: more tailored anomaly removal options ‘HolidayTransformer’: detects holidays and wishes good cheer to all ‘LocalLinearTrend’: rolling local trend, using tails for future and past trend ‘KalmanSmoothing’: smooth using a state space model ‘RegressionFilter’: fit seasonal removal and local linear trend, clip std devs away from this fit ‘LevelShiftTransformer’: automatically compensate for historic level shifts in data. ‘CenterSplit’: Croston inspired magnitude/occurrence split for intermittent “FFTFilter”: filter using a fast fourier transform “FFTDecomposition”: remove FFT harmonics, later add back “ReplaceConstant”: replace a value with NaN, optionally fillna then later reintroduce “AlignLastDiff”: shift forecast to be within range of historical diffs “DiffSmoother”: smooth diffs then return to original space “HistoricValues”: match predictions to most similar historic value and overwrite “BKBandpassFilter”: another version of the Baxter King bandpass filter “Constraint”: apply constraints (caps) on values “FIRFilter”: apply a FIR filter (firwin) “ShiftFirstValue”: similar to positive shift but uses the first values as the basis of zero “ThetaTransformer”: decomposes into theta lines, then recombines “ChangepointDetrend”: detrend but with changepoints, and seasonality thrown in for fun “ChangepointDetector”: detect changepoints using advanced detector and remove associated trend “MeanPercentSplitter”: split data into rolling mean and percent of rolling mean “UpscaleDownscaleTransformer”: upscales and downscales “ReconciliationTransformer”: creates hierarchies then reconciles on the way back “CointegrationTransformer”: creates stationary features from cointegrated sets of series

  • transformation_params (dict) – params of transformers {0: {}, 1: {‘model’: ‘Poisson’}, …} pass through dictionary of empty dictionaries to utilize defaults

  • random_seed (int) – random state passed through where applicable

  • forecast_length (int) – length of forecast, not needed as argument for most transformers/params

fill_na(df, window: int = 10)
Parameters:
  • df (pandas.DataFrame) – Datetime Indexed

  • window (int) – passed through to rolling mean fill technique

Returns:

pandas.DataFrame

fit(df)

Apply transformations and return transformer object.

Parameters:

df (pandas.DataFrame) – Datetime Indexed

fit_transform(df)

Directly fit and apply transformations to convert df.

static get_new_params(method='fast', **kwargs)
inverse_transform(df, trans_method: str = 'forecast', fillzero: bool = False, bounds: bool = False, start: int | None = None)

Undo the madness.

Parameters:
  • df (pandas.DataFrame or PredictionObject) – Datetime Indexed data or AutoTS prediction container

  • trans_method (str) – ‘forecast’ or ‘original’ passed through

  • fillzero (bool) – if inverse returns NaN, fill with zero

  • bounds (bool) – currently ignores AlignLastValue transform if True (also used in process_components of Cassandra)

classmethod retrieve_transformer(transformation: str | None = None, param: dict = {}, df=None, random_seed: int = 2020, n_jobs: int = 1, holiday_country: list | None = None, forecast_length: int = 90)

Retrieves a specific transformer object from a string.

Parameters:
  • df (pandas.DataFrame) – Datetime Indexed - required to set params for some transformers

  • transformation (str) – name of desired method

  • param (dict) – dict of kwargs to pass (legacy: an actual param)

Returns:

transformer object

transform(df)

Apply transformations to convert df.

class autots.HolidayDetector(anomaly_detector_params={}, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=True, use_lunar_holidays=True, use_lunar_weekday=False, use_islamic_holidays=False, use_hebrew_holidays=False, use_hindu_holidays=False, output: str = 'multivariate', n_jobs: int = 1)

Bases: object

dates_to_holidays(dates, style='flag', holiday_impacts=False)

Populate date information for a given pd.DatetimeIndex.

Parameters:
  • dates (pd.DatetimeIndex) – list of dates

  • day_holidays (pd.DataFrame) – list of month/day holidays. Pass None if not available

  • style (str) – option for how to return information “long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar

  • holiday_impacts (dict) – a dict passed to .replace contaning values for holiday_names, or str ‘value’ or ‘anomaly_score’

detect(df)

Run holiday detection. Input wide-style pandas time series.

fit(df)
static get_new_params(method='random')
plot(series_name=None, include_anomalies=True, title=None, marker_size=None, plot_kwargs={}, series=None, start_date=None)
plot_anomaly(kwargs={})
class autots.ModelPrediction(forecast_length: int, transformation_dict: dict, model_str: str, parameter_dict: dict, frequency: str = 'infer', prediction_interval: float = 0.9, no_negatives: bool = False, constraint: float | None = None, holiday_country: str = 'US', startTimeStamps=None, grouping_ids=None, fail_on_forecast_nan: bool = True, return_model: bool = False, random_seed: int = 2020, verbose: int = 0, n_jobs: int | None = None, current_model_file: str | None = None, model_count: int = 0, force_gc: bool = False)

Bases: ModelObject

Feed parameters into modeling pipeline. A class object, does NOT work with ensembles.

Parameters:
  • df_train (pandas.DataFrame) – numeric training dataset of DatetimeIndex and series as cols

  • forecast_length (int) – number of periods to forecast

  • transformation_dict (dict) – a dictionary of outlier, fillNA, and transformation methods to be used

  • model_str (str) – a string to be direct to the appropriate model, used in ModelMonster

  • frequency (str) – str representing frequency alias of time series

  • prediction_interval (float) – width of errors (note: rarely do the intervals accurately match the % asked for…)

  • no_negatives (bool) – whether to force all forecasts to be > 0

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values.

  • future_regressor_train (pd.Series) – with datetime index, of known in advance data, section matching train data

  • future_regressor_forecast (pd.Series) – with datetime index, of known in advance data, section matching test data

  • holiday_country (str) – passed through to holiday package, used by a few models as 0/1 regressor.

  • startTimeStamps (pd.Series) – index (series_ids), columns (Datetime of First start of series)

  • fail_on_forecast_nan (bool) – if False, return forecasts even if NaN present, if True, raises error if any nan in forecast

  • return_model (bool) – if True, forecast will have .model and .tranformer attributes set to model object.

  • n_jobs (int) – number of processes

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

Returns:

Prediction from AutoTS model object

Return type:

PredictionObject (autots.PredictionObject)

fit(df, future_regressor=None)
fit_data(df, future_regressor=None)
fit_predict(df, forecast_length, future_regressor_train=None, future_regressor_forecast=None)
predict(forecast_length=None, future_regressor=None)
autots.RandomTransform(transformer_list: dict = {'AlignLastDiff': 0.01, 'AlignLastValue': 0.2, 'AnomalyRemoval': 0.03, 'BKBandpassFilter': 0.01, 'BTCD': 0.005, 'CenterLastValue': 0.01, 'CenterSplit': 0.01, 'ChangepointDetector': 0.01, 'ChangepointDetrend': 0.01, 'ClipOutliers': 0.03, 'CointegrationTransformer': 0.01, 'Constraint': 0.01, 'ConvolutionFilter': 0.01, 'CumSumTransformer': 0.005, 'DatepartRegression': 0.01, 'Detrend': 0.02, 'DiffSmoother': 0.005, 'DifferencedTransformer': 0.05, 'Discretize': 0.01, 'EWMAFilter': 0.02, 'FFTDecomposition': 0.01, 'FFTFilter': 0.01, 'FIRFilter': 0.01, 'FastICA': 0.005, 'G711Scaler': 0.01, 'G726Filter': 0.01, 'HPFilter': 0.01, 'HistoricValues': 0.01, 'HolidayTransformer': 0.01, 'IntermittentOccurrence': 0.01, 'KalmanSmoothing': 0.02, 'LevelShiftTransformer': 0.03, 'LocalLinearTrend': 0.01, 'Log': 0.01, 'MaxAbsScaler': 0.03, 'MeanDifference': 0.002, 'MeanPercentSplitter': 0.01, 'MinMaxScaler': 0.03, 'PCA': 0.01, 'PctChangeTransformer': 0.01, 'PositiveShift': 0.02, 'PowerTransformer': 0.01, 'QuantileTransformer': 0.03, 'ReconciliationTransformer': 0.01, 'RegressionFilter': 0.01, 'ReplaceConstant': 0.02, 'RobustScaler': 0.03, 'RollingMean100thN': 0.01, 'RollingMeanTransformer': 0.02, 'Round': 0.02, 'STLFilter': 0.01, 'ScipyFilter': 0.02, 'SeasonalDifference': 0.06, 'ShiftFirstValue': 0.01, 'SinTrend': 0.01, 'Slice': 0.02, 'StandardScaler': 0.04, 'ThetaTransformer': 0.01, 'UpscaleDownscaleTransformer': 0.005, 'bkfilter': 0.05, 'cffilter': 0.01, 'convolution_filter': 0.001, None: 0.0}, transformer_max_depth: int = 4, na_prob_dict: dict = {'DatepartRegressionImputer': 0.01, 'IterativeImputer': 0.025, 'IterativeImputerExtraTrees': 0.0001, 'KNNImputer': 0.02, 'SeasonalityMotifImputer1K': 0.005, 'SeasonalityMotifImputerLinMix': 0.005, 'fake_date': 0.1, 'ffill': 0.4, 'ffill_mean_biased': 0.1, 'interpolate': 0.4, 'mean': 0.06, 'median': 0.03, 'rolling_mean': 0.1, 'rolling_mean_24': 0.1, 'seasonal_linear': 0.01, 'seasonal_linear_window_3': 0.01, 'zero': 0.05, None: 0.001}, fast_params: bool | None = None, superfast_params: bool | None = None, traditional_order: bool = False, transformer_min_depth: int = 1, allow_none: bool = True, no_nan_fill: bool = False, exclude_fillna: list | None = None)

Return a dict of randomly choosen transformation selections.

BTCD is used as a signal that slow parameters are allowed.

Parameters:

exclude_fillna (list) – list of fillna methods to exclude from selection (e.g., [‘fake_date’])

class autots.SyntheticDailyGenerator(start_date='2015-01-01', n_days=2555, n_series=10, random_seed=42, trend_changepoint_freq=0.5, level_shift_freq=0.1, level_shift_strength=0.4, anomaly_freq=0.05, shared_anomaly_prob=0.2, shared_level_shift_prob=0.2, weekly_seasonality_strength=1.0, yearly_seasonality_strength=1.0, noise_level=0.1, include_regressors=False, anomaly_types=None, disable_holiday_splash=False)

Bases: object

Generate realistic synthetic daily time series data with labeled components.

Creates multivariate time series with: - Piecewise linear trends with changepoints - Level shifts (instantaneous and ramped) - Seasonality (weekly, yearly) with stochastic variation - Holiday effects (common and custom) with splash and bridge effects - Anomalies with various post-event patterns - Noise with regime changes - Optional regressor impacts - Business day series with weekend NaN - Multiple scales across series

All components are labeled and stored for model evaluation.

Variability Across Series: - Noise levels vary 0.5x-2.0x the base noise_level per series - Weekly seasonality strength varies 0.3x-2.5x per series - Yearly seasonality strength varies 0.2x-2.0x per series - Level shift frequency varies across series (some have none, some have several) - This creates a range from subtle, hard-to-detect patterns to very obvious ones

Event Scaling with Dataset Length: - Events (anomalies, level shifts, etc.) scale appropriately with n_days - Short datasets (< 1 year) use probabilistic event generation - Longer datasets use Poisson-based event counts - Level shifts are rare events, appropriately distributed

Template Compatibility: - Template structure is compatible with TimeSeriesFeatureDetector - Both use same JSON-friendly format for components and labels - Templates can be saved/loaded and used for model evaluation

Parameter Tuning: - Use tune_to_data() to optimize parameters to match real-world data - Tuning adjusts frequency and strength parameters based on statistical properties - See TUNING_GUIDE.md for detailed usage examples

Parameters:
  • start_date (str or pd.Timestamp) – Start date for the time series

  • n_days (int) – Number of days to generate

  • n_series (int) – Number of time series to generate

  • random_seed (int) – Random seed for reproducibility

  • trend_changepoint_freq (float) – Probability per year of a trend changepoint (default 0.5)

  • level_shift_freq (float) – Probability per year of a level shift (default 0.1)

  • level_shift_strength (float) – Controls the magnitude of level shifts as a percentage of the series baseline. Shifts will be sampled from 10% to this value (skewed toward 10%), but always at least 5x the noise standard deviation for detectability (default 0.4 = 40%)

  • anomaly_freq (float) – Probability per week of an anomaly (default 0.05)

  • weekly_seasonality_strength (float) – Base strength of weekly seasonality (default 1.0) Actual per-series strength will vary 0.3x-2.5x this value

  • yearly_seasonality_strength (float) – Base strength of yearly seasonality (default 1.0) Actual per-series strength will vary 0.2x-2.0x this value

  • noise_level (float) – Base noise level (default 0.1, relative to signal) Actual per-series level will vary 0.5x-2.0x this value

  • include_regressors (bool) – Whether to include regressor effects (default False)

  • anomaly_types (list of str or None) – List of anomaly types to generate. Valid types are: ‘point_outlier’, ‘noisy_burst’, ‘impulse_decay’, ‘linear_decay’, ‘transient_change’ If None (default), all types will be generated

  • disable_holiday_splash (bool) – If True, holidays will only affect a single day with no splash or bridge effects (default False)

Examples

Basic usage:

>>> from autots.datasets import generate_synthetic_daily_data
>>> gen = generate_synthetic_daily_data(n_days=365, n_series=5)
>>> data = gen.get_data()
>>> labels = gen.get_all_labels()

Tuning to real-world data:

>>> import pandas as pd
>>> real_data = pd.read_csv('real_data.csv', index_col=0, parse_dates=True)
>>> gen = generate_synthetic_daily_data(
...     start_date=real_data.index[0],
...     n_days=len(real_data),
...     n_series=len(real_data.columns),
... )
>>> results = gen.tune_to_data(real_data, n_iterations=20, verbose=True)
>>> gen._generate()  # Regenerate with tuned parameters
>>> tuned_data = gen.get_data()
SERIES_TYPE_DESCRIPTIONS = {'autocorrelated_noise': 'Autocorrelated Noise (AR)', 'business_day': 'Business Day (weekend NaN)', 'granger_lagged': 'Granger Lagged (7-day lag from Lunar Holidays)', 'lunar_holidays': 'Lunar Holidays', 'multiplicative_seasonality': 'Multiplicative Seasonality (AR noise)', 'no_level_shifts': 'No Level Shifts', 'ramadan_holidays': 'Ramadan Holidays', 'saturating_trend': 'Saturating Trend (logistic)', 'seasonality_changepoints': 'Seasonality Changepoints', 'standard': 'Standard', 'time_varying_seasonality': 'Time-Varying Seasonality', 'variance_regimes': 'Variance Regimes (GARCH)'}
TEMPLATE_VERSION = '1.0'
get_all_labels(series_name=None)

Get all labels in a structured format for easy model evaluation.

Parameters:

series_name (str, optional) – If provided, return labels for specific series only.

Returns:

Comprehensive dictionary of all labels and metadata.

Return type:

dict

get_anomalies(series_name=None)

Get anomaly labels: {series_name: [(date, magnitude, type, duration, shared), …]}

get_components(series_name=None)

Get individual components for analysis.

Parameters:

series_name (str, optional) – If provided, return components for specific series. If None, return all components.

Returns:

Dictionary of {series_name: {component_name: array}}

Return type:

dict

get_data()

Get the generated time series data.

get_holiday_config()

Get holiday splash/bridge configuration: {holiday_name: {‘has_splash’: bool, ‘has_bridge’: bool}}

get_holiday_impacts(series_name=None)

Get holiday impact labels (main holiday dates only): {series_name: {date: impact}}

get_holiday_splash_impacts(series_name=None)

Get holiday splash/bridge day impacts: {series_name: {date: impact}}

get_lagged_influences(series_name=None)

Get lagged influence information for Granger-style causal relationships.

Parameters:

series_name (str, optional) – If provided, return lagged influence info for specific series. If None, return all lagged influences.

Returns:

Dictionary of {series_name: {‘source’: source_series, ‘lag’: lag_days, ‘coefficient’: coef}} or single dict if series_name is specified

Return type:

dict

get_level_shifts(series_name=None)

Get level shift labels: {series_name: [(date, magnitude, type, shared), …]}

get_noise_changepoints(series_name=None)

Get noise distribution changepoints: {series_name: [(date, old_params, new_params), …]}

get_noise_to_signal_ratios()

Get noise-to-signal ratios for all series.

get_regressor_impacts(series_name=None)

Get regressor impacts: {series_name: {‘by_date’: {date: {regressor: impact}}, ‘coefficients’: {…}}}

get_regressors()

Get the generated regressors (if any).

get_seasonality_changepoints(series_name=None)

Get seasonality changepoints: {series_name: [(date, description), …]}

get_series_noise_levels()

Get per-series noise levels.

get_series_scales()

Get scale factors for all series.

get_series_seasonality_strengths()

Get per-series seasonality strengths.

get_series_type_description(series_name)

Get human-readable description for a series type.

Parameters:

series_name (str) – Name of the series

Returns:

Human-readable description of the series type

Return type:

str

get_template(series_name=None, deep=True)

Get the JSON-friendly template describing the generated data.

get_trend_changepoints(series_name=None)

Get trend changepoint labels: {series_name: [(date, old_slope, new_slope), …]}

machine_summary(series_name=None, include_events=True, include_regressors=True, max_events_per_type=25, round_decimals=6, as_json=False)

Return a structured summary tailored for LLM or tool consumption.

plot(series_name=None, figsize=(16, 12), save_path=None, show=True)

Plot a series with all its labeled components clearly marked.

Parameters:
  • series_name (str, optional) – Name of series to plot. If None, randomly selects one.

  • figsize (tuple, optional) – Figure size (width, height) in inches. Default (16, 12).

  • save_path (str, optional) – If provided, saves the plot to this path instead of displaying.

  • show (bool, optional) – Whether to display the plot. Default True.

Returns:

fig – The generated figure object

Return type:

matplotlib.figure.Figure

Raises:

ImportError – If matplotlib is not installed

classmethod render_template(template, return_components=False)

Render a template into time series using the generator’s renderer.

summary()

Print a summary of the generated data.

to_csv(filepath, include_regressors=False)

Save generated data to CSV.

Parameters:
  • filepath (str) – Path to save the CSV file

  • include_regressors (bool) – Whether to include regressors in the output

tune_to_data(target_df, n_iterations=20, n_standard_series=None, metric='mse', verbose=True, random_seed=None)

Tune generator parameters to match real-world time series data.

This method optimizes the generator’s parameters to minimize the difference between synthetic data and real-world data based on distributional statistics. Special series types are not tuned but will still be generated with optimized base parameters.

TODO: this is a fairly basic implementation, and won’t tune many aspects of real world data

Parameters:
  • target_df (pd.DataFrame) – Real-world time series data to match (DatetimeIndex, numeric columns)

  • n_iterations (int, optional) – Number of optimization iterations (default 20)

  • n_standard_series (int, optional) – Number of standard series to generate for comparison during tuning. If None, uses min(target_df.shape[1], 5) series.

  • metric (str, optional) – Distance metric to minimize: ‘mse’, ‘mae’, ‘wasserstein’ (default ‘mse’)

  • verbose (bool, optional) – Whether to print progress (default True)

  • random_seed (int, optional) – Random seed for tuning process (default None, uses current random_seed)

Returns:

Dictionary containing: - ‘best_params’: Optimized parameter dictionary - ‘best_score’: Best score achieved - ‘target_stats’: Statistics from target data - ‘synthetic_stats’: Statistics from best synthetic data (scaled) - ‘scale_multiplier’: Factor to multiply synthetic data by to match target magnitude

Return type:

dict

Notes

Updates self with best parameters found. After calling this method, new data generation will use the tuned parameters.

Important: The synthetic data is generated on a base scale (~50), which may differ from your real-world data scale. The returned ‘scale_multiplier’ should be applied to generated data to match the magnitude of the target data:

>>> gen._generate()  # Regenerate with tuned parameters
>>> scaled_data = gen.data * gen.tuning_results['scale_multiplier']

The scale multiplier matches the mean of absolute means between target and synthetic data, ensuring the overall magnitude is similar.

Raises:
  • ImportError – If scipy is not installed (required for optimization)

  • ValueError – If target_df is invalid

autots.TransformTS

alias of GeneralTransformer

autots.create_lagged_regressor(df, forecast_length: int, frequency: str = 'infer', scale: bool = True, summarize: str | None = None, backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill')

Create a regressor of features lagged by forecast length. Useful to some models that don’t otherwise use such information.

It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:]

Parameters:
  • df (pd.DataFrame) – training data

  • forecast_length (int) – length of forecasts, to shift data by

  • frequency (str) – the ever necessary frequency for datetime things. Default ‘infer’

  • scale (bool) – if True, use the StandardScaler to standardize the features

  • summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’, ‘gaussian_random_projection’, “auto”

  • backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression

  • fill_na (str) – method to prefill NAs in data, same methods as available elsewhere

Returns:

regressor_train, regressor_forecast

autots.create_regressor(df, forecast_length, frequency: str = 'infer', holiday_countries: list = ['US'], datepart_method: str = 'simple_binarized', drop_most_recent: int = 0, scale: bool = True, summarize: str = 'auto', backfill: str = 'bfill', n_jobs: str = 'auto', fill_na: str = 'ffill', aggfunc: str = 'first', encode_holiday_type=False, holiday_detector_params={'anomaly_detector_params': {'forecast_params': None, 'method': 'mad', 'method_params': {'alpha': 0.05, 'distribution': 'gamma'}, 'transform_dict': {'fillna': None, 'transformation_params': {'0': {}}, 'transformations': {'0': 'DifferencedTransformer'}}}, 'output': 'univariate', 'splash_threshold': None, 'threshold': 0.8, 'use_dayofmonth_holidays': True, 'use_hebrew_holidays': False, 'use_islamic_holidays': False, 'use_lunar_holidays': False, 'use_lunar_weekday': False, 'use_wkdeom_holidays': False, 'use_wkdom_holidays': True}, holiday_regr_style: str = 'flag', preprocessing_params: dict | None = None, fft_n_harmonics: int = 6, fft_detrend: str = 'linear')

Create a regressor from information available in the existing dataset. Components: are lagged data, datepart information, holiday, and FFT harmonic features.

This function has been confusing people. This is NOT necessary for machine learning models, in AutoTS they internally create more elaborate feature sets separately. This instead may help some other models (GLM, ARIMA) which accept regressors but won’t build a regressor feature set internally. And this allows post-hoc customization as needed before input to AutoTS.

It is recommended that the .head(forecast_length) of both regressor_train and the df for training are dropped. df = df.iloc[forecast_length:] If you don’t want the lagged features, set summarize=”median” which will only give one column of such, which can then be easily dropped

Parameters:
  • df (pd.DataFrame) – WIDE style dataframe (use long_to_wide if the data isn’t already) categorical series will be discard for this, if present

  • forecast_length (int) – time ahead that will be forecast

  • frequency (str) – those annoying offset codes you have to always use for time series

  • holiday_countries (list) – list of countries to pull holidays for. Reqs holidays pkg also can be a dict of {‘country’: “subdiv”} to include subdivision (state)

  • datepart_method (str) – see date_part from seasonal

  • scale (bool) – if True, use the StandardScaler to standardize the features

  • summarize (str) – options to summarize the features, if large: ‘pca’, ‘median’, ‘mean’, ‘mean+std’, ‘feature_agglomeration’ (auto default when larger data), ‘gaussian_random_projection’

  • backfill (str) – method to deal with the NaNs created by shifting “bfill”- backfill with last values “ETS” -backfill with ETS backwards forecast “DatepartRegression” - backfill with DatepartRegression

  • fill_na (str) – method to prefill NAs in data, same methods as available elsewhere

  • aggfunc (str) – str or func, used if frequency is resampled

  • encode_holiday_type (bool) – if True, returns column per holiday, ONLY for holidays package country holidays (not Detector)

  • holiday_detector_params (dict) – passed to HolidayDetector, or None

  • holiday_regr_style (str) – passed to detector’s dates_to_holidays ‘flag’, ‘series_flag’, ‘impact’

  • preprocessing_params (dict) – GeneralTransformer params to be applied before regressor creation

  • fft_n_harmonics (int) – if not None, extract FFT harmonic features. Number of harmonics to extract (e.g., 10). Can also be float for percentage or None to disable FFT features

  • fft_detrend (str) – detrending method for FFT - None, ‘linear’, ‘quadratic’, ‘cubic’, ‘quartic’

Returns:

regressor_train, regressor_forecast

autots.generate_synthetic_daily_data(start_date='2015-01-01', n_days=2555, n_series=10, random_seed=42, **kwargs)

Quick function to generate synthetic daily data.

Parameters:
  • start_date (str) – Start date for the time series

  • n_days (int) – Number of days to generate

  • n_series (int) – Number of series to generate

  • random_seed (int) – Random seed for reproducibility

  • **kwargs – Additional parameters passed to SyntheticDailyGenerator

Returns:

generator – Generator object with data and labels

Return type:

SyntheticDailyGenerator

autots.infer_frequency(df_wide, warn=True, **kwargs)

Infer the frequency in a slightly more robust way.

Parameters:
  • df_wide (pd.Dataframe or pd.DatetimeIndex) – input to pull frequency from

  • warn (bool) – unused, here to make swappable with pd.infer_freq

autots.load_artificial(long=False, date_start=None, date_end=None)

Load artifically generated series from random distributions.

Parameters:
  • long (bool) – if True long style data, if False, wide style data

  • date_start – str or datetime.datetime of start date

  • date_end – str or datetime.datetime of end date

autots.load_daily(long: bool = True)

Daily sample data.

``` # most of the wiki data was chosen to show holidays or holiday-like patterns wiki = [

‘United_States’, ‘Germany’, ‘List_of_highest-grossing_films’, ‘Jesus’, ‘Michael_Jackson’, ‘List_of_United_States_cities_by_population’, ‘Microsoft_Office’, ‘Google_Chrome’, ‘Periodic_table’, ‘Standard_deviation’, ‘Easter’, ‘Christmas’, ‘Chinese_New_Year’, ‘Thanksgiving’, ‘List_of_countries_that_have_gained_independence_from_the_United_Kingdom’, ‘History_of_the_hamburger’, ‘Elizabeth_II’, ‘William_Shakespeare’, ‘George_Washington’, ‘Cleopatra’, ‘all’

]

df2 = load_live_daily(

observation_start=”2017-01-01”, weather_years=7, trends_list=None, gov_domain_list=None, wikipedia_pages=wiki, fred_series=[‘DGS10’, ‘T5YIE’, ‘SP500’,’DEXUSEU’], sleep_seconds=10, fred_key = “93873d40f10c20fe6f6e75b1ad0aed4d”, weather_data_types = [“WSF2”, “PRCP”], weather_stations = [“USW00014771”], # looking for intermittent tickers=None, london_air_stations=None, weather_event_types=None, earthquake_min_magnitude=None,

) data_file_name = join(“autots”, “datasets”, ‘data’, ‘holidays.zip’) df2.to_csv(

data_file_name, index=True, compression={

‘method’: ‘zip’, ‘archive_name’: ‘holidays.csv’, ‘compresslevel’: 9 # Maximum compression level (0-9)

}

)

Sources: Wikimedia Foundation

param long:

if True, return data in long format. Otherwise return wide

type long:

bool

autots.load_hourly(long: bool = True)

Traffic data from the MN DOT via the UCI data repository.

autots.load_linear(long=False, shape=None, start_date: str = '2021-01-01', introduce_nan: float | None = None, introduce_random: float | None = None, random_seed: int = 123)

Create a dataset of just zeroes for testing edge case.

Parameters:
  • long (bool) – whether to make long or wide

  • shape (tuple) – shape of output dataframe

  • start_date (str) – first date of index

  • introduce_nan (float) – percent of rows to make null. 0.2 = 20%

  • introduce_random (float) – shape of gamma distribution

  • random_seed (int) – seed for random

autots.load_live_daily(long: bool = False, observation_start: str | None = None, observation_end: str | None = None, fred_key: str | None = None, fred_series=['DGS10', 'T5YIE', 'SP500', 'DCOILWTICO', 'DEXUSEU', 'WPU0911'], tickers: list = ['MSFT'], trends_list: list = ['forecasting', 'cycling', 'microsoft'], trends_geo: str = 'US', weather_data_types: list = ['AWND', 'WSF2', 'TAVG', 'PRCP'], weather_stations: list = ['USW00094846', 'USW00014925', 'USW00014771'], weather_years: int = 5, noaa_cdo_token: str | None = None, london_air_stations: list = ['CT3', 'SK8'], london_air_species: str = 'PM25', london_air_days: int = 180, earthquake_days: int = 180, earthquake_min_magnitude: int = 5, gsa_key: str | None = None, nasa_api_key: str = 'DEMO_KEY', gov_domain_list=['nasa.gov'], gov_domain_limit: int = 600, wikipedia_pages: list = ['Microsoft_Office', 'List_of_highest-grossing_films'], wiki_language: str = 'en', weather_event_types=['%28Z%29+Winter+Weather', '%28Z%29+Winter+Storm'], caiso_query: str | None = None, eia_key: str | None = None, eia_respondents: list = ['MISO', 'PJM', 'TVA', 'US48'], timeout: float = 300.05, sleep_seconds: int = 10, **kwargs)

Generates a dataframe of data up to the present day. Requires active internet connection. Try to be respectful of these free data sources by not calling too much too heavily. Pass None instead of specification lists to exclude a data source.

Parameters:
  • long (bool) – whether to return in long format or wide

  • observation_start (str) – %Y-%m-%d earliest day to retrieve, passed to Fred.get_series and yfinance.history note that apis with more restrictions have other default lengths below which ignore this

  • observation_end (str) – %Y-%m-%d most recent day to retrieve

  • fred_key (str) – https://fred.stlouisfed.org/docs/api/api_key.html

  • fred_series (list) – list of FRED series IDs. This requires fredapi package

  • tickers (list) – list of stock tickers, requires yfinance pypi package

  • trends_list (list) – list of search keywords, requires pytrends pypi package. None to skip.

  • weather_data_types (list) – from NCEI NOAA api data types, GHCN Daily Weather Elements PRCP, SNOW, TMAX, TMIN, TAVG, AWND, WSF1, WSF2, WSF5, WSFG

  • weather_stations (list) – from NCEI NOAA api station ids. Pass empty list to skip.

  • noaa_cdo_token (str) – API token from https://www.ncdc.noaa.gov/cdo-web/token (free, required for weather data)

  • london_air_stations (list) – londonair.org.uk source station IDs. Pass empty list to skip.

  • london_species (str) – what measurement to pull from London Air. Not all stations have all metrics.

  • earthquake_min_magnitude (int) – smallest earthquake magnitude to pull from earthquake.usgs.gov. Set None to skip this.

  • gsa_key (str) – api key from https://open.gsa.gov/api/dap/

  • nasa_api_key (str) – API key for https://api.nasa.gov/. Set to None to skip NASA DONKI data.

  • gov_domain_list (list) – dist of government run domains to get traffic data for. Can be very slow, so fewer is better. some examples: [‘usps.com’, ‘ncbi.nlm.nih.gov’, ‘cdc.gov’, ‘weather.gov’, ‘irs.gov’, “usajobs.gov”, “studentaid.gov”, ‘nasa.gov’, “uk.usembassy.gov”, “tsunami.gov”]

  • gov_domain_limit (int) – max number of records. Smaller will be faster. Max is currently 10000.

  • wikipedia_pages (list) – list of Wikipedia pages, html encoded if needed (underscore for space)

  • weather_event_types (list) – list of html encoded severe weather event types https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Export-Format.pdf

  • caiso_query (str) – ENE_SLRS or None, can try others but probably won’t work due to other hardcoded params

  • timeout (float) – used by some queries

  • sleep_seconds (int) – increasing this may reduce probability of server download failures

autots.load_monthly(long: bool = True)

Federal Reserve of St. Louis monthly economic indicators.

autots.load_sine(long=False, shape=None, start_date: str = '2021-01-01', introduce_random: float | None = None, random_seed: int = 123)

Create a dataset of just zeroes for testing edge case.

autots.load_weekdays(long: bool = False, categorical: bool = True, periods: int = 180)

Test edge cases by creating a Series with values as day of week.

Parameters:
  • long (bool) – if True, return a df with columns “value” and “datetime” if False, return a Series with dt index

  • categorical (bool) – if True, return str/object, else return int

  • periods (int) – number of periods, ie length of data to generate

autots.load_weekly(long: bool = True)

Weekly petroleum industry data from the EIA.

autots.load_yearly(long: bool = True)

Federal Reserve of St. Louis annual economic indicators.

autots.long_to_wide(df, date_col: str = 'datetime', value_col: str = 'value', id_col: str = 'series_id', aggfunc: str = 'first')

Take long data and convert into wide, cleaner data.

Parameters:
  • df (pd.DataFrame) –

  • date_col (str) –

  • value_col (str) –

    • the name of the column with the values of the time series (ie sales $)

  • id_col (str) –

    • name of the id column, unique for each time series

  • aggfunc (str) –

    • passed to pd.pivot_table, determines how to aggregate duplicates for series_id and datetime

    other options include “mean” and other numpy functions, beware data must already be input as numeric type for these to work. if categorical data is provided, aggfunc=’first’ is recommended

autots.model_forecast(model_name, model_param_dict, model_transform_dict, df_train, forecast_length: int, frequency: str = 'infer', prediction_interval: float = 0.9, no_negatives: bool = False, constraint: float | None = None, future_regressor_train=None, future_regressor_forecast=None, holiday_country: str = 'US', startTimeStamps=None, grouping_ids=None, fail_on_forecast_nan: bool = True, random_seed: int = 2020, verbose: int = 0, n_jobs: int = 'auto', template_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'], horizontal_subset: list | None = None, return_model: bool = False, current_model_file: str | None = None, model_count: int = 0, force_gc: bool = False, internal_validation: bool = False, **kwargs)

Takes numeric data, returns numeric forecasts.

Only one model (albeit potentially an ensemble)! Horizontal ensembles can not be nested, other ensemble types can be.

Well, she turned me into a newt. A newt? I got better. -Python

Parameters:
  • model_name (str) – a string to be direct to the appropriate model, used in ModelMonster

  • model_param_dict (dict) – dictionary of parameters to be passed into the model.

  • model_transform_dict (dict) – a dictionary of fillNA and transformation methods to be used pass an empty dictionary if no transformations are desired.

  • df_train (pandas.DataFrame) – numeric training dataset of DatetimeIndex and series as cols

  • forecast_length (int) – number of periods to forecast

  • frequency (str) – str representing frequency alias of time series

  • prediction_interval (float) – width of errors (note: rarely do the intervals accurately match the % asked for…)

  • no_negatives (bool) – whether to force all forecasts to be > 0

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values.

  • future_regressor_train (pd.Series) – with datetime index, of known in advance data, section matching train data

  • future_regressor_forecast (pd.Series) – with datetime index, of known in advance data, section matching test data

  • holiday_country (str) – passed through to holiday package, used by a few models as 0/1 regressor.

  • n_jobs (int) – number of CPUs to use when available.

  • template_cols (list) – column names of columns used as model template

  • horizontal_subset (list) – columns of df_train to use for forecast, meant for internal use for horizontal ensembling

  • fail_on_forecast_nan (bool) – if False, return forecasts even if NaN present, if True, raises error if any nan in forecast. True is recommended.

  • return_model (bool) – if True, forecast will have .model and .tranformer attributes set to model object. Only works for non-ensembles.

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

  • force_gc (bool) – if True, run gc.collect() after each model

  • internal_validation – niche flag to tell that it is running inside a template model search

Returns:

Prediction from AutoTS model object

Return type:

PredictionObject (autots.PredictionObject)