autots.evaluator package

Submodules

autots.evaluator.anomaly_detector module

Anomaly Detector Created on Mon Jul 18 14:19:55 2022

@author: Colin

class autots.evaluator.anomaly_detector.AnomalyDetector(output='multivariate', method='zscore', transform_dict={'transformation_params': {0: {'datepart_method': 'simple_3', 'regression_model': {'model': 'ElasticNet', 'model_params': {}}}}, 'transformations': {0: 'DatepartRegression'}}, forecast_params=None, method_params={}, eval_period=None, isolated_only=False, n_jobs=1)

Bases: object

detect(df)

All will return -1 for anomalies.

Parameters:

df (pd.DataFrame) – pandas wide-style data

Returns:

pd.DataFrame (classifications, -1 = outlier, 1 = not outlier), pd.DataFrame s(scores)

fit(df)
fit_anomaly_classifier()

Fit a model to predict if a score is an anomaly.

static get_new_params(method='random')

Generate random new parameter combinations.

Parameters:

method (str) – ‘fast’, ‘deep’, ‘default’, or any of the anomaly method names (ie ‘IQR’) to specify only that method

plot(series_name=None, title=None, plot_kwargs={})
score_to_anomaly(scores)

A DecisionTree model, used as models are nonstandard (and nonparametric).

class autots.evaluator.anomaly_detector.HolidayDetector(anomaly_detector_params={}, threshold=0.8, min_occurrences=2, splash_threshold=0.65, use_dayofmonth_holidays=True, use_wkdom_holidays=True, use_wkdeom_holidays=True, use_lunar_holidays=True, use_lunar_weekday=False, use_islamic_holidays=True, use_hebrew_holidays=True, output: str = 'multivariate', n_jobs: int = 1)

Bases: object

dates_to_holidays(dates, style='flag', holiday_impacts=False)

Populate date information for a given pd.DatetimeIndex.

Parameters:
  • dates (pd.DatetimeIndex) – list of dates

  • day_holidays (pd.DataFrame) – list of month/day holidays. Pass None if not available

  • style (str) – option for how to return information “long” - return date, name, series for all holidays in a long style dataframe “impact” - returns dates, series with values of sum of impacts (if given) or joined string of holiday names ‘flag’ - return dates, holidays flag, (is not 0-1 but rather sum of input series impacted for that holiday and day) ‘prophet’ - return format required for prophet. Will need to be filtered on series for multivariate case ‘series_flag’ - dates, series 0/1 for if holiday occurred in any calendar

  • holiday_impacts (dict) – a dict passed to .replace contaning values for holiday_names, or str ‘value’ or ‘anomaly_score’

detect(df)

Run holiday detection. Input wide-style pandas time series.

fit(df)
static get_new_params(method='random')
plot(series_name=None, include_anomalies=True, title=None, plot_kwargs={}, series=None)
plot_anomaly(kwargs={})

autots.evaluator.auto_model module

Mid-level helper functions for AutoTS.

autots.evaluator.auto_model.ModelMonster(model: str, parameters: dict = {}, frequency: str = 'infer', prediction_interval: float = 0.9, holiday_country: str = 'US', startTimeStamps=None, forecast_length: int = 14, random_seed: int = 2020, verbose: int = 0, n_jobs: int | None = None, **kwargs)

Directs strings and parameters to appropriate model objects.

Parameters:
  • model (str) – Name of Model Function

  • parameters (dict) – Dictionary of parameters to pass through to model

class autots.evaluator.auto_model.ModelPrediction(forecast_length: int, transformation_dict: dict, model_str: str, parameter_dict: dict, frequency: str = 'infer', prediction_interval: float = 0.9, no_negatives: bool = False, constraint: float | None = None, holiday_country: str = 'US', startTimeStamps=None, grouping_ids=None, fail_on_forecast_nan: bool = True, return_model: bool = False, random_seed: int = 2020, verbose: int = 0, n_jobs: int | None = None, current_model_file: str | None = None, model_count: int = 0, force_gc: bool = False)

Bases: ModelObject

Feed parameters into modeling pipeline. A class object, does NOT work with ensembles.

Parameters:
  • df_train (pandas.DataFrame) – numeric training dataset of DatetimeIndex and series as cols

  • forecast_length (int) – number of periods to forecast

  • transformation_dict (dict) – a dictionary of outlier, fillNA, and transformation methods to be used

  • model_str (str) – a string to be direct to the appropriate model, used in ModelMonster

  • frequency (str) – str representing frequency alias of time series

  • prediction_interval (float) – width of errors (note: rarely do the intervals accurately match the % asked for…)

  • no_negatives (bool) – whether to force all forecasts to be > 0

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values.

  • future_regressor_train (pd.Series) – with datetime index, of known in advance data, section matching train data

  • future_regressor_forecast (pd.Series) – with datetime index, of known in advance data, section matching test data

  • holiday_country (str) – passed through to holiday package, used by a few models as 0/1 regressor.

  • startTimeStamps (pd.Series) – index (series_ids), columns (Datetime of First start of series)

  • fail_on_forecast_nan (bool) – if False, return forecasts even if NaN present, if True, raises error if any nan in forecast

  • return_model (bool) – if True, forecast will have .model and .tranformer attributes set to model object.

  • n_jobs (int) – number of processes

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

Returns:

Prediction from AutoTS model object

Return type:

PredictionObject (autots.PredictionObject)

fit(df, future_regressor=None)
fit_data(df, future_regressor=None)
fit_predict(df, forecast_length, future_regressor_train=None, future_regressor_forecast=None)
predict(forecast_length=None, future_regressor=None)
autots.evaluator.auto_model.NewGeneticTemplate(model_results, submitted_parameters, sort_column: str = 'Score', sort_ascending: bool = True, max_results: int = 50, max_per_model_class: int = 5, top_n: int = 50, template_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'], transformer_list: dict = {}, transformer_max_depth: int = 8, models_mode: str = 'default', score_per_series=None, recursive_count=0, model_list=None)

Return new template given old template with model accuracies.

“No mating!” - Pattern, Sanderson

Parameters:
  • model_results (pandas.DataFrame) – models that have actually been run

  • submitted_paramters (pandas.DataFrame) – models tried (may have returned different parameters to results)

autots.evaluator.auto_model.RandomTemplate(n: int = 10, model_list: list = ['ZeroesNaive', 'LastValueNaive', 'AverageValueNaive', 'GLS', 'GLM', 'ETS'], transformer_list: dict = 'fast', transformer_max_depth: int = 8, models_mode: str = 'default')

Returns a template dataframe of randomly generated transformations, models, and hyperparameters.

Parameters:

n (int) – number of random models to return

class autots.evaluator.auto_model.TemplateEvalObject(model_results=Empty DataFrame Columns: [] Index: [], per_timestamp_smape=Empty DataFrame Columns: [] Index: [], per_series_metrics=Empty DataFrame Columns: [] Index: [], per_series_mae=None, per_series_rmse=None, per_series_made=None, per_series_contour=None, per_series_spl=None, per_series_mle=None, per_series_imle=None, per_series_maxe=None, per_series_oda=None, per_series_mqae=None, per_series_dwae=None, per_series_ewmae=None, per_series_uwmse=None, per_series_smoothness=None, per_series_mate=None, per_series_matse=None, per_series_wasserstein=None, per_series_dwd=None, model_count: int = 0)

Bases: object

Object to contain all the failures!.

full_mae_ids

list of model_ids corresponding to full_mae_errors

Type:

list

full_mae_errors

list of numpy arrays of shape (rows, columns) appended in order of validation only provided for ‘mosaic’ ensembling

Type:

list

concat(another_eval)

Merge another TemplateEvalObject onto this one.

load(filename)
save(filename='initial_results.pickle')

Save results to a file.

Parameters:

filename (str) – *.pickle or *.csv. .pickle saves full results

autots.evaluator.auto_model.TemplateWizard(template, df_train, df_test, weights, model_count: int = 0, ensemble: list = ['mosaic', 'distance'], forecast_length: int = 14, frequency: str = 'infer', prediction_interval: float = 0.9, no_negatives: bool = False, constraint: float | None = None, future_regressor_train=None, future_regressor_forecast=None, holiday_country: str = 'US', startTimeStamps=None, random_seed: int = 2020, verbose: int = 0, n_jobs: int | None = None, validation_round: int = 0, current_generation: int = 0, max_generations: str = '0', model_interrupt: bool = False, grouping_ids=None, template_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'], traceback: bool = False, current_model_file: str | None = None, mosaic_used=None, force_gc: bool = False, additional_msg: str = '')

Take Template, returns Results.

There are some who call me… Tim. - Python

Parameters:
  • template (pandas.DataFrame) – containing model str, and json of transformations and hyperparamters

  • df_train (pandas.DataFrame) – numeric training dataset of DatetimeIndex and series as cols

  • df_test (pandas.DataFrame) – dataframe of actual values of (forecast length * n series)

  • weights (dict) – key = column/series_id, value = weight

  • ensemble (list) – list of ensemble types to prepare metric collection

  • forecast_length (int) – number of periods to forecast

  • transformation_dict (dict) – a dictionary of outlier, fillNA, and transformation methods to be used

  • model_str (str) – a string to be direct to the appropriate model, used in ModelMonster

  • frequency (str) – str representing frequency alias of time series

  • prediction_interval (float) – width of errors (note: rarely do the intervals accurately match the % asked for…)

  • no_negatives (bool) – whether to force all forecasts to be > 0

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values.

  • future_regressor_train (pd.Series) – with datetime index, of known in advance data, section matching train data

  • future_regressor_forecast (pd.Series) – with datetime index, of known in advance data, section matching test data

  • holiday_country (str) – passed through to holiday package, used by a few models as 0/1 regressor.

  • startTimeStamps (pd.Series) – index (series_ids), columns (Datetime of First start of series)

  • validation_round (int) – int passed to record current validation.

  • current_generation (int) – info to pass to print statements

  • max_generations (str) – info to pass to print statements

  • model_interrupt (bool) – if True, keyboard interrupts are caught and only break current model eval.

  • template_cols (list) – column names of columns used as model template

  • traceback (bool) – include tracebook over just error representation

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

  • force_gc (bool) – if True, run gc.collect after every model run

Returns:

TemplateEvalObject

autots.evaluator.auto_model.UniqueTemplates(existing_templates, new_possibilities, selection_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'])

Returns unique dataframe rows from new_possiblities not in existing_templates.

Parameters:

selection_cols (list) – list of column namess to use to judge uniqueness/match on

autots.evaluator.auto_model.back_forecast(df, model_name, model_param_dict, model_transform_dict, future_regressor_train=None, n_splits: int = 'auto', forecast_length=7, frequency='infer', prediction_interval=0.9, no_negatives=False, constraint=None, holiday_country='US', random_seed=123, n_jobs='auto', verbose=0, eval_periods: int | None = None, current_model_file: str | None = None, force_gc: bool = False, **kwargs)

Create forecasts for the historical training data, ie. backcast or back forecast.

This actually forecasts on historical data, these are not fit model values as are often returned by other packages. As such, this will be slower, but more representative of real world model performance. There may be jumps in data between chunks.

Args are same as for model_forecast except… n_splits(int): how many pieces to split data into. Pass 2 for fastest, or “auto” for best accuracy

Returns a standard prediction object (access .forecast, .lower_forecast, .upper_forecast)

Parameters:

eval_period (int) – if passed, only returns results for this many time steps of recent history

autots.evaluator.auto_model.create_model_id(model_str: str, parameter_dict: dict = {}, transformation_dict: dict = {})

Create a hash ID which should be unique to the model parameters.

autots.evaluator.auto_model.dict_recombination(a: dict, b: dict)

Recombine two dictionaries with identical keys. Return new dict.

autots.evaluator.auto_model.generate_score(model_results, metric_weighting: dict = {}, prediction_interval: float = 0.9)

Generate score based on relative accuracies.

SMAPE - smaller is better MAE - smaller is better RMSE - smaller is better MADE - smaller is better MLE - smaller is better MAGE - smaller is better SPL - smaller is better ODA - bigger is better DWAE - smaller is better Contour - bigger is better (is 0 to 1) Containment - bigger is better (is 0 to 1) Runtime - smaller is better

autots.evaluator.auto_model.generate_score_per_series(results_object, metric_weighting, total_validations=1, models_to_use=None)

Score generation on per_series_metrics for ensembles.

autots.evaluator.auto_model.horizontal_template_to_model_list(template)

helper function to take template dataframe of ensembles to a single list of models.

autots.evaluator.auto_model.model_forecast(model_name, model_param_dict, model_transform_dict, df_train, forecast_length: int, frequency: str = 'infer', prediction_interval: float = 0.9, no_negatives: bool = False, constraint: float | None = None, future_regressor_train=None, future_regressor_forecast=None, holiday_country: str = 'US', startTimeStamps=None, grouping_ids=None, fail_on_forecast_nan: bool = True, random_seed: int = 2020, verbose: int = 0, n_jobs: int = 'auto', template_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'], horizontal_subset: list | None = None, return_model: bool = False, current_model_file: str | None = None, model_count: int = 0, force_gc: bool = False, **kwargs)

Takes numeric data, returns numeric forecasts.

Only one model (albeit potentially an ensemble)! Horizontal ensembles can not be nested, other ensemble types can be.

Well, she turned me into a newt. A newt? I got better. -Python

Parameters:
  • model_name (str) – a string to be direct to the appropriate model, used in ModelMonster

  • model_param_dict (dict) – dictionary of parameters to be passed into the model.

  • model_transform_dict (dict) – a dictionary of fillNA and transformation methods to be used pass an empty dictionary if no transformations are desired.

  • df_train (pandas.DataFrame) – numeric training dataset of DatetimeIndex and series as cols

  • forecast_length (int) – number of periods to forecast

  • frequency (str) – str representing frequency alias of time series

  • prediction_interval (float) – width of errors (note: rarely do the intervals accurately match the % asked for…)

  • no_negatives (bool) – whether to force all forecasts to be > 0

  • constraint (float) – when not None, use this value * data st dev above max or below min for constraining forecast values.

  • future_regressor_train (pd.Series) – with datetime index, of known in advance data, section matching train data

  • future_regressor_forecast (pd.Series) – with datetime index, of known in advance data, section matching test data

  • holiday_country (str) – passed through to holiday package, used by a few models as 0/1 regressor.

  • n_jobs (int) – number of CPUs to use when available.

  • template_cols (list) – column names of columns used as model template

  • horizontal_subset (list) – columns of df_train to use for forecast, meant for internal use for horizontal ensembling

  • fail_on_forecast_nan (bool) – if False, return forecasts even if NaN present, if True, raises error if any nan in forecast. True is recommended.

  • return_model (bool) – if True, forecast will have .model and .tranformer attributes set to model object. Only works for non-ensembles.

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

  • force_gc (bool) – if True, run gc.collect() after each model

Returns:

Prediction from AutoTS model object

Return type:

PredictionObject (autots.PredictionObject)

autots.evaluator.auto_model.random_model(model_list, model_prob, transformer_list='fast', transformer_max_depth=2, models_mode='random', counter=15, n_models=5, keyword_format=False)

Generate a random model from a given list of models and probabilities.

autots.evaluator.auto_model.remove_leading_zeros(df)

Accepts wide dataframe, returns dataframe with zeroes preceeding any non-zero value as NaN.

autots.evaluator.auto_model.trans_dict_recomb(dict_array)

Recombine two transformation param dictionaries from array of dicts.

autots.evaluator.auto_model.unpack_ensemble_models(template, template_cols: list = ['Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'], keep_ensemble: bool = True, recursive: bool = False)

Take ensemble models from template and add as new rows. Some confusion may exist as Ensembles require both ‘Ensemble’ column > 0 and model name ‘Ensemble’

Parameters:
  • template (pd.DataFrame) – AutoTS template containing template_cols

  • keep_ensemble (bool) – if False, drop row containing original ensemble

  • recursive (bool) – if True, unnest ensembles of ensembles…

autots.evaluator.auto_model.validation_aggregation(validation_results, df_train=None, groupby_cols=['ID', 'Model', 'ModelParameters', 'TransformationParameters', 'Ensemble'])

Aggregate a TemplateEvalObject.

autots.evaluator.auto_ts module

Higher-level functions of automated time series modeling.

class autots.evaluator.auto_ts.AutoTS(forecast_length: int = 14, frequency: str = 'infer', prediction_interval: float = 0.9, max_generations: int = 20, no_negatives: bool = False, constraint: float | None = None, ensemble: str | None = None, initial_template: str = 'General+Random', random_seed: int = 2022, holiday_country: str = 'US', subset: int | None = None, aggfunc: str = 'first', na_tolerance: float = 1, metric_weighting: dict = {'containment_weighting': 0, 'contour_weighting': 0.01, 'imle_weighting': 0, 'made_weighting': 0.05, 'mae_weighting': 2, 'mage_weighting': 0, 'mle_weighting': 0, 'oda_weighting': 0.001, 'rmse_weighting': 2, 'runtime_weighting': 0.01, 'smape_weighting': 5, 'spl_weighting': 3, 'wasserstein_weighting': 0.01}, drop_most_recent: int = 0, drop_data_older_than_periods: int | None = None, model_list: str = 'default', transformer_list: dict = 'auto', transformer_max_depth: int = 6, models_mode: str = 'random', num_validations: int = 'auto', models_to_validate: float = 0.15, max_per_model_class: int | None = None, validation_method: str = 'backwards', min_allowed_train_percent: float = 0.5, remove_leading_zeroes: bool = False, prefill_na: str | None = None, introduce_na: bool | None = None, preclean: dict | None = None, model_interrupt: bool = True, generation_timeout: int | None = None, current_model_file: str | None = None, force_gc: bool = False, horizontal_ensemble_validation: bool = False, verbose: int = 1, n_jobs: int = 0.5)

Bases: object

Automate time series modeling using a genetic algorithm.

Parameters:
  • forecast_length (int) – number of periods over which to evaluate forecast. Can be overriden later in .predict(). when you don’t have much historical data, using a small forecast length for .fit and the full desired forecast lenght for .predict is usually the best possible approach given limitations.

  • frequency (str) – ‘infer’ or a specific pandas datetime offset. Can be used to force rollup of data (ie daily input, but frequency ‘M’ will rollup to monthly).

  • prediction_interval (float) – 0-1, uncertainty range for upper and lower forecasts. Adjust range, but rarely matches actual containment.

  • max_generations (int) – number of genetic algorithms generations to run. More runs = longer runtime, generally better accuracy. It’s called max because someday there will be an auto early stopping option, but for now this is just the exact number of generations to run.

  • no_negatives (bool) – if True, all negative predictions are rounded up to 0.

  • constraint (float) –

    when not None, use this float value * data st dev above max or below min for constraining forecast values. now also instead accepts a dictionary containing the following key/values:

    constraint_method (str): one of

    stdev_min - threshold is min and max of historic data +/- constraint * st dev of data stdev - threshold is the mean of historic data +/- constraint * st dev of data absolute - input is array of length series containing the threshold’s final value for each quantile - constraint is the quantile of historic data to use as threshold

    constraint_regularization (float): 0 to 1

    where 0 means no constraint, 1 is hard threshold cutoff, and in between is penalty term

    upper_constraint (float): or array, depending on method, None if unused lower_constraint (float): or array, depending on method, None if unused bounds (bool): if True, apply to upper/lower forecast, otherwise False applies only to forecast

  • ensemble (str) – None or list or comma-separated string containing: ‘auto’, ‘simple’, ‘distance’, ‘horizontal’, ‘horizontal-min’, ‘horizontal-max’, “mosaic”, “subsample”

  • initial_template (str) – ‘Random’ - randomly generates starting template, ‘General’ uses template included in package, ‘General+Random’ - both of previous. Also can be overriden with self.import_template()

  • random_seed (int) – random seed allows (slightly) more consistent results.

  • holiday_country (str) – passed through to Holidays package for some models.

  • subset (int) – maximum number of series to evaluate at once. Useful to speed evaluation when many series are input. takes a new subset of columns on each validation, unless mosaic ensembling, in which case columns are the same in each validation

  • aggfunc (str) – if data is to be rolled up to a higher frequency (daily -> monthly) or duplicate timestamps are included. Default ‘first’ removes duplicates, for rollup try ‘mean’ or np.sum. Beware numeric aggregations like ‘mean’ will not work with non-numeric inputs. Numeric aggregations like ‘sum’ will also change nan values to 0

  • na_tolerance (float) – 0 to 1. Series are dropped if they have more than this percent NaN. 0.95 here would allow series containing up to 95% NaN values.

  • metric_weighting (dict) – weights to assign to metrics, effecting how the ranking score is generated.

  • drop_most_recent (int) – option to drop n most recent data points. Useful, say, for monthly sales data where the current (unfinished) month is included. occurs after any aggregration is applied, so will be whatever is specified by frequency, will drop n frequencies

  • drop_data_older_than_periods (int) – take only the n most recent timestamps

  • model_list (list) – str alias or list of names of model objects to use now can be a dictionary of {“model”: prob} but only affects starting random templates. Genetic algorithim takes from there.

  • transformer_list (list) – list of transformers to use, or dict of transformer:probability. Note this does not apply to initial templates. can accept string aliases: “all”, “fast”, “superfast”, ‘scalable’ (scalable is a subset of fast that should have fewer memory issues at scale)

  • transformer_max_depth (int) – maximum number of sequential transformers to generate for new Random Transformers. Fewer will be faster.

  • models_mode (str) – option to adjust parameter options for newly generated models. Only sporadically utilized. Currently includes: ‘default’/’random’, ‘deep’ (searches more params, likely slower), and ‘regressor’ (forces ‘User’ regressor mode in regressor capable models), ‘gradient_boosting’, ‘neuralnets’ (~Regression class models only)

  • num_validations (int) – number of cross validations to perform. 0 for just train/test on best split. Possible confusion: num_validations is the number of validations to perform after the first eval segment, so totally eval/validations will be this + 1. Also “auto” and “max” aliases available. Max maxes out at 50.

  • models_to_validate (int) – top n models to pass through to cross validation. Or float in 0 to 1 as % of tried. 0.99 is forced to 100% validation. 1 evaluates just 1 model. If horizontal or mosaic ensemble, then additional min per_series models above the number here are added to validation.

  • max_per_model_class (int) – of the models_to_validate what is the maximum to pass from any one model class/family.

  • validation_method (str) – ‘even’, ‘backwards’, or ‘seasonal n’ where n is an integer of seasonal ‘backwards’ is better for recency and for shorter training sets ‘even’ splits the data into equally-sized slices best for more consistent data, a poetic but less effective strategy than others here ‘seasonal’ most similar indexes ‘seasonal n’ for example ‘seasonal 364’ would test all data on each previous year of the forecast_length that would immediately follow the training data. ‘similarity’ automatically finds the data sections most similar to the most recent data that will be used for prediction ‘custom’ - if used, .fit() needs validation_indexes passed - a list of pd.DatetimeIndex’s, tail of each is used as test

  • min_allowed_train_percent (float) – percent of forecast length to allow as min training, else raises error. 0.5 with a forecast length of 10 would mean 5 training points are mandated, for a total of 15 points. Useful in (unrecommended) cases where forecast_length > training length.

  • remove_leading_zeroes (bool) – replace leading zeroes with NaN. Useful in data where initial zeroes mean data collection hasn’t started yet.

  • prefill_na (str) – value to input to fill all NaNs with. Leaving as None and allowing model interpolation is recommended. None, 0, ‘mean’, or ‘median’. 0 may be useful in for examples sales cases where all NaN can be assumed equal to zero.

  • introduce_na (bool) – whether to force last values in one training validation to be NaN. Helps make more robust models. defaults to None, which introduces NaN in last rows of validations if any NaN in tail of training data. Will not introduce NaN to all series if subset is used. if True, will also randomly change 20% of all rows to NaN in the validations

  • preclean (dict) – if not None, a dictionary of Transformer params to be applied to input data {“fillna”: “median”, “transformations”: {}, “transformation_params”: {}} This will change data used in model inputs for fit and predict, and for accuracy evaluation in cross validation!

  • model_interrupt (bool) – if False, KeyboardInterrupts quit entire program. if True, KeyboardInterrupts attempt to only quit current model. if True, recommend use in conjunction with verbose > 0 and result_file in the event of accidental complete termination. if “end_generation”, as True and also ends entire generation of run. Note skipped models will not be tried again.

  • generation_timeout (int) – if not None, this is the number of minutes from start at which the generational search ends, then proceeding to validation This is only checked after the end of each generation, so only offers an ‘approximate’ timeout for searching. It is an overall cap for total generation search time, not per generation.

  • current_model_file (str) – file path to write to disk of current model params (for debugging if computer crashes). .json is appended

  • force_gc (bool) – if True, run gc.collect() after each model run. Probably won’t make much difference.

  • horizontal_ensemble_validation (bool) – True is slower but more reliable model selection on unstable data, if horz. ensembles are used

  • verbose (int) – setting to 0 or lower should reduce most output. Higher numbers give more output.

  • n_jobs (int) – Number of cores available to pass to parallel processing. A joblib context manager can be used instead (pass None in this case). Also ‘auto’.

best_model

DataFrame containing template for the best ranked model

Type:

pd.DataFrame

best_model_name

model name

Type:

str

best_model_params

model params

Type:

dict

best_model_transformation_params

transformation parameters

Type:

dict

best_model_ensemble

Ensemble type int id

Type:

int

regression_check

If True, the best_model uses an input ‘User’ future_regressor

Type:

bool

df_wide_numeric

dataframe containing shaped final data, will include preclean

Type:

pd.DataFrame

initial_results.model_results

contains a collection of result metrics

Type:

object

score_per_series

generated score of metrics given per input series, if horizontal ensembles

Type:

pd.DataFrame

fit, predict
export_template, import_template, import_results, import_best_model
results, failure_rate
horizontal_to_df, mosaic_to_df
plot_horizontal, plot_horizontal_transformers, plot_generation_loss, plot_backforecast
back_forecast(series=None, n_splits: int = 'auto', tail: int = 'auto', verbose: int = 0)

Create forecasts for the historical training data, ie. backcast or back forecast. OUT OF SAMPLE

This actually forecasts on historical data, these are not fit model values as are often returned by other packages. As such, this will be slower, but more representative of real world model performance. There may be jumps in data between chunks.

Args are same as for model_forecast except… n_splits(int): how many pieces to split data into. Pass 2 for fastest, or “auto” for best accuracy series (str): if to run on only one column, pass column name. Faster than full. tail (int): df.tail() of the dataset, back_forecast is only run on n most recent observations.

which points at eval_periods of lower-level back_forecast function

Returns a standard prediction object (access .forecast, .lower_forecast, .upper_forecast)

best_model_per_series_mape()

This isn’t quite classic mape but is a percentage mean error intended for quick visuals not final statistics (see model.results()).

best_model_per_series_score()
diagnose_params(target='runtime', waterfall_plots=True)

Attempt to explain params causing measured outcomes using shap and linear regression coefficients.

Parameters:
  • target (str) – runtime, smape, mae, oda, or exception, the measured outcome to correlate parameters with

  • waterfall_plots (bool) – whether to show waterfall SHAP plots

expand_horizontal()

Enables expanding horizontal models trained on a subset to full data. Reruns template models and generates new template.

see best_model_original and best_model_original_id for reference back to original best model after this runs

export_best_model(filename, **kwargs)

Basically the same as export_template but only ever the one best model.

export_template(filename=None, models: str = 'best', n: int = 40, max_per_model_class: int | None = None, include_results: bool = False, unpack_ensembles: bool = False, min_metrics: list = ['smape', 'spl', 'wasserstein', 'mle', 'imle', 'ewmae'], max_metrics: list | None = None)

Export top results as a reusable template.

Parameters:
  • filename (str) – ‘csv’ or ‘json’ (in filename). None to return a dataframe and not write a file.

  • models (str) – ‘best’ or ‘all’, and ‘slowest’ for diagnostics

  • n (int) – if models = ‘best’, how many n-best to export

  • max_per_model_class (int) – if models = ‘best’, the max number of each model class to include in template

  • include_results (bool) – whether to include performance metrics

  • unpack_ensembles (bool) – if True, ensembles are returned only as components (will result in larger n models, as full ensemble counts as 1 model)

  • min_metrics (list) – if not None and models==’best’, include the lowest for this metric, a way to include even if not a major part of metric weighting as an addon

  • max_metrics (list) – for metrics to take the max model for

failure_rate(result_set: str = 'initial')

Return fraction of models passing with exceptions.

Parameters:

result_set (str, optional) – ‘validation’ or ‘initial’. Defaults to ‘initial’.

Returns:

float.

fit(df, date_col: str | None = None, value_col: str | None = None, id_col: str | None = None, future_regressor=None, weights: dict = {}, result_file: str | None = None, grouping_ids=None, validation_indexes: list | None = None)

Train algorithm given data supplied.

Parameters:
  • df (pandas.DataFrame) – Datetime Indexed dataframe of series, or dataframe of three columns as below.

  • date_col (str) – name of datetime column if long style data

  • value_col (str) – name of column containing the data of series if using long style data. NOT for pointing out the most important column if several, that’s weights

  • id_col (str) – name of column identifying different series if long style data.

  • future_regressor (numpy.Array) – single external regressor matching train.index

  • weights (dict) – {‘colname1’: 2, ‘colname2’: 5} - increase importance of a series in metric evaluation. Any left blank assumed to have weight of 1. pass the alias ‘mean’ as a str ie weights=’mean’ to automatically use the mean value of a series as its weight available aliases: mean, median, min, max

  • result_file (str) – results saved on each new generation. Does not include validation rounds. “.csv” save model results table. “.pickle” saves full object, including ensemble information.

  • grouping_ids (dict) – currently a one-level dict containing series_id:group_id mapping. used in 0.2.x but not 0.3.x+ versions. retained for potential future use

fit_data(df, date_col=None, value_col=None, id_col=None, future_regressor=None, weights={})

Part of the setup that involves fitting the initial data but not running any models.

get_metric_corr(percent_best=0.1)

Returns a dataframe of correlation among evaluation metrics across evaluations.

Parameters:

percent_best (float) – percent (ie 0.1 for 10%) of models to use, best by score first

static get_new_params(method='random')

Randomly generate new parameters for the class.

get_params_from_id(model_id=None)

Model id must be one that was run in the fit or imported results.

get_top_n_counts(input_dict=None, n=5)

Get the top n most common value counts using a basic dictionary.

horizontal_per_generation()
horizontal_to_df()

helper function for plotting.

import_best_model(import_target, enforce_model_list: bool = True, include_ensemble: bool = True)

Load a best model, overriding any existing setting.

Parameters:

import_target – pd.DataFrame or file path

import_results(filename)

Add results from another run on the same data.

Input can be filename with .csv or .pickle. or can be a DataFrame of model results or a full TemplateEvalObject

import_template(filename: str, method: str = 'add_on', enforce_model_list: bool = True, include_ensemble: bool = False, include_horizontal: bool = False, force_validation: bool = False)

Import a previously exported template of model parameters. Must be done before the AutoTS object is .fit().

Use import_best_model instead for loading a model for immediate prediction.

Parameters:
  • filename (str) – file location (or a pd.DataFrame already loaded)

  • method (str) – ‘add_on’ or ‘only’ - “add_on” keeps initial_template generated in init. “only” uses only this template.

  • enforce_model_list (bool) – if True, remove model types not in model_list

  • include_ensemble (bool) – if enforce_model_list is True, this specifies whether to allow ensembles anyway (otherwise they are unpacked and parts kept)

  • include_horizontal (bool) – if enforce_model_list is True, this specifies whether to allow ensembles except horizontal (overridden by keep_ensemble)

  • force_validation (bool) – if True, all models imported here will automatically get sent to full cross validation (regardless of first eval performance) weird behavior can occur wtih force_validation if another template is added later with method==’only’. In that case, model.validate_import should be erased by setting to None

list_failed_model_types()

Return a list of model types (ie ETS, LastValueNaive) that failed. If all had at least one success, then return an empty list.

load_template(filename)

Helper funciton for just loading the file part of import_template.

mosaic_to_df()

Helper function to create a readable df of models in mosaic.

parse_best_model()
plot_back_forecast(**kwargs)
plot_backforecast(series=None, n_splits: int = 'auto', start_date='auto', title=None, alpha=0.25, facecolor='black', loc='upper left', **kwargs)

Plot the historical data and fit forecast on historic. Out of sample in chunks = forecast_length by default.

Parameters:
  • series (str or list) – column names of time series

  • n_splits (int or str) – “auto”, number > 2, higher more accurate but slower

  • start_date (datetime.datetime) – or “auto”

  • title (str) –

  • pd.DataFrame.plot() (**kwargs passed to) –

plot_generation_loss(title='Single Model Accuracy Gain Over Generations', **kwargs)

Plot improvement in accuracy over generations. Note: this is only “one size fits all” accuracy and doesn’t account for the benefits seen for ensembling.

Parameters:

pd.DataFrame.plot() (**kwargs passed to) –

plot_horizontal(max_series: int = 20, title='Model Types Chosen by Series', **kwargs)

Simple plot to visualize assigned series: models.

Note that for ‘mosaic’ ensembles, it only plots the type of the most common model_id for that series, or the first if all are mode.

Parameters:
  • max_series (int) – max number of points to plot

  • pandas.plot() (**kwargs passed to) –

plot_horizontal_model_count(color_list=None, top_n: int = 20, title='Most Frequently Chosen Models', **kwargs)

Plots most common models. Does not factor in nested in non-horizontal Ensembles.

plot_horizontal_per_generation(title='Horizontal Ensemble Accuracy Gain (first eval sample only)', **kwargs)

Plot how well the horizontal ensembles would do after each new generation. Slow.

plot_horizontal_transformers(method='transformers', color_list=None, **kwargs)

Simple plot to visualize transformers used. Note this doesn’t capture transformers nested in simple ensembles.

Parameters:
  • method (str) – ‘fillna’ or ‘transformers’ - which to plot

  • hex. (color_list = list of colors to sample for bar colors. Can be names or) –

  • pandas.plot() (**kwargs passed to) –

plot_metric_corr(cols=None, percent_best=0.1)

Plot correlation in results among metrics. The metrics that are highly correlated are those that mostly the unscaled ones

Parameters:
  • cols (list) – strings of columns to show, ‘all’ for all

  • percent_best (float) – percent (ie 0.1 for 10%) of models to use, best by score first

plot_per_series_error(title: str = 'Top Series Contributing Score Error', max_series: int = 10, max_name_chars: int = 25, color: str = '#ff9912', figsize=(12, 4), kind: str = 'bar', upper_clip: float = 1000, **kwargs)

Plot which series are contributing most to error (Score) of final model. Avg of validations for best_model

Parameters:
  • title (str) – plot title

  • max_series (int) – max number of series to show on plot (sorted)

  • max_name_chars (str) – if horizontal ensemble, will chop series names to this

  • color (str) – hex or name of color of plot

  • figsize (tuple) – passed through to plot axis

  • kind (str) – bar or pie

  • upper_clip (float) – set max error show to this value, to prevent unnecessary distortion

  • pandas.plot() (**kwargs passed to) –

plot_per_series_mape(title: str | None = None, max_series: int = 10, max_name_chars: int = 25, color: str = '#ff9912', figsize=(12, 4), kind: str = 'bar', **kwargs)

Plot which series are contributing most to SMAPE of final model. Avg of validations for best_model

Parameters:
  • title (str) – plot title

  • max_series (int) – max number of series to show on plot (sorted)

  • max_name_chars (str) – if horizontal ensemble, will chop series names to this

  • color (str) – hex or name of color of plot

  • figsize (tuple) – passed through to plot axis

  • kind (str) – bar or pie

  • pandas.plot() (**kwargs passed to) –

plot_per_series_smape(title: str | None = None, max_series: int = 10, max_name_chars: int = 25, color: str = '#ff9912', figsize=(12, 4), kind: str = 'bar', **kwargs)

To be backwards compatible, not necessarily maintained, plot_per_series_mape is to be preferred.

plot_series_corr(cols=15)

Plot series correlation. Data must be fit first.

Parameters:

cols (list) – strings of columns to show, ‘all’ for all, or int of number to sample

plot_transformer_failure_rate()

Failure Rate per Transformer type (ignoring ensembles), failure may be due to other model or transformer.

plot_validations(df_wide=None, models=None, series=None, title=None, start_date='auto', end_date='auto', subset=None, compare_horizontal=False, colors=None, include_bounds=True, alpha=0.35, start_color='darkred', end_color='#A2AD9C', **kwargs)

Similar to plot_backforecast but using the model’s validation segments specifically. Must reforecast. Saves results to self.validation_forecasts and caches. Set validation_forecasts_template to None to force rerun otherwise it uses stored (when models is the same). ‘chosen’ refers to best_model_id, the model chosen to run for predict Validation sections may overlap (depending on method) which can confuse graph readers.

Parameters:
  • models (list) – list, str, df or None, models to compare (IDs unless df of model params)

  • series (str) – time series to graph

  • title (str) – graph title

  • start_date (str) – ‘auto’ or datetime, place to begin graph, None for full

  • end_date (str) – ‘auto’ or datetime, end of graph x axis

  • subset (str) – overrides series, shows either ‘best’ or ‘worst’

  • compare_horizontal (bool) – if True, plot horizontal ensemble versus best non-horizontal model, when available

  • include_bounds (bool) – if True (default) include the upper/lower forecast bounds

  • start_color (str) – color of vline for val start marker, None to remove vline

  • end_color (str) – color of vline for val end marker, None to remove vline

predict(forecast_length: int = 'self', prediction_interval: float = 'self', future_regressor=None, hierarchy=None, just_point_forecast: bool = False, fail_on_forecast_nan: bool = True, verbose: int = 'self', df=None)

Generate forecast data immediately following dates of index supplied to .fit().

If using a model from update_fit list, with no ensembling, underlying model will not be retrained when used as below, with a single prediction interval: This designed for high speed forecasting. Full retraining is best when there is sufficient time. `python model = AutoTS(model_list='update_fit') model.fit(df) model.predict() # for new data without retraining model.fit_data(df) model.predict() # to force retrain of best model (but not full model search) model.model = None model.fit_data(df) model.predict() `

Parameters:
  • forecast_length (int) – Number of periods of data to forecast ahead

  • prediction_interval (float) –

    interval of upper/lower forecasts. defaults to ‘self’ ie the interval specified in __init__() if prediction_interval is a list, then returns a dict of forecast objects.

    {str(interval): prediction_object}

  • future_regressor (numpy.Array) – additional regressor

  • hierarchy – Not yet implemented

  • just_point_forecast (bool) – If True, return a pandas.DataFrame of just point forecasts

  • fail_on_forecast_nan (bool) – if False, return forecasts even if NaN present, if True, raises error if any nan in forecast

  • df (pd.DataFrame) – wide style df, if present, calls fit_data with this dataframe. Recommended strongly to use model.fit_data(df) first instead as it has more args.

Returns:

Either a PredictionObject of forecasts and metadata, or if just_point_forecast == True, a dataframe of point forecasts

results(result_set: str = 'initial')

Convenience function to return tested models table.

Parameters:

result_set (str) – ‘validation’ or ‘initial’

retrieve_validation_forecasts(models=None, compare_horizontal=False, id_name='SeriesID', value_name='Value', interval_name='PredictionInterval')
save_template(filename, export_template, **kwargs)

Helper function for the save part of export_template.

validation_agg()
autots.evaluator.auto_ts.error_correlations(all_result, result: str = 'corr')

Onehot encode AutoTS result df and return df or correlation with errors.

Parameters:
  • all_results (pandas.DataFrame) – AutoTS model_results df

  • result (str) – whether to return ‘df’, ‘corr’, ‘poly corr’ with errors

autots.evaluator.auto_ts.fake_regressor(df, forecast_length: int = 14, date_col: str | None = None, value_col: str | None = None, id_col: str | None = None, frequency: str = 'infer', aggfunc: str = 'first', drop_most_recent: int = 0, na_tolerance: float = 0.95, drop_data_older_than_periods: int = 100000, dimensions: int = 1, verbose: int = 0)

Create a fake regressor of random numbers for testing purposes.

autots.evaluator.benchmark module

Created on Fri Nov 5 13:45:01 2021

@author: Colin

class autots.evaluator.benchmark.Benchmark

Bases: object

run(n_jobs: int = 'auto', times: int = 3, random_seed: int = 123, base_models_only=False, verbose: int = 0)

Run benchmark.

Parameters:
  • n_jobs (int) – passed to model_forecast for n cpus

  • times (int) – number of times to run benchmark models (returns avg of n times)

  • random_seed (int) – random seed, increases consistency

  • base_models_only (bool) – if True, doesn’t attempt Tensorflow, GluonTS, or Prophet models

autots.evaluator.event_forecasting module

Generate probabilities of forecastings crossing limit thresholds. Created on Thu Jan 27 13:36:18 2022

class autots.evaluator.event_forecasting.EventRiskForecast(df_train, forecast_length, frequency: str = 'infer', prediction_interval=0.9, lower_limit=0.05, upper_limit=0.95, model_name='UnivariateMotif', model_param_dict={'distance_metric': 'euclidean', 'k': 10, 'pointed_method': 'median', 'return_result_windows': True, 'window': 14}, model_transform_dict={'fillna': 'pchip', 'transformation_params': {'0': {'method': 0.5}, '1': {}, '2': {'fixed': False, 'window': 7}, '3': {}}, 'transformations': {'0': 'Slice', '1': 'DifferencedTransformer', '2': 'RollingMeanTransformer', '3': 'MaxAbsScaler'}}, model_forecast_kwargs={'max_generations': 30, 'n_jobs': 'auto', 'random_seed': 321, 'verbose': 1}, future_regressor_train=None, future_regressor_forecast=None)

Bases: object

Generate a risk score (0 to 1, but usually close to 0) for a future event exceeding user specified upper or lower bounds.

Upper and lower limits can be one of four types, and may each be different. 1. None (no risk score calculated for this direction) 2. Float in range [0, 1] historic quantile of series (which is historic min and max at edges) is chosen as limit. 3. A dictionary of {“model_name”: x, “model_param_dict”: y, “model_transform_dict”: z, “prediction_interval”: 0.9} to generate a forecast as the limits

Primarily intended for simple forecasts like SeasonalNaive, but can be used with any AutoTS model

  1. a custom input numpy array of shape (forecast_length, num_series)

This can be used to find the “middle” limit too, flip so upper=lower and lower=upper, then abs(U - (1 - L)). In some cases it may help to drop the results from the first forecast timestep or two.

This functions by generating multiple outcome forecast possiblities in two ways. If a ‘Motif’ type model is passed, it uses all the k neighbors motifs as outcome paths (recommended) All other AutoTS models will generate the possible outcomes by utilizing multiple prediction_intervals (more intervals = slower but more resolution). The risk score is then the % of outcome forecasts which cross the limit. (less than or equal for lower, greater than or equal for upper)

Only accepts wide style dataframe input. Methods are class_methods and can be used standalone. They default to __init__ inputs, but can be overriden. Results are usually a numpy array of shape (forecast_length, num_series)

Parameters:
  • df_train (pd.DataFrame) – `wide style data, pd.DatetimeIndex for index and one series per column

  • forecast_length (int) – number of forecast steps to make

  • frequency (str) – frequency of timesteps

  • prediction_interval (float) – float or list of floats for probabilistic forecasting if a list, the first item in the list is the one used for .fit default

  • model_forecast_kwargs (dict) – AutoTS kwargs to pass to generaet_result_windows, .fit_forecast, and forecast-style limits

  • model_name – for model_forecast in generate_result_windows

  • model_param_dict – for model_forecast in generate_result_windows

  • model_transform_dict – for model_forecast in generate_result_windows

  • future_regressor_train – regressor arrays if used

  • future_regressor_forecast – regressor arrays if used

fit()
predict()
predict_historic()
generate_result_windows()
generate_risk_array()
generate_historic_risk_array()
set_limit()
plot()
result_windows, forecast_df, up_forecast_df, low_forecast_df
lower_limit_2d, upper_limit_2d, upper_risk_array, lower_risk_array
fit(df_train=None, forecast_length=None, prediction_interval=None, models_mode='event_risk', model_list=['UnivariateMotif', 'MultivariateMotif', 'SectionalMotif', 'ARCH', 'MetricMotif', 'SeasonalityMotif'], ensemble=None, autots_kwargs=None, future_regressor_train=None)

Shortcut for generating model params.

args specified are those suggested for an otherwise normal AutoTS run

Parameters:
  • df_train (pd.DataFrame) – wide style only

  • model_method (str) – event_risk here is used by motif models

  • model_list (list) – suggesting the use of motif models

  • ensemble (list) – must be None or empty list to get motif result windows

  • autots_kwargs (dict) – all other args passed in as kwargs if None, defaults to class model_forecast_kwargs, for blank pass empty dict

static generate_historic_risk_array(df, limit, direction='upper')

Given a df and a limit, returns a 0/1 array of whether limit was equaled or exceeded.

generate_result_windows(df_train=None, forecast_length=None, frequency=None, prediction_interval=None, model_name=None, model_param_dict=None, model_transform_dict=None, model_forecast_kwargs=None, future_regressor_train=None, future_regressor_forecast=None)

For event risk forecasting. Params default to class init but can be overridden here.

Returns:

(num_samples/k, forecast_length, num_series/columns)

Return type:

result_windows (numpy.array)

static generate_risk_array(result_windows, limit, direction='upper')

Given a df and a limit, returns a 0/1 array of whether limit was equaled or exceeded.

plot(column_idx=0, grays=['#838996', '#c0c0c0', '#dcdcdc', '#a9a9a9', '#808080', '#989898', '#808080', '#757575', '#696969', '#c9c0bb', '#c8c8c8', '#323232', '#e5e4e2', '#778899', '#4f666a', '#848482', '#414a4c', '#8a7f80', '#c4c3d0', '#bebebe', '#dbd7d2'], up_low_color=['#ff4500', '#ff5349'], bar_color='#6495ED', bar_ylim=[0.0, 0.5], figsize=(14, 8), result_windows=None, lower_limit_2d=None, upper_limit_2d=None, upper_risk_array=None, lower_risk_array=None)

Plot a sample of the risk forecast outcomes.

Parameters:
  • column_idx (int) – positional index of series to sample for plot

  • grays (list of str) – list of hex codes for colors for the potential forecasts

  • up_low_colors (list of str) – two hex code colors for lower and upper

  • bar_color (str) – hex color for bar graph

  • bar_ylim (list) – passed to ylim of plot, sets scale of axis of barplot

  • figsize (tuple) – passed to figsize of output figure

plot_eval(df_test, column_idx=0, actuals_color=['#00BFFF'], up_low_color=['#ff4500', '#ff5349'], bar_color='#6495ED', bar_ylim=[0.0, 0.5], figsize=(14, 8), lower_limit_2d=None, upper_limit_2d=None, upper_risk_array=None, lower_risk_array=None)

Plot a sample of the risk forecast with known value vs risk score.

Parameters:
  • df_test (pd.DataFrame) – dataframe of known values (dt index, series)

  • column_idx (int) – positional index of series to sample for plot

  • actuals_color (list of str) – list of one hex code for line of known actuals

  • up_low_colors (list of str) – two hex code colors for lower and upper

  • bar_color (str) – hex color for bar graph

  • bar_ylim (list) – passed to ylim of plot, sets scale of axis of barplot

  • figsize (tuple) – passed to figsize of output figure

predict()

Returns forecast upper, lower risk probability arrays for input limits.

predict_historic(upper_limit=None, lower_limit=None, eval_periods=None)

Returns upper, lower risk probability arrays for input limits for the historic data. If manual numpy array limits are used, the limits will need to be appropriate shape (for df_train and eval_periods if used)

Parameters:
  • upper_limit – if different than the version passed to init

  • lower_limit – if different than the version passed to init

  • eval_periods (int) – only assess the n most recent periods of history

static set_limit(limit, target_shape, df_train, direction='upper', period='forecast', forecast_length=None, eval_periods=None)

Handles all limit input styles and returns numpy array.

Parameters:
  • limit – see class overview for input options

  • target_shape (tuple) – of (forecast_length, num_series)

  • df_train (pd.DataFrame) – training data

  • direction (str) – whether it is the “upper” or “lower” limit

  • period (str) – “forecast” or “historic” only used for limits defined by forecast algorithm params

  • forecast_length (int) – needed only for historic of forecast algorithm defined limit

  • eval_periods (int) – only for historic forecast limit, only runs on the tail n (this) of data

autots.evaluator.event_forecasting.extract_result_windows(forecasts, model_name=None)

standardize result windows from different models.

autots.evaluator.event_forecasting.extract_window_index(forecasts)
autots.evaluator.event_forecasting.set_limit_forecast(df_train, forecast_length, model_name='SeasonalNaive', model_param_dict={'lag_1': 28, 'lag_2': None, 'method': 'median'}, model_transform_dict={'fillna': 'nearest', 'transformation_params': {}, 'transformations': {}}, prediction_interval=0.9, frequency='infer', model_forecast_kwargs={'n_jobs': 'auto', 'random_seed': 321, 'verbose': 1}, future_regressor_train=None, future_regressor_forecast=None)

Helper function for forecast limits set by forecast algorithms.

autots.evaluator.event_forecasting.set_limit_forecast_historic(df_train, forecast_length, model_name='SeasonalNaive', model_param_dict={'lag_1': 28, 'lag_2': None, 'method': 'median'}, model_transform_dict={'fillna': 'nearest', 'transformation_params': {}, 'transformations': {}}, prediction_interval=0.9, frequency='infer', model_forecast_kwargs={'n_jobs': 'auto', 'random_seed': 321, 'verbose': 2}, future_regressor_train=None, future_regressor_forecast=None, eval_periods=None)

Helper function for forecast limits set by forecast algorithms.

autots.evaluator.metrics module

Tools for calculating forecast errors.

Some common args:

A or actual (np.array): actuals ndim 2 (timesteps, series) F or forecast (np.array): forecast values ndim 2 (timesteps, series) ae (np.array): precalculated np.abs(A - F)

autots.evaluator.metrics.array_last_val(arr)
autots.evaluator.metrics.chi_squared_hist_distribution_loss(F, A, bins='auto', plot=False)

Distribution loss, chi-squared distance from histograms.

autots.evaluator.metrics.containment(lower_forecast, upper_forecast, actual)

Expects two, 2-D numpy arrays of forecast_length * n series.

Returns a 1-D array of results in len n series

Parameters:
  • actual (numpy.array) – known true values

  • forecast (numpy.array) – predicted values

autots.evaluator.metrics.contour(A, F)

A measure of how well the actual and forecast follow the same pattern of change. Note: If actual values are unchanging, will match positive changing forecasts. This is faster, and because if actuals are a flat line, contour probably isn’t a concern regardless.

# bluff tops follow the shape of the river below, at different elevation

Expects two, 2-D numpy arrays of forecast_length * n series Returns a 1-D array of results in len n series

NaNs diffs are filled with 0, essentially equiavelent to assuming a forward fill of NaN

Concat the last row of history to head of both A and F (req for 1 step)

Parameters:
  • A (numpy.array) – known true values

  • F (numpy.array) – predicted values

autots.evaluator.metrics.default_scaler(df_train)
autots.evaluator.metrics.dwae(A, F, last_of_array)

Direcitonal Weighted Absolute Error, the accuracy of growth or decline relative to most recent data.

autots.evaluator.metrics.full_metric_evaluation(A, F, upper_forecast, lower_forecast, df_train, prediction_interval, columns=None, scaler=None, return_components=False, cumsum_A=None, diff_A=None, last_of_array=None, **kwargs)

Create a pd.DataFrame of metrics per series given actuals, forecast, and precalculated errors. There are some extra args which are precomputed metrics for efficiency in loops, don’t worry about them.

Parameters:
  • A (np.array) – array or df of actuals

  • F (np.array) – array or df of forecasts

  • return_components (bool) – if True, return tuple of detailed errors

autots.evaluator.metrics.kde(actuals, forecasts, bandwidth, x)
autots.evaluator.metrics.kde_kl_distance(F, A, bandwidth=0.5, x=None)

Distribution loss by means of KDE and KL Divergence.

autots.evaluator.metrics.kl_divergence(p, q, epsilon=1e-10)

Compute KL Divergence between two distributions.

autots.evaluator.metrics.linearity(arr)

Score perecentage of a np.array with linear progression, along the index (0) axis.

autots.evaluator.metrics.mae(ae)

Accepting abs error already calculated

autots.evaluator.metrics.mda(A, F)

A measure of how well the actual and forecast follow the same pattern of change. Expects two, 2-D numpy arrays of forecast_length * n series Returns a 1-D array of results in len n series

NaNs diffs are filled with 0, essentially equiavelent to assuming a forward fill of NaN

Concat the last row of history to head of both A and F (req for 1 step)

Parameters:
  • A (numpy.array) – known true values

  • F (numpy.array) – predicted values

autots.evaluator.metrics.mean_absolute_differential_error(A, F, order: int = 1, df_train=None, scaler=None)

Expects two, 2-D numpy arrays of forecast_length * n series.

Returns a 1-D array of results in len n series

Parameters:
  • A (numpy.array) – known true values

  • F (numpy.array) – predicted values

  • order (int) – order of differential

  • df_train (np.array) – if provided, uses this as starting point for first diff step. Tail(1) must be most recent historical point before forecast. Must be numpy Array not DataFrame. Highly recommended if using this as the sole optimization metric. Without, it is an “unanchored” shape fitting metric. This will also allow this to work on forecast_length = 1 forecasts

  • scaler (np.array) – if provided, metrics are scaled by this. 1d array of shape (num_series,)

autots.evaluator.metrics.mean_absolute_error(A, F)

Expects two, 2-D numpy arrays of forecast_length * n series.

Returns a 1-D array of results in len n series

Parameters:
  • A (numpy.array) – known true values

  • F (numpy.array) – predicted values

autots.evaluator.metrics.medae(ae, nan_flag=True)

Accepting abs error already calculated

autots.evaluator.metrics.median_absolute_error(A, F)

Expects two, 2-D numpy arrays of forecast_length * n series.

Returns a 1-D array of results in len n series

Parameters:
  • A (numpy.array) – known true values

  • F (numpy.array) – predicted values

autots.evaluator.metrics.mlvb(A, F, last_of_array)

Mean last value baseline, the % difference of forecast vs last value naive forecast. Does poorly with near-zero values.

Parameters:
  • A (np.array) – actuals

  • F (np.array) – forecast values

  • last_of_array (np.array) – the last row of the historic training data, most recent values

autots.evaluator.metrics.mqae(ae, q=0.85, nan_flag=True)

Return the mean of errors less than q quantile of the errors per series. np.nans count as largest values, and so are removed as part of the > q group.

autots.evaluator.metrics.msle(full_errors, ae, le, nan_flag=True)

input is array of y_pred - y_true to over-penalize underestimate. Use instead y_true - y_pred to over-penalize overestimate. AE used here for the log just to avoid divide by zero warnings (values aren’t used either way)

autots.evaluator.metrics.numpy_ffill(arr)

Fill np.nan forward down the zero axis.

autots.evaluator.metrics.oda(A, F, last_of_array)

Origin Directional Accuracy, the accuracy of growth or decline relative to most recent data.

autots.evaluator.metrics.pinball_loss(A, F, quantile)

Bigger is bad-er.

autots.evaluator.metrics.precomp_wasserstein(F, cumsum_A)
autots.evaluator.metrics.qae(ae, q=0.9, nan_flag=True)

Return the q quantile of the errors per series. np.nans count as smallest values and will push more values into the exclusion group.

autots.evaluator.metrics.rmse(sqe)

Accepting squared error already calculated

autots.evaluator.metrics.root_mean_square_error(actual, forecast)

Expects two, 2-D numpy arrays of forecast_length * n series.

Returns a 1-D array of results in len n series

Parameters:
  • actual (numpy.array) – known true values

  • forecast (numpy.array) – predicted values

autots.evaluator.metrics.rps(predictions, observed)

Vectorized version of Ranked Probability Score. A lower value is a better score. From: Colin Catlin, https://syllepsis.live/2022/01/22/ranked-probability-score-in-python/

Parameters:
  • predictions (pd.DataFrame) – each column is an outcome category, with values as the 0 to 1 probability of that category

  • observed (pd.DataFrame) – each column is an outcome category, with values of 0 OR 1 with 1 being that category occurred

autots.evaluator.metrics.scaled_pinball_loss(A, F, df_train, quantile)

Scaled pinball loss.

Parameters:
  • A (np.array) – actual values

  • F (np.array) – forecast values

  • df_train (np.array) – values of historic data for scaling

  • quantile (float) – which bound of upper/lower forecast this is

autots.evaluator.metrics.smape(actual, forecast, ae, nan_flag=True)

Accepting abs error already calculated

autots.evaluator.metrics.smoothness(arr)

A gradient measure of linearity, where 0 is linear and larger values are more volatile.

autots.evaluator.metrics.spl(precomputed_spl, scaler)

Accepting most of it already calculated

autots.evaluator.metrics.symmetric_mean_absolute_percentage_error(actual, forecast)

Expect two, 2-D numpy arrays of forecast_length * n series. Allows NaN in actuals, and corresponding NaN in forecast, but not unmatched NaN in forecast Also doesn’t like zeroes in either forecast or actual - results in poor error value even if forecast is accurate

Returns a 1-D array of results in len n series

Parameters:
  • actual (numpy.array) – known true values

  • forecast (numpy.array) – predicted values

References

https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error

autots.evaluator.metrics.threshold_loss(actual, forecast, threshold, penalty_threshold=None)

Run once for overestimate then again for underestimate. Add both for combined view.

Parameters:
  • actual/forecast – 2D wide style data DataFrame or np.array

  • threshold – (0, 2), 0.9 (penalize 10% and greater underestimates) and 1.1 (penalize overestimate over 10%)

  • penalty_threshold – defaults to same as threshold, adjust strength of penalty

autots.evaluator.metrics.unsorted_wasserstein(F, A)

Also known as earth moving distance.

autots.evaluator.metrics.wasserstein(F, A)

This version has sorting, which is perhaps less relevant on average than the unsorted.

autots.evaluator.validation module

Extracted from auto_ts.py, the functions to create validation segments.

Warning, these are used in AMFM, possibly other places. Avoid modification of function structures, if possible.

Created on Mon Jan 16 11:36:01 2023

@author: Colin

autots.evaluator.validation.extract_seasonal_val_periods(validation_method)
autots.evaluator.validation.generate_validation_indices(validation_method, forecast_length, num_validations, df_wide_numeric, validation_params={}, preclean=None, verbose=0)

generate validation indices (equals num_validations + 1 as includes initial eval).

Parameters:
  • validation_method (str) – ‘backwards’, ‘even’, ‘similarity’, ‘seasonal’, ‘seasonal 364’, etc.

  • forecast_length (int) – number of steps ahead for forecast

  • num_validations (int) – number of additional vals after first eval sample

  • df_wide_numeric (pd.DataFrame) – pandas DataFrame with a dt index and columns as time series

  • preclean (dict) – transformer dict, used for similarity cleaning

  • verbose (int) – verbosity

autots.evaluator.validation.validate_num_validations(validation_method, num_validations, df_wide_numeric, forecast_length, min_allowed_train_percent=0.5, verbose=0)

Check how many validations are possible given the length of the data. Beyond initial eval split which is always assumed.

Module contents

Model Evaluators