autots.evaluator.feature_detector package¶
Subpackages¶
- autots.evaluator.feature_detector.components package
- Submodules
- autots.evaluator.feature_detector.components.anomalies module
- autots.evaluator.feature_detector.components.decomposition module
- autots.evaluator.feature_detector.components.holidays module
- autots.evaluator.feature_detector.components.seasonality module
- autots.evaluator.feature_detector.components.trend module
- Module contents
- autots.evaluator.feature_detector.loss package
- autots.evaluator.feature_detector.utils package
Submodules¶
autots.evaluator.feature_detector.detector module¶
TimeSeriesFeatureDetector - Main orchestrator class.
Composes functionality from component mixins for decomposition, seasonality, trend, holidays, anomalies, rescaling, and formatting.
- class autots.evaluator.feature_detector.detector.TimeSeriesFeatureDetector(seasonality_params=None, rough_seasonality_params=None, holiday_params=None, anomaly_params=None, changepoint_params=None, level_shift_params=None, level_shift_validation=None, general_transformer_params=None, smoothing_window=None, standardize=True, detection_mode='multivariate', global_holiday_anomaly_suppression=True, extended_anomaly_params=None, event_dag_params=None, holiday_country=None, holiday_countries=None)¶
Bases:
DecompositionMixin,SeasonalityMixin,TrendMixin,HolidayMixin,AnomalyMixin,ExtendedAnomalyMixin,RescalingMixin,FormattingMixinComprehensive feature detection pipeline for time series.
TODO: upstream more of this code into the component classes (e.g., HolidayDetector, AnomalyRemoval, ChangepointDetector) TODO: Handle multiplicative seasonality TODO: Handle time varying seasonality using fast_kalman TODO: Improve holiday “splash” effect and weekend interactions TODO: Support identifying regressor impacts and granger lag impacts TODO: Build upon the JSON template so that it can be converted to a fixed size embedding (probably a 2d embedding). The fixed size may vary by parameters, but for a given parameter set should always be the same size. The embedding does not need to be capable of fully reconstructing the time series, just representing it. TODO: Support for modeling the trend with a fast kalman state space approach, ideally aligned with changepoints in some way if possible. TODO: consider also having “deviation from group” type anomaly detection for multivariate series TODO: Improve anomaly typing in univariate mode (currently defaults to point_outlier) and incorporate detector scores into type confidence. TODO: Detect and expose non-holiday regressor impacts (not just holiday coefficients), and persist them in template/features output.
Parameters
- rough_seasonality_paramsdict, optional
Parameters for DatepartRegressionTransformer used in initial rough seasonality decomposition (to improve holiday and anomaly detection).
- holiday_paramsdict, optional
Parameters for HolidayDetector
- anomaly_paramsdict, optional
Parameters for AnomalyRemoval
- changepoint_paramsdict, optional
Parameters for ChangepointDetector
- level_shift_paramsdict, optional
Parameters for LevelShiftMagic
- level_shift_validationdict, optional
Validation parameters for level shifts
- general_transformer_paramsdict, optional
Parameters for GeneralTransformer applied before trend detection
- smoothing_windowint, optional
Window size for smoothing before trend detection
- standardizebool, default=True
Whether to standardize series before processing
- detection_modestr, default=’multivariate’
Controls whether detections are unique per series (‘multivariate’) or shared across all series (‘univariate’). - ‘multivariate’: Each series gets unique anomalies, holidays, changepoints, and level shifts - ‘univariate’: All series share common anomalies, holidays, changepoints, and level shifts
(level shifts are detected on aggregated signal and scaled appropriately per series)
- global_holiday_anomaly_suppressionbool, default=True
If True, anomaly detection suppresses holiday-proximate flags using a merged holiday date set from all series. Set False to disable this suppression.
- TEMPLATE_VERSION = '1.2'¶
- fit(df)¶
Fit the feature detector to time series data.
Decomposition follows this sequential removal strategy:
INITIAL DECOMPOSITION (for detection only): - Remove rough seasonality → rough_residual - Detect holidays on rough_residual - Detect anomalies on rough_residual
FINAL SEASONALITY FIT: - Fit on: original - anomalies - Holidays fitted simultaneously as regressors - Output: final_residual (has seasonality + holidays removed)
LEVEL SHIFT DETECTION: - Detect on: original - anomalies - seasonality - holidays - (This is final_residual)
TREND DETECTION: - Detect on: original - anomalies - seasonality - holidays - level_shifts
NOISE & ANOMALY COMPONENTS: - Noise: original - trend - level_shifts - seasonality - holidays - anomalies - Anomalies: difference between original and de-anomalied version
- forecast(forecast_length, frequency=None)¶
Generate a simple forward projection similar to BasicLinearModel. This detector is not optimized for forecasting; dedicated forecasting models may provide better results.
- get_cleaned_data(series_name=None)¶
Return cleaned time series data with anomalies, noise, and level shifts removed.
The cleaned data consists of: - Trend (with mean included) - Seasonality - Holiday effects
Level shifts are corrected by removing the cumulative shift effect, returning the data to its baseline level. Anomalies and noise are excluded entirely.
- Parameters:
series_name (str, optional) – If provided, return cleaned data for only this series. If None, return cleaned data for all series.
- Returns:
Cleaned time series data with the same index as the original data. If series_name is specified, returns a DataFrame with a single column.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If fit() has not been called yet.
ValueError – If series_name is provided but not found in the original data.
Examples
>>> detector = TimeSeriesFeatureDetector() >>> detector.fit(df) >>> cleaned = detector.get_cleaned_data() >>> cleaned_single = detector.get_cleaned_data('series_1')
- get_detected_features(series_name=None, include_components=False, include_metadata=True)¶
- get_event_dag(deep=True)¶
Return Event DAG metadata derived from detector outputs.
- static get_new_params(method='random')¶
Sample random parameters for detector optimization.
- get_template(deep=True)¶
- plot(series_name=None, figsize=(16, 14), save_path=None, show=True, separate_noise_anomaly_panels=True, dual_axis_seasonality_holidays=True, dual_axis_trend_level_shift=True)¶
- plot_event_dag(series=None, start_date=None, end_date=None, show_members=False, figsize=(14, 6), save_path=None, show=True)¶
Plot Event DAG macro-events on a timeline-first layout.
- query_features(dates=None, series=None, include_components=False, include_metadata=False, include_event_dag=False, include_event_members=False, return_json=False)¶
Query a specific slice of detected features with minimal token usage.
Designed for LLM-friendly output with compact representation.
- Parameters:
dates (str, datetime, list, slice) – Date(s) to query for features. - Single date: “2024-01-15” or datetime object - Date range: slice(“2024-01-01”, “2024-01-31”) - List of dates: [“2024-01-15”, “2024-01-20”] - None: return all features (not filtered by date)
series (str, list) – Series name(s) to query. - Single series: “sales” - Multiple series: [“sales”, “revenue”] - None: all series
include_components (bool) – Include component time series values for the date range
include_metadata (bool) – Include metadata like noise levels, scales, etc.
include_event_dag (bool) – Include Event DAG cluster and family metadata
include_event_members (bool) – Include raw Event DAG member events
return_json (bool) – Return JSON string instead of dict
- Returns:
- Compact feature data including anomalies, changepoints,
level shifts, holidays, and optionally components
- Return type:
dict or str
Examples
>>> # Get all features for one series >>> detector.query_features(series="sales")
>>> # Get features occurring in a date range >>> detector.query_features( ... dates=slice("2024-01-01", "2024-01-31"), ... series=["sales", "revenue"] ... )
>>> # Get components for specific dates >>> detector.query_features( ... dates=["2024-01-15", "2024-01-16"], ... series="sales", ... include_components=True ... )
- classmethod render_template(template, return_components=False)¶
Render a feature detection template back into time series data.
- summary()¶
- tune_with_synthetic(real_df, n_synthetic_series=16, n_tune_iterations=25, n_detector_iterations=30, tune_seed=42, loss_params=None, loss_weights=None, synthetic_starting_params=None, starting_params=None, verbose=True)¶
Tune synthetic data to a real dataset, optimize detector params, and fit self.
After completion, this instance is fitted on
real_dfwith the optimized detector parameters and stores optimization artifacts on the instance.
autots.evaluator.feature_detector.event_dag module¶
Event DAG utilities for TimeSeriesFeatureDetector.
- autots.evaluator.feature_detector.event_dag.build_event_dag_from_detector(detector)¶
Build an Event DAG from detector public event outputs.
- autots.evaluator.feature_detector.event_dag.empty_event_dag(params=None, detection_mode='multivariate', construction_mode='full', series_names=None)¶
Return a valid empty Event DAG container.
- autots.evaluator.feature_detector.event_dag.resolve_event_dag_params(params=None)¶
Return normalized Event DAG params.
autots.evaluator.feature_detector.event_dag_view module¶
Event DAG filtering and plotting helpers.
- autots.evaluator.feature_detector.event_dag_view.filter_event_dag(event_dag, series=None, start_date=None, end_date=None, include_members=True)¶
Filter an Event DAG view by series and date range.
- autots.evaluator.feature_detector.event_dag_view.plot_event_dag_timeline(event_dag, series=None, start_date=None, end_date=None, show_members=False, figsize=(14, 6), save_path=None, show=True)¶
Render a timeline-first Event DAG view.
autots.evaluator.feature_detector.extended_anomaly module¶
ExtendedAnomalyDetector - Two-pass anomaly detector for multi-day patterns.
Pass 1: Point anomaly proposals via AnomalyRemoval. Pass 2: Extended/multi-day pattern detection:
CUSUM (sustained mean shift for noisy_burst / transient_change)
Cumulative-sum template matching (slope_reversion onset + hold + reversion)
Decay extension of pass-1 points (impulse_decay / linear_decay)
Segmented sliding-window mean shift (noisy_burst / transient_change)
Can be used standalone or embedded in TimeSeriesFeatureDetector via ExtendedAnomalyMixin.
- class autots.evaluator.feature_detector.extended_anomaly.ExtendedAnomalyDetector(point_anomaly_params=None, sustained_window=7, sustained_baseline=60, sustained_threshold=2.5, cusum_k=0.5, cusum_h=5.0, slope_reversion_min_hold=5, slope_reversion_min_reversion=7, slope_reversion_cumsum_threshold=3.0, slope_reversion_max_duration=84, decay_lookahead=14, decay_fit_min_r2=0.5, min_segment_run=2, sustained_hysteresis=0.7, segment_max_gap=1, merge_distance_days=3, max_anomalies_per_series=25)¶
Bases:
objectTwo-pass anomaly detector combining point detection with multi-day pattern detection.
Pass 1 produces point-level proposals via AnomalyRemoval (optional if
pass1_recordsare provided externally).Pass 2 runs four independent detection methods on a clean residual: - CUSUM: cumulative-sum alarm for sustained mean shifts. - Slope-reversion template: cumulative-sum peak/trough analysis for
slow onset → hold → reversion patterns.
Decay extension: extends pass-1 point detections with exponential or linear decay tails.
Segmented mean shift: sliding-window mean comparison against a rolling baseline.
All per-series events are then merged and de-duplicated into a final list that preserves
start_date,end_date,duration,type,magnitude, andscore.- Parameters:
point_anomaly_params (dict, optional) – Keyword arguments forwarded to
AnomalyRemovalfor pass-1 point detection. If None a conservative rolling-zscore detector is used.sustained_window (int) – Short rolling window (days) used by the segmented-shift and CUSUM detectors to compute local means.
sustained_baseline (int) – Longer window used to estimate the baseline mean and standard deviation.
sustained_threshold (float) – Standardized deviation threshold (in units of baseline σ) above which a window is considered anomalous.
cusum_k (float) – CUSUM allowance parameter (slack / half-width) in standardized units. Smaller values are more sensitive.
cusum_h (float) – CUSUM decision threshold. An alarm fires when the accumulated statistic exceeds this value.
slope_reversion_min_hold (int) – Minimum number of days that the cumulative sum must stay elevated before a slope-reversion event is flagged.
slope_reversion_min_reversion (int) – Minimum number of days the reversion phase must last.
slope_reversion_cumsum_threshold (float) – Minimum peak z-score of the cumulative sum (relative to its rolling σ) required to trigger a slope-reversion candidate.
slope_reversion_max_duration (int) – Maximum allowed duration (days) for slope-reversion events. Longer candidates are treated as structural drift and ignored.
decay_lookahead (int) – Number of days to inspect after a pass-1 peak for a decay tail.
decay_fit_min_r2 (float) – Minimum R² required for a decay template fit to extend a point event.
min_segment_run (int) – Minimum number of consecutive elevated windows required by the segmented-shift detector to form an event.
sustained_hysteresis (float) – Fraction of
sustained_thresholdused to keep an in-progress segmented run active after it starts. Values in (0, 1] reduce fragmentation from brief dips.segment_max_gap (int) – Maximum number of non-flagged days allowed between two segmented runs before stitching them into one event.
merge_distance_days (int) – Events within this many days of each other (or overlapping) are merged.
max_anomalies_per_series (int) – Cap on the number of events returned per series.
- fit(residual_df, pass1_records=None)¶
Detect extended anomalies in residual_df.
- Parameters:
residual_df (pd.DataFrame) – Clean residual DataFrame (original minus all structured components). Each column is treated as an independent series.
pass1_records (dict, optional) – Pre-computed point anomaly records keyed by series name ({series: [{‘date’: …, ‘magnitude’: …, ‘type’: …, ‘score’: …}]}). When provided the internal pass-1 AnomalyRemoval run is skipped.
- Return type:
self
- get_events(series_name=None)¶
Return detected events.
- Parameters:
series_name (str, optional) – If given, return events for that series only (as a list). Otherwise return the full dict.
- static get_new_params(method='random')¶
Sample random parameters for optimizer search.
- class autots.evaluator.feature_detector.extended_anomaly.ExtendedAnomalyMixin¶
Bases:
objectMixin for
TimeSeriesFeatureDetectorthat adds a second extended anomaly detection pass after the main decomposition is complete.The extended pass operates on the cleanest available residual:
noise_component + anomaly_component(= original minus all structured components), so that structured effects do not contaminate the extended anomaly detection.Requires the host class to expose: -
self.extended_anomaly_params(dict or falsy to disable) -self._anomaly_records_temp(populated by pass-1 in DecompositionMixin)
autots.evaluator.feature_detector.optimizer module¶
FeatureDetectionOptimizer - Hyperparameter optimization using synthetic data.
- class autots.evaluator.feature_detector.optimizer.FeatureDetectionOptimizer(synthetic_generator, loss_calculator=None, n_iterations=50, random_seed=42, starting_params=None, search_strategy='random', selection_strategy='recovery_lexicographic', stage_budget=None)¶
Bases:
objectOptimize TimeSeriesFeatureDetector parameters using synthetic labeled data.
Defaults to a broad random/genetic search with recovery-first selection.
- fine_tune_changepoints(starting_params, n_per_stage=200, curriculum_sigmas=None, tversky_alpha=0.3, tversky_beta=0.7, tversky_gamma=2.0, level_shift_weight=0.35, exclude_changepoint_methods=None, over_prediction_penalty=0.1, location_weight=0.35, count_weight=0.25, slope_match_weight=0.15)¶
Focused fine-tuning pass that freezes every parameter group except
changepoint_paramsandlevel_shift_params.All other parameters (seasonality, anomaly, holiday, etc.) are held fixed so the optimizer can zero in on changepoint quality without interference.
The loss function is a statistical translation of techniques designed for neural changepoint training:
- Gaussian Label Smoothing
Instead of a hard ±tolerance binary label, each true changepoint is represented as a Gaussian probability distribution centred on its date with standard deviation
sigma. This converts the step-function loss landscape into smooth, convex basins and ensures that detections that are “close but not exact” receive a constructive gradient signal.- Focal Tversky Loss (statistical translation)
The metric used for scoring is the Focal Tversky index with
alpha < beta(default 0.3 / 0.7), which heavily penalises false negatives over false positives, directly preventing the zero-prediction collapse that plagues changepoint tuning. The focal exponentgamma=2.0concentrates the gradient on partially-matched changepoints rather than already-correct ones.- Curriculum Learning (sigma annealing)
Three stages with decreasing sigma drive the search from coarse to fine sensitivity:
Stage 1: sigma=14 days — wide window builds initial recall Stage 2: sigma=7 days — medium window matches ±7-day tolerance Stage 3: sigma=3.5 days — tight window polishes placement precision
- Parameters:
starting_params (dict) – Full detector parameter dict to use as the frozen baseline. All keys except
changepoint_paramsandlevel_shift_paramsare immutably frozen throughout the run.n_per_stage (int) – Number of candidate configurations evaluated per curriculum stage.
curriculum_sigmas (list of float, optional) – Sigma values (in days) for each curriculum stage. Defaults to [14.0, 7.0, 3.5].
tversky_alpha (float) – FP weight in Tversky denominator (keep < tversky_beta).
tversky_beta (float) – FN weight in Tversky denominator (keep > tversky_alpha).
tversky_gamma (float) – Focal exponent applied to (1 - Tversky_index).
level_shift_weight (float) – Blend weight for level-shift Tversky loss in the final score (trend changepoints get 1 - weight). Defaults below 0.5 so the fine-tune remains changepoint-first while still rewarding cleaner level-shift separation.
exclude_changepoint_methods (list of str, optional) – Changepoint method names to exclude from the search. Defaults to
['basic'], which prevents the evenly-spaced pseudo-detector from being selected (it cannot be used for analytic purposes). Pass an empty list[]to allow all methods including ‘basic’.over_prediction_penalty (float) – Scales how quickly the count penalty ramps once detections exceed the slight-over buffer. Higher values curb severe over-segmentation without removing the mild recall bias near the target count.
location_weight (float) – Weight on an explicit symmetric distance penalty. This makes a count-correct but badly misplaced solution score worse than a nearby over-detected one, which is the balance needed for downstream trend fitting.
count_weight (float) – Weight on count calibration. Slight over-detection is tolerated more than under-detection, but the penalty ramps quickly once excess changepoints move beyond the preferred buffer.
slope_match_weight (float) – Weight on slope-change alignment for nearby trend changepoints. This favors candidates that place changepoints where the underlying trend change is directionally and numerically similar to ground truth.
- Returns:
Best full parameter dict found, with only changepoint/level-shift params potentially changed from
starting_params.- Return type:
dict
- get_optimization_summary()¶
Return summary of optimization results.
- optimize(starting_params=None)¶
Run genetic-style optimization to find best detector parameters.
- Parameters:
starting_params (dict, optional) – Optional seed parameter configuration. Overrides constructor value when provided.
- Returns:
Best parameters found
- Return type:
dict
Module contents¶
Time Series Feature Detection and Optimization package.
- class autots.evaluator.feature_detector.ExtendedAnomalyDetector(point_anomaly_params=None, sustained_window=7, sustained_baseline=60, sustained_threshold=2.5, cusum_k=0.5, cusum_h=5.0, slope_reversion_min_hold=5, slope_reversion_min_reversion=7, slope_reversion_cumsum_threshold=3.0, slope_reversion_max_duration=84, decay_lookahead=14, decay_fit_min_r2=0.5, min_segment_run=2, sustained_hysteresis=0.7, segment_max_gap=1, merge_distance_days=3, max_anomalies_per_series=25)¶
Bases:
objectTwo-pass anomaly detector combining point detection with multi-day pattern detection.
Pass 1 produces point-level proposals via AnomalyRemoval (optional if
pass1_recordsare provided externally).Pass 2 runs four independent detection methods on a clean residual: - CUSUM: cumulative-sum alarm for sustained mean shifts. - Slope-reversion template: cumulative-sum peak/trough analysis for
slow onset → hold → reversion patterns.
Decay extension: extends pass-1 point detections with exponential or linear decay tails.
Segmented mean shift: sliding-window mean comparison against a rolling baseline.
All per-series events are then merged and de-duplicated into a final list that preserves
start_date,end_date,duration,type,magnitude, andscore.- Parameters:
point_anomaly_params (dict, optional) – Keyword arguments forwarded to
AnomalyRemovalfor pass-1 point detection. If None a conservative rolling-zscore detector is used.sustained_window (int) – Short rolling window (days) used by the segmented-shift and CUSUM detectors to compute local means.
sustained_baseline (int) – Longer window used to estimate the baseline mean and standard deviation.
sustained_threshold (float) – Standardized deviation threshold (in units of baseline σ) above which a window is considered anomalous.
cusum_k (float) – CUSUM allowance parameter (slack / half-width) in standardized units. Smaller values are more sensitive.
cusum_h (float) – CUSUM decision threshold. An alarm fires when the accumulated statistic exceeds this value.
slope_reversion_min_hold (int) – Minimum number of days that the cumulative sum must stay elevated before a slope-reversion event is flagged.
slope_reversion_min_reversion (int) – Minimum number of days the reversion phase must last.
slope_reversion_cumsum_threshold (float) – Minimum peak z-score of the cumulative sum (relative to its rolling σ) required to trigger a slope-reversion candidate.
slope_reversion_max_duration (int) – Maximum allowed duration (days) for slope-reversion events. Longer candidates are treated as structural drift and ignored.
decay_lookahead (int) – Number of days to inspect after a pass-1 peak for a decay tail.
decay_fit_min_r2 (float) – Minimum R² required for a decay template fit to extend a point event.
min_segment_run (int) – Minimum number of consecutive elevated windows required by the segmented-shift detector to form an event.
sustained_hysteresis (float) – Fraction of
sustained_thresholdused to keep an in-progress segmented run active after it starts. Values in (0, 1] reduce fragmentation from brief dips.segment_max_gap (int) – Maximum number of non-flagged days allowed between two segmented runs before stitching them into one event.
merge_distance_days (int) – Events within this many days of each other (or overlapping) are merged.
max_anomalies_per_series (int) – Cap on the number of events returned per series.
- fit(residual_df, pass1_records=None)¶
Detect extended anomalies in residual_df.
- Parameters:
residual_df (pd.DataFrame) – Clean residual DataFrame (original minus all structured components). Each column is treated as an independent series.
pass1_records (dict, optional) – Pre-computed point anomaly records keyed by series name ({series: [{‘date’: …, ‘magnitude’: …, ‘type’: …, ‘score’: …}]}). When provided the internal pass-1 AnomalyRemoval run is skipped.
- Return type:
self
- get_events(series_name=None)¶
Return detected events.
- Parameters:
series_name (str, optional) – If given, return events for that series only (as a list). Otherwise return the full dict.
- static get_new_params(method='random')¶
Sample random parameters for optimizer search.
- class autots.evaluator.feature_detector.FeatureDetectionLoss(changepoint_tolerance_days=7, level_shift_tolerance_days=7, anomaly_tolerance_days=1, holiday_tolerance_days=1, seasonality_window=14, weights=None, holiday_over_anomaly_bonus=0.4, trend_component_penalty='component', trend_complexity_window=7, trend_complexity_weight=0.0, focus_component_weights=False, validation_strictness=1.0, invalid_loss_mode='penalty', invalid_loss_penalty=1000000.0)¶
Bases:
LossMetricsMixin,LossEvaluatorsMixinComprehensive loss calculator for feature detection optimization.
Each synthetic label family contributes to the total loss: - Trend changepoints and slopes - Level shifts - Anomalies (including shared events and post patterns) - Holiday timing, direct impacts, and splash/bridge days - Seasonality strength, patterns, and changepoints - Noise regimes and noise-to-signal characteristics - Low-frequency noise structure consistency (drift/shift leakage) - Series-level metadata consistency (scale, type) - Regressor impacts when present
- DEFAULT_WEIGHTS = {'anomaly_loss': 1.3, 'holiday_event_loss': 1.2, 'holiday_impact_loss': 0.9, 'holiday_recall_loss': 0.9, 'holiday_splash_loss': 0.03, 'level_shift_loss': 1.3, 'metadata_loss': 0.05, 'noise_level_loss': 0.5, 'noise_regime_loss': 0.4, 'noise_structure_loss': 0.2, 'regressor_loss': 0.3, 'seasonality_changepoint_loss': 0.01, 'seasonality_pattern_loss': 2.0, 'seasonality_strength_loss': 2.0, 'trend_loss': 1.0}¶
- INVALID_LOSS_PENALTY = 1000000.0¶
- calculate_loss(detected_features, true_labels, series_name=None, true_components=None, date_index=None)¶
Calculate overall loss comparing detected features to true labels.
- Parameters:
detected_features (dict) – Output from TimeSeriesFeatureDetector.get_detected_features(…)
true_labels (dict) – Labels from SyntheticDailyGenerator.get_all_labels(…)
series_name (str, optional) – If provided, only evaluate the named series.
true_components (dict, optional) – Mapping of series -> component arrays from SyntheticDailyGenerator.get_components()
date_index (pd.DatetimeIndex, optional) – Index used for the time series. Required for seasonality changepoint evaluation.
- Returns:
Loss breakdown with per-component metrics and total weighted loss.
- Return type:
dict
- class autots.evaluator.feature_detector.FeatureDetectionOptimizer(synthetic_generator, loss_calculator=None, n_iterations=50, random_seed=42, starting_params=None, search_strategy='random', selection_strategy='recovery_lexicographic', stage_budget=None)¶
Bases:
objectOptimize TimeSeriesFeatureDetector parameters using synthetic labeled data.
Defaults to a broad random/genetic search with recovery-first selection.
- fine_tune_changepoints(starting_params, n_per_stage=200, curriculum_sigmas=None, tversky_alpha=0.3, tversky_beta=0.7, tversky_gamma=2.0, level_shift_weight=0.35, exclude_changepoint_methods=None, over_prediction_penalty=0.1, location_weight=0.35, count_weight=0.25, slope_match_weight=0.15)¶
Focused fine-tuning pass that freezes every parameter group except
changepoint_paramsandlevel_shift_params.All other parameters (seasonality, anomaly, holiday, etc.) are held fixed so the optimizer can zero in on changepoint quality without interference.
The loss function is a statistical translation of techniques designed for neural changepoint training:
- Gaussian Label Smoothing
Instead of a hard ±tolerance binary label, each true changepoint is represented as a Gaussian probability distribution centred on its date with standard deviation
sigma. This converts the step-function loss landscape into smooth, convex basins and ensures that detections that are “close but not exact” receive a constructive gradient signal.- Focal Tversky Loss (statistical translation)
The metric used for scoring is the Focal Tversky index with
alpha < beta(default 0.3 / 0.7), which heavily penalises false negatives over false positives, directly preventing the zero-prediction collapse that plagues changepoint tuning. The focal exponentgamma=2.0concentrates the gradient on partially-matched changepoints rather than already-correct ones.- Curriculum Learning (sigma annealing)
Three stages with decreasing sigma drive the search from coarse to fine sensitivity:
Stage 1: sigma=14 days — wide window builds initial recall Stage 2: sigma=7 days — medium window matches ±7-day tolerance Stage 3: sigma=3.5 days — tight window polishes placement precision
- Parameters:
starting_params (dict) – Full detector parameter dict to use as the frozen baseline. All keys except
changepoint_paramsandlevel_shift_paramsare immutably frozen throughout the run.n_per_stage (int) – Number of candidate configurations evaluated per curriculum stage.
curriculum_sigmas (list of float, optional) – Sigma values (in days) for each curriculum stage. Defaults to [14.0, 7.0, 3.5].
tversky_alpha (float) – FP weight in Tversky denominator (keep < tversky_beta).
tversky_beta (float) – FN weight in Tversky denominator (keep > tversky_alpha).
tversky_gamma (float) – Focal exponent applied to (1 - Tversky_index).
level_shift_weight (float) – Blend weight for level-shift Tversky loss in the final score (trend changepoints get 1 - weight). Defaults below 0.5 so the fine-tune remains changepoint-first while still rewarding cleaner level-shift separation.
exclude_changepoint_methods (list of str, optional) – Changepoint method names to exclude from the search. Defaults to
['basic'], which prevents the evenly-spaced pseudo-detector from being selected (it cannot be used for analytic purposes). Pass an empty list[]to allow all methods including ‘basic’.over_prediction_penalty (float) – Scales how quickly the count penalty ramps once detections exceed the slight-over buffer. Higher values curb severe over-segmentation without removing the mild recall bias near the target count.
location_weight (float) – Weight on an explicit symmetric distance penalty. This makes a count-correct but badly misplaced solution score worse than a nearby over-detected one, which is the balance needed for downstream trend fitting.
count_weight (float) – Weight on count calibration. Slight over-detection is tolerated more than under-detection, but the penalty ramps quickly once excess changepoints move beyond the preferred buffer.
slope_match_weight (float) – Weight on slope-change alignment for nearby trend changepoints. This favors candidates that place changepoints where the underlying trend change is directionally and numerically similar to ground truth.
- Returns:
Best full parameter dict found, with only changepoint/level-shift params potentially changed from
starting_params.- Return type:
dict
- get_optimization_summary()¶
Return summary of optimization results.
- optimize(starting_params=None)¶
Run genetic-style optimization to find best detector parameters.
- Parameters:
starting_params (dict, optional) – Optional seed parameter configuration. Overrides constructor value when provided.
- Returns:
Best parameters found
- Return type:
dict
- class autots.evaluator.feature_detector.ReconstructionLoss(trend_complexity_window=7, trend_complexity_weight=1.0, metric_weights=None, trend_dominance_target=0.65, trend_min_other_variance=0.0001, seasonality_lags=(7, 365), seasonality_min_autocorr=0.1, seasonality_improvement_target=0.35, anomaly_improvement_target=0.25, anomaly_min_pre_std=0.001)¶
Bases:
FeatureDetectionLossLoss function tailored for real-world datasets lacking component-level labels.
Focuses on reconstruction quality while discouraging overly complex trend fits and encouraging variance to be attributed to seasonality, holidays, anomalies, and level shifts.
- DEFAULT_METRIC_WEIGHTS = {'anomaly_capture_loss': 0.7, 'noise_whiteness_loss': 0.5, 'reconstruction_loss': 0.5, 'seasonality_capture_loss': 0.8, 'seasonality_shape_loss': 0.6, 'structural_loss': 1.0, 'trend_dominance_loss': 0.9, 'trend_smoothness_loss': 1.2}¶
- calculate_loss(observed_df, detected_features, components=None, series_name=None)¶
Calculate reconstruction-oriented loss for unlabeled datasets.
- Parameters:
observed_df (pd.DataFrame) – Original time series data used for detection.
detected_features (dict) – Output from TimeSeriesFeatureDetector.get_detected_features(…, include_components=True).
components (dict, optional) – Explicit component container matching get_detected_features()[‘components’].
series_name (str, optional) – Restrict evaluation to a single series.
- Returns:
Loss metrics per series and aggregate total weighted loss.
- Return type:
dict
- class autots.evaluator.feature_detector.TimeSeriesFeatureDetector(seasonality_params=None, rough_seasonality_params=None, holiday_params=None, anomaly_params=None, changepoint_params=None, level_shift_params=None, level_shift_validation=None, general_transformer_params=None, smoothing_window=None, standardize=True, detection_mode='multivariate', global_holiday_anomaly_suppression=True, extended_anomaly_params=None, event_dag_params=None, holiday_country=None, holiday_countries=None)¶
Bases:
DecompositionMixin,SeasonalityMixin,TrendMixin,HolidayMixin,AnomalyMixin,ExtendedAnomalyMixin,RescalingMixin,FormattingMixinComprehensive feature detection pipeline for time series.
TODO: upstream more of this code into the component classes (e.g., HolidayDetector, AnomalyRemoval, ChangepointDetector) TODO: Handle multiplicative seasonality TODO: Handle time varying seasonality using fast_kalman TODO: Improve holiday “splash” effect and weekend interactions TODO: Support identifying regressor impacts and granger lag impacts TODO: Build upon the JSON template so that it can be converted to a fixed size embedding (probably a 2d embedding). The fixed size may vary by parameters, but for a given parameter set should always be the same size. The embedding does not need to be capable of fully reconstructing the time series, just representing it. TODO: Support for modeling the trend with a fast kalman state space approach, ideally aligned with changepoints in some way if possible. TODO: consider also having “deviation from group” type anomaly detection for multivariate series TODO: Improve anomaly typing in univariate mode (currently defaults to point_outlier) and incorporate detector scores into type confidence. TODO: Detect and expose non-holiday regressor impacts (not just holiday coefficients), and persist them in template/features output.
Parameters
- rough_seasonality_paramsdict, optional
Parameters for DatepartRegressionTransformer used in initial rough seasonality decomposition (to improve holiday and anomaly detection).
- holiday_paramsdict, optional
Parameters for HolidayDetector
- anomaly_paramsdict, optional
Parameters for AnomalyRemoval
- changepoint_paramsdict, optional
Parameters for ChangepointDetector
- level_shift_paramsdict, optional
Parameters for LevelShiftMagic
- level_shift_validationdict, optional
Validation parameters for level shifts
- general_transformer_paramsdict, optional
Parameters for GeneralTransformer applied before trend detection
- smoothing_windowint, optional
Window size for smoothing before trend detection
- standardizebool, default=True
Whether to standardize series before processing
- detection_modestr, default=’multivariate’
Controls whether detections are unique per series (‘multivariate’) or shared across all series (‘univariate’). - ‘multivariate’: Each series gets unique anomalies, holidays, changepoints, and level shifts - ‘univariate’: All series share common anomalies, holidays, changepoints, and level shifts
(level shifts are detected on aggregated signal and scaled appropriately per series)
- global_holiday_anomaly_suppressionbool, default=True
If True, anomaly detection suppresses holiday-proximate flags using a merged holiday date set from all series. Set False to disable this suppression.
- TEMPLATE_VERSION = '1.2'¶
- fit(df)¶
Fit the feature detector to time series data.
Decomposition follows this sequential removal strategy:
INITIAL DECOMPOSITION (for detection only): - Remove rough seasonality → rough_residual - Detect holidays on rough_residual - Detect anomalies on rough_residual
FINAL SEASONALITY FIT: - Fit on: original - anomalies - Holidays fitted simultaneously as regressors - Output: final_residual (has seasonality + holidays removed)
LEVEL SHIFT DETECTION: - Detect on: original - anomalies - seasonality - holidays - (This is final_residual)
TREND DETECTION: - Detect on: original - anomalies - seasonality - holidays - level_shifts
NOISE & ANOMALY COMPONENTS: - Noise: original - trend - level_shifts - seasonality - holidays - anomalies - Anomalies: difference between original and de-anomalied version
- forecast(forecast_length, frequency=None)¶
Generate a simple forward projection similar to BasicLinearModel. This detector is not optimized for forecasting; dedicated forecasting models may provide better results.
- get_cleaned_data(series_name=None)¶
Return cleaned time series data with anomalies, noise, and level shifts removed.
The cleaned data consists of: - Trend (with mean included) - Seasonality - Holiday effects
Level shifts are corrected by removing the cumulative shift effect, returning the data to its baseline level. Anomalies and noise are excluded entirely.
- Parameters:
series_name (str, optional) – If provided, return cleaned data for only this series. If None, return cleaned data for all series.
- Returns:
Cleaned time series data with the same index as the original data. If series_name is specified, returns a DataFrame with a single column.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If fit() has not been called yet.
ValueError – If series_name is provided but not found in the original data.
Examples
>>> detector = TimeSeriesFeatureDetector() >>> detector.fit(df) >>> cleaned = detector.get_cleaned_data() >>> cleaned_single = detector.get_cleaned_data('series_1')
- get_detected_features(series_name=None, include_components=False, include_metadata=True)¶
- get_event_dag(deep=True)¶
Return Event DAG metadata derived from detector outputs.
- static get_new_params(method='random')¶
Sample random parameters for detector optimization.
- get_template(deep=True)¶
- plot(series_name=None, figsize=(16, 14), save_path=None, show=True, separate_noise_anomaly_panels=True, dual_axis_seasonality_holidays=True, dual_axis_trend_level_shift=True)¶
- plot_event_dag(series=None, start_date=None, end_date=None, show_members=False, figsize=(14, 6), save_path=None, show=True)¶
Plot Event DAG macro-events on a timeline-first layout.
- query_features(dates=None, series=None, include_components=False, include_metadata=False, include_event_dag=False, include_event_members=False, return_json=False)¶
Query a specific slice of detected features with minimal token usage.
Designed for LLM-friendly output with compact representation.
- Parameters:
dates (str, datetime, list, slice) – Date(s) to query for features. - Single date: “2024-01-15” or datetime object - Date range: slice(“2024-01-01”, “2024-01-31”) - List of dates: [“2024-01-15”, “2024-01-20”] - None: return all features (not filtered by date)
series (str, list) – Series name(s) to query. - Single series: “sales” - Multiple series: [“sales”, “revenue”] - None: all series
include_components (bool) – Include component time series values for the date range
include_metadata (bool) – Include metadata like noise levels, scales, etc.
include_event_dag (bool) – Include Event DAG cluster and family metadata
include_event_members (bool) – Include raw Event DAG member events
return_json (bool) – Return JSON string instead of dict
- Returns:
- Compact feature data including anomalies, changepoints,
level shifts, holidays, and optionally components
- Return type:
dict or str
Examples
>>> # Get all features for one series >>> detector.query_features(series="sales")
>>> # Get features occurring in a date range >>> detector.query_features( ... dates=slice("2024-01-01", "2024-01-31"), ... series=["sales", "revenue"] ... )
>>> # Get components for specific dates >>> detector.query_features( ... dates=["2024-01-15", "2024-01-16"], ... series="sales", ... include_components=True ... )
- classmethod render_template(template, return_components=False)¶
Render a feature detection template back into time series data.
- summary()¶
- tune_with_synthetic(real_df, n_synthetic_series=16, n_tune_iterations=25, n_detector_iterations=30, tune_seed=42, loss_params=None, loss_weights=None, synthetic_starting_params=None, starting_params=None, verbose=True)¶
Tune synthetic data to a real dataset, optimize detector params, and fit self.
After completion, this instance is fitted on
real_dfwith the optimized detector parameters and stores optimization artifacts on the instance.