autots.datasets package

Submodules

autots.datasets.fred module

FRED (Federal Reserve Economic Data) Data Import

requires API key from FRED and pip install fredapi

autots.datasets.fred.get_fred_data(fredkey: str, SeriesNameDict: dict | None = None, long=True, observation_start=None, sleep_seconds: int = 1, **kwargs)

Imports Data from Federal Reserve. For simplest results, make sure requested series are all of the same frequency.

Parameters:
  • fredkey (str) – an API key from FRED

  • SeriesNameDict (dict) – pairs of FRED Series IDs and Series Names like: {‘SeriesID’: ‘SeriesName’} or a list of FRED IDs. Series id must match Fred IDs, but name can be anything if None, several default series are returned

  • long (bool) – if True, return long style data, else return wide style data with dt index

  • observation_start (datetime) – passed to Fred get_series

  • sleep_seconds (int) – seconds to sleep between each series call, reduces failure chance usually

Module contents

Tools for Importing Sample Data

autots.datasets.load_artificial(long=False, date_start=None, date_end=None)

Load artifically generated series from random distributions.

Parameters:
  • long (bool) – if True long style data, if False, wide style data

  • date_start – str or datetime.datetime of start date

  • date_end – str or datetime.datetime of end date

autots.datasets.load_daily(long: bool = True)

Daily sample data.

``` # most of the wiki data was chosen to show holidays or holiday-like patterns wiki = [

‘United_States’, ‘Germany’, ‘List_of_highest-grossing_films’, ‘Jesus’, ‘Michael_Jackson’, ‘List_of_United_States_cities_by_population’, ‘Microsoft_Office’, ‘Google_Chrome’, ‘Periodic_table’, ‘Standard_deviation’, ‘Easter’, ‘Christmas’, ‘Chinese_New_Year’, ‘Thanksgiving’, ‘List_of_countries_that_have_gained_independence_from_the_United_Kingdom’, ‘History_of_the_hamburger’, ‘Elizabeth_II’, ‘William_Shakespeare’, ‘George_Washington’, ‘Cleopatra’, ‘all’

]

df2 = load_live_daily(

observation_start=”2017-01-01”, weather_years=7, trends_list=None, gov_domain_list=None, wikipedia_pages=wiki, fred_series=[‘DGS10’, ‘T5YIE’, ‘SP500’,’DEXUSEU’], sleep_seconds=10, fred_key = “93873d40f10c20fe6f6e75b1ad0aed4d”, weather_data_types = [“WSF2”, “PRCP”], weather_stations = [“USW00014771”], # looking for intermittent tickers=None, london_air_stations=None, weather_event_types=None, earthquake_min_magnitude=None,

) data_file_name = join(“autots”, “datasets”, ‘data’, ‘holidays.zip’) df2.to_csv(

data_file_name, index=True, compression={

‘method’: ‘zip’, ‘archive_name’: ‘holidays.csv’, ‘compresslevel’: 9 # Maximum compression level (0-9)

}

)

Sources: Wikimedia Foundation

param long:

if True, return data in long format. Otherwise return wide

type long:

bool

autots.datasets.load_hourly(long: bool = True)

Traffic data from the MN DOT via the UCI data repository.

autots.datasets.load_linear(long=False, shape=None, start_date: str = '2021-01-01', introduce_nan: float | None = None, introduce_random: float | None = None, random_seed: int = 123)

Create a dataset of just zeroes for testing edge case.

Parameters:
  • long (bool) – whether to make long or wide

  • shape (tuple) – shape of output dataframe

  • start_date (str) – first date of index

  • introduce_nan (float) – percent of rows to make null. 0.2 = 20%

  • introduce_random (float) – shape of gamma distribution

  • random_seed (int) – seed for random

autots.datasets.load_live_daily(long: bool = False, observation_start: str | None = None, observation_end: str | None = None, fred_key: str | None = None, fred_series=['DGS10', 'T5YIE', 'SP500', 'DCOILWTICO', 'DEXUSEU', 'WPU0911'], tickers: list = ['MSFT'], trends_list: list = ['forecasting', 'cycling', 'microsoft'], trends_geo: str = 'US', weather_data_types: list = ['AWND', 'WSF2', 'TAVG', 'PRCP'], weather_stations: list = ['USW00094846', 'USW00014925', 'USW00014771'], weather_years: int = 5, london_air_stations: list = ['CT3', 'SK8'], london_air_species: str = 'PM25', london_air_days: int = 180, earthquake_days: int = 180, earthquake_min_magnitude: int = 5, gsa_key: str | None = None, gov_domain_list=['nasa.gov'], gov_domain_limit: int = 600, wikipedia_pages: list = ['Microsoft_Office', 'List_of_highest-grossing_films'], wiki_language: str = 'en', weather_event_types=['%28Z%29+Winter+Weather', '%28Z%29+Winter+Storm'], caiso_query: str | None = None, eia_key: str | None = None, eia_respondents: list = ['MISO', 'PJM', 'TVA', 'US48'], timeout: float = 300.05, sleep_seconds: int = 2, **kwargs)

Generates a dataframe of data up to the present day. Requires active internet connection. Try to be respectful of these free data sources by not calling too much too heavily. Pass None instead of specification lists to exclude a data source.

Parameters:
  • long (bool) – whether to return in long format or wide

  • observation_start (str) – %Y-%m-%d earliest day to retrieve, passed to Fred.get_series and yfinance.history note that apis with more restrictions have other default lengths below which ignore this

  • observation_end (str) – %Y-%m-%d most recent day to retrieve

  • fred_key (str) – https://fred.stlouisfed.org/docs/api/api_key.html

  • fred_series (list) – list of FRED series IDs. This requires fredapi package

  • tickers (list) – list of stock tickers, requires yfinance pypi package

  • trends_list (list) – list of search keywords, requires pytrends pypi package. None to skip.

  • weather_data_types (list) – from NCEI NOAA api data types, GHCN Daily Weather Elements PRCP, SNOW, TMAX, TMIN, TAVG, AWND, WSF1, WSF2, WSF5, WSFG

  • weather_stations (list) – from NCEI NOAA api station ids. Pass empty list to skip.

  • london_air_stations (list) – londonair.org.uk source station IDs. Pass empty list to skip.

  • london_species (str) – what measurement to pull from London Air. Not all stations have all metrics.

  • earthquake_min_magnitude (int) – smallest earthquake magnitude to pull from earthquake.usgs.gov. Set None to skip this.

  • gsa_key (str) – api key from https://open.gsa.gov/api/dap/

  • gov_domain_list (list) – dist of government run domains to get traffic data for. Can be very slow, so fewer is better. some examples: [‘usps.com’, ‘ncbi.nlm.nih.gov’, ‘cdc.gov’, ‘weather.gov’, ‘irs.gov’, “usajobs.gov”, “studentaid.gov”, ‘nasa.gov’, “uk.usembassy.gov”, “tsunami.gov”]

  • gov_domain_limit (int) – max number of records. Smaller will be faster. Max is currently 10000.

  • wikipedia_pages (list) – list of Wikipedia pages, html encoded if needed (underscore for space)

  • weather_event_types (list) – list of html encoded severe weather event types https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Export-Format.pdf

  • caiso_query (str) – ENE_SLRS or None, can try others but probably won’t work due to other hardcoded params

  • timeout (float) – used by some queries

  • sleep_seconds (int) – increasing this may reduce probability of server download failures

autots.datasets.load_monthly(long: bool = True)

Federal Reserve of St. Louis monthly economic indicators.

autots.datasets.load_sine(long=False, shape=None, start_date: str = '2021-01-01', introduce_random: float | None = None, random_seed: int = 123)

Create a dataset of just zeroes for testing edge case.

autots.datasets.load_weekdays(long: bool = False, categorical: bool = True, periods: int = 180)

Test edge cases by creating a Series with values as day of week.

Parameters:
  • long (bool) – if True, return a df with columns “value” and “datetime” if False, return a Series with dt index

  • categorical (bool) – if True, return str/object, else return int

  • periods (int) – number of periods, ie length of data to generate

autots.datasets.load_weekly(long: bool = True)

Weekly petroleum industry data from the EIA.

autots.datasets.load_yearly(long: bool = True)

Federal Reserve of St. Louis annual economic indicators.

autots.datasets.load_zeroes(long=False, shape=None, start_date: str = '2021-01-01')

Create a dataset of just zeroes for testing edge case.