autots.datasets package¶
Submodules¶
autots.datasets.fred module¶
FRED (Federal Reserve Economic Data) Data Import
requires API key from FRED and pip install fredapi
- autots.datasets.fred.get_fred_data(fredkey: str, SeriesNameDict: dict | None = None, long=True, observation_start=None, sleep_seconds: int = 1, **kwargs)¶
Imports Data from Federal Reserve. For simplest results, make sure requested series are all of the same frequency.
- Parameters:
fredkey (str) – an API key from FRED
SeriesNameDict (dict) – pairs of FRED Series IDs and Series Names like: {‘SeriesID’: ‘SeriesName’} or a list of FRED IDs. Series id must match Fred IDs, but name can be anything if None, several default series are returned
long (bool) – if True, return long style data, else return wide style data with dt index
observation_start (datetime) – passed to Fred get_series
sleep_seconds (int) – seconds to sleep between each series call, reduces failure chance usually
Module contents¶
Tools for Importing Sample Data
- autots.datasets.load_artificial(long=False, date_start=None, date_end=None)¶
Load artifically generated series from random distributions.
- Parameters:
long (bool) – if True long style data, if False, wide style data
date_start – str or datetime.datetime of start date
date_end – str or datetime.datetime of end date
- autots.datasets.load_daily(long: bool = True)¶
Daily sample data.
``` # most of the wiki data was chosen to show holidays or holiday-like patterns wiki = [
‘United_States’, ‘Germany’, ‘List_of_highest-grossing_films’, ‘Jesus’, ‘Michael_Jackson’, ‘List_of_United_States_cities_by_population’, ‘Microsoft_Office’, ‘Google_Chrome’, ‘Periodic_table’, ‘Standard_deviation’, ‘Easter’, ‘Christmas’, ‘Chinese_New_Year’, ‘Thanksgiving’, ‘List_of_countries_that_have_gained_independence_from_the_United_Kingdom’, ‘History_of_the_hamburger’, ‘Elizabeth_II’, ‘William_Shakespeare’, ‘George_Washington’, ‘Cleopatra’, ‘all’
]
- df2 = load_live_daily(
observation_start=”2017-01-01”, weather_years=7, trends_list=None, gov_domain_list=None, wikipedia_pages=wiki, fred_series=[‘DGS10’, ‘T5YIE’, ‘SP500’,’DEXUSEU’], sleep_seconds=10, fred_key = “93873d40f10c20fe6f6e75b1ad0aed4d”, weather_data_types = [“WSF2”, “PRCP”], weather_stations = [“USW00014771”], # looking for intermittent tickers=None, london_air_stations=None, weather_event_types=None, earthquake_min_magnitude=None,
) data_file_name = join(“autots”, “datasets”, ‘data’, ‘holidays.zip’) df2.to_csv(
data_file_name, index=True, compression={
‘method’: ‘zip’, ‘archive_name’: ‘holidays.csv’, ‘compresslevel’: 9 # Maximum compression level (0-9)
}
)¶
Sources: Wikimedia Foundation
- param long:
if True, return data in long format. Otherwise return wide
- type long:
bool
- autots.datasets.load_hourly(long: bool = True)¶
Traffic data from the MN DOT via the UCI data repository.
- autots.datasets.load_linear(long=False, shape=None, start_date: str = '2021-01-01', introduce_nan: float | None = None, introduce_random: float | None = None, random_seed: int = 123)¶
Create a dataset of just zeroes for testing edge case.
- Parameters:
long (bool) – whether to make long or wide
shape (tuple) – shape of output dataframe
start_date (str) – first date of index
introduce_nan (float) – percent of rows to make null. 0.2 = 20%
introduce_random (float) – shape of gamma distribution
random_seed (int) – seed for random
- autots.datasets.load_live_daily(long: bool = False, observation_start: str | None = None, observation_end: str | None = None, fred_key: str | None = None, fred_series=['DGS10', 'T5YIE', 'SP500', 'DCOILWTICO', 'DEXUSEU', 'WPU0911'], tickers: list = ['MSFT'], trends_list: list = ['forecasting', 'cycling', 'microsoft'], trends_geo: str = 'US', weather_data_types: list = ['AWND', 'WSF2', 'TAVG', 'PRCP'], weather_stations: list = ['USW00094846', 'USW00014925', 'USW00014771'], weather_years: int = 5, london_air_stations: list = ['CT3', 'SK8'], london_air_species: str = 'PM25', london_air_days: int = 180, earthquake_days: int = 180, earthquake_min_magnitude: int = 5, gsa_key: str | None = None, gov_domain_list=['nasa.gov'], gov_domain_limit: int = 600, wikipedia_pages: list = ['Microsoft_Office', 'List_of_highest-grossing_films'], wiki_language: str = 'en', weather_event_types=['%28Z%29+Winter+Weather', '%28Z%29+Winter+Storm'], caiso_query: str | None = None, eia_key: str | None = None, eia_respondents: list = ['MISO', 'PJM', 'TVA', 'US48'], timeout: float = 300.05, sleep_seconds: int = 2, **kwargs)¶
Generates a dataframe of data up to the present day. Requires active internet connection. Try to be respectful of these free data sources by not calling too much too heavily. Pass None instead of specification lists to exclude a data source.
- Parameters:
long (bool) – whether to return in long format or wide
observation_start (str) – %Y-%m-%d earliest day to retrieve, passed to Fred.get_series and yfinance.history note that apis with more restrictions have other default lengths below which ignore this
observation_end (str) – %Y-%m-%d most recent day to retrieve
fred_key (str) – https://fred.stlouisfed.org/docs/api/api_key.html
fred_series (list) – list of FRED series IDs. This requires fredapi package
tickers (list) – list of stock tickers, requires yfinance pypi package
trends_list (list) – list of search keywords, requires pytrends pypi package. None to skip.
weather_data_types (list) – from NCEI NOAA api data types, GHCN Daily Weather Elements PRCP, SNOW, TMAX, TMIN, TAVG, AWND, WSF1, WSF2, WSF5, WSFG
weather_stations (list) – from NCEI NOAA api station ids. Pass empty list to skip.
london_air_stations (list) – londonair.org.uk source station IDs. Pass empty list to skip.
london_species (str) – what measurement to pull from London Air. Not all stations have all metrics.
earthquake_min_magnitude (int) – smallest earthquake magnitude to pull from earthquake.usgs.gov. Set None to skip this.
gsa_key (str) – api key from https://open.gsa.gov/api/dap/
gov_domain_list (list) – dist of government run domains to get traffic data for. Can be very slow, so fewer is better. some examples: [‘usps.com’, ‘ncbi.nlm.nih.gov’, ‘cdc.gov’, ‘weather.gov’, ‘irs.gov’, “usajobs.gov”, “studentaid.gov”, ‘nasa.gov’, “uk.usembassy.gov”, “tsunami.gov”]
gov_domain_limit (int) – max number of records. Smaller will be faster. Max is currently 10000.
wikipedia_pages (list) – list of Wikipedia pages, html encoded if needed (underscore for space)
weather_event_types (list) – list of html encoded severe weather event types https://www1.ncdc.noaa.gov/pub/data/swdi/stormevents/csvfiles/Storm-Data-Export-Format.pdf
caiso_query (str) – ENE_SLRS or None, can try others but probably won’t work due to other hardcoded params
timeout (float) – used by some queries
sleep_seconds (int) – increasing this may reduce probability of server download failures
- autots.datasets.load_monthly(long: bool = True)¶
Federal Reserve of St. Louis monthly economic indicators.
- autots.datasets.load_sine(long=False, shape=None, start_date: str = '2021-01-01', introduce_random: float | None = None, random_seed: int = 123)¶
Create a dataset of just zeroes for testing edge case.
- autots.datasets.load_weekdays(long: bool = False, categorical: bool = True, periods: int = 180)¶
Test edge cases by creating a Series with values as day of week.
- Parameters:
long (bool) – if True, return a df with columns “value” and “datetime” if False, return a Series with dt index
categorical (bool) – if True, return str/object, else return int
periods (int) – number of periods, ie length of data to generate
- autots.datasets.load_weekly(long: bool = True)¶
Weekly petroleum industry data from the EIA.
- autots.datasets.load_yearly(long: bool = True)¶
Federal Reserve of St. Louis annual economic indicators.
- autots.datasets.load_zeroes(long=False, shape=None, start_date: str = '2021-01-01')¶
Create a dataset of just zeroes for testing edge case.