Data Module Guide#
The finm.data module provides access to financial data from various sources with a
standardized interface. All load functions return polars DataFrames by default.
Standard Interface#
Each data source submodule follows the same pattern:
from finm.data import federal_reserve
# Download data from source
federal_reserve.pull(data_dir="./data", accept_license=True)
# Load cached data (returns polars DataFrame)
df = federal_reserve.load(data_dir="./data")
# Load in long format for time series analysis
df_long = federal_reserve.load(data_dir="./data", format="long")
# Get a LazyFrame for deferred computation
lf = federal_reserve.load(data_dir="./data", lazy=True)
# Manual conversion to long format
df_long = federal_reserve.to_long_format(df)
Caching with pull_if_not_found#
Each load function supports automatic downloading when data is missing locally:
from finm.data import federal_reserve
# Will pull data if not found locally
df = federal_reserve.load(
data_dir="./data",
pull_if_not_found=True,
accept_license=True,
)
When using pull_if_not_found=True, you must also set accept_license=True to
acknowledge the data provider’s license terms.
WRDS Special Handling#
WRDS data requires credentials when pulling:
from finm.data import wrds
df = wrds.load(
data_dir="./data",
variant="treasury",
pull_if_not_found=True,
wrds_username="your_username",
start_date="2020-01-01",
end_date="2023-12-31",
)
Polars DataFrames#
All load functions return polars DataFrames by default for better performance.
Use the lazy=True parameter to get a LazyFrame instead:
# Returns polars.DataFrame (default)
df = federal_reserve.load(data_dir="./data")
# Returns polars.LazyFrame for deferred computation
lf = federal_reserve.load(data_dir="./data", lazy=True)
result = lf.filter(pl.col("SVENY01") > 0.03).collect()
Long Format#
The long format uses three standard columns:
unique_id: Identifier for the time series (e.g., yield maturity, factor name, CUSIP)ds: Datey: Value
This format is useful for panel data analysis and time series forecasting.
Available Data Sources#
Federal Reserve Yield Curve#
GSW (Gurkaynak, Sack, Wright) yield curve model data.
from finm.data import federal_reserve
# Download and save to ./data
federal_reserve.pull(data_dir="./data")
# Load standard yield columns (SVENY01-30)
df = federal_reserve.load(data_dir="./data", variant="standard")
# Load all columns
df_all = federal_reserve.load(data_dir="./data", variant="all")
Fama-French Factors#
Fama-French 3 factors from Ken French’s Data Library.
from finm.data import fama_french
# Load bundled data (no download needed)
df = fama_french.load()
# Filter by date
df = fama_french.load(start="2020-01-01", end="2023-12-31")
# Download latest data (requires pandas-datareader)
fama_french.pull(data_dir="./data", frequency="daily")
He-Kelly-Manela Factors#
Intermediary capital risk factors from He, Kelly, and Manela (2017).
from finm.data import he_kelly_manela
# Download data
he_kelly_manela.pull(data_dir="./data")
# Load variants
df_monthly = he_kelly_manela.load(data_dir="./data", variant="factors_monthly")
df_daily = he_kelly_manela.load(data_dir="./data", variant="factors_daily")
df_all = he_kelly_manela.load(data_dir="./data", variant="all")
Open Source Bond Returns#
Treasury and corporate bond returns from Open Bond Asset Pricing.
from finm.data import open_source_bond
# Download both datasets
open_source_bond.pull(data_dir="./data", variant="all")
# Or download individually
open_source_bond.pull(data_dir="./data", variant="treasury")
open_source_bond.pull(data_dir="./data", variant="corporate")
# Load data
treasury = open_source_bond.load(data_dir="./data", variant="treasury")
corporate = open_source_bond.load(data_dir="./data", variant="corporate")
WRDS Data (Requires Credentials)#
CRSP Treasury and corporate bond data from WRDS.
from finm.data import wrds
# Pull Treasury data
wrds.pull(
data_dir="./data",
variant="treasury",
wrds_username="your_username",
start_date="2020-01-01",
end_date="2023-12-31",
)
# Pull corporate bond data
wrds.pull(
data_dir="./data",
variant="corp_bond",
wrds_username="your_username",
start_date="2020-01-01",
end_date="2023-12-31",
)
# Load from cache
df_treasury = wrds.load(data_dir="./data", variant="treasury")
df_corp = wrds.load(data_dir="./data", variant="corp_bond")
Convenience Functions#
The data module also provides descriptive function names at the module level:
from finm import data
# Federal Reserve
data.pull_fed_yield_curve(data_dir="./data")
df = data.load_fed_yield_curve(data_dir="./data")
# Fama-French
df = data.load_fama_french_factors()
# He-Kelly-Manela
data.pull_he_kelly_manela(data_dir="./data")
df = data.load_he_kelly_manela_factors_monthly(data_dir="./data")
# Open Source Bond
data.pull_open_source_bond(data_dir="./data")
df = data.load_treasury_returns(data_dir="./data")
df = data.load_corporate_bond_returns(data_dir="./data")
# WRDS
data.pull_wrds_treasury(data_dir="./data", wrds_username="user", ...)
df = data.load_wrds_treasury(data_dir="./data")