Cleaning Summary: Treasury Bond Returns#

import sys
from pathlib import Path

sys.path.insert(0, "../../src")
sys.path.insert(0, "./src")

import calc_treasury_bond_returns
import pandas as pd

# import polars as pl
from he_kelly_manela import pull_he_kelly_manela
from settings import config

DATA_DIR = Path(config("DATA_DIR"))

Treasury Bond Returns Summary#

By leveraging the TRACE dataset from openbondassetpricing.com, the FTSFR dataset ensures a robust foundation for analyzing treasury bond returns, adhering to established methodologies and incorporating comprehensive data cleaning procedures.

Data Cleaning and Construction#

The treasury bond returns dataset is constructed using the following cleaning and processing steps:

1. Bond Selection Criteria#

  • CUSIP Filtering:

    • Only include bonds with CUSIPs starting with โ€˜91โ€™ (indicating Treasury securities)

    • This ensures weโ€™re only analyzing genuine Treasury bonds

2. Return Processing#

  • Return Conversion:

    • Convert raw returns to decimal form by dividing by 100

    • This standardizes the return format for analysis

  • Return Filtering:

    • Remove observations where returns exceed 50% (tr_return > 0.5)

    • This eliminates potential data errors or extreme outliers

3. Maturity Grouping#

  • Maturity Bins:

    • Create 10 maturity groups using 0.5-year intervals from 0 to 5 years

    • Bins: [0.0, 0.5, 1.0, โ€ฆ, 5.0]

    • Each group represents a specific maturity range for analysis

  • Group Assignment:

    • Assign each bond to a maturity group based on its time to maturity (tau)

    • Drop observations with missing maturity information

    • Convert group labels to integers for easier analysis

4. Portfolio Construction#

  • Return Aggregation:

    • Group bonds by date and maturity group

    • Calculate mean returns for each group

    • This creates a time series of portfolio returns for each maturity group

5. Data Quality Checks#

  • Missing Value Handling:

    • Remove observations with missing returns

    • Remove observations with missing maturity group assignments

  • Outlier Treatment:

    • Extreme returns (>50%) are filtered out

    • This ensures the analysis focuses on normal market conditions

This cleaning process ensures a high-quality dataset for analyzing Treasury bond returns across different maturity groups, facilitating comparison with the Kelly-Manzello data.

hkm = pull_he_kelly_manela.load_he_kelly_manela_all(
    data_dir=DATA_DIR / "he_kelly_manela"
)
treas_hkm = hkm.iloc[:, 34:44].copy()
treas_hkm["yyyymm"] = hkm["yyyymm"]
treas_hkm.head()
treas_hkm.tail()
treas_hkm.describe()
treas_hkm.isnull().sum()
US_bonds_01    25
US_bonds_02    25
US_bonds_03    25
US_bonds_04    25
US_bonds_05    25
US_bonds_06    25
US_bonds_07    25
US_bonds_08    25
US_bonds_09    25
US_bonds_10    25
yyyymm          0
dtype: int64
treas_bond_returns = calc_treasury_bond_returns.calc_returns(
    data_dir=DATA_DIR / "us_treasury_returns"
)
treas_bond_returns.describe()
DATE 1 2 3 4 5 6 7 8 9 10
count 668 668.000000 668.000000 668.000000 665.000000 665.000000 665.000000 668.000000 668.000000 665.000000 659.000000
mean 1997-11-14 12:25:52.095808384 0.003809 0.004242 0.004447 0.004466 0.004474 0.004838 0.005006 0.005152 0.005141 0.005120
min 1970-01-31 00:00:00 -0.000283 -0.008939 -0.025195 -0.032362 -0.046647 -0.052993 -0.052765 -0.060457 -0.058666 -0.059056
25% 1983-12-23 06:00:00 0.001228 0.000874 0.000446 0.000224 -0.000375 -0.000747 -0.001473 -0.002057 -0.002714 -0.003201
50% 1997-11-15 00:00:00 0.003798 0.003604 0.003525 0.003321 0.003453 0.003900 0.004090 0.004340 0.004370 0.004511
75% 2011-10-07 18:00:00 0.005326 0.006152 0.007166 0.007829 0.008772 0.010052 0.011202 0.012094 0.012924 0.013117
max 2025-08-31 00:00:00 0.023142 0.045551 0.064735 0.068330 0.091789 0.095415 0.086319 0.093149 0.093774 0.121050
std NaN 0.003098 0.004619 0.006256 0.007471 0.008891 0.010458 0.011638 0.012975 0.013851 0.014982

Comparing FTSFR with He Kelly Manela#

# Print initial data info
print("Treasury Bond Returns Info:")
print(treas_bond_returns.info())
print("\nTreasury Bond Returns Head:")
print(treas_bond_returns.head())
print("\nTreasury Bond Returns Date Range:")
print(treas_bond_returns["DATE"].min(), "to", treas_bond_returns["DATE"].max())

print("\nHKM Treasury Bonds Info:")
print(treas_hkm.info())
print("\nHKM Treasury Bonds Head:")
print(treas_hkm.head())

# Convert treas_hkm dates to datetime
treas_hkm["date"] = pd.to_datetime(
    treas_hkm["yyyymm"].astype(int).astype(str), format="%Y%m"
) + pd.offsets.MonthEnd(0)

print("\nAfter date conversion - HKM Treasury Bonds Head:")
print(treas_hkm.head())
print("\nHKM Treasury Bonds Date Range:")
print(treas_hkm["date"].min(), "to", treas_hkm["date"].max())

# Convert treas_bond_returns DATE to datetime if it's not already
treas_bond_returns["DATE"] = pd.to_datetime(treas_bond_returns["DATE"])

print("\nAfter date conversion - Treasury Bond Returns Head:")
print(treas_bond_returns.head())
print("\nTreasury Bond Returns Date Range:")
print(treas_bond_returns["DATE"].min(), "to", treas_bond_returns["DATE"].max())

# Try the merge
merged_df = pd.merge(
    treas_bond_returns, treas_hkm, left_on="DATE", right_on="date", how="inner"
)

print("\nMerged DataFrame Shape:", merged_df.shape)
print("\nMerged DataFrame Head:")
print(merged_df.head())

# Create subplots for each pair of columns
import matplotlib.pyplot as plt

if not merged_df.empty:
    fig, axes = plt.subplots(5, 2, figsize=(15, 20))
    axes = axes.flatten()

    for i in range(10):
        col1 = str(i + 1)  # Column from treas_bond_returns
        if i == 9:
            col2 = "US_bonds_10"  # Column from treas_hkm
        else:
            col2 = f"US_bonds_0{i + 1}"  # Column from treas_hkm

        ax = axes[i]
        ax.plot(
            merged_df["DATE"], merged_df[col1], label=f"Portfolio {i + 1}", color="blue"
        )
        ax.plot(
            merged_df["DATE"],
            merged_df[col2],
            label=f"HKM {i + 1}",
            color="red",
            linestyle="--",
        )
        ax.set_title(f"Comparison: Portfolio {i + 1} vs HKM {i + 1}")
        ax.legend()
        ax.grid(True)

        # Rotate x-axis labels for better readability
        plt.setp(ax.get_xticklabels(), rotation=45)

    plt.tight_layout()
    plt.show()
else:
    print("\nNo data to plot - merged DataFrame is empty")

# Print correlation between corresponding columns
print("\nCorrelations between corresponding columns:")
for i in range(10):
    col1 = str(i + 1)
    if i == 9:
        col2 = "US_bonds_10"
    else:
        col2 = f"US_bonds_0{i + 1}"
    corr = merged_df[col1].corr(merged_df[col2])
    print(f"Portfolio {i + 1} vs HKM {i + 1}: {corr:.4f}")
Treasury Bond Returns Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 668 entries, 0 to 667
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   DATE    668 non-null    datetime64[ns]
 1   1       668 non-null    float64       
 2   2       668 non-null    float64       
 3   3       668 non-null    float64       
 4   4       665 non-null    float64       
 5   5       665 non-null    float64       
 6   6       665 non-null    float64       
 7   7       668 non-null    float64       
 8   8       668 non-null    float64       
 9   9       665 non-null    float64       
 10  10      659 non-null    float64       
dtypes: datetime64[ns](1), float64(10)
memory usage: 57.5 KB
None

Treasury Bond Returns Head:
        DATE         1         2         3         4         5         6  \
0 1970-01-31  0.008027  0.008752  0.006678  0.008800  0.010397  0.007652   
1 1970-02-28  0.008731  0.015935  0.016617  0.023389  0.028954       NaN   
2 1970-03-31  0.006317  0.007638  0.007676  0.009226  0.009374       NaN   
3 1970-04-30  0.004571  0.000628 -0.002641 -0.005875 -0.009144       NaN   
4 1970-05-31  0.004894  0.006297  0.005494  0.004096  0.004789  0.009457   

          7         8         9        10  
0 -0.006523  0.010780  0.009950  0.002499  
1  0.026928  0.040144  0.043813  0.035437  
2  0.009485  0.007844  0.007447  0.008757  
3 -0.015769 -0.015136 -0.018791 -0.020009  
4  0.008328  0.008393  0.010475  0.010748  

Treasury Bond Returns Date Range:
1970-01-31 00:00:00 to 2025-08-31 00:00:00

HKM Treasury Bonds Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 516 entries, 0 to 515
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   US_bonds_01  491 non-null    float64
 1   US_bonds_02  491 non-null    float64
 2   US_bonds_03  491 non-null    float64
 3   US_bonds_04  491 non-null    float64
 4   US_bonds_05  491 non-null    float64
 5   US_bonds_06  491 non-null    float64
 6   US_bonds_07  491 non-null    float64
 7   US_bonds_08  491 non-null    float64
 8   US_bonds_09  491 non-null    float64
 9   US_bonds_10  491 non-null    float64
 10  yyyymm       516 non-null    float64
dtypes: float64(11)
memory usage: 44.5 KB
None

HKM Treasury Bonds Head:
   US_bonds_01  US_bonds_02  US_bonds_03  US_bonds_04  US_bonds_05  \
0          NaN          NaN          NaN          NaN          NaN   
1          NaN          NaN          NaN          NaN          NaN   
2          NaN          NaN          NaN          NaN          NaN   
3          NaN          NaN          NaN          NaN          NaN   
4          NaN          NaN          NaN          NaN          NaN   

   US_bonds_06  US_bonds_07  US_bonds_08  US_bonds_09  US_bonds_10    yyyymm  
0          NaN          NaN          NaN          NaN          NaN  197001.0  
1          NaN          NaN          NaN          NaN          NaN  197002.0  
2          NaN          NaN          NaN          NaN          NaN  197003.0  
3          NaN          NaN          NaN          NaN          NaN  197004.0  
4          NaN          NaN          NaN          NaN          NaN  197005.0  

After date conversion - HKM Treasury Bonds Head:
   US_bonds_01  US_bonds_02  US_bonds_03  US_bonds_04  US_bonds_05  \
0          NaN          NaN          NaN          NaN          NaN   
1          NaN          NaN          NaN          NaN          NaN   
2          NaN          NaN          NaN          NaN          NaN   
3          NaN          NaN          NaN          NaN          NaN   
4          NaN          NaN          NaN          NaN          NaN   

   US_bonds_06  US_bonds_07  US_bonds_08  US_bonds_09  US_bonds_10    yyyymm  \
0          NaN          NaN          NaN          NaN          NaN  197001.0   
1          NaN          NaN          NaN          NaN          NaN  197002.0   
2          NaN          NaN          NaN          NaN          NaN  197003.0   
3          NaN          NaN          NaN          NaN          NaN  197004.0   
4          NaN          NaN          NaN          NaN          NaN  197005.0   

        date  
0 1970-01-31  
1 1970-02-28  
2 1970-03-31  
3 1970-04-30  
4 1970-05-31  

HKM Treasury Bonds Date Range:
1970-01-31 00:00:00 to 2012-12-31 00:00:00

After date conversion - Treasury Bond Returns Head:
        DATE         1         2         3         4         5         6  \
0 1970-01-31  0.008027  0.008752  0.006678  0.008800  0.010397  0.007652   
1 1970-02-28  0.008731  0.015935  0.016617  0.023389  0.028954       NaN   
2 1970-03-31  0.006317  0.007638  0.007676  0.009226  0.009374       NaN   
3 1970-04-30  0.004571  0.000628 -0.002641 -0.005875 -0.009144       NaN   
4 1970-05-31  0.004894  0.006297  0.005494  0.004096  0.004789  0.009457   

          7         8         9        10  
0 -0.006523  0.010780  0.009950  0.002499  
1  0.026928  0.040144  0.043813  0.035437  
2  0.009485  0.007844  0.007447  0.008757  
3 -0.015769 -0.015136 -0.018791 -0.020009  
4  0.008328  0.008393  0.010475  0.010748  

Treasury Bond Returns Date Range:
1970-01-31 00:00:00 to 2025-08-31 00:00:00

Merged DataFrame Shape: (516, 23)

Merged DataFrame Head:
        DATE         1         2         3         4         5         6  \
0 1970-01-31  0.008027  0.008752  0.006678  0.008800  0.010397  0.007652   
1 1970-02-28  0.008731  0.015935  0.016617  0.023389  0.028954       NaN   
2 1970-03-31  0.006317  0.007638  0.007676  0.009226  0.009374       NaN   
3 1970-04-30  0.004571  0.000628 -0.002641 -0.005875 -0.009144       NaN   
4 1970-05-31  0.004894  0.006297  0.005494  0.004096  0.004789  0.009457   

          7         8         9  ...  US_bonds_03  US_bonds_04  US_bonds_05  \
0 -0.006523  0.010780  0.009950  ...          NaN          NaN          NaN   
1  0.026928  0.040144  0.043813  ...          NaN          NaN          NaN   
2  0.009485  0.007844  0.007447  ...          NaN          NaN          NaN   
3 -0.015769 -0.015136 -0.018791  ...          NaN          NaN          NaN   
4  0.008328  0.008393  0.010475  ...          NaN          NaN          NaN   

   US_bonds_06  US_bonds_07  US_bonds_08  US_bonds_09  US_bonds_10    yyyymm  \
0          NaN          NaN          NaN          NaN          NaN  197001.0   
1          NaN          NaN          NaN          NaN          NaN  197002.0   
2          NaN          NaN          NaN          NaN          NaN  197003.0   
3          NaN          NaN          NaN          NaN          NaN  197004.0   
4          NaN          NaN          NaN          NaN          NaN  197005.0   

        date  
0 1970-01-31  
1 1970-02-28  
2 1970-03-31  
3 1970-04-30  
4 1970-05-31  

[5 rows x 23 columns]
../_images/440c66a4286e274e406ae50773e5a638ddd491e7ccc87be185414ee8dee70cbe.png
Correlations between corresponding columns:
Portfolio 1 vs HKM 1: 0.9939
Portfolio 2 vs HKM 2: 0.9996
Portfolio 3 vs HKM 3: 0.9995
Portfolio 4 vs HKM 4: 0.9983
Portfolio 5 vs HKM 5: 0.9960
Portfolio 6 vs HKM 6: 0.9987
Portfolio 7 vs HKM 7: 0.9982
Portfolio 8 vs HKM 8: 0.9988
Portfolio 9 vs HKM 9: 0.9953
Portfolio 10 vs HKM 10: 0.9975

๐Ÿ“ˆ Comparison of Treasury Bond Portfolio Returns: FTSFR Portfolios vs. HKM Portfolios#

The figure above compares the time-series returns of Treasury bond portfolios:

  • Portfolios 1โ€“10 (in blue): Portfolios constructed by FTSFR, where Treasury bonds are sorted by time remaining to maturity, in 6-month intervals:

    • Portfolio 1: 0 to 6 months

    • Portfolio 2: 6 months to 1 year

    • Portfolio 3: 1 year to 1.5 years

    • Portfolio 4: 1.5 to 2 years

    • Portfolio 5: 2 to 2.5 years

    • Portfolio 6: 2.5 to 3 years

    • Portfolio 7: 3 to 3.5 years

    • Portfolio 8: 3.5 to 4 years

    • Portfolio 9: 4 to 4.5 years

    • Portfolio 10: 4.5 to 5 years

  • HKM Portfolios 1โ€“10 (in red): Portfolios from He, Kelly, and Manella (HKM) using a similar 6-month maturity bucket structure for comparison.


๐Ÿ” Observations#

  • The returns between FTSFR portfolios (blue) and HKM portfolios (red) show close alignment, indicating a consistent term-structure pattern across both datasets.

  • During periods of heightened volatilityโ€”such as the 2008 financial crisis โ€”portfolios with longer time to maturity generally exhibit greater return sensitivity, seen consistently in both series.

  • Small return differences may result from:

    • Rounding errors due to different data sources and small values.

    • Missing values in the HKM data.


This comparison confirms that the FTSFR replication tracks the structure and return behavior of the HKM maturity-sorted Treasury portfolios.