The first column is the entity id, the second column is the date, and the third
column is the value. Each dataset is sorted by entity, then by date,
so that the dates are ordered within each entity.
Explanation
This is the info that I used to determine the format of the datasets:
For panel data intended for time series forecasting in Python, the recommended structure and sorting approach is as follows:
Recommended Column Order:
This ordering clearly separates each individual entity’s time series, making it easier to manage and forecast multiple series simultaneously.
Recommended Sorting:
Sorting first by entity ensures that all observations for each entity are grouped together. Sorting secondarily by date ensures that within each entity, observations are chronologically ordered. This sorting is crucial because time series forecasting methods rely on the correct temporal order of observations within each entity.
Example:
This format and sorting method will facilitate straightforward integration with common Python forecasting libraries such as Prophet, statsmodels, sktime, or darts, which typically assume data is provided in long format with explicit entity identifiers and chronological ordering.