Thank you for your great work. I'm currently exploring the IMU pretraining component using the Ego4D dataset.
I've observed that the pretraining code requires specific index files: datasets/Pretrain/imu/ego4d/window_idx_train.json and datasets/Pretrain/imu/ego4d/window_idx_val.json, in addition to the processed IMU data itself.
My assumption is that these window_idx_*.json files were generated by processing the official Ego4D metadata. Could you please confirm if this is correct?
To help clarify the generation process, I have a few specific questions regarding the fields within these JSON files:
- Caption Source: Was the caption field derived from the
scenarios field (or another specific field) in the Ego4D metadata?
- Data Split: How was the train/validation split determined? Was it based on the
split_em, split_av fields from the metadata, or was a different method (e.g., random sampling across videos) employed?
- Window Definition: Could you elaborate on how the
window_start and window_end timestamps or indices were defined? For instance, are they fixed-size sliding windows, related to specific events, or defined differently?
If possible, would you be able to share the script or provide details on the methodology used to generate these window_idx_*.json files from the metadata? Thank you!
Thank you for your great work. I'm currently exploring the IMU pretraining component using the Ego4D dataset.
I've observed that the pretraining code requires specific index files:
datasets/Pretrain/imu/ego4d/window_idx_train.jsonanddatasets/Pretrain/imu/ego4d/window_idx_val.json, in addition to the processed IMU data itself.My assumption is that these
window_idx_*.jsonfiles were generated by processing the official Ego4D metadata. Could you please confirm if this is correct?To help clarify the generation process, I have a few specific questions regarding the fields within these JSON files:
scenariosfield (or another specific field) in the Ego4D metadata?split_em, split_avfields from the metadata, or was a different method (e.g., random sampling across videos) employed?window_startandwindow_endtimestamps or indices were defined? For instance, are they fixed-size sliding windows, related to specific events, or defined differently?If possible, would you be able to share the script or provide details on the methodology used to generate these
window_idx_*.jsonfiles from the metadata? Thank you!