Just wondering, how would you implement masking in the ViViT? Just in case we want to process videos with different lengths. Any suggestions?