see here: https://github.com/OpenNMT/CTranslate2/pull/1687#issuecomment-2163523905 but maybe we also need to avoid empty tokens resulting from a.split(" ") when there is two consecutive spaces in a