Conversation
|
Great!! |
Yes, the modeling units are 5000 tokens including "<blank>". |
|
Thanks!
You may run into memory problems. Fangjun recently committed a code change
that can be used to work around something related to that, though.
We need to make sure our recipes can run for those kinds of sizes anyway.
…On Tue, May 25, 2021 at 10:21 AM LIyong.Guo ***@***.***> wrote:
Great!!
I assume the modeling units are BPE pieces? I think a good step towards
resolving the difference would be to train
(i) a CTC model
(ii) a LF-MMI model
using those same BPE pieces.
Yes, the modeling units are 5000 tokens including .
I will do the suggested experiments.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#201 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2ABWX6JSLSM35IIELTPMCSRANCNFSM45NKCFJQ>
.
|
| b_to_a_map=b_to_a_map, | ||
| sorted_match_a=True) | ||
| lm_path_lats = k2.top_sort(k2.connect(lm_path_lats.to('cpu'))).to(device) | ||
| lm_scores = lm_path_lats.get_tot_scores(True, True) |
There was a problem hiding this comment.
The 2nd arg to get_tot_scores() here, representing log_semiring, should be false, because ARPA-type language models are constructed in such a way that the backoff prob is included in the direct arc. I.e. we would be double-counting if we were to sum the probabilities of the non-backoff and backoff arcs.
csukuangfj
left a comment
There was a problem hiding this comment.
Please add more documentation to your code.
| x -= self.mean | ||
|
|
||
| if norm_vars: | ||
| x /= self.std |
There was a problem hiding this comment.
norm_means uses a guard requires_grad to choose whether to perform an in-place update. Is there a reason not to do the same here?
The original implementation
https://github.com/espnet/espnet/blob/08feae5bb93fa8f6dcba66760c8617a4b5e39d70/espnet/nets/pytorch_backend/frontends/feature_transform.py#L135
uses self.scale to do a multiplication, which is more efficient than dividing by self.std.
| def encode( | ||
| self, speech: torch.Tensor, | ||
| speech_lengths: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]: | ||
|
|
There was a problem hiding this comment.
Would you mind adding doc describing the shape of various tensors?
| return nnet_output | ||
|
|
||
| @classmethod | ||
| def build_model(cls, asr_train_config, asr_model_file, device): |
There was a problem hiding this comment.
cls is never used.
I would suggest changing @classmethod to @staticmethod and removing cls.
| """ | ||
| model = TransformerLM(**config) | ||
|
|
||
| assert model_file is not None, f"model file doesn't exist" |
There was a problem hiding this comment.
f"{model_file} doesn't exist"
| if model_type == 'espnet': | ||
| return load_espnet_model(config, model_file) | ||
| elif model_type == 'snowfall': | ||
| raise NotImplementedError(f'Snowfall model to be suppported') |
There was a problem hiding this comment.
No need to use f-string here.
| self.unk_idx = self.token2idx['<unk>'] | ||
|
|
||
|
|
||
| @dataclass |
There was a problem hiding this comment.
Do we really need to use dataclass here?
Also, could you remove the class NumericalizerMixin?
The extra level of inheritance makes the code hard to read.
| # The original link of these models is: | ||
| # https://zenodo.org/record/4604066#.YKtNrqgzZPY | ||
| # which is accessible by espnet utils | ||
| # The are ported to following link for users who don't have espnet dependencies. |
| # The are ported to following link for users who don't have espnet dependencies. | ||
| if [ ! -d snowfall_model_zoo ]; then | ||
| echo "About to download pretrained models." | ||
| git clone https://huggingface.co/GuoLiyong/snowfall_model_zoo |
There was a problem hiding this comment.
I would suggest using git clone --depth 1. It improves the clone speed.
| blank_bias = -1.0 | ||
| nnet_output[:, :, 0] += blank_bias | ||
|
|
||
| supervision_segments = torch.tensor([[0, 0, nnet_output.shape[1]]], |
There was a problem hiding this comment.
Is the batch size always 1? A larger batch size can improve decoding speed.
|
|
||
| ref = batch['supervisions']['text'] | ||
| for i in range(len(ref)): | ||
| hyp_words = text.split(' ') |
There was a problem hiding this comment.
What's the format of text?
Does text depend on i? If not, you can split it outside of the for loop.
Wer results of this pr (by loaded models from espnet model zoo):
This pr implements following procedure with models from espnet model zoo:

Added benefits by loading espnet trained conformer encoder model with equivalent snowfall model definition:
Also, the loaded espnet transformer lm could be used as a baseline for snowfall lm training tasks.