Skip to content

korp_mono.py list index out of range #3

@Phaqui

Description

@Phaqui

Det kan virke som at noe går galt et sted. I den fila jeg prøver å konvertere med korp_mono.py, ligger det analyser av typen

"<1024x768>"
	"1024x" Err/MissingSpace"768" Num @HNOUN #2->0

Som da gjør at scriptet krasjer med følgende melding (de første tre linjene har jeg skrevet ut slik at jeg skulle finne ut hvordan inputtet så ut.

anders@debian:~/corpus/corpus-fao$ korp_mono --skip-existing --ncpus most analysed/blogs/web_mix.txt.xml
--skip-existing given. Skipping 0 files that are already processed
Processing 1 files in parallel (9 workers)
word_form='1024x768'
lemma='1024x_∞_@HNOUN #2->0'
rest_cohort='\t"1024x" Err/MissingSpace"768" Num @HNOUN #2->0'
[1/1 FAILED: /home/anders/corpus/corpus-fao/analysed/blogs/web_mix.txt.xml
list index out of range
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/anders/.pyenv/versions/3.11.1/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 528, in process_file
    make_vrt_xml(file, analysed_file.lang),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 547, in make_vrt_xml
    make_sentences(valid_sentences(old_root.find(".//body/dependency").text), lang)
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 888, in make_sentences
    return [
           ^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 889, in <listcomp>
    make_sentence(current_sentence, current_lang) for current_sentence in sentences
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 879, in make_sentence
    [
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 880, in <listcomp>
    make_analysis_tuple(word_form, rest_cohort, current_lang)
  File "/home/anders/projects/CorpusTools/corpustools/korp_mono.py", line 840, in make_analysis_tuple
    maybe_pos = parts[1].replace("_∞_", "").strip()
                ~~~~~^^^
IndexError: list index out of range

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions