Skip to content

Very large document fails to generate PDF #91

@karelszurman

Description

@karelszurman

I have a document with software design which consists of 54 sdoc files and it's a really large document now. It's part of the project with 3 requirements documents with overall containing 3000+ nodes spanning across SyRD, SRD, and SDD. We generate PDF for each document individually.

For SDD, rendering a single HTML takes some time, but it works in the strictdoc web server. (I.e. there should not be any syntax errors.) Before we were generating PDF files for each module, it worked. Now we want to generate a single PDF file (there is main.sdoc which includes all others) what fails with:

Traceback (most recent call last):
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/features/html2pdf/pdf_print_driver.py", line 98, in get_pdf_from_html
    _: CompletedProcess[bytes] = run(
                                 ^^^^
  File "/usr/lib/python3.12/subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.12/subprocess.py", line 1885, in _execute_child
    self.pid = _fork_exec(
               ^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not int

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/cli/main.py", line 95, in _main
    parser.run(parallelizer)
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/cli/cli_arg_parser.py", line 57, in run
    command_instance.run(parallelizer)
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/commands/export.py", line 208, in run
    export_action.export()
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/helpers/timing.py", line 32, in wrap
    result = func(*args, **kw)
             ^^^^^^^^^^^^^^^^^
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/core/actions/export_action.py", line 133, in export
    HTML2PDFGenerator.export_tree(
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/features/html2pdf/html2pdf_generator.py", line 116, in export_tree
    pdf_print_driver.get_pdf_from_html(
  File "/mnt/c/workspace/project/.venv/lib/python3.12/site-packages/strictdoc/features/html2pdf/pdf_print_driver.py", line 108, in get_pdf_from_html
    raise PDFPrintDriverException(e_) from e_
strictdoc.features.html2pdf.pdf_print_driver.PDFPrintDriverException

error:
error source: /mnt/c/workspaceproject/.venv/lib/python3.12/site-packages/strictdoc/features/html2pdf/pdf_print_driver.py:108, function: get_pdf_from_html()

From the project folder, within virtual environment, I run strictdoc export as follows:

../../.venv/bin/strictdoc export --formats=html2pdf .

I also tried to extend the timeout for the PDFPrintDriver
In the strictdoc/features/html2pdf/pdf_print_driver.py even way more than 10minutes

class PDFPrintDriver:
    @staticmethod
    def get_pdf_from_html(
        project_config: ProjectConfig,
        paths_to_print: List[Tuple[str, str]],
        path_to_input_root: str,
    ) -> None:
        assert isinstance(paths_to_print, list), paths_to_print
        path_to_html2pdf4doc_cache = os.path.join(
            project_config.get_path_to_cache_dir(), "html2pdf"
        )
        cmd: List[str] = [
            # Using sys.executable instead of "python" is important because
            # venv subprocess call to python resolves to wrong interpreter,
            # https://github.com/python/cpython/issues/86207
            # Switching back to calling html2pdf4doc directly because the
            # python -m doesn't work well with PyInstaller.
            # sys.executable, "-m"
            "html2pdf4doc",
            "print",
            "--cache-dir",
            path_to_html2pdf4doc_cache,
            "--debug",
            "--page-load-timeout",
            25*60,
            "--strict",
        ]

and in html2pdf4doc/main.py extended the type range for timeout parameter

    command_parser_print.add_argument(
        "--page-load-timeout",
        # 10 minutes should be enough to print even the largest documents.
        type=IntRange(0, 30 * 60),

When I try open generated HTML 000_main-PDF.html it does not finally render in Chrome browser. HTML file looks from the html structure perspective fine. The Chrome developer console shows such error:

Uncaught (in promise) Error: 
 ⛔ [grid.split] ♾️ loop guard triggered
  | (anonymous) | @ | html2pdf4doc.min.js:6
-- | -- | -- | --
  | split | @ | html2pdf4doc.min.js:6
  | Ri | @ | html2pdf4doc.min.js:6
  | _parseNode | @ | html2pdf4doc.min.js:11
  | _parseNodes | @ | html2pdf4doc.min.js:9
  | _parseNode | @ | html2pdf4doc.min.js:11
  | _parseNodes | @ | html2pdf4doc.min.js:9
  | _parseNode | @ | html2pdf4doc.min.js:11
  | _parseNodes | @ | html2pdf4doc.min.js:9
  | _calculatePageStarts | @ | html2pdf4doc.min.js:9
  | calculate | @ | html2pdf4doc.min.js:9
  | render | @ | html2pdf4doc.min.js:11
  | await in render |   |  
  | (anonymous) | @ | html2pdf4doc.min.js:11
  | (anonymous) | @ | html2pdf4doc.min.js:11

strictdoc version
html2pdf4doc version 0.0.33
running on WSL2

Can you please provide me some hints how to deal with this issue or debug it more?
Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions