Skip to content

Conversation

@isaacseymour
Copy link
Contributor

tl;dr: make codegen much faster through parallel writing of files, and writing each file's contents once.


We have a very large API design, with over 100 services, which generates almost 3k files with goa generate. As our API has grown, running goa gen has got really slow.

There's three changes in this PR:

  1. Emit [TIMING] logs when --debug is passed - this makes finding bottlenecks much easier
  2. Use a worker pool to write files: parallelising this makes it significantly faster (~3x)
  3. Only write files once they are finalized - i.e. tidying up imports with an in-memory copy of the file, before persisting it at the end.

This cut the time on that stage for us from ~52s to ~10s 🎉

lawrencejones and others added 2 commits November 10, 2025 16:47
tl;dr: make codegen much faster through parallel writing of files

We have a _very large_ API design, with over 100 services, which
generates almost 3k files with `goa generate`. As our API has grown,
running `goa generate` has got really slow.

This commit adds timing data to the `--debug` output.

Then, the lowest-hanging-fruit optimisation has been applied: writing
all the output files in parallel.

---

As a recap, the generation process has several stages:

1. **Compile temporary binary**: Goa creates a temporary Go program that
   imports the design package (which triggers package initialization and
   DSL execution via blank import)
2. **Execute binary**: The binary runs through multiple phases:
   - Package initialization (runs DSL definitions)
   - `eval.RunDSL()` - processes the DSL in 4 phases (execute, prepare,
     validate, finalize)
   - `generator.Generate()` - produces the actual Go files

Measuring codegen for the incident-io codebase on 3,036 generated files:

**Total time: 61 seconds**

Breakdown:
- build.Import: 117ms
- NewGenerator (packages.Load): 52ms
- Write (generate main.go): 14ms
- Compile (go get + go build): 3.6s
  - packages.Load: 47ms
  - go get: 514ms
  - go build: 3.0s
- **Run (execute binary): 52.2s** ⚠️ 85% of total time
  - Check eval.Context.Errors: <1ms
  - eval.RunDSL(): 105ms
  - **generator.Generate(): 51.6s** ⚠️
    - Stage 1 (Compute design roots): <1ms
    - Stage 2 (Compute gen package): 33ms
    - Stage 3 (Retrieve generators): <1ms
    - Stage 4 (Pre-generation plugins): <1ms
    - **Stage 5 (Generate files): 26.2s** (3 generators, sequential)
      - Generator 0: 7.2s → 1,438 files
      - Generator 1: 18.8s → 1,594 files
      - Generator 2: 0.2s → 4 files
    - Stage 6 (Post-generation plugins): <1ms
    - **Stage 7 (Write files): 32.1s** ⚠️ Biggest bottleneck (52% of generation)
    - Stage 8 (Compute filenames): 2ms

This commit tries optimising the biggest stage of this process and
changes file rendering from sequential loop to parallel worker pool:

```go
// Before: Sequential (32.1s for 3,036 files)
for _, f := range genfiles {
    filename, err := f.Render(dir)
    // ...
}

// After: Parallel with runtime.NumCPU() workers
numWorkers := runtime.NumCPU()
// Worker pool processes files concurrently
```

Which changed execution time from **61 seconds total** to **34 seconds
total**, for an overall speedup of 1.8x.

While here, I also tried parallelising Stage 5 (generator functions) but
hit infinite recursion in `AsObject()` when handling circular type
references concurrently. This is where I'd go for the next biggest
speed-up.
Eliminate redundant file I/O operations that were causing severe performance
degradation. Previous implementation performed 2 reads and 3 writes per Go
file during code generation.

Changes:
- Refactor finalizeGoSource to process entirely in memory
- Render template sections to buffer instead of directly to file
- Perform all formatting/import cleanup in memory
- Write final content exactly once per file

Performance improvement:
- Before: 40s for 3000+ files (5 I/O operations per file)
- After: 9s for 3000+ files (1 write operation per file)
- 77.5% reduction in generation time

This maintains all existing functionality including:
- Import cleanup optimizations
- Proper goimports formatting
- Parallel file writing benefits

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
}

// 5. Generate initial set of files produced by goa code generators.
// NOTE: Parallelization causes infinite recursion in AsObject() for circular type references
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try to find time to work on this problem in the next couple weeks ideally. This stage is now the slowest (~20s), but parallelising is fiddly due to circular references.

type ExampleGenerator struct {
Randomizer
seen map[string]*any
mu sync.RWMutex
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this isn't essential right now, I think it's sensible to add locks around map operations to avoid panics in future!

@raphael
Copy link
Member

raphael commented Nov 14, 2025

This is awesome, thank you for the great work! Just one thing that might be nice before we can merge: would it be possible to add a few tests for the new concurrent part of the code?

@isaacseymour
Copy link
Contributor Author

I've added some tests that hit the "generate lots of files" code, but they don't directly test the concurrency (although I have run them with -race successfully!)

Was there anything else you were thinking of @raphael ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants