Skip to content

Sqlite writes limit parallel scaling #21235

@JukkaL

Description

@JukkaL

I measured time spent in sqlite operations when running the mypy_parallel benchmark. When using 8 workers, sqlite writes were over 10x slower compared to using just 1 worker, likely because only one worker can be writing at any time. This seems to be one of the main sources of limited parallel scaling on macOS at least.

Here are some ideas which might help:

  1. Instead of writing cache files in each worker, send them via IPC and have the main process write everything.
  2. Use a separate extra cache file per worker, and the main process will copy per-worker cached data to the main database. Every worker reads from the main database.
  3. Use a separate extra cache file per worker, and somehow allow each worker to read from the right database but always write to their own database.
  4. Shard the database -- have 1 (non-exclusive) database per worker, and use module name based hash to choose the target database for both reading and writing.

Next steps:

  • Measure on Linux as well
  • Prototype some of the ideas and measure impact (at least ones that are easy enough to implement)

cc @ilevkivskyi

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions