What pytest's collection phase costs you

Neul Labs · May 21, 2026 ·

pytestperformance

If you run pytest -v --collect-only on a real project you’ll see something interesting: the command takes most of the time of a normal pytest invocation, and zero tests have run. That gap — between “pytest started” and “pytest ran a test” — is collection. For a lot of teams it is the largest source of “why does pytest feel slow” and also the least understood.

This post is about what collection actually does, why it costs what it costs, and what removing it from the per-invocation hot path looks like. We’ll cite the rpytest benchmark for one concrete data point, but the broader claim is about pytest the framework, not specifically about us.

What collection actually does

When you run pytest, the framework has to figure out which tests exist before it can run any of them. That requires:

Walking the file tree under testpaths (or the current directory) and matching names against python_files (default test_*.py).
Importing every matched file. Importing a test module means executing it top-to-bottom: module-level fixtures, marker definitions, helper imports, and anything else at module scope runs now, before any test runs.
Resolving conftest.py files from the project root down to each test directory. Every conftest.py is itself a Python module that gets executed. Fixtures defined there are registered into pytest’s fixture resolution.
Loading plugins — both packaged plugins (declared via entry points in installed distributions) and local plugins. Each plugin’s pytest_plugins hook runs, and each one is free to register new hooks, fixtures, markers, and CLI options.
Generating parametrized test IDs. @pytest.mark.parametrize is expanded at collection time; one parametrized function with 200 cases becomes 200 collected node IDs in this phase.
Resolving applied markers against markexpr and other selectors, and applying skipif logic that depends on import-time information.

Only after all of that is done does pytest’s pytest_collection_finish hook fire and the actual run phase begin. If you’ve ever wondered why pytest --collect-only -q | wc -l takes seconds for a small repo, this is why: the work to know what to run is most of the work.

Why it’s expensive in practice

Three things compound:

Interpreter startup is non-zero. Cold CPython startup is typically tens of milliseconds; with site-packages enabled and a few imports it climbs further. You pay this on every invocation.

Plugin import graph is large. Modern Python test environments routinely have pytest-cov, pytest-mock, pytest-asyncio, pytest-django (or whatever framework integration is in play), pytest-xdist, hypothesis, pytest-randomly, and friends. Each one adds import time. None of them individually feel slow. Together they noticeably move the needle.

Test files import production code. A test file that says from myapp.services import UserService imports myapp.services, which imports the ORM, which imports the database driver, which imports a C extension. The transitive import graph reachable from a test module is often most of your application. Collection executes it.

The result is a fixed cost per invocation that has nothing to do with the tests you actually want to run. Whether you ran the whole suite or one parametrized case, you paid it.

When the fixed cost dominates

The fixed cost dominates whenever your tests are individually cheap. Consider:

A TDD loop: edit, save, run a single test, repeat. The cheap individual test runs in milliseconds; the collection phase runs in hundreds of milliseconds to seconds. The dev experience is “pytest is slow” even though the actual test is fast.
pytest --lf (last-failed): you’re running one failing test out of thousands. Collection still has to walk all the files. The interesting work is over before pytest dispatches.
CI sharding: a CI job that runs one-fourth of the suite still pays a full collection cost on a one-fourth-sized inventory if you split by file paths, or pays the full collection if you split by node IDs.
Watch-mode tools that re-invoke pytest on every file change: collection happens on every save.

The benchmark in rpytest’s BENCHMARK.md measures this. On a 500-test synthetic suite, pytest’s wall clock including startup is 0.63 s. Of that, execution itself is roughly 0.30 s. The other ~0.33 s is everything-before-execution: interpreter startup, plugin import, file walk, conftest evaluation, parametrization expansion. Half of pytest’s wall clock, on that suite, is collection-shaped overhead.

That ratio gets worse, not better, for short suites. On a single-test invocation, the inventory work is the same — but the execution share is much smaller, so collection’s percentage approaches 100.

What pytest-xdist does and doesn’t fix

A common response to “pytest is slow” is pytest -n auto. That makes the execution phase faster (more cores), but it makes the collection phase worse: each xdist worker is a separate Python process that has to do its own collection. The rpytest benchmark shows this clearly: pytest -n 4 is slower than plain pytest for that 500-test suite (0.87 s vs 0.63 s wall clock), because the workers’ startup and collection costs collectively exceed the execution speedup. pytest -n auto is slower still (1.90 s) for the same reason at higher worker counts.

This isn’t a knock on xdist — it’s a knock on doing collection per worker. xdist amortizes collection beautifully on large suites where execution time vastly exceeds collection time. It just doesn’t help, and can hurt, on the short suites where you most want parallel to help.

What removing it from the hot path looks like

There are a few ways to make collection’s per-invocation cost go away, in increasing order of how much they require you to change:

Run pytest less often. Run the whole suite once instead of one test ten times. This works for some workflows; it ruins TDD.
Cache collection between runs. Tools like pytest-testmon partially do this — they skip tests whose dependencies haven’t changed. They don’t skip the import cost, but they reduce the execution set.
Keep the interpreter warm. Hold a Python process that has already imported your suite, and dispatch from outside. This is what rpytest does: a daemon hosts a real pytest process, the inventory lives in its memory, and each rpytest invocation is an RPC into it. The interpreter only starts once. Plugins only load once. Collection only runs once per session.

On rpytest’s measured 500-test suite, that drops wall clock from 0.63 s to 0.32 s with the daemon warm — about half the time. The execution phase is unchanged (you can’t get the actual tests to run faster by hosting them). The collection phase is just gone from the per-invocation path.

What this means for you

The honest framing: the daemon model is a structural fit for any workflow where you invoke the test runner more than once per session. TDD, watch mode, last-failed, CI hot-paths that re-run after a failed step, sharded CI jobs that retry. For a one-shot pytest invocation in a developer’s terminal, you’ll still pay the daemon’s first-run cost; the gain is on every subsequent invocation.

The right way to know whether it matters to you isn’t a blog post. It’s:

Run pytest -v --collect-only on your suite and look at the time.
Compare to a real pytest run with -x and a single passing test.
The gap is what removing collection from the hot path is worth to your team, per invocation, multiplied by how many invocations you actually do.

If that number is small, you don’t have a collection-cost problem. If it’s not, the daemon model is the structural fix.