Skip to content

Fix state not showing up fast enough with TestDelayedEvents#865

Merged
MadLittleMods merged 1 commit intomainfrom
madlittlemods/more-robust-delayed-event-state-change
Apr 27, 2026
Merged

Fix state not showing up fast enough with TestDelayedEvents#865
MadLittleMods merged 1 commit intomainfrom
madlittlemods/more-robust-delayed-event-state-change

Conversation

@MadLittleMods
Copy link
Copy Markdown
Collaborator

@MadLittleMods MadLittleMods commented Apr 24, 2026

Fix state not showing up fast enough with TestDelayedEvents.

Follow-up to #830

As experienced when running this test against the worker-based Synapse setup we use alongside the Synapse Pro Rust apps, https://github.com/element-hq/synapse-rust-apps/actions/runs/24910122124/job/72949760158?pr=360 (https://github.com/element-hq/synapse-rust-apps/pull/360)

❌ TestDelayedEvents/delayed_state_events_are_kept_on_server_restart (10.12s)
      delayed_event_test.go:425: StopServer hs1
      delayed_event_test.go:429: StartServer hs1
      delayed_event_test.go:443: CSAPI.MustDo GET http://127.0.0.1:32978/_matrix/client/v3/rooms/%21MbDncghrqxTzEmQhCP:hs1/state/com.example.test/1 returned non-2xx code: 404 Not Found - body: {"errcode":"M_NOT_FOUND","error":"Event not found."}

Why does this happen?

Discussed in #865 (comment)

Dev notes

MSC4140 Synapse implementation added in element-hq/synapse#17326

Pull Request Checklist

Follow-up to #830

As experienced in https://github.com/element-hq/synapse-rust-apps/actions/runs/24910122124/job/72949760158?pr=360 (element-hq/synapse-rust-apps#360)

```
❌ TestDelayedEvents/delayed_state_events_are_kept_on_server_restart (10.12s)
      delayed_event_test.go:425: StopServer hs1
      delayed_event_test.go:429: StartServer hs1
      delayed_event_test.go:443: CSAPI.MustDo GET http://127.0.0.1:32978/_matrix/client/v3/rooms/%21MbDncghrqxTzEmQhCP:hs1/state/com.example.test/1 returned non-2xx code: 404 Not Found - body: {"errcode":"M_NOT_FOUND","error":"Event not found."}
```
@@ -371,6 +371,11 @@ func TestDelayedEvents(t *testing.T) {

Copy link
Copy Markdown
Collaborator Author

@MadLittleMods MadLittleMods Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As experienced when running this test against the worker-based Synapse setup we use alongside the Synapse Pro Rust apps, https://github.com/element-hq/synapse-rust-apps/actions/runs/24910122124/job/72949760158?pr=360 (https://github.com/element-hq/synapse-rust-apps/pull/360)

Error encountered:

❌ TestDelayedEvents/delayed_state_events_are_kept_on_server_restart (10.12s)
      delayed_event_test.go:425: StopServer hs1
      delayed_event_test.go:429: StartServer hs1
      delayed_event_test.go:443: CSAPI.MustDo GET http://127.0.0.1:32978/_matrix/client/v3/rooms/%21MbDncghrqxTzEmQhCP:hs1/state/com.example.test/1 returned non-2xx code: 404 Not Found - body: {"errcode":"M_NOT_FOUND","error":"Event not found."}

I haven't actually checked whether this PR fixes the problem there (just theory)

Copy link
Copy Markdown
Collaborator Author

@MadLittleMods MadLittleMods Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this happen?

I guess this happens because the worker that processes delayed events and updates the rooms state, isn't necessarily the one that serves state requests. Is this even true?

It looks like the main process in Synapse handles processing delayed events.

And it looks like /state requests can be handled by workers. Although, the regex there is slightly strange as ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state$ doesn't cover /state/{eventType}/{stateKey} requests (only /state). ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state/ is also there but listed as "event sending requests" probably because the GET vs PUT is the same path. In the actual worker config we are using in the workerized Synapse image for Complement, /state/{eventType}/{stateKey} isn't covered by any workers.

So I guess both are processed on the Synapse main process and this shouldn't be a problem? Perhaps this is a problem with Synapse itself 🤔

It shouldn't have anything to do with running with the Synapse rust apps as those currently only cover state federation servlets (/_matrix/federation/v1/state_ids/{roomId, /_matrix/federation/v1/state/{roomId}, /_matrix/federation/v1/event/{eventId}) which isn't the client API.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @AndrewFerr any insight?

Copy link
Copy Markdown
Contributor

@reivilibre reivilibre Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the main process in Synapse handles processing delayed events.

Even if so, the state will have to get persisted on the correct event_persister for the room and the main process might be serving stale data until the invalidation comes through. So that could explain the difference.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good catch on the event_persister wrench being thrown in here @reivilibre!

I'm going to merge as this is already a step forward and got review attention/approval but we should also update the other spots that try to check /state as a one-off (wherever user.Do(t, "GET", getPathForState(...)) is happening). The negative assertions are a bit tricky.

Copy link
Copy Markdown
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems correct to me

@MadLittleMods MadLittleMods marked this pull request as ready for review April 27, 2026 18:19
@MadLittleMods MadLittleMods requested review from a team as code owners April 27, 2026 18:19
@MadLittleMods MadLittleMods merged commit cea922e into main Apr 27, 2026
4 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/more-robust-delayed-event-state-change branch April 27, 2026 18:22
MadLittleMods added a commit that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants