Skip to content

feat: implement error hierarchy and fix sitemap deadlock#1852

Closed
dylanoik115-hue wants to merge 1 commit intoapify:masterfrom
dylanoik115-hue:feat-add-exceptions
Closed

feat: implement error hierarchy and fix sitemap deadlock#1852
dylanoik115-hue wants to merge 1 commit intoapify:masterfrom
dylanoik115-hue:feat-add-exceptions

Conversation

@dylanoik115-hue
Copy link
Copy Markdown

@dylanoik115-hue dylanoik115-hue commented Apr 21, 2026

🎯 Goal

Align the Python SDK with Crawlee JS standards and resolve intermittent crawler hangs.

🛠 Changes

  • Exception Hierarchy: Added NonRetryableError and CriticalError to src/crawlee/errors.py.
  • Logic Injection: Updated BasicCrawler to respect these errors, preventing wasted retry budgets.
  • Deadlock Fix: Replaced asyncio.sleep polling in SitemapRequestLoader with an asyncio.Event model for deterministic synchronization.

✅ Verification

Closes #1842, Closes #1739

@vdusek
Copy link
Copy Markdown
Collaborator

vdusek commented Apr 23, 2026

Hi @dylanoik115-hue, thanks for the contribution, but I'm going to close this PR.

A few blocking issues:

  • The deadlock fix for Possible deadlock in SitemapRequestLoader #1842 is not in the diff. The PR description says SitemapRequestLoader was changed from asyncio.sleep polling to an asyncio.Event model, but src/crawlee/request_loaders/_sitemap_request_loader.py is untouched (and btw. already uses asyncio.Event).
  • The added tests never run. tests/stress_test_fixes.py doesn't match pytest's default discovery (test_*.py / *_test.py), and it's in tests/ rather than tests/unit/.
  • Align exception hierarchy with Crawlee JS #1739 is explicitly a discussion issue ("not a final decision… needs to be discussed with the whole team before any implementation begins").

@vdusek vdusek closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible deadlock in SitemapRequestLoader Align exception hierarchy with Crawlee JS

4 participants