Crawl4AI v0.8.0 Release Notes

Release Date: January 2026 Previous Version: v0.7.6 Status: Release Candidate


Highlights

  • Critical Security Fixes for Docker API deployment
  • 11 New Features including crash recovery, prefetch mode, and proxy improvements
  • Breaking Changes - see migration guide below

Breaking Changes

1. Docker API: Hooks Disabled by Default

What changed: Hooks are now disabled by default on the Docker API.

Why: Security fix for Remote Code Execution (RCE) vulnerability.

Who is affected: Users of the Docker API who use the hooks parameter in /crawl requests.

Migration:

# To re-enable hooks (only if you trust all API users):
export CRAWL4AI_HOOKS_ENABLED=true

2. Docker API: file:// URLs Blocked

What changed: The endpoints /execute_js, /screenshot, /pdf, and /html now reject file:// URLs.

Why: Security fix for Local File Inclusion (LFI) vulnerability.

Who is affected: Users who were reading local files via the Docker API.

Migration: Use the Python library directly for local file processing:

# Instead of API call with file:// URL, use library:
from crawl4ai import AsyncWebCrawler
async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(url="file:///path/to/file.html")


Security Fixes

Critical: Remote Code Execution via Hooks (CVE Pending)

Severity: CRITICAL (CVSS 10.0) Affected: Docker API deployment (all versions before v0.8.0) Vector: POST /crawl with malicious hooks parameter

Details: The __import__ builtin was available in hook code, allowing attackers to import os, subprocess, etc. and execute arbitrary commands.

Fix: 1. Removed __import__ from allowed builtins 2. Hooks disabled by default (CRAWL4AI_HOOKS_ENABLED=false)

High: Local File Inclusion via file:// URLs (CVE Pending)

Severity: HIGH (CVSS 8.6) Affected: Docker API deployment (all versions before v0.8.0) Vector: POST /execute_js (and other endpoints) with file:///etc/passwd

Details: API endpoints accepted file:// URLs, allowing attackers to read arbitrary files from the server.

Fix: URL scheme validation now only allows http://, https://, and raw: URLs.

Credits

Discovered by Neo by ProjectDiscovery (projectdiscovery.io) - December 2025


New Features

1. init_scripts Support for BrowserConfig

Pre-page-load JavaScript injection for stealth evasions.

config = BrowserConfig(
    init_scripts=[
        "Object.defineProperty(navigator, 'webdriver', {get: () => false})"
    ]
)

2. CDP Connection Improvements

  • WebSocket URL support (ws://, wss://)
  • Proper cleanup with cdp_cleanup_on_close=True
  • Browser reuse across multiple connections

3. Crash Recovery for Deep Crawl Strategies

All deep crawl strategies (BFS, DFS, Best-First) now support crash recovery:

from crawl4ai.deep_crawling import BFSDeepCrawlStrategy

strategy = BFSDeepCrawlStrategy(
    max_depth=3,
    resume_state=saved_state,  # Resume from checkpoint
    on_state_change=save_callback  # Persist state in real-time
)

4. PDF and MHTML for raw:/file:// URLs

Generate PDFs and MHTML from cached HTML content.

5. Screenshots for raw:/file:// URLs

Render cached HTML and capture screenshots.

6. base_url Parameter for CrawlerRunConfig

Proper URL resolution for raw: HTML processing:

config = CrawlerRunConfig(base_url='https://example.com')
result = await crawler.arun(url='raw:{html}', config=config)

7. Prefetch Mode for Two-Phase Deep Crawling

Fast link extraction without full page processing:

config = CrawlerRunConfig(prefetch=True)

8. Proxy Rotation and Configuration

Enhanced proxy rotation with sticky sessions support.

9. Proxy Support for HTTP Strategy

Non-browser crawler now supports proxies.

10. Browser Pipeline for raw:/file:// URLs

New process_in_browser parameter for browser operations on local content:

config = CrawlerRunConfig(
    process_in_browser=True,  # Force browser processing
    screenshot=True
)
result = await crawler.arun(url='raw:<html>...</html>', config=config)

11. Smart TTL Cache for Sitemap URL Seeder

Intelligent cache invalidation for sitemaps:

config = SeedingConfig(
    cache_ttl_hours=24,
    validate_sitemap_lastmod=True
)

Bug Fixes

raw: URL Parsing Truncates at # Character

Problem: CSS color codes like #eee were being truncated.

Before: raw:body{background:#eee} → body{background: After: raw:body{background:#eee} → body{background:#eee}

Caching System Improvements

Various fixes to cache validation and persistence.


Documentation Updates

  • Multi-sample schema generation documentation
  • URL seeder smart TTL cache parameters
  • Security documentation (SECURITY.md)

Upgrade Guide

From v0.7.x to v0.8.0

  1. Update the package:

    pip install --upgrade crawl4ai
    

  2. Docker API users:

  3. Hooks are now disabled by default
  4. If you need hooks: export CRAWL4AI_HOOKS_ENABLED=true
  5. file:// URLs no longer work on API (use library directly)

  6. Review security settings:

    # config.yml - recommended for production
    security:
      enabled: true
      jwt_enabled: true
    

  7. Test your integration before deploying to production

Breaking Change Checklist

  • [ ] Check if you use hooks parameter in API calls
  • [ ] Check if you use file:// URLs via the API
  • [ ] Update environment variables if needed
  • [ ] Review security configuration

Full Changelog

See CHANGELOG.md for complete version history.


Contributors

Thanks to all contributors who made this release possible.

Special thanks to Neo by ProjectDiscovery for responsible security disclosure.


For questions or issues, please open a GitHub Issue.


> Feedback