Crawl4AI Assistant

About Crawl4AI Assistant

Transform any website into structured data with just a few clicks! The Crawl4AI Assistant Chrome Extension provides three powerful tools for web scraping and data extraction.

🎉 NEW: Click2Crawl extracts data INSTANTLY without any LLM! Test your schema and see JSON results immediately in the browser!

🎯

Click2Crawl

Visual data extraction - click elements to build schemas instantly!

🔴

Script Builder (Alpha)

Record browser actions to create automation scripts

📝

Markdown Extraction (New!)

Convert any webpage content to clean markdown with Visual Text Mode

Quick Start

Installation

Download the Extension

Get the latest release from GitHub or use the button below

↓ Download Extension (v1.3.0)

Load in Chrome

Navigate to chrome://extensions/ and enable Developer Mode

Load Unpacked

Click "Load unpacked" and select the extracted extension folder

Explore Our Tools

🎯

Click2Crawl

Visual data extraction

Available

🔴

Script Builder

Browser automation

Alpha

📝

Markdown Extraction

Content to markdown

New!

🎯 Click2Crawl

Click elements to build extraction schemas - No LLM needed!

Select Container

Click on any repeating element like product cards or articles. Use up/down navigation to fine-tune selection!

■ Container highlighted in green

Click Fields to Extract

Click on data fields inside the container - choose text, links, images, or attributes

■ Fields highlighted in pink

Test & Extract Data Instantly!

🎉 Click "Test Schema" to see extracted JSON immediately - no LLM or coding required!

⚡ See extracted JSON immediately

🚀 Zero LLM dependency

📊 Instant JSON extraction

🎯 Visual element selection

🐍 Export Python code

✨ Live preview

📥 Download results

📝 Export to markdown

🔴 Script Builder

Record actions, generate automation

Hit Record

Start capturing your browser interactions

● Recording indicator

Interact Naturally

Click, type, scroll - everything is captured

🖱️ ⌨️ 📜

Export Script

Get JavaScript for Crawl4AI's js_code parameter

📝 Automation ready

Smart action grouping

Wait detection

Keyboard shortcuts

Alpha version

📝 Markdown Extraction

Convert webpage content to clean markdown "as you see"

Ctrl/Cmd + Click

Hold Ctrl/Cmd and click multiple elements you want to extract

🔢 Numbered selection badges

Enable Visual Text Mode

Extract content "as you see" - clean text without complex HTML structures

👁️ Visual Text Mode (As You See)

Export Clean Markdown

Get beautifully formatted markdown ready for documentation or LLMs

📄 Clean, readable output

Multi-select with Ctrl/Cmd

Visual Text Mode (As You See)

Clean markdown output

Export to Crawl4AI Cloud (soon)

See the Generated Code & Extracted Data

click2crawl_extraction.py

#!/usr/bin/env python3
"""
🎉 NO LLM NEEDED! Direct extraction with CSS selectors
Generated by Crawl4AI Chrome Extension - Click2Crawl
"""

import asyncio
import json
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

# The EXACT schema from Click2Crawl - no guessing!
EXTRACTION_SCHEMA = {
    "name": "Product Catalog",
    "baseSelector": "div.product-card",  # The container you selected
    "fields": [
        {
            "name": "title",
            "selector": "h3.product-title",
            "type": "text"
        },
        {
            "name": "price",
            "selector": "span.price",
            "type": "text"
        },
        {
            "name": "image",
            "selector": "img.product-img",
            "type": "attribute",
            "attribute": "src"
        },
        {
            "name": "link",
            "selector": "a.product-link",
            "type": "attribute",
            "attribute": "href"
        }
    ]
}

async def extract_data(url: str):
    # Direct extraction - no LLM API calls!
    extraction_strategy = JsonCssExtractionStrategy(schema=EXTRACTION_SCHEMA)
    
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            config=CrawlerRunConfig(extraction_strategy=extraction_strategy)
        )
        
        if result.success:
            data = json.loads(result.extracted_content)
            print(f"✅ Extracted {len(data)} items instantly!")
            
            # Save to file
            with open('products.json', 'w') as f:
                json.dump(data, f, indent=2)
            
            return data

# Run extraction on any similar page!
data = asyncio.run(extract_data("https://example.com/products"))

# 🎯 Result: Clean JSON data, no LLM costs, instant results!

extracted_data.json

// 🎉 Instantly extracted from the page - no coding required!
[
  {
    "title": "Wireless Bluetooth Headphones",
    "price": "$79.99",
    "image": "https://example.com/images/headphones-bt-01.jpg",
    "link": "/products/wireless-bluetooth-headphones"
  },
  {
    "title": "Smart Watch Pro 2024",
    "price": "$299.00",
    "image": "https://example.com/images/smartwatch-pro.jpg",
    "link": "/products/smart-watch-pro-2024"
  },
  {
    "title": "4K Webcam for Streaming",
    "price": "$149.99",
    "image": "https://example.com/images/webcam-4k.jpg",
    "link": "/products/4k-webcam-streaming"
  },
  {
    "title": "Mechanical Gaming Keyboard RGB",
    "price": "$129.99",
    "image": "https://example.com/images/keyboard-gaming.jpg",
    "link": "/products/mechanical-gaming-keyboard"
  },
  {
    "title": "USB-C Hub 7-in-1",
    "price": "$45.99",
    "image": "https://example.com/images/usbc-hub.jpg",
    "link": "/products/usb-c-hub-7in1"
  }
]

automation_script.py

import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig

# JavaScript generated from your recorded actions
js_script = """
// Search for products
document.querySelector('button.search-toggle').click();
await new Promise(r => setTimeout(r, 500));

// Type search query
const searchInput = document.querySelector('input#search');
searchInput.value = 'wireless headphones';
searchInput.dispatchEvent(new Event('input', {bubbles: true}));

// Submit search
searchInput.dispatchEvent(new KeyboardEvent('keydown', {
    key: 'Enter', keyCode: 13, bubbles: true
}));

// Wait for results
await new Promise(r => setTimeout(r, 2000));

// Click first product
document.querySelector('.product-item:first-child').click();

// Wait for product page
await new Promise(r => setTimeout(r, 1000));

// Add to cart
document.querySelector('button.add-to-cart').click();
"""

async def automate_shopping():
    config = CrawlerRunConfig(
        js_code=js_script,
        wait_for="css:.cart-confirmation",
        screenshot=True
    )
    
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://shop.example.com",
            config=config
        )
        print(f"✓ Automation complete: {result.url}")
        return result

asyncio.run(automate_shopping())

extracted_content.md

# Extracted from Hacker News with Visual Text Mode 👁️

1. **Show HN: I built a tool to find and reach out to YouTubers** (hellosimply.io)
   84 points by erickim 2 hours ago | hide | 31 comments

2. **The 24 Hour Restaurant** (logicmag.io)
   124 points by helsinkiandrew 5 hours ago | hide | 52 comments

3. **Building a Better Bloom Filter in Rust** (carlmastrangelo.com)
   89 points by carlmastrangelo 3 hours ago | hide | 27 comments

---

### Article: The 24 Hour Restaurant

In New York City, the 24-hour restaurant is becoming extinct. What we lose when we can no longer eat whenever we want.

When I first moved to New York, I loved that I could get a full meal at 3 AM. Not just pizza or fast food, but a proper sit-down dinner with table service and a menu that ran for pages. The city that never sleeps had restaurants that matched its rhythm.

Today, finding a 24-hour restaurant in Manhattan requires genuine effort. The pandemic accelerated a decline that was already underway, but the roots go deeper: rising rents, changing labor laws, and shifting cultural patterns have all contributed to the death of round-the-clock dining.

---

### Product Review: Framework Laptop 16

**Specifications:**
- Display: 16" 2560×1600 165Hz
- Processor: AMD Ryzen 7 7840HS
- Memory: 32GB DDR5-5600
- Storage: 2TB NVMe Gen4
- Price: Starting at $1,399

**Pros:**
- Fully modular and repairable
- Excellent Linux support
- Great keyboard and trackpad
- Expansion card system

**Cons:**
- Battery life could be better
- Slightly heavier than competitors
- Fan noise under load

Crawl4AI Cloud

Your browser cluster without the cluster.

⚡ POST /crawl

🌐 JS-rendered pages

📊 Schema extraction built-in

💰 $0.001/page

See it extract your own data. Right now.

🚀 Join C4AI Cloud Waiting List

Be among the first to experience the future of web scraping

Your Name

Email Address

Company (Optional)

What will you use Crawl4AI Cloud for?

More Features Coming Soon

Roadmap

We're continuously expanding C4AI Assistant with powerful new features:

Direct

Direct Data Download

Skip the code generation entirely! Download extracted data directly from Click2Crawl as JSON or CSV files.

📊 One-click download • No Python needed • Multiple export formats

Smart Field Detection

AI-powered field detection for Click2Crawl that automatically suggests the most likely data fields on any page.

🤖 Auto-detect fields • Smart naming • Pattern recognition

🚀 Stay tuned for updates! Follow our GitHub for the latest releases.

You don't need Puppeteer. You need Crawl4AI Cloud.

Click2Crawl

Script Builder (Alpha)

Markdown Extraction (New!)

Quick Start

Download the Extension

Load in Chrome

Load Unpacked

Explore Our Tools

Click2Crawl

Script Builder

Markdown Extraction

🎯 Click2Crawl

Select Container

Click Fields to Extract

Test & Extract Data Instantly!

🔴 Script Builder

Hit Record

Interact Naturally

Export Script

📝 Markdown Extraction

Ctrl/Cmd + Click

Enable Visual Text Mode

Export Clean Markdown

See the Generated Code & Extracted Data

Crawl4AI Cloud

🚀 Join C4AI Cloud Waiting List

📊 Extracted Data

Data Uploaded Successfully!

More Features Coming Soon

Direct Data Download

Smart Field Detection