Crawl4AI Assistant

Chrome Extension for Visual Web Scraping

You don't need Puppeteer. You need Crawl4AI Cloud.

One API call. JS-rendered. No browser cluster to maintain.

About Crawl4AI Assistant

Transform any website into structured data with just a few clicks! The Crawl4AI Assistant Chrome Extension provides three powerful tools for web scraping and data extraction.

🎉 NEW: Click2Crawl extracts data INSTANTLY without any LLM! Test your schema and see JSON results immediately in the browser!
🎯

Click2Crawl

Visual data extraction - click elements to build schemas instantly!

🔴

Script Builder (Alpha)

Record browser actions to create automation scripts

📝

Markdown Extraction (New!)

Convert any webpage content to clean markdown with Visual Text Mode

Quick Start

Installation
1

Download the Extension

Get the latest release from GitHub or use the button below

Download Extension (v1.3.0)
2

Load in Chrome

Navigate to chrome://extensions/ and enable Developer Mode

3

Load Unpacked

Click "Load unpacked" and select the extracted extension folder

Explore Our Tools

🎯

Click2Crawl

Visual data extraction

Available
🔴

Script Builder

Browser automation

Alpha
📝

Markdown Extraction

Content to markdown

New!

🎯 Click2Crawl

Click elements to build extraction schemas - No LLM needed!
1

Select Container

Click on any repeating element like product cards or articles. Use up/down navigation to fine-tune selection!

Container highlighted in green
2

Click Fields to Extract

Click on data fields inside the container - choose text, links, images, or attributes

Fields highlighted in pink
3

Test & Extract Data Instantly!

🎉 Click "Test Schema" to see extracted JSON immediately - no LLM or coding required!

See extracted JSON immediately
🚀 Zero LLM dependency
📊 Instant JSON extraction
🎯 Visual element selection
🐍 Export Python code
✨ Live preview
📥 Download results
📝 Export to markdown

🔴 Script Builder

Record actions, generate automation
1

Hit Record

Start capturing your browser interactions

Recording indicator
2

Interact Naturally

Click, type, scroll - everything is captured

🖱️ ⌨️ 📜
3

Export Script

Get JavaScript for Crawl4AI's js_code parameter

📝 Automation ready
Smart action grouping
Wait detection
Keyboard shortcuts
Alpha version

📝 Markdown Extraction

Convert webpage content to clean markdown "as you see"
1

Ctrl/Cmd + Click

Hold Ctrl/Cmd and click multiple elements you want to extract

🔢 Numbered selection badges
2

Enable Visual Text Mode

Extract content "as you see" - clean text without complex HTML structures

👁️ Visual Text Mode (As You See)
3

Export Clean Markdown

Get beautifully formatted markdown ready for documentation or LLMs

📄 Clean, readable output
Multi-select with Ctrl/Cmd
Visual Text Mode (As You See)
Clean markdown output
Export to Crawl4AI Cloud (soon)

See the Generated Code & Extracted Data

click2crawl_extraction.py
#!/usr/bin/env python3
"""
🎉 NO LLM NEEDED! Direct extraction with CSS selectors
Generated by Crawl4AI Chrome Extension - Click2Crawl
"""

import asyncio
import json
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

# The EXACT schema from Click2Crawl - no guessing!
EXTRACTION_SCHEMA = {
    "name": "Product Catalog",
    "baseSelector": "div.product-card",  # The container you selected
    "fields": [
        {
            "name": "title",
            "selector": "h3.product-title",
            "type": "text"
        },
        {
            "name": "price",
            "selector": "span.price",
            "type": "text"
        },
        {
            "name": "image",
            "selector": "img.product-img",
            "type": "attribute",
            "attribute": "src"
        },
        {
            "name": "link",
            "selector": "a.product-link",
            "type": "attribute",
            "attribute": "href"
        }
    ]
}

async def extract_data(url: str):
    # Direct extraction - no LLM API calls!
    extraction_strategy = JsonCssExtractionStrategy(schema=EXTRACTION_SCHEMA)
    
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            config=CrawlerRunConfig(extraction_strategy=extraction_strategy)
        )
        
        if result.success:
            data = json.loads(result.extracted_content)
            print(f"✅ Extracted {len(data)} items instantly!")
            
            # Save to file
            with open('products.json', 'w') as f:
                json.dump(data, f, indent=2)
            
            return data

# Run extraction on any similar page!
data = asyncio.run(extract_data("https://example.com/products"))

# 🎯 Result: Clean JSON data, no LLM costs, instant results!
extracted_data.json
// 🎉 Instantly extracted from the page - no coding required!
[
  {
    "title": "Wireless Bluetooth Headphones",
    "price": "$79.99",
    "image": "https://example.com/images/headphones-bt-01.jpg",
    "link": "/products/wireless-bluetooth-headphones"
  },
  {
    "title": "Smart Watch Pro 2024",
    "price": "$299.00",
    "image": "https://example.com/images/smartwatch-pro.jpg",
    "link": "/products/smart-watch-pro-2024"
  },
  {
    "title": "4K Webcam for Streaming",
    "price": "$149.99",
    "image": "https://example.com/images/webcam-4k.jpg",
    "link": "/products/4k-webcam-streaming"
  },
  {
    "title": "Mechanical Gaming Keyboard RGB",
    "price": "$129.99",
    "image": "https://example.com/images/keyboard-gaming.jpg",
    "link": "/products/mechanical-gaming-keyboard"
  },
  {
    "title": "USB-C Hub 7-in-1",
    "price": "$45.99",
    "image": "https://example.com/images/usbc-hub.jpg",
    "link": "/products/usb-c-hub-7in1"
  }
]
automation_script.py
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig

# JavaScript generated from your recorded actions
js_script = """
// Search for products
document.querySelector('button.search-toggle').click();
await new Promise(r => setTimeout(r, 500));

// Type search query
const searchInput = document.querySelector('input#search');
searchInput.value = 'wireless headphones';
searchInput.dispatchEvent(new Event('input', {bubbles: true}));

// Submit search
searchInput.dispatchEvent(new KeyboardEvent('keydown', {
    key: 'Enter', keyCode: 13, bubbles: true
}));

// Wait for results
await new Promise(r => setTimeout(r, 2000));

// Click first product
document.querySelector('.product-item:first-child').click();

// Wait for product page
await new Promise(r => setTimeout(r, 1000));

// Add to cart
document.querySelector('button.add-to-cart').click();
"""

async def automate_shopping():
    config = CrawlerRunConfig(
        js_code=js_script,
        wait_for="css:.cart-confirmation",
        screenshot=True
    )
    
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://shop.example.com",
            config=config
        )
        print(f"✓ Automation complete: {result.url}")
        return result

asyncio.run(automate_shopping())
extracted_content.md
# Extracted from Hacker News with Visual Text Mode 👁️

1. **Show HN: I built a tool to find and reach out to YouTubers** (hellosimply.io)
   84 points by erickim 2 hours ago | hide | 31 comments

2. **The 24 Hour Restaurant** (logicmag.io)
   124 points by helsinkiandrew 5 hours ago | hide | 52 comments

3. **Building a Better Bloom Filter in Rust** (carlmastrangelo.com)
   89 points by carlmastrangelo 3 hours ago | hide | 27 comments

---

### Article: The 24 Hour Restaurant

In New York City, the 24-hour restaurant is becoming extinct. What we lose when we can no longer eat whenever we want.

When I first moved to New York, I loved that I could get a full meal at 3 AM. Not just pizza or fast food, but a proper sit-down dinner with table service and a menu that ran for pages. The city that never sleeps had restaurants that matched its rhythm.

Today, finding a 24-hour restaurant in Manhattan requires genuine effort. The pandemic accelerated a decline that was already underway, but the roots go deeper: rising rents, changing labor laws, and shifting cultural patterns have all contributed to the death of round-the-clock dining.

---

### Product Review: Framework Laptop 16

**Specifications:**
- Display: 16" 2560×1600 165Hz
- Processor: AMD Ryzen 7 7840HS
- Memory: 32GB DDR5-5600
- Storage: 2TB NVMe Gen4
- Price: Starting at $1,399

**Pros:**
- Fully modular and repairable
- Excellent Linux support
- Great keyboard and trackpad
- Expansion card system

**Cons:**
- Battery life could be better
- Slightly heavier than competitors
- Fan noise under load

Crawl4AI Cloud

Your browser cluster without the cluster.

⚡ POST /crawl
🌐 JS-rendered pages
📊 Schema extraction built-in
💰 $0.001/page

See it extract your own data. Right now.

More Features Coming Soon

Roadmap

We're continuously expanding C4AI Assistant with powerful new features:

Direct

Direct Data Download

Skip the code generation entirely! Download extracted data directly from Click2Crawl as JSON or CSV files.

📊 One-click download • No Python needed • Multiple export formats
AI

Smart Field Detection

AI-powered field detection for Click2Crawl that automatically suggests the most likely data fields on any page.

🤖 Auto-detect fields • Smart naming • Pattern recognition

🚀 Stay tuned for updates! Follow our GitHub for the latest releases.