# Advanced Scraping Guide
Source: https://docs.firecrawl.dev/advanced-scraping-guide
Configure scrape options, browser actions, crawl, map, and the agent endpoint with Firecrawl's full API surface.
Reference for every option across Firecrawl's scrape, crawl, map, and agent endpoints.
## Basic scraping
To scrape a single page and get clean markdown content, use the `/scrape` endpoint.
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape("https://firecrawl.dev")
print(doc.markdown)
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import { Firecrawl } from 'firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
const doc = await firecrawl.scrape('https://firecrawl.dev');
console.log(doc.markdown);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://docs.firecrawl.dev"
}'
```
## Scraping PDFs
Firecrawl supports PDFs. Use the `parsers` option (e.g., `parsers: ["pdf"]`) when you want to ensure PDF parsing. You can control the parsing strategy with the `mode` option:
* **`auto`** (default) — attempts fast text-based extraction first, then falls back to OCR if needed.
* **`fast`** — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages.
* **`ocr`** — forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page.
`{ type: "pdf" }` and `"pdf"` both default to `mode: "auto"`.
```json theme={null}
"parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }]
```
## Scrape options
When using the `/scrape` endpoint, you can customize the request with the following options.
### Formats (`formats`)
The `formats` array controls which output types the scraper returns. Default: `["markdown"]`.
**String formats**: pass the name directly (e.g. `"markdown"`).
| Format | Description |
| ---------- | ---------------------------------------------------------------------------- |
| `markdown` | Page content converted to clean Markdown. |
| `html` | Processed HTML with unnecessary elements removed. |
| `rawHtml` | Original HTML exactly as returned by the server. |
| `links` | All links found on the page. |
| `images` | All images found on the page. |
| `summary` | An LLM-generated summary of the page content. |
| `branding` | Extracts brand identity (colors, fonts, typography, spacing, UI components). |
**Object formats**: pass an object with `type` and additional options.
| Format | Options | Description |
| ---------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `json` | `prompt?: string`, `schema?: object` | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters). |
| `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. |
| `changeTracking` | `modes?: ("json" \| "git-diff")[]`, `tag?: string`, `schema?: object`, `prompt?: string` | Track changes between scrapes. Requires `"markdown"` to also be in the formats array. |
| `attributes` | `selectors: [{ selector: string, attribute: string }]` | Extract specific HTML attributes from elements matching CSS selectors. |
### Content filtering
These parameters control which parts of the page appear in the output. When `onlyMainContent` is `true` (the default), boilerplate (nav, footer, etc.) is stripped. `includeTags` and `excludeTags` are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set `onlyMainContent: false` to use the full page as the starting point for tag filtering.
| Parameter | Type | Default | Description |
| ----------------- | --------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `onlyMainContent` | `boolean` | `true` | Return only the main content. Set `false` for the full page. |
| `includeTags` | `array` | — | CSS selectors to include — tags, classes, IDs, or attribute selectors (e.g. `["h1", "p", ".main-content", "[data-testid=\"main\"]"]`). |
| `excludeTags` | `array` | — | CSS selectors to exclude — tags, classes, IDs, or attribute selectors (e.g. `["#ad", "#footer", "[role=\"banner\"]"]`). |
### Timing and cache
| Parameter | Type | Default | Description |
| --------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------ |
| `waitFor` | `integer` (ms) | `0` | Extra wait time before scraping, on top of smart-wait. Use sparingly. |
| `maxAge` | `integer` (ms) | `172800000` | Return a cached version if fresher than this value (default is 2 days). Set `0` to always fetch fresh. |
| `timeout` | `integer` (ms) | `60000` | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second). |
### PDF parsing
| Parameter | Type | Default | Description |
| --------- | ------- | --------- | -------------------------------------------------------------------------------- |
| `parsers` | `array` | `["pdf"]` | Controls PDF processing. `[]` to skip parsing and return base64 (1 credit flat). |
```json theme={null}
{ "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 }
```
| Property | Type | Default | Description |
| ---------- | --------------------------- | ------------ | ------------------------------------------------------------------------------------- |
| `type` | `"pdf"` | *(required)* | Parser type. |
| `mode` | `"fast" \| "auto" \| "ocr"` | `"auto"` | `fast`: text-based extraction only. `auto`: fast with OCR fallback. `ocr`: force OCR. |
| `maxPages` | `integer` | — | Cap the number of pages to parse. |
### Actions
Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all `wait` actions and `waitFor` must not exceed 60 seconds.
| Action | Parameters | Description |
| ------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `wait` | `milliseconds?: number`, `selector?: string` | Wait for a fixed duration **or** until an element is visible (provide one, not both). When using `selector`, times out after 30 seconds. |
| `click` | `selector: string`, `all?: boolean` | Click an element matching the CSS selector. Set `all: true` to click every match. |
| `write` | `text: string` | Type text into the currently focused field. You must focus the element with a `click` action first. |
| `press` | `key: string` | Press a keyboard key (e.g. `"Enter"`, `"Tab"`, `"Escape"`). |
| `scroll` | `direction?: "up" \| "down"`, `selector?: string` | Scroll the page or a specific element. Direction defaults to `"down"`. |
| `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max viewport resolution is 7680×4320. |
| `scrape` | *(none)* | Capture the current page HTML at this point in the action sequence. |
| `executeJavascript` | `script: string` | Run JavaScript code in the page. Return values are available in the `actions.javascriptReturns` array of the response. |
| `pdf` | `format?: string`, `landscape?: boolean`, `scale?: number` | Generate a PDF. Supported formats: `"A0"` through `"A6"`, `"Letter"`, `"Legal"`, `"Tabloid"`, `"Ledger"`. Defaults to `"Letter"`. |
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')
doc = firecrawl.scrape('https://example.com', {
'actions': [
{ 'type': 'wait', 'milliseconds': 1000 },
{ 'type': 'click', 'selector': '#accept' },
{ 'type': 'scroll', 'direction': 'down' },
{ 'type': 'click', 'selector': '#q' },
{ 'type': 'write', 'text': 'firecrawl' },
{ 'type': 'press', 'key': 'Enter' },
{ 'type': 'wait', 'milliseconds': 2000 }
],
'formats': ['markdown']
})
print(doc.markdown)
```
```js Node theme={null}
import { Firecrawl } from 'firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
const doc = await firecrawl.scrape('https://example.com', {
actions: [
{ type: 'wait', milliseconds: 1000 },
{ type: 'click', selector: '#accept' },
{ type: 'scroll', direction: 'down' },
{ type: 'click', selector: '#q' },
{ type: 'write', text: 'firecrawl' },
{ type: 'press', key: 'Enter' },
{ type: 'wait', milliseconds: 2000 }
],
formats: ['markdown']
});
console.log(doc.markdown);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "wait", "milliseconds": 1000 },
{ "type": "click", "selector": "#accept" },
{ "type": "scroll", "direction": "down" },
{ "type": "click", "selector": "#q" },
{ "type": "write", "text": "firecrawl" },
{ "type": "press", "key": "Enter" },
{ "type": "wait", "milliseconds": 2000 }
],
"formats": ["markdown"]
}'
```
#### Action execution notes
* **Write** requires a preceding `click` to focus the target element.
* **Scroll** accepts an optional `selector` to scroll a specific element instead of the page.
* **Wait** accepts either `milliseconds` (fixed delay) or `selector` (wait until visible).
* Actions run **sequentially**: each step completes before the next begins.
* Actions are **not supported for PDFs**. If the URL resolves to a PDF the request will fail.
#### Advanced action examples
**Taking a screenshot:**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "click", "selector": "#load-more" },
{ "type": "wait", "milliseconds": 1000 },
{ "type": "screenshot", "fullPage": true, "quality": 80 }
]
}'
```
**Clicking multiple elements:**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "click", "selector": ".expand-button", "all": true },
{ "type": "wait", "milliseconds": 500 }
],
"formats": ["markdown"]
}'
```
**Generating a PDF:**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "pdf", "format": "A4", "landscape": false }
]
}'
```
**Executing JavaScript (e.g. extracting embedded page data):**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "executeJavascript", "script": "document.querySelector(\"#__NEXT_DATA__\").textContent" }
],
"formats": ["markdown"]
}'
```
The return value of each `executeJavascript` action is captured in the `actions.javascriptReturns` array of the response.
### Full scrape example
The following request combines multiple scrape options:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"formats": [
"markdown",
"links",
"html",
"rawHtml",
{ "type": "screenshot", "fullPage": true, "quality": 80 }
],
"includeTags": ["h1", "p", "a", ".main-content"],
"excludeTags": ["#ad", "#footer"],
"onlyMainContent": false,
"waitFor": 1000,
"timeout": 15000,
"parsers": ["pdf"]
}'
```
This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to `
);
};
export default ChatBotDemo;
```
### Understanding the Frontend
The frontend uses AI Elements components to provide a complete chat interface:
**Key Features:**
* **Conversation Display**: The `Conversation` component automatically handles message scrolling and display
* **Message Rendering**: Each message part is rendered based on its type (text, reasoning, tool calls)
* **Tool Visualization**: Tool calls are displayed with collapsible sections showing inputs and outputs
* **Interactive Controls**: Users can toggle web search, select models, and attach files
* **Message Actions**: Copy and retry actions for assistant messages
To ensure the markdown from the LLM is correctly rendered, add the following import to your `app/globals.css` file:
```css theme={null}
@source "../node_modules/streamdown/dist/index.js";
```
This imports the necessary styles for rendering markdown content in the message responses.
Create the chat API endpoint at `app/api/chat/route.ts`. This route will handle incoming messages and stream responses from the AI.
```typescript theme={null}
import { streamText, UIMessage, convertToModelMessages } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
// Allow streaming responses up to 5 minutes
export const maxDuration = 300;
export async function POST(req: Request) {
const {
messages,
model,
webSearch,
}: {
messages: UIMessage[];
model: string;
webSearch: boolean;
} = await req.json();
const result = streamText({
model: openai(model),
messages: convertToModelMessages(messages),
system:
"You are a helpful assistant that can answer questions and help with tasks.",
});
// send sources and reasoning back to the client
return result.toUIMessageStreamResponse({
sendSources: true,
sendReasoning: true,
});
}
```
This basic route:
* Receives messages from the frontend
* Uses the OpenAI model selected by the user
* Streams responses back to the client
* Doesn't include tools yet - we'll add those next
Create a `.env.local` file in your project root:
```bash theme={null}
touch .env.local
```
Add your OpenAI API key:
```env theme={null}
OPENAI_API_KEY=sk-your-openai-api-key
```
The `OPENAI_API_KEY` is required for the AI model to function.
Now you can test the AI SDK chatbot without Firecrawl integration. Start the development server:
```bash theme={null}
npm run dev
```
Open [localhost:3000](http://localhost:3000) in your browser and test the basic chat functionality. The assistant should respond to messages, but won't have web scraping or search capabilities yet.
Now let's enhance the assistant with web scraping and search capabilities using Firecrawl.
### Install Firecrawl SDK
Firecrawl converts websites into LLM-ready formats with scraping and search capabilities:
```bash theme={null}
npm i @mendable/firecrawl-js
```
### Create the Tools File
Create a `lib` folder and add a `tools.ts` file inside it:
```bash theme={null}
mkdir lib && touch lib/tools.ts
```
Add the following code to define the web scraping and search tools:
```typescript lib/tools.ts theme={null}
import FirecrawlApp from "@mendable/firecrawl-js";
import { tool } from "ai";
import { z } from "zod";
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
export const scrapeWebsiteTool = tool({
description: 'Scrape content from any website URL',
inputSchema: z.object({
url: z.string().url().describe('The URL to scrape')
}),
execute: async ({ url }) => {
console.log('Scraping:', url);
const result = await firecrawl.scrape(url, {
formats: ['markdown'],
onlyMainContent: true,
timeout: 30000
});
console.log('Scraped content preview:', result.markdown?.slice(0, 200) + '...');
return { content: result.markdown };
}
});
export const searchWebTool = tool({
description: 'Search the web using Firecrawl',
inputSchema: z.object({
query: z.string().describe('The search query'),
limit: z.number().optional().describe('Number of results'),
location: z.string().optional().describe('Location for localized results'),
tbs: z.string().optional().describe('Time filter (qdr:h, qdr:d, qdr:w, qdr:m, qdr:y)'),
sources: z.array(z.enum(['web', 'news', 'images'])).optional().describe('Result types'),
categories: z.array(z.enum(['github', 'research', 'pdf'])).optional().describe('Filter categories'),
}),
execute: async ({ query, limit, location, tbs, sources, categories }) => {
console.log('Searching:', query);
const response = await firecrawl.search(query, {
...(limit && { limit }),
...(location && { location }),
...(tbs && { tbs }),
...(sources && { sources }),
...(categories && { categories }),
}) as { web?: Array<{ title?: string; url?: string; description?: string }> };
const results = (response.web || []).map((item) => ({
title: item.title || item.url || 'Untitled',
url: item.url || '',
description: item.description || '',
}));
console.log('Search results:', results.length);
return { results };
},
});
```
### Understanding the Tools
**Scrape Website Tool:**
* Accepts a URL as input (validated by Zod schema)
* Uses Firecrawl's `scrape` method to fetch the page as markdown
* Extracts only the main content to reduce token usage
* Returns the scraped content for the AI to analyze
**Search Web Tool:**
* Accepts a search query with optional filters
* Uses Firecrawl's `search` method to find relevant web pages
* Supports advanced filters like location, time range, and content categories
* Returns structured results with titles, URLs, and descriptions
Learn more about tools: [ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools).
Now update your `app/api/chat/route.ts` to include the Firecrawl tools we just created.
```typescript theme={null}
import { streamText, UIMessage, stepCountIs, convertToModelMessages } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
import { scrapeWebsiteTool, searchWebTool } from "@/lib/tools";
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
export const maxDuration = 300;
export async function POST(req: Request) {
const {
messages,
model,
webSearch,
}: {
messages: UIMessage[];
model: string;
webSearch: boolean;
} = await req.json();
const result = streamText({
model: openai(model),
messages: convertToModelMessages(messages),
system:
"You are a helpful assistant that can answer questions and help with tasks.",
// Add the Firecrawl tools here
tools: {
scrapeWebsite: scrapeWebsiteTool,
searchWeb: searchWebTool,
},
stopWhen: stepCountIs(5),
toolChoice: webSearch ? "auto" : "none",
});
return result.toUIMessageStreamResponse({
sendSources: true,
sendReasoning: true,
});
}
```
The key changes from the basic route:
* Import `stepCountIs` from the AI SDK
* Import the Firecrawl tools from `@/lib/tools`
* Add the `tools` object with both `scrapeWebsite` and `searchWeb` tools
* Add `stopWhen: stepCountIs(5)` to limit execution steps
* Set `toolChoice` to "auto" when web search is enabled, "none" otherwise
Learn more about `streamText`: [ai-sdk.dev/docs/reference/ai-sdk-core/stream-text](https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text).
Update your `.env.local` file to include your Firecrawl API key:
```env theme={null}
OPENAI_API_KEY=sk-your-openai-api-key
FIRECRAWL_API_KEY=fc-your-firecrawl-api-key
```
Get your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev).
Restart your development server:
```bash theme={null}
npm run dev
```
Open [localhost:3000](http://localhost:3000) and test the enhanced assistant:
1. Toggle the "Search" button to enable web search
2. Ask: "What are the latest features from firecrawl.dev?"
3. Watch as the AI calls the `searchWeb` or `scrapeWebsite` tool
4. See the tool execution in the UI with inputs and outputs
5. Read the AI's analysis based on the scraped data
## How It Works
### Message Flow
1. **User sends a message**: The user types a question and clicks submit
2. **Frontend sends request**: `useChat` sends the message to `/api/chat` with the selected model and web search setting
3. **Backend processes message**: The API route receives the message and calls `streamText`
4. **AI decides on tools**: The model analyzes the question and decides whether to use `scrapeWebsite` or `searchWeb` (only if web search is enabled)
5. **Tools execute**: If tools are called, Firecrawl scrapes or searches the web
6. **AI generates response**: The model analyzes tool results and generates a natural language response
7. **Frontend displays results**: The UI shows tool calls and the final response in real-time
### Tool Calling Process
The AI SDK's tool calling system ([ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools)) works as follows:
1. The model receives the user's message and available tool descriptions
2. If the model determines a tool is needed, it generates a tool call with parameters
3. The SDK executes the tool function with those parameters
4. The tool result is sent back to the model
5. The model uses the result to generate its final response
This all happens automatically within a single `streamText` call, with results streaming to the frontend in real-time.
## Key Features
### Model Selection
The application supports multiple OpenAI models:
* **GPT-5 Mini (Thinking)**: Recent OpenAI model with advanced reasoning capabilities
* **GPT-4o Mini**: Fast and cost-effective model
Users can switch between models using the dropdown selector.
### Web Search Toggle
The Search button controls whether the AI can use Firecrawl tools:
* **Enabled**: AI can call `scrapeWebsite` and `searchWeb` tools as needed
* **Disabled**: AI responds only with its training knowledge
This gives users control over when to use web data versus the model's built-in knowledge.
## Customization Ideas
### Add More Tools
Extend the assistant with additional tools:
* Database lookups for internal company data
* CRM integration to fetch customer information
* Email sending capabilities
* Document generation
Each tool follows the same pattern: define a schema with Zod, implement the execute function, and register it in the `tools` object.
### Change the AI Model
Swap OpenAI for another provider:
```typescript theme={null}
import { anthropic } from "@ai-sdk/anthropic";
const result = streamText({
model: anthropic("claude-4.5-sonnet"),
// ... rest of config
});
```
The AI SDK supports 20+ providers with the same API. Learn more: [ai-sdk.dev/docs/foundations/providers-and-models](https://ai-sdk.dev/docs/foundations/providers-and-models).
### Customize the UI
AI Elements components are built on shadcn/ui, so you can:
* Modify component styles in the component files
* Add new variants to existing components
* Create custom components that match the design system
## Best Practices
1. **Use appropriate tools**: Choose `searchWeb` to find relevant pages first, `scrapeWebsite` for single pages, or let the AI decide
2. **Monitor API usage**: Track your Firecrawl and OpenAI API usage to avoid unexpected costs
3. **Handle errors gracefully**: The tools include error handling, but consider adding user-facing error messages
4. **Optimize performance**: Use streaming to provide immediate feedback and consider caching frequently accessed content
5. **Set reasonable limits**: The `stopWhen: stepCountIs(5)` prevents excessive tool calls and runaway costs
***
## Related Resources
Explore the AI SDK for building AI-powered applications with streaming, tool
calling, and multi-provider support.
Pre-built UI components for AI applications built on shadcn/ui.
# Building a Brand Style Guide Generator with Firecrawl
Source: https://docs.firecrawl.dev/developer-guides/cookbooks/brand-style-guide-generator-cookbook
Generate professional PDF brand style guides by extracting design systems from any website using Firecrawl's branding format
Build a brand style guide generator that automatically extracts colors, typography, spacing, and visual identity from any website and compiles it into a professional PDF document.
## What You'll Build
A Node.js application that takes any website URL, extracts its complete brand identity using Firecrawl's branding format, and generates a polished PDF style guide with:
* Color palette with hex values
* Typography system (fonts, sizes, weights)
* Spacing and layout specifications
* Logo and brand imagery
* Theme information (light/dark mode)
## Prerequisites
* Node.js 18 or later installed
* A Firecrawl API key from [firecrawl.dev](https://firecrawl.dev)
* Basic knowledge of TypeScript and Node.js
Start by creating a new directory for your project and initializing it:
```bash theme={null}
mkdir brand-style-guide-generator && cd brand-style-guide-generator
npm init -y
```
Update your `package.json` to use ES modules:
```json package.json theme={null}
{
"name": "brand-style-guide-generator",
"version": "1.0.0",
"type": "module",
"scripts": {
"start": "npx tsx index.ts"
}
}
```
Install the required packages for web scraping and PDF generation:
```bash theme={null}
npm i @mendable/firecrawl-js pdfkit
npm i -D typescript tsx @types/node @types/pdfkit
```
These packages provide:
* `@mendable/firecrawl-js`: Firecrawl SDK for extracting brand identity from websites
* `pdfkit`: PDF document generation library
* `tsx`: TypeScript execution for Node.js
Create the main application file at `index.ts`. This script extracts brand identity from a URL and generates a professional PDF style guide.
```typescript index.ts theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import PDFDocument from "pdfkit";
import fs from "fs";
const API_KEY = "fc-YOUR-API-KEY";
const URL = "https://firecrawl.dev";
async function main() {
const fc = new Firecrawl({ apiKey: API_KEY });
const { branding: b } = (await fc.scrape(URL, { formats: ["branding"] })) as any;
const doc = new PDFDocument({ size: "A4", margin: 50 });
doc.pipe(fs.createWriteStream("brand-style-guide.pdf"));
// Fetch logo (PNG/JPG only)
let logoImg: Buffer | null = null;
try {
const logoUrl = b.images?.favicon || b.images?.ogImage;
if (logoUrl?.match(/\.(png|jpg|jpeg)$/i)) {
const res = await fetch(logoUrl);
logoImg = Buffer.from(await res.arrayBuffer());
}
} catch {}
// Header with logo
doc.rect(0, 0, 595, 120).fill(b.colors?.primary || "#333");
const titleX = logoImg ? 130 : 50;
if (logoImg) doc.image(logoImg, 50, 30, { height: 60 });
doc.fontSize(36).fillColor("#fff").text("Brand Style Guide", titleX, 38);
doc.fontSize(14).text(URL, titleX, 80);
// Colors
doc.fontSize(18).fillColor("#333").text("Colors", 50, 160);
const colors = Object.entries(b.colors || {}).filter(([, v]) => typeof v === "string" && (v as string).startsWith("#"));
colors.forEach(([k, v], i) => {
const x = 50 + i * 100;
doc.rect(x, 195, 80, 80).fill(v as string);
doc.fontSize(10).fillColor("#333").text(k, x, 282, { width: 80, align: "center" });
doc.fontSize(9).fillColor("#888").text(v as string, x, 296, { width: 80, align: "center" });
});
// Typography
doc.fontSize(18).fillColor("#333").text("Typography", 50, 340);
doc.fontSize(13).fillColor("#444");
doc.text(`Primary Font: ${b.typography?.fontFamilies?.primary || "—"}`, 50, 370);
doc.text(`Heading Font: ${b.typography?.fontFamilies?.heading || "—"}`, 50, 392);
doc.fontSize(12).fillColor("#666").text("Font Sizes:", 50, 422);
Object.entries(b.typography?.fontSizes || {}).forEach(([k, v], i) => {
doc.text(`${k.toUpperCase()}: ${v}`, 70, 445 + i * 22);
});
// Spacing & Theme
doc.fontSize(18).fillColor("#333").text("Spacing & Theme", 320, 340);
doc.fontSize(13).fillColor("#444");
doc.text(`Base Unit: ${b.spacing?.baseUnit}px`, 320, 370);
doc.text(`Border Radius: ${b.spacing?.borderRadius}`, 320, 392);
doc.text(`Color Scheme: ${b.colorScheme}`, 320, 414);
doc.end();
console.log("Generated: brand-style-guide.pdf");
}
main();
```
For this simple project, the API key is placed directly in the code. If you plan to push this to GitHub or share it with others, move the key to a `.env` file and use `process.env.FIRECRAWL_API_KEY` instead.
Replace `fc-YOUR-API-KEY` with your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev).
### Understanding the Code
**Key Components:**
* **Firecrawl Branding Format**: The `branding` format extracts comprehensive brand identity including colors, typography, spacing, and images
* **PDFKit Document**: Creates a professional A4 PDF with proper margins and sections
* **Color Swatches**: Renders visual color blocks with hex values and semantic names
* **Typography Display**: Shows font families and sizes in an organized layout
* **Spacing & Theme**: Documents the design system's spacing units and color scheme
Run the script to generate a brand style guide:
```bash theme={null}
npm start
```
The script will:
1. Extract the brand identity from the target URL using Firecrawl's branding format
2. Generate a PDF named `brand-style-guide.pdf`
3. Save it in your project directory
To generate a style guide for a different website, simply change the `URL` constant in `index.ts`.
## How It Works
### Extraction Process
1. **URL Input**: The generator receives a target website URL
2. **Firecrawl Scrape**: Calls the `/scrape` endpoint with the `branding` format
3. **Brand Analysis**: Firecrawl analyzes the page's CSS, fonts, and visual elements
4. **Data Return**: Returns a structured `BrandingProfile` object with all design tokens
### PDF Generation Process
1. **Header Creation**: Generates a colored header using the primary brand color
2. **Logo Fetch**: Downloads and embeds the logo or favicon if available
3. **Color Palette**: Renders each color as a visual swatch with metadata
4. **Typography Section**: Documents font families, sizes, and weights
5. **Spacing Info**: Includes base units, border radius, and theme mode
### Branding Profile Structure
The [branding format](https://docs.firecrawl.dev/features/scrape#%2Fscrape-with-branding-endpoint) returns detailed brand information:
```typescript theme={null}
{
colorScheme: "dark" | "light",
logo: "https://example.com/logo.svg",
colors: {
primary: "#FF6B35",
secondary: "#004E89",
accent: "#F77F00",
background: "#1A1A1A",
textPrimary: "#FFFFFF",
textSecondary: "#B0B0B0"
},
typography: {
fontFamilies: { primary: "Inter", heading: "Inter", code: "Roboto Mono" },
fontSizes: { h1: "48px", h2: "36px", body: "16px" },
fontWeights: { regular: 400, medium: 500, bold: 700 }
},
spacing: {
baseUnit: 8,
borderRadius: "8px"
},
images: {
logo: "https://example.com/logo.svg",
favicon: "https://example.com/favicon.ico"
}
}
```
## Key Features
### Automatic Color Extraction
The generator identifies and categorizes all brand colors:
* **Primary & Secondary**: Main brand colors
* **Accent**: Highlight and CTA colors
* **Background & Text**: UI foundation colors
* **Semantic Colors**: Success, warning, error states
### Typography Documentation
Captures the complete type system:
* **Font Families**: Primary, heading, and monospace fonts
* **Size Scale**: All heading and body text sizes
* **Font Weights**: Available weight variations
### Visual Output
The PDF includes:
* Color-coded header matching the brand
* Embedded logo when available
* Professional layout with clear hierarchy
* Metadata footer with generation date
## Customization Ideas
### Add Component Documentation
Extend the generator to include UI component styles:
```typescript theme={null}
// Add after the Spacing & Theme section
if (b.components) {
doc.addPage();
doc.fontSize(20).fillColor("#333").text("UI Components", 50, 50);
// Document button styles
if (b.components.buttonPrimary) {
const btn = b.components.buttonPrimary;
doc.fontSize(14).text("Primary Button", 50, 90);
doc.rect(50, 110, 120, 40).fill(btn.background);
doc.fontSize(12).fillColor(btn.textColor).text("Button", 50, 120, { width: 120, align: "center" });
}
}
```
### Export Multiple Formats
Add JSON export alongside the PDF:
```typescript theme={null}
// Add before doc.end()
fs.writeFileSync("brand-data.json", JSON.stringify(b, null, 2));
```
### Batch Processing
Generate guides for multiple websites:
```typescript theme={null}
const websites = [
"https://stripe.com",
"https://linear.app",
"https://vercel.com"
];
for (const site of websites) {
const { branding } = await fc.scrape(site, { formats: ["branding"] }) as any;
// Generate PDF for each site...
}
```
### Custom PDF Themes
Create different PDF styles based on the extracted theme:
```typescript theme={null}
const isDarkMode = b.colorScheme === "dark";
const headerBg = isDarkMode ? b.colors?.background : b.colors?.primary;
const textColor = isDarkMode ? "#fff" : "#333";
```
## Best Practices
1. **Handle Missing Data**: Not all websites expose complete branding information. Always provide fallback values for missing properties.
2. **Respect Rate Limits**: When batch processing multiple sites, add delays between requests to respect Firecrawl's rate limits.
3. **Cache Results**: Store extracted branding data to avoid repeated API calls for the same site.
4. **Image Format Handling**: Some logos may be in formats PDFKit doesn't support (like SVG). Consider adding format conversion or graceful fallbacks.
5. **Error Handling**: Wrap the generation process in try-catch blocks and provide meaningful error messages.
***
## Related Resources
Learn more about the branding format and all available properties you can extract.
Complete API reference for the scrape endpoint with all format options.
Learn more about PDFKit for advanced PDF customization options.
Process multiple URLs efficiently with batch scraping.
# Full-Stack Templates
Source: https://docs.firecrawl.dev/developer-guides/examples
Explore real-world examples and tutorials for Firecrawl
## Official Examples
**Use Case**: Build a RAG-powered chatbot with live web data
* [GitHub Repo](https://github.com/firecrawl/open-lovable)
**Use Case**: Build and deploy AI agents with web scraping capabilities
* [GitHub Repo](https://github.com/firecrawl/open-agent-builder)
**Use Case**: AI search engine with real-time citations, streaming responses, and live data
* [GitHub Repo](https://github.com/firecrawl/fireplexity)
**Use Case**: SaaS starter with brand monitoring, authentication, and billing
* [GitHub Repo](https://github.com/firecrawl/firegeo)
**Use Case**: AI-powered data enrichment tool that transforms emails into rich datasets
* [GitHub Repo](https://github.com/firecrawl/fire-enrich)
**Use Case**: AI-powered deep research tool with validated answers and citations
* [GitHub Repo](https://github.com/firecrawl/firesearch)
**Use Case**: Instantly create AI chatbots for any website with RAG-powered search
* [GitHub Repo](https://github.com/firecrawl/firestarter)
**Use Case**: Transform any website into AI-ready structured data
* [GitHub Repo](https://github.com/firecrawl/ai-ready-website)
**Use Case**: AI-powered research assistant that gathers and analyzes web data
* [GitHub Repo](https://github.com/firecrawl/open-researcher)
# Anthropic
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/anthropic
Use Firecrawl with Claude for web scraping + AI workflows
Integrate Firecrawl with Claude to build AI applications powered by web data.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js @anthropic-ai/sdk zod zod-to-json-schema
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
ANTHROPIC_API_KEY=your_anthropic_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using Claude.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import Anthropic from '@anthropic-ai/sdk';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [
{ role: 'user', content: `Summarize in 100 words: ${scrapeResult.markdown}` }
]
});
console.log('Response:', message);
```
## Tool Use
This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { Anthropic } from '@anthropic-ai/sdk';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
const ScrapeArgsSchema = z.object({
url: z.string()
});
console.log("Sending user message to Claude and requesting tool use if necessary...");
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
tools: [{
name: 'scrape_website',
description: 'Scrape and extract markdown content from a website URL',
input_schema: zodToJsonSchema(ScrapeArgsSchema, 'ScrapeArgsSchema') as any
}],
messages: [{
role: 'user',
content: 'What is Firecrawl? Check firecrawl.dev'
}]
});
const toolUse = response.content.find(block => block.type === 'tool_use');
if (toolUse && toolUse.type === 'tool_use') {
const input = toolUse.input as { url: string };
console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`);
const result = await firecrawl.scrape(input.url, {
formats: ['markdown']
});
console.log(`Scraped content preview: ${result.markdown?.substring(0, 300)}...`);
// Continue with the conversation or process the scraped content as needed
}
```
## Structured Extraction
This example demonstrates how to use Claude to extract structured data from scraped website content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string().optional(),
description: z.string().optional()
});
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown'],
onlyMainContent: true
});
const prompt = `Extract company information from this website content.
Output ONLY valid JSON in this exact format (no markdown, no explanation):
{
"name": "Company Name",
"industry": "Industry",
"description": "One sentence description"
}
Website content:
${scrapeResult.markdown}`;
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [
{ role: 'user', content: prompt },
{ role: 'assistant', content: '{' }
]
});
const textBlock = message.content.find(block => block.type === 'text');
if (textBlock && textBlock.type === 'text') {
const jsonText = '{' + textBlock.text;
const companyInfo = CompanyInfoSchema.parse(JSON.parse(jsonText));
console.log(companyInfo);
}
```
For more examples, check the [Claude documentation](https://docs.anthropic.com/claude/docs).
# ElevenAgents
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/elevenagents
Give ElevenLabs voice and chat agents real-time web access with Firecrawl
Give your [ElevenAgents](https://elevenlabs.io/agents) voice and chat agents the ability to scrape, search, and crawl the web in real time using Firecrawl. This guide covers two integration paths:
1. **MCP server** — connect the hosted Firecrawl MCP server for zero-code setup.
2. **Server webhook tool** — point a custom tool at Firecrawl's REST API for full control over requests.
## Prerequisites
* An [ElevenLabs](https://elevenlabs.io) account with access to ElevenAgents
* A Firecrawl API key from your [firecrawl.dev dashboard](https://firecrawl.dev)
## Option 1: Firecrawl MCP Server
The fastest way to give an agent web access. ElevenAgents supports remote MCP servers, and Firecrawl provides a hosted MCP endpoint.
### Add the MCP server
1. Open the [Integrations page](https://elevenlabs.io/app/agents/integrations) in ElevenLabs and click **+ Add integration**.
2. Select **Custom MCP Server** from the integration library.
3. Fill in the following fields:
| Field | Value |
| --------------- | ------------------------------------------------------------ |
| **Name** | Firecrawl |
| **Description** | Search, scrape, crawl, and extract content from any website. |
| **Server type** | Streamable HTTP |
| **Server URL** | `https://mcp.firecrawl.dev/YOUR_FIRECRAWL_API_KEY/v2/mcp` |
Replace `YOUR_FIRECRAWL_API_KEY` with your actual key. Leave the **Type** dropdown set to **Value**. Treat this URL as a secret — it contains your API key.
You must select **Streamable HTTP** as the server type. The default SSE option does not work with the Firecrawl MCP endpoint.
4. Under **Tool Approval Mode**, choose an approval level:
* **No Approval** — the agent uses tools freely. Fine for read-only scraping.
* **Fine-Grained Tool Approval** — lets you pre-select which tools can run automatically and which require approval. Good for controlling expensive crawl operations.
* **Always Ask** (default) — the agent requests permission before every tool call.
5. Check **I trust this server**, then click **Add Server**.
ElevenLabs will connect to the server and list the available tools (scrape, search, crawl, map, and more).
### Attach it to an agent
1. Create or open an agent in the [ElevenAgents dashboard](https://elevenlabs.io/app/agents/agents).
2. Go to the **Tools** tab, then select the **MCP** sub-tab.
3. Click **Add server** and select the **Firecrawl** integration from the dropdown.
### Update the system prompt
In the **Agent** tab, add instructions to the **System prompt** so the agent knows when to use Firecrawl. For example:
```text theme={null}
You are a helpful research assistant. When the user asks about a website,
a company, or any topic that requires up-to-date information, use the
Firecrawl tools to search the web or scrape the relevant page, then
summarize the results.
```
### Test it
Click **Preview** in the top navigation bar. You can test using the text chat input or by starting a voice call. Try a prompt like:
> "What does firecrawl.dev do? Go to the site and summarize it for me."
The agent will call the Firecrawl MCP `scrape` tool, receive the page markdown, and respond with a summary.
***
## Option 2: Server Webhook Tool
Use this approach when you need precise control over request parameters (formats, headers, timeouts, etc.) or want to call a specific Firecrawl endpoint without exposing the full MCP tool set.
### Scrape tool
Create a tool that scrapes a single URL and returns its content as markdown.
1. Open your agent and go to the **Tools** tab.
2. Click **Add tool** and select **Webhook**.
3. Configure the tool:
| Field | Value |
| --------------- | ---------------------------------------------------------- |
| **Name** | scrape\_website |
| **Description** | Scrape content from a URL and return it as clean markdown. |
| **Method** | POST |
| **URL** | `https://api.firecrawl.dev/v2/scrape` |
The **Method** field defaults to GET — make sure to change it to **POST**.
4. Scroll to the **Headers** section and click **Add header** for authentication:
| Header | Value |
| --------------- | ------------------------------- |
| `Authorization` | `Bearer YOUR_FIRECRAWL_API_KEY` |
Alternatively, if you have workspace auth connections configured, you can use the **Authentication** dropdown instead.
5. Add a **body parameter**:
| Parameter | Type | Description | Required |
| --------- | ------ | ----------------- | -------- |
| `url` | string | The URL to scrape | Yes |
6. Click **Add tool**.
The Firecrawl API returns the page content as markdown by default. The agent receives the JSON response and can use the `markdown` field to answer questions.
### Search tool
Create a tool that searches the web and returns results with scraped content.
1. Click **Add tool** → **Webhook** again and configure:
| Field | Value |
| --------------- | ------------------------------------------------------------------------- |
| **Name** | search\_web |
| **Description** | Search the web for a query and return relevant results with page content. |
| **Method** | POST |
| **URL** | `https://api.firecrawl.dev/v2/search` |
2. Add the same `Authorization` header as above.
3. Add **body parameters**:
| Parameter | Type | Description | Required |
| --------- | ------ | ----------------------------------------------- | -------- |
| `query` | string | The search query | Yes |
| `limit` | number | Maximum number of results to return (default 5) | No |
4. Click **Add tool**.
### Update the system prompt
In the **Agent** tab, update the **System prompt**:
```text theme={null}
You are a knowledgeable assistant with access to web tools.
- Use `scrape_website` when the user gives you a specific URL to read.
- Use `search_web` when the user asks a general question that requires
finding information online.
Always summarize the information concisely and cite the source URL.
```
### Test it
Click **Preview** and try asking:
> "Search for the latest Next.js features and give me a summary."
The agent will call `search_web`, receive results from Firecrawl, and respond with a summary of the findings.
***
## Tips
* **Model selection** — For reliable tool calling, use a high-intelligence model such as GPT-4o, Claude Sonnet 4.5 or later, or Gemini 2.5 Flash. Smaller models may struggle to extract the correct parameters.
* **Keep prompts specific** — Tell the agent exactly when to use each tool. Vague instructions lead to missed or incorrect tool calls.
* **Limit response size** — For voice agents, long scraped pages can overwhelm the LLM context. Use `onlyMainContent: true` in scrape options (or instruct the agent to summarize aggressively) to keep responses concise.
* **Tool call sounds** — In the webhook or MCP tool settings, you can configure a **Tool call sound** to play ambient audio while a tool runs. This signals to the user that the agent is working.
## Resources
* [ElevenAgents documentation](https://elevenlabs.io/docs/eleven-agents/overview)
* [ElevenAgents tools overview](https://elevenlabs.io/docs/eleven-agents/customization/tools)
* [ElevenAgents MCP integration](https://elevenlabs.io/docs/eleven-agents/customization/tools/mcp)
* [Firecrawl API reference](https://docs.firecrawl.dev/api-reference/v2-introduction)
* [Firecrawl MCP server](https://docs.firecrawl.dev/mcp-server)
# Gemini
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/gemini
Use Firecrawl with Google's Gemini AI for web scraping + AI workflows
Integrate Firecrawl with Google's Gemini for AI applications powered by web data.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js @google/genai
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
GEMINI_API_KEY=your_gemini_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using Gemini.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { GoogleGenAI } from '@google/genai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Summarize: ${scrapeResult.markdown}`,
});
console.log('Summary:', response.text);
```
## Content Analysis
This example shows how to analyze website content using Gemini's multi-turn conversation capabilities.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { GoogleGenAI } from '@google/genai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://news.ycombinator.com/', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const chat = ai.chats.create({
model: 'gemini-2.5-flash'
});
// Ask for the top 3 stories on Hacker News
const result1 = await chat.sendMessage({
message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${scrapeResult.markdown}`
});
console.log('Top 3 Stories:', result1.text);
// Ask for the 4th and 5th stories on Hacker News
const result2 = await chat.sendMessage({
message: `Now, what are the 4th and 5th top stories on Hacker News from the same content?`
});
console.log('4th and 5th Stories:', result2.text);
```
## Structured Extraction
This example demonstrates how to extract structured data using Gemini's JSON mode from scraped website content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { GoogleGenAI, Type } from '@google/genai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Extract company information: ${scrapeResult.markdown}`,
config: {
responseMimeType: 'application/json',
responseSchema: {
type: Type.OBJECT,
properties: {
name: { type: Type.STRING },
industry: { type: Type.STRING },
description: { type: Type.STRING },
products: {
type: Type.ARRAY,
items: { type: Type.STRING }
}
},
propertyOrdering: ['name', 'industry', 'description', 'products']
}
}
});
console.log('Extracted company info:', response?.text);
```
For more examples, check the [Gemini documentation](https://ai.google.dev/docs).
# Agent Development Kit (ADK)
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/google-adk
Integrate Firecrawl with Google's ADK using MCP for advanced agent workflows
Integrate Firecrawl with Google's Agent Development Kit (ADK) to build powerful AI agents with web scraping capabilities through the Model Context Protocol (MCP).
## Overview
Firecrawl provides an MCP server that seamlessly integrates with Google's ADK, enabling your agents to efficiently scrape, crawl, and extract structured data from any website. The integration supports both cloud-based and self-hosted Firecrawl instances with streamable HTTP for optimal performance.
## Features
* Efficient web scraping, crawling, and content discovery from any website
* Advanced search capabilities and intelligent content extraction
* Deep research and high-volume batch scraping
* Flexible deployment (cloud-based or self-hosted)
* Optimized for modern web environments with streamable HTTP support
## Prerequisites
* Obtain an API key for Firecrawl from [firecrawl.dev](https://firecrawl.dev)
* Install Google ADK
## Setup
```python Remote MCP Server theme={null}
from google.adk.agents.llm_agent import Agent
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
FIRECRAWL_API_KEY = "YOUR-API-KEY"
root_agent = Agent(
model="gemini-2.5-pro",
name="firecrawl_agent",
description='A helpful assistant for scraping websites with Firecrawl',
instruction='Help the user search for website content',
tools=[
MCPToolset(
connection_params=StreamableHTTPServerParams(
url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp",
),
)
],
)
```
```python Local MCP Server theme={null}
from google.adk.agents.llm_agent import Agent
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from mcp import StdioServerParameters
root_agent = Agent(
model='gemini-2.5-pro',
name='firecrawl_agent',
description='A helpful assistant for scraping websites with Firecrawl',
instruction='Help the user search for website content',
tools=[
MCPToolset(
connection_params=StdioConnectionParams(
server_params = StdioServerParameters(
command='npx',
args=[
"-y",
"firecrawl-mcp",
],
env={
"FIRECRAWL_API_KEY": "YOUR-API-KEY",
}
),
timeout=30,
),
)
],
)
```
## Available Tools
| Tool | Name | Description |
| ------------------ | ------------------------------ | ------------------------------------------------------------------------------------ |
| Scrape Tool | `firecrawl_scrape` | Scrape content from a single URL with advanced options |
| Batch Scrape Tool | `firecrawl_batch_scrape` | Scrape multiple URLs efficiently with built-in rate limiting and parallel processing |
| Check Batch Status | `firecrawl_check_batch_status` | Check the status of a batch operation |
| Map Tool | `firecrawl_map` | Map a website to discover all indexed URLs on the site |
| Search Tool | `firecrawl_search` | Search the web and optionally extract content from search results |
| Crawl Tool | `firecrawl_crawl` | Start an asynchronous crawl with advanced options |
| Check Crawl Status | `firecrawl_check_crawl_status` | Check the status of a crawl job |
| Extract Tool | `firecrawl_extract` | Extract structured information from web pages using LLM capabilities |
## Configuration
### Required Configuration
**FIRECRAWL\_API\_KEY**: Your Firecrawl API key
* Required when using cloud API (default)
* Optional when using self-hosted instance with FIRECRAWL\_API\_URL
### Optional Configuration
**Firecrawl API URL (for self-hosted instances)**:
* `FIRECRAWL_API_URL`: Custom API endpoint
* Example: `https://firecrawl.your-domain.com`
* If not provided, the cloud API will be used
**Retry configuration**:
* `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum retry attempts (default: 3)
* `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds (default: 1000)
* `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds (default: 10000)
* `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2)
**Credit usage monitoring**:
* `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Warning threshold (default: 1000)
* `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Critical threshold (default: 100)
## Example: Web Research Agent
```python theme={null}
from google.adk.agents.llm_agent import Agent
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
FIRECRAWL_API_KEY = "YOUR-API-KEY"
# Create a research agent
research_agent = Agent(
model="gemini-2.5-pro",
name="research_agent",
description='An AI agent that researches topics by scraping and analyzing web content',
instruction='''You are a research assistant. When given a topic or question:
1. Use the search tool to find relevant websites
2. Scrape the most relevant pages for detailed information
3. Extract structured data when needed
4. Provide comprehensive, well-sourced answers''',
tools=[
MCPToolset(
connection_params=StreamableHTTPServerParams(
url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp",
),
)
],
)
# Use the agent
response = research_agent.run("What are the latest features in Python 3.13?")
print(response)
```
## Best Practices
1. **Use the right tool for the job**:
* `firecrawl_search` when you need to find relevant pages first
* `firecrawl_scrape` for single pages
* `firecrawl_batch_scrape` for multiple known URLs
* `firecrawl_crawl` for discovering and scraping entire sites
2. **Monitor your usage**: Configure credit thresholds to avoid unexpected usage
3. **Handle errors gracefully**: Configure retry settings based on your use case
4. **Optimize performance**: Use batch operations when scraping multiple URLs
***
## Related Resources
Learn how to build powerful multi-agent AI systems using Google's ADK framework with Firecrawl for web scraping capabilities.
Learn more about Firecrawl's Model Context Protocol (MCP) server integration and capabilities.
Explore the official Google Agent Development Kit documentation for comprehensive guides and API references.
# LangChain
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langchain
Use Firecrawl with LangChain for web scraping + AI workflows
Integrate Firecrawl with LangChain to build AI applications powered by web data.
## Setup
```bash theme={null}
npm install @langchain/openai @mendable/firecrawl-js
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Chat
This example demonstrates a simple workflow: scrape a website and process the content using LangChain.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage } from '@langchain/core/messages';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const chat = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
});
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const response = await chat.invoke([
new HumanMessage(`Summarize: ${scrapeResult.markdown}`)
]);
console.log('Summary:', response.content);
```
## Chains
This example shows how to build a LangChain chain to process and analyze scraped content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const model = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
});
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
// Create processing chain
const prompt = ChatPromptTemplate.fromMessages([
['system', 'You are an expert at analyzing company websites.'],
['user', 'Extract the company name and main products from: {content}']
]);
const chain = prompt.pipe(model).pipe(new StringOutputParser());
// Execute the chain
const result = await chain.invoke({
content: scrapeResult.markdown
});
console.log('Chain result:', result);
```
## Tool Calling
This example demonstrates how to use LangChain's tool calling feature to let the model decide when to scrape websites.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { DynamicStructuredTool } from '@langchain/core/tools';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Create the scraping tool
const scrapeWebsiteTool = new DynamicStructuredTool({
name: 'scrape_website',
description: 'Scrape content from any website URL',
schema: z.object({
url: z.string().url().describe('The URL to scrape')
}),
func: async ({ url }) => {
console.log('Scraping:', url);
const result = await firecrawl.scrape(url, {
formats: ['markdown']
});
console.log('Scraped content preview:', result.markdown?.substring(0, 200) + '...');
return result.markdown || 'No content scraped';
}
});
const model = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
}).bindTools([scrapeWebsiteTool]);
const response = await model.invoke('What is Firecrawl? Visit firecrawl.dev and tell me about it.');
console.log('Response:', response.content);
console.log('Tool calls:', response.tool_calls);
```
## Structured Data Extraction
This example shows how to extract structured data using LangChain's structured output feature.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string(),
description: z.string(),
products: z.array(z.string())
});
const model = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
}).withStructuredOutput(CompanyInfoSchema);
const companyInfo = await model.invoke([
{
role: 'system',
content: 'Extract company information from website content.'
},
{
role: 'user',
content: `Extract data: ${scrapeResult.markdown}`
}
]);
console.log('Extracted company info:', companyInfo);
```
For more examples, check the [LangChain documentation](https://js.langchain.com/docs).
# LangGraph
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langgraph
Integrate Firecrawl with LangGraph for building agent workflows
This guide shows how to integrate Firecrawl with LangGraph to build AI agent workflows that can scrape and process web content.
## Setup
```bash theme={null}
npm install @langchain/langgraph @langchain/openai @mendable/firecrawl-js
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Basic Workflow
This example demonstrates a basic LangGraph workflow that scrapes a website and analyzes the content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { StateGraph, MessagesAnnotation, START, END } from '@langchain/langgraph';
// Initialize Firecrawl
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Initialize LLM
const llm = new ChatOpenAI({
model: "gpt-5-nano",
apiKey: process.env.OPENAI_API_KEY
});
// Define the scrape node
async function scrapeNode(state: typeof MessagesAnnotation.State) {
console.log('Scraping...');
const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] });
return {
messages: [{
role: "system",
content: `Scraped content: ${result.markdown}`
}]
};
}
// Define the analyze node
async function analyzeNode(state: typeof MessagesAnnotation.State) {
console.log('Analyzing...');
const response = await llm.invoke(state.messages);
return { messages: [response] };
}
// Build the graph
const graph = new StateGraph(MessagesAnnotation)
.addNode("scrape", scrapeNode)
.addNode("analyze", analyzeNode)
.addEdge(START, "scrape")
.addEdge("scrape", "analyze")
.addEdge("analyze", END);
// Compile the graph
const app = graph.compile();
// Run the workflow
const result = await app.invoke({
messages: [{ role: "user", content: "Summarize the website" }]
});
console.log(JSON.stringify(result, null, 2));
```
## Multi-Step Workflow
This example demonstrates a more complex workflow that scrapes multiple URLs and processes them.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { StateGraph, Annotation, START, END } from '@langchain/langgraph';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const llm = new ChatOpenAI({ model: "gpt-5-nano", apiKey: process.env.OPENAI_API_KEY });
// Define custom state
const WorkflowState = Annotation.Root({
urls: Annotation(),
scrapedData: Annotation>(),
summary: Annotation()
});
// Scrape multiple URLs
async function scrapeMultiple(state: typeof WorkflowState.State) {
const scrapedData = [];
for (const url of state.urls) {
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
scrapedData.push({ url, content: result.markdown || '' });
}
return { scrapedData };
}
// Summarize all scraped content
async function summarizeAll(state: typeof WorkflowState.State) {
const combinedContent = state.scrapedData
.map(item => `Content from ${item.url}:\n${item.content}`)
.join('\n\n');
const response = await llm.invoke([
{ role: "user", content: `Summarize these websites:\n${combinedContent}` }
]);
return { summary: response.content as string };
}
// Build the workflow graph
const workflow = new StateGraph(WorkflowState)
.addNode("scrape", scrapeMultiple)
.addNode("summarize", summarizeAll)
.addEdge(START, "scrape")
.addEdge("scrape", "summarize")
.addEdge("summarize", END);
const app = workflow.compile();
// Execute workflow
const result = await app.invoke({
urls: ["https://firecrawl.dev", "https://firecrawl.dev/pricing"]
});
console.log(result.summary);
```
For more examples, check the [LangGraph documentation](https://langchain-ai.github.io/langgraphjs/).
# LlamaIndex
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/llamaindex
Use Firecrawl with LlamaIndex for RAG applications
Integrate Firecrawl with LlamaIndex to build AI applications with vector search and embeddings powered by web content.
## Setup
```bash theme={null}
npm install llamaindex @llamaindex/openai @mendable/firecrawl-js
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## RAG with Vector Search
This example demonstrates how to use LlamaIndex with Firecrawl to crawl a website, create embeddings, and query the content using RAG.
```typescript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
import { Document, VectorStoreIndex, Settings } from 'llamaindex';
import { OpenAI, OpenAIEmbedding } from '@llamaindex/openai';
Settings.llm = new OpenAI({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
const crawlResult = await firecrawl.crawl('https://firecrawl.dev', {
limit: 10,
scrapeOptions: { formats: ['markdown'] }
});
console.log(`Crawled ${crawlResult.data.length } pages`);
const documents = crawlResult.data.map((page: any, i: number) =>
new Document({
text: page.markdown,
id_: `page-${i}`,
metadata: { url: page.metadata?.sourceURL }
})
);
const index = await VectorStoreIndex.fromDocuments(documents);
console.log('Vector index created with embeddings');
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: 'What is Firecrawl and how does it work?' });
console.log('\nAnswer:', response.toString());
```
For more examples, check the [LlamaIndex documentation](https://ts.llamaindex.ai/).
# Mastra
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/mastra
Use Firecrawl with Mastra for building AI workflows
Integrate Firecrawl with Mastra, the TypeScript framework for building AI agents and workflows.
## Setup
```bash theme={null}
npm install @mastra/core @mendable/firecrawl-js zod
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Multi-Step Workflow
This example demonstrates a complete workflow that searches, scrapes, and summarizes documentation using Firecrawl and Mastra.
```typescript theme={null}
import { createWorkflow, createStep } from "@mastra/core/workflows";
import { z } from "zod";
import Firecrawl from "@mendable/firecrawl-js";
import { Agent } from "@mastra/core/agent";
const firecrawl = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY || "fc-YOUR_API_KEY"
});
const agent = new Agent({
name: "summarizer",
instructions: "You are a helpful assistant that creates concise summaries of documentation.",
model: "openai/gpt-5-nano",
});
// Step 1: Search with Firecrawl SDK
const searchStep = createStep({
id: "search",
inputSchema: z.object({
query: z.string(),
}),
outputSchema: z.object({
url: z.string(),
title: z.string(),
}),
execute: async ({ inputData }: { inputData: { query: string } }) => {
console.log(`Searching: ${inputData.query}`);
const searchResults = await firecrawl.search(inputData.query, { limit: 1 });
const webResults = (searchResults as any)?.web;
if (!webResults || !Array.isArray(webResults) || webResults.length === 0) {
throw new Error("No search results found");
}
const firstResult = webResults[0];
console.log(`Found: ${firstResult.title}`);
return {
url: firstResult.url,
title: firstResult.title,
};
},
});
// Step 2: Scrape the URL with Firecrawl SDK
const scrapeStep = createStep({
id: "scrape",
inputSchema: z.object({
url: z.string(),
title: z.string(),
}),
outputSchema: z.object({
markdown: z.string(),
title: z.string(),
}),
execute: async ({ inputData }: { inputData: { url: string; title: string } }) => {
console.log(`Scraping: ${inputData.url}`);
const scrapeResult = await firecrawl.scrape(inputData.url, {
formats: ["markdown"],
});
console.log(`Scraped: ${scrapeResult.markdown?.length || 0} characters`);
return {
markdown: scrapeResult.markdown || "",
title: inputData.title,
};
},
});
// Step 3: Summarize with Claude
const summarizeStep = createStep({
id: "summarize",
inputSchema: z.object({
markdown: z.string(),
title: z.string(),
}),
outputSchema: z.object({
summary: z.string(),
}),
execute: async ({ inputData }: { inputData: { markdown: string; title: string } }) => {
console.log(`Summarizing: ${inputData.title}`);
const prompt = `Summarize the following documentation in 2-3 sentences:\n\nTitle: ${inputData.title}\n\n${inputData.markdown}`;
const result = await agent.generate(prompt);
console.log(`Summary generated`);
return { summary: result.text };
},
});
// Create workflow
export const workflow = createWorkflow({
id: "firecrawl-workflow",
inputSchema: z.object({
query: z.string(),
}),
outputSchema: z.object({
summary: z.string(),
}),
steps: [searchStep, scrapeStep, summarizeStep],
})
.then(searchStep)
.then(scrapeStep)
.then(summarizeStep)
.commit();
async function testWorkflow() {
const run = await workflow.createRunAsync();
const result = await run.start({
inputData: { query: "Firecrawl documentation" }
});
if (result.status === "success") {
const { summarize } = result.steps;
if (summarize.status === "success") {
console.log(`\n${summarize.output.summary}`);
}
} else {
console.error("Workflow failed:", result.status);
}
}
testWorkflow().catch(console.error);
```
For more examples, check the [Mastra documentation](https://mastra.ai/docs).
# OpenAI
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/openai
Use Firecrawl with OpenAI for web scraping + AI workflows
Integrate Firecrawl with OpenAI to build AI applications powered by web data.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js openai zod
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using an OpenAI model.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Scrape the website content
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
// Summarize with OpenAI model
const completion = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [
{ role: 'user', content: `Summarize: ${scrapeResult.markdown}` }
]
});
console.log('Summary:', completion.choices[0]?.message.content);
```
## Function Calling
This example shows how to use OpenAI's function calling feature to let the model decide when to scrape websites based on user requests.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ScrapeArgsSchema = z.object({
url: z.string().describe('The URL of the website to scrape')
});
const tools = [{
type: 'function' as const,
function: {
name: 'scrape_website',
description: 'Scrape content from any website URL',
parameters: z.toJSONSchema(ScrapeArgsSchema)
}
}];
const response = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [{
role: 'user',
content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.'
}],
tools
});
const message = response.choices[0]?.message;
if (message?.tool_calls && message.tool_calls.length > 0) {
for (const toolCall of message.tool_calls) {
if (toolCall.type === 'function') {
console.log('Tool called:', toolCall.function.name);
const args = ScrapeArgsSchema.parse(JSON.parse(toolCall.function.arguments));
const result = await firecrawl.scrape(args.url, {
formats: ['markdown'] // Other formats: html, links, etc.
});
console.log('Scraped content:', result.markdown?.substring(0, 200) + '...');
// Send the scraped content back to the model for final response
const finalResponse = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [
{
role: 'user',
content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.'
},
message,
{
role: 'tool',
tool_call_id: toolCall.id,
content: result.markdown || 'No content scraped'
}
],
tools
});
console.log('Final response:', finalResponse.choices[0]?.message?.content);
}
}
} else {
console.log('Direct response:', message?.content);
}
```
## Structured Data Extraction
This example demonstrates how to use OpenAI models with structured outputs to extract specific data from scraped content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string(),
description: z.string(),
products: z.array(z.string())
});
const response = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [
{
role: 'system',
content: 'Extract company information from website content.'
},
{
role: 'user',
content: `Extract data: ${scrapeResult.markdown}`
}
],
response_format: {
type: 'json_schema',
json_schema: {
name: 'company_info',
schema: z.toJSONSchema(CompanyInfoSchema),
strict: true
}
}
});
const content = response.choices[0]?.message?.content;
const companyInfo = content ? CompanyInfoSchema.parse(JSON.parse(content)) : null;
console.log('Validated company info:', companyInfo);
```
## Search + Analyze
This example combines Firecrawl's search capabilities with OpenAI model analysis to find and summarize information from multiple sources.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Search for relevant information
const searchResult = await firecrawl.search('Next.js 16 new features', {
limit: 3,
sources: [{ type: 'web' }], // Other sources: { type: 'news' }, { type: 'images' }
scrapeOptions: { formats: ['markdown'] }
});
console.log('Search results:', searchResult.web?.length, 'pages found');
// Analyze and summarize the key features
const analysis = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [{
role: 'user',
content: `Summarize the key features: ${JSON.stringify(searchResult)}`
}]
});
console.log('Analysis:', analysis.choices[0]?.message?.content);
```
## Responses API with MCP
This example shows how to use OpenAI's Responses API with Firecrawl configured as an MCP (Model Context Protocol) server.
```typescript theme={null}
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await openai.responses.create({
model: 'gpt-5-nano',
tools: [
{
type: 'mcp',
server_label: 'firecrawl',
server_description: 'A web search and scraping MCP server to scrape and extract content from websites.',
server_url: `https://mcp.firecrawl.dev/${process.env.FIRECRAWL_API_KEY}/v2/mcp`,
require_approval: 'never'
}
],
input: 'Find out what the top stories on Hacker News are and the latest blog post on OpenAI and summarize them in a bullet point format'
});
console.log('Response:', JSON.stringify(response.output, null, 2));
```
For more examples, check the [OpenAI documentation](https://platform.openai.com/docs).
# Vercel AI SDK
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/vercel-ai-sdk
Firecrawl tools for Vercel AI SDK. Web scraping, search, interact, and crawling for AI applications.
Firecrawl tools for the Vercel AI SDK. Search, scrape, interact with pages, and crawl the web in your AI applications.
## Install
```bash theme={null}
npm install firecrawl-aisdk ai
```
Set environment variables:
```bash theme={null}
FIRECRAWL_API_KEY=fc-your-key # https://firecrawl.dev
AI_GATEWAY_API_KEY=your-key # https://vercel.com/ai-gateway
```
These examples use the [Vercel AI Gateway](https://vercel.com/ai-gateway) string model format, but Firecrawl tools work with any AI SDK provider. You can also use provider imports like `anthropic('claude-sonnet-4-5-20250514')` from `@ai-sdk/anthropic`.
## Quick Start
`FirecrawlTools()` bundles `search`, `scrape`, and `interact` by default.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { FirecrawlTools } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
tools: FirecrawlTools(),
stopWhen: stepCountIs(30),
prompt: `
1. Use interact on Hacker News to identify the top story
2. Search for other perspectives on the same topic
3. Scrape the most relevant pages you found
4. Summarize everything you found
`,
});
```
## FirecrawlTools
`FirecrawlTools()` gives you the default tools plus an auto-generated `systemPrompt` you can pass to `generateText`.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { FirecrawlTools } from 'firecrawl-aisdk';
const tools = FirecrawlTools();
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
system: `${tools.systemPrompt}\n\nAnswer with citations when possible.`,
tools,
stopWhen: stepCountIs(20),
prompt: 'Find the current Firecrawl pricing page and explain the available plans.',
});
```
You can customize defaults, opt into async tools, or disable individual tools:
```typescript theme={null}
const tools = FirecrawlTools({
search: { limit: 5 },
scrape: { formats: ['markdown'], onlyMainContent: true },
interact: { profile: { name: 'my-session', saveChanges: true } },
crawl: true,
agent: true,
});
```
```typescript theme={null}
// Disable interact, keep search + scrape
FirecrawlTools({ interact: false });
// Opt into deprecated browser compatibility
FirecrawlTools({ browser: {} });
// Include every available tool
FirecrawlTools({ all: true });
```
When scraping to answer a question about a page, prefer query format:
```typescript theme={null}
formats: [{ type: 'query', prompt: 'What does this page say about pricing and rate limits?' }]
```
Use `formats: ['markdown']` only when you need the full page content.
## Individual Tools
Every tool can be used directly or called with options:
```typescript theme={null}
import { generateText } from 'ai';
import { scrape, search } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Search for Firecrawl, then scrape the most relevant result.',
tools: { search, scrape },
});
const customScrape = scrape({ apiKey: 'fc-custom-key', apiUrl: 'https://api.firecrawl.dev' });
```
### Search + Scrape
```typescript theme={null}
import { generateText } from 'ai';
import { search, scrape } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Search for Firecrawl, scrape the top official result, and explain what it does.',
tools: { search, scrape },
});
```
### Map
```typescript theme={null}
import { generateText } from 'ai';
import { map } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Map https://docs.firecrawl.dev and list the main sections.',
tools: { map },
});
```
### Stream
```typescript theme={null}
import { streamText, stepCountIs } from 'ai';
import { scrape } from 'firecrawl-aisdk';
const result = streamText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'What is the first 100 words of firecrawl.dev?',
tools: { scrape },
stopWhen: stepCountIs(3),
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
await result.fullStream;
```
## Interact
`interact()` creates a scrape-backed interactive session. Call `start(url)` to bootstrap a session and get a live view URL, then let the model reuse that session through the `interact` tool.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { interact, search } from 'firecrawl-aisdk';
const interactTool = interact();
console.log('Live view:', await interactTool.start('https://news.ycombinator.com'));
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
tools: { interact: interactTool, search },
stopWhen: stepCountIs(25),
prompt: 'Use interact on the current Hacker News session, find the top story, then search for more context.',
});
await interactTool.close();
```
If you need the explicit live view URL after startup, use `interactTool.interactiveLiveViewUrl`.
Reuse browser state across sessions with profiles:
```typescript theme={null}
const interactTool = interact({
profile: { name: 'my-session', saveChanges: true },
});
```
`browser()` is deprecated. Prefer `interact()`.
## Async Tools
Crawl, batch scrape, and agent return a job ID. Pair them with `poll`.
### Crawl
```typescript theme={null}
import { generateText } from 'ai';
import { crawl, poll } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Crawl https://docs.firecrawl.dev (limit 3 pages) and summarize.',
tools: { crawl, poll },
});
```
### Batch Scrape
```typescript theme={null}
import { generateText } from 'ai';
import { batchScrape, poll } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Scrape https://firecrawl.dev and https://docs.firecrawl.dev, then compare them.',
tools: { batchScrape, poll },
});
```
### Agent
Autonomous web data gathering that searches, navigates, and extracts on its own.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { agent, poll } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Find the founders of Firecrawl, their roles, and their backgrounds.',
tools: { agent, poll },
stopWhen: stepCountIs(10),
});
```
## Logging
```typescript theme={null}
import { generateText } from 'ai';
import { logStep, scrape, stepLogger } from 'firecrawl-aisdk';
const logger = stepLogger();
const { text, usage } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Scrape https://firecrawl.dev and summarize it.',
tools: { scrape },
onStepFinish: logger.onStep,
experimental_onToolCallFinish: logger.onToolCallFinish,
});
logger.close();
logger.summary(usage);
await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Scrape https://firecrawl.dev and summarize it again.',
tools: { scrape },
onStepFinish: logStep,
});
```
## All Exports
```typescript theme={null}
import {
// Core tools
search, // Search the web
scrape, // Scrape a single URL
map, // Discover URLs on a site
crawl, // Crawl multiple pages (async, use poll)
batchScrape, // Scrape multiple URLs (async, use poll)
agent, // Autonomous web research (async, use poll)
// Job management
poll, // Poll async jobs for results
status, // Check job status
cancel, // Cancel running jobs
// Browser/session tools
interact, // interact({ profile: { name: '...' } })
browser, // deprecated compatibility export
// All-in-one bundle
FirecrawlTools, // FirecrawlTools({ search, scrape, interact, crawl, agent })
// Helpers
stepLogger, // Token stats per tool call
logStep, // Simple one-liner logging
} from 'firecrawl-aisdk';
```
# MCP Web Search & Scrape in ChatGPT
Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/chatgpt
Add web scraping and search to ChatGPT in 2 minutes
MCP support in ChatGPT is currently a beta feature. The interface and availability may change as OpenAI continues development.
**Availability:** Developer mode with MCP connectors is not available on the Free tier and requires a paid ChatGPT subscription. Availability and features vary by plan and region. See [OpenAI's documentation on Developer Mode](https://help.openai.com/en/articles/12584461-developer-mode-apps-and-full-mcp-connectors-in-chatgpt-beta) for current availability and setup instructions.
Add web scraping and search capabilities to ChatGPT with Firecrawl MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) and copy your API key.
### 2. Enable Developer Mode
Open ChatGPT settings by clicking your username in the bottom left corner, or navigate directly to [chatgpt.com/#settings](https://chatgpt.com/#settings).
In the settings modal, scroll to the bottom and select **Advanced Settings**. Toggle **Developer mode** to ON.
### 3. Create the Connector
With Developer mode enabled, go to the **Apps & Connectors** tab in the same settings modal.
Click the **Create** button in the top right corner.
Fill in the connector details:
* **Name:** `Firecrawl MCP`
* **Description:** `Web scraping, crawling, search, and content extraction` (optional)
* **MCP Server URL:** `https://mcp.firecrawl.dev/YOUR_API_KEY_HERE/v2/mcp`
* **Authentication:** `None`
Replace `YOUR_API_KEY_HERE` in the URL with your actual [Firecrawl API key](https://www.firecrawl.dev/app/api-keys).
Check the **"I understand and want to continue"** checkbox, then click **Create**.
### 4. Verify Setup
Go back to the main ChatGPT interface. You should see **Developer mode** displayed, indicating that MCP connectors are active.
If you do not see Developer mode, reload the page. If it still does not appear, open settings again and verify that Developer mode is toggled ON under Advanced Settings.
### 5. Access Firecrawl Tools
To use Firecrawl in a conversation, click the **+** button in the chat input, then select **More** and choose **Firecrawl MCP**.
## Quick Demo
With Firecrawl MCP selected, try these prompts:
**Search:**
```
Search for the latest React Server Components updates
```
**Scrape:**
```
Scrape firecrawl.dev and tell me what it does
```
**Get docs:**
```
Scrape the Vercel documentation for edge functions and summarize it
```
## Tool Confirmation
When ChatGPT uses the Firecrawl MCP tools, you may see a confirmation prompt asking for your approval. Some ChatGPT Desktop versions auto-approve tool calls without showing this dialog. If no prompt appears, you can verify the tool was invoked by checking for a "Called tool" section in the response or reviewing your usage at [firecrawl.dev/app/usage](https://www.firecrawl.dev/app/usage).
**No confirmation prompt appearing?** If ChatGPT answers your question without showing a confirmation dialog, the Firecrawl MCP connector is most likely not attached to the conversation. Go back to [Step 5](#5-access-firecrawl-tools) and make sure you click the **+** button, select **More**, and choose **Firecrawl MCP** before sending your prompt. The connector must be attached to each new conversation.
You can check **"Remember for this conversation"** to avoid repeated confirmations during the same chat session. This security measure is implemented by OpenAI to ensure MCP tools do not perform unintended actions.
Once confirmed, ChatGPT will execute the request and return the results.
# MCP Web Search & Scrape in Claude.ai
Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/claude-ai
Add web scraping and search to Claude.ai (Co-work) in 2 minutes
Add web scraping and search capabilities to Claude.ai with Firecrawl MCP using custom connectors.
Looking for Claude Code setup? See the [Claude Code guide](/quickstarts/claude-code) instead.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys) and copy your API key.
### 2. Add Custom Connector
Go to [Settings > Connectors](https://claude.ai/settings/connectors) in Claude.ai and click **Add custom connector**.
Fill in the connector details:
* **URL:** `https://mcp.firecrawl.dev/YOUR_API_KEY/v2/mcp`
* **OAuth Client ID:** Leave blank
* **OAuth Client Secret:** Leave blank
Replace `YOUR_API_KEY` in the URL with your actual [Firecrawl API key](https://www.firecrawl.dev/app/api-keys). Your API key is embedded directly in the URL, so no additional authentication fields are needed.
Click **Add** to save the connector.
### 3. Enable in Conversation
In any Claude.ai conversation, click the **+** button at the bottom left, go to **Connectors**, and enable the Firecrawl connector.
## Quick Demo
With the Firecrawl connector enabled, try these prompts:
**Search the web:**
```
Search for the latest Next.js 15 features
```
**Scrape a page:**
```
Scrape firecrawl.dev and tell me what it does
```
**Get documentation:**
```
Find and scrape the Stripe API docs for payment intents
```
Claude will automatically use Firecrawl's search and scrape tools to get the information.
# MCP Web Search & Scrape in Factory AI
Source: https://docs.firecrawl.dev/developer-guides/mcp-setup-guides/factory-ai
Add web scraping and search to Factory AI in 2 minutes
Add web scraping and search capabilities to Factory AI with Firecrawl MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key.
### 2. Install Factory AI CLI
Install the [Factory AI CLI](https://docs.factory.ai/cli/getting-started/quickstart) if you haven't already:
**macOS/Linux:**
```bash theme={null}
curl -fsSL https://app.factory.ai/cli | sh
```
**Windows:**
```powershell theme={null}
iwr https://app.factory.ai/cli/install.ps1 -useb | iex
```
### 3. Add Firecrawl MCP Server
In the Factory droid CLI, add Firecrawl using the `/mcp add` command:
```bash theme={null}
/mcp add firecrawl "npx -y firecrawl-mcp" -e FIRECRAWL_API_KEY=your-api-key-here
```
Replace `your-api-key-here` with your actual Firecrawl API key.
### 4. Done!
The Firecrawl tools are now available in your Factory AI session!
## Quick Demo
Try these in Factory AI:
**Search the web:**
```
Search for the latest Next.js 15 features
```
**Scrape a page:**
```
Scrape firecrawl.dev and tell me what it does
```
**Get documentation:**
```
Find and scrape the Stripe API docs for payment intents
```
Factory will automatically use Firecrawl's search and scrape tools to get the information.
# Choosing the Data Extractor
Source: https://docs.firecrawl.dev/developer-guides/usage-guides/choosing-the-data-extractor
Compare /agent, /extract, and /scrape (JSON mode) to pick the right tool for structured data extraction
Firecrawl offers three approaches for extracting structured data from web pages. Each serves different use cases with varying levels of automation and control.
## Quick Comparison
| Feature | `/agent` | `/extract` | `/scrape` (JSON mode) |
| ------------------- | -------------------------------------- | ------------------------------------------ | ---------------------------- |
| **Status** | Active | Use `/agent` instead | Active |
| **URL Required** | No (optional) | Yes (wildcards supported) | Yes (single URL) |
| **Scope** | Web-wide discovery | Multiple pages/domains | Single page |
| **URL Discovery** | Autonomous web search | Crawls from given URLs | None |
| **Processing** | Asynchronous | Asynchronous | Synchronous |
| **Schema Required** | No (prompt or schema) | No (prompt or schema) | No (prompt or schema) |
| **Pricing** | Dynamic (5 free runs/day) | Token-based (1 credit = 15 tokens) | 1 credit/page |
| **Best For** | Research, discovery, complex gathering | Multi-page extraction (when you know URLs) | Known single-page extraction |
## 1. `/agent` Endpoint
The `/agent` endpoint is Firecrawl's most advanced offering—the successor to `/extract`. It uses AI agents to autonomously search, navigate, and gather data from across the web.
### Key Characteristics
* **URLs Optional**: Just describe what you need via `prompt`; URLs are completely optional
* **Autonomous Navigation**: The agent searches and navigates deep into sites to find your data
* **Deep Web Search**: Autonomously discovers information across multiple domains and pages
* **Parallel Processing**: Processes multiple sources simultaneously for faster results
* **Models Available**: `spark-1-mini` (default, 60% cheaper) and `spark-1-pro` (higher accuracy)
### Example
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
role: Optional[str] = Field(None, description="Role or position")
background: Optional[str] = Field(None, description="Professional background")
class FoundersSchema(BaseModel):
founders: List[Founder] = Field(description="List of founders")
result = app.agent(
prompt="Find the founders of Firecrawl",
schema=FoundersSchema,
model="spark-1-mini",
max_credits=100
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
const result = await firecrawl.agent({
prompt: "Find the founders of Firecrawl",
schema: z.object({
founders: z.array(z.object({
name: z.string().describe("Full name of the founder"),
role: z.string().describe("Role or position").optional(),
background: z.string().describe("Professional background").optional()
})).describe("List of founders")
}),
model: "spark-1-mini",
maxCredits: 100
});
console.log(result.data);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the founders of Firecrawl",
"model": "spark-1-mini",
"maxCredits": 100,
"schema": {
"type": "object",
"properties": {
"founders": {
"type": "array",
"description": "List of founders",
"items": {
"type": "object",
"properties": {
"name": { "type": "string", "description": "Full name" },
"role": { "type": "string", "description": "Role or position" },
"background": { "type": "string", "description": "Professional background" }
},
"required": ["name"]
}
}
},
"required": ["founders"]
}
}'
```
### Best Use Case: Autonomous Research & Discovery
**Scenario**: You need to find information about AI startups that raised Series A funding, including their founders and funding amounts.
**Why `/agent`**: You don't know which websites contain this information. The agent will autonomously search the web, navigate to relevant sources (Crunchbase, news sites, company pages), and compile the structured data for you.
For more details, see the [Agent documentation](/features/agent).
***
## 2. `/extract` Endpoint
**Use `/agent` instead**: We recommend migrating to [`/agent`](/features/agent)—it's faster, more reliable, doesn't require URLs, and handles all `/extract` use cases plus more.
The `/extract` endpoint collects structured data from specified URLs or entire domains using LLM-powered extraction.
### Key Characteristics
* **URLs Typically Required**: Provide at least one URL (supports wildcards like `example.com/*`)
* **Domain Crawling**: Can crawl and parse all URLs discovered in a domain
* **Web Search Enhancement**: Optional `enableWebSearch` to follow links outside specified domains
* **Schema Optional**: Supports strict JSON schema OR natural language prompts
* **Async Processing**: Returns job ID for status checking
### The URL Limitation
The fundamental challenge with `/extract` is that you typically need to know URLs upfront:
1. **Discovery gap**: For tasks like "find YC W24 companies," you don't know which URLs contain the data. You'd need a separate search step before calling `/extract`.
2. **Awkward web search**: While `enableWebSearch` exists, it's constrained to start from URLs you provide—an awkward workflow for discovery tasks.
3. **Why `/agent` was created**: `/extract` is good at extracting from known locations, but less effective at discovering where data lives.
### Example
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
schema = {
"type": "object",
"properties": {"description": {"type": "string"}},
"required": ["description"],
}
res = firecrawl.extract(
urls=["https://docs.firecrawl.dev"],
prompt="Extract the page description",
schema=schema,
)
print(res.data["description"])
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const schema = {
type: 'object',
properties: {
title: { type: 'string' }
},
required: ['title']
};
const res = await firecrawl.extract({
urls: ['https://docs.firecrawl.dev'],
prompt: 'Extract the page title',
schema,
scrapeOptions: { formats: [{ type: 'json', prompt: 'Extract', schema }] }
});
console.log(res.status || res.success, res.data);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/extract" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://docs.firecrawl.dev"],
"prompt": "Extract the page title",
"schema": {
"type": "object",
"properties": {"title": {"type": "string"}},
"required": ["title"]
},
"scrapeOptions": {
"formats": [{"type": "json", "prompt": "Extract", "schema": {"type": "object"}}]
}
}'
```
### Best Use Case: Targeted Multi-Page Extraction
**Scenario**: You have your competitor's documentation URL and want to extract all their API endpoints from `docs.competitor.com/*`.
**Why `/extract` worked here**: You knew the exact domain. But even then, `/agent` with URLs provided would typically give better results today.
For more details, see the [Extract documentation](/features/extract).
***
## 3. `/scrape` Endpoint with JSON Mode
The `/scrape` endpoint with JSON mode is the most controlled approach—it extracts structured data from a single known URL using an LLM to parse the page content into your specified schema.
### Key Characteristics
* **Single URL Only**: Designed for extracting data from one specific page at a time
* **Exact URL Required**: You must know the precise URL containing the data
* **Schema Optional**: Can use JSON schema OR just a prompt (LLM chooses structure)
* **Synchronous**: Returns data immediately (no job polling needed)
* **Additional Formats**: Can combine JSON extraction with markdown, HTML, screenshots in one request
### Example
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
schema: schema
}],
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [ {
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
} ]
}'
```
### Best Use Case: Single-Page Precision Extraction
**Scenario**: You're building a price monitoring tool and need to extract the price, stock status, and product details from a specific product page you already have the URL for.
**Why `/scrape` with JSON mode**: You know exactly which page contains the data, need precise single-page extraction, and want synchronous results without job management overhead.
For more details, see the [JSON mode documentation](/features/llm-extract).
***
## Decision Guide
**Do you know the exact URL(s) containing your data?**
* **NO** → Use `/agent` (autonomous web discovery)
* **YES**
* **Single page?** → Use `/scrape` with JSON mode
* **Multiple pages?** → Use `/agent` with URLs (or batch `/scrape`)
### Recommendations by Scenario
| Scenario | Recommended Endpoint |
| -------------------------------------------------- | ------------------------------- |
| "Find all AI startups and their funding" | `/agent` |
| "Extract data from this specific product page" | `/scrape` (JSON mode) |
| "Get all blog posts from competitor.com" | `/agent` with URL |
| "Monitor prices across multiple known URLs" | `/scrape` with batch processing |
| "Research companies in a specific industry" | `/agent` |
| "Extract contact info from 50 known company pages" | `/scrape` with batch processing |
***
## Pricing
| Endpoint | Cost | Notes |
| --------------------- | ---------------------------------- | ------------------------------------- |
| `/scrape` (JSON mode) | 1 credit/page | Fixed, predictable |
| `/extract` | Token-based (1 credit = 15 tokens) | Variable based on content |
| `/agent` | Dynamic | 5 free runs/day; varies by complexity |
### Example: "Find the founders of Firecrawl"
| Endpoint | How It Works | Credits Used |
| ---------- | ----------------------------------------------- | ---------------------- |
| `/scrape` | You find the URL manually, then scrape 1 page | \~1 credit |
| `/extract` | You provide URL(s), it extracts structured data | Variable (token-based) |
| `/agent` | Just send the prompt—agent finds and extracts | \~100–500 credits |
**Tradeoff**: `/scrape` is cheapest but requires you to know the URL. `/agent` costs more but handles discovery automatically.
For detailed pricing, see [Firecrawl Pricing](https://firecrawl.dev/pricing).
***
## Migration: `/extract` → `/agent`
If you're currently using `/extract`, migration is straightforward:
**Before (extract):**
```python theme={null}
result = app.extract(
urls=["https://example.com/*"],
prompt="Extract product information",
schema=schema
)
```
**After (agent):**
```python theme={null}
result = app.agent(
urls=["https://example.com"], # Optional - can omit entirely
prompt="Extract product information from example.com",
schema=schema,
model="spark-1-mini" # or "spark-1-pro" for higher accuracy
)
```
The key advantage: with `/agent`, you can drop the URLs entirely and just describe what you need.
***
## Key Takeaways
1. **Know the exact URL?** Use `/scrape` with JSON mode—it's the cheapest (1 credit/page), fastest (synchronous), and most predictable option.
2. **Need autonomous research?** Use `/agent`—it handles discovery automatically with 5 free runs/day, then dynamic pricing based on complexity.
3. **Migrate from `/extract`** to `/agent` for new projects—`/agent` is the successor with better capabilities.
4. **Cost vs. convenience tradeoff**: `/scrape` is most cost-effective when you know your URLs; `/agent` costs more but eliminates manual URL discovery.
***
## Further Reading
* [Agent documentation](/features/agent)
* [Agent models](/features/models)
* [JSON mode documentation](/features/llm-extract)
* [Extract documentation](/features/extract)
* [Batch scraping](/features/batch-scrape)
# Firecrawl + Dify
Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/dify
Official plugin for Firecrawl + Dify AI workflow automation
**Official Dify Plugin:** [marketplace.dify.ai/plugins/langgenius/firecrawl](https://marketplace.dify.ai/plugins/langgenius/firecrawl)
Official plugin by Dify team • 44,000+ installs • Chatflow & Agent apps • Free to use
## Dify Integration Overview
Dify is an open-source LLM app development platform. The official Firecrawl plugin enables web crawling and scraping directly in your AI workflows.
Build visual pipelines with Firecrawl nodes for data extraction
Give AI agents the power to scrape live web data on demand
## Firecrawl Tools in Dify
Convert any URL into clean, structured data. Transform raw HTML into actionable insights.
**Use Cases:** Extract product data, scrape article content, get structured data with JSON mode.
Perform recursive crawls of websites and subdomains to gather extensive content.
**Use Cases:** Full site content extraction, documentation scraping, multi-page data collection.
Generate a complete map of all URLs present on a website.
**Use Cases:** Site structure analysis, SEO auditing, URL discovery for batch scraping.
Retrieve scraping results based on a Job ID or cancel ongoing tasks.
**Use Cases:** Monitor long-running crawls, manage async scraping workflows, cancel operations when needed.
## Getting Started
Access the [Dify Plugin Marketplace](https://marketplace.dify.ai/plugins/langgenius/firecrawl) and install the Firecrawl tool
Visit [Firecrawl API Keys](https://www.firecrawl.dev/app/api-keys) and create a new API key
Navigate to **Plugins > Firecrawl > To Authorize** and input your API key
Drag Firecrawl tools into your Chatflow, Workflow, or Agent application
Set up parameters and test your workflow
## Usage Patterns
**Visual Pipeline Integration**
1. Add Firecrawl node to your pipeline
2. Select action (Map, Crawl, Scrape)
3. Define input variables
4. Execute pipeline sequentially
**Example Flow:**
```
User Input → Firecrawl (Scrape) → LLM Processing → Response
```
**Automated Data Processing**
Build multi-step workflows with:
* Scheduled scraping
* Data transformation
* Database storage
* Notifications
**Example Flow:**
```
Schedule Trigger → Firecrawl (Crawl) → Data Processing → Storage
```
**AI-Powered Web Access**
Give agents real-time web scraping capabilities:
1. Add Firecrawl tool to Agent
2. Agent autonomously decides when to scrape
3. LLM analyzes extracted content
4. Agent provides informed responses
**Use Case:** Customer support agents that reference live documentation
## Common Use Cases
Build RAG-powered chatbots that scrape and reference live website content
Agents that research topics by scraping and analyzing multiple sources
Automated workflows that track competitor websites and alert on changes
Extract and enrich data from websites into structured databases
## Firecrawl Actions
| Tool | Description | Best For |
| ------------- | ------------------------------ | ----------------------- |
| **Scrape** | Single-page data extraction | Quick content capture |
| **Crawl** | Multi-page recursive crawling | Full site extraction |
| **Map** | URL discovery and site mapping | SEO analysis, URL lists |
| **Crawl Job** | Async job management | Long-running operations |
## Best Practices
* Let agents decide when to scrape
* Use natural language instructions
* Enable tool calling in LLM settings
* Monitor token usage with large scrapes
* Use Map before Crawl for large sites
* Set appropriate crawl limits
* Add error handling nodes
* Test with small datasets first
## Dify vs Other Platforms
| Feature | Dify | Make | Zapier | n8n |
| --------------- | -------------------- | ------------------- | ------------------- | ------------------- |
| **Type** | LLM app platform | Workflow automation | Workflow automation | Workflow automation |
| **Best For** | AI agents & chatbots | Visual workflows | Quick automation | Developer control |
| **Pricing** | Open-source + Cloud | Operations-based | Per-task | Flat monthly |
| **AI-Native** | Yes | Partial | Partial | Partial |
| **Self-Hosted** | Yes | No | No | Yes |
**Pro Tip:** Dify excels at building AI-native applications where agents need dynamic web access. Perfect for chatbots, research assistants, and AI tools that need live data.
# Firecrawl + Make
Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/make
Official integration and workflow automation for Firecrawl + Make
**Official Make Integration:** [make.com/en/integrations/firecrawl](https://www.make.com/en/integrations/firecrawl)
Connect with 3,000+ apps • Visual workflow builder • Enterprise-grade automation • AI-powered scenarios
## Make Integration Overview
Make (formerly Integromat) provides a verified, officially supported Firecrawl integration maintained by Mendable.
Design complex automations with Make's intuitive visual interface
Scale securely with enterprise-grade automation and controls
## Firecrawl Modules in Make
### Crawl a Website
Crawl a URL and get its content from multiple pages
***
### Extract a Website
Extract structured data from pages using LLMs
***
### Scrape a Website
Scrape a URL and get its content from a single page
***
### Map a Website
Map multiple URLs based on options
***
### Search a Website
Web search with Firecrawl's scraping capabilities
***
### Get Crawl Status
Get the status of a given crawl event ID
***
### Get Extract Status
Get the status of a given extraction event ID
### Search a Website
Full-page content retrieval for any search query
### Make an API Call
Perform arbitrary authorized API calls for custom use cases
## Popular App Integrations
**Google Sheets** - Track and log scraped data in spreadsheets
**Airtable** - Build structured databases with scraped content
**Google Drive** - Store scraped files and reports
**Notion** - Organize research and web data
**Slack** - Get alerts for website changes and updates
**Telegram Bot** - Instant notifications for monitoring
**Gmail** - Email reports and digests
**Microsoft 365 Email** - Enterprise email automation
**HubSpot CRM** - Enrich leads with web data
**monday.com** - Track competitor intelligence
**ClickUp** - Manage research tasks
**OpenAI (ChatGPT, DALL-E)** - Analyze and summarize scraped content
**Google Gemini AI** - Process and extract insights
**Perplexity AI** - Enhanced research workflows
**Make AI Agents** - Build adaptive AI-powered automations
## Common Workflow Patterns
**Schedule** → Firecrawl (Scrape) → Google Sheets (Log) → Slack (Alert)
Track competitor websites and get instant notifications
**Google Forms** → Firecrawl (Scrape company site) → HubSpot CRM (Update)
Automatically enrich leads with company data
**Schedule** → Firecrawl (Crawl blog) → OpenAI (Summarize) → Gmail (Send digest)
Automated content curation and distribution
**Schedule (Hourly)** → Firecrawl (Scrape) → Filter → Telegram (Alert)
Real-time price tracking and alerts
## Getting Started
Get your API key at [firecrawl.dev](https://firecrawl.dev)
Log into [Make](https://make.com) and click "Create a new scenario"
Search for "Firecrawl" and select your desired action
Add your Firecrawl API key to authenticate
Set up your workflow parameters and run a test
## Firecrawl Actions Overview
| Module | Use Case | Best For |
| --------------------- | -------------------------------- | ------------------- |
| **Scrape a Website** | Single-page data extraction | Quick data capture |
| **Crawl a Website** | Multi-page content collection | Full site scraping |
| **Extract a Website** | AI-powered structured extraction | Complex data needs |
| **Search a Website** | Search + full content | Research automation |
| **Map a Website** | URL discovery | SEO analysis |
## Best Practices
* Use **Scrape** for single pages (fastest)
* Use **Crawl** with limits for large sites
* Schedule appropriately to avoid rate limits
* Add error handling modules
* Schedule strategically (hourly/daily/weekly)
* Use filters to prevent unnecessary runs
* Set crawl limits to control API usage
* Test in Firecrawl playground first
## Industry Use Cases
* Competitor price monitoring
* Product availability tracking
* Review aggregation and analysis
* Inventory level monitoring
* Listing aggregation from multiple sources
* Market trend analysis
* Property data enrichment
* Competitive pricing intelligence
* Competitor content monitoring
* SEO performance tracking
* Backlink analysis
* Social media mention tracking
* Market data collection
* News and sentiment aggregation
* Regulatory filing monitoring
* Stock price tracking
* Job posting aggregation
* Company research automation
* Candidate background enrichment
* Salary benchmarking
## Make vs Zapier vs n8n
| Feature | Make | Zapier | n8n |
| ------------------ | --------------------------------- | ---------------- | -------------------- |
| **Setup** | Visual builder, cloud | No-code, cloud | Self-hosted or cloud |
| **Pricing** | Operations-based | Per-task pricing | Flat monthly |
| **Integrations** | 3,000+ apps | 8,000+ apps | 400+ integrations |
| **Complexity** | Advanced workflows | Simple workflows | Complex workflows |
| **Best For** | Visual automation, mid-complexity | Quick automation | Developer control |
| **Learning Curve** | Moderate | Easy | Moderate-Advanced |
**Pro Tip:** Make excels at visual workflow design and complex automations. Perfect for teams that need more control than Zapier but prefer visual building over n8n's code-first approach.
# Firecrawl + n8n
Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/n8n
Learn how to use Firecrawl with n8n for web scraping automation, a complete step-by-step guide.
## Introduction to Firecrawl and n8n
Web scraping automation has become essential for modern businesses. Whether you need to monitor competitor prices, aggregate content, generate leads, or power AI applications with real-time data, the combination of Firecrawl and n8n provides a powerful solution without requiring programming knowledge.
**What is n8n?**
n8n is an open-source workflow automation platform that connects different tools and services together. Think of it as a visual programming environment where you drag and drop nodes onto a canvas, connect them, and create automated workflows. With over 400 integrations, n8n lets you build complex automations without writing code.
## Why Use Firecrawl with n8n?
Traditional web scraping presents several challenges. Custom scripts break when websites update their structure. Anti-bot systems block automated requests. JavaScript-heavy sites don't render properly. Infrastructure requires constant maintenance.
Firecrawl handles these technical complexities on the scraping side, while n8n provides the automation framework. Together, they let you build production-ready workflows that:
* Extract data from any website reliably
* Connect scraped data to other business tools
* Run on schedules or triggered by events
* Scale from simple tasks to complex pipelines
This guide will walk you through setting up both platforms and building your first scraping workflow from scratch.
## Step 1: Create Your Firecrawl Account
Firecrawl provides the web scraping capabilities for your workflows. Let's set up your account and get your API credentials.
### Sign Up for Firecrawl
1. Navigate to [firecrawl.dev](https://firecrawl.dev) in your web browser
2. Click the "Get Started" or "Sign Up" button
3. Create an account using your email address or GitHub login
4. Verify your email if prompted
### Get Your API Key
After signing in, you need an API key to connect Firecrawl to n8n:
1. Go to your Firecrawl dashboard
2. Navigate to the [API Keys page](https://www.firecrawl.dev/app/api-keys)
3. Click "Create New API Key"
4. Give your key a descriptive name (e.g., "n8n Integration")
5. Copy the generated API key and save it somewhere secure
Your API key is like a password. Keep it secure and never share it publicly. You'll need this key in the next section.
Firecrawl provides free credits when you sign up, which is enough to test your workflows and complete this tutorial.
## Step 2: Set Up n8n
n8n offers two deployment options: cloud-hosted or self-hosted. For beginners, the cloud version is the fastest way to get started.
### Choose Your n8n Version
**n8n Cloud (Recommended for beginners):**
* No installation required
* Free tier available
* Managed infrastructure
* Automatic updates
**Self-Hosted:**
* Complete data control
* Run on your own servers
* Requires Docker installation
* Good for advanced users with specific security requirements
Choose the option that fits your needs. Both paths will lead you to the same workflow editor interface.
### Option A: n8n Cloud (Recommended for Beginners)
1. Visit [n8n.cloud](https://n8n.cloud)
2. Click "Start Free" or "Sign Up"
3. Register using your email address or GitHub
4. Complete the verification process
5. You'll be directed to your n8n dashboard
The free tier provides everything you need to build and test workflows. You can upgrade later if you need more execution time or advanced features.
### Option B: Self-Hosted with Docker
If you prefer to run n8n on your own infrastructure, you can set it up quickly using Docker.
**Prerequisites:**
* [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed on your computer
* Basic familiarity with command line/terminal
**Installation Steps:**
1. Open your terminal or command prompt
2. Create a Docker volume to persist your workflow data:
```bash theme={null}
docker volume create n8n_data
```
This volume stores your workflows, credentials, and execution history so they persist even if you restart the container.
3. Run the n8n Docker container:
```bash theme={null}
docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n
```
4. Wait for n8n to start. You'll see output indicating the server is running
5. Open your web browser and navigate to `http://localhost:5678`
6. Create your n8n account by registering with an email
Your self-hosted n8n instance is now running locally. The interface is identical to n8n Cloud, so you can follow the rest of this guide regardless of which option you chose.
The `--rm` flag automatically removes the container when you stop it, but your data remains safe in the `n8n_data` volume. For production deployments, see the [n8n self-hosting documentation](https://docs.n8n.io/hosting/) for more advanced configuration options.
### Understanding the n8n Interface
When you first log in to n8n, you'll see the main dashboard:
Key interface elements:
* **Workflows**: Your saved automations appear here
* **Executions**: History of workflow runs
* **Credentials**: Stored API keys and authentication tokens
* **Settings**: Account and workspace configuration
Click "Create New Workflow" to open the workflow editor.
### The Workflow Canvas
The workflow editor is where you'll build your automations:
Important elements:
* **Canvas**: The main area where you place and connect nodes
* **Add Node Button (+)**: Click this to add new nodes to your workflow
* **Node Panel**: Opens when you click "+" showing all available nodes
* **Execute Workflow**: Runs your workflow manually for testing
* **Save**: Saves your workflow configuration
Let's build your first workflow by adding the Firecrawl node.
## Step 3: Install and Configure the Firecrawl Node
n8n includes native support for Firecrawl. You'll install the node and connect it to your Firecrawl account using the API key you created earlier.
### Add the Firecrawl Node to Your Workflow
1. In your new workflow canvas, click the "**+**" button in the center
2. The node selection panel opens on the right side
3. In the search box at the top, type "**Firecrawl**"
4. You'll see the Firecrawl node appear in the search results
5. Click "**Install**" next to the Firecrawl node
6. Wait for the installation to complete (this takes a few seconds)
7. Once installed, click on the Firecrawl node to add it to your canvas
The Firecrawl node will appear on your canvas as a box with the Firecrawl logo. This node represents a single Firecrawl operation in your workflow.
### Connect Your Firecrawl API Key
**n8n Cloud users:** Instead of manually entering an API key, you can use the one-click **"Connect to Firecrawl"** OAuth button when adding the Firecrawl node. This automatically creates a new Firecrawl team linked to your n8n account and grants **100,000 free credits**. To view these credits on the [Firecrawl dashboard](https://www.firecrawl.dev/app/usage), make sure you switch to your n8n-linked team using the team selector in the top-left corner.
Before you can use the Firecrawl node, you need to authenticate it with your API key:
1. Click on the Firecrawl node box to open its configuration panel on the right
2. At the top, you'll see a "Credential to connect with" dropdown
3. Since this is your first time, click "**Create New Credential**"
4. A credential configuration window opens
5. Enter a name for this credential (e.g., "My Firecrawl Account")
6. Paste your Firecrawl API key in the "API Key" field
7. Click "**Save**" at the bottom
The credential is now saved in n8n. You won't need to enter your API key again for future Firecrawl nodes.
### Test Your Connection
Let's verify that your Firecrawl node is properly connected:
1. With the Firecrawl node still selected, look at the configuration panel
2. In the "Resource" dropdown, select "**Scrape a url and get its content**"
3. In the "URL" field, enter: `https://firecrawl.dev`
4. Leave other settings at their defaults for now
5. Click the "**Test step**" button at the bottom right of the node
If everything is configured correctly, you'll see the scraped content from example.com appear in the output panel below the node.
Congratulations! Your Firecrawl node is now connected and working.
## Step 4: Create Your Telegram Bot
Before building your first workflow, you'll need a Telegram bot to receive notifications. Telegram bots are free and easy to create through Telegram's BotFather.
### Create a Bot with BotFather
1. Open Telegram on your phone or desktop
2. Search for "**@BotFather**" (the official bot from Telegram)
3. Start a conversation with BotFather by clicking "**Start**"
4. Send the command `/newbot` to create a new bot
5. BotFather will ask you to choose a name for your bot (this is the display name users will see)
6. Enter a name like "**My Firecrawl Bot**"
7. Next, choose a username for your bot. It must end with "bot" (e.g., "**my\_firecrawl\_updates\_bot**")
8. If the username is available, BotFather will create your bot and send you a message with your bot token
Save your bot token securely. This token is like a password that allows n8n to send messages as your bot. Never share it publicly.
### Get Your Chat ID
To send messages to yourself, you need your Telegram chat ID:
1. Open your web browser and visit this URL (replace `YOUR_BOT_TOKEN` with your actual bot token):
```
https://api.telegram.org/botYOUR_BOT_TOKEN/getUpdates
```
2. Keep this browser tab open
3. Now, search for your bot's username in Telegram (the one you just created)
4. Start a conversation with your bot by clicking "**Start**"
5. Send any message to your bot (e.g., "hello")
6. Go back to the browser tab and refresh the page
7. Look for the `"chat":{"id":` field in the JSON response
8. The number next to `"id":` is your chat ID (e.g., `123456789`)
9. Save this chat ID for later
Your chat ID is the unique identifier for your conversation with the bot. You'll use this to tell n8n where to send messages.
You now have everything needed to integrate Telegram with your n8n workflows.
## Step 5: Build Practical Workflows with Telegram
Now let's build three real-world workflows that send information directly to your Telegram. These examples demonstrate different Firecrawl operations and how to integrate them with Telegram notifications.
### Example 1: Daily Firecrawl Product Updates Summary
Get a daily summary of Firecrawl product updates delivered to your Telegram every morning.
**What you'll build:**
* Scrapes Firecrawl's product updates blog at 9 AM daily
* Uses AI to generate a summary of the content
* Sends the summary to your Telegram
**Step-by-step:**
1. Create a new workflow in n8n
2. Add a **Schedule Trigger** node:
* Click the "**+**" button on canvas
* Search for "**Schedule Trigger**"
* Configure: Every day at 9:00 AM
3. Add the **Firecrawl** node:
* Click "**+**" next to Schedule Trigger
* Search for and add "**Firecrawl**"
* Select your Firecrawl credential
* Configure:
* **Resource**: Scrape a url and get its content
* **URL**: `https://www.firecrawl.dev/blog/category/product-updates`
* **Formats**: Select "Summary"
4. Add the **Telegram** node:
* Click "**+**" next to Firecrawl
* Search for "**Telegram**"
* Click "**Send a text message**" to add it to the canvas
5. Set up Telegram credentials:
* Click on the Telegram node to open its configuration
* In the "Credential to connect with" dropdown, click "**Create New Credential**"
* Paste your bot token from BotFather
* Click "**Save**"
6. Configure the Telegram message:
* **Operation**: Send Message
* **Chat ID**: Enter your chat ID
* **Text**: Leave this with a "hello" message for now
* Click **Execute step** to test sending a message while receiving the summary from Firecrawl.
* Now with Firecrawl's summary structure, add the summary to the message text by dragging the `summary` field from Firecrawl node output.
7. Test the workflow:
* Click "**Execute Workflow**"
* Check your Telegram for the summary message
8. Activate the workflow by toggling the "**Active**" switch
Your Telegram bot will now send you a daily summary of Firecrawl product updates every morning at 9 AM.
### Example 2: AI News Search to Telegram
This workflow uses Firecrawl's Search operation to find AI news and send formatted results to Telegram.
**Key differences from Example 1:**
* Uses a **Manual Trigger** instead of Schedule (run on demand)
* Uses **Search** operation instead of Scrape
* Includes a **Code** node to format multiple results
**Build the workflow:**
1. Create a new workflow and add a **Manual Trigger** node
2. Add **Firecrawl** node with these settings:
* **Resource**: Search and optionally scrape search results
* **Query**: `ai news`
* **Limit**: 5
3. Add a **Code** node to format the search results:
* Select "Run Once for All Items"
* Paste this code:
```javascript theme={null}
const results = $input.all();
let message = "Latest AI News:\n\n";
results.forEach((item) => {
const webData = item.json.data.web;
webData.forEach((article, index) => {
message += `${index + 1}. ${article.title}\n`;
message += `${article.description}\n`;
message += `${article.url}\n\n`;
});
});
return [{ json: { message } }];
```
4. Update **Telegram** node (using your saved credential):
* **Text**: Drag the `message` field from Code node
Replace the Manual Trigger with a Schedule Trigger to get automatic AI news updates at set intervals.
### Example 3: AI-Powered News Summary
This workflow adds AI to Example 2, using OpenAI to generate intelligent summaries of the latest AI news before sending to Telegram.
**Key changes from Example 2:**
* Add **OpenAI credentials** setup
* Add **AI Agent** node between Code and Telegram
* AI Agent analyzes and summarizes all the news articles intelligently
* Telegram receives the AI-generated summary instead of raw news list
**Modify the workflow:**
1. **Get your OpenAI API key**:
* Go to [platform.openai.com/api-keys](https://platform.openai.com/api-keys)
* Sign in or create an account
* Click "**Create new secret key**"
* Give it a name (e.g., "n8n Integration")
* Copy the API key immediately (you won't see it again)
2. **Add and connect the AI Agent node**:
* Click "**+**" after the Code node
* Search for "**Basic LLM Chain**" or "**AI Agent**"
* Drag the `message` field from the Code node to the AI Agent's input prompt field
* Select **OpenAI** as the LLM provider
3. **Add your OpenAI credentials**:
* Click "**Create New Credential**" for OpenAI
* Paste your OpenAI API key
* Select model: **gpt-5-mini** (cost-effective) or **gpt-5** (more capable)
* Click "**Save**"
4. **Add the system prompt to the AI Agent**:
* In the AI Agent node, add this system prompt:
```
You are an AI news analyst. Analyze the provided AI news articles and create a concise,
insightful summary highlighting the most important developments and trends.
Group related topics together and provide context about why these developments matter.
Keep the summary conversational and engaging, around 3-4 paragraphs.
```
5. **Update the Telegram node and test**:
* Update the Telegram node:
* **Text**: Drag the AI Agent's output (the generated summary)
* Remove the old mapping to the Code node's message
* Click "**Execute Workflow**" to test
* The AI will analyze all news articles and create a summary
* Check your Telegram for the AI-generated summary
The AI Agent receives all the formatted news articles and creates an intelligent summary, making it easier to understand trends and important developments at a glance.
## Understanding Firecrawl Operations
Now that you've built some workflows, let's explore the different Firecrawl operations available in n8n. Each operation is designed for specific web scraping use cases.
### Scrape a url and get its content
Extracts content from a single web page and returns it in various formats.
**What it does:**
* Scrapes a single URL
* Returns clean markdown, HTML, or AI-generated summaries
* Can capture screenshots and extract links
**Best for:**
* Article extraction
* Product page monitoring
* Blog post scraping
* Generating page summaries
**Example use case:** Daily blog summaries (like Example 1 above)
### Search and optionally scrape search results
Performs web searches and returns results with optional content scraping.
**What it does:**
* Searches the web, news, or images
* Returns titles, descriptions, and URLs
* Optionally scrapes the full content of results
**Best for:**
* Research automation
* News monitoring
* Trend discovery
* Finding relevant content
**Example use case:** AI news aggregation (like Example 2 above)
### Crawl a website
Recursively discovers and scrapes multiple pages from a website.
**What it does:**
* Follows links automatically
* Scrapes multiple pages in one operation
* Can filter URLs by patterns
**Best for:**
* Full documentation extraction
* Site archiving
* Multi-page data collection
### Map a website and get urls
Returns all URLs found on a website without scraping content.
**What it does:**
* Discovers all links on a site
* Returns clean URL list
* Fast and lightweight
**Best for:**
* URL discovery
* Sitemap generation
* Planning larger crawls
### Extract Data
Uses AI to extract structured information based on custom prompts or schemas.
**What it does:**
* AI-powered data extraction
* Returns data in your specified format
* Works across multiple pages
**Best for:**
* Custom data extraction
* Building databases
* Structured information gathering
### Batch Scrape
Scrapes multiple URLs in parallel efficiently.
**What it does:**
* Processes multiple URLs at once
* More efficient than loops
* Returns all results together
**Best for:**
* Processing URL lists
* Bulk data collection
* Large-scale scraping projects
### Agent
Uses an AI agent to autonomously browse and extract data from websites based on a natural language prompt.
**What it does:**
* Accepts a prompt describing what data you need
* AI agent navigates and extracts information autonomously
* Available in **Sync** mode (waits for results) and **Async** mode (returns a job ID immediately)
* Use **Get Agent Status** to poll for results when using Async mode
**Best for:**
* Complex, multi-page data gathering guided by a prompt
* Extracting information when you don't know the exact page structure
* Research tasks that require navigating multiple pages
**Sync vs. Async:**
* **Agent (Sync)** starts the job and waits for the result in one step — simplest for most use cases. The **Max Wait Time** parameter controls how long the node polls before timing out (default: 300 seconds, maximum: 600 seconds). If the agent job takes longer than this, the node returns a timeout status even though the job may still complete on the Firecrawl side. For jobs that may exceed 10 minutes, use the async mode instead.
* **Agent (Async)** returns a job ID immediately. Add a second Firecrawl node with the **Get Agent Status** operation to retrieve results once the job completes.
For details on the agent feature, see the [Agent documentation](/features/agent).
### Browser Sandbox
Provides a persistent browser session that you can control with code, allowing multi-step browser automation within a single session.
**Operations:**
* **Create Browser Session** — starts a new browser session and returns a `sessionId`
* **Execute Browser Code** — runs JavaScript, Python, or bash code in the session (using the `sessionId` from the Create step)
* **List Browser Sessions** — lists active or destroyed sessions
* **Delete Browser Session** — destroys a session when you are done
**Best for:**
* Multi-step browser workflows that require maintaining state across pages
* Dynamic page navigation where the number of steps is not known in advance
* Workflows that use persistent browser profiles to preserve cookies and localStorage across runs
In n8n, select the **Browser** resource on the Firecrawl node to access these operations. Pass the `sessionId` from the Create step into each subsequent Execute or Delete step. Use n8n's **Loop Over Items** node to iterate through a dynamic list of pages, calling Execute for each one within the same session.
For details on the Browser Sandbox feature, see the [Browser Sandbox documentation](/features/browser-sandbox).
## Workflow Templates and Examples
Instead of building from scratch, you can start with pre-built templates. The n8n community has created numerous Firecrawl workflows you can copy and customize.
### Featured Templates
Build an AI chatbot with web access using Firecrawl and n8n
Ready-to-use templates for lead generation, price monitoring, and more
Scrape pages into embeddings and store in Supabase pgvector for RAG
Scrape company websites and extract structured business signals
Scrape pages into embeddings and store in Pinecone for RAG
Browse hundreds of workflows using Firecrawl
View official integration documentation
### How to Import Templates
To use a template from the n8n community:
1. Click on a workflow template link
2. Click "**Import template to localhost:5678 self-hosted instance**" button on the template page
3. The workflow opens in your n8n instance
4. Configure credentials for each node
5. Customize settings for your use case
6. Activate the workflow
## Best Practices
Follow these guidelines to build reliable, efficient workflows:
### Testing and Debugging
* Always test workflows manually before activating schedules
* Use the "**Execute Workflow**" button to test the entire flow
* Check output data at each node to verify correctness
* Use the "**Executions**" tab to review past runs and debug issues
### Error Handling
* Add Error Trigger nodes to catch and handle failures
* Set up notifications when workflows fail
* Use the "**Continue On Fail**" setting for non-critical nodes
* Monitor your workflow executions regularly
### Performance Optimization
* Use Batch Scrape for multiple URLs instead of loops
* Set appropriate rate limits to avoid overwhelming target sites
* Cache data when possible to reduce unnecessary requests
* Schedule intensive workflows during off-peak hours
### Security
* Never expose API keys in workflow configurations
* Use n8n's credential system to securely store authentication
* Be careful when sharing workflows publicly
* Follow target websites' terms of service and robots.txt
## Next Steps
You now have the fundamentals to build web scraping automations with Firecrawl and n8n. Here's how to continue learning:
### Explore Advanced Features
* Study webhook configurations for real-time data processing
* Experiment with AI-powered extraction using prompts and schemas
* Build complex multi-step workflows with branching logic
### Join the Community
* [Firecrawl Discord](https://discord.gg/firecrawl) - Get help with Firecrawl and discuss web scraping
* [n8n Community Forum](https://community.n8n.io/) - Ask questions about workflow automation
* Share your workflows and learn from others
### Recommended Learning Path
1. Complete the example workflows in this guide
2. Modify templates from the community library
3. Build a workflow to solve a real problem in your work
4. Explore advanced Firecrawl operations
5. Contribute your own templates to help others
**Need help?** If you're stuck or have questions, the Firecrawl and n8n communities are active and helpful. Don't hesitate to ask for guidance as you build your automations.
## Additional Resources
* [Firecrawl API Documentation](/api-reference/v2-introduction)
* [n8n Documentation](https://docs.n8n.io/)
* [Web Scraping Best Practices](https://www.firecrawl.dev/blog)
# Firecrawl + Zapier
Source: https://docs.firecrawl.dev/developer-guides/workflow-automation/zapier
Official tutorials and Zapier integration templates for Firecrawl + Zapier automation
**Official Zapier Integration:** [zapier.com/apps/firecrawl/integrations](https://zapier.com/apps/firecrawl/integrations)
Connect with 8,000+ apps • No-code automation • Pre-built Zap templates • Cloud-based
## Official Blog Post
Real-world case study: How Zapier integrated Firecrawl into Zapier Chatbots in a single afternoon.
## Popular Integrations
### Google Sheets
→ [View Integration](https://zapier.com/apps/google-sheets/integrations/firecrawl)
Track competitor data, centralize marketing insights, and automate data collection.
**Best For:** Business owners, marketing teams
***
### Airtable
→ [View Integration](https://zapier.com/apps/airtable/integrations/firecrawl)
Build lead generation databases and content aggregation systems with structured storage.
**Best For:** Sales teams, project managers
***
### Zapier Tables
→ [View Integration](https://zapier.com/apps/zapier-tables/integrations/firecrawl)
No-code database automation for employee onboarding and centralized lead management.
**Best For:** HR teams, operations
### Slack
→ [View Integration](https://zapier.com/apps/slack/integrations/firecrawl)
Get website change notifications, competitor monitoring alerts, and market intelligence updates.
**Best For:** Marketing teams, product managers
***
### Telegram
→ [View Integration](https://zapier.com/apps/telegram/integrations/firecrawl)
Instant price alerts, breaking news notifications, and real-time monitoring.
**Best For:** Traders, news enthusiasts
### HubSpot
→ [View Integration](https://zapier.com/apps/hubspot/integrations/firecrawl)
Contact enrichment, lead scoring with web data, and marketing automation.
**Best For:** Marketing ops, sales ops
***
### Pipedrive
→ [View Integration](https://zapier.com/apps/pipedrive/integrations/firecrawl)
Lead enrichment from websites and competitor intelligence tracking.
**Best For:** Sales teams, account executives
***
### Attio
→ [View Integration](https://zapier.com/apps/attio/integrations/firecrawl)
Modern CRM data enrichment and relationship intelligence.
**Best For:** Modern sales teams
### Google Docs
→ [View Integration](https://zapier.com/apps/google-docs/integrations/firecrawl)
Automated report generation, research documentation, and content aggregation.
**Best For:** Researchers, content creators
***
### Notion
→ [View Integration](https://zapier.com/apps/notion/integrations/firecrawl)
Knowledge base updates, research library building, and content curation.
**Best For:** Product teams, researchers
### Schedule by Zapier
→ [View Integration](https://zapier.com/apps/schedule/integrations/firecrawl)
Run hourly, daily, weekly, or monthly scraping automatically.
***
### Zapier Interfaces
→ [View Integration](https://zapier.com/apps/interfaces/integrations/firecrawl)
Build custom internal tools with form-based scraping and team dashboards.
**Best For:** Operations teams
***
### Zapier Chatbots
→ [View Integration](https://zapier.com/apps/zapier-chatbots/integrations/firecrawl)
AI chatbots with live web knowledge for customer support and lead generation.
Official Zapier product uses Firecrawl internally
## Firecrawl Actions
| Action | Use Case |
| --------------------------- | ----------------------------------------- |
| **Scrape URL** | Quick single-page data capture |
| **Crawl Website** | Full site scraping with multiple pages |
| **Extract Structured Data** | AI-powered extraction with custom schemas |
| **Search Web** | Research automation with search + scrape |
| **Map Website** | SEO analysis and site structure mapping |
## Quick Reference
1. Sign up at [firecrawl.dev](https://firecrawl.dev)
2. Get your API key
3. Create a Zap in Zapier
4. Connect Firecrawl with your API key
5. Choose your workflow and activate
* Use `/Scrape URL` for single pages (faster)
* Schedule strategically (hourly/daily/weekly)
* Test in Firecrawl playground first
* Add error handling for failed scrapes
* Use filters to prevent unnecessary runs
## Industry Use Cases
* Price monitoring across competitors
* Product availability tracking
* Review aggregation
* Listing aggregation
* Market trend analysis
* Property data collection
* Competitor content tracking
* SEO monitoring
* Backlink analysis
* Market data collection
* News aggregation
* Regulatory filing monitoring
* Job posting aggregation
* Company research automation
* Candidate information enrichment
## Zapier vs n8n
| Feature | Zapier | n8n |
| ---------------- | ------------------------------------- | ------------------------ |
| **Setup** | No-code, cloud-based | Self-hosted or cloud |
| **Pricing** | Per-task pricing | Flat monthly |
| **Integrations** | 8,000+ apps | 400+ integrations |
| **Best For** | Quick automation, non-technical users | Custom logic, developers |
**Pro Tip:** Start with Zapier's pre-built templates and customize as needed. Perfect for quick, no-code automation!
# Agent
Source: https://docs.firecrawl.dev/features/agent
Gather data wherever it lives on the web.
Firecrawl `/agent` is a magic API that searches, navigates, and gathers data from the widest range of websites, finding data in hard-to-reach places and uncovering data in ways no other API can. It accomplishes in a few minutes what would take a human many hours — end-to-end data collection, without scripts or manual work.
Whether you need one data point or entire datasets at scale, Firecrawl `/agent` works to get your data.
**Think of `/agent` as deep research for data, wherever it is!**
**Research Preview**: Agent is in early access. Expect rough edges. It will get significantly better over time. [Share feedback →](mailto:product@firecrawl.com)
Agent builds on everything great about `/extract` and takes it further:
* **No URLs Required**: Just describe what you need via `prompt` parameter. URLs are optional
* **Deep Web Search**: Autonomously searches and navigates deep into sites to find your data
* **Reliable and Accurate**: Works with a wide variety of queries and use cases
* **Faster**: Processes multiple sources in parallel for quicker results
Test the agent in the interactive playground — no code required.
## Using `/agent`
The only required parameter is `prompt`. Simply describe what data you want to extract. For structured output, provide a JSON schema. The SDKs support Pydantic (Python) and Zod (Node) for type-safe schema definitions:
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
role: Optional[str] = Field(None, description="Role or position")
background: Optional[str] = Field(None, description="Professional background")
class FoundersSchema(BaseModel):
founders: List[Founder] = Field(description="List of founders")
result = app.agent(
prompt="Find the founders of Firecrawl",
schema=FoundersSchema,
model="spark-1-mini",
max_credits=100
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
const result = await firecrawl.agent({
prompt: "Find the founders of Firecrawl",
schema: z.object({
founders: z.array(z.object({
name: z.string().describe("Full name of the founder"),
role: z.string().describe("Role or position").optional(),
background: z.string().describe("Professional background").optional()
})).describe("List of founders")
}),
model: "spark-1-mini",
maxCredits: 100
});
console.log(result.data);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the founders of Firecrawl",
"model": "spark-1-mini",
"maxCredits": 100,
"schema": {
"type": "object",
"properties": {
"founders": {
"type": "array",
"description": "List of founders",
"items": {
"type": "object",
"properties": {
"name": { "type": "string", "description": "Full name" },
"role": { "type": "string", "description": "Role or position" },
"background": { "type": "string", "description": "Professional background" }
},
"required": ["name"]
}
}
},
"required": ["founders"]
}
}'
```
### Response
```json JSON theme={null}
{
"success": true,
"status": "completed",
"data": {
"founders": [
{
"name": "Eric Ciarla",
"role": "Co-founder",
"background": "Previously at Mendable"
},
{
"name": "Nicolas Camara",
"role": "Co-founder",
"background": "Previously at Mendable"
},
{
"name": "Caleb Peffer",
"role": "Co-founder",
"background": "Previously at Mendable"
}
]
},
"expiresAt": "2024-12-15T00:00:00.000Z",
"creditsUsed": 15
}
```
## Providing URLs (Optional)
You can optionally provide URLs to focus the agent on specific pages:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.agent(
urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
prompt="Compare the features and pricing information from these pages"
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
const result = await firecrawl.agent({
urls: ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
prompt: "Compare the features and pricing information from these pages"
});
console.log(result.data);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing"
],
"prompt": "Compare the features and pricing information from these pages"
}'
```
## Job Status and Completion
Agent jobs run asynchronously. When you submit a job, you'll receive a Job ID that you can use to check status:
* **Default method**: `agent()` waits and returns final results
* **Start then poll**: Use `start_agent` (Python) or `startAgent` (Node) to get a Job ID immediately, then poll with `get_agent_status` / `getAgentStatus`
Job results are available via the API for 24 hours after completion. After this period, you can still view your agent history and results in the [activity logs](https://www.firecrawl.dev/app/logs).
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Start an agent job
agent_job = app.start_agent(
prompt="Find the founders of Firecrawl"
)
# Check the status
status = app.get_agent_status(agent_job.id)
print(status)
# Example output:
# status='completed'
# success=True
# data={ ... }
# expires_at=datetime.datetime(...)
# credits_used=15
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Start an agent job
const started = await firecrawl.startAgent({
prompt: "Find the founders of Firecrawl"
});
// Check the status
if (started.id) {
const status = await firecrawl.getAgentStatus(started.id);
console.log(status.status, status.data);
}
```
```bash cURL theme={null}
curl -X GET "https://api.firecrawl.dev/v2/agent/" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
### Possible States
| Status | Description |
| ------------ | ------------------------------------------ |
| `processing` | The agent is still working on your request |
| `completed` | Extraction finished successfully |
| `failed` | An error occurred during extraction |
| `cancelled` | The job was cancelled by the user |
#### Pending Example
```json JSON theme={null}
{
"success": true,
"status": "processing",
"expiresAt": "2024-12-15T00:00:00.000Z"
}
```
#### Completed Example
```json JSON theme={null}
{
"success": true,
"status": "completed",
"data": {
"founders": [
{
"name": "Eric Ciarla",
"role": "Co-founder"
},
{
"name": "Nicolas Camara",
"role": "Co-founder"
},
{
"name": "Caleb Peffer",
"role": "Co-founder"
}
]
},
"expiresAt": "2024-12-15T00:00:00.000Z",
"creditsUsed": 15
}
```
## Share agent runs
You can share agent runs directly from the Agent playground. Shared links are public — anyone with the link can view the run output and activity — and you can revoke access at any time to disable the link. Shared pages are not indexed by search engines.
## Model Selection
Firecrawl Agent offers two models. **Spark 1 Mini is 60% cheaper** and is the default — perfect for most use cases. Upgrade to Spark 1 Pro when you need maximum accuracy on complex tasks.
| Model | Cost | Accuracy | Best For |
| -------------- | --------------- | -------- | ------------------------------------- |
| `spark-1-mini` | **60% cheaper** | Standard | Most tasks (default) |
| `spark-1-pro` | Standard | Higher | Complex research, critical extraction |
**Start with Spark 1 Mini** (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical.
### Spark 1 Mini (Default)
`spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks.
**Use Mini when:**
* Extracting simple data points (contact info, pricing, etc.)
* Working with well-structured websites
* Cost efficiency is a priority
* Running high-volume extraction jobs
### Spark 1 Pro
`spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks.
**Use Pro when:**
* Performing complex competitive analysis
* Extracting data that requires deep reasoning
* Accuracy is critical for your use case
* Dealing with ambiguous or hard-to-find data
### Specifying a Model
Pass the `model` parameter to select which model to use:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Using Spark 1 Mini (default - can be omitted)
result = app.agent(
prompt="Find the pricing of Firecrawl",
model="spark-1-mini"
)
# Using Spark 1 Pro for complex tasks
result = app.agent(
prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model="spark-1-pro"
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Using Spark 1 Mini (default - can be omitted)
const result = await firecrawl.agent({
prompt: "Find the pricing of Firecrawl",
model: "spark-1-mini"
});
// Using Spark 1 Pro for complex tasks
const resultPro = await firecrawl.agent({
prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model: "spark-1-pro"
});
console.log(result.data);
```
```bash cURL theme={null}
# Using Spark 1 Mini (default)
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the pricing of Firecrawl",
"model": "spark-1-mini"
}'
# Using Spark 1 Pro for complex tasks
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
"model": "spark-1-pro"
}'
```
## Parameters
| Parameter | Type | Required | Description |
| ------------ | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `prompt` | string | **Yes** | Natural language description of the data you want to extract (max 10,000 characters) |
| `model` | string | No | Model to use: `spark-1-mini` (default) or `spark-1-pro` |
| `urls` | array | No | Optional list of URLs to focus the extraction |
| `schema` | object | No | Optional JSON schema for structured output |
| `maxCredits` | number | No | Maximum number of credits to spend on this agent task. Defaults to **2,500** if not set. The dashboard supports values up to **2,500**; for higher limits, set `maxCredits` via the API (values above 2,500 are always treated as paid requests). If the limit is reached, the job fails and **no data is returned**. Failed runs are not billed: credits used for AI reasoning are never charged on failure, any credits used for tool calls during the run (scraping, search, mapping, etc.) are refunded, and the response reports `creditsUsed: 0`. |
## Agent vs Extract: What's Improved
| Feature | Agent (New) | Extract |
| ----------------- | ----------- | -------- |
| URLs Required | No | Yes |
| Speed | Faster | Standard |
| Cost | Lower | Standard |
| Reliability | Higher | Standard |
| Query Flexibility | High | Moderate |
## Example Use Cases
* **Research**: "Find the top 5 AI startups and their funding amounts"
* **Competitive Analysis**: "Compare pricing plans between Slack and Microsoft Teams"
* **Data Gathering**: "Extract contact information from company websites"
* **Content Summarization**: "Summarize the latest blog posts about web scraping"
## CSV Upload in Agent Playground
The [Agent Playground](https://www.firecrawl.dev/app/agent) supports CSV upload for batch processing. Your CSV can contain one or more columns of input data. For example, a single column of company names, or multiple columns such as company name, product, and website URL. Each row represents one item for the agent to process.
Upload your CSV, then add output columns using the "+" button in the grid header. Each column has its own prompt — click a column header to describe what the agent should find for that field (e.g., "CEO or founder name", "Total funding raised"). Hit Run, and the agent processes each row in parallel, filling in the results.
## API Reference
Check out the [Agent API Reference](/api-reference/endpoint/agent) for more details.
Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com).
## Pricing
Firecrawl Agent uses **dynamic billing** that scales with the complexity of your data extraction request. You pay based on the actual work Agent performs, ensuring fair pricing whether you're extracting simple data points or complex structured information from multiple sources.
### How Agent pricing works
Agent pricing is **dynamic and credit-based** during Research Preview:
* **Simple extractions** (like contact info from a single page) typically use fewer credits and cost less
* **Complex research tasks** (like competitive analysis across multiple domains) use more credits but reflect the total effort involved
* **Transparent usage** shows you exactly how many credits each request consumed
* **Credit conversion** automatically converts agent credit usage to credits for easy billing
Credit usage varies based on the complexity of your prompt, the amount of data processed, and the structure of the output requested. As a rough guide, most agent runs consume **a few hundred credits**, though simpler single-page tasks may use less and complex multi-domain research may use more.
### Parallel Agents Pricing
If you are running multiple agents in parallel with Spark-1 Fast, pricing is a lot more predictable at 10 credits per cell.
### Getting started
**All users** receive **5 free daily runs**, which can be used from either the playground or the API, to explore Agent's capabilities without any cost.
Additional usage is billed based on credit consumption and converted to credits.
### Managing costs
Agent can be expensive, but there are some ways to decrease the cost:
* **Start with free runs**: Use your 5 daily free requests to understand pricing
* **Set a `maxCredits` parameter**: Limit your spending by setting a maximum number of credits you're willing to spend. The dashboard caps this at 2,500 credits; to set a higher limit, use the `maxCredits` parameter directly via the API (note: values above 2,500 are always billed as paid requests)
* **Optimize prompts**: More specific prompts often use fewer credits
* **Break large tasks into smaller runs**: A single agent run has an output ceiling based on the underlying model's generation capacity (\~150-200 rows of structured data). For large extraction jobs, split by category, region, or URL batch (3-5 URLs per run) and merge the results. This also keeps each run well under the `maxCredits` limit.
* **Monitor usage**: Track your consumption through the dashboard
* **Set expectations**: Complex multi-domain research will use more credits than simple single-page extractions
Try Agent now at [firecrawl.dev/app/agent](https://www.firecrawl.dev/app/agent) to see how credit usage scales with your specific use cases.
Pricing is subject to change as we move from Research Preview to general availability. Current users will receive advance notice of any pricing updates.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Batch Scrape
Source: https://docs.firecrawl.dev/features/batch-scrape
Scrape multiple URLs in a single batch job
Batch scrape lets you scrape multiple URLs in a single job. Pass a list of URLs and optional parameters, and Firecrawl processes them concurrently and returns all results together.
* Works like `/crawl` but for an explicit list of URLs
* Synchronous and asynchronous modes
* Supports all scrape options including structured extraction
* Configurable concurrency per job
## How it works
You can run a batch scrape in two ways:
| Mode | SDK method (JS / Python) | Behavior |
| ------------ | ----------------------------------------- | ---------------------------------------------------------------- |
| Synchronous | `batchScrape` / `batch_scrape` | Starts the batch and waits for completion, returning all results |
| Asynchronous | `startBatchScrape` / `start_batch_scrape` | Starts the batch and returns a job ID for polling or webhooks |
## Basic usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
start = firecrawl.start_batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"]) # returns id
job = firecrawl.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)
print(job.status, job.completed, job.total)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Start a batch scrape job
const { id } = await firecrawl.startBatchScrape([
'https://firecrawl.dev',
'https://docs.firecrawl.dev'
], {
options: { formats: ['markdown'] },
});
// Wait for completion
const job = await firecrawl.batchScrape([
'https://firecrawl.dev',
'https://docs.firecrawl.dev'
], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 });
console.log(job.status, job.completed, job.total);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"],
"formats": ["markdown"]
}'
```
### Response
Calling `batchScrape` / `batch_scrape` returns the full results when the batch completes.
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
Calling `startBatchScrape` / `start_batch_scrape` returns a job ID you can track via `getBatchScrapeStatus` / `get_batch_scrape_status`, the API endpoint `/batch/scrape/{id}`, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the [activity logs](https://www.firecrawl.dev/app/logs).
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
```
## Concurrency
By default, a batch scrape job uses your team's full concurrent browser limit (see [Rate Limits](/rate-limits)). You can lower this per job with the `maxConcurrency` parameter.
For example, `maxConcurrency: 50` limits that job to 50 simultaneous scrapes. Setting this value too low on large batches will significantly slow down processing, so only reduce it if you need to leave capacity for other concurrent jobs.
## Structured extraction
You can use batch scrape to extract structured data from every page in the batch. This is useful when you want the same schema applied to a list of URLs.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Scrape multiple websites:
batch_scrape_result = firecrawl.batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_result)
# Or, you can use the start method:
batch_scrape_job = firecrawl.start_batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_job)
# You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id)
print(batch_scrape_status)
```
```js Node theme={null}
import Firecrawl, { ScrapeResponse } from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
// Define schema to extract contents into
const schema = {
type: "object",
properties: {
title: { type: "string" },
description: { type: "string" }
},
required: ["title", "description"]
};
// Scrape multiple websites (synchronous):
const batchScrapeResult = await firecrawl.batchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], {
formats: [
{
type: "json",
prompt: "Extract the title and description from the page.",
schema: schema
}
]
});
// Output all the results of the batch scrape:
console.log(batchScrapeResult)
// Or, you can use the start method:
const batchScrapeJob = await firecrawl.startBatchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], {
formats: [
{
type: "json",
prompt: "Extract the title and description from the page.",
schema: schema
}
]
});
console.log(batchScrapeJob)
// You can then use the job ID to check the status of the batch scrape:
const batchScrapeStatus = await firecrawl.getBatchScrapeStatus(batchScrapeJob.id);
console.log(batchScrapeStatus)
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"],
"formats" : [{
"type": "json",
"prompt": "Extract the title and description from the page.",
"schema": {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"description": {
"type": "string"
}
},
"required": [
"title",
"description"
]
}
}]
}'
```
### Response
`batchScrape` / `batch_scrape` returns full results:
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"json": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot."
}
},
...
]
}
```
`startBatchScrape` / `start_batch_scrape` returns a job ID:
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
```
## Webhooks
You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This lets you process results immediately instead of waiting for the entire batch to complete.
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
```
### Event types
| Event | Description |
| ------------------------ | ------------------------------------- |
| `batch_scrape.started` | The batch scrape job has begun |
| `batch_scrape.page` | A single URL was successfully scraped |
| `batch_scrape.completed` | All URLs have been processed |
| `batch_scrape.failed` | The batch scrape encountered an error |
### Payload
Each webhook delivery includes a JSON body with the following structure:
```json theme={null}
{
"success": true,
"type": "batch_scrape.page",
"id": "batch-job-id",
"data": [...],
"metadata": {},
"error": null
}
```
### Verifying webhook signatures
Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with.
1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings
2. Extract the signature from the `X-Firecrawl-Signature` header
3. Compute HMAC-SHA256 of the raw request body using your secret
4. Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...`
For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security).
For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview).
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Change Tracking
Source: https://docs.firecrawl.dev/features/change-tracking
Detect and monitor changes in web content between scrapes
Change tracking compares the current content of a page against the last time you scraped it. Add `changeTracking` to your `formats` array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed.
* Works with `/scrape`, `/crawl`, and `/batch/scrape`
* Two diff modes: `git-diff` for line-level changes, `json` for field-level comparison
* Scoped to your team, and optionally scoped to a tag that you pass in
## How it works
Every scrape with `changeTracking` enabled stores a snapshot and compares it against the previous snapshot for that URL. Snapshots are stored persistently and do not expire, so comparisons remain accurate regardless of how much time has passed between scrapes.
| Scrape | Result |
| ----------------- | -------------------------------------------------- |
| First time | `changeStatus: "new"` (no previous version exists) |
| Content unchanged | `changeStatus: "same"` |
| Content modified | `changeStatus: "changed"` (diff data available) |
| Page removed | `changeStatus: "removed"` |
The response includes these fields in the `changeTracking` object:
| Field | Type | Description |
| ------------------ | --------------------- | ---------------------------------------------------------------------------------------------- |
| `previousScrapeAt` | `string \| null` | Timestamp of the previous scrape (`null` on first scrape) |
| `changeStatus` | `string` | `"new"`, `"same"`, `"changed"`, or `"removed"` |
| `visibility` | `string` | `"visible"` (discoverable via links/sitemap) or `"hidden"` (URL works but is no longer linked) |
| `diff` | `object \| undefined` | Line-level diff (only present in `git-diff` mode when status is `"changed"`) |
| `json` | `object \| undefined` | Field-level comparison (only present in `json` mode when status is `"changed"`) |
## Basic usage
Include both `markdown` and `changeTracking` in the `formats` array. The `markdown` format is required because change tracking compares pages via their markdown content.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
result = firecrawl.scrape(
"https://example.com/pricing",
formats=["markdown", "changeTracking"]
)
print(result.changeTracking)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const result = await firecrawl.scrape('https://example.com/pricing', {
formats: ['markdown', 'changeTracking']
});
console.log(result.changeTracking);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/pricing",
"formats": ["markdown", "changeTracking"]
}'
```
### Response
On the first scrape, `changeStatus` is `"new"` and `previousScrapeAt` is `null`:
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...",
"changeTracking": {
"previousScrapeAt": null,
"changeStatus": "new",
"visibility": "visible"
}
}
}
```
On subsequent scrapes, `changeStatus` reflects whether content changed:
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...",
"changeTracking": {
"previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible"
}
}
}
```
## Git-diff mode
The `git-diff` mode returns line-by-line changes in a format similar to `git diff`. Pass an object in the `formats` array with `modes: ["git-diff"]`:
```python Python theme={null}
result = firecrawl.scrape(
"https://example.com/pricing",
formats=[
"markdown",
{
"type": "changeTracking",
"modes": ["git-diff"]
}
]
)
if result.changeTracking.changeStatus == "changed":
print(result.changeTracking.diff.text)
```
```js Node theme={null}
const result = await firecrawl.scrape('https://example.com/pricing', {
formats: [
'markdown',
{ type: 'changeTracking', modes: ['git-diff'] }
]
});
if (result.changeTracking.changeStatus === 'changed') {
console.log(result.changeTracking.diff.text);
}
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/pricing",
"formats": [
"markdown",
{ "type": "changeTracking", "modes": ["git-diff"] }
]
}'
```
### Response
The `diff` object contains both a plain-text diff and a structured JSON representation:
```json theme={null}
{
"changeTracking": {
"previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible",
"diff": {
"text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo",
"json": {
"files": [{
"chunks": [{
"content": "@@ -1,3 +1,3 @@",
"changes": [
{ "type": "normal", "content": "# Pricing" },
{ "type": "del", "ln": 2, "content": "Starter: $9/mo" },
{ "type": "del", "ln": 3, "content": "Pro: $29/mo" },
{ "type": "add", "ln": 2, "content": "Starter: $12/mo" },
{ "type": "add", "ln": 3, "content": "Pro: $39/mo" }
]
}]
}]
}
}
}
}
```
The structured `diff.json` object contains:
* `files`: array of changed files (typically one for web pages)
* `chunks`: sections of changes within a file
* `changes`: individual line changes with `type` (`"add"`, `"del"`, or `"normal"`), line number (`ln`), and `content`
## JSON mode
The `json` mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff.
Pass `modes: ["json"]` with a `schema` defining the fields to extract:
```python Python theme={null}
result = firecrawl.scrape(
"https://example.com/product/widget",
formats=[
"markdown",
{
"type": "changeTracking",
"modes": ["json"],
"schema": {
"type": "object",
"properties": {
"price": { "type": "string" },
"availability": { "type": "string" }
}
}
}
]
)
if result.changeTracking.changeStatus == "changed":
changes = result.changeTracking.json
print(f"Price: {changes['price']['previous']} → {changes['price']['current']}")
```
```js Node theme={null}
const result = await firecrawl.scrape('https://example.com/product/widget', {
formats: [
'markdown',
{
type: 'changeTracking',
modes: ['json'],
schema: {
type: 'object',
properties: {
price: { type: 'string' },
availability: { type: 'string' }
}
}
}
]
});
if (result.changeTracking.changeStatus === 'changed') {
const changes = result.changeTracking.json;
console.log(`Price: ${changes.price.previous} → ${changes.price.current}`);
}
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/widget",
"formats": [
"markdown",
{
"type": "changeTracking",
"modes": ["json"],
"schema": {
"type": "object",
"properties": {
"price": { "type": "string" },
"availability": { "type": "string" }
}
}
}
]
}'
```
### Response
Each field in the schema is returned with `previous` and `current` values:
```json theme={null}
{
"changeTracking": {
"previousScrapeAt": "2025-06-05T08:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible",
"json": {
"price": {
"previous": "$19.99",
"current": "$24.99"
},
"availability": {
"previous": "In Stock",
"current": "In Stock"
}
}
}
}
```
You can also pass an optional `prompt` to guide the LLM extraction alongside the schema.
JSON mode uses LLM extraction and costs **5 credits per page**. Basic change tracking and `git-diff` mode have no additional cost.
## Using tags
By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain **separate tracking histories** for the same URL, which is useful when you monitor the same page at different intervals or in different contexts.
```python Python theme={null}
# Hourly monitoring (compared against last "hourly" scrape)
result = firecrawl.scrape(
"https://example.com/pricing",
formats=[
"markdown",
{ "type": "changeTracking", "tag": "hourly" }
]
)
# Daily summary (compared against last "daily" scrape)
result = firecrawl.scrape(
"https://example.com/pricing",
formats=[
"markdown",
{ "type": "changeTracking", "tag": "daily" }
]
)
```
```js Node theme={null}
// Hourly monitoring (compared against last "hourly" scrape)
const result = await firecrawl.scrape('https://example.com/pricing', {
formats: [
'markdown',
{ type: 'changeTracking', tag: 'hourly' }
]
});
// Daily summary (compared against last "daily" scrape)
const result2 = await firecrawl.scrape('https://example.com/pricing', {
formats: [
'markdown',
{ type: 'changeTracking', tag: 'daily' }
]
});
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/pricing",
"formats": [
"markdown",
{ "type": "changeTracking", "tag": "hourly" }
]
}'
```
## Crawl with change tracking
Add change tracking to crawl operations to monitor an entire site for changes. Pass the `changeTracking` format inside `scrapeOptions`:
```python Python theme={null}
result = firecrawl.crawl(
"https://example.com",
limit=50,
scrape_options={
"formats": ["markdown", "changeTracking"]
}
)
for page in result.data:
status = page.changeTracking.changeStatus
url = page.metadata.url
print(f"{url}: {status}")
```
```js Node theme={null}
const result = await firecrawl.crawl('https://example.com', {
limit: 50,
scrapeOptions: {
formats: ['markdown', 'changeTracking']
}
});
for (const page of result.data) {
console.log(`${page.metadata.url}: ${page.changeTracking.changeStatus}`);
}
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"scrapeOptions": {
"formats": ["markdown", "changeTracking"]
}
}'
```
## Batch scrape with change tracking
Use [batch scrape](/features/batch-scrape) to monitor a specific set of URLs:
```python Python theme={null}
result = firecrawl.batch_scrape(
[
"https://example.com/pricing",
"https://example.com/product/widget",
"https://example.com/blog/latest"
],
formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}]
)
```
```js Node theme={null}
const result = await firecrawl.batchScrape([
'https://example.com/pricing',
'https://example.com/product/widget',
'https://example.com/blog/latest'
], {
options: {
formats: ['markdown', { type: 'changeTracking', modes: ['git-diff'] }]
}
});
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/pricing",
"https://example.com/product/widget",
"https://example.com/blog/latest"
],
"formats": [
"markdown",
{ "type": "changeTracking", "modes": ["git-diff"] }
]
}'
```
## Scheduling change tracking
Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools.
### Cron job
Create a script that scrapes a URL and alerts on changes:
```bash check-pricing.sh theme={null}
#!/bin/bash
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://competitor.com/pricing",
"formats": [
"markdown",
{
"type": "changeTracking",
"modes": ["json"],
"schema": {
"type": "object",
"properties": {
"starter_price": { "type": "string" },
"pro_price": { "type": "string" }
}
}
}
]
}')
STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus')
if [ "$STATUS" = "changed" ]; then
echo "$RESPONSE" | jq '.data.changeTracking.json'
# Send alert via email, Slack, etc.
fi
```
Schedule it with `crontab -e`:
```bash theme={null}
0 */6 * * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1
```
| Schedule | Expression |
| ------------------------ | ------------- |
| Every hour | `0 * * * *` |
| Every 6 hours | `0 */6 * * *` |
| Daily at 9 AM | `0 9 * * *` |
| Weekly on Monday at 8 AM | `0 8 * * 1` |
### Cloud and serverless schedulers
* **AWS**: EventBridge rule triggering a Lambda function
* **GCP**: Cloud Scheduler triggering a Cloud Function
* **Vercel / Netlify**: Cron-triggered serverless functions
* **GitHub Actions**: Scheduled workflows with `schedule` and `cron` trigger
### Workflow automation
No-code platforms like **n8n**, **Zapier**, and **Make** can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the [workflow automation guides](/developer-guides/workflow-automation/n8n).
## Webhooks
For async operations like crawl and batch scrape, use [webhooks](/webhooks/overview) to receive change tracking results as they arrive instead of polling.
```python Python theme={null}
job = firecrawl.start_crawl(
"https://example.com",
limit=50,
scrape_options={
"formats": [
"markdown",
{"type": "changeTracking", "modes": ["git-diff"]}
]
},
webhook={
"url": "https://your-server.com/firecrawl-webhook",
"headers": {"Authorization": "Bearer your-webhook-secret"},
"events": ["crawl.page", "crawl.completed"]
}
)
```
```js Node theme={null}
const { id } = await firecrawl.startCrawl('https://example.com', {
limit: 50,
scrapeOptions: {
formats: [
'markdown',
{ type: 'changeTracking', modes: ['git-diff'] }
]
},
webhook: {
url: 'https://your-server.com/firecrawl-webhook',
headers: { Authorization: 'Bearer your-webhook-secret' },
events: ['crawl.page', 'crawl.completed']
}
});
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"scrapeOptions": {
"formats": [
"markdown",
{ "type": "changeTracking", "modes": ["git-diff"] }
]
},
"webhook": {
"url": "https://your-server.com/firecrawl-webhook",
"headers": { "Authorization": "Bearer your-webhook-secret" },
"events": ["crawl.page", "crawl.completed"]
}
}'
```
The `crawl.page` event payload includes the `changeTracking` object for each page:
```json theme={null}
{
"success": true,
"type": "crawl.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [{
"markdown": "# Pricing\n\nStarter: $12/mo...",
"metadata": {
"title": "Pricing",
"url": "https://example.com/pricing",
"statusCode": 200
},
"changeTracking": {
"previousScrapeAt": "2025-06-05T12:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible",
"diff": {
"text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo"
}
}
}]
}
```
For webhook configuration details (headers, metadata, events, retries, signature verification), see the [Webhooks documentation](/webhooks/overview).
## Configuration reference
The full set of options available when passing a `changeTracking` format object:
| Parameter | Type | Default | Description |
| --------- | ---------- | ---------- | ----------------------------------------------------------------- |
| `type` | `string` | (required) | Must be `"changeTracking"` |
| `modes` | `string[]` | `[]` | Diff modes to enable: `"git-diff"`, `"json"`, or both |
| `schema` | `object` | (none) | JSON Schema for field-level comparison (required for `json` mode) |
| `prompt` | `string` | (none) | Custom prompt to guide LLM extraction (used with `json` mode) |
| `tag` | `string` | `null` | Separate tracking history identifier |
### Data models
```ts TypeScript theme={null}
interface ChangeTrackingResult {
previousScrapeAt: string | null;
changeStatus: "new" | "same" | "changed" | "removed";
visibility: "visible" | "hidden";
diff?: {
text: string;
json: {
files: Array<{
from: string | null;
to: string | null;
chunks: Array<{
content: string;
changes: Array<{
type: "add" | "del" | "normal";
ln?: number;
ln1?: number;
ln2?: number;
content: string;
}>;
}>;
}>;
};
};
json?: Record;
}
```
```python Python theme={null}
class ChangeTrackingData(BaseModel):
previous_scrape_at: Optional[str] = None
change_status: str # "new" | "same" | "changed" | "removed"
visibility: str # "visible" | "hidden"
diff: Optional[Dict[str, Any]] = None
json: Optional[Dict[str, Any]] = None
```
## Important details
The `markdown` format must always be included alongside `changeTracking`. Change tracking compares pages via their markdown content.
* **Snapshot retention**: Snapshots are stored persistently and do not expire. A scrape performed months after the previous one will still compare correctly against the earlier snapshot.
* **Scoping**: Comparisons are scoped to your team. Your first scrape of any URL returns `"new"`, even if other users have scraped it.
* **URL matching**: Previous scrapes are matched on exact source URL, team ID, `markdown` format, and `tag`. Keep URLs consistent between scrapes.
* **Parameter consistency**: Using different `includeTags`, `excludeTags`, or `onlyMainContent` settings across scrapes of the same URL produces unreliable comparisons.
* **Comparison algorithm**: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization.
* **Caching**: Requests with `changeTracking` bypass the index cache. The `maxAge` parameter is ignored.
* **Error handling**: Monitor the `warning` field in responses and handle the `changeTracking` object potentially being absent (this can occur if the database lookup for the previous scrape times out).
## Billing
| Mode | Cost |
| --------------------- | --------------------------------------- |
| Basic change tracking | No extra cost (standard scrape credits) |
| `git-diff` mode | No extra cost |
| `json` mode | 5 credits per page |
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Crawl
Source: https://docs.firecrawl.dev/features/crawl
Recursively crawl a website and get content from every page
Crawl submits a URL to Firecrawl and recursively discovers and scrapes every reachable subpage. It handles sitemaps, JavaScript rendering, and rate limits automatically, returning clean markdown or structured data for each page.
* Discovers pages via sitemap and recursive link traversal
* Supports path filtering, depth limits, and subdomain/external link control
* Returns results via polling, WebSocket, or webhook
Test crawling in the interactive playground — no code required.
## Installation
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
```
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl
# Authenticate (one-time setup)
firecrawl login
```
## Basic usage
Submit a crawl job by calling `POST /v2/crawl` with a starting URL. The endpoint returns a job ID that you use to poll for results.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
docs = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=10)
print(docs)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const docs = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 10 });
console.log(docs);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 10
}'
```
```bash CLI theme={null}
# Start a crawl job (returns job ID)
firecrawl crawl https://firecrawl.dev
# Wait for completion with progress
firecrawl crawl https://firecrawl.dev --wait --progress --limit 100
```
Each page crawled consumes 1 credit. The default crawl `limit` is 10,000 pages. Before starting, the crawl endpoint checks that your remaining credits can cover the `limit` — if not, it returns a **402 (Payment Required)** error. Set a lower `limit` to match your intended crawl size (e.g. `limit: 100`) to avoid this. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page.
### Scrape options
All options from the [Scrape endpoint](/api-reference/endpoint/scrape) are available in crawl via `scrapeOptions` (JS) / `scrape_options` (Python). These apply to every page the crawler scrapes, including formats, proxy, caching, actions, location, and tags.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
# Crawl with scrape options
response = firecrawl.crawl('https://example.com',
limit=100,
scrape_options={
'formats': [
'markdown',
{ 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'title': { 'type': 'string' } } } }
],
'proxy': 'auto',
'max_age': 600000,
'only_main_content': True
}
)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });
// Crawl with scrape options
const crawlResponse = await firecrawl.crawl('https://example.com', {
limit: 100,
scrapeOptions: {
formats: [
'markdown',
{
type: 'json',
schema: { type: 'object', properties: { title: { type: 'string' } } },
},
],
proxy: 'auto',
maxAge: 600000,
onlyMainContent: true,
},
});
```
## Checking crawl status
Use the job ID to poll for the crawl status and retrieve results.
```python Python theme={null}
status = firecrawl.get_crawl_status("")
print(status)
```
```js Node theme={null}
const status = await firecrawl.getCrawlStatus("");
console.log(status);
```
```bash cURL theme={null}
# After starting a crawl, poll status by jobId
curl -s -X GET "https://api.firecrawl.dev/v2/crawl/" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
```bash CLI theme={null}
# Check crawl status using job ID
firecrawl crawl
```
Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the [activity logs](https://www.firecrawl.dev/app/logs).
Pages in the crawl results `data` array are pages that Firecrawl successfully scraped, even if the target site returned an HTTP error like 404. The `metadata.statusCode` field shows the HTTP status code from the target site. To retrieve pages that Firecrawl itself failed to scrape (e.g. network errors, timeouts, or robots.txt blocks), use the dedicated [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint (`GET /crawl/{id}/errors`).
### Response handling
The response varies based on the crawl's status. For incomplete or large responses exceeding 10MB, a `next` URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the `next` parameter is absent, it indicates the end of the crawl data.
The `skip` and `next` parameters are only relevant when hitting the API directly.
If you're using the SDK, pagination is handled automatically and all
results are returned at once.
```json Scraping theme={null}
{
"status": "scraping",
"total": 36,
"completed": 10,
"creditsUsed": 10,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=10",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
## SDK methods
There are two ways to use crawl with the SDK.
### Crawl and wait
The `crawl` method waits for the crawl to complete and returns the full response. It handles pagination automatically. This is recommended for most use cases.
```python Python theme={null}
from firecrawl import Firecrawl
from firecrawl.types import ScrapeOptions
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Crawl a website:
crawl_status = firecrawl.crawl(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(formats=['markdown', 'html']),
poll_interval=30
)
print(crawl_status)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
const crawlResponse = await firecrawl.crawl('https://firecrawl.dev', {
limit: 100,
scrapeOptions: {
formats: ['markdown', 'html'],
}
})
console.log(crawlResponse)
```
The response includes the crawl status and all scraped data:
```bash Python theme={null}
success=True
status='completed'
completed=100
total=100
creditsUsed=100
expiresAt=datetime.datetime(2025, 4, 23, 19, 21, 17, tzinfo=TzInfo(UTC))
next=None
data=[
Document(
markdown='[Day 7 - Launch Week III.Integrations DayApril 14th to 20th](...',
metadata={
'title': '15 Python Web Scraping Projects: From Beginner to Advanced',
...
'scrapeId': '97dcf796-c09b-43c9-b4f7-868a7a5af722',
'sourceURL': 'https://www.firecrawl.dev/blog/python-web-scraping-projects',
'url': 'https://www.firecrawl.dev/blog/python-web-scraping-projects',
'statusCode': 200
}
),
...
]
```
```json Node theme={null}
{
success: true,
status: "completed",
completed: 100,
total: 100,
creditsUsed: 100,
expiresAt: "2025-04-23T19:28:45.000Z",
data: [
{
markdown: "[Day 7 - Launch Week III.Integrations DayApril ...",
html: `
```
The initial response returns the job ID:
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/crawl/123-456-789"
}
```
## Real-time results with WebSocket
The watcher method provides real-time updates as pages are crawled. Start a crawl, then subscribe to events for immediate data processing.
```python Python theme={null}
import asyncio
from firecrawl import AsyncFirecrawl
async def main():
firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY")
# Start a crawl first
started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5)
# Watch updates (snapshots) until terminal status
async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120):
if snapshot.status == "completed":
print("DONE", snapshot.status)
for doc in snapshot.data:
print("DOC", doc.metadata.source_url if doc.metadata else None)
elif snapshot.status == "failed":
print("ERR", snapshot.status)
else:
print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total)
asyncio.run(main())
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
// Start a crawl and then watch it
const { id } = await firecrawl.startCrawl('https://mendable.ai', {
excludePaths: ['blog/*'],
limit: 5,
});
const watcher = firecrawl.watcher(id, { kind: 'crawl', pollInterval: 2, timeout: 120 });
watcher.on('document', (doc) => {
console.log('DOC', doc);
});
watcher.on('error', (err) => {
console.error('ERR', err?.error || err);
});
watcher.on('done', (state) => {
console.log('DONE', state.status);
});
// Begin watching (WS with HTTP fallback)
await watcher.start();
```
## Webhooks
You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they are scraped instead of waiting for the entire crawl to complete.
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 100,
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
```
### Event types
| Event | Description |
| ----------------- | ---------------------------------------- |
| `crawl.started` | Fires when the crawl begins |
| `crawl.page` | Fires for each page successfully scraped |
| `crawl.completed` | Fires when the crawl finishes |
| `crawl.failed` | Fires if the crawl encounters an error |
### Payload
```json theme={null}
{
"success": true,
"type": "crawl.page",
"id": "crawl-job-id",
"data": [...], // Page data for 'page' events
"metadata": {}, // Your custom metadata
"error": null
}
```
### Verifying webhook signatures
Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with.
1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings
2. Extract the signature from the `X-Firecrawl-Signature` header
3. Compute HMAC-SHA256 of the raw request body using your secret
4. Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...`
For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, payload structure, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview).
## Configuration reference
The full set of parameters available when submitting a crawl job:
| Parameter | Type | Default | Description |
| ----------------------- | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url` | `string` | (required) | The starting URL to crawl from |
| `limit` | `integer` | `10000` | Maximum number of pages to crawl |
| `maxDiscoveryDepth` | `integer` | (none) | Maximum depth from the root URL based on link-discovery hops, not the number of `/` segments in the URL. Each time a new URL is found on a page, it is assigned a depth one higher than the page it was discovered on. The root site and sitemapped pages have a discovery depth of 0. Pages at the max depth are still scraped, but links on them are not followed. |
| `includePaths` | `string[]` | (none) | URL pathname regex patterns to include. Only matching paths are crawled. |
| `excludePaths` | `string[]` | (none) | URL pathname regex patterns to exclude from the crawl |
| `regexOnFullURL` | `boolean` | `false` | Match `includePaths`/`excludePaths` against the full URL (including query parameters) instead of just the pathname |
| `crawlEntireDomain` | `boolean` | `false` | Follow internal links to sibling or parent URLs, not just child paths |
| `allowSubdomains` | `boolean` | `false` | Follow links to subdomains of the main domain |
| `allowExternalLinks` | `boolean` | `false` | Follow links to external websites |
| `sitemap` | `string` | `"include"` | Sitemap handling: `"include"` (default), `"skip"`, or `"only"` |
| `ignoreQueryParameters` | `boolean` | `false` | Avoid re-scraping the same path with different query parameters |
| `ignoreRobotsTxt` | `boolean` | `false` | Ignore the website's robots.txt rules. **Enterprise only** — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. |
| `robotsUserAgent` | `string` | (none) | Custom User-Agent string for robots.txt evaluation. When set, robots.txt is fetched with this User-Agent and rules are matched against it instead of the default. **Enterprise only** — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. |
| `delay` | `number` | (none) | Delay in seconds between scrapes to respect rate limits. Setting this forces concurrency to 1. |
| `maxConcurrency` | `integer` | (none) | Maximum concurrent scrapes. Defaults to your team's concurrency limit. |
| `scrapeOptions` | `object` | (none) | Options applied to every scraped page (formats, proxy, caching, actions, etc.) |
| `webhook` | `object` | (none) | Webhook configuration for real-time notifications |
| `prompt` | `string` | (none) | Natural language prompt to generate crawl options. Explicitly set parameters override generated equivalents. |
## Important details
By default, crawl ignores sublinks that are not children of the URL you provide. For example, `website.com/other-parent/blog-1` would not be returned if you crawled `website.com/blogs/`. Use the `crawlEntireDomain` parameter to include sibling and parent paths. To crawl subdomains like `blog.website.com` when crawling `website.com`, use the `allowSubdomains` parameter.
* **Sitemap discovery**: By default, the crawler includes the website's sitemap to discover URLs (`sitemap: "include"`). If you set `sitemap: "skip"`, only pages reachable through HTML links from the root URL are found. Assets like PDFs or deeply nested pages listed in the sitemap but not directly linked from HTML will be missed. For maximum coverage, keep the default setting.
* **Credit usage**: Each page crawled costs 1 credit. JSON mode adds 4 credits per page, enhanced proxy adds 4 credits per page, and PDF parsing costs 1 credit per PDF page.
* **Result expiration**: Job results are available via the API for 24 hours after completion. After that, view results in the [activity logs](https://www.firecrawl.dev/app/logs).
* **Crawl errors**: The `data` array contains pages Firecrawl successfully scraped. Use the [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint to retrieve pages that failed due to network errors, timeouts, or robots.txt blocks.
* **Non-deterministic results**: Crawl results may vary between runs of the same configuration. Pages are scraped concurrently, so the order in which links are discovered depends on network timing and which pages finish loading first. This means different branches of a site may be explored to different extents near the depth boundary, especially at higher `maxDiscoveryDepth` values. To get more deterministic results, set `maxConcurrency` to `1` or use `sitemap: "only"` if the site has a comprehensive sitemap.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Document Parsing
Source: https://docs.firecrawl.dev/features/document-parsing
Learn about document parsing capabilities.
Firecrawl provides powerful document parsing capabilities, allowing you to extract structured content from various document formats. This feature is particularly useful for processing files like spreadsheets, Word documents, and more.
## Supported Document Formats
Firecrawl currently supports the following document formats:
* **Excel Spreadsheets** (`.xlsx`, `.xls`)
* Each worksheet is converted to an HTML table
* Worksheets are separated by H2 headings with the sheet name
* Preserves cell formatting and data types
* **Word Documents** (`.docx`, `.doc`, `.odt`, `.rtf`)
* Extracts text content while preserving document structure
* Maintains headings, paragraphs, lists, and tables
* Preserves basic formatting and styling
* **PDF Documents** (`.pdf`)
* Extracts text content with layout information
* Preserves document structure including sections and paragraphs
* Handles both text-based and scanned PDFs (with OCR support)
* Supports `mode` option to control parsing strategy: `fast` (text-only), `auto` (text with OCR fallback, default), or `ocr` (force OCR)
* Priced at 1 credit per-page. See [Pricing](https://firecrawl.dev/pricing) for details.
### PDF Parsing Modes
Use the `parsers` option to control how PDFs are processed:
| Mode | Description |
| ------ | --------------------------------------------------------------------------------------------------------------------- |
| `auto` | Attempts fast text-based extraction first, falls back to OCR if needed. This is the default. |
| `fast` | Text-based parsing only (embedded text). Fastest option, but will not extract text from scanned or image-heavy pages. |
| `ocr` | Forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page. |
```js theme={null}
// Object syntax with mode
parsers: [{ type: "pdf", mode: "ocr", maxPages: 20 }]
// Default (auto mode)
parsers: [{ type: "pdf" }]
```
## How to Use Document Parsing
Document parsing in Firecrawl works in two ways:
1. **URL-based parsing (`/v2/scrape`)**: provide a URL that points to a supported document type.
2. **File upload parsing (`/v2/parse`)**: upload file bytes directly with `multipart/form-data`.
For URL-based parsing, Firecrawl detects file type from extension or content type automatically.
### Upload documents with `/v2/parse`
Use `/v2/parse` when the source document is local or not publicly accessible by URL.
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/parse" \
-H "Authorization: Bearer fc-YOUR-API-KEY" \
-F 'options={"formats":["markdown"]}' \
-F "file=@./document.docx;type=application/vnd.openxmlformats-officedocument.wordprocessingml.document"
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const app = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await app.parse(
{
data: "
",
filename="upload.html",
content_type="text/html",
options=ScrapeOptions(formats=["markdown"]),
)
print(doc.markdown)
```
### Example: Scraping an Excel File
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com/data.xlsx');
console.log(doc.markdown);
```
### Example: Scraping a Word Document
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com/data.docx');
console.log(doc.markdown);
```
## Output Format
All supported document types are converted to clean, structured markdown. For example, an Excel file with multiple sheets might be converted to:
```markdown theme={null}
## Sheet1
| Name | Value |
|-------|-------|
| Item 1 | 100 |
| Item 2 | 200 |
## Sheet2
| Date | Description |
|------------|--------------|
| 2023-01-01 | First quarter|
```
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Enhanced Mode
Source: https://docs.firecrawl.dev/features/enhanced-mode
Use enhanced proxies for reliable scraping on complex sites
Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. Set the `proxy` parameter to control which proxy strategy is used for a request.
## Proxy types
Firecrawl supports three proxy types:
| Type | Description | Speed | Cost |
| ---------- | ------------------------------------------------------------ | ------ | ----------------------------------------------------------- |
| `basic` | Standard proxies suitable for most sites | Fast | 1 credit |
| `enhanced` | Enhanced proxies for complex sites | Slower | 5 credits per request |
| `auto` | Tries `basic` first, then retries with `enhanced` on failure | Varies | 1 credit if basic succeeds, 5 credits if enhanced is needed |
If you do not specify a proxy, Firecrawl defaults to `auto`.
## Basic usage
Set the `proxy` parameter to choose a proxy strategy. The following example uses `auto`, which lets Firecrawl decide when to escalate to enhanced proxies.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')
# Choose proxy strategy: 'basic' | 'enhanced' | 'auto'
doc = firecrawl.scrape('https://example.com', formats=['markdown'], proxy='auto')
print(doc.warning or 'ok')
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Choose proxy strategy: 'basic' | 'enhanced' | 'auto'
const doc = await firecrawl.scrape('https://example.com', {
formats: ['markdown'],
proxy: 'auto'
});
console.log(doc.warning || 'ok');
```
```bash cURL theme={null}
// Choose proxy strategy: 'basic' | 'enhanced' | 'auto'
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"proxy": "auto"
}'
```
Enhanced proxy requests cost **5 credits per request**. When using `auto`, the 5-credit cost only applies if the basic proxy fails and the enhanced retry succeeds.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Faster Scraping
Source: https://docs.firecrawl.dev/features/fast-scraping
Speed up your scrapes by 500% with the maxAge parameter
## How It Works
Firecrawl caches previously scraped pages and, by default, returns a recent copy when available.
* **Default freshness**: `maxAge = 172800000` ms (2 days). If the cached copy is newer than this, it’s returned instantly; otherwise, Firecrawl scrapes fresh and updates the cache.
* **Force fresh**: Set `maxAge: 0` to always scrape. Be aware this bypasses the cache entirely, meaning every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if you don't need real-time content on every request.
* **Skip caching**: Set `storeInCache: false` if you don’t want to store results for a request.
Get your results **up to 500% faster** when you don’t need the absolute freshest data. Control freshness via `maxAge`:
1. **Return instantly** if we have a recent version of the page
2. **Scrape fresh** only if our version is older than your specified age
3. **Save you time** - results come back in milliseconds instead of seconds
## When to Use This
**Great for:**
* Documentation, articles, product pages
* Bulk processing jobs
* Development and testing
* Building knowledge bases
**Skip for:**
* Real-time data (stock prices, live scores, breaking news)
* Frequently updated content
* Time-sensitive applications
## Usage
Add `maxAge` to your scrape request. Values are in milliseconds (e.g., `3600000` = 1 hour).
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Use cached data if it's less than 1 hour old (3600000 ms)
# This can be 500% faster than a fresh scrape!
scrape_result = firecrawl.scrape(
'https://firecrawl.dev',
formats=['markdown'],
max_age=3600000 # 1 hour in milliseconds
)
print(scrape_result.markdown)
```
```javascript JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Use cached data if it's less than 1 hour old (3600000 ms)
// This can be 500% faster than a fresh scrape!
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown'],
maxAge: 3600000 // 1 hour in milliseconds
});
console.log(scrapeResult.markdown);
```
## Common maxAge values
Here are some helpful reference values:
* **5 minutes**: `300000` - For semi-dynamic content
* **1 hour**: `3600000` - For content that updates hourly
* **1 day**: `86400000` - For daily-updated content
* **1 week**: `604800000` - For relatively static content
## Performance impact
With `maxAge` enabled:
* **500% faster response times** for recent content
* **Instant results** instead of waiting for fresh scrapes
## Important notes
* **Default**: `maxAge` is `172800000` (2 days)
* **Fresh when needed**: If our data is older than `maxAge`, we scrape fresh automatically
* **No stale data**: You'll never get data older than your specified `maxAge`
* **Credits**: Cached results still cost 1 credit per page. Caching improves speed and latency, not credit usage.
### When caching is bypassed
Caching is automatically skipped when your request includes any of the following:
* Custom `headers`
* `actions` (browser automation steps)
* A browser `profile`
* `changeTracking` format
* Custom `screenshot` viewport or quality settings
### Cache hit matching
For a cache hit, these parameters must match exactly between the original and subsequent requests: `url`, `mobile`, `location`, `waitFor`, `blockAds`, `screenshot` (enabled/disabled and full-page), and stealth proxy mode.
You can verify cache behavior by checking `metadata.cacheState` in the response — it will be `"hit"` or `"miss"`.
## Faster crawling
The same speed benefits apply when crawling multiple pages. Use `maxAge` within `scrapeOptions` to get cached results for pages we’ve seen recently.
```python Python theme={null}
from firecrawl import Firecrawl
from firecrawl.v2.types import ScrapeOptions
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Crawl with cached scraping - 500% faster for pages we've seen recently
crawl_result = firecrawl.crawl(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(
formats=['markdown'],
max_age=3600000 # Use cached data if less than 1 hour old
)
)
for page in crawl_result.data:
print(f"URL: {page.metadata.source_url}")
print(f"Content: {page.markdown[:200]}...")
```
```javascript JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Crawl with cached scraping - 500% faster for pages we've seen recently
const crawlResult = await firecrawl.crawl('https://firecrawl.dev', {
limit: 100,
scrapeOptions: {
formats: ['markdown'],
maxAge: 3600000 // Use cached data if less than 1 hour old
}
});
crawlResult.data.forEach(page => {
console.log(`URL: ${page.metadata.sourceURL}`);
console.log(`Content: ${page.markdown.substring(0, 200)}...`);
});
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"limit": 100,
"scrapeOptions": {
"formats": ["markdown"],
"maxAge": 3600000
}
}'
```
When crawling with `maxAge`, each page in your crawl will benefit from the 500% speed improvement if we have recent cached data for that page.
Start using `maxAge` today for dramatically faster scrapes and crawls!
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Interact after scraping
Source: https://docs.firecrawl.dev/features/interact
Interact with a page you fetched by prompting or running code.
Scrape a page to get clean data, then call `/interact` to start taking actions in that page - click buttons, fill forms, extract dynamic content, or navigate deeper. Just describe what you want, or write code if you need full control.
Describe what action you want to take in the page
Interact via code execution securely with playwright, agent-browser
Watch or interact with the browser in real time via embeddable stream
## How It Works
1. **Scrape** a URL with `POST /v2/scrape`. The response includes a `scrapeId` in `data.metadata.scrapeId`. Optionally pass a `profile` to persist browser state across sessions.
2. **Interact** by calling `POST /v2/scrape/{scrapeId}/interact` with a `prompt` or with playwright `code`. On the first call, the scraped session is resumed and you can start interacting with the page.
3. **Stop** the session with `DELETE /v2/scrape/{scrapeId}/interact` when you're done.
## Quick Start
Scrape a page, interact with it, and stop the session:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
# 1. Scrape Amazon's homepage
result = app.scrape("https://www.amazon.com", formats=["markdown"])
scrape_id = result.metadata.scrape_id
# 2. Interact — search for a product and get its price
app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max")
response = app.interact(scrape_id, prompt="Click on the first result and tell me the price")
print(response.output)
# 3. Stop the session
app.stop_interaction(scrape_id)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
// 1. Scrape Amazon's homepage
const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
// 2. Interact — search for a product and get its price
await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
console.log(response.output);
// 3. Stop the session
await app.stopInteraction(scrapeId);
```
```bash cURL theme={null}
# 1. Scrape Amazon's homepage
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "formats": ["markdown"]}')
SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId')
# 2. Interact — search for a product and get its price
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Search for iPhone 16 Pro Max"}'
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Click on the first result and tell me the price"}'
# 3. Stop the session
curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
```bash CLI theme={null}
# 1. Scrape Amazon's homepage (scrape ID is saved automatically)
firecrawl scrape https://www.amazon.com
# 2. Interact — search for a product and get its price
firecrawl interact "Search for iPhone 16 Pro Max"
firecrawl interact "Click on the first result and tell me the price"
# 3. Stop the session
firecrawl interact stop
```
```json Response theme={null}
{
"success": true,
"liveViewUrl": "https://liveview.firecrawl.dev/...",
"interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...",
"output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.",
"exitCode": 0,
"killed": false
}
```
## Interact via prompting
The simplest way to interact with a page. Describe what you want in natural language and it will click, type, scroll, and extract data automatically.
```python Python theme={null}
response = app.interact(scrape_id, prompt="What are the customer reviews saying about battery life?")
print(response.output)
```
```js Node theme={null}
const response = await app.interact(scrapeId, {
prompt: 'What are the customer reviews saying about battery life?',
});
console.log(response.output);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What are the customer reviews saying about battery life?"
}'
```
```bash CLI theme={null}
firecrawl interact "What are the customer reviews saying about battery life?"
```
The response includes an `output` field with the agent's answer:
```json Response theme={null}
{
"success": true,
"liveViewUrl": "https://liveview.firecrawl.dev/...",
"interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...",
"output": "Customers are generally positive about battery life. Most reviewers report 8-10 hours of use on a single charge. A few noted it drains faster with heavy multitasking.",
"stdout": "...",
"result": "...",
"stderr": "",
"exitCode": 0,
"killed": false
}
```
### Keep Prompts Small and Focused
Prompts work best when each one is a **single, clear task**. Instead of asking the agent to do a complex multi-step workflow in one shot, break it into separate interact calls. Each call reuses the same browser session, so state carries over between them.
## Running Code
For full control, you can execute code directly in the browser sandbox. The `page` variable (a Playwright Page object) is available in Node.js and Python. Bash mode has [agent-browser](https://github.com/vercel-labs/agent-browser) pre-installed. You can also take screenshots within the session — use `(await page.screenshot()).toString("base64")` in Node.js, `await page.screenshot(path="/tmp/screenshot.png")` in Python, or `agent-browser screenshot` in Bash.
### Node.js (Playwright)
The default language. Write Playwright code directly — `page` is already connected to the browser.
```python Python theme={null}
response = app.interact(scrape_id, code="""
// Click a button and wait for navigation
await page.click('#next-page');
await page.waitForLoadState('networkidle');
// Extract content from the new page
const title = await page.title();
const content = await page.$eval('.article-body', el => el.textContent);
JSON.stringify({ title, content });
""")
print(response.result)
```
```js Node theme={null}
const response = await app.interact(scrapeId, {
code: `
// Click a button and wait for navigation
await page.click('#next-page');
await page.waitForLoadState('networkidle');
// Extract content from the new page
const title = await page.title();
const content = await page.$eval('.article-body', el => el.textContent);
JSON.stringify({ title, content });
`,
});
console.log(response.result);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"code": "await page.click(\"#next-page\"); await page.waitForLoadState(\"networkidle\"); const title = await page.title(); JSON.stringify({ title });",
"language": "node",
"timeout": 30
}'
```
```bash CLI theme={null}
# Uses the last scrape automatically
firecrawl interact -c "
await page.click('#next-page');
await page.waitForLoadState('networkidle');
const title = await page.title();
const content = await page.\$eval('.article-body', el => el.textContent);
JSON.stringify({ title, content });
"
# Or pass a scrape ID explicitly
# firecrawl interact -c "await page.title()"
```
### Python
Set `language` to `"python"` for Playwright's Python API.
```python Python theme={null}
response = app.interact(
scrape_id,
code="""
import json
await page.click('#load-more')
await page.wait_for_load_state('networkidle')
items = await page.query_selector_all('.item')
data = []
for item in items:
text = await item.text_content()
data.append(text.strip())
print(json.dumps(data))
""",
language="python",
)
print(response.stdout)
```
```js Node theme={null}
const response = await app.interact(scrapeId, {
code: `
import json
await page.click('#load-more')
await page.wait_for_load_state('networkidle')
items = await page.query_selector_all('.item')
data = []
for item in items:
text = await item.text_content()
data.append(text.strip())
print(json.dumps(data))
`,
language: 'python',
});
console.log(response.stdout);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"code": "import json\nawait page.click(\"#load-more\")\nawait page.wait_for_load_state(\"networkidle\")\nitems = await page.query_selector_all(\".item\")\ndata = [await i.text_content() for i in items]\nprint(json.dumps(data))",
"language": "python"
}'
```
```bash CLI theme={null}
firecrawl interact --python -c "
import json
await page.click('#load-more')
await page.wait_for_load_state('networkidle')
items = await page.query_selector_all('.item')
data = [await i.text_content() for i in items]
print(json.dumps(data))
"
```
### Bash (agent-browser)
[agent-browser](https://github.com/vercel-labs/agent-browser) is a CLI pre-installed in the sandbox with 60+ commands. It provides an accessibility tree with element refs (`@e1`, `@e2`, ...) — ideal for LLM-driven automation.
```python Python theme={null}
# Take a snapshot to see interactive elements
snapshot = app.interact(
scrape_id,
code="agent-browser snapshot -i",
language="bash",
)
print(snapshot.stdout)
# Output:
# [document]
# @e1 [input type="text"] "Search..."
# @e2 [button] "Search"
# @e3 [link] "About"
# Interact with elements using @refs
app.interact(
scrape_id,
code='agent-browser fill @e1 "firecrawl" && agent-browser click @e2',
language="bash",
)
```
```js Node theme={null}
// Take a snapshot to see interactive elements
const snapshot = await app.interact(scrapeId, {
code: 'agent-browser snapshot -i',
language: 'bash',
});
console.log(snapshot.stdout);
// Output:
// [document]
// @e1 [input type="text"] "Search..."
// @e2 [button] "Search"
// @e3 [link] "About"
// Interact with elements using @refs
await app.interact(scrapeId, {
code: 'agent-browser fill @e1 "firecrawl" && agent-browser click @e2',
language: 'bash',
});
```
```bash cURL theme={null}
# Take a snapshot to see interactive elements
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"code": "agent-browser snapshot -i", "language": "bash"}'
# Interact with elements using @refs
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"code": "agent-browser fill @e1 \"firecrawl\" && agent-browser click @e2", "language": "bash"}'
```
```bash CLI theme={null}
# Take a snapshot to see interactive elements
firecrawl interact --bash -c "agent-browser snapshot -i"
# Interact with elements using @refs
firecrawl interact --bash -c 'agent-browser fill @e1 "firecrawl" && agent-browser click @e2'
```
Common agent-browser commands:
| Command | Description |
| ------------------------- | ----------------------------------------- |
| `snapshot` | Full accessibility tree with element refs |
| `snapshot -i` | Interactive elements only |
| `click @e1` | Click element by ref |
| `fill @e1 "text"` | Clear field and type text |
| `type @e1 "text"` | Type without clearing |
| `press Enter` | Press a keyboard key |
| `scroll down 500` | Scroll down by pixels |
| `get text @e1` | Get text content |
| `get url` | Get current URL |
| `wait @e1` | Wait for element |
| `wait --load networkidle` | Wait for network idle |
| `find text "X" click` | Find element by text and click |
| `screenshot` | Take a screenshot of the current page |
| `eval "js code"` | Run JavaScript in page |
## Live View
Every interact response returns a `liveViewUrl` that you can embed to watch the browser in real time. Useful for debugging, demos, or building browser-powered UIs.
```json Response theme={null}
{
"success": true,
"liveViewUrl": "https://liveview.firecrawl.dev/...",
"interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...",
"stdout": "",
"result": "...",
"exitCode": 0
}
```
```html theme={null}
```
### Interactive Live View
The response also includes an `interactiveLiveViewUrl`. Unlike the standard live view which is view-only, the interactive live view allows users to click, type, and interact with the browser session directly through the embedded stream. This is useful for building user-facing browser UIs — such as login flows, or guided workflows where end users need to control the browser.
```html theme={null}
```
## Session Lifecycle
### Creation
The first `POST /v2/scrape/{scrapeId}/interact` continues the scrape session and starts the interaction.
### Reuse
Subsequent interact calls on the same `scrapeId` reuse the existing session. The browser stays open and maintains its state between calls, so you can chain multiple interactions:
```python Python theme={null}
# First call — click a tab
app.interact(scrape_id, code="await page.click('#tab-2')")
# Second call — the tab is still selected, extract its content
result = app.interact(scrape_id, code="await page.$eval('#tab-2-content', el => el.textContent)")
print(result.result)
```
```js Node theme={null}
// First call — click a tab
await app.interact(scrapeId, { code: "await page.click('#tab-2')" });
// Second call — the tab is still selected, extract its content
const result = await app.interact(scrapeId, {
code: "await page.$eval('#tab-2-content', el => el.textContent)",
});
console.log(result.result);
```
```bash CLI theme={null}
# First call — click a tab
firecrawl interact -c "await page.click('#tab-2')"
# Second call — the tab is still selected, extract its content
firecrawl interact -c "await page.\$eval('#tab-2-content', el => el.textContent)"
```
### Cleanup
Stop the session explicitly when done:
```python Python theme={null}
app.stop_interaction(scrape_id)
```
```js Node theme={null}
await app.stopInteraction(scrapeId);
```
```bash cURL theme={null}
curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
```bash CLI theme={null}
# Stops the last scrape session
firecrawl interact stop
# Or stop a specific session by ID
# firecrawl interact stop
```
Sessions also expire automatically based on TTL (default: 10 minutes) or inactivity timeout (default: 5 minutes).
Always stop sessions when you're done to avoid unnecessary billing. Credits are prorated by the second.
## Persistent Profiles
By default, each scrape + interact session starts with a clean browser. With `profile`, you can save and reuse browser state (cookies, localStorage, sessions) across scrapes. This is useful for staying logged in and preserving preferences.
Pass the `profile` parameter when calling scrape. Sessions with the same profile name share state.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
# Session 1: Scrape with a profile, log in, then stop (state is saved)
result = app.scrape(
"https://app.example.com/login",
formats=["markdown"],
profile={"name": "my-app", "save_changes": True},
)
scrape_id = result.metadata.scrape_id
app.interact(scrape_id, prompt="Fill in user@example.com and password, then click Login")
app.stop_interaction(scrape_id)
# Session 2: Scrape with the same profile — already logged in
result = app.scrape(
"https://app.example.com/dashboard",
formats=["markdown"],
profile={"name": "my-app", "save_changes": True},
)
scrape_id = result.metadata.scrape_id
response = app.interact(scrape_id, prompt="Extract the dashboard data")
print(response.output)
app.stop_interaction(scrape_id)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
// Session 1: Scrape with a profile, log in, then stop (state is saved)
const result1 = await app.scrape('https://app.example.com/login', {
formats: ['markdown'],
profile: { name: 'my-app', saveChanges: true },
});
const scrapeId1 = result1.metadata?.scrapeId;
await app.interact(scrapeId1, { prompt: 'Fill in user@example.com and password, then click Login' });
await app.stopInteraction(scrapeId1);
// Session 2: Scrape with the same profile — already logged in
const result2 = await app.scrape('https://app.example.com/dashboard', {
formats: ['markdown'],
profile: { name: 'my-app', saveChanges: true },
});
const scrapeId2 = result2.metadata?.scrapeId;
const response = await app.interact(scrapeId2, { prompt: 'Extract the dashboard data' });
console.log(response.output);
await app.stopInteraction(scrapeId2);
```
```bash cURL theme={null}
# Session 1: Scrape with a profile
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://app.example.com/login",
"formats": ["markdown"],
"profile": { "name": "my-app", "saveChanges": true }
}')
SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId')
# Log in via interact
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Fill in user@example.com and password, then click Login"}'
# Stop — state is saved to the profile
curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
# Session 2: Scrape again with the same profile — already logged in
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://app.example.com/dashboard",
"formats": ["markdown"],
"profile": { "name": "my-app", "saveChanges": true }
}')
SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId')
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Extract the dashboard data"}'
```
```bash CLI theme={null}
# Session 1: Scrape with a profile, log in, then stop (state is saved)
firecrawl scrape https://app.example.com/login --profile my-app
firecrawl interact "Fill in user@example.com and password, then click Login"
firecrawl interact stop
# Session 2: Scrape with the same profile — already logged in
firecrawl scrape https://app.example.com/dashboard --profile my-app
firecrawl interact "Extract the dashboard data"
firecrawl interact stop
# Read-only: load profile state without saving changes back
firecrawl scrape https://app.example.com/dashboard --profile my-app --no-save-changes
```
| Parameter | Default | Description |
| ------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | — | A name for the persistent profile. Scrapes with the same name share browser state. |
| `saveChanges` | `true` | When `true`, browser state is saved back to the profile when the interact session stops. Set to `false` to load existing data without writing — useful when you need multiple concurrent readers. |
Only one session can save to a profile at a time. If another session is already saving, you'll get a `409` error. You can still open the same profile with `saveChanges: false`, or try again later.
The browser state is saved when the interact session is stopped. Always stop the session when you're done so the profile can be reused.
## When to Use What
| Use Case | Recommended | Why |
| -------------------------------- | -------------------------- | ------------------------------- |
| Web search | [Search](/features/search) | Dedicated search endpoint |
| Get clean content from a URL | [Scrape](/features/scrape) | One API call, no session needed |
| Click, type, navigate on a page | **Interact** (prompt) | Just describe it in English |
| Extract data behind interactions | **Interact** (prompt) | No selectors needed |
| Complex scraping logic | **Interact** (code) | Full Playwright control |
**Interact vs Browser Sandbox**: Interact is built on the same infrastructure as [Browser Sandbox](/features/browser) but provides a better interface for the most common pattern — scrape a page, then go deeper. Browser Sandbox is better when you need a standalone browser session that isn't tied to a specific scrape.
## Pricing
* **Code-only** (no `prompt`) — 2 credits per session minute
* **With AI prompts** — 7 credits per session minute
* **Scrape** — billed separately (1 credit per scrape, plus any format-specific costs)
## API Reference
* [Execute Interact](/api-reference/endpoint/scrape-execute) — `POST /v2/scrape/{scrapeId}/interact`
* [Stop Interact](/api-reference/endpoint/scrape-browser-delete) — `DELETE /v2/scrape/{scrapeId}/interact`
### Request Body (POST)
| Field | Type | Default | Description |
| ---------- | -------- | -------- | ---------------------------------------------------------------------------------------------------- |
| `prompt` | `string` | — | Natural language task for the AI agent. Required if `code` is not set. Max 10,000 characters. |
| `code` | `string` | — | Code to execute (Node.js, Python, or Bash). Required if `prompt` is not set. Max 100,000 characters. |
| `language` | `string` | `"node"` | `"node"`, `"python"`, or `"bash"`. Only used with `code`. |
| `timeout` | `number` | `30` | Timeout in seconds (1–300). |
| `origin` | `string` | — | Caller identifier for activity tracking. |
### Response
| Field | Description |
| ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| `success` | `true` if the execution completed without errors |
| `liveViewUrl` | Read-only live view URL for the browser session |
| `interactiveLiveViewUrl` | Interactive live view URL (viewers can control the browser) |
| `output` | The agent's natural language answer to your prompt. Only present when using `prompt`. |
| `stdout` | Standard output from the code execution |
| `result` | Raw return value from the sandbox. For `code`: the last expression evaluated. For `prompt`: the raw page snapshot the agent used to produce `output`. |
| `stderr` | Standard error output |
| `exitCode` | Exit code (`0` = success) |
| `killed` | `true` if the execution was terminated due to timeout |
***
Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com) or reach out on [Discord](https://discord.gg/firecrawl).
# JSON mode - Structured result
Source: https://docs.firecrawl.dev/features/llm-extract
Extract structured data from pages via LLMs
**v2 API Change:** JSON schema extraction is fully supported in v2, but the API format has changed. In v2, the schema is embedded directly inside the format object as `formats: [{type: "json", schema: {...}}]`. The v1 `jsonOptions` parameter no longer exists in v2.
For schema validation failures and other extraction errors, see [Errors](/api-reference/errors) — extraction-specific issues typically surface as `400` or `422` responses.
## Scrape and extract structured data with Firecrawl
Firecrawl uses AI to get structured data from web pages in 3 steps:
1. **Set the Schema (optional):**
Define a JSON schema (using OpenAI's format) to specify the data you want, or just provide a `prompt` if you don't need a strict schema, along with the webpage URL.
2. **Make the Request:**
Send your URL and schema to our scrape endpoint using JSON mode. See how here:
[Scrape Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/scrape)
3. **Get Your Data:**
Get back clean, structured data matching your schema that you can use right away.
This makes getting web data in the format you need quick and easy.
## Extract structured data
### JSON mode via /scrape
Used to extract structured data from scraped pages.
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
schema: schema
}],
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [ {
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
} ]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### Structured data without schema
You can also extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"prompt": "Extract the company mission from the page."
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
prompt: "Extract the company mission from the page."
}]
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [{
"type": "json",
"prompt": "Extract the company mission from the page."
}]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### Real-world example: Extracting company information
Here's a comprehensive example extracting structured company information from a website:
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev/',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}]
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
const companyInfoSchema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev/", {
formats: [{
type: "json",
schema: companyInfoSchema
}]
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev/",
"formats": [{
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
}]
}'
```
Output:
```json Output theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "Turn websites into LLM-ready data",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
}
}
}
```
### JSON format options
When using JSON mode in v2, include an object in `formats` with the schema embedded directly:
`formats: [{ type: 'json', schema: { ... }, prompt: '...' }]`
Parameters:
* `schema`: JSON Schema describing the structured output you want (required for schema-based extraction).
* `prompt`: Optional prompt to guide extraction (also used for no-schema extraction).
**Important:** Unlike v1, there is no separate `jsonOptions` parameter in v2. The schema must be included directly inside the format object in the `formats` array.
**HTML attributes are not available in JSON extraction.** JSON extraction works on the markdown conversion of the page, which only preserves visible text content. HTML attributes (e.g., `data-id`, custom attributes on elements) are stripped during conversion and the LLM cannot see them. If you need to extract HTML attribute values, use `rawHtml` format and parse attributes client-side, or use an `executeJavascript` action to inject attribute values into visible text before extraction.
## Tips for consistent extraction
If you are seeing inconsistent or incomplete results from JSON extraction, these practices can help:
* **Keep prompts short and focused.** Long prompts with many rules increase variability. Move specific constraints (like allowed values) into the schema instead.
* **Use concise property names.** Avoid embedding instructions or enum lists in property names. Use a short key like `"installation_type"` and put allowed values in an `enum` array.
* **Add `enum` arrays for constrained fields.** When a field has a fixed set of values, list them in `enum` and make sure they match the exact text shown on the page.
* **Include null-handling in field descriptions.** Add `"Return null if not found on the page."` to each field's `description` so the model does not guess missing values.
* **Add location hints.** Tell the model where to find data on the page, e.g. `"Flow rate in GPM from the Specifications table."`.
* **Split large schemas into smaller requests.** Schemas with many fields (e.g. 30+) produce less consistent results. Split them into 2–3 requests of 10–15 fields each.
* **Avoid `minItems`/`maxItems` on arrays.** JSON Schema validation keywords like `minItems` and `maxItems` do not control how much content the scraper collects. Setting `minItems: 20` will not make the LLM return more items — it may instead hallucinate entries to satisfy the constraint. Remove these keywords and use a `prompt` instead (e.g. `"Extract ALL reviews from the page. Do not skip any."`) to guide completeness.
* **Use `"type": "array"` to extract lists of items.** If you need to extract multiple items (e.g. a list of people, products, or reviews), wrap them in an array property with an `items` block. Using `"type": "object"` for a list will return only a single item. See the array schema example below.
**Example of a well-structured schema:**
```json theme={null}
{
"type": "object",
"properties": {
"product_name": {
"type": ["string", "null"],
"description": "Full descriptive product name as shown on the page. Return null if not found."
},
"installation_type": {
"type": ["string", "null"],
"description": "Installation type from the Specifications section. Return null if not found.",
"enum": ["Deck-mount", "Wall-mount", "Countertop", "Drop-in", "Undermount"]
},
"flow_rate_gpm": {
"type": ["string", "null"],
"description": "Flow rate in GPM from the Specifications section. Return null if not found."
}
}
}
```
**Example of extracting a list of items:**
When a page contains multiple items (e.g. team members, products, reviews), use `"type": "array"` with `"items"` to get the full list:
```json theme={null}
{
"type": "object",
"properties": {
"people": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"role": { "type": "string" },
"department": { "type": "string" }
}
}
}
}
}
```
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Lockdown Mode
Source: https://docs.firecrawl.dev/features/lockdown
Cache-only scrape mode for compliance and air-gapped environments. No outbound traffic.
Lockdown mode forces the scrape endpoint to read from Firecrawl's existing index and cache only — it never makes an outbound request to the target URL. It is designed for compliance-constrained and air-gapped environments where the scrape request itself (the URL, headers, and body) could leak sensitive information over the network.
## How it works
When `lockdown: true` is set on a `/v2/scrape` request:
* **No outbound traffic.** Firecrawl never connects to the target URL. All outbound paths (HTTP engines, robots.txt fetching, search-index writes, audio transforms, etc.) are gated off.
* **Cache-only reads.** The request is served from Firecrawl's index if a matching entry exists. The default `maxAge` is bumped to 2 years so existing cached pages are eligible regardless of age.
* **Cache miss returns an error.** If no cached data is available, Firecrawl returns a `404` with error code `SCRAPE_LOCKDOWN_CACHE_MISS`. The URL is never logged on miss.
* **Zero data retention.** Lockdown requests are treated as ZDR: no URL is persisted, no response blob is written to long-term storage, and the scrape job is cleaned up after delivery.
## When to use this
**Great for:**
* Regulated industries (healthcare, finance, legal) where outbound requests require audit or approval
* Air-gapped or compliance-constrained environments where the URL itself is sensitive
* Replaying already-indexed pages without re-hitting origins
**Skip for:**
* Fresh content that has never been scraped before — lockdown mode returns an error on cache miss
* Real-time or time-sensitive data
## Usage
Add `lockdown: true` to your scrape request.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Serve only previously cached results. No outbound request is made.
# Returns SCRAPE_LOCKDOWN_CACHE_MISS if the URL is not in the cache.
scrape_result = firecrawl.scrape(
'https://firecrawl.dev',
formats=['markdown'],
lockdown=True,
)
print(scrape_result.markdown)
```
```javascript JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Serve only previously cached results. No outbound request is made.
// Returns SCRAPE_LOCKDOWN_CACHE_MISS if the URL is not in the cache.
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown'],
lockdown: true,
});
console.log(scrapeResult.markdown);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown"],
"lockdown": true
}'
```
```bash CLI theme={null}
# Serve only previously cached results. No outbound request is made.
firecrawl https://firecrawl.dev --lockdown
```
## Cache miss response
If the URL has not been previously scraped and cached, the response is:
```json theme={null}
{
"success": false,
"code": "SCRAPE_LOCKDOWN_CACHE_MISS",
"error": "No cached data is available for this request in lockdown mode. Lockdown mode only serves previously cached responses and never makes outbound requests. To resolve this, either disable lockdown mode to allow a fresh scrape, or try again after the URL has been scraped and cached."
}
```
To seed the cache, perform a normal (non-lockdown) scrape of the URL first. Subsequent lockdown requests will return the cached result.
## Billing
| Outcome | Credits |
| ----------------------------------------- | --------- |
| Cache hit | 5 credits |
| Cache miss (`SCRAPE_LOCKDOWN_CACHE_MISS`) | 1 credit |
Zero Data Retention does not incur an additional charge on lockdown requests — the ZDR cost is waived because lockdown mode is already ZDR by default.
## Cache hit matching
Lockdown uses the same cache-match rules as regular scrapes. For a cache hit, these parameters must match the cached entry: `url`, `mobile`, `location`, `waitFor`, `blockAds`, `screenshot` (enabled/disabled and full-page), and enhanced proxy mode. You can verify behavior via `metadata.cacheState` in the response — it will be `"hit"` on a served response.
## Availability
Lockdown mode is supported on the `/v2/scrape` endpoint and is exposed across all surfaces that call it:
* **SDKs** — Python, Node.js, Go, Rust, Java, .NET, Ruby, PHP, and Elixir (`lockdown: true` on the scrape options).
* **CLI** — pass `--lockdown` to `firecrawl scrape`.
* **MCP server** — include `"lockdown": true` in the `firecrawl_scrape` tool arguments.
It is not available on `crawl`, `map`, `extract`, or `search`.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Map
Source: https://docs.firecrawl.dev/features/map
Input a website and get all the urls on the website - extremely fast
## Introducing /map
The easiest way to go from a single url to a map of the entire website. This is extremely useful for:
* When you need to prompt the end-user to choose which links to scrape
* Need to quickly know the links on a website
* Need to scrape pages of a website that are related to a specific topic (use the `search` parameter)
* Only need to scrape specific pages of a website
Test mapping in the interactive playground — no code required.
## Mapping
### /map endpoint
Used to map a URL and get urls of the website. This returns most links present on the website.
URLs are primarily discovered from the website's sitemap, supplemented with SERP (search engine) results and previously crawled pages to improve coverage. You can control sitemap behavior with the `sitemap` parameter.
### Installation
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
```
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl
# Authenticate (one-time setup)
firecrawl login
```
### Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
res = firecrawl.map(url="https://firecrawl.dev", limit=50, sitemap="include")
print(res)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const res = await firecrawl.map('https://firecrawl.dev', { limit: 50, sitemap: 'include' });
console.log(res);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/map \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev"
}'
```
```bash CLI theme={null}
# Map a website to discover URLs
firecrawl map https://firecrawl.dev
# Output as JSON with limit
firecrawl map https://firecrawl.dev --json --limit 100 --pretty
```
Each map request consumes 1 credit per call, regardless of the number of URLs returned. For example, setting `limit` to 100,000 still uses 1 credit.
### Response
SDKs will return the data object directly. cURL will return the payload exactly as shown below.
```json theme={null}
{
"success": true,
"links": [
{
"url": "https://docs.firecrawl.dev/features/scrape",
"title": "Scrape | Firecrawl",
"description": "Turn any url into clean data"
},
{
"url": "https://www.firecrawl.dev/blog/5_easy_ways_to_access_glm_4_5",
"title": "5 Easy Ways to Access GLM-4.5",
"description": "Discover how to access GLM-4.5 models locally, through chat applications, via the official API, and using the LLM marketplaces API for seamless integration i..."
},
{
"url": "https://www.firecrawl.dev/playground",
"title": "Playground - Firecrawl",
"description": "Preview the API response and get the code snippets for the API"
},
{
"url": "https://www.firecrawl.dev/?testId=2a7e0542-077b-4eff-bec7-0130395570d6",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ..."
},
{
"url": "https://www.firecrawl.dev/?testId=af391f07-ca0e-40d3-8ff2-b1ecf2e3fcde",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders. Clean, structured, and ..."
},
...
]
}
```
Title and description are not always present as it depends on the website.
#### Map with search
Map with `search` param allows you to search for specific urls inside a website.
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/map \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"search": "docs"
}'
```
Response will be an ordered list from the most relevant to the least relevant.
```json theme={null}
{
"status": "success",
"links": [
{
"url": "https://docs.firecrawl.dev",
"title": "Firecrawl Docs",
"description": "Firecrawl documentation"
},
{
"url": "https://docs.firecrawl.dev/sdks/python",
"title": "Firecrawl Python SDK",
"description": "Firecrawl Python SDK documentation"
},
...
]
}
```
## Location and Language
Specify country and preferred languages to get relevant content based on your target location and language preferences, similar to the scrape endpoint.
### How it works
When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified.
### Usage
To use the location and language settings, include the `location` object in your request body with the following properties:
* `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'.
* `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
res = firecrawl.map('https://example.com',
location={
'country': 'US',
'languages': ['en']
}
)
print(res)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const res = await firecrawl.map('https://example.com', {
location: { country: 'US', languages: ['en'] },
});
console.log(res.metadata);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/map" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"location": { "country": "US", "languages": ["en"] }
}'
```
For more details about supported locations, refer to the [Proxies documentation](/features/proxies).
## Considerations
This endpoint prioritizes speed, so it may not capture all website links. It primarily relies on the website's sitemap, supplemented by cached crawl data and search engine results. For a more thorough and up-to-date list of URLs, consider using the [/crawl](/features/crawl) endpoint instead.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Models
Source: https://docs.firecrawl.dev/features/models
Choose the right model for your agent extraction tasks.
Firecrawl Agent offers two models optimized for different use cases. Choose the right model based on your extraction complexity and cost requirements.
## Available Models
| Model | Cost | Accuracy | Best For |
| -------------- | --------------- | -------- | ------------------------------------- |
| `spark-1-mini` | **60% cheaper** | Standard | Most tasks (default) |
| `spark-1-pro` | Standard | Higher | Complex research, critical extraction |
**Start with Spark 1 Mini** (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical.
## Spark 1 Mini (Default)
`spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks.
**Use Mini when:**
* Extracting simple data points (contact info, pricing, etc.)
* Working with well-structured websites
* Cost efficiency is a priority
* Running high-volume extraction jobs
**Example use cases:**
* Extracting product prices from e-commerce sites
* Gathering contact information from company pages
* Pulling basic metadata from articles
* Simple data point lookups
## Spark 1 Pro
`spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks.
**Use Pro when:**
* Performing complex competitive analysis
* Extracting data that requires deep reasoning
* Accuracy is critical for your use case
* Dealing with ambiguous or hard-to-find data
**Example use cases:**
* Multi-domain competitive analysis
* Complex research tasks requiring reasoning
* Extracting nuanced information from multiple sources
* Critical business intelligence gathering
## Specifying a Model
Pass the `model` parameter to select which model to use:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Using Spark 1 Mini (default - can be omitted)
result = app.agent(
prompt="Find the pricing of Firecrawl",
model="spark-1-mini"
)
# Using Spark 1 Pro for complex tasks
result = app.agent(
prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model="spark-1-pro"
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Using Spark 1 Mini (default - can be omitted)
const result = await firecrawl.agent({
prompt: "Find the pricing of Firecrawl",
model: "spark-1-mini"
});
// Using Spark 1 Pro for complex tasks
const resultPro = await firecrawl.agent({
prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model: "spark-1-pro"
});
console.log(result.data);
```
```bash cURL theme={null}
# Using Spark 1 Mini (default)
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the pricing of Firecrawl",
"model": "spark-1-mini"
}'
# Using Spark 1 Pro for complex tasks
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
"model": "spark-1-pro"
}'
```
## Model Comparison
| Feature | Spark 1 Mini | Spark 1 Pro |
| ---------------- | ------------ | ------------- |
| **Cost** | 60% cheaper | Standard |
| **Accuracy** | Standard | Higher |
| **Speed** | Fast | Fast |
| **Best for** | Most tasks | Complex tasks |
| **Reasoning** | Standard | Advanced |
| **Multi-domain** | Good | Excellent |
## Pricing by Model
Both models use dynamic, credit-based pricing that scales with task complexity:
* **Spark 1 Mini**: Uses approximately 60% fewer credits than Pro for equivalent tasks
* **Spark 1 Pro**: Standard credit consumption for maximum accuracy
Credit usage varies based on prompt complexity, data processed, and output structure — regardless of model selected.
## Choosing the Right Model
```
┌─────────────────────────────────┐
│ What type of task? │
└─────────────────────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Simple/Direct │ │ Complex/Research│
│ extraction │ │ multi-domain │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ spark-1-mini │ │ spark-1-pro │
│ (60% cheaper) │ │ (higher acc.) │
└─────────────────┘ └─────────────────┘
```
## API Reference
See the [Agent API Reference](/api-reference/endpoint/agent) for complete parameter documentation.
Have questions about which model to use? Email [help@firecrawl.com](mailto:help@firecrawl.com).
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Parse
Source: https://docs.firecrawl.dev/features/parse
Upload a local or non-public document and convert it into clean, LLM-ready data
## Introducing /parse
The `/parse` endpoint converts local or non-public documents into clean, LLM-ready data. Upload file bytes via `multipart/form-data` and get back Markdown, JSON, HTML, links, images, or a summary — with reading order and tables preserved.
* Turn PDF, DOCX, XLSX, HTML, and more into Markdown or structured JSON
* Up to **5x faster** parsing via a Rust-based engine
* Files up to **50 MB** per request
* Zero Data Retention support
## When to use `/parse`
Use `/parse` when the source document is **a local file** or **not publicly accessible by URL**. If you have a public URL that points to a document, prefer [`/scrape`](/features/scrape) — it auto-detects the file type from the extension or content type and parses it the same way.
| Source | Endpoint |
| ---------------------------------------------------------------- | ------------------------------------------------ |
| Public URL to a document (e.g. `https://example.com/report.pdf`) | [`POST /scrape`](/api-reference/endpoint/scrape) |
| Local file or non-public bytes (PDF, DOCX, XLSX, HTML, …) | [`POST /parse`](/api-reference/endpoint/parse) |
## Parsing
### /parse endpoint
Used to upload a file and receive parsed content. The request is `multipart/form-data` with a required `file` part and an optional `options` JSON part.
**Supported extensions:** `.html`, `.htm`, `.pdf`, `.docx`, `.doc`, `.odt`, `.rtf`, `.xlsx`, `.xls`.
### Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.parse("./report.pdf")
print(doc.markdown)
```
```javascript Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import fs from "node:fs";
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.parse({
data: fs.readFileSync("./report.pdf"),
filename: "report.pdf",
});
console.log(doc.markdown);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/parse \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@./report.pdf' \
-F 'options={"formats":["markdown"]};type=application/json'
```
### Response
SDKs return the document object directly. cURL returns the JSON payload.
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Annual Report\n\n...",
"metadata": {
"title": "Annual Report",
"numPages": 42,
"sourceFile": "report.pdf"
}
}
}
```
## Options
`/parse` accepts a subset of scrape options under the `options` field. Common settings:
* `formats`: Array of output formats. Defaults to `["markdown"]`. Supported: `markdown`, `html`, `rawHtml`, `links`, `images`, `summary`, and `json` (with a schema or prompt).
* `onlyMainContent`: Only return the main content of the document. Defaults to `true`.
* `includeTags` / `excludeTags`: Tag-level inclusion or exclusion (HTML inputs).
* `timeout`: Request timeout in milliseconds. Defaults to `30000`, max `300000`.
* `parsers`: File-parser controls. For PDFs, set `{ "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": }`.
`/parse` does not support browser-only options like `actions`, `waitFor`, `location`, `mobile`, or change tracking.
### PDF parser modes
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/parse \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@./scan.pdf' \
-F 'options={"parsers":[{"type":"pdf","mode":"ocr","maxPages":50}]};type=application/json'
```
* `fast`: text-only extraction, fastest path.
* `auto` (default): text-first with OCR fallback for image-only pages.
* `ocr`: OCR every page — use for scanned documents.
### Structured JSON output
Pass a JSON schema or prompt to extract structured data directly from the document:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/parse \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@./invoice.pdf' \
-F 'options={"formats":[{"type":"json","schema":{"type":"object","properties":{"total":{"type":"number"},"vendor":{"type":"string"}}}}]};type=application/json'
```
## Considerations
* Maximum file size is **50 MB** per request.
* Parsing very large or scanned PDFs in `ocr` mode may take longer — increase `timeout` or use `maxPages` to bound the work.
* For batches of files, call `/parse` per file in parallel; there is no batch upload variant.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Proxies
Source: https://docs.firecrawl.dev/features/proxies
Learn about proxy types, locations, and how Firecrawl selects proxies for your requests.
Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. The proxy type can be specified using the `proxy` parameter.
> By default, Firecrawl routes all requests through proxies to help ensure reliability and access, even if you do not specify a proxy type or location.
## Location-Based Proxy Selection
Firecrawl automatically selects the best proxy based on your specified or detected location. This helps optimize scraping performance and reliability. However, not all locations are currently supported. The following locations are available:
| Country Code | Country Name | Basic Proxy Support | Enhanced Proxy Support |
| ------------ | -------------------- | ------------------- | ---------------------- |
| AE | United Arab Emirates | Yes | No |
| AU | Australia | Yes | No |
| BR | Brazil | Yes | No |
| CA | Canada | Yes | No |
| CN | China | Yes | No |
| CZ | Czechia | Yes | No |
| DE | Germany | Yes | No |
| DK | Denmark | Yes | Yes |
| EE | Estonia | Yes | No |
| EG | Egypt | Yes | No |
| ES | Spain | Yes | No |
| FR | France | Yes | No |
| GB | United Kingdom | Yes | No |
| GR | Greece | Yes | No |
| HU | Hungary | Yes | No |
| ID | Indonesia | Yes | No |
| IL | Israel | Yes | No |
| IN | India | Yes | No |
| IT | Italy | Yes | No |
| JP | Japan | Yes | No |
| MY | Malaysia | Yes | No |
| NO | Norway | Yes | No |
| PL | Poland | Yes | No |
| PT | Portugal | Yes | No |
| QA | Qatar | Yes | No |
| SG | Singapore | Yes | No |
| US | United States | Yes | Yes |
| VN | Vietnam | Yes | No |
The list of supported proxy locations will change over time.
If you need proxies in a location not listed above, please [contact us](mailto:help@firecrawl.com) and let us know your requirements.
If you do not specify a proxy or location, Firecrawl will automatically use US proxies.
## How to Specify Proxy Location
You can request a specific proxy location by setting the `location.country` parameter in your request. For example, to use a Brazilian proxy, set `location.country` to `BR`.
For full details, see the [API reference for `location.country`](https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-location).
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape('https://example.com',
formats=['markdown'],
location={
'country': 'US',
'languages': ['en']
}
)
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', {
formats: ['markdown'],
location: { country: 'US', languages: ['en'] },
});
console.log(doc.metadata);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"location": { "country": "US", "languages": ["en"] }
}'
```
If you request a country where a proxy is not available, Firecrawl will use the closest available region (EU or US) and set the browser location to your requested country.
## Proxy Types
Firecrawl supports three types of proxies:
* **basic**: Proxies for scraping most sites. Fast and usually works.
* **enhanced**: Enhanced proxies for scraping complex sites while maintaining privacy. Slower, but more reliable on certain sites. [Learn more about Enhanced Mode →](/features/enhanced-mode)
* **auto**: Firecrawl will automatically retry scraping with enhanced proxies if the basic proxy fails. If the retry with enhanced is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed.
***
> **Note:** For detailed information on using enhanced proxies, including credit costs and retry strategies, see the [Enhanced Mode documentation](/features/enhanced-mode).
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Scrape
Source: https://docs.firecrawl.dev/features/scrape
Turn any url into clean data
Firecrawl converts web pages into markdown, ideal for LLM applications.
* It manages complexities: proxies, caching, rate limits, js-blocked content
* Handles dynamic content: dynamic websites, js-rendered sites, PDFs, images
* Outputs clean markdown, structured data, screenshots or html.
For details, see the [Scrape Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape).
Test scraping in the interactive playground — no code required.
If a request fails, see [Errors](/api-reference/errors) for the full catalog of error codes, causes, remedies, and retry guidance.
## Scraping a URL with Firecrawl
### /scrape endpoint
Used to scrape a URL and get its content.
### Installation
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
```
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl
# Authenticate (one-time setup)
firecrawl login
```
### Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
# Scrape a website:
doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"])
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Scrape a website:
const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown", "html"]
}'
```
```bash CLI theme={null}
# Scrape a URL and get markdown
firecrawl https://firecrawl.dev
# With multiple formats (returns JSON)
firecrawl https://firecrawl.dev --format markdown,html,links --pretty
```
For more details about the parameters, refer to the [API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape).
Each scrape consumes 1 credit. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, PDF parsing costs 1 credit per PDF page, and audio extraction costs 4 additional credits per page.
### Response
SDKs will return the data object directly. cURL will return the payload exactly as shown below.
```json theme={null}
{
"success": true,
"data" : {
"markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...",
"html": "
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
schema: schema
}],
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [ {
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
} ]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### Extracting without schema
You can now extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"prompt": "Extract the company mission from the page."
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
prompt: "Extract the company mission from the page."
}]
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [{
"type": "json",
"prompt": "Extract the company mission from the page."
}]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### JSON format options
When using the `json` format, pass an object inside `formats` with the following parameters:
* `schema`: JSON Schema for the structured output.
* `prompt`: Optional prompt to help guide extraction when a schema is present or when you prefer light guidance.
## Extract brand identity
### /scrape (with branding) endpoint
The branding format extracts comprehensive brand identity information from a webpage, including colors, fonts, typography, spacing, UI components, and more. This is useful for design system analysis, brand monitoring, or building tools that need to understand a website's visual identity.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
result = firecrawl.scrape(
url='https://firecrawl.dev',
formats=['branding']
)
print(result['branding'])
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const result = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['branding']
});
console.log(result.branding);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["branding"]
}'
```
### Response
The branding format returns a comprehensive `BrandingProfile` object with the following structure:
```json Output theme={null}
{
"success": true,
"data": {
"branding": {
"colorScheme": "dark",
"logo": "https://firecrawl.dev/logo.svg",
"colors": {
"primary": "#FF6B35",
"secondary": "#004E89",
"accent": "#F77F00",
"background": "#1A1A1A",
"textPrimary": "#FFFFFF",
"textSecondary": "#B0B0B0"
},
"fonts": [
{
"family": "Inter"
},
{
"family": "Roboto Mono"
}
],
"typography": {
"fontFamilies": {
"primary": "Inter",
"heading": "Inter",
"code": "Roboto Mono"
},
"fontSizes": {
"h1": "48px",
"h2": "36px",
"h3": "24px",
"body": "16px"
},
"fontWeights": {
"regular": 400,
"medium": 500,
"bold": 700
}
},
"spacing": {
"baseUnit": 8,
"borderRadius": "8px"
},
"components": {
"buttonPrimary": {
"background": "#FF6B35",
"textColor": "#FFFFFF",
"borderRadius": "8px"
},
"buttonSecondary": {
"background": "transparent",
"textColor": "#FF6B35",
"borderColor": "#FF6B35",
"borderRadius": "8px"
}
},
"images": {
"logo": "https://firecrawl.dev/logo.svg",
"favicon": "https://firecrawl.dev/favicon.ico",
"ogImage": "https://firecrawl.dev/og-image.png"
}
}
}
}
```
### Branding Profile Structure
The `branding` object contains the following properties:
* `colorScheme`: The detected color scheme (`"light"` or `"dark"`)
* `logo`: URL of the primary logo
* `colors`: Object containing brand colors:
* `primary`, `secondary`, `accent`: Main brand colors
* `background`, `textPrimary`, `textSecondary`: UI colors
* `link`, `success`, `warning`, `error`: Semantic colors
* `fonts`: Array of font families used on the page
* `typography`: Detailed typography information:
* `fontFamilies`: Primary, heading, and code font families
* `fontSizes`: Size definitions for headings and body text
* `fontWeights`: Weight definitions (light, regular, medium, bold)
* `lineHeights`: Line height values for different text types
* `spacing`: Spacing and layout information:
* `baseUnit`: Base spacing unit in pixels
* `borderRadius`: Default border radius
* `padding`, `margins`: Spacing values
* `components`: UI component styles:
* `buttonPrimary`, `buttonSecondary`: Button styles
* `input`: Input field styles
* `icons`: Icon style information
* `images`: Brand images (logo, favicon, og:image)
* `animations`: Animation and transition settings
* `layout`: Layout configuration (grid, header/footer heights)
* `personality`: Brand personality traits (tone, energy, target audience)
### Combining with other formats
You can combine the branding format with other formats to get comprehensive page data:
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
result = firecrawl.scrape(
url='https://firecrawl.dev',
formats=['markdown', 'branding', 'screenshot']
)
print(result['markdown'])
print(result['branding'])
print(result['screenshot'])
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const result = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown', 'branding', 'screenshot']
});
console.log(result.markdown);
console.log(result.branding);
console.log(result.screenshot);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown", "branding", "screenshot"]
}'
```
## Audio extraction
The `audio` format extracts audio from supported websites (e.g. YouTube) as MP3 files and returns a signed Google Cloud Storage URL. This is useful for building audio processing pipelines, transcription services, or podcast tools.
Audio extraction costs 5 credits per page (1 base + 4 additional).
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape("https://www.youtube.com/watch?v=dQw4w9WgXcQ", formats=["audio"])
print(doc.audio) # Signed GCS URL to the MP3 file
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://www.youtube.com/watch?v=dQw4w9WgXcQ', {
formats: ['audio']
});
console.log(doc.audio); // Signed GCS URL to the MP3 file
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"formats": ["audio"]
}'
```
## Interacting with the page with Actions
Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
**We recommend [Interact](/features/interact) over actions: our newer, more powerful way to interact with scraped pages.**
Interact runs as a stateful browser session that stays alive across calls, so you can drive a page turn-by-turn with either:
* **Natural language** for flexible, non-deterministic flows. e.g. *“search for ‘wireless headphones’, filter to 4+ stars under \$200, and return the results”*.
* **Playwright or agent-browser code** for deterministic steps. e.g. `await page.click('#export')`.
Interact also supports profiles, persistent sessions, and a live embeddable browser view (with an interactive mode where end users can drive the browser themselves).
Here is an example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot.
It is important to almost always use the `wait` action before/after executing other actions to give enough time for the page to load.
### Example
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape(
url="https://example.com/login",
formats=["markdown"],
actions=[
{"type": "write", "text": "john@example.com"},
{"type": "press", "key": "Tab"},
{"type": "write", "text": "secret"},
{"type": "click", "selector": 'button[type="submit"]'},
{"type": "wait", "milliseconds": 1500},
{"type": "screenshot", "full_page": True},
],
)
print(doc.markdown, doc.screenshot)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com/login', {
formats: ['markdown'],
actions: [
{ type: 'write', text: 'john@example.com' },
{ type: 'press', key: 'Tab' },
{ type: 'write', text: 'secret' },
{ type: 'click', selector: 'button[type="submit"]' },
{ type: 'wait', milliseconds: 1500 },
{ type: 'screenshot', fullPage: true },
],
});
console.log(doc.markdown, doc.screenshot);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://example.com/login",
"formats": ["markdown"],
"actions": [
{ "type": "write", "text": "john@example.com" },
{ "type": "press", "key": "Tab" },
{ "type": "write", "text": "secret" },
{ "type": "click", "selector": "button[type=\"submit\"]" },
{ "type": "wait", "milliseconds": 1500 },
{ "type": "screenshot", "fullPage": true },
],
}'
```
### Output
```json JSON theme={null}
{
"success": true,
"data": {
"markdown": "Our first Launch Week is over! [See the recap 🚀](blog/firecrawl-launch-week-1-recap)...",
"actions": {
"screenshots": [
"https://alttmdsdujxrfnakrkyi.supabase.co/storage/v1/object/public/media/screenshot-75ef2d87-31e0-4349-a478-fb432a29e241.png"
],
"scrapes": [
{
"url": "https://www.firecrawl.dev/",
"html": "
Firecrawl
"
}
]
},
"metadata": {
"title": "Home - Firecrawl",
"description": "Firecrawl crawls and converts any website into clean markdown.",
"language": "en",
"keywords": "Firecrawl,Markdown,Data,Mendable,Langchain",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "Turn any website into LLM-ready data.",
"ogUrl": "https://www.firecrawl.dev/",
"ogImage": "https://www.firecrawl.dev/og.png?123",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "http://google.com",
"statusCode": 200
}
}
}
```
For workflows that require richer browser control after scraping, such as authenticated sessions, multi-step navigation, or a live view of the page, we recommend [Interact](/features/interact) over extending the actions array.
## Location and Language
Specify country and preferred languages to get relevant content based on your target location and language preferences.
### How it works
When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified.
### Usage
To use the location and language settings, include the `location` object in your request body with the following properties:
* `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'.
* `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape('https://example.com',
formats=['markdown'],
location={
'country': 'US',
'languages': ['en']
}
)
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', {
formats: ['markdown'],
location: { country: 'US', languages: ['en'] },
});
console.log(doc.metadata);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"location": { "country": "US", "languages": ["en"] }
}'
```
For more details about supported locations, refer to the [Proxies documentation](/features/proxies).
## Caching and maxAge
To make requests faster, Firecrawl serves results from cache by default when a recent copy is available.
* **Default freshness window**: `maxAge = 172800000` ms (2 days). If a cached page is newer than this, it’s returned instantly; otherwise, the page is scraped and then cached.
* **Performance**: This can speed up scrapes by up to 5x when data doesn’t need to be ultra-fresh.
* **Always fetch fresh**: Set `maxAge` to `0`. Note that this bypasses the cache entirely, so every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if freshness on every request is not critical.
* **Avoid storing**: Set `storeInCache` to `false` if you don’t want Firecrawl to cache/store results for this request.
* **Cache-only lookup**: Set `minAge` to perform a cache-only lookup without triggering a fresh scrape. The value is in milliseconds and specifies the minimum age the cached data must be. If no cached data is found, a `404` with error code `SCRAPE_NO_CACHED_DATA` is returned. Set `minAge` to `1` to accept any cached data regardless of age.
* **Change tracking**: Requests that include `changeTracking` bypass the cache, so `maxAge` is ignored.
* **Credits**: Cached results still cost 1 credit per page. Caching improves speed, not credit usage.
Example (force fresh content):
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
doc = firecrawl.scrape(url='https://example.com', max_age=0, formats=['markdown'])
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', { maxAge: 0, formats: ['markdown'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"maxAge": 0,
"formats": ["markdown"]
}'
```
Example (use a 10-minute cache window):
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
doc = firecrawl.scrape(url='https://example.com', max_age=600000, formats=['markdown', 'html'])
print(doc)
```
```js Node theme={null}
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', { maxAge: 600000, formats: ['markdown', 'html'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"maxAge": 600000,
"formats": ["markdown", "html"]
}'
```
## Batch scraping multiple URLs
You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.
### How it works
It is very similar to how the `/crawl` endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape.
The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape.
### Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
job = firecrawl.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)
print(job)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const job = await firecrawl.batchScrape([
'https://firecrawl.dev',
'https://docs.firecrawl.dev',
], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 });
console.log(job);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"],
"formats": ["markdown"]
}'
```
### Response
If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape.
#### Synchronous
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
#### Asynchronous
You can then use the job ID to check the status of the batch scrape by calling the `/batch/scrape/{id}` endpoint. This endpoint is meant to be used while the job is still running or right after it has completed **as batch scrape jobs expire after 24 hours**.
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
```
## Enhanced Mode
For complex websites, Firecrawl offers enhanced mode that provides better success rates while maintaining privacy.
Learn more about [Enhanced Mode](/features/enhanced-mode).
## Zero Data Retention (ZDR)
Firecrawl supports Zero Data Retention (ZDR) for teams with strict data handling requirements. When enabled, Firecrawl will not persist any page content or extracted data beyond the lifetime of the request.
To enable ZDR, set `zeroDataRetention: true` in your request:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"zeroDataRetention": true
}'
```
ZDR is available on Enterprise plans and must be enabled for your team. Visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started.
ZDR adds **1 additional credit per page** on top of the base scrape cost.
Screenshots are not available in ZDR mode. Because screenshots require uploading to persistent storage, they are incompatible with the ZDR guarantee. Requests that include both `zeroDataRetention: true` and a `screenshot` format will return an error.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Search
Source: https://docs.firecrawl.dev/features/search
Search the web and get full content from results
Search the web and get clean, structured content from every result in a single API call. Pass a query to `/search` and Firecrawl returns titles, descriptions, and URLs. Add `scrapeOptions` to also retrieve full-page markdown, HTML, links, or screenshots for each result.
For the full parameter list, see the [Search Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/search).
Test searching in the interactive playground — no code required.
## Performing a Search with Firecrawl
### /search endpoint
Used to perform web searches and optionally retrieve content from the results.
### Installation
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
```
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl
# Authenticate (one-time setup)
firecrawl login
```
### Basic Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
results = firecrawl.search(
query="firecrawl",
limit=3,
)
print(results)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await firecrawl.search('firecrawl', {
limit: 3,
scrapeOptions: { formats: ['markdown'] }
});
console.log(results);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/search" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "firecrawl",
"limit": 3
}'
```
```bash CLI theme={null}
# Search the web
firecrawl search "firecrawl web scraping" --limit 5 --pretty
```
### Response
SDKs will return the data object directly. cURL will return the complete payload.
```json JSON theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://www.firecrawl.dev/",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.",
"position": 1
},
{
"url": "https://github.com/firecrawl/firecrawl",
"title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub",
"description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.",
"position": 2
},
...
],
"images": [
{
"title": "Quickstart | Firecrawl",
"imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png",
"imageWidth": 5814,
"imageHeight": 1200,
"url": "https://docs.firecrawl.dev/",
"position": 1
},
...
],
"news": [
{
"title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees",
"url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/",
"snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.",
"date": "3 months ago",
"position": 1
},
...
]
}
}
```
## Search result types
In addition to regular web results, Search supports specialized result types via the `sources` parameter:
* `web`: standard web results (default)
* `news`: news-focused results
* `images`: image search results
You can request multiple sources in a single call (e.g., `sources: ["web", "news"]`). When you do, the `limit` parameter applies **per source type** — so `limit: 5` with `sources: ["web", "news"]` returns up to 5 web results and up to 5 news results (10 total). If you need different parameters per source (for example, different `limit` values or different `scrapeOptions`), make separate calls instead.
## Search Categories
Filter search results by specific categories using the `categories` parameter:
* `github`: Search within GitHub repositories, code, issues, and documentation
* `research`: Search academic and research websites (arXiv, Nature, IEEE, PubMed, etc.)
* `pdf`: Search for PDFs
### GitHub Category Search
Search specifically within GitHub repositories:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "web scraping python",
"categories": ["github"],
"limit": 10
}'
```
### Research Category Search
Search academic and research websites:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "machine learning transformers",
"categories": ["research"],
"limit": 10
}'
```
### Mixed Category Search
Combine multiple categories in one search:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "neural networks",
"categories": ["github", "research"],
"limit": 15
}'
```
### Category Response Format
Each search result includes a `category` field indicating its source:
```json theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://github.com/example/neural-network",
"title": "Neural Network Implementation",
"description": "A PyTorch implementation of neural networks",
"category": "github"
},
{
"url": "https://arxiv.org/abs/2024.12345",
"title": "Advances in Neural Network Architecture",
"description": "Research paper on neural network improvements",
"category": "research"
}
]
}
}
```
Examples:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "openai",
"sources": ["news"],
"limit": 5
}'
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "jupiter",
"sources": ["images"],
"limit": 8
}'
```
### HD Image Search with Size Filtering
Use images operators to find high-resolution images:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "sunset imagesize:1920x1080",
"sources": ["images"],
"limit": 5
}'
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "mountain wallpaper larger:2560x1440",
"sources": ["images"],
"limit": 8
}'
```
**Common HD resolutions:**
* `imagesize:1920x1080` - Full HD (1080p)
* `imagesize:2560x1440` - QHD (1440p)
* `imagesize:3840x2160` - 4K UHD
* `larger:1920x1080` - HD and above
* `larger:2560x1440` - QHD and above
## Search with Content Scraping
Search and retrieve content from the search results in one operation.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Search and scrape content
results = firecrawl.search(
"firecrawl web scraping",
limit=3,
scrape_options={
"formats": ["markdown", "links"]
}
)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await firecrawl.search('firecrawl', {
limit: 3,
scrapeOptions: { formats: ['markdown'] }
});
console.log(results);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "firecrawl web scraping",
"limit": 3,
"scrapeOptions": {
"formats": ["markdown", "links"]
}
}'
```
```bash CLI theme={null}
# Search and scrape results
firecrawl search "firecrawl" --scrape --scrape-formats markdown --limit 5 --pretty
```
Every option in scrape endpoint is supported by this search endpoint through the `scrapeOptions` parameter.
### Response with Scraped Content
```json theme={null}
{
"success": true,
"data": [
{
"title": "Firecrawl - The Ultimate Web Scraping API",
"description": "Firecrawl is a powerful web scraping API that turns any website into clean, structured data for AI and analysis.",
"url": "https://firecrawl.dev/",
"markdown": "# Firecrawl\n\nThe Ultimate Web Scraping API\n\n## Turn any website into clean, structured data\n\nFirecrawl makes it easy to extract data from websites for AI applications, market research, content aggregation, and more...",
"links": [
"https://firecrawl.dev/pricing",
"https://firecrawl.dev/docs",
"https://firecrawl.dev/guides"
],
"metadata": {
"title": "Firecrawl - The Ultimate Web Scraping API",
"description": "Firecrawl is a powerful web scraping API that turns any website into clean, structured data for AI and analysis.",
"sourceURL": "https://firecrawl.dev/",
"statusCode": 200
}
}
]
}
```
## Advanced Search Options
Firecrawl's search API supports various parameters to customize your search:
### Location Customization
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Search with location settings (Germany)
search_result = firecrawl.search(
"web scraping tools",
limit=5,
location="Germany"
)
# Process the results
for result in search_result.data:
print(f"Title: {result['title']}")
print(f"URL: {result['url']}")
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Search with location settings (Germany)
const results = await firecrawl.search('web scraping tools', {
limit: 5,
location: "Germany"
});
// Process the results
console.log(results);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "web scraping tools",
"limit": 5,
"location": "Germany"
}'
```
```bash CLI theme={null}
# Search with location
firecrawl search "local restaurants" --location "San Francisco,California,United States" --country US --pretty
```
### Time-Based Search
Use the `tbs` parameter to filter results by time. Note that `tbs` only applies to `web` source results — it does not filter `news` or `images` results. If you need time-filtered news, consider using a `web` source with the `site:` operator to target specific news domains.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
results = firecrawl.search(
query="firecrawl",
limit=5,
tbs="qdr:d",
)
print(len(results.get('web', [])))
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await firecrawl.search('firecrawl', {
limit: 5,
tbs: 'qdr:d', // past day
});
console.log(results.web);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "latest web scraping techniques",
"limit": 5,
"tbs": "qdr:w"
}'
```
```bash CLI theme={null}
# Search with time filter (past week)
firecrawl search "firecrawl updates" --tbs qdr:w --limit 5 --pretty
```
Common `tbs` values:
* `qdr:h` - Past hour
* `qdr:d` - Past 24 hours
* `qdr:w` - Past week
* `qdr:m` - Past month
* `qdr:y` - Past year
* `sbd:1` - Sort by date (newest first)
For more precise time filtering, you can specify exact date ranges using the custom date range format:
```python Python theme={null}
from firecrawl import Firecrawl
# Initialize the client with your API key
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Search for results from December 2024
search_result = firecrawl.search(
"firecrawl updates",
limit=10,
tbs="cdr:1,cd_min:12/1/2024,cd_max:12/31/2024"
)
```
```js JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
// Initialize the client with your API key
const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
// Search for results from December 2024
firecrawl.search("firecrawl updates", {
limit: 10,
tbs: "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024"
})
.then(searchResult => {
console.log(searchResult.data);
});
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "firecrawl updates",
"limit": 10,
"tbs": "cdr:1,cd_min:12/1/2024,cd_max:12/31/2024"
}'
```
You can combine `sbd:1` with time filters to get date-sorted results within a time range. For example, `sbd:1,qdr:w` returns results from the past week sorted newest first, and `sbd:1,cdr:1,cd_min:12/1/2024,cd_max:12/31/2024` returns results from December 2024 sorted by date.
### Custom Timeout
Set a custom timeout for search operations:
```python Python theme={null}
from firecrawl import Firecrawl
# Initialize the client with your API key
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Set a 30-second timeout
search_result = firecrawl.search(
"complex search query",
limit=10,
timeout=30000 # 30 seconds in milliseconds
)
```
```js JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
// Initialize the client with your API key
const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
// Set a 30-second timeout
firecrawl.search("complex search query", {
limit: 10,
timeout: 30000 // 30 seconds in milliseconds
})
.then(searchResult => {
// Process results
console.log(searchResult.data);
});
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "complex search query",
"limit": 10,
"timeout": 30000
}'
```
## Zero Data Retention (ZDR)
For teams with strict data handling requirements, Firecrawl offers Zero Data Retention (ZDR) options for the `/search` endpoint via the `enterprise` parameter. ZDR search is available on Enterprise plans — visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started.
This is separate from the `zeroDataRetention` scrape option, which controls ZDR for scraping operations. See [Scrape ZDR](/features/scrape#zero-data-retention-zdr) for details. The `enterprise` parameter only applies to the search portion of the request.
### End-to-End ZDR
With end-to-end ZDR, both Firecrawl and our upstream search provider enforce zero data retention. No query or result data is stored at any point in the pipeline.
* **Cost:** 10 credits per 10 results
* **Parameter:** `enterprise: ["zdr"]`
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "sensitive topic",
"limit": 10,
"enterprise": ["zdr"]
}'
```
### Anonymized ZDR
With anonymized ZDR, Firecrawl enforces full zero data retention on our side. Our search provider may cache the query, but it is fully anonymized — no identifying information is attached.
* **Cost:** 2 credits per 10 results
* **Parameter:** `enterprise: ["anon"]`
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "sensitive topic",
"limit": 10,
"enterprise": ["anon"]
}'
```
### Combining Search ZDR with Scrape ZDR
If you are using search with content scraping (`scrapeOptions`), the `enterprise` parameter covers the search portion while `zeroDataRetention` in `scrapeOptions` covers the scraping portion. To get full ZDR across both, set both:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"query": "sensitive topic",
"limit": 5,
"enterprise": ["zdr"],
"scrapeOptions": {
"formats": ["markdown"],
"zeroDataRetention": true
}
}'
```
## Cost Implications
The cost of a search is 2 credits per 10 results, rounded up (1–10 results = 2 credits, 11–20 = 4 credits, and so on). If scraping options are enabled, the standard scraping costs apply to each search result:
* **Basic scrape**: 1 credit per webpage
* **PDF parsing**: 1 credit per PDF page
* **Enhanced proxy mode**: 4 additional credits per webpage
* **JSON mode**: 4 additional credits per webpage
To help control costs:
* Set `parsers: []` if PDF parsing isn’t required
* Use `proxy: "basic"` instead of `"enhanced"` when possible, or set it to `"auto"`
* Limit the number of search results with the `limit` parameter
## Advanced Scraping Options
For more details about the scraping options, refer to the [Scrape Feature documentation](https://docs.firecrawl.dev/features/scrape). Everything except for the FIRE-1 Agent and Change-Tracking features are supported by this Search endpoint.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Introduction
Source: https://docs.firecrawl.dev/introduction
Search the web, scrape any page, and interact with it — all through one API.
**For AI agents:** Append `.md` to any docs URL for markdown, e.g. [introduction.md](/introduction.md).
## Get started
Sign up and get your API key to start using Firecrawl
Test the API instantly without writing any code
### Use Firecrawl with AI agents (recommended)
The Firecrawl skill is the fastest way for agents to discover and use Firecrawl. Without it, your agent will not know Firecrawl is available.
```bash theme={null}
npx -y firecrawl-cli@latest init --all --browser
```
Restart your agent after installing the skill. See [Skill + CLI](/sdks/cli) for the full setup.
Or use the [MCP Server](/mcp-server) to connect Firecrawl directly to Claude, Cursor, Windsurf, VS Code, and other AI tools.
***
## What can Firecrawl do?
Search the web and get full page content from results
Extract content from any URL as markdown, HTML, or structured JSON
Continue working with any scraped page — click, fill forms, extract dynamic content
### Why Firecrawl?
* **LLM-ready output**: Clean markdown, structured JSON, screenshots, and more.
* **Handles the hard stuff**: Proxies, anti-bot, JavaScript rendering, and dynamic content.
* **Reliable**: Built for production with high uptime and consistent results.
* **Fast**: Results in seconds, optimized for high throughput.
* **MCP Server**: Connect Firecrawl to any AI tool via the [Model Context Protocol](/mcp-server).
***
## Search
Search the web and get full page content from results in one call. See the [Search feature docs](/features/search) for all options.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
results = firecrawl.search(
query="firecrawl",
limit=3,
)
print(results)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await firecrawl.search('firecrawl', {
limit: 3,
scrapeOptions: { formats: ['markdown'] }
});
console.log(results);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/search" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "firecrawl",
"limit": 3
}'
```
```bash CLI theme={null}
# Search the web
firecrawl search "firecrawl web scraping" --limit 5 --pretty
```
SDKs will return the data object directly. cURL will return the complete payload.
```json JSON theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://www.firecrawl.dev/",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.",
"position": 1
},
{
"url": "https://github.com/firecrawl/firecrawl",
"title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub",
"description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.",
"position": 2
},
...
],
"images": [
{
"title": "Quickstart | Firecrawl",
"imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png",
"imageWidth": 5814,
"imageHeight": 1200,
"url": "https://docs.firecrawl.dev/",
"position": 1
},
...
],
"news": [
{
"title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees",
"url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/",
"snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.",
"date": "3 months ago",
"position": 1
},
...
]
}
}
```
## Scrape
Scrape any URL and get its content in markdown, HTML, or other formats. See the [Scrape feature docs](/features/scrape) for all options.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
# Scrape a website:
doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"])
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Scrape a website:
const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown", "html"]
}'
```
```bash CLI theme={null}
# Scrape a URL and get markdown
firecrawl https://firecrawl.dev
# With multiple formats (returns JSON)
firecrawl https://firecrawl.dev --format markdown,html,links --pretty
```
SDKs will return the data object directly. cURL will return the payload exactly as shown below.
```json theme={null}
{
"success": true,
"data" : {
"markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...",
"html": "
## Interact
Scrape a page, then keep working with it — click buttons, fill forms, extract dynamic content, or navigate deeper. Describe what you want in plain English or write code for full control. See the [Interact feature docs](/features/interact) for all options.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
# 1. Scrape Amazon's homepage
result = app.scrape("https://www.amazon.com", formats=["markdown"])
scrape_id = result.metadata.scrape_id
# 2. Interact — search for a product and get its price
app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max")
response = app.interact(scrape_id, prompt="Click on the first result and tell me the price")
print(response.output)
# 3. Stop the session
app.stop_interaction(scrape_id)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
// 1. Scrape Amazon's homepage
const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
// 2. Interact — search for a product and get its price
await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
console.log(response.output);
// 3. Stop the session
await app.stopInteraction(scrapeId);
```
```bash cURL theme={null}
# 1. Scrape Amazon's homepage
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "formats": ["markdown"]}')
SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId')
# 2. Interact — search for a product and get its price
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Search for iPhone 16 Pro Max"}'
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Click on the first result and tell me the price"}'
# 3. Stop the session
curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
```bash CLI theme={null}
# 1. Scrape Amazon's homepage (scrape ID is saved automatically)
firecrawl scrape https://www.amazon.com
# 2. Interact — search for a product and get its price
firecrawl interact "Search for iPhone 16 Pro Max"
firecrawl interact "Click on the first result and tell me the price"
# 3. Stop the session
firecrawl interact stop
```
```json Response theme={null}
{
"success": true,
"liveViewUrl": "https://liveview.firecrawl.dev/...",
"interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...",
"output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.",
"exitCode": 0,
"killed": false
}
```
***
## More capabilities
Autonomous web data gathering powered by AI
Managed browser sessions for interactive workflows
Discover all URLs on a website
Recursively gather content from entire sites
***
## Resources
Complete API documentation with interactive examples
Python, Node.js, CLI, and community SDKs
Self-host Firecrawl or contribute to the project
LangChain, LlamaIndex, OpenAI, and more
# Firecrawl MCP Server
Source: https://docs.firecrawl.dev/mcp-server
Use Firecrawl's API through the Model Context Protocol
A Model Context Protocol (MCP) server implementation that integrates [Firecrawl](https://github.com/firecrawl/firecrawl) for searching, scraping, and interacting with the web. Our MCP server is open-source and available on [GitHub](https://github.com/firecrawl/firecrawl-mcp-server).
## Features
* Search the web and get full page content
* Scrape any URL into clean, structured data
* Interact with pages — click, navigate, and operate
* Deep research with autonomous agent
* Browser session management
* Cloud and self-hosted support
* Streamable HTTP support
## Installation
You can either use our remote hosted URL or run the server locally. Get your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys)
### Remote hosted URL
```bash theme={null}
https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp
```
### Running with npx
```bash theme={null}
env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
```
### Manual Installation
```bash theme={null}
npm install -g firecrawl-mcp
```
### Running on Cursor
#### Manual Installation
Configuring Cursor 🖥️
Note: Requires Cursor version 0.45.6+
For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers:
[Cursor MCP Server Configuration Guide](https://docs.cursor.com/context/model-context-protocol#configuring-mcp-servers)
To configure Firecrawl MCP in Cursor **v0.48.6**
1. Open Cursor Settings
2. Go to Features > MCP Servers
3. Click "+ Add new global MCP server"
4. Enter the following code:
```json theme={null}
{
"mcpServers": {
"firecrawl-mcp": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR-API-KEY"
}
}
}
}
```
To configure Firecrawl MCP in Cursor **v0.45.6**
1. Open Cursor Settings
2. Go to Features > MCP Servers
3. Click "+ Add New MCP Server"
4. Enter the following:
* Name: "firecrawl-mcp" (or your preferred name)
* Type: "command"
* Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp`
> If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"`
Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from [https://www.firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys)
After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web data needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query.
### Running on Windsurf
Add this to your `./codeium/windsurf/model_config.json`:
```json theme={null}
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
```
### Running with Streamable HTTP Mode
To run the server using streamable HTTP transport locally instead of the default stdio transport:
```bash theme={null}
env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
```
Use the url: [http://localhost:3000/v2/mcp](http://localhost:3000/v2/mcp) or [https://mcp.firecrawl.dev/\{FIRECRAWL\_API\_KEY}/v2/mcp](https://mcp.firecrawl.dev/\{FIRECRAWL_API_KEY}/v2/mcp)
### Installing via Smithery (Legacy)
To install Firecrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl):
```bash theme={null}
npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
```
### Running on VS Code
For one-click installation, click one of the install buttons below\...
[](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D) [](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D\&quality=insiders)
For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing `Ctrl + Shift + P` and typing `Preferences: Open User Settings (JSON)`.
```json theme={null}
{
"mcp": {
"inputs": [
{
"type": "promptString",
"id": "apiKey",
"description": "Firecrawl API Key",
"password": true
}
],
"servers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "${input:apiKey}"
}
}
}
}
}
```
Optionally, you can add it to a file called `.vscode/mcp.json` in your workspace. This will allow you to share the configuration with others:
```json theme={null}
{
"inputs": [
{
"type": "promptString",
"id": "apiKey",
"description": "Firecrawl API Key",
"password": true
}
],
"servers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "${input:apiKey}"
}
}
}
}
```
**Note:** Some users have reported issues when adding the MCP server to VS Code due to how it validates JSON with an outdated schema format ([microsoft/vscode#155379](https://github.com/microsoft/vscode/issues/155379)).
This affects several MCP tools, including Firecrawl.
**Workaround:** Disable JSON validation in VS Code to allow the MCP server to load properly.\
See reference: [directus/directus#25906 (comment)](https://github.com/directus/directus/issues/25906#issuecomment-3369169513).
The MCP server still works fine when invoked via other extensions, but the issue occurs specifically when registering it directly in the MCP server list. We plan to add guidance once VS Code updates their schema validation.
### Running on Claude Desktop
Add this to the Claude config file:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"url": "https://mcp.firecrawl.dev/v2/mcp",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}
```
If you get a "Couldn't reach the MCP server" error, your Claude Desktop version may not support streamable HTTP transport. Use the local npx approach instead (requires [Node.js](https://nodejs.org)):
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
```
If you see a `spawn npx ENOENT` error, Node.js is not installed or not in your system PATH. Install Node.js from [nodejs.org](https://nodejs.org) (LTS version), then fully restart Claude Desktop. On Windows, you can also run `where npx` in Command Prompt and use the full path (e.g. `C:\\Program Files\\nodejs\\npx.cmd`) as the `command` value.
### Running on Claude Code
Add the Firecrawl MCP server using the Claude Code CLI. You can use the remote hosted URL or run locally:
```bash theme={null}
# Remote hosted URL (recommended)
claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp
# Or run locally via npx
claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp
```
### Running on Google Antigravity
Google Antigravity allows you to configure MCP servers directly through its Agent interface.
1. Open the Agent sidebar in the Editor or the Agent Manager view
2. Click the "..." (More Actions) menu and select **MCP Servers**
3. Select **View raw config** to open your local `mcp_config.json` file
4. Add the following configuration:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_FIRECRAWL_API_KEY"
}
}
}
}
```
5. Save the file and click **Refresh** in the Antigravity MCP interface to see the new tools
Replace `YOUR_FIRECRAWL_API_KEY` with your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys).
### Running on n8n
To connect the Firecrawl MCP server in n8n:
1. Get your Firecrawl API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys)
2. In your n8n workflow, add an **AI Agent** node
3. In the AI Agent configuration, add a new **Tool**
4. Select **MCP Client Tool** as the tool type
5. Enter the MCP server Endpoint (replace `{YOUR_FIRECRAWL_API_KEY}` with your actual API key):
```
https://mcp.firecrawl.dev/{YOUR_FIRECRAWL_API_KEY}/v2/mcp
```
6. Set **Server Transport** to **HTTP Streamable**
7. Set **Authentication** to **None**
8. For **Tools to include**, you can select **All**, **Selected**, or **All Except** - this will expose the Firecrawl tools (scrape, crawl, map, search, extract, etc.)
For self-hosted deployments, run the MCP server with npx and enable HTTP transport mode:
```bash theme={null}
env HTTP_STREAMABLE_SERVER=true \
FIRECRAWL_API_KEY=fc-YOUR_API_KEY \
FIRECRAWL_API_URL=YOUR_FIRECRAWL_INSTANCE \
npx -y firecrawl-mcp
```
This will start the server on `http://localhost:3000/v2/mcp` which you can use in your n8n workflow as Endpoint. The `HTTP_STREAMABLE_SERVER=true` environment variable is required since n8n needs HTTP transport.
## Configuration
### Environment Variables
#### Required for Cloud API
* `FIRECRAWL_API_KEY`: Your Firecrawl API key
* Required when using cloud API (default)
* Optional when using self-hosted instance with `FIRECRAWL_API_URL`
* `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances
* Example: `https://firecrawl.your-domain.com`
* If not provided, the cloud API will be used (requires API key)
#### Optional Configuration
##### Retry Configuration
* `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3)
* `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000)
* `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000)
* `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2)
##### Credit Usage Monitoring
* `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000)
* `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100)
### Configuration Examples
For cloud API usage with custom retry and credit monitoring:
```bash theme={null}
# Required for cloud API
export FIRECRAWL_API_KEY=your-api-key
# Optional retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts
export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay
export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay
export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff
# Optional credit monitoring
export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits
export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits
```
For self-hosted instance:
```bash theme={null}
# Required for self-hosted
export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
# Optional authentication for self-hosted
export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth
# Custom retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries
```
### Custom configuration with Claude Desktop
Add this to your `claude_desktop_config.json`:
```json theme={null}
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
"FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
"FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
"FIRECRAWL_RETRY_MAX_DELAY": "30000",
"FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
"FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
"FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
}
}
}
}
```
### System Configuration
The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:
```typescript theme={null}
const CONFIG = {
retry: {
maxAttempts: 3, // Number of retry attempts for rate-limited requests
initialDelay: 1000, // Initial delay before first retry (in milliseconds)
maxDelay: 10000, // Maximum delay between retries (in milliseconds)
backoffFactor: 2, // Multiplier for exponential backoff
},
credit: {
warningThreshold: 1000, // Warn when credit usage reaches this level
criticalThreshold: 100, // Critical alert when credit usage reaches this level
},
};
```
These configurations control:
1. **Retry Behavior**
* Automatically retries failed requests due to rate limits
* Uses exponential backoff to avoid overwhelming the API
* Example: With default settings, retries will be attempted at:
* 1st retry: 1 second delay
* 2nd retry: 2 seconds delay
* 3rd retry: 4 seconds delay (capped at maxDelay)
2. **Credit Usage Monitoring**
* Tracks API credit consumption for cloud API usage
* Provides warnings at specified thresholds
* Helps prevent unexpected service interruption
* Example: With default settings:
* Warning at 1000 credits remaining
* Critical alert at 100 credits remaining
### Rate Limiting and Batch Processing
The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities:
* Automatic rate limit handling with exponential backoff
* Efficient parallel processing for batch operations
* Smart request queuing and throttling
* Automatic retries for transient errors
## Available Tools
### 1. Scrape Tool (`firecrawl_scrape`)
Scrape content from a single URL with advanced options.
```json theme={null}
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"waitFor": 1000,
"mobile": false,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"skipTlsVerification": false
}
}
```
### 2. Map Tool (`firecrawl_map`)
Map a website to discover all indexed URLs on the site.
```json theme={null}
{
"name": "firecrawl_map",
"arguments": {
"url": "https://example.com",
"search": "blog",
"sitemap": "include",
"includeSubdomains": false,
"limit": 100,
"ignoreQueryParameters": true
}
}
```
#### Map Tool Options:
* `url`: The base URL of the website to map
* `search`: Optional search term to filter URLs
* `sitemap`: Control sitemap usage - "include", "skip", or "only"
* `includeSubdomains`: Whether to include subdomains in the mapping
* `limit`: Maximum number of URLs to return
* `ignoreQueryParameters`: Whether to ignore query parameters when mapping
**Best for:** Discovering URLs on a website before deciding what to scrape; finding specific sections of a website.
**Returns:** Array of URLs found on the site.
### 3. Search Tool (`firecrawl_search`)
Search the web and optionally extract content from search results.
```json theme={null}
{
"name": "firecrawl_search",
"arguments": {
"query": "your search query",
"limit": 5,
"location": "United States",
"tbs": "qdr:m",
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}
```
#### Search Tool Options:
* `query`: The search query string (required)
* `limit`: Maximum number of results to return
* `location`: Geographic location for search results
* `tbs`: Time-based search filter (e.g., `qdr:d` for past day, `qdr:w` for past week, `qdr:m` for past month)
* `filter`: Additional search filter
* `sources`: Array of source types to search (`web`, `images`, `news`)
* `scrapeOptions`: Options for scraping search result pages
* `enterprise`: Array of enterprise options (`default`, `anon`, `zdr`)
### 4. Crawl Tool (`firecrawl_crawl`)
Start an asynchronous crawl with advanced options.
```json theme={null}
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://example.com",
"maxDiscoveryDepth": 2,
"limit": 100,
"allowExternalLinks": false,
"deduplicateSimilarURLs": true
}
}
```
### 5. Check Crawl Status (`firecrawl_check_crawl_status`)
Check the status of a crawl job.
```json theme={null}
{
"name": "firecrawl_check_crawl_status",
"arguments": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
**Returns:** Status and progress of the crawl job, including results if available.
### 6. Extract Tool (`firecrawl_extract`)
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
```json theme={null}
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://example.com/page1", "https://example.com/page2"],
"prompt": "Extract product information including name, price, and description",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
},
"required": ["name", "price"]
},
"allowExternalLinks": false,
"enableWebSearch": false,
"includeSubdomains": false
}
}
```
Example response:
```json theme={null}
{
"content": [
{
"type": "text",
"text": {
"name": "Example Product",
"price": 99.99,
"description": "This is an example product description"
}
}
],
"isError": false
}
```
#### Extract Tool Options:
* `urls`: Array of URLs to extract information from
* `prompt`: Custom prompt for the LLM extraction
* `schema`: JSON schema for structured data extraction
* `allowExternalLinks`: Allow extraction from external links
* `enableWebSearch`: Enable web search for additional context
* `includeSubdomains`: Include subdomains in extraction
When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
### 7. Agent Tool (`firecrawl_agent`)
Autonomous web research agent that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. This runs asynchronously -- it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results.
```json theme={null}
{
"name": "firecrawl_agent",
"arguments": {
"prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts",
"schema": {
"type": "object",
"properties": {
"startups": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"funding": { "type": "string" },
"founded": { "type": "string" }
}
}
}
}
}
}
}
```
You can also provide specific URLs for the agent to focus on:
```json theme={null}
{
"name": "firecrawl_agent",
"arguments": {
"urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
"prompt": "Compare the features and pricing information from these pages"
}
}
```
#### Agent Tool Options:
* `prompt`: Natural language description of the data you want (required, max 10,000 characters)
* `urls`: Optional array of URLs to focus the agent on specific pages
* `schema`: Optional JSON schema for structured output
**Best for:** Complex research tasks where you don't know the exact URLs; multi-source data gathering; finding information scattered across the web; extracting data from JavaScript-heavy SPAs that fail with regular scrape.
**Returns:** Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
### 8. Check Agent Status (`firecrawl_agent_status`)
Check the status of an agent job and retrieve results when complete. Poll every 15-30 seconds and keep polling for at least 2-3 minutes before considering the request failed.
```json theme={null}
{
"name": "firecrawl_agent_status",
"arguments": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
#### Agent Status Options:
* `id`: The agent job ID returned by `firecrawl_agent` (required)
**Possible statuses:**
* `processing`: Agent is still researching -- keep polling
* `completed`: Research finished -- response includes the extracted data
* `failed`: An error occurred
**Returns:** Status, progress, and results (if completed) of the agent job.
### 9. Create Browser Session (`firecrawl_browser_create`)
Create a persistent browser session for code execution via CDP (Chrome DevTools Protocol).
```json theme={null}
{
"name": "firecrawl_browser_create",
"arguments": {
"ttl": 120,
"activityTtl": 60
}
}
```
#### Browser Create Options:
* `ttl`: Total session lifetime in seconds (30-3600, optional)
* `activityTtl`: Idle timeout in seconds (10-3600, optional)
**Best for:** Running code (Python/JS) that interacts with a live browser page, multi-step browser automation, sessions with profiles that survive across multiple tool calls.
**Returns:** Session ID, CDP URL, and live view URL.
### 10. Execute Code in Browser (`firecrawl_browser_execute`)
Execute code in an active browser session. Supports agent-browser commands (bash), Python, or JavaScript.
```json theme={null}
{
"name": "firecrawl_browser_execute",
"arguments": {
"sessionId": "session-id-here",
"code": "agent-browser open https://example.com",
"language": "bash"
}
}
```
Python example with Playwright:
```json theme={null}
{
"name": "firecrawl_browser_execute",
"arguments": {
"sessionId": "session-id-here",
"code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)",
"language": "python"
}
}
```
#### Browser Execute Options:
* `sessionId`: The browser session ID (required)
* `code`: The code to execute (required)
* `language`: `bash`, `python`, or `node` (optional, defaults to `bash`)
**Common agent-browser commands (bash):**
* `agent-browser open ` -- Navigate to URL
* `agent-browser snapshot` -- Get accessibility tree with clickable refs
* `agent-browser click @e5` -- Click element by ref from snapshot
* `agent-browser type @e3 "text"` -- Type into element
* `agent-browser screenshot [path]` -- Take screenshot
* `agent-browser scroll down` -- Scroll page
* `agent-browser wait 2000` -- Wait 2 seconds
**Returns:** Execution result including stdout, stderr, and exit code.
### 11. Delete Browser Session (`firecrawl_browser_delete`)
Destroy a browser session.
```json theme={null}
{
"name": "firecrawl_browser_delete",
"arguments": {
"sessionId": "session-id-here"
}
}
```
#### Browser Delete Options:
* `sessionId`: The browser session ID to destroy (required)
**Returns:** Success confirmation.
### 12. List Browser Sessions (`firecrawl_browser_list`)
List browser sessions, optionally filtered by status.
```json theme={null}
{
"name": "firecrawl_browser_list",
"arguments": {
"status": "active"
}
}
```
#### Browser List Options:
* `status`: Filter by session status -- `active` or `destroyed` (optional)
**Returns:** Array of browser sessions.
### 13. Interact with Scraped Page (`firecrawl_interact`)
Interact with a previously scraped page in a live browser session. Scrape a page first with `firecrawl_scrape`, then use the returned `scrapeId` (from the scrape response metadata) to click buttons, fill forms, extract dynamic content, or navigate deeper. The response includes a `liveViewUrl` and `interactiveLiveViewUrl` you can open in your browser to watch or control the session in real time.
```json theme={null}
{
"name": "firecrawl_interact",
"arguments": {
"scrapeId": "scrape-id-from-previous-scrape",
"prompt": "Click the Sign In button"
}
}
```
#### Interact Tool Options:
* `scrapeId`: The scrape job ID from a previous `firecrawl_scrape` call (required)
* `prompt`: Natural language instruction describing the action to take (provide `prompt` or `code`)
* `code`: Code to execute in the browser session (provide `code` or `prompt`)
* `language`: `bash`, `python`, or `node` (optional, defaults to `node`, only used with `code`)
* `timeout`: Execution timeout in seconds, 1–300 (optional, defaults to 30)
**Best for:** Multi-step workflows on a single page — searching a site, clicking through results, filling forms, extracting data that requires interaction.
**Returns:** Interaction result including `liveViewUrl` and `interactiveLiveViewUrl`.
### 14. Stop Interact Session (`firecrawl_interact_stop`)
Stop an interact session for a scraped page. Call this when you are done interacting to free resources.
```json theme={null}
{
"name": "firecrawl_interact_stop",
"arguments": {
"scrapeId": "scrape-id-from-previous-scrape"
}
}
```
#### Interact Stop Options:
* `scrapeId`: The scrape ID for the session to stop (required)
**Returns:** Confirmation that the session has been stopped.
## Logging System
The server includes comprehensive logging:
* Operation status and progress
* Performance metrics
* Credit usage monitoring
* Rate limit tracking
* Error conditions
Example log messages:
```
[INFO] Firecrawl MCP Server initialized successfully
[INFO] Starting scrape for URL: https://example.com
[INFO] Starting crawl for URL: https://example.com
[WARNING] Credit usage has reached warning threshold
[ERROR] Rate limit exceeded, retrying in 2s...
```
## Error Handling
The server provides robust error handling:
* Automatic retries for transient errors
* Rate limit handling with backoff
* Detailed error messages
* Credit usage warnings
* Network resilience
Example error response:
```json theme={null}
{
"content": [
{
"type": "text",
"text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
}
],
"isError": true
}
```
## Development
```bash theme={null}
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
```
### Contributing
1. Fork the repository
2. Create your feature branch
3. Run tests: `npm test`
4. Submit a pull request
### Thanks to contributors
Thanks to [@vrknetha](https://github.com/vrknetha), [@cawstudios](https://caw.tech) for the initial implementation!
Thanks to MCP.so and Klavis AI for hosting and [@gstarwd](https://github.com/gstarwd), [@xiangkaiz](https://github.com/xiangkaiz) and [@zihaolin96](https://github.com/zihaolin96) for integrating our server.
## License
MIT License - see LICENSE file for details
# ASP.NET Core
Source: https://docs.firecrawl.dev/quickstarts/aspnet-core
Use Firecrawl with ASP.NET Core to search, scrape, and interact with web data using the REST API.
## Prerequisites
* .NET 6.0+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Configuration
Add your API key to `appsettings.json`:
```json theme={null}
{
"Firecrawl": {
"ApiKey": "fc-YOUR-API-KEY",
"BaseUrl": "https://api.firecrawl.dev/v2"
}
}
```
Or use environment variables / user secrets:
```bash theme={null}
export Firecrawl__ApiKey=fc-YOUR-API-KEY
```
## Create a service
Create `Services/FirecrawlService.cs`:
```csharp theme={null}
using System.Net.Http.Headers;
using System.Text;
using System.Text.Json;
public class FirecrawlService
{
private readonly HttpClient _http;
private readonly string _baseUrl;
public FirecrawlService(IConfiguration config)
{
_baseUrl = config["Firecrawl:BaseUrl"] ?? "https://api.firecrawl.dev/v2";
_http = new HttpClient();
_http.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue("Bearer", config["Firecrawl:ApiKey"]);
}
public async Task SearchAsync(string query, int limit = 5)
{
var content = new StringContent(
JsonSerializer.Serialize(new { query, limit }),
Encoding.UTF8, "application/json");
var response = await _http.PostAsync($"{_baseUrl}/search", content);
response.EnsureSuccessStatusCode();
return JsonDocument.Parse(await response.Content.ReadAsStringAsync());
}
public async Task ScrapeAsync(string url)
{
var content = new StringContent(
JsonSerializer.Serialize(new { url }),
Encoding.UTF8, "application/json");
var response = await _http.PostAsync($"{_baseUrl}/scrape", content);
response.EnsureSuccessStatusCode();
return JsonDocument.Parse(await response.Content.ReadAsStringAsync());
}
public async Task InteractAsync(string url, string prompt, string? followUp = null)
{
// 1. Scrape to open a browser session
var scrapeContent = new StringContent(
JsonSerializer.Serialize(new { url, formats = new[] { "markdown" } }),
Encoding.UTF8, "application/json");
var scrapeRes = await _http.PostAsync($"{_baseUrl}/scrape", scrapeContent);
scrapeRes.EnsureSuccessStatusCode();
var scrapeDoc = JsonDocument.Parse(await scrapeRes.Content.ReadAsStringAsync());
var scrapeId = scrapeDoc.RootElement
.GetProperty("data").GetProperty("metadata").GetProperty("scrapeId").GetString();
// 2. Send first prompt
var firstPrompt = new StringContent(
JsonSerializer.Serialize(new { prompt }),
Encoding.UTF8, "application/json");
await _http.PostAsync($"{_baseUrl}/scrape/{scrapeId}/interact", firstPrompt);
// 3. Send follow-up prompt
JsonDocument? result = null;
if (followUp != null)
{
var followUpContent = new StringContent(
JsonSerializer.Serialize(new { prompt = followUp }),
Encoding.UTF8, "application/json");
var followUpRes = await _http.PostAsync(
$"{_baseUrl}/scrape/{scrapeId}/interact", followUpContent);
followUpRes.EnsureSuccessStatusCode();
result = JsonDocument.Parse(await followUpRes.Content.ReadAsStringAsync());
}
// 4. Close the session
await _http.DeleteAsync($"{_baseUrl}/scrape/{scrapeId}/interact");
return result ?? scrapeDoc;
}
}
```
## Register and use
In `Program.cs`:
```csharp theme={null}
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSingleton();
var app = builder.Build();
app.MapPost("/api/search", async (FirecrawlService firecrawl, SearchRequest req) =>
{
var result = await firecrawl.SearchAsync(req.Query, req.Limit);
return Results.Ok(result.RootElement);
});
app.MapPost("/api/scrape", async (FirecrawlService firecrawl, ScrapeRequest req) =>
{
var result = await firecrawl.ScrapeAsync(req.Url);
return Results.Ok(result.RootElement);
});
app.MapPost("/api/interact", async (FirecrawlService firecrawl, InteractRequest req) =>
{
var result = await firecrawl.InteractAsync(req.Url, req.Prompt, req.FollowUp);
return Results.Ok(result.RootElement);
});
app.Run();
record SearchRequest(string Query, int Limit = 5);
record ScrapeRequest(string Url);
record InteractRequest(string Url, string Prompt, string? FollowUp = null);
```
## Run it
```bash theme={null}
dotnet run
```
## Test it
```bash theme={null}
# Search the web
curl -X POST http://localhost:5000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
# Scrape a page
curl -X POST http://localhost:5000/api/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Interact with a page
curl -X POST http://localhost:5000/api/interact \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "prompt": "Search for iPhone 16 Pro Max", "followUp": "Click on the first result and tell me the price"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Complete REST API documentation
# Astro
Source: https://docs.firecrawl.dev/quickstarts/astro
Use Firecrawl with Astro to scrape, search, and interact with web data in your content-driven site.
## Prerequisites
* Astro project with SSR enabled (`output: "server"` or `"hybrid"`)
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Create `src/pages/api/search.ts`:
```typescript theme={null}
import type { APIRoute } from "astro";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: import.meta.env.FIRECRAWL_API_KEY,
});
export const POST: APIRoute = async ({ request }) => {
const { query } = await request.json();
const results = await firecrawl.search(query, { limit: 5 });
return new Response(JSON.stringify(results), {
headers: { "Content-Type": "application/json" },
});
};
```
Or search at request time in a server-rendered page (`src/pages/search.astro`):
```astro theme={null}
---
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: import.meta.env.FIRECRAWL_API_KEY,
});
const query = Astro.url.searchParams.get("q");
let results = [];
if (query) {
const searchData = await firecrawl.search(query, { limit: 5 });
results = searchData.web || [];
}
---
)}
```
## Scrape a page
Create `src/pages/api/scrape.ts`:
```typescript theme={null}
import type { APIRoute } from "astro";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: import.meta.env.FIRECRAWL_API_KEY,
});
export const POST: APIRoute = async ({ request }) => {
const { url } = await request.json();
const result = await firecrawl.scrape(url);
return new Response(JSON.stringify(result), {
headers: { "Content-Type": "application/json" },
});
};
```
Or scrape at request time in a server-rendered page (`src/pages/scrape.astro`):
```astro theme={null}
---
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: import.meta.env.FIRECRAWL_API_KEY,
});
const target = Astro.url.searchParams.get("url");
let markdown = null;
if (target) {
const result = await firecrawl.scrape(target);
markdown = result.markdown;
}
---
Scraped Content
{markdown ?
{markdown}
:
Pass ?url= to scrape a page
}
```
## Interact with a page
Create `src/pages/api/interact.ts`:
```typescript theme={null}
import type { APIRoute } from "astro";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: import.meta.env.FIRECRAWL_API_KEY,
});
export const POST: APIRoute = async () => {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
await firecrawl.stopInteraction(scrapeId);
return new Response(JSON.stringify({ output: response.output }), {
headers: { "Content-Type": "application/json" },
});
};
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# AWS Lambda
Source: https://docs.firecrawl.dev/quickstarts/aws-lambda
Use Firecrawl with AWS Lambda to search, scrape, and interact with web data in serverless functions.
## Prerequisites
* AWS account with Lambda access
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
mkdir firecrawl-lambda && cd firecrawl-lambda
npm init -y
npm install @mendable/firecrawl-js
```
## Search the web
Create `index.mjs` with a search handler:
```javascript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function handler(event) {
const body = JSON.parse(event.body || "{}");
if (body.action === "search") {
const results = await firecrawl.search(body.query, { limit: 5 });
return {
statusCode: 200,
body: JSON.stringify(results),
};
}
return { statusCode: 400, body: JSON.stringify({ error: "Unknown action" }) };
}
```
## Scrape a page
Add a `scrape` action to the same handler:
```javascript theme={null}
if (body.action === "scrape") {
const result = await firecrawl.scrape(body.url);
return {
statusCode: 200,
body: JSON.stringify(result),
};
}
```
## Interact with a page
Add an `interact` action to control a live browser session — click buttons, fill forms, and extract dynamic content:
```javascript theme={null}
if (body.action === "interact") {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
await firecrawl.stopInteraction(scrapeId);
return {
statusCode: 200,
body: JSON.stringify({ output: response.output }),
};
}
```
## Deploy
Package and deploy with the AWS CLI:
```bash theme={null}
zip -r function.zip index.mjs node_modules/
aws lambda create-function \
--function-name firecrawl-scraper \
--runtime nodejs20.x \
--handler index.handler \
--zip-file fileb://function.zip \
--role arn:aws:iam::YOUR_ACCOUNT:role/lambda-role \
--environment Variables="{FIRECRAWL_API_KEY=fc-YOUR-API-KEY}" \
--timeout 60
```
Set the Lambda timeout to at least 30 seconds. Scraping dynamic pages and interact sessions can take longer than the default 3-second timeout.
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Bun
Source: https://docs.firecrawl.dev/quickstarts/bun
Use Firecrawl with Bun to build fast web scraping and search servers.
## Prerequisites
* Bun 1.0+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
bun add @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Bun has a built-in HTTP server. Create `index.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
Bun.serve({
port: 3000,
async fetch(req) {
const url = new URL(req.url);
if (req.method === "POST" && url.pathname === "/search") {
const { query } = await req.json();
const results = await firecrawl.search(query, { limit: 5 });
return Response.json(results);
}
return new Response("Not found", { status: 404 });
},
});
console.log("Server running on port 3000");
```
Run it:
```bash theme={null}
bun run index.ts
```
## Scrape a page
Add a `/scrape` route to the same server:
```typescript theme={null}
if (req.method === "POST" && url.pathname === "/scrape") {
const { url: targetUrl } = await req.json();
const result = await firecrawl.scrape(targetUrl);
return Response.json(result);
}
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
```typescript theme={null}
if (req.method === "POST" && url.pathname === "/interact") {
const { url: targetUrl } = await req.json();
const result = await firecrawl.scrape(targetUrl, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
await firecrawl.stopInteraction(scrapeId);
return Response.json({ output: response.output });
}
```
## Script usage
Use Firecrawl in a standalone Bun script:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const app = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
const results = await app.search("firecrawl web scraping", { limit: 5 });
console.log(results);
```
```bash theme={null}
bun run search.ts
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Cloudflare Workers
Source: https://docs.firecrawl.dev/quickstarts/cloudflare-workers
Use Firecrawl with Cloudflare Workers to search, scrape, and interact with web data at the edge.
## Prerequisites
* Wrangler CLI (`npm install -g wrangler`)
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
npm create cloudflare@latest my-scraper
cd my-scraper
npm install @mendable/firecrawl-js
```
Add your API key as a secret:
```bash theme={null}
wrangler secret put FIRECRAWL_API_KEY
```
## Search the web
Create a handler that searches the web and returns results with full page content.
Edit `src/index.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
export interface Env {
FIRECRAWL_API_KEY: string;
}
export default {
async fetch(request: Request, env: Env): Promise {
const firecrawl = new Firecrawl({ apiKey: env.FIRECRAWL_API_KEY });
const url = new URL(request.url);
if (request.method === "POST" && url.pathname === "/search") {
const { query } = (await request.json()) as { query: string };
const results = await firecrawl.search(query, { limit: 5 });
return Response.json(results);
}
return new Response("Not found", { status: 404 });
},
};
```
## Scrape a page
Add a `/scrape` route to extract clean markdown from any URL.
```typescript theme={null}
if (request.method === "POST" && url.pathname === "/scrape") {
const { url: targetUrl } = (await request.json()) as { url: string };
const result = await firecrawl.scrape(targetUrl);
return Response.json(result);
}
```
## Interact with a page
Add an `/interact` route to control a live browser session — click buttons, fill forms, and extract dynamic content.
```typescript theme={null}
if (request.method === "POST" && url.pathname === "/interact") {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
await firecrawl.stopInteraction(scrapeId);
return Response.json({ output: response.output });
}
```
## Deploy
```bash theme={null}
wrangler deploy
```
## Test it
```bash theme={null}
curl -X POST https://my-scraper..workers.dev/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Deno Deploy
Source: https://docs.firecrawl.dev/quickstarts/deno-deploy
Use Firecrawl with Deno Deploy to search, scrape, and interact with web data at the edge.
## Prerequisites
* Deno 1.40+ or Deno 2
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
Create `main.ts`:
```typescript theme={null}
import Firecrawl from "npm:@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: Deno.env.get("FIRECRAWL_API_KEY"),
});
```
## Search the web
Add a `/search` route that searches the web and returns results with full page content.
```typescript theme={null}
Deno.serve(async (req) => {
const url = new URL(req.url);
if (req.method === "POST" && url.pathname === "/search") {
const { query } = await req.json();
const results = await firecrawl.search(query, { limit: 5 });
return Response.json(results);
}
return new Response("Not found", { status: 404 });
});
```
## Scrape a page
Add a `/scrape` route to extract clean markdown from any URL.
```typescript theme={null}
if (req.method === "POST" && url.pathname === "/scrape") {
const { url: targetUrl } = await req.json();
const result = await firecrawl.scrape(targetUrl);
return Response.json(result);
}
```
## Interact with a page
Add an `/interact` route to control a live browser session — click buttons, fill forms, and extract dynamic content.
```typescript theme={null}
if (req.method === "POST" && url.pathname === "/interact") {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
console.log(response.output);
await firecrawl.stopInteraction(scrapeId);
return Response.json({ output: response.output });
}
```
## Run locally
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY deno run --allow-net --allow-env main.ts
```
## Deploy
Install the Deno Deploy CLI (`deployctl`) and deploy:
```bash theme={null}
deployctl deploy --project=my-scraper main.ts
```
Set the environment variable in the Deno Deploy dashboard or via CLI:
```bash theme={null}
deployctl env set FIRECRAWL_API_KEY=fc-YOUR-API-KEY --project=my-scraper
```
## Test it
```bash theme={null}
curl -X POST https://my-scraper.deno.dev/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Django
Source: https://docs.firecrawl.dev/quickstarts/django
Use Firecrawl with Django to scrape, search, and interact with web data in your Python web application.
## Prerequisites
* Django 4+ project
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
pip install firecrawl-py
```
Add your API key to your Django settings or environment:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Create views
Add search, scrape, and interact views to your Django app. In `views.py`:
```python theme={null}
import json
import os
from django.http import JsonResponse
from django.views.decorators.csrf import csrf_exempt
from django.views.decorators.http import require_POST
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])
@csrf_exempt
@require_POST
def search_view(request):
body = json.loads(request.body)
results = firecrawl.search(body["query"], limit=body.get("limit", 5))
return JsonResponse(
[{"title": r.title, "url": r.url} for r in results.web],
safe=False,
)
@csrf_exempt
@require_POST
def scrape_view(request):
body = json.loads(request.body)
result = firecrawl.scrape(body["url"])
return JsonResponse({
"markdown": result.markdown,
"metadata": result.metadata,
})
@csrf_exempt
@require_POST
def interact_start_view(request):
body = json.loads(request.body)
result = firecrawl.scrape(body["url"], formats=["markdown"])
return JsonResponse({"scrape_id": result.metadata.scrape_id})
@csrf_exempt
@require_POST
def interact_view(request):
body = json.loads(request.body)
response = firecrawl.interact(body["scrape_id"], prompt=body["prompt"])
return JsonResponse({"output": response.output})
@csrf_exempt
@require_POST
def interact_stop_view(request):
body = json.loads(request.body)
firecrawl.stop_interaction(body["scrape_id"])
return JsonResponse({"status": "stopped"})
```
## Wire up URLs
In `urls.py`:
```python theme={null}
from django.urls import path
from . import views
urlpatterns = [
path("api/search/", views.search_view),
path("api/scrape/", views.scrape_view),
path("api/interact/start/", views.interact_start_view),
path("api/interact/", views.interact_view),
path("api/interact/stop/", views.interact_stop_view),
]
```
## Test it
```bash theme={null}
python manage.py runserver
# Search the web
curl -X POST http://localhost:8000/api/search/ \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping", "limit": 5}'
# Scrape a page
curl -X POST http://localhost:8000/api/scrape/ \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Start an interactive session
curl -X POST http://localhost:8000/api/interact/start/ \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com"}'
```
## Management command
Use Firecrawl in a Django management command for scripts and data pipelines. Create `management/commands/scrape.py`:
```python theme={null}
import os
from django.core.management.base import BaseCommand
from firecrawl import Firecrawl
class Command(BaseCommand):
help = "Scrape a URL and print the markdown"
def add_arguments(self, parser):
parser.add_argument("url", type=str)
def handle(self, *args, **options):
firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])
result = firecrawl.scrape(options["url"])
self.stdout.write(result.markdown)
```
```bash theme={null}
python manage.py scrape https://example.com
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, async, and more
# .NET
Source: https://docs.firecrawl.dev/quickstarts/dotnet
Get started with Firecrawl in .NET. Scrape, search, and interact with web data using the REST API.
## Prerequisites
* .NET 6.0+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Search the web
Firecrawl works with .NET through the REST API using `HttpClient`.
```csharp theme={null}
using System.Net.Http.Headers;
using System.Text;
using System.Text.Json;
var apiKey = Environment.GetEnvironmentVariable("FIRECRAWL_API_KEY");
var client = new HttpClient();
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
var content = new StringContent(
JsonSerializer.Serialize(new { query = "firecrawl web scraping", limit = 5 }),
Encoding.UTF8,
"application/json"
);
var response = await client.PostAsync("https://api.firecrawl.dev/v2/search", content);
var json = await response.Content.ReadAsStringAsync();
Console.WriteLine(json);
```
```json theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://docs.firecrawl.dev",
"title": "Firecrawl Documentation",
"markdown": "# Firecrawl\n\nFirecrawl is a web scraping API..."
}
]
}
}
```
## Scrape a page
```csharp theme={null}
var scrapeContent = new StringContent(
JsonSerializer.Serialize(new { url = "https://example.com" }),
Encoding.UTF8,
"application/json"
);
var scrapeResponse = await client.PostAsync("https://api.firecrawl.dev/v2/scrape", scrapeContent);
var scrapeJson = await scrapeResponse.Content.ReadAsStringAsync();
using var doc = JsonDocument.Parse(scrapeJson);
var markdown = doc.RootElement.GetProperty("data").GetProperty("markdown").GetString();
Console.WriteLine(markdown);
```
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
## Interact with a page
Start a browser session, interact with the page using natural-language prompts, then close the session.
### Step 1 — Scrape to start a session
```csharp theme={null}
var sessionContent = new StringContent(
JsonSerializer.Serialize(new { url = "https://www.amazon.com", formats = new[] { "markdown" } }),
Encoding.UTF8,
"application/json"
);
var sessionResponse = await client.PostAsync("https://api.firecrawl.dev/v2/scrape", sessionContent);
var sessionJson = await sessionResponse.Content.ReadAsStringAsync();
using var sessionDoc = JsonDocument.Parse(sessionJson);
var scrapeId = sessionDoc.RootElement
.GetProperty("data")
.GetProperty("metadata")
.GetProperty("scrapeId")
.GetString();
Console.WriteLine($"scrapeId: {scrapeId}");
```
### Step 2 — Send interactions
```csharp theme={null}
var interactUrl = $"https://api.firecrawl.dev/v2/scrape/{scrapeId}/interact";
// Search for a product
var searchBody = new StringContent(
JsonSerializer.Serialize(new { prompt = "Search for iPhone 16 Pro Max" }),
Encoding.UTF8,
"application/json"
);
var searchResult = await client.PostAsync(interactUrl, searchBody);
Console.WriteLine(await searchResult.Content.ReadAsStringAsync());
// Click on the first result
var clickBody = new StringContent(
JsonSerializer.Serialize(new { prompt = "Click on the first result and tell me the price" }),
Encoding.UTF8,
"application/json"
);
var clickResult = await client.PostAsync(interactUrl, clickBody);
Console.WriteLine(await clickResult.Content.ReadAsStringAsync());
```
### Step 3 — Stop the session
```csharp theme={null}
await client.DeleteAsync(interactUrl);
Console.WriteLine("Session stopped");
```
## Reusable client class
For repeated use, wrap the API in a typed client:
```csharp theme={null}
using System.Net.Http.Headers;
using System.Text;
using System.Text.Json;
public class FirecrawlClient
{
private readonly HttpClient _http;
private const string BaseUrl = "https://api.firecrawl.dev/v2";
public FirecrawlClient(string apiKey)
{
_http = new HttpClient();
_http.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
}
private async Task PostAsync(string endpoint, object payload)
{
var content = new StringContent(
JsonSerializer.Serialize(payload),
Encoding.UTF8,
"application/json"
);
var response = await _http.PostAsync($"{BaseUrl}{endpoint}", content);
response.EnsureSuccessStatusCode();
var json = await response.Content.ReadAsStringAsync();
return JsonDocument.Parse(json);
}
public async Task ScrapeAsync(string url)
{
return await PostAsync("/scrape", new { url });
}
public async Task SearchAsync(string query, int limit = 5)
{
return await PostAsync("/search", new { query, limit });
}
}
// Usage
var firecrawl = new FirecrawlClient(Environment.GetEnvironmentVariable("FIRECRAWL_API_KEY")!);
var result = await firecrawl.SearchAsync("firecrawl web scraping");
Console.WriteLine(result.RootElement);
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Complete REST API documentation
# Elixir
Source: https://docs.firecrawl.dev/quickstarts/elixir
Get started with Firecrawl in Elixir. Search, scrape, and interact with web data using the official SDK.
## Prerequisites
* Elixir 1.14+ and OTP 25+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
Add `firecrawl` to your `mix.exs`:
```elixir theme={null}
defp deps do
[
{:firecrawl, "~> 1.0"}
]
end
```
Configure your API key in `config/config.exs`:
```elixir theme={null}
config :firecrawl, api_key: System.get_env("FIRECRAWL_API_KEY")
```
## Search the web
```elixir theme={null}
{:ok, result} = Firecrawl.search_and_scrape(query: "firecrawl web scraping", limit: 5)
for entry <- result.body["data"]["web"] do
IO.puts("#{entry["title"]} - #{entry["url"]}")
end
```
## Scrape a page
```elixir theme={null}
{:ok, result} = Firecrawl.scrape_and_extract_from_url(url: "https://example.com")
IO.puts(result.body["data"]["markdown"])
```
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
## Interact with a page
Scrape a page, then keep working with it using the browser session API.
```elixir theme={null}
{:ok, scrape} = Firecrawl.scrape_and_extract_from_url(
url: "https://www.amazon.com",
formats: ["markdown"]
)
scrape_id = get_in(scrape.body, ["data", "metadata", "scrapeId"])
# Use the REST API for interact (prompt-based)
headers = [
{"Authorization", "Bearer #{Application.get_env(:firecrawl, :api_key)}"},
{"Content-Type", "application/json"}
]
{:ok, _} = Req.post(
"https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact",
json: %{prompt: "Search for iPhone 16 Pro Max"},
headers: headers
)
{:ok, response} = Req.post(
"https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact",
json: %{prompt: "Click on the first result and tell me the price"},
headers: headers
)
IO.inspect(response.body)
# Stop the session
Req.delete(
"https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact",
headers: headers
)
```
## Environment variable
Set `FIRECRAWL_API_KEY` instead of hardcoding:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Express
Source: https://docs.firecrawl.dev/quickstarts/express
Use Firecrawl with Express to build web scraping and search APIs.
## Prerequisites
* Node.js 18+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
npm install express @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
```javascript theme={null}
import express from "express";
import Firecrawl from "@mendable/firecrawl-js";
const app = express();
app.use(express.json());
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
app.post("/search", async (req, res) => {
try {
const { query } = req.body;
const results = await firecrawl.search(query, { limit: 5 });
res.json(results);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => console.log("Server running on port 3000"));
```
## Scrape a page
```javascript theme={null}
app.post("/scrape", async (req, res) => {
try {
const { url } = req.body;
const result = await firecrawl.scrape(url);
res.json(result);
} catch (error) {
res.status(500).json({ error: error.message });
}
});
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
```javascript theme={null}
app.post("/interact", async (req, res) => {
try {
const { url } = req.body;
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
await firecrawl.stopInteraction(scrapeId);
res.json({ output: response.output });
} catch (error) {
res.status(500).json({ error: error.message });
}
});
```
## Test it
```bash theme={null}
curl -X POST http://localhost:3000/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# FastAPI
Source: https://docs.firecrawl.dev/quickstarts/fastapi
Use Firecrawl with FastAPI to build async web scraping and search APIs in Python.
## Prerequisites
* Python 3.8+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
pip install fastapi uvicorn firecrawl-py
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Create the API
Create `main.py`:
```python theme={null}
import os
from fastapi import FastAPI
from pydantic import BaseModel
from firecrawl import Firecrawl
app = FastAPI()
firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])
class SearchRequest(BaseModel):
query: str
limit: int = 5
class ScrapeRequest(BaseModel):
url: str
class InteractRequest(BaseModel):
scrape_id: str
prompt: str
@app.post("/search")
async def search(req: SearchRequest):
results = firecrawl.search(req.query, limit=req.limit)
return [{"title": r.title, "url": r.url} for r in results.web]
@app.post("/scrape")
async def scrape(req: ScrapeRequest):
result = firecrawl.scrape(req.url)
return {"markdown": result.markdown, "metadata": result.metadata}
@app.post("/interact/start")
async def interact_start(req: ScrapeRequest):
result = firecrawl.scrape(req.url, formats=["markdown"])
return {"scrape_id": result.metadata.scrape_id}
@app.post("/interact")
async def interact(req: InteractRequest):
response = firecrawl.interact(req.scrape_id, prompt=req.prompt)
return {"output": response.output}
@app.post("/interact/stop")
async def interact_stop(req: InteractRequest):
firecrawl.stop_interaction(req.scrape_id)
return {"status": "stopped"}
```
## Run it
```bash theme={null}
uvicorn main:app --reload
```
## Test it
```bash theme={null}
# Search the web
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping", "limit": 5}'
# Scrape a page
curl -X POST http://localhost:8000/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Start an interactive session, then send prompts
curl -X POST http://localhost:8000/interact/start \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com"}'
```
FastAPI auto-generates interactive docs at `http://localhost:8000/docs`.
## Async variant
For better concurrency under load, use `AsyncFirecrawl`:
```python theme={null}
from firecrawl import AsyncFirecrawl
async_firecrawl = AsyncFirecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])
@app.post("/scrape-async")
async def scrape_async(req: ScrapeRequest):
result = await async_firecrawl.scrape(req.url)
return {"markdown": result.markdown}
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, async, and more
# Fastify
Source: https://docs.firecrawl.dev/quickstarts/fastify
Use Firecrawl with Fastify to build high-performance web scraping and search APIs.
## Prerequisites
* Node.js 18+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
npm install fastify @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
```javascript theme={null}
import Fastify from "fastify";
import Firecrawl from "@mendable/firecrawl-js";
const fastify = Fastify({ logger: true });
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
fastify.post("/search", async (request) => {
const { query } = request.body;
return firecrawl.search(query, { limit: 5 });
});
fastify.listen({ port: 3000 });
```
## Scrape a page
```javascript theme={null}
fastify.post("/scrape", async (request) => {
const { url } = request.body;
return firecrawl.scrape(url);
});
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
```javascript theme={null}
fastify.post("/interact", async (request) => {
const { url } = request.body;
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
await firecrawl.stopInteraction(scrapeId);
return { output: response.output };
});
```
## As a Fastify plugin
Encapsulate the client in a plugin for reuse across routes:
```javascript theme={null}
import fp from "fastify-plugin";
import Firecrawl from "@mendable/firecrawl-js";
export default fp(async function firecrawlPlugin(fastify) {
const client = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
fastify.decorate("firecrawl", client);
});
```
Register the plugin, then use `fastify.firecrawl` in any route:
```javascript theme={null}
fastify.register(firecrawlPlugin);
fastify.post("/search", async function (request) {
const { query } = request.body;
return this.firecrawl.search(query, { limit: 5 });
});
```
## Test it
```bash theme={null}
curl -X POST http://localhost:3000/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Flask
Source: https://docs.firecrawl.dev/quickstarts/flask
Use Firecrawl with Flask to build web scraping and search APIs in Python.
## Prerequisites
* Python 3.8+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
pip install flask firecrawl-py
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Create the app
Create `app.py`:
```python theme={null}
import os
from flask import Flask, request, jsonify
from firecrawl import Firecrawl
app = Flask(__name__)
firecrawl = Firecrawl(api_key=os.environ["FIRECRAWL_API_KEY"])
@app.post("/search")
def search():
data = request.get_json()
results = firecrawl.search(data["query"], limit=data.get("limit", 5))
return jsonify([{"title": r.title, "url": r.url} for r in results.web])
@app.post("/scrape")
def scrape():
data = request.get_json()
result = firecrawl.scrape(data["url"])
return jsonify(markdown=result.markdown, metadata=result.metadata)
@app.post("/interact/start")
def interact_start():
data = request.get_json()
result = firecrawl.scrape(data["url"], formats=["markdown"])
return jsonify(scrape_id=result.metadata.scrape_id)
@app.post("/interact")
def interact():
data = request.get_json()
response = firecrawl.interact(data["scrape_id"], prompt=data["prompt"])
return jsonify(output=response.output)
@app.post("/interact/stop")
def interact_stop():
data = request.get_json()
firecrawl.stop_interaction(data["scrape_id"])
return jsonify(status="stopped")
if __name__ == "__main__":
app.run(debug=True)
```
## Run it
```bash theme={null}
flask run
```
## Test it
```bash theme={null}
# Search the web
curl -X POST http://localhost:5000/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping", "limit": 5}'
# Scrape a page
curl -X POST http://localhost:5000/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Start an interactive session
curl -X POST http://localhost:5000/interact/start \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com"}'
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, async, and more
# Go
Source: https://docs.firecrawl.dev/quickstarts/go
Get started with Firecrawl in Go. Scrape, search, and interact with web data using the REST API.
## Prerequisites
* Go 1.21+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Search the web
Firecrawl works with Go through the REST API. Use `net/http` to make requests directly.
```go theme={null}
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
)
func main() {
apiKey := os.Getenv("FIRECRAWL_API_KEY")
body, _ := json.Marshal(map[string]interface{}{
"query": "firecrawl web scraping",
"limit": 5,
})
req, _ := http.NewRequest("POST", "https://api.firecrawl.dev/v2/search", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "request failed: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
result, _ := io.ReadAll(resp.Body)
fmt.Println(string(result))
}
```
```json theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://docs.firecrawl.dev",
"title": "Firecrawl Documentation",
"markdown": "# Firecrawl\n\nFirecrawl is a web scraping API..."
}
]
}
}
```
## Scrape a page
```go theme={null}
body, _ := json.Marshal(map[string]string{
"url": "https://example.com",
})
req, _ := http.NewRequest("POST", "https://api.firecrawl.dev/v2/scrape", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "request failed: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
result, _ := io.ReadAll(resp.Body)
fmt.Println(string(result))
```
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
## Interact with a page
Start a browser session, interact with the page using natural-language prompts, then close the session.
### Step 1 — Scrape to start a session
```go theme={null}
body, _ := json.Marshal(map[string]interface{}{
"url": "https://www.amazon.com",
"formats": []string{"markdown"},
})
req, _ := http.NewRequest("POST", "https://api.firecrawl.dev/v2/scrape", bytes.NewReader(body))
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "request failed: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
var scrapeResult map[string]interface{}
json.NewDecoder(resp.Body).Decode(&scrapeResult)
data := scrapeResult["data"].(map[string]interface{})
metadata := data["metadata"].(map[string]interface{})
scrapeId := metadata["scrapeId"].(string)
fmt.Println("scrapeId:", scrapeId)
```
### Step 2 — Send interactions
```go theme={null}
// Search for a product
interactBody, _ := json.Marshal(map[string]string{
"prompt": "Search for iPhone 16 Pro Max",
})
interactURL := fmt.Sprintf("https://api.firecrawl.dev/v2/scrape/%s/interact", scrapeId)
req, _ = http.NewRequest("POST", interactURL, bytes.NewReader(interactBody))
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
resp, err = http.DefaultClient.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "interact failed: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
result, _ := io.ReadAll(resp.Body)
fmt.Println(string(result))
// Click on the first result
interactBody, _ = json.Marshal(map[string]string{
"prompt": "Click on the first result and tell me the price",
})
req, _ = http.NewRequest("POST", interactURL, bytes.NewReader(interactBody))
req.Header.Set("Authorization", "Bearer "+apiKey)
req.Header.Set("Content-Type", "application/json")
resp, err = http.DefaultClient.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "interact failed: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
result, _ = io.ReadAll(resp.Body)
fmt.Println(string(result))
```
### Step 3 — Stop the session
```go theme={null}
req, _ = http.NewRequest("DELETE", interactURL, nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, err = http.DefaultClient.Do(req)
if err != nil {
fmt.Fprintf(os.Stderr, "delete failed: %v\n", err)
os.Exit(1)
}
defer resp.Body.Close()
fmt.Println("Session stopped")
```
## Reusable helper
For repeated use, wrap the API in a small helper:
```go theme={null}
type FirecrawlClient struct {
APIKey string
BaseURL string
Client *http.Client
}
func NewFirecrawlClient(apiKey string) *FirecrawlClient {
return &FirecrawlClient{
APIKey: apiKey,
BaseURL: "https://api.firecrawl.dev/v2",
Client: &http.Client{},
}
}
func (fc *FirecrawlClient) post(endpoint string, payload interface{}) ([]byte, error) {
body, err := json.Marshal(payload)
if err != nil {
return nil, err
}
req, err := http.NewRequest("POST", fc.BaseURL+endpoint, bytes.NewReader(body))
if err != nil {
return nil, err
}
req.Header.Set("Authorization", "Bearer "+fc.APIKey)
req.Header.Set("Content-Type", "application/json")
resp, err := fc.Client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
func (fc *FirecrawlClient) Scrape(url string) ([]byte, error) {
return fc.post("/scrape", map[string]string{"url": url})
}
func (fc *FirecrawlClient) Search(query string, limit int) ([]byte, error) {
return fc.post("/search", map[string]interface{}{"query": query, "limit": limit})
}
```
A [community Go SDK](https://github.com/mendableai/firecrawl-go) is available for the v1 API. See the [Go SDK docs](/sdks/go) for details.
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Complete REST API documentation
# Hono
Source: https://docs.firecrawl.dev/quickstarts/hono
Use Firecrawl with Hono to build lightweight web scraping and search APIs that run anywhere.
## Prerequisites
* Node.js 18+, Bun, or Deno
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
npm install hono @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
```typescript theme={null}
import { Hono } from "hono";
import Firecrawl from "@mendable/firecrawl-js";
const app = new Hono();
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
app.post("/search", async (c) => {
const { query } = await c.req.json();
const results = await firecrawl.search(query, { limit: 5 });
return c.json(results);
});
export default app;
```
## Scrape a page
```typescript theme={null}
app.post("/scrape", async (c) => {
const { url } = await c.req.json();
const result = await firecrawl.scrape(url);
return c.json(result);
});
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
```typescript theme={null}
app.post("/interact", async (c) => {
const { url } = await c.req.json();
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
await firecrawl.stopInteraction(scrapeId);
return c.json({ output: response.output });
});
```
## Deploy anywhere
Hono runs on multiple runtimes. For Cloudflare Workers, pass the API key from the environment binding:
```typescript theme={null}
import { Hono } from "hono";
import Firecrawl from "@mendable/firecrawl-js";
type Bindings = { FIRECRAWL_API_KEY: string };
const app = new Hono<{ Bindings: Bindings }>();
app.post("/search", async (c) => {
const firecrawl = new Firecrawl({ apiKey: c.env.FIRECRAWL_API_KEY });
const { query } = await c.req.json();
const results = await firecrawl.search(query, { limit: 5 });
return c.json(results);
});
export default app;
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Java
Source: https://docs.firecrawl.dev/quickstarts/java
Get started with Firecrawl in Java. Search, scrape, and interact with web data using the official SDK.
## Prerequisites
* Java 11+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```kotlin theme={null}
dependencies {
implementation("com.firecrawl:firecrawl-java:1.2.0")
}
```
```xml theme={null}
com.firecrawlfirecrawl-java1.2.0
```
## Search the web
```java theme={null}
import com.firecrawl.client.FirecrawlClient;
import com.firecrawl.models.SearchData;
import com.firecrawl.models.SearchOptions;
public class Main {
public static void main(String[] args) {
FirecrawlClient client = FirecrawlClient.builder()
.apiKey("fc-YOUR-API-KEY")
.build();
SearchData results = client.search(
"firecrawl web scraping",
SearchOptions.builder().limit(5).build()
);
if (results.getWeb() != null) {
for (var result : results.getWeb()) {
System.out.println(result.get("title") + " - " + result.get("url"));
}
}
}
}
```
## Scrape a page
```java theme={null}
import com.firecrawl.models.Document;
Document doc = client.scrape("https://example.com");
System.out.println(doc.getMarkdown());
```
```json theme={null}
{
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
```
## Interact with a page
Open a browser session, run Playwright code against it, and close it when done:
```java theme={null}
import com.firecrawl.models.ScrapeOptions;
import com.firecrawl.models.BrowserExecuteResponse;
import java.util.List;
Document doc = client.scrape("https://www.amazon.com",
ScrapeOptions.builder().formats(List.of((Object) "markdown")).build());
String scrapeId = (String) doc.getMetadata().get("scrapeId");
BrowserExecuteResponse run = client.interact(scrapeId,
"const title = await page.title(); console.log(title);");
System.out.println(run.getStdout());
client.stopInteractiveBrowser(scrapeId);
```
## Environment variable
Instead of passing `apiKey` directly, set the `FIRECRAWL_API_KEY` environment variable:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
```java theme={null}
FirecrawlClient client = FirecrawlClient.fromEnv();
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Laravel
Source: https://docs.firecrawl.dev/quickstarts/laravel
Use Firecrawl with Laravel to search, scrape, and interact with web data using the REST API.
## Prerequisites
* Laravel 10+ project
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Configuration
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
Add the config entry to `config/services.php`:
```php theme={null}
'firecrawl' => [
'api_key' => env('FIRECRAWL_API_KEY'),
'base_url' => env('FIRECRAWL_API_URL', 'https://api.firecrawl.dev/v2'),
],
```
## Create a service class
Create `app/Services/FirecrawlService.php`:
```php theme={null}
apiKey = config('services.firecrawl.api_key');
$this->baseUrl = config('services.firecrawl.base_url');
}
public function search(string $query, int $limit = 5): array
{
$response = Http::withToken($this->apiKey)
->post("{$this->baseUrl}/search", [
'query' => $query,
'limit' => $limit,
]);
return $response->json();
}
public function scrape(string $url, array $options = []): array
{
$response = Http::withToken($this->apiKey)
->post("{$this->baseUrl}/scrape", array_merge(['url' => $url], $options));
return $response->json();
}
public function interact(string $url, string $prompt, ?string $followUp = null): array
{
// 1. Scrape to open a browser session
$scrapeResult = $this->scrape($url, ['formats' => ['markdown']]);
$scrapeId = $scrapeResult['data']['metadata']['scrapeId'];
// 2. Send first prompt
Http::withToken($this->apiKey)
->post("{$this->baseUrl}/scrape/{$scrapeId}/interact", [
'prompt' => $prompt,
]);
// 3. Send follow-up prompt
$result = null;
if ($followUp) {
$result = Http::withToken($this->apiKey)
->post("{$this->baseUrl}/scrape/{$scrapeId}/interact", [
'prompt' => $followUp,
])->json();
}
// 4. Close the session
Http::withToken($this->apiKey)
->delete("{$this->baseUrl}/scrape/{$scrapeId}/interact");
return $result ?? $scrapeResult;
}
}
```
## Create a controller
Create `app/Http/Controllers/FirecrawlController.php`:
```php theme={null}
validate(['query' => 'required|string']);
return response()->json(
$this->firecrawl->search($validated['query'], $request->input('limit', 5))
);
}
public function scrape(Request $request)
{
$validated = $request->validate(['url' => 'required|url']);
return response()->json($this->firecrawl->scrape($validated['url']));
}
public function interact(Request $request)
{
$validated = $request->validate([
'url' => 'required|url',
'prompt' => 'required|string',
]);
return response()->json(
$this->firecrawl->interact(
$validated['url'],
$validated['prompt'],
$request->input('followUp')
)
);
}
}
```
## Register routes
In `routes/api.php`:
```php theme={null}
use App\Http\Controllers\FirecrawlController;
Route::post('/search', [FirecrawlController::class, 'search']);
Route::post('/scrape', [FirecrawlController::class, 'scrape']);
Route::post('/interact', [FirecrawlController::class, 'interact']);
```
## Test it
```bash theme={null}
php artisan serve
# Search the web
curl -X POST http://localhost:8000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
# Scrape a page
curl -X POST http://localhost:8000/api/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Interact with a page
curl -X POST http://localhost:8000/api/interact \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "prompt": "Search for iPhone 16 Pro Max", "followUp": "Click on the first result and tell me the price"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Complete REST API documentation
# Mastra
Source: https://docs.firecrawl.dev/quickstarts/mastra
Wire Firecrawl into Mastra tools so your agents and workflows can search and scrape live web data.
## Prerequisites
* Node.js 22.13+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
* An API key from a supported [Mastra model provider](https://mastra.ai/models)
* An existing Mastra project — follow the [Mastra quickstart](https://mastra.ai/guides/getting-started/quickstart) to set one up
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
## Set your API key
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Build the Firecrawl tools
Create `src/mastra/tools/firecrawl.ts` to expose search and scrape as Mastra tools:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { createTool } from "@mastra/core/tools";
import { z } from "zod";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY! });
export const firecrawlSearch = createTool({
id: "firecrawl-search",
description: "Search the web and return top results.",
inputSchema: z.object({ query: z.string().min(1) }),
outputSchema: z.object({
results: z.array(
z.object({
title: z.string().nullable(),
url: z.string(),
}),
),
}),
execute: async ({ query }) => {
const results = await firecrawl.search(query, { limit: 3 });
return {
results: (results.web ?? []).map((item) => ({
title: item.title ?? null,
url: item.url,
})),
};
},
});
export const firecrawlScrape = createTool({
id: "firecrawl-scrape",
description: "Scrape a URL and return markdown content.",
inputSchema: z.object({ url: z.string().url() }),
outputSchema: z.object({ markdown: z.string() }),
execute: async ({ url }) => {
const result = await firecrawl.scrape(url, {
formats: ["markdown"],
onlyMainContent: true,
});
return { markdown: result.markdown ?? "" };
},
});
```
## Create the agent
Create `src/mastra/agents/web-agent.ts` and give it the Firecrawl tools:
```typescript theme={null}
import { Agent } from "@mastra/core/agent";
import { firecrawlSearch, firecrawlScrape } from "../tools/firecrawl";
export const webAgent = new Agent({
id: "web-agent",
name: "Web Agent",
instructions:
"Use Firecrawl tools to search and scrape web pages, then summarize the results.",
model: "openai/gpt-5.4",
tools: { firecrawlSearch, firecrawlScrape },
});
```
## Register the agent
Register the agent on your Mastra instance in `src/mastra/index.ts`:
```typescript theme={null}
import { Mastra } from "@mastra/core";
import { webAgent } from "./agents/web-agent";
export const mastra = new Mastra({
agents: { webAgent },
});
```
## Test in Studio
Run the dev server and open [Mastra Studio](https://mastra.ai/docs/studio/overview):
```bash theme={null}
mastra dev
```
Open the **Web Agent** and try prompts like:
* "Find the latest Firecrawl changelog and summarize the last release."
* "Search for Firecrawl pricing and extract the plan tiers."
## Self-hosted Firecrawl
If you run Firecrawl locally, set `FIRECRAWL_API_URL` and pass `apiUrl` to the client:
```typescript theme={null}
const firecrawl = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY!,
apiUrl: process.env.FIRECRAWL_API_URL,
});
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Let an agent drive Firecrawl end to end
Full SDK reference with crawl, map, batch scrape, and more
# NestJS
Source: https://docs.firecrawl.dev/quickstarts/nestjs
Use Firecrawl with NestJS to build structured web scraping and search services.
## Prerequisites
* NestJS project (`@nestjs/cli`)
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Create a Firecrawl service
Create `src/firecrawl/firecrawl.service.ts`:
```typescript theme={null}
import { Injectable } from "@nestjs/common";
import Firecrawl from "@mendable/firecrawl-js";
@Injectable()
export class FirecrawlService {
private readonly client: Firecrawl;
constructor() {
this.client = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
}
async search(query: string, limit = 5) {
return this.client.search(query, { limit });
}
async scrape(url: string) {
return this.client.scrape(url);
}
async interact(url: string, prompts: string[]) {
const result = await this.client.scrape(url, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
const responses = [];
for (const prompt of prompts) {
const response = await this.client.interact(scrapeId, { prompt });
responses.push(response);
}
await this.client.stopInteraction(scrapeId);
return responses;
}
}
```
## Create a controller
Create `src/firecrawl/firecrawl.controller.ts`:
```typescript theme={null}
import { Body, Controller, Post } from "@nestjs/common";
import { FirecrawlService } from "./firecrawl.service";
@Controller("firecrawl")
export class FirecrawlController {
constructor(private readonly firecrawlService: FirecrawlService) {}
@Post("search")
async search(@Body("query") query: string) {
return this.firecrawlService.search(query);
}
@Post("scrape")
async scrape(@Body("url") url: string) {
return this.firecrawlService.scrape(url);
}
@Post("interact")
async interact(@Body("url") url: string, @Body("prompts") prompts: string[]) {
return this.firecrawlService.interact(url, prompts);
}
}
```
## Register the module
Create `src/firecrawl/firecrawl.module.ts`:
```typescript theme={null}
import { Module } from "@nestjs/common";
import { FirecrawlService } from "./firecrawl.service";
import { FirecrawlController } from "./firecrawl.controller";
@Module({
providers: [FirecrawlService],
controllers: [FirecrawlController],
exports: [FirecrawlService],
})
export class FirecrawlModule {}
```
Import `FirecrawlModule` in your `AppModule`.
## Test it
```bash theme={null}
curl -X POST http://localhost:3000/firecrawl/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Next.js
Source: https://docs.firecrawl.dev/quickstarts/nextjs
Use Firecrawl with Next.js to scrape, search, and interact with web data in your React application.
## Prerequisites
* Next.js 14+ project (App Router)
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
## Set your API key
Add your API key to `.env.local`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
The SDK should only run server-side since it requires your API key.
### Route Handler
Create `app/api/search/route.ts`:
```typescript theme={null}
import { NextResponse } from "next/server";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function POST(request: Request) {
const { query } = await request.json();
const results = await firecrawl.search(query, { limit: 5 });
return NextResponse.json(results);
}
```
### Server Action
Create `app/actions.ts` for use from Client Components:
```typescript theme={null}
"use server";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function searchWeb(query: string) {
const results = await firecrawl.search(query, { limit: 5 });
return (results.web || []).map((r) => ({ title: r.title, url: r.url }));
}
```
## Scrape a page
### Route Handler
Create `app/api/scrape/route.ts`:
```typescript theme={null}
import { NextResponse } from "next/server";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function POST(request: Request) {
const { url } = await request.json();
const result = await firecrawl.scrape(url);
return NextResponse.json(result);
}
```
### Server Component
Fetch data directly in a Server Component at `app/page.tsx`:
```tsx theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export default async function Page() {
const result = await firecrawl.scrape("https://example.com");
return (
Scraped Content
{result.markdown}
);
}
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
### Route Handler
Create `app/api/interact/route.ts`:
```typescript theme={null}
import { NextResponse } from "next/server";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function POST(request: Request) {
const { url, prompts } = await request.json();
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
await firecrawl.stopInteraction(scrapeId);
return NextResponse.json({ output: response.output });
}
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Node.js
Source: https://docs.firecrawl.dev/quickstarts/nodejs
Get started with Firecrawl in Node.js. Scrape, search, and interact with web data using the official SDK.
## Prerequisites
* Node.js 18+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
## Environment variable
Instead of passing `apiKey` directly, set the `FIRECRAWL_API_KEY` environment variable:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
```javascript theme={null}
const app = new Firecrawl();
```
## Search the web
```javascript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await app.search("firecrawl web scraping", { limit: 5 });
for (const result of results.web) {
console.log(result.title, result.url);
}
```
## Scrape a page
```javascript theme={null}
const result = await app.scrape("https://example.com");
console.log(result.markdown);
```
```json theme={null}
{
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
```javascript theme={null}
const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
console.log(response.output);
await app.stopInteraction(scrapeId);
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Nuxt
Source: https://docs.firecrawl.dev/quickstarts/nuxt
Use Firecrawl with Nuxt to scrape, search, and interact with web data in your Vue application.
## Prerequisites
* Nuxt 3+ project
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Create `server/api/search.post.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY,
});
export default defineEventHandler(async (event) => {
const { query } = await readBody(event);
const results = await firecrawl.search(query, { limit: 5 });
return results;
});
```
Call it from a Vue component:
```vue theme={null}
```
## Scrape a page
Create `server/api/scrape.post.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY,
});
export default defineEventHandler(async (event) => {
const { url } = await readBody(event);
const result = await firecrawl.scrape(url);
return result;
});
```
Call it from a Vue component:
```vue theme={null}
{{ data.markdown }}
```
## Interact with a page
Create `server/api/interact.post.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY,
});
export default defineEventHandler(async (event) => {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
await firecrawl.stopInteraction(scrapeId);
return { output: response.output };
});
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# PHP
Source: https://docs.firecrawl.dev/quickstarts/php
Get started with Firecrawl in PHP. Scrape, search, and interact with web data using the REST API.
## Prerequisites
* PHP 8.0+ with cURL extension
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Search the web
Firecrawl works with PHP through the REST API using cURL.
```php theme={null}
true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
'Authorization: Bearer ' . $apiKey,
'Content-Type: application/json',
],
CURLOPT_POSTFIELDS => json_encode([
'query' => 'firecrawl web scraping',
'limit' => 5,
]),
]);
$response = curl_exec($ch);
curl_close($ch);
$results = json_decode($response, true);
foreach ($results['data']['web'] as $result) {
echo $result['title'] . ' - ' . $result['url'] . "\n";
}
```
```json theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://docs.firecrawl.dev",
"title": "Firecrawl Documentation",
"markdown": "# Firecrawl\n\nFirecrawl is a web scraping API..."
}
]
}
}
```
## Scrape a page
```php theme={null}
true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
'Authorization: Bearer ' . $apiKey,
'Content-Type: application/json',
],
CURLOPT_POSTFIELDS => json_encode([
'url' => 'https://example.com',
]),
]);
$response = curl_exec($ch);
curl_close($ch);
$data = json_decode($response, true);
echo $data['data']['markdown'];
```
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
## Interact with a page
Start a browser session, interact with the page using natural-language prompts, then close the session.
### Step 1 — Scrape to start a session
```php theme={null}
true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
'Authorization: Bearer ' . $apiKey,
'Content-Type: application/json',
],
CURLOPT_POSTFIELDS => json_encode([
'url' => 'https://www.amazon.com',
'formats' => ['markdown'],
]),
]);
$response = curl_exec($ch);
curl_close($ch);
$scrapeResult = json_decode($response, true);
$scrapeId = $scrapeResult['data']['metadata']['scrapeId'];
echo "scrapeId: $scrapeId\n";
```
### Step 2 — Send interactions
```php theme={null}
true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_POSTFIELDS => json_encode([
'prompt' => 'Search for iPhone 16 Pro Max',
]),
]);
$response = curl_exec($ch);
curl_close($ch);
echo $response . "\n";
// Click on the first result
$ch = curl_init($interactUrl);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => $headers,
CURLOPT_POSTFIELDS => json_encode([
'prompt' => 'Click on the first result and tell me the price',
]),
]);
$response = curl_exec($ch);
curl_close($ch);
echo $response . "\n";
```
### Step 3 — Stop the session
```php theme={null}
true,
CURLOPT_CUSTOMREQUEST => 'DELETE',
CURLOPT_HTTPHEADER => [
'Authorization: Bearer ' . $apiKey,
],
]);
curl_exec($ch);
curl_close($ch);
echo "Session stopped\n";
```
## Reusable helper class
For repeated use, wrap the API in a class:
```php theme={null}
apiKey = $apiKey;
}
private function post(string $endpoint, array $payload): array
{
$ch = curl_init($this->baseUrl . $endpoint);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_HTTPHEADER => [
'Authorization: Bearer ' . $this->apiKey,
'Content-Type: application/json',
],
CURLOPT_POSTFIELDS => json_encode($payload),
]);
$response = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode >= 400) {
throw new \RuntimeException("Firecrawl API error: HTTP $httpCode");
}
return json_decode($response, true);
}
public function scrape(string $url, array $options = []): array
{
return $this->post('/scrape', array_merge(['url' => $url], $options));
}
public function search(string $query, int $limit = 5): array
{
return $this->post('/search', ['query' => $query, 'limit' => $limit]);
}
}
// Usage
$app = new Firecrawl(getenv('FIRECRAWL_API_KEY'));
$result = $app->scrape('https://example.com');
echo $result['data']['markdown'];
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Complete REST API documentation
# Python
Source: https://docs.firecrawl.dev/quickstarts/python
Get started with Firecrawl in Python. Scrape, search, and interact with web data using the official SDK.
## Prerequisites
* Python 3.8+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
pip install firecrawl-py
```
## Search the web
```python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
results = app.search("firecrawl web scraping", limit=5)
for result in results.web:
print(result.title, result.url)
```
## Scrape a page
```python theme={null}
result = app.scrape("https://example.com")
print(result.markdown)
```
```json theme={null}
{
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
```
## Interact with a page
Use Interact to control a live browser session — click buttons, fill forms, and extract dynamic content.
```python theme={null}
result = app.scrape("https://www.amazon.com", formats=["markdown"])
scrape_id = result.metadata.scrape_id
app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max")
response = app.interact(scrape_id, prompt="Click on the first result and tell me the price")
print(response.output)
app.stop_interaction(scrape_id)
```
## Environment variable
Instead of passing `api_key` directly, set the `FIRECRAWL_API_KEY` environment variable:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
```python theme={null}
app = Firecrawl()
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, async, and more
# Rails
Source: https://docs.firecrawl.dev/quickstarts/rails
Use Firecrawl with Ruby on Rails to search, scrape, and interact with web data using the REST API.
## Prerequisites
* Ruby 3.0+ and Rails 7+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Configuration
Add your API key to your Rails credentials or environment:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Create a service
Create `app/services/firecrawl_service.rb`:
```ruby theme={null}
require "net/http"
require "json"
require "uri"
class FirecrawlService
BASE_URL = "https://api.firecrawl.dev/v2"
def initialize(api_key: ENV.fetch("FIRECRAWL_API_KEY"))
@api_key = api_key
end
def search(query, limit: 5)
post("/search", { query: query, limit: limit })
end
def scrape(url, **options)
post("/scrape", { url: url }.merge(options))
end
def interact(url, prompt, follow_up: nil)
# 1. Scrape to open a browser session
scrape_result = scrape(url, formats: ["markdown"])
scrape_id = scrape_result.dig("data", "metadata", "scrapeId")
# 2. Send first prompt
post("/scrape/#{scrape_id}/interact", { prompt: prompt })
# 3. Send follow-up prompt
result = nil
if follow_up
result = post("/scrape/#{scrape_id}/interact", { prompt: follow_up })
end
# 4. Close the session
delete("/scrape/#{scrape_id}/interact")
result || scrape_result
end
private
def post(endpoint, payload)
uri = URI("#{BASE_URL}#{endpoint}")
request = Net::HTTP::Post.new(uri)
request["Authorization"] = "Bearer #{@api_key}"
request["Content-Type"] = "application/json"
request.body = payload.to_json
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
http.request(request)
end
JSON.parse(response.body)
end
def delete(endpoint)
uri = URI("#{BASE_URL}#{endpoint}")
request = Net::HTTP::Delete.new(uri)
request["Authorization"] = "Bearer #{@api_key}"
Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
http.request(request)
end
end
end
```
## Create a controller
Generate a controller:
```bash theme={null}
rails generate controller Firecrawl search scrape interact --skip-routes
```
Edit `app/controllers/firecrawl_controller.rb`:
```ruby theme={null}
class FirecrawlController < ApplicationController
skip_before_action :verify_authenticity_token
def search
service = FirecrawlService.new
result = service.search(params.require(:query), limit: params.fetch(:limit, 5).to_i)
render json: result
end
def scrape
service = FirecrawlService.new
result = service.scrape(params.require(:url))
render json: result
end
def interact
service = FirecrawlService.new
result = service.interact(
params.require(:url),
params.require(:prompt),
follow_up: params[:followUp]
)
render json: result
end
end
```
## Add routes
In `config/routes.rb`:
```ruby theme={null}
Rails.application.routes.draw do
post "api/search", to: "firecrawl#search"
post "api/scrape", to: "firecrawl#scrape"
post "api/interact", to: "firecrawl#interact"
end
```
## Test it
```bash theme={null}
rails server
# Search the web
curl -X POST http://localhost:3000/api/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
# Scrape a page
curl -X POST http://localhost:3000/api/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Interact with a page
curl -X POST http://localhost:3000/api/interact \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "prompt": "Search for iPhone 16 Pro Max", "followUp": "Click on the first result and tell me the price"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Complete REST API documentation
# Remix
Source: https://docs.firecrawl.dev/quickstarts/remix
Use Firecrawl with Remix to scrape, search, and interact with web data in your full-stack React app.
## Prerequisites
* Remix project
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Use Firecrawl in an `action` to handle form submissions. Create `app/routes/search.tsx`:
```tsx theme={null}
import { json, type ActionFunctionArgs } from "@remix-run/node";
import { Form, useActionData } from "@remix-run/react";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function action({ request }: ActionFunctionArgs) {
const formData = await request.formData();
const query = formData.get("query") as string;
const results = await firecrawl.search(query, { limit: 5 });
return json({ results: (results.web || []).map((r) => ({ title: r.title, url: r.url })) });
}
export default function SearchPage() {
const data = useActionData();
return (
);
}
```
## Scrape a page
Use Firecrawl in a `loader` to fetch data at request time. Create `app/routes/scrape.tsx`:
```tsx theme={null}
import { json, type LoaderFunctionArgs } from "@remix-run/node";
import { useLoaderData } from "@remix-run/react";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function loader({ request }: LoaderFunctionArgs) {
const url = new URL(request.url);
const target = url.searchParams.get("url");
if (!target) return json({ markdown: null });
const result = await firecrawl.scrape(target);
return json({ markdown: result.markdown });
}
export default function ScrapePage() {
const { markdown } = useLoaderData();
return (
Scraped Content
{markdown ?
{markdown}
:
Pass ?url= to scrape a page
}
);
}
```
## Interact with a page
Use interact to control a live browser session — click buttons, fill forms, and extract dynamic content. Create `app/routes/interact.tsx`:
```tsx theme={null}
import { json, type ActionFunctionArgs } from "@remix-run/node";
import { Form, useActionData } from "@remix-run/react";
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function action({ request }: ActionFunctionArgs) {
const formData = await request.formData();
const url = formData.get("url") as string;
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await firecrawl.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
await firecrawl.stopInteraction(scrapeId);
return json({ output: response.output });
}
export default function InteractPage() {
const data = useActionData();
return (
{data?.output &&
{data.output}
}
);
}
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Ruby
Source: https://docs.firecrawl.dev/quickstarts/ruby
Get started with Firecrawl in Ruby. Search, scrape, and interact with web data using the REST API.
## Prerequisites
* Ruby 3.0+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Search the web
Firecrawl works with Ruby through the REST API using `net/http`.
```ruby theme={null}
require "net/http"
require "json"
require "uri"
api_key = ENV.fetch("FIRECRAWL_API_KEY")
uri = URI("https://api.firecrawl.dev/v2/search")
request = Net::HTTP::Post.new(uri)
request["Authorization"] = "Bearer #{api_key}"
request["Content-Type"] = "application/json"
request.body = { query: "firecrawl web scraping", limit: 5 }.to_json
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(request) }
results = JSON.parse(response.body)
results["data"]["web"].each do |result|
puts "#{result['title']} - #{result['url']}"
end
```
## Scrape a page
```ruby theme={null}
uri = URI("https://api.firecrawl.dev/v2/scrape")
request = Net::HTTP::Post.new(uri)
request["Authorization"] = "Bearer #{api_key}"
request["Content-Type"] = "application/json"
request.body = { url: "https://example.com" }.to_json
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(request) }
data = JSON.parse(response.body)
puts data.dig("data", "markdown")
```
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
}
```
## Interact with a page
Scrape a page, then keep working with it using natural language prompts.
```ruby theme={null}
uri = URI("https://api.firecrawl.dev/v2/scrape")
request = Net::HTTP::Post.new(uri)
request["Authorization"] = "Bearer #{api_key}"
request["Content-Type"] = "application/json"
request.body = { url: "https://www.amazon.com", formats: ["markdown"] }.to_json
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) { |http| http.request(request) }
scrape_id = JSON.parse(response.body).dig("data", "metadata", "scrapeId")
interact_uri = URI("https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact")
interact_req = Net::HTTP::Post.new(interact_uri)
interact_req["Authorization"] = "Bearer #{api_key}"
interact_req["Content-Type"] = "application/json"
interact_req.body = { prompt: "Search for iPhone 16 Pro Max" }.to_json
interact_resp = Net::HTTP.start(interact_uri.hostname, interact_uri.port, use_ssl: true) { |http| http.request(interact_req) }
puts JSON.parse(interact_resp.body)
# Stop the session
delete_uri = URI("https://api.firecrawl.dev/v2/scrape/#{scrape_id}/interact")
delete_req = Net::HTTP::Delete.new(delete_uri)
delete_req["Authorization"] = "Bearer #{api_key}"
Net::HTTP.start(delete_uri.hostname, delete_uri.port, use_ssl: true) { |http| http.request(delete_req) }
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Complete REST API documentation
Click, fill forms, and extract dynamic content
# Rust
Source: https://docs.firecrawl.dev/quickstarts/rust
Get started with Firecrawl in Rust. Search, scrape, and interact with web data using the official SDK.
## Prerequisites
* Rust 1.70+ with Cargo
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the crate
Add `firecrawl` to your `Cargo.toml`:
```toml theme={null}
[dependencies]
firecrawl = "2"
tokio = { version = "1", features = ["full"] }
serde_json = "1"
```
## Search the web
```rust theme={null}
use firecrawl::{Client, SearchOptions};
#[tokio::main]
async fn main() -> Result<(), Box> {
let client = Client::new("fc-YOUR-API-KEY")?;
let results = client.search(
"firecrawl web scraping",
SearchOptions { limit: Some(5), ..Default::default() },
).await?;
if let Some(web) = results.data.web {
for item in web {
if let firecrawl::SearchResultOrDocument::WebResult(r) = item {
println!("{} - {}", r.url, r.title.unwrap_or_default());
}
}
}
Ok(())
}
```
## Scrape a page
```rust theme={null}
let doc = client.scrape("https://example.com", None).await?;
println!("{}", doc.markdown.unwrap_or_default());
```
```json theme={null}
{
"markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"metadata": {
"title": "Example Domain",
"sourceURL": "https://example.com"
}
}
```
## Interact with a page
Scrape a page to get a `scrapeId`, then use the interact API to control the browser session:
```rust theme={null}
use firecrawl::{Client, ScrapeOptions, Format, ScrapeExecuteOptions};
let doc = client.scrape(
"https://www.amazon.com",
ScrapeOptions {
formats: Some(vec![Format::Markdown]),
..Default::default()
},
).await?;
let scrape_id = doc.metadata
.as_ref()
.and_then(|m| m.scrape_id.as_deref())
.expect("scrapeId not found");
// Send a prompt to interact with the page
let run = client.interact(
scrape_id,
ScrapeExecuteOptions {
prompt: Some("Search for iPhone 16 Pro Max".to_string()),
..Default::default()
},
).await?;
let run = client.interact(
scrape_id,
ScrapeExecuteOptions {
prompt: Some("Click on the first result and tell me the price".to_string()),
..Default::default()
},
).await?;
println!("{:?}", run.output);
// Close the session
client.stop_interaction(scrape_id).await?;
```
## Environment variable
Set `FIRECRAWL_API_KEY` instead of passing the key directly:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
```rust theme={null}
let api_key = std::env::var("FIRECRAWL_API_KEY")?;
let client = Client::new(api_key)?;
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Spring Boot
Source: https://docs.firecrawl.dev/quickstarts/spring-boot
Use Firecrawl with Spring Boot to search, scrape, and interact with web data using the official Java SDK.
## Prerequisites
* Java 17+ and Spring Boot 3+
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Add the dependency
```kotlin theme={null}
dependencies {
implementation("com.firecrawl:firecrawl-java:1.2.0")
}
```
```xml theme={null}
com.firecrawlfirecrawl-java1.2.0
```
## Configuration
Add your API key to `application.properties`:
```properties theme={null}
firecrawl.api-key=${FIRECRAWL_API_KEY}
```
Or set it as an environment variable:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Create a configuration bean
Create `FirecrawlConfig.java`:
```java theme={null}
import com.firecrawl.client.FirecrawlClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class FirecrawlConfig {
@Bean
public FirecrawlClient firecrawlClient(
@Value("${firecrawl.api-key}") String apiKey) {
return FirecrawlClient.builder()
.apiKey(apiKey)
.build();
}
}
```
## Create a REST controller
Create `FirecrawlController.java`:
```java theme={null}
import com.firecrawl.client.FirecrawlClient;
import com.firecrawl.models.Document;
import com.firecrawl.models.SearchData;
import com.firecrawl.models.SearchOptions;
import com.firecrawl.models.ScrapeOptions;
import com.firecrawl.models.BrowserExecuteResponse;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.Map;
@RestController
@RequestMapping("/api")
public class FirecrawlController {
private final FirecrawlClient firecrawl;
public FirecrawlController(FirecrawlClient firecrawl) {
this.firecrawl = firecrawl;
}
@PostMapping("/search")
public SearchData search(@RequestBody Map body) {
return firecrawl.search(
(String) body.get("query"),
SearchOptions.builder()
.limit((int) body.getOrDefault("limit", 5))
.build()
);
}
@PostMapping("/scrape")
public Map scrape(@RequestBody Map body) {
Document doc = firecrawl.scrape(body.get("url"));
return Map.of(
"markdown", doc.getMarkdown(),
"metadata", doc.getMetadata()
);
}
@PostMapping("/interact")
public Map interact(@RequestBody Map body) {
Document doc = firecrawl.scrape(body.get("url"),
ScrapeOptions.builder().formats(List.of((Object) "markdown")).build());
String scrapeId = (String) doc.getMetadata().get("scrapeId");
BrowserExecuteResponse response = firecrawl.interact(scrapeId,
body.getOrDefault("code", "const title = await page.title(); console.log(title);"));
firecrawl.stopInteractiveBrowser(scrapeId);
return Map.of("result", response.getStdout());
}
}
```
## Run it
```bash theme={null}
./gradlew bootRun
```
## Test it
```bash theme={null}
# Search the web
curl -X POST http://localhost:8080/api/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
# Scrape a page
curl -X POST http://localhost:8080/api/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Interact with a page
curl -X POST http://localhost:8080/api/interact \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "code": "const title = await page.title(); console.log(title);"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Supabase Edge Functions
Source: https://docs.firecrawl.dev/quickstarts/supabase-edge-functions
Use Firecrawl with Supabase Edge Functions to search, scrape, and interact with web data at the edge.
## Prerequisites
* Supabase project with CLI (`supabase init`)
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
supabase functions new firecrawl-search
supabase functions new firecrawl-scrape
supabase functions new firecrawl-interact
```
Set the secret:
```bash theme={null}
supabase secrets set FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Edit `supabase/functions/firecrawl-search/index.ts`:
```typescript theme={null}
import Firecrawl from "npm:@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: Deno.env.get("FIRECRAWL_API_KEY"),
});
Deno.serve(async (req) => {
const { query } = await req.json();
const results = await firecrawl.search(query, { limit: 5 });
return new Response(JSON.stringify(results), {
headers: { "Content-Type": "application/json" },
});
});
```
## Scrape a page
Edit `supabase/functions/firecrawl-scrape/index.ts`:
```typescript theme={null}
import Firecrawl from "npm:@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: Deno.env.get("FIRECRAWL_API_KEY"),
});
Deno.serve(async (req) => {
const { url } = await req.json();
const result = await firecrawl.scrape(url);
return new Response(JSON.stringify(result), {
headers: { "Content-Type": "application/json" },
});
});
```
## Interact with a page
Edit `supabase/functions/firecrawl-interact/index.ts`:
```typescript theme={null}
import Firecrawl from "npm:@mendable/firecrawl-js";
const firecrawl = new Firecrawl({
apiKey: Deno.env.get("FIRECRAWL_API_KEY"),
});
Deno.serve(async (_req) => {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
console.log(response.output);
await firecrawl.stopInteraction(scrapeId);
return new Response(JSON.stringify({ output: response.output }), {
headers: { "Content-Type": "application/json" },
});
});
```
## Deploy
```bash theme={null}
supabase functions deploy firecrawl-search
supabase functions deploy firecrawl-scrape
supabase functions deploy firecrawl-interact
```
## Test it
```bash theme={null}
curl -X POST https://.supabase.co/functions/v1/firecrawl-search \
-H "Authorization: Bearer " \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# SvelteKit
Source: https://docs.firecrawl.dev/quickstarts/sveltekit
Use Firecrawl with SvelteKit to scrape, search, and interact with web data in your Svelte application.
## Prerequisites
* SvelteKit project
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Install the SDK
```bash theme={null}
npm install @mendable/firecrawl-js
```
Add your API key to `.env`:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Create a form action in `src/routes/search/+page.server.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { FIRECRAWL_API_KEY } from "$env/static/private";
const firecrawl = new Firecrawl({ apiKey: FIRECRAWL_API_KEY });
export const actions = {
default: async ({ request }) => {
const data = await request.formData();
const query = data.get("query") as string;
const results = await firecrawl.search(query, { limit: 5 });
return { results: (results.web || []).map((r) => ({ title: r.title, url: r.url })) };
},
};
```
Wire it up in `src/routes/search/+page.svelte`:
```svelte theme={null}
{#if form?.results}
{#each form.results as result}
{/each}
{/if}
```
## Scrape a page
Fetch data in a load function at `src/routes/scrape/+page.server.ts`:
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { FIRECRAWL_API_KEY } from "$env/static/private";
const firecrawl = new Firecrawl({ apiKey: FIRECRAWL_API_KEY });
export async function load({ url }) {
const target = url.searchParams.get("url");
if (!target) return { markdown: null };
const result = await firecrawl.scrape(target);
return { markdown: result.markdown };
}
```
Display it in `src/routes/scrape/+page.svelte`:
```svelte theme={null}
{#if data.markdown}
{data.markdown}
{:else}
Pass ?url= to scrape a page
{/if}
```
## Interact with a page
Create a server endpoint at `src/routes/api/interact/+server.ts`:
```typescript theme={null}
import { json } from "@sveltejs/kit";
import Firecrawl from "@mendable/firecrawl-js";
import { FIRECRAWL_API_KEY } from "$env/static/private";
const firecrawl = new Firecrawl({ apiKey: FIRECRAWL_API_KEY });
export async function POST() {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
await firecrawl.stopInteraction(scrapeId);
return json({ output: response.output });
}
```
## Next steps
All scrape options including formats, actions, and proxies
Search the web and get full page content
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Vercel Functions
Source: https://docs.firecrawl.dev/quickstarts/vercel-functions
Use Firecrawl with Vercel Functions to search, scrape, and interact with web data in serverless deployments.
## Prerequisites
* Vercel project (Next.js, SvelteKit, Nuxt, or standalone)
* A Firecrawl API key — [get one free](https://www.firecrawl.dev/app/api-keys)
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js
```
Add `FIRECRAWL_API_KEY` as an environment variable in your Vercel project settings, or in `.env.local` for local development:
```bash theme={null}
FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
## Search the web
Create `api/search.ts` (or `app/api/search/route.ts` for Next.js):
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function POST(request: Request) {
const { query } = await request.json();
const results = await firecrawl.search(query, { limit: 5 });
return new Response(JSON.stringify(results), {
headers: { "Content-Type": "application/json" },
});
}
```
## Scrape a page
Create `api/scrape.ts` (or `app/api/scrape/route.ts` for Next.js):
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function POST(request: Request) {
const { url } = await request.json();
const result = await firecrawl.scrape(url);
return new Response(JSON.stringify(result), {
headers: { "Content-Type": "application/json" },
});
}
```
## Interact with a page
Create `api/interact.ts` (or `app/api/interact/route.ts` for Next.js):
```typescript theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
export async function POST(request: Request) {
const result = await firecrawl.scrape("https://www.amazon.com", {
formats: ["markdown"],
});
const scrapeId = result.metadata?.scrapeId;
await firecrawl.interact(scrapeId, {
prompt: "Search for iPhone 16 Pro Max",
});
const response = await firecrawl.interact(scrapeId, {
prompt: "Click on the first result and tell me the price",
});
await firecrawl.stopInteraction(scrapeId);
return new Response(JSON.stringify({ output: response.output }), {
headers: { "Content-Type": "application/json" },
});
}
```
## Deploy
```bash theme={null}
vercel deploy
```
## Test it
```bash theme={null}
curl -X POST https://your-project.vercel.app/api/search \
-H "Content-Type: application/json" \
-d '{"query": "firecrawl web scraping"}'
```
Vercel Functions have a default timeout of 10 seconds on the Hobby plan and 60 seconds on Pro. For large crawl jobs, use the Firecrawl async API with webhooks instead.
## Next steps
Search the web and get full page content
All scrape options including formats, actions, and proxies
Click, fill forms, and extract dynamic content
Full SDK reference with crawl, map, batch scrape, and more
# Rate Limits
Source: https://docs.firecrawl.dev/rate-limits
Rate limits for different pricing plans and API requests
When you exceed a rate or concurrency limit, the API returns a `429` response. See [Errors](/api-reference/errors) for the full error catalog and a retry-with-backoff snippet.
## Billing Model
Firecrawl uses subscription-based monthly plans. There is no pure pay-as-you-go model, but the **auto-recharge feature** provides flexible scaling. Once you subscribe to a plan, you can automatically purchase additional credits when you dip below a threshold. Larger auto-recharge packs offer better rates. To test before committing to a larger plan, start with the Free or Hobby tier.
Plan downgrades take effect at the next renewal. Unused-time credits are not issued.
## Concurrent Browser Limits
Concurrent browsers control how many pages Firecrawl can process for you in parallel. Your plan sets the ceiling; any jobs beyond it wait in a queue until a browser frees up.
Time spent in the queue counts against the request's [`timeout`](/advanced-scraping-guide#timing-and-cache) parameter, so you can set a lower timeout to fail fast instead of waiting. To see current availability before sending work, call the [Queue Status](/api-reference/endpoint/queue-status) endpoint. Jobs that are waiting in your concurrency queue will time out after a maximum of 48 hours.
### Current Plans
Plan
Concurrent Browsers
Max Queued Jobs
Free
2
50,000
Hobby
5
50,000
Standard
50
100,000
Growth
100
200,000
Scale / Enterprise
150+
300,000+
Each team has a maximum number of jobs that can be waiting in the concurrency queue. If you exceed this limit, new jobs will be rejected with a `429` status code until existing jobs complete. For larger plans with custom concurrency limits, the max queued jobs is 2,000 times your concurrency limit, capped at 2,000,000.
If you require higher concurrency limits, [contact us about enterprise plans](https://firecrawl.dev/enterprise).
### Extract Plans (Legacy)
Plan
Concurrent Browsers
Max Queued Jobs
Free
2
50,000
Starter
50
100,000
Explorer
100
200,000
Pro
200
400,000
## API Rate Limits
Rate limits are measured in requests per minute and are primarily in place to prevent abuse. When configured correctly, your real bottleneck will be concurrent browsers. Rate limits are applied per team, so all API keys on the same team share the same rate limit counters.
### Current Plans
Plan
/scrape
/map
/crawl
/search
/agent
/crawl/status
/agent/status
Free
10
10
1
5
10
1500
500
Hobby
100
100
15
50
100
1500
25000
Standard
500
500
50
250
500
1500
25000
Growth
5000
5000
250
2500
1000
1500
25000
Scale
7500
7500
750
7500
1000
25000
25000
These rate limits are enforced to ensure fair usage and availability of the API for all users. If you require higher limits, please contact us at [help@firecrawl.com](mailto:help@firecrawl.com) to discuss custom plans.
### Extract Endpoints
The extract endpoints share limits with the corresponding /agent rate limits.
### Batch Scrape Endpoints
The batch scrape endpoints share limits with the corresponding /crawl rate limits.
### Browser Sandbox
The browser sandbox endpoints have per-plan rate limits that scale with your subscription:
Plan
/browser
/browser/\{id}/execute
Free
2
10
Hobby
20
100
Standard
100
500
Growth
1,000
5,000
Scale
1,500
7,500
In addition, each team's plan determines how many browser sessions can be active simultaneously (see Concurrent Browser Limits above). If you exceed this limit, new session requests will return a `429` status code until existing sessions are destroyed.
### FIRE-1 Agent
Requests involving the FIRE-1 agent requests have separate rate limits that are counted independently for each endpoint:
Endpoint
Rate Limit (requests/min)
/scrape
10
/extract
10
### Extract Plans (Legacy)
Plan
/extract (requests/min)
/extract/status (requests/min)
Starter
100
25000
Explorer
500
25000
Pro
1000
25000
# Skill + CLI
Source: https://docs.firecrawl.dev/sdks/cli
Firecrawl Skill is an easy way for AI agents such as Claude Code, Antigravity and OpenCode to use Firecrawl through the CLI.
Search, scrape, and interact with the web directly from the terminal. The Firecrawl CLI works standalone or as a skill that AI coding agents like Claude Code, Antigravity, and OpenCode can discover and use automatically.
## Installation
If you are using an AI agent like Claude Code, you can install the Firecrawl skill below and the agent will set it up for you.
```bash theme={null}
npx -y firecrawl-cli@latest init --all --browser
```
* `--all` installs the Firecrawl skill to every detected AI coding agent
* `--browser` opens the browser for Firecrawl authentication automatically
After installing the skill, restart your agent for it to discover the new skill.
You can also manually install the Firecrawl CLI globally using npm:
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl-cli
```
## Authentication
Before using the CLI, you need to authenticate with your Firecrawl API key.
### Login
```bash CLI theme={null}
# Interactive login (opens browser or prompts for API key)
firecrawl login
# Login with browser authentication (recommended for agents)
firecrawl login --browser
# Login with API key directly
firecrawl login --api-key fc-YOUR-API-KEY
# Or set via environment variable
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
### View Configuration
```bash CLI theme={null}
# View current configuration and authentication status
firecrawl view-config
```
### Logout
```bash CLI theme={null}
# Clear stored credentials
firecrawl logout
```
### Self-Hosted / Local Development
For self-hosted Firecrawl instances or local development, use the `--api-url` option:
```bash CLI theme={null}
# Use a local Firecrawl instance (no API key required)
firecrawl --api-url http://localhost:3002 scrape https://example.com
# Or set via environment variable
export FIRECRAWL_API_URL=http://localhost:3002
firecrawl scrape https://example.com
# Configure and persist the custom API URL
firecrawl config --api-url http://localhost:3002
```
When using a custom API URL (anything other than `https://api.firecrawl.dev`), API key authentication is automatically skipped, allowing you to use local instances without an API key.
### Check Status
Verify installation, authentication, and view rate limits:
```bash CLI theme={null}
firecrawl --status
```
Output when ready:
```
🔥 firecrawl cli v1.1.1
● Authenticated via FIRECRAWL_API_KEY
Concurrency: 0/100 jobs (parallel scrape limit)
Credits: 500,000 remaining
```
* **Concurrency**: Maximum parallel jobs. Run parallel operations close to this limit but not above.
* **Credits**: Remaining API credits. Each scrape/crawl consumes credits.
## Commands
### Scrape
Scrape a single URL and extract its content in various formats.
Use `--only-main-content` to get clean output without navigation, footers, and ads. This is recommended for most use cases where you want just the article or main page content.
```bash CLI theme={null}
# Scrape a URL (default: markdown output)
firecrawl https://example.com
# Or use the explicit scrape command
firecrawl scrape https://example.com
# Recommended: use --only-main-content for clean output without nav/footer
firecrawl https://example.com --only-main-content
```
#### Output Formats
```bash CLI theme={null}
# Get HTML output
firecrawl https://example.com --html
# Multiple formats (returns JSON)
firecrawl https://example.com --format markdown,links
# Get images from a page
firecrawl https://example.com --format images
# Get a summary of the page content
firecrawl https://example.com --format summary
# Track changes on a page
firecrawl https://example.com --format changeTracking
# Available formats: markdown, html, rawHtml, links, screenshot, json, images, summary, changeTracking, attributes, branding
```
#### Scrape Options
```bash CLI theme={null}
# Extract only main content (removes navs, footers)
firecrawl https://example.com --only-main-content
# Wait for JavaScript rendering
firecrawl https://example.com --wait-for 3000
# Take a screenshot
firecrawl https://example.com --screenshot
# Include/exclude specific HTML tags
firecrawl https://example.com --include-tags article,main
firecrawl https://example.com --exclude-tags nav,footer
# Save output to file
firecrawl https://example.com -o output.md
# Pretty print JSON output
firecrawl https://example.com --format markdown,links --pretty
# Force JSON output even with single format
firecrawl https://example.com --json
# Show request timing information
firecrawl https://example.com --timing
```
**Available Options:**
| Option | Short | Description |
| ----------------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--url ` | `-u` | URL to scrape (alternative to positional argument) |
| `--format ` | `-f` | Output formats (comma-separated): `markdown`, `html`, `rawHtml`, `links`, `screenshot`, `json`, `images`, `summary`, `changeTracking`, `attributes`, `branding` |
| `--html` | `-H` | Shortcut for `--format html` |
| `--only-main-content` | | Extract only main content |
| `--wait-for ` | | Wait time in milliseconds for JS rendering |
| `--screenshot` | | Take a screenshot |
| `--include-tags ` | | HTML tags to include (comma-separated) |
| `--exclude-tags ` | | HTML tags to exclude (comma-separated) |
| `--output ` | `-o` | Save output to file |
| `--json` | | Force JSON output even with single format |
| `--pretty` | | Pretty print JSON output |
| `--timing` | | Show request timing and other useful information |
***
### Search
Search the web and optionally scrape the results.
```bash CLI theme={null}
# Search the web
firecrawl search "web scraping tutorials"
# Limit results
firecrawl search "AI news" --limit 10
# Pretty print results
firecrawl search "machine learning" --pretty
```
#### Search Options
```bash CLI theme={null}
# Search specific sources
firecrawl search "AI" --sources web,news,images
# Search with category filters
firecrawl search "react hooks" --categories github
firecrawl search "machine learning" --categories research,pdf
# Time-based filtering
firecrawl search "tech news" --tbs qdr:h # Last hour
firecrawl search "tech news" --tbs qdr:d # Last day
firecrawl search "tech news" --tbs qdr:w # Last week
firecrawl search "tech news" --tbs qdr:m # Last month
firecrawl search "tech news" --tbs qdr:y # Last year
# Location-based search
firecrawl search "restaurants" --location "Berlin,Germany" --country DE
# Search and scrape results
firecrawl search "documentation" --scrape --scrape-formats markdown
# Save to file
firecrawl search "firecrawl" --pretty -o results.json
```
**Available Options:**
| Option | Description |
| ---------------------------- | ------------------------------------------------------------------------------------------- |
| `--limit ` | Maximum results (default: 5, max: 100) |
| `--sources ` | Sources to search: `web`, `images`, `news` (comma-separated) |
| `--categories ` | Filter by category: `github`, `research`, `pdf` (comma-separated) |
| `--tbs ` | Time filter: `qdr:h` (hour), `qdr:d` (day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year) |
| `--location ` | Geo-targeting (e.g., "Berlin,Germany") |
| `--country ` | ISO country code (default: US) |
| `--timeout ` | Timeout in milliseconds (default: 60000) |
| `--ignore-invalid-urls` | Exclude URLs invalid for other Firecrawl endpoints |
| `--scrape` | Scrape search results |
| `--scrape-formats ` | Formats for scraped content (default: markdown) |
| `--only-main-content` | Include only main content when scraping (default: true) |
| `--json` | Output as JSON |
| `--output ` | Save output to file |
| `--pretty` | Pretty print JSON output |
***
### Map
Discover all URLs on a website quickly.
```bash CLI theme={null}
# Discover all URLs on a website
firecrawl map https://example.com
# Output as JSON
firecrawl map https://example.com --json
# Limit number of URLs
firecrawl map https://example.com --limit 500
```
#### Map Options
```bash CLI theme={null}
# Filter URLs by search query
firecrawl map https://example.com --search "blog"
# Include subdomains
firecrawl map https://example.com --include-subdomains
# Control sitemap usage
firecrawl map https://example.com --sitemap include # Use sitemap
firecrawl map https://example.com --sitemap skip # Skip sitemap
firecrawl map https://example.com --sitemap only # Only use sitemap
# Ignore query parameters (dedupe URLs)
firecrawl map https://example.com --ignore-query-parameters
# Wait for map to complete with timeout
firecrawl map https://example.com --wait --timeout 60
# Save to file
firecrawl map https://example.com -o urls.txt
firecrawl map https://example.com --json --pretty -o urls.json
```
**Available Options:**
| Option | Description |
| --------------------------- | ----------------------------------------------- |
| `--url ` | URL to map (alternative to positional argument) |
| `--limit ` | Maximum URLs to discover |
| `--search ` | Filter URLs by search query |
| `--sitemap ` | Sitemap handling: `include`, `skip`, `only` |
| `--include-subdomains` | Include subdomains |
| `--ignore-query-parameters` | Treat URLs with different params as same |
| `--wait` | Wait for map to complete |
| `--timeout ` | Timeout in seconds |
| `--json` | Output as JSON |
| `--output ` | Save output to file |
| `--pretty` | Pretty print JSON output |
***
### Browser
Have your agents interact with the web using a secure browser sandbox.
Launch cloud browser sessions and execute Python, JavaScript, or bash code remotely. Each session runs a full Chromium instance — no local browser install required. Code runs server-side with a pre-configured [Playwright](https://playwright.dev/) `page` object ready to use.
```bash CLI theme={null}
# Launch a cloud browser session
firecrawl browser launch-session
# Execute agent-browser commands (default - "agent-browser" is auto-prefixed)
firecrawl browser execute "open https://example.com"
firecrawl browser execute "snapshot"
firecrawl browser execute "click @e5"
firecrawl browser execute "scrape"
# Execute Playwright Python code
firecrawl browser execute --python 'await page.goto("https://example.com")
print(await page.title())'
# Execute Playwright JavaScript code
firecrawl browser execute --node 'await page.goto("https://example.com"); console.log(await page.title());'
# List all sessions (or: list active / list destroyed)
firecrawl browser list
# Close the active session
firecrawl browser close
```
#### Browser Options
```bash CLI theme={null}
# Launch with custom TTL (10 minutes) and live view
firecrawl browser launch-session --ttl 600 --stream
# Launch with inactivity timeout
firecrawl browser launch-session --ttl 120 --ttl-inactivity 60
# agent-browser commands (default - "agent-browser" is auto-prefixed)
firecrawl browser execute "open https://news.ycombinator.com"
firecrawl browser execute "snapshot"
firecrawl browser execute "click @e3"
firecrawl browser execute "scrape"
# Playwright Python - navigate, interact, extract
firecrawl browser execute --python '
await page.goto("https://news.ycombinator.com")
items = await page.query_selector_all(".titleline > a")
for item in items[:5]:
print(await item.text_content())
'
# Playwright JavaScript - same page object
firecrawl browser execute --node '
await page.goto("https://example.com");
const title = await page.title();
console.log(title);
'
# Explicit bash mode - runs in the sandbox
firecrawl browser execute --bash "agent-browser snapshot"
# Target a specific session
firecrawl browser execute --session --python 'print(await page.title())'
# Save output to file
firecrawl browser execute "scrape" -o result.txt
# Close a specific session
firecrawl browser close --session
# List sessions (all / active / destroyed)
firecrawl browser list
firecrawl browser list active --json
```
**Subcommands:**
| Subcommand | Description |
| ---------------- | ----------------------------------------------------------------------------------- |
| `launch-session` | Launch a new cloud browser session (returns session ID, CDP URL, and live view URL) |
| `execute ` | Execute Playwright Python/JS code or bash commands in a session |
| `list [status]` | List browser sessions (filter by `active` or `destroyed`) |
| `close` | Close a browser session |
**Execute Options:**
| Option | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `--bash` | Execute bash commands remotely in the sandbox (default). [agent-browser](https://github.com/vercel-labs/agent-browser) (40+ commands) is pre-installed and auto-prefixed. `CDP_URL` is auto-injected so agent-browser connects to your session automatically. Best approach for AI agents. |
| `--python` | Execute as Playwright Python code. A Playwright `page` object is available — use `await page.goto()`, `await page.title()`, etc. |
| `--node` | Execute as Playwright JavaScript code. Same `page` object available. |
| `--session ` | Target a specific session (default: active session) |
**Launch Options:**
| Option | Description |
| ---------------------------- | ------------------------------------------------------------------- |
| `--ttl ` | Total session TTL (default: 600, range: 30–3600) |
| `--ttl-inactivity ` | Auto-close after inactivity (range: 10–3600) |
| `--profile ` | Name for a profile (saves and reuses browser state across sessions) |
| `--no-save-changes` | Load existing profile data without saving changes back |
| `--stream` | Enable live view streaming |
**Common Options:**
| Option | Description |
| ----------------- | --------------------- |
| `--output ` | Save output to file |
| `--json` | Output as JSON format |
***
### Crawl
Crawl an entire website starting from a URL.
```bash CLI theme={null}
# Start a crawl (returns job ID immediately)
firecrawl crawl https://example.com
# Wait for crawl to complete
firecrawl crawl https://example.com --wait
# Wait with progress indicator
firecrawl crawl https://example.com --wait --progress
```
#### Check Crawl Status
```bash CLI theme={null}
# Check crawl status using job ID
firecrawl crawl
# Example with a real job ID
firecrawl crawl 550e8400-e29b-41d4-a716-446655440000
```
#### Crawl Options
```bash CLI theme={null}
# Limit crawl depth and pages
firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait
# Include only specific paths
firecrawl crawl https://example.com --include-paths /blog,/docs --wait
# Exclude specific paths
firecrawl crawl https://example.com --exclude-paths /admin,/login --wait
# Include subdomains
firecrawl crawl https://example.com --allow-subdomains --wait
# Crawl entire domain
firecrawl crawl https://example.com --crawl-entire-domain --wait
# Rate limiting
firecrawl crawl https://example.com --delay 1000 --max-concurrency 2 --wait
# Custom polling interval and timeout
firecrawl crawl https://example.com --wait --poll-interval 10 --timeout 300
# Save results to file
firecrawl crawl https://example.com --wait --pretty -o results.json
```
**Available Options:**
| Option | Description |
| --------------------------- | ------------------------------------------------- |
| `--url ` | URL to crawl (alternative to positional argument) |
| `--wait` | Wait for crawl to complete |
| `--progress` | Show progress indicator while waiting |
| `--poll-interval ` | Polling interval (default: 5) |
| `--timeout ` | Timeout when waiting |
| `--status` | Check status of existing crawl job |
| `--limit ` | Maximum pages to crawl |
| `--max-depth ` | Maximum crawl depth |
| `--include-paths ` | Paths to include (comma-separated) |
| `--exclude-paths ` | Paths to exclude (comma-separated) |
| `--sitemap ` | Sitemap handling: `include`, `skip`, `only` |
| `--allow-subdomains` | Include subdomains |
| `--allow-external-links` | Follow external links |
| `--crawl-entire-domain` | Crawl entire domain |
| `--ignore-query-parameters` | Treat URLs with different params as same |
| `--delay ` | Delay between requests |
| `--max-concurrency ` | Maximum concurrent requests |
| `--output ` | Save output to file |
| `--pretty` | Pretty print JSON output |
***
### Agent
Search and gather data from the web using natural language prompts.
```bash CLI theme={null}
# Basic usage - URLs are optional
firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait
# Focus on specific URLs
firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait
# Use a schema for structured output
firecrawl agent "Get company information" --urls https://example.com --schema '{"name": "string", "founded": "number"}' --wait
# Use schema from a file
firecrawl agent "Get product details" --urls https://example.com --schema-file schema.json --wait
```
#### Agent Options
```bash CLI theme={null}
# Use Spark 1 Pro for higher accuracy
firecrawl agent "Competitive analysis across multiple domains" --model spark-1-pro --wait
# Set max credits to limit costs
firecrawl agent "Gather contact information from company websites" --max-credits 100 --wait
# Check status of an existing job
firecrawl agent --status
# Custom polling interval and timeout
firecrawl agent "Summarize recent blog posts" --wait --poll-interval 10 --timeout 300
# Save output to file
firecrawl agent "Find pricing information" --urls https://example.com --wait -o pricing.json --pretty
```
**Available Options:**
| Option | Description |
| --------------------------- | -------------------------------------------------------------------------------------- |
| `--urls ` | Optional list of URLs to focus the agent on (comma-separated) |
| `--model ` | Model to use: `spark-1-mini` (default, 60% cheaper) or `spark-1-pro` (higher accuracy) |
| `--schema ` | JSON schema for structured output (inline JSON string) |
| `--schema-file ` | Path to JSON schema file for structured output |
| `--max-credits ` | Maximum credits to spend (job fails if limit reached) |
| `--status` | Check status of existing agent job |
| `--wait` | Wait for agent to complete before returning results |
| `--poll-interval ` | Polling interval when waiting (default: 5) |
| `--timeout ` | Timeout when waiting (default: no timeout) |
| `--output ` | Save output to file |
| `--json` | Output as JSON format |
***
### Credit Usage
Check your team's credit balance and usage.
```bash CLI theme={null}
# View credit usage
firecrawl credit-usage
# Output as JSON
firecrawl credit-usage --json --pretty
```
***
### Version
Display the CLI version.
```bash CLI theme={null}
firecrawl version
# or
firecrawl --version
```
## Global Options
These options are available for all commands:
| Option | Short | Description |
| ----------------- | ----- | ------------------------------------------------------ |
| `--status` | | Show version, auth, concurrency, and credits |
| `--api-key ` | `-k` | Override stored API key for this command |
| `--api-url ` | | Use custom API URL (for self-hosted/local development) |
| `--help` | `-h` | Show help for a command |
| `--version` | `-V` | Show CLI version |
## Output Handling
The CLI outputs to stdout by default, making it easy to pipe or redirect:
```bash CLI theme={null}
# Pipe markdown to another command
firecrawl https://example.com | head -50
# Redirect to a file
firecrawl https://example.com > output.md
# Save JSON with pretty formatting
firecrawl https://example.com --format markdown,links --pretty -o data.json
```
### Format Behavior
* **Single format**: Outputs raw content (markdown text, HTML, etc.)
* **Multiple formats**: Outputs JSON with all requested data
```bash CLI theme={null}
# Raw markdown output
firecrawl https://example.com --format markdown
# JSON output with multiple formats
firecrawl https://example.com --format markdown,links
```
## Examples
### Quick Scrape
```bash CLI theme={null}
# Get markdown content from a URL (use --only-main-content for clean output)
firecrawl https://docs.firecrawl.dev --only-main-content
# Get HTML content
firecrawl https://example.com --html -o page.html
```
### Full Site Crawl
```bash CLI theme={null}
# Crawl a docs site with limits
firecrawl crawl https://docs.example.com --limit 50 --max-depth 2 --wait --progress -o docs.json
```
### Site Discovery
```bash CLI theme={null}
# Find all blog posts
firecrawl map https://example.com --search "blog" -o blog-urls.txt
```
### Research Workflow
```bash CLI theme={null}
# Search and scrape results for research
firecrawl search "machine learning best practices 2024" --scrape --scrape-formats markdown --pretty
```
### Agent
```bash CLI theme={null}
# URLs are optional
firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait
# Focus on specific URLs
firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait
```
### Browser Automation
```bash CLI theme={null}
# Launch a session, scrape a page, and close
firecrawl browser launch-session
firecrawl browser execute "open https://news.ycombinator.com"
firecrawl browser execute "snapshot"
firecrawl browser execute "scrape"
firecrawl browser close
# Use agent-browser via bash mode (default — recommended for AI agents)
firecrawl browser launch-session
firecrawl browser execute "open https://example.com"
firecrawl browser execute "snapshot"
# snapshot returns @ref IDs — use them to interact
firecrawl browser execute "click @e5"
firecrawl browser execute "fill @e3 'search query'"
firecrawl browser execute "scrape"
# Run --help to see all 40+ commands
firecrawl browser execute --bash "agent-browser --help"
firecrawl browser close
```
### Combine with Other Tools
```bash CLI theme={null}
# Extract URLs from search results
jq -r '.data.web[].url' search-results.json
# Get titles from search results
jq -r '.data.web[] | "\(.title): \(.url)"' search-results.json
# Extract links and process with jq
firecrawl https://example.com --format links | jq '.links[].url'
# Count URLs from map
firecrawl map https://example.com | wc -l
```
## Telemetry
The CLI collects anonymous usage data during authentication to help improve the product:
* CLI version, OS, and Node.js version
* Development tool detection (e.g., Cursor, VS Code, Claude Code)
**No command data, URLs, or file contents are collected via the CLI.**
To disable telemetry, set the environment variable:
```bash CLI theme={null}
export FIRECRAWL_NO_TELEMETRY=1
```
## Open Source
The Firecrawl CLI and Skill are open source and available on GitHub: [firecrawl/cli](https://github.com/firecrawl/cli)
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# AI Platforms
Source: https://docs.firecrawl.dev/use-cases/ai-platforms
Power AI assistants and let customers build AI apps
AI platform builders and teams use Firecrawl to power knowledge bases, chatbots, and enable customers to build AI applications with web data.
## Start with a Template
Instant AI chatbots for websites with web knowledge integration
**Get started with templates and examples.** Build AI-powered applications with web data.
## How It Works
Transform websites into AI-ready data. Power chatbots with real-time web knowledge, build RAG systems with up-to-date documentation, and enable your users to connect their AI applications to web sources.
## Why AI Platforms Choose Firecrawl
### Reduce Hallucinations with Real-Time Data
Your AI assistants need current information, not outdated training data. Whether it's domain-specific knowledge, technical documentation, or industry-specific content, Firecrawl ensures your knowledge bases stay synchronized with the latest updates-reducing hallucinations and improving response accuracy.
## Customer Stories
**Replit**
Learn how Replit leverages Firecrawl to keep Replit Agent up-to-date with the latest API documentation and web content.
**Stack AI**
Discover how Stack AI uses Firecrawl to seamlessly feed agentic AI workflows with high-quality web data.
## FAQs
Firecrawl provides simple APIs and SDKs that integrate directly into AI platforms. Whether you're building with LangChain, using no-code tools like n8n, or custom frameworks, Firecrawl delivers clean, structured web data ready for AI consumption.
Yes. Firecrawl is designed for enterprise-scale data extraction, processing millions of pages for AI training datasets. Our infrastructure scales automatically to meet your needs.
Firecrawl delivers data in AI-friendly formats including clean markdown, structured JSON, raw HTML, extracted images, screenshots, and news content. This flexibility ensures compatibility with any AI platform's data ingestion requirements.
Yes! Our API supports real-time data extraction, enabling AI applications to access fresh web data on-demand. This is perfect for AI agents that need current information to make decisions.
## Related Use Cases
* [Deep Research](/use-cases/deep-research) - Advanced research capabilities
* [Content Generation](/use-cases/content-generation) - AI-powered content creation
* [Developers & MCP](/use-cases/developers-mcp) - Developer integrations
# Competitive Intelligence
Source: https://docs.firecrawl.dev/use-cases/competitive-intelligence
Monitor competitor websites and track changes in real-time
Business intelligence teams use Firecrawl to monitor competitors and get alerts on strategic changes.
## Start with a Template
Real-time website monitoring with intelligent alerts
Research and analyze competitor strategies with AI
**Choose from monitoring and research templates.** Track competitors and analyze their strategies.
## How It Works
Set up a pipeline that scrapes competitor sites on a schedule, extracts the data you care about, and alerts your team when something changes.
Crawl or scrape competitor websites with Firecrawl to get clean, structured content from product pages, pricing tables, and blog posts.
Pull out specific fields like pricing tiers, feature lists, job openings, or partnership announcements using structured extraction.
Store each extraction and diff it against previous snapshots to pinpoint what changed and when.
Trigger notifications when you detect important updates such as a new product launch, a pricing shift, or a change in positioning.
## What You Can Track
* **Products**: New launches, features, specs, pricing, documentation
* **Marketing**: Messaging changes, campaigns, case studies, testimonials
* **Business**: Job postings, partnerships, funding, press releases
* **Strategy**: Positioning, target markets, pricing approaches, go-to-market
* **Technical**: API changes, integrations, technology stack updates
## FAQs
Firecrawl extracts current page content whenever called. Build your own monitoring system to check competitors at intervals that match your needs - from hourly for critical updates to daily for routine tracking.
Yes, Firecrawl can access region-specific content. You can monitor different versions of competitor sites across multiple countries and languages.
When building your monitoring system, implement filters to ignore minor changes like timestamps or dynamic content. Compare extracted data over time and use your own logic to determine what constitutes a meaningful change.
Yes. Extract data from competitor press releases, blog posts, and public social media pages. Build systems to analyze announcement patterns, messaging changes, and campaign launches over time.
Extract data from multiple competitor sites using Firecrawl's APIs. Build your own system to organize and compare this data - many users create databases with competitor profiles and custom dashboards for analysis.
## Related Use Cases
* [Product & E-commerce](/use-cases/product-ecommerce) - Track competitor products
* [Investment & Finance](/use-cases/investment-finance) - Market intelligence
* [SEO Platforms](/use-cases/seo-platforms) - search competitor tracking
# Content Generation
Source: https://docs.firecrawl.dev/use-cases/content-generation
Generate AI content based on website data, images, and news
Turn live website data into presentations, emails, marketing copy, and newsletters. Firecrawl scrapes pages, extracts clean text and images, and feeds structured content to your AI pipeline so every piece you publish is grounded in real, up-to-date sources.
## Start with a Template
Clone and recreate any website as a modern React app
**Get started with the Open Lovable template.** Transform websites into content and applications.
## How It Works
Point Firecrawl at any URL or set of URLs. It returns clean, structured content you can pipe directly into your generation workflow.
### Multiple Output Formats
Extract content as structured HTML, clean Markdown, JSON, or full-page screenshots, whichever format best fits your content pipeline.
### Image and Visual Extraction
Capture images and take screenshots from source pages to enrich presentations, emails, and social posts with real visuals.
### News and Trending Content
Surface relevant news stories as part of your request to keep generated content current and timely.
## What You Can Create
* **Sales Decks**: Custom presentations with prospect data
* **Email Campaigns**: Personalized outreach at scale
* **Marketing Content**: Data-driven blog posts and reports
* **Social Media**: Trending topic and news-driven content generation
* **Documentation**: Auto-updated technical content
* **Newsletters**: Curated updates from industry and competitor news
* **Visual Content**: Posts and reports enriched with extracted images and screenshots
## FAQs
Firecrawl extracts data directly from source websites, preserving the original content structure and context. All extracted data includes source URLs and timestamps for verification.
Firecrawl provides clean markdown, structured JSON, HTML, images, and screenshots from websites. This extracted data serves as the factual foundation for your content generation workflows.
Yes. Firecrawl can extract images, capture screenshots, and pull content from news sites. This enables you to create visually rich content and stay current with industry developments.
Firecrawl excels at extracting from company websites, news sites, blogs, and documentation. Sites with structured HTML and clear content hierarchies yield the cleanest extraction results.
Use Firecrawl's batch scraping and crawl APIs to extract data from multiple websites efficiently. Process hundreds of URLs in parallel to build comprehensive datasets for your content workflows.
## Related Use Cases
* [AI Platforms](/use-cases/ai-platforms) - Build AI-powered content tools
* [Lead Enrichment](/use-cases/lead-enrichment) - Personalize with prospect data
* [SEO Platforms](/use-cases/seo-platforms) - Optimize generated content
# Data Migration
Source: https://docs.firecrawl.dev/use-cases/data-migration
Transfer web data efficiently between platforms and systems
Extract content, structure, and metadata from any website and transform it for import into a new platform. Firecrawl handles the scraping so you can focus on mapping data to your target system.
## Start with a Template
Efficiently migrate data between platforms and systems
**Get started with the Firecrawl Migrator template.** Extract and transform data for platform migrations.
## How It Works
Point Firecrawl at your source website to crawl every page and return clean, structured data. From there, you transform the output to match your target platform's schema and import it using that platform's API or bulk-import tools.
## What You Can Migrate
* **Content**: Pages, posts, articles, media files, metadata
* **Structure**: Hierarchies, categories, tags, taxonomies
* **Users**: Profiles and user-related data where publicly accessible
* **Settings**: Configurations, custom fields, workflows
* **E-commerce**: Products, catalogs, inventory, orders
## Common Migration Use Cases
Users build migration tools with Firecrawl to extract data from various platforms:
Extract content from WordPress, Drupal, and Joomla sites or custom CMS platforms. Preserve content structure and metadata, then export for import into new systems like Contentful, Strapi, or Sanity.
Extract product catalogs from Magento and WooCommerce stores including inventory, pricing, descriptions, and specifications. Format data for import into Shopify, BigCommerce, or other platforms.
## FAQs
Our infrastructure scales automatically to handle large migrations. We support incremental processing with batching and parallel extraction, allowing you to migrate millions of pages by breaking them into manageable chunks with progress tracking.
Yes! Extract all SEO metadata including URLs, titles, descriptions, and implement proper redirects. We help maintain your search rankings through the migration.
Firecrawl can extract and catalog all media files. You can download them for re-upload to your new platform or reference them directly if keeping the same CDN.
We provide detailed extraction reports and support comparison tools. You can verify content completeness, check broken links, and validate data integrity.
Yes, you can extract publicly visible user-generated content including comments, reviews, and forum posts. Private user data requires appropriate authentication and permissions.
## Related Use Cases
* [Product & E-commerce](/use-cases/product-ecommerce) - Catalog migrations
* [Content Generation](/use-cases/content-generation) - Content transformation
* [AI Platforms](/use-cases/ai-platforms) - Knowledge base migration
# Deep Research
Source: https://docs.firecrawl.dev/use-cases/deep-research
Build agentic research tools with deep web search capabilities
Build automated research agents that search the web, scrape full-page content, and synthesize findings with an LLM. Firecrawl handles source discovery and content extraction so you can focus on analysis, not parsing HTML.
## Start with a Template
Blazing-fast AI search with real-time citations
Deep research agent with LangGraph and answer validation
Visual AI research assistant for comprehensive analysis
**Choose from multiple research templates.** Clone, configure your API key, and start researching.
## How It Works
Build powerful research tools that transform scattered web data into comprehensive insights. The core pattern is a **search → scrape → analyze → repeat** loop: use Firecrawl's search API to discover relevant sources, scrape each source for full content, then feed the results into an LLM to synthesize findings and identify follow-up queries.
Use the `/search` endpoint to find relevant pages for your research topic.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
results = firecrawl.search(
"recent advances in quantum computing",
limit=5,
scrape_options={"formats": ["markdown", "links"]}
)
```
```js Node.js theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await firecrawl.search(
'recent advances in quantum computing',
{ limit: 5, scrapeOptions: { formats: ['markdown', 'links'] } }
);
```
Extract full content from each result to get detailed information with citations.
```python Python theme={null}
for result in results:
doc = firecrawl.scrape(result["url"], formats=["markdown"])
# Feed doc content into your LLM for analysis
```
```js Node.js theme={null}
for (const result of results) {
const doc = await firecrawl.scrape(result.url, { formats: ['markdown'] });
// Feed doc content into your LLM for analysis
}
```
Use an LLM to synthesize findings, identify gaps, and generate follow-up queries. Repeat the loop until your research question is fully answered.
## Why Researchers Choose Firecrawl
### Accelerate Research from Weeks to Hours
Build automated research systems that discover, read, and synthesize information from across the web. Create tools that deliver comprehensive reports with full citations, eliminating manual searching through hundreds of sources.
### Ensure Research Completeness
Reduce the risk of missing critical information. Build systems that follow citation chains, discover related sources, and surface insights that traditional search methods miss.
## Research Tool Capabilities
* **Iterative Exploration**: Build tools that automatically discover related topics and sources
* **Multi-Source Synthesis**: Combine information from hundreds of websites
* **Citation Preservation**: Maintain full source attribution in your research outputs
* **Intelligent Summarization**: Extract key findings and insights for analysis
* **Trend Detection**: Identify patterns across multiple sources
## FAQs
Use Firecrawl's crawl and search APIs to build iterative research systems. Start with search results, extract content from relevant pages, follow citation links, and aggregate findings. Combine with LLMs to synthesize comprehensive research reports.
Yes. Firecrawl can extract data from open-access research papers, academic websites, and publicly available scientific publications. It preserves formatting, citations, and technical content critical for research work.
Firecrawl maintains source attribution and extracts content exactly as presented on websites. All data includes source URLs and timestamps, ensuring full traceability for research purposes.
Yes. Set up scheduled crawls to track how information changes over time. This is perfect for monitoring trends, policy changes, or any research requiring temporal data analysis.
Our crawling infrastructure scales to handle thousands of sources simultaneously. Whether you're analyzing entire industries or tracking global trends, Firecrawl provides the data pipeline you need.
## Related Use Cases
* [AI Platforms](/use-cases/ai-platforms) - Build AI research assistants
* [Content Generation](/use-cases/content-generation) - Research-based content
* [Competitive Intelligence](/use-cases/competitive-intelligence) - Market research
# Developers & MCP
Source: https://docs.firecrawl.dev/use-cases/developers-mcp
Build powerful integrations with Model Context Protocol support
Give your AI coding assistant the ability to scrape, crawl, and search the web in real time. Firecrawl's MCP server connects to Claude Desktop, Cursor, and other Model Context Protocol clients so your assistant can pull live documentation, discover site structures, and extract structured data on demand.
## Start with a Template
Official MCP server - Add web scraping to Claude Desktop and Cursor
Build complete applications from any website instantly
**Get started with MCP in minutes.** Follow our [setup guide](https://github.com/firecrawl/firecrawl-mcp-server#installation) to integrate Firecrawl into Claude Desktop or Cursor.
## How It Works
Integrate Firecrawl directly into your AI coding workflow through Model Context Protocol. Once configured, your AI assistant gains access to a set of web scraping tools it can call on your behalf:
| Tool | What it does |
| ---------------- | ---------------------------------------------------------- |
| **Scrape** | Extract content or structured data from a single URL |
| **Batch Scrape** | Extract content from multiple known URLs in parallel |
| **Map** | Discover all indexed URLs on a website |
| **Crawl** | Walk a site section and extract content from every page |
| **Search** | Search the web and optionally extract content from results |
Your assistant picks the right tool automatically. Ask it to "read the Next.js docs" and it will scrape. Ask it to "find all blog posts on example.com" and it will map then batch scrape.
## Why Developers Choose Firecrawl MCP
### Build Smarter AI Assistants
Give your AI real-time access to documentation, APIs, and web resources. Reduce outdated information and hallucinations by providing your assistant with the latest data.
### Zero Infrastructure Required
No servers to manage, no crawlers to maintain. Just configure once and your AI assistant can access websites instantly through the Model Context Protocol.
## Customer Stories
**Botpress**
Discover how Botpress uses Firecrawl to streamline knowledge base population and improve developer experience.
**Answer HQ**
Learn how Answer HQ uses Firecrawl to help businesses import website data and build intelligent support assistants.
## FAQs
Currently, Claude Desktop and Cursor have native MCP support. More AI assistants are adding support regularly. You can also use the MCP SDK to build custom integrations.
VS Code and other IDEs can use MCP through community extensions or terminal integrations. Native support varies by IDE. Check our [GitHub repository](https://github.com/firecrawl/firecrawl-mcp-server) for IDE-specific setup guides.
The MCP server automatically caches responses for 15 minutes. You can configure cache duration in your MCP server settings or implement custom caching logic.
MCP requests use your standard Firecrawl API rate limits. We recommend batching related requests and using caching for frequently accessed documentation.
Follow our [setup guide](https://github.com/firecrawl/firecrawl-mcp-server#installation) to configure MCP. You'll need to add your Firecrawl API key to your MCP configuration file. The process takes just a few minutes.
## Related Use Cases
* [AI Platforms](/use-cases/ai-platforms) - Build AI-powered dev tools
* [Deep Research](/use-cases/deep-research) - Complex technical research
* [Content Generation](/use-cases/content-generation) - Generate documentation
# Investment & Finance
Source: https://docs.firecrawl.dev/use-cases/investment-finance
Track companies and extract financial insights from web data
Turn public web data into structured financial signals. Firecrawl scrapes company websites, news sites, job boards, and regulatory filings, then returns clean JSON you can feed directly into due diligence workflows, earnings prep, or ongoing portfolio surveillance.
## Start with a Template
Monitor portfolio companies for material changes and trigger events
## What You Can Track
* **Company Metrics**: growth indicators, team changes, product launches, funding rounds
* **Market Signals**: industry trends, competitor moves, sentiment shifts, regulatory changes
* **Risk Indicators**: leadership changes, legal issues, regulatory mentions, customer complaints
* **Financial Data**: pricing updates, revenue signals, partnership announcements
* **Alternative Data**: job postings, web traffic, social signals, news mentions
## Customer Stories
**Athena Intelligence**
Discover how Athena Intelligence leverages Firecrawl to fuel its AI-native analytics platform for enterprise analysts.
**Cargo**
See how Cargo uses Firecrawl to analyze market data and power revenue intelligence workflows.
## FAQs
Yes, you can monitor publicly available information about private companies from their websites, news mentions, job postings, and social media presence.
Firecrawl extracts data in real-time when called. Build your own monitoring system to fetch data at intervals that match your investment strategy - from minute-by-minute for critical events to daily for routine tracking.
Public web sources such as company websites, news sites, job boards, review sites, forums, social media, government filings, and open-access industry data.
Extract data from company ESG reports, sustainability pages, news mentions of environmental initiatives, and regulatory filings. Build tracking systems to identify changes in sustainability commitments or ESG-related developments.
Yes. Extract recent company updates, product launches, executive changes, and industry trends before earnings calls. Combine with competitor data to anticipate questions and identify key discussion points.
## Related Use Cases
* [Competitive Intelligence](/use-cases/competitive-intelligence) - Track market competitors
* [Deep Research](/use-cases/deep-research) - Comprehensive market analysis
* [Lead Enrichment](/use-cases/lead-enrichment) - B2B investment opportunities
# Lead Enrichment
Source: https://docs.firecrawl.dev/use-cases/lead-enrichment
Extract and filter leads from websites to power your sales pipeline
Scrape business directories for leads, enrich your CRM with live company data, and automate account research. Firecrawl handles the extraction so you can focus on closing deals.
## Start with a Template
AI-powered lead enrichment and data extraction from websites
## How It Works
1. **Point Firecrawl at a source.** Pass a directory URL, company website, or list of domains to the API.
2. **Extract structured data.** Firecrawl scrapes each page and returns fields like company name, contact details, team members, and recent news.
3. **Send the data to your CRM.** Push enriched records into Salesforce, HubSpot, or any other tool through the API or a Zapier integration.
## Why Sales Teams Choose Firecrawl
### Extract leads from any directory
Point Firecrawl at a business directory, trade association, or conference attendee list. You get back structured records with company details and contact information, ready to import into your CRM.
### Keep CRM data fresh automatically
Firecrawl pulls information directly from live company websites instead of a static database. Your team always sees the latest company news, team changes, and growth signals.
## Customer Stories
**Zapier**
Discover how Zapier uses Firecrawl to empower customers with custom knowledge in their chatbots.
**Cargo**
See how Cargo uses Firecrawl to instantly analyze webpage content and power Go-To-Market workflows.
## Lead Sources
### Business Directories
* Industry-specific directories
* Chamber of commerce listings
* Trade association members
* Conference attendee lists
### Company Websites
* About pages and team sections
* Press releases and news
* Job postings for growth signals
* Customer case studies
## FAQs
Firecrawl automatically extracts company information, contact details, product offerings, and recent news from prospect websites. This enriches your CRM with accurate, up-to-date information for better sales outreach.
Yes! Firecrawl extracts publicly available contact information including emails and phone numbers from company websites, team pages, and contact sections.
Since Firecrawl extracts data directly from live websites, you get the most current information available. This is more accurate than static databases that quickly become outdated.
Yes. Use our API or Zapier integration to automatically enrich leads in Salesforce, HubSpot, Pipedrive, and other CRMs. Keep your lead data fresh without manual research.
Extract detailed company information, recent updates, and trigger events from target account websites. This intelligence helps personalize outreach and identify the perfect timing for engagement.
## Related Use Cases
* [AI Platforms](/use-cases/ai-platforms) - Build AI sales assistants
* [Competitive Intelligence](/use-cases/competitive-intelligence) - Track competitors
* [Investment & Finance](/use-cases/investment-finance) - Investment opportunities
# Observability & Monitoring
Source: https://docs.firecrawl.dev/use-cases/observability
Monitor websites, track uptime, and detect changes in real-time
DevOps and SRE teams use Firecrawl to monitor websites, track availability, and detect critical changes across their digital infrastructure.
## Start with a Template
Real-time website monitoring and intelligent change detection
## How It Works
Call Firecrawl's scrape or extract API on a schedule to capture page content, then compare each snapshot against a baseline in your own system. When the extracted data changes or a page fails to load, trigger an alert through your existing tools (PagerDuty, Slack, email, etc.).
Because Firecrawl fully renders JavaScript before extracting, you get the page as users see it, not just the raw HTML. This makes it reliable for monitoring SPAs, dynamic dashboards, and client-rendered content.
## What You Can Monitor
* **Availability**: Uptime, response times, error rates
* **Content**: Text changes, image updates, layout shifts
* **Performance**: Page load times, resource sizes, Core Web Vitals
* **Security**: SSL certificates, security headers, misconfigurations
* **SEO Health**: Meta tags, structured data, sitemap validity
* **User journeys**: Multi-step transaction flows and cross-browser rendering
## FAQs
Firecrawl extracts website content and structure on demand. Build monitoring systems that call Firecrawl's API to check pages, compare extracted data against baselines, and trigger your own alerts when changes occur.
Yes! Firecrawl fully renders JavaScript, making it perfect for monitoring modern SPAs, React apps, and dynamic content. We capture the page as users see it, not just the raw HTML.
Firecrawl extracts data in real-time when called. Build your monitoring system to check sites at whatever frequency you need - from minute-by-minute for critical pages to daily for routine checks.
Yes. Use the extract API to pull specific elements like prices, inventory levels, or critical content. Build validation logic in your monitoring system to verify that important information is present and correct.
Firecrawl provides webhooks that you can use to build integrations with your alerting tools. Send extracted data to PagerDuty, Slack, email, or any monitoring platform by building connectors that process Firecrawl's responses.
## Related Use Cases
* [Competitive Intelligence](/use-cases/competitive-intelligence) - Monitor competitor changes
* [Product & E-commerce](/use-cases/product-ecommerce) - Track inventory and pricing
* [Data Migration](/use-cases/data-migration) - Validate migrations
# Use Cases
Source: https://docs.firecrawl.dev/use-cases/overview
Transform web data into powerful features for your applications
Turn any website into structured data for your product. Whether you are building RAG pipelines, enriching leads, or monitoring competitors, Firecrawl handles the scraping so you can focus on what you do with the data.
Browse the use cases below to find guides, code samples, and architecture patterns for your workflow.
Add web knowledge to your RAG chatbots and AI assistants.
Extract and filter leads from websites to enrich your sales pipeline.
Monitor search rankings and optimize content strategy.
Build agentic research tools with deep web search capabilities.
Monitor pricing and track inventory across e-commerce sites.
Generate AI content based on website data and structure.
Build powerful integrations with Model Context Protocol support.
Track companies and extract financial insights from web data.
Monitor competitor websites and track changes in real-time.
Transfer web data seamlessly between platforms and systems.
Monitor websites, track uptime, and detect changes in real-time.
# Product & E-commerce
Source: https://docs.firecrawl.dev/use-cases/product-ecommerce
Monitor pricing and track inventory across e-commerce sites
Turn any e-commerce website into structured product data. Firecrawl extracts pricing, inventory, and catalog information so you can monitor competitors, track stock levels, and migrate products between platforms.
## Start with a Template
Migrate product catalogs and e-commerce data between platforms
## What You Can Extract
* **Product Data**: Title, SKU, specs, descriptions, categories
* **Pricing**: Current price, discounts, shipping, tax
* **Inventory**: Stock levels, availability, lead times
* **Reviews**: Ratings, customer feedback, Q\&A sections
## Use Cases in Action
**Price Monitoring**
Track competitor pricing across multiple e-commerce sites, receive alerts on price changes, and optimize your pricing strategy based on real-time market data.
**Catalog Migration**
Seamlessly migrate thousands of products between e-commerce platforms, preserving all product data, variants, images, and metadata.
## FAQs
Build a monitoring system using Firecrawl's API to extract prices at regular intervals. Compare extracted data over time to identify pricing trends, promotions, and competitive positioning.
Yes, Firecrawl can extract all product variants including size, color, and other options. Structure the data with custom schemas to capture all variant information.
For dynamic pricing, you can use Firecrawl's JavaScript rendering to capture prices after they load. For user-specific pricing, configure authentication headers in your requests.
Yes. Firecrawl can extract data from any publicly accessible e-commerce website. Users successfully extract from Shopify, WooCommerce, Magento, BigCommerce, and custom-built stores.
Yes. Firecrawl can navigate through paginated product listings and handle infinite scroll mechanisms to extract complete product catalogs, ensuring no products are missed during extraction.
## Related Use Cases
* [Lead Enrichment](/use-cases/lead-enrichment) - Enrich B2B e-commerce leads
* [Competitive Intelligence](/use-cases/competitive-intelligence) - Track competitor strategies
* [Data Migration](/use-cases/data-migration) - Migrate between platforms
# SEO Platforms
Source: https://docs.firecrawl.dev/use-cases/seo-platforms
Optimize websites for AI assistants and search engines
Audit entire websites for SEO and AI readability at scale. Firecrawl extracts meta tags, header structures, internal links, and content across thousands of pages, so you can spot optimization gaps that sampling-based tools miss.
## Start with a Template
GEO-powered SEO monitoring and multi-region rank tracking
**Get started with the FireGEO template.** Optimize for both search engines and AI assistants.
## How It Works
Crawl a site with Firecrawl to get structured markdown and metadata for every page. Use that output to analyze content quality, check technical SEO elements, and evaluate how well each page communicates with AI assistants.
## Why SEO Platforms Choose Firecrawl
### Optimize for AI discovery, not just Google
Tune your clients' content for AI assistant responses alongside traditional search rankings. Firecrawl extracts the structural and semantic signals AI crawlers rely on, so you can audit pages for both Google and the new wave of AI-powered discovery.
### Complete site intelligence at scale
Audit entire websites instead of sampling a handful of pages. Extract meta tags, header structures, internal links, and content across thousands of pages in a single crawl, and surface optimization gaps your competitors miss.
## What You Can Build
* **AI Readability Audit**: Optimize for AI comprehension
* **Content Analysis**: Structure and semantic optimization
* **Technical SEO**: Site performance and crawlability
* **Search Tracking**: Monitor search engine positions
## FAQs
Firecrawl helps you structure content for optimal AI comprehension, extract semantic signals, and ensure your site follows best practices for AI discovery. This includes generating experimental formats like llms.txt (an emerging convention for AI crawler guidance).
Firecrawl extracts complete site structures, meta tags, headers, internal links, and content to perform comprehensive SEO audits. Identify optimization opportunities and track improvements over time.
Yes. Analyze competitor site structures, keyword usage, content strategies, and technical SEO implementations. Understand what's working in your industry to inform your strategy.
Crawl competitor sites to identify topics they cover that you don't. Extract their content categories, blog topics, and page structures to find opportunities for new content.
Yes. Identify broken links, track redirect chains, extract canonical tags, and monitor meta tag implementation. Regular crawls help identify technical SEO issues across your site.
## Related Use Cases
* [AI Platforms](/use-cases/ai-platforms) - Build AI-powered SEO tools
* [Competitive Intelligence](/use-cases/competitive-intelligence) - Track competitor SEO
* [Content Generation](/use-cases/content-generation) - Create SEO content
# Event Types
Source: https://docs.firecrawl.dev/webhooks/events
Webhook event reference
Firecrawl sends webhook events at each stage of a job's lifecycle, so you can track progress, capture results, and handle failures in real time without polling.
## Quick Reference
| Event | Trigger |
| ------------------------ | ---------------------------------------------------- |
| `crawl.started` | Crawl job begins processing |
| `crawl.page` | A page is scraped during a crawl |
| `crawl.completed` | Crawl job finishes and all pages have been processed |
| `batch_scrape.started` | Batch scrape job begins processing |
| `batch_scrape.page` | A URL is scraped during a batch scrape |
| `batch_scrape.completed` | All URLs in the batch have been processed |
| `extract.started` | Extract job begins processing |
| `extract.completed` | Extraction finishes successfully |
| `extract.failed` | Extraction fails |
| `agent.started` | Agent job begins processing |
| `agent.action` | Agent executes a tool (scrape, search, etc.) |
| `agent.completed` | Agent finishes successfully |
| `agent.failed` | Agent encounters an error |
| `agent.cancelled` | Agent job is cancelled by the user |
## Payload Structure
All webhook events share this structure:
```json theme={null}
{
"success": true,
"type": "crawl.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [...],
"metadata": {}
}
```
| Field | Type | Description |
| ---------- | ------- | ----------------------------------------- |
| `success` | boolean | Whether the operation succeeded |
| `type` | string | Event type (e.g. `crawl.page`) |
| `id` | string | Job ID |
| `data` | array | Event-specific data (see examples below) |
| `metadata` | object | Custom metadata from your webhook config |
| `error` | string | Error message (when `success` is `false`) |
## Crawl Events
### `crawl.started`
Sent when the crawl job begins processing.
```json theme={null}
{
"success": true,
"type": "crawl.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
```
### `crawl.page`
Sent for each page scraped. The `data` array contains the page content and metadata.
```json theme={null}
{
"success": true,
"type": "crawl.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"markdown": "# Page content...",
"metadata": {
"title": "Page Title",
"description": "Page description",
"url": "https://example.com/page",
"statusCode": 200,
"contentType": "text/html",
"scrapeId": "550e8400-e29b-41d4-a716-446655440001",
"sourceURL": "https://example.com/page",
"proxyUsed": "basic",
"cacheState": "hit",
"cachedAt": "2025-09-03T21:11:25.636Z",
"creditsUsed": 1
}
}
],
"metadata": {}
}
```
### `crawl.completed`
Sent when the crawl job finishes and all pages have been processed.
```json theme={null}
{
"success": true,
"type": "crawl.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
```
## Batch Scrape Events
### `batch_scrape.started`
Sent when the batch scrape job begins processing.
```json theme={null}
{
"success": true,
"type": "batch_scrape.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
```
### `batch_scrape.page`
Sent for each URL scraped. The `data` array contains the page content and metadata.
```json theme={null}
{
"success": true,
"type": "batch_scrape.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"markdown": "# Page content...",
"metadata": {
"title": "Page Title",
"description": "Page description",
"url": "https://example.com",
"statusCode": 200,
"contentType": "text/html",
"scrapeId": "550e8400-e29b-41d4-a716-446655440001",
"sourceURL": "https://example.com",
"proxyUsed": "basic",
"cacheState": "miss",
"cachedAt": "2025-09-03T23:30:53.434Z",
"creditsUsed": 1
}
}
],
"metadata": {}
}
```
### `batch_scrape.completed`
Sent when all URLs in the batch have been processed.
```json theme={null}
{
"success": true,
"type": "batch_scrape.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
```
## Extract Events
### `extract.started`
Sent when the extract job begins processing.
```json theme={null}
{
"success": true,
"type": "extract.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
```
### `extract.completed`
Sent when extraction finishes successfully. The `data` array contains the extracted data and usage info.
```json theme={null}
{
"success": true,
"type": "extract.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"success": true,
"data": { "siteName": "Example Site", "category": "Technology" },
"extractId": "550e8400-e29b-41d4-a716-446655440000",
"llmUsage": 0.0020118,
"totalUrlsScraped": 1,
"sources": {
"siteName": ["https://example.com"],
"category": ["https://example.com"]
}
}
],
"metadata": {}
}
```
### `extract.failed`
Sent when extraction fails. The `error` field contains the failure reason.
```json theme={null}
{
"success": false,
"type": "extract.failed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"error": "Failed to extract data: timeout exceeded",
"metadata": {}
}
```
## Agent Events
### `agent.started`
Sent when the agent job begins processing.
```json theme={null}
{
"success": true,
"type": "agent.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
```
### `agent.action`
Sent after each tool execution (scrape, search, etc.).
```json theme={null}
{
"success": true,
"type": "agent.action",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 5,
"action": "mcp__tools__scrape",
"input": {
"url": "https://example.com"
}
}
],
"metadata": {}
}
```
The `creditsUsed` value in `action` events is an **estimate** of the total
credits used so far. The final accurate credit count is only
available in the `completed`, `failed`, or `cancelled` events.
### `agent.completed`
Sent when the agent finishes successfully. The `data` array contains the extracted data and total credits used.
```json theme={null}
{
"success": true,
"type": "agent.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 15,
"data": {
"company": "Example Corp",
"industry": "Technology",
"founded": 2020
}
}
],
"metadata": {}
}
```
### `agent.failed`
Sent when the agent encounters an error. The `error` field contains the failure reason.
```json theme={null}
{
"success": false,
"type": "agent.failed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 8
}
],
"error": "Max credits exceeded",
"metadata": {}
}
```
### `agent.cancelled`
Sent when the agent job is cancelled by the user.
```json theme={null}
{
"success": false,
"type": "agent.cancelled",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 3
}
],
"metadata": {}
}
```
## Event Filtering
By default, you receive all events. To subscribe to specific events only, use the `events` array in your webhook config:
```json theme={null}
{
"url": "https://your-app.com/webhook",
"events": ["completed", "failed"]
}
```
This is useful if you only care about job completion and don't need per-page updates.
# Overview
Source: https://docs.firecrawl.dev/webhooks/overview
Real-time notifications for your Firecrawl operations
Get notified the moment a crawl, batch scrape, extract, or agent job starts, progresses, or finishes. Instead of polling for status, you provide an HTTPS endpoint and Firecrawl delivers events to it in real time.
## Supported Operations
| Operation | Events |
| ------------ | ------------------------------------------------------- |
| Crawl | `started`, `page`, `completed` |
| Batch Scrape | `started`, `page`, `completed` |
| Extract | `started`, `completed`, `failed` |
| Agent | `started`, `action`, `completed`, `failed`, `cancelled` |
See [Event Types](/webhooks/events) for full payload details and examples.
## Configuration
Add a `webhook` object to your request:
```json JSON theme={null}
{
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed", "failed"]
}
}
```
| Field | Type | Required | Description |
| ---------- | ------ | -------- | ------------------------------------- |
| `url` | string | Yes | Your endpoint URL (HTTPS) |
| `headers` | object | No | Custom headers to include |
| `metadata` | object | No | Custom data included in payloads |
| `events` | array | No | Event types to receive (default: all) |
## Usage
### Crawl with Webhook
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 100,
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
```
### Batch Scrape with Webhook
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
```
## Timeouts & Retries
Your endpoint must respond with a `2xx` status within **10 seconds**.
If delivery fails (timeout, non-2xx, or network error), Firecrawl retries automatically:
| Retry | Delay after failure |
| ----- | ------------------- |
| 1st | 1 minute |
| 2nd | 5 minutes |
| 3rd | 15 minutes |
After 3 failed retries, the webhook is marked as failed and no further attempts are made.
# Security
Source: https://docs.firecrawl.dev/webhooks/security
Verify webhook authenticity
Verify that every webhook request actually came from Firecrawl by checking its HMAC-SHA256 signature. This stops attackers from spoofing payloads and lets you trust the data before acting on it.
## Secret Key
Your webhook secret is available in the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings. Each account has a unique secret used to sign all webhook requests.
Keep your webhook secret secure and never expose it publicly. If you believe your secret has been compromised, regenerate it immediately from your account settings.
## Signature Verification
Each webhook request includes an `X-Firecrawl-Signature` header:
```
X-Firecrawl-Signature: sha256=abc123def456...
```
### How to Verify
1. Extract the signature from the `X-Firecrawl-Signature` header
2. Get the raw request body (before parsing)
3. Compute HMAC-SHA256 using your secret key
4. Compare signatures using a timing-safe function
### Implementation
```js Node/Express theme={null}
import crypto from 'crypto';
import express from 'express';
const app = express();
// Use raw body parser for signature verification
app.use('/webhook/firecrawl', express.raw({ type: 'application/json' }));
app.post('/webhook/firecrawl', (req, res) => {
const signature = req.get('X-Firecrawl-Signature');
const webhookSecret = process.env.FIRECRAWL_WEBHOOK_SECRET;
if (!signature || !webhookSecret) {
return res.status(401).send('Unauthorized');
}
// Extract hash from signature header
const [algorithm, hash] = signature.split('=');
if (algorithm !== 'sha256') {
return res.status(401).send('Invalid signature algorithm');
}
// Compute expected signature
const expectedSignature = crypto
.createHmac('sha256', webhookSecret)
.update(req.body)
.digest('hex');
// Verify signature using timing-safe comparison
if (!crypto.timingSafeEqual(Buffer.from(hash, 'hex'), Buffer.from(expectedSignature, 'hex'))) {
return res.status(401).send('Invalid signature');
}
// Parse and process verified webhook
const event = JSON.parse(req.body);
console.log('Verified Firecrawl webhook:', event);
res.status(200).send('ok');
});
app.listen(3000, () => console.log('Listening on 3000'));
```
```python Python/Flask theme={null}
import hmac
import hashlib
from flask import Flask, request, abort
app = Flask(__name__)
WEBHOOK_SECRET = 'your-webhook-secret-here' # Get from Firecrawl dashboard
@app.post('/webhook/firecrawl')
def webhook():
signature = request.headers.get('X-Firecrawl-Signature')
if not signature:
abort(401, 'Missing signature header')
# Extract hash from signature header
try:
algorithm, hash_value = signature.split('=', 1)
if algorithm != 'sha256':
abort(401, 'Invalid signature algorithm')
except ValueError:
abort(401, 'Invalid signature format')
# Compute expected signature
expected_signature = hmac.new(
WEBHOOK_SECRET.encode('utf-8'),
request.data,
hashlib.sha256
).hexdigest()
# Verify signature using timing-safe comparison
if not hmac.compare_digest(hash_value, expected_signature):
abort(401, 'Invalid signature')
# Parse and process verified webhook
event = request.get_json(force=True)
print('Verified Firecrawl webhook:', event)
return 'ok', 200
if __name__ == '__main__':
app.run(port=3000)
```
## Best Practices
* **Verify every request.** Always check the signature before processing a webhook payload. Reject any request that fails verification with a `401` status.
* **Use timing-safe comparisons.** Standard string comparison can leak timing information. Use `crypto.timingSafeEqual()` in Node.js or `hmac.compare_digest()` in Python.
* **Serve your endpoint over HTTPS.** This ensures webhook payloads are encrypted in transit.
# Testing
Source: https://docs.firecrawl.dev/webhooks/testing
Test and debug webhooks
Verify that your webhook integration works before deploying to production. This page covers how to receive webhooks on your local machine and how to diagnose common delivery and verification failures.
## Local Development
Webhooks require a publicly reachable URL, so you need to expose your local server to the internet during development.
### Using Cloudflare Tunnels
[Cloudflare Tunnels](https://github.com/cloudflare/cloudflared/releases) provide a free way to expose your local server without opening firewall ports:
```bash theme={null}
cloudflared tunnel --url localhost:3000
```
You'll get a public URL like `https://abc123.trycloudflare.com`. Use this in your webhook config:
```json theme={null}
{
"url": "https://abc123.trycloudflare.com/webhook"
}
```
## Troubleshooting
### Webhooks Not Arriving
* **Endpoint not accessible** - Verify your server is publicly reachable and firewalls allow incoming connections
* **Using HTTP** - Webhook URLs must use HTTPS
* **Wrong events** - Check the `events` filter in your webhook config
* **Timeout errors** - Ensure your endpoint responds within 10 seconds
### Signature Verification Failing
The most common cause is using the parsed JSON body instead of the raw request body. A second cause is using the wrong secret, so confirm yours matches the value in your [account settings](https://www.firecrawl.dev/app/settings?tab=advanced).
```javascript theme={null}
// Wrong - using parsed body
const signature = crypto
.createHmac('sha256', secret)
.update(JSON.stringify(req.body))
.digest('hex');
// Correct - using raw body
app.use('/webhook', express.raw({ type: 'application/json' }));
app.post('/webhook', (req, res) => {
const signature = crypto
.createHmac('sha256', secret)
.update(req.body) // Raw buffer
.digest('hex');
});
```
# Advanced Scraping Guide
Source: https://docs.firecrawl.dev/advanced-scraping-guide
Configure scrape options, browser actions, crawl, map, and the agent endpoint with Firecrawl's full API surface.
Reference for every option across Firecrawl's scrape, crawl, map, and agent endpoints.
## Basic scraping
To scrape a single page and get clean markdown content, use the `/scrape` endpoint.
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape("https://firecrawl.dev")
print(doc.markdown)
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import { Firecrawl } from 'firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
const doc = await firecrawl.scrape('https://firecrawl.dev');
console.log(doc.markdown);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://docs.firecrawl.dev"
}'
```
## Scraping PDFs
Firecrawl supports PDFs. Use the `parsers` option (e.g., `parsers: ["pdf"]`) when you want to ensure PDF parsing. You can control the parsing strategy with the `mode` option:
* **`auto`** (default) — attempts fast text-based extraction first, then falls back to OCR if needed.
* **`fast`** — text-based parsing only (embedded text). Fastest, but skips scanned/image-heavy pages.
* **`ocr`** — forces OCR parsing on every page. Use for scanned documents or when `auto` misclassifies a page.
`{ type: "pdf" }` and `"pdf"` both default to `mode: "auto"`.
```json theme={null}
"parsers": [{ "type": "pdf", "mode": "fast", "maxPages": 50 }]
```
## Scrape options
When using the `/scrape` endpoint, you can customize the request with the following options.
### Formats (`formats`)
The `formats` array controls which output types the scraper returns. Default: `["markdown"]`.
**String formats**: pass the name directly (e.g. `"markdown"`).
| Format | Description |
| ---------- | ---------------------------------------------------------------------------- |
| `markdown` | Page content converted to clean Markdown. |
| `html` | Processed HTML with unnecessary elements removed. |
| `rawHtml` | Original HTML exactly as returned by the server. |
| `links` | All links found on the page. |
| `images` | All images found on the page. |
| `summary` | An LLM-generated summary of the page content. |
| `branding` | Extracts brand identity (colors, fonts, typography, spacing, UI components). |
**Object formats**: pass an object with `type` and additional options.
| Format | Options | Description |
| ---------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `json` | `prompt?: string`, `schema?: object` | Extract structured data using an LLM. Provide a JSON schema and/or a natural-language prompt (max 10,000 characters). |
| `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max one per request. Viewport max resolution is 7680×4320. Screenshot URLs expire after 24 hours. |
| `changeTracking` | `modes?: ("json" \| "git-diff")[]`, `tag?: string`, `schema?: object`, `prompt?: string` | Track changes between scrapes. Requires `"markdown"` to also be in the formats array. |
| `attributes` | `selectors: [{ selector: string, attribute: string }]` | Extract specific HTML attributes from elements matching CSS selectors. |
### Content filtering
These parameters control which parts of the page appear in the output. When `onlyMainContent` is `true` (the default), boilerplate (nav, footer, etc.) is stripped. `includeTags` and `excludeTags` are applied against the original page DOM, not the post-filtered result, so your selectors should target elements as they appear in the source HTML. Set `onlyMainContent: false` to use the full page as the starting point for tag filtering.
| Parameter | Type | Default | Description |
| ----------------- | --------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `onlyMainContent` | `boolean` | `true` | Return only the main content. Set `false` for the full page. |
| `includeTags` | `array` | — | CSS selectors to include — tags, classes, IDs, or attribute selectors (e.g. `["h1", "p", ".main-content", "[data-testid=\"main\"]"]`). |
| `excludeTags` | `array` | — | CSS selectors to exclude — tags, classes, IDs, or attribute selectors (e.g. `["#ad", "#footer", "[role=\"banner\"]"]`). |
### Timing and cache
| Parameter | Type | Default | Description |
| --------- | -------------- | ----------- | ------------------------------------------------------------------------------------------------------ |
| `waitFor` | `integer` (ms) | `0` | Extra wait time before scraping, on top of smart-wait. Use sparingly. |
| `maxAge` | `integer` (ms) | `172800000` | Return a cached version if fresher than this value (default is 2 days). Set `0` to always fetch fresh. |
| `timeout` | `integer` (ms) | `60000` | Max request duration before aborting (default is 60 seconds). Minimum is 1000 (1 second). |
### PDF parsing
| Parameter | Type | Default | Description |
| --------- | ------- | --------- | -------------------------------------------------------------------------------- |
| `parsers` | `array` | `["pdf"]` | Controls PDF processing. `[]` to skip parsing and return base64 (1 credit flat). |
```json theme={null}
{ "type": "pdf", "mode": "fast" | "auto" | "ocr", "maxPages": 10 }
```
| Property | Type | Default | Description |
| ---------- | --------------------------- | ------------ | ------------------------------------------------------------------------------------- |
| `type` | `"pdf"` | *(required)* | Parser type. |
| `mode` | `"fast" \| "auto" \| "ocr"` | `"auto"` | `fast`: text-based extraction only. `auto`: fast with OCR fallback. `ocr`: force OCR. |
| `maxPages` | `integer` | — | Cap the number of pages to parse. |
### Actions
Run browser actions before scraping. This is useful for dynamic content, navigation, or user-gated pages. You can include up to 50 actions per request, and the combined wait time across all `wait` actions and `waitFor` must not exceed 60 seconds.
| Action | Parameters | Description |
| ------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `wait` | `milliseconds?: number`, `selector?: string` | Wait for a fixed duration **or** until an element is visible (provide one, not both). When using `selector`, times out after 30 seconds. |
| `click` | `selector: string`, `all?: boolean` | Click an element matching the CSS selector. Set `all: true` to click every match. |
| `write` | `text: string` | Type text into the currently focused field. You must focus the element with a `click` action first. |
| `press` | `key: string` | Press a keyboard key (e.g. `"Enter"`, `"Tab"`, `"Escape"`). |
| `scroll` | `direction?: "up" \| "down"`, `selector?: string` | Scroll the page or a specific element. Direction defaults to `"down"`. |
| `screenshot` | `fullPage?: boolean`, `quality?: number`, `viewport?: { width, height }` | Capture a screenshot. Max viewport resolution is 7680×4320. |
| `scrape` | *(none)* | Capture the current page HTML at this point in the action sequence. |
| `executeJavascript` | `script: string` | Run JavaScript code in the page. Return values are available in the `actions.javascriptReturns` array of the response. |
| `pdf` | `format?: string`, `landscape?: boolean`, `scale?: number` | Generate a PDF. Supported formats: `"A0"` through `"A6"`, `"Letter"`, `"Legal"`, `"Tabloid"`, `"Ledger"`. Defaults to `"Letter"`. |
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')
doc = firecrawl.scrape('https://example.com', {
'actions': [
{ 'type': 'wait', 'milliseconds': 1000 },
{ 'type': 'click', 'selector': '#accept' },
{ 'type': 'scroll', 'direction': 'down' },
{ 'type': 'click', 'selector': '#q' },
{ 'type': 'write', 'text': 'firecrawl' },
{ 'type': 'press', 'key': 'Enter' },
{ 'type': 'wait', 'milliseconds': 2000 }
],
'formats': ['markdown']
})
print(doc.markdown)
```
```js Node theme={null}
import { Firecrawl } from 'firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
const doc = await firecrawl.scrape('https://example.com', {
actions: [
{ type: 'wait', milliseconds: 1000 },
{ type: 'click', selector: '#accept' },
{ type: 'scroll', direction: 'down' },
{ type: 'click', selector: '#q' },
{ type: 'write', text: 'firecrawl' },
{ type: 'press', key: 'Enter' },
{ type: 'wait', milliseconds: 2000 }
],
formats: ['markdown']
});
console.log(doc.markdown);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "wait", "milliseconds": 1000 },
{ "type": "click", "selector": "#accept" },
{ "type": "scroll", "direction": "down" },
{ "type": "click", "selector": "#q" },
{ "type": "write", "text": "firecrawl" },
{ "type": "press", "key": "Enter" },
{ "type": "wait", "milliseconds": 2000 }
],
"formats": ["markdown"]
}'
```
#### Action execution notes
* **Write** requires a preceding `click` to focus the target element.
* **Scroll** accepts an optional `selector` to scroll a specific element instead of the page.
* **Wait** accepts either `milliseconds` (fixed delay) or `selector` (wait until visible).
* Actions run **sequentially**: each step completes before the next begins.
* Actions are **not supported for PDFs**. If the URL resolves to a PDF the request will fail.
#### Advanced action examples
**Taking a screenshot:**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "click", "selector": "#load-more" },
{ "type": "wait", "milliseconds": 1000 },
{ "type": "screenshot", "fullPage": true, "quality": 80 }
]
}'
```
**Clicking multiple elements:**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "click", "selector": ".expand-button", "all": true },
{ "type": "wait", "milliseconds": 500 }
],
"formats": ["markdown"]
}'
```
**Generating a PDF:**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "pdf", "format": "A4", "landscape": false }
]
}'
```
**Executing JavaScript (e.g. extracting embedded page data):**
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"actions": [
{ "type": "executeJavascript", "script": "document.querySelector(\"#__NEXT_DATA__\").textContent" }
],
"formats": ["markdown"]
}'
```
The return value of each `executeJavascript` action is captured in the `actions.javascriptReturns` array of the response.
### Full scrape example
The following request combines multiple scrape options:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"formats": [
"markdown",
"links",
"html",
"rawHtml",
{ "type": "screenshot", "fullPage": true, "quality": 80 }
],
"includeTags": ["h1", "p", "a", ".main-content"],
"excludeTags": ["#ad", "#footer"],
"onlyMainContent": false,
"waitFor": 1000,
"timeout": 15000,
"parsers": ["pdf"]
}'
```
This request returns markdown, HTML, raw HTML, links, and a full-page screenshot. It scopes content to `
);
};
export default ChatBotDemo;
```
### Understanding the Frontend
The frontend uses AI Elements components to provide a complete chat interface:
**Key Features:**
* **Conversation Display**: The `Conversation` component automatically handles message scrolling and display
* **Message Rendering**: Each message part is rendered based on its type (text, reasoning, tool calls)
* **Tool Visualization**: Tool calls are displayed with collapsible sections showing inputs and outputs
* **Interactive Controls**: Users can toggle web search, select models, and attach files
* **Message Actions**: Copy and retry actions for assistant messages
To ensure the markdown from the LLM is correctly rendered, add the following import to your `app/globals.css` file:
```css theme={null}
@source "../node_modules/streamdown/dist/index.js";
```
This imports the necessary styles for rendering markdown content in the message responses.
Create the chat API endpoint at `app/api/chat/route.ts`. This route will handle incoming messages and stream responses from the AI.
```typescript theme={null}
import { streamText, UIMessage, convertToModelMessages } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
// Allow streaming responses up to 5 minutes
export const maxDuration = 300;
export async function POST(req: Request) {
const {
messages,
model,
webSearch,
}: {
messages: UIMessage[];
model: string;
webSearch: boolean;
} = await req.json();
const result = streamText({
model: openai(model),
messages: convertToModelMessages(messages),
system:
"You are a helpful assistant that can answer questions and help with tasks.",
});
// send sources and reasoning back to the client
return result.toUIMessageStreamResponse({
sendSources: true,
sendReasoning: true,
});
}
```
This basic route:
* Receives messages from the frontend
* Uses the OpenAI model selected by the user
* Streams responses back to the client
* Doesn't include tools yet - we'll add those next
Create a `.env.local` file in your project root:
```bash theme={null}
touch .env.local
```
Add your OpenAI API key:
```env theme={null}
OPENAI_API_KEY=sk-your-openai-api-key
```
The `OPENAI_API_KEY` is required for the AI model to function.
Now you can test the AI SDK chatbot without Firecrawl integration. Start the development server:
```bash theme={null}
npm run dev
```
Open [localhost:3000](http://localhost:3000) in your browser and test the basic chat functionality. The assistant should respond to messages, but won't have web scraping or search capabilities yet.
Now let's enhance the assistant with web scraping and search capabilities using Firecrawl.
### Install Firecrawl SDK
Firecrawl converts websites into LLM-ready formats with scraping and search capabilities:
```bash theme={null}
npm i @mendable/firecrawl-js
```
### Create the Tools File
Create a `lib` folder and add a `tools.ts` file inside it:
```bash theme={null}
mkdir lib && touch lib/tools.ts
```
Add the following code to define the web scraping and search tools:
```typescript lib/tools.ts theme={null}
import FirecrawlApp from "@mendable/firecrawl-js";
import { tool } from "ai";
import { z } from "zod";
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
export const scrapeWebsiteTool = tool({
description: 'Scrape content from any website URL',
inputSchema: z.object({
url: z.string().url().describe('The URL to scrape')
}),
execute: async ({ url }) => {
console.log('Scraping:', url);
const result = await firecrawl.scrape(url, {
formats: ['markdown'],
onlyMainContent: true,
timeout: 30000
});
console.log('Scraped content preview:', result.markdown?.slice(0, 200) + '...');
return { content: result.markdown };
}
});
export const searchWebTool = tool({
description: 'Search the web using Firecrawl',
inputSchema: z.object({
query: z.string().describe('The search query'),
limit: z.number().optional().describe('Number of results'),
location: z.string().optional().describe('Location for localized results'),
tbs: z.string().optional().describe('Time filter (qdr:h, qdr:d, qdr:w, qdr:m, qdr:y)'),
sources: z.array(z.enum(['web', 'news', 'images'])).optional().describe('Result types'),
categories: z.array(z.enum(['github', 'research', 'pdf'])).optional().describe('Filter categories'),
}),
execute: async ({ query, limit, location, tbs, sources, categories }) => {
console.log('Searching:', query);
const response = await firecrawl.search(query, {
...(limit && { limit }),
...(location && { location }),
...(tbs && { tbs }),
...(sources && { sources }),
...(categories && { categories }),
}) as { web?: Array<{ title?: string; url?: string; description?: string }> };
const results = (response.web || []).map((item) => ({
title: item.title || item.url || 'Untitled',
url: item.url || '',
description: item.description || '',
}));
console.log('Search results:', results.length);
return { results };
},
});
```
### Understanding the Tools
**Scrape Website Tool:**
* Accepts a URL as input (validated by Zod schema)
* Uses Firecrawl's `scrape` method to fetch the page as markdown
* Extracts only the main content to reduce token usage
* Returns the scraped content for the AI to analyze
**Search Web Tool:**
* Accepts a search query with optional filters
* Uses Firecrawl's `search` method to find relevant web pages
* Supports advanced filters like location, time range, and content categories
* Returns structured results with titles, URLs, and descriptions
Learn more about tools: [ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools).
Now update your `app/api/chat/route.ts` to include the Firecrawl tools we just created.
```typescript theme={null}
import { streamText, UIMessage, stepCountIs, convertToModelMessages } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
import { scrapeWebsiteTool, searchWebTool } from "@/lib/tools";
const openai = createOpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
export const maxDuration = 300;
export async function POST(req: Request) {
const {
messages,
model,
webSearch,
}: {
messages: UIMessage[];
model: string;
webSearch: boolean;
} = await req.json();
const result = streamText({
model: openai(model),
messages: convertToModelMessages(messages),
system:
"You are a helpful assistant that can answer questions and help with tasks.",
// Add the Firecrawl tools here
tools: {
scrapeWebsite: scrapeWebsiteTool,
searchWeb: searchWebTool,
},
stopWhen: stepCountIs(5),
toolChoice: webSearch ? "auto" : "none",
});
return result.toUIMessageStreamResponse({
sendSources: true,
sendReasoning: true,
});
}
```
The key changes from the basic route:
* Import `stepCountIs` from the AI SDK
* Import the Firecrawl tools from `@/lib/tools`
* Add the `tools` object with both `scrapeWebsite` and `searchWeb` tools
* Add `stopWhen: stepCountIs(5)` to limit execution steps
* Set `toolChoice` to "auto" when web search is enabled, "none" otherwise
Learn more about `streamText`: [ai-sdk.dev/docs/reference/ai-sdk-core/stream-text](https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text).
Update your `.env.local` file to include your Firecrawl API key:
```env theme={null}
OPENAI_API_KEY=sk-your-openai-api-key
FIRECRAWL_API_KEY=fc-your-firecrawl-api-key
```
Get your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev).
Restart your development server:
```bash theme={null}
npm run dev
```
Open [localhost:3000](http://localhost:3000) and test the enhanced assistant:
1. Toggle the "Search" button to enable web search
2. Ask: "What are the latest features from firecrawl.dev?"
3. Watch as the AI calls the `searchWeb` or `scrapeWebsite` tool
4. See the tool execution in the UI with inputs and outputs
5. Read the AI's analysis based on the scraped data
## How It Works
### Message Flow
1. **User sends a message**: The user types a question and clicks submit
2. **Frontend sends request**: `useChat` sends the message to `/api/chat` with the selected model and web search setting
3. **Backend processes message**: The API route receives the message and calls `streamText`
4. **AI decides on tools**: The model analyzes the question and decides whether to use `scrapeWebsite` or `searchWeb` (only if web search is enabled)
5. **Tools execute**: If tools are called, Firecrawl scrapes or searches the web
6. **AI generates response**: The model analyzes tool results and generates a natural language response
7. **Frontend displays results**: The UI shows tool calls and the final response in real-time
### Tool Calling Process
The AI SDK's tool calling system ([ai-sdk.dev/docs/foundations/tools](https://ai-sdk.dev/docs/foundations/tools)) works as follows:
1. The model receives the user's message and available tool descriptions
2. If the model determines a tool is needed, it generates a tool call with parameters
3. The SDK executes the tool function with those parameters
4. The tool result is sent back to the model
5. The model uses the result to generate its final response
This all happens automatically within a single `streamText` call, with results streaming to the frontend in real-time.
## Key Features
### Model Selection
The application supports multiple OpenAI models:
* **GPT-5 Mini (Thinking)**: Recent OpenAI model with advanced reasoning capabilities
* **GPT-4o Mini**: Fast and cost-effective model
Users can switch between models using the dropdown selector.
### Web Search Toggle
The Search button controls whether the AI can use Firecrawl tools:
* **Enabled**: AI can call `scrapeWebsite` and `searchWeb` tools as needed
* **Disabled**: AI responds only with its training knowledge
This gives users control over when to use web data versus the model's built-in knowledge.
## Customization Ideas
### Add More Tools
Extend the assistant with additional tools:
* Database lookups for internal company data
* CRM integration to fetch customer information
* Email sending capabilities
* Document generation
Each tool follows the same pattern: define a schema with Zod, implement the execute function, and register it in the `tools` object.
### Change the AI Model
Swap OpenAI for another provider:
```typescript theme={null}
import { anthropic } from "@ai-sdk/anthropic";
const result = streamText({
model: anthropic("claude-4.5-sonnet"),
// ... rest of config
});
```
The AI SDK supports 20+ providers with the same API. Learn more: [ai-sdk.dev/docs/foundations/providers-and-models](https://ai-sdk.dev/docs/foundations/providers-and-models).
### Customize the UI
AI Elements components are built on shadcn/ui, so you can:
* Modify component styles in the component files
* Add new variants to existing components
* Create custom components that match the design system
## Best Practices
1. **Use appropriate tools**: Choose `searchWeb` to find relevant pages first, `scrapeWebsite` for single pages, or let the AI decide
2. **Monitor API usage**: Track your Firecrawl and OpenAI API usage to avoid unexpected costs
3. **Handle errors gracefully**: The tools include error handling, but consider adding user-facing error messages
4. **Optimize performance**: Use streaming to provide immediate feedback and consider caching frequently accessed content
5. **Set reasonable limits**: The `stopWhen: stepCountIs(5)` prevents excessive tool calls and runaway costs
***
## Related Resources
Explore the AI SDK for building AI-powered applications with streaming, tool
calling, and multi-provider support.
Pre-built UI components for AI applications built on shadcn/ui.
# Building a Brand Style Guide Generator with Firecrawl
Source: https://docs.firecrawl.dev/developer-guides/cookbooks/brand-style-guide-generator-cookbook
Generate professional PDF brand style guides by extracting design systems from any website using Firecrawl's branding format
Build a brand style guide generator that automatically extracts colors, typography, spacing, and visual identity from any website and compiles it into a professional PDF document.
## What You'll Build
A Node.js application that takes any website URL, extracts its complete brand identity using Firecrawl's branding format, and generates a polished PDF style guide with:
* Color palette with hex values
* Typography system (fonts, sizes, weights)
* Spacing and layout specifications
* Logo and brand imagery
* Theme information (light/dark mode)
## Prerequisites
* Node.js 18 or later installed
* A Firecrawl API key from [firecrawl.dev](https://firecrawl.dev)
* Basic knowledge of TypeScript and Node.js
Start by creating a new directory for your project and initializing it:
```bash theme={null}
mkdir brand-style-guide-generator && cd brand-style-guide-generator
npm init -y
```
Update your `package.json` to use ES modules:
```json package.json theme={null}
{
"name": "brand-style-guide-generator",
"version": "1.0.0",
"type": "module",
"scripts": {
"start": "npx tsx index.ts"
}
}
```
Install the required packages for web scraping and PDF generation:
```bash theme={null}
npm i @mendable/firecrawl-js pdfkit
npm i -D typescript tsx @types/node @types/pdfkit
```
These packages provide:
* `@mendable/firecrawl-js`: Firecrawl SDK for extracting brand identity from websites
* `pdfkit`: PDF document generation library
* `tsx`: TypeScript execution for Node.js
Create the main application file at `index.ts`. This script extracts brand identity from a URL and generates a professional PDF style guide.
```typescript index.ts theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import PDFDocument from "pdfkit";
import fs from "fs";
const API_KEY = "fc-YOUR-API-KEY";
const URL = "https://firecrawl.dev";
async function main() {
const fc = new Firecrawl({ apiKey: API_KEY });
const { branding: b } = (await fc.scrape(URL, { formats: ["branding"] })) as any;
const doc = new PDFDocument({ size: "A4", margin: 50 });
doc.pipe(fs.createWriteStream("brand-style-guide.pdf"));
// Fetch logo (PNG/JPG only)
let logoImg: Buffer | null = null;
try {
const logoUrl = b.images?.favicon || b.images?.ogImage;
if (logoUrl?.match(/\.(png|jpg|jpeg)$/i)) {
const res = await fetch(logoUrl);
logoImg = Buffer.from(await res.arrayBuffer());
}
} catch {}
// Header with logo
doc.rect(0, 0, 595, 120).fill(b.colors?.primary || "#333");
const titleX = logoImg ? 130 : 50;
if (logoImg) doc.image(logoImg, 50, 30, { height: 60 });
doc.fontSize(36).fillColor("#fff").text("Brand Style Guide", titleX, 38);
doc.fontSize(14).text(URL, titleX, 80);
// Colors
doc.fontSize(18).fillColor("#333").text("Colors", 50, 160);
const colors = Object.entries(b.colors || {}).filter(([, v]) => typeof v === "string" && (v as string).startsWith("#"));
colors.forEach(([k, v], i) => {
const x = 50 + i * 100;
doc.rect(x, 195, 80, 80).fill(v as string);
doc.fontSize(10).fillColor("#333").text(k, x, 282, { width: 80, align: "center" });
doc.fontSize(9).fillColor("#888").text(v as string, x, 296, { width: 80, align: "center" });
});
// Typography
doc.fontSize(18).fillColor("#333").text("Typography", 50, 340);
doc.fontSize(13).fillColor("#444");
doc.text(`Primary Font: ${b.typography?.fontFamilies?.primary || "—"}`, 50, 370);
doc.text(`Heading Font: ${b.typography?.fontFamilies?.heading || "—"}`, 50, 392);
doc.fontSize(12).fillColor("#666").text("Font Sizes:", 50, 422);
Object.entries(b.typography?.fontSizes || {}).forEach(([k, v], i) => {
doc.text(`${k.toUpperCase()}: ${v}`, 70, 445 + i * 22);
});
// Spacing & Theme
doc.fontSize(18).fillColor("#333").text("Spacing & Theme", 320, 340);
doc.fontSize(13).fillColor("#444");
doc.text(`Base Unit: ${b.spacing?.baseUnit}px`, 320, 370);
doc.text(`Border Radius: ${b.spacing?.borderRadius}`, 320, 392);
doc.text(`Color Scheme: ${b.colorScheme}`, 320, 414);
doc.end();
console.log("Generated: brand-style-guide.pdf");
}
main();
```
For this simple project, the API key is placed directly in the code. If you plan to push this to GitHub or share it with others, move the key to a `.env` file and use `process.env.FIRECRAWL_API_KEY` instead.
Replace `fc-YOUR-API-KEY` with your Firecrawl API key from [firecrawl.dev](https://firecrawl.dev).
### Understanding the Code
**Key Components:**
* **Firecrawl Branding Format**: The `branding` format extracts comprehensive brand identity including colors, typography, spacing, and images
* **PDFKit Document**: Creates a professional A4 PDF with proper margins and sections
* **Color Swatches**: Renders visual color blocks with hex values and semantic names
* **Typography Display**: Shows font families and sizes in an organized layout
* **Spacing & Theme**: Documents the design system's spacing units and color scheme
Run the script to generate a brand style guide:
```bash theme={null}
npm start
```
The script will:
1. Extract the brand identity from the target URL using Firecrawl's branding format
2. Generate a PDF named `brand-style-guide.pdf`
3. Save it in your project directory
To generate a style guide for a different website, simply change the `URL` constant in `index.ts`.
## How It Works
### Extraction Process
1. **URL Input**: The generator receives a target website URL
2. **Firecrawl Scrape**: Calls the `/scrape` endpoint with the `branding` format
3. **Brand Analysis**: Firecrawl analyzes the page's CSS, fonts, and visual elements
4. **Data Return**: Returns a structured `BrandingProfile` object with all design tokens
### PDF Generation Process
1. **Header Creation**: Generates a colored header using the primary brand color
2. **Logo Fetch**: Downloads and embeds the logo or favicon if available
3. **Color Palette**: Renders each color as a visual swatch with metadata
4. **Typography Section**: Documents font families, sizes, and weights
5. **Spacing Info**: Includes base units, border radius, and theme mode
### Branding Profile Structure
The [branding format](https://docs.firecrawl.dev/features/scrape#%2Fscrape-with-branding-endpoint) returns detailed brand information:
```typescript theme={null}
{
colorScheme: "dark" | "light",
logo: "https://example.com/logo.svg",
colors: {
primary: "#FF6B35",
secondary: "#004E89",
accent: "#F77F00",
background: "#1A1A1A",
textPrimary: "#FFFFFF",
textSecondary: "#B0B0B0"
},
typography: {
fontFamilies: { primary: "Inter", heading: "Inter", code: "Roboto Mono" },
fontSizes: { h1: "48px", h2: "36px", body: "16px" },
fontWeights: { regular: 400, medium: 500, bold: 700 }
},
spacing: {
baseUnit: 8,
borderRadius: "8px"
},
images: {
logo: "https://example.com/logo.svg",
favicon: "https://example.com/favicon.ico"
}
}
```
## Key Features
### Automatic Color Extraction
The generator identifies and categorizes all brand colors:
* **Primary & Secondary**: Main brand colors
* **Accent**: Highlight and CTA colors
* **Background & Text**: UI foundation colors
* **Semantic Colors**: Success, warning, error states
### Typography Documentation
Captures the complete type system:
* **Font Families**: Primary, heading, and monospace fonts
* **Size Scale**: All heading and body text sizes
* **Font Weights**: Available weight variations
### Visual Output
The PDF includes:
* Color-coded header matching the brand
* Embedded logo when available
* Professional layout with clear hierarchy
* Metadata footer with generation date
## Customization Ideas
### Add Component Documentation
Extend the generator to include UI component styles:
```typescript theme={null}
// Add after the Spacing & Theme section
if (b.components) {
doc.addPage();
doc.fontSize(20).fillColor("#333").text("UI Components", 50, 50);
// Document button styles
if (b.components.buttonPrimary) {
const btn = b.components.buttonPrimary;
doc.fontSize(14).text("Primary Button", 50, 90);
doc.rect(50, 110, 120, 40).fill(btn.background);
doc.fontSize(12).fillColor(btn.textColor).text("Button", 50, 120, { width: 120, align: "center" });
}
}
```
### Export Multiple Formats
Add JSON export alongside the PDF:
```typescript theme={null}
// Add before doc.end()
fs.writeFileSync("brand-data.json", JSON.stringify(b, null, 2));
```
### Batch Processing
Generate guides for multiple websites:
```typescript theme={null}
const websites = [
"https://stripe.com",
"https://linear.app",
"https://vercel.com"
];
for (const site of websites) {
const { branding } = await fc.scrape(site, { formats: ["branding"] }) as any;
// Generate PDF for each site...
}
```
### Custom PDF Themes
Create different PDF styles based on the extracted theme:
```typescript theme={null}
const isDarkMode = b.colorScheme === "dark";
const headerBg = isDarkMode ? b.colors?.background : b.colors?.primary;
const textColor = isDarkMode ? "#fff" : "#333";
```
## Best Practices
1. **Handle Missing Data**: Not all websites expose complete branding information. Always provide fallback values for missing properties.
2. **Respect Rate Limits**: When batch processing multiple sites, add delays between requests to respect Firecrawl's rate limits.
3. **Cache Results**: Store extracted branding data to avoid repeated API calls for the same site.
4. **Image Format Handling**: Some logos may be in formats PDFKit doesn't support (like SVG). Consider adding format conversion or graceful fallbacks.
5. **Error Handling**: Wrap the generation process in try-catch blocks and provide meaningful error messages.
***
## Related Resources
Learn more about the branding format and all available properties you can extract.
Complete API reference for the scrape endpoint with all format options.
Learn more about PDFKit for advanced PDF customization options.
Process multiple URLs efficiently with batch scraping.
# Anthropic
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/anthropic
Use Firecrawl with Claude for web scraping + AI workflows
Integrate Firecrawl with Claude to build AI applications powered by web data.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js @anthropic-ai/sdk zod zod-to-json-schema
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
ANTHROPIC_API_KEY=your_anthropic_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using Claude.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import Anthropic from '@anthropic-ai/sdk';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [
{ role: 'user', content: `Summarize in 100 words: ${scrapeResult.markdown}` }
]
});
console.log('Response:', message);
```
## Tool Use
This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { Anthropic } from '@anthropic-ai/sdk';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const firecrawl = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
const ScrapeArgsSchema = z.object({
url: z.string()
});
console.log("Sending user message to Claude and requesting tool use if necessary...");
const response = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
tools: [{
name: 'scrape_website',
description: 'Scrape and extract markdown content from a website URL',
input_schema: zodToJsonSchema(ScrapeArgsSchema, 'ScrapeArgsSchema') as any
}],
messages: [{
role: 'user',
content: 'What is Firecrawl? Check firecrawl.dev'
}]
});
const toolUse = response.content.find(block => block.type === 'tool_use');
if (toolUse && toolUse.type === 'tool_use') {
const input = toolUse.input as { url: string };
console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`);
const result = await firecrawl.scrape(input.url, {
formats: ['markdown']
});
console.log(`Scraped content preview: ${result.markdown?.substring(0, 300)}...`);
// Continue with the conversation or process the scraped content as needed
}
```
## Structured Extraction
This example demonstrates how to use Claude to extract structured data from scraped website content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string().optional(),
description: z.string().optional()
});
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown'],
onlyMainContent: true
});
const prompt = `Extract company information from this website content.
Output ONLY valid JSON in this exact format (no markdown, no explanation):
{
"name": "Company Name",
"industry": "Industry",
"description": "One sentence description"
}
Website content:
${scrapeResult.markdown}`;
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [
{ role: 'user', content: prompt },
{ role: 'assistant', content: '{' }
]
});
const textBlock = message.content.find(block => block.type === 'text');
if (textBlock && textBlock.type === 'text') {
const jsonText = '{' + textBlock.text;
const companyInfo = CompanyInfoSchema.parse(JSON.parse(jsonText));
console.log(companyInfo);
}
```
For more examples, check the [Claude documentation](https://docs.anthropic.com/claude/docs).
# ElevenAgents
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/elevenagents
Give ElevenLabs voice and chat agents real-time web access with Firecrawl
Give your [ElevenAgents](https://elevenlabs.io/agents) voice and chat agents the ability to scrape, search, and crawl the web in real time using Firecrawl. This guide covers two integration paths:
1. **MCP server** — connect the hosted Firecrawl MCP server for zero-code setup.
2. **Server webhook tool** — point a custom tool at Firecrawl's REST API for full control over requests.
## Prerequisites
* An [ElevenLabs](https://elevenlabs.io) account with access to ElevenAgents
* A Firecrawl API key from your [firecrawl.dev dashboard](https://firecrawl.dev)
## Option 1: Firecrawl MCP Server
The fastest way to give an agent web access. ElevenAgents supports remote MCP servers, and Firecrawl provides a hosted MCP endpoint.
### Add the MCP server
1. Open the [Integrations page](https://elevenlabs.io/app/agents/integrations) in ElevenLabs and click **+ Add integration**.
2. Select **Custom MCP Server** from the integration library.
3. Fill in the following fields:
| Field | Value |
| --------------- | ------------------------------------------------------------ |
| **Name** | Firecrawl |
| **Description** | Search, scrape, crawl, and extract content from any website. |
| **Server type** | Streamable HTTP |
| **Server URL** | `https://mcp.firecrawl.dev/YOUR_FIRECRAWL_API_KEY/v2/mcp` |
Replace `YOUR_FIRECRAWL_API_KEY` with your actual key. Leave the **Type** dropdown set to **Value**. Treat this URL as a secret — it contains your API key.
You must select **Streamable HTTP** as the server type. The default SSE option does not work with the Firecrawl MCP endpoint.
4. Under **Tool Approval Mode**, choose an approval level:
* **No Approval** — the agent uses tools freely. Fine for read-only scraping.
* **Fine-Grained Tool Approval** — lets you pre-select which tools can run automatically and which require approval. Good for controlling expensive crawl operations.
* **Always Ask** (default) — the agent requests permission before every tool call.
5. Check **I trust this server**, then click **Add Server**.
ElevenLabs will connect to the server and list the available tools (scrape, search, crawl, map, and more).
### Attach it to an agent
1. Create or open an agent in the [ElevenAgents dashboard](https://elevenlabs.io/app/agents/agents).
2. Go to the **Tools** tab, then select the **MCP** sub-tab.
3. Click **Add server** and select the **Firecrawl** integration from the dropdown.
### Update the system prompt
In the **Agent** tab, add instructions to the **System prompt** so the agent knows when to use Firecrawl. For example:
```text theme={null}
You are a helpful research assistant. When the user asks about a website,
a company, or any topic that requires up-to-date information, use the
Firecrawl tools to search the web or scrape the relevant page, then
summarize the results.
```
### Test it
Click **Preview** in the top navigation bar. You can test using the text chat input or by starting a voice call. Try a prompt like:
> "What does firecrawl.dev do? Go to the site and summarize it for me."
The agent will call the Firecrawl MCP `scrape` tool, receive the page markdown, and respond with a summary.
***
## Option 2: Server Webhook Tool
Use this approach when you need precise control over request parameters (formats, headers, timeouts, etc.) or want to call a specific Firecrawl endpoint without exposing the full MCP tool set.
### Scrape tool
Create a tool that scrapes a single URL and returns its content as markdown.
1. Open your agent and go to the **Tools** tab.
2. Click **Add tool** and select **Webhook**.
3. Configure the tool:
| Field | Value |
| --------------- | ---------------------------------------------------------- |
| **Name** | scrape\_website |
| **Description** | Scrape content from a URL and return it as clean markdown. |
| **Method** | POST |
| **URL** | `https://api.firecrawl.dev/v2/scrape` |
The **Method** field defaults to GET — make sure to change it to **POST**.
4. Scroll to the **Headers** section and click **Add header** for authentication:
| Header | Value |
| --------------- | ------------------------------- |
| `Authorization` | `Bearer YOUR_FIRECRAWL_API_KEY` |
Alternatively, if you have workspace auth connections configured, you can use the **Authentication** dropdown instead.
5. Add a **body parameter**:
| Parameter | Type | Description | Required |
| --------- | ------ | ----------------- | -------- |
| `url` | string | The URL to scrape | Yes |
6. Click **Add tool**.
The Firecrawl API returns the page content as markdown by default. The agent receives the JSON response and can use the `markdown` field to answer questions.
### Search tool
Create a tool that searches the web and returns results with scraped content.
1. Click **Add tool** → **Webhook** again and configure:
| Field | Value |
| --------------- | ------------------------------------------------------------------------- |
| **Name** | search\_web |
| **Description** | Search the web for a query and return relevant results with page content. |
| **Method** | POST |
| **URL** | `https://api.firecrawl.dev/v2/search` |
2. Add the same `Authorization` header as above.
3. Add **body parameters**:
| Parameter | Type | Description | Required |
| --------- | ------ | ----------------------------------------------- | -------- |
| `query` | string | The search query | Yes |
| `limit` | number | Maximum number of results to return (default 5) | No |
4. Click **Add tool**.
### Update the system prompt
In the **Agent** tab, update the **System prompt**:
```text theme={null}
You are a knowledgeable assistant with access to web tools.
- Use `scrape_website` when the user gives you a specific URL to read.
- Use `search_web` when the user asks a general question that requires
finding information online.
Always summarize the information concisely and cite the source URL.
```
### Test it
Click **Preview** and try asking:
> "Search for the latest Next.js features and give me a summary."
The agent will call `search_web`, receive results from Firecrawl, and respond with a summary of the findings.
***
## Tips
* **Model selection** — For reliable tool calling, use a high-intelligence model such as GPT-4o, Claude Sonnet 4.5 or later, or Gemini 2.5 Flash. Smaller models may struggle to extract the correct parameters.
* **Keep prompts specific** — Tell the agent exactly when to use each tool. Vague instructions lead to missed or incorrect tool calls.
* **Limit response size** — For voice agents, long scraped pages can overwhelm the LLM context. Use `onlyMainContent: true` in scrape options (or instruct the agent to summarize aggressively) to keep responses concise.
* **Tool call sounds** — In the webhook or MCP tool settings, you can configure a **Tool call sound** to play ambient audio while a tool runs. This signals to the user that the agent is working.
## Resources
* [ElevenAgents documentation](https://elevenlabs.io/docs/eleven-agents/overview)
* [ElevenAgents tools overview](https://elevenlabs.io/docs/eleven-agents/customization/tools)
* [ElevenAgents MCP integration](https://elevenlabs.io/docs/eleven-agents/customization/tools/mcp)
* [Firecrawl API reference](https://docs.firecrawl.dev/api-reference/v2-introduction)
* [Firecrawl MCP server](https://docs.firecrawl.dev/mcp-server)
# Gemini
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/gemini
Use Firecrawl with Google's Gemini AI for web scraping + AI workflows
Integrate Firecrawl with Google's Gemini for AI applications powered by web data.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js @google/genai
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
GEMINI_API_KEY=your_gemini_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using Gemini.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { GoogleGenAI } from '@google/genai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Summarize: ${scrapeResult.markdown}`,
});
console.log('Summary:', response.text);
```
## Content Analysis
This example shows how to analyze website content using Gemini's multi-turn conversation capabilities.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { GoogleGenAI } from '@google/genai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://news.ycombinator.com/', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const chat = ai.chats.create({
model: 'gemini-2.5-flash'
});
// Ask for the top 3 stories on Hacker News
const result1 = await chat.sendMessage({
message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${scrapeResult.markdown}`
});
console.log('Top 3 Stories:', result1.text);
// Ask for the 4th and 5th stories on Hacker News
const result2 = await chat.sendMessage({
message: `Now, what are the 4th and 5th top stories on Hacker News from the same content?`
});
console.log('4th and 5th Stories:', result2.text);
```
## Structured Extraction
This example demonstrates how to extract structured data using Gemini's JSON mode from scraped website content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { GoogleGenAI, Type } from '@google/genai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Extract company information: ${scrapeResult.markdown}`,
config: {
responseMimeType: 'application/json',
responseSchema: {
type: Type.OBJECT,
properties: {
name: { type: Type.STRING },
industry: { type: Type.STRING },
description: { type: Type.STRING },
products: {
type: Type.ARRAY,
items: { type: Type.STRING }
}
},
propertyOrdering: ['name', 'industry', 'description', 'products']
}
}
});
console.log('Extracted company info:', response?.text);
```
For more examples, check the [Gemini documentation](https://ai.google.dev/docs).
# Agent Development Kit (ADK)
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/google-adk
Integrate Firecrawl with Google's ADK using MCP for advanced agent workflows
Integrate Firecrawl with Google's Agent Development Kit (ADK) to build powerful AI agents with web scraping capabilities through the Model Context Protocol (MCP).
## Overview
Firecrawl provides an MCP server that seamlessly integrates with Google's ADK, enabling your agents to efficiently scrape, crawl, and extract structured data from any website. The integration supports both cloud-based and self-hosted Firecrawl instances with streamable HTTP for optimal performance.
## Features
* Efficient web scraping, crawling, and content discovery from any website
* Advanced search capabilities and intelligent content extraction
* Deep research and high-volume batch scraping
* Flexible deployment (cloud-based or self-hosted)
* Optimized for modern web environments with streamable HTTP support
## Prerequisites
* Obtain an API key for Firecrawl from [firecrawl.dev](https://firecrawl.dev)
* Install Google ADK
## Setup
```python Remote MCP Server theme={null}
from google.adk.agents.llm_agent import Agent
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
FIRECRAWL_API_KEY = "YOUR-API-KEY"
root_agent = Agent(
model="gemini-2.5-pro",
name="firecrawl_agent",
description='A helpful assistant for scraping websites with Firecrawl',
instruction='Help the user search for website content',
tools=[
MCPToolset(
connection_params=StreamableHTTPServerParams(
url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp",
),
)
],
)
```
```python Local MCP Server theme={null}
from google.adk.agents.llm_agent import Agent
from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from mcp import StdioServerParameters
root_agent = Agent(
model='gemini-2.5-pro',
name='firecrawl_agent',
description='A helpful assistant for scraping websites with Firecrawl',
instruction='Help the user search for website content',
tools=[
MCPToolset(
connection_params=StdioConnectionParams(
server_params = StdioServerParameters(
command='npx',
args=[
"-y",
"firecrawl-mcp",
],
env={
"FIRECRAWL_API_KEY": "YOUR-API-KEY",
}
),
timeout=30,
),
)
],
)
```
## Available Tools
| Tool | Name | Description |
| ------------------ | ------------------------------ | ------------------------------------------------------------------------------------ |
| Scrape Tool | `firecrawl_scrape` | Scrape content from a single URL with advanced options |
| Batch Scrape Tool | `firecrawl_batch_scrape` | Scrape multiple URLs efficiently with built-in rate limiting and parallel processing |
| Check Batch Status | `firecrawl_check_batch_status` | Check the status of a batch operation |
| Map Tool | `firecrawl_map` | Map a website to discover all indexed URLs on the site |
| Search Tool | `firecrawl_search` | Search the web and optionally extract content from search results |
| Crawl Tool | `firecrawl_crawl` | Start an asynchronous crawl with advanced options |
| Check Crawl Status | `firecrawl_check_crawl_status` | Check the status of a crawl job |
| Extract Tool | `firecrawl_extract` | Extract structured information from web pages using LLM capabilities |
## Configuration
### Required Configuration
**FIRECRAWL\_API\_KEY**: Your Firecrawl API key
* Required when using cloud API (default)
* Optional when using self-hosted instance with FIRECRAWL\_API\_URL
### Optional Configuration
**Firecrawl API URL (for self-hosted instances)**:
* `FIRECRAWL_API_URL`: Custom API endpoint
* Example: `https://firecrawl.your-domain.com`
* If not provided, the cloud API will be used
**Retry configuration**:
* `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum retry attempts (default: 3)
* `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds (default: 1000)
* `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds (default: 10000)
* `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2)
**Credit usage monitoring**:
* `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Warning threshold (default: 1000)
* `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Critical threshold (default: 100)
## Example: Web Research Agent
```python theme={null}
from google.adk.agents.llm_agent import Agent
from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
FIRECRAWL_API_KEY = "YOUR-API-KEY"
# Create a research agent
research_agent = Agent(
model="gemini-2.5-pro",
name="research_agent",
description='An AI agent that researches topics by scraping and analyzing web content',
instruction='''You are a research assistant. When given a topic or question:
1. Use the search tool to find relevant websites
2. Scrape the most relevant pages for detailed information
3. Extract structured data when needed
4. Provide comprehensive, well-sourced answers''',
tools=[
MCPToolset(
connection_params=StreamableHTTPServerParams(
url=f"https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp",
),
)
],
)
# Use the agent
response = research_agent.run("What are the latest features in Python 3.13?")
print(response)
```
## Best Practices
1. **Use the right tool for the job**:
* `firecrawl_search` when you need to find relevant pages first
* `firecrawl_scrape` for single pages
* `firecrawl_batch_scrape` for multiple known URLs
* `firecrawl_crawl` for discovering and scraping entire sites
2. **Monitor your usage**: Configure credit thresholds to avoid unexpected usage
3. **Handle errors gracefully**: Configure retry settings based on your use case
4. **Optimize performance**: Use batch operations when scraping multiple URLs
***
## Related Resources
Learn how to build powerful multi-agent AI systems using Google's ADK framework with Firecrawl for web scraping capabilities.
Learn more about Firecrawl's Model Context Protocol (MCP) server integration and capabilities.
Explore the official Google Agent Development Kit documentation for comprehensive guides and API references.
# LangChain
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langchain
Use Firecrawl with LangChain for web scraping + AI workflows
Integrate Firecrawl with LangChain to build AI applications powered by web data.
## Setup
```bash theme={null}
npm install @langchain/openai @mendable/firecrawl-js
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Chat
This example demonstrates a simple workflow: scrape a website and process the content using LangChain.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage } from '@langchain/core/messages';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const chat = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
});
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const response = await chat.invoke([
new HumanMessage(`Summarize: ${scrapeResult.markdown}`)
]);
console.log('Summary:', response.content);
```
## Chains
This example shows how to build a LangChain chain to process and analyze scraped content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const model = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
});
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
// Create processing chain
const prompt = ChatPromptTemplate.fromMessages([
['system', 'You are an expert at analyzing company websites.'],
['user', 'Extract the company name and main products from: {content}']
]);
const chain = prompt.pipe(model).pipe(new StringOutputParser());
// Execute the chain
const result = await chain.invoke({
content: scrapeResult.markdown
});
console.log('Chain result:', result);
```
## Tool Calling
This example demonstrates how to use LangChain's tool calling feature to let the model decide when to scrape websites.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { DynamicStructuredTool } from '@langchain/core/tools';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Create the scraping tool
const scrapeWebsiteTool = new DynamicStructuredTool({
name: 'scrape_website',
description: 'Scrape content from any website URL',
schema: z.object({
url: z.string().url().describe('The URL to scrape')
}),
func: async ({ url }) => {
console.log('Scraping:', url);
const result = await firecrawl.scrape(url, {
formats: ['markdown']
});
console.log('Scraped content preview:', result.markdown?.substring(0, 200) + '...');
return result.markdown || 'No content scraped';
}
});
const model = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
}).bindTools([scrapeWebsiteTool]);
const response = await model.invoke('What is Firecrawl? Visit firecrawl.dev and tell me about it.');
console.log('Response:', response.content);
console.log('Tool calls:', response.tool_calls);
```
## Structured Data Extraction
This example shows how to extract structured data using LangChain's structured output feature.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string(),
description: z.string(),
products: z.array(z.string())
});
const model = new ChatOpenAI({
model: 'gpt-5-nano',
apiKey: process.env.OPENAI_API_KEY
}).withStructuredOutput(CompanyInfoSchema);
const companyInfo = await model.invoke([
{
role: 'system',
content: 'Extract company information from website content.'
},
{
role: 'user',
content: `Extract data: ${scrapeResult.markdown}`
}
]);
console.log('Extracted company info:', companyInfo);
```
For more examples, check the [LangChain documentation](https://js.langchain.com/docs).
# LangGraph
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/langgraph
Integrate Firecrawl with LangGraph for building agent workflows
This guide shows how to integrate Firecrawl with LangGraph to build AI agent workflows that can scrape and process web content.
## Setup
```bash theme={null}
npm install @langchain/langgraph @langchain/openai @mendable/firecrawl-js
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Basic Workflow
This example demonstrates a basic LangGraph workflow that scrapes a website and analyzes the content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { StateGraph, MessagesAnnotation, START, END } from '@langchain/langgraph';
// Initialize Firecrawl
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Initialize LLM
const llm = new ChatOpenAI({
model: "gpt-5-nano",
apiKey: process.env.OPENAI_API_KEY
});
// Define the scrape node
async function scrapeNode(state: typeof MessagesAnnotation.State) {
console.log('Scraping...');
const result = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown'] });
return {
messages: [{
role: "system",
content: `Scraped content: ${result.markdown}`
}]
};
}
// Define the analyze node
async function analyzeNode(state: typeof MessagesAnnotation.State) {
console.log('Analyzing...');
const response = await llm.invoke(state.messages);
return { messages: [response] };
}
// Build the graph
const graph = new StateGraph(MessagesAnnotation)
.addNode("scrape", scrapeNode)
.addNode("analyze", analyzeNode)
.addEdge(START, "scrape")
.addEdge("scrape", "analyze")
.addEdge("analyze", END);
// Compile the graph
const app = graph.compile();
// Run the workflow
const result = await app.invoke({
messages: [{ role: "user", content: "Summarize the website" }]
});
console.log(JSON.stringify(result, null, 2));
```
## Multi-Step Workflow
This example demonstrates a more complex workflow that scrapes multiple URLs and processes them.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import { ChatOpenAI } from '@langchain/openai';
import { StateGraph, Annotation, START, END } from '@langchain/langgraph';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const llm = new ChatOpenAI({ model: "gpt-5-nano", apiKey: process.env.OPENAI_API_KEY });
// Define custom state
const WorkflowState = Annotation.Root({
urls: Annotation(),
scrapedData: Annotation>(),
summary: Annotation()
});
// Scrape multiple URLs
async function scrapeMultiple(state: typeof WorkflowState.State) {
const scrapedData = [];
for (const url of state.urls) {
const result = await firecrawl.scrape(url, { formats: ['markdown'] });
scrapedData.push({ url, content: result.markdown || '' });
}
return { scrapedData };
}
// Summarize all scraped content
async function summarizeAll(state: typeof WorkflowState.State) {
const combinedContent = state.scrapedData
.map(item => `Content from ${item.url}:\n${item.content}`)
.join('\n\n');
const response = await llm.invoke([
{ role: "user", content: `Summarize these websites:\n${combinedContent}` }
]);
return { summary: response.content as string };
}
// Build the workflow graph
const workflow = new StateGraph(WorkflowState)
.addNode("scrape", scrapeMultiple)
.addNode("summarize", summarizeAll)
.addEdge(START, "scrape")
.addEdge("scrape", "summarize")
.addEdge("summarize", END);
const app = workflow.compile();
// Execute workflow
const result = await app.invoke({
urls: ["https://firecrawl.dev", "https://firecrawl.dev/pricing"]
});
console.log(result.summary);
```
For more examples, check the [LangGraph documentation](https://langchain-ai.github.io/langgraphjs/).
# LlamaIndex
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/llamaindex
Use Firecrawl with LlamaIndex for RAG applications
Integrate Firecrawl with LlamaIndex to build AI applications with vector search and embeddings powered by web content.
## Setup
```bash theme={null}
npm install llamaindex @llamaindex/openai @mendable/firecrawl-js
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## RAG with Vector Search
This example demonstrates how to use LlamaIndex with Firecrawl to crawl a website, create embeddings, and query the content using RAG.
```typescript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
import { Document, VectorStoreIndex, Settings } from 'llamaindex';
import { OpenAI, OpenAIEmbedding } from '@llamaindex/openai';
Settings.llm = new OpenAI({ model: "gpt-4o" });
Settings.embedModel = new OpenAIEmbedding({ model: "text-embedding-3-small" });
const firecrawl = new Firecrawl({ apiKey: process.env.FIRECRAWL_API_KEY });
const crawlResult = await firecrawl.crawl('https://firecrawl.dev', {
limit: 10,
scrapeOptions: { formats: ['markdown'] }
});
console.log(`Crawled ${crawlResult.data.length } pages`);
const documents = crawlResult.data.map((page: any, i: number) =>
new Document({
text: page.markdown,
id_: `page-${i}`,
metadata: { url: page.metadata?.sourceURL }
})
);
const index = await VectorStoreIndex.fromDocuments(documents);
console.log('Vector index created with embeddings');
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({ query: 'What is Firecrawl and how does it work?' });
console.log('\nAnswer:', response.toString());
```
For more examples, check the [LlamaIndex documentation](https://ts.llamaindex.ai/).
# Mastra
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/mastra
Use Firecrawl with Mastra for building AI workflows
Integrate Firecrawl with Mastra, the TypeScript framework for building AI agents and workflows.
## Setup
```bash theme={null}
npm install @mastra/core @mendable/firecrawl-js zod
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Multi-Step Workflow
This example demonstrates a complete workflow that searches, scrapes, and summarizes documentation using Firecrawl and Mastra.
```typescript theme={null}
import { createWorkflow, createStep } from "@mastra/core/workflows";
import { z } from "zod";
import Firecrawl from "@mendable/firecrawl-js";
import { Agent } from "@mastra/core/agent";
const firecrawl = new Firecrawl({
apiKey: process.env.FIRECRAWL_API_KEY || "fc-YOUR_API_KEY"
});
const agent = new Agent({
name: "summarizer",
instructions: "You are a helpful assistant that creates concise summaries of documentation.",
model: "openai/gpt-5-nano",
});
// Step 1: Search with Firecrawl SDK
const searchStep = createStep({
id: "search",
inputSchema: z.object({
query: z.string(),
}),
outputSchema: z.object({
url: z.string(),
title: z.string(),
}),
execute: async ({ inputData }: { inputData: { query: string } }) => {
console.log(`Searching: ${inputData.query}`);
const searchResults = await firecrawl.search(inputData.query, { limit: 1 });
const webResults = (searchResults as any)?.web;
if (!webResults || !Array.isArray(webResults) || webResults.length === 0) {
throw new Error("No search results found");
}
const firstResult = webResults[0];
console.log(`Found: ${firstResult.title}`);
return {
url: firstResult.url,
title: firstResult.title,
};
},
});
// Step 2: Scrape the URL with Firecrawl SDK
const scrapeStep = createStep({
id: "scrape",
inputSchema: z.object({
url: z.string(),
title: z.string(),
}),
outputSchema: z.object({
markdown: z.string(),
title: z.string(),
}),
execute: async ({ inputData }: { inputData: { url: string; title: string } }) => {
console.log(`Scraping: ${inputData.url}`);
const scrapeResult = await firecrawl.scrape(inputData.url, {
formats: ["markdown"],
});
console.log(`Scraped: ${scrapeResult.markdown?.length || 0} characters`);
return {
markdown: scrapeResult.markdown || "",
title: inputData.title,
};
},
});
// Step 3: Summarize with Claude
const summarizeStep = createStep({
id: "summarize",
inputSchema: z.object({
markdown: z.string(),
title: z.string(),
}),
outputSchema: z.object({
summary: z.string(),
}),
execute: async ({ inputData }: { inputData: { markdown: string; title: string } }) => {
console.log(`Summarizing: ${inputData.title}`);
const prompt = `Summarize the following documentation in 2-3 sentences:\n\nTitle: ${inputData.title}\n\n${inputData.markdown}`;
const result = await agent.generate(prompt);
console.log(`Summary generated`);
return { summary: result.text };
},
});
// Create workflow
export const workflow = createWorkflow({
id: "firecrawl-workflow",
inputSchema: z.object({
query: z.string(),
}),
outputSchema: z.object({
summary: z.string(),
}),
steps: [searchStep, scrapeStep, summarizeStep],
})
.then(searchStep)
.then(scrapeStep)
.then(summarizeStep)
.commit();
async function testWorkflow() {
const run = await workflow.createRunAsync();
const result = await run.start({
inputData: { query: "Firecrawl documentation" }
});
if (result.status === "success") {
const { summarize } = result.steps;
if (summarize.status === "success") {
console.log(`\n${summarize.output.summary}`);
}
} else {
console.error("Workflow failed:", result.status);
}
}
testWorkflow().catch(console.error);
```
For more examples, check the [Mastra documentation](https://mastra.ai/docs).
# OpenAI
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/openai
Use Firecrawl with OpenAI for web scraping + AI workflows
Integrate Firecrawl with OpenAI to build AI applications powered by web data.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js openai zod
```
Create `.env` file:
```bash theme={null}
FIRECRAWL_API_KEY=your_firecrawl_key
OPENAI_API_KEY=your_openai_key
```
> **Note:** If using Node \< 20, install `dotenv` and add `import 'dotenv/config'` to your code.
## Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using an OpenAI model.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Scrape the website content
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
// Summarize with OpenAI model
const completion = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [
{ role: 'user', content: `Summarize: ${scrapeResult.markdown}` }
]
});
console.log('Summary:', completion.choices[0]?.message.content);
```
## Function Calling
This example shows how to use OpenAI's function calling feature to let the model decide when to scrape websites based on user requests.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const ScrapeArgsSchema = z.object({
url: z.string().describe('The URL of the website to scrape')
});
const tools = [{
type: 'function' as const,
function: {
name: 'scrape_website',
description: 'Scrape content from any website URL',
parameters: z.toJSONSchema(ScrapeArgsSchema)
}
}];
const response = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [{
role: 'user',
content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.'
}],
tools
});
const message = response.choices[0]?.message;
if (message?.tool_calls && message.tool_calls.length > 0) {
for (const toolCall of message.tool_calls) {
if (toolCall.type === 'function') {
console.log('Tool called:', toolCall.function.name);
const args = ScrapeArgsSchema.parse(JSON.parse(toolCall.function.arguments));
const result = await firecrawl.scrape(args.url, {
formats: ['markdown'] // Other formats: html, links, etc.
});
console.log('Scraped content:', result.markdown?.substring(0, 200) + '...');
// Send the scraped content back to the model for final response
const finalResponse = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [
{
role: 'user',
content: 'What is Firecrawl? Visit firecrawl.dev and tell me about it.'
},
message,
{
role: 'tool',
tool_call_id: toolCall.id,
content: result.markdown || 'No content scraped'
}
],
tools
});
console.log('Final response:', finalResponse.choices[0]?.message?.content);
}
}
} else {
console.log('Direct response:', message?.content);
}
```
## Structured Data Extraction
This example demonstrates how to use OpenAI models with structured outputs to extract specific data from scraped content.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const scrapeResult = await firecrawl.scrape('https://stripe.com', {
formats: ['markdown']
});
console.log('Scraped content length:', scrapeResult.markdown?.length);
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string(),
description: z.string(),
products: z.array(z.string())
});
const response = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [
{
role: 'system',
content: 'Extract company information from website content.'
},
{
role: 'user',
content: `Extract data: ${scrapeResult.markdown}`
}
],
response_format: {
type: 'json_schema',
json_schema: {
name: 'company_info',
schema: z.toJSONSchema(CompanyInfoSchema),
strict: true
}
}
});
const content = response.choices[0]?.message?.content;
const companyInfo = content ? CompanyInfoSchema.parse(JSON.parse(content)) : null;
console.log('Validated company info:', companyInfo);
```
## Search + Analyze
This example combines Firecrawl's search capabilities with OpenAI model analysis to find and summarize information from multiple sources.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Search for relevant information
const searchResult = await firecrawl.search('Next.js 16 new features', {
limit: 3,
sources: [{ type: 'web' }], // Other sources: { type: 'news' }, { type: 'images' }
scrapeOptions: { formats: ['markdown'] }
});
console.log('Search results:', searchResult.web?.length, 'pages found');
// Analyze and summarize the key features
const analysis = await openai.chat.completions.create({
model: 'gpt-5-nano',
messages: [{
role: 'user',
content: `Summarize the key features: ${JSON.stringify(searchResult)}`
}]
});
console.log('Analysis:', analysis.choices[0]?.message?.content);
```
## Responses API with MCP
This example shows how to use OpenAI's Responses API with Firecrawl configured as an MCP (Model Context Protocol) server.
```typescript theme={null}
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await openai.responses.create({
model: 'gpt-5-nano',
tools: [
{
type: 'mcp',
server_label: 'firecrawl',
server_description: 'A web search and scraping MCP server to scrape and extract content from websites.',
server_url: `https://mcp.firecrawl.dev/${process.env.FIRECRAWL_API_KEY}/v2/mcp`,
require_approval: 'never'
}
],
input: 'Find out what the top stories on Hacker News are and the latest blog post on OpenAI and summarize them in a bullet point format'
});
console.log('Response:', JSON.stringify(response.output, null, 2));
```
For more examples, check the [OpenAI documentation](https://platform.openai.com/docs).
# Vercel AI SDK
Source: https://docs.firecrawl.dev/developer-guides/llm-sdks-and-frameworks/vercel-ai-sdk
Firecrawl tools for Vercel AI SDK. Web scraping, search, interact, and crawling for AI applications.
Firecrawl tools for the Vercel AI SDK. Search, scrape, interact with pages, and crawl the web in your AI applications.
## Install
```bash theme={null}
npm install firecrawl-aisdk ai
```
Set environment variables:
```bash theme={null}
FIRECRAWL_API_KEY=fc-your-key # https://firecrawl.dev
AI_GATEWAY_API_KEY=your-key # https://vercel.com/ai-gateway
```
These examples use the [Vercel AI Gateway](https://vercel.com/ai-gateway) string model format, but Firecrawl tools work with any AI SDK provider. You can also use provider imports like `anthropic('claude-sonnet-4-5-20250514')` from `@ai-sdk/anthropic`.
## Quick Start
`FirecrawlTools()` bundles `search`, `scrape`, and `interact` by default.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { FirecrawlTools } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
tools: FirecrawlTools(),
stopWhen: stepCountIs(30),
prompt: `
1. Use interact on Hacker News to identify the top story
2. Search for other perspectives on the same topic
3. Scrape the most relevant pages you found
4. Summarize everything you found
`,
});
```
## FirecrawlTools
`FirecrawlTools()` gives you the default tools plus an auto-generated `systemPrompt` you can pass to `generateText`.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { FirecrawlTools } from 'firecrawl-aisdk';
const tools = FirecrawlTools();
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
system: `${tools.systemPrompt}\n\nAnswer with citations when possible.`,
tools,
stopWhen: stepCountIs(20),
prompt: 'Find the current Firecrawl pricing page and explain the available plans.',
});
```
You can customize defaults, opt into async tools, or disable individual tools:
```typescript theme={null}
const tools = FirecrawlTools({
search: { limit: 5 },
scrape: { formats: ['markdown'], onlyMainContent: true },
interact: { profile: { name: 'my-session', saveChanges: true } },
crawl: true,
agent: true,
});
```
```typescript theme={null}
// Disable interact, keep search + scrape
FirecrawlTools({ interact: false });
// Opt into deprecated browser compatibility
FirecrawlTools({ browser: {} });
// Include every available tool
FirecrawlTools({ all: true });
```
When scraping to answer a question about a page, prefer query format:
```typescript theme={null}
formats: [{ type: 'query', prompt: 'What does this page say about pricing and rate limits?' }]
```
Use `formats: ['markdown']` only when you need the full page content.
## Individual Tools
Every tool can be used directly or called with options:
```typescript theme={null}
import { generateText } from 'ai';
import { scrape, search } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Search for Firecrawl, then scrape the most relevant result.',
tools: { search, scrape },
});
const customScrape = scrape({ apiKey: 'fc-custom-key', apiUrl: 'https://api.firecrawl.dev' });
```
### Search + Scrape
```typescript theme={null}
import { generateText } from 'ai';
import { search, scrape } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Search for Firecrawl, scrape the top official result, and explain what it does.',
tools: { search, scrape },
});
```
### Map
```typescript theme={null}
import { generateText } from 'ai';
import { map } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Map https://docs.firecrawl.dev and list the main sections.',
tools: { map },
});
```
### Stream
```typescript theme={null}
import { streamText, stepCountIs } from 'ai';
import { scrape } from 'firecrawl-aisdk';
const result = streamText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'What is the first 100 words of firecrawl.dev?',
tools: { scrape },
stopWhen: stepCountIs(3),
});
for await (const chunk of result.textStream) {
process.stdout.write(chunk);
}
await result.fullStream;
```
## Interact
`interact()` creates a scrape-backed interactive session. Call `start(url)` to bootstrap a session and get a live view URL, then let the model reuse that session through the `interact` tool.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { interact, search } from 'firecrawl-aisdk';
const interactTool = interact();
console.log('Live view:', await interactTool.start('https://news.ycombinator.com'));
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
tools: { interact: interactTool, search },
stopWhen: stepCountIs(25),
prompt: 'Use interact on the current Hacker News session, find the top story, then search for more context.',
});
await interactTool.close();
```
If you need the explicit live view URL after startup, use `interactTool.interactiveLiveViewUrl`.
Reuse browser state across sessions with profiles:
```typescript theme={null}
const interactTool = interact({
profile: { name: 'my-session', saveChanges: true },
});
```
`browser()` is deprecated. Prefer `interact()`.
## Async Tools
Crawl, batch scrape, and agent return a job ID. Pair them with `poll`.
### Crawl
```typescript theme={null}
import { generateText } from 'ai';
import { crawl, poll } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Crawl https://docs.firecrawl.dev (limit 3 pages) and summarize.',
tools: { crawl, poll },
});
```
### Batch Scrape
```typescript theme={null}
import { generateText } from 'ai';
import { batchScrape, poll } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Scrape https://firecrawl.dev and https://docs.firecrawl.dev, then compare them.',
tools: { batchScrape, poll },
});
```
### Agent
Autonomous web data gathering that searches, navigates, and extracts on its own.
```typescript theme={null}
import { generateText, stepCountIs } from 'ai';
import { agent, poll } from 'firecrawl-aisdk';
const { text } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Find the founders of Firecrawl, their roles, and their backgrounds.',
tools: { agent, poll },
stopWhen: stepCountIs(10),
});
```
## Logging
```typescript theme={null}
import { generateText } from 'ai';
import { logStep, scrape, stepLogger } from 'firecrawl-aisdk';
const logger = stepLogger();
const { text, usage } = await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Scrape https://firecrawl.dev and summarize it.',
tools: { scrape },
onStepFinish: logger.onStep,
experimental_onToolCallFinish: logger.onToolCallFinish,
});
logger.close();
logger.summary(usage);
await generateText({
model: 'anthropic/claude-sonnet-4-5',
prompt: 'Scrape https://firecrawl.dev and summarize it again.',
tools: { scrape },
onStepFinish: logStep,
});
```
## All Exports
```typescript theme={null}
import {
// Core tools
search, // Search the web
scrape, // Scrape a single URL
map, // Discover URLs on a site
crawl, // Crawl multiple pages (async, use poll)
batchScrape, // Scrape multiple URLs (async, use poll)
agent, // Autonomous web research (async, use poll)
// Job management
poll, // Poll async jobs for results
status, // Check job status
cancel, // Cancel running jobs
// Browser/session tools
interact, // interact({ profile: { name: '...' } })
browser, // deprecated compatibility export
// All-in-one bundle
FirecrawlTools, // FirecrawlTools({ search, scrape, interact, crawl, agent })
// Helpers
stepLogger, // Token stats per tool call
logStep, // Simple one-liner logging
} from 'firecrawl-aisdk';
```
# Agent
Source: https://docs.firecrawl.dev/features/agent
Gather data wherever it lives on the web.
Firecrawl `/agent` is a magic API that searches, navigates, and gathers data from the widest range of websites, finding data in hard-to-reach places and uncovering data in ways no other API can. It accomplishes in a few minutes what would take a human many hours — end-to-end data collection, without scripts or manual work.
Whether you need one data point or entire datasets at scale, Firecrawl `/agent` works to get your data.
**Think of `/agent` as deep research for data, wherever it is!**
**Research Preview**: Agent is in early access. Expect rough edges. It will get significantly better over time. [Share feedback →](mailto:product@firecrawl.com)
Agent builds on everything great about `/extract` and takes it further:
* **No URLs Required**: Just describe what you need via `prompt` parameter. URLs are optional
* **Deep Web Search**: Autonomously searches and navigates deep into sites to find your data
* **Reliable and Accurate**: Works with a wide variety of queries and use cases
* **Faster**: Processes multiple sources in parallel for quicker results
Test the agent in the interactive playground — no code required.
## Using `/agent`
The only required parameter is `prompt`. Simply describe what data you want to extract. For structured output, provide a JSON schema. The SDKs support Pydantic (Python) and Zod (Node) for type-safe schema definitions:
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
role: Optional[str] = Field(None, description="Role or position")
background: Optional[str] = Field(None, description="Professional background")
class FoundersSchema(BaseModel):
founders: List[Founder] = Field(description="List of founders")
result = app.agent(
prompt="Find the founders of Firecrawl",
schema=FoundersSchema,
model="spark-1-mini",
max_credits=100
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
import { z } from 'zod';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
const result = await firecrawl.agent({
prompt: "Find the founders of Firecrawl",
schema: z.object({
founders: z.array(z.object({
name: z.string().describe("Full name of the founder"),
role: z.string().describe("Role or position").optional(),
background: z.string().describe("Professional background").optional()
})).describe("List of founders")
}),
model: "spark-1-mini",
maxCredits: 100
});
console.log(result.data);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the founders of Firecrawl",
"model": "spark-1-mini",
"maxCredits": 100,
"schema": {
"type": "object",
"properties": {
"founders": {
"type": "array",
"description": "List of founders",
"items": {
"type": "object",
"properties": {
"name": { "type": "string", "description": "Full name" },
"role": { "type": "string", "description": "Role or position" },
"background": { "type": "string", "description": "Professional background" }
},
"required": ["name"]
}
}
},
"required": ["founders"]
}
}'
```
### Response
```json JSON theme={null}
{
"success": true,
"status": "completed",
"data": {
"founders": [
{
"name": "Eric Ciarla",
"role": "Co-founder",
"background": "Previously at Mendable"
},
{
"name": "Nicolas Camara",
"role": "Co-founder",
"background": "Previously at Mendable"
},
{
"name": "Caleb Peffer",
"role": "Co-founder",
"background": "Previously at Mendable"
}
]
},
"expiresAt": "2024-12-15T00:00:00.000Z",
"creditsUsed": 15
}
```
## Providing URLs (Optional)
You can optionally provide URLs to focus the agent on specific pages:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.agent(
urls=["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
prompt="Compare the features and pricing information from these pages"
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
const result = await firecrawl.agent({
urls: ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
prompt: "Compare the features and pricing information from these pages"
});
console.log(result.data);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing"
],
"prompt": "Compare the features and pricing information from these pages"
}'
```
## Job Status and Completion
Agent jobs run asynchronously. When you submit a job, you'll receive a Job ID that you can use to check status:
* **Default method**: `agent()` waits and returns final results
* **Start then poll**: Use `start_agent` (Python) or `startAgent` (Node) to get a Job ID immediately, then poll with `get_agent_status` / `getAgentStatus`
Job results are available via the API for 24 hours after completion. After this period, you can still view your agent history and results in the [activity logs](https://www.firecrawl.dev/app/logs).
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Start an agent job
agent_job = app.start_agent(
prompt="Find the founders of Firecrawl"
)
# Check the status
status = app.get_agent_status(agent_job.id)
print(status)
# Example output:
# status='completed'
# success=True
# data={ ... }
# expires_at=datetime.datetime(...)
# credits_used=15
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Start an agent job
const started = await firecrawl.startAgent({
prompt: "Find the founders of Firecrawl"
});
// Check the status
if (started.id) {
const status = await firecrawl.getAgentStatus(started.id);
console.log(status.status, status.data);
}
```
```bash cURL theme={null}
curl -X GET "https://api.firecrawl.dev/v2/agent/" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
### Possible States
| Status | Description |
| ------------ | ------------------------------------------ |
| `processing` | The agent is still working on your request |
| `completed` | Extraction finished successfully |
| `failed` | An error occurred during extraction |
| `cancelled` | The job was cancelled by the user |
#### Pending Example
```json JSON theme={null}
{
"success": true,
"status": "processing",
"expiresAt": "2024-12-15T00:00:00.000Z"
}
```
#### Completed Example
```json JSON theme={null}
{
"success": true,
"status": "completed",
"data": {
"founders": [
{
"name": "Eric Ciarla",
"role": "Co-founder"
},
{
"name": "Nicolas Camara",
"role": "Co-founder"
},
{
"name": "Caleb Peffer",
"role": "Co-founder"
}
]
},
"expiresAt": "2024-12-15T00:00:00.000Z",
"creditsUsed": 15
}
```
## Share agent runs
You can share agent runs directly from the Agent playground. Shared links are public — anyone with the link can view the run output and activity — and you can revoke access at any time to disable the link. Shared pages are not indexed by search engines.
## Model Selection
Firecrawl Agent offers two models. **Spark 1 Mini is 60% cheaper** and is the default — perfect for most use cases. Upgrade to Spark 1 Pro when you need maximum accuracy on complex tasks.
| Model | Cost | Accuracy | Best For |
| -------------- | --------------- | -------- | ------------------------------------- |
| `spark-1-mini` | **60% cheaper** | Standard | Most tasks (default) |
| `spark-1-pro` | Standard | Higher | Complex research, critical extraction |
**Start with Spark 1 Mini** (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical.
### Spark 1 Mini (Default)
`spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks.
**Use Mini when:**
* Extracting simple data points (contact info, pricing, etc.)
* Working with well-structured websites
* Cost efficiency is a priority
* Running high-volume extraction jobs
### Spark 1 Pro
`spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks.
**Use Pro when:**
* Performing complex competitive analysis
* Extracting data that requires deep reasoning
* Accuracy is critical for your use case
* Dealing with ambiguous or hard-to-find data
### Specifying a Model
Pass the `model` parameter to select which model to use:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Using Spark 1 Mini (default - can be omitted)
result = app.agent(
prompt="Find the pricing of Firecrawl",
model="spark-1-mini"
)
# Using Spark 1 Pro for complex tasks
result = app.agent(
prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model="spark-1-pro"
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Using Spark 1 Mini (default - can be omitted)
const result = await firecrawl.agent({
prompt: "Find the pricing of Firecrawl",
model: "spark-1-mini"
});
// Using Spark 1 Pro for complex tasks
const resultPro = await firecrawl.agent({
prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model: "spark-1-pro"
});
console.log(result.data);
```
```bash cURL theme={null}
# Using Spark 1 Mini (default)
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the pricing of Firecrawl",
"model": "spark-1-mini"
}'
# Using Spark 1 Pro for complex tasks
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
"model": "spark-1-pro"
}'
```
## Parameters
| Parameter | Type | Required | Description |
| ------------ | ------ | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `prompt` | string | **Yes** | Natural language description of the data you want to extract (max 10,000 characters) |
| `model` | string | No | Model to use: `spark-1-mini` (default) or `spark-1-pro` |
| `urls` | array | No | Optional list of URLs to focus the extraction |
| `schema` | object | No | Optional JSON schema for structured output |
| `maxCredits` | number | No | Maximum number of credits to spend on this agent task. Defaults to **2,500** if not set. The dashboard supports values up to **2,500**; for higher limits, set `maxCredits` via the API (values above 2,500 are always treated as paid requests). If the limit is reached, the job fails and **no data is returned**. Failed runs are not billed: credits used for AI reasoning are never charged on failure, any credits used for tool calls during the run (scraping, search, mapping, etc.) are refunded, and the response reports `creditsUsed: 0`. |
## Agent vs Extract: What's Improved
| Feature | Agent (New) | Extract |
| ----------------- | ----------- | -------- |
| URLs Required | No | Yes |
| Speed | Faster | Standard |
| Cost | Lower | Standard |
| Reliability | Higher | Standard |
| Query Flexibility | High | Moderate |
## Example Use Cases
* **Research**: "Find the top 5 AI startups and their funding amounts"
* **Competitive Analysis**: "Compare pricing plans between Slack and Microsoft Teams"
* **Data Gathering**: "Extract contact information from company websites"
* **Content Summarization**: "Summarize the latest blog posts about web scraping"
## CSV Upload in Agent Playground
The [Agent Playground](https://www.firecrawl.dev/app/agent) supports CSV upload for batch processing. Your CSV can contain one or more columns of input data. For example, a single column of company names, or multiple columns such as company name, product, and website URL. Each row represents one item for the agent to process.
Upload your CSV, then add output columns using the "+" button in the grid header. Each column has its own prompt — click a column header to describe what the agent should find for that field (e.g., "CEO or founder name", "Total funding raised"). Hit Run, and the agent processes each row in parallel, filling in the results.
## API Reference
Check out the [Agent API Reference](/api-reference/endpoint/agent) for more details.
Have feedback or need help? Email [help@firecrawl.com](mailto:help@firecrawl.com).
## Pricing
Firecrawl Agent uses **dynamic billing** that scales with the complexity of your data extraction request. You pay based on the actual work Agent performs, ensuring fair pricing whether you're extracting simple data points or complex structured information from multiple sources.
### How Agent pricing works
Agent pricing is **dynamic and credit-based** during Research Preview:
* **Simple extractions** (like contact info from a single page) typically use fewer credits and cost less
* **Complex research tasks** (like competitive analysis across multiple domains) use more credits but reflect the total effort involved
* **Transparent usage** shows you exactly how many credits each request consumed
* **Credit conversion** automatically converts agent credit usage to credits for easy billing
Credit usage varies based on the complexity of your prompt, the amount of data processed, and the structure of the output requested. As a rough guide, most agent runs consume **a few hundred credits**, though simpler single-page tasks may use less and complex multi-domain research may use more.
### Parallel Agents Pricing
If you are running multiple agents in parallel with Spark-1 Fast, pricing is a lot more predictable at 10 credits per cell.
### Getting started
**All users** receive **5 free daily runs**, which can be used from either the playground or the API, to explore Agent's capabilities without any cost.
Additional usage is billed based on credit consumption and converted to credits.
### Managing costs
Agent can be expensive, but there are some ways to decrease the cost:
* **Start with free runs**: Use your 5 daily free requests to understand pricing
* **Set a `maxCredits` parameter**: Limit your spending by setting a maximum number of credits you're willing to spend. The dashboard caps this at 2,500 credits; to set a higher limit, use the `maxCredits` parameter directly via the API (note: values above 2,500 are always billed as paid requests)
* **Optimize prompts**: More specific prompts often use fewer credits
* **Break large tasks into smaller runs**: A single agent run has an output ceiling based on the underlying model's generation capacity (\~150-200 rows of structured data). For large extraction jobs, split by category, region, or URL batch (3-5 URLs per run) and merge the results. This also keeps each run well under the `maxCredits` limit.
* **Monitor usage**: Track your consumption through the dashboard
* **Set expectations**: Complex multi-domain research will use more credits than simple single-page extractions
Try Agent now at [firecrawl.dev/app/agent](https://www.firecrawl.dev/app/agent) to see how credit usage scales with your specific use cases.
Pricing is subject to change as we move from Research Preview to general availability. Current users will receive advance notice of any pricing updates.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Batch Scrape
Source: https://docs.firecrawl.dev/features/batch-scrape
Scrape multiple URLs in a single batch job
Batch scrape lets you scrape multiple URLs in a single job. Pass a list of URLs and optional parameters, and Firecrawl processes them concurrently and returns all results together.
* Works like `/crawl` but for an explicit list of URLs
* Synchronous and asynchronous modes
* Supports all scrape options including structured extraction
* Configurable concurrency per job
## How it works
You can run a batch scrape in two ways:
| Mode | SDK method (JS / Python) | Behavior |
| ------------ | ----------------------------------------- | ---------------------------------------------------------------- |
| Synchronous | `batchScrape` / `batch_scrape` | Starts the batch and waits for completion, returning all results |
| Asynchronous | `startBatchScrape` / `start_batch_scrape` | Starts the batch and returns a job ID for polling or webhooks |
## Basic usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
start = firecrawl.start_batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"]) # returns id
job = firecrawl.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)
print(job.status, job.completed, job.total)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Start a batch scrape job
const { id } = await firecrawl.startBatchScrape([
'https://firecrawl.dev',
'https://docs.firecrawl.dev'
], {
options: { formats: ['markdown'] },
});
// Wait for completion
const job = await firecrawl.batchScrape([
'https://firecrawl.dev',
'https://docs.firecrawl.dev'
], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 });
console.log(job.status, job.completed, job.total);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"],
"formats": ["markdown"]
}'
```
### Response
Calling `batchScrape` / `batch_scrape` returns the full results when the batch completes.
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
Calling `startBatchScrape` / `start_batch_scrape` returns a job ID you can track via `getBatchScrapeStatus` / `get_batch_scrape_status`, the API endpoint `/batch/scrape/{id}`, or webhooks. Job results are available via the API for 24 hours after completion. After this period, you can still view your batch scrape history and results in the [activity logs](https://www.firecrawl.dev/app/logs).
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
```
## Concurrency
By default, a batch scrape job uses your team's full concurrent browser limit (see [Rate Limits](/rate-limits)). You can lower this per job with the `maxConcurrency` parameter.
For example, `maxConcurrency: 50` limits that job to 50 simultaneous scrapes. Setting this value too low on large batches will significantly slow down processing, so only reduce it if you need to leave capacity for other concurrent jobs.
## Structured extraction
You can use batch scrape to extract structured data from every page in the batch. This is useful when you want the same schema applied to a list of URLs.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Scrape multiple websites:
batch_scrape_result = firecrawl.batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_result)
# Or, you can use the start method:
batch_scrape_job = firecrawl.start_batch_scrape(
['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'],
formats=[{
'type': 'json',
'prompt': 'Extract the title and description from the page.',
'schema': {
'type': 'object',
'properties': {
'title': {'type': 'string'},
'description': {'type': 'string'}
},
'required': ['title', 'description']
}
}]
)
print(batch_scrape_job)
# You can then use the job ID to check the status of the batch scrape:
batch_scrape_status = firecrawl.get_batch_scrape_status(batch_scrape_job.id)
print(batch_scrape_status)
```
```js Node theme={null}
import Firecrawl, { ScrapeResponse } from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
// Define schema to extract contents into
const schema = {
type: "object",
properties: {
title: { type: "string" },
description: { type: "string" }
},
required: ["title", "description"]
};
// Scrape multiple websites (synchronous):
const batchScrapeResult = await firecrawl.batchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], {
formats: [
{
type: "json",
prompt: "Extract the title and description from the page.",
schema: schema
}
]
});
// Output all the results of the batch scrape:
console.log(batchScrapeResult)
// Or, you can use the start method:
const batchScrapeJob = await firecrawl.startBatchScrape(['https://docs.firecrawl.dev', 'https://docs.firecrawl.dev/sdks/overview'], {
formats: [
{
type: "json",
prompt: "Extract the title and description from the page.",
schema: schema
}
]
});
console.log(batchScrapeJob)
// You can then use the job ID to check the status of the batch scrape:
const batchScrapeStatus = await firecrawl.getBatchScrapeStatus(batchScrapeJob.id);
console.log(batchScrapeStatus)
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": ["https://docs.firecrawl.dev", "https://docs.firecrawl.dev/sdks/overview"],
"formats" : [{
"type": "json",
"prompt": "Extract the title and description from the page.",
"schema": {
"type": "object",
"properties": {
"title": {
"type": "string"
},
"description": {
"type": "string"
}
},
"required": [
"title",
"description"
]
}
}]
}'
```
### Response
`batchScrape` / `batch_scrape` returns full results:
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"json": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot."
}
},
...
]
}
```
`startBatchScrape` / `start_batch_scrape` returns a job ID:
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
```
## Webhooks
You can configure webhooks to receive real-time notifications as each URL in your batch is scraped. This lets you process results immediately instead of waiting for the entire batch to complete.
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/batch/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"urls": [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
```
### Event types
| Event | Description |
| ------------------------ | ------------------------------------- |
| `batch_scrape.started` | The batch scrape job has begun |
| `batch_scrape.page` | A single URL was successfully scraped |
| `batch_scrape.completed` | All URLs have been processed |
| `batch_scrape.failed` | The batch scrape encountered an error |
### Payload
Each webhook delivery includes a JSON body with the following structure:
```json theme={null}
{
"success": true,
"type": "batch_scrape.page",
"id": "batch-job-id",
"data": [...],
"metadata": {},
"error": null
}
```
### Verifying webhook signatures
Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with.
1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings
2. Extract the signature from the `X-Firecrawl-Signature` header
3. Compute HMAC-SHA256 of the raw request body using your secret
4. Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...`
For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security).
For comprehensive webhook documentation including detailed event payloads, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview).
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Change Tracking
Source: https://docs.firecrawl.dev/features/change-tracking
Detect and monitor changes in web content between scrapes
Change tracking compares the current content of a page against the last time you scraped it. Add `changeTracking` to your `formats` array to detect whether a page is new, unchanged, or modified, and optionally get a structured diff of what changed.
* Works with `/scrape`, `/crawl`, and `/batch/scrape`
* Two diff modes: `git-diff` for line-level changes, `json` for field-level comparison
* Scoped to your team, and optionally scoped to a tag that you pass in
## How it works
Every scrape with `changeTracking` enabled stores a snapshot and compares it against the previous snapshot for that URL. Snapshots are stored persistently and do not expire, so comparisons remain accurate regardless of how much time has passed between scrapes.
| Scrape | Result |
| ----------------- | -------------------------------------------------- |
| First time | `changeStatus: "new"` (no previous version exists) |
| Content unchanged | `changeStatus: "same"` |
| Content modified | `changeStatus: "changed"` (diff data available) |
| Page removed | `changeStatus: "removed"` |
The response includes these fields in the `changeTracking` object:
| Field | Type | Description |
| ------------------ | --------------------- | ---------------------------------------------------------------------------------------------- |
| `previousScrapeAt` | `string \| null` | Timestamp of the previous scrape (`null` on first scrape) |
| `changeStatus` | `string` | `"new"`, `"same"`, `"changed"`, or `"removed"` |
| `visibility` | `string` | `"visible"` (discoverable via links/sitemap) or `"hidden"` (URL works but is no longer linked) |
| `diff` | `object \| undefined` | Line-level diff (only present in `git-diff` mode when status is `"changed"`) |
| `json` | `object \| undefined` | Field-level comparison (only present in `json` mode when status is `"changed"`) |
## Basic usage
Include both `markdown` and `changeTracking` in the `formats` array. The `markdown` format is required because change tracking compares pages via their markdown content.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
result = firecrawl.scrape(
"https://example.com/pricing",
formats=["markdown", "changeTracking"]
)
print(result.changeTracking)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const result = await firecrawl.scrape('https://example.com/pricing', {
formats: ['markdown', 'changeTracking']
});
console.log(result.changeTracking);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/pricing",
"formats": ["markdown", "changeTracking"]
}'
```
### Response
On the first scrape, `changeStatus` is `"new"` and `previousScrapeAt` is `null`:
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Pricing\n\nStarter: $9/mo\nPro: $29/mo...",
"changeTracking": {
"previousScrapeAt": null,
"changeStatus": "new",
"visibility": "visible"
}
}
}
```
On subsequent scrapes, `changeStatus` reflects whether content changed:
```json theme={null}
{
"success": true,
"data": {
"markdown": "# Pricing\n\nStarter: $12/mo\nPro: $39/mo...",
"changeTracking": {
"previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible"
}
}
}
```
## Git-diff mode
The `git-diff` mode returns line-by-line changes in a format similar to `git diff`. Pass an object in the `formats` array with `modes: ["git-diff"]`:
```python Python theme={null}
result = firecrawl.scrape(
"https://example.com/pricing",
formats=[
"markdown",
{
"type": "changeTracking",
"modes": ["git-diff"]
}
]
)
if result.changeTracking.changeStatus == "changed":
print(result.changeTracking.diff.text)
```
```js Node theme={null}
const result = await firecrawl.scrape('https://example.com/pricing', {
formats: [
'markdown',
{ type: 'changeTracking', modes: ['git-diff'] }
]
});
if (result.changeTracking.changeStatus === 'changed') {
console.log(result.changeTracking.diff.text);
}
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/pricing",
"formats": [
"markdown",
{ "type": "changeTracking", "modes": ["git-diff"] }
]
}'
```
### Response
The `diff` object contains both a plain-text diff and a structured JSON representation:
```json theme={null}
{
"changeTracking": {
"previousScrapeAt": "2025-06-01T10:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible",
"diff": {
"text": "@@ -1,3 +1,3 @@\n # Pricing\n-Starter: $9/mo\n-Pro: $29/mo\n+Starter: $12/mo\n+Pro: $39/mo",
"json": {
"files": [{
"chunks": [{
"content": "@@ -1,3 +1,3 @@",
"changes": [
{ "type": "normal", "content": "# Pricing" },
{ "type": "del", "ln": 2, "content": "Starter: $9/mo" },
{ "type": "del", "ln": 3, "content": "Pro: $29/mo" },
{ "type": "add", "ln": 2, "content": "Starter: $12/mo" },
{ "type": "add", "ln": 3, "content": "Pro: $39/mo" }
]
}]
}]
}
}
}
}
```
The structured `diff.json` object contains:
* `files`: array of changed files (typically one for web pages)
* `chunks`: sections of changes within a file
* `changes`: individual line changes with `type` (`"add"`, `"del"`, or `"normal"`), line number (`ln`), and `content`
## JSON mode
The `json` mode extracts specific fields from both the current and previous version of the page using a schema you define. This is useful for tracking changes in structured data like prices, stock levels, or metadata without parsing a full diff.
Pass `modes: ["json"]` with a `schema` defining the fields to extract:
```python Python theme={null}
result = firecrawl.scrape(
"https://example.com/product/widget",
formats=[
"markdown",
{
"type": "changeTracking",
"modes": ["json"],
"schema": {
"type": "object",
"properties": {
"price": { "type": "string" },
"availability": { "type": "string" }
}
}
}
]
)
if result.changeTracking.changeStatus == "changed":
changes = result.changeTracking.json
print(f"Price: {changes['price']['previous']} → {changes['price']['current']}")
```
```js Node theme={null}
const result = await firecrawl.scrape('https://example.com/product/widget', {
formats: [
'markdown',
{
type: 'changeTracking',
modes: ['json'],
schema: {
type: 'object',
properties: {
price: { type: 'string' },
availability: { type: 'string' }
}
}
}
]
});
if (result.changeTracking.changeStatus === 'changed') {
const changes = result.changeTracking.json;
console.log(`Price: ${changes.price.previous} → ${changes.price.current}`);
}
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/widget",
"formats": [
"markdown",
{
"type": "changeTracking",
"modes": ["json"],
"schema": {
"type": "object",
"properties": {
"price": { "type": "string" },
"availability": { "type": "string" }
}
}
}
]
}'
```
### Response
Each field in the schema is returned with `previous` and `current` values:
```json theme={null}
{
"changeTracking": {
"previousScrapeAt": "2025-06-05T08:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible",
"json": {
"price": {
"previous": "$19.99",
"current": "$24.99"
},
"availability": {
"previous": "In Stock",
"current": "In Stock"
}
}
}
}
```
You can also pass an optional `prompt` to guide the LLM extraction alongside the schema.
JSON mode uses LLM extraction and costs **5 credits per page**. Basic change tracking and `git-diff` mode have no additional cost.
## Using tags
By default, change tracking compares against the most recent scrape of the same URL scraped by your team. Tags let you maintain **separate tracking histories** for the same URL, which is useful when you monitor the same page at different intervals or in different contexts.
```python Python theme={null}
# Hourly monitoring (compared against last "hourly" scrape)
result = firecrawl.scrape(
"https://example.com/pricing",
formats=[
"markdown",
{ "type": "changeTracking", "tag": "hourly" }
]
)
# Daily summary (compared against last "daily" scrape)
result = firecrawl.scrape(
"https://example.com/pricing",
formats=[
"markdown",
{ "type": "changeTracking", "tag": "daily" }
]
)
```
```js Node theme={null}
// Hourly monitoring (compared against last "hourly" scrape)
const result = await firecrawl.scrape('https://example.com/pricing', {
formats: [
'markdown',
{ type: 'changeTracking', tag: 'hourly' }
]
});
// Daily summary (compared against last "daily" scrape)
const result2 = await firecrawl.scrape('https://example.com/pricing', {
formats: [
'markdown',
{ type: 'changeTracking', tag: 'daily' }
]
});
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/pricing",
"formats": [
"markdown",
{ "type": "changeTracking", "tag": "hourly" }
]
}'
```
## Crawl with change tracking
Add change tracking to crawl operations to monitor an entire site for changes. Pass the `changeTracking` format inside `scrapeOptions`:
```python Python theme={null}
result = firecrawl.crawl(
"https://example.com",
limit=50,
scrape_options={
"formats": ["markdown", "changeTracking"]
}
)
for page in result.data:
status = page.changeTracking.changeStatus
url = page.metadata.url
print(f"{url}: {status}")
```
```js Node theme={null}
const result = await firecrawl.crawl('https://example.com', {
limit: 50,
scrapeOptions: {
formats: ['markdown', 'changeTracking']
}
});
for (const page of result.data) {
console.log(`${page.metadata.url}: ${page.changeTracking.changeStatus}`);
}
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"scrapeOptions": {
"formats": ["markdown", "changeTracking"]
}
}'
```
## Batch scrape with change tracking
Use [batch scrape](/features/batch-scrape) to monitor a specific set of URLs:
```python Python theme={null}
result = firecrawl.batch_scrape(
[
"https://example.com/pricing",
"https://example.com/product/widget",
"https://example.com/blog/latest"
],
formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"]}]
)
```
```js Node theme={null}
const result = await firecrawl.batchScrape([
'https://example.com/pricing',
'https://example.com/product/widget',
'https://example.com/blog/latest'
], {
options: {
formats: ['markdown', { type: 'changeTracking', modes: ['git-diff'] }]
}
});
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example.com/pricing",
"https://example.com/product/widget",
"https://example.com/blog/latest"
],
"formats": [
"markdown",
{ "type": "changeTracking", "modes": ["git-diff"] }
]
}'
```
## Scheduling change tracking
Change tracking is most useful when you scrape on a regular schedule. You can automate this with cron, cloud schedulers, or workflow tools.
### Cron job
Create a script that scrapes a URL and alerts on changes:
```bash check-pricing.sh theme={null}
#!/bin/bash
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://competitor.com/pricing",
"formats": [
"markdown",
{
"type": "changeTracking",
"modes": ["json"],
"schema": {
"type": "object",
"properties": {
"starter_price": { "type": "string" },
"pro_price": { "type": "string" }
}
}
}
]
}')
STATUS=$(echo "$RESPONSE" | jq -r '.data.changeTracking.changeStatus')
if [ "$STATUS" = "changed" ]; then
echo "$RESPONSE" | jq '.data.changeTracking.json'
# Send alert via email, Slack, etc.
fi
```
Schedule it with `crontab -e`:
```bash theme={null}
0 */6 * * * /path/to/check-pricing.sh >> /var/log/price-monitor.log 2>&1
```
| Schedule | Expression |
| ------------------------ | ------------- |
| Every hour | `0 * * * *` |
| Every 6 hours | `0 */6 * * *` |
| Daily at 9 AM | `0 9 * * *` |
| Weekly on Monday at 8 AM | `0 8 * * 1` |
### Cloud and serverless schedulers
* **AWS**: EventBridge rule triggering a Lambda function
* **GCP**: Cloud Scheduler triggering a Cloud Function
* **Vercel / Netlify**: Cron-triggered serverless functions
* **GitHub Actions**: Scheduled workflows with `schedule` and `cron` trigger
### Workflow automation
No-code platforms like **n8n**, **Zapier**, and **Make** can call the Firecrawl API on a schedule and route results to Slack, email, or databases. See the [workflow automation guides](/developer-guides/workflow-automation/n8n).
## Webhooks
For async operations like crawl and batch scrape, use [webhooks](/webhooks/overview) to receive change tracking results as they arrive instead of polling.
```python Python theme={null}
job = firecrawl.start_crawl(
"https://example.com",
limit=50,
scrape_options={
"formats": [
"markdown",
{"type": "changeTracking", "modes": ["git-diff"]}
]
},
webhook={
"url": "https://your-server.com/firecrawl-webhook",
"headers": {"Authorization": "Bearer your-webhook-secret"},
"events": ["crawl.page", "crawl.completed"]
}
)
```
```js Node theme={null}
const { id } = await firecrawl.startCrawl('https://example.com', {
limit: 50,
scrapeOptions: {
formats: [
'markdown',
{ type: 'changeTracking', modes: ['git-diff'] }
]
},
webhook: {
url: 'https://your-server.com/firecrawl-webhook',
headers: { Authorization: 'Bearer your-webhook-secret' },
events: ['crawl.page', 'crawl.completed']
}
});
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"limit": 50,
"scrapeOptions": {
"formats": [
"markdown",
{ "type": "changeTracking", "modes": ["git-diff"] }
]
},
"webhook": {
"url": "https://your-server.com/firecrawl-webhook",
"headers": { "Authorization": "Bearer your-webhook-secret" },
"events": ["crawl.page", "crawl.completed"]
}
}'
```
The `crawl.page` event payload includes the `changeTracking` object for each page:
```json theme={null}
{
"success": true,
"type": "crawl.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [{
"markdown": "# Pricing\n\nStarter: $12/mo...",
"metadata": {
"title": "Pricing",
"url": "https://example.com/pricing",
"statusCode": 200
},
"changeTracking": {
"previousScrapeAt": "2025-06-05T12:00:00.000+00:00",
"changeStatus": "changed",
"visibility": "visible",
"diff": {
"text": "@@ -2,1 +2,1 @@\n-Starter: $9/mo\n+Starter: $12/mo"
}
}
}]
}
```
For webhook configuration details (headers, metadata, events, retries, signature verification), see the [Webhooks documentation](/webhooks/overview).
## Configuration reference
The full set of options available when passing a `changeTracking` format object:
| Parameter | Type | Default | Description |
| --------- | ---------- | ---------- | ----------------------------------------------------------------- |
| `type` | `string` | (required) | Must be `"changeTracking"` |
| `modes` | `string[]` | `[]` | Diff modes to enable: `"git-diff"`, `"json"`, or both |
| `schema` | `object` | (none) | JSON Schema for field-level comparison (required for `json` mode) |
| `prompt` | `string` | (none) | Custom prompt to guide LLM extraction (used with `json` mode) |
| `tag` | `string` | `null` | Separate tracking history identifier |
### Data models
```ts TypeScript theme={null}
interface ChangeTrackingResult {
previousScrapeAt: string | null;
changeStatus: "new" | "same" | "changed" | "removed";
visibility: "visible" | "hidden";
diff?: {
text: string;
json: {
files: Array<{
from: string | null;
to: string | null;
chunks: Array<{
content: string;
changes: Array<{
type: "add" | "del" | "normal";
ln?: number;
ln1?: number;
ln2?: number;
content: string;
}>;
}>;
}>;
};
};
json?: Record;
}
```
```python Python theme={null}
class ChangeTrackingData(BaseModel):
previous_scrape_at: Optional[str] = None
change_status: str # "new" | "same" | "changed" | "removed"
visibility: str # "visible" | "hidden"
diff: Optional[Dict[str, Any]] = None
json: Optional[Dict[str, Any]] = None
```
## Important details
The `markdown` format must always be included alongside `changeTracking`. Change tracking compares pages via their markdown content.
* **Snapshot retention**: Snapshots are stored persistently and do not expire. A scrape performed months after the previous one will still compare correctly against the earlier snapshot.
* **Scoping**: Comparisons are scoped to your team. Your first scrape of any URL returns `"new"`, even if other users have scraped it.
* **URL matching**: Previous scrapes are matched on exact source URL, team ID, `markdown` format, and `tag`. Keep URLs consistent between scrapes.
* **Parameter consistency**: Using different `includeTags`, `excludeTags`, or `onlyMainContent` settings across scrapes of the same URL produces unreliable comparisons.
* **Comparison algorithm**: The algorithm is resistant to whitespace and content order changes. Iframe source URLs are ignored to handle captcha/antibot randomization.
* **Caching**: Requests with `changeTracking` bypass the index cache. The `maxAge` parameter is ignored.
* **Error handling**: Monitor the `warning` field in responses and handle the `changeTracking` object potentially being absent (this can occur if the database lookup for the previous scrape times out).
## Billing
| Mode | Cost |
| --------------------- | --------------------------------------- |
| Basic change tracking | No extra cost (standard scrape credits) |
| `git-diff` mode | No extra cost |
| `json` mode | 5 credits per page |
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Crawl
Source: https://docs.firecrawl.dev/features/crawl
Recursively crawl a website and get content from every page
Crawl submits a URL to Firecrawl and recursively discovers and scrapes every reachable subpage. It handles sitemaps, JavaScript rendering, and rate limits automatically, returning clean markdown or structured data for each page.
* Discovers pages via sitemap and recursive link traversal
* Supports path filtering, depth limits, and subdomain/external link control
* Returns results via polling, WebSocket, or webhook
Test crawling in the interactive playground — no code required.
## Installation
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
```
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl
# Authenticate (one-time setup)
firecrawl login
```
## Basic usage
Submit a crawl job by calling `POST /v2/crawl` with a starting URL. The endpoint returns a job ID that you use to poll for results.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
docs = firecrawl.crawl(url="https://docs.firecrawl.dev", limit=10)
print(docs)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const docs = await firecrawl.crawl('https://docs.firecrawl.dev', { limit: 10 });
console.log(docs);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 10
}'
```
```bash CLI theme={null}
# Start a crawl job (returns job ID)
firecrawl crawl https://firecrawl.dev
# Wait for completion with progress
firecrawl crawl https://firecrawl.dev --wait --progress --limit 100
```
Each page crawled consumes 1 credit. The default crawl `limit` is 10,000 pages. Before starting, the crawl endpoint checks that your remaining credits can cover the `limit` — if not, it returns a **402 (Payment Required)** error. Set a lower `limit` to match your intended crawl size (e.g. `limit: 100`) to avoid this. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, and PDF parsing costs 1 credit per PDF page.
### Scrape options
All options from the [Scrape endpoint](/api-reference/endpoint/scrape) are available in crawl via `scrapeOptions` (JS) / `scrape_options` (Python). These apply to every page the crawler scrapes, including formats, proxy, caching, actions, location, and tags.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
# Crawl with scrape options
response = firecrawl.crawl('https://example.com',
limit=100,
scrape_options={
'formats': [
'markdown',
{ 'type': 'json', 'schema': { 'type': 'object', 'properties': { 'title': { 'type': 'string' } } } }
],
'proxy': 'auto',
'max_age': 600000,
'only_main_content': True
}
)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' });
// Crawl with scrape options
const crawlResponse = await firecrawl.crawl('https://example.com', {
limit: 100,
scrapeOptions: {
formats: [
'markdown',
{
type: 'json',
schema: { type: 'object', properties: { title: { type: 'string' } } },
},
],
proxy: 'auto',
maxAge: 600000,
onlyMainContent: true,
},
});
```
## Checking crawl status
Use the job ID to poll for the crawl status and retrieve results.
```python Python theme={null}
status = firecrawl.get_crawl_status("")
print(status)
```
```js Node theme={null}
const status = await firecrawl.getCrawlStatus("");
console.log(status);
```
```bash cURL theme={null}
# After starting a crawl, poll status by jobId
curl -s -X GET "https://api.firecrawl.dev/v2/crawl/" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
```bash CLI theme={null}
# Check crawl status using job ID
firecrawl crawl
```
Job results are available via the API for 24 hours after completion. After this period, you can still view your crawl history and results in the [activity logs](https://www.firecrawl.dev/app/logs).
Pages in the crawl results `data` array are pages that Firecrawl successfully scraped, even if the target site returned an HTTP error like 404. The `metadata.statusCode` field shows the HTTP status code from the target site. To retrieve pages that Firecrawl itself failed to scrape (e.g. network errors, timeouts, or robots.txt blocks), use the dedicated [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint (`GET /crawl/{id}/errors`).
### Response handling
The response varies based on the crawl's status. For incomplete or large responses exceeding 10MB, a `next` URL parameter is provided. You must request this URL to retrieve the next 10MB of data. If the `next` parameter is absent, it indicates the end of the crawl data.
The `skip` and `next` parameters are only relevant when hitting the API directly.
If you're using the SDK, pagination is handled automatically and all
results are returned at once.
```json Scraping theme={null}
{
"status": "scraping",
"total": 36,
"completed": 10,
"creditsUsed": 10,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=10",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/crawl/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
## SDK methods
There are two ways to use crawl with the SDK.
### Crawl and wait
The `crawl` method waits for the crawl to complete and returns the full response. It handles pagination automatically. This is recommended for most use cases.
```python Python theme={null}
from firecrawl import Firecrawl
from firecrawl.types import ScrapeOptions
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Crawl a website:
crawl_status = firecrawl.crawl(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(formats=['markdown', 'html']),
poll_interval=30
)
print(crawl_status)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({apiKey: "fc-YOUR_API_KEY"});
const crawlResponse = await firecrawl.crawl('https://firecrawl.dev', {
limit: 100,
scrapeOptions: {
formats: ['markdown', 'html'],
}
})
console.log(crawlResponse)
```
The response includes the crawl status and all scraped data:
```bash Python theme={null}
success=True
status='completed'
completed=100
total=100
creditsUsed=100
expiresAt=datetime.datetime(2025, 4, 23, 19, 21, 17, tzinfo=TzInfo(UTC))
next=None
data=[
Document(
markdown='[Day 7 - Launch Week III.Integrations DayApril 14th to 20th](...',
metadata={
'title': '15 Python Web Scraping Projects: From Beginner to Advanced',
...
'scrapeId': '97dcf796-c09b-43c9-b4f7-868a7a5af722',
'sourceURL': 'https://www.firecrawl.dev/blog/python-web-scraping-projects',
'url': 'https://www.firecrawl.dev/blog/python-web-scraping-projects',
'statusCode': 200
}
),
...
]
```
```json Node theme={null}
{
success: true,
status: "completed",
completed: 100,
total: 100,
creditsUsed: 100,
expiresAt: "2025-04-23T19:28:45.000Z",
data: [
{
markdown: "[Day 7 - Launch Week III.Integrations DayApril ...",
html: `
```
The initial response returns the job ID:
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/crawl/123-456-789"
}
```
## Real-time results with WebSocket
The watcher method provides real-time updates as pages are crawled. Start a crawl, then subscribe to events for immediate data processing.
```python Python theme={null}
import asyncio
from firecrawl import AsyncFirecrawl
async def main():
firecrawl = AsyncFirecrawl(api_key="fc-YOUR-API-KEY")
# Start a crawl first
started = await firecrawl.start_crawl("https://firecrawl.dev", limit=5)
# Watch updates (snapshots) until terminal status
async for snapshot in firecrawl.watcher(started.id, kind="crawl", poll_interval=2, timeout=120):
if snapshot.status == "completed":
print("DONE", snapshot.status)
for doc in snapshot.data:
print("DOC", doc.metadata.source_url if doc.metadata else None)
elif snapshot.status == "failed":
print("ERR", snapshot.status)
else:
print("STATUS", snapshot.status, snapshot.completed, "/", snapshot.total)
asyncio.run(main())
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
// Start a crawl and then watch it
const { id } = await firecrawl.startCrawl('https://mendable.ai', {
excludePaths: ['blog/*'],
limit: 5,
});
const watcher = firecrawl.watcher(id, { kind: 'crawl', pollInterval: 2, timeout: 120 });
watcher.on('document', (doc) => {
console.log('DOC', doc);
});
watcher.on('error', (err) => {
console.error('ERR', err?.error || err);
});
watcher.on('done', (state) => {
console.log('DONE', state.status);
});
// Begin watching (WS with HTTP fallback)
await watcher.start();
```
## Webhooks
You can configure webhooks to receive real-time notifications as your crawl progresses. This allows you to process pages as they are scraped instead of waiting for the entire crawl to complete.
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 100,
"webhook": {
"url": "https://your-domain.com/webhook",
"metadata": {
"any_key": "any_value"
},
"events": ["started", "page", "completed"]
}
}'
```
### Event types
| Event | Description |
| ----------------- | ---------------------------------------- |
| `crawl.started` | Fires when the crawl begins |
| `crawl.page` | Fires for each page successfully scraped |
| `crawl.completed` | Fires when the crawl finishes |
| `crawl.failed` | Fires if the crawl encounters an error |
### Payload
```json theme={null}
{
"success": true,
"type": "crawl.page",
"id": "crawl-job-id",
"data": [...], // Page data for 'page' events
"metadata": {}, // Your custom metadata
"error": null
}
```
### Verifying webhook signatures
Every webhook request from Firecrawl includes an `X-Firecrawl-Signature` header containing an HMAC-SHA256 signature. Always verify this signature to ensure the webhook is authentic and has not been tampered with.
1. Get your webhook secret from the [Advanced tab](https://www.firecrawl.dev/app/settings?tab=advanced) of your account settings
2. Extract the signature from the `X-Firecrawl-Signature` header
3. Compute HMAC-SHA256 of the raw request body using your secret
4. Compare with the signature header using a timing-safe function
Never process a webhook without verifying its signature first. The `X-Firecrawl-Signature` header contains the signature in the format: `sha256=abc123def456...`
For complete implementation examples in JavaScript and Python, see the [Webhook Security documentation](/webhooks/security). For comprehensive webhook documentation including detailed event payloads, payload structure, advanced configuration, and troubleshooting, see the [Webhooks documentation](/webhooks/overview).
## Configuration reference
The full set of parameters available when submitting a crawl job:
| Parameter | Type | Default | Description |
| ----------------------- | ---------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url` | `string` | (required) | The starting URL to crawl from |
| `limit` | `integer` | `10000` | Maximum number of pages to crawl |
| `maxDiscoveryDepth` | `integer` | (none) | Maximum depth from the root URL based on link-discovery hops, not the number of `/` segments in the URL. Each time a new URL is found on a page, it is assigned a depth one higher than the page it was discovered on. The root site and sitemapped pages have a discovery depth of 0. Pages at the max depth are still scraped, but links on them are not followed. |
| `includePaths` | `string[]` | (none) | URL pathname regex patterns to include. Only matching paths are crawled. |
| `excludePaths` | `string[]` | (none) | URL pathname regex patterns to exclude from the crawl |
| `regexOnFullURL` | `boolean` | `false` | Match `includePaths`/`excludePaths` against the full URL (including query parameters) instead of just the pathname |
| `crawlEntireDomain` | `boolean` | `false` | Follow internal links to sibling or parent URLs, not just child paths |
| `allowSubdomains` | `boolean` | `false` | Follow links to subdomains of the main domain |
| `allowExternalLinks` | `boolean` | `false` | Follow links to external websites |
| `sitemap` | `string` | `"include"` | Sitemap handling: `"include"` (default), `"skip"`, or `"only"` |
| `ignoreQueryParameters` | `boolean` | `false` | Avoid re-scraping the same path with different query parameters |
| `ignoreRobotsTxt` | `boolean` | `false` | Ignore the website's robots.txt rules. **Enterprise only** — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. |
| `robotsUserAgent` | `string` | (none) | Custom User-Agent string for robots.txt evaluation. When set, robots.txt is fetched with this User-Agent and rules are matched against it instead of the default. **Enterprise only** — contact [support@firecrawl.com](mailto:support@firecrawl.com) to enable. |
| `delay` | `number` | (none) | Delay in seconds between scrapes to respect rate limits. Setting this forces concurrency to 1. |
| `maxConcurrency` | `integer` | (none) | Maximum concurrent scrapes. Defaults to your team's concurrency limit. |
| `scrapeOptions` | `object` | (none) | Options applied to every scraped page (formats, proxy, caching, actions, etc.) |
| `webhook` | `object` | (none) | Webhook configuration for real-time notifications |
| `prompt` | `string` | (none) | Natural language prompt to generate crawl options. Explicitly set parameters override generated equivalents. |
## Important details
By default, crawl ignores sublinks that are not children of the URL you provide. For example, `website.com/other-parent/blog-1` would not be returned if you crawled `website.com/blogs/`. Use the `crawlEntireDomain` parameter to include sibling and parent paths. To crawl subdomains like `blog.website.com` when crawling `website.com`, use the `allowSubdomains` parameter.
* **Sitemap discovery**: By default, the crawler includes the website's sitemap to discover URLs (`sitemap: "include"`). If you set `sitemap: "skip"`, only pages reachable through HTML links from the root URL are found. Assets like PDFs or deeply nested pages listed in the sitemap but not directly linked from HTML will be missed. For maximum coverage, keep the default setting.
* **Credit usage**: Each page crawled costs 1 credit. JSON mode adds 4 credits per page, enhanced proxy adds 4 credits per page, and PDF parsing costs 1 credit per PDF page.
* **Result expiration**: Job results are available via the API for 24 hours after completion. After that, view results in the [activity logs](https://www.firecrawl.dev/app/logs).
* **Crawl errors**: The `data` array contains pages Firecrawl successfully scraped. Use the [Get Crawl Errors](/api-reference/endpoint/crawl-get-errors) endpoint to retrieve pages that failed due to network errors, timeouts, or robots.txt blocks.
* **Non-deterministic results**: Crawl results may vary between runs of the same configuration. Pages are scraped concurrently, so the order in which links are discovered depends on network timing and which pages finish loading first. This means different branches of a site may be explored to different extents near the depth boundary, especially at higher `maxDiscoveryDepth` values. To get more deterministic results, set `maxConcurrency` to `1` or use `sitemap: "only"` if the site has a comprehensive sitemap.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Enhanced Mode
Source: https://docs.firecrawl.dev/features/enhanced-mode
Use enhanced proxies for reliable scraping on complex sites
Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. Set the `proxy` parameter to control which proxy strategy is used for a request.
## Proxy types
Firecrawl supports three proxy types:
| Type | Description | Speed | Cost |
| ---------- | ------------------------------------------------------------ | ------ | ----------------------------------------------------------- |
| `basic` | Standard proxies suitable for most sites | Fast | 1 credit |
| `enhanced` | Enhanced proxies for complex sites | Slower | 5 credits per request |
| `auto` | Tries `basic` first, then retries with `enhanced` on failure | Varies | 1 credit if basic succeeds, 5 credits if enhanced is needed |
If you do not specify a proxy, Firecrawl defaults to `auto`.
## Basic usage
Set the `proxy` parameter to choose a proxy strategy. The following example uses `auto`, which lets Firecrawl decide when to escalate to enhanced proxies.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR-API-KEY')
# Choose proxy strategy: 'basic' | 'enhanced' | 'auto'
doc = firecrawl.scrape('https://example.com', formats=['markdown'], proxy='auto')
print(doc.warning or 'ok')
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Choose proxy strategy: 'basic' | 'enhanced' | 'auto'
const doc = await firecrawl.scrape('https://example.com', {
formats: ['markdown'],
proxy: 'auto'
});
console.log(doc.warning || 'ok');
```
```bash cURL theme={null}
// Choose proxy strategy: 'basic' | 'enhanced' | 'auto'
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR-API-KEY' \
-d '{
"url": "https://example.com",
"proxy": "auto"
}'
```
Enhanced proxy requests cost **5 credits per request**. When using `auto`, the 5-credit cost only applies if the basic proxy fails and the enhanced retry succeeds.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Faster Scraping
Source: https://docs.firecrawl.dev/features/fast-scraping
Speed up your scrapes by 500% with the maxAge parameter
## How It Works
Firecrawl caches previously scraped pages and, by default, returns a recent copy when available.
* **Default freshness**: `maxAge = 172800000` ms (2 days). If the cached copy is newer than this, it’s returned instantly; otherwise, Firecrawl scrapes fresh and updates the cache.
* **Force fresh**: Set `maxAge: 0` to always scrape. Be aware this bypasses the cache entirely, meaning every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if you don't need real-time content on every request.
* **Skip caching**: Set `storeInCache: false` if you don’t want to store results for a request.
Get your results **up to 500% faster** when you don’t need the absolute freshest data. Control freshness via `maxAge`:
1. **Return instantly** if we have a recent version of the page
2. **Scrape fresh** only if our version is older than your specified age
3. **Save you time** - results come back in milliseconds instead of seconds
## When to Use This
**Great for:**
* Documentation, articles, product pages
* Bulk processing jobs
* Development and testing
* Building knowledge bases
**Skip for:**
* Real-time data (stock prices, live scores, breaking news)
* Frequently updated content
* Time-sensitive applications
## Usage
Add `maxAge` to your scrape request. Values are in milliseconds (e.g., `3600000` = 1 hour).
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Use cached data if it's less than 1 hour old (3600000 ms)
# This can be 500% faster than a fresh scrape!
scrape_result = firecrawl.scrape(
'https://firecrawl.dev',
formats=['markdown'],
max_age=3600000 # 1 hour in milliseconds
)
print(scrape_result.markdown)
```
```javascript JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Use cached data if it's less than 1 hour old (3600000 ms)
// This can be 500% faster than a fresh scrape!
const scrapeResult = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown'],
maxAge: 3600000 // 1 hour in milliseconds
});
console.log(scrapeResult.markdown);
```
## Common maxAge values
Here are some helpful reference values:
* **5 minutes**: `300000` - For semi-dynamic content
* **1 hour**: `3600000` - For content that updates hourly
* **1 day**: `86400000` - For daily-updated content
* **1 week**: `604800000` - For relatively static content
## Performance impact
With `maxAge` enabled:
* **500% faster response times** for recent content
* **Instant results** instead of waiting for fresh scrapes
## Important notes
* **Default**: `maxAge` is `172800000` (2 days)
* **Fresh when needed**: If our data is older than `maxAge`, we scrape fresh automatically
* **No stale data**: You'll never get data older than your specified `maxAge`
* **Credits**: Cached results still cost 1 credit per page. Caching improves speed and latency, not credit usage.
### When caching is bypassed
Caching is automatically skipped when your request includes any of the following:
* Custom `headers`
* `actions` (browser automation steps)
* A browser `profile`
* `changeTracking` format
* Custom `screenshot` viewport or quality settings
### Cache hit matching
For a cache hit, these parameters must match exactly between the original and subsequent requests: `url`, `mobile`, `location`, `waitFor`, `blockAds`, `screenshot` (enabled/disabled and full-page), and stealth proxy mode.
You can verify cache behavior by checking `metadata.cacheState` in the response — it will be `"hit"` or `"miss"`.
## Faster crawling
The same speed benefits apply when crawling multiple pages. Use `maxAge` within `scrapeOptions` to get cached results for pages we’ve seen recently.
```python Python theme={null}
from firecrawl import Firecrawl
from firecrawl.v2.types import ScrapeOptions
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
# Crawl with cached scraping - 500% faster for pages we've seen recently
crawl_result = firecrawl.crawl(
'https://firecrawl.dev',
limit=100,
scrape_options=ScrapeOptions(
formats=['markdown'],
max_age=3600000 # Use cached data if less than 1 hour old
)
)
for page in crawl_result.data:
print(f"URL: {page.metadata.source_url}")
print(f"Content: {page.markdown[:200]}...")
```
```javascript JavaScript theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Crawl with cached scraping - 500% faster for pages we've seen recently
const crawlResult = await firecrawl.crawl('https://firecrawl.dev', {
limit: 100,
scrapeOptions: {
formats: ['markdown'],
maxAge: 3600000 // Use cached data if less than 1 hour old
}
});
crawlResult.data.forEach(page => {
console.log(`URL: ${page.metadata.sourceURL}`);
console.log(`Content: ${page.markdown.substring(0, 200)}...`);
});
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"limit": 100,
"scrapeOptions": {
"formats": ["markdown"],
"maxAge": 3600000
}
}'
```
When crawling with `maxAge`, each page in your crawl will benefit from the 500% speed improvement if we have recent cached data for that page.
Start using `maxAge` today for dramatically faster scrapes and crawls!
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# JSON mode - Structured result
Source: https://docs.firecrawl.dev/features/llm-extract
Extract structured data from pages via LLMs
**v2 API Change:** JSON schema extraction is fully supported in v2, but the API format has changed. In v2, the schema is embedded directly inside the format object as `formats: [{type: "json", schema: {...}}]`. The v1 `jsonOptions` parameter no longer exists in v2.
For schema validation failures and other extraction errors, see [Errors](/api-reference/errors) — extraction-specific issues typically surface as `400` or `422` responses.
## Scrape and extract structured data with Firecrawl
Firecrawl uses AI to get structured data from web pages in 3 steps:
1. **Set the Schema (optional):**
Define a JSON schema (using OpenAI's format) to specify the data you want, or just provide a `prompt` if you don't need a strict schema, along with the webpage URL.
2. **Make the Request:**
Send your URL and schema to our scrape endpoint using JSON mode. See how here:
[Scrape Endpoint Documentation](https://docs.firecrawl.dev/api-reference/endpoint/scrape)
3. **Get Your Data:**
Get back clean, structured data matching your schema that you can use right away.
This makes getting web data in the format you need quick and easy.
## Extract structured data
### JSON mode via /scrape
Used to extract structured data from scraped pages.
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
schema: schema
}],
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [ {
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
} ]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### Structured data without schema
You can also extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"prompt": "Extract the company mission from the page."
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
prompt: "Extract the company mission from the page."
}]
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [{
"type": "json",
"prompt": "Extract the company mission from the page."
}]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### Real-world example: Extracting company information
Here's a comprehensive example extracting structured company information from a website:
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev/',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}]
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
const companyInfoSchema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev/", {
formats: [{
type: "json",
schema: companyInfoSchema
}]
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev/",
"formats": [{
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
}]
}'
```
Output:
```json Output theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "Turn websites into LLM-ready data",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
}
}
}
```
### JSON format options
When using JSON mode in v2, include an object in `formats` with the schema embedded directly:
`formats: [{ type: 'json', schema: { ... }, prompt: '...' }]`
Parameters:
* `schema`: JSON Schema describing the structured output you want (required for schema-based extraction).
* `prompt`: Optional prompt to guide extraction (also used for no-schema extraction).
**Important:** Unlike v1, there is no separate `jsonOptions` parameter in v2. The schema must be included directly inside the format object in the `formats` array.
**HTML attributes are not available in JSON extraction.** JSON extraction works on the markdown conversion of the page, which only preserves visible text content. HTML attributes (e.g., `data-id`, custom attributes on elements) are stripped during conversion and the LLM cannot see them. If you need to extract HTML attribute values, use `rawHtml` format and parse attributes client-side, or use an `executeJavascript` action to inject attribute values into visible text before extraction.
## Tips for consistent extraction
If you are seeing inconsistent or incomplete results from JSON extraction, these practices can help:
* **Keep prompts short and focused.** Long prompts with many rules increase variability. Move specific constraints (like allowed values) into the schema instead.
* **Use concise property names.** Avoid embedding instructions or enum lists in property names. Use a short key like `"installation_type"` and put allowed values in an `enum` array.
* **Add `enum` arrays for constrained fields.** When a field has a fixed set of values, list them in `enum` and make sure they match the exact text shown on the page.
* **Include null-handling in field descriptions.** Add `"Return null if not found on the page."` to each field's `description` so the model does not guess missing values.
* **Add location hints.** Tell the model where to find data on the page, e.g. `"Flow rate in GPM from the Specifications table."`.
* **Split large schemas into smaller requests.** Schemas with many fields (e.g. 30+) produce less consistent results. Split them into 2–3 requests of 10–15 fields each.
* **Avoid `minItems`/`maxItems` on arrays.** JSON Schema validation keywords like `minItems` and `maxItems` do not control how much content the scraper collects. Setting `minItems: 20` will not make the LLM return more items — it may instead hallucinate entries to satisfy the constraint. Remove these keywords and use a `prompt` instead (e.g. `"Extract ALL reviews from the page. Do not skip any."`) to guide completeness.
* **Use `"type": "array"` to extract lists of items.** If you need to extract multiple items (e.g. a list of people, products, or reviews), wrap them in an array property with an `items` block. Using `"type": "object"` for a list will return only a single item. See the array schema example below.
**Example of a well-structured schema:**
```json theme={null}
{
"type": "object",
"properties": {
"product_name": {
"type": ["string", "null"],
"description": "Full descriptive product name as shown on the page. Return null if not found."
},
"installation_type": {
"type": ["string", "null"],
"description": "Installation type from the Specifications section. Return null if not found.",
"enum": ["Deck-mount", "Wall-mount", "Countertop", "Drop-in", "Undermount"]
},
"flow_rate_gpm": {
"type": ["string", "null"],
"description": "Flow rate in GPM from the Specifications section. Return null if not found."
}
}
}
```
**Example of extracting a list of items:**
When a page contains multiple items (e.g. team members, products, reviews), use `"type": "array"` with `"items"` to get the full list:
```json theme={null}
{
"type": "object",
"properties": {
"people": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"role": { "type": "string" },
"department": { "type": "string" }
}
}
}
}
}
```
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Models
Source: https://docs.firecrawl.dev/features/models
Choose the right model for your agent extraction tasks.
Firecrawl Agent offers two models optimized for different use cases. Choose the right model based on your extraction complexity and cost requirements.
## Available Models
| Model | Cost | Accuracy | Best For |
| -------------- | --------------- | -------- | ------------------------------------- |
| `spark-1-mini` | **60% cheaper** | Standard | Most tasks (default) |
| `spark-1-pro` | Standard | Higher | Complex research, critical extraction |
**Start with Spark 1 Mini** (default) — it handles most extraction tasks well at 60% lower cost. Switch to Pro only for complex multi-domain research or when accuracy is critical.
## Spark 1 Mini (Default)
`spark-1-mini` is our efficient model, ideal for straightforward data extraction tasks.
**Use Mini when:**
* Extracting simple data points (contact info, pricing, etc.)
* Working with well-structured websites
* Cost efficiency is a priority
* Running high-volume extraction jobs
**Example use cases:**
* Extracting product prices from e-commerce sites
* Gathering contact information from company pages
* Pulling basic metadata from articles
* Simple data point lookups
## Spark 1 Pro
`spark-1-pro` is our flagship model, designed for maximum accuracy on complex extraction tasks.
**Use Pro when:**
* Performing complex competitive analysis
* Extracting data that requires deep reasoning
* Accuracy is critical for your use case
* Dealing with ambiguous or hard-to-find data
**Example use cases:**
* Multi-domain competitive analysis
* Complex research tasks requiring reasoning
* Extracting nuanced information from multiple sources
* Critical business intelligence gathering
## Specifying a Model
Pass the `model` parameter to select which model to use:
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
# Using Spark 1 Mini (default - can be omitted)
result = app.agent(
prompt="Find the pricing of Firecrawl",
model="spark-1-mini"
)
# Using Spark 1 Pro for complex tasks
result = app.agent(
prompt="Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model="spark-1-pro"
)
print(result.data)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR_API_KEY" });
// Using Spark 1 Mini (default - can be omitted)
const result = await firecrawl.agent({
prompt: "Find the pricing of Firecrawl",
model: "spark-1-mini"
});
// Using Spark 1 Pro for complex tasks
const resultPro = await firecrawl.agent({
prompt: "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
model: "spark-1-pro"
});
console.log(result.data);
```
```bash cURL theme={null}
# Using Spark 1 Mini (default)
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Find the pricing of Firecrawl",
"model": "spark-1-mini"
}'
# Using Spark 1 Pro for complex tasks
curl -X POST "https://api.firecrawl.dev/v2/agent" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Compare all enterprise features and pricing across Firecrawl, Apify, and ScrapingBee",
"model": "spark-1-pro"
}'
```
## Model Comparison
| Feature | Spark 1 Mini | Spark 1 Pro |
| ---------------- | ------------ | ------------- |
| **Cost** | 60% cheaper | Standard |
| **Accuracy** | Standard | Higher |
| **Speed** | Fast | Fast |
| **Best for** | Most tasks | Complex tasks |
| **Reasoning** | Standard | Advanced |
| **Multi-domain** | Good | Excellent |
## Pricing by Model
Both models use dynamic, credit-based pricing that scales with task complexity:
* **Spark 1 Mini**: Uses approximately 60% fewer credits than Pro for equivalent tasks
* **Spark 1 Pro**: Standard credit consumption for maximum accuracy
Credit usage varies based on prompt complexity, data processed, and output structure — regardless of model selected.
## Choosing the Right Model
```
┌─────────────────────────────────┐
│ What type of task? │
└─────────────────────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Simple/Direct │ │ Complex/Research│
│ extraction │ │ multi-domain │
└─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ spark-1-mini │ │ spark-1-pro │
│ (60% cheaper) │ │ (higher acc.) │
└─────────────────┘ └─────────────────┘
```
## API Reference
See the [Agent API Reference](/api-reference/endpoint/agent) for complete parameter documentation.
Have questions about which model to use? Email [help@firecrawl.com](mailto:help@firecrawl.com).
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Proxies
Source: https://docs.firecrawl.dev/features/proxies
Learn about proxy types, locations, and how Firecrawl selects proxies for your requests.
Firecrawl provides different proxy types to help you scrape websites with varying levels of complexity. The proxy type can be specified using the `proxy` parameter.
> By default, Firecrawl routes all requests through proxies to help ensure reliability and access, even if you do not specify a proxy type or location.
## Location-Based Proxy Selection
Firecrawl automatically selects the best proxy based on your specified or detected location. This helps optimize scraping performance and reliability. However, not all locations are currently supported. The following locations are available:
| Country Code | Country Name | Basic Proxy Support | Enhanced Proxy Support |
| ------------ | -------------------- | ------------------- | ---------------------- |
| AE | United Arab Emirates | Yes | No |
| AU | Australia | Yes | No |
| BR | Brazil | Yes | No |
| CA | Canada | Yes | No |
| CN | China | Yes | No |
| CZ | Czechia | Yes | No |
| DE | Germany | Yes | No |
| DK | Denmark | Yes | Yes |
| EE | Estonia | Yes | No |
| EG | Egypt | Yes | No |
| ES | Spain | Yes | No |
| FR | France | Yes | No |
| GB | United Kingdom | Yes | No |
| GR | Greece | Yes | No |
| HU | Hungary | Yes | No |
| ID | Indonesia | Yes | No |
| IL | Israel | Yes | No |
| IN | India | Yes | No |
| IT | Italy | Yes | No |
| JP | Japan | Yes | No |
| MY | Malaysia | Yes | No |
| NO | Norway | Yes | No |
| PL | Poland | Yes | No |
| PT | Portugal | Yes | No |
| QA | Qatar | Yes | No |
| SG | Singapore | Yes | No |
| US | United States | Yes | Yes |
| VN | Vietnam | Yes | No |
The list of supported proxy locations will change over time.
If you need proxies in a location not listed above, please [contact us](mailto:help@firecrawl.com) and let us know your requirements.
If you do not specify a proxy or location, Firecrawl will automatically use US proxies.
## How to Specify Proxy Location
You can request a specific proxy location by setting the `location.country` parameter in your request. For example, to use a Brazilian proxy, set `location.country` to `BR`.
For full details, see the [API reference for `location.country`](https://docs.firecrawl.dev/api-reference/endpoint/scrape#body-location).
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape('https://example.com',
formats=['markdown'],
location={
'country': 'US',
'languages': ['en']
}
)
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', {
formats: ['markdown'],
location: { country: 'US', languages: ['en'] },
});
console.log(doc.metadata);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"location": { "country": "US", "languages": ["en"] }
}'
```
If you request a country where a proxy is not available, Firecrawl will use the closest available region (EU or US) and set the browser location to your requested country.
## Proxy Types
Firecrawl supports three types of proxies:
* **basic**: Proxies for scraping most sites. Fast and usually works.
* **enhanced**: Enhanced proxies for scraping complex sites while maintaining privacy. Slower, but more reliable on certain sites. [Learn more about Enhanced Mode →](/features/enhanced-mode)
* **auto**: Firecrawl will automatically retry scraping with enhanced proxies if the basic proxy fails. If the retry with enhanced is successful, 5 credits will be billed for the scrape. If the first attempt with basic is successful, only the regular cost will be billed.
***
> **Note:** For detailed information on using enhanced proxies, including credit costs and retry strategies, see the [Enhanced Mode documentation](/features/enhanced-mode).
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Scrape
Source: https://docs.firecrawl.dev/features/scrape
Turn any url into clean data
Firecrawl converts web pages into markdown, ideal for LLM applications.
* It manages complexities: proxies, caching, rate limits, js-blocked content
* Handles dynamic content: dynamic websites, js-rendered sites, PDFs, images
* Outputs clean markdown, structured data, screenshots or html.
For details, see the [Scrape Endpoint API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape).
Test scraping in the interactive playground — no code required.
If a request fails, see [Errors](/api-reference/errors) for the full catalog of error codes, causes, remedies, and retry guidance.
## Scraping a URL with Firecrawl
### /scrape endpoint
Used to scrape a URL and get its content.
### Installation
```python Python theme={null}
# pip install firecrawl-py
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
```
```js Node theme={null}
// npm install @mendable/firecrawl-js
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
```
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl
# Authenticate (one-time setup)
firecrawl login
```
### Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
# Scrape a website:
doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"])
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Scrape a website:
const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown", "html"]
}'
```
```bash CLI theme={null}
# Scrape a URL and get markdown
firecrawl https://firecrawl.dev
# With multiple formats (returns JSON)
firecrawl https://firecrawl.dev --format markdown,html,links --pretty
```
For more details about the parameters, refer to the [API Reference](https://docs.firecrawl.dev/api-reference/endpoint/scrape).
Each scrape consumes 1 credit. Additional credits apply for certain options: JSON mode costs 4 additional credits per page, enhanced proxy costs 4 additional credits per page, PDF parsing costs 1 credit per PDF page, and audio extraction costs 4 additional credits per page.
### Response
SDKs will return the data object directly. cURL will return the payload exactly as shown below.
```json theme={null}
{
"success": true,
"data" : {
"markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...",
"html": "
```python Python theme={null}
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
import { z } from "zod";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
// Define schema to extract contents into
const schema = z.object({
company_mission: z.string(),
supports_sso: z.boolean(),
is_open_source: z.boolean(),
is_in_yc: z.boolean()
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
schema: schema
}],
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [ {
"type": "json",
"schema": {
"type": "object",
"properties": {
"company_mission": {
"type": "string"
},
"supports_sso": {
"type": "boolean"
},
"is_open_source": {
"type": "boolean"
},
"is_in_yc": {
"type": "boolean"
}
},
"required": [
"company_mission",
"supports_sso",
"is_open_source",
"is_in_yc"
]
}
} ]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
"supports_sso": true,
"is_open_source": true,
"is_in_yc": true
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### Extracting without schema
You can now extract without a schema by just passing a `prompt` to the endpoint. The llm chooses the structure of the data.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"prompt": "Extract the company mission from the page."
}],
only_main_content=False,
timeout=120000
)
print(result)
```
```js Node theme={null}
import Firecrawl from "@mendable/firecrawl-js";
const app = new Firecrawl({
apiKey: "fc-YOUR_API_KEY"
});
const result = await app.scrape("https://firecrawl.dev", {
formats: [{
type: "json",
prompt: "Extract the company mission from the page."
}]
});
console.log(result);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://firecrawl.dev",
"formats": [{
"type": "json",
"prompt": "Extract the company mission from the page."
}]
}'
```
Output:
```json JSON theme={null}
{
"success": true,
"data": {
"json": {
"company_mission": "AI-powered web scraping and data extraction",
},
"metadata": {
"title": "Firecrawl",
"description": "AI-powered web scraping and data extraction",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "AI-powered web scraping and data extraction",
"ogUrl": "https://firecrawl.dev/",
"ogImage": "https://firecrawl.dev/og.png",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "https://firecrawl.dev/"
},
}
}
```
### JSON format options
When using the `json` format, pass an object inside `formats` with the following parameters:
* `schema`: JSON Schema for the structured output.
* `prompt`: Optional prompt to help guide extraction when a schema is present or when you prefer light guidance.
## Extract brand identity
### /scrape (with branding) endpoint
The branding format extracts comprehensive brand identity information from a webpage, including colors, fonts, typography, spacing, UI components, and more. This is useful for design system analysis, brand monitoring, or building tools that need to understand a website's visual identity.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
result = firecrawl.scrape(
url='https://firecrawl.dev',
formats=['branding']
)
print(result['branding'])
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const result = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['branding']
});
console.log(result.branding);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["branding"]
}'
```
### Response
The branding format returns a comprehensive `BrandingProfile` object with the following structure:
```json Output theme={null}
{
"success": true,
"data": {
"branding": {
"colorScheme": "dark",
"logo": "https://firecrawl.dev/logo.svg",
"colors": {
"primary": "#FF6B35",
"secondary": "#004E89",
"accent": "#F77F00",
"background": "#1A1A1A",
"textPrimary": "#FFFFFF",
"textSecondary": "#B0B0B0"
},
"fonts": [
{
"family": "Inter"
},
{
"family": "Roboto Mono"
}
],
"typography": {
"fontFamilies": {
"primary": "Inter",
"heading": "Inter",
"code": "Roboto Mono"
},
"fontSizes": {
"h1": "48px",
"h2": "36px",
"h3": "24px",
"body": "16px"
},
"fontWeights": {
"regular": 400,
"medium": 500,
"bold": 700
}
},
"spacing": {
"baseUnit": 8,
"borderRadius": "8px"
},
"components": {
"buttonPrimary": {
"background": "#FF6B35",
"textColor": "#FFFFFF",
"borderRadius": "8px"
},
"buttonSecondary": {
"background": "transparent",
"textColor": "#FF6B35",
"borderColor": "#FF6B35",
"borderRadius": "8px"
}
},
"images": {
"logo": "https://firecrawl.dev/logo.svg",
"favicon": "https://firecrawl.dev/favicon.ico",
"ogImage": "https://firecrawl.dev/og-image.png"
}
}
}
}
```
### Branding Profile Structure
The `branding` object contains the following properties:
* `colorScheme`: The detected color scheme (`"light"` or `"dark"`)
* `logo`: URL of the primary logo
* `colors`: Object containing brand colors:
* `primary`, `secondary`, `accent`: Main brand colors
* `background`, `textPrimary`, `textSecondary`: UI colors
* `link`, `success`, `warning`, `error`: Semantic colors
* `fonts`: Array of font families used on the page
* `typography`: Detailed typography information:
* `fontFamilies`: Primary, heading, and code font families
* `fontSizes`: Size definitions for headings and body text
* `fontWeights`: Weight definitions (light, regular, medium, bold)
* `lineHeights`: Line height values for different text types
* `spacing`: Spacing and layout information:
* `baseUnit`: Base spacing unit in pixels
* `borderRadius`: Default border radius
* `padding`, `margins`: Spacing values
* `components`: UI component styles:
* `buttonPrimary`, `buttonSecondary`: Button styles
* `input`: Input field styles
* `icons`: Icon style information
* `images`: Brand images (logo, favicon, og:image)
* `animations`: Animation and transition settings
* `layout`: Layout configuration (grid, header/footer heights)
* `personality`: Brand personality traits (tone, energy, target audience)
### Combining with other formats
You can combine the branding format with other formats to get comprehensive page data:
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
result = firecrawl.scrape(
url='https://firecrawl.dev',
formats=['markdown', 'branding', 'screenshot']
)
print(result['markdown'])
print(result['branding'])
print(result['screenshot'])
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const result = await firecrawl.scrape('https://firecrawl.dev', {
formats: ['markdown', 'branding', 'screenshot']
});
console.log(result.markdown);
console.log(result.branding);
console.log(result.screenshot);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown", "branding", "screenshot"]
}'
```
## Audio extraction
The `audio` format extracts audio from supported websites (e.g. YouTube) as MP3 files and returns a signed Google Cloud Storage URL. This is useful for building audio processing pipelines, transcription services, or podcast tools.
Audio extraction costs 5 credits per page (1 base + 4 additional).
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape("https://www.youtube.com/watch?v=dQw4w9WgXcQ", formats=["audio"])
print(doc.audio) # Signed GCS URL to the MP3 file
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://www.youtube.com/watch?v=dQw4w9WgXcQ', {
formats: ['audio']
});
console.log(doc.audio); // Signed GCS URL to the MP3 file
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"formats": ["audio"]
}'
```
## Interacting with the page with Actions
Firecrawl allows you to perform various actions on a web page before scraping its content. This is particularly useful for interacting with dynamic content, navigating through pages, or accessing content that requires user interaction.
**We recommend [Interact](/features/interact) over actions: our newer, more powerful way to interact with scraped pages.**
Interact runs as a stateful browser session that stays alive across calls, so you can drive a page turn-by-turn with either:
* **Natural language** for flexible, non-deterministic flows. e.g. *“search for ‘wireless headphones’, filter to 4+ stars under \$200, and return the results”*.
* **Playwright or agent-browser code** for deterministic steps. e.g. `await page.click('#export')`.
Interact also supports profiles, persistent sessions, and a live embeddable browser view (with an interactive mode where end users can drive the browser themselves).
Here is an example of how to use actions to navigate to google.com, search for Firecrawl, click on the first result, and take a screenshot.
It is important to almost always use the `wait` action before/after executing other actions to give enough time for the page to load.
### Example
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape(
url="https://example.com/login",
formats=["markdown"],
actions=[
{"type": "write", "text": "john@example.com"},
{"type": "press", "key": "Tab"},
{"type": "write", "text": "secret"},
{"type": "click", "selector": 'button[type="submit"]'},
{"type": "wait", "milliseconds": 1500},
{"type": "screenshot", "full_page": True},
],
)
print(doc.markdown, doc.screenshot)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com/login', {
formats: ['markdown'],
actions: [
{ type: 'write', text: 'john@example.com' },
{ type: 'press', key: 'Tab' },
{ type: 'write', text: 'secret' },
{ type: 'click', selector: 'button[type="submit"]' },
{ type: 'wait', milliseconds: 1500 },
{ type: 'screenshot', fullPage: true },
],
});
console.log(doc.markdown, doc.screenshot);
```
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://example.com/login",
"formats": ["markdown"],
"actions": [
{ "type": "write", "text": "john@example.com" },
{ "type": "press", "key": "Tab" },
{ "type": "write", "text": "secret" },
{ "type": "click", "selector": "button[type=\"submit\"]" },
{ "type": "wait", "milliseconds": 1500 },
{ "type": "screenshot", "fullPage": true },
],
}'
```
### Output
```json JSON theme={null}
{
"success": true,
"data": {
"markdown": "Our first Launch Week is over! [See the recap 🚀](blog/firecrawl-launch-week-1-recap)...",
"actions": {
"screenshots": [
"https://alttmdsdujxrfnakrkyi.supabase.co/storage/v1/object/public/media/screenshot-75ef2d87-31e0-4349-a478-fb432a29e241.png"
],
"scrapes": [
{
"url": "https://www.firecrawl.dev/",
"html": "
Firecrawl
"
}
]
},
"metadata": {
"title": "Home - Firecrawl",
"description": "Firecrawl crawls and converts any website into clean markdown.",
"language": "en",
"keywords": "Firecrawl,Markdown,Data,Mendable,Langchain",
"robots": "follow, index",
"ogTitle": "Firecrawl",
"ogDescription": "Turn any website into LLM-ready data.",
"ogUrl": "https://www.firecrawl.dev/",
"ogImage": "https://www.firecrawl.dev/og.png?123",
"ogLocaleAlternate": [],
"ogSiteName": "Firecrawl",
"sourceURL": "http://google.com",
"statusCode": 200
}
}
}
```
For workflows that require richer browser control after scraping, such as authenticated sessions, multi-step navigation, or a live view of the page, we recommend [Interact](/features/interact) over extending the actions array.
## Location and Language
Specify country and preferred languages to get relevant content based on your target location and language preferences.
### How it works
When you specify the location settings, Firecrawl will use an appropriate proxy if available and emulate the corresponding language and timezone settings. By default, the location is set to 'US' if not specified.
### Usage
To use the location and language settings, include the `location` object in your request body with the following properties:
* `country`: ISO 3166-1 alpha-2 country code (e.g., 'US', 'AU', 'DE', 'JP'). Defaults to 'US'.
* `languages`: An array of preferred languages and locales for the request in order of priority. Defaults to the language of the specified location.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
doc = firecrawl.scrape('https://example.com',
formats=['markdown'],
location={
'country': 'US',
'languages': ['en']
}
)
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', {
formats: ['markdown'],
location: { country: 'US', languages: ['en'] },
});
console.log(doc.metadata);
```
```bash cURL theme={null}
curl -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"location": { "country": "US", "languages": ["en"] }
}'
```
For more details about supported locations, refer to the [Proxies documentation](/features/proxies).
## Caching and maxAge
To make requests faster, Firecrawl serves results from cache by default when a recent copy is available.
* **Default freshness window**: `maxAge = 172800000` ms (2 days). If a cached page is newer than this, it’s returned instantly; otherwise, the page is scraped and then cached.
* **Performance**: This can speed up scrapes by up to 5x when data doesn’t need to be ultra-fresh.
* **Always fetch fresh**: Set `maxAge` to `0`. Note that this bypasses the cache entirely, so every request goes through the full scraping pipeline, meaning that the request will take longer to complete and is more likely to fail. Use a non-zero `maxAge` if freshness on every request is not critical.
* **Avoid storing**: Set `storeInCache` to `false` if you don’t want Firecrawl to cache/store results for this request.
* **Cache-only lookup**: Set `minAge` to perform a cache-only lookup without triggering a fresh scrape. The value is in milliseconds and specifies the minimum age the cached data must be. If no cached data is found, a `404` with error code `SCRAPE_NO_CACHED_DATA` is returned. Set `minAge` to `1` to accept any cached data regardless of age.
* **Change tracking**: Requests that include `changeTracking` bypass the cache, so `maxAge` is ignored.
* **Credits**: Cached results still cost 1 credit per page. Caching improves speed, not credit usage.
Example (force fresh content):
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
doc = firecrawl.scrape(url='https://example.com', max_age=0, formats=['markdown'])
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', { maxAge: 0, formats: ['markdown'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"maxAge": 0,
"formats": ["markdown"]
}'
```
Example (use a 10-minute cache window):
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key='fc-YOUR_API_KEY')
doc = firecrawl.scrape(url='https://example.com', max_age=600000, formats=['markdown', 'html'])
print(doc)
```
```js Node theme={null}
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const doc = await firecrawl.scrape('https://example.com', { maxAge: 600000, formats: ['markdown', 'html'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"maxAge": 600000,
"formats": ["markdown", "html"]
}'
```
## Batch scraping multiple URLs
You can now batch scrape multiple URLs at the same time. It takes the starting URLs and optional parameters as arguments. The params argument allows you to specify additional options for the batch scrape job, such as the output formats.
### How it works
It is very similar to how the `/crawl` endpoint works. It submits a batch scrape job and returns a job ID to check the status of the batch scrape.
The sdk provides 2 methods, synchronous and asynchronous. The synchronous method will return the results of the batch scrape job, while the asynchronous method will return a job ID that you can use to check the status of the batch scrape.
### Usage
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
job = firecrawl.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
], formats=["markdown"], poll_interval=2, wait_timeout=120)
print(job)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const job = await firecrawl.batchScrape([
'https://firecrawl.dev',
'https://docs.firecrawl.dev',
], { options: { formats: ['markdown'] }, pollInterval: 2, timeout: 120 });
console.log(job);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/batch/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://firecrawl.dev", "https://docs.firecrawl.dev"],
"formats": ["markdown"]
}'
```
### Response
If you’re using the sync methods from the SDKs, it will return the results of the batch scrape job. Otherwise, it will return a job ID that you can use to check the status of the batch scrape.
#### Synchronous
```json Completed theme={null}
{
"status": "completed",
"total": 36,
"completed": 36,
"creditsUsed": 36,
"expiresAt": "2024-00-00T00:00:00.000Z",
"next": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789?skip=26",
"data": [
{
"markdown": "[Firecrawl Docs home page!...",
"html": "...",
"metadata": {
"title": "Build a 'Chat with website' using Groq Llama 3 | Firecrawl",
"language": "en",
"sourceURL": "https://docs.firecrawl.dev/learn/rag-llama3",
"description": "Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.",
"ogLocaleAlternate": [],
"statusCode": 200
}
},
...
]
}
```
#### Asynchronous
You can then use the job ID to check the status of the batch scrape by calling the `/batch/scrape/{id}` endpoint. This endpoint is meant to be used while the job is still running or right after it has completed **as batch scrape jobs expire after 24 hours**.
```json theme={null}
{
"success": true,
"id": "123-456-789",
"url": "https://api.firecrawl.dev/v2/batch/scrape/123-456-789"
}
```
## Enhanced Mode
For complex websites, Firecrawl offers enhanced mode that provides better success rates while maintaining privacy.
Learn more about [Enhanced Mode](/features/enhanced-mode).
## Zero Data Retention (ZDR)
Firecrawl supports Zero Data Retention (ZDR) for teams with strict data handling requirements. When enabled, Firecrawl will not persist any page content or extracted data beyond the lifetime of the request.
To enable ZDR, set `zeroDataRetention: true` in your request:
```bash cURL theme={null}
curl -X POST https://api.firecrawl.dev/v2/scrape \
-H "Content-Type: application/json" \
-H "Authorization: Bearer fc-YOUR_API_KEY" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"zeroDataRetention": true
}'
```
ZDR is available on Enterprise plans and must be enabled for your team. Visit [firecrawl.dev/enterprise](https://www.firecrawl.dev/enterprise) to get started.
ZDR adds **1 additional credit per page** on top of the base scrape cost.
Screenshots are not available in ZDR mode. Because screenshots require uploading to persistent storage, they are incompatible with the ZDR guarantee. Requests that include both `zeroDataRetention: true` and a `screenshot` format will return an error.
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# Introduction
Source: https://docs.firecrawl.dev/introduction
Search the web, scrape any page, and interact with it — all through one API.
**For AI agents:** Append `.md` to any docs URL for markdown, e.g. [introduction.md](/introduction.md).
## Get started
Sign up and get your API key to start using Firecrawl
Test the API instantly without writing any code
### Use Firecrawl with AI agents (recommended)
The Firecrawl skill is the fastest way for agents to discover and use Firecrawl. Without it, your agent will not know Firecrawl is available.
```bash theme={null}
npx -y firecrawl-cli@latest init --all --browser
```
Restart your agent after installing the skill. See [Skill + CLI](/sdks/cli) for the full setup.
Or use the [MCP Server](/mcp-server) to connect Firecrawl directly to Claude, Cursor, Windsurf, VS Code, and other AI tools.
***
## What can Firecrawl do?
Search the web and get full page content from results
Extract content from any URL as markdown, HTML, or structured JSON
Continue working with any scraped page — click, fill forms, extract dynamic content
### Why Firecrawl?
* **LLM-ready output**: Clean markdown, structured JSON, screenshots, and more.
* **Handles the hard stuff**: Proxies, anti-bot, JavaScript rendering, and dynamic content.
* **Reliable**: Built for production with high uptime and consistent results.
* **Fast**: Results in seconds, optimized for high throughput.
* **MCP Server**: Connect Firecrawl to any AI tool via the [Model Context Protocol](/mcp-server).
***
## Search
Search the web and get full page content from results in one call. See the [Search feature docs](/features/search) for all options.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
results = firecrawl.search(
query="firecrawl",
limit=3,
)
print(results)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
const results = await firecrawl.search('firecrawl', {
limit: 3,
scrapeOptions: { formats: ['markdown'] }
});
console.log(results);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/search" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "firecrawl",
"limit": 3
}'
```
```bash CLI theme={null}
# Search the web
firecrawl search "firecrawl web scraping" --limit 5 --pretty
```
SDKs will return the data object directly. cURL will return the complete payload.
```json JSON theme={null}
{
"success": true,
"data": {
"web": [
{
"url": "https://www.firecrawl.dev/",
"title": "Firecrawl - The Web Data API for AI",
"description": "The web crawling, scraping, and search API for AI. Built for scale. Firecrawl delivers the entire internet to AI agents and builders.",
"position": 1
},
{
"url": "https://github.com/firecrawl/firecrawl",
"title": "mendableai/firecrawl: Turn entire websites into LLM-ready ... - GitHub",
"description": "Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data.",
"position": 2
},
...
],
"images": [
{
"title": "Quickstart | Firecrawl",
"imageUrl": "https://mintlify.s3.us-west-1.amazonaws.com/firecrawl/logo/logo.png",
"imageWidth": 5814,
"imageHeight": 1200,
"url": "https://docs.firecrawl.dev/",
"position": 1
},
...
],
"news": [
{
"title": "Y Combinator startup Firecrawl is ready to pay $1M to hire three AI agents as employees",
"url": "https://techcrunch.com/2025/05/17/y-combinator-startup-firecrawl-is-ready-to-pay-1m-to-hire-three-ai-agents-as-employees/",
"snippet": "It's now placed three new ads on YC's job board for “AI agents only” and has set aside a $1 million budget total to make it happen.",
"date": "3 months ago",
"position": 1
},
...
]
}
}
```
## Scrape
Scrape any URL and get its content in markdown, HTML, or other formats. See the [Scrape feature docs](/features/scrape) for all options.
```python Python theme={null}
from firecrawl import Firecrawl
firecrawl = Firecrawl(api_key="fc-YOUR-API-KEY")
# Scrape a website:
doc = firecrawl.scrape("https://firecrawl.dev", formats=["markdown", "html"])
print(doc)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const firecrawl = new Firecrawl({ apiKey: "fc-YOUR-API-KEY" });
// Scrape a website:
const doc = await firecrawl.scrape('https://firecrawl.dev', { formats: ['markdown', 'html'] });
console.log(doc);
```
```bash cURL theme={null}
curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://firecrawl.dev",
"formats": ["markdown", "html"]
}'
```
```bash CLI theme={null}
# Scrape a URL and get markdown
firecrawl https://firecrawl.dev
# With multiple formats (returns JSON)
firecrawl https://firecrawl.dev --format markdown,html,links --pretty
```
SDKs will return the data object directly. cURL will return the payload exactly as shown below.
```json theme={null}
{
"success": true,
"data" : {
"markdown": "Launch Week I is here! [See our Day 2 Release 🚀](https://www.firecrawl.dev/blog/launch-week-i-day-2-doubled-rate-limits)[💥 Get 2 months free...",
"html": "
## Interact
Scrape a page, then keep working with it — click buttons, fill forms, extract dynamic content, or navigate deeper. Describe what you want in plain English or write code for full control. See the [Interact feature docs](/features/interact) for all options.
```python Python theme={null}
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR-API-KEY")
# 1. Scrape Amazon's homepage
result = app.scrape("https://www.amazon.com", formats=["markdown"])
scrape_id = result.metadata.scrape_id
# 2. Interact — search for a product and get its price
app.interact(scrape_id, prompt="Search for iPhone 16 Pro Max")
response = app.interact(scrape_id, prompt="Click on the first result and tell me the price")
print(response.output)
# 3. Stop the session
app.stop_interaction(scrape_id)
```
```js Node theme={null}
import Firecrawl from '@mendable/firecrawl-js';
const app = new Firecrawl({ apiKey: 'fc-YOUR-API-KEY' });
// 1. Scrape Amazon's homepage
const result = await app.scrape('https://www.amazon.com', { formats: ['markdown'] });
const scrapeId = result.metadata?.scrapeId;
// 2. Interact — search for a product and get its price
await app.interact(scrapeId, { prompt: 'Search for iPhone 16 Pro Max' });
const response = await app.interact(scrapeId, { prompt: 'Click on the first result and tell me the price' });
console.log(response.output);
// 3. Stop the session
await app.stopInteraction(scrapeId);
```
```bash cURL theme={null}
# 1. Scrape Amazon's homepage
RESPONSE=$(curl -s -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com", "formats": ["markdown"]}')
SCRAPE_ID=$(echo $RESPONSE | jq -r '.data.metadata.scrapeId')
# 2. Interact — search for a product and get its price
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Search for iPhone 16 Pro Max"}'
curl -s -X POST "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": "Click on the first result and tell me the price"}'
# 3. Stop the session
curl -s -X DELETE "https://api.firecrawl.dev/v2/scrape/$SCRAPE_ID/interact" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY"
```
```bash CLI theme={null}
# 1. Scrape Amazon's homepage (scrape ID is saved automatically)
firecrawl scrape https://www.amazon.com
# 2. Interact — search for a product and get its price
firecrawl interact "Search for iPhone 16 Pro Max"
firecrawl interact "Click on the first result and tell me the price"
# 3. Stop the session
firecrawl interact stop
```
```json Response theme={null}
{
"success": true,
"liveViewUrl": "https://liveview.firecrawl.dev/...",
"interactiveLiveViewUrl": "https://liveview.firecrawl.dev/...",
"output": "The iPhone 16 Pro Max (256GB) is priced at $1,199.00.",
"exitCode": 0,
"killed": false
}
```
***
## More capabilities
Autonomous web data gathering powered by AI
Managed browser sessions for interactive workflows
Discover all URLs on a website
Recursively gather content from entire sites
***
## Resources
Complete API documentation with interactive examples
Python, Node.js, CLI, and community SDKs
Self-host Firecrawl or contribute to the project
LangChain, LlamaIndex, OpenAI, and more
# Firecrawl MCP Server
Source: https://docs.firecrawl.dev/mcp-server
Use Firecrawl's API through the Model Context Protocol
A Model Context Protocol (MCP) server implementation that integrates [Firecrawl](https://github.com/firecrawl/firecrawl) for searching, scraping, and interacting with the web. Our MCP server is open-source and available on [GitHub](https://github.com/firecrawl/firecrawl-mcp-server).
## Features
* Search the web and get full page content
* Scrape any URL into clean, structured data
* Interact with pages — click, navigate, and operate
* Deep research with autonomous agent
* Browser session management
* Cloud and self-hosted support
* Streamable HTTP support
## Installation
You can either use our remote hosted URL or run the server locally. Get your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys)
### Remote hosted URL
```bash theme={null}
https://mcp.firecrawl.dev/{FIRECRAWL_API_KEY}/v2/mcp
```
### Running with npx
```bash theme={null}
env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
```
### Manual Installation
```bash theme={null}
npm install -g firecrawl-mcp
```
### Running on Cursor
#### Manual Installation
Configuring Cursor 🖥️
Note: Requires Cursor version 0.45.6+
For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers:
[Cursor MCP Server Configuration Guide](https://docs.cursor.com/context/model-context-protocol#configuring-mcp-servers)
To configure Firecrawl MCP in Cursor **v0.48.6**
1. Open Cursor Settings
2. Go to Features > MCP Servers
3. Click "+ Add new global MCP server"
4. Enter the following code:
```json theme={null}
{
"mcpServers": {
"firecrawl-mcp": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR-API-KEY"
}
}
}
}
```
To configure Firecrawl MCP in Cursor **v0.45.6**
1. Open Cursor Settings
2. Go to Features > MCP Servers
3. Click "+ Add New MCP Server"
4. Enter the following:
* Name: "firecrawl-mcp" (or your preferred name)
* Type: "command"
* Command: `env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp`
> If you are using Windows and are running into issues, try `cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"`
Replace `your-api-key` with your Firecrawl API key. If you don't have one yet, you can create an account and get it from [https://www.firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys)
After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use Firecrawl MCP when appropriate, but you can explicitly request it by describing your web data needs. Access the Composer via Command+L (Mac), select "Agent" next to the submit button, and enter your query.
### Running on Windsurf
Add this to your `./codeium/windsurf/model_config.json`:
```json theme={null}
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
```
### Running with Streamable HTTP Mode
To run the server using streamable HTTP transport locally instead of the default stdio transport:
```bash theme={null}
env HTTP_STREAMABLE_SERVER=true FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp
```
Use the url: [http://localhost:3000/v2/mcp](http://localhost:3000/v2/mcp) or [https://mcp.firecrawl.dev/\{FIRECRAWL\_API\_KEY}/v2/mcp](https://mcp.firecrawl.dev/\{FIRECRAWL_API_KEY}/v2/mcp)
### Installing via Smithery (Legacy)
To install Firecrawl for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@mendableai/mcp-server-firecrawl):
```bash theme={null}
npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
```
### Running on VS Code
For one-click installation, click one of the install buttons below\...
[](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D) [](https://insiders.vscode.dev/redirect/mcp/install?name=firecrawl\&inputs=%5B%7B%22type%22%3A%22promptString%22%2C%22id%22%3A%22apiKey%22%2C%22description%22%3A%22Firecrawl%20API%20Key%22%2C%22password%22%3Atrue%7D%5D\&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22firecrawl-mcp%22%5D%2C%22env%22%3A%7B%22FIRECRAWL_API_KEY%22%3A%22%24%7Binput%3AapiKey%7D%22%7D%7D\&quality=insiders)
For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing `Ctrl + Shift + P` and typing `Preferences: Open User Settings (JSON)`.
```json theme={null}
{
"mcp": {
"inputs": [
{
"type": "promptString",
"id": "apiKey",
"description": "Firecrawl API Key",
"password": true
}
],
"servers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "${input:apiKey}"
}
}
}
}
}
```
Optionally, you can add it to a file called `.vscode/mcp.json` in your workspace. This will allow you to share the configuration with others:
```json theme={null}
{
"inputs": [
{
"type": "promptString",
"id": "apiKey",
"description": "Firecrawl API Key",
"password": true
}
],
"servers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "${input:apiKey}"
}
}
}
}
```
**Note:** Some users have reported issues when adding the MCP server to VS Code due to how it validates JSON with an outdated schema format ([microsoft/vscode#155379](https://github.com/microsoft/vscode/issues/155379)).
This affects several MCP tools, including Firecrawl.
**Workaround:** Disable JSON validation in VS Code to allow the MCP server to load properly.\
See reference: [directus/directus#25906 (comment)](https://github.com/directus/directus/issues/25906#issuecomment-3369169513).
The MCP server still works fine when invoked via other extensions, but the issue occurs specifically when registering it directly in the MCP server list. We plan to add guidance once VS Code updates their schema validation.
### Running on Claude Desktop
Add this to the Claude config file:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"url": "https://mcp.firecrawl.dev/v2/mcp",
"headers": {
"Authorization": "Bearer YOUR_API_KEY"
}
}
}
}
```
If you get a "Couldn't reach the MCP server" error, your Claude Desktop version may not support streamable HTTP transport. Use the local npx approach instead (requires [Node.js](https://nodejs.org)):
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
```
If you see a `spawn npx ENOENT` error, Node.js is not installed or not in your system PATH. Install Node.js from [nodejs.org](https://nodejs.org) (LTS version), then fully restart Claude Desktop. On Windows, you can also run `where npx` in Command Prompt and use the full path (e.g. `C:\\Program Files\\nodejs\\npx.cmd`) as the `command` value.
### Running on Claude Code
Add the Firecrawl MCP server using the Claude Code CLI. You can use the remote hosted URL or run locally:
```bash theme={null}
# Remote hosted URL (recommended)
claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp
# Or run locally via npx
claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp
```
### Running on Google Antigravity
Google Antigravity allows you to configure MCP servers directly through its Agent interface.
1. Open the Agent sidebar in the Editor or the Agent Manager view
2. Click the "..." (More Actions) menu and select **MCP Servers**
3. Select **View raw config** to open your local `mcp_config.json` file
4. Add the following configuration:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_FIRECRAWL_API_KEY"
}
}
}
}
```
5. Save the file and click **Refresh** in the Antigravity MCP interface to see the new tools
Replace `YOUR_FIRECRAWL_API_KEY` with your API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys).
### Running on n8n
To connect the Firecrawl MCP server in n8n:
1. Get your Firecrawl API key from [https://firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys)
2. In your n8n workflow, add an **AI Agent** node
3. In the AI Agent configuration, add a new **Tool**
4. Select **MCP Client Tool** as the tool type
5. Enter the MCP server Endpoint (replace `{YOUR_FIRECRAWL_API_KEY}` with your actual API key):
```
https://mcp.firecrawl.dev/{YOUR_FIRECRAWL_API_KEY}/v2/mcp
```
6. Set **Server Transport** to **HTTP Streamable**
7. Set **Authentication** to **None**
8. For **Tools to include**, you can select **All**, **Selected**, or **All Except** - this will expose the Firecrawl tools (scrape, crawl, map, search, extract, etc.)
For self-hosted deployments, run the MCP server with npx and enable HTTP transport mode:
```bash theme={null}
env HTTP_STREAMABLE_SERVER=true \
FIRECRAWL_API_KEY=fc-YOUR_API_KEY \
FIRECRAWL_API_URL=YOUR_FIRECRAWL_INSTANCE \
npx -y firecrawl-mcp
```
This will start the server on `http://localhost:3000/v2/mcp` which you can use in your n8n workflow as Endpoint. The `HTTP_STREAMABLE_SERVER=true` environment variable is required since n8n needs HTTP transport.
## Configuration
### Environment Variables
#### Required for Cloud API
* `FIRECRAWL_API_KEY`: Your Firecrawl API key
* Required when using cloud API (default)
* Optional when using self-hosted instance with `FIRECRAWL_API_URL`
* `FIRECRAWL_API_URL` (Optional): Custom API endpoint for self-hosted instances
* Example: `https://firecrawl.your-domain.com`
* If not provided, the cloud API will be used (requires API key)
#### Optional Configuration
##### Retry Configuration
* `FIRECRAWL_RETRY_MAX_ATTEMPTS`: Maximum number of retry attempts (default: 3)
* `FIRECRAWL_RETRY_INITIAL_DELAY`: Initial delay in milliseconds before first retry (default: 1000)
* `FIRECRAWL_RETRY_MAX_DELAY`: Maximum delay in milliseconds between retries (default: 10000)
* `FIRECRAWL_RETRY_BACKOFF_FACTOR`: Exponential backoff multiplier (default: 2)
##### Credit Usage Monitoring
* `FIRECRAWL_CREDIT_WARNING_THRESHOLD`: Credit usage warning threshold (default: 1000)
* `FIRECRAWL_CREDIT_CRITICAL_THRESHOLD`: Credit usage critical threshold (default: 100)
### Configuration Examples
For cloud API usage with custom retry and credit monitoring:
```bash theme={null}
# Required for cloud API
export FIRECRAWL_API_KEY=your-api-key
# Optional retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts
export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay
export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay
export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff
# Optional credit monitoring
export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits
export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits
```
For self-hosted instance:
```bash theme={null}
# Required for self-hosted
export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
# Optional authentication for self-hosted
export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth
# Custom retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries
```
### Custom configuration with Claude Desktop
Add this to your `claude_desktop_config.json`:
```json theme={null}
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
"FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
"FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
"FIRECRAWL_RETRY_MAX_DELAY": "30000",
"FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
"FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
"FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
}
}
}
}
```
### System Configuration
The server includes several configurable parameters that can be set via environment variables. Here are the default values if not configured:
```typescript theme={null}
const CONFIG = {
retry: {
maxAttempts: 3, // Number of retry attempts for rate-limited requests
initialDelay: 1000, // Initial delay before first retry (in milliseconds)
maxDelay: 10000, // Maximum delay between retries (in milliseconds)
backoffFactor: 2, // Multiplier for exponential backoff
},
credit: {
warningThreshold: 1000, // Warn when credit usage reaches this level
criticalThreshold: 100, // Critical alert when credit usage reaches this level
},
};
```
These configurations control:
1. **Retry Behavior**
* Automatically retries failed requests due to rate limits
* Uses exponential backoff to avoid overwhelming the API
* Example: With default settings, retries will be attempted at:
* 1st retry: 1 second delay
* 2nd retry: 2 seconds delay
* 3rd retry: 4 seconds delay (capped at maxDelay)
2. **Credit Usage Monitoring**
* Tracks API credit consumption for cloud API usage
* Provides warnings at specified thresholds
* Helps prevent unexpected service interruption
* Example: With default settings:
* Warning at 1000 credits remaining
* Critical alert at 100 credits remaining
### Rate Limiting and Batch Processing
The server utilizes Firecrawl's built-in rate limiting and batch processing capabilities:
* Automatic rate limit handling with exponential backoff
* Efficient parallel processing for batch operations
* Smart request queuing and throttling
* Automatic retries for transient errors
## Available Tools
### 1. Scrape Tool (`firecrawl_scrape`)
Scrape content from a single URL with advanced options.
```json theme={null}
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"waitFor": 1000,
"mobile": false,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"skipTlsVerification": false
}
}
```
### 2. Map Tool (`firecrawl_map`)
Map a website to discover all indexed URLs on the site.
```json theme={null}
{
"name": "firecrawl_map",
"arguments": {
"url": "https://example.com",
"search": "blog",
"sitemap": "include",
"includeSubdomains": false,
"limit": 100,
"ignoreQueryParameters": true
}
}
```
#### Map Tool Options:
* `url`: The base URL of the website to map
* `search`: Optional search term to filter URLs
* `sitemap`: Control sitemap usage - "include", "skip", or "only"
* `includeSubdomains`: Whether to include subdomains in the mapping
* `limit`: Maximum number of URLs to return
* `ignoreQueryParameters`: Whether to ignore query parameters when mapping
**Best for:** Discovering URLs on a website before deciding what to scrape; finding specific sections of a website.
**Returns:** Array of URLs found on the site.
### 3. Search Tool (`firecrawl_search`)
Search the web and optionally extract content from search results.
```json theme={null}
{
"name": "firecrawl_search",
"arguments": {
"query": "your search query",
"limit": 5,
"location": "United States",
"tbs": "qdr:m",
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}
```
#### Search Tool Options:
* `query`: The search query string (required)
* `limit`: Maximum number of results to return
* `location`: Geographic location for search results
* `tbs`: Time-based search filter (e.g., `qdr:d` for past day, `qdr:w` for past week, `qdr:m` for past month)
* `filter`: Additional search filter
* `sources`: Array of source types to search (`web`, `images`, `news`)
* `scrapeOptions`: Options for scraping search result pages
* `enterprise`: Array of enterprise options (`default`, `anon`, `zdr`)
### 4. Crawl Tool (`firecrawl_crawl`)
Start an asynchronous crawl with advanced options.
```json theme={null}
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://example.com",
"maxDiscoveryDepth": 2,
"limit": 100,
"allowExternalLinks": false,
"deduplicateSimilarURLs": true
}
}
```
### 5. Check Crawl Status (`firecrawl_check_crawl_status`)
Check the status of a crawl job.
```json theme={null}
{
"name": "firecrawl_check_crawl_status",
"arguments": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
**Returns:** Status and progress of the crawl job, including results if available.
### 6. Extract Tool (`firecrawl_extract`)
Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.
```json theme={null}
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://example.com/page1", "https://example.com/page2"],
"prompt": "Extract product information including name, price, and description",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
},
"required": ["name", "price"]
},
"allowExternalLinks": false,
"enableWebSearch": false,
"includeSubdomains": false
}
}
```
Example response:
```json theme={null}
{
"content": [
{
"type": "text",
"text": {
"name": "Example Product",
"price": 99.99,
"description": "This is an example product description"
}
}
],
"isError": false
}
```
#### Extract Tool Options:
* `urls`: Array of URLs to extract information from
* `prompt`: Custom prompt for the LLM extraction
* `schema`: JSON schema for structured data extraction
* `allowExternalLinks`: Allow extraction from external links
* `enableWebSearch`: Enable web search for additional context
* `includeSubdomains`: Include subdomains in extraction
When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses Firecrawl's managed LLM service.
### 7. Agent Tool (`firecrawl_agent`)
Autonomous web research agent that independently browses the internet, searches for information, navigates through pages, and extracts structured data based on your query. This runs asynchronously -- it returns a job ID immediately, and you poll `firecrawl_agent_status` to check when complete and retrieve results.
```json theme={null}
{
"name": "firecrawl_agent",
"arguments": {
"prompt": "Find the top 5 AI startups founded in 2024 and their funding amounts",
"schema": {
"type": "object",
"properties": {
"startups": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"funding": { "type": "string" },
"founded": { "type": "string" }
}
}
}
}
}
}
}
```
You can also provide specific URLs for the agent to focus on:
```json theme={null}
{
"name": "firecrawl_agent",
"arguments": {
"urls": ["https://docs.firecrawl.dev", "https://firecrawl.dev/pricing"],
"prompt": "Compare the features and pricing information from these pages"
}
}
```
#### Agent Tool Options:
* `prompt`: Natural language description of the data you want (required, max 10,000 characters)
* `urls`: Optional array of URLs to focus the agent on specific pages
* `schema`: Optional JSON schema for structured output
**Best for:** Complex research tasks where you don't know the exact URLs; multi-source data gathering; finding information scattered across the web; extracting data from JavaScript-heavy SPAs that fail with regular scrape.
**Returns:** Job ID for status checking. Use `firecrawl_agent_status` to poll for results.
### 8. Check Agent Status (`firecrawl_agent_status`)
Check the status of an agent job and retrieve results when complete. Poll every 15-30 seconds and keep polling for at least 2-3 minutes before considering the request failed.
```json theme={null}
{
"name": "firecrawl_agent_status",
"arguments": {
"id": "550e8400-e29b-41d4-a716-446655440000"
}
}
```
#### Agent Status Options:
* `id`: The agent job ID returned by `firecrawl_agent` (required)
**Possible statuses:**
* `processing`: Agent is still researching -- keep polling
* `completed`: Research finished -- response includes the extracted data
* `failed`: An error occurred
**Returns:** Status, progress, and results (if completed) of the agent job.
### 9. Create Browser Session (`firecrawl_browser_create`)
Create a persistent browser session for code execution via CDP (Chrome DevTools Protocol).
```json theme={null}
{
"name": "firecrawl_browser_create",
"arguments": {
"ttl": 120,
"activityTtl": 60
}
}
```
#### Browser Create Options:
* `ttl`: Total session lifetime in seconds (30-3600, optional)
* `activityTtl`: Idle timeout in seconds (10-3600, optional)
**Best for:** Running code (Python/JS) that interacts with a live browser page, multi-step browser automation, sessions with profiles that survive across multiple tool calls.
**Returns:** Session ID, CDP URL, and live view URL.
### 10. Execute Code in Browser (`firecrawl_browser_execute`)
Execute code in an active browser session. Supports agent-browser commands (bash), Python, or JavaScript.
```json theme={null}
{
"name": "firecrawl_browser_execute",
"arguments": {
"sessionId": "session-id-here",
"code": "agent-browser open https://example.com",
"language": "bash"
}
}
```
Python example with Playwright:
```json theme={null}
{
"name": "firecrawl_browser_execute",
"arguments": {
"sessionId": "session-id-here",
"code": "await page.goto('https://example.com')\ntitle = await page.title()\nprint(title)",
"language": "python"
}
}
```
#### Browser Execute Options:
* `sessionId`: The browser session ID (required)
* `code`: The code to execute (required)
* `language`: `bash`, `python`, or `node` (optional, defaults to `bash`)
**Common agent-browser commands (bash):**
* `agent-browser open ` -- Navigate to URL
* `agent-browser snapshot` -- Get accessibility tree with clickable refs
* `agent-browser click @e5` -- Click element by ref from snapshot
* `agent-browser type @e3 "text"` -- Type into element
* `agent-browser screenshot [path]` -- Take screenshot
* `agent-browser scroll down` -- Scroll page
* `agent-browser wait 2000` -- Wait 2 seconds
**Returns:** Execution result including stdout, stderr, and exit code.
### 11. Delete Browser Session (`firecrawl_browser_delete`)
Destroy a browser session.
```json theme={null}
{
"name": "firecrawl_browser_delete",
"arguments": {
"sessionId": "session-id-here"
}
}
```
#### Browser Delete Options:
* `sessionId`: The browser session ID to destroy (required)
**Returns:** Success confirmation.
### 12. List Browser Sessions (`firecrawl_browser_list`)
List browser sessions, optionally filtered by status.
```json theme={null}
{
"name": "firecrawl_browser_list",
"arguments": {
"status": "active"
}
}
```
#### Browser List Options:
* `status`: Filter by session status -- `active` or `destroyed` (optional)
**Returns:** Array of browser sessions.
### 13. Interact with Scraped Page (`firecrawl_interact`)
Interact with a previously scraped page in a live browser session. Scrape a page first with `firecrawl_scrape`, then use the returned `scrapeId` (from the scrape response metadata) to click buttons, fill forms, extract dynamic content, or navigate deeper. The response includes a `liveViewUrl` and `interactiveLiveViewUrl` you can open in your browser to watch or control the session in real time.
```json theme={null}
{
"name": "firecrawl_interact",
"arguments": {
"scrapeId": "scrape-id-from-previous-scrape",
"prompt": "Click the Sign In button"
}
}
```
#### Interact Tool Options:
* `scrapeId`: The scrape job ID from a previous `firecrawl_scrape` call (required)
* `prompt`: Natural language instruction describing the action to take (provide `prompt` or `code`)
* `code`: Code to execute in the browser session (provide `code` or `prompt`)
* `language`: `bash`, `python`, or `node` (optional, defaults to `node`, only used with `code`)
* `timeout`: Execution timeout in seconds, 1–300 (optional, defaults to 30)
**Best for:** Multi-step workflows on a single page — searching a site, clicking through results, filling forms, extracting data that requires interaction.
**Returns:** Interaction result including `liveViewUrl` and `interactiveLiveViewUrl`.
### 14. Stop Interact Session (`firecrawl_interact_stop`)
Stop an interact session for a scraped page. Call this when you are done interacting to free resources.
```json theme={null}
{
"name": "firecrawl_interact_stop",
"arguments": {
"scrapeId": "scrape-id-from-previous-scrape"
}
}
```
#### Interact Stop Options:
* `scrapeId`: The scrape ID for the session to stop (required)
**Returns:** Confirmation that the session has been stopped.
## Logging System
The server includes comprehensive logging:
* Operation status and progress
* Performance metrics
* Credit usage monitoring
* Rate limit tracking
* Error conditions
Example log messages:
```
[INFO] Firecrawl MCP Server initialized successfully
[INFO] Starting scrape for URL: https://example.com
[INFO] Starting crawl for URL: https://example.com
[WARNING] Credit usage has reached warning threshold
[ERROR] Rate limit exceeded, retrying in 2s...
```
## Error Handling
The server provides robust error handling:
* Automatic retries for transient errors
* Rate limit handling with backoff
* Detailed error messages
* Credit usage warnings
* Network resilience
Example error response:
```json theme={null}
{
"content": [
{
"type": "text",
"text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
}
],
"isError": true
}
```
## Development
```bash theme={null}
# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test
```
### Contributing
1. Fork the repository
2. Create your feature branch
3. Run tests: `npm test`
4. Submit a pull request
### Thanks to contributors
Thanks to [@vrknetha](https://github.com/vrknetha), [@cawstudios](https://caw.tech) for the initial implementation!
Thanks to MCP.so and Klavis AI for hosting and [@gstarwd](https://github.com/gstarwd), [@xiangkaiz](https://github.com/xiangkaiz) and [@zihaolin96](https://github.com/zihaolin96) for integrating our server.
## License
MIT License - see LICENSE file for details
# MCP Web Search & Scrape in Amp
Source: https://docs.firecrawl.dev/quickstarts/amp
Add Firecrawl web scraping and search to Sourcegraph Amp
Add Firecrawl's search, scrape, crawl, and browser tools to [Sourcegraph Amp](https://ampcode.com) via MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://www.firecrawl.dev/app) and copy your API key.
### 2. Add Firecrawl to Amp
Open Amp settings and add an MCP server. Amp accepts standard MCP config:
```json theme={null}
{
"amp.mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "fc-YOUR-API-KEY"
}
}
}
}
```
Replace `fc-YOUR-API-KEY` with your Firecrawl API key.
### 3. Reload Amp
Reload the Amp window. Firecrawl tools are now available to the agent.
## Quick Demo
```
Search the web for "Sourcegraph Cody vs Amp" and summarize the differences.
```
```
Scrape https://docs.firecrawl.dev and list the core endpoints.
```
```
Crawl https://example.com and output a site map as JSON.
```
## Remote Hosted URL (no Node.js required)
```json theme={null}
{
"amp.mcpServers": {
"firecrawl": {
"url": "https://mcp.firecrawl.dev/fc-YOUR-API-KEY/v2/mcp"
}
}
}
```
## Troubleshooting
* **Server fails to start** — check Amp's MCP log view for stderr output.
* **Missing `npx`** — install Node.js 18+ or use the remote hosted URL above.
# MCP Web Search & Scrape in Antigravity
Source: https://docs.firecrawl.dev/quickstarts/antigravity
Add Firecrawl web scraping and search to Google Antigravity
Add Firecrawl's search, scrape, crawl, and browser tools to [Google Antigravity](https://antigravity.google/) via MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://www.firecrawl.dev/app) and copy your API key.
### 2. Add Firecrawl to Antigravity
Open Antigravity settings (`Cmd/Ctrl + ,`), search for **MCP Servers**, and add a new server with:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "fc-YOUR-API-KEY"
}
}
}
}
```
Replace `fc-YOUR-API-KEY` with your Firecrawl API key.
### 3. Reload Antigravity
Reload the window (`Cmd/Ctrl + Shift + P` → `Reload Window`). The agent now has Firecrawl's web tools available.
## Quick Demo
In an Antigravity agent chat:
```
Search the web for "Vercel AI SDK v5 release notes" and summarize.
```
```
Scrape https://docs.firecrawl.dev/ai-onboarding and list every linked guide.
```
```
Crawl https://example.com and extract every page title.
```
Antigravity routes those tool calls through Firecrawl MCP automatically.
## Remote Hosted URL (no Node.js required)
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"url": "https://mcp.firecrawl.dev/fc-YOUR-API-KEY/v2/mcp"
}
}
}
```
## Troubleshooting
* **Server shows as "failed"** — check the MCP output panel for stderr. Most failures are a missing API key or `npx` not on `PATH`.
* **Tools not invoked** — explicitly mention Firecrawl (e.g., "Use Firecrawl to scrape…") in your first prompt so the agent picks the right tool.
# AutoGen
Source: https://docs.firecrawl.dev/quickstarts/autogen
Use Firecrawl as a tool inside Microsoft AutoGen multi-agent conversations.
Integrate Firecrawl with [Microsoft AutoGen](https://github.com/microsoft/autogen) to give multi-agent conversations live web search, scrape, and crawl tools.
## Setup
```bash theme={null}
pip install -U "autogen-agentchat" "autogen-ext[openai]" firecrawl-py
```
Set your keys:
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
export OPENAI_API_KEY=sk-YOUR-OPENAI-KEY
```
## Firecrawl as an AutoGen Tool
This example wraps Firecrawl's `scrape` and `search` as AutoGen function tools, then lets a single `AssistantAgent` use them to answer a question.
```python theme={null}
import asyncio
import os
from firecrawl import FirecrawlApp
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
firecrawl = FirecrawlApp(api_key=os.environ["FIRECRAWL_API_KEY"])
def scrape_url(url: str) -> str:
"""Scrape a URL and return clean markdown."""
result = firecrawl.scrape(url, formats=["markdown"])
return result.markdown or ""
def web_search(query: str, limit: int = 5) -> list[dict]:
"""Search the web and return the top results."""
result = firecrawl.search(query, limit=limit)
return [
{"title": r.title, "url": r.url, "snippet": r.description}
for r in result.web or []
]
async def main() -> None:
model = OpenAIChatCompletionClient(model="gpt-4o-mini")
researcher = AssistantAgent(
name="researcher",
model_client=model,
tools=[scrape_url, web_search],
system_message=(
"You are a web researcher. Use web_search to find candidate sources, "
"then scrape_url to read the most relevant ones. Cite URLs in your answer."
),
)
await Console(
researcher.run_stream(
task="What does Firecrawl's /agent endpoint do? Cite the docs."
)
)
if __name__ == "__main__":
asyncio.run(main())
```
Run it:
```bash theme={null}
python researcher.py
```
## Multi-Agent: Researcher + Writer
Hand Firecrawl output from a researcher agent to a writer agent in a round-robin team.
```python theme={null}
import asyncio
import os
from firecrawl import FirecrawlApp
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
firecrawl = FirecrawlApp(api_key=os.environ["FIRECRAWL_API_KEY"])
def scrape_url(url: str) -> str:
result = firecrawl.scrape(url, formats=["markdown"])
return result.markdown or ""
def web_search(query: str, limit: int = 5) -> list[dict]:
result = firecrawl.search(query, limit=limit)
return [
{"title": r.title, "url": r.url, "snippet": r.description}
for r in result.web or []
]
async def main() -> None:
model = OpenAIChatCompletionClient(model="gpt-4o-mini")
researcher = AssistantAgent(
name="researcher",
model_client=model,
tools=[scrape_url, web_search],
system_message="Gather sources with web_search + scrape_url. Reply with bullet-point findings and URLs.",
)
writer = AssistantAgent(
name="writer",
model_client=model,
system_message="Turn the researcher's findings into a 200-word briefing with inline citations.",
)
team = RoundRobinGroupChat(
[researcher, writer],
termination_condition=MaxMessageTermination(max_messages=6),
)
await Console(team.run_stream(task="Write a briefing on Firecrawl's crawl endpoint."))
if __name__ == "__main__":
asyncio.run(main())
```
## Notes
* Firecrawl's Python SDK is synchronous; AutoGen will call your wrappers inside its event loop without issues for small workloads. For heavy concurrent scraping, move calls off the main thread or use [batch scrape](/features/batch-scrape).
* Replace `OpenAIChatCompletionClient` with any AutoGen-supported model client (Azure OpenAI, Anthropic via `autogen-ext`, Ollama, etc.). Firecrawl is model-agnostic.
* See the [AutoGen docs](https://microsoft.github.io/autogen/) for agent patterns beyond round-robin (selector, swarm, nested teams).
# MCP Web Search & Scrape in Claude Code
Source: https://docs.firecrawl.dev/quickstarts/claude-code
Add web scraping and search to Claude Code in 2 minutes
Add web scraping and search capabilities to Claude Code with Firecrawl MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key.
### 2. Add Firecrawl MCP Server
**Option A: Remote hosted URL (recommended)**
```bash theme={null}
claude mcp add firecrawl --url https://mcp.firecrawl.dev/your-api-key/v2/mcp
```
**Option B: Local (npx)**
```bash theme={null}
claude mcp add firecrawl -e FIRECRAWL_API_KEY=your-api-key -- npx -y firecrawl-mcp
```
Replace `your-api-key` with your actual Firecrawl API key.
Done! You can now search and scrape the web from Claude Code.
## Quick Demo
Try these in Claude Code:
**Search the web:**
```
Search for the latest Next.js 15 features
```
**Scrape a page:**
```
Scrape firecrawl.dev and tell me what it does
```
**Get documentation:**
```
Find and scrape the Stripe API docs for payment intents
```
Claude will automatically use Firecrawl's search and scrape tools to get the information.
# MCP Web Search & Scrape in Codex CLI
Source: https://docs.firecrawl.dev/quickstarts/codex-cli
Add Firecrawl web scraping and search to OpenAI Codex CLI
Add Firecrawl's search, scrape, crawl, and browser tools to [OpenAI Codex CLI](https://github.com/openai/codex) via MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://www.firecrawl.dev/app) and copy your API key.
### 2. Add Firecrawl to Codex
Codex reads MCP server config from `~/.codex/config.toml`. Add the Firecrawl server:
```toml theme={null}
[mcp_servers.firecrawl]
command = "npx"
args = ["-y", "firecrawl-mcp"]
[mcp_servers.firecrawl.env]
FIRECRAWL_API_KEY = "fc-YOUR-API-KEY"
```
Replace `fc-YOUR-API-KEY` with your Firecrawl API key.
### 3. Start Codex
```bash theme={null}
codex
```
Codex discovers the Firecrawl tools on launch. Confirm they are loaded:
```bash theme={null}
/mcp
```
You should see `firecrawl` listed with tools like `firecrawl_search`, `firecrawl_scrape`, `firecrawl_crawl`, and `firecrawl_extract`.
## Quick Demo
Try these prompts:
```
Search the web for the latest Next.js App Router release notes and summarize.
```
```
Scrape https://docs.firecrawl.dev and list the top-level sections.
```
```
Crawl https://example.com and save the markdown for every page under /blog.
```
## Remote Hosted URL (no Node.js required)
If you prefer not to run `npx` locally:
```toml theme={null}
[mcp_servers.firecrawl]
url = "https://mcp.firecrawl.dev/fc-YOUR-API-KEY/v2/mcp"
```
## Troubleshooting
* **Codex doesn't see the tools** — run `codex --version` to confirm you're on a version with MCP support, then restart the CLI after editing `config.toml`.
* **`spawn npx ENOENT`** — install Node.js 18+ and ensure `npx` is on your `PATH`, or switch to the remote hosted URL above.
* **401 / invalid key** — regenerate an API key at [firecrawl.dev/app/api-keys](https://www.firecrawl.dev/app/api-keys).
# MCP Web Search & Scrape in Cursor
Source: https://docs.firecrawl.dev/quickstarts/cursor
Add web scraping and search to Cursor in 2 minutes
Add web scraping and search capabilities to Cursor with Firecrawl MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key.
### 2. Add to Cursor
Open Settings (`Cmd+,`), search for "MCP", and add:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "your_api_key_here"
}
}
}
}
```
Replace `your_api_key_here` with your actual Firecrawl API key.
### 3. Restart Cursor
Done! You can now search and scrape the web from Cursor.
## Quick Demo
Try these in Cursor Chat (`Cmd+K`):
**Search:**
```
Search for TypeScript best practices 2025
```
**Scrape:**
```
Scrape firecrawl.dev and tell me what it does
```
**Get docs:**
```
Scrape the React hooks documentation and explain useEffect
```
Cursor will automatically use Firecrawl tools.
## Windows Troubleshooting
If you see a `spawn npx ENOENT` or "No server info found" error on Windows, Cursor cannot find `npx` in your PATH. Try one of these fixes:
**Option A: Use the full path to `npx.cmd`**
Run `where npx` in Command Prompt to get the full path, then update your config:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "C:\\Program Files\\nodejs\\npx.cmd",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "your_api_key_here"
}
}
}
}
```
Replace the `command` path with the output from `where npx`.
**Option B: Use the remote hosted URL (no Node.js required)**
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"url": "https://mcp.firecrawl.dev/YOUR-API-KEY/v2/mcp"
}
}
}
```
Replace `YOUR-API-KEY` with your Firecrawl API key.
# MCP Web Search & Scrape in Gemini CLI
Source: https://docs.firecrawl.dev/quickstarts/gemini-cli
Add Firecrawl web scraping and search to Google Gemini CLI
Add Firecrawl's search, scrape, crawl, and browser tools to [Google Gemini CLI](https://github.com/google-gemini/gemini-cli) via MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://www.firecrawl.dev/app) and copy your API key.
### 2. Add Firecrawl to Gemini CLI
Gemini CLI reads MCP config from `~/.gemini/settings.json` (global) or `.gemini/settings.json` in your project. Add:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "fc-YOUR-API-KEY"
}
}
}
}
```
Replace `fc-YOUR-API-KEY` with your Firecrawl API key.
### 3. Launch Gemini CLI
```bash theme={null}
gemini
```
Confirm the server is loaded:
```
/mcp list
```
You should see `firecrawl` and its tools.
## Quick Demo
```
Use firecrawl to search the web for "Gemini 2.5 context window" and summarize the top 5 results.
```
```
Scrape https://ai.google.dev/gemini-api/docs and outline the sections.
```
```
Crawl https://example.com and extract the product names from /products.
```
## Remote Hosted URL (no Node.js required)
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"url": "https://mcp.firecrawl.dev/fc-YOUR-API-KEY/v2/mcp"
}
}
}
```
## Troubleshooting
* **Tools don't show up** — restart Gemini CLI after editing `settings.json`; MCP servers are loaded at startup.
* **`spawn npx ENOENT`** — install Node.js 18+ or use the remote hosted URL.
* **Rate-limited** — upgrade your Firecrawl plan at [firecrawl.dev/pricing](https://www.firecrawl.dev/pricing).
# Nous Research
Source: https://docs.firecrawl.dev/quickstarts/nous-research
Use Firecrawl as a tool with Nous Research Hermes models.
Pair [Nous Research](https://nousresearch.com) Hermes models with Firecrawl to give Hermes live web search, scrape, and crawl.
Hermes models support OpenAI-compatible tool calls, so Firecrawl plugs in as a function the model can invoke. You can reach Hermes via the Nous Portal API, via [OpenRouter](/quickstarts/openrouter), or through Nous's Forge agent platform.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js openai zod
```
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
export NOUS_API_KEY=YOUR-NOUS-PORTAL-KEY
```
## Hermes + Firecrawl Tool Call
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// Nous Portal is OpenAI-compatible
const nous = new OpenAI({
apiKey: process.env.NOUS_API_KEY,
baseURL: 'https://inference-api.nousresearch.com/v1',
});
const SearchArgs = z.object({
query: z.string().describe('The web search query'),
limit: z.number().int().min(1).max(10).default(5),
});
const ScrapeArgs = z.object({
url: z.string().describe('The URL to scrape'),
});
const tools = [
{
type: 'function' as const,
function: {
name: 'web_search',
description: 'Search the web with Firecrawl and return top results.',
parameters: z.toJSONSchema(SearchArgs),
},
},
{
type: 'function' as const,
function: {
name: 'scrape_website',
description: 'Scrape the markdown content of a URL.',
parameters: z.toJSONSchema(ScrapeArgs),
},
},
];
const response = await nous.chat.completions.create({
model: 'Hermes-4-405B',
tools,
messages: [
{
role: 'user',
content: 'Research Firecrawl\'s /agent endpoint and cite the docs.',
},
],
});
for (const call of response.choices[0]?.message.tool_calls ?? []) {
if (call.function.name === 'web_search') {
const { query, limit } = SearchArgs.parse(JSON.parse(call.function.arguments));
const results = await firecrawl.search(query, { limit });
console.log(results.web);
}
if (call.function.name === 'scrape_website') {
const { url } = ScrapeArgs.parse(JSON.parse(call.function.arguments));
const page = await firecrawl.scrape(url, { formats: ['markdown'] });
console.log(page.markdown);
}
}
```
## Hermes via OpenRouter
Prefer a single gateway for multiple models? Route Hermes through OpenRouter:
```typescript theme={null}
const client = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY,
baseURL: 'https://openrouter.ai/api/v1',
});
await client.chat.completions.create({
model: 'nousresearch/hermes-4-405b',
// ...same tools, same messages
});
```
See the [OpenRouter guide](/quickstarts/openrouter) for the full pattern.
## Notes
* Hermes is strong at structured tool output — pair with Firecrawl's [JSON format](/features/llm-extract) to chain scrape → extract cleanly.
* For long-running agent loops, stream tool calls and use Firecrawl's [async crawl](/features/crawl) so the model isn't blocked on large scrapes.
* Verify the exact Nous Portal base URL and model slug at [portal.nousresearch.com](https://portal.nousresearch.com) — model names update as new Hermes generations ship.
# OpenClaw
Source: https://docs.firecrawl.dev/quickstarts/openclaw
Use Firecrawl with OpenClaw to give your agents web scraping, search, and browser automation capabilities.
Integrate Firecrawl with OpenClaw to give your agents the ability to scrape, search, crawl, extract, and interact with the web — all through the [Firecrawl CLI](/sdks/cli).
## Why Firecrawl + OpenClaw
* **No local browser needed** — every session runs in a remote, isolated sandbox. No Chromium installs, no driver conflicts, no RAM pressure on your machine.
* **Real parallelism** — run many browser sessions at once without local resource fights. Agents can browse in batches across multiple sites simultaneously.
* **Secure by default** — navigation, DOM evaluation, and extraction all happen inside disposable sandboxes, not on your workstation.
* **Better token economics** — agents get back clean artifacts (snapshots, extracted fields) instead of hauling giant DOMs and driver logs into the context window.
* **Full web toolkit** — scrape, search, and browser automation all through a single CLI that your agent already knows how to use.
## Setup
Tell your agent to install the Firecrawl CLI, authenticate and initialize the skill with this command:
```bash theme={null}
npx -y firecrawl-cli init --browser --all
```
* `--all` installs the Firecrawl skill to every detected AI coding agent
* `--browser` opens the browser for Firecrawl authentication automatically
or install everything seperately:
```bash theme={null}
npm install -g firecrawl-cli
firecrawl init skills
firecrawl login --browser
# Or, skip the browser and provide your API key directly:
export FIRECRAWL_API_KEY="fc-YOUR-API-KEY"
```
Verify everything is set up correctly:
```bash theme={null}
firecrawl --status
```
Once the skill is installed, your OpenClaw agent automatically discovers and uses Firecrawl commands — no extra configuration needed.
## Scrape
Scrape a single page and extract its content:
```bash theme={null}
firecrawl https://example.com --only-main-content
```
Get specific formats:
```bash theme={null}
firecrawl https://example.com --format markdown,links --pretty
```
## Search
Search the web and optionally scrape the results:
```bash theme={null}
firecrawl search "latest AI funding rounds 2025" --limit 10
# Search and scrape the results
firecrawl search "OpenClaw documentation" --scrape --scrape-formats markdown
```
## Browser
Launch a remote browser session for interactive automation. Each session runs in an isolated sandbox — no local Chromium install required. `agent-browser` is pre-installed with 40+ commands.
```bash theme={null}
# Shorthand: auto-launches a session if none is active
firecrawl browser "open https://news.ycombinator.com"
firecrawl browser "snapshot"
firecrawl browser "scrape"
firecrawl browser close
```
Interact with page elements using refs from the snapshot:
```bash theme={null}
firecrawl browser "open https://example.com"
firecrawl browser "snapshot"
# snapshot returns @ref IDs — use them to interact
firecrawl browser "click @e5"
firecrawl browser "fill @e3 'search query'"
firecrawl browser "scrape"
firecrawl browser close
```
The shorthand form (`firecrawl browser "..."`) sends commands to `agent-browser` automatically and auto-launches a sandbox session if there isn't one active. Your agent issues intent-level actions (`open`, `click`, `fill`, `snapshot`, `scrape`) instead of writing Playwright code.
## Example: tell your agent
Here are some prompts you can give your OpenClaw agent:
* *"Use Firecrawl to scrape [https://example.com](https://example.com) and summarize the main content."*
* *"Search for the latest OpenAI news and give me a summary of the top 5 results."*
* *"Use Firecrawl Browser to open Hacker News, get the top 5 stories, and the first 10 comments on each."*
* *"Crawl the docs at [https://docs.firecrawl.dev](https://docs.firecrawl.dev) and save them to a file."*
## Further reading
* [Firecrawl CLI reference](/sdks/cli)
* [Browser Sandbox docs](/features/browser)
* [Agent docs](/features/agent)
# MCP Web Search & Scrape in OpenCode
Source: https://docs.firecrawl.dev/quickstarts/opencode
Add Firecrawl web scraping and search to OpenCode
Add Firecrawl's search, scrape, crawl, and browser tools to [OpenCode](https://opencode.ai) via MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://www.firecrawl.dev/app) and copy your API key.
### 2. Add Firecrawl to OpenCode
OpenCode reads config from `~/.config/opencode/config.json` (global) or `./opencode.json` in your project. Add:
```json theme={null}
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"firecrawl": {
"type": "local",
"command": ["npx", "-y", "firecrawl-mcp"],
"environment": {
"FIRECRAWL_API_KEY": "fc-YOUR-API-KEY"
}
}
}
}
```
Replace `fc-YOUR-API-KEY` with your Firecrawl API key.
### 3. Start OpenCode
```bash theme={null}
opencode
```
OpenCode loads MCP servers on startup. Confirm Firecrawl is attached:
```
/mcp
```
## Quick Demo
```
Search the web for "Bun 2.0 changelog" and summarize the top results.
```
```
Scrape https://docs.firecrawl.dev/introduction and list the code examples.
```
```
Crawl https://example.com/blog and save each post as markdown.
```
## Remote Hosted URL (no Node.js required)
```json theme={null}
{
"mcp": {
"firecrawl": {
"type": "remote",
"url": "https://mcp.firecrawl.dev/fc-YOUR-API-KEY/v2/mcp"
}
}
}
```
## Troubleshooting
* **Server not attached** — run `opencode doctor` to inspect MCP load errors.
* **Permission denied on `npx`** — install Node.js 18+ and ensure your shell picks up the install (`which npx`).
# OpenRouter
Source: https://docs.firecrawl.dev/quickstarts/openrouter
Use Firecrawl as a tool with any model served by OpenRouter.
Pair [OpenRouter](https://openrouter.ai) — a unified API for hundreds of LLMs — with Firecrawl to give any model live web search, scrape, and crawl.
OpenRouter's API is OpenAI-compatible, so you can use the OpenAI SDK pointed at OpenRouter's base URL plus Firecrawl's Python or JavaScript SDK as the tool.
## Setup
```bash theme={null}
npm install @mendable/firecrawl-js openai zod
```
```bash theme={null}
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
export OPENROUTER_API_KEY=sk-or-YOUR-OPENROUTER-KEY
```
## Scrape + Summarize with Any OpenRouter Model
This scrapes a page with Firecrawl and summarizes it with whatever model you pick from OpenRouter — here, Claude Haiku 4.5.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openrouter = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY,
baseURL: 'https://openrouter.ai/api/v1',
});
const scraped = await firecrawl.scrape('https://docs.firecrawl.dev', {
formats: ['markdown'],
});
const completion = await openrouter.chat.completions.create({
model: 'anthropic/claude-haiku-4.5',
messages: [
{ role: 'user', content: `Summarize in 5 bullets: ${scraped.markdown}` },
],
});
console.log(completion.choices[0]?.message.content);
```
Switch the `model` string to any [OpenRouter-supported model](https://openrouter.ai/models) — `openai/gpt-5`, `google/gemini-2.5-pro`, `meta-llama/llama-4-maverick`, etc.
## Tool Calling: Model Decides When to Scrape
OpenRouter supports OpenAI-style tool calls, so Firecrawl plugs in as a function the model can invoke.
```typescript theme={null}
import FirecrawlApp from '@mendable/firecrawl-js';
import OpenAI from 'openai';
import { z } from 'zod';
const firecrawl = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const openrouter = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY,
baseURL: 'https://openrouter.ai/api/v1',
});
const ScrapeArgs = z.object({
url: z.string().describe('The URL to scrape'),
});
const tools = [
{
type: 'function' as const,
function: {
name: 'scrape_website',
description: 'Scrape the markdown content of any URL via Firecrawl',
parameters: z.toJSONSchema(ScrapeArgs),
},
},
];
const response = await openrouter.chat.completions.create({
model: 'anthropic/claude-haiku-4.5',
tools,
messages: [
{
role: 'user',
content: 'What is Firecrawl? Visit firecrawl.dev and tell me.',
},
],
});
const call = response.choices[0]?.message.tool_calls?.[0];
if (call?.function.name === 'scrape_website') {
const { url } = ScrapeArgs.parse(JSON.parse(call.function.arguments));
const page = await firecrawl.scrape(url, { formats: ['markdown'] });
console.log(page.markdown);
}
```
## Python
```python theme={null}
import os
from firecrawl import FirecrawlApp
from openai import OpenAI
firecrawl = FirecrawlApp(api_key=os.environ["FIRECRAWL_API_KEY"])
openrouter = OpenAI(
api_key=os.environ["OPENROUTER_API_KEY"],
base_url="https://openrouter.ai/api/v1",
)
page = firecrawl.scrape("https://docs.firecrawl.dev", formats=["markdown"])
completion = openrouter.chat.completions.create(
model="anthropic/claude-haiku-4.5",
messages=[{"role": "user", "content": f"Summarize: {page.markdown}"}],
)
print(completion.choices[0].message.content)
```
## Notes
* Firecrawl is fully model-agnostic — pick any OpenRouter model without changing the scrape code.
* Many top OpenRouter apps (Cline, Roo Code, Kilo, Cursor, Continue) are themselves agent harnesses that can use Firecrawl MCP — see [MCP Server](/mcp-server) to wire Firecrawl into those directly.
* For large jobs, use [batch scrape](/features/batch-scrape) to stay within LLM context budgets.
# MCP Web Search & Scrape in Windsurf
Source: https://docs.firecrawl.dev/quickstarts/windsurf
Add web scraping and search to Windsurf in 2 minutes
Add web scraping and search capabilities to Windsurf with Firecrawl MCP.
## Quick Setup
### 1. Get Your API Key
Sign up at [firecrawl.dev/app](https://firecrawl.dev/app) and copy your API key.
### 2. Add to Windsurf
Add this to your `./codeium/windsurf/model_config.json`:
```json theme={null}
{
"mcpServers": {
"firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY"
}
}
}
}
```
Replace `YOUR_API_KEY` with your actual Firecrawl API key.
### 3. Restart Windsurf
Done! Windsurf can now search and scrape the web.
## Quick Demo
Try these in Windsurf:
**Search:**
```
Search for the latest Tailwind CSS features
```
**Scrape:**
```
Scrape firecrawl.dev and explain what it does
```
**Get docs:**
```
Find and scrape the Supabase authentication documentation
```
Windsurf's AI agents will automatically use Firecrawl tools.
# Rate Limits
Source: https://docs.firecrawl.dev/rate-limits
Rate limits for different pricing plans and API requests
When you exceed a rate or concurrency limit, the API returns a `429` response. See [Errors](/api-reference/errors) for the full error catalog and a retry-with-backoff snippet.
## Billing Model
Firecrawl uses subscription-based monthly plans. There is no pure pay-as-you-go model, but the **auto-recharge feature** provides flexible scaling. Once you subscribe to a plan, you can automatically purchase additional credits when you dip below a threshold. Larger auto-recharge packs offer better rates. To test before committing to a larger plan, start with the Free or Hobby tier.
Plan downgrades take effect at the next renewal. Unused-time credits are not issued.
## Concurrent Browser Limits
Concurrent browsers control how many pages Firecrawl can process for you in parallel. Your plan sets the ceiling; any jobs beyond it wait in a queue until a browser frees up.
Time spent in the queue counts against the request's [`timeout`](/advanced-scraping-guide#timing-and-cache) parameter, so you can set a lower timeout to fail fast instead of waiting. To see current availability before sending work, call the [Queue Status](/api-reference/endpoint/queue-status) endpoint. Jobs that are waiting in your concurrency queue will time out after a maximum of 48 hours.
### Current Plans
Plan
Concurrent Browsers
Max Queued Jobs
Free
2
50,000
Hobby
5
50,000
Standard
50
100,000
Growth
100
200,000
Scale / Enterprise
150+
300,000+
Each team has a maximum number of jobs that can be waiting in the concurrency queue. If you exceed this limit, new jobs will be rejected with a `429` status code until existing jobs complete. For larger plans with custom concurrency limits, the max queued jobs is 2,000 times your concurrency limit, capped at 2,000,000.
If you require higher concurrency limits, [contact us about enterprise plans](https://firecrawl.dev/enterprise).
### Extract Plans (Legacy)
Plan
Concurrent Browsers
Max Queued Jobs
Free
2
50,000
Starter
50
100,000
Explorer
100
200,000
Pro
200
400,000
## API Rate Limits
Rate limits are measured in requests per minute and are primarily in place to prevent abuse. When configured correctly, your real bottleneck will be concurrent browsers. Rate limits are applied per team, so all API keys on the same team share the same rate limit counters.
### Current Plans
Plan
/scrape
/map
/crawl
/search
/agent
/crawl/status
/agent/status
Free
10
10
1
5
10
1500
500
Hobby
100
100
15
50
100
1500
25000
Standard
500
500
50
250
500
1500
25000
Growth
5000
5000
250
2500
1000
1500
25000
Scale
7500
7500
750
7500
1000
25000
25000
These rate limits are enforced to ensure fair usage and availability of the API for all users. If you require higher limits, please contact us at [help@firecrawl.com](mailto:help@firecrawl.com) to discuss custom plans.
### Extract Endpoints
The extract endpoints share limits with the corresponding /agent rate limits.
### Batch Scrape Endpoints
The batch scrape endpoints share limits with the corresponding /crawl rate limits.
### Browser Sandbox
The browser sandbox endpoints have per-plan rate limits that scale with your subscription:
Plan
/browser
/browser/\{id}/execute
Free
2
10
Hobby
20
100
Standard
100
500
Growth
1,000
5,000
Scale
1,500
7,500
In addition, each team's plan determines how many browser sessions can be active simultaneously (see Concurrent Browser Limits above). If you exceed this limit, new session requests will return a `429` status code until existing sessions are destroyed.
### FIRE-1 Agent
Requests involving the FIRE-1 agent requests have separate rate limits that are counted independently for each endpoint:
Endpoint
Rate Limit (requests/min)
/scrape
10
/extract
10
### Extract Plans (Legacy)
Plan
/extract (requests/min)
/extract/status (requests/min)
Starter
100
25000
Explorer
500
25000
Pro
1000
25000
# Skill + CLI
Source: https://docs.firecrawl.dev/sdks/cli
Firecrawl Skill is an easy way for AI agents such as Claude Code, Antigravity and OpenCode to use Firecrawl through the CLI.
Search, scrape, and interact with the web directly from the terminal. The Firecrawl CLI works standalone or as a skill that AI coding agents like Claude Code, Antigravity, and OpenCode can discover and use automatically.
## Installation
If you are using an AI agent like Claude Code, you can install the Firecrawl skill below and the agent will set it up for you.
```bash theme={null}
npx -y firecrawl-cli@latest init --all --browser
```
* `--all` installs the Firecrawl skill to every detected AI coding agent
* `--browser` opens the browser for Firecrawl authentication automatically
After installing the skill, restart your agent for it to discover the new skill.
You can also manually install the Firecrawl CLI globally using npm:
```bash CLI theme={null}
# Install globally with npm
npm install -g firecrawl-cli
```
## Authentication
Before using the CLI, you need to authenticate with your Firecrawl API key.
### Login
```bash CLI theme={null}
# Interactive login (opens browser or prompts for API key)
firecrawl login
# Login with browser authentication (recommended for agents)
firecrawl login --browser
# Login with API key directly
firecrawl login --api-key fc-YOUR-API-KEY
# Or set via environment variable
export FIRECRAWL_API_KEY=fc-YOUR-API-KEY
```
### View Configuration
```bash CLI theme={null}
# View current configuration and authentication status
firecrawl view-config
```
### Logout
```bash CLI theme={null}
# Clear stored credentials
firecrawl logout
```
### Self-Hosted / Local Development
For self-hosted Firecrawl instances or local development, use the `--api-url` option:
```bash CLI theme={null}
# Use a local Firecrawl instance (no API key required)
firecrawl --api-url http://localhost:3002 scrape https://example.com
# Or set via environment variable
export FIRECRAWL_API_URL=http://localhost:3002
firecrawl scrape https://example.com
# Configure and persist the custom API URL
firecrawl config --api-url http://localhost:3002
```
When using a custom API URL (anything other than `https://api.firecrawl.dev`), API key authentication is automatically skipped, allowing you to use local instances without an API key.
### Check Status
Verify installation, authentication, and view rate limits:
```bash CLI theme={null}
firecrawl --status
```
Output when ready:
```
🔥 firecrawl cli v1.1.1
● Authenticated via FIRECRAWL_API_KEY
Concurrency: 0/100 jobs (parallel scrape limit)
Credits: 500,000 remaining
```
* **Concurrency**: Maximum parallel jobs. Run parallel operations close to this limit but not above.
* **Credits**: Remaining API credits. Each scrape/crawl consumes credits.
## Commands
### Scrape
Scrape a single URL and extract its content in various formats.
Use `--only-main-content` to get clean output without navigation, footers, and ads. This is recommended for most use cases where you want just the article or main page content.
```bash CLI theme={null}
# Scrape a URL (default: markdown output)
firecrawl https://example.com
# Or use the explicit scrape command
firecrawl scrape https://example.com
# Recommended: use --only-main-content for clean output without nav/footer
firecrawl https://example.com --only-main-content
```
#### Output Formats
```bash CLI theme={null}
# Get HTML output
firecrawl https://example.com --html
# Multiple formats (returns JSON)
firecrawl https://example.com --format markdown,links
# Get images from a page
firecrawl https://example.com --format images
# Get a summary of the page content
firecrawl https://example.com --format summary
# Track changes on a page
firecrawl https://example.com --format changeTracking
# Available formats: markdown, html, rawHtml, links, screenshot, json, images, summary, changeTracking, attributes, branding
```
#### Scrape Options
```bash CLI theme={null}
# Extract only main content (removes navs, footers)
firecrawl https://example.com --only-main-content
# Wait for JavaScript rendering
firecrawl https://example.com --wait-for 3000
# Take a screenshot
firecrawl https://example.com --screenshot
# Include/exclude specific HTML tags
firecrawl https://example.com --include-tags article,main
firecrawl https://example.com --exclude-tags nav,footer
# Save output to file
firecrawl https://example.com -o output.md
# Pretty print JSON output
firecrawl https://example.com --format markdown,links --pretty
# Force JSON output even with single format
firecrawl https://example.com --json
# Show request timing information
firecrawl https://example.com --timing
```
**Available Options:**
| Option | Short | Description |
| ----------------------- | ----- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--url ` | `-u` | URL to scrape (alternative to positional argument) |
| `--format ` | `-f` | Output formats (comma-separated): `markdown`, `html`, `rawHtml`, `links`, `screenshot`, `json`, `images`, `summary`, `changeTracking`, `attributes`, `branding` |
| `--html` | `-H` | Shortcut for `--format html` |
| `--only-main-content` | | Extract only main content |
| `--wait-for ` | | Wait time in milliseconds for JS rendering |
| `--screenshot` | | Take a screenshot |
| `--include-tags ` | | HTML tags to include (comma-separated) |
| `--exclude-tags ` | | HTML tags to exclude (comma-separated) |
| `--output ` | `-o` | Save output to file |
| `--json` | | Force JSON output even with single format |
| `--pretty` | | Pretty print JSON output |
| `--timing` | | Show request timing and other useful information |
***
### Search
Search the web and optionally scrape the results.
```bash CLI theme={null}
# Search the web
firecrawl search "web scraping tutorials"
# Limit results
firecrawl search "AI news" --limit 10
# Pretty print results
firecrawl search "machine learning" --pretty
```
#### Search Options
```bash CLI theme={null}
# Search specific sources
firecrawl search "AI" --sources web,news,images
# Search with category filters
firecrawl search "react hooks" --categories github
firecrawl search "machine learning" --categories research,pdf
# Time-based filtering
firecrawl search "tech news" --tbs qdr:h # Last hour
firecrawl search "tech news" --tbs qdr:d # Last day
firecrawl search "tech news" --tbs qdr:w # Last week
firecrawl search "tech news" --tbs qdr:m # Last month
firecrawl search "tech news" --tbs qdr:y # Last year
# Location-based search
firecrawl search "restaurants" --location "Berlin,Germany" --country DE
# Search and scrape results
firecrawl search "documentation" --scrape --scrape-formats markdown
# Save to file
firecrawl search "firecrawl" --pretty -o results.json
```
**Available Options:**
| Option | Description |
| ---------------------------- | ------------------------------------------------------------------------------------------- |
| `--limit ` | Maximum results (default: 5, max: 100) |
| `--sources ` | Sources to search: `web`, `images`, `news` (comma-separated) |
| `--categories ` | Filter by category: `github`, `research`, `pdf` (comma-separated) |
| `--tbs ` | Time filter: `qdr:h` (hour), `qdr:d` (day), `qdr:w` (week), `qdr:m` (month), `qdr:y` (year) |
| `--location ` | Geo-targeting (e.g., "Berlin,Germany") |
| `--country ` | ISO country code (default: US) |
| `--timeout ` | Timeout in milliseconds (default: 60000) |
| `--ignore-invalid-urls` | Exclude URLs invalid for other Firecrawl endpoints |
| `--scrape` | Scrape search results |
| `--scrape-formats ` | Formats for scraped content (default: markdown) |
| `--only-main-content` | Include only main content when scraping (default: true) |
| `--json` | Output as JSON |
| `--output ` | Save output to file |
| `--pretty` | Pretty print JSON output |
***
### Map
Discover all URLs on a website quickly.
```bash CLI theme={null}
# Discover all URLs on a website
firecrawl map https://example.com
# Output as JSON
firecrawl map https://example.com --json
# Limit number of URLs
firecrawl map https://example.com --limit 500
```
#### Map Options
```bash CLI theme={null}
# Filter URLs by search query
firecrawl map https://example.com --search "blog"
# Include subdomains
firecrawl map https://example.com --include-subdomains
# Control sitemap usage
firecrawl map https://example.com --sitemap include # Use sitemap
firecrawl map https://example.com --sitemap skip # Skip sitemap
firecrawl map https://example.com --sitemap only # Only use sitemap
# Ignore query parameters (dedupe URLs)
firecrawl map https://example.com --ignore-query-parameters
# Wait for map to complete with timeout
firecrawl map https://example.com --wait --timeout 60
# Save to file
firecrawl map https://example.com -o urls.txt
firecrawl map https://example.com --json --pretty -o urls.json
```
**Available Options:**
| Option | Description |
| --------------------------- | ----------------------------------------------- |
| `--url ` | URL to map (alternative to positional argument) |
| `--limit ` | Maximum URLs to discover |
| `--search ` | Filter URLs by search query |
| `--sitemap ` | Sitemap handling: `include`, `skip`, `only` |
| `--include-subdomains` | Include subdomains |
| `--ignore-query-parameters` | Treat URLs with different params as same |
| `--wait` | Wait for map to complete |
| `--timeout ` | Timeout in seconds |
| `--json` | Output as JSON |
| `--output ` | Save output to file |
| `--pretty` | Pretty print JSON output |
***
### Browser
Have your agents interact with the web using a secure browser sandbox.
Launch cloud browser sessions and execute Python, JavaScript, or bash code remotely. Each session runs a full Chromium instance — no local browser install required. Code runs server-side with a pre-configured [Playwright](https://playwright.dev/) `page` object ready to use.
```bash CLI theme={null}
# Launch a cloud browser session
firecrawl browser launch-session
# Execute agent-browser commands (default - "agent-browser" is auto-prefixed)
firecrawl browser execute "open https://example.com"
firecrawl browser execute "snapshot"
firecrawl browser execute "click @e5"
firecrawl browser execute "scrape"
# Execute Playwright Python code
firecrawl browser execute --python 'await page.goto("https://example.com")
print(await page.title())'
# Execute Playwright JavaScript code
firecrawl browser execute --node 'await page.goto("https://example.com"); console.log(await page.title());'
# List all sessions (or: list active / list destroyed)
firecrawl browser list
# Close the active session
firecrawl browser close
```
#### Browser Options
```bash CLI theme={null}
# Launch with custom TTL (10 minutes) and live view
firecrawl browser launch-session --ttl 600 --stream
# Launch with inactivity timeout
firecrawl browser launch-session --ttl 120 --ttl-inactivity 60
# agent-browser commands (default - "agent-browser" is auto-prefixed)
firecrawl browser execute "open https://news.ycombinator.com"
firecrawl browser execute "snapshot"
firecrawl browser execute "click @e3"
firecrawl browser execute "scrape"
# Playwright Python - navigate, interact, extract
firecrawl browser execute --python '
await page.goto("https://news.ycombinator.com")
items = await page.query_selector_all(".titleline > a")
for item in items[:5]:
print(await item.text_content())
'
# Playwright JavaScript - same page object
firecrawl browser execute --node '
await page.goto("https://example.com");
const title = await page.title();
console.log(title);
'
# Explicit bash mode - runs in the sandbox
firecrawl browser execute --bash "agent-browser snapshot"
# Target a specific session
firecrawl browser execute --session --python 'print(await page.title())'
# Save output to file
firecrawl browser execute "scrape" -o result.txt
# Close a specific session
firecrawl browser close --session
# List sessions (all / active / destroyed)
firecrawl browser list
firecrawl browser list active --json
```
**Subcommands:**
| Subcommand | Description |
| ---------------- | ----------------------------------------------------------------------------------- |
| `launch-session` | Launch a new cloud browser session (returns session ID, CDP URL, and live view URL) |
| `execute ` | Execute Playwright Python/JS code or bash commands in a session |
| `list [status]` | List browser sessions (filter by `active` or `destroyed`) |
| `close` | Close a browser session |
**Execute Options:**
| Option | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `--bash` | Execute bash commands remotely in the sandbox (default). [agent-browser](https://github.com/vercel-labs/agent-browser) (40+ commands) is pre-installed and auto-prefixed. `CDP_URL` is auto-injected so agent-browser connects to your session automatically. Best approach for AI agents. |
| `--python` | Execute as Playwright Python code. A Playwright `page` object is available — use `await page.goto()`, `await page.title()`, etc. |
| `--node` | Execute as Playwright JavaScript code. Same `page` object available. |
| `--session ` | Target a specific session (default: active session) |
**Launch Options:**
| Option | Description |
| ---------------------------- | ------------------------------------------------------------------- |
| `--ttl ` | Total session TTL (default: 600, range: 30–3600) |
| `--ttl-inactivity ` | Auto-close after inactivity (range: 10–3600) |
| `--profile ` | Name for a profile (saves and reuses browser state across sessions) |
| `--no-save-changes` | Load existing profile data without saving changes back |
| `--stream` | Enable live view streaming |
**Common Options:**
| Option | Description |
| ----------------- | --------------------- |
| `--output ` | Save output to file |
| `--json` | Output as JSON format |
***
### Crawl
Crawl an entire website starting from a URL.
```bash CLI theme={null}
# Start a crawl (returns job ID immediately)
firecrawl crawl https://example.com
# Wait for crawl to complete
firecrawl crawl https://example.com --wait
# Wait with progress indicator
firecrawl crawl https://example.com --wait --progress
```
#### Check Crawl Status
```bash CLI theme={null}
# Check crawl status using job ID
firecrawl crawl
# Example with a real job ID
firecrawl crawl 550e8400-e29b-41d4-a716-446655440000
```
#### Crawl Options
```bash CLI theme={null}
# Limit crawl depth and pages
firecrawl crawl https://example.com --limit 100 --max-depth 3 --wait
# Include only specific paths
firecrawl crawl https://example.com --include-paths /blog,/docs --wait
# Exclude specific paths
firecrawl crawl https://example.com --exclude-paths /admin,/login --wait
# Include subdomains
firecrawl crawl https://example.com --allow-subdomains --wait
# Crawl entire domain
firecrawl crawl https://example.com --crawl-entire-domain --wait
# Rate limiting
firecrawl crawl https://example.com --delay 1000 --max-concurrency 2 --wait
# Custom polling interval and timeout
firecrawl crawl https://example.com --wait --poll-interval 10 --timeout 300
# Save results to file
firecrawl crawl https://example.com --wait --pretty -o results.json
```
**Available Options:**
| Option | Description |
| --------------------------- | ------------------------------------------------- |
| `--url ` | URL to crawl (alternative to positional argument) |
| `--wait` | Wait for crawl to complete |
| `--progress` | Show progress indicator while waiting |
| `--poll-interval ` | Polling interval (default: 5) |
| `--timeout ` | Timeout when waiting |
| `--status` | Check status of existing crawl job |
| `--limit ` | Maximum pages to crawl |
| `--max-depth ` | Maximum crawl depth |
| `--include-paths ` | Paths to include (comma-separated) |
| `--exclude-paths ` | Paths to exclude (comma-separated) |
| `--sitemap ` | Sitemap handling: `include`, `skip`, `only` |
| `--allow-subdomains` | Include subdomains |
| `--allow-external-links` | Follow external links |
| `--crawl-entire-domain` | Crawl entire domain |
| `--ignore-query-parameters` | Treat URLs with different params as same |
| `--delay ` | Delay between requests |
| `--max-concurrency ` | Maximum concurrent requests |
| `--output ` | Save output to file |
| `--pretty` | Pretty print JSON output |
***
### Agent
Search and gather data from the web using natural language prompts.
```bash CLI theme={null}
# Basic usage - URLs are optional
firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait
# Focus on specific URLs
firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait
# Use a schema for structured output
firecrawl agent "Get company information" --urls https://example.com --schema '{"name": "string", "founded": "number"}' --wait
# Use schema from a file
firecrawl agent "Get product details" --urls https://example.com --schema-file schema.json --wait
```
#### Agent Options
```bash CLI theme={null}
# Use Spark 1 Pro for higher accuracy
firecrawl agent "Competitive analysis across multiple domains" --model spark-1-pro --wait
# Set max credits to limit costs
firecrawl agent "Gather contact information from company websites" --max-credits 100 --wait
# Check status of an existing job
firecrawl agent --status
# Custom polling interval and timeout
firecrawl agent "Summarize recent blog posts" --wait --poll-interval 10 --timeout 300
# Save output to file
firecrawl agent "Find pricing information" --urls https://example.com --wait -o pricing.json --pretty
```
**Available Options:**
| Option | Description |
| --------------------------- | -------------------------------------------------------------------------------------- |
| `--urls ` | Optional list of URLs to focus the agent on (comma-separated) |
| `--model ` | Model to use: `spark-1-mini` (default, 60% cheaper) or `spark-1-pro` (higher accuracy) |
| `--schema ` | JSON schema for structured output (inline JSON string) |
| `--schema-file ` | Path to JSON schema file for structured output |
| `--max-credits ` | Maximum credits to spend (job fails if limit reached) |
| `--status` | Check status of existing agent job |
| `--wait` | Wait for agent to complete before returning results |
| `--poll-interval ` | Polling interval when waiting (default: 5) |
| `--timeout ` | Timeout when waiting (default: no timeout) |
| `--output ` | Save output to file |
| `--json` | Output as JSON format |
***
### Credit Usage
Check your team's credit balance and usage.
```bash CLI theme={null}
# View credit usage
firecrawl credit-usage
# Output as JSON
firecrawl credit-usage --json --pretty
```
***
### Version
Display the CLI version.
```bash CLI theme={null}
firecrawl version
# or
firecrawl --version
```
## Global Options
These options are available for all commands:
| Option | Short | Description |
| ----------------- | ----- | ------------------------------------------------------ |
| `--status` | | Show version, auth, concurrency, and credits |
| `--api-key ` | `-k` | Override stored API key for this command |
| `--api-url ` | | Use custom API URL (for self-hosted/local development) |
| `--help` | `-h` | Show help for a command |
| `--version` | `-V` | Show CLI version |
## Output Handling
The CLI outputs to stdout by default, making it easy to pipe or redirect:
```bash CLI theme={null}
# Pipe markdown to another command
firecrawl https://example.com | head -50
# Redirect to a file
firecrawl https://example.com > output.md
# Save JSON with pretty formatting
firecrawl https://example.com --format markdown,links --pretty -o data.json
```
### Format Behavior
* **Single format**: Outputs raw content (markdown text, HTML, etc.)
* **Multiple formats**: Outputs JSON with all requested data
```bash CLI theme={null}
# Raw markdown output
firecrawl https://example.com --format markdown
# JSON output with multiple formats
firecrawl https://example.com --format markdown,links
```
## Examples
### Quick Scrape
```bash CLI theme={null}
# Get markdown content from a URL (use --only-main-content for clean output)
firecrawl https://docs.firecrawl.dev --only-main-content
# Get HTML content
firecrawl https://example.com --html -o page.html
```
### Full Site Crawl
```bash CLI theme={null}
# Crawl a docs site with limits
firecrawl crawl https://docs.example.com --limit 50 --max-depth 2 --wait --progress -o docs.json
```
### Site Discovery
```bash CLI theme={null}
# Find all blog posts
firecrawl map https://example.com --search "blog" -o blog-urls.txt
```
### Research Workflow
```bash CLI theme={null}
# Search and scrape results for research
firecrawl search "machine learning best practices 2024" --scrape --scrape-formats markdown --pretty
```
### Agent
```bash CLI theme={null}
# URLs are optional
firecrawl agent "Find the top 5 AI startups and their funding amounts" --wait
# Focus on specific URLs
firecrawl agent "Compare pricing plans" --urls https://slack.com/pricing,https://teams.microsoft.com/pricing --wait
```
### Browser Automation
```bash CLI theme={null}
# Launch a session, scrape a page, and close
firecrawl browser launch-session
firecrawl browser execute "open https://news.ycombinator.com"
firecrawl browser execute "snapshot"
firecrawl browser execute "scrape"
firecrawl browser close
# Use agent-browser via bash mode (default — recommended for AI agents)
firecrawl browser launch-session
firecrawl browser execute "open https://example.com"
firecrawl browser execute "snapshot"
# snapshot returns @ref IDs — use them to interact
firecrawl browser execute "click @e5"
firecrawl browser execute "fill @e3 'search query'"
firecrawl browser execute "scrape"
# Run --help to see all 40+ commands
firecrawl browser execute --bash "agent-browser --help"
firecrawl browser close
```
### Combine with Other Tools
```bash CLI theme={null}
# Extract URLs from search results
jq -r '.data.web[].url' search-results.json
# Get titles from search results
jq -r '.data.web[] | "\(.title): \(.url)"' search-results.json
# Extract links and process with jq
firecrawl https://example.com --format links | jq '.links[].url'
# Count URLs from map
firecrawl map https://example.com | wc -l
```
## Telemetry
The CLI collects anonymous usage data during authentication to help improve the product:
* CLI version, OS, and Node.js version
* Development tool detection (e.g., Cursor, VS Code, Claude Code)
**No command data, URLs, or file contents are collected via the CLI.**
To disable telemetry, set the environment variable:
```bash CLI theme={null}
export FIRECRAWL_NO_TELEMETRY=1
```
## Open Source
The Firecrawl CLI and Skill are open source and available on GitHub: [firecrawl/cli](https://github.com/firecrawl/cli)
> Are you an AI agent that needs a Firecrawl API key? See [firecrawl.dev/agent-onboarding/SKILL.md](https://www.firecrawl.dev/agent-onboarding/SKILL.md) for automated onboarding instructions.
# .NET
Source: https://docs.firecrawl.dev/sdks/dotnet
Firecrawl .NET SDK is a wrapper around the Firecrawl API to help you easily turn websites into markdown.
## Installation
The official .NET SDK is maintained in the Firecrawl monorepo at [apps/dot-net-sdk](https://github.com/firecrawl/firecrawl/tree/main/apps/dot-net-sdk).
To install the Firecrawl .NET SDK, add the NuGet package:
```bash theme={null}
dotnet add package firecrawl-sdk
```
```powershell theme={null}
Install-Package firecrawl-sdk
```
```xml theme={null}
```
Requires .NET 8.0 or later.
## Usage
1. Get an API key from [firecrawl.dev](https://firecrawl.dev)
2. Set the API key as an environment variable named `FIRECRAWL_API_KEY`, or pass it to the `FirecrawlClient` constructor
Here is a quick example using the current SDK API surface:
```csharp theme={null}
using Firecrawl;
using Firecrawl.Models;
var client = new FirecrawlClient("fc-your-api-key");
// Scrape a single page
var doc = await client.ScrapeAsync("https://firecrawl.dev",
new ScrapeOptions { Formats = new List