Skip to main content
Build autonomous web search agents and research assistants that can find, analyze, and synthesize information from millions of web pages using natural language.

Before you start

Before you begin, make sure you have:
  • Python 3.9 or later
  • LangChain installed (pip install langchain-core)
  • CatchAll API key (obtain from platform.newscatcherapi.com)
  • Basic familiarity with LangChain concepts (agents, tools, LLMs)

Installation

    pip install langchain-catchall

Quickstart

Submit a search query and get structured results:
import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])

# Search and wait for results (10-15 minutes)
result = client.search("Semiconductor company earnings announcements")

print(f"Found {result.valid_records} records")
for record in result.all_records[:3]:
    print(f"- {record.record_title}")
Jobs typically complete in 10-15 minutes. The search() method handles submission, polling, and retrieval automatically.

CatchAllClient

CatchAllClient wraps the CatchAll Python SDK with LangChain-friendly patterns. Use it for manual control in scripts, data pipelines, and async applications.

Initialize client

import os
from langchain_catchall import CatchAllClient

client = CatchAllClient( api_key=os.environ["CATCHALL_API_KEY"],
poll_interval=30, # Check status every 30 seconds (recommended: 30-60s)
max_wait_time=2400, # Timeout after 40 minutes (typical jobs: 10-15 min) 
)

Submit job

Create a new search job:
job_id = client.submit_job(
    query="AI company acquisitions and mergers",
    context="Focus on deal size and technology sector",
    schema="[ACQUIRER] acquired [TARGET] for [AMOUNT]",
)
print(f"Job submitted: {job_id}")

Wait for completion

Block until job finishes:
client.wait_for_completion(job_id)
print("Job completed!")
Raises TimeoutError if job exceeds max_wait_time.

Retrieve results

Get structured records:
# Get first page
result = client.get_results(job_id, page=1, page_size=100)

# Get all pages
result = client.get_all_results(job_id)

for record in result.all_records:
    print(f"Title: {record.record_title}")
    print(f"Data: {record.enrichment}")
    print(f"Sources: {len(record.citations)} articles")
Combine submit, wait, and retrieve in one call:
result = client.search(
    query="Data breach incidents at financial institutions",
    context="Include incident type and affected customer count",
)

print(f"Found {result.valid_records} records")
Set wait=False to return immediately without waiting:
result = client.search("FDA drug approvals for oncology treatments", wait=False)
print(f"Job ID: {result.job_id}")
# Retrieve later with client.get_all_results(result.job_id)

List jobs

View all jobs for your API key:
jobs = client.list_jobs()

for job in jobs:
    print(f"{job.job_id}: {job.query}")

Advanced patterns

Store job ID for later retrieval (useful for data pipelines):
import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])

# Submit and store job_id for later retrieval
job_id = client.submit_job("Technology company IPO filings")

# Store job_id (example using a dict - replace with your database)
job_cache = {}
job_cache["ipo_tracker"] = job_id

# Later: Check if completed and retrieve
status = client.get_status(job_id)
completed = any(s.status == 'completed' and s.completed for s in status.steps)

if completed:
    result = client.get_all_results(job_id)
    print(f"Retrieved {result.valid_records} records from cached job")

CatchAllTools

CatchAllTools provides ready-to-use tools for LangGraph agents with built-in caching. Search once, then analyze many times without additional API costs.

Initialize toolkit

import os
from langchain_openai import ChatOpenAI
from langchain_catchall import CatchAllTools

llm = ChatOpenAI(model="gpt-4o")

toolkit = CatchAllTools(
    api_key=os.environ["CATCHALL_API_KEY"],
    llm=llm,
    max_results=100,     # Balance between context size and completeness
    verbose=True,        # Show progress bars and logs
)

tools = toolkit.get_tools()

Available tools

The toolkit provides two tools:
  • catchall_search_data: Initialize new search (10-15 min operation).
  • catchall_analyze_data: Query cached results (instant).
CATCHALL_AGENT_PROMPT teaches the agent when to search vs. analyze. This prompt is critical for cost-effective operation.

Create agent

Build an autonomous research agent with LangGraph:
import os
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langchain.messages import SystemMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

# Initialize components
llm = ChatOpenAI(model="gpt-4o")
toolkit = CatchAllTools(
    api_key=os.environ["CATCHALL_API_KEY"],
    llm=llm,
    verbose=True
)
tools = toolkit.get_tools()

# Create agent with prompt
agent = create_react_agent(model=llm, tools=tools)
messages = [SystemMessage(content=CATCHALL_AGENT_PROMPT)]

# Run agent
response = agent.invoke({
    "messages": messages + [("user", "Find technology company acquisitions announced this week")]
})

print(response["messages"][-1].content)

Conversational agent pattern

Build an interactive agent that remembers previous searches:
import os
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langchain.messages import SystemMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

# Setup
llm = ChatOpenAI(model="gpt-4o")
toolkit = CatchAllTools(
    api_key=os.environ["CATCHALL_API_KEY"],
    llm=llm,
    verbose=True
)
tools = toolkit.get_tools()

agent = create_react_agent(model=llm, tools=tools)
messages = [SystemMessage(content=CATCHALL_AGENT_PROMPT)]

# Initial search
messages.append(("user", "Find articles about corporate headquarters relocations and office openings in the US"))
response = agent.invoke({"messages": messages})
messages.append(("assistant", response["messages"][-1].content))

# Follow-up 1: Filter cached data
messages.append(("user", "Show only California locations"))
response = agent.invoke({"messages": messages})
messages.append(("assistant", response["messages"][-1].content))

# Follow-up 2: Analyze cached data
messages.append(("user", "What are the top 3 cities by number of openings?"))
response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)
Key pattern:
  1. First message: Agent calls catchall_search_data (10-15 min)
  2. Follow-up messages: Agent calls catchall_analyze_data (instant)
  3. New topic: Agent calls catchall_search_data again

Search once, analyze many

The most powerful pattern: perform expensive search once, then run unlimited free analyses:
import os
from langchain_catchall import CatchAllClient, query_with_llm
from langchain_openai import ChatOpenAI

# Setup
client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])
llm = ChatOpenAI(model="gpt-4o")

# Search once (10-15 minutes, costs API credits)
result = client.search("Cloud computing company quarterly earnings")

# Analyze many times (instant, no additional cost)
questions = [
    "Which companies had highest revenue growth?",
    "Compare profit margins across companies",
    "What are key trends in the earnings reports?",
    "List companies by market cap",
    "Summarize cloud computing revenue",
]

for question in questions:
    answer = query_with_llm(result, question, llm)
    print(f"\nQ: {question}")
    print(f"A: {answer}")
This pattern is ideal for:
  • Financial analysis (analyze same dataset from multiple angles)
  • Research reports (extract different insights from one search)
  • Exploratory data analysis (iterate on questions without re-fetching)

Error handling

Handle timeouts and failures gracefully:
import os
from langchain_catchall import CatchAllClient

client = CatchAllClient(
    api_key=os.environ["CATCHALL_API_KEY"],
    max_wait_time=2400  # 40 minutes
)

try:
    result = client.search("Venture capital funding rounds across all industries")
    print(f"Success: {result.valid_records} records")

except TimeoutError as e:
    print(f"Search timed out after 30 minutes: {e}")
    # Retry with narrower query
    result = client.search("Series B funding rounds for fintech startups")

except Exception as e:
    print(f"Unexpected error: {e}")
    raise

Monitors

Monitors automate recurring CatchAll searches with scheduled execution. The langchain-catchall package does not support Monitors. To use Monitors, install the underlying SDK:
pip install newscatcher-catchall-sdk
See the Monitors documentation for complete usage guide.

Next steps

See also