Skip to main content
CatchAll is a web search API that generates unique datasets that don’t exist anywhere else on the web. Built on NewsCatcher’s proprietary real-world event index, it delivers state-of-the-art recall—finding all relevant events, not just top results.

How it works

CatchAll processes queries through a multi-stage pipeline that analyzes over 50,000 web pages per job:
1

Analyze

Generates targeted search queries for NewsCatcher’s proprietary news index and creates validation rules and extraction patterns based on your input.
2

Fetch

Retrieves and processes 50,000+ articles from web sources to ensure comprehensive coverage.
3

Cluster

Groups related articles into distinct real-world events using the Leiden algorithm for community detection.
4

Validate

Applies generated validators to filter clusters, ensuring only relevant events that match your criteria proceed to extraction.
5

Extract

Transforms validated events into structured JSON records with dynamic schemas tailored to your query.
Processing typically takes 10-15 minutes per job. Poll the status endpoint every 30-60 seconds to track progress through each stage.
Each job returns structured JSON records with dynamic schemas. Fields like company_name, deal_value, and acquisition_date are automatically generated based on your query.See the Quickstart > Review response for a complete response example.

Key characteristics

CatchAll searches NewsCatcher’s continuously updated index of 2+ billion web pages, optimized for finding real-world events (acquisitions, approvals, incidents) within a recent timeframe. The system excels at comprehensive event discovery, not static content retrieval.
Learn how to construct effective event queries in Write effective queries.
Each job generates a unique response schema. Field names and structure in the enrichment object vary between jobs—even with identical inputs.Guaranteed fields in every record:
  • record_id
  • record_title
  • enrichment object
  • citations array
Variable fields:
  • All fields inside enrichment (names, types, structure)
See Understanding dynamic schemas for integration patterns.
Control what data gets extracted by providing custom validators and enrichments, or let the system generate them automatically based on your query.Custom validators filter which events are relevant:
{
  "name": "is_acquisition",
  "description": "true if article describes an acquisition",
  "type": "boolean"
}
Custom enrichments define what data to extract:
{
  "name": "acquiring_company",
  "description": "Extract the acquiring company name",
  "type": "company"
}
The company enrichment type extracts structured data including name, alternative names, website candidates, people, and address.
Use the POST /catchAll/initialize endpoint to get suggested validators and enrichments before submitting your job.
Specify custom date ranges for article search, or let the system determine the optimal time window based on your query (default: 5 days).Date ranges are validated against your plan’s allowed lookback period. If your requested dates exceed plan limits, the API returns a 400 error with specific guidance.
Use the POST /catchAll/initialize endpoint to preview date adjustments before submitting.
Identical queries can produce different results:
  • LLMs may generate different keywords, validators, and extractors
  • Different content sources may be retrieved
  • Field names and structure vary between runs
  • Record counts differ
Each query creates a job that processes asynchronously. Use the returned job_id to poll the job status and retrieve results when completed. Processing typically takes 10-15 minutes.Track detailed progress through the steps array in the status endpoint response.
Results become available progressively during the enriching stage as validation completes in batches. Check for status: "enriching" to retrieve partial results before job completion.The progress_validated field tracks how many candidate clusters have been processed. This allows you to access early results while the job continues processing remaining batches.
Start with fewer records using the limit parameter for quick testing, then use POST /catchAll/continue to process more records without re-submitting the query.Continue requests preserve all analysis, validation, and extraction logic from the original job.

Endpoints

Base URL: https://catchall.newscatcherapi.com
EndpointMethodDescription
/catchAll/initializePOSTGet validator, enrichment, and date suggestions
/catchAll/submitPOSTCreate a new job
/catchAll/continuePOSTContinue job with higher limit
/catchAll/jobs/userGET List all jobs for your API key
/catchAll/status/{job_id}GETCheck job processing status
/catchAll/pull/{job_id}GETRetrieve job results
Track detailed progress using the steps array in the status endpoint response. See Job status > steps for details.

Use cases

  • Market intelligence: Company earnings, M&A activity, product launches
  • Regulatory monitoring: Policy changes, government actions, compliance updates
  • Business development: Partnerships, funding rounds, market entries
  • Competitive analysis: Competitor activities and announcements
  • Research automation: Structured data extraction for analysis
  • News aggregation: Topic-specific news with structured output

What’s next

For technical support, contact us at support@newscatcherapi.com.