Skip to main content

Documentation Index

Fetch the complete documentation index at: https://newscatcherinc-docs.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

How data is indexed

News API stores data in monthly indexes, optimized for search within a single month. Queries that span multiple months access multiple indexes, and performance degrades proportionally with the time range — queries across 5+ years can cause significant slowdowns.

Technical limitations

While you technically can query data across our entire historical range (2019 to present), doing so in a single request is not recommended for several reasons:
  • Performance degradation — queries spanning multiple years search across numerous indexes, significantly increasing response time.
  • Request timeouts — complex queries combined with long time ranges may time out before completion (default: 30 seconds).
  • Multi-index complexity — long time ranges require coordinating searches across multiple monthly indexes.
  • Limited result access — the API limits responses to 10,000 articles per request, so long time range queries may miss the most relevant historical data.

NLP data availability

Historical data is available from 2019 onward. NLP enrichment is available only for articles indexed from July 2023 onward. For earlier articles, the nlp field is present in responses but returned as an empty object {}. To request NLP enrichment for pre-July 2023 data, contact support@newscatcherapi.com.

Efficient query patterns

To retrieve historical data efficiently, break your queries into time chunks rather than querying the full date range at once.

Incorrect approach

q=financial crisis&from_=2019-01-01&to_=2025-01-01
This query attempts to search approximately 72 monthly indexes at once, which may lead to poor performance or timeout errors (408 Request Timeout).
1

Estimate data volume using aggregation

Before retrieving actual articles, use the /aggregation_count endpoint to understand the volume of data matching your query across time periods.Example request:
{
    "q": "your search query",
    "aggregation_by": "month",
    "from_": "2020-01-01",
    "to_": "2020-12-31",
    "lang": "en"
}
Example response:
{
    "aggregations": [
        {
            "aggregation_count": [
                {
                    "time_frame": "2020-01-01 00:00:00",
                    "article_count": 2450
                },
                {
                    "time_frame": "2020-02-01 00:00:00",
                    "article_count": 3120
                }
                // Additional months...
            ]
        }
    ]
}
2

Process data in time chunks

Retrieve articles in monthly or weekly chunks. Complex queries spanning more than 30 days risk 408 timeout errors — if a chunk times out, subdivide it further
{
    "q": "your search query",
    "from_": "2020-01-01",
    "to_": "2020-01-31",
    "page_size": 100,
    "page": 1
    // Additional parameters as needed
}

Example implementation

Here’s a practical example showing how to retrieve a week of data using the recommended approach. The same logic scales to retrieve months or years by adjusting the date ranges and aggregation period (day/month):
For detailed guidance on retrieving large datasets, see Retrieve large datasets.

Common pitfalls to avoid

PitfallImpactSolution
Querying multiple years at onceSlow performance, timeouts (408 errors)Break queries into monthly chunks
Using overly broad search termsExcessive result volumeRefine query terms to be more specific
Insufficient error handlingFailed data retrievalImplement robust retry and error handling
Underestimating data volumeResource constraintsUse aggregation endpoint to estimate volume first
Requesting too many results per pageSlow response timesUse reasonable page sizes (100-1000)
Improper pagination implementationIncomplete data retrievalSee Retrieve large datasets
Expecting NLP data before July 2023nlp field is present but returned as {}Set has_nlp=true to filter for NLP-enriched articles only
Not prioritizing recent dataSlower iteration when validating a new queryStart with a recent short range to validate results before querying the full history
Missing delays between requests429 errors interrupting long retrieval jobsAdd a delay between requests and implement exponential backoff on 429 responses

See also