> ## Documentation Index > Fetch the complete documentation index at: https://newscatcherinc-docs.mintlify.site/docs/llms.txt > Use this file to discover all available pages before exploring further. # Retrieve large datasets > Retrieve all articles for a query that exceeds the 10,000 article limit using time-chunking across Python, TypeScript, and Java News API returns up to 10,000 articles per query. For broad queries this limit is hit constantly — a search for "artificial intelligence" in English returns 10,000 results even when hundreds of thousands of matching articles exist. This guide walks through the full retrieval workflow in three steps: measure your dataset volume, choose the right chunk size, then fetch everything. All three steps include code examples for Python, TypeScript, and Java. ## Before you begin * An active News API key * For Python: Python 3.10+ with [News API Python SDK](/news-api/libraries/python) installed ## Retrieval workflow Before writing any retrieval logic, use [`/aggregation_count`](/news-api/api-reference/aggregation-count/get-aggregation-count-by-interval-get) to understand how many articles your query actually matches and how they're distributed over time. This tells you which chunk size to use and whether your query needs narrowing. ```bash cURL theme={null} curl -X POST "https://v3-api.newscatcherapi.com/api/aggregation_count" \ -H "x-api-token: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "q": "artificial intelligence", "lang": "en", "aggregation_by": "day" }' ``` ```python Python theme={null} from newscatcher import NewscatcherApi client = NewscatcherApi(api_key="YOUR_API_KEY") result = client.aggregation.post( q="artificial intelligence", lang="en", aggregation_by="day", ) print(f"Total articles: {result.total_hits}") for bucket in result.aggregations[0].aggregation_count: print(f"{bucket.time_frame[:10]}: {bucket.article_count:,} articles") ``` ```typescript TypeScript theme={null} import { NewscatcherApiClient } from "newscatcher-sdk"; const client = new NewscatcherApiClient({ apiKey: "YOUR_API_KEY" }); const result = await client.aggregation.post({ q: "artificial intelligence", lang: "en", aggregationBy: "day", }); console.log(`Total articles: ${result.totalHits}`); result.aggregations?.[0].aggregationCount?.forEach((bucket) => { console.log(`${bucket.timeFrame?.slice(0, 10)}: ${bucket.articleCount}`); }); ``` ```java Java theme={null} import com.newscatcher.api.NewscatcherApiClient; import com.newscatcher.api.resources.aggregation.requests.PostAggregationRequest; NewscatcherApiClient client = NewscatcherApiClient.builder() .apiKey("YOUR_API_KEY") .build(); var result = client.aggregation().post( PostAggregationRequest.builder() .q("artificial intelligence") .lang("en") .aggregationBy("day") .build() ); System.out.println("Total articles: " + result.getTotalHits()); result.getAggregations().get(0).getAggregationCount().forEach(bucket -> System.out.println(bucket.getTimeFrame() + ": " + bucket.getArticleCount()) ); ``` The response shows total volume and the per-day distribution: ```json theme={null} { "status": "ok", "total_hits": 118562, "page": 1, "total_pages": 1186, "page_size": 100, "aggregations": [ { "aggregation_count": [ { "time_frame": "2026-05-04 00:00:00", "article_count": 18461 }, { "time_frame": "2026-05-05 00:00:00", "article_count": 20725 }, { "time_frame": "2026-05-06 00:00:00", "article_count": 20880 }, { "time_frame": "2026-05-07 00:00:00", "article_count": 20973 }, { "time_frame": "2026-05-08 00:00:00", "article_count": 15915 }, { "time_frame": "2026-05-09 00:00:00", "article_count": 6708 }, { "time_frame": "2026-05-10 00:00:00", "article_count": 5782 }, { "time_frame": "2026-05-11 00:00:00", "article_count": 9118 } ] } ] } ``` 118,562 total articles, with 15,000–20,000 per day. Daily chunks would hit the 10K cap every day. You need 6-hour chunks. Pick a chunk size where each window returns fewer than 10,000 articles. Use the per-period counts from Step 1: | Articles per period | Recommended chunk size | | ------------------------- | ------------------------------------- | | More than 10,000 per hour | `"1h"` — consider narrowing the query | | More than 10,000 per day | `"6h"` or `"1h"` | | 3,000–10,000 per day | `"1d"` | | 1,000–3,000 per day | `"3d"` | | 100–1,000 per day | `"7d"` | | Fewer than 100 per day | `"30d"` | For the "artificial intelligence" example above: most days have 15,000–20,000 articles, so `"6h"` is the right choice. With chunk size confirmed, iterate through your date range window by window, paginating each window fully before moving to the next. ```python Python theme={null} import time from newscatcher import NewscatcherApi from newscatcher.core import ApiError client = NewscatcherApi(api_key="YOUR_API_KEY") def get_all_articles(query: str, from_date: str, to_date: str, chunk_hours: int) -> list: from datetime import datetime, timedelta articles = [] window_start = datetime.fromisoformat(from_date) end = datetime.fromisoformat(to_date) while window_start < end: window_end = min(window_start + timedelta(hours=chunk_hours), end) page = 1 total_pages = 1 while page <= total_pages: try: response = client.search.post( q=query, lang="en", from_=window_start.isoformat(), to=window_end.isoformat(), page=page, page_size=1000, ) articles.extend(response.articles) total_pages = response.total_pages page += 1 time.sleep(0.5) except ApiError as e: print(f"Error on page {page}: {e}") break window_start = window_end return articles articles = get_all_articles( "artificial intelligence", "2026-05-04T00:00:00", "2026-05-11T00:00:00", chunk_hours=6, ) print(f"Retrieved {len(articles)} articles") ``` ```typescript TypeScript theme={null} import { NewscatcherApiClient } from "newscatcher-sdk"; const client = new NewscatcherApiClient({ apiKey: "YOUR_API_KEY" }); async function getAllArticles( query: string, from: Date, to: Date, chunkHours: number ): Promise { const articles: object[] = []; let windowStart = new Date(from); while (windowStart < to) { const windowEnd = new Date( Math.min(windowStart.getTime() + chunkHours * 3600 * 1000, to.getTime()) ); let page = 1; let totalPages = 1; while (page <= totalPages) { const response = await client.search.post({ q: query, lang: "en", from: windowStart, to: windowEnd, pageSize: 1000, page, }); articles.push(...(response.articles ?? [])); totalPages = response.totalPages ?? 1; page++; } windowStart = windowEnd; } return articles; } const articles = await getAllArticles( "artificial intelligence", new Date("2026-05-04T00:00:00"), new Date("2026-05-11T00:00:00"), 6, ); console.log(`Retrieved ${articles.length} articles`); ``` ```java Java theme={null} import com.newscatcher.api.NewscatcherApiClient; import com.newscatcher.api.resources.search.requests.PostSearchRequest; import java.time.Instant; import java.time.temporal.ChronoUnit; import java.util.ArrayList; import java.util.List; NewscatcherApiClient client = NewscatcherApiClient.builder() .apiKey("YOUR_API_KEY") .build(); List