NLP features - newscatcher

As part of the NewsCatcher processing pipeline, each article is enriched with NLP data before it is indexed: theme classification, sentiment scores, named entities, content tags, and vector embeddings. News API exposes these fields in the response when you set include_nlp_data to true.

NLP enrichment is available only for articles indexed from July 2023 onward. For earlier articles, the API returns "nlp": {}.To request NLP enrichment for historical articles, contact support@newscatcherapi.com.

How NLP processing works

Processing mode depends on the article’s language and determines which response fields are populated and which are null. Native processing applies to English and Arabic articles. NLP runs on the original text and results appear in the standard nlp.* fields. Translation-based processing applies to all other languages. The article is first translated to English, then NLP runs on that translation. Results appear in nlp.translation_* fields — the corresponding standard fields are explicitly null, not absent. To receive translation fields in the response, set include_translation_fields to true. This distinction matters when consuming NER or summary fields: a null value in nlp.ner_PER means the article was processed via translation, not that no entities exist — check nlp.translation_ner_PER instead.

Available features

Feature	What it produces
Theme	One or more topic labels per article, for example `Tech` or `Finance`. Filterable with `theme` and `not_theme`.
Summary	AI-generated article summary. `nlp.summary` for native, `nlp.translation_summary` for translation-based.
Sentiment	Tone scores from `-1.0` to `1.0` for title and content independently.
Named entity recognition	Persons, organizations, locations, and miscellaneous entities with mention counts.
IPTC tags	Hierarchical news category tags using the IPTC media topic standard.
IAB tags	Content category tags using the IAB content taxonomy, used for audience segmentation.
Custom tags	Organization-specific taxonomy, private to your API key.
Vector embeddings	1024-dimensional semantic vectors for similarity search and clustering.

​How NLP processing works

​Available features

​See also

How NLP processing works

Available features

See also