Understanding NLP layer
When processing news, we summarize the article content, categorize articles by theme, estimate the overall tone of the writing, and identify important names and places mentioned in the text. As a result, we supply each processed article with additional NLP information that you can use when making requests via News API v3. The NLP layer in News API v3 consists of the following components:Component | Description | Plan Requirement |
---|---|---|
Theme | General topic or category of the article | v3_nlp |
Summary | Concise overview of the article’s content | v3_nlp |
Sentiment | Separate scores for title and content sentiment | v3_nlp |
Named Entities | Identified persons, organizations, locations, and miscellaneous entities | v3_nlp |
Translations | English translations of non-English content (one-way only): title, content, summary, named entities | v3_nlp |
IPTC Tags | Standardized news category tags | v3_nlp_iptc_tags |
IAB Tags | Content categories for digital advertising | v3_nlp_iptc_tags |
Custom Tags | Organization-specific classification system | All v3 NLP plans |
Embeddings | 1024-dimensional vector representation for semantic similarity | v3_nlp_embeddings |
Including NLP data in API responses
To control NLP data in your API responses, use the following parameters:include_nlp_data
(boolean): Set totrue
to include the NLP object for each article in response.has_nlp
(boolean): Set totrue
to filter the results to only articles with available NLP data.include_translation_fields
(boolean): Set totrue
to include translation fields in the response.
Some fields within the NLP object may be empty or
null
if specific analyses
were not performed on the article. The full data is available for articles in
English and Arabic only. Translation features are only available with the NLP
plan or higher.How these parameters work together
These parameters control both the article filtering and the inclusion of NLP data:include_nlp_data=true, has_nlp=false
: Returns all matching articles with the NLP object included in each. The completeness of NLP data varies by language.include_nlp_data=true, has_nlp=true
: Returns only articles processed with NLP. This combination filters out many articles in languages other than English and Arabic.include_nlp_data=false
: The NLP object is not included in the response, regardless of thehas_nlp
value.include_translation_fields=true
: Includes translation fields (title_translated_en
andcontent_translated_en
) in the response.
NLP coverage by language
The table below shows which NLP features are available for different language categories:Feature | English & Arabic | Other Languages | Coverage |
---|---|---|---|
Theme classification | ✓ | ✓ (limited) | 10% of non-English/Arabic articles |
Summary | ✓ | ✓ (limited) | 10% of non-English/Arabic articles |
Sentiment analysis | ✓ | ✗ | 100% of English/Arabic articles |
Named entity recognition | ✓ | ✓ | 100% of English/Arabic articles and all non-English articles via translations |
Content tags | ✓ | ✓ (limited) | 10% of non-English/Arabic articles |
Vector embeddings | ✓ | ✓ | Nearly 100% of all articles |
Clustering | ✓ | ✓ | All articles |
Deduplication | ✓ | ✓ | All articles |
Translations to English | ✗ | ✓ | All non-English articles |
Search with English translations | ✗ | ✓ | All non-English articles |
When working with non-English/Arabic content, using
has_nlp=true
substantially reduces the result set.Code example
Here’s how you can make a request to include NLP data in your search results using Python:nlp_request.py
Response example
Response example
response.json
Theme classification
Theme classification categorizes articles into predefined topics, allowing for efficient filtering and organization of news content.Available themes
News API v3 supports the following themes:Business
Economics
Entertainment
Finance
Health
Politics
Science
Sports
Tech
Crime
Financial Crime
Lifestyle
Automotive
Travel
Weather
General
Filtering by theme
Use thetheme
and not_theme
parameters to filter articles based on their
classified themes:
theme
(string): Includes articles matching the specified theme(s).not_theme
(string): Excludes articles matching the specified theme(s).
Article summarization
Article summarization provides concise overviews of article content, allowing for quick understanding without reading the full text.Using summaries in searches and clustering
You can use summaries in your searches and clustering:-
In searches, use the
search_in
parameter:This query searches forclimate change
within article summaries, potentially yielding more relevant results than searching the full content. -
For clustering, use summaries as the clustering variable:
This approach can lead to more concise and focused clusters. For more information on clustering, see Clustering news articles.
Sentiment analysis
Sentiment analysis determines the emotional tone of an article. News API v3 provides sentiment scores for both the title and content, ranging from -1 (negative) to 1 (positive).Filtering by sentiment
Filter articles based on sentiment scores using these parameters:title_sentiment_min
andtitle_sentiment_max
(float): Filter by title sentimentcontent_sentiment_min
andcontent_sentiment_max
(float): Filter by content sentiment
Named Entity Recognition (NER)
NER identifies and categorizes named entities within text content. News API v3 recognizes four types of entities:PER_entity_name
: Person namesORG_entity_name
: Organization namesLOC_entity_name
: Location namesMISC_entity_name
: Miscellaneous entities (events, nationalities, products, works of art, etc.)
NER Coverage and Language Support
NER Type | Language Coverage | Response Fields |
---|---|---|
Original Content | 100% English, 100% Arabic, ~10% other languages | nlp.ner_PER , nlp.ner_ORG , nlp.ner_LOC , nlp.ner_MISC |
Translation-Based | 100% all languages (via English translations) | nlp.translation_ner_PER , nlp.translation_ner_ORG , nlp.translation_ner_LOC , nlp.translation_ner_MISC |
AND
, OR
, NOT
), proximity
search with NEAR
, and count-based filtering.
To learn more about NER, see
How to search by entity.
Translation features
We translate and index all non-English articles to English, enabling you to search multilingual content using English keywords and perform named entity recognition across all languages. This feature is available in the NLP plan and higher.Searching in translated content
You can use English keywords to find relevant content in non-English articles by using thesearch_in
parameter with translation options:
title_translated
: Search in English translations of titlescontent_translated
: Search in English translations of contentsummary_translated
: Search in summaries of English translationstitle_content_translated
: Search in both English translated titles and content
Available translation fields
When using translation features, the API can include these additional fields in responses:title_translated_en
: English translation of the article titlecontent_translated_en
: English translation of the article contentnlp.summary_translated
: Brief AI-generated summary of the English translationnlp.translation_ner_PER
: Person entities extracted from English translationsnlp.translation_ner_ORG
: Organization entities extracted from English translationsnlp.translation_ner_LOC
: Location entities extracted from English translationsnlp.translation_ner_MISC
: Miscellaneous entities extracted from English translations
Tagging
Content tagging provides a standardized categorization of news articles, enhancing searchability and enabling more precise content filtering. IPTC and IAB tags are available in thev3_nlp_iptc_tags
plan. Custom tags are developed
upon request and are available in all NLP plans.
IPTC tags
IPTC (International Press Telecommunications Council) tags are a standardized set of news categories. They offer a hierarchical classification system for news content. To filter articles by IPTC tags use the following parameters:iptc_tags
(string): Includes articles with specified IPTC tags.not_iptc_tags
(string): Excludes articles with specified IPTC tags.
20000002
encodes arts and entertainment.
For a complete IPTC Media Topic NewsCodes list, visit the
IPTC website.
IAB tags
IAB (Interactive Advertising Bureau) tags provide a standardized taxonomy for digital advertising content. To filter articles by IAB tags use the following parameters:iab_tags
(string): Includes articles with specified IAB tags.not_iab_tags
(string): Excludes articles with specified IAB tags.
Business
or
Investing
but not Personal Finance
.
For more information on IAB Content Taxonomy, visit the
IAB Tech Lab website.
Custom tags
Custom tags help you classify and filter articles based on your organization’s taxonomy. Each taxonomy is organization-specific and protected by your API key, ensuring your custom classification system remains secure and private. We develop and integrate this solution upon your request. Simply provide us with your tags and their descriptions. To filter articles by your taxonomy tags, use thecustom_tags
parameter
following this pattern:
"custom_tags.taxonomy": "Tag1,Tag2,Tag3"
,
taxonomy
is your taxonomy name and Tag1,Tag2,Tag3
are specific tags.
To specify multiple tags:
- For
GET
requests, use a comma-separated string. - For
POST
requests, use a comma-separated string or an array of strings.
Embeddings
Vector embeddings provide a powerful way to represent article content as numerical vectors, enabling advanced semantic analysis and similarity comparisons. Available exclusively with thev3_nlp_embeddings
plan, each
article is processed through the
multilingual-e5-large model
to generate its vector representation.
The embedding is available in the new_embedding
field as an array of 1024
numbers. Here’s an example of how it appears in the API response:
- Semantic search: Find articles with similar meanings, not just matching keywords.
- Content recommendation: Suggest related articles based on semantic similarity.
- Topic clustering: Group articles by meaning using vector similarity.
- Machine learning: Train models using these dense numerical representations.
Use cases
NLP features in News API v3 enable various applications across industries:Application | Description | Example use case |
---|---|---|
Brand Monitoring | Track mentions, analyze sentiment and identify influencers. | A tech company monitoring public perception of their latest product launch. |
Competitive Intelligence | Monitor competitors’ activities and public perception. | An automotive manufacturer tracking mentions of competitors’ electric vehicle initiatives. |
Market Research | Analyze trends, consumer sentiment, and emerging topics. | A financial services firm identifying emerging fintech trends. |
Political Analysis | Track political figures and analyze public opinion. | A political campaign monitoring sentiment around key policy issues. |
Financial Analysis | Monitor market sentiment and track company mentions. | An investment firm analyzing sentiment around potential acquisition targets. |
Academic Research | Conduct large-scale analysis of media coverage. | A researcher studying media bias in climate change reporting. |
Content Curation | Automatically filter and categorize news content. | A news aggregator app personalizing content for users based on interests. |
Trend Forecasting | Identify emerging trends across industries. | A consulting firm predicting future technology adoption trends. |
Best practices
To maximize the effectiveness of NLP features in News API v3:- Start with broader queries and gradually refine using NLP parameters.
- Combine multiple NLP parameters for precise results.
- Use entity recognition with boolean operators to refine searches.
- Experiment with sentiment thresholds to find the right balance for your use case.
- Leverage theme classification and content tags to quickly filter large volumes of news data.
- Regularly review and update your queries to adapt to changing news landscapes.