Robots.txt compliance

News API includes robots.txt compliance information with every article, helping you build applications that respect publisher permissions. This feature provides transparency about whether content adheres to the source website’s automated access guidelines.

What is robots.txt compliance?

Robots.txt files allow website publishers to specify which parts of their site can be accessed by automated tools. News API respects these guidelines by marking each article with its compliance status.

The robots.txt compliance check is performed at the time of article collection and reflects the publisher’s guidelines at that moment.

Article-level compliance

Each API response includes a robots_compliant boolean field indicating if content can be safely accessed according to the publisher’s guidelines. If true, automated access is allowed; if false, it is restricted.

{
  "title": "Revolutionary AI Technology Breakthrough",
  "author": "Jane Smith",
  "domain_url": "techcrunch.com",
  "content": "Scientists have made a groundbreaking discovery...",
  "robots_compliant": true
  // other article fields
}

Filtering by compliance

Use the optional robots_compliant parameter to filter your API requests. This parameter is available for all endpoints that return articles.

All articles
Compliant only
Non-compliant only

{
  "q": "artificial intelligence"
}

If the parameter is omitted, the API returns all articles with compliance status indicated (default behavior).

{
  "q": "artificial intelligence",
  "robots_compliant": true
}

Returns only articles that comply with publisher guidelines.

{
  "q": "artificial intelligence",
  "robots_compliant": false
}

Returns only articles flagged as non-compliant.

Source-level compliance metrics

The /sources endpoint provides aggregate compliance metrics showing what percentage of articles from each source comply with robots.txt rules. To view compliance metrics, set include_additional_info to true:

{
  "source_name": "technology",
  "include_additional_info": true
}

The response includes robots_compliant as a percentage in the additional_info field:

{
  "message": "Maximum sources displayed according to your plan is set to 1000",
  "sources": [
    {
      "name_source": "China Science and Technology Network",
      "domain_url": "stdaily.com",
      "logo": null,
      "additional_info": {
        "nb_articles_for_7d": 548,
        "country": "CN",
        "rank": 8107,
        "is_news_domain": true,
        "news_domain_type": "Original Content",
        "news_type": "General News Outlets",
        "robots_compliant": "100%"
      }
    }
  ]
}

This metric helps you evaluate the reliability and compliance trends of sources when selecting them for your application.

Benefits

Legal compliance

Reduce legal risks by respecting publisher-defined access policies and maintaining good relationships with content providers.

Performance optimization

Filter non-compliant content server-side, reducing unnecessary data transfer and improving application performance.

Publisher relations

Demonstrate respect for content providers’ access policies, fostering better long-term partnerships.

Transparency

Clear visibility into which content can be safely used, enabling informed decisions about content usage.

Implementation considerations

Always present in article responses

The robots_compliant field is included in every article object, ensuring consistent access to compliance information regardless of filtering.

Optional parameter for filtering

You can include the robots_compliant parameter in requests to filter results, or omit it to receive all content with compliance flags.

Server-side filtering

When you use the parameter, filtering happens at the API level, improving performance by reducing unnecessary data transfer.

Source metrics require additional_info

To view source-level compliance percentages, include include_additional_info: true in your /sources endpoint requests.

Get started

Guides and concepts

How to

Troubleshooting

Migration

Robots.txt compliance

What is robots.txt compliance?

Article-level compliance

Filtering by compliance

Source-level compliance metrics

Benefits

Legal compliance

Performance optimization

Publisher relations

Transparency

Implementation considerations

See also

Get started

Guides and concepts

How to

Troubleshooting

Migration

​What is robots.txt compliance?

​Article-level compliance

​Filtering by compliance

​Source-level compliance metrics

​Benefits

Legal compliance

Performance optimization

Publisher relations

Transparency

​Implementation considerations

​See also

What is robots.txt compliance?

Article-level compliance

Filtering by compliance

Source-level compliance metrics

Benefits

Implementation considerations

See also