What is robots.txt compliance?
Robots.txt files allow website publishers to specify which parts of their site can be accessed by automated tools. News API respects these guidelines by marking each article with its compliance status.The robots.txt compliance check is performed at the time of article collection
and reflects the publisher’s guidelines at that moment.
Article-level compliance
Each API response includes arobots_compliant boolean field indicating if
content can be safely accessed according to the publisher’s guidelines. If true,
automated access is allowed; if false, it is restricted.
Filtering by compliance
Use the optionalrobots_compliant parameter to filter your API requests.
This parameter is available for all endpoints that return articles.
- All articles
- Compliant only
- Non-compliant only
Source-level compliance metrics
The/sources endpoint provides aggregate compliance metrics showing what
percentage of articles from each source comply with robots.txt rules.
To view compliance metrics, set include_additional_info to true:
robots_compliant as a percentage in the additional_info
field:
Benefits
Legal compliance
Reduce legal risks by respecting publisher-defined access policies and
maintaining good relationships with content providers.
Performance optimization
Filter non-compliant content server-side, reducing unnecessary data transfer
and improving application performance.
Publisher relations
Demonstrate respect for content providers’ access policies, fostering better
long-term partnerships.
Transparency
Clear visibility into which content can be safely used, enabling informed
decisions about content usage.
Implementation considerations
Always present in article responses
Always present in article responses
The
robots_compliant field is included in every article object, ensuring
consistent access to compliance information regardless of filtering.Optional parameter for filtering
Optional parameter for filtering
You can include the
robots_compliant parameter in requests to filter
results, or omit it to receive all content with compliance flags.Server-side filtering
Server-side filtering
When you use the parameter, filtering happens at the API level, improving
performance by reducing unnecessary data transfer.
Source metrics require additional_info
Source metrics require additional_info
To view source-level compliance percentages, include
include_additional_info: true
in your /sources endpoint requests.
