> ## Documentation Index
> Fetch the complete documentation index at: https://newscatcherinc-docs.mintlify.site/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Robots.txt compliance

> Understand robots.txt compliance fields and parameters in News API to build applications that respect publisher permissions.

News API includes robots.txt compliance information with every article,
helping you build applications that respect publisher permissions. This feature
provides transparency about whether content adheres to the source website's
automated access guidelines.

## What is robots.txt compliance?

Robots.txt files allow website publishers to specify which parts of their site
can be accessed by automated tools. News API respects these guidelines by
marking each article with its compliance status.

<Note>
  The robots.txt compliance check is performed at the time of article collection
  and reflects the publisher's guidelines at that moment.
</Note>

## Article-level compliance

Each API response includes a `robots_compliant` boolean field indicating if
content can be safely accessed according to the publisher's guidelines. If true,
automated access is allowed; if false, it is restricted.

```json {6} theme={null}
{
  "title": "Revolutionary AI Technology Breakthrough",
  "author": "Jane Smith",
  "domain_url": "techcrunch.com",
  "content": "Scientists have made a groundbreaking discovery...",
  "robots_compliant": true
  // other article fields
}
```

### Filtering by compliance

Use the optional `robots_compliant` parameter to filter your API requests.
This parameter is available for all endpoints that return articles.

<Tabs>
  <Tab title="All articles">
    ```json theme={null}
    {
      "q": "artificial intelligence"
    }
    ```

    If the parameter is omitted, the API returns all articles with compliance
    status indicated (default behavior).
  </Tab>

  <Tab title="Compliant only">
    ```json theme={null}
    {
      "q": "artificial intelligence",
      "robots_compliant": true
    }
    ```

    Returns only articles that comply with publisher guidelines.
  </Tab>

  <Tab title="Non-compliant only">
    ```json theme={null}
    {
      "q": "artificial intelligence",
      "robots_compliant": false
    }
    ```

    Returns only articles flagged as non-compliant.
  </Tab>
</Tabs>

## Source-level compliance metrics

The `/sources` endpoint provides aggregate compliance metrics showing what
percentage of articles from each source comply with robots.txt rules.

To view compliance metrics, set `include_additional_info` to `true`:

```json theme={null}
{
  "source_name": "technology",
  "include_additional_info": true
}
```

The response includes `robots_compliant` as a percentage in the `additional_info`
field:

```json {15} theme={null}
{
  "message": "Maximum sources displayed according to your plan is set to 1000",
  "sources": [
    {
      "name_source": "China Science and Technology Network",
      "domain_url": "stdaily.com",
      "logo": null,
      "additional_info": {
        "nb_articles_for_7d": 548,
        "country": "CN",
        "rank": 8107,
        "is_news_domain": true,
        "news_domain_type": "Original Content",
        "news_type": "General News Outlets",
        "robots_compliant": "100%"
      }
    }
  ]
}
```

This metric helps you evaluate the reliability and compliance trends of sources
when selecting them for your application.

## Benefits

<CardGroup cols={2}>
  <Card title="Legal compliance" icon="shield-check">
    Reduce legal risks by respecting publisher-defined access policies and
    maintaining good relationships with content providers.
  </Card>

  <Card title="Performance optimization" icon="gauge">
    Filter non-compliant content server-side, reducing unnecessary data transfer
    and improving application performance.
  </Card>

  <Card title="Publisher relations" icon="handshake">
    Demonstrate respect for content providers' access policies, fostering better
    long-term partnerships.
  </Card>

  <Card title="Transparency" icon="eye">
    Clear visibility into which content can be safely used, enabling informed
    decisions about content usage.
  </Card>
</CardGroup>

## Implementation considerations

<AccordionGroup>
  <Accordion title="Always present in article responses">
    The `robots_compliant` field is included in every article object, ensuring
    consistent access to compliance information regardless of filtering.
  </Accordion>

  <Accordion title="Optional parameter for filtering">
    You can include the `robots_compliant` parameter in requests to filter
    results, or omit it to receive all content with compliance flags.
  </Accordion>

  <Accordion title="Server-side filtering">
    When you use the parameter, filtering happens at the API level, improving
    performance by reducing unnecessary data transfer.
  </Accordion>

  <Accordion title="Source metrics require additional_info">
    To view source-level compliance percentages, include `include_additional_info: true`
    in your `/sources` endpoint requests.
  </Accordion>
</AccordionGroup>

## See also

* [API Reference](/news-api/api-reference/search/search-articles-get)
* [Error handling](/news-api/troubleshooting/error-handling)
