OpenAI – ChatGPT – SearchGPT Web Crawlers Spiders and Bots

Estimated read time 2 min read

Source: OpenAI

Overview of OpenAI Crawlers

OpenAI uses web crawlers (“robots”) and user agents to perform actions for its products, either automatically or triggered by user request. OpenAI uses the following robots.txt tags to enable webmasters to manage how their sites and content work with AI. Each setting is independent of the others – for example, a webmaster can allow OAI-SearchBot to appear in search results while disallowing GPTbot to indicate that crawled content should not be used for training OpenAI’s generative AI foundation models. For search results, please note it can take ~24 hours from a site’s robots.txt update for our systems to adjust.

OAI-SearchBot

    • OAI-SearchBot is for search. OAI-SearchBot is used to link to and surface websites in search results in the SearchGPT prototype. It is not used to crawl content to train OpenAI’s generative AI foundation models. To help ensure your site appears in search results, we recommend allowing OAI-Searchbot in your site’s robots.txt file and allowing requests from our published IP ranges below.

Full user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot

Published IP addresses:

ChatGPT-User

    • ChatGPT-User is for user actions in ChatGPT and Custom GPTs. When users ask ChatGPT or a CustomGPT a question, it may visit a web page to help answer and include a link to the source in its response. ChatGPT users may also interact with external applications via GPT Actions. ChatGPT-User governs which sites these user requests can be made to. It is not used for crawling the web in any automatic fashion, nor to crawl content for generative AI training.

Full user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot

Published IP addresses

GPTBot

    • GPTBot is used to make our generative AI foundation models more useful and safe. It is used to crawl content that may be used in training our generative AI foundation models. Disallowing GPTBot indicates a site’s content should not be used in training generative AI foundation models.

Full user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

Published IP addresses