What is robots.txt and why bother setting one up?
robots.txt is the first file Googlebot, Bingbot or GPTBot reaches for before reading your site. It tells them: "these paths you can browse, those you can't". It's a plain text file sitting at `https://your-domain.com/robots.txt`.
Here you click and pick which bots get which rules, add sitemap URLs and immediately see the finished file ready to copy onto your server. You can also block every AI crawler with one button (GPTBot, ClaudeBot, PerplexityBot) if you don't want your content ending up in language models.
Important: robots.txt is a request, not a security measure. Well-behaved bots (Google, Bing) listen, but malicious scrapers will ignore the file. For real protection use authentication, a web application firewall or IP blocking.
How to use it
- Decide what rules you want. Usually one `User-agent: *` group (all bots) is enough.
- For each group add Allow paths (allowed) and Disallow paths (blocked). E.g. `Disallow: /admin/` blocks the admin panel.
- Type sitemap URLs in the Sitemap field. They should be full URLs with `https://`.
- Use the preset buttons (block AI, block staging, allow everything except admin) to save time.
- Copy the generated file or download it as `robots.txt`. Drop it into the root of your site (next to index.html). Verify at `your-domain.com/robots.txt`.
When this tool helps
The most common scenarios where you need to set up robots.txt:
- Blocking AI scrapers. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, CCBot (Common Crawl), Google-Extended. More companies don't want their content training language models. The "Block AI" preset handles it.
- Hiding admin panel and API. `Disallow: /admin/`, `Disallow: /api/`, `Disallow: /wp-admin/`. Should not show in Google results.
- Staging and test environments. `staging.your-company.com` must stay invisible to Google. Full block: `Disallow: /`.
- Pointing to sitemaps. Google finds all pages faster when robots.txt contains a line like `Sitemap: https://your-domain.com/sitemap.xml`.
- Crawl-delay for slow servers. If your server has a weak CPU and Bingbot is generating too much load, add `Crawl-delay: 10` (10-second pause between requests). Googlebot doesn't support this, use Search Console instead.
- Different rules for different bots. You can let Google in everywhere but block Yandex on specific paths. Each User-agent gets its own group.
After generating the file check it with the robots.txt + sitemap.xml validator. After uploading to the server also add sitemap.xml, so Google discovers all pages faster.