DevLog 20250523: Sitemap and `robots.txt`

Question

DevLog 20250523: Sitemap and `robots.txt`

calendar_todayJun 18, 2025 • schedule3 min read

— Originally published at dev.to

Search engine optimization (SEO) is not just about keywords and HTML metadata! Though those are the most basic things one can do (and can easily improve site visibility), there are other tricks that go a bit deeper—more technical than what ordinary readers see.

I found that webmasters can go to Google Search Console and Bing Webmaster Tools to add sitemaps. This continues our previous discussion on Search Engine Architecture.

Sitemap

A sitemap is simply a file (or sometimes a web page) that tells search engines about the pages on our site.

Better discovery: Search engines won’t have to “guess” at pages.
Faster indexing: New or updated content gets found more quickly when we update the sitemap.
Structured hints: Metadata in an XML sitemap gives crawlers extra clues about how often and how important different pages are.

There are two main flavors:

XML Sitemap (for search engines)
- It’s an XML-formatted file (usually named sitemap.xml) that lives at the website’s root.
- Inside, it lists all of the website’s important URLs, plus optional metadata like:
  - <lastmod> (when the page was last changed)
  - <changefreq> (how often it tends to be updated)
  - <priority> (a hint about which pages we consider most important)
- By submitting this file to Google Search Console or Bing Webmaster Tools, we help crawlers discover and index our pages more efficiently—especially useful if we have a very large site, pages that aren’t well linked internally, or lots of media content.
```
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/</loc>
    <lastmod>2025-05-20</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://www.example.com/blog/post-1</loc>
    <lastmod>2025-05-18</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
  
</urlset>
```
HTML Sitemap (for people)
- It’s just a regular web page on the site that lists links to all pages in a human-readable format.
- It’s primarily a usability feature—helping visitors (and indirectly search engines) navigate large or complex sites.

robots.txt

The robots.txt file is another “cheat-sheet” to put at the very root of the website (e.g. https://www.example.com/robots.txt) to tell well-behaved web crawlers which parts of the site they’re welcome to explore—and which parts we’d rather keep off-limits.

Privacy & security: Keep staging directories, admin panels, or confidential files out of search results.
Crawl-budget control: On large sites, we can steer crawlers away from low-value pages (like faceted filters), so they focus on important content.
Performance: Reduce server load by preventing bots from hammering resource-heavy sections.

Some notes:

Location matters
- Must live at https://domain.com/robots.txt (exactly).
- Crawlers automatically look here first before they begin crawling website pages.

Basic syntax

It’s plain text, with directives grouped by User-agent (the crawler’s name).
Common directives:
- Disallow: — path (or file) we don’t want crawled
- Allow: — exception to a Disallow: (supported by Google, Bing, etc.)
- Sitemap: — URL of the XML sitemap

# Block all crawlers from /private/
User-agent: *
Disallow: /private/

# Allow Googlebot to see /private/public-info.html
User-agent: Googlebot
Allow: /private/public-info.html

# Let everyone know where the sitemap lives
Sitemap: https://www.example.com/sitemap.xml

User-agents
- * is the wildcard: applies to every crawler.
- We can target specific bots (e.g., User-agent: Googlebot, User-agent: Bingbot) if we need different rules.
Disallow vs. Allow
- Disallow: / — don’t crawl anything on the site.
- Disallow: (empty) — allow everything.
- Allow: /path/to/page.html — lets a crawler index a page that would otherwise be blocked by a broader Disallow:.

A few best practices

robots.txt is publicly visible—don’t use it to hide truly sensitive info (use authentication!).

Test in Google Search Console’s “robots.txt Tester” or Bing Webmaster Tools.
Combine with sitemaps: Always include a Sitemap: line so crawlers can discover all valid URLs easily.

Here is an example from Google: https://www.google.com/robots.txt

1 Comment

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Methodox Technologies, Inc.

7.5k Points • 276 Badges

Toronto • methodox.io

82Posts

72Comments

6Connections

We are a vibrant start-up dedicated to revolutionizing personal computing for creators and professio... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

Vanessa Paul · Answer 1 · 2025-11-28T14:20:13+0000

Nice example on combining sitemaps with robots.txt, I never realized how much it can actually speed up indexing.

	Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat abarth23 - Apr 27
	CatchDoms: find SEO expired domains samir - Apr 9
	DevLog 20250522: Serverless & Serverside vs Client Side Rendering Methodox - May 23, 2025
	DevLog 20250520: Search Engine Architecture Methodox - May 20, 2025
	llms.txt vs. robots.txt: Crawl Access Controls vs. AI Semantic Context Directories abusuyfan - May 25

DevLog 20250523: Sitemap and `robots.txt`

Sitemap

robots.txt

A few best practices

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Everyone says DeepSeek is cheaper, but I got tired of guessing the exact math. So I built a calculat

CatchDoms: find SEO expired domains

DevLog 20250522: Serverless & Serverside vs Client Side Rendering

DevLog 20250520: Search Engine Architecture

llms.txt vs. robots.txt: Crawl Access Controls vs. AI Semantic Context Directories

More From Methodox

Designing a Binary Data API for Divooka with PPM and PLY Examples

A Soft Introduction to EasyShell

DevLog 20251214: Yet Another Shell Script – Keeping Build Automation Simple with EasyShell

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,649 amazing developers

Don't have an account? Sign up

OR

DevLog 20250523: Sitemap and `robots.txt`

Sitemap

robots.txt

A few best practices

1 Comment

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Methodox

Related Jobs

Commenters (This Week)