Sitemap and robots.txt Configuration Guide

Dynamic XML sitemaps, image sitemaps, news sitemaps, and robots.txt — the complete crawl optimization guide with real Search Console data.

Sitemap and robots.txt Configuration Guide

Dynamic XML sitemaps, image sitemaps, news sitemaps, and robots.txt — the complete crawl optimization guide with real Search Console data.

Why Sitemaps Matter More Than You Think

A sitemap is not just an SEO checkbox. It's a direct communication channel with search engine crawlers. When Google's bot visits your site, it looks for /sitemap.xml first to understand the site structure, discover new pages, and prioritize crawling. Without a sitemap, Google relies on following links — which means orphan pages (pages with no internal links) never get indexed.

JekCMS generates sitemaps dynamically. No static XML files to maintain. Every new post, category, or page is automatically included within minutes.

XML Sitemap Structure

A valid XML sitemap follows the sitemaps.org protocol. Each URL entry includes the location, last modification date, change frequency hint, and priority hint:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://example.com/blog/my-post</loc>
        <lastmod>2026-03-15</lastmod>
        <changefreq>monthly</changefreq>
        <priority>0.8</priority>
    </url>
</urlset>

JekCMS generates separate sitemaps for different content types: sitemap-posts.xml, sitemap-pages.xml, sitemap-categories.xml, and sitemap-tags.xml. A master sitemap.xml index links to all of them.

Dynamic Generation in PHP

// includes/sitemap-handler.php
header('Content-Type: application/xml; charset=utf-8');

$uri = $_SERVER['REQUEST_URI'];
$type = '';
if (strpos($uri, 'sitemap-posts') !== false) $type = 'posts';
elseif (strpos($uri, 'sitemap-pages') !== false) $type = 'pages';
elseif (strpos($uri, 'sitemap-categories') !== false) $type = 'categories';

echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

if ($type === 'posts') {
    $posts = $db->fetchAll(
        "SELECT slug, updated_at FROM posts WHERE status='published' ORDER BY updated_at DESC"
    );
    foreach ($posts as $post) {
        echo "<url><loc>" . SITE_URL . "/blog/" . htmlspecialchars($post['slug']) . "</loc>";
        echo "<lastmod>" . date('Y-m-d', strtotime($post['updated_at'])) . "</lastmod></url>";
    }
}

echo '</urlset>';

Image Sitemap

Google Images is a significant traffic source. Adding image information to your sitemap helps Google index your images correctly:

<url>
    <loc>https://example.com/blog/my-post</loc>
    <image:image>
        <image:loc>https://example.com/uploads/images/photo.avif</image:loc>
        <image:title>Photo description</image:title>
    </image:image>
</url>

robots.txt

The robots.txt file tells crawlers which parts of your site to crawl and which to ignore:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /includes/
Disallow: /config/
Disallow: /cache/

Sitemap: https://example.com/sitemap.xml

Critical: the Sitemap directive must use an absolute URL. Relative URLs are invalid and Google will ignore them.

Crawl Budget Optimization

For sites with thousands of pages, crawl budget matters. Google allocates a finite number of crawls per day. Optimize by: blocking non-essential pages in robots.txt, using noindex meta tags for thin content, keeping sitemap URLs under 50,000 per file (Google's limit), and serving fast responses (slow sites get crawled less).

Blocking /admin/, /api/, and /cache/ in robots.txt prevents Googlebot from spending crawl budget on pages that should never be indexed — directing that budget toward your actual content instead.

Order Today

One-time payment, lifetime access. Setup in 30 minutes.

View Pricing
  • Setup and live in 30 minutes
  • 14+ professional themes
  • n8n automation integration
  • Automatic SEO — Sitemap, Schema.org
  • PayPal & iyzico payment support

Be the first to know

New features, release notes & CMS guides — a couple of emails a month, no spam.