You publish content. Google ignores it. Nine times out of ten, the cause sits in one of four buckets: blocked resources, a stray noindex tag, a crawl error, or thin content. This guide walks through each with the settings, filters, and failure modes that matter.
Do not guess. Open Google Search Console, navigate to Pages > All submitted URLs, and filter by 'Not indexed'. This single view shows the exact reason Google assigns to each URL. The reasons are categorical — 'Crawled - currently not indexed', 'Discovered - currently not indexed', 'Blocked by robots.txt', 'Page with redirect', 'Not found (404)', 'Excluded by noindex tag'. Each maps to a specific cure.
In practice, when you see 'Discovered - currently not indexed' for more than 3 weeks, the page is usually a candidate for the 'crawl budget' bottleneck. Google found the URL but chose not to crawl it yet. The fix is not to resubmit — it is to reduce low-value pages that compete for crawl slots. A common situation we see is a site with 50,000 product pages, 10,000 of which are out-of-stock. Those dead pages drain budget. Consolidate or noindex them.
Check your robots.txt live on the live domain, not on a staging copy. A single misplaced Disallow: / or a wildcard rule like Disallow: /filter/ can block entire content clusters. Use Google Search Console's robots.txt tester to validate. One edge case: a rule that blocks /assets/ may also block CSS and JS files, causing Google to see a broken page and treat it as low quality. Always test the full page rendering.
If you have duplicate content across multiple URLs, Google expects you to consolidate signals. Read the official guidance on how to consolidate duplicate URLs to avoid spreading crawl budget across near-identical pages.
| Root Cause | Detection Method | Immediate Fix | Hidden Failure Mode |
|---|---|---|---|
| Blocked by robots.txt Disallow rule matches the URL path | GSC 'Blocked by robots.txt' report or live robots.txt tester | Remove or narrow the Disallow rule. Re-submit URL via GSC | A Disallow: / rule for a subfolder can block CSS/JS. Always test page rendering |
| Noindex tag present or X-Robots-Tag header | GSC 'Excluded by noindex tag' + browser inspection of page source | Remove the noindex tag or set to index. Wait for recrawl or request indexing | Noindex on paginated pages or filter pages can leak across canonicals. Check the canonical tag too |
| Crawl error (5xx, 4xx) Server timeout, 503, 404, 410 | GSC 'Crawl errors' report. Check server logs for response codes | Fix server resources, remove broken redirect chains, correct URL structure | Soft 404s (page returns 200 but shows 'no results') are not flagged automatically. Manual review required |
| Content quality / thin pages Low word count, low value, auto-generated, or duplicate content | GSC 'Crawled - currently not indexed' + manual content audit | Increase page depth, add unique value, remove or consolidate duplicates | Google may index a page then de-index it when it detects scaling of similar low-value content. Monitor indexation trends weekly |
Open 'Pages' report. Filter by 'Not indexed'. Note the exact reason for each URL group.
Use GSC's robots.txt tester. Simulate the exact URL path. If blocked, fix and re-submit.
Look for <meta name='robots' content='noindex'>. Also check HTTP response headers for X-Robots-Tag.
Use curl or browser dev tools. Ensure 200 status, no redirect chain longer than 3 hops, and no soft 404.
If all technical checks pass but the page is still not indexed, compare word count and uniqueness against the top 3 competing pages.
In GSC, paste the URL and click 'Request indexing'. Wait 5-7 days. Re-check the 'Not indexed' report.
Scenario: An e-commerce store with 15,000 product pages. Google indexed 2,100. The 'Not indexed' report showed 12,900 URLs. Breakdown: 4,500 'Discovered - currently not indexed', 5,200 'Crawled - currently not indexed', 2,600 'Blocked by robots.txt', 600 'Excluded by noindex tag'.
Step 1: Reviewed robots.txt. Found a blanket Disallow: /products/ rule — outdated from a site migration. Removed the rule. Re-submitted 2,600 URLs. Within 10 days, 1,800 were indexed.
Step 2: For the 'Crawled - currently not indexed' group (5,200 URLs), we sampled 300 pages. 210 had fewer than 80 words, no product description, and a single image. These were thin pages. We consolidated 4,800 thin product variants into 1,200 parent pages using canonical tags. The remaining 400 pages we enriched with unique descriptions.
Step 3: The 'Discovered - currently not indexed' group (4,500 URLs) resolved itself after we eliminated the thin pages, freeing up crawl budget. Within 4 weeks, total indexed URLs rose from 2,100 to 9,800.
Noindex tags appear in two places: the HTML <head> as a meta tag, or as an HTTP header (X-Robots-Tag). Check both. A common edge case: a CMS that applies noindex to all pages under a certain date, or a staging site that leaked to production. We once audited a site where the entire blog archive was noindex because the developer had set it globally on the /blog/ path. It took 8 months to recover indexation after the tag was removed.
If you are dealing with a migration or reindexing project, a structured workflow is essential. See this technical migration protocol for reindexing a website on Google for step-by-step instructions covering URL mapping, redirect validation, and indexation requests.
Open Google Search Console > Pages > Not indexed. Note every reason.
Test the exact URL in GSC's robots.txt tester. Confirm the page is not blocked.
Inspect page source for <code>noindex</code> meta tag. Check HTTP response headers for X-Robots-Tag.
Verify the page returns a 200 status code. Check for soft 404s (empty result page with 200).
Check redirect chains. No more than 3 hops. No redirect loops.
Check canonical tag. It should point to the same URL or a close equivalent.
Review server logs for recent crawl attempts. Look for 5xx errors or timeouts.
Assess content quality: word count, uniqueness, image alt text, internal links.
Requesting indexing is not a guarantee. Common reasons: the page is blocked by robots.txt, has a noindex tag, returns a 5xx error, or is too thin. Google may also deprioritize the page if traffic is low. Wait 5-7 days, then re-check the GSC 'Not indexed' report for the specific reason. If the reason is 'Discovered - currently not indexed', it usually means crawl budget is tight — consolidate low-value pages.
New domains have zero authority and limited crawl budget. Google may discover the URL but not index it for weeks. Solutions: build quality backlinks to the domain (not just the guest post), ensure the post has unique content (no spun text), and request indexing from GSC. Avoid placing guest posts on domains with a history of spam, as they can inherit a penalty.
Agencies often see 'Crawled - currently not indexed' across client sites due to thin content, duplicate boilerplate text, or misconfigured robots.txt. Use a bulk URL inspector in GSC's API to audit all client properties at once. Automate checks for noindex tags and server errors. Set up weekly indexation reports per property to catch regressions early.
CDNs can block Googlebot if the firewall rules are too aggressive. Check your CDN logs for 403 or 503 responses to Googlebot IPs. Also ensure that the CDN caches the correct robots.txt and does not serve a stale version. Some CDNs also strip the X-Robots-Tag header — verify that the header is preserved in the cached response.
Outreach landing pages are often thin — a single paragraph and a link. Google may see them as low-value and skip indexation. Improve the page with at least 300 words of original content, a clear value proposition, and internal links to authoritative pages on your site. Also ensure the page is not blocked by robots.txt and does not have a noindex tag.
Post-migration, Google needs to recrawl and reindex the new URLs. Common blockers: old URLs are not 301-redirected to the new ones, redirect chains are too long, or the new site has a noindex tag from staging. Run a full crawl with Screaming Frog or similar, fix all redirect issues, and submit a new sitemap in GSC. Indexation can take 4 to 8 weeks.
The Indexing API is only for job listings and live streaming pages. It will not work for standard blog posts or product pages. If you use it on unsupported content types, Google may ignore it. For regular pages, the only reliable methods are sitemap submission and manual URL inspection requests in GSC.
The top three: 1) 503 errors (server overload or maintenance), 2) 404 errors (deleted pages without redirects), and 3) soft 404s (a page returns 200 but shows 'no results' or a blank state). All three cause Google to drop the URL from the index. Check server logs weekly, set up alerting for spikes in 5xx errors, and review soft 404s manually.
API-generated pages often lack static HTML content at the point of crawl. Google may see an empty shell. Pre-render critical content on the server (server-side rendering) or use dynamic rendering for Googlebot. Also check that the API endpoint is not rate-limiting Googlebot. Set up a separate crawl path with higher limits for search engine bots.
A sitemap is a suggestion, not a guarantee. If you submitted 10,000 URLs but only 200 are indexed, the likely causes: the sitemap includes blocked URLs, thin content pages, or URLs that return 4xx/5xx errors. Run the sitemap through a validator tool. Remove all low-quality or blocked URLs from the sitemap. Only include pages you actively want indexed.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.