A practical diagnostic hub that shows you exactly which pages are indexed, why others are missing, and how to fix them. No fluff. Just operational steps and real failure modes.
If your pages are not in Google's index, they cannot rank. Period. A google website index checker tells you exactly which URLs Google knows about and which it ignores. Most site owners run one check, see green lights, and move on. That is a mistake. In practice, when you dig into the raw data, you often find entire sections of your site blocked, pages returning soft 404s, or JavaScript content that Google never rendered.
We see this pattern every week: a client with 50,000 product pages, only 12,000 indexed. The fix is rarely a single switch. It requires a systematic workflow. This article walks you through that workflow, from running the check to fixing the root cause.
| Method | How It Works | Best For | Hidden Failure Mode |
|---|---|---|---|
| Google Search Console (URL Inspection) Manual, single-URL test | Sends a live fetch request to Google. Returns exact index status and any crawl errors. | Spot-checking critical pages (homepage, money pages, new content). | Rate-limited. You cannot test more than ~600 URLs per day. Also, the 'URL is on Google' status can be stale if the page was indexed months ago and later removed. |
| GSC Index Coverage Report Bulk export of all submitted URLs | Aggregates data across all pages submitted via sitemap. Gives counts by status: Error, Valid, Excluded, etc. | Overview of site-wide index health. Identifying patterns (e.g., all /blog/ pages excluded). | Does not check pages not in your sitemap. Also, the 'Excluded' category lumps together 'duplicate without canonical', 'noindex', and 'crawled but not indexed'. You must drill down. |
| Third-Party Bulk Checkers (Sitebulb, Screaming Frog, Python scripts) | Uses the Google Indexing API or a cached search result check. Some tools simulate a 'site:domain.com/url' search. | Large-scale audits (10k+ URLs). Comparing index status across different crawl dates. | The Google Indexing API has a quota of 200 URLs per day per project. Most third-party tools fall back to cached data, which can be 1-3 weeks old. You get false negatives for recently published pages. |
| site: Operator in Google Search Manual query | Type site:yourdomain.com/path in the search bar. Google shows indexed pages that match the query. | Quick gut check. No tools required. | Extremely unreliable. The count shown is an estimate, often off by 50-80%. The results are paginated and filtered by Google's relevance algorithm. You will not see all indexed pages. |
Scenario: A mid-size ecommerce site with 5,000 product pages. The client says 'Google indexes all our pages.' We run a bulk check via GSC Index Coverage export.
Step 1: Export the 'All submitted URLs' report. Total submitted: 4,850 (the rest were never added to the sitemap). Of those, 2,100 are 'Valid', 1,200 are 'Excluded', 450 are 'Error', and 1,100 are 'Crawled but not indexed yet'.
Step 2: Look at the 'Excluded' details. 800 are marked 'Duplicate without canonical' (same product with different sort parameters). 300 are 'Noindex' (staging pages accidentally left live). 100 are 'Page with redirect'.
Step 3: Fix actions: add canonical tags to the 800 duplicate URLs. Remove noindex tags from the 300 staging pages. Update or remove the 100 redirects. After fixes, resubmit the sitemap and recheck after 2 weeks. The recheck shows 3,400 valid pages. Index rate improved from 43% to 70%.
Export GSC Index Coverage report. Filter by Excluded and Error. Count the total unindexed URLs.
Categorize each unindexed URL by reason: noindex, robots.txt block, soft 404, or duplicate.
Remove noindex tags, update robots.txt, add canonical tags, or fix server errors. Use regex in your CMS if possible.
Generate a fresh sitemap with only the fixed URLs. Submit via GSC. Do not submit the old sitemap with broken URLs.
After 7-14 days, re-run the bulk check. Track the 'Valid' count. Expect a 15-30% increase per cycle.
No index check is clean. Here are the failures we see most often:
Always double-check with GSC URL inspection on a few random URLs before declaring a crisis.
| GSC Exclusion Reason | Actual Meaning | Most Common Cause | Recommended Fix |
|---|---|---|---|
| Duplicate without canonical | Google found two identical pages and chose one as canonical. The other is excluded. | URL parameters (sort, filter, session IDs) creating near-duplicate content. | Add to the preferred version. Or use parameter handling in GSC to tell Google to ignore certain parameters. |
| Noindex | The page has a meta robots tag with 'noindex' or an X-Robots-Tag HTTP header. | Staging pages, old blog posts accidentally set to noindex, or a global noindex tag applied via theme settings. | Remove the noindex tag from the HTML or server response. Use a find-and-replace in the database or a bulk update plugin. |
| Blocked by robots.txt | Googlebot followed a URL in the sitemap but was blocked by a Disallow directive before crawling. | Overly broad Disallow rules, e.g., 'Disallow: /' or 'Disallow: /wp-admin/' that also blocks public pages under that path. | Edit robots.txt to remove the Disallow rule for those paths. Test with the robots.txt tester in GSC before going live. |
| Soft 404 | The page returns a 200 status code but Google thinks it has no useful content (thin page, error message, or blank page). | Empty category pages, search results pages with no results, or pages that redirect to a 404 without a proper 301. | Either add substantive content, return a 404 or 410 status, or redirect to a relevant page with a 301. Do not return 200 for empty pages. |
Modern sites rely heavily on JavaScript. Google renders pages in two waves: first the raw HTML, then a second pass with JavaScript executed. If your content is injected via JavaScript and the second pass fails (e.g., due to a slow API or a blocking script), Google may index an empty shell. This is a known issue covered in Google's official guidance on JavaScript SEO basics. Use the 'Test Live URL' in GSC to see what Google sees after rendering. If the rendered HTML is missing key content, you have a rendering problem, not an indexability problem.
Another common blind spot: lazy loading. Googlebot scrolls down on mobile-first indexing, but it may not trigger lazy-load events if they rely on user interaction. Ensure critical content is present in the initial HTML or use server-side rendering.
For bulk verification of 10,000+ pages, export your sitemap URLs and use Google Search Console's Index Coverage report. Filter by 'Submitted and indexed' to see which pages are included. For the unindexed ones, use the 'Excluded' tab to see the reason. If you need a script, the Google Indexing API has a daily quota of 200 URLs per project, so it is not suitable for massive bulk checks. Instead, use a crawler like Screaming Frog with the 'Check Index Status' feature, but be aware it relies on cached data.
Zero indexed pages usually means one of three things: (1) your site is brand new and Google has not crawled it yet (submit a sitemap and wait 2-3 weeks), (2) your robots.txt file blocks Googlebot entirely (check the robots.txt tester in GSC), or (3) you have a noindex meta tag on all pages (search your HTML for <meta name='robots' content='noindex'>). Rarely, it can be a manual action or a server error that returns 500 status. Run a single URL inspection in GSC to diagnose.
For agencies, the best option is the Google Search Console API (not the Indexing API). The GSC API lets you pull Index Coverage data for all client properties programmatically. You can build a dashboard that shows each client's valid indexed pages, excluded count, and error breakdown. The quota is generous (2,000 requests per day per project). For real-time single URL checks, use the URL Inspection API, which has a quota of 600 queries per day per property. Avoid the Indexing API for bulk checks because it is designed for job posting or live-streaming pages, not general content.
For backlinks and guest posts you control, use the URL Inspection tool in GSC. Enter the exact URL of the guest post and check 'Indexing requested'. If it says 'URL is not on Google', the page may have a noindex tag, be blocked by robots.txt, or return a soft 404. For backlinks on sites you do not control, use the 'Links' report in GSC to see which pages Google has indexed that link to you. If the linking page is not indexed, the link does not pass PageRank. Request indexing of your guest post via the GSC URL inspection tool.
Three common errors: (1) The sitemap includes URLs that return 4xx or 5xx status codes — GSC will show them as 'Crawled but not indexed' or 'Error'. (2) The sitemap is too large (over 50,000 URLs or 50MB uncompressed) — Google will truncate it. (3) The sitemap includes URLs with noindex tags — Google will ignore them but still count them in the 'Submitted' column, causing a misleading index rate. Always validate your sitemap with a tool like Screaming Frog before submission.
Google uses mobile-first indexing, meaning it primarily uses the mobile version of a page for ranking and indexing. If your desktop and mobile pages have different content, HTML structure, or robots directives, the index checker will reflect the mobile version. For example, if the mobile page has a noindex tag but the desktop page does not, Google will not index the URL. Ensure your mobile page is not blocked, has equivalent content, and passes the 'Test Live URL' check in GSC with mobile user-agent.
No single API call returns a complete list of all indexed URLs. The GSC API's sitemap endpoints only return URLs you submitted. The Index Coverage API returns counts and reasons, not individual URLs. To compile a list, you must run a combination: (1) export all submitted URLs from your sitemap, (2) use the GSC API to check each URL's status (limited to 600 per day), or (3) use the 'site:' operator with a crawl tool, but that is incomplete. For most sites, the Index Coverage report CSV export is the closest you can get.
This status means Googlebot fetched the URL but chose not to add it to the index. Common causes: thin content, low page quality, duplicate content, or slow server response. Fixes: (1) improve the content to be unique and valuable (at least 300 words of original text), (2) remove or consolidate thin pages, (3) ensure the page loads in under 3 seconds, (4) add internal links from high-authority pages on your site. After fixes, request indexing via the URL Inspection tool. It may take 2-4 weeks for Google to recrawl and index the page.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.