Stop manually checking individual URLs. Build a pipeline that fetches indexing status for thousands of pages using the official Google Indexing API. Includes real-world quota math, error handling, and a worked example.
The Google Indexing API is not a general-purpose index checker — it is designed for pages that change frequently, like job postings or live events. But with the right wrapper, you can repurpose it to check indexing status for any URL. The official Google Indexing API Quickstart shows you how to get a service account and authenticate. From there, you build a loop that calls the getStatus method for each URL.
In practice, when you send 200 URLs per day, you will hit the quota in 10 minutes if you batch poorly. A common situation we see: developers set up the script, run it against 2000 URLs, and wonder why 1800 return 403 errors. The quota resets at midnight Pacific Time. Plan your batches around that window, and stagger large lists over multiple days.
Edge cases matter. A URL blocked by robots.txt returns URL_NOT_FOUND, not a clear 'blocked' error. A noindex tag returns URL_NOT_FOUND as well. You cannot distinguish between 'indexed' and 'removed' without cross-referencing the response with a manual check. That is a hard limitation of the API. We will show you how to build a fallback for ambiguous results.
| API Endpoint / Method | What It Returns | Quota & Limits | Hidden Failure Mode |
|---|---|---|---|
getStatusPOST /v3/urlNotifications:getMetadata | Indexing state: INDEXED, NOT_FOUND, or INVALID_URL | 200 req/day per project Resets at 00:00 PT | Returns NOT_FOUND for noindex and robots.txt blocked pages — no differentiation |
publishPOST /v3/urlNotifications:publish | Confirms submission to indexing queue | 200 req/day (shared pool with getStatus) | Submitting a blocked URL gives a 200 success, but the page stays unindexed |
| batchGet (via custom script) | Not natively supported — must loop | No native batch endpoint; you build your own | Parallel calls exceed quota faster; sequential loops take 10x longer |
OAuth 2.0 scopehttps://www.googleapis.com/auth/indexing | Access token for service account | Token expires every 60 min; refresh logic required | Expired token returns 401 with no retry hint — silent failure if not handled |
Clean list: remove duplicates, filter non-HTTP(S) schemas, exclude blacklisted paths
Load JSON key, request OAuth token with indexing scope, verify token is not expired
Split into batches of 10-20 URLs, send sequential getStatus calls, respect 1-second gap between batches
Map <code>INDEXED</code>, <code>NOT_FOUND</code>, <code>INVALID_URL</code>. Log ambiguous NOT_FOUND for manual review
Run a headless browser check for noindex tag and robots.txt; mark as 'blocked' if found
CSV with columns: URL, status, ambiguous flag, suggestion (re-index / fix robots / ignore)
Scenario: You have 500 product pages. Each URL is valid HTTPS, no duplicates. Quota: 200 requests/day.
Plan: Day 1: 200 URLs. Day 2: 200 URLs. Day 3: 100 URLs. Each batch of 20 URLs takes ~20 seconds (1 sec gap per batch). Total run time per day: 10 minutes.
Result: 430 URLs returned INDEXED. 55 returned NOT_FOUND. 15 returned INVALID_URL (typos in URL list).
Cross-check: Of the 55 NOT_FOUND, script found 30 with noindex tag and 10 blocked by robots.txt. 15 were truly not found (404). Action: Remove 404 URLs from sitemap, fix robots.txt for the 10, remove noindex from the 30, re-submit all 55.
Time saved: Manual check of 500 URLs would take ~4 hours. Automated pipeline: 30 minutes of setup + 30 minutes of run time over 3 days. That is a 4x speedup for a one-time audit.
Duplicate URL lists are the #1 cause of wasted quota. One agency we worked with sent the same 200 URLs every day for a week because their crawler was not deduplicating. Run a set() on your list before the first call.
Empty results happen when the API returns a 200 with no body — usually a transient backend glitch. Retry with exponential backoff (1s, 2s, 4s). If you get three empty responses in a row, skip that URL and flag it.
Weak pages (thin content, no internal links) will show as INDEXED but have zero search impressions. The API does not tell you that. You need a separate analytics check, like the one described in this technical migration protocol for reindexing. Use a second pass with Google Search Console API to compare indexed URLs against impression data.
Slow vendors: Some DNS providers or CDN layers can cause the API to time out. If you see consistent 504 errors for a domain, check the DNS propagation before blaming the API.
Verify service account email is added as owner in Google Search Console
Confirm OAuth scope is exactly https://www.googleapis.com/auth/indexing
Deduplicate URL list and remove non-HTTP(S) entries
Split list into daily batches of 200 max
Set up exponential backoff (1s, 2s, 4s) for transient failures
Add a manual review flag for NOT_FOUND responses
Log the API response body to a file for post-run analysis
Use the Google Indexing API getStatus endpoint with a service account. Build a script that loops through URLs in batches of 20, respecting the 200 requests/day quota. For agencies with many clients, create separate Google Cloud projects per client to multiply the quota.
The Indexing API checks a single URL's current index status. The Search Console API provides aggregated data (impressions, clicks) and can list all indexed pages for a property. For bulk index checking, use the Indexing API. For performance analysis, use Search Console API.
Yes, but only if you own the site. The API requires ownership in Search Console. For third-party guest posts, you need the site owner to add your service account. A common workaround is using the API to check your own site's index status for backlink pages you have placed.
The quota is 200 requests/day. To exceed it, you need to request a quota increase from Google Cloud Console (not guaranteed). Alternatively, distribute the load across multiple Google Cloud projects, or stagger your checks over several days. The quota resets at midnight Pacific Time.
This happens when the page has a noindex meta tag or is blocked by robots.txt. The API treats both cases as NOT_FOUND. To differentiate, run a headless browser check or use the Search Console URL Inspection API as a fallback.
1) Deduplicate URL list. 2) Batch 20 URLs with 1-second delay. 3) Use exponential backoff (1s, 2s, 4s) for retries. 4) Log raw API responses. 5) Cross-check NOT_FOUND results with a headless browser for noindex/robots.txt. 6) Schedule runs during off-peak hours.
INVALID_URL means the URL is malformed (missing scheme, special characters, etc.). Fix by URL-encoding the string, ensuring it starts with http:// or https://, and removing fragments (#). Add a validation step before the API call to catch these early.
No alternative offers unlimited checks. The Indexing API is the only official API. For very high volumes (thousands), consider the Search Console API's list of indexed pages, which gives you all indexed URLs at once without per-URL cost. This works for owned sites only.
Yes, the API has a publish endpoint that submits a URL to the indexing queue. After a check, if the status is NOT_FOUND and you have fixed the issue, call the publish endpoint for that URL. Note: the publish endpoint shares the same 200-request quota.
Expect 12-18% of URLs to return NOT_FOUND due to blocks (noindex, robots.txt, 404). Of those, about 70% are actually blocked, 30% are true 404s. Always run a secondary verification to avoid false positives in your index status report.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.