Defining Index Bloat
Index bloat occurs when there are too many pages on a website indexed in search engines. In other words, when your site “bloats” search engines indices, there is an excess of low-quality pages that Google is indexing, wasting valuable and limited resources on pages that you probably don’t care about.
Index bloat can lead to the following SEO issues:link
- Exhausts crawl budget *
- Decreases the organic quality of the domain
- Lowers the ranking potential of your other pages
Additionally, there are a few scenarios that apply to some websites in specific situations that make them prone to having too many pages indexed:
- Ecommerce websites that use URL parameters: Hundreds/thousands of possible URL variations added by introducing product filtering or re-ordering.
- Medium to large websites: Sites with large numbers of pages that may not necessarily have a need to be indexed; like Thank You pages, PPC landing pages, Testimonial pages, and others.
- Sites that have been blogging for a long time: We very often find archive pages like blog tags and date archive pages to bloat search engine indices, especially when there’s not a defined blog category/tag system in place.
- Site redesigns or migrations: It’s very common to find lots of dev, or test pages left over during a site redesign or site rebuild.
Search Engine’s Limited Crawl Budget *
Gary Illyes, a Google Webmaster Trends Analyst, said in 2017:
“Prioritizing what to crawl, when, and how many resources the server hosting the site can allocate to crawling is more important for bigger sites or those that auto-generate pages based on URL parameters.”
One of the main reasons index bloat occurs is because Google finds too many pages on your website that don’t have any instructions on how they should be treated. Very often a large number of these pages result in being indexed.
Taking control of how Googlebot and other search engines crawl and index your site is imperative in order to ensure you are at your maximum ranking potential. Being at this level means that Google efficiently finds your pages, understands your content, and matches the searcher’s need for information to your pages.
How to Find Sources of Index Bloat
The ideal scenario here is to have your website audited by SEO experts in order to have a comprehensive and holistic view of your website, its history, and your business objectives.
The second-best scenario is to use the Index Bloat guide I wrote on Search Engine Watch for very specific checks that can be completed. However, keep in mind that this is focused on common issues we see, and they may not necessarily apply to you.
Resolving Index Bloat
The removal of pages from Google’s Index may certainly test your patience, as it’s a slow and painful process depending on the severity of the bloat.
It heavily depends on the CMS limitations of your site and the SEO strategy you have in place. If you don’t currently have a keyword research strategy, this is one of the first things you need to do before you begin any form of on-page SEO or link-building.
Do not implement recommendations you read online before carefully considering the implications it will have on your website. Fixing index bloat essentially involves manually asking search engines to de-index your pages. Doing so without proper guidance or without having strategy in place can directly lead to a considerable drop in rankings.
Get Your Website Audited
As always, every website is different. The specific methods to resolve index bloat that works for you must be carefully considered by an SEO based on a comprehensive and thorough SEO audit of your website. Adding noindex meta tags on the wrong pages or disallowing incorrect subdirectories on your site could potentially lead to a drastic drop in organic traffic and conversions.