For full transcript http://bit.ly/WWCrawlBudget
Hello. Welcome to another episode of Weekly Wisdom. Today I would like to talk about crawl budget, as it is going to be understood in 2020 and beyond. Crawl budget is something that has changed over the last year. Basically, there are quite a lot of new factors affecting it and quite a lot of changes with how Google is crawling, rendering, and indexing our content. I will go through all of the things affecting the crawl budget that I could come up with. I am going to try to go through as many things as possible, definitely touching on the most important ones within the next 10 to 15 minutes.
Organized Website Structure
Let’s start with the most important thing that is actually a classic, something that didn’t change for the last few years, and it is an organized website structure. It has quite a lot of different metrics within that statement. Let’s go through them one by one:
Original content, no duplicates — this means no duplicate content, no near-duplicates, no soft 404’s. These are the key things basically, and key offenders for most of the crawler budget and index bloat issues.
Index bloat contains everything that is not valuable, not searchable. If you have any pages that people wouldn’t search for or that don’t have any kind of traffic flow, or they don’t correspond to any queries or user intent, then I wouldn’t have them indexed in Google.
Everything that directly affects crawling your website: internal redirects, internal 404s, server problems, 500 code problems such as timeouts, and so on.
The most important part, but not as technical as the previous section, would be information architecture. It covers everything that goes with how your website is structured and how logically it is built. Information architecture affects how both users and Google can look into your website structure and understand what to rank and how to index your content properly.
This has everything to do with indexing strategy. For example, you could have an eCommerce store and quite a lot of different pages and faceted navigation with different filters. You would not want to index pages with a filter from $102 to $104 for a certain product. The whole indexing strategy for an eCommerce store has to be in place to make sure that Google’s crawling and indexing are as efficient as possible.
If you have a lot of similar products or a lot of content pieces that are somehow similar to each other, you need to differentiate them, so Google clearly knows which of those pages is the most important for a given query. As the oldest rule in Google says, if you have two pages competing for the same query within your structure, neither of thee two is going to win. Most likely, your competitor is going to steal quite a lot of that traffic.