Chapter 2 – The Dynamics of a Search Engine

A search engine wants to produce results that solves a searchers problem the fastest and easiest way. They crawl and understand sites based on the content on the web pages, pages are indexed higher or lower based on relevancy to the search term.

To have your website featured in the search engine results pages the search engines need to know that it exists. This is the easiest part of any SEO strategy, that is getting your website crawled by a search bot and indexed in the SERPs (search engine results pages).

How does a search engine operate?

Search engines use a bot often referred to as a spider, this crawls the web looking for content to index in the SERPs.

The engine as whole does the following:

  1. Spider Crawls: The search engine spider goes out and passes from one website to another following links, this enables to discover new content.
  2. Indexing: During the crawl the content is stored and placed in the index. When a page is found in the SERPs it can be optimized on and off site for a higher ranking.
  3. Rankings: The engine shows the most relevant piece of content that covers the searchers query the best at the top. The results show the most relevant down to the least relevant.

In short, the search spiders crawl, index, and rank the data. Higher rankings depend on many factors.

What happens when a search bot crawls the net?

The search engines send out what is known as a bot (robot) AKA crawlers or spiders. The bots find new content and even run over old content to discover if it has been changed. The crawlers find all types of content for each part of Google search pages, this includes web pages, images, videos, documents, and a lot more. The content is found by links from one web site to another, the crawlers follow them and jump from one page to the next.

how the search engine spiders work

The Google Bot or Bing Bot starts out at a web page and hops from the links to new web pages, this includes your internal links to other pages on your website. By passing from one site to the next the crawlers are able to discover new content and register if old content has been changed. All of these URLs are stored in the index, which are thrown up when the searcher types in their search phrases.

What is the search engine index?

The pages/URLs that the spiders have discovered are stored in an index, this is a database that is drawn upon when a searcher is looking for information that matches anything there.

What is search engine ranking?

The process starts when the searcher inputs their search phrase into the search engine, the engine then goes to the index/database to retrieve the content and display it in order of relevancy to the searchers search phrase. The order of the webpages in the SERPs is called ranking. The higher a webpage sits in the search results pages the more relevant the engine believes it is to the searchers search phrase. Remember that this is just a machine, and it very often does not show the best most relevant content at the top, this is because an SEO expert can manipulate the results.

There is a file on any website that can be used to stop search spiders from entering your site or parts of your site. There are many reasons why you would wan to block the bots, some pages, like sensitive pages with personal details on, can be blocked and not put in the index.

After you have finished reading this section you will know exactly what the search engines are looking for to index and rank your pages.

It’s good to keep in mind that Google is the largest search engine, followed by Bing and Yahoo. Because Google holds most of the market share the majority of SEO agencies and experts focus solely on Google. More than 90% of all web searches happen on Google and all its properties, YouTube is also part of it’s empire.

Can the search engines crawl/find your web site?

To ensure your site appears in the search index it has to be crawled by one of Google’s spider bots. To discover if your website is in the index you just need to drop your homepage URL into Google search and see the results. This will tell you how many pages of your website is in the index. This helps you to know if Google bot is crawling your website and which pages it has discovered and put in the index.

discover website is indexed in Google

As you can see there is a number of results, this gives you a rough idea of how many URLs are indexed. There may be some profile links that show too, but it gives you a rough evaluation of what is indexed on Google for your website.

To know the exact number of indexed pages you will need to add your website to a Google Webmaster Tools account. You will be required to verify that you own the domain, this can be done in a few different ways and is not difficult. The tool is very useful, you can use it to ask Google bot to crawl certain pages of your site, you can even submit a sitemap for Google bot to crawl and index all the pages on your website.

If your website does not appear in the index at all there are some reasons why this could be:

  • You have just set up your site and it is new and has not been crawled.
  • You do not have any links to your website from other sites.
  • You have set your robots.txt to block Google bot from crawling it.
  • The may have been penalized for black hat backlinking tactics.

Please remember that there will be pages on your site that you do not want Google bot to crawl and index. These pages will be account pages, pages where customers details are stored etc. You may also want to block Google from crawling poor thin content that you wrote before you discovered knowledge on SEO. To tell Google bot not to crawl certain pages you use a robots.txt file.

Robots.txt File

As you can see from the file extension, this is a basic text file that can be created using notepad. The file can be uploaded to your root directory, the same section your index page is found. You can use the file to direct the crawl bots when and where they crawl your website.

How does Google bot read the robots.txt file?

Remember that if you do not have a robots.txt file on your website then Google bot will go ahead and crawl the whole of your website, of course this will be determined by your internal links.

If the Google crawl bot finds the robots.txt file on your website then it will follow the rules that you have set for it.

Be sure to note that if Google bot discovers an error in the robots.txt then it will not crawl your website.

It is also good to note that there are people out there that build bad bots that will not follow the rules set in your robots.txt file. There are bots that have been developed to scrape sites for content, like emails, and even peoples addresses. If you place URLs of peoples account details in the robots.txt file it makes it easy for these bad bots to know where they are. With that in mind, it is better practice to NoIndex these pages and then have a login form placed on them, leaving them off the robots.txt file.

Defining URL parameters in GSC

There are some websites (mostly ecommerce) that will have the same content available on many URLs. This will be done on product search pages, the results from search forms on ecommerce websites.

With that in mind, how does Google know which of these URLs to index and show in the SERPs? You can use URL Parameters in your Google Webmasters Account to tell Google bot how to work with your pages. You can ask Google bot not to crawl certain pages with a certain parameter, this would result in all those pages dropping from their index. It’s great if you want duplicate content to be removed, but not so good if you do want all those pages appearing in the index.

Can the search bot find all your major pages/content?

Not let’s take a look at what you can do to ensure Google bot crawls and finds all your important content. Some of your major big content pages may have been missed by the bot for one reason or another. It’s important that all your big keyword content is crawled and indexed to get that traffic to your website.

So, using Google Webmaster Tools you know Google bot can access and crawl your site. But can Google bot crawl through your entire site, passing through your internal links?

The bots can’t crawl past login forms

If your website users have an account that is behind a login form then the bots will not crawl that area of your site. A search bot does not act like a real user, it can follow links, but it can’t fill in forms.

Do you have search forms to find content?

Remember that Google bot is just that, it is a bot. A crawl bot can’t use a search form to find your content, the bot follows links to find your content. If you simply have a search form on your site for people to discover your content then this will hinder a crawl bot from finding it all.

Is your text/content hidden in images etc?

If you have content written into some media like an image then Google bot can’t see it, and will of course not index it. It is always best to have your text within the body content on your page. The search bots are getting better at reading images, however for now, it is better to be safe than sorry.

Can the search bots follow your site navigation links?

The main navigation menu on your website is where Google bot will work it’s magic, you must have all your main content available on the site navigation bar. If you have a big piece of content that you want to be indexed then it must be one or two clicks away from your main index page. You must make it easy for Google bot to find these pages, the main navigation on your website is the best way to do it.

People make the following mistakes that stop crawlers finding content on their website:

  • Ensure your desktop and mobile navigation are the same, a different one on each platform could hinder indexing.
  • Search bots do not like Javascript, and a Javascript enabled navigation may hinder the bot from finding your content. Make sure the menu is in simple html format.
  • Showing a different navigation to different visitors when logged in may hinder Google bot from discovering all the content.
  • Not putting main pages on your site navigation.

Keep your site main navigation bar simple, in plain html, and you will not have any problems with Google bot finding and indexing your pages.

How is your site architecture?

Having a great site architecture will allow not only your visitors to discover all your major content, but it will allow the search bots to discover it too. Users should be able to find what they are looking for logically without much thought going into it. If it’s not easy, you risk losing the visitor back to the search engines and to using another website other than yours.

Do you have your sitemap set up?

A sitemap is a list of all the URLs on your website, if you use WordPress as your content management platform and set up Yoast SEO plugin it will add one automatically for you. You can then submit this sitmap through your Google Webmaster Account, this makes it easy for Google bot to crawl and index all your pages. This combined with a good navigation bar ensures all the content you want discovered and indexed most certainly will be.

Do the crawlers get error codes when they try and access your web pages?

As the search bot flows through your site it may encounter some errors, these crawl errors can be found in your Google Webmaster Account. The log will show you details about errors with your server and even not found errors. You can also discover how often your website is crawled by the bots.

4xx Codes is when search engine crawlers can’t access your content due to a client error

If you see a 4xx error in your Webmaster Account it means the URL contains bad syntax or cannot be fulfilled. The most comment one is where page can’t be found because it may have been moved, this is called a “404 – not found” error. This can happen if you have changed the content to a different URL extension, or the page has been deleted altogether. When a search bot comes to a 404 page it can’t access the URL. A website user will see a 404 page, this will be a page with no content on it, they will normally leave your site in frustration.

5xx Codes is when search engine crawlers can’t access your content due to a server error

If the bot encounters a 5xx error it means there is a problem with the server. The whole site might be down, or just the page on the server may not be returned. You can discover how and when this has happened to your site using Google Webmaster Tools, this is under the “crawl error” tab. This usually occurs when the bot can’t access the site because the server takes too long to respond.

If you move any pages on your site you can use a 301 permanent redirect, this tells the bots and your visitors that the page has been moved. It will also take them to your new page.

Doing a 301 improves user experience and also allows the bots to discover the content. The 301 also passes the link juice over to the new page, so the backlinks pointing to the old URL will now have an effect on ranking on the new URL.

The 301 code means the page has permanently moved to a new one, be sure to have the new URL focused around the same content subject as the old one, or the result will be lost rankings.

The alternative is to use a 302 redirect, this is done when you move content on a temporary basis. This is where you are moving the traffic to a new page for a short time, just until you have finished what you are doing on the main page before it is back again.

Be aware that passing Google bot through many redirect will hinder the chances of the content being indexed, this is known as a redirect chain. If you redirect a page from one to another, and then again from the later page to another page it will be labelled in this way as a redirect chain.

Now that you are certain a bot can crawl and find all of your pages, the next step is the get it indexed.

Getting Indexed: How do the search engines read and store your URLs?

You know that Google bot can now crawl your site, but this does not mean it will be indexed. The index (database) is where your URLs are stored until Google needs to use it in the SERPs for any search phrases used. The crawler looks over your content and the details are stored in the index.

We are now going to look at how indexing works and how you can ensure your site hits the index.

How does Google bot view my web pages?

If you go and put your URL into the Google search engine you will notice that underneath the listing is the word “cached”. This is a what Google bot discovered the last time it crawled that web page on your site.

Site that add new content a lot, like Facebook, and big news sites, will be crawled often by Google bot. You can click on the “cached” version of your site to see what Google bot crawled the last time it was on that page.

There is also an option to view the text-only version of that page, this allows you to establish what body text has been read by the bot.

Do webpages ever get removed from the index?

Webpages are removed from the index all the time, here are some reasons why this will happen:

  • If the URL returns a “not found” error a 4XX code or a 5XX code where the server does not load the site then it will eventually be removed from the index.
  • The URL may have a Noindex meta tag attached to it, this tells the search bots not to index the URL.
  • The URL can be removed after being penalized from not following Google Webmaster Guidelines, this can be a result of buying backlinks, or building PBNs etc.
  • The URL has been blocked from being crawled, a login form may have been added that will not stop a crawler from viewing the content.

To check if a URL has been removed from the index you can go to your Google Webmaster Account and enter the exact URL into the URL Inspection tool. You will discover if the URL has been removed from the index, you can also request that it is crawled again for reindexing. You can also view errors on the URL, this will help you understand why it was removed from the index.

How to instruct the search engines to index your site

Believe it or not, there are a few ways that you can instruct the search bots to index your website.

Robots meta tags

Your Meta Data AKA Meta Tags or Meta Directives give the search engines instructions on how you want the URL to be regarded.

You can tell the search bot not to index the URL and even tell it not to pass any link juice through the links on the page. This is all done using the Robots Meta Tags which is stored in the header area of your code.

The Meta Tag can be used to exclude all or specific search engines. We have listed the most used Meta Directives including ways you will use them.

index/noindex: This tells the search bot if the page should be crawled and kept in the database to be used in the index. If you use Noindex then you are telling the search bots not to index that page in the SERPs. The search bot will naturally crawl all pages, so the index tag is not needed.

  • When it will be used: If a page is not needed to be indexed you will use the Noindex tag, this might be because a page has sensitive data on it.

follow/nofollow: This notifies the crawler whether it should pass through a link or not. A nofollow tag will stop the crawler passing through the link to the target URL.

  • When it will be used: This can be used if you do not want a crawler to follow a link to a page, you may not want the target page to be crawled and indexed. This also can be used to stop juice passing from one page to another, this would be useful when linking to an external resource.

noarchive: This is when you want to restrict search engines from caching a page, the search engines will do this unless directed not to by using this tag.

  • When it will be used: You might not want a previous version to be shown to your website visitors where things change on the page often, like prices.

To see this meta data Google bot needs to crawl the pages as it is included in the header code section of your site. If you want to block search bots from crawling your website then you must use the robots.txt file.

X-Robots-Tag

The x-robots tag is used within the HTTP header of your URL and offers flexibility over meta tags as it can be applied site wide.

The derivatives used in a robots meta tag can also be used in an X-Robots-Tag. If you use WordPress you can block bots with ease using your robots.txt file directly from within the dashboard.

If you follow and understand everything listed here it will help you get your important pages crawled and indexed in the search engines.

How do the search engines rank website pages?

To really get to grips with SEO you need to understand why a search engine puts a website above another website in the SERPs for particular keywords.

The search engines use what is known as an algorithm, this is a series of rules that are used to place websites in an order in the search results pages.

Google and Bing are constantly updating their algrithm and the SEO landscape is constantly changing, this is what makes it so much fun. Some updates have been done to deal with Spammers, like people copying other peoples content.

Why does the algorithm change a lot? The algorithm changes so much to combat SEO experts gaming the results. This is done to improve the quality of the results pages, ensure the top listed site is the best resource for the search query.

What does the algorithm want?

The search engines wants the best resource at the top of the results pages, the one that helps answer the searchers questions right away.

Over time the algorithm has become very complex understanding how words relate to each other, also noting where content has been plagerised from other places on the net. Google does not want to show the same content over and over again but on different sites, that would annoy their user and is completely wasting peoples time. Take that as a big hint, do not copy content.

In the early stages of SEO it was much easier to fool the system, it was possible to push a search phrase into some content over and over again and it would rank high for it. These days that is a big no no, somrthing that will get your site penalized for over optimization.

Give the search engines what they want. The crawlers want fresh content that is unique and offers something different to what is already out there.

What about links and SEO?

There are a few kinds of links, you have internal links, these link the webpages together on your own website. Then there is backlinks, which is a link from another website to yours.

Links, both internal and backlinks play a huge part of SEO. The search engines use backlinks to determine how good a piece of content is, a page with a lot of backlinks has a lot of votes for it. In the past people caught onto this and went ahead blasting their pages with all sorts of links, right now this does not work. Backlinks are still important, however you need contextual links from related sites, anything else could get your site penalized.

The more backlinks that your website earns from other authority sites in the same niche the higher it will rank. This is like having many votes of confidence for your pages, it tells the bots that it is a good resources as other people have taken the time to link to it.

How important is content in SEO?

The words you place on any web page play a massive part in SEO, it is well known that longer content over 2000 words in length ranks higher. However, this is not the rule of thumb, it really does depend on the focus keywords.

How do the search engines determine which piece of content is worthy of the top spot? You will notice that if you visit a top ranking website for any search phrase it will be highly relevant to that term, the page will have they search phrase within the content. The content will match the search phrase typed in by the user, the content will satisfy the users request.

Right now there is no real benchmark for how often a search phrases should be used within the content on a page (keyword density). The words you use in your Meta Tags will influence where your page is placed in the SERPs, the content length and content subject matter will also influence the placement.

We do not know how many ranking signals there is, you can be certain there is more than 300. You can be certain that backlinks, content, and how a user behaves on your site will directly influence where it is placed in the SERPs.

What is RankBrain?

This is a component of the Google algorithm, it learns from the search engine users behaviour. If a website in a lower ranking has more visitor dwell time, page views, click through rate than a higher ranked site you can be sure it will move it up the rankings.

The way the user interacts with your website has a big influence on how it is being placed in the SERPs. If a user searches for a phrase on Google, clicks on the top site and returns to the SERPs a few seconds later it tells the bot that the site visited did not solve the searchers problem.

This is why you need to have your Meta Data completely in line with your on-page content, your site must load fast too. Provide the best information for the search phrase and have a well constructed site that loads quickly and directs visitors where they need to be with ease and things will be good for you when it comes to getting top rankings.

User behaviour on the SERPs and your website

Like we have already established, the way a user behaves on the SERPs and on your web pages will influence rankings. This includes things like:

  • Clicks to your site from the SERPs.
  • Time spent on your pages.
  • If the searcher bounces back to the SERPs (Bounce Rate).
  • If the searcher spends time on other sites in the SERPs.

User behaviour has been proven to influence rankings, can really good user engagement metrics get you a top ranking alone? It certainly send a strong signal to the search bots that your content is great and worthy of a top ranking. This combined with niche related backlinks can be very powerful indeed.

From the horses mouth (Google)

Google have stated that they use user behaviour as a ranking signal. They know that if more people click on website ranked in position 2 over position 1 and stay on the site then that is what is more relevant to the search term and should be in the top spot.

We do not know how much priority Google put on this data over backlinks, however it is something as an SEO you should take seriously. As you would expect, great content goes hand in hand with good user metrics.

These metrics can also be gamed, you could pay people to search for a particular keyword and click on your website and stay on the site for a long time. So, with that in mind, I do not expect these metrics to have a huge impact on rankings, especially as the search engines get more sophisticated.

How have the search results pages have changed?

At one time Google returned just 10 sites in the SERPs, it was very simple. Not much to look at, just a list of sites. Over time new sections have been added to the results pages.

Now you will see a sections at the top and bottom for adverts, sometimes a featured snippet (also known as the zero spot), questions and answer boxes, local 3 pack, a knowledge panel on the right, shopping image products, and then there’s the main site links.

You can be certain Google will be adding more sections over time. What you see in the SERPs depends on the search query, if you are looking to buy something you will sometimes see products in the placements.

If you enter a questions into the search bar you will see a questions and answers part to the SERPs and sometimes a knowledge section to the right. What you see all depends on what you type into the magic Google search box.

Local searches

If you enter a search term with a local place name attached to it then Google will draw upon its index of local business listings.

You can appear in the local 3 pack for local search terms if you claim and verify the Google My Business listing for your shop or work place. When it comes to ranking local pages Google uses 3 main factors:

  1. Relevance
  2. Distance
  3. Prominence

Relevance: is how well the business matches the users search term. To ensure you get this right make sure your business details are completely filled out, make sure they are the same on your website as they are on your Google Business page.

Distance: Google uses your business location to display better local results. It also uses the searchers location from their IP. This can bring back very specific local results for the user.

Prominence: A business has higher visibility if they are well known, the bot established this by the number of reviews etc such as:

  • Reviews: This is the number of Google reviews the business has, it also takes into account other reviews on Facebook and other review sites like Yelp.
  • Citations: These are mentions on other sites that will list your businesses contact details also known as NAP (Name, Address, Phone Number). If your details are consitant across different sites on the web it builds trust with Google.

Organic rankings

Regular SEO practices also correlate to local SEO, Google also lists website in the organic SERPs for local search terms. It’s good practice to have your Google My Business details consistent with your website details.

Local search results are influenced by user behaviour data, Google wants to display the best business to the searcher. So great reviews and online presence will help with long term high rankings.

You should be aware that nobody knows the full extent of the Google algorithm, and that it is always changing. Through mush research and tests, you can be certain that quality content, user behaviour, and backlinks are the most important part of the SEO puzzle.

Introduction

  1. What is SEO
  2. The Dynamics of a Search Engine
  3. Keyword Research
  4. On-Site SEO
  5. Technical SEO
  6. Off-Site SEO – Backlinks and Promotion
  7. Measuring SEO Results – Track Keyword Rankings
  8. SEO Glossary