Where JavaScript and SEO meet

   Michel Goldberg |  אוגוסט 2018
In the following document we‘ll review the topic of JS from an SEO point of view and lay out all of our information and guidelines on the matter.

We’ll start with a short overview of how Google crawls & indexes and map out the challenges search engines face with JavaScript, and finally we’ll present the solutions and adaptations needed to enable Google to crawl & index JS sites properly.

 

 

Introduction

 

(a) To establish our point of view on JavaScript websites from an SEO perspective let’s start with a quote from the well-known Googler John Muller:

The message is clear—Google understands that the use of JavaScript will only keep growing, and they have been adapting to this.
In 2018, it is legitimate to build a website making extensive use of JS; the advantages are clear and Google’s capabilities will constantly improve over time and will adapt to the Internet’s standards.

(b) Having said that, in the short and medium terms JS websites will be confronting Google with significant crawling and indexing challenges, which is what we will be covering in this paper— we’ll map out the problems faced by Google when handling JS websites, and the recommended practice in order to overcome them.
For a better understanding of the subject we shall now briefly review how Google crawls and indexes the web, followed by the challenges JavaScript presents for search engines.

 

How Does Google Work?

(a) Efficiency
First of all, it is important to realise that a central element in all matters concerning Technical SEO is efficiency—a matter of paramount importance for the Google crawling and indexing mechanisms, which must operate with maximum efficiency. The Internet is a vast ocean to be crawled and indexed, however Google’s resources, extensive as they may be, are ultimately limited.
That is why a considerable part of technical SEO is aimed at assisting the crawling and indexing on a website, whether its clearly defining a single <h1> tag on the web page or by blocking an entire area of the website using the robots.txt file, and a variety of other actions designed to help the crawling and indexing mechanisms focus only on the relevant web pages and “comprehend” the content of the web page clearly.

(b) Crawling & Indexing
Let’s start with a short explanation of the process, and how Google’s two central mechanisms work:
Crawling mechanism—known as Googlebot—in charge of discovering new URL addresses and forwarding them for indexing; the crawler requests the address from the server, receives the HTML file from it and forwards it for indexing, while collecting scraping any links to additional addresses the file contains and continuing to crawl them further.
Indexing mechanism—occasionally known as Caffeine—responsible for evaluating and indexing the content at the URL address obtained from Googlebot; among other operations the mechanism is responsible for indexing the content of the web page, assessing its importance, as well as prioritising the crawling of the page for Googlebot, i.e. the frequency the web page is visited based on its level of importance.
As may be deduced, the mechanisms operate concurrently and in parallel—one of them crawling and discovering addresses whilst the other indexes their content: efficiency.
At the same time it is to be noted that these are separate mechanisms.

(c) Speed
Next, bearing this in mind, we can now address loading speed. Google is strict with efficient use of its resources, not allotting more than several seconds for rendering a web page (it is believed it allocates approximately 5 sec. but in reality this varies by site). After a few seconds, Google moves on; i.e. content not rendered within a few seconds may not be indexed. Of course it is not only the particular web page that suffers, but the entire website, if the resources on its pages take too long to load—Google being far from happy (to put it mildly) to crawl & index due the load the site imposes on its resources; this may result in fewer of the website pages being crawled, and at a lower frequency.

(d) WRS based on Chrome 41
On that Caffeine note, it has been officially announced that Google’s Web Rendering Service is based on the Chrome 41 browser, and is in fact identical to this version of the browser; in other words, Google uses a 3(!)-year-old browser for rendering web pages.
An understanding of Chrome 41’s abilities and limitations will make it far easier to build a website that Google can crawl and index properly.
The chrome 41 browser should also be used as an important debugging tool.

Chrome 41 may be installed here.

 

 

 

 

 

 

A Challenge Called JavaScript  

Having reviewed Google’s crawling and indexing mechanisms, we may now turn to the challenges JS presents for search engines: In short, the main issue with JS is in its lack of efficiency for Google (despite being nice for users), coupled with the fact that JS has a tendency to ignore Google’s limitations.
Let’s look at this more closely and set out the points of friction between JS and search engines:

1. Client-side rendering – many JS sites utilize client side rendering i.e. the HTML is loaded on the client’s side, which has a significant impact on search engines—

> The main implication is the need for Google to allocate resources for rendering the web page. On server-side loaded websites, Googlebot requests the web page from the server and obtains the entire HTML already rendered and ready for indexing; Googlebot forwards the HTML as-is to Caffeine, which can then index it—a 2-stage process is involved: crawling and indexing. On client-side loaded JS websites, the search engine only obtains a basic file from the server and a further stage is required in the process—rendering of the web page’s full HTML in preparation for indexing, the process now entailing an additional stage: Crawling > Rendering > Indexing. The additional stage, rendering, requires the allocation of additional Google resources and as stated, Google manages its resources with maximum efficiency.

> An additional implication, derived from the previous one, is the inability of Googlebot to crawl links and additional addresses concurrently with the work done by Caffeine. As mentioned, on client-side rendered JS websites, Googlebot does not receive a rendered and ready HTML from the server, so for Googlebot to continue to scrape and crawl new URL addresses from the web page, it must wait for Caffeine to finish rendering the web page, identify the links on it and forward them back to it so that they may be crawled: inefficient.

In consequence, the crawl rate on JS websites tends to be very low—web pages are simply not crawled. It should be emphasised that it is not a matter of Google’s ability to render JS web pages or to crawl JS links, but of it’s reluctancy to allocate the resources required to render client-side web pages and crawl its links (In other words, wait for caffeine to finish rendering the web page).

Clarification: The term “rendering” refers to the compilation of the HTML (the DOM)—not the visualisation of the DOM in pixels:

It’s worth taking a moment and delve into the server-side loaded JS web page indexing process:
In a video for developers (here) Google declared (and as explained above) that as far as the search engine is concerned the rendering process is entirely separate from crawling or indexing, and that with web pages which require rendering the indexing is put on hold until the necessary rendering resources are available:-

 

Google also informed (in the same video) that for web pages requiring rendering there are actually 2 waves of indexing:

First wave: On receipt of the basic HTML file by the crawler, the basic file is indexed without being rendered and waits until rendering resources are available to render the full HTML.
Second wave: After the rendering resources have been freed and the full HTML is rendered, the web page is re-submitted for indexing in its complete form, and the links on it are scraped forwarded to Googlebot for crawling.
The following is the official Google diagram illustrating both waves:

 

 

Incidentally, indexed server-side web page URLs without any indexed content may occasionally be seen, as the page may be currently be between waves of indexing.

2. Incompatibility with Chrome 41—today’s JS libraries may use things which didn’t exist 3 years ago and which Chrome 41 is incapable of rendering (e.g. ES6); which means that Google cannot render them either, and content that Google does not know how to render cannot be indexed.
We would point out that of no less importance is the fact that users with older browsers (“old” may even include newer versions of Explorer or even Safari) will encounter the same problems when visiting the website, and might view content partially or not at all.

3. Speed—a website’s speed is a challenge within itself, unrelated to JS, although the initial loading of JS websites (the first time a response is forwarded from the server) is liable to be sluggish, not necessarily through the fault of the JS itself. On client-side rendered web pages all the important content of the page (including critical meta-tags) is built by the JS on the client, and is not in the initial HTML so if the scripts load slowly on the client’s side Google is liable to abort without full rendering of the web page’s content from the scripts.

4. Hidden content—in addition, JS makes use of various types of user-generated events triggered by user interaction (such as onclick) to load content, i.e. the content will not load in the initial loading of the web page, but only when the user initiates a certain interaction will the browser submit an additional request to the server and load the requested content. It is important to understand that Google is not a user in the sense that it doesn’t interact with the web page, which means that content not present on the web page on completion of the initial loading, rather is hidden behind some user behaviour/interaction, will not be available to Google and therefore will not be indexed.
5. URL address—some of the JS libraries (such as Angular 1) use a hashtag (#) in URL addresses, however Google ignores anything after the hashtag in the URL, since as far as its concerned that is not a unique address, but an internal link to a specific location within the same web page, a kind of bookmark; i.e. Google does not see this as a unique address warranting separate indexing.
In summary: Although Google has the ability to crawl and index JS, there are still some limitations that have to do with Google being a BOT and its necessity as a search engine to run efficiently, so at the moment it still needs our help in order to ensure crawling and indexing of JS.

Guidelines for Adapting a JS Website for Search Engines

As noted in the introduction, everyone understands that JS is here to stay, so there are solutions for adapting JS websites so that search engines can crawl and index them; these are mapped out in this section:

(a) Serving Web Pages
To ensure that web pages powered by JS are served in a way that Google can index easily and efficiently (in other words, sparing it the rendering stage as much as possible), Google offers 2 possible solutions, whose common denominator is that for Google the web pages are served with the main part of their content already rendered.

Solution A—Hybrid Rendering: The ideal solution which Google believes will set the long-term standard, although it is more complex to implement.
Solution B—Dynamic Rendering: A functional solution which currently satisfactorily solves the issue.
Let’s elaborate:

Hybrid Rendering or Isomorphic JavaScript
Google’s recommends serving the website (recommendation shown here in Google’s official video for developers, similar recommendation here in additional video by John Muller) using Hybrid Rendering which can be done using Isomorphic JavaScript.
The guiding principle is combining the use of server and client rendering.
The main part of the content important to the user and search engines is rendered on the server-side, reaching the browser in HTML form; and web page elements mainly intended for user interaction (which for the most part are irrelevant in for Google) are rendered on the client-side. Thus, Google does not need to render the web page content that is important to it for indexing, however the website can nonetheless offer users a rich and dynamic user experience.

The fine balance between client side and server side rendering is neatly explained here:

 

In practice the implementation of hybrid rendering may be complex in the majority of frameworks, however the Angular Universal framework can be used to balance server- and client-side rendering (Angular Universal official website).
This solution also requires a JS-powered server.

Dynamic Rendering
This solution is based on the principle of the server identifying Googlebot requests (by user-agent) and rendering the web page on the server-side for Google only, but the page continues to be rendered on the client-side for users in the normal fashion.
For this solution Google recommends that rendering is not performed for Googlebot on the server as-is, because this may require a large amount of the server’s resources, but rather recommends integrating infrastructure in the server allowing it to render externally for Google.
Flowchart of dynamic rendering:

It is to be noted that to a considerable degree this practice is quite similar to the escaped fragment solution, in which we also served dedicated content to search engine crawlers only (a solution deprecated by Google as last year), the distinction being that in this solution the dedicated content is served according to user-agent rather than according to query.

This solution may be easier to implement, but as stated, Google believes the long-term recommendation is Hybrid Rendering.

To be emphasised: The aforesaid solutions are not a precondition for indexing a JS powered web page. On a technical level Google is capable of rendering and indexing most JS web page elements. However as of now, and considering the challenges Google faces with JS, there are no significant client-side web sites that are ranked well in search results—please take note of this and draw the appropriate conclusions.

 

(b) Use of Chrome 41As stated, Google’s WRS is based on the Chrome 41 browser; so familiarity with Chrome 41’s limitations is essential, as is optimizing the website for Google’s search capabilities.

Using Chrome 41 as a debugging tool is recommended.
Checking the “Console” (under Inspect) in Chrome 41 will show us the list of problems encountered by the browser when loading the web page—these are the same as the obstacles Google will face when loading the page: invaluable intelligence.
The following screenshot is of a JS website experiencing difficulty in ranking; the GSC “fetch & render” showed that things go wrong during rendering, however only with Chrome 41 (in the console, under Inspect) were we able to see what was causing the trouble:

In addition, Google’s WRS (based on Chrome 41) has several limitations worth noting:

 

 

 

(c) Graceful Degradation
Further to the aforesaid, the website should be made adaptable to older/less advanced browsers; aside form helping Google render the website, this is also enables users with older browsers load the site. Here we may only recommend the use of transpiling/polyfilling, although naturally your own development team may identify the solutions suitable foryour site, provided Chrome 41 compatibility tests are carried out as noted.

(d) Optimised Rendering Path
As mentioned, loading speed is important, and for JS websites prone to slow loading scripts—content taking too long to load may not be indexed as the page will be abandoned before hand. Emphasis should therefore be placed at the development stage on website page loading; the assistance of the Google technical team and their Rendering Optimisation Guide is recommended.

(e) Using History API
Use friendly URLs , in any case avoid the use of fragmented URLs (containing hashtags—as explained (Google ignores anything after the hashtag, failing to index the unique address.
Also to be avoided are hashbang (#!) addresses: Although Google is capable of crawling them, having recently announced that escaped fragment addresses are disregarded since the capability is in place for direct rendering of the hashbang address, our experience nonetheless is that hashbang addresses create plenty of issues .

Google’s recommendation using History API (about 100 secs. viewing time from this point in the video); this solution enables the creation of friendly URL addresses in single page application websites, as well as those making extended use of JS.

(f) Links
Every URL link that we want Google to crawl and index must be inlinked via an href link on the site; avoid using onclick and JS links. Google only crawls href links (here in video) and construction of the links in HTML will ensure that Google crawls and reaches all the content on the website.

(g) Central Inline Elements
As implied in Par. (a), the central elements & content on the page from a search engine’s POV (i.e. main content and meta-tags—title, canonical and the like) must be inline to ensure their accessibility to search engines. In other words, correct implementation of hybrid/dynamic rendering ensure central elements are rendered on the server-side and reach the browser inline and in HTML for Googlebot.

(h) Lazy-Loading of Images
Insofar as a website uses lazy-loading of images to improve loading speed (i.e. the web page is initially loaded with a place holder at the image location, and only when the image enters the user’s view-port, on his screen, is a request sent to the server and the image itself loaded), make sure that the images are accessible to the search engines. As stated, the search engine does not interact with the web page; it neither scrolls nor inserts the image into its screen, meaning the search engine renders the page without the images (as mentioned above, the images are not initially loaded) and cannot index them.

2 solutions are available:

(i) Insertion of an image with a <noscript/> tag around the <img> element. This tag defines an alternative for browsers that do not support scripts, including the image for them in the initial loading; this way Google can access the image file via this tag in the code:

(ii) Insertion of structured data around the image, enabling Google to access the image via the structured data:

(i) Additional Information about Google
We’ll take the opportunity to discuss two matters not necessarily related to JS, but important for developing a website adapted to search engines:

> Google is Stateless
The Google crawler is defined as stateless, so that any content of a website only shown to users in a given state, e.g. users possessing certain cookies (such as content intended for registered users), will not be available to Google.
The following is what Google clears when loading the web page, and any content supported by data saved via one of them will not be savailable for Google and therefor not indexed:

Google Uses Caching
The search engine is a heavy user of caching, and AJAX files may also be found in the Google cache; hence it should be ensured that if modifications are made within the JS files, the versioning mechanism in their URL is to be used so that Google knows to re-request the file:

What Not to Do

Do not block JS with robots.txt—for obvious reasons, although for some reason can still be seen on many sites.

Do not use onclick links for important web page content: As noted, Google does not behave like a user; content needing a user’s interaction to load will neither be rendered nor indexed by Google.
An example of important content—opening navigation toolbar: A common mistake is to locate the navigation toolbar behind onclick; i.e only when a user clicks the menu icon is a request made and the menu is loaded. In this situation, Google does not render the toolbar, with all the implications of this (does not crawl the links included in it and unable to assess the importance of laterally linked web pages).

Tip for checking functionality—testing with Chrome 41: If the opening navigation toolbar (or any other object you wish to check) is in the DOM when the web page has finished loading initially, then it will also be there for Google, which can search it on completion of the initial loading. Otherwise, the content is concealed and requires user interaction for loading, and Google will neither render nor index it.

Do not use JS redirects: Redirects must be on the server level (301); from experience, JS redirects create problems both with identification of the redirect and with passing “link juice” (notwithstanding hints by Google that it can deal with them)—JS redirects are an issue best not got into unnecessarily.

As a footnote it is important to realise that all of the aforesaid only applies to Google: other search engines are a long way behind Google in all matters concerning JS crawl and rendering capabilities, which should be borne in mind.

You can check your Google Analytics Account to see what proportion of organic traffic on your site is from other search engines—under  Acquisition > Campaigns > Organic Keywords > Source.

Case Study

Here is a short case study in which we solved a JS problem encountered by one of our clients; the client involved is an authority in its field, as such possessing a large website generating a lot of traffic. The client also has a separate mobile website incorporating large quantities of JavaScript in various areas of the website—where Google had difficulty rendering the web page content. All of this with the Mobile-First Index in the background, making it particularly important to adapt the mobile website pages for the search engines.

For example, this is how the Fetch as Google test looked in the client’s GSC account:

 

Google was unable to render the web pages’ content. The test of course showed that there were no blocked files, so we were aware a problem with the rendering existed but did not know what was causing it. In order to locate the causes we met with the client’s development team, installed the browser Chrome 41 on their system, and by means of a simple check of the Console in the browser’s Developer Tools the development team was able to solve the problem quickly and efficiently. After carrying out the fix Google can now successfully render the content of these web pages.

This is how a sample web page looks following the correction:

The results of the correction may also be observed in the traffic to these areas in the mobile website. True, Google now mainly bases the results of the search on the desktop (although a transition to a mobile-based index is being carried out), so the website was attracting mobile traffic even when Google was having difficulty rendering the mobile version, however, following modification of these areas in the website, Google gained the ability of rendering the content of the web pages and determining their value for mobile users in an enhanced manner, leading to improved performance of these web pages in the mobile search results.

The GSC indicates a pleasing increase of impressions in mobile search engines for these web pages immediately following the fix:

A similar trend may be seen in clicks from mobile search results for these web pages:

Naturally this is reflected in the Google Analytics User Data as well:

This test case illustrates the importance of working with Chrome 41.

Resources

Main references for an understanding of the subject:

This video should be watched first:

https://www.youtube.com/watch?v=PFwUbgvpdaQ&feature=youtu.beofficial Google video on adaptation of JS to search engines;

followed by—

https://www.youtube.com/watch?v=83As5qYrMno—Google instructional video for developers on how to build a JS powered website adapted to Google.

For those seeking further reading and enrichment/guidance/extra information:

https://www.elephate.com/blog/ultimate-guide-javascript-seo/—general guide
https://www.elephate.com/blog/chrome-41-key-to-website-rendering/—use of Chrome 41
http://www.stateofdigital.com/javascript-seo-crawling-indexing/—excellent account of crawl and search mechanisms and the problem with JS

https://hackernoon.com/polyfills-everything-you-ever-wanted-to-know-or-maybe-a-bit-less-7c8de164e423—explanation of polyfills
https://www.searchenginejournal.com/javascript-seo-like-peanut-butter-and-jelly-thanks-to-isomorphic-js/183337/ – —what is Isomorphic?
https://webmasters.googleblog.com/2017/12/rendering-ajax-crawling-pages.html—Google notice on disregard of escaped fragments
http://www.stateofdigital.com/javascript-seo-the-definitive-resource-list/—major database on JS
https://moz.com/blog/javascript-seo—general guide to SEO & JS

https://www.briggsby.com/dealing-with-javascript-for-seo/—another superb general guide
https://www.elephate.com/blog/javascript-vs-crawl-budget-ready-player-one/—account of impact of JS on crawl budget
https://www.elephate.com/blog/everything-you-know-about-javascript-indexing-is-wrong/—conclusions drawn from largest trial conducted on the subject

http://diveinto.html5doctor.com/history.htmlexplanation of History API; focusing on “The Why” aspect

https://universal.angular.io/official Angular Universal website

 

 

 

<   |   >