Study: 31% of international websites contain hreflang errors

Posted On 06 Apr 2023
Comment: Off

Conflicting hreflang directives and missing self-referencing tags are among the widespread issues plaguing international websites today.

There’s also the added complexity of understanding specific language nuances and regional targeting which would only typically be understood by a native speaker or someone who had thoroughly studied the language.

Incorrect hreflang implementation can cause many complications (i.e., duplicate content, erroneous indexing and poor SERP visibility) detrimental to SEO performance.

It’s imperative to implement hreflang with care. Thankfully, hreflang is well-documented and related issues can be identified through various SEO tools.

Hreflang errors study

To determine how widespread hreflang issues are and which ones are more common, I partnered with NerdyData, which gave me access to their database of websites that contain hreflang code.

NerdyData provided a list of 18,786 websites that contain at least one instance of hreflang declaring an alternate within the source code. Thus, this study only accounts for hreflang implemented in the <head>, not through XML sitemaps or the HTTP header.

I carried out the study by:

  • Running crawls in Screaming Frog to validate the presence of hreflang on the homepages.
  • Removing GEO-IP redirects so the complete list of URLs resolves in 200s.
  • Utilizing HreflangChecker.com and Visual SEO Studio to process the URLs in batches to identify common issues identified by the tools.

31.02% of websites contain conflicting hreflang directives

My findings show 31.02% of websites serving multiple languages have conflicting hreflang directives. Conflicting hreflang can happen when a webpage has various hreflang tags for different languages and geographical targeting.

Put simply, more than one URL has been assigned to an individual language or region, sending confusing signals to search engines. For example:

  • <link rel=”alternate” href=”https://example.com/” hreflang=”en” />
  • <link rel=”alternate” href=”https://example.com/en-uk/” hreflang=”en-gb” />
  • <link rel=”alternate” href=”https://example.com/en-us/” hreflang=”en-gb” />
  • <link rel=”alternate” href=”https://example.com/en-au/” hreflang=”en-au” />

Such confusion potentially leads to complications around duplicate content and incorrect ranking and indexing, making it difficult to place well in the SERP.

Even if users find your webpage among those performing well, they will suffer poor user experience if they are served the incorrect version of the page.

16.04% of hreflang clusters are missing self-referencing tags

Self-referencing hreflang happens when a page includes a hreflang tag pointing to its URL.

In essence, the page indicates it is available in various languages, including the original language of the page.

Despite initially appearing as a redundant tactic, it’s good practice for international SEO. Unfortunately, 16.04% of sites with multiple languages have no self-referencing hreflang tags.

Search engines can better understand the relationship between different versions of the same page when self-referencing hreflang tags are used, including pages available in different languages.

Given that hreflang contributes as one of approximately 20 canonicalization signals, it’s an important signal to include.

47.95% of websites don’t utilize x-default

The x-default attribute signals to search engines that a page doesn’t target a specific language or location, defining it as a default language version of the page.

It’s especially useful when a page is available in multiple languages but doesn’t deliver content in the user’s preferred language.

The x-default attribute isn’t necessarily needed in hreflang. Up to 47.95% of multilanguage sites are currently not using it.

However, it can be beneficial to use in cases where a user searches for a page in a specific language that isn’t available, as it helps search engines find the most appropriate version of the page to display.

It’s important to note that the x-default attribute should only be used if another language isn’t available. Where other available languages exist, each should be specified with a hreflang tag.

Additionally, x-default should not be used on pages specific to a particular language or location.

8.91% of hreflang clusters contain at least one instance of invalid language codes

It is essential to use the two-letter ISO-639-1 format within hreflang attributes.

Unfortunately, it’s common for language codes to go wrong, causing multiple issues that can affect the international targeting of a website.

My research found that 8.91% of sites targeting more than one language currently contain unknown language codes.

It could simply be a confused approach to combining language and location codes, but many common issues might be the cause.

Some language codes don’t quite match the spelling of a country.

For example, you might expect the language code for Croatian to be “cr,” but it’s actually “hr.” Because the code isn’t obvious, it’s easy to make mistakes when implementing language codes.

1.6% of hreflang clusters contain at least one instance of invalid region codes

Contrary to the previous statistic, relatively few hreflang clusters contain invalid region codes.

While using the two-letter ISO-3166-1 region codes isn’t required, it does help when targeting the same language between two or more countries with different spelling rules. Doing so provides more context to search engines, looking into user location and language.

To return to my previous example, you must use the code “en-US” to target users in the United States. If it is set to “en-GB,” you will only target British-based users, missing your intended audience entirely.

Common errors here include:

  • <link rel=”alternate” href=”https://example.com/en-gb/” hreflang=”en-uk” />
  • <link rel=”alternate” href=”https://example.com/en-eu/” hreflang=”en-eu” />

Here, the entries all target English but also intend to target the UK and Europe. Both UK and EU are invalid codes as it’s GB (Great Britain), and you can’t target Europe as a continent.

Spanish targeting can also be problematic in Latin America, with cluster trying to target es-la, es-lx and es-419 in an attempt to target the region as a whole when you should be targeting individual countries – or leaving Spanish as a general language.

22.46% of hreflang clusters contain irregular/unusual language-region combinations

There is a range of benefits to targeting countries without native languages with hreflang, with a major one being to improve the user experience for non-native speakers.

For example, Dutch is the native language of the Netherlands, but an estimated 95% of the population also speaks English. There are also around 97,8000 British nationals who live in the Netherlands.

With such high numbers of English speakers, targeting users in the Netherlands with your English website pages makes sense.

However, not all combinations make sense. For example:

  • <link rel=”alternate” href=”https://example.com/en-vn/” hreflang=”en-vn” />
  • <link rel=”alternate” href=”https://example.com/es-ie/” hreflang=”es-ie” />
  • <link rel=”alternate” href=”https://example.com/zh-zm/” hreflang=”zh-zm” />

While the three examples above will pass a hreflang test and are technically accurate, Zambia’s number of Chinese speakers will likely yield little to no results in having this alternate version.

Creating alternate versions that make little sense creates additional and unnecessary crawl demand and versions that Google may likely deem to be duplicated, overriding the canonicals.

About the Author