Google Analytics Spam – Don’t Be Fooled: Filter Bogus Traffic in Google Analytics

Google Analytics Spam: Filtering Bogus Traffic

Published by

May 23, 2017

Google Analytics spam refers to fake traffic in Google Analytics and can make it impossible to use GA for its intended purpose: To analyze the traffic patterns of your website!

No, fake traffic won’t cause any issues with your website per se, but it will result in data that doesn’t represent your real visitors (or any visitors, for that matter).

Fake traffic is Spam traffic that creates sessions on your website that artificially increases traffic, creates traffic spikes and avalanches, that don’t represent what’s really happening with your customers at all.

Ridding Google Analytics of Spam traffic is essential to ensuring the data you receive and act upon is real.

So let’s look at how Spam affects your GA reports, and how to get rid of it.

Signs of Google Analytics spam traffic

The traffic that gets reported through Google Analytics has many properties. Properties are metrics that get ‘triggered’ in each session, each one reporting some piece of data about the visitor’s level of engagement. We mostly see those values in aggregate and perceive them as proportions of the total. An example would be average session duration, bounce rate, page views, users and so on.

For typical Spam traffic, at least a few of these fields would look a little suspicious, like session durations in round numbers (unlikely to occur naturally).

Let us look at an example:

Google Analytics Spam Traffic Session Durations
Google Analytics Spam Traffic Session Durations

We did analytics diagnostics for this site recently, and since it is a relatively new site, it has very low traffic. That makes it a perfect example.

The website had apparently over 100 visits in the last month. It did not take very long to see there was a significant amount of Spam traffic going on. All we had to do was to open the Behavior tab and click All Pages.

But let’s back up a bit.

There are multiple types of Spam traffic, the most common of which are:

  • Languages Spam: This type has become very common recently and it involves ghost traffic setting weird values in the Languages report.
  • Pages Spam: This type infiltrates the content reports and causes it to report pages that do not exist on your site.
  • Fake Events Spam: A type of Spam traffic that inserts a fake event report.

All of these have one thing in common: they are called referral Spam.

The Spammers use the Google Analytics Measurement Protocol and send the data to multiple Analytics accounts at the same time. The result is that the reports contain messages and links which the Spammers hope users will click. Most of the time, clicking on those will install malware or spyware on the websites of the users, so never click those links!

Where to look for sneaky Google Analytics spam

Knowing what we’re dealing with, let’s look at the effects these events have on Google Analytics reports on a real website.

Depending on the type of Spam, the evidence will be in different reports. The only solution is to go through the reports and check for any suspicious traffic.

Google Analytics Spam Traffic
Google Analytics Spam Traffic

Above is an overview of a website report that contains Spam traffic.

Can you spot it?

This is a prime example of language Spam. The Spammer here (rumor has it that it is a guy disgruntled with Google for some reason) has inserted fake language reports, which you can easily spot.

You can check for the traffic patterns by clicking on the entry and you will get this:

Google Analytics Traffic Patterns
Google Analytics Traffic Patterns

If we don’t take the language field in consideration, it looks like legitimate traffic. You see that the Spammer even sets the bounce rate, Pages/Session and average duration to make it look like legitimate traffic.

For comparison, let’s see a legitimate language report:

Google Analytics Language Report
Google Analytics Language Report

Above is the report from legitimate traffic. You can see that it doesn’t differ much from the Spam version. No new users or new sessions raise an eyebrow, though. It is almost impossible to have that result since there had to be at least one new session and one new user. These nuances, however, can escape our notice.

The second example:

Google Analytics Report Page
Google Analytics Report Page

Once again, we see an apparently normal Analytics report page. Listed here are the top ten pages on the website.

But if you owned this site (or took care of its analytics account) you would immediately know that /sharebutton.to is not a page that exists on this site. So how did people get there and how did Google Analytics report this page?

Let us look:

Sharebutton?
Sharebutton?

It seems that every visitor (but one) reached this page using organic search, which would obviously be impossible if this page is not on the website at all. So we have uncovered another instance of Spam traffic.

Another area where you can look for Spam traffic is in the Acquisition reports. The Acquisition reports contain the information of how visitors got to the site. Once more, you should know the site you analyze to spot anomalies – and Spammers may spoof you even then.

Google Analytics Acquisition Reports
Google Analytics Acquisition Reports

Once again, it all seems like normal, legitimate traffic. High organic search is expected as most traffic to websites in general comes from either direct or search. If we were unaware of the presence of Spammers on the site, this report could easily pass as normal. To realize the extent of Spam infiltration, it is necessary to open each individual channel.

The table below is the Organic channel report:

Google Analytics Organic Channel Report
Google Analytics Organic Channel Report

Our dead-giveaway here that there’s Spam traffic is in the keywords.

It is very hard to come by uncovered (or provided) keywords anymore. In 2011, Google (followed by the other main search engines) moved to “secure search” which, sadly for publishers, prevents you from seeing which keywords were used to reach your site. This data field now shows up as “not provided.”

Which means that if it is provided, something is not right.

The second indicator is that the keywords provided all contain the phrase sharebutton.to, our ‘friend’ from the Spam pages report. The Spammer has set the value of the keyword to his traffic and revealed its true identity.

How to separate spam from legitimate traffic

To clear the reports of the Spam traffic, we need to find a way to discriminate between Spam and legitimate traffic.

Google Analytics provides a way to separate Spam traffic at the outset, before it is brought into the reports: Filters.

Using filters, you can exclude the traffic belonging to the staff of the website and developers. You can also filter out traffic coming from certain sources or according to multiple criteria. As such, filters can be used to eliminate Spam.

To do that, we only need to create a filter and apply it – and the Spam traffic will immediately disappear.

The trick here is finding a common element that the Spam traffic shares. Ideally this common denominator should address most of the Spam traffic at once, using one filter.

Google Analytics Spam - Add Filter to View
Google Analytics Spam – Add Filter to View

Adding filters can be done from the Admin tab in your Google Analytics account.

WARNING: You should be cautious about adding filters and do it only in Test View for the first few days. Otherwise, you risk losing valuable data if something goes wrong with the filter.

What can go wrong?

The trouble here is that most Spammers use multiple value settings to infiltrate your website reports, so what works on one set of Spam traffic will not work on another. There is an additional danger that you can filter out legitimate traffic.

Using segments to identify a good common denominator for Spam, we can exclude traffic with certain characteristics and improve the segment until we get only the legitimate traffic.

Google Analytics Spam Using Segments
Google Analytics Spam Using Segments

An example would be a segment created using the common fact that most of the Spammers have only 1 visit to the site and usually use some identifiable word that can be used to segment the data.

For example, a segment that tracks all new visitors that made a single visit to the site, using organic search keywords containing the word ‘sharebutton’.

But, there’s a better way to do this – because this method would result in having to create filters to cover each category of Spam. That’s a lot of filters.

You can overcome the need to apply an increasing number of filters by using something that most Spammers will not be able to reproduce: The hostname of your website.

Due to the way the internet functions, all the legitimate traffic reports will reach Google Analytics from the hostname on which the website’s tracking code is located.

Anything that appears in Google Analytics with some other hostname cannot possibly be legitimate traffic.

As you can see from this illustration, the only way the data comes to Google Analytics is from your server.

Google Analytics Spam Data Illustration
Google Analytics Spam Data Illustration

As you can see, a visitor navigates to your website and connects to the server on which the site is located. There, a GA tracking code, contained on every page of the site, records the data about the visit and fires off a report to the Google Analytics server. This data contains the hostname the report emanated from – the URL of your website.

One filter to rule them all

Let us see what this means in practice. Our poor contaminated reports from the low traffic website will once more be the guinea pig.

First off, we will open a Report page.

Google Analytics Audience Report Overview Screen
Google Analytics Audience Report Overview Screen

The report we opened is the Audience report’s overview screen. Immediately, we can see some Spam traffic here, but it is only 18 sessions and we now know that there is more Spam there.

The Audience report contains a handy little report called the Technology report.

Google Analytics Technology Report
Google Analytics Technology Report

By default, this shows the ISPs visitors use, but if you look closely at the top, you can see the Hostname report (circled in red for the purposes of this post). When you click on it, you will see something that will fend off barbarians from your reports.

Google Analytics Hostname
Google Analytics Hostname

These are the hostnames that have sent data to the Google Analytics server.

Immediately, you should spot that of the four hostnames, only one is yourdomain.com.

That means that the rest came from somewhere else.

Unless you were generous with your tracking code and gave it to your friends and acquaintances, or also happen to maintain websites for USA Today and the New York Post, that right there is all Spam traffic. Including the (not set).

Due to the way most Spammers work, they will not be able to influence the value of the hostname field. The way most of the Spammers infiltrate reports is by replicating the Google Analytics tracking code and then mimicking traffic reports to the Google Analytics server. Most of the time they do not know which site are they mimicking and so they try to use more or less known names (such as, for example, NY Post or USA Today). Or they just don’t bother at all, so you get the (not set) hostname.

Fortunately for most (and with condolences to NY Post and USA Today) this allows us to exploit this and create a filter that would let through only the reports coming from the legitimate hostname – our own.

Let’s see how to do this.

Add Filter to View
Add Filter to View

This is the first dialog box you see when you open the ‘add filter’ menu under the Admin → Views tab.

First, we have to name the filter.

“Spam exterminator” can be a good name for it. “Spam ninja”? “Spam Defender”?

To create your filter, select the filter type as ‘custom.’ Predefined filters do not allow you to include only the hostname.

Once we select ‘custom,’ select ‘Include.’

Then, select the field. To do this, open the dropdown menu ‘Select field’ underneath the Filter Field.

Exclude Filters
Exclude Filters

When you click Hostname, you will need to enter the hostname you want to include, ie. www/.yourdomain/.com.

The slash signs are used because the content of the field must be written in Regular Expression (RegEx) form. The dot is a special sign in RegEx and must be preceded by the slash in order to tell Google Analytics that the dot is a symbol, not an expression. A good guide for using RegEx can be found here.

Now your filter should look like this:

Google AnalyticsSpam - Filter Name
Google AnalyticsSpam – Filter Name

Next, click the ‘Verify this filter’ button, located at the bottom of the filter screen.

The result should be the complete exclusion of all the reports, except from the ones coming from www.mydomain.com.

Of course, once you verify that the filter works, just save it and it will start doing it’s job. Maintain it in the Test View for a week or so to ensure you will not compromise your data and then migrate the filter to your Main View.

To do this, just open the Main (or Master, depending on your naming convention) view in the Admin tab of your website analytics and go to Filters.

Google Analytics Spam - Add Filter to View
Google Analytics Spam – Add Filter to View

Click on ‘Apply existing Filter’ and select the name of the filter in the dialog box that appears.

Google Analytics spam conclusion

Maintaining clean data in your Google Analytics account is a critical task, the importance of which cannot be overemphasized. The decision-making process in for conversion optimization depends on the integrity of this quantitative data. If it is compromised, it will at best slow down the process of analysis, as you struggle to ignore the fake data points.

At worst, it can lead you to wrong conclusions and cause you to postulate hypotheses based on ‘ghost’ traffic, ignoring the real data points that would actually mean something to your real-world visitors.

The instructions above should rid you of about 95% of Spam traffic. (What remains you can eliminate using segments.)

Eventually, Google will probably work out a solution to eliminate the problem altogether, but until then, we’re on our own.

Your Google Analytics filter checklist

  1. You need to have admin permission to add filters to an account
  2. Go to your ‘View’ tab and select the ‘Filters’
  3. To add new filter, click on the red ‘Add New Filter’ Button
  4. Select either the predefined filters or use custom ones, depending on what you need. To filter out certain domains or hostnames, predefined filters will suffice, but to filter some more complex attributes, such as visitors from certain countries or from certain referral channels you need to use custom filters.
  5. Always use the test view to introduce the filters for the first time. Otherwise you may lose the data you did not want to filter out
  6. Filters have other uses beside filtering Spam and internal traffic. For example, if your website sells the products only to certain countries, you can use filters to include only users from those countries to have a realistic view of your conversions, based on your target audience.
  7. Use filters with care. Always know exactly what you are trying to achieve.

And to conclude this article, let’s see what the filtered reports look like for our guinea pig site:

Google Analytics Spam - Filtered Reports Conclusion
Google Analytics Spam – Filtered Reports Conclusion

See also our Google Analytics introductory series.

Series Navigation<< Google Analytics Multi-Channel FunnelGoogle Analytics Acquisition and Attribution >>

Published by

Edin is a Senior CRO Consultant. Edin is into Google Analytics and testing (any A/B testing tool really) and likes to write about it. You can follow Edin on Twitter.