Data is all the rage these days. But whether it’s real-time or big, there is one consistent theme: data must be clean to be valuable.
And that’s why Google Analytics referral spam is such a big deal – because it poisons the well of your web analytics data, leaving it tainted and untrustworthy.
Let’s examine why this fast-growing trend is so detrimental to your business, and how you can keep it from showing up in your analytics reports.
What is Google Analytics spam?
There are so many valuable articles already written about the various forms of Google Analytics spam out there, and I’d definitely advise checking them out. But in case you’re in a hurry or just aren’t interested, there are two types of referral spam:
- Ghost Referrals – A website uses your Google Analytics UAID to send false data to Google. They never even visit your website!
- Spam Bots – Unlike ghost referrals, these bots actually do visit your website, albeit briefly.
Why would they do that?
This isn’t some sadistic high school prankster having a laugh at all the mischief he’s created. No, these are intentional, sophisticated efforts with one (or more) of several intended outcomes:
- Sell you something. By putting their name into your referral list, they are hoping you will be tricked into visiting their website. Here, they may try to sell you services – common referral spammer semalt.com is in this boat. Or, they may have more sinister intentions…
- Malware placement. Similar to gaining traffic, this spammer wants to get you to his website. But once you’re there he will attempt to place malware on your machine. This could take the form of a keylogger for stealing financial information, or including your computer as part of a large botnet. Whatever the case, this means bad news for you if you visit.
- Send you to another site. Affiliate marketing is where retailers pay a small commission if another site sends them a purchasing customer. To this end, some spammers are hoping that you will click one of the thousands of links on their site and buy something, earning them a kickback. Is it likely? No. But it doesn’t take much effort on their part to generate traffic this way, and they’re playing a numbers game.
Why is Google Analytics spam a big deal?
While this referral spam may not be harmful in and of itself, the biggest impact it has is corrupting your data Google Analytics data. The degree to which your data is tainted will depend on the amount of traffic your site sees, as well as the amount of referral spam.
If your Google Analytics install is being plagued by referral spam, expect to see bad data in the following places:
- All total/composite metrics (visits, bounce rate, pages viewed, new vs. returning, etc.)
- Referral Sources
- Site Speed
If you know you have bad site speed data, how do you know whether you need to take to fix it? What decisions can you make about site effectiveness if all your engagement metrics are flawed? How can you send an automated campaign report to your CEO knowing that it’ll show a line item for “Get-Free-Traffic-Now”?
You can’t. The most powerful information you have about your largest digital asset (your website) has just been rendered flawed. It’s like putting a smiley face sticker on the Mona Lisa.
What can I do about Google Analytics referral spam?
There is no “one-size-fits-all” that works for removing spam from your web analytics. However, implementing these steps will solve the issue for most websites with the least amount of ongoing hassle.
Always keep a clean or “raw” view.
The standard setup in Google Analytics is that each Property starts with a View called “All Web Site Data.” Don’t mess with that one, just leave it alone. No filters, no limits (feel free to add goals and site search, though). By doing this, if something goes terribly wrong with your filtering, you always have a clean view to fall back on for verification.
You can have up to 25 views for a Property, such one just for internal traffic, or one for only traffic from a certain country. For our purposes, we recommend creating one called “Filtered” or “All Data, Filtered” for easier remembering later.
Add a Hostname filter.
Since ghost referrals never land on your site, they don’t generate a proper hostname to Google Analytics. This makes filtering them relatively easy. Check out your Hostname report for the past 12 months and see what common hostnames have been logged. Choose the ones fit your specific usage. These will form your filter.
Once you have identified your hostnames, go to the Admin panel and choose your Filtered view (see step above). Create a new filter called “Ghost Spam – Hostname” or something else you’ll remember. I’m personally a fan of “Pac-Man” because it’s killing ghosts! Okay, that was terrible. Moving on…
Set your filter to Include the field Hostname, and then insert acceptable hostnames from your report. Remember that this is RegEx, so precede dots with backslashes. In most cases, your filter will look like “mysite\.com|mysite.dev”. By including the domain, all subdomains are covered.
Create a filter for known spammers.
Since the Hostname filters includes only your website, you will still need to exclude the bots that actually visit your domain. The trick to this filter is that you want to exclude the field Campaign Source. This seems to work better and more consistently than Referral.
There are two approaches that I have seen employed:
- Enter each spam website completely as they appear on your referral list, starting with a seed list (com has a good one to start with). This approach is very thorough, but also time-consuming to setup and maintain, as you will need to be adding new offenders on a regular basis, such as weekly.
- Enter common domain words to eliminate the majority of spammers, and add the rest as they appear. This has a higher chance of filtering legitimate traffic, but also requires less maintenance. An example might look like “porn|buttons|free|seo|money”.
Several of our clients use the second option, and it seems to work very well. If you’re going to use this approach, I’d recommend exporting a all referral sources for the past 12 months to Excel, and using that to build your domain filter list. Everybody’s will look a little different – for example, it’s not uncommon for us to get a legitimate link from a site with “seo” in the domain, but a website about cosmetics could pretty safely filter on that term.
Filter countries that aren’t relevant.
Depending on your business, you may have a targeted geographic approach. Maybe you can only work with companies in the United States, or North America, or just not anyone in Australia. Based on these limitations, you can employ country exclusion filters to not see traffic that isn’t important to your business. Some common countries that drive referral spam are Russia, Brazil, India and China (aka BRIC).
On a side note, this is a good practice even if you aren’t trying to filter Google Analytics referral spam, because it allows you to focus on how the visitors you care about most are performing. Obviously, don’t filter out countries that you have a legitimate interest in.
Will this solve all of my problems?
This is by no means an exhaustive list of techniques that you can employ to stop Analytics spam. Spammers are creative, constantly coming up with new ways to bypass our protections, almost as soon as we get them setup. In six months, they may have something completely different to deploy.
However, the majority of our clients have seen success by implementing these filters. I encourage you to try these out, and regain confidence in your data.
Also, if you need some help getting these changes in place, or just want a fresh set of eyes on your web analytics, we’d be glad to help. Just drop us a line!