Small businesses rely on Google Analytics to help them understand how well their online marketing efforts are working. But this resource is being crippled by opportunistic spammers. Here’s what you need to know to protect this important data, and improve your marketing intelligence.
Millions of small businesses rely on Google Analytics as a measuring tool for their online marketing success. When the graph line representing their number of site visitors goes up, they’re doing something right. When the line moves down, they start to worry.
So, imagine the confusion when the line starts to look like a punk hairdo, with radical spikes of increased traffic that seem completely unrelated to any actual online marketing efforts.
Introduction to Google Analytics Referrer Spam
Those with a bit of knowledge will take a more detailed look at the data, and discover that these big traffic spikes are fueled by sites with names like forum.darodar.com, buttons-for-website.com, and semalt.com. (Important: DON’T visit those websites! You have been warned.)
What Is The Goal of Referral Spammers?
If you’re like most Google Analytics users, your first instinct may be to trust the data and visit those websites to find out why they’re linking to you. However, if you’re one of those who did click on a link in your list of referral sources (listed next to names you recognize), you did exactly what some spammer lurking in Russia, Brazil, or Uzbekistan wanted you to do. Thanks to you, they just got a little richer. Hopefully, you didn’t pick up an online social disease (malware or virus) by visiting those links.
These sites are called Google Analytics referral spam (yup, just like email spam). Not only do they lead you to a potentially malware-infected website, they can completely ruin Google Analytics as a reliable measuring tool.
Small Businesses Are The Primary Target
Here’s the temptation: Just as Microsoft Windows, with its oodles of users, offers a big, juicy target for virus-building hackers, Google Analytics has a LOT of users – the vast majority of which are small businesses. So, if you’re in to spamming, why not find a way to exploit the way Google Analytics works to feed it with false traffic reports.
Spammers can assume that many users of GA (e.g. small businesses) will:
- wonder what’s happening
- investigate by clicking on links planted inside GA
- come to websites where spammers can infect them or collect affiliate marketing income
Is Referral Spam New?
Referral spam has been around for a long time, but — until recently — it was clumsy and easy for hosting providers and analytics services like Google Analytics to block. However, it’s becoming increasingly increasingly frequent and sophisticated. All indications are that this trend will continue to get worse.
What’s Led To The Increase In Referral Spam?
Ironically, it’s probable that Google Analytics helped make this type of spam more likely to increase. Not too long ago, GA announced their Measurement Protocol, which was intended to help web developers connect external systems (such as point-of-sale systems and other internet connected devices and services of many types) to GA. The intent was to create a clearer picture of consumer behavior. Google documented how to formulate requests so these devices could send information that would be accepted and integrated with GA.
The Law of Unintended Consequences strikes again. What would have been difficult for opportunistic spammers, has now become a snap.
It’s a flaw that — until Google comes up with a fix — has effectively made all GA data subject to abuse and fraud.
How Does Google Analytics Referrer Spam Work?
To clarify how this all works, here’s an illustration that nicely captures how some of these false reports are generated and disguised.
Image used with permission, courtesy Analytics-Toolkit.com
What Referrer Spam Means to Small Businesses
A few dozen fake clicks here, a few dozen there — certainly no big user of Google’s paid version of Analytics will notice, as the phoney numbers are lost in a massive flow of legitimate clicks, packets of information that Google Analytics slurps up like a whale consuming krill.
But, to a small business, a few dozen unusual visits may make a graph look very intriguing! You might ask yourself, did some website write a good review about me? I’ve seen some GA accounts in which more than 50% of the apparent traffic comes from fake sources.
These fake “visits” show up as referrals, i.e., as traffic sent to your website from another website, typically from a displayed link. And link traffic, as we’ve been taught, is mostly a good thing, when it comes from credible sources, like magazines, news organizations, and other trusted sources.
Link traffic can be bad, we’re also told, if it comes from known spam sources, such as the millions of low-value websites set up by or for opportunistic SEO “experts.” Google has made algorithm changes to squash these human cockroaches. (Sorry if you’re fond of cockroaches, but I’m not, and I have a real disgust for these particular SEO tactics.) And Google is now penalizing websites that abuse the link=trust formula.
So, it follows that ignoring those funky referral sites might be a bad decision. What if those are real hits on your website that are coming from spammy websites? How can you tell? How can you tell what’s really going on with your web traffic, if 25% (or more) of it comes from imaginary visitors? Might Google penalize us and rank our website lower? Doesn’t this mess up all the work we’re trying to do to build audiences and use measurement to make decisions?
There Is Something You Can Do
The good news is there are ways to weed your Google Analytics digital data-garden and restore the validity of the measurements you find there.
The bad news is that — just like in your real-life garden — the weeding takes both some work and a some specific knowledge (which I’ll offer below). The really bad news, is that until GA makes some big changes, this problem will continue to evolve and could get much worse, a problem requiring ongoing attention.
Cleaning Up Your Google Analytics
Cleaning up referral spam is unavoidably a technical topic, but I’ll do my best to demystify it.
I may sometimes call “referral spam” “ghost referral spam.” For the most part, those terms mean the same thing, but referral spam is the overall problem, while ghost referral spam is a subset of the problem.
It’s called “ghost” because it exists only in the world of information, not in the physical world — like the web server with your site aboard.
In other words, ghost referral spam exists only inside Google Analytics or other online analytics tools, all of which work in a similar way.
Solutions From The Experts
Fortunately, there are experts out there who have invested the time and brain power needed to deeply understand how Google Analytics works, and who have developed some good advice on how to respond to these spam threats.
Michael Sullivan is one such expert. Through his web business, AnalyticsEdge.com, he’s recently posted two articles that are a great resource helping us know what to do. For example, Sullivan helps people like us distinguish fake from real, and configure our Analytics accounts to properly filter out the fakes. The best place to start is his Definitive Guide to Removing Referral Spam. He’s also done a follow-up that helps even more, a post entitled, Segment to Eliminate Spam Referrals. Do yourself a favor and read them.
In the meantime — if you’d like a Cliff Notes version — here’s a set of steps from the Definitive Guide article that will give you reports that show only real, valid data.
1) If you’re not worried about the past, create a new View
This step will help most if you’re setting up a brand new website or a new Analytics account.
The really short version of this approach is to set up a filter in your account that includes only traffic labeled from your Hostnames. Hostnames are the names of websites you control or that are legitimately using your Analytics number. That would include sites like www.yourwebsite.com, blog.yourwebsite.com, store.yourwebsite.com, and sites of third-party services performing functions you want to track.
The article guides you through creating the list of valid hostnames and building the filter. This works well, because for now, you might not have to keep track of or worry about what’s going on in the “Wild, Wild West.” Every few days it seems, ghost spam seems to pop up from a different website.
An alternative to Sullivan’s approach is pointed out by another GA expert, Georgi Georgiev of Analytics-Toolkit.com. Georgiev’s premise is that Hostname information can also be faked, and appears to be happening to some GA users.
In my experience, filtering hostnames does seem to produce clean-looking reports for me and the sites we manage. At least for now. And so does Georgiev’s solution of filtering based on Campaign Source. His article is also worth reading.
2) If the past matters, create a Segment
Segments are used to select sessions that match a specific set of criteria. They can be used to eliminate referral spam sources to give you “clean” metrics for reporting. This article shows how a single segment can include valid hostnames (to remove ghost referral traffic) and to exclude spam referrals created by crawlers.
– Analytics Edge
By adding new Segments to your view of your data, you can use multiple filters to both add and exclude traffic. You can begin by adding “All Traffic” to the view produced by filtering for valid hostnames. Then, filters are added that exclude the remaining real-but-computer-generated crawler and bot traffic. That traffic is identified by referral domain information. Because new crawlers and bots appear regularly, this filter will require updating over time, though not as often as filters for the pure “ghost” traffic.
3) And if those links are real, then what?
The good news is that, for now, the vast majority of these troubling inbound links and traffic aren’t real. For now, they’re annoying, and they interfere with knowing whether or not your marketing efforts are working, but that’s about it. You can clean the digital graffiti off your GA data. For now, they aren’t going to hurt your search ranking. For now, they’re not a “huge problem.”
If you do find that you have truly questionable inbound links, you can first attempt to block that traffic from ever getting to your site. That’s a whole ‘nother story, which I’ll address in a future post. But, in the meantime, you can learn more about banning that traffic by reading up on editing your .htaccess file. WordPress users can access several different plugins to help, including WP-Ban and SpamReferrerBlock. They do slightly different things, but both can help eliminate traffic you don’t want.
The bad news is that we must all be vigilant, keep an eye on our web traffic, and update our blocked sites lists and our filters in GA.
A Huge Problem That Is Growing
Greed is driving our opponents, and it’s an insatiable hunger. Analytics-Toolkit raises a serious concern: that everything inside GA is now potentially suspect. For now, the attackers are focused on generating referral spam, but that doesn’t mean they’ll stop there. Because they CAN fake traffic, they almost certainly will, whenever there’s a financial motivation, from extremely subtle methods to something as crude as extortion.
The “huge problem” lies in the future. It won’t be long before it becomes a waste of time to use Google Analytics, unless Google makes fundamental changes to this great tool. It means it’s very likely that we’re all going to be updating our websites to enable a new, more secure analytics system, which will likely be more complicated than simply adding a tracking number on our website’s pages.
It’s always going to be necessary to keep on guard, to look for good advice, and to keep a reliable Ghostbusters’ phone number in your address book. Ours is at the bottom of the page.