Online privacy: an oxymoron

There’s plenty of debate in the industry these days about online privacy. There’s even a federal inquiry commission on “Information Privacy and Innovation in the Internet Economy” being conducted by the US Department of Commerce.  Pending legislation proposes to regulate the online advertising industry in order to protect consumer privacy.  It’s getting crazy in D.C.

Lots of the noise in the blogosphere centers on Facebook’s privacy policies, which some people find to be confusing.  But whether the default settings are too open, or the policies themselves are confusing to some users, the fact is that Facebook’s privacy policy is published.  Facebook makes controls available for every account holder to determine their own privacy settings.  It’s up to the users to take control of their own account and determine how public they want to be.  It’s hard to fault Facebook for allowing people to share as much information as they want.  As B.J. Novak said at the Webby Awards, Facebook users are obviously concerned their personal information “will somehow wind up all over the g.d. Internet.  That’s the last thing Facebook users signed up for.  Also the first thing.”

The more interesting privacy debate is the one the Commerce Department is looking into, and that is more about how much anonymous web surfing is really anonymous.  All kinds of online targeting is already widely in use: contextual targeting, behavioral targeting, geo-targeting, re-targeting, etc.  The optimistic view of these technologies is that they attempt to make advertising more useful and productive for visitors by showing them only ads that they’d actually be interested in seeing.  The pessimistic view is that anonymous activity on the web should be anonymous and there’s a slippery slope from serving behaviorally targeted ads to disclosing sensitive personal information.  Google CEO Eric Schmidt took a lot of heat for famously saying that “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”  This is a reasonable stance if we’re talking about terrorists researching bomb-building techniques, but totally outrageous if we’re talking about patients researching new treatments for HIV.  Some anonymous activity really should be anonymous, and if it’s not possible to provide anonymity, then the industry must at least provide privacy.

The key point in that last sentence is that THE INDUSTRY should provide privacy.  The 4A’s, the IAB, the DMA, and the Association of National Advertisers all agree that the appropriate approach to address consumer online privacy is through industry self-regulation and education.  And they said so in a letter to the US Dept. of Commerce in response to the aforementioned inquiry.  To date the industry has done an “ok” job of policing itself and delivering acceptable levels of privacy to consumers.  But if we don’t step up our efforts, the regulation is coming.  And that would make it harder for advertisers, agencies, and vendors to innovate, and it will definitely slow the pace of online advertising growth – the exact opposite of the desired outcome.

Maybe Scott McNealy, former CEO of Sun Microsystems, said it best more than 10 years ago when he said “You have zero privacy anyway.  Get over it.”

Posted by Steve OBrien on June 17th, 2010 No Comments

Search v. Display? … or Search+Display?

Next week many of us in the online advertising industry will gather for ad:tech in San Francisco.  It’ll be interesting to see how many new DSPs are announced, or how many ad networks and agencies will announce that they’re now DSPs.  Rather than repeat my last blog/rant, I’ll defer to MikeonAds as the authoritative source for defining what makes a real DSP, and why anyone should care.

We’ll be announcing some new offerings and I’ll be interested to see the continued blurring of distinction between search and display advertising.  Most of the effort/activity so far has been about using search terms to create audience segments for targeting (people who look for “mortgage refinance” are a pretty well-defined audience).  Or even more basic, visitors who ended up on your site through search make great candidates for re-targeting.  So using data from search campaigns, either internally generated or obtained from a third party, can help with targeting in display campaigns.  And the DSPs are making it easier than ever for display advertising campaigns to be targeted and cost-effective.

Of course, at some point soon there’s going to be a lot more overlap.  The platforms being used to manage display campaigns, whether DSPs or the exchanges themselves, bear a striking resemblance to the campaign management/optimization platforms that have been used to run search campaigns for years.  And since the largest ad exchanges are run by the largest search providers (Google, Yahoo!), it seems likely that features for complementary campaigns will emerge.  Followed closely by blended campaigns.  And then maybe some day the distinction between search and display will be so small as to be unnoticeable.  But we’re getting ahead of ourselves…

Posted by Steve OBrien on April 16th, 2010 No Comments

DSPs, SSPs, and Other TLAs

The entire online advertising industry is going through sweeping changes and it’s fascinating to watch.  We’ve been focused primarily on performance-based advertising (CPC) for years, preventing fraud and improving traffic quality for advertisers by working with performance-based ad networks.  Working with an ad network certified by Click Forensics provides advertisers with a level of transparency and accountability.  They can feel confident they’re getting what they pay for, and not paying for a bunch of invalid traffic or fraudulent clicks.

Until recently, the issues in the display advertising space were very different.  Sure, advertisers like to measure clicks (as in click-through rate), but since they aren’t paying by the click, there’s really no such thing as “click fraud” in a display campaign.  Advertisers worried more about reach and frequency and audience and had no concerns about “traffic quality.”  But the industry is changing.

Advertisers, or their agencies, no longer negotiate directly with publishers for ad inventory, unless we’re talking about premium placements like the home page of CNN.com.  No, the large majority of the inventory available on the Web is non-premium and is purchased through an intermediary.   Ad exchanges like RightMedia or DoubleClick Ad Exchange provide a clearinghouse for inventory and allow buyers and sellers to connect, benefiting both.  New services like Demand Side Platforms (DSPs) provide buyers with new alternatives for targeting specific audiences and provide real-time bidding features that allow advertisers to connect with their audiences at the lowest possible price.  Publishers, of course, would prefer to receive the highest possible price, so Yield Management Platforms (or Supply-Side Platforms, SSPs) have emerged to help them sell their valuable audiences to the highest bidder.

So what does all this mean for the advertiser?  It means that by the time they pay their invoice for the 10 million impressions delivered last month, they have very little idea about how those impressions were delivered, to whom, and in what context.  They really have no way of verifying that the impressions were actually seen by a human being (they could have been served “below the fold” in a browser or simply served to a botnet).  They have no way of knowing what appeared next to their ad, whether it was the mutual funds listing on a finance blog or some inappropriate user-generated content on a social networking site.  And they have no way of verifying whether the campaign reached the desired audience, or just some random web surfers in China.

We’ll be talking a lot more in the next few months about ad verification.  It’s an important topic, and we’re working on products to address these challenges.

Posted by Steve OBrien on March 10th, 2010 No Comments

Click Forensics, not just for clicks any more

Today we publicly revealed something that we’ve been working on around here for a while: Click Forensics for display advertising.  I know, what in the world does a click quality company know about display advertising?  We live in the world of CPC, so what do we know about CPM?

As the industry leader in click quality, we’ve been gathering data and scoring clicks for years.  Billions of clicks from hundreds of thousands of sites generated for thousands of advertisers across hundreds of ad networks.  Every single one of those clicks started out in life as an ad impression.  Some one saw the ad, some one clicked on the ad.  But before they clicked, it was only an ad impression.  So the kinds of data that we capture for clicks, CPC, is the same kind of data required to analyze impressions, CPM.  The kind of machine learning advanced clustering analysis and data mining we do to analyze click quality can be equally successfully applied to impression quality.

But who cares about impression quality?  I mean, they sell those things by the thousands! So what if one or two are bad?  Well, we don’t care about one or two impressions, and neither do most advertisers.  But in a world where impressions are purchased by the millions from ad networks, demand side platforms, yield management platforms, and ad exchanges, we all care whether or not our ads are being served, and if they’re being seen, and if so by whom?  Auditing and certifying that ads are served appropriately is what we call Ad Verification, and we’ve written a white paper to explain what it is and why it’s important.

And it is important.  That’s why we’re working on products that address the issues and solve the problems for display advertisers, ad networks, and publishers.  And that’s why Click Forensics is not just for clicks any more.

Posted by Steve OBrien on February 26th, 2010 No Comments

Typosquatting? Yup, It’s Real. And a Real Problem.

Some excellent new research by two Harvard professors, Tyler Moore and Ben Edelman (“Measuring the Perpetrators and Funders of Typosquatting“), finally quantifies the size of the issue with some defensible methodology.  It also provides a concise little history of how “cybersquatting” in the early 1990’s evolved into what we now call “typosquatting,” and how it was allowed to happen.  Typosquatting is a practice employed by some domainers (people who register domain names for a living and try to make money from them) by which they purchase common misspellings of popular domain names in hopes of montezing the traffic generated by poor spellers or those of us with fat fingers.  Who hasn’t mistakenly tried to do a search at Goggle.com, or tried to log in at Facebokk?  Truth is, no matter how badly you butcher the spelling of a popular domain, it’s likely that some one has registered that misspelling as a .com and hopes you’ll click some of his ads.

So how big is the problem?  Well, there are millions (yes, millions) of domain names registered solely because they are popular misspellings of other domain names.  Through some impressive research, the professors found that for 3,264 popular domain names there are approximately 938,000 registered typo domains targeting variations in their spelling.  That’s an average of 281 typo domains for every “legitimate” domain.  The most popular target?  Google.  Also in the top five most targeted are MySpace, FreeCreditReport.com, and Hotels.com.

Great research, but what’s the conclusion?  Well, the authors assert that since most typosquatting is monetized through pay-per-click ads, it’s incumbent upon the ad platforms to do something about it (they’re talking to you, Google and Yahoo!).  Not an unreasonable conclusion.  The problem is that the ad platforms actually make money off these made-for-ad sites, so their incentive to shut them down is not high.  Some of our customers have reported that the traffic generated from these typosquatters converts at a higher rate than traffic from other sources!  So advertisers aren’t necessarily pressuring the ad platforms to “fix” the problem.  We’ve found that the only time advertisers really get upset about typosquatting is when the squatter damages a trademark, a brand, or promotes a direct competitor.  That’s where we come in.

Posted by Steve OBrien on February 24th, 2010 No Comments

Q4 2009 Click Fraud Rate is Down. Or up. Depends.

Today we released our quarterly statistics regarding the rate of click fraud for Q4 2009, which came in at 15.3%.  We first began publishing industry data over four years ago, in 2006, which means we can now look at the trend for the same quarter over the past four years.  The fourth calendar quarter has traditionally been the annual high, and this year is no different.  15.3% is higher than any of the three previous quarters.  Like Willie Sutton who robbed banks because “that’s where the money is,” fraudsters find the increased search traffic during the Q4 holiday season to be a prime opportunity for illicit gain.

What’s different this year is that the trend of click fraud increasing annually, which we’ve observed for the past three years, has stopped.  For the first time, the Q4 click fraud rate has declined from 2008 to 2009.  Given that Q4 2008 was the highest click fraud rate we’ve ever reported, this isn’t too surprising.  But it’s still good news for the industry.  Even as fraud schemes become increasingly sophisticated with the advent of spyware, malware, adware, and botnets, the industry’s efforts to thwart fraud and protect advertisers seem to be working.  By the way, when I say “the industry,” I’m including the major search engines themselves.  Google, Yahoo!, and Microsoft all have active traffic quality programs in place to keep one step ahead of these new sources and methods of fraud.

Unfortunately, not every ad network, publisher, and advertiser can afford to build a team of PhD’s to constantly monitor and fight the problem.  That’s why we’re here.

Posted by Steve OBrien on January 19th, 2010 No Comments

A Graduate Level Course In Click Fraud

On Tuesday Harvard Business School professor Ben Edelman blogged about a new form of click fraud that may be almost as insidious as the Bahama Botnet discovered by Click Forensics last year.  Andy Greenberg did a wonderful job summarizing and translating Professor Edelman’s findings into layman’s terms in his Forbes.com article Google Faces The Slickest Click Fraud Yet.

This new fraud scheme is really a compilation of  “Fraudster Greatest Hits,” but with a new twist.  It consists of spyware being installed on unsuspecting user’s machines and clicking on paid links to generate fees for the spyware author and intermediary ad networks, some of whom are complicit and most of whom are not.  Nothing new there.  The spyware that Prof. Edelman tracked, though, was smart enough to click on paid links for sites that the user is already visiting.  What a perfect way to disguise fraud as legitimate traffic!  A visitor to Finishline.com doesn’t notice that a pop-up browser was redirected to Finishline.com, because that’s where he intended to go in the first place.  Visitors browse, shop, and maybe even buy something (convert) at a perfectly normal rate.  The traffic looks completely legitimate to Finishline.com, and to Google.

So, is this it?  The perfect click fraud scheme that successfully foils all attempts at discovery and generates untold riches for the perpetrators?  Well, not quite.  First off, it was discovered.  Prof. Edelman’s blog has been written about on Forbes.com and his discovery will certainly garner some attention in Mountain View.  That’s good, because the spyware perpetrator, TrafficSolar, should be prevented from continuing this fraud.

But it was probably a fairly low-volume scheme to begin with.  It’s limited to machines of users that are infected with spyware who also visit select Google advertisers.  So some small percentage of the organic visitors to Finishline.com generated a click fee instead of visiting for free.  It’s a problem, but probably not a huge one.  What would make it more serious is if there were another version of the spyware that simply clicks on paid links in the background without the user’s knowledge (a la the Bahama Botnet).  By mixing the fraudulent clicks with the real end-user visitor behavior and conversions, a fraudster like TrafficSolar could give the impression of being 100% legitimate.

The concluding recommendation in Prof. Edelman’s report is for Google to fire InfoSpace, its ad syndication partner.  A better solution would be for Google and InfoSpace to deal only with reputable partners who provide verified, audited clicks to ensure advertisers get what they pay for.  Check our client list for some worthy candidates.

Posted by Steve OBrien on January 13th, 2010 9 Comments

Hard at Work with Yahoo! TQ Score Prediction

We’re constantly hard at work here at Click Forensics to continuously improve our ability to accurately predict overall traffic quality for our clients.  And, every now and then, we’re able to bundle a number of these enhancements into the tangible release.  We did just that last week; announcing an upgraded version of our Yahoo! TQ Forecast feature.  We’ve been testing these features with a handful of clients, with strong results so far – namely, much better predictive accuracy so that clients can be sure they’re sending high quality/high paying traffic into Yahoo.  And, we’re excited to now be rolling this out to all our clients.  Specific features include:

  • YTQ Forecast Report – provides a summary of the likely Yahoo! TQ scores particular traffic sources will receive when they’re sent to Yahoo!;
  • Dynamic Adjustments - continuously monitors and adjusts to changes in the YTQ score rankings so that clients can appropriately tune and filter traffic sources;
  • Preemptive Traffic Source Blocking – enables publishers and ad networks to quickly identify and block certain online advertising traffic sources that are likely to deliver low Yahoo! TQ scores; and
  • Enhanced Botnet Detection – delivers better detection of non-human clicks, both malicious and benign, while serving as an early warning system for advanced sources of fraud.

We also got some nice coverage in AdExchanger about the problems some of our CPC and performance-based ad network clients face and how these new enhancement will help solve these challenges.

Posted by Paul Pellman on December 18th, 2009 No Comments

Bahama Botnet Hurts Google, Too

While it’s easy to see how the recently discovered Bahama Botnet is cheating online advertisers out of free traffic and generating fraudulent fees for complicit parked domains and ad networks, it’s important to note that ad providers are being victimized as well.

 

We have conducted additional research into the behavior of the Bahama botnet and found that it acts as a sort of perverted “Robin Hood” among ad networks by robbing ad revenue from the top-tier players and delivering fraudulent traffic to second and third-tier ad networks and publishers.  Chief among the ad provider victims is the one with the biggest treasure to take: Google.

 

As we’ve seen in this video, when an infected user performs a search on Google.com, they get some peculiar results.  This is because, unbeknownst to the user, they’re not actually on Google.com.  The page looks like Google.com and even says Google.com in the browser’s address bar.  So how can it not be google.com?  The perpetrators behind the Bahama Botnet are able to steal traffic and revenue from Google using a trick called “DNS poisoning”.

 

All computers on the internet identify themselves with a set of numbers that we know as an IP address.  Computers can find one another using these numbers.  However, humans find words easier to remember than long sets of numbers, so the Domain Name System (DNS) was devised to translate these numbers into names.  When “Google.com” is typed into a browser, the computer uses DNS to translate that domain name into a number.  In the case of Google.com, that number happens to be 74.125.155.99.  The DNS method for translating domain names into numbers is fundamental to making the internet work.

 

However, in the case of the Bahama Botnet, this DNS translation method gets corrupted.  The Bahama botnet malware causes the infected computer to mistranslate a domain name.  Instead of translating “Google.com” as 74.125.155.99, an infected computer will translate it as 64.86.17.56.  That number doesn’t represent any computer owned by Google.  Instead, it represents a computer located in Canada.  When a user with an infected machine performs a search on what they think is google.com, the query actually goes to the Canadian computer, which pulls real search results directly from Google, fiddles with them a bit, and displays them to the searcher.  Now the searcher is looking at a page that looks exactly like the Google search results page, but it’s not.  A click on the apparently “organic” results will redirect as a paid click through several ad networks or parked domains — some complicit, some not.  Regardless, cost per click (CPC) fees are generated, advertisers pay, and click fraud has occurred.

 

An interesting side effect of this whole scheme is that while the perpetrators of the Bahama Botnet turn organic or natural search listings into paid links, they don’t seem to alter the final destination domains of the sponsored links that show up on a search results page.  When an infected user clicks on one of these sponsored links, they always seem to end up on the correct destination domain (so clicking a sponsored link for Dell.com, for example, will always take an infected user to dell.com).  However, due to the DNS poisoning, a click on a sponsored link will never go through Google’s own click-counting redirect.  Google never sees, and therefore never charges for, that click.   The advertiser gets a free click, instead of a paid one, and Google loses the revenue.  The Bahama Botnet strikes again.

Posted by Matt Graham on October 8th, 2009 1 Comment

Beware the “Bahama” Botnet

Just when you thought the fraudsters couldn’t get any more sophisticated … they surprise you.  Click Forensics researchers have recently discovered one of the most advanced sources of click fraud we’ve seen.  We’ve named it the “Bahama botnet” because when first discovered it was redirecting traffic through 200,000 parked domain sites located in the Bahamas.  It has since been reprogrammed to redirect through other intermediate sites hosted in Amsterdam, the U.K., and even San Jose, CA, but the Bahama name stuck.

Interestingly, the Bahama botnet appears to be closely related to the recent spate of “scareware” attacks, such as the one perpetrated against The New York Times digital site just a few days ago, reported by ComputerWorld.  Visitors to the NYTimes.com site were greeted with a pop-up informing them their computer was infected and directed to an authentic-looking site where they could install a program called Personal Antivirus.  Users duped into purchasing this phony software were then infected with a Trojan that gave control of their computer to an unknown third party that we now know to be part of a gang in the Ukraine.

We believe the Bahama botnet is controlled by this same gang, or their neighbors down the street.  More info about the “Ukranian fan club” can be found in Dancho Danchev’s excellent security blog.  We’re pretty sure the Bahama botnet is related to the Ukranian fan club and the NYTimes.com scareware because they each phone back to a bogus “Windows protection” domain located on the same IP address.

These sources were originally identified by the Black Hat community, but we believe Click Forensics is the first to discover the breadth and depth of click fraud being perpetrated by the botnets it controls.  And the botnet is incredibly insidious.

As seen in this video of the botnet in action, caught on film and narrated by Click Forensic’s own Matt Graham, the infected machine will exhibit some really funky behavior.  Clicks on organic search results are redirected through a series of parked domains across a number of top-tier ad providers (search engines and ad networks), eventually arriving at an advertiser unrelated to the original query.  The user is momentarily confused, but likely just performs the search again, this time with easy success.

What makes the botnet so insidious is that it operates intermittently so that the user doesn’t really know that anything is wrong.  Additionally, it can operate independently of the user because the authors appear to be building a large database of authentically user-generated search queries.  And because the queries come from many different machines (IPs) across a broad segment of the Internet population, it is very difficult to find and identify these clicks as fraudulent.  But these auto-generated clicks were not able to disguise themselves well enough to escape Click Forensics anomaly detection algorithms.  Additionally, large amounts of non-converting clicks were spotted in the data we receive from advertisers.  From there, our team was able to hone in on the source of the Bahama botnet.

Seemingly random clicks discovered through advanced pattern detection
Seemingly random clicks discovered through advanced pattern detection

 

Posted by Steve OBrien on September 17th, 2009 3 Comments