I always find it interesting to understand the phenomenons that affects me and discover their root causes, be that in politics or information overload (IO). About IO, one path to explore is the motivation behind the production of information.
As you may know, on the Web there are clear incentives to get good search engine ranks to drive traffic to your site, in some cases generating revenue from ads. I just recently realized some of the specifics of this sometimes involve generating massive amounts of content, details of which I found rather startling.
Some background on “Internet marketing”
Search Engine Optimization (SEO), is the work done to make a site stand out in search engine results, for obvious marketing purposes. Some people specialize in doing this. No, bar that: there’s a whole industry centered around this.
Some SEO techniques are entirely ethical and simply make good content easier to find: they’re referred to as “white hat”. Other techniques, “black hat”, use devious ways to route people to less desirable content.
There are ways to benefit directly from search engine rankings without really creating any value in the process. One of those ways is the pathological MFA site, Made For AdSense site (AdSense/Adwords referring to Google ads).
These are those sites you end up on when you mistype a domain name and bam, get a page which is essentially a fresco of Google ads with some teeny-weeny content lost in there somewhere. If for some rather popular keyword someone may get a good spot in Google results for his MFA page, then he’s won the game and reaps some profit when wandering visitors click his ads. At least that’s one scenario.
The techniques of content generation
I knew about this, vaguely, but the other day I stumbled upon this article over at TheNextWeb concerning DataPresser. In a nutshell, this is a tool that allows you to generate content automatically, by following rules.
With DataPresser, for example, you can generate all sorts of variations on “Find cool wallpapers of _______”, where ______ is a blank filled from a database. Not only can you replace the blank, you can even change the way the sentence is worded, using synonyms or even grammatical constructions. That’s to avoid being flagged as “duplicate content” by Google, who obviously tries to eliminate such sites from its results. As you can see, it’s a game of cat and mouse.
The general keyword for this activity is “Content Generation”. There are many techniques and tools. Some will simply generate new pages by republishing from RSS feeds found elsewhere, with ads slapped around. Others will accumulate tons of text and mix bits of sentences from here and there to create text that doesn’t make sense but appears correct to search engines. You can even buy whole databases of content, say game cheatcodes, rather cheaply.
Why mass-produce?
Obviously, it’s profitable to mass-produce text to cover many topics, therefore many keywords. But there’s another reason: search engines give more credit to sites with links pointing to them. That’s why some content generation involve the creation of many sites linking to each other (the whole thing is called a link farm). Another “link-building” technique concerns message boards and blog comments, with posts being made solely to create links that add to a site search-engine karma.
Conclusion
I think this phenomenon plays a big role in understanding the huge “size” of the Web (number of pages). A very simple technique to generate more revenue for a given site, for example, is to split an article in multiple pages so more ads can be displayed. But with these MFA sites, we’re talking about generating thousands of pages at the click of a button!
In the end, it all boils down to spam and background noise. Whatever the service, if it can generate buzz, chances are it’ll be exploited.

