Structure in the flow Structure in the flow
Information overload, learning and personal knowledge management Information overload, learning and personal knowledge management

Posts Tagged ‘information overload’

For the curious: the incentive to mass-produce information

I always find it interesting to understand the phenomenons that affects me and discover their root causes, be that in politics or information overload (IO). About IO, one path to explore is the motivation behind the production of information.

As you may know, on the Web there are clear incentives to get good search engine ranks to drive traffic to your site, in some cases generating revenue from ads. I just recently realized some of the specifics of this sometimes involve generating massive amounts of content, details of which I found rather startling.

Some background on “Internet marketing”

Search Engine Optimization (SEO), is the work done to make a site stand out in search engine results, for obvious marketing purposes. Some people specialize in doing this. No, bar that: there’s a whole industry centered around this.

Some SEO techniques are entirely ethical and simply make good content easier to find: they’re referred to as “white hat”. Other techniques, “black hat”, use devious ways to route people to less desirable content.

There are ways to benefit directly from search engine rankings without really creating any value in the process. One of those ways is the pathological MFA site, Made For AdSense site (AdSense/Adwords referring to Google ads).

These are those sites you end up on when you mistype a domain name and bam, get a page which is essentially a fresco of Google ads with some teeny-weeny content lost in there somewhere. If for some rather popular keyword someone may get a good spot in Google results for his MFA page, then he’s won the game and reaps some profit when wandering visitors click his ads. At least that’s one scenario.

The techniques of content generation

I knew about this, vaguely, but the other day I stumbled upon this article over at TheNextWeb concerning DataPresser. In a nutshell, this is a tool that allows you to generate content automatically, by following rules.

With DataPresser, for example, you can generate all sorts of variations on “Find cool wallpapers of _______”, where ______ is a blank filled from a database. Not only can you replace the blank, you can even change the way the sentence is worded, using synonyms or even grammatical constructions. That’s to avoid being flagged as “duplicate content” by Google, who obviously tries to eliminate such sites from its results. As you can see, it’s a game of cat and mouse.

The general keyword for this activity is “Content Generation”. There are many techniques and tools. Some will simply generate new pages by republishing from RSS feeds found elsewhere, with ads slapped around. Others will accumulate tons of text and mix bits of sentences from here and there to create text that doesn’t make sense but appears correct to search engines. You can even buy whole databases of content, say game cheatcodes, rather cheaply.

Why mass-produce?

Obviously, it’s profitable to mass-produce text to cover many topics, therefore many keywords. But there’s another reason: search engines give more credit to sites with links pointing to them. That’s why some content generation involve the creation of many sites linking to each other (the whole thing is called a link farm). Another “link-building” technique concerns message boards and blog comments, with posts being made solely to create links that add to a site search-engine karma.

Conclusion

I think this phenomenon plays a big role in understanding the huge “size” of the Web (number of pages). A very simple technique to generate more revenue for a given site, for example, is to split an article in multiple pages so more ads can be displayed. But with these MFA sites, we’re talking about generating thousands of pages at the click of a button!

In the end, it all boils down to spam and background noise. Whatever the service, if it can generate buzz, chances are it’ll be exploited.

References

How a popular blogger reads 600+ RSS feeds every day

This is about a year old, but very relevant here. Timothy Ferriss, author of “The 4-hour workweek”, interviewed Robert Scoble and filmed his RSS reading process (he’s suscribed to more than 600 feeds!).

In the end, perhaps unsurprisingly, the magic relies on being really quick at judging an article from its title, its overall look and other cues. There are a couple of technical tips, though, about using the Google Reader interface efficiently, like relying on keyboard shortcuts.

Here’s the video:

Clay Shirky’s talk on information overload

Via a LifeHacker story, I found this video of NYU New Media professor Clay Shirky’s opinion on the information overload problem. It’s very interesting, if a bit long, so I made a summary of some of his points:

  • We always hear the same story: information being produced increasingly fast. That makes us feel good about ourselves: that’s why I can’t get anything done, see!


IDC information overload chart (mentioned in Shirky’s talk)

  • In the past, the editor had to filter for quality what went out of the printing press, due to the risk involved if the book didn’t sell. But the Internet introduced “post-Gutenberg economics”, where the filter for quality is now “way downstream” from the source, since everyone may publish.
  • So we shouldn’t see the problem as an information overproduction at the source problem, so much as a personal filtering problem.
  • He takes email spam as an example: we set up filters, but after a few time we notice more spam gets in anyway: our filters need tweaking. It’s about old filters continuously breaking and needing to be fixed.
  • Social media and the Internet in general bring new systems that break old ways of exchanging information, and makes us formalize and need to take responsibility for information flow issues, who our information might reach, how public it gets, like privacy of Facebook events.
  • Conclusion: information overload is not just a superficial problem, something that can be solved by programming once and for all. Algorithms can help, yes, but we need to rethink social norms and when we face overload, ask ourself personally: which of my filters just broke?

As some people underlined in comments at LifeHacker, “solving” information overload is nothing new for another fundamental reason: it’s about chosing what we’re personally interested in. One cannot master every field there is, obviously. In the end, it’s about personal choice, not just about what’s universally “good” or “bad”. That’s one problem with social bookmarking: one story might be very interesting to you and not to the mass, not even to people in what you consider your own field.