Structure in the flow Structure in the flow
Information overload, learning and personal knowledge management Information overload, learning and personal knowledge management

Posts Tagged ‘for the curious’

Repetition and my WikidPad dynamic search extension

Digression on repetition

Information overload has numerous causes, and one of them is plain old repetition, e.g.: two sources delivering the same information, with superficial differences. It’s natural to repeat information for various reasons.

As an example, when students take notes on a teacher’s lecture, they all duplicate basically the same information. If they all decide to put their notes online, bam, 30 new versions of “Notes on Heisenberg uncertainty principle”. Same goes for journals and bloggers reporting on a given event.

Of course there might be additional value to each version, different points being made, but for someone doing research on recent events, he still gets to read again and again the same basic facts.

Clearly there’s no simple solution. In fact I might mention here that discussion in the blogosphere does create repetition, but makes that information evolve. Something similar happens for students exchanging notes. In this light, repetition appears as a necessary evil.

If we really want to get philosophical, let’s just say repetition is unavoidable from the very start, as production of repetitive information is just the consequence of information flowing in the social graph and of different human beings going through similar experiences and train of thoughts. And clearly it’s not because one of them has eaten apple pie that humanity can move on and experience other stuff.


Gratuitous picture of humanity’s bane (source)

(Ah, of course, the irony here is that this very article is just some remix of ideas told a zillion times over).

My WikidPad extension

Yet, being aware of the problem, you can at least work on making your own set of notes as repetition-free as possible. That’s another core reason why I love personal wikis. Instead of rewriting information on two pages, as you’d do in paper notes because you don’t have your old notebooks handy, you simply link to the other page and voilà! you just avoided adding a little more repetition to this world (why not add some grandiose here? :) ).

Yet there are cases where where linking is not enough. Say I’m taking notes on the differences between two programming languages, C# and Java. I have a page on C#, a page on Java. Where do I put the notes? I could create a page dedicated to that topic, but I don’t have enough material for the moment to justify that. So say I put them in the page about Java. Consequence: when on C# page I have to navigate to the other page to read the info.


Diagram explaining the extension

What my extension does is grab the info on the Java page (and any other page) and dynamically bring the relevant sections in the C# page. Technically, you give the extension a keyword, and it will search your whole wiki to find pages that contain it. Then, in those pages, it searches for precisely the lines that contain your keyword and some context around it (”sections”). It then prints a list of those sections.

Now it doesn’t matter as much where I put the notes. As long as I label the sections correctly, I can centralize them in the relevant pages when needed, and I don’t need manual copy anymore.

Grab the code & read details here: http://www.fsavard.com/flow/wikidpad-dynamic-search-results/

For the curious: the incentive to mass-produce information

I always find it interesting to understand the phenomenons that affects me and discover their root causes, be that in politics or information overload (IO). About IO, one path to explore is the motivation behind the production of information.

As you may know, on the Web there are clear incentives to get good search engine ranks to drive traffic to your site, in some cases generating revenue from ads. I just recently realized some of the specifics of this sometimes involve generating massive amounts of content, details of which I found rather startling.

Some background on “Internet marketing”

Search Engine Optimization (SEO), is the work done to make a site stand out in search engine results, for obvious marketing purposes. Some people specialize in doing this. No, bar that: there’s a whole industry centered around this.

Some SEO techniques are entirely ethical and simply make good content easier to find: they’re referred to as “white hat”. Other techniques, “black hat”, use devious ways to route people to less desirable content.

There are ways to benefit directly from search engine rankings without really creating any value in the process. One of those ways is the pathological MFA site, Made For AdSense site (AdSense/Adwords referring to Google ads).

These are those sites you end up on when you mistype a domain name and bam, get a page which is essentially a fresco of Google ads with some teeny-weeny content lost in there somewhere. If for some rather popular keyword someone may get a good spot in Google results for his MFA page, then he’s won the game and reaps some profit when wandering visitors click his ads. At least that’s one scenario.

The techniques of content generation

I knew about this, vaguely, but the other day I stumbled upon this article over at TheNextWeb concerning DataPresser. In a nutshell, this is a tool that allows you to generate content automatically, by following rules.

With DataPresser, for example, you can generate all sorts of variations on “Find cool wallpapers of _______”, where ______ is a blank filled from a database. Not only can you replace the blank, you can even change the way the sentence is worded, using synonyms or even grammatical constructions. That’s to avoid being flagged as “duplicate content” by Google, who obviously tries to eliminate such sites from its results. As you can see, it’s a game of cat and mouse.

The general keyword for this activity is “Content Generation”. There are many techniques and tools. Some will simply generate new pages by republishing from RSS feeds found elsewhere, with ads slapped around. Others will accumulate tons of text and mix bits of sentences from here and there to create text that doesn’t make sense but appears correct to search engines. You can even buy whole databases of content, say game cheatcodes, rather cheaply.

Why mass-produce?

Obviously, it’s profitable to mass-produce text to cover many topics, therefore many keywords. But there’s another reason: search engines give more credit to sites with links pointing to them. That’s why some content generation involve the creation of many sites linking to each other (the whole thing is called a link farm). Another “link-building” technique concerns message boards and blog comments, with posts being made solely to create links that add to a site search-engine karma.

Conclusion

I think this phenomenon plays a big role in understanding the huge “size” of the Web (number of pages). A very simple technique to generate more revenue for a given site, for example, is to split an article in multiple pages so more ads can be displayed. But with these MFA sites, we’re talking about generating thousands of pages at the click of a button!

In the end, it all boils down to spam and background noise. Whatever the service, if it can generate buzz, chances are it’ll be exploited.

References

Knowledge and learning concepts map

I found this very detailed map of learning and knowledge concepts, made by Dr. Rodridgue Savoie of the National Research Council of Canada. Readers with a some time and curiosity will surely find it to be a wealth of paths to explore:

A part of the map