Browsing the web, we see tons of different layouts: each site has his own. Though that makes for a more diverse experience, it’s not the best when you want to sit down and take the time to read a long article.
Those that use Firefox have certainly encountered extensions such as AdBlock Plus and Flashblock, which help in making web pages look less like a stress test for epilepsy. More general (cross-browser) solutions exist by using a proxy mechanism to filter incoming content, such as Privoxy.
Yet one can go even further to isolate the text of an article. Some sites offer a “print” version of their articles (usually a single, clean page), but that’s not the general case. That’s where the Aardvark extension comes in. It allows you to delete elements from a page and rearrange it quickly so you only keep the part you want.
Overview of Aardvark’s modification commands
Once installed, you navigate to a page you want to clean up and you launch Aardvark (Tools -> Start Aardvark). You then see a red rectangle over elements when your mouse pointer hovers over them. You press keys to activate different editing operations for the selected element (press ‘h’ to get the list of commands).
Aardvark‘s help (list of commands)
It helps here to understand how web pages are coded (HTML), but in essence a page is made of rectangular zones inside bigger zones (ex: an image in a paragraph), forming a hierarchy. As your mouse pointer hovers over a given rectangle (say a paragraph title), you may want to select its parent in the hierarchy (the paragraph itself). To do it, you press ‘w’ to ‘widen’ the selection. The inverse operation is ‘n’ for ‘narrow’.
Example of Aardvark‘s rectangle selection
You can delete elements in essentially two ways. The first is the straightforward one: you select an element and press ‘r’ to remove it. The other is the opposite: you press ‘i’ to isolate the selected element, ie. keep only this one, remove all the rest. ‘i’ is very useful to select the page main element that contains the whole text, and then you can work the details with ‘r’.
If the isolated text is too narrow (doesn’t fill the page horizontally), you can press ‘d’ to ‘de-widthify’, which means that the ‘width’ attribute (which prevents the block from filling the page) is removed from it. You may have to fiddle a bit until you find the element on which the ‘width’ is applied, though.
Saving the result with ScrapBook
When the modifications are over, I save the page in its modified state using ScrapBook (which I covered in another blog post). I can then read in the format I want, and add notes and highlights. (The ScrapBook extension does have a “delete” feature, but it’s not as featureful as Aardvark’s.)
If an article is spread over multiple pages, you can use ScrapBook “Combine Wizard” (in the SB sidebar: Tools -> Combine Wizard) feature to merge them in a single page.