A reason to smile during a typical day’s routine is provided when your feed aggregator notifies you of an incoming feed from the webcomicof your choice. My all time favourites include xkcd, PhD comics, The Joy of Tech, UserFriendly, Red Meat and Cyanide and Happiness. Others that make it to the list are Questionable Content, the 5th wave and Dilbert.
Now for a late entrant like me into this world, it becomes rather tedious to read the strip online from the beginning. I do not remember where I stopped reading previously and I already have enough bookmarks without needing to add one more. I would like to save them offline so I can read them anytime I want. Then again who wants to navigate every page and save every image.
A simple customized script in Linux can automate the whole process of downloading . Here I present a Tcl script to download the images from The Joy of Tech
Of course this can also be performed using a shell script / perl / python and the like. The trick in downloading is when you the webcomic issues are not ordered sequentially by their serial number. Rather they url looks something like http://ars.userfriendly.org/cartoons/?id=20090113. Here one solution I can think of is finding the pattern (sequence number / dated) for the image url and modifying the script to download according to the sequence. If that does not succeed download the entire page, parsing the for the <img> tags for the image url and downloading it. This can also be helpful if you want to add the title of the comic to the filename by searching for the regex in the downloaded webpage ( I did this for PhD comics).
If anbody have a better solution I would like to hear of it.
This is a blog where I plan discuss things esoteric (the url is indicative I spz). This shall vary between diverse topics intended for the geeks and übergeeks with whom we share this planet.
Posts will be tech related, rants of mine, a bizzare idea swimming in my grey cells, a project which I carried out or whatever I feel will justify the few kilobytes it will occupy on the server disk.
Look forward to updates at haphazard intervals, hackneyed phrases and the like