Tracing the traces in online spaces

Soldering Time Fun!

Each of this week’s recommended links is about getting down and dirty with the technological details of internet communication.

In a new paper, Geiger and Ribes offer a compelling picture of Wikipedia’s “vandal fighting” editors that largely departs from the existing literature. By engaging with the day-to-day practices of the vandal fighters, the researchers learned to make meaning of an overwhelming heap of Wikipedia data in order to reconstruct the scene of a malicious user being banned.

Joel Spolsky usually writes for an audience of computer programmers and this essay about character encoding is no exception. In and among the technical details, however, Spolsky’s history of the digitized alphabet is a parable about the growing pains of a global computing network. Two hexadecimal bytes represent 255 unique values: plenty of space for American engineers to store 26 lowercase letters, 26 uppercase letters, 10 Arabic numerals, and a handful of punctuation — but what happens when we start to trade files with colleagues overseas? How do today’s software designers account for the thousands upon thousands of characters used around the world?

Finally, Asheesh Laroia runs a fascinating workshop about web scraping at PyCon, the annual gathering of Python programmers. In this play-along-at-home presentation, he walks the audience through a variety of tools and techniques to automate data collection from nearly any resource on the web. Novice programmers should feel comfortable to jump right in. Laroia provides plenty of example code to play with.

  • Laroia, A. (2009) Scrape the Web: Strategies for programming websites that don’t expect it. [Video] PyCon, Chicago, May 8. Retrieved from: http://blip.tv/file/2022154/

(If you’re not yet a programmer but want to learn, Python is a great language for beginners. If you’re looking for an introductory book, try Think Python.)