I’ve been having fun today on the Afrikaans Wikipedia. Although my Afrikaans is bad enough so that everything is quite slow going, and I sit with a dictionary by my side, its quite fun working on something that’s still so incomplete. The English Wikipedia is huge now, over 600 000 articles, so the tiny Afrikaans version, where your edits remain visible on the recent changes page the entire day, and don’t disappear within seconds, is quite a different experience altogether.
There are still not entries for most countries in the world. I started today with a Madagaskar entry, and ended up getting distracted into investigating a translation tool, hoping to be able to quickly create stub articles from the English versions. I looked at translate.org as well as the Wikipedia translation pages, but after wading through reams of not very useful discussions and links, I decided it would be quicker to write a tool myself.
It’s really simple. I created a strings table, using the following structure:
CREATE TABLE `string` (
`id` int(11) NOT NULL auto_increment,
`english` varchar(100) NOT NULL default '',
`afrikaans` varchar(100) NOT NULL default '',
`strlen` tinyint(4) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
I just manually entered a whole list of English strings, and their Afrikaans counterparts, and then ran it through a sorting script to populate the strlen field, so that longer strings such as Capital of are translated before shorter string such as Capital
Having done all this, I created a simple web form that takes a textarea, and outputs a hopefully translated textarea, the aim being to be able to quickly run entries from the English wikipedia through the tool, and paste them into the Afrikaans version with the minimum of fuss.
I started with Kenya, and ran it through the tool. However, I noticed that the article uses a country template. It’s a fairly obvious concept, which makes life so much easier. However, none of the Afrikaans articles that I’d looked at until then had made use of them, so each country had different links, different spellings, etc. A nightmare. So, I got distracted again, and with much headscratching and paging through the dictionary, I created an Afrikaans country template.
After getting it vaguely ready for production (and following the be bold principle), I had it working on the newly-created Kenia page. I decided to try it on the Suid-Afrika page, only to find that there was already a template in use. Unfortunately it’s been badly translated from the Dutch, but if I’d known (or bothered to look more carefully) I could have saved myself a lot of time and used that as a base.
Anyway, it’s been a fun day. The real work continues to pile up, but why did I stop working fulltime if not to be able to enjoy unproductive days like this
My real aim is to get the translation tool to a level where, after some initial translation, it can be used to quickly populate country pages for the Xhosa, Zulu, Sotho and other South African language Wikipedias. There’ve been attempts at starting them, but they haven’t yet got anywhere near the critical mass they need to achieve to become viable resources.