Encyclopaedia Britannica for Mediawiki

Here is a Python script that I cooked up some time ago that feeds the articles from the Encyclopaedia Britannica 2008 Ultimate DVD into Mediawiki, which is a MUCH better interface than the one shipped on the DVD. It supports the Encyclopaedia Britannica proper, the Britannica Books of the Year, the Britannica Student Library, and the index, each of which goes to a separate namespace. To use it, you need to run it from the root of your Mediawiki installation, which you should set up to include some additional namespaces in LocalSettings.php:

$wgAllowExternalImages = true;
$wgExtraNamespaces[100] = 'IndexEntry';
$wgExtraNamespaces[102] = 'BookOfTheYear';
$wgExtraNamespaces[104] = 'YearInReview';
$wgExtraNamespaces[106] = 'Document';
$wgExtraNamespaces[108] = 'BSL';
$wgContentNamespaces[] = 102;
$wgContentNamespaces[] = 106;
$wgContentNamespaces[] = 108;
$wgCapitalLinks = false;

You will also need to change the path to your Britannica DVD at the top of the script. It takes several days to finish, but it can be interrupted at any time and will resume operation when started again. Make sure it is allowed to write the necessary data to your Mediawiki directory.

The script tries its best to transform the highly inconsistent Britannica HTML code to Mediawiki markup, including inline images and diagrams. It even adds additional links to the sparsely linked articles, yielding remarkable results. It is not perfect, however, and I’d like to hear of any improvements you can come up with.

Download: dopidx.py

Leave a Reply

You must be logged in to post a comment.