Data Extraction //
The data is extracted algorithmically using Linked Open Data. We started with a set of historical topics that people might want to see relationships between. In this case we began by extracting all the topic tags from In Our Time - a long running Radio 4 programme with an excellent archive - using the BBC Programmes RDF API.
We extracted about 3700 tags from 500 programmes, resulting in around 1200 unique terms. These terms were just free-text tags such as ‘medieval’ applied by the BBC production team. In order to start finding out more about these topics we needed to link them to an appropriate Wikipedia article. This was done using a search engine query while limiting the scope of the search to Wikipedia.
Once we had a list of all the relevant Wikipedia pages, we had a way to start working out which ones were suitable for putting on our map and timeline - things which were around at a certain time, and in a certain place.
We extracted the dates for our Wikipedia terms in a very crude way: by reading the introduction of each article and looking for patterns like (1819 - 1901) or 7th Century. This resulted in several false positive matches for articles such as England which includes the phrase “settled during the 5th and 6th centuries.” The date matching could be improved by another Linked Data source such as Freebase , although the number of terms which have start and end dates is still relatively limited.
Extracting the places was more complicated because we needed to know not only the name of the place, but also its latitude and longitude so that we could put it on a map. In order to do this we used DBpedia, which is a Linked Data service that connects Wikipedia articles to other datasets. In this case we used Geonames as it offers one of the most comprehensive sets of open source geographic data.
We scanned each event Wikipedia page and extracted the links to other Wikipedia articles. By checking which of these articles had a Geonames ID listed in DBpedia, we were able to find all geographical places referred to from and event page. We also also used Geonames to filter out any places, such as England, which had crept into our list of events.
Some terms which seem like places, still get treated as event. For example, Ancient Egypt does not have a Geonames ID because it is not strictly a place which could be given geographic co-ordinates. It does however link to a number of places such as Alexandria. The introduction to the Wikipedia article also tells us that the civilisation “coalesced around 3150 BC”, which gives us a date. We now have everything we need to put it on the map and on the timeline.
Development of the Interface //
We extracted the information for each of our In Our Times terms and exported the data set as a large (~1MB) JSON file to be loaded into the JavaScript interface.
The timeline uses a logarithmic function to compress dates in the far past where there are fewer events of interest. The place markers are all added to the map on page load and then shown/hidden as required. This makes the interface more responsive when dragging the timeline.
The map uses Google Maps API v3, the timeline is a customised jQuery UI slider and jQuery is used extensively throughout. Data and text comes from Wikipedia, as discussed above, while related BBC content comes from the Jungle API.
Google Docs makes it easy to create, store and share online documents, spreadsheets and presentations.