Working with the British Library Dataset

20 Apr 2016 . category: research . Comments

Earlier this year Tim Weyrich directed me onto a dataset published by the British Library and since then my research has focused heavily around this. Within Computer Vision it is unusual to get a large dataset not skewed to achieve a specific research goal. Sometimes datasets can be repurposed, but this is requires extensive effort to get the data in its rawest form.

The British Library dataset is a quite literally a "dump" of all the unknown to OCR elements from the book scanning performed by Microsoft. Therefore is not just a collection of line art imagery, but also of elaborate charachters or section embroidery.

I intend to post more on working with this dataset as time goes on, but for now there is a github which contains the directory of images: Directory of Images (Github)

And what is the main repository on Flickr:Flickr Repository

Assistant Professor in Visual Computing at Durham University. Stuart's research focus is on Visual Reasoning to understand the layout of visual content from Iconography (e.g. Sketches) to 3D Scene understanding and their implications on methods of interaction. He is currently a co-I on the RePAIR EU FET, DCitizens EU Twinning, and BoSS EU Lighthouse. He was a co-I on the MEMEX RIA EU H2020 project coordinated at IIT for increasing social inclusion with Cultural Heritage. Stuart has previously held a Researcher & PostDoc positions at IIT as well as PostDocs at University College London (UCL), and the University of Surrey. Also, at the University of Surrey, Stuart was awarded his PhD on visual information retrieval for sketches. Stuart holds an External Scientist at IIT, Honorary roles UCL and UCL Digital Humanities, and an international collaborator of ITI/LARSyS. He also regularly organises Vision for Art (VISART) workshop and Humanities-orientated tutorials and was Program Chair at British Machine Conference (BMVC) 2021.