Attendance of CERL’s Visual Approaches to Cultural Heritage, 14 March 2018
The Consortium of European Research Libraries (CERL) held its Spring seminar in Zürich, and I was lucky enough to attend. The seminar was held at ETHzürich (Eidgenössische Technische Hochschule Zürich), the city’s other university, in one of its very stylish and modern buildings. ETH’s central campus is on a hill in the old town, with amazing panoramic views over the city and steep narrow lanes with curious perspectives.
The theme of the seminar, as the introductory speakers explained, was the iconic turn (or pictorial turn). Over the last century, images have become increasingly prevalent, in terms of production, reproduction and the role they play in our lives. This has caused cultural theorists to suggest that we need new ways to think about images, and understand how this new visual culture (or age of spectacle) is affecting us.
For those of us working in the field of information technology, the rise of visual resources has developed hand in hand with increasing capacity, bandwidth, processing power and new software for manipulating images. The International Image Interoperability Framework (IIIF) is becoming standard in research libraries. Artificial intelligence techniques like deep learning (neural networks) can classify images. Digitisation is faster than ever and optical character recognition (OCR) and handwriting recognition are getting more and more accurate.
The presentations in the seminar reflected the diversity of approaches in this area, from mass digitisation to using IIIF for uniting fragmented manuscripts to using data visualisations in research.
Meda Hotea, from the ETH Library, demonstrated some of the work done there on presenting visual material. e-rara is the platform for rare Swiss books held across five different institutions. One can view fully digitised copies of 65 thousand titles from 6 centuries of book production.
Another digital image platform hosted by ETH is e-pics, which holds hundreds of thousands of images. These are broken down into various collections. Of particular interest to many in the audience were the ex meis libris scans, which can help one to track the provenance of a book.
Unfortunately I had already done my sightseeing before the seminar, so I wasn’t able to use ETH’s ETHorama, which links documents to locations, allowing one to find literary references, travelogue descriptions and historical accounts using a map interface. This is similar to Edinburgh’s Lit Long project, but for the whole of Switzerland.
The final site demonstrated by Meda Hotea was Explora, which presents material from the ETH library in story form, using a mixture of text, images and other media.
The next presentation was from William Duba and Maria Widmer, both from Fribourg University. Maria Widmer works on the e-codices project, which is a virtual library presenting all medieval Swiss manuscripts, bringing together 2000 manuscripts from over 80 different libraries. William Duba works at Fragmentarium, which is also a virtual manuscript library, but specialising in fragmented manuscripts.
In the past, manuscripts were not always treated as valuable historical documents. They were recycled, used as binding material for other manuscripts, or they could be cut up if the value of the parts was greater than the whole, or they could be damaged by fire or flood. The Fragmentarium aims to reunite manuscripts which have become separated and split over different institutions. e-codices also offers useful advice on digitisation and storage. During the questions, we learned of a case when some people photographing manuscripts didn’t include a colour guide or ruler, but Photoshopped them on to the images afterwards!
Claudia Fabian, from the Bayerische Staatsbibliothek demonstrated some very impressive work which the BSB has done with their immense collection (10 thousand incunabula, 60 thousand C16 books, 43 million images). With such big data, it is possible to do some clever things.
Similar to character and handwriting recognition, BSB performs OMR or optical music recognition on sheet music. And most impressively, one can perform similarity searches on images (Bildähnlichkeitssuche), to find other images which resemble an initial image. This involves performing edge detection and simplifying the colours and encoding these as a description string. The distance between strings can then be calculated and the closer two strings are, the more similar the images should in fact be. I recommend exploring this feature.
There was an interesting discussion on how working with images from books can de-contextualise the image as one no longer sees the page or work it is from. But this image search can also re-contextualise images, by linking between different works which contain similar images.
An important use of this search is to locate duplicate images, thus reducing the amount of metadata which needs to be generated. Also, provenance information (ex libris marks) can be automatically extracted.
The final presentation of the morning came from Etienne Postumus (Brill) and Hans Brandhorst (Leiden University), regarding Arkyves, a serendipity engine which uses Iconclass.
Iconclass, as I learned, is a system for categorising pictures, not dissimilar to a thesaurus containing a hierarchy of concepts. An Iconclass classification is an alphanumeric string. Read from left to right, the characters represent ever increasingly specific concepts, e.g. 54F2 represents Abstract Ideas and Concepts (5), Process of Action (54), Fortune and Misfortune (54F), Victory (54F2).
One of the strengths of Iconclass is that the labels can be translated into other languages. This means that the same classification strings can be used across languages. There was some interesting debate as to how to update Iconclass, to replace out-dated words, for example, those referring to people of colour.
The idea of serendipity which Arkyves aims to embody is that of finding things which one didn’t know one was searching for. In a library where one can access the shelves, one will find other books near the first book one searched for. Often, database searches will return just what one searched for, removing the possibility for serendipitous discovery. Arkyves contains about a million images and metadata, from many famous collections, so there is plenty to find.
Arkyves uses Iconclass to return a list of classifications which are most often associated with the search term one used. Etienne used the search term alcohol, which returns many hits. These can be filtered using associated Iconclass classifications, e.g. 11N37 Sloth and Indolence, 33B31 fist-fight and 56B13 Jocularity.
The site has a useful help document, so that one can get the most out of this valuable resource.
After lunch, Matthias Bixler (Zürich University) talked about his research into historical social networks, and how these can be plotted. Taking, as an example, the complex family of the Roman Emperor Nero, Matthias showed how a well constructed network diagram can give one insights which would be unavailable if one only looked at the raw information (in the way that a graph conveys information better than a simple list of numbers).
There was a lot of discussion as to whether network diagrams were stand-alone pieces of research. One can construct diagrams which suggest convincing answers to research questions (e.g. why did Nero murder particular people in his family). But, as Matthias argued, network diagrams are there to provide insight and to illustrate arguments (or to show off!), rather than as evidence in their own right.
Cristina Dondi (Oxford) then gave a presentation on visualising the circulation of books over space and time using Material Evidence in Incunabula (MEI) and 15cv.
The MEI project uses a variety of different types of evidence to determine the different hands through which a book has passed. One can tell from the type of binding, handwriting styles from marginalia as well as ex libris marks where and when a book has been. These routes can then be visualised by 15cv. Books tend to be produced in France or Italy and over time they make their way into different American libraries. In between, they are bought and sold, given as presents or bequeathed.
Cristina also talked about illustrations in 15th century books. These would be done with woodcuts. These would often be copied or reused across multiple images, and images would contain multiple individual woodcut prints. Using open source software developed at Oxford, it is possible to trace the use of particular woodcuts and reconstruct networks between different publishers and printers, by finding instances of the same woodcut in different books.
The billion Euro presentation came from Frederic Kaplan on the Time Machine Project. This is a very ambitious project which hopes to win €1Bn of European funding to be spent over 10 years by 200 partner institutions in digitising 2000 years of European history.
This builds on the successful Venice Time Machine, which scanned 1000 years of official records from the city. These, together with 3d scans of objects, digitised maps and 3d point clouds obtained from matching features on buildings in paintings, allow one to reconstruct a street view of Venice at any point in its history. One can see who lived where, along with what they did.
As Frederic explained, technologies have advanced to such a degree that mass digitisation of the past is now a reality. Using robots and artificial intelligence, manuscripts can be scanned in seconds and their text accurately extracted (5% error per letter). This software can also recognise duplicate images, and it was discovered that 7% of duplicates had conflicting metadata.
One example we were shown used (from what I remember) a Canaletto painting of the Doge’s Palace. In front of the palace are two very humble looking buildings. Using tax records, it is possible to know who worked in these little huts.
Anna Neovesky (Academy of Sciences and Literature, Mainz) gave the final presentation of the day, on the Regesta Imperii (RI), a database containing full texts from royal (Roman-German) and Papal charters, from 751 to 1519. This project was started in 1829, when a librarian, Johann Friedrich Böhmer, started collecting and publishing the deeds. There are now over 90 print volumes and they were distributed as CD-ROMs in the 1990s and put online in 2001.
This project reminded me of the Records of the Parliaments of Scotland (RPS) project, upon which I worked. It also started out in printed form in the early C19 and was intended to be published on CD-ROM, before the internet made that technology obsolete. The RPS makes both the original (mostly Latin) text available along with a standardised modern English translation. However, I can’t seem to find much of the original texts in RI, and the modern German appears to be a report of the original rather than a translation.
The project is still expanding, bringing in supplementary texts. They are currently transcribing and geo-tagging inscriptions.
The whole visit to Zürich was very interesting. A lot of the projects being presented are similar to ones I have worked on at St Andrews, and I picked up quite a few ideas I look forward to implementing.