Mining History: Innovative tools in Knowledge Discovery

The shift from concerns related to accessibility to curatorial services enhancing the creative re-use of digital documents has called for an exploration of the web resources and computational schemes needed to meaningfully break the tectonics of the digital document. Under the supervision of two guest lecturers, Sándor Darányi and Peter Wittek (an information scientist and a computer scientist), the participants had the opportunity to learn about and experiment with toolkits which make it possible to extract and display the semantic latent content of textual data sets in info-graphic ways. Gradual insights into the difficulties and solutions of text processing by computers were gained by the presentation of algorithmic methodologies used to represent quality by quantity and afterwards by images, within complex procedures such as topic modeling (creating a statistical model of the content), data visualization (the translation of the semantic content in visual representations) and machine learning (non-statistical pattern emergence). Alternatives to the classical retrieval of information by imagistic browsing and mining were thus introduced at the same time as participants were made aware of the necessity of information repositioning within contextual landscapes. Current tools combining disparate insights from classical lexical semantics (theories regarding the relational nature of meaning) up to web sciences now allow us to grasp the conceptual dynamics within documentary corpora in both their spatial and their temporal dimensions.

During the first ArchivaLab session we therefore experimented with datasets curated by OSA in order to trace otherwise unnoticed connections in our archival collections and to develop a systematic approach to the process of identifying them. The aim is to make data and archives an inspiring space for collaborative research.

Guest lecturers: Sándor Darányi, Peter Wittek