Data as DNA

CORE Admin

LAB1100 is exploring new approaches to the discovery of patterns in historical texts.

LAB1100 is setting up a new project that aims to apply pattern recognition techniques developed in the field of bioinformatics to transcribed handwritten documents. This project will develop a tool that is able to produce a data-driven index of terms based on any type of textual data. As the tool will rely on reoccurring patterns, and not on semantics, the application will be language-agnostic and will be able to deal with spelling variations and transcription errors.

LAB1100 aims to rely on algorithms that have been developed in the field of bioinformatics for the discovery of DNA sequences.

The purpose of this project is not to provide scholars with a tool that shows 'the most relevant' terms, but to create a heuristic tool that helps to identify terms and to locate the texts and pages in which these terms have been used. The tool can also be used to find texts or pages with co-occurring patterns. The tool's API and user interactions will be facilitated by a nodegoat research environment that will ingest the indices, weights, and references to the pages and texts.

Latest Blog Posts

New nodegoat Data Publication Module

CORE Admin
Publication of the 'Imagology' project of Joep Leerssen, see this page for more info and a public user interface

Publish your project with the new data publication module. nodegoat users can now select any project to generate a data publication that is web-accessible and downloadable as a ZIP-file. By generating a new publication a Project's data model and all of its data are published and archived. The publication remains accessible also when new publications are generated at a later stage. [....]

Continue reading