Data as DNA

CORE Admin

LAB1100 is exploring new approaches to the discovery of patterns in historical texts.

LAB1100 is setting up a new project that aims to apply pattern recognition techniques developed in the field of bioinformatics to transcribed handwritten documents. This project will develop a tool that is able to produce a data-driven index of terms based on any type of textual data. As the tool will rely on reoccurring patterns, and not on semantics, the application will be language-agnostic and will be able to deal with spelling variations and transcription errors.

LAB1100 aims to rely on algorithms that have been developed in the field of bioinformatics for the discovery of DNA sequences.

The purpose of this project is not to provide scholars with a tool that shows 'the most relevant' terms, but to create a heuristic tool that helps to identify terms and to locate the texts and pages in which these terms have been used. The tool can also be used to find texts or pages with co-occurring patterns. The tool's API and user interactions will be facilitated by a nodegoat research environment that will ingest the indices, weights, and references to the pages and texts.

Latest Blog Posts

Learn how to connect your nodegoat environment to Transkribus and other services

CORE Admin
An example of a document that can be transcribed and ingested into nodegoat.

The nodegoat Guides have been extended with a new section on 'Ingestion Processes'. An Ingestion Process allows you to query an external resource and ingest the returned data in your nodegoat environment. Once the data is stored in nodegoat, it can be used for tagging, referencing, filtering, analysis, and visualisation purposes.[....]

Continue reading