Data as DNA

CORE Admin

LAB1100 is exploring new approaches to the discovery of patterns in historical texts.

LAB1100 is setting up a new project that aims to apply pattern recognition techniques developed in the field of bioinformatics to transcribed handwritten documents. This project will develop a tool that is able to produce a data-driven index of terms based on any type of textual data. As the tool will rely on reoccurring patterns, and not on semantics, the application will be language-agnostic and will be able to deal with spelling variations and transcription errors.

LAB1100 aims to rely on algorithms that have been developed in the field of bioinformatics for the discovery of DNA sequences.

The purpose of this project is not to provide scholars with a tool that shows 'the most relevant' terms, but to create a heuristic tool that helps to identify terms and to locate the texts and pages in which these terms have been used. The tool can also be used to find texts or pages with co-occurring patterns. The tool's API and user interactions will be facilitated by a nodegoat research environment that will ingest the indices, weights, and references to the pages and texts.

Latest Blog Posts

nodegoat Workshop series on Linked Data organised by the COST Action NEP4DISSENT

CORE Admin

LAB1100 is organising the workshop series ‘Linking your Historical Sources to Open Data’ together with the COST Action NEP4DISSENT. These workshops will help researchers to connect their research data to existing Linked Open Data resources. These connections will ensure that research data remains interoperable and allow for the ingestion of various relevant Linked Open Data resources.[....]

Continue reading