Niedersächsische Staats- und Universitätsbibliothek Göttingen Niedersächsische Staats- und Universitätsbibliothek Göttingen
Datenströme KI generiert Pixabay, Sayedur Rahman

Text and Data Mining

SUB Göttingen participates in projects on text and data mining, develops tools for natural language processing, and provides text resources as well as text and data mining tools.

MONAPipe – Modes of Narration and Attribution Pipeline

MONAPipe stands for ‘Modes of Narration and Attribution Pipeline’ and offers natural language processing tools for the German language, implemented in Python/spaCy. In addition to the components provided by spaCY, MONAPipe offers specific components and models for digital humanities and computational literary studies.

MONAPipe was originally created in the MONA project group and is now being further developed within the Text+ infrastructure.

You can find a description of the tool and the MONAPipe documentation in the SSH Open Marketplace.

MINE – Text Mining Service for Digital Resources

Project Objective

The MINE project aims to pool text resources available on the Göttingen Campus or provided by partners around the world. The service will then allow full-text searches and searches via metadata, which will also include results from text and data mining tools. These results will also be made available in a knowledge graph.

Service Infrastructure for Text and Data Mining

MINE is developing a service infrastructure for text and data mining (TDM) that will be transferred to a campus service after the end of the project. The aim is to provide researchers and digital services with easy and direct access to TDM tools and text resources. MINE not only enables searches of existing data and metadata, but also enriches the metadata with prepared TDM tools. The enriched results are stored in a knowledge graph, which offers new and unique opportunities for exploring the available resources.

Currently, the service offers searches in approximately 7 million data records from various data sources, which are combined in a normalised data model. The technical infrastructure, which is currently under development, is constantly being expanded with new tools and additional text resources.

Access

You can access the prototype at https://mine-graph.de/. Some functions are only available on the Göttingen Campus. MINE provides various REST endpoints that other systems can use. There is a Python client library and an Orange widget for integrating text resources into your own pipelines or tools. 

MINE is being developed in collaboration with the Göttingen Scientific Data Processing Association (GWDG).

If you have any further questions or would like to receive full access, please contact the MINE team.