You are here
Projects of the node for bacteria and archaea of GBIF-Germany
The focus of the node GBIF-D bacteria and archaea is to open up prokaryotic organisms related data using innovative methods of data mobilization. The scale and density of available taxon-related information shall be increased substantially. This enhanced data base will offer possibilities of completely new biodiversity analysis (distribution patterns, Omics-based comparison with physiological data, the conformity of degradation or synthesis of natural products).
Specific biodiversity information on bacteria and archaea is up to now only insufficient digitally available, e.g. information on habitats and biogeography, biochemistry and physiology, as well as detailed descriptions of the culture conditions. This is especially true if this information should be connected with the results of genomic and metagenomic analyses.
Provided that relevant information in written form is available, such as formal descriptions for new taxa or recorded descriptions of living bacterial collections, these sources of knowledge is often neither centrally located nor structured searchable and accessible via the Internet.
Mobilisation of written and visual material
In a first step, the written source material is reviewed and digitized. Digitizing is only the first step of processing. The digitized information is then transferred through text and data mining methods into a structured and analysable format. Here methods of text and data mining are applied. These include techniques which extract and transform unstructured text, tables or figures to a structured and searchable form. The results are stored as database. The algorithms that are used here have their origins in the field of computational linguistics as well as statistical analysis and machine learning. The processed data and images are assigned to the respective taxa. The database content is semantically structured, edited and finally made publicly available via the Internet.