Open Source

The Beilstein-Institut supports a number of open source initiatives that are hosted on Github. Here we will summarize the outputs from our BeilsteinChemInfo Labs projects. All software, tools and look up tables that we publish are available to the community with the most liberal open source licenses available in order to encourage the community to not only use and share this work, but also to work with us to further develop these outputs further for the chemistry community. 

BChemLookup – An Open Science Initiative for Mapping Common Chemical Abbreviations

 

In chemical drawings or reaction schemes, it is common to use abbreviated labels to represents (larger) substructures. Unfortunately these abbreviations are not standardized and they are difficult to digitize without creating a mapping between the abbreviation and the correct chemical name. The Beilstein Chemical Lookup Collection is an open, community-driven resource dedicated to collecting and standardizing chemical abbreviations and shorthand notations used in structure drawing, reaction schemes, and synthetic chemistry. This open-science repository aims to gather, curate, and share a comprehensive set of chemical lookup tables. Here is an example of the normalized and curated information in the abbreviation look up table, here presented as human-readable information:

Abbreviation  Meaning SMILES/SMARTS NOTES
Bn benzyl c1ccccc1CH2 common protecting group
Et ethyl CC alkyl chain
Ph phenyl c1ccccc1 benzene ring
Boc tert-butoxycarbonyl OC(C)(C)C=O amine protecting group
t-Bu tert-butyl C(C)(C)C often used as steric bulk

The goal of this project is to create comprehensive lists of chemical abbreviations used in academic literature, chemical drawing software, reaction databases and to provide machine-readable formats for use in parsers, renderers, and other cheminformatics tools. In this work, we will aim to identify ambiguities, overloads, and regional variations and finally to enable community contribution and review for open data curation.

All data in this repository is released under the CC0 1.0 Universal license. Anyone may use it freely in research, software, education, or commercial applications. For more information, visit the Beilstein BChemLookup Github repository: 

https://github.com/Beilstein-Institut/BChemLookup

BChemXtract – A Pure-Java Extractor of ChemDraw Structures and Reactions

One of the first steps for making chemical knowledge FAIR and enabling data-driven science is to unlock the structural data in ChemDraw files. The open-source Java parser BchemXtract (Beilstein Chemical Extraction) allows to extract chemical structures and reactions (experimental status) from ChemDraws CDX and CDXML files. It was initially developed with the purpose of extracting chemical structures and making them FAIR for embedding into the Beilstein Diamond Open Access journals. However this tool was developed and released as a stand-alone software toolkit, so that it could be integrated into e.g. any publishing system. Furthermore, BchemXtract is not limited to extracting chemical structures from articles in chemical journals, it can also be used for just processing CDX/CDXML files.

The project aims to provide a pure-Java implementation for reading ChemDraw files (binary and XML) as well as extracting and validating meaningful chemical structures and reactions from the parsed file.

Structures and reactions can be enriched with calculated chemical descriptors such as InChI and SMILES (or RInChI and reaction SMILES respectively). This project has been piloted by the Beilstein Journals to disseminate FAIR chemical structures; more information about how the software was integrated can be found in the Chemical Structures section

All data in this repository is released under the MIT license. Anyone may use it freely in research, software, education, or commercial applications. For more information, visit the Beilstein BChemXtract Github repository: 

https://github.com/Beilstein-Institut/BChemXtract