Proceedings of the

1st Beilstein Glyco-Bioinformatics Symposium

Bits 'n' Bytes of Sugars

4 – 8 October 2009, Potsdam, Germany

The articles of the conference proceedings are available in PDF format.

Download the complete proceedings book in EPUB format (6 MB).

Preface

Glycomics is an emerging field within the -omics-sciences which addresses the investigation of the structure-function relationships of complex biosynthesized carbohydrates and the role they play within biological systems.

The post-genomic era has seen an explosion of activities in the areas of genomics and proteomics in both fundamental research as well as biotechnology applications. Sequencing and synthesis of nucleic acids and proteins has been performed in an automated fashion for many years now; these important basic techniques are now being applied to carbohydrates. Through the work of a number of key laboratories around the world, significant scientific and technical advances are being made resulting in an increasing number of data sets of important interactions of carbohydrates with proteins and nucleic acids become available.

The scientific progress of both genomics and proteomics relies on the interdisciplinary nature of a laboratory-based and a computer-based cooperation. Such an interdisciplinary ‘‘glyco’’-based community is currently starting to become established; this symposium is aimed at supporting these efforts by bringing together glycochemists and biologists with experts in bioinformatics and computer sciences to lay the ground-work for a concerted effort in the area of glyco-bioinformatics.

The symposium covered the use of publicly available data, data mining, structure prediction and docking of carbohydrates, web-based services to combine proteomics and glycomics data for structure-function research and glycosylation analysis.

The meeting successfully paved the way towards the development of first ideas for the integration of glyco-bioinformatics in a universal platform that will serve biologists, chemists and all interested in glycosciences.

We would like to thank particularly the authors who provided us with written versions of the papers that they presented. Special thanks go to all those involved with the preparation and organization of the symposium, to the chairmen who piloted us successfully through the sessions and to the speakers and participants for their contribution in making this symposium a success.

Frankfurt/Main, December 2010

Martin G. Hicks
Carsten Kettner
Peter Seeberger

Glycoinformatics for Structural Glycomics

Stuart M. Haslam1 and David Goldberg2

1Division of Molecular Biosciences, Imperial College London

2Palo Alto Research Center, Palo Alto

Ultra-high sensitivity mass spectrometric strategies incorporating MALDI-MS/MS and nano-electrospray(ES)-MS/MS enable very complex mixtures of glycoproteins and glycolipids from biological extracts of cells and tissues to be screened thereby revealing the types of glycans present and, importantly, providing clues to structures that are likely to be functionally important. However, in contrast to the genomic and proteomic areas, the glycosciences lack accessible, curated and comprehensive data collections that summarize the structure, characteristics, biological origin and potential function of glycans that have been experimentally verified and reported in the literature. This lack of glycan databases has been identified by glycobiologists as the single biggest hindrance to their research. Additionally, the sparseness of glycan databases hampers the realization of bioinformatics tools for the interpretation of experimental data and the automatic determination of the glycan structure, therefore limiting the possibility of large scale glycomics studies. The current status of the field and possible future developments are outlined.

Mining Patterns from Glycan Structures

Ichigaku Takigawa1,3, Kosuke Hashimoto1,2, Motoki Shiga1,3, Minoru Kanehisa1 and Hiroshi Mamitsuka1,3

1Bioinformatics Center, Institute for Chemical Research, Kyoto University

2DHHS/NIH/NLM, National Center for Biotechnology Information (NCBI), Bethesda

3Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Agency (JST), Saitama

Glycans can be directed trees in which nodes are monosaccharides and edges are linkages extending between monosaccharides in the direction from the amino acid-connecting monosaccharide. We present an efficient method for mining frequent and statistically significant subtrees from glycan trees. The two key points of the method are: (1) It can reduce the number of redundant subtrees obtained by usual frequent subtree mining techniques and (2) can keep significant subtrees only by removing frequent but insignificant subtrees, like those with only one or a few nodes. We confirmed the efficiency of the approach in various manners, including biological significance. Our approach would be useful for mining unknown conserved patterns in larger glycan datasets to be obtained by a high-throughput manner in the future.

Bioinformatics - Key to the Future of Chemical Glycomics

Peter H. Seeberger

Max Planck Institute for Colloids and Surfaces, Potsdam

Free University of Berlin

The Burnham Institute, La Jolla

The glycome is more complex than either the genome or the proteome. Efforts to understand glycomics are producing information regarding the structure and function of carbohydrates. Branching and stereochemistry of the glycosidic linkage renders carbohydrates much more complex than oligonucleotides and proteins. Bioinformatics is a key technology to extract the information relayed via glycans. Three major classes of mammalian carbohydrates, glycolipids, O- and N-linked glycans, were analyzed based on the largest available database. The average oligosaccharide is composed of about eight monosaccharide units and while about a quarter of all oligo-saccharides are strictly linear, the remainder are branched at least once. Glucosamine, galactose and mannose are dominating and comprise about 75% of the monosaccharides within mammalian oligosaccharide frameworks. α-linked sialic acid, α-linked fucose and β-linked galactose decorate the majority of reducing termini. Glucose as the most abundant carbohydrate in mammals plays only a very minor role within these structures. Particular emphasis was placed on analyzing the way the monosaccharide units are linked within the oligomeric framework. Just eleven mono-saccharide connections account for more than 75% of all linkages.
Thus, the number of structural combinations found in nature – the part of the occupied mammalian glycospace – is much smaller than expected. Only 36 monosaccharide building blocks are required to construct 75% of the 3299 mammalian oligosaccharides.

Automated N-Glycan Composition Analysis with LC-MS/MSMS

Hannu Peltoniemi1, Ilja Ritamo2, Jarkko Räbinä2 and Leena Valmu2

1Applied Numerics Ltd., Helsinki

2Finnish Red Cross Blood Service, R&D, Helsinki

Compared to proteomics the mass spectrometric glycan analysis still employs a lot of manual work and the differential glycomics can be a burden with increasing number of spectra. Our aim is to ease these tasks by using in-house developed glycomic software in combination with existing proteomics tools. The resulting workflow is targeted especially to glycan LC-MS/MSMS analytics and can be run with a minimal amount of human intervention. Here the method was applied to cell surface N-glycans from umbilical cord blood derived mononuclear cells. The final goal is to profile and differentiate the stem cell surface glycans which are being analysed at the Finnish Red Cross Blood Service.

Glycoinformatic Platforms for Data Interpretation: An HPLC Perspective

Matthew P. Campbell, Natalia V. Artemenko and Pauline M. Rudd

NIBRT Dubllin-Oxford Glycobiology Laboratory, National Institute for Bioprocessing Research and Training, Conway Institute, University College Dublin

High-throughput and automated HPLC techniques allow for the rapid, detailed structural analysis of complex glycans. These advances have the potential for (i) validating a new generation of biomarkers that relate alterations in glycan processing to disease by mining the glycosylation patterns in a variety of disease types and (ii) for monitoring the production of therapeutic glycoproteins. The wealth of knowledge that can be generated justifies the requirement for databases, analytical tools and search facilities. Recent efforts by  international consortia have increased the awareness of the need for glycoinformatics and several resources are now available including a suite of novel applications to assist the interpretation of HPLC data collections.

ProSim: Development of a User-friendly Molecular Modelling Package

Hans Heindl1, Tamas Kiss2, Gabor Terstyanszky2, Noam Weingarten2, Babak Afrough3 and Pamela Greenwell1

1School of Life Sciences, University of Westminster, London

2Centre for Parallel Computing, School of Electronics and Computing, University of Westminster, London

3Institute for Women's Health, University College London

We have developed and tested a user-friendly automated molecular modelling workflow with a web-based interface. The workflow has been tested using protein: drug, enzyme: substrate and lectin: sugar interactions. The work presented here involves studies using 3 glycosidases (a human mannosidase, a viral neuraminidase and a protozoan sialidase). We have illustrated the utility of the workflow using the mannosidase and thio-saccharide inhibitor. We have shown the limitations of such in silico technology when working with enzymes like the viral neuraminidase in which dynamic conformation changes take place during the binding or where access to the active site is blocked by a loop or a single residue. The viral neuraminidase inhibitor does not bind to the protozoan sialidase suggesting that such inhibitors would have no use in therapeutic strategies.

GlycoOptimization for Fully Human and Largely Improved Biopharmaceutical Antibodies and Proteins

Steffen Goletz, Antje Danielczyk, Renate Stahn, Uwe Karsten, Lars Stoeckl, Anja Loeffler, Annett Hillemann and Hans Baumeister

Glycotope GmbH, Berlin

The vast majority of therapeutic proteins are, by nature, glycosylated. In the last years the attached glycans attracts more and more attention since an increasing number of data are available showing that glycosylation greatly affects the biopharmaceutical characteristics of the product. This article focuses on the novel GlycoExpress technology that allows to screen for the optimal glycosylation of any protein and to produce the glycooptimized biopharmaceutical for clinical use. GlycoExpress is a toolbox of human, glyco-engineered cell lines to express any protein with fully human glycosylation and in a variety of differentially glycosylated isoforms. In a number of in vitro and in vivo assays the optimally glycosylated protein is identified and the corresponding cell line is further developed for GMP-production of that glycooptimized biopharmaceutical.
Twelve proteins have been successfully glycooptimized so far with four antibodies and one protein hormone in late preclinical/early clinical development. Glycooptimization improved these biologics up to several hundred folds in respect to activity, bioavailability, immunogenicity and/or patient coverage. And it is expected that in the near future a number of novel biotherapeutics will be developed whose therapeutic or economic benefit rests upon a fully human and optimized glycosylation.

Structural Glycomics – Molecular Details of Protein-Carbohydrate Interactions and their Prediction

Dirk Neumann1 and Oliver Kohlbacher1,2

1neumann scientific computing & consulting, Berlin

2Center of Bioinformatics, Division for Simulation of Biological Systems, University of Tübingen

Protein-ligand docking is an essential technique in computer-aided drug design. While generally available docking programs work well for most drug classes, carbohydrates and carbohydrate-like compounds are often problematic for docking. We discuss the peculiarities of protein-carbohydrate interactions and their impact on protein-carbohydrate docking and review the state of the art in docking of carbohydrates to proteins. Finally, we give an overview of carbohydrate docking studies and present a new docking method specifically designed to handle docking of carbohydrate-like compounds. BALLDock/SLICK combines an evolutionary docking algorithm for flexible ligands and flexible receptor side chains with carbohydrate-specific scoring and energy functions. The scoring function has been designed to identify accurate ligand poses, while the energy function yields accurate estimates of the binding free energies of these poses. On a test set of known protein-sugar complexes we demonstrate the ability of the approach to generate correct poses for almost all of the structures and achieve very low mean errors for the predicted binding free energies.

Software Tools for Storing, Processing and Displaying Carbohydrate Microarray Data

Mark Stoll and Ten Feizi

The Glycosciences Laboratory, Imperial College London

We describe a suite of software modules to store, retrieve and display carbohydrate microarray data. Storage is in a relational database that holds all the microarray data and associated glycan, protein and experimental information. The retrieval and display software has a comprehensive system of sorters, filters and arrangers to allow highly customized presentation of data as charts, tables, 2D matrices and array graphics. Matrices allow arrangement of proteins in one axis and glycans in the other, so that comparisons can be made between the binding patterns of proteins. Sorting and filtering includes a large assortment of built-in parameters that range from glycan features to data grouping in slides and experiments but may also be completely customized to suit individual needs. Charts, tables and matrices are customizable to maximize presentation clarity. There are customizable automatic chart titles, chart axis annotation and scaling, table layouts, matrix arrangements and colour schemes for all graphics. All display output from the software can be saved or printed for permanent record.

SysBioWare: Structure Assignment Tool for Automated Glycomics

Sergey Y. Vakhrushev1, Denis Dadimov2, Jasna Peter-Katalinic3,4

1Genome Biology Unit, Quantitative Proteomics, EMBL, Heidelberg

2MechSystemProject U.E., Bioinformatics Division, Minsk

3Department of Biotechnology, University of Rijeka

4Institute of Pharmaceutical Biology and Phytochemistry, University of Münster

Glycomics as a part of systems biology closely related to proteomics encompasses knowledge acquainted by comprehensive and systematic studies of entire complement of carbohydrates in a cell, organ or organism. The prerequisite for these studies is a detailed information on molecular structure of complex carbohydrates which play a crucial role in processes like signalling, cell-cell recognition and immune response, and which act as therapeutic agents, vaccine or drug targets. Although the modern methods of mass spectrometry are well fitted for integrative ‘‘omics’’ experiment design, the interpretation of carbohydrate mass spectral data is still strongly linked to the human expertise. In this chapter we present a computational approach for automatic interpretation of mass spectral data of complex carbohydrates. We contribute to the field by designing a software package which will significantly reduce a need for human expertise in mass spectrometric data interpretation derived from glycoconjugates and enable discovery and improvement of high-throughput protocol for automated glycomics.
The proposed structure assignment tool named SysBioWare was constructed for automated processing raw MS and MS/MS performing isotopic grouping of detected peaks after de-noising and wavelet analysis. Monoisotopic m/z values render peak list association with the raw MS spectrum and allow compositional assignment according to the tuned building block library. This platform has been applied to human urinome and glycolipidome as a potent tool for rapid assignment of already known or/and new carbohydrate structures.