Proceedings of

the Beilstein Bozen Symposium

Molecular Informatics:

Confronting Complexity

13 – 16 May 2002 in Bozen, Italy

The articles of the conference proceedings are available in PDF format.

Download the complete proceedings book in PDF format (9 MB).


The Beilstein-Institut organizes and sponsors scientific meetings, workshops and seminars, with the aim of catalysing advances in chemical science by facilitating the interdisciplinary exchange and communication of ideas amongst the participants.

This workshop – Molecular Informatics: Confronting Complexity - addressed some of the new challenges that face scientists in the post-genome era, in particular, the integration of two, until recently, disparate sciences – chemistry and biology. The underlying theme of the workshop was to gain insight into the behaviour of biological and molecular systems through the application of molecular informatics.

The flood of data being generated as a result of research into genomics and proteomics is often overwhelming. Well publicised successes tend to draw the focus away from some of the significant issues relating to a better understanding of molecular systems which are still far from clear. Whereas the development of predictive models based on analogy has been very successful in chemistry and cheminformatics, the non-linear nature of biomolecular systems, often with multiple pathways, restricts similar transference within bioinformatics. However, without a critical analysis, taking into account the assumptions and limitations of hypotheses and predictive models, advances in molecular informatics will not assume significance. Before this can be effectively carried out, more effort needs to be made in bridging the gap between chemists, dealing with the structure and properties of molecules, and biologists, working with complex molecular and cell physiological systems

Participants, as well as, speakers were confronted with the following complex challenges from cheminformatics and bioinformatics: knowledge discovery and data mining, rational drug design, prediction of small molecule bioavailability (ADME Tox) properties, protein structure and function determination, new methods of drug-target modeling, cellular metabolism and metabolic pathways, and the use of high-throughput methods (biochips, X-ray crystallography) for acquiring gene expression and protein structure, as well as, binding information.

This meeting did not set out to solve all problems, but to initiate a dialog between scientists of different disciplines. Over the three days of the workshop, the participants not only heard excellent talks, took part in lively discussions, but in the time between the official sessions of the scientific program, exchanged ideas and thoughts and generally made a valuable and personal contribution to bridging the gap!

We would like to thank particularly the authors who provided us with written versions of the papers that they presented. Special thanks go to all those involved with the preparation and organization of the workshop, to the chairmen who piloted us successfully through the sessions, and to the speakers and participants for their contribution in making this workshop a success.

Frankfurt / Main, February 2003

Martin G. Hicks
Carsten Kettner


Carsten Kettner

Beilstein-Institut zur Förderung der Chemischen Wissenschaften, Frankfurt am Main.

With the availability of complete genome sequences, emphasis has shifted toward the understanding of protein function and this in turn has opened up a new “-omics”-field, i.e. functional proteomics. Structural studies of proteins are only one aspect of functional proteomics and are mostly carried out by computational means. However, these investigations must be completed by function studies resulting in structure/function relationships and this can only be accomplished at the lab bench.

Some examples of comprehensive investigations on transport proteins of yeast, Saccharomyces cerevisiae, can be used to illustrate these relationships. This research includes various methods and tools concerning visualisation, sequencing and annotation. In addition, the transport activity of a tonoplast-residing proton pump has been studied in detail by biophysical approaches. The result of these investigations on the structure/function relationships demonstrate a fruitful cooperation of so-called traditional “wet” biology and computational biology


Christopher M. Dobson

Department of Chemistry, University of Cambridge.

The ability of proteins to fold to their functional states following synthesis on the ribosome is one of the most remarkable features of biology. The sequences of natural proteins have emerged through evolutionary processes such that their unique native states can be found very efficiently even in the complex environment inside a living cell. But under some conditions proteins fail to fold correctly, or to remain correctly folded, in living systems, and this failure can result in a wide range of diseases. One group of diseases, known as amyloidoses, which includes Alzheimer’s and the transmissible spongiform encephalopathies, involves deposition of aggregated proteins in a variety of tissues. These diseases are particularly intriguing because evidence is accumulating that the formation of the highly organized amyloid aggregates is a generic property of polypeptides, and not simply a feature of the few proteins associated with recognized pathological conditions. Moreover, such aggregates appear to posses inherent toxicity. That aggregates of this type are not normally found in properly functional biological systems is a further testament to the efficiency of biological evolution, in this case resulting in the emergence of a variety of mechanisms inhibiting their formation. Understanding the nature of such protective mechanisms is a crucial step in the development of strategies to prevent and treat these debilitating diseases.


Richard A. Goldstein

Siena Biotech, Siena, Italy.

The genomic data available to computational biologists represents the product of the complex processes of evolution. In particular, the forces of mutation, duplication, and selection have acted to sculpt modern protein sequence and structure in the context of changing functional requirements. Just as crystallographers are able to determine protein structures through an analysis of X-ray diffraction patterns, we wish to read the evolutionary history of proteins in order to understand their structures, functions, and interactions. To this end, we have been developing models of natural site substitutions that are informed by the protein structure and function and the resulting variations in selective pressures, even when the structure and function of the protein are unknown. By phrasing the substitution process in terms of the underlying properties of the constituent amino acids we can build models that are both much more accurate and more interpretable. The model is applied to a large set of globular proteins as well as a set of Gprotein coupled receptors, identifying general structural and functional features of these biomolecules.


W. Graham Richards

Department of Chemistry, University of Oxford, Central Chemistry Laboratory, Oxford.

Computational methods developed in the area of medical imaging can be adapted to find ligand binding sites on proteins. Once the binding site is specified, libraries of real or virtual molecules may be screened to seek out compounds which have very strong affinity. Massively distributed computing enables huge numbers of molecules to be screened.

These approaches will be illustrated by reference to a search for inhibitors of the binding of anthrax lethal factor to the protection antigen. With the site identified some 3.5 billion molecules were tested in 24 days using the power of 1.4 million personal computers running a screen saver. Over 300,000 hits were revealed with approximately 12,000 looking particularly promising.


Christopher A. Lipinski

Pfizer Global Research and Development, Groton Laboratories, Connecticut, USA.

Poor aqueous solubility is the largest physicochemical problem hindering drug oral activity. Among combinatorial libraries, poor solubility is a frequently encountered problem but poor permeability is seldom a problem. The relative importance of poor solubility vs. poor permeability as a source of poor oral activity depends on the method of lead generation. Solubility or permeability problems are not purely a technical issue of assay design or computational prediction. People and organizational issues are extremely important. A computational ADME filter like the "rule of 5" is most effective when used prior to the beginning of experimentation.


Christoph Helma, Stefan Kramer and Luc De Raedt

Albert-Ludwigs-Universität Freiburg, Institut für Informatik, Freiburg im Breisgau, Germany.

Inductive databases are a new generation of databases, that are capable of dealing with data but also with patterns or regularities within the data. A user can generate, manipulate and search for patterns of interest using an inductive query language. Data mining then becomes an interactive querying process.

The inductive database framework is especially interesting for bio- and chemoinformatics, because of the large and complex databases that exist in these domains, and the lack of methods to gain scientific knowledge from them. In this article we present an example for inductive databases: Molfea is the Molecular Feature Miner that mines for linear fragments in the 2D-structure of chemical compounds. In the methodological part we will explain the inner working of the Molfea algorithm, using a simple example. In the second part we will present applications to the NCI DTP AIDS Antiviral Screen database and several benchmark Structure-Activity Relationship problems in toxicology.


Athel Cornish-Bowden and María Luz Cárdenas

CNRS-BIP, Marseille, France.

The metabolism of living organisms is certainly complicated, but it does not follow from this that it is complex, which would mean that its behaviour could not be computed, even in principle. For a simple organism like the parasite Trypanosoma brucei, the kinetics of glycolysis can in fact be computed with good accuracy from the known kinetic properties of the component enzymes. There may thus be no complexity to confront in the design of molecules intended to act as drugs. On the other hand, successful drug design will require much more attention to the functions of the intended targets than is evident in current practice, which is overwhelmingly structure-based. This will involve recognizing the different kinds of inhibition possible in a complete system and analysing the stoichiometric constraints that limit the variations in metabolite concentrations that are possible.


Christian Lemer1, Avi Naim2, Yong Zhang2, Didier Croes1, Georges N. Cohen4, Gaurab Mukherjee2, Lorenz Wernisch2,3, Klaudia Walter2, Jean Richelle1, Jacques Van Helden1 and Shoshana J. Wodak1,2

1Centre de Biologie Structurale et Bioinformatique, Université Libre de Bruxelles.
2European Bioinformatics Institute (EBI), Genome Campus – Hinxton, Cambridge.
3School of Crystallography. Birkbeck College. University of London.
4Unité d' Expression des Gènes Eucaryotes, Paris.

The aMAZE database manages information on the molecular functions of genes and proteins, their interactions and the biochemical processes in which they participate. Its data model embodies general rules for associating molecules and interactions into large complex networks that can be analysed using graph theory methods. The processes represented include metabolic pathways, protein-protein interactions, gene regulation, transport and signal transduction. These processes are mapped into their spatial localisation. A distinct feature of aMAZE is its Object-Oriented, modular and open user interface. Queries are invoked through dedicated modules, data can be linked to external sources, interactively browsed and transferred between modules, and new modules can be readily added. Available modules also include, a custom-built Diagram Editor for the automatic layout, display, and interactive modification of pathway diagrams, and procedures for analysing network graphs.


Brian K. Shoichet, Susan L. Mcgovern, Binqing Q. Wei and John J. Irwin

Northwestern University, Department of Molecular Pharmacology & Biological Chemistry, Chicago.

Molecular docking attempts to find complementary fits for two molecules, typically a candidate ligand and a macromolecular receptor. Among the most popular applications of docking computer programs is that of screening a database of small molecules for those that might act as ligands for a biological receptor of known or modeled structure. The motivating idea is that the receptor structure can act as a template to select database molecules that will complement it structurally and chemically, and so bind to it, modulating its function. The hope is that this will allow novel families of ligands to be found, allowing one to escape from the tedium of substratebased or analog-based design.


Harren Jhoti

Astex Technology Ltd, Cambridge Science Park. Cambridge

In the past two decades the promise of structure-based drug design has continued to attract significant interest from the pharmaceutical industry. The initial wave of enthusiasm in the late eighties resulted in some notable successes, for example, the crystal structures of HIV protease and influenza neuraminidase were used to design Viracept and Relenza, both drugs currently used in anti-viral therapy. However, although structure-based design methods continued to be developed, the approach became largely eclipsed in the early nineties by other technologies such as combinatorial chemistry and high-throughput screening (HTS) which seemed to offer a more effective approach for drug discovery. The goal of obtaining a crystal structure of the target protein, particularly in complex with lead compounds was regarded as a resource-intensive, unpredictable and slow process. During that period it was clear that protein crystallography was unable to keep pace with the other drug discovery technologies being performed in a highthroughput mode.

More recently, there has been resurgence in interest for using structurebased approaches driven largely by major technology developments in protein crystallography that have resulted in crystal structures for many of today’s therapeutic targets. Furthermore, the ability to rapidly obtain crystal structures of a target protein in complex with small molecules is driving a new wave of structure-based drug design. In this chapter I will briefly describe some of these technology developments and focus on how they have enabled high-throughput X-ray crystallography to be applied to drug discovery.


Gerald M. Maggiora

Computer-Aided Drug Discovery, Pharmacia Corporation, Kalamazoo, USA.

A description of a computer-aided decision making methodology, called the Analytic Hierarchy Process (AHP), is presented. The method was developed by Thomas Saaty over three decades ago to handle a variety of business-oriented decision making activities. The AHP is a flexible methodology that allows both subjective and objective data to be considered in a decision process. Moreover, it is intuitive and relatively easy to understand the way in which decisions are made. Although many business-related applications have been carried out over the years, very few science-based applications currently exist. In addition to a description of the basic methodology an example from drug-discovery research, namely biological target selection, will be presented as an illustration of how the AHP methodology can be applied in pharmaceutical research. A brief mention of other possible applications will also be provided.


Karl-Heinz Baringhaus, Thomas Klabunde, Hans Matter, Thorsten Naumann and Bernard Pirard

Aventis Pharma Deutschland GmbH, LG Chemistry-Computational Chemistry, Industriepark Hoechst, Frankfurt am Main.

This paper describes a target family-related lead finding approach, which consists of capturing public and proprietary information to build a biological and a chemical space. Computational tools to assemble these spaces as well as appropriate techniques to match them are covered. Three recent applications in the field of kinases, ion channels and GPCRs exhibited already improved lead finding capabilities compared to traditional approaches.


Carleton R. Sage, Kevin R. Holme, Nianish Sud and Rudy Potenzone

Lionbioscience, Inc., San Diego, USA.

Drug discovery programs have had to deal with an avalanche of data coming from both the adoption of new technologies such as high throughput screening and combinatorial chemistry, as well as advances in genomics and structural genomics which have facilitated a gene family target approach to drug discovery. Although this data rich environment has been a challenge to manage, it has provided an opportunity for the development of informatics based tools and solutions to extract information from this large body of data, and convert this information into knowledge that can be used and reused for drug discovery. [...]

This paper will describe some concepts and implementations of cheminformatics solutions that begin to address the need for reusable knowledge generation within drug discovery projects. The talk will address requirements for the integration of chemical and biological data as well as the integration of tools and models. The power of using predictive tools for compound design will be highlighted as well as methods to simultaneously consider multiple SAR's. We will describe how providing such solutions.


Timothy Clark

Computer-Chemie-Centrum, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.

The possible role of quantum mechanical (QM) techniques in cheminformatics is discussed. The advantages, disadvantages and capabilities of QM and its applicability to databases of thousands of molecules are discussed. The critical relationship between quantitative structure-property relationships (QSPRs) and the quality of the experimental data is discussed using aqueous solubility as an example. The use of QM-derived descriptors to investigate physical property space and to characterise compounds as drug-like or non-drug-like is illustrated. Finally, it is pointed out that not QM-calculations, but rather a knowledge of the molecular electron density is necessary for the examples shown, and a technique that can reproduce the electron density without QM-calculations is presented.


Gisbert Schneider

Johann Wolfgang Goethe-Universität, Institut für Organische Chemie und Chemische Biologie, Frankfurt am Main, Germany.

"Molecular informatics" is a scientific discipline devoted to analysing and understanding the storage, processing and distribution of information encoded by molecules and molecular interactions, coined by contemporary bio- and cheminformatics research. Although this definition of molecular informatics may not be perfect, it is comparably easy to comprehend. The term "complexity" appears more vague and difficult to define. Although most of us do have an intuitive understanding of what complexity suggests, different persons will probably give a different answer to the question what complexity actually means and implies in the context of molecular informatics. The Beilstein-Workshop Molecular Informatics: Confronting Complexity brought together an international group of scientists to present their research, exchange ideas and opinions, and discuss complex systems in the light of the workshop’s challenging title.