Studies on Yeast Membrane Transporters – How can Computational Biology help?



Carsten Kettner

Beilstein-Institut, Frankfurt am Main, Germany

With the availability of complete genome sequences, emphasis has shifted toward the understanding of protein function and this in turn has opened up a new “-omics”-field, i.e. functional proteomics. Functional proteomics of membrane proteins provides an important tool for the understanding of tissue development, cell-cell communication, signal transduction and transport of nutrients across membranes. Structural studies of proteins are only one aspect of functional proteomics and are mostly carried out by computational means. However, these investigations must be completed by function studies resulting in structure/function relationships and this can only be accomplished at the lab bench.

Some examples of comprehensive investigations on transport proteins of the model eukaryotic cell, Saccharomyces cerevisiae (a yeast), can be used to illustrate these relationships. This research includes visualisation by microscopy, localisation within the cell and examination of the subunit composition as well as sequencing of the peptide chain and association to the corresponding gene by biochemical means. In addition, the transport activity of both a plasma membrane-bound potassium channel, and a tonoplast-residing proton pump have been studied in detail by biophysical approaches (1, 2). The results of these investigations on the structure/function relationships demonstrate a fruitful cooperation of so-called traditional “wet” biology and computational biology.

1. Reid, J.D., Lukas, W., Shafaatian, R., Bertl, A., Scheurmann-Kettner, C., Guy, H.R., North, R.A. (1996). The S. cerevisiae Outwardly-rectifying potassium channel (DUK1) identifies a new family of channels with duplicated pore domains. Rec. Channels, 4:51-62.

2.Bertl, A., Bihler, H., Kettner, C., Slayman, C.L. (1998). Electrophysiology in the eukaryotic model cell Saccharomyces cerevisiae. Eur. J. Physiol. 436 :999-1013.

back to Program

 

Protein Folding, Evolution and Disease



Christopher M. Dobson

Department of Chemistry, University of Cambridge, Cambridge, United Kingdom

Protein folding is perhaps the most fundamental process associated with the generation of functional structures in biology.  There has been considerable progress in the last few years in understanding the underlying principles that govern this highly complex process.  Central to much of this progress has been the development of ideas as to the nature of the energy surface or landscape for a folding reaction.  These ideas have arisen from a combination of theoretical analysis and experimental investigation (Dinner et al., TIBS 25, 331-339, 2000).  Of particular importance in the latter has been the concerted application of a wide range of experimental techniques, each able to describe aspects of the structural changes taking place during the folding process.  NMR spectroscopy and protein engineering have both been key methods in this approach because of their ability to provide structural and dynamical information at the level of individual residues.  Recently, new approaches have been devised that combine experimental data directly with simulation techniques to define the structures of key species on the folding surface (Vendruscolo et al., Nature 409, 641-645, 2001).

Recently, much research has also focused on the realisation that proteins can misfold in vivo and that this phenomenon is linked with a wide range of diseases.  We have been investigating in particular the nature of the amyloidogenic conditions (that include Alzheimer’s disease and the spongiform encephalopathies, e.g. BSE and CJD) in which protein misfolding leads to the aggregation of proteins, often into fibrillar or thread-like structures.  We have looked in particular at a non-neuropathic amyloidosis associated with lysozyme, one of our model systems for investigating the fundamentals of protein folding.  (Booth et al., Nature 385, 787-793, 1997; Canet et al., Nature Struct. Biol. 9, 308-315, 2002).  This work has recently been extended by the finding that many proteins not associated with clinical manifestations of disease can form amyloid structures in the laboratory under appropriately chosen conditions (Chiti et al., PNAS 96, 3590-3594, 1999; Fandrich et al., Nature 410, 165-166, 2001).  Such findings have led us to explore some of the underlying characteristics of protein sequences that result in a high propensity to form amyloid structures (Chiti et al., Nature Struct. Biol. 9, 137-143, 2002), and to put forward ideas concerning the origin of the various diseases associated with the information (Bucciantini et al., Nature 416, 507-511, 2002).  We have also speculated more generally that the avoidance of aggregation could be a significant driving force in the evolution of protein sequences and structures (Dobson, Phil. Trans. R. Soc. Lond. B356, 133-145, 2001).

back to Program

 

Using Evolutionary Information to Study G-Protein Coupled Receptors



Richard Goldstein

Department of Chemistry, University of Michigan; Ann Arbor, United States of America

The genomic data available to computational biologists represents the product of the complex processes of evolution. In particular, the forces of mutation, duplication, and selection have acted to sculpt modern protein sequence and structure in the context of changing functional requirements. Just as crystallographers are able to determine protein structures through an analysis of X-ray diffraction patterns, we wish to read the evolutionary history of proteins in order to understand their structures, functions, and interactions.  To this end, we have been developing models of natural site substitutions that are informed by the protein structure and function and the resulting variations in selective pressures.  The model identifies locations in the protein under similar selective pressure, characterizes the form of the selective pressure, and locates positions under unique selective pressures.  Comparison of different groups of proteins can identify changes in selective pressure corresponding to changes in function or specificity. We illustrate this approach by applying these models to sets of G-protein coupled receptors, identifying general structural and functional features of these receptors.

back to Program

 

Bioinformatic's View on Molecular Mimicry



Cornelius Frömmel and Robert Preissner

Institute of Biochemistry, Medical Faculty Charité Humbolt-University, Berlin, Germany

The specific interaction between macromolecules and their ligands is one of the basic phen­omena of life. The unique process of molecular recognition relies on geometric and physical singularities of interacting sites of the distinct molecules. An interesting question is whether two non-related protein surface elements can superimpose at atomic level meaning that structural elements can mimic each other. Such molecular mimicry is an interesting question related to several fields of modern biology:

  • drug design
  • design of catalytic antibodies
  • structure/function prediction of genes
  • auto-reactive antibodies/ cross reactivity

To discuss molecular mimicry in biology one has to consider several related aspects (mi­micry imitation similarity homology cross-reactivity) and to distinguish different level of molecular mimicry (one-dimensional = sequence similarity, two and three-dimensional simi­larity of molecules).

One of the relevant problem in antibody-antigen-complexes is the question if molecu­lar mimicry can be observed by chance. Comparison of the theoretical number of conforma­tions with the observed one in proteins we could show that in reality the number of different arrangements in proteins are significantly restricted compared to completely free arrangement of distinct number atoms in the same volume. Furthermore, using the databank of interacting surface areas in proteins (DIP, Preissner et al., 1998) we are able to find several structurally similar subsets of atoms up to 40 atoms which are member of different secondary structure elements and show interaction with other parts of the same protein molecule or surrounding water. Whether amino acid sequence nor the atoms found at comparable positions show any evolutionary or functional relationship. Comparing known complexes of multi-specific mono­clonal antibodies complexed with different non-homologues peptides (Keitel et al., 1997, Kramer et al., 1997; Hoffmüller et al., 2000) it can be shown, that the binding of different peptides at the identical site of the anti-bodies is realised in geometrical completely different way. This means that binding to one fixed site can be reached by very different geometry of the ligand (=mimicry in the sense of Greek philosophers). On the other hand our search procedure based on the concept geo­metrical similarity of ligands (Preißner et al., 1999a,b,c; Frömmel et al. 2001) successful­ly predict ligands (e.g. substrates and inhibitors of proteases) as well it was able to find peptido-mimetica for protein design (Preissner et al., 2001) in a data bank of low molecular weight substances

These results show, that molecular mimicry remains successfully as concept for molecular design as well as it is relevant for understanding of cross reactivity of antibodies.

Frömmel, C., Preißner, R and Goede, A. (1999a) Determination of Ligands for Proteins. WO 00/04380

Hoffmüller, U., Knaute, T., Hahn, U., Höhne, W., Schneider-Mergener, J. and Kramer, A. (2000) Evolutionary transition pathways for changing peptide ligand specificity and structure. EMBO J. 19: 4866-4874.

Keitel, T., Kramer, A., Wessner, H., Scholz, C., Schneider-Mergener, J. and Höhne, W. (1997) Crystallographic analysis of anti-p24 (HIV-1) monoclonal antibody cross-reactivity and polyspecificity. Cell. 91: 811-820.

Kramer, A., Keitel, T., Winkler, K., Stöcklein, W., Höhne, W. and Schneider-Mergener, J. (1997) Molecular basis for the binding promiscuity of an anti-p24 (HIV-1) monoclonal antibody. Cell 91: 799-809.Preißner, R., Goede, A. and Frömmel, C. (1998) Dictionary of interfaces in proteins (DIP). Data bank of complementary molecular surface patches. J. Mol. Biol. 280:535-550.

Preißner, R., Goede, A. and Frömmel, C. (1999b) Homonyms and Synonyms in the Dictionary of Interfaces in Proteins (DIP). Bioinformatics. 15: 832 - 836.

Preißner, R., Goede, A. and Frömmel, C. (1999c) Spare Parts for Helix-Helix interaction. Protein Eng. 12: 825-831.

Preißner, R., Goede, A., Rother, K., Osterkamp, F., Koert, U., and Frömmel, C. (2001) Matching organic libraries with protein substructures. J. Comp. aided Mol. Design 15:811-817.

back to Program

 

Pattern Recognition and Distributed Computing in Drug Design



Graham Richards

Department of Chemistry, University of Oxford, Oxford, United Kingdom

Computational methods developed in the area of medical imaging can be adapted to find ligand binding sites on proteins. Once the binding site is specified, libraries of real or virtual molecules may be screened to seek out compounds which have very strong affinity. Massively distributed computing enables huge numbers of molecules to be screened.

These approaches will be illustrated by reference to a search for inhibitors of the binding of anthrax lethal factor to the protection antigen. With the site identified, some 3.5 billion molecules were tested in 24 days using the power of 1.4 million personal computers running a screen saver. Over 300,000 hits were revealed with approximately 12,000 looking particularly promising.

back to Program

 

 

Generating Synthetically Accessible Ligands by De Novo Design



Peter Johnson

School of Chemistry, University of Leeds, Leeds, United Kingdom

Because of the way in which structures are built up from scratch, de novo design systems are capable of generating a tremendously diverse range of ligands which are predicted to bind tightly to a target protein.  However, a significant disadvantage to this approach is that after a structure generation process (which may be very demanding of computer resources), many of the solutions produced may not be synthetically accessible. The CAESA program attempts to overcome this deficiency by post generation scoring and ranking according to an estimate of synthetic accessibility, but this approach is inefficient in that large numbers of structures are generated by de novo design and then pruned in a separate computationally demanding process.

The approach used in SYNSPROUT, a new variant of SPROUT, is to build synthetic constraints into the structure generation process by staring with a library of readily available starting materials, which are used in the both the initial docking process and also in a build up process which only permits joins which correspond exactly to a chemical reaction defined in a user created knowledge base.

The current version of the program works well with medium sized databases of starting materials. For large databases (hundreds of thousands) of starting materials, the combinatorial nature of the structure generation process means that even the recently developed parallel version would be too slow and work in hand is geared to overcoming this problem.

Details of the methods will be presented, along with a discussion of remaining problems and potential methods for overcoming them.  Experimental results relating to the application of SPROUT to the design of specific enzyme inhibitors will also be discussed.

Finally, some very recent work on the development of a novel approach to the flexible docking of ligands to proteins will be outlined.

back to Program

 

Physicochemical Properties and the Discovery of Orally Active Drugs: Technical and People Issues



Christopher A. Lipinski

Pfizer Global Research and Development, Groton Laboratories, United States of America

Oral absorption depends on adequate solubility and intestinal permeability.

A compound is insoluble because it is either too lipophilic or the intermolecular crystal packing forces for the compound are too strong. Globally, in the current era, poor aqueous solubility is the single largest physicochemical problem hindering drug oral activity. The relative importance of poor solubility vs. poor permeability as a source of poor oral activity is very dependent on the method by which leads are generated as can be seen by an examination of the time dependent trends in Merck v.s. Pfizer, Groton clinical candidates.

Dealing with solubility or permeability problems in an early discovery setting is not purely a technical issue of assay design or computational prediction. People and organizational issues are extremely important. In the age of combinatorial chemistry some element of modest batch mode capability in searching the literature for chemical structures and physicochemical properties, e.g. melting point, is very important. Policies of the major organizations abstracting the literature have generally not recognized these recent chemistry changes. Assay or computational results must be communicated to medicinal chemists in a manner that allows chemists to decide how to modify chemical structure. Communication with chemists is best when it takes advantage of the chemists’ superb pattern recognition skills and is least effective when presented in an equation format or in terms that cannot be equated with chemical structure. Internal organizational issues are very important because the requirements of a computational or experimental assay on a topic like solubility are very different between early discovery and early development settings. Lack of clarity as to the requirements of the two settings can lead to unintended negative consequences.

A computational ADME filter like the "rule of 5" is most effective when used prior to the beginning of experimentation. In my opinion, there is a hierarchy of properties that can be controlled by chemistry. Tight structure activity relationships (SAR) equate with good control. Properties important to oral activity like solubility and permeability do not show tight SAR and hence need early computational prediction and early experimental assays.

back to Program

 

 

Molecular Feature Mining in Predictive Toxicology



Luc De Raedt, Christoph Helma, Stefan Kramer

Institut für Informatik, Albert-Ludwigs-University Freiburg, Freiburg, Germany

Molecular feature mining is a novel data mining approach to extract scientifically relevant features from molecular databases.

First, we introduce inductive databases on which molecular feature mining is based. Inductive databases are a recent framework for databases in which one does not only store data but also patterns.   Data mining in inductive databases then becomes a querying process, where the user specifies the patterns of interest in an inductive database query language. E.g. the user might specify that the patterns should occur within a given example molecule, that they should also occur in at least 15 % of the actives and that they should occur in at most 0.5 % of the inactives. The inductive data base system then generates the patterns of interest.
Secondly, we apply this general framework in the context of the  molecular feature miner MolFea [Kramer, De Raedt, Helma, SIGKDD 01]. MolFea searches for linear patterns in the 2D-structure of molecules. It uses Smiles and Smarts strings to encode the molecules and the patterns respectively. MolFea is able to efficiently extract patterns of interest in large databases. In one experiment, MolFea was able to mine the Developmental Therapeutics Program's AIDS Antiviral Screen Database with approximately 40.000 compounds in a couple of cpu-hours on a Pentium III.

Thirdly, we show how the patterns extracted with MolFea can be used as features for Quantitative Structure Activity Relationships (QSAR) and traditional data mining techniques, such as support vector machines, decision trees, and regression models. We report on several benchmarks in domains of carcinogenicity, biodegradability and predictive toxicology. In turns out that the combination of MolFea with support vector machines leads to accurate predictive models

Kramer, S., De Raedt, L. and Helma, C. Molecular Feature Mining in HIV Data. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, August 26-29, 2001, San Francisco, CA, USA. ACM, 2001, pp. 136-143.

back to Program

 

 

 

Metabolic Analysis as a Tool in Drug Design



Athel Cornish-Bowden

CNRS-BIP, Marseille, France

Design and discovery of new drugs has stagnated in the past 30 years. Out of many possible illustrations, the failure to discover any new classes of antibiotics in a 34-year period between 1963 and 1997 is especially striking, the development of genetic engineering and related fields during the intervening period being almost without effect. The time has therefore come to ask whether fundamentally different approaches to drug design will work better than the methods currently in use, especially ones that take account of the knowledge of metabolism and enzyme kinetic behaviour that has been gathered during the past century. Taking a more positive example, the herbicide Roundup (Glyphosate) acts not by depressing the metabolism of affected plants but by increasing the concentration of a normal metabolite (shikimate) to the point where it becomes lethal; it does this by acting as an uncompetitive inhibitor; competitive inhibition of the same enzyme would certainly be ineffective. Judging whether a given enzyme inhibitor is likely to have significant effects in a living organism requires a detailed knowledge of the metabolism of the organism, the amount of flux control that the inhibited enzyme has, and any constraints that may affect the possibility of causing large changes in metabolite concentrations.

We have studied these ideas in relation to some simple artificial models and to a model of glycolysis in the parasite Trypanosoma brucei. This more complex example illustrated some of the complications that need to be taken into account when trying to predict the effects of inhibiting enzymes in living cells. One important point is that non-trivial pathways may impose non-trivial and non-obvious stoicheiometric constraints on the concentrations of many metabolites. Another is that although reverse reactions have frequently been ignored in experimental studies of the kinetic behaviour of enzymes they cannot be ignored in constructing metabolic models; if they are, the results may be entirely misleading.

back to Program

 

Representing and Analysing Information on Networks of Cellular Function



Shoshana J. Wodak1,2, Jacques Van Helden1,Christian Lemer1, Avi Naim2, and Lorenz Wernisch2,3

1SCMBB,CP 160-16, Free University of Brussels, Brussels, Belgium

2EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge United Kingdom

3 Department of Crystallography, Birkbeck College. London, United Kingdom

With the availability of more than 60 complete genome sequences, including the sequence of the human genome, the need to step up our efforts to interpret these vast amounts of data in terms of molecular and cellular function has become pressing. To proceed with such interpretation, working at the level of individual genes or proteins is not satisfactory because their biological function and evolution is often interdependent and contextual. It is therefore necessary to use information on the networks of physical and functional interactions between the proteins and the genes that occur in the living cell. 

These networks are the metabolic and regulatory pathways, the signal transduction and transport processes.  They involve very complex biological information, which is not readily available in electronic form, and is often lacking altogether.  Systematic efforts for generating and archiving this information are therefore urgently called for.  In particular, availability of this information in electronic form will make it amenable to systematic analyses. Such analyses are required not only for the interpretation of high throughput gene expression data and proteomics experiments, but are also bound to be a central component in the assignment of gene function, elucidating cellular mechanisms and evolutionary processes.

We will present on-going work on the development of aMAZE, a database on networks of cellular functions. The conceptual framework for representing data on metabolic pathways and its extension to other cellular processes will be described.  We will furthermore illustrate the types of analyses that the database is enabling, discuss the tools that are required to perform them, and if time permits, outline how the analyses of functional networks could be integrated with comparative genome studies to improve the assignment of gene function.

References

- Graph-based analysis of metabolic networks

van Helden J., Wernisch L, Gilbert D., and Wodak S.J., Proceedings of the scheering symposium, 2002 (in press).

- Representing and analysing molecular and cellular function using the computer

van Helden, J., Naim, A., Mancuso, R., Eldridge, M., Wernisch, L., Gilbert, D. & Wodak, S. J., Biol Chem 381(9-10), 921-35 (2000)

- From molecular activities and processes to biological function

van Helden, J., Naim, A., Lemer, C., Mancuso, R., Eldridge, M. & Wodak, S., Briefings in Bioinformatics 2(1), 98-93 (2001).

back to Program

 

Functional Genome Analysis



Jörg Hoheisel

Deutsches Krebsforschungszentrum, Heidelberg, Germany

The division of functional genome analysis at the DKFZ is involved in the development of technologies for the analysis of large genomic areas to entire genomes with respect to the encoded functions and their regulation. Based on technical advances, various functional aspects are being analysed. One emphasis is work on DNA-, protein-, and peptide-microarrays. Apart from addressing chemical and biophysical issues, the resulting methods are immediately put to the test in relevant, biologically driven projects. Also, systems are being developed toward early diagnosis, prognosis and evaluation of the success of disease treatment with some accentuation on cancer. Comparative studies on transcription and actual protein expression and epigenetic factors are under way (http://www.dkfz-heidelberg.de/funct_genome/index.html).

back to Program

 

HaloLex: A Lighthouse in the Flood of Information



F. Pfeiffer, J. Wolfertz, C. Garcia-Rizo, V. Hickmann, T. Schmidt, M. Falb, D. Oesterhelt

Max Plank Institute of Biochemistry, Martinsried, Germany

The HaloLex database is designed to store biological information concerning genomics, proteomics, genetics, and physiology of Halobacterium salinarum. We attempt to link biological information to their experimental basis as much as possible. We hope to be able to overcome the problem of database errors resulting from automatic annotation and incautious homology-based information transfer.

Genome sequencing results in mass prediction of protein sequences, most of which have never been subject to experimental analysis. Large-scale function assignments are made by application of bioinformatic techniques, usually based on sequence homology analysis. In automatic annotation, biological information is transferred from a source sequence to a target sequence considered to be homologous.

Homology-based function assignments may introduce errors due to inherent biological problems:

(a)             It is difficult to reliably distinguish between distant homologs and nonhomologous sequences. Below 30% sequence identity, additional evidence should be considered (e.g. statistical analysis, conservation of motifs)

(b)             Differences in domain architecture must be considered. Only functions that can be assigned to the common domain(s) can be transferred.

(c)             False positives need to be detected, e.g. non-homologous sequence pairs with an artificially high blast scores.

(d)             Even closely related proteins may have functional differences, e.g. by using different substrates

(e)             Identity of the molecular activity may result in distinct physiological responses due to differences in interaction between proteins.

Most of the proteins in public sequence databases never have been subject to experimental analysis and this also holds true for the functions assigned to these proteins. Unfortunately, the databases scarcely discriminate between experimental data and computer predictions. Also, it is usually not possible to trace the source of the annotation information. Annotation errors, once entered into databases, are propagated and lead to the "homolog_of_homolog_of_homolog" problem referred to as "transitive catastrophe". Current estimates are that up to 20% of the annotation information in public sequence databases are incorrect.

In the HaloLex project, we commit ourselves to provide reliable biological information.

(a)             Attempts are made to collect experimentally validated biological information.

(b)             We validate biological information used for homology-based function assignment. Validation is documented in "microreviews" which are linked to the corresponding literature and to the corresponding sequences.

(c)             The documentation which homolog was used for function assignment and the annotation of the level of homology permits to trace the route of information transfer. The overall validity of the resulting information is characterized and derived information is updated in case of revisions and corrections.

The HaloLex project is in the early implementation phase. We have loaded the genome and annotation information derived from our genome sequencing project of Halobacterium salinarum DSM671. Part of the information is available on our public web page "www.halolex.mpg.de". Additional data and functionalities, currently developed in the context of our internal web pages, will be made publicly available in the near future.

back to Program

 

Computer-Aided Design and Screening of Combinatorial Drug Libraries



William L. Jorgensen

Department of Chemistry, Yale University, New Haven, United States of America

A hierarchy of approaches is being taken to estimate protein-ligand binding affinities as a critical element in drug design.

(a)      The most sophisticated calculations feature Monte Carlo (MC) and molecular dynamics (MD) simulations with statistical perturbation (FEP) theory. The OPLS-AA force field is used with explicit inclusion of TIP4P water molecules. These procedures can reliably yield relative free energies of binding for structurally similar ligands at a rate of about 1 ligand per 1 GHz-processor week.

(b)      A faster (1 ligand per day)  “linear response” (LR) approach  is also being pursued that involves the same complete MC simulations, but only for the unbound and bound states of the ligand, while the FEP calculations require multiple intermediate calculations as well. Descriptors from the MC/LR calculations are used in a QSAR-like approach to correlate known binding affinities or activities and to predict new ones.

(c)       Finally, a rapid (20 ligands per hour) procedure is implemented in the GenMol program, which performs an extensive conformational search in a protein’s binding site to find the optimal positioning of a ligand. Large trial sets of ligands are automatically generated by GenMol from a virtual library of >1010 compounds, which is created from >100 cores and >300 substituents. Each ligand is screened for desired drug-like properties with the integrated QikProp program. A scoring function is used to rate the binding affinity or activity for each trial ligand

Representative applications of the methods will be presented.

Recent references:

COX-2, Src SH2 Domain, HIV Reverse Transcriptase, and Thrombin: Computational Approaches to Drug Design. W. L. Jorgensen, M. L. P. Price, D. J. Price, R. C. Rizzo, D. Wang, A. C. Pierce, and J. Tirado-Rives, In Free Energy Calculations in Rational Drug Design, M. R. Reddy and M. D. Elion, Eds.; Kluwer: New York, 2001; p 299-316.

Antiviral Drug Design: Computational Analyses of the Effects of the L100I Mutation for HIV-RT on the Binding of NNRTIs. D.-P. Wang, R. C. Rizzo, J. Tirado-Rives, and W. L. Jorgensen, Bioorg. Med. Chem. Lett., 11, 2799-2802 (2001).

Prediction of  Drug Solubility from Structure. W. L. Jorgensen and E. M. Duffy, Adv. Drug Delivery Reviews, 54, 355-366 (2002).

Estimation of Binding Affinities for Celecoxib Analogues with COX-2 via Monte Carlo – Linear Response. S. S. Wesolowski and W. L. Jorgensen, Bioorg. Med. Chem. Lett., 12, 267-270 (2002).

Prediction of Activity for Non-nucleoside Inhibitors with  HIV Reverse Transcriptase Based on Monte Carlo Simulations.R. C. Rizzo, M. Udier Blagovic, D.-P. Wang, E. K. Watkins, M. B. Kroeger Smith, R. H. Smith, Jr., J. Tirado-Rives, and W. L. Jorgensen, J. Med. Chem., 45, 0000-0000 (2002). In press.

back to Program

 

New Approaches to In Silico Screening



Hans-Joachim Böhm

Discovery Chemistry, Pharmaceuticals Division, F. Hoffmann-LaRoche Ltd, Basel, Switzerland

The 3D structure of a target protein can be used to design chemical libraries with significantly enhanced hit rates. Several examples will be presented to highlight the current status of the field. Recent improvements in scoring function further enhance the hit rates.

Increasingly, physicochemical and pharmacokinetic properties are taken into account in the library design process. The goal is to design compounds that bind to the target protein and have the right “drug-like” properties. Predicted properties include lipophilicity, membrane permeability, solubility, protein binding but also certain aspects of metabolism and toxicity. Combined with novel rapid three-point pharmacophore based searching methods, it is possible to process large virtual libraries.

back to Program

 

 

Hits, Leads, and Artefacts from Molecular Docking & other Screens



Brian Shoichet

Molecular Pharmacology & Biological Chemistry, Northwestern University, Chicago, United States of America

Molecular docking is widely used to discover novel lead compounds that complement a receptor of known structure.  Because of problems with scoring functions and sampling, docking remains a screening technique prone to false positives and false negatives. 

We have investigated how docking enriches hit rates and predictions both in highly simplified systems and in more complex systems.  Docking was tested in small cavities engineered into T4 lysozyme to investigate how well the program could both capture known binding patterns and predict new ones.  Docking screens were also undertaken against PTP1B, a drug target, in parallel with high-throughput screening experiments.  This allowed us to compare docking hit rates with those afforded by random screening.  Finally, docking was used in an attempt to discover novel inhibitors of beta-lactamase. 

Many inhibitors were found, and although some turned out to be specific, most were non-specific.  Surprisingly, investigation of similar compounds in the literature, discovered by both virtual and high throughput screening, suggests that many reported inhibitors bind non-specifically to many targets.  These inhibitors all have strange kinetic profiles, distinctly different from classically behaved inhibitors. 

Biophysical studies suggest a single, non-classical mechanism for these inhibitors.  Studies of pharmaceutical database suggest that promiscuous inhibitors that have this mechanism are widespread in drug screening databases.

back to Program

 

 

High Throughput X-Ray Crystallography for Drug Discovery



Harren Jhoti

Astex Technology Ltd, Cambridge Science Park, Cambridge, United Kingdom

Despite improvements in high-throughput screening technologies and combinatorial chemistry, and against the rising numbers of protein targets, the number of drug candidates entering development remains unchanged. To address this productivity gap, companies are now using more knowledge-based approaches based on the 3-dimensional structure of the protein target to accelerate the discovery of lead compounds.

Traditionally, the utility of protein structures to the drug discovery process ranges from initial selection and validation of biological targets to directing chemistry programs in the actual discovery and optimisation of lead compounds. However, technology advances are now allowing protein/ligand crystal structures to be determined at such a rate that X-ray crystallography is being developed as a screening technology. A key advantage of using X-ray crystallography as a detection method in lead discovery is that weak-binding compounds, that would have been missed using bioassay-based screening, can be identified and their binding modes defined.

In this presentation I will outline key technology developments that have allowed protein/ligand structures to be solved at significantly higher rates of throughput. Also, I will illustrate the advantages and technical challenges that remain for using X-ray crystallography as a screening tool. As an example of this approach to lead discovery, I will describe the use of X-ray crystallography to screen libraries of molecular fragments to identify novel lead compounds of p38 kinase and CDK-2. In these programs X-ray crystallography was used to identify and transform weak-binding molecular fragments, which have novel scaffolds but only mM affinity, into novel lead compounds, with nanomolar affinity and cellular activity in under 6 months.

back to Program

 

 

Computer-Assisted Decision Making in Pharmaceutical Research



Gerald M. Maggiora

Computer-Aided Drug Discovery, Pharmacia Corporation, Kalamazoo, United States of America

Decision making in pharmaceutical research can be a daunting task.  The reason for this is that scientists must deal with a wide range of data some of which is quantitative and some of which is qualitative.  Any methodology for assisting decision making must therefore be able to deal with these somewhat conflicting classes of data. 

The Analytic Hierarchy Process (AHP) invented by Thomas Saaty more than twenty years ago is ideally suited to the task.  It allows decision makers to handle both qualitative and quantitative data in a comprehensive, integrated, and mathematically rigorous manner.  More importantly, it portrays the results of analyses in a way that is comprehensible and intuitive to scientists.  Examples related to ‘target selection’ and ‘assessment of compound quality’ that illustrate applications of the AHP to pharmaceutical research will be presented.

back to Program

 

Knowledge Based Lead Finding by Matching Chemical and Biological Space



Karl-Heinz Baringhaus, Thomas Klabunde, Thorsten Naumann and Bernard Pirard

Chemistry/Computational Chemistry, Aventis Pharma Deutschland GmbH, Frankfurt am Main, Germany

The competitive pressure in pharmaceutical industry requires a reduction of project cycle-times and hence an increase in productivity and efficiency. Therefore, effective lead finding techniques to shorten early stages of drug discovery are of major interest.

We are currently strengthening our lead finding activities in target families. Such a target family approach makes use of all available public and proprietary chemical and biological information (e.g. compounds, 3D target structures) to build knowledge about compounds and targets. Thereby, a target related chemical and biological space is assembled, which offers new efficient opportunities for lead finding. Potential new leads for a certain biological target are subsequently identified through matching biological and chemical space.

The chemical space of a target family is generated through building databases of biologically active compounds and privileged frameworks and fragments. Similarity and substructure searching as well as 3D-based virtual screening in internal and external compound collections yields a target family directed compound library. The biological space is usually comprised through sequences, 3D structures and eventually pharmacophores of a target family. A match between both spaces reveals subsequently preferred compounds for biological testing.

The Aventis ion channel space contains currently 15412 compounds and 1743 scaffolds. Screening of this library or of subsets thereof against new ion channels revealed hit rates of approximately 4%. The high quality of these hits reduced the cycle time for screening and optimization.

Our kinase directed compound library was built on known crystal structures of kinases through generation of generic kinase pharmacophore models to elucidate the most essential scaffolds. In addition, matching of kinase frameworks with frameworks from the Aventis compound repository yielded the current biased kinase library. A detailed biological space of kinases was generated through sequence and 3D-based comparison of the ATP binding site. Biological profiling of this library against new kinase targets exhibited already increased success rates compared to that of high throughput screening (HTS) of our compound collection and revealed detailed information how to achieve selectivity.

All target family related libraries are dynamic and constantly improved through proprietary and public knowledge. In addition, several matching techniques between chemical and biological space are developed to find better leads faster.

back to Program

 

Providing Cheminformatics Solutions to Medicinal Chemists to Support Drug Discovery Decisions



Kevin R. Holme and Rudolph Potenzone

Lion Bioscience, Inc., San Diego, United States of America

Drug discovery programs have had to deal with an avalanche of data coming from both the adoption of new technologies such as high throughput screening and combinatorial chemistry, as well as advances in genomics and structural genomics which have facilitated a gene family target approach to drug discovery.  Although this data rich environment has been a challenge to manage, it has provided an opportunity for the development of informatics-based tools and solutions to extract information from this large body of data, and convert this information into knowledge that can be used and reused for drug discovery.

In the cheminformatics field there has been considerable focus on the development of new tools to visualize and analyze the data, particularly with relation to identifying new leads, and analyzing SAR for lead optimization.  While individual cheminformatics tools are critical for analyzing this data, a real opportunity exists to provide solutions that synthesize results from these analyses into knowledge to support drug discovery decisions.  This remains largely a "manual" activity that takes place within individual project teams.

This paper will describe some concepts and implementations of cheminformatics solutions that begin to address the need for reusable knowledge generation within drug discovery projects.  The talk will address the integration of chemical and biological data as well as the integration of tools and models. The power of using predictive tools for compound design will be highlighted as well as methods to simultaneously consider multiple structure activity relationships. We will describe how providing such solutions to project teams can increase the effectiveness of discovery programs by giving biologists and chemists the ability to perform virtual hypothesis evaluation and multidimensional optimizations from their desktops. Finally, some specific examples of how this has been applied to internal drug discovery programs at Lion will be presented.

back to Program

 

Does Quantum Chemistry Have a Place in Cheminformatics ?



Timothy Clark

Computer-Chemie-Centrum, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

Cheminformatics demands on the one hand that computational techniques be as fast and simple as possible, but on the other hand that the description that they provide of the molecular properties be as accurate and detailed as possible. Simple 2D-approaches fulfil the first condition, but it is becoming increasingly clear that they are often not adequate for the second. Modern hardware, however, makes the use of semiempirical molecular orbital theory feasible for tens of thousands of molecules and even density functional theory for hundreds. These techniques allow a far more detailed description of the molecular electronic properties, especially electrostatics and polarizability.

Examples of the use of such techniques for predicting physical properties or biological activity will be given. Special attention will be paid to the number and nature of the factors determining physical properties and to the use of quantum mechanically calculated binding vectors and points for high throughput virtual screening (HTVS).

The advantages and disadvantages of different types of calculational techniques and descriptors for property prediction and HTVS will be discussed, as will the problem of multiple conformations in activity or toxicity prediction. The contrast between the quantum mechanical approach, which treats whole molecules and generally only uses their properties at the molecular surface, and 2D- or 3D-structure-based approaches will be emphasized.

back to Program