Special Reviews

The Many Roles of Computation in Drug Discovery

See allHide authors and affiliations

Science  19 Mar 2004:
Vol. 303, Issue 5665, pp. 1813-1818
DOI: 10.1126/science.1096361


An overview is given on the diverse uses of computational chemistry in drug discovery. Particular emphasis is placed on virtual screening, de novo design, evaluation of drug-likeness, and advanced methods for determining protein-ligand binding.

“Is there really a case where a drug that's on the market was designed by a computer?” When asked this, I invoke the professorial mantra (“All questions are good questions.”), while sensing that the desired answer is “no”. Then, the inquisitor could go back to the lab with the reassurance that his or her choice to avoid learning about computational chemistry remains wise. The reality is that the use of computers and computational methods permeates all aspects of drug discovery today. Those who are most proficient with the computational tools have the advantage for delivering new drug candidates more quickly and at lower cost than their competitors.

However, the phrasing of the question suggests misunderstanding and oversimplification of the drug discovery process. First, it is the rare case today when an unmodified natural product like taxol becomes a drug. It is also inconceivable that a human with or without computational tools could propose a single chemical structure that ends up as a drug; there are far too many hurdles and subtleties along the way. Most drugs now arise through discovery programs that begin with identification of a biomolecular target of potential therapeutic value through biological studies including, for example, analysis of mice with gene knockouts. A multidisciplinary project team is then assembled with the goal of finding clinical candidates, i.e., druglike compounds that are ready for human clinical trials, which typically selectively bind to the molecular target and interfere either with its activity as a receptor or enzyme. Molecular libraries are screened, and the resulting leads are optimized in a cycle that features design, synthesis and assaying of numerous analogs, and animal studies. Crystal structure determination for complexes of some analogs with the biomolecular target is often possible, which enables “structure-based drug design” (SBDD) and the efficient optimization of leads. The success of SBDD is well documented (1, 2); it has contributed to the introduction of ∼50 compounds into clinical trials and to numerous drug approvals. Minimally, the role of computation here is in the structure refinement using simulated annealing (3), development of the underlying molecular mechanics (MM) force fields, structure display, and building and MM evaluation of analogs. All top pharmaceutical companies have substantial structural biology and computational chemistry groups that are intertwined and participate on the project teams.

There is usually much “tweaking” toward the end of the preclinical period of drug discovery when a series of compounds with adequate potency has been identified and the remaining concerns focus on differences in pharmacological issues relating to bioavailabilty, duration of action, and toxicity. As an example, the methyl group in celecoxib (Celebrex) makes the compound a weaker COX-2 inhibitor than the unsubstituted parent or chloro analog (Scheme 1); however, it also introduces a site for metabolic oxidation that then leads to acceptable clearance of the drug (4). In the more common case, too much metabolic activity reduces bioavailability, so, for example, a reactive hydrogen may be replaced by a halogen to block a metabolic process (5).

Scheme 1.

Although the production of primary metabolites can be well predicted, the details of the kinetics cannot. Tweaking is common as well to improve solubility or cell permeability, which are also central to bioavailability. The list of other issues and concerns is long, including the action of efflux pumps, active transport, potential drug-drug interactions (e.g., don't take antacids and fluoroquinolones including ciprofloxacin because metal ion chelation reduces their absorption), drug distribution in blood and tissue, binding to plasma proteins, microbial resistance, and toxicities of many forms such as mutagenicity, nephrotoxicity, hepatotoxicity, and the ventricular irregularity known as long Q-T syndrome (6). The complex differences between animals and humans are addressed mainly in human clinical trials, and, finally there are the differences between humans, which lead to the future of pharmacogenomics (7). Drug discovery is complex: Successful teams and companies need to be congratulated, whereas the search for one responsible individual or computer program is counterproductive. There is not going to be a “voilà” moment at the computer terminal. Instead, there is systematic use of wide-ranging computational tools to facilitate and enhance the drug discovery process. Aside from the universal and now taken for-granted use of software developed by chemists for structure drawing, database entry-management-query, two- to three-dimensional (2D to 3D) structure conversion, molecular visualization, quantum chemistry, molecular mechanics, conformational searching, molecular dynamics, and biomolecular structure refinement, recent advances in several specific areas in the lead to candidate progression should be highlighted.

Library Screening and De Novo Design

At the lead-generation stage, experimental high-throughput screening (HTS) requires a library of compounds and an assay for activity (8). A successful hit would have a concentration for 50% inhibition (IC50) of ∼10 μM; extensive lead optimization is typically needed to lower this value to the 1 to 10 nM range. Sometimes there are no hits (9), and the combination of high costs and low hit rates has put the large-scale approaches of the early 1990s out of favor (10). Computational virtual screening (1113) can be performed on libraries of known or constructed compounds and requires either measured activities for some known compounds or a structure of the biomolecular target.

When the structure of the target is unknown, the activity data can be used to construct a pharmacophore model for the positioning of key features like hydrogen-bonding and hydrophobic groups. Such a model can be used as a template to select the most promising candidates from the library.

Even simple considerations for a set of active compounds can greatly reduce the search space. The biomolecular binding sites for inhibitors have associated characteristics of spatial extent and overall polarity. In a plot such as Fig. 1, it is easy to see that inhibitors for a given target cluster in a limited region. The ordinate shows the solvent-accessible surface area (SASA) for each compound, and the abscissa shows the predicted log of the octanol/water partition coefficient (log P), which is a measure of hydrophobicity. The specific example is for nonnucleoside inhibitors of HIV-1 reverse transcriptase (NNRTIs), including efavirenz (Sustiva) and UC-781 (Scheme 2). The binding site is relatively small and hydrophobic, so the solvent-accessible surface area and the log P for an inhibitor should fall in the 450 to 650 Å2 and 2 to 6 ranges, respectively. The return can be expected to be low for investing in compounds outside these ranges in search of new NNRTIs. Such computational screening for similarity is common in selecting compounds for purchase from commercial libraries. A related activity is to build computationally all members of a proposed combinatorial library and then select the final reagents for purchase or synthesis based on the diversity of the resultant library members (14).

Fig. 1.

Plot of solvent accessible surface area (Å2) versus log P, as computed by QikProp (27), for the >70,000 compounds in the 2001 Maybridge catalog with the addition of 20 nonnucleoside inhibitors of HIV-1 reverse transcriptase (NNRTIs) in red.

Scheme 2.

Docking. When the structure of the target is known, usually from x-ray crystallography or predicted by homology modeling, the most common virtual screening approach is molecular docking (12, 15). Libraries of potential ligands are built in the computer, optimally positioned in the binding site, and scored for potential activity. The top-scoring compounds can then be purchased or synthesized and submitted for experimental testing. Numerous successes of this approach for obtaining lead compounds are well documented (12, 13). Some particularly notable examples are the discovery of DNA gyrase inhibitors after HTS failed (9) and the findings of much higher hit rates for docking than HTS in two comparative studies for tyrosine phosphatase 1B (16) and dihydrodipicolinate reductase (17). There are many approaches to docking and related software packages (15); some widely used ones are DOCK (18), FlexX (19), Glide (20), and GOLD (21). Although conformational sampling of the ligand is now performed by all of the programs, remaining issues for these programs to overcome include the generation of optimal structures of the complexes, flexibility of the host, and details of the scoring functions including treatment of solvation.

So, what's the best docking program? In general, true objectivity is difficult to realize in software comparisons owing to lingering favoritism, associated biased choices for the test data sets, comparison of new versions of one program with older versions of another, and uneven skill levels with different programs. If a developer or vendor is doing the comparisons, the strains on objectivity are obvious and they have the last shot at fine-tuning algorithms to remove problem cases. Therefore, serious potential users need to invest some effort in testing alternatives in their own environment. From the broader performance comparisons of docking programs, one general observation is that improved results are obtained using consensus scoring, i.e., application of multiple scoring functions to the same structures (2225). Thus, current individual scoring functions are not optimal. The typical test is to seed a library of inactive compounds with active ones and then determine the number of actives that are predicted to be among the ∼5% of top-scoring compounds. For thymidine kinase and estrogen receptor, Bissantz et al. (23) applied Dock, FlexX, and Gold and found that the hit rates of ∼10% from a single scoring function could rise to 25 to 40% and 65 to 70% with double and triple scoring.

Similarity searching. Further comparisons also need to be made with simpler alternatives, such as similarity matching against actives (26). For example, I removed ∼20,000 non-druglike molecules from the 2001 Maybridge catalog and then added the 20 NNRTIs of Fig. 1. The remaining 50684 compounds were processed by QikProp (27) to compute 36 descriptors and properties for each compound that are stored in a spreadsheet. This file was then processed with QikSim (28) to compute the similarity of each compound to the average of the active compounds with simple Euclidian and Tanimoto measures of similarity (26). Using the default weights, which emphasize similarity of surface areas, hydrogen bond counts, overall shape (globularity), and log P, 65% (Euclidian) and 50% (Tanimoto) of the active molecules were ranked as being among the top 5% in similarity to the average NNRTI. The corresponding random result would be that only 5% of the active molecules (1 molecule) would emerge in the top 5% of processed molecules (0.05 × 50684 = 2534 molecules). The results for the Euclidian analysis are illustrated in Fig. 2. The analogous procedure was performed for 20 nonsteroidal anti-inflammatory drugs, including celecoxib, rofecoxib (Vioxx), and the common over-the-counter alternatives; 50% (Euclidian) and 75% (Tanimoto) of the active compounds emerged in the top-ranked 5%. Examples of two compounds from the Maybridge catalog, which were ranked among those most similar to the average NNRTI, are RJC 02097 and RJC 03066. They can be compared to efavirenz and UC-781, respectively (Scheme 3).

Scheme 3.

De novo design. A related activity is de novo design (15), i.e., design of inhibitors from scratch given the target binding site. Docking programs can also be used for this purpose by coupling them with a structure generator. However, a variety of more specialized programs have been developed for building ligands in binding sites, usually by positioning and connecting molecular fragments or by growing substituents on a core; early examples of the alternatives are LUDI (29) and SPROUT (30). As a recent example, my contribution in this area is the growing program BOMB (biochemical and organic model builder). The user positions a core, which may be as simple as methane, in the binding site, and a combinatorial library of analogs is created by appending up to four substituents from a list of more than 500 typical drug fragments. All reasonable conformers of each analog are constructed and optimized with the OPLS-AA force field with variation of the dihedral angles, position and orientation of the analog, and some dihedral angles for protein side chains. The final structures are again scored with consideration of the interaction energy, hydrogen bonding, and solvation energy changes.

One scoring function has been trained on several hundred complexes for HIV-RT and COX2. It can then be used with BOMB in construction mode to design new inhibitors (Fig. 3). BOMB is fully integrated with QikProp, so each analog that is built is filtered for drug likeness. Molecules can be built in five topologies, e.g., by sprouting from a central core or by linking groups sequentially to yield more elongated constructs. The weakest point is the scoring, which is reflected in limited transferability of scoring functions between different classes of protein targets.

Fig. 2.

Results of similarity searching for NNRTIs seeded into the Maybridge catalog.

Fig. 3.

An NNRTI built by BOMB in the HIV-1 RT binding site starting from ammonia as the core. Full atomic detail is provided, though only selected protein atoms are illustrated here, including Tyr181, Tyr188, Phe227, Trp229, Leu100, Lys101, and Val106. The protein surface is colored according to its electrostatic potential, red and blue for negatively and positively charged areas, respectively.

Another general issue with de novo design is the synthetic accessibility of the constructs. Our approach is to build libraries with a parallel synthesis plan in mind that centers on standard reactions such as amide bond formation, coupling reactions, and cycloaddition chemistry. A similar notion has been formalized in the SYNOPSIS program by only considering molecules that can be built from known precursors through successive application of appropriate reactions from a set of 70 possibilities (31). Interaction between all team members including the computational and synthetic chemists is clearly important.

Prediction of Properties and Drug-likeness

Lipinski, Murcko, and co-workers at Pfizer and Vertex deserve much credit for raising awareness about properties and structural features that render molecules more or less druglike (11, 3235). The motivation is to apply ADME (absorption, distribution, metabolism, and excretion) considerations early in preclinical development to avoid costly late-stage preclinical and clinical failures (36). The well-known rule-of-five (33) points out that most orally administered drugs have a molecular weight (MW) of 500 or less, a log P no higher than 5, five or fewer hydrogen-bond donor sites, and 10 or fewer hydrogen-bond acceptor sites (N and O atoms). There are other important issues for drug likeness, including lack of reactive functionality except in prodrugs, cell permeability, and, for central nervous system (CNS) compounds, brain/blood partitioning. As reviewed elsewhere (37), an acceptable level of solubility is also critical to permit dissolution and absorption; virtually all drugs have aqueous solubilities above 10–6 M (log S > –6).

The selection of such properties and features and their allowable ranges can be adjusted to provide filters for specific applications such as selection of screening compounds or purchase of reagents for a combinatorial library. Of course, rules invite exceptions, and there is always someone who is quick to note cases such as macrolide antibiotics like erythromycin (MW 734), which through evolution and use of active transport mechanisms are effective antibacterial agents. Cost-efficient modern drug discovery has a substantial statistical component to it: Envelope pushers need to have the resources to take on the higher risks.

Most chemical software companies now offer modules for computation of ADME-related properties (38). The predictions mostly come from regression equations or neural networks (39) that are trained on experimental data. The amount of data and reliability of the corresponding predictions varies from excellent for log P and solubility to more limited reliability for properties such as log BB (brain/blood partitioning) and the IC50 for HERG K+-channel blockage, which promotes long Q-T syndrome. In 2000, I released one of the earliest general programs of this type, QikProp (27). The input is a 3D molecular structure and the output is a profile of structural features, ADME-relevant properties, undesirable functionality, primary metabolites, and comparisons of the same quantities against a database of known drugs. The structural features include surface area components and hydrogen-bonding potentials, and the properties include octanol/water and water/gas log Ps, log S (37), log BB (40), overall CNS activity (41), Caco-2 and MDCK cell permeabilities (42), log Khsa for human serum albumin binding (43), and log IC50 for HERG K+-channel blockage (44). ADME software is generally very fast and can be executed for large libraries, e.g., the processing time for the 71,500 compounds in the 2001 Maybridge catalog with QikProp is 3 min on a 3 GHz laptop PC. Such software is often implemented in corporate Web-based systems for ready processing of individual compounds or libraries.

Some results of a QikProp analysis for the antipsychotic drug chlorpromazine (Thorazine) and the antiadrenergic atenolol are listed in Table 1 (Scheme 4). The two drugs approach the hydrophobic and hydrophilic extremes, and they illustrate the typical patterns. Hydrophobic compounds have relatively poor solubility, high log P, and high serum-protein binding, but good cell permeability, whereas the opposite is true for the hydrophilic compounds. This dichotomy is responsible for the classic lead-optimization struggle of solubility versus permeability. ADME software can help thorough predictions for a proposed analog series and early identification of potential problems. As an example of a promising compound that went over the line, the peptide-like thrombin inhibitor (Scheme 5) had to be abandoned late in development due to its oral bioavailability of only 1% (2). A quick sketch with ChemDraw, conversion to a 3D structure with Chem3D, and processing by QikProp, reveals that the problem appears to be poor cell permeability for this relatively polar molecule, with predicted PCaco and PMDCK values near 10 nm/s.

Scheme 4.
Scheme 5.
Table 1.

Predicted and experimental (in parentheses) ADME-related properties.

log S -4.5 (-5.01) -0.61 (-1.30)
log P 4.80 (5.19) 0.40 (0.16)
log BB 0.74 (1.06) -1.09
log Khsa 0.78 (1.10) -0.79 (-0.48)
PCaco (nm/s) 2003 66 (33)
PMDCK (nm/s) 1425 33 (18)
CNS Activity ++ --

Beyond the use of ADME software for individual compounds and the obvious use as a filter for library screening, analyses for classes of compounds can also be informative. Avoidance of the extremes in a class is a safe strategy. For example, some predictions from QikProp are shown in Table 2 for NNRTIs in order of increasing log P. UC-781 is predicted to be the most hydrophobic with poor solubility and high serum-protein binding; although it is potent in vitro, its pursuit as a topical microbicide rather than as an oral drug is undoubtedly related to these properties (45). The log BB predictions are also interesting from the standpoint of potential CNS penetration, which can be beneficial for attack on latent HIV reservoirs or a concern for CNS side effects.

Table 2.

Selected predicted properties for NNRTIs.

NNRTIlog Plog BBlog Slog Khsa
Nevirapine 2.52 0.02 -3.92 0.11
Delavirdine 2.80 -1.59 -5.74 0.31
TMC 125 2.80 -2.13 -5.79 0.25
MKC-442 3.08 -0.61 -3.57 0.13
8-chloro-TIBO 3.41 0.61 -3.83 0.43
Efavirenz 3.52 -0.28 -5.05 0.29
DPC083 4.10 -0.28 -5.52 0.52
UC-781 5.05 0.04 -5.74 0.70

Advanced Treatments of Protein-Ligand Binding

To progress beyond scoring functions, Monte Carlo (MC) statistical mechanics or molecular dynamics (MD) simulations are typically applied (4648). Classical force fields are used, and the methodology allows extensive sampling of all degrees of freedom for the biomolecule-ligand complexes and representation of the aqueous surroundings with hundreds to thousands of discrete water molecules. Free-energy perturbation (FEP) and thermodynamic integration (TI) calculations then provide formally rigorous means to compute free-energy changes. For biomolecule-ligand affinities, perturbations are made to convert one ligand to another using the thermodynamic cycle shown in Scheme 6. The conversions involve a coupling parameter that causes one molecule to be smoothly mutated to the other by changing the force field parameters and geometry. The difference in free energies of binding for the ligands X and Y then comes from ΔΔGb = ΔGX – ΔGY = ΔGF – ΔGC (Scheme 6). Two series of mutations are performed to convert X to Y unbound in water and complexed to the biomolecule, which yield ΔGF and ΔGC. The same cycle can be used to compute the effect of a protein mutation on the binding of one inhibitor by making P the inhibitor and X and Y the two proteins.

Scheme 6.

There have been numerous successful applications, as reviewed elsewhere (4648). Some of our own recent contributions have addressed substituent optimization for selective COX-2 inhibitors (49), the origin of COX-2/COX-1 selectivity (50), analysis of large substituent effects on Factor Xa inhibition (51), and elucidation of effects of protein mutations on the activity of anti-HIV drugs (5254). The last work included predictions for the structures of the complexes of efavirenz and TMC125 with HIV-1 RT, which were subsequently confirmed by x-ray crystallography; it also elucidated design principles for minimizing the effects of protein mutations on drug activity.

The explicit inclusion of water molecules in these studies certainly makes the representation of the molecular systems more realistic. The relevance of discrete water molecules that form hydrogen-bonded bridges between inhibitors and protein hosts is well illustrated in x-ray structures, when bound water molecules are resolved, and in many of the computational studies (Fig. 4). Water molecules are individual entities that form specific, hydrogen-bonding interactions with the inhibitors and proteins as well as clathrate-like networks around nonpolar groups (55, 56). Optimization of the water structure for protein-ligand complexes can be expected to become a more common part of inhibitor design in the future. Other strengths of the FEP and TI approaches are the rigor, extensive sampling, and the detailed insights that can be obtained on variations in binding affinities. The drawbacks are that they are comparatively time-consuming (∼2 days per compound versus a minute with scoring function approaches); the sampling and convergence of the results are not always complete, especially for significant backbone and side-chain conformational changes or 180° reorientation of the ligand; and it remains difficult to handle large structural differences between ligands as in changing the core structure (chemotype).

Fig. 4.

Computed structure from a Monte Carlo simulation for BIRB 796, an anti-inflammatory clinical candidate (67), bound to p38 mitogen-activated protein kinase. Fifty-six amino acid residues and 74 water molecules nearest the inhibitor are illustrated. The inhibitor is shown as a CPK model, and the water molecules and a few protein residues are presented as stick images; the remaining protein residues are rendered in the surface.

Hybrid methods. Faster, more approximate alternatives that still include substantial sampling are being pursued. An approximate approach based on linear response theory was introduced by Åqvist et al. (57). In this model, the free energy of interaction of a solute with its environment is given by one-half the electrostatic (Coulombic) energy plus the van der Waals (Lennard-Jones) energy scaled by an empirical parameter, α. For binding a ligand to a protein, the differences in the interactions between the ligand in the unbound state and bound in the complex then provide an estimate of the free energy of binding. The required energy components are obtained from MC or MD simulations for inhibitors in water and for the protein-inhibitor complexes in water. Key advantages over FEP and TI methods are that absolute free energies of binding can be approximated and that only simulations at the endpoints of a mutation are required, so the computing demands are reduced to a few hours per compound. A major plus for the LR approach is that it is easy to treat structurally diverse ligands, which might be impractical to tackle with FEP calculations. In spite of the approximations including the neglect of intramolecular energetics, the approach has yielded promising results (58). It is expected to be most viable for a series of relatively rigid analogs.

An early extension of the LR method to calculate free energies of hydration incorporated a third term proportional to the solute's solvent-accessible surface area, as an index for cavity formation (59). The technology has evolved, and results have since been obtained for much larger protein-inhibitor data sets, showing that better accuracy can be achieved by considering alternate descriptors in the LR equations. Some recent large studies have considered 45 celecoxib analogs with COX-2 (60), ∼250 NNRTIs with HIV-RT (61, 62), and 148 inhibitors with CDK2, Lck, and p38 kinase (63). The latter work is particularly notable because it demonstrated that the LR models that were built using the activity data for two kinases yielded excellent predictions for the third kinase (the predictive correlation coefficient q2 = 0.54 to 0.71). LR methods can be viewed as a merger of QSAR (quantitative structure-activity relationships) and simulation technology, and they have the advantage of using powerful descriptors including energy components and estimates of solvation energies that are extracted from the MD or MC simulations on the protein-ligand complexes. Other approximate methods have also been introduced (6466) that typically feature simplified representation of the solvent through Poisson-Boltzmann or generalized Born/surface area (GB/SA) procedures. It can be expected that these more advanced techniques will continue to evolve and become more widely used in drug discovery.

References and Notes

View Abstract

Navigate This Article