Molecular Design: 2018

Friday 30 November 2018

Ligand efficiency and fragment-to-lead optimizations

The third annual survey (F2L2017) of fragment-to-lead (F2L) optimizations was published last week. Given that it was the second survey (F2L2016) in this series, that prompted me to write 'The Nature of Ligand Efficiency' (NoLE), I thought that some comments would be in order. F2L2017 presents analysis of data that had been aggregated from all three surveys and I'll be focusing on the aspects of this analysis that relate to ligand efficiency (LE).

As noted in NoLE, perception of efficiency changes when affinity is expressed in different concentration units and I have argued that this is an undesirable feature for a quantity that is widely touted as useful for design. At very least, it does place a burden of proof on those who advocate the use of LE in design to either show that the change in perception of efficiency with concentration unit is not a problem or to justify their choice of the 1 M concentration unit. One difficulty that LE advocates face is that the nontrivial dependency of LE on the concentration unit only came to light a few years after LE was introduced as "a useful metric for lead selection" and, even now, some LE advocates appear to be in a state of denial. Put more bluntly, you weren't even aware that you were choosing the 1 M concentration unit when you started telling medicinal chemists that they should be using LE to do their jobs but you still want us to believe that you made the correct choice?

I'm assuming that the authors of F2L2017 would all claim familiarity with the fundamentals of physical chemistry and biophysics while some of the authors may even consider themselves to be experts in these areas. I'll put the following question to each of the authors of F2L2017: what would your reaction be to analysis showing that the space group for a crystal structure changed if the unit cell parameters were expressed using different units? I can also put things a bit more coarsely by noting that to examine the effect on perception of changing a unit is, when applicable, a most efficacious bullshit detector.

The analysis in F2L2017 that I'll focus on is the the comparison between fragment hits and leads. As I showed in NoLE, it is meaningless to compare LE values because LE has a nontrivial dependency on the concentration unit used to express affinity. LE advocates can of course declare themselves to be Experts (or even Thought Leaders) and invoke morality in support of their choice of the 1 M concentration unit. However, this is a risky tactic because physical science can't accommodate 'privileged' units and an insistence that quantities have to be expressed in specific units might be taken as evidence that one is not actually an Expert (at least not in physical science).

So let's take a look at what F2L2017 has to say about LE in the context of F2L optimizations.

"The distributions for fragment and lead LE have also remained reasonably constant. On average there is no significant change in LE between fragment and lead (ΔLE = 0.004, p ≈ 0.8). Figure 5A shows the distribution of ΔLE, which is approximately centered around zero, although interestingly there are more examples where LE increases from fragment to lead (40) than where a decrease is seen (25). Some caution is warranted when interpreting these data, as our minimum criterion for 100-fold potency improvement may have introduced some selection bias. Nevertheless, there is no clear evidence in this data set that LE changes systematically during fragment optimization. Although the average change in LE from fragment to lead is small, Figure 5B shows that the correlation between fragment and lead LE is modest (R2 = 0.22), with a mean absolute difference between fragment and lead LE of 0.08."

This might be a good point at which to remind the authors of F2L2017 about some of the more extravagant claims that have been made for LE. It has been asserted that “fragment hits typically possess high ‘ligand efficiency’ (binding affinity per heavy atom) and so are highly suitable for optimization into clinical candidates with good drug-like properties”. It has also been claimed that "ligand efficiency validated fragment-based design". However, the more important point is that it is completely meaningless to compare values of LE of hits and leads because you will come to different conclusions if you express affinity using a different concentration unit (see Table 2 in NoLE). It is also worth noting that expressing affinity in units of 1 M introduces selection bias just as does the requirement for 100-fold potency improvement.

Had I been reviewing F2L2017, I'd have suggested that the authors might think a bit more carefully about exactly why they are analyzing differences between LE values for fragments and leads. A perspective on fragment library design (reviewed in this post) correctly stated that a general objective of optimization projects is “ensuring that any additional molecular weight and lipophilicity also produces an acceptable increase in affinity". If you're thinking along these lines then scaling the F2L potency increase by the corresponding increase in molecular size makes a lot more sense than comparing LE for the fragments and leads. This quantifies how efficiently (in terms of increased molecular size) the potency gains for the F2L project have been achieved. This is not a new idea and I'll direct readers toward a 2006 study in which it was noted that a tenfold increase in affinity corresponded to a mean increase in molecular weight of 64 Da (standard deviation = 18 Da) for 73 compound pairs from FBLD projects. This is how group efficiency (GE) works and I draw the attention of the two F2L2017 authors from Astex to a perceptive statement made by their colleagues that GE is “a more sensitive metric to define the quality of an added group than a comparison of the LE of the parent and newly formed compounds”.

The distinction between a difference in LE and a difference in affinity that has been scaled by a difference in molecular size becomes a whole lot clearer if you examine the relevant equations. Equation (1) defines the F2L LE difference and first thing that you'll notice is that is that it is algebraically more complex than equation (2). This is relevant because LE advocates often tout the simplicity of the LE metric. However, the more significant difference between the two is that the concentration that defines the standard state is present in equation (1) but absent in equation (2). This means that you get the same answer when you scale affinity difference by the corresponding molecular size difference regardless of the units in which you express affinity.

So let's see how things look if you're prepared to think beyond LE when assessing F2L optimizations. Here's a figure from NoLE in which I've plotted the change in affinity against the change in number of non-hydrogen atoms for the F2L optimizations surveyed in F2L2016. The molecular size efficiency for each optimization can be calculated by dividing the change in affinity by the change in in number of non-hydrogen atoms. I've drawn lines corresponding to minimum and maximum values of molecular size efficiency and have also shown the quartiles.

So now it's time to wrap things up. A physical quantity that is expressed in a different unit is still the same physical quantity and I presume that all the authors of F2L2017 would have been aware of this while they were still undergraduates. LE was described as thermodynamically indefensible in comments on Derek's post on NoLE and choosing to defend an indefensible position usually ends in tears (just as it did for the French at Dien Bien Phu in 1954). The dilemma facing those who seek to lead opinion in FBDD is that to embrace the view that the 1 M concentration unit is somehow privileged requires that they abandon fundamental physicochemical principles that they would have learned as undergraduates.

Sunday 14 October 2018

A PAINful itch

I've been meaning to take a look at the Seven Year Itch (SYI) article on PAINS for some time. SYI looks back over the the preceding 7 years of PAINS while presenting a view of future directions. One general comment that I would make of SYI is that it appears to try to counter criticisms of PAINS filters without explicitly acknowledging these criticisms.

This will a long post and strong coffee may be required. Before starting, it must be stressed that I neither deny that assay interference is a significant problem nor do I assert that compounds identified by PAINS filters are benign. The essence of my criticism of much of the PAINS analysis is that the rhetoric is simply not supported by the data. It has always been easy to opine that chemical structures look unwholesome but it has always been rather more difficult to demonstrate that compounds are behaving pathologically in assays. One observation that I would make about modern drug discovery is that fact and opinion often become entangled to the extent that those who express (and seek to influence) opinions are no longer capable of distinguishing what they know from what they believe.

I've included some photos to break up the text a bit and these are from a 2016 visit to the north of Vietnam. I'll start with this one taken from the western shore of Hoan Kiem Lake the night after the supermoon.

Hanoi moon

I found SYI to be something of a propaganda piece with all the coherence of a six-hour Fidel Castro harangue. As is typical for articles in the PAINS literature, SYI is heavy in speculation and opinion but is considerably lighter in facts and measured data. It wastes little time in letting readers know how many times the original PAINS article was cited. One criticism that I have made about the original PAINS article (that also applies to SYI and the articles in between) is that the article neither defines the term PAINS (other than to expand the acronym) nor does it provide objective criteria by which a compound can be shown experimentally to be (or not to be) a PAINS (or is that a PAIN). An 'unofficial' definition for the term PAINS has actually been published and I think that it's pretty good:

"PAINS, or pan-assay interference compounds, are compounds that have been observed to show activity in multiple types of assays by interfering with the assay readout rather than through specific compound/target interactions."

While PAINS purists might denounce the creators of the unofficial PAINS definition for heresy and unspecified doctrinal errors, I would argue that the unofficial definition is more useful than the official definition (PAINS are pan-assay interference compounds). I would also point out that some of those who introduced the unofficial definition actually use experiments to study assay interference when much of the official PAINSology (or should that be PAINSomics) consists of speculation about the causes of frequent-hitter behavior. One question that I shall put to you, the reader, is how often, when reading an article on PAINS, do you see real examples of experimental studies that have clearly demonstrated that specific compounds exhibit pan-assay interference?

Restored bunker and barbed wire at Strongpoint Béatrice which was the first to fall to the Viet Minh.

Although the reception of PAINS filters has generally been positive, JCIM has published two articles (the first by an Associate Editor of that journal and the second by me) that examine the PAINS filters critically from a cheminformatic perspective. The basis of the criticism is that the PAINS filters are predictors of frequent hitter behavior for assays using an AlphaScreen readout and they have been developed using proprietary data. It's a quite a leap from frequent-hitter behavior when tested at single concentrations in a panel of six AlphaScreen assays to pan-assay interference. In the language of cheminfomatics, we can state that the PAINS filters have been extrapolated out of a narrow applicability domain and they have been reported (ref and ref) to be less predictive of frequent-hitter behavior in these situations. One point that I specifically made was that a panel of six assays all using the same readout is a suboptimal design of an experiment to detect and quantify pan-assay interference.

In my article, bad behavior in assays was classified as Type 1 ( assay result gives an incorrect indication of the extent to which the compound affects the function of the target) or Type 2 (compounds affect target function by an undesirable mechanism of action). I used these rather bland labels because I didn't want to become ensnared in a Dien Bien Phu of nomenclature and it must be stressed that there is absolutely no suggestion that other people use these labels. My own preference would actually be to only use the term interference for Type 1 bad behavior and it's worth remembering that Type 1 bad behavior can also lead to false negatives.

The distinction between Type 1 and and Type 2 behaviors is an important and useful one to make from the perspective of drug discovery scientists who are making decisions as to which screening hits to take forward. Type 1 behavior is undesirable because it means that you can't believe the screening result for hits but, provided that you can find an assay (e.g. label-free measurement of affinity) that is not interfered with, Type 1 behavior is a manageable, although irksome, problem. Running a second assay that uses an orthogonal readout may shed light on whether Type 1 behavior is an issue although, in some cases, it may be possible to assess, and even correct for, interference without running the orthogonal second assay. Type 2 behavior is a much more serious problem and a compound that exhibits Type 2 behavior needs to be put out of its misery as swiftly and mercifully as possible. The challenge presented by Type 2 behavior is that you need to establish the mechanism of action simply to determine whether or not it is desirable. Running a second assay with an orthogonal readout is unlikely to provide useful information since the effect on target function is real.

Barbed wire at Strongpoint Béatrice. I'm guessing that it was not far from here that, on the night of 13th/14th March, 1954, Captain Riès would have made the final transmission: "It's all over - the Viets are here. Fire on my position. Out."

Most (all?) of the PAINSology before SYI failed to make any distinction between Type 1 and Type 2 bad behavior. SYI states "There does not seem to be an industry-accepted nomenclature or ontology of anomalous binding behavior" and makes some suggestions as to how this state of affairs might be rectified. SYI recommends that "Actives" be first classified as "target modulators" or "readout modulators". The "target modulators" are all considered to be "true positives" and these are further classified as "true hits" or "false hits". All the "readout modulators" are labelled as "false positives". Unsurprisingly, the authors recommend that all the "false hits" and "false positives" be labelled as pan-assay interference compounds regardless of whether the compounds in question actually exhibit pan-assay interference. In general, I would advise against drawing a distinction between the terms "hit" and "positive" in the context of screening but, if you chose to do so, then you do really do need to define the terms much more precisely than the authors have done.

I think the term "readout modulator" is reasonable and is equivalent to my definition of Type 1 behavior (assay result gives an incorrect indication of the extent to which the compound affects the function of the target). However, I strongly disagree with the classification of compounds showing "non-specific interaction with target leading to active readout" as readout modulators since I'd regard any interaction with the target that affects its function to be modulation. My understanding is that the effects of colloidal aggregators on protein function are real (although not exploitable) and that it is often possible to observe reproducible concentration responses. My advice to the authors is that, if you're going to appropriate colloidal aggregators as PAINS, then you might at least put them in the right category.

While the term "target modulator" is also reasonable, it might not be a such great idea to use it in connection with assay interference since it's also quite a good description of a drug. Consider the possibility of homeopaths and anti-vaxxers denouncing the pharmaceutical industry for poisoning people with target modulators. However, I disagree with the use of the term "false hit" since the modulation of the target is real even when the mechanism of action is not exploitable. There is also a danger of confusing the "false hits" with the "false positives" and SYI is not exactly clear about the distinction between a "hit" and a "positive". In screening both terms tend to be used to specify results for which the readout exceeds a threshold value.

The defensive positions on one of the hills of Strongpoint Béatrice have not been restored. Although the trenches have filled in with time, they are not always as shallow as they appear to be in this photo (as I discovered when I stepped off the path).

It's now time to examine what SYI has to to say and singlet oxygen is as good a place as any to start from. One criticism of PAINS filters that I have made, both in my article and the Molecular Design blog, is that some of the frequent-hitter behavior in the PAINS assay panel may be due to quenching or scavenging of singlet oxygen which is an essential component of the the AlphaScreen readout. SYI states:

"However, while many PAINS classes contain some member compounds that registered as hits in all the assays analyzed and that therefore could be AlphaScreen-specific signal interference compounds, most compounds in such classes signal in only a portion of assays. For these, chemical reactivity that is only induced in some assays is a plausible mechanism for platform-independent assay interference."

The authors seem to be interpreting the observation that a compound only hits in a portion of assays as evidence for platform-independent assay interference. This is actually a very naive argument for a number of reasons. First, compounds do not all appear to have been assayed at the same concentration in the original PAINS assay panel and there may be other sources of variation that were not disclosed. Second, different readout thresholds may have been used for the assays in the panel and noise in the readout introduces a probabilistic element to whether or not the signal for a compound exceeds the threshold. Last, but definitely not least, the molecular structure of a compound does influence the efficiency with which it quenches or scavenges singlet oxygen. A recent study observed that PAINS "alerts appear to encode primarily AlphaScreen promiscuous molecules".

If you read enough PAINS literature, you'll invariably come across sweeping generalizations made about PAINS. For example, it has been claimed that "Most PAINS function as reactive chemicals rather than discriminating drugs." SYI follows this pattern and asserts:

"Another comment we frequently encounter and very relevant to this journal is that PAINS may not be appropriate for drug development but may still comprise useful tool compounds. This is not so, as tool compounds need to be much more pharmacologically precise in order that the biological responses they invoke can be unambiguously interpreted."

While it is encouraging that the authors have finally realized the significance of the distinction between readout modulators and target modulators, they don't seem to be fully aware of the implications of making this distinction. Specifically, one can no longer make the sweeping generalizations about PAINS that are common in PAINS literature. Consider a hypothetical compound that is an efficient quencher of singlet oxygen and that has shown up as a hit in all six AlphaScreen assays of the original PAINS assay panel. While many would consider this compound to be a PAINS (or PAIN), I would strongly challenge a claim that observation of frequent-hitter behavior in this assay panel would be sufficient to rule out the use of the compound as a tool.

SYI notes that PAINS are recognized by other independently developed promiscuity filters.

"The corroboration of PAINS classes by such independent efforts provides strong support for the structural filters and subsequent recognition and awareness of poorly performing compound classes in the literature. It is instructive therefore to introduce two more recent and fully statistically validated frequent-hitter analytical methods that are assay platform-independent. The first was reported in 2014 by AstraZeneca(16) and the second in 2016 by academic researchers and called Badapple.(27)"

I don't think it is particularly surprising (or significant) that some of the PAINS classes are recognized as frequent-hitters by other models for frequent-hitter behavior. What is not clear is how many of the PAINS classes are recognized by the other frequent-hitter models or how 'strong' the recognition is. I would challenge the description of the AstraZeneca frequent-hitter model as "fully statistically validated" since validation was performed using proprietary data. I made a similar criticism of the original PAINS study and would suggest that the authors take a look at what this JCIM editorial has to say about the use of proprietary data in modeling studies.

The French named this place Eliane and it was quieter when I visited than it would have been on 6th May, 1954 when the Viet Minh detonated a large mine beneath the French positions. It has been said that the alphabetically-ordered (Anne-Marie to Isabelle) strongpoints at Dien Bien Phu were named for the mistresses of the commander, Colonel (later General) Christian de Castries although this is unlikely.

SYI summarizes as follows:

"In summary, we have previously discussed a variety of issues key to interpretation of PAINS filter outputs, ranging from HTS library design and screening concentration, relevance of PAINS-bearing FDA-approved drugs, issues in SMARTS to SLN conversion, the reality of nonfrequent hitter PAINS, as well as PAINS and non-PAINS that are respectively not recognized or recognized in the PAINS filters as originally published. However, nowhere has a discussion around these key principles been summarized in one article, and that is the point of the current article. Had this been the case, we believe some recent contributions to the literature would have been more thoughtfully directed. (21,32)"

I must confess that reference to the reality of nonfrequent hitter pan assay interference compounds would normally prompt me to advise authors to stay off the peyote until the manuscript has been safely submitted. However, the bigger problem embedded in the somewhat Rumsfeldesque first sentence is that you need objective and unambiguous criteria by which compounds can be determined to be PAINS or non-PAINS before you can talk about "key principles". You also need to acknowledge that interference with readout and and undesirable mechanisms of action are entirely different problems requiring entirely different solutions.

I noted that recent contributions to the literature from me and from a JCIM Associate Editor (who might know a bit more about cheminformatics than the authors) were criticized for being insufficiently thoughtful. To be criticized in this manner is, as the late, great Denis Healey might have observed, "like being savaged by a dead sheep". Despite what the authors believe, I can confirm that my contribution to the literature would have been very similar even if SYI had been published beforehand. Nevertheless, I would suggest to the authors that dismissing the feedback from a JCIM Associate Editor as if he were a disobedient schoolboy might not have been such a smart move. For example, it could get the JMC editors wondering a bit more about exactly what they'd got themselves into when they decided to endorse a frequent-hitter model as a predictor of pan-assay interference. The endorsement of a predictive model by a premier scientific journal represents a huge benefit to the creators of the model but the flip side is that it also represents a huge risk to the journal.

So that's all that I want to say about PAINS and it's a good point to wrap things up so that I can return to Vietnam for the remainder of the post.

I'm pretty sure that neither General Giap nor General de Castries visited the summit of Fansipan which at 3143 meters is the highest point in Vietnam (I wouldn't have either had a cable car had not been installed a few months before I visited). It's a great place to enjoy the sunset.

Back in Hanoi, I attempted to pay my respects to Uncle Ho, as I've done on two previous visits to this city, but timing was not great (they were doing the annual formaldehyde change). Uncle Ho is in much better shape than Chairman Mao who is actually seven years 'younger' and this is a consequence of having been embalmed by the Russians (the acknowledged experts in this field). Chairman Mao had the misfortune to expire when Sino-Soviet relations were particularly frosty and his pickling was left to some of his less expert fellow citizens. It is also said that the Russian embalming team arrived in Hanoi before Uncle Ho had actually expired...

Catching up with Uncle Ho

Sunday 7 October 2018

More hydrogen bonding asymmetries

<< Previous

I examined an article on the polarized nature of protein-ligand binding interfaces previously and promised that I'd discuss a completely different type of hydrogen bond asymmetry which is based not on structure but on energetics. Readers may be familiar with hydrogen bond (HB) acidity and basicity which may be quantified by measurement of 1:1 association constants for hydrogen bonded complexes in non-hydrogen bonding solvents (e.g. carbon tetrachloride). Here are three references (1 | 2 | 3) and I'll also mention that molecular electrostatic potential (MEP) can be used for prediction of both HB acidity and HB basicity.

As discussed in this article, measurements of HB acidity and basicity have their limitations when trying to use them to understand and predict solvation behavior in aqueous media. First, measuring the association constant for a 1:1 complex does not tell us what will happen when two water molecules simultaneously donate hydrogen bonds to the oxygen atom of a carbonyl group. Second, the measured association constants cannot be used to compare HB acceptors with HB donors. This may seem a perverse sort of thing to want to do but one of the things that drug designers are interested in is the ease of dragging different HB donors and acceptors out of water.

Lake Liadskoye

Prediction of alkane/water partition coefficients (logPalk) has been a long standing interest (1 | 2 | 3) of mine for a number of years. It turns out that analysis of logPalk values measured for structurally prototypical model compounds can tell us quite a lot about what happens when you drag individual HB donors and acceptors out of water. The analysis is based on the observation of a very strong correlation between molecular surface area (MSA) and logPalk. The figure below shows the response of logPalk to MSA for saturated hydrocarbons, aliphatic alcohols (single hydroxyl group) and aliphatic diols. The lines of fit are essentially parallel and equally spaced which suggests that the effect on logPalk of adding a hydroxyl group to a saturated hydrocarbon or to an aliphatic alcohol is constant. This suggests treating polar groups as perturbations of saturated hydrocarbons for prediction of logPalk and analysis of data like what is shown in Figure 1 can be used to parameterize the perturbations for different polar groups. The approach, described in this article is to first calculate logPalk for a hypothetical saturated hydrocarbon with the same MSA as the compound of interest and then to sum the parameters for the polar groups in the molecular structure to account for the introduction of these polar groups.

Figure 1. Relationship between alkane/water logP and molecular surface area (MSA) for saturated hydrocarbons, saturated alcohols and saturated diols. Neither of the aliphatic diols (1,4-butanediol and 1,6-hexanediol) would be expected to form intramolecular HBs in water.

I think we'll need more data (especially for heterocycles and species with intramolecular HBs) to make this approach to prediction of logPalk generally useful. However, the size of the effect on logPalk of introducing an HB acceptor or donor into a saturated hydrocarbon does tell us how strongly the HB donor or acceptor interacts with water. It was actually this article which was published after our article on logPalk prediction that got me thinking along these lines. In our article, we showed how polarity can be defined for HB acceptors and donors and calculated from measured alkane/water partition coefficients. Polarity defined in this manner brings HB donors and acceptors onto the same scale and allows us to explore another type of hydrogen bonding asymmetry.

Insects exploiting surface tension in Belovezhskaya Pushcha

For HB acceptors, the approach is simple. First you need to identify appropriate model compounds for which logPalk has been measured. These have only the HB acceptor functional group of interest, saturated carbon and hydrogen in their molecular structures. Next, calculate logPalk for a saturated hydrocarbon with the same MSA as that for the model compound (use line for saturated hydrocarbons in Figure 1 to do this) and subtract the measured logPalk value from the calculated value. Things are a bit more complicated for HB donors because you can't usually have these without an HB acceptor (this is the 'baggage' I discussed in the previous post) and you need to deal with these on a case-by-case basis. For example, you might estimate the polarity of an amide NH by subtracting the polarity of the tertiary amide group from that of the secondary amide group. Here's a table of polarity estimates for some hydrogen bond acceptors and donors (our article explains how these were derived).

Table 1. Polarity of HB acceptors and donors estimated from measured alkane/water partition coefficient and molecular surface area

The results in Table 1 show that HB donors are typically more easily pulled out of water than HB acceptors and this can be seen as another hydrogen bonding asymmetry. This appears to go against the folklore that HB donors are somehow worse than HB acceptors from the perspective of drug-likeness. The polarity values for the NH (0.8) and carbonyl O (6.8) of the amide group may have some relevance to protein folding. This a good place to wrap up and I'll conclude by noting that, in the supplemental information for our article, you'll find an archive that contains files (in plain text format) of measured values for logPalk, hydrogen bond basicity and pKa that we extracted from the literature (DOI links are included). Here are some more photos from Belarus.

Até mais!

Flora and fauna of Belovezhskaya Pushcha

Sunday 30 September 2018

Hydrogen bonding asymmetries

Next >>

Have you ever wondered why the Rule of 5 (Ro5) specifies hydrogen bond (HB) thresholds of 10 acceptors but only 5 donors? This is, perhaps, the prototypical example of what I'll call a 'hydrogen bonding asymmetry' and it is sometimes invoked in support of the folklore that HB donors are somehow 'worse' than HB acceptors in drug design. I have, on occasion, tried to track down the source of this folklore but that trail has always gone cold on me. In any case, I don't think the HB asymmetry in Ro5 has any physical significance since HB acceptors (especially as defined for Ro5) tend to be more common in chemical structures of interest to medicinal chemists than HB donors. This was discussed in our correlation inflation article and the bigger Ro5 question for me is why the high polarity limit is defined by counts of HB donors and acceptors while the low polarity limit is defined in terms of lipophilicity. As may become a blogging habit, I'll include some random photos (these are from a visit to India late in 2013) to break up the text a bit.

Drum fest at Buland Darwaza

It was this article in JCAMD about the 'polarized' nature of protein-ligand interfaces that got me thinking again about hydrogen bonding asymmetries. The study found that proteins donate twice as many HBs as they accepted. While the observation is certainly interesting, I do think that the authors might be over-interpreting it. For example, the authors suggest that it appears to be an underlying explanation for Ro5 and they may find that there are significant differences in their definitions of HB acceptors and those used to apply Ro5. The authors also state "Peptidyl ligands, on the other hand, showed no strong preference for donating versus accepting H-bonds". This observation would more be consistent with 'polarization' of protein-ligand interfaces being determined by nature of the ligand.

The authors assert that "lone pairs available to accept H-bonds are actually 1.6 times as prevalent as protons available to donate, both on the protein and ligand side of the interface." While it is appropriate to count lone pairs in situations where only one lone pair accepts an HB (e.g. when considering 1:1 hydrogen bonded complexes in low polarity solvents), I would argue that it is not appropriate to do so when considering biomolecular recognition in aqueous media because the acceptance of an HB by one oxygen lone pair makes the other lone pair less able to accept an HB. You can see this effect using molecular electrostatic potential as discussed in this article (see polarization effects section and Table 4). Put another way, how often is a carbonyl oxygen observed to accept two HBs from a binding partner? How many docking tools would explictly penalize a pose in which a carbonyl oxygen accepted two HBs?

As I see it, a typical protein is more likely to have a surplus of HB donors under normal physiological conditions. Some parts (e.g. serine, threonine, tyrosine and histidine side chains and the backbone) of a protein can be regarded as having equal numbers of HB donor and acceptor atoms. While the anionic side chains of aspartate and glutamate cannot donate HBs, the cationic side chains of arginine and lysine have five and three donor hydrogen atoms respectively while lacking HB acceptors. The tryptophan side chain has only a single HB donor (although its p-system is likely to be able to accept HBs) while each side chain of aspargine and glutamine has two donor hydrogen atoms and one acceptor oxygen atom. The histidine side chain is sometimes observed to be protonated in X-ray crystal structures which means that it should be considered to be more HB donor than HB acceptor in the constext of protein-ligand recognition. The tyrosine hydroxyl would be expected to be a stronger HB donor (and weaker HB acceptor) than the hydroxyls of either serine or threonine.

A magical place

The study considers the "possibility is that nature avoids the presence of chemical groups bearing both H-bond donor and acceptor capacity, such as hydroxyl groups, in the binding sites of proteins or ligands" although it is not clear what glycobiologists would have to say about this. Let's think a bit about what happens when a hydroxyl group donates its hydrogen atom. Let's suppose you've spotted a nice juicy hydrogen bond acceptor at the bottom of a deep binding pocket that is otherwise hydrophobic. The ligandability is eye-wateringly awesome (the ligandometer is beeping loudly and appears to have gone into dynamic range overload). Even the tiresome Mothers Against Molecular Obesity (MAMO) are impressed and have recommended that you deploy a hydroxyl group since this will be great for property forecast index (PFI). What could possibly go wrong?

The main problem is that the hydroxyl HB donor comes with baggage. In order to donate an HB to the acceptor at the bottom of that pocket, you're going to need to force an HB acceptor into contact with the non-polar part of that binding pocket. Although this contact is not inherently repulsive, it is destabilizing. Another factor is that donation of an HB by the hydroxyl group is likely to increase the HB basicity of the oxygen (which will exacerbate the problem). You can think of other neutral HB donors (e.g. amide NH) but the vast majority of them come with baggage the form of an accompanying HB acceptor. Exceptions such as NH in pyrrole (not renowned for stability) and indole (steric demands) come with baggage of their own. In contrast, the drug designer has access to a diverse set (e.g. heteroaromatic N, nitrile N, tertiary amide O, sulfoxide O, ether O) of HB acceptors that are not accompanied by HB donors. If you use one of these, you don't have the problem of having to also accommodate a ligand HB donor.

This is a good place to wrap up. In the next post, I'll talk about a completely different type of hydrogen bonding asymmetry, but for now, I'll leave you with some photos from an afternoon spent admiring asses in the Rann of Kutch.

Até mais!

Thursday 13 September 2018

On the Nature of QSAR

With EuroQSAR2018 fast approaching, I'll share some thoughts from Brazil since I won't be there in person. I've not got any QSAR related graphics handy so I'll include a few random photos to break the text up a bit.

East of Marianne River on north coast of Trinidad

Although Corwin Hansch is generally regarded as the "Father of QSAR", it is helpful to look further back to the work of Louis Hammett in order to see the prehistory of the field. Hammett introduced the concept of the linear free energy relationship (LFER) which forms the basis of the formulation of QSAR by Hansch and Toshio Fujita. However, the LFER framework encodes two other concepts that are also relevant to drug design. First, the definition of a substituent constant relates a change in a property to a change in molecular structure and this underpins matched molecular pair analysis (MMPA). Second, establishing an LFER allows the sensitivity of physicochemical behavior to structural change to be quantified and this can be seen as a basis for the activity cliff concept.

Kasbah cats in Ouarzazate

As David Winkler and the late Prof. Fujita noted in this 2016 article, QSAR has evolved into "two QSARs":

Two main branches of QSAR have evolved. The first of these remains true to the origins of QSAR, where the model is often relatively simple and linear and interpretable in terms of molecular interactions or biological mechanisms, and may be considered “pure” or classical QSAR. The second type focuses much more on modeling structure–activity relationships in large data sets with high chemical diversity using a variety of regression or classification methods, and its primary purpose is to make reliable predictions of properties of new molecules—often the interpretation of the model is obscure or impossible.

I'll label the two branches of QSAR as "classical" (C) and "machine learning" (ML). As QSAR evolved from its origins into ML-QSAR, the descriptors became less physical and more numerous. While I would not attempt to interpret ML-QSAR models, I'd still be wary of interpreting a C-QSAR model if there was a high degree of correlation between the descriptors. One significant difficulty for those who advocate ML-QSAR is that machine learning is frequently associated with (or even equated to) artificial intelligence (AI) which, in turn, oozes hype. Here are a couple of recent In The Pipeline posts (don't forget to look at the comments) on machine learning and AI.

One difference between C-QSAR models and ML-QSAR models is that the former are typically local (training set compounds are closely related structurally) while the the latter are typically non-local (although not as global as their creators might have you believe). My view is that most 'global' QSAR models are actually ensembles of local models although many QSAR modelers would have me dispatched to the auto-da-fé for this heresy. A C-QSAR model is usually defined for a particular structural series (or scaffold) and the parameters are often specific (e.g. p value for C3-substituent) to the structural series. Provided that relevant data are available for training, one might anticipate that, within its applicability domain, local model will outperform a global model since the local model is better able to capture the structural context of the scaffold.

I would guess that most chemists would predict the effect on logP of chloro-substituting a compound more confidently than they would predict logP for the compound itself. Put another way, it is typically easier to predict the effect of a relatively small structural change (a perturbation) on chemical behavior than it is to predict chemical behavior directly from molecular structure. This is the basis for using free energy calculations to predict relative affinity and it also provides a motivation for MMPA (which can be seen as the data-analytic equivalent of free energy perturbation). This suggests viewing activity and properties in terms of structural relationships between compounds. I would argue that C-QSAR models are better able than ML-QSAR models to exploit structural relationships between compounds.

Down the islands with Venezuela in the distance

ML-QSAR models typically use many parameters to fit the data and this means that more data is needed to build them. One of the issues that I have with machine learning approaches to modeling is that it is not usually clear how many parameters have been used to build the models (and it's not always clear that the creators of the models know). You can think of number of parameters as the currency in which you pay for the quality of fit to the training data and you need to account for number of parameters when comparing performance of different models. This is an issue that I think ML-QSAR advocates need to address.

Overfitting of training data is an issue even for C-QSAR models that use small numbers of parameters. Generally, it is assumed that if a model satisfies validation criteria it has not been over-fitted. However, cross-validation can lead to an optimistic assessment of model quality if the distribution of compounds in the training space is very uneven. An analogous problem can arise even when using external test sets. Hawkins advocated creating test sets by removing all representatives of particular chemotypes from training sets and I was sufficiently uncouth to mention this to one of the plenaries at EuroQSAR 2016. Training set design and model validation do not appear to be solved problems in the context of ML-QSAR.

The Corniche in Beirut

I get the impression that machine learning algorithms may be better suited for classification than QSAR and it is common to see potency (or affinity) values classified as 'active' or 'inactive' for modeling. This creates a number of difficulties and I'll also point you towards the correlation inflation article that explains why gratuitous categorization of continuous data is very, very naughty. First, transformation of continuous data to categorical data throws away huge amounts of information which would seem to be the data science equivalent of shooting yourself in the foot. Second, categorization distorts your perception of the data (e.g. a pIC50 value of 6.5 might be regarded as more similar to one of 9.0 than one of 5.5). Third, a constant uncertainty in potency translates to a variable uncertainty in the classification. Fourth, if you categorize continuous data then you need to demonstrate that conclusions of analysis do not depend on the categorization scheme.

In the machine learning area not all QSAR is actually QSAR. This article reports that "the performance of Naïve Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods". However, the QSAR methods used appear to be based on categorical rather than quantitative definitions of activity. Even when more than two activity categories (e.g. high, medium, low) are defined, analysis might not be accounting for the ordering of the categories and this issue was also discussed in the correlation inflation article. Some clarification from the machine learning community may be in order as to which of their offerings can be used for modelling quantitative activity data.

I'll conclude the post by taking a look at where QSAR fits into the framework of drug design. Applying QSAR methods requires data and one difficulty for the modeler is that the project may have delivered its endpoint (or been put out of its misery) by the time that there is sufficient data for developing useful models. Simple models can be useful even if they are not particularly predictive. For example, modelling the response of pIC50 to logP makes it easy to see the extent to which the activity of each compound beats (or is beaten by) the trend in the data. Provided that there is sufficient range in the data, a weak correlation between pIC50 and logP is actually very desirable and I'll leave it to the reader to ponder why this might be the case. My view is that ML-QSAR models are unlikely to have significant impact for predicting potency against therapeutic targets in drug discovery projects.

So that's just about all I've got to say. Have an enjoyable conference and make sure keep the speakers honest with your questions. It'd be rude not to.

Early evening in Barra

Saturday 7 April 2018

Hammett

I first became aware of Louis Hammett during the third term of my first year as an undergraduate at the University of Reading. Hammett was a pioneer in physical-organic chemistry and is widely regarded as one of the founders of that field. He would have been 124 today and was less than a year younger than Christopher Ingold, another pioneer in the field. Hammett passed away in 1987 at the age of 92 (here is an excellent obituary).

Today Hammett is remembered primarily for the parameters that describe electronic interactions between aromatic rings and their substituents. He also introduced linear free energy relationships which form the basis of classical QSAR. These days, QSAR has evolved away from its origins in physical-organic chemistry into what many call machine learning and parameters have become less physical (and considerably more numerous). Hammett's work provided an early lesson to wannabe molecular designers in how to think about molecules.

Jens Sadowski and I introduced matched molecular pair analysis (MMPA) in a chapter of a cheminformatics book that was conceived and edited by my dear friend (and favorite Transylvanian) Tudor Oprea. Here's a photo of Tudor and me at an OpenEye meeting (I think CUP II in 2001) during which our props (Tudor is wearing a PoD cape) were provided by the session chair (the formidable Janet Newman who intimidates proteins to the extent that they 'voluntarily' crystallize).

Now you might be wondering what MMPA has to do with Hammett. The short answer is that our book chapter included a table of what are effectively substituent constants for aqueous solubility and these have Hammett's fingerprints all over them. The longer answer is that Hammett introduced the idea of associating parameters with structural relationships (e.g. X is chloro analog of Y) between compounds. This is an important idea because much pharmaceutical design is focused on understanding and predicting the effects of structural modifications on the activity and properties of compounds. One rationale for this focus is the belief that it is easier to predict differences (e.g. relative affinity) in chemical behavior between structurally-related compounds than it is to predict chemical behavior directly from molecular structure.

At first, I didn't see the deeper connection between Hammett's work and pharmaceutical design. The main focus of our book chapter was preparing chemical structures in databases for virtual screening so the full extent of Hammett's influence on MMPA was not immediately recognized. As is often the case, we think we've discovered something really new only to find out later that somebody had been thinking along similar lines many years before.

Happy 124th birthday, Louis Hammett.

Sunday 1 April 2018

The maximal quality of molecular interactions

There is a lot more to drug design than maximization of affinity and the key to successful design is actually that drugs form high quality interactions with their targets. Before the epiphany of ligand efficiency, measurement of interaction quality was a very inexact science. Ground-breaking research from the Budapest Enthalpomics Group (BEG) now puts the concept on a firm theoretical footing by unequivocally demonstrating that individual interactions can be localized on the affinity-quality axis in a unique manner that is completely independent of the standard state definition.

The essence of this novel approach is that, in addition to to its contributions to enthalpy and entropy of binding, each molecular interaction will now be awarded points for the artistic elements of the contact between ligand and target. This industry-leading application of Big Data uses the Blofeld-Auric Normalized Zeta Artificial Intelligence (BANZAI) algorithm to score aesthetic aspects of molecular interactions. This revolutionary machine learning application uses variable-depth, convolutional networks to model the covariance structure of the reduced efficiency tensor. Commenting on these seminal and disruptive findings the institute director, Prof. Kígyó Olaj, noted that "the algorithm is particularly accurate for scoring synchronization of vibrational modes and is even able to determine whether or not a hydrogen bond has made deliberate use of the bottom of the pool to assist another hydrogen bond during the binding routine".