Your Questions Answered: High-Quality Starting Points for Drug Discovery

During our recent webinar, The Greatest Hits: A Beginner’s Guide to High-Quality Starting Points for Drug Discovery, Sam Ceusters (Head of Chemistry, SymeGold, Symeres), Leon van Berkom, and Rutger Folmer (both Directors of Chemistry, Symeres) fielded questions from researchers across the globe. Here’s what the experts had to say.

What’s the importance of carbohydrate-based scaffolds in the future of drug discovery?

It’s growing, but complicated. For practical reasons, carbohydrates haven’t historically been the first thing people reach for when designing a screening campaign. Carbohydrates are polar, often high molecular weight, and their ADME profile tends to be harder to optimize than a conventional small molecule. If you need oral bioavailability and blood-brain barrier penetration, a carbohydrate scaffold is going to be a challenge.

However, it’s not always that clear cut. The sodium-glucose co-transporter 2 (SGLT2) inhibitor class of drugs – like dapagliflozin and empagliflozin, for example, are carbohydrate-derived molecules that became major cardiovascular and renal medicines (1–3). They work by blocking glucose reabsorption in the kidney, and they’re now among the most widely prescribed agents for type 2 diabetes and heart failure. So in some cases, you can take a carbohydrate core, modify it into something genuinely drug-like, and end up with a clinically meaningful medicine.

Beyond these, there’s a growing interest in glycan biology, which plays important roles in immune recognition, cellular communication, cancer cell behavior, and bacterial infection. As the tools to study these interactions improve, we have an increasingly strong case for targeting them with small molecules.

There’s also a structural argument, since carbohydrates are inherently sp3-rich. They sit at the spherical end of chemical shape space – the same region natural products occupy and that drug discovery broadly is moving towards. From a pure library design perspective, that’s worth noting, since most high-throughput screening libraries bias toward flat, aromatic compounds, leaving the spherical end of this space is underrepresented.

The honest caveat here of course if that there are a real synthetic complexity and ADME challenges around carbohydrate-based scaffold, and the field is still developing reliable strategies for both. Glycomimetics – simplified, more drug-like versions of carbohydrate scaffolds that preserve key binding geometry – are where much of the current effort is concentrated (4), and that’s probably where the clearest near-term opportunities will emerge. Carbohydrate scaffolds won’t be right for every program, but for the right biology, they’re underexplored in a way that tends to mean interesting IP.

Regarding the semisynthetic compounds from your library, were the natural scaffolds already from known bioactive molecules?

Short answer – mostly yes, and that’s partly the point. Natural products make compelling starting points for drug discovery because evolution already did the hard work! Over millions of years, they’ve been refined to bind biological targets with reasonable potency and selectivity. They’re rarely flat, tend to be structurally complex, and often sit in exactly the kind of sp3-rich, three-dimensional chemical space that drug hunters now actively seek out. If you caught the Escape from Flatlands section of the webinar, this will ring a bell.

So, when we’re designing pseudo-natural product derivatives for SymeGold1, we’re often starting from scaffolds that are either known bioactives or closely related to them. Yes, this sometimes creates IP challenges, but there’s no patent on nature itself, and if you’re the first to synthesize and patent a novel analog that doesn’t exist in any prior art, that’s a legitimate and protectable compound. The natural product gives you the scaffold, while your chemistry gives you the IP.

It’s worth pointing out here that the same complexity that makes natural products biologically interesting can also make them a synthetic nightmare. A 60-step synthesis is impressive organic chemistry, but it’s not a viable starting point for making 120 analogs on a methyl position! The pseudo-natural compounds in SymeGold are designed with this tension in mind – we hold on to the biological inspiration, but we try to engineer tractability from the start.

How do you know when it’s the right time to move on from a hit series, and what does that decision process look like in practice?

There’s no clean rule, and anyone who tells you otherwise has probably never had to make the call.

The most useful framework Rutger laid out during the session is the target lead profile – a set of criteria you define before you start hitting things with a compound. Not the full target product profile that covers dosing and shelf life and patient subgroups, but the intermediate goals, like what does a compound coming out of hit-to-lead actually need to look like? Activity levels, selectivity against anti-targets, metabolic stability in microsomes, physical chemical properties – specific, measurable things.

With that in hand, the question becomes much more manageable. If every round of analog synthesis is moving towards those criteria – even slowly – you probably keep going. If you’ve done multiple iterations and nothing’s improving, or compounds are consistently unstable, or potency is being lost with every structural change you make, that’s your signal to consider moving on.

Rutger’s analogy was “When do you stop looking for a lost purse?” You never really want to give up. And whether you can give up also depends on context – how many other projects are running, how deep the budget goes, whether this is the only thing keeping the program alive. “Kill cheap, kill quickly” is the principle. Whether it’s applicable depends on what else you’ve got.

The practical trigger we use to decide when move on is when there’s no improvement towards the target lead profile, across a genuine run of chemistry. We always listen to the data.

We have a limited budget and can only afford to screen a subset of compounds. What criteria should we prioritize when selecting which libraries to use?

If budget limits you to a subset of a library, the worst thing you can do is pick compounds at random – or pick them all from the same region of chemical space. What you want in this case is coverage. The approach would be to cluster the full library by core scaffold, then pick representative molecules from each cluster. That way, even with limited numbers, you’re sampling across the structural diversity of the whole collection rather than drilling too deep into one chemical class.

Within each cluster, our advice is to favor the smallest, most polar compound, because molecular weight and lipophilicity tend to creep up as a compound progresses through optimization. Starting low gives you room to maneuver later.

There’s a trade-off here: if you pick just one compound per cluster, you won’t know at the outset whether an active compound is a genuine hit or a singleton – a molecule that looks active but has no active analogues, which is a much weaker starting position. If budget allows, three-to-five representatives per cluster is meaningfully better, since you get early structure-activity data alongside your hit. Once you find an active cluster, go deeper there. The initial subset is the map, and the hits tell you where to dig.

When you’re screening fewer compounds, assay quality really matters. If your hit rate is 0.1% and you’re running 3,000 compounds instead of 300,000, you might get just three hits to work with – so those three need to be real. Assay quality is easy to overlook because libraries are tangible and purchasable, and assay setup is fiddly and expensive, but not taking enough time here is where false positives come from.

How do you decide on the size and composition of a screening library when you’re working with a novel or poorly characterized target?

Leon led this one, and his position was clear: size matters less than you’d expect.

The intuitive answer is ‘screen more stuff, find more things.’ But for a genuinely novel target – unknown binding site, unknown mechanism, no established pharmacology – a large library with poor structural diversity serves you worse than a smaller one with broad coverage. What you need is for your compounds to be meaningfully different from each other in shape and chemistry, so that if a hit exists somewhere in chemical space, you’ve got a reasonable chance of having sampled somewhere near it.

Rutger added something that mirrors his answer to the budget question. With a well-characterized target – a kinase, for example – you already know what credible hits tend to look like, so you can apply prior expectations. But with a truly novel target, you don’t have that reference frame. Your hit output will look unfamiliar, which means the risk of misinterpreting false positives is higher. The assay needs to be more rigorous to compensate.

Leon also noted that novel targets almost always come with more target validation work baked in. If you find something that looks like a hit, you’ll need to confirm actual binding – not just activity in an assay that might be picking up off-target effects. That’s additional work worth planning for upfront, rather than discovering it two months into chasing a compound that isn’t doing what you think.

References

1. S. Park, J. Jung, J. Jeong, M. H. Jang, Y.-G. Kim, S. H. Ann, S.-J. Kim, S. Han, G.-M. Park, Cardiorenal outcomes of empagliflozin versus dapagliflozin in primary prevention among patients with type 2 diabetes: A nationwide cohort study. Diabetes Res. Clin. Pr. 236, 113239 (2026).

2. V. Vallon, S. Verma, Effects of SGLT2 Inhibitors on Kidney and Cardiovascular Function. Annu. Rev. Physiol. 83, 1–26 (2020).

3. P. McLean, J. Bennett, E. “Trey” Woods, S. Chandrasekhar, N. Newman, Y. Mohammad, M. Khawaja, A. Rizwan, R. Siddiqui, Y. Birnbaum, C. J. Lavie, S. Virani, K. E. Hachem, W. H. W. Tang, T. Ahuja, S. Isaacs, C. Krittanawong, SGLT2 inhibitors across various patient populations in the era of precision medicine: the multidisciplinary team approach. npj Metab. Heal. Dis. 3, 29 (2025).

4. L. Yuan, Y. Hua, X. Wang, Recent progress of glycomimetics in drug development. Org. Biomol. Chem. 23, 7671–7680 (2025).

Resources we think you'll love

Blog

DILI uncovered: from animal models to a weight-of-evidence future

Explore the science behind drug-induced liver injury (DILI), the limitations of current preclinical models, and how NAMs, in vitro systems and weight-of-evidence approaches are shaping the future of hepatotoxicity assessment.

Your Questions, Answered: Hit Discovery, Hard Choices, and the SymeGold Library

What’s the importance of carbohydrate-based scaffolds in the future of drug discovery?

Regarding the semisynthetic compounds from your library, were the natural scaffolds already from known bioactive molecules?

How do you know when it’s the right time to move on from a hit series, and what does that decision process look like in practice?

We have a limited budget and can only afford to screen a subset of compounds. What criteria should we prioritize when selecting which libraries to use?

How do you decide on the size and composition of a screening library when you’re working with a novel or poorly characterized target?

References

Related services

Resources we think you'll love

DILI uncovered: from animal models to a weight-of-evidence future

DILI Uncovered: how to mitigate hepatoxicity failure

The Greatest Hits: A beginner’s guide to high-quality starting points for Drug Discovery

ADME-Tox testing in drug discovery: your questions answered

Embracing scientific complexity to mitigate toxicity issues and development risks while advancing toward the clinic

Map your molecule’s next move with the Symeres developability roadmap

Three Signs Your Synthetic Route Will Collapse at Scale

Lead optimization: what data actually drive decisions?

When slowing chemistry speeds programs up

When a clean PK profile is actually a warning sign

CDMO red flags you can’t ignore: Communication breakdowns

CDMO red flags you can’t ignore: Capacity constraints and resource stretch

CDMO red flags you can’t ignore: Regulatory shortfalls and misalignment

5 CDMO red flags you can’t ignore: A guide for biotechs and pharma

CDMO red flags you can’t ignore: Underestimating technology transfer complexity

O.N.E Symeres: A practical approach to real-world drug development

CDMO red flags you can’t ignore: Undefined or shifting project scope

Accelerating chemical innovation: Unveiling Symeres’ parallel chemistry

From racemic to pure the art and science of enantiomer separation

IND & IMPD enabling developability roadmap

Innovations in unnatural amino acids: Advancing functional diversity and applications

Leveraging copper-catalyzed ullmann-type cross-coupling reactions in PR&D

Managing nitrosamines in the pharmaceutical industry: A comprehensive approach

Optimizing solid-state properties and enhancing API bioavailability through physicochemical prediction

Stable isotope-labeled compounds

Unlocking the potential of high-throughput screening: Symegold library design and expansion insights

Insights into drug discovery and development 2025

Interview with the computer-aided drug design (CADD) department

Meet the Organix Director, Mario Gonzalez

Interview with the new Managing Director of Symeres Groningen

An interview with Yadan Chen and Paul O’Shea

An interview with Anu Mahadevan and Paul Blundell

Crystalline and liquid crystalline 25-hydroxy-cholest-5-en-3-sulfate sodium and methods for preparing same

In vivo pharmacokinetic experiments in preclinical drug development

Accelerating medicinal chemistry by rapid analoging

Solid-state chemistry part II: Optimal form selection by controlled crystallization

Route scouting for kilogram-scale manufacturing of APIs

Solid-state chemistry part I: Introduction

Speak with our experts