The Unmet Needs of Next-Generation Sequencing (NGS)

Illustration of GC bias in NGS from two S Aureas strains.

There are plenty of unmet needs in the current iteration of NGS, not the least of which is the effort involved in getting plenty of sequence data

A short list

The current NGS market is estimated to be about $7,000,000,000 (that’s $7 Billion) which is something remarkable for a market that started only in 2005 with the advent of the 454 / Roche GS20 (now discontinued). After the market leader Illumina there are alternative sequencing platforms such as Ion Torrent / Thermo Fisher Scientific, newcomers Element Biosciences, Singular Genomics and Ultima Genomics, and single molecule companies Pacific Biosciences (also known as PacBio) and Oxford Nanopore Technologies (also known at ONT).

With major revenues coming from cancer testing (Illumina estimates NGS at $1.5 Billion of the oncology testing market), genetic disease testing ($800 Million) and reproductive health ($700 Million), NGS is well-entrenched in these routine assay fields.

Yet from a different lens, that oncology testing market of $1.5B is from a total of a $78B cancer testing market, or a 2% penetration of the cancer testing landscape. Similarly, both genetic disease (a $10B market) and reproductive health (a $9B market) NGS has penetrated only about 8% of each of these markets. So there is plenty of room to grow.

Yet what is holding the adoption of NGS back from more clinical applications? Here I propose a (relatively short) list of unmet needs, which serve as barriers to adoption. To look at it another way, this is a list from which current NGS providers (as well as new NGS providers) would do well to improve upon.

The list is:

  • PCR bias
  • Sample input amount
  • Library preparation workflow
  • High cost of instrumentation and reagents
  • Sequencing run-times

We’ll address each of these in order, and comment how single-molecule sequencing (aka ‘third-generation sequencing’, but that term really isn’t used much any more’) has addressed this issue (or not, as the case may be).

PCR Bias

PCR bias is a thing that people doing routine NGS may not think about, but for those who are doing whole genome assemblies or otherwise have to get sequence data from areas particular G-C or A-T rich, this is a Big Problem. Because PCR works on the basis of the short oligonucleotides hybridizing under a set of temperature, salt and cation concentrations, the melting temperature (Tm) of the short DNA primers is really important. PCR also depends upon the strands denaturing and re-annealing at certain temperatures, and in all this limits of G-C percent of all sequence being amplified has a very strong influence on efficiency of the reaction.

This is a complex topic, where research papers dwell on a computer scientist’s love for sequence windows and normalized read coverage and statistical equations, suffice it to say bias can go in all kinds of directions, and there’s a lot of variability in several dimensions. (See figure – both sequence plots are from the same species of bacteria, just different strains: S. aureus USA300 and S. aureus MRSA252. One has a falling coverage as a function of G-C content, and the other a rising coverage plot (!)

Figure 1 from Chen YC and Hwang CC et al. Effects of GC bias in next-generation-sequencing data on de novo genome assembly. PLoS One. (2013) 8(4):e62856.

A nagging question though – what do these coverage plots imply for all the missing data? A given genomic region could have all kinds of G-C content variation making it very resistant to study. Single molecule sequencing has largely solved this G-C bias question (link to 2013 publication) but the PCR bias is something to still consider.

Sample input amount

Back in the early days of the Ion Torrent PGM (around 2011 to 2012), a key selling point for the Personal Genome Machine was ability of the AmpliSeq technology to use only 10 ng of FFPE DNA as input material, whereas the Illumina hybridization-based enrichment methods could not touch. (Their approach at that time was in the 100’s of ng input.)

As many of you are aware, FFPE tissues are in limited supply, the fixation and embedding process damages nucleic acids (typically fragmented to about 300 bp in length) and in the current Standard of Care the FFPE process is firmly embedded in the surgeon – to – pathologist workflow in the hospital environment.

Another instance of limiting input is for cell-free DNA analysis, where amounts of cfDNA from a given 10 mL blood draw is a range from 10 ng to 40 ng. (This varies by individual, healthy versus diseased, as well as inflamed or normal along with a host of other variables.) And from this limited amount of DNA, companies like GRAIL and Guardant are detecting down to one part in a thousand, or 0.1% minor allele fraction (MAF). They use plenty of tricks and techniques to generate useable NGS libraries.

Yet with a system that does not use PCR for their library preparation (Helicos, covered here) one limitation of using 1 – 9 ng of input material (and simply adding a Poly-A tail to the DNA) was that barcodes could not be added. Thus a key design feature of the Helicos instrument was a flowcell with 50 individual lanes, one per sample. It makes me think they could have perhaps been a bit more imaginative with the sample preparation and somehow add a sample barcode before polyadenylation to enable sample multiplexing.

Library preparation workflow

Ask anyone who does routine NGS library preparation, with or without the aid of liquid handling automation, and they will tell you it’s work. You purify DNA, you do an enrichment step (whether multiplex PCR like AmpliSeq or QIAseq or Pillar SLIMamp or a hybrid capture step from NimbleGen / Roche or Agilent SureSelect), you cleanup those reactions, you ligate adapters, you purify it again, you setup a short-cycle PCR to add sample indexes, you clean it up again, you quantitate via fluorometry or qPCR.

And then do some calculations to normalize molarity.

After loading the NGS instrument you wait for cluster generation / emulsion reaction with beads, and then you sequence. You are glad you are not in the early 2010’s with separate instruments for these processes, but still this all takes time.

And there still is a danger of overloading the NGS instrument. This danger is done away with in single molecule sequencing (a single pore or Zero Mode Waveguide can only handle one molecule, higher concentrations of molecules doesn’t matter).

High cost of instrumentation and reagents

A NovaSeq 6000 is almost a cool $1,000,000. A single run at maximum capacity on the NovaSeq is $30,000. This gets the cost-per-Gigabase down to $5.

The newest NovaSeq X from Illumina is even more – a cost of an eye-watering $1,250,000. This gets the cost/GB down to $3, with additional cost lowering effort with higher throughput flowcells later in 2023 to about $2/GB.

For single molecule sequencing, the instrumentation is still high. A PacBio Sequel II is $525K, the new PacBio Revio about $750K. One exception is the ONT PromethION is $310K. However for all these instruments, the cost per GB is very high – Sequel per-GB cost is around $45 (that’s a good 50% more what the per GB cost on an Illumina NextSeq, however you are getting long reads with the Sequel). Revio is a lot better at $20/GB, so better than the NextSeq short reads but still 4x what a NovaSeq 6000 costs per GB.

ONT is an attractive $10/GB, still 2x the NovaSeq though.

In many ways single molecule sequencing is far superior to short-read NGS for clinical WGS. (See my friend Brian Krueger’s LinkedIn post – and poll – here and some great insights about the value of clinical WGS in an older post here.) The main barrier to wider clinical use of WGS is cost, and with a need for at least 15x if not 30x genome coverage, that’s 50GB to 100GB of sequence data. Put another way, currently on ONT the cost to generate WGS data is $500 to $1,000, while on the PacBio Revio it is $1,000 to $2,000, which is still too expensive (even though it has broken through the ‘$1000 Genome’ barrier).

Sequencing run-times

Lastly NGS takes a long time to run. A MiSeq running 2×150 paired-end (PE) reads is over 48 hours; a NextSeq run is at least over a day; and a NovaSeq almost two days.

Here single molecule sequencing isn’t much faster, a Sequel II run takes over a day, Revio still takes a day, and ONT is three days (in order to maximize throughput, these pores are allowed to run for a long time).

For ultra-fast WGS, ONT was used to get WGS data in 3 hours on the PromethION (GenomeWeb link paywalled) for newborn screening in the NICU, where fast turnaround time is paramount. There are some eight programs worldwide underway to utilize NGS in newborn screening programs, enrolling from 1,000 to over 100,000 infants. One prominent one is Stephen Kingsmore’s Rady Children’s Institute “BeginNGS” program, subtitled “Newborn genetic screening to end the diagnostic odyssey”. (Here’s a paywalled 360dx article laying out the details of these eight different genomic screening for newborn programs worldwide).

Anything else?

Okay, that is my list of ‘unmet needs in NGS’ with some problems solved by single molecule approaches, yet other problems remain. Did I miss anything else?

Comments

2 responses to “The Unmet Needs of Next-Generation Sequencing (NGS)”

  1. Masanari Kitagawa Avatar
    Masanari Kitagawa

    What is your assessment of BGI (MGI)?

    1. scooteradmin Avatar

      MGI / Complete Genomics is definitely going to be a part of the NGS landscape, given their freedom to operate (FTO) in the US (as the largest market) and lowering the cost-per-gigabase.

      However the competition in the NGS marketplace has shifted with Element Biosciences, Singular Genomics and now Onso / Pacific Biosciences all shipping systems, and Ultima Genomics getting ready to launch. Each system has its ‘sweet spot’ (whether compared to NextSeq or NovaSeq level of throughput / cost / instrument cost) and MGI’s approach is across the board.

Leave a Reply to Masanari Kitagawa Cancel reply

Your email address will not be published. Required fields are marked *