Category: NGS

  • NGS Company Infographic

    While pulling together the most recent blog post (titled “NGS Necropolis Part 1 – the QIAGEN GeneReader”) I realized there was a lot of dates and numbers and names that lent itself to a simple timeline. And then it turned into a table, because I could then cram in a lot more information.

    So here you have it – no less than 13 companies who started as early as 1999, sorted by commercialization launch year, with acquisition dates, year founded dates, and year taken off-market (if applicable). Enjoy!

  • The NGS Necropolis Part 1: The QIAGEN GeneReader

    The NGS Necropolis Part 1: The QIAGEN GeneReader

    Since the launch of the first massively parallel sequencer in 2005, the pyrosequencing-based 454 GS20, many approaches have been developed and worked on. Wide adoption of several of these have not happened. In this series a few prominent platforms will be described, the first of which is QIAGEN’s GeneReader.

    AI's interpretation of what a QIAGEN GeneReader looks like.
    This is an AI interpretation of a prompt “Use an image of a QIAGEN GeneReader next-generation sequencing instrument as the centerpiece, decorate it with DNA helices in a lattice formation like a wreath around its top, and the setting of a deep blue lake with light shimmering on the surface of the water.” The artistic model was SDXL 0.9.

    A brief history of the NGS market: from Zero to $7 Billion in 18 years.

    In 2005 the 454 Life Sciences company launched the GS20, the first “next-generation” sequencer (“first generation sequencing” of course is the venerable Sanger method read out via capillary electrophoresis, which was how the Human Genome Project was completed in 2002). The 454 GS20 offered about a million reads per run of about 80 to 100 bases in length in a 10 hour run, or almost 1 million bases, which was a much higher capability than what could be done via Sanger at that time. Only two years later in 2007 Roche Diagnostics acquired 454 for $155 Million, and iterated the platform for an upgraded model called the Roche / 454 GS Flex with increased readlength in 2008, and a smaller unit called the GS Junior in 2009.

    Simultaneously, in this backdrop in late 2006 Illumina acquired Solexa for about $600 Million and started selling the first Genome Analyzers in early 2007. The throughput at the beginning was about a full 1,000 times what 454 could do, although the reads were much shorter. In an approximately 2.5d (~60 hour) run, the 1G could produce a gigabase of data, with the first iteration of reads as short as 25 base pairs and then steadily increasing in that first year to 37 and then to 50 base pairs. Of course in the intervening 15 years the industry has seen Illumina scale this technology in a nothing-short-of-spectacular way:

    Comparison FeatureSolexa 1GIllumina NovaSeq X
    List Price of the Instrument$400,000$1,250,000
    Price to run a Single Experiment (“Cost to press GO”)~$3700~$19,000
    Total Sequencing Yield from a Single Experiment~800 MB6 TB (6,000 GB or 6,000,000 MB)
    Read LengthSingle-end (1x) 25 to 37 basesPaired-end (2x) 150 bases
    Cost per Gigabase~$4,600~$2.00*
    Fold-reduction in Cost per GigabaseN/A2,300-fold
    If a luxury car cost this in 2005 what would it cost in 2023$100,000$100,000 / 2,300 = $43.48
    If a house cost this in 2005 what would it cost in 2023$500,000$500,000 / 2,300 = $217.39
    Comparing the first Solexa 1G (later renamed the Illumina Genome Analyzer) to the latest iteration of Illumina’s highest throughput system, and a comparison in price per Gigabase (one billion bases of sequence)

    It was in 2006 when Agencourt was acquired by Applied Biosystems for $120M, which was then launched as the Life Technologies’ SOLiD next-generation sequencing system in 2008. The sequencing-by-ligation based system went through five iterations from 2008 through 2012, overlapping with Life Technologies purchasing Ion Torrent in 2011 and started selling the Ion Torrent PGM (“Personal Genome Machine”). As a side note, yours truly started at Illumina in 2003 as a product manager first, and then in 2005 started selling Illumina whole-genome microarrays in the Mid-Atlantic area (including the US National Institutes of Health), and sold the first Genome Analyzers through 2009; in 2010 I started selling the SOLiD 4 systems for Life Technologies and then after the Ion Torrent acquisition moved back into a Regional Marketing role.

    Of course Complete Genomics launched their service business for whole-genome sequencing in 2009 (acquired by BGI in a $117M merger in 2012), Helicos launched their true Single Molecule Sequencing system in 2008 and closed their business in 2012, and Pacific Biosciences made a splash launching the PacBio RS in 2011. Illumina, Ion Torrent (underneath the Thermo Fisher Scientific company), Pacific Biosciences, BGI / MGI / Complete Genomics are all selling (and supporting) their respective products and services; Helicos was recently written up here if you are interested in that history; and Roche / 454 discontinued that product line in 2013.

    QIAGEN’s Investments in NGS

    In this Part 1, here we describe the QIAGEN acquisition of Intelligent BioSystems in 2012 (for about $50M). In future posts the Bio-Rad acquisition of GNUbio in in 2014 (for about $40M) and the Roche acquisition of single-molecule sequencing firm Genia in 2014 (for $125M in cash and $225M in contingent payments) will be discussed, along with many other NGS startups in various stages of development.

    QIAGEN went the furthest (relative to Bio-Rad and Roche) with their sequencing platform the GeneReader; way back in 2012 I wrote up this post on The Next Generation Technologist blog about the Mini-20 system that Intelligent Biosystems had developed, and their individually-addressable 20-lane system. It was revamped, and took some time in development; understand that QIAGEN had purchased SuperArray Biosciences, also called SABiosciences, in 2009 (for $90M); SABio had pathway- and disease-specific qPCR panels, however they had in development a single-primer enrichment technology, now launched as QIAseq target enrichment panels. QIAGEN also had acquired Ingenuity Systems in 2013 (for $105M), primarily for Ingenuity’s Variant Analysis™ and iReport™ capabilities, now embodied in QIAGEN’s Clinical Insight (QCI) with clinical-grade variant analysis and reporting capability.

    Leveraging capabilities of the QIAGEN QIAcube and QIAsymphony sample preparation and liquid handling automation for library preparation, their vision and goal was to implement a user-friendly clinical sequencing platform, following a “From Sample to Insight” mantra. Having the front-end sample preparation, library preparation, NGS sequencing and clinical interpretation and reporting from a single vendor is very attractive from both a customer perspective as well as a business one. (In consultant jargon, this is called increasing the value capture model.)

    It was November 2015 at a conference in Austin Texas (Association for Molecular Pathology or AMP) where QIAGEN announced the launch of the GeneReader, as “the world’s first truly complete NGS workflow”. I happened to attend that conference, and was in the audience when the then-QIAGEN CEO Peer Schatz made the announcement.

    While not very public about it, apparently the run capacity was up to four flowcells, and each flowcell could accommodate 10 samples; flowcells could be added mid-run using a turntable inside the instrument. QIAGEN emphasized how flexible the system is to add additional samples while existing samples are currently running.

    A year later, in November 2016 QIAGEN announced the ‘relaunch’ of the GeneReader product, with new chemistry. This was in direct response to a lawsuit Illumina pursued in mid-2016, and won a preliminary injunction against QIAGEN of September 2016 for patent infringement. A few years later in 2019, QIAGEN took a $200M charge to exit the development of a clinical version of the GeneReader platform.

    One analyst described their clinical effort as a ‘nice niche’, however subsequent agreements with Illumina to develop IVD panels for the MiSeqDx, NextSeq 550Dx and future diagnostic systems is a clear signal that the GeneReader is something of a dead end.

    In the meantime, QIAGEN continues to sell (and develop) enrichment on the front-end for NGS, and further refine their clinical reporting capability.

    A few lessons from the GeneReader

    A few lessons can be drawn from this GeneReader story. (N.B. – I have no financial relationship with QIAGEN, although I know a lot of people there – or who were there – when I started my career in life sciences vendors in 1999 as a Technical Support scientist, and then as a Product Manager.)

    One lesson is the importance of a clear value proposition and differentiator. Illumina in 2016 had just launched the MiniSeq and iSeq, tackling the low-end of the market. (The price of a MiniSeq instrument was $49,500, and still sold today.) Right around the corner in March of 2017 the first NovaSeq 6000 system launched. Thermo Fisher Scientific’s Ion Torrent division announced a new system called the GeneStudio S5 and a new higher capacity sequencing chip (called the Ion 550™ Chip), and with their Oncomine combination of cancer-specific multiplex-PCR-based AmpliSeq™ panels as well as their clinical software (also somewhat confusingly called Oncomine), Thermo Fisher has a decent-size footprint in the clinical market.

    Where would the GeneReader fit in against this backdrop? The throughput was just too low to be competitive with the existing offerings from Illumina and Thermo Fisher Scientific, thus QIAGEN emphasized the ‘sample to insight’ and have a clinical offering with all-inclusive pricing (sample preparation, enrichment, sequencing, data analysis and clinical reporting).

    There was only one problem with this scenario: Thermo Fisher’s Ion Torrent systems were in exactly that same market: clinical oncology with library preparation through clinical reporting, and making a decent business from it. At least decent enough for further automation with the Ion Torrent Genexus system launch in March 2022. QIAGEN could not claim much higher accuracy, or lower cost structure, or superior sophistication of their reporting software, it was what can be called “me too” several years in the making, while Ion Torrent was already there (as well as Illumina clinical users who cobbled together all the requisite clinical parts).

    A second lesson is that the clinical market is price sensitive regarding adoption of new technology. There may be high value in a new kind of data-type (whether it’s sequencing data versus real-time PCR point mutations or single CNVs, or structural variation information from optical mapping systems like Bionano Genomics), however price is an objection. And not only the cost to run a single sample, but the costs of training personnel for a complex instrument workflow (thus the need for integrated automation, increasing complexity), the cost of acquiring the instrument in the first place, and lastly the costs of maintaining the instrument(s) needed (roughly 10% of the list price of the instrument each and every year that instrument is in service).

    QIAGEN had to be willing to invest many, many years into supporting the GeneReader to grab a slice of a relatively small and highly competitive clinical NGS market. Yes that clinical NGS market is growing (in comparison to the research market which is flat in comparison) but still those cost barriers remain. And there was nothing noteworthy in QIAGEN’s cost structure to their clinical customers; capturing the entire value chain was ultimately too expensive for a limited number of customers, and thus QIAGEN reversed course in 2019.

    The last lesson is that there will be new players and new technologies attempting to disrupt both the research and clinical markets, and what is true today in 2023 may not be true in 2025, or even perhaps 2024. Last week I spoke with a person with Element Biosciences, who is talking with many clinical customers about adopting an Element AVITI™ for their Laboratory Developed Tests (LDTs) thanks to their favorable economics. The PacBio Revio has favorable economics, as well as the PromethION 48, however is still too expensive relative to Illumina (and Element and Singular) short reads for wide adoption.

    On top of this new companies currently in “stealth” mode are working on cracking a $7 Billion dollar NGS market, the latest of which is Ultima Genomics that publicly announced last May (with a collection of pre-prints and several high-profile presentations) their system will be able to cut the cost of sequencing (in comparison to the Illumina NovaSeq X number above at $2.00/Gb) by a factor of 2, to $1.00/Gb. (Edit: here including a $2.00/Gb NovaSeq X price for a yet-to-launch “25B” NovaSeq X flowcell to achieve these economics; the original version of this post had the existing “10B” flowcell per-Gb pricing at $3.20/Gb.)

    Next month (2-6 November 2023) the American Society for Human Genetics will take place in Washington DC, and it will be exciting to see what kinds of advances are in store. I’ll be sure to share what I learn.

    Several AI Generated images using a “QIAGEN GeneReader as the centerpiece”. Room for improvement!
  • High Throughput NGS Systems: Throughput, Time and Cost Graphic

    High Throughput NGS Systems: Throughput, Time and Cost Graphic

    It is an exciting time for next-generation sequencing (NGS) with new platforms being launched. Below is a chart that illustrates recent progress.

    High Throughput NGS Systems compared by Gb/run, Days Required and $ per Gb
    High Throughput NGS Systems compared by Gb/run, Days Required and dollars per Gb

    A chart from 7 years ago…

    A few days ago I was reminded of a chart from 2016, when a blogger named Lex Nederbragt (now at the University of Oslo, Norway) as a result of competition in the NGS marketplace for both readlength and throughput made a handy chart with a lot of platforms on it. (You can see his original blog post with links to the image in a post called “Developments in high throughput sequencing – July 2016 edition“.)

    I was hopeful that that chart would have been updated in the intervening years, but alas his blogging moved platforms and then went quiet a year or so later. (And as a blogger myself, I can relate to the pressures of work and life as well as affiliation, which may or may not be conducive to this kind of activity.)

    The current NGS “arms race”

    So I thought about the current “arms race” of new platforms in the wings (i.e. Ultima Genomics is the top sponsor in February 2024’s Advances in Genome Biology and Technology), as well as new platforms only now getting into the hands of customers (specifically the Pacific Biosciences’ Revio, Element Biosciences’ AVITI, Singular Genomics’ G4, and the Pacific Biosciences’ Onso), and the renewed efforts of BGI / MGI / Complete Genomics now that they have the ability to sell their systems in North America and Europe. However complicating the MGI / Complete Genomics story, about a year ago parent company BGI Genomics was added to the US Department of Defense list of blacklisted companies – a GenomeWeb story with more details here, however as far as I can tell MGI / Complete Genomics continues to do business in the US.

    By pulling together specifications and prices (along with some handy source materials assembled by others) I constructed a list of some 21 existing (or in the case of Ultima soon to exist) offerings for sale, from Illumina’s iSeq all the way up to MGI’s monster DNBSEQ-T20x2. I took this list, calculated a US Dollar per Gigabase cost based on the highest throughput x readlength x time to sequence, and excluded all the other configurations. (For example, using a lower number of flowcells, or shorter runtimes for tag-counting applications were excluded.) I also noted the number of hours it took for this highest-throughput-per-system calculation.

    I then excluded all systems whose price-per-Gigabase of sequence was greater than $10 per Gb. (For those curious, if you figure 100 Gb of sequence per genome as a 33x WGS coverage, that’s the “$1000 Genome”). Therefore any system above the magic “$1,000 Genome” mark is not included, and you have the chart above: Gigabases per run on the X-axis, Hours per run on the Y-axis, and the size of the bubble in terms of US Dollars per Gigabase is relative to each other; the smaller the bubble, the lower the per-Gb cost.

    A few observations

    The market leader (estimated market share is about 75%) is of course Illumina, going through an upgrade cycle on the NovaSeq X where the per-Gb price on the NovaSeq 6000 at $4.84 drops to $3.20 in the latest iteration of the flowcell (these were released in February 2023, the 10B). A newer one (25B) will further drop that per-Gb price to $2.00 or so in the latter half of this year.

    Element Biosciences has a ‘package deal’ to get to $2.00 per-Gb, however that’s dependent on special discounting and large purchase commitment; I’ve left it at their current maximum capacity use-case.

    The Pacific Biosciences’ Revio did not make the cut due to higher than $10/Gb cost (from the pricing I’ve seen it’s about double that), but the Oxford Nanopore PromethION made it at exactly $10/Gb. Pretty remarkable that you can get a long-read whole genome for $1,000 when you think about it, even if it takes several days to produce the data.

    The MGI / Complete Genomics systems are certainly price-competitive – and the DNBSEQ-T20x2 broke the chart at 72,000 Gigabases per run, at $0.99 per Gb. Yes, that’s 720 whole genomes at 33x every 4 days. Their other system, the T7, has a few installations worldwide when they were effectively blocked from selling them in North America and Europe due to patent infringement (and an injunction).

    For the new Ultima system (called the UG 100), it has a relatively short runtime (24 h), a very low per-Gb price at $1.00, and at 3,000 Gb/run that is 30 whole genomes a day. Certainly a platform to watch, especially with the November 2023 ASHG conference coming up next month (in Washington DC) and the February 2024 AGBT conference (in Florida).

    I will be attending ASHG this year, and if you’d like to meet in-person during that conference be sure to reach out!

  • The Unmet Needs of Next-Generation Sequencing (NGS)

    The Unmet Needs of Next-Generation Sequencing (NGS)

    There are plenty of unmet needs in the current iteration of NGS, not the least of which is the effort involved in getting plenty of sequence data

    A short list

    The current NGS market is estimated to be about $7,000,000,000 (that’s $7 Billion) which is something remarkable for a market that started only in 2005 with the advent of the 454 / Roche GS20 (now discontinued). After the market leader Illumina there are alternative sequencing platforms such as Ion Torrent / Thermo Fisher Scientific, newcomers Element Biosciences, Singular Genomics and Ultima Genomics, and single molecule companies Pacific Biosciences (also known as PacBio) and Oxford Nanopore Technologies (also known at ONT).

    With major revenues coming from cancer testing (Illumina estimates NGS at $1.5 Billion of the oncology testing market), genetic disease testing ($800 Million) and reproductive health ($700 Million), NGS is well-entrenched in these routine assay fields.

    Yet from a different lens, that oncology testing market of $1.5B is from a total of a $78B cancer testing market, or a 2% penetration of the cancer testing landscape. Similarly, both genetic disease (a $10B market) and reproductive health (a $9B market) NGS has penetrated only about 8% of each of these markets. So there is plenty of room to grow.

    Yet what is holding the adoption of NGS back from more clinical applications? Here I propose a (relatively short) list of unmet needs, which serve as barriers to adoption. To look at it another way, this is a list from which current NGS providers (as well as new NGS providers) would do well to improve upon.

    The list is:

    • PCR bias
    • Sample input amount
    • Library preparation workflow
    • High cost of instrumentation and reagents
    • Sequencing run-times

    We’ll address each of these in order, and comment how single-molecule sequencing (aka ‘third-generation sequencing’, but that term really isn’t used much any more’) has addressed this issue (or not, as the case may be).

    PCR Bias

    PCR bias is a thing that people doing routine NGS may not think about, but for those who are doing whole genome assemblies or otherwise have to get sequence data from areas particular G-C or A-T rich, this is a Big Problem. Because PCR works on the basis of the short oligonucleotides hybridizing under a set of temperature, salt and cation concentrations, the melting temperature (Tm) of the short DNA primers is really important. PCR also depends upon the strands denaturing and re-annealing at certain temperatures, and in all this limits of G-C percent of all sequence being amplified has a very strong influence on efficiency of the reaction.

    This is a complex topic, where research papers dwell on a computer scientist’s love for sequence windows and normalized read coverage and statistical equations, suffice it to say bias can go in all kinds of directions, and there’s a lot of variability in several dimensions. (See figure – both sequence plots are from the same species of bacteria, just different strains: S. aureus USA300 and S. aureus MRSA252. One has a falling coverage as a function of G-C content, and the other a rising coverage plot (!)

    Figure 1 from Chen YC and Hwang CC et al. Effects of GC bias in next-generation-sequencing data on de novo genome assembly. PLoS One. (2013) 8(4):e62856.

    A nagging question though – what do these coverage plots imply for all the missing data? A given genomic region could have all kinds of G-C content variation making it very resistant to study. Single molecule sequencing has largely solved this G-C bias question (link to 2013 publication) but the PCR bias is something to still consider.

    Sample input amount

    Back in the early days of the Ion Torrent PGM (around 2011 to 2012), a key selling point for the Personal Genome Machine was ability of the AmpliSeq technology to use only 10 ng of FFPE DNA as input material, whereas the Illumina hybridization-based enrichment methods could not touch. (Their approach at that time was in the 100’s of ng input.)

    As many of you are aware, FFPE tissues are in limited supply, the fixation and embedding process damages nucleic acids (typically fragmented to about 300 bp in length) and in the current Standard of Care the FFPE process is firmly embedded in the surgeon – to – pathologist workflow in the hospital environment.

    Another instance of limiting input is for cell-free DNA analysis, where amounts of cfDNA from a given 10 mL blood draw is a range from 10 ng to 40 ng. (This varies by individual, healthy versus diseased, as well as inflamed or normal along with a host of other variables.) And from this limited amount of DNA, companies like GRAIL and Guardant are detecting down to one part in a thousand, or 0.1% minor allele fraction (MAF). They use plenty of tricks and techniques to generate useable NGS libraries.

    Yet with a system that does not use PCR for their library preparation (Helicos, covered here) one limitation of using 1 – 9 ng of input material (and simply adding a Poly-A tail to the DNA) was that barcodes could not be added. Thus a key design feature of the Helicos instrument was a flowcell with 50 individual lanes, one per sample. It makes me think they could have perhaps been a bit more imaginative with the sample preparation and somehow add a sample barcode before polyadenylation to enable sample multiplexing.

    Library preparation workflow

    Ask anyone who does routine NGS library preparation, with or without the aid of liquid handling automation, and they will tell you it’s work. You purify DNA, you do an enrichment step (whether multiplex PCR like AmpliSeq or QIAseq or Pillar SLIMamp or a hybrid capture step from NimbleGen / Roche or Agilent SureSelect), you cleanup those reactions, you ligate adapters, you purify it again, you setup a short-cycle PCR to add sample indexes, you clean it up again, you quantitate via fluorometry or qPCR.

    And then do some calculations to normalize molarity.

    After loading the NGS instrument you wait for cluster generation / emulsion reaction with beads, and then you sequence. You are glad you are not in the early 2010’s with separate instruments for these processes, but still this all takes time.

    And there still is a danger of overloading the NGS instrument. This danger is done away with in single molecule sequencing (a single pore or Zero Mode Waveguide can only handle one molecule, higher concentrations of molecules doesn’t matter).

    High cost of instrumentation and reagents

    A NovaSeq 6000 is almost a cool $1,000,000. A single run at maximum capacity on the NovaSeq is $30,000. This gets the cost-per-Gigabase down to $5.

    The newest NovaSeq X from Illumina is even more – a cost of an eye-watering $1,250,000. This gets the cost/GB down to $3, with additional cost lowering effort with higher throughput flowcells later in 2023 to about $2/GB.

    For single molecule sequencing, the instrumentation is still high. A PacBio Sequel II is $525K, the new PacBio Revio about $750K. One exception is the ONT PromethION is $310K. However for all these instruments, the cost per GB is very high – Sequel per-GB cost is around $45 (that’s a good 50% more what the per GB cost on an Illumina NextSeq, however you are getting long reads with the Sequel). Revio is a lot better at $20/GB, so better than the NextSeq short reads but still 4x what a NovaSeq 6000 costs per GB.

    ONT is an attractive $10/GB, still 2x the NovaSeq though.

    In many ways single molecule sequencing is far superior to short-read NGS for clinical WGS. (See my friend Brian Krueger’s LinkedIn post – and poll – here and some great insights about the value of clinical WGS in an older post here.) The main barrier to wider clinical use of WGS is cost, and with a need for at least 15x if not 30x genome coverage, that’s 50GB to 100GB of sequence data. Put another way, currently on ONT the cost to generate WGS data is $500 to $1,000, while on the PacBio Revio it is $1,000 to $2,000, which is still too expensive (even though it has broken through the ‘$1000 Genome’ barrier).

    Sequencing run-times

    Lastly NGS takes a long time to run. A MiSeq running 2×150 paired-end (PE) reads is over 48 hours; a NextSeq run is at least over a day; and a NovaSeq almost two days.

    Here single molecule sequencing isn’t much faster, a Sequel II run takes over a day, Revio still takes a day, and ONT is three days (in order to maximize throughput, these pores are allowed to run for a long time).

    For ultra-fast WGS, ONT was used to get WGS data in 3 hours on the PromethION (GenomeWeb link paywalled) for newborn screening in the NICU, where fast turnaround time is paramount. There are some eight programs worldwide underway to utilize NGS in newborn screening programs, enrolling from 1,000 to over 100,000 infants. One prominent one is Stephen Kingsmore’s Rady Children’s Institute “BeginNGS” program, subtitled “Newborn genetic screening to end the diagnostic odyssey”. (Here’s a paywalled 360dx article laying out the details of these eight different genomic screening for newborn programs worldwide).

    Anything else?

    Okay, that is my list of ‘unmet needs in NGS’ with some problems solved by single molecule approaches, yet other problems remain. Did I miss anything else?

  • Observations about Helicos, a single molecule sequencer from 2008

    Observations about Helicos, a single molecule sequencer from 2008

    A brief history of Helicos Biosciences

    Does anyone remember Helicos Biosciences? Way back when, in 2009 (per Wikipedia) Stanford co-founder Stephen Quake had his genome sequenced (and published in the prestigious journal Nature Biotechnology) for a reported $50K cost in Helicos reagents; that year I remember hearing a talk given by Arul Chinnaiyan at the NCI with single-molecule RNA-Seq data, it was an exciting time.

    Crunchbase indicates Helicos raised $77M and went public in 2007; they shipped their first Heliscope in 2008, only to be delisted in 2010 and then declared bankruptcy in 2012. Remember,  the Solexa 1G / Illumina Genome Analyzer (“GA”) only started selling commercially in late 2006. So those early days it was something of a dogfight. I was selling Illumina microarrays for GWAS to the NIH from 2005 onward through the first GA’s and GA IIx’s and then from 2010 started selling the Applied Biosystems / Life Technologies’ SOLiD 4’s.

    A few distinguishing features

    For those familiar with library preparation for Illumina sequencing, it takes time and several rounds of PCR and PCR cleanup, along with quantification, to be ready for sequencing. For Helicos, it used poly-adenylated nucleic acid to bind to the flowcell (and then the chemistry would sequence the DNA directly, without any further amplification in emulsions, nanoballs, or clusters depending on the platform).

    As the worlds’ “first true single molecule sequencer”, the DNA sequence had no amplification bias and could read high GC or low GC stretches of DNA without any impairment to the accuracy. The bases were accurate to about 96%; this 4% error was not sequence dependent and was basically random, simplifying analysis. There was only a tiny amount of sample input required; 3 ng input amounts of RNA or DNA. The two flowcells gave the instrument the capability to run 50 samples in 1 run, which took 7 or 8 days to complete. And at a 2008 price per sample of about $325 (for about 14M unique reads, this info is from an old GenomeWeb interview), the price/sample for RNA-Seq was attractive, although it would require a 48-sample experiment (two of the 50 lanes were reserved for controls), or some $15,600 for a single experiment, which naturally would limit the market for high-throughput operations.

    In 2008 the Solexa 1G and Genome Analyzers were all the rage

    The Solexa acquisition occurred in Nov 2006 and several Solexa 1G’s had already been shipped and started producing data in customer laboratories, and at only 25 basepair (bp) reads they still produced about 800 MB of sequencing data per 3+ days’ sequencing run. I was selling microarrays to the NIH for Illumina since 2005, and in early 2007 the first Solexa 1G was installed at the laboratory of Keji Zhao at NHLBI, who ended up being the first person to publish a ChIP-Seq paper using NGS (it was in the journal Cell in May 2007, here’s a PubMed link).

    By the time Helicos commercially launched with their first commercial sale in February 2008, Illumina had already sold 50 Genome Analyzers by the summer of 2007, and by February 2008 had updated the instrument to do paired ends and the readlengths were extended from 25 bases to 35 bases. Illumina announced progress to moving to 50 base readlengths.

    Against this backdrop, Helicos designed and build their ‘Heliscope’ single molecule sequencer to be highly scaled: 50 channels, about 25GB of sequence data per 7 or 8 day run, read lengths 25 to 55 basepairs long with an average of about 30 to 35, about 4% error rate with bases having a G/C content ranging from 20% to 80%, and the error model was random (no systematic bias which was their big selling point against Illumina, where cluster generation as well as library preparation uses a form of PCR amplification introducing bias).

    And according to a conversation I had this week, the price of the Heliscope was also scaled: $1.2M was the instrument at the start in 2008, and steadily lowered over time to about $900K in 2011 when they ceased operations. Requiring 48 samples for an RNA-Seq experiment, taking an entire week to generate data, and costing over $15,000 was a tall order to fill; sequencing a whole genome for $50,000 was also not something many laboratories or individuals could afford to spend in 2008.

    Important aspects of the Heliscope

    Being able to source a 2008 product sheet of the HelicoScope (PDF), the data storage capacity on-board (remember this is 2008) was a whopping 28 Terabytes. This was to store enormous imaging data for the flowcells, of which there were a pair of them, and to do all the image registration and base calling. Any way you look at it, a single run producing 25 Gigabases of sequencing data in 2008 was going to pose some challenges.

    And this instrument was big: the spec sheet says the main Heliscope sequencer was four feet by 3 feet by 6 feet tall, and a whopping 1,890 lbs. An 800 lb block of Vermont granite was included at the bottom of the instrument to stabilize it against vibration. However it’s clear from a photograph of the instrument that they were fitted with wheels, so you can say it was portable, as much as a 2,000 lb instrument is portable.

    The world’s first single molecule sequencing technology (they trademarked the name of their technology, calling it True Single Molecule Sequencing (tSMS)™), the chemistry was not in ‘real-time’ like the latest PacBio Revio™ or Oxford Nanopore PromethION™, it was sequencing-by-synthesis of a single base and then imaging the entire flowcell surface. With two flowcells (each with 25 lanes), one would be imaged while the other had its flowcell biochemistry being performed. Impressively (or perhaps not that realistically?) they claimed improvements in the flowcell density and tSMS reagent efficiency they promised to eventually produce 1GB of sequence per hour (about 7x the above numbers in terms of density and thus overall throughput).

    One source told me in those days the flowcell had uneven densities of poly-T molecules, so there were unusable areas to call bases. If it was too sparse, not worth the effort of scanning and analyzing; if it was too dense, the signals would collide and no usable sequence could be obtained. The original design however scanned all the surface of all 50 channels; usable data or not, all the images were scanned and analyzed. There wasn’t the luxury of time and engineering resources to optimize this.

    What was the cause of the ultimate demise of the Heliscope?

    Not only was the instrument cost an issue, there was also the problem of getting to longer reads. In 2008 Illumina was getting 35bp reads and on their way to 50bp reads, along with paired-end capability that meant a large increase in throughput. (For those unaware, in 2023 these reads now go out to 300bp.) Helicos could not catch up; due to the likeliness of restrictions on detectability and the optical system, the laser illumination to excite the fluor labels on the nucleotides would also hold the potential to damage the DNA from being a usable molecule. And thus Helicos could talk about extending the average readlength from 35 (plus or minus 10 or 15 bases as it was a distribution of reads) to 50 or longer, but it just did not happen in the timeframe from 2008 to 2011 when Helicos stopped selling the Heliscope systems. It is my understanding that they did not sell many of these $1M systems, less than a dozen or so worldwide.

    Price of a new instrument from a new company at the $1M pricepoint is a tall order. One life science company that sold single-cell analysis equipment and consumables, Berkeley Lights (now renamed PhenomeX after an acquisition of the single-cell proteomics company Isoplexis) tried for years to reduce the size and cost of their flagship Beacon system, however were unable to and has a limited market for their analzyer.

    You can say Helicos paved the way for market reception of PacBio in 2012 and then Oxford Nanopore a few years later in 2015. The relatively high cost (some 7x to 10x on a cost-per-base relative to sequence data coming off of Illumina’s flagship NovaSeq X) remains a large barrier.

    Now that Element Biosciences, Singular Genomics, and Ultima Genomics (and let’s not forget PacBio’s Onso) are competing head-to-head with Illumina on short reads, is there room for innovation (and cost reduction) in single molecule long reads? I would certainly hope so.