Author: scooteradmin

  • High Throughput Single Cell Morphology from Deepcell

    High Throughput Single Cell Morphology from Deepcell

    Single cell analysis has been a mainstay of biological research for almost 4 decades. Is it time to study cells on the basis of their physical properties, instead of surface markers?

    A brief history of flow cytometry

    Going all the way back to 1879, the physics of droplet formation was explored by Lord Rayleigh, who discovered a property of fluids emerging from a jet orifice is hydrodynamically unstable, and breaks into smaller droplets due to lower surface tension. (Here’s a link to the publication, which is remarkable that you can see what the Royal Society of London’s publications were like almost 150 years ago.)

    In 1953, Wallace Coulter received a patent for “Means for Counting Particles Suspended in a Fluid”, where a particle passing through a constricted path will have a detectable change in electrical characteristics. This ‘Coulter Principle’ is used in hemocytometers today to get RBC (Red Blood Cell) counts, a mainstay of blood-based diagnostics. That same year, the principles of laminar flow was used to design a chamber for optical counting of RBCs; it would involve two streams of fluid, a fast-flowing ‘sheath’ and the RBCs injected interior to that stream, where the particles could be measured optically.

    This principle of ‘hydrodynamic flow’ is a first key principle of how flow cytometers work today.

    At Stanford University in 1965, an advance in ink jet technology used the property of independently-charged droplets for printing applications using a high-speed oscillograph. Thus droplets could be directionally controlled by an electrostatic charge. That same year at Los Alamos National Laboratories, Mack Fulwyler adapted a Coulter Cell Counting instrument to form droplets after counting; these droplets were then charged and deflected into a collection vessel. The principle and construction of the first cell sorting device was reported in this 1965 paper, “Electronic Separation of Biological Cells by Volume”.

    The first flow cytometers and cell sorters were commercialized in the late 1960s, and Beckton Dickinson introduced their first instruments in the early 1970’s. A workhorse in the biological sciences, the flow cytometry business (in the aggregate) has flourished, with current revenues above $4 Billion yearly. Applications span both basic research and applied diagnostic testing for cell counting, biomarker detection and cell sorting. The numbers of lasers (and dye combinations) for detection of different markers continue to rise, with as many as 12 fluorescence and 2 scatter parameters measured simultaneously.

    Imaging Flow Cytometry (IFC), introduced in 2011 and mainly used to detect extracellular vesicles and circulating exosomes, have increasingly been used for Circulating Tumor Cell (CTC) detection and characterization, as well as cells in states of transition (cell cycle phases such as mitoses). Deepcell has made its first instrument for high throughput imaging of cells only, without any cell labeling to sort cells, but rather their visual properties (that is, their morphology).

    High-speed brightfield imaging from Deepcell

    Deepcell was co-founded in the laboratory of Euan Ashley at Stanford University with Maddison Masaeli, who was a postdoc in the Ashley lab, and Mahyar Salek, who was involved in deep learning at both Google and Emotiful, with the concept to combine high-resolution imaging of single cells with the sorting of these cells in a label-free fashion by leveraging real-time artificial intelligence and deep learning.

    In May of 2023 the benchtop REM-I platform was announced. This platform is able to image cells at high speed, classify their morphology with Deepcell’s Human Foundation Model, and then sort them without the use of labels. Deepcell’s Axon data suite enables the user to represent the 115 dimensions of morphology measured by the Human Foundation Model as UMAP (2D Uniform Manifold Approximation and Dimensions) plots, a convenient method to reduce high dimensional data into a 2-dimensional map. (See figure 1 for a simple example of three cell lines mixed with polystyrene beads run on the REM-I platform).

    Figure 1: Evaluation of the Deepcell HFM with B) polystyrene beads and three cell lines. 6μm polystyrene beads, cancer (A375 and Caov-3) and immune (Jurkat) cell lines were imaged on the Deepcell REM-I platform, then combined in silico to evaluate the classification performance of the HFM.

    Real-time AI characterization in 115 dimensions

    Single cell analysis has progressed from a light microscope to genomic characterization of a cell population via single cell transcriptome analysis of all the expressed genes in the genome.
    The REM-I instrument takes cells in suspension, and are loaded and analyzed by an automated microfluidic system. The instrument captures high-resolution brightfield images, and can sort cells in six different ways for downstream analysis. (A typical analysis would be single cell RNA-Seq whole transcriptome analysis.) See figure 2 for an illustration of how the instrument works.

    Figure 2: The REM-I Instrument workflow, from cell suspension through examination of high-dimension cell morphology analysis, selecting cells of interest and recovering sorted cells.

    Through an artificial intelligence data model Deepcell calls the Human Foundation Model, cells can be identified through computer vision (to measure morphometrics) and self-supervised deep learning to derive 115 dimensions from each image. There are two different encoders analyzing an image: one called the Deep Learning Encoder (utilizing a Convolutional Neural Network) that extracts 64 dimensions of Deep Learning features, and a second one called the Computer Vision Encoder (utilizing human constructed algorithms) extracting an additional 51 dimensions of morphometric features. These 51 morphometric dimensions are what you normally consider as morphology – cell shape, pixel intensity, and image texture, while the 64 Deep Learning dimensions are not human-interpretable.

    Their Human Foundation Model has been pre-trained on a variety of cell types, including normal adult blood cells, fetal blood, trophoblast cell lines and a variety of cancer cell lines. Over 25 million high-resolution images were used to derive AI-assisted cell image annotation.

    It’s remarkable that so many parameters could be extracted from a single brightfield image, and in real-time; in a 2023 Nature publication (Communications Biology), a paper about the REM-I platform technology (“COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning”) showed data from 600 cells per second up to 6,000 cells per second.

    Will Morpholomics become mainstream?

    It should come as no surprise that with the wealth of genomic, transcriptomic, epigenetic and proteomic datasets now available, there is great interest at the intersection of these “omics” for furthering both basic research and personalized medicine. Other less-known fields such as metabolomics and lipidomics are limited through current methods of analysis, and may well become more important in the future as better methods of analyses become available.

    At the October 2023 American Society for Human Genetics conference in Washington DC, Deepcell presented a poster called “Morphology profiling driven by deep learning characterizes functional changes in CRISPR knockout cell lines” illustrating how morphology changes with single gene knockouts. 11 genes were individually down-regulated using CRISPR, and it was clear that distinct morphotypes could be distinguished between the individual edited cell lines. The cell images in the poster and its relationship to the respective area on the UMAP plots were convincing. (This poster is available for download from the Deepcell website here (PDF).)

    Could the “morpholome” become an accepted area of study? One tantalizing application is in the realm of non-invasive prenatal diagnostics; Deepcell’s CEO Dr. Maddison Masaeli said in this 2023 interview that circulating Trisomy 21 fetal cells were different than normal healthy fetal cells, and visibly so. This opens up the possibility for early and non-invasive screening methods for detecting certain genetic conditions. The potential for circulating tumor cell (CTC) detection is obvious, which has been intensively studied since the advent of the cell-surface marker EpCAM was used in the CellSearch CTC platform.[1] 

    Other application areas Deepcell is targeting include metastatic cancer research, translational research and Alzheimer’s Disease research (several presentation webinars covering these topics are on Deepcell’s Webinars page)

    There are several imaging flow cytometers on the market, however none with the deep learning cell model and real-time sorting capabilities in a label-free manner that Deepcell offers. [2] BioCompare has a brief review called Optimizing Cell Sorting covering the challenges of fluorescent probe panel design to look into this issue in a little more detail.

    Availability of datasets and ability to access technology

    If you’d like to take a look at several datasets along with Deepcell’s Axon software, you can sign up here or you can request a personal demo. Deepcell also offers a technology access capability they call the Spark Program, where you can send your samples to Deepcell for analysis, available here.

    Reference:

    1. Salek M and Masaeli et al. (2023) Commun Biol. 6(1):971. COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning doi:10.1038/s42003-023-05325-9
  • Interview and demonstration video of BioIntelli 2.0 Life Science Sales and Marketing lead generation

    Interview and demonstration video of BioIntelli 2.0 Life Science Sales and Marketing lead generation

    A few months ago I wrote about a new lead generation tool called BioIntelli 2.0 (you can find it here).

    David Hines and yours truly held a video interview, with a demonstration of BioIntelli 2.0’s new capabilities. This video is about 23 minutes long if you are looking for a tool for account mapping, new top-of-funnel contacts for a new customer acquisition campaign, or want to do some competitor analyses.

    Video interview and demonstration of BioIntelli 2.0 with David Hines

    You can contact BioIntelli by email at sales@biointelli.net or by phone at +1 (617) 444-8420 for more information.

  • Nabsys Genome Mapping Technology Launches at ASHG 2023

    Nabsys Genome Mapping Technology Launches at ASHG 2023

    Introduction to genetic structural variation

    It is an exciting time to be involved in genetics and its application to healthcare. It was only a little over two decades ago the first draft of the Human Genome Project was published, and last year the first telomere-to-telomere sequence of a single chromosome was achieved.  And the impact of next-generation sequencing is seeing in increasingly valuable applications in the clinic: as a companion diagnostic for targeted cancer therapy, as a method for prenatal non-invasive testing for trisomy, and for rare disease diagnostics. Yet there are still so many big problems that remain unsolved.

    One of the larger problems in genetics (and by association application and impact to healthcare) is the detection and characterization of structural variation. A single gene can be damaged via a multitude of mechanisms (such as non-homologous recombination), and is a different kind of variation compared to Single Nucleotide Polymorphisms (SNPs) which genotyping microarrays measure, or insertion / deletion mutations (called indels where a single base to several dozens of bases can be inserted or deleted) which sequencing can also detect.

    The size of these insertions and deletions, however, can exceed the resolving power of next-generation sequencing, where readlengths can be limited to 150 to 300 bases. There can be insertions and deletions of kilobases or hundreds of kilobases long, which will be invisible to NGS analyses.

    In a given individual’s whole-genome sequence, there will be some 4 to 5 million SNPs and indels detected. The structural rearrangements (above 50 bases of inserted or deleted nucleotides, to several million bases or even entire chromosome arms) go undetected. For clinical cases, a pathology cytogenetics laboratory routinely uses techniques such as Fluorescent In-Situ Hybridization (known by its acronym FISH), karyotyping and microarrays (typically aCGH or array Comparative Genomic Hybridization) to detect structural rearrangements and specific gene fusions for diagnosing and appropriately guiding the treatment of cancer.

    Figure 1 below (kindly provided by Nabsys) compares conventional next-generation Sequencing by Synthesis (SBS) to genome mapping.

    Figure 1: Sequencing by Synthesis (typical NGS method) compared to genome mapping. Image kindly provided by Nabsys.

    There are estimated over 20,000 structural variants in a single human genome, yet with current sequencing technology (including single molecule sequencing from manufacturers such as Pacific Biosciences or Oxford Nanopore Technologies) large swaths of genome sequence can be rearranged but go undetected.

    For example, say there is a balanced structural variant, where a large multi-megabase region is inverted. It is called balanced because there is no gain or loss of DNA sequence, however there is a stretch of several megabases in the completely opposite orientation. Even with the technical advances of single-molecule sequencing to the tens or even hundreds of kilobases long, detecting all the different kinds of variation with a wide range of sizes and complexity remains a challenge.

    Mapping versus long read sequencing

    One definite trend over the past few years has been a consistent increase in throughput of short read sequencing, in addition to the similar throughput increases in long read sequencing as well. However on a cost-per-gigabase basis, long read sequencing remains 5-fold to 10-fold more expensive, severely limiting its applicability to clinical applications.

    Genome mapping using an optical method has been on the market for several years from Bionano Genomics, and is accepted as a complement to whole genome or whole exome sequencing to understand the nature of structural variants and disease. Nabsys now offers better resolution of variants at lower cost, detecting SV’s as small as 300 base pairs with >100kb long segments of the genome electronically mapped.

    Nabsys OhmX™ technology

    For a Nabsys run, high-molecular weight genomic DNA (50 kb to 500 kb) is first nicked using sequence-specific nickase enzymes, that could be used alone or in combinations, then labeled and coated with a protein called RecA (the RecA protein serves to stiffen the DNA for analyses). The samples are injected into the instrument, and the data is collected.

    Single DNA molecules are translocated through a silicon nanochannel, and the labeled locations are electronically detected to determine the distance between sequence-specific tags on individual molecules. While each electronic event is measured across the linear DNA molecule, there is a time-to-distance conversion and the entire genome has enough overlap to assemble what is effectively a restriction map of overlapping fragments (see figure 2).

    Figure 2: Individual molecules labeled with sequence specific labels, measured in a Nabsys OhmX Analyzer using a Nabsys OhmX-8 nanochannel device, and assembled into a Genome Map. Drawing courtesy of Nabsys.

    This capability was showcased a few years ago for microbial genomes, and a few publications1, 2, 3 show the proof of the approach for analyzing DNA maps this way at single-molecule resolution in bacterial genomes.

    With the recent commercial release of the Nabsys OhmX Analyzer system and OhmX-8 Detector consumables, a 10-fold increase in throughput has been achieved combined with 250 electronic detectors per channel. Nabsys uses a kit for efficient high molecular-weight DNA extraction and labeling in preparation for loading onto the system. (The sample input requirement is 5 ug of starting material, sufficient for several instrument runs if necessary; less input can be used if DNA quantities are limited.) In addition, as there are no optics (only fluidics and electronics) the Nabsys instrument is much more compact and less expensive than the equivalent optical instrument, as well as less expensive to run.

    Applications for human disease: cancer and rare disease

    Cancer has been correctly described as a ‘disease of the genome’, and as a research tool understanding the role structural variation has in cancer progression and treatment is an ongoing area of important work. Another important application of genomic mapping is for rare disease; currently it is estimated that about 70% of suspected Mendelian disorders go undiagnosed even with current short-read whole-genome sequencing4.

    It remains to be seen whether better detection and characterization of structural variation can provide the needed insights into these two important research areas, currently limited by cost of existing technology.

    Nabsys at ASHG 2023

    At the upcoming American Society for Human Genetics conference in Washington DC November 2 – 5, 2023 Nabsys will be present in the Hitachi High-Tech America Booth 1423. Hitachi will present their Human Chromosome Explorer bioinformatics pipeline for a low-cost, scalable Structural Variation validation and discovery platform.

    You can find out more about the Nabsys OhmX Analyzer here (a downloadable brochure is available on that page) and also more information about the overall approach to electronic genome mapping is here. A handy whitepaper about EGM can be found here (PDF).

    1. Passera A and Casati P et al. Characterization of Lysinibacillus fusiformis strain S4C11: In vitro, in planta, and in silico analyses reveal a plant-beneficial microbe. Microbiol Res. (2021) 244:126665. doi:10.1016/j.micres.2020.126665
    2. Weigand MR and Tondella ML et al. Screening and Genomic Characterization of Filamentous Hemagglutinin-Deficient Bordetella pertussis. Infect Immun. (2018) 86(4):e00869-17.  doi:10.1128/IAI.00869-17
    3. Abrahams JS and Preston A et al. Towards comprehensive understanding of bacterial genetic diversity: large-scale amplifications in Bordetella pertussis and Mycobacterium tuberculosis. Microb Genom. (2022) 8(2):000761. doi:10.1099/mgen.0.000761
    4. Rehm HL. Evolving health care through personal genomics. Nat Rev Genet. (2017) 18(4):259-267. doi:10.1038/nrg.2016.162
  • NGS Company Infographic

    While pulling together the most recent blog post (titled “NGS Necropolis Part 1 – the QIAGEN GeneReader”) I realized there was a lot of dates and numbers and names that lent itself to a simple timeline. And then it turned into a table, because I could then cram in a lot more information.

    So here you have it – no less than 13 companies who started as early as 1999, sorted by commercialization launch year, with acquisition dates, year founded dates, and year taken off-market (if applicable). Enjoy!

  • The NGS Necropolis Part 1: The QIAGEN GeneReader

    The NGS Necropolis Part 1: The QIAGEN GeneReader

    Since the launch of the first massively parallel sequencer in 2005, the pyrosequencing-based 454 GS20, many approaches have been developed and worked on. Wide adoption of several of these have not happened. In this series a few prominent platforms will be described, the first of which is QIAGEN’s GeneReader.

    AI's interpretation of what a QIAGEN GeneReader looks like.
    This is an AI interpretation of a prompt “Use an image of a QIAGEN GeneReader next-generation sequencing instrument as the centerpiece, decorate it with DNA helices in a lattice formation like a wreath around its top, and the setting of a deep blue lake with light shimmering on the surface of the water.” The artistic model was SDXL 0.9.

    A brief history of the NGS market: from Zero to $7 Billion in 18 years.

    In 2005 the 454 Life Sciences company launched the GS20, the first “next-generation” sequencer (“first generation sequencing” of course is the venerable Sanger method read out via capillary electrophoresis, which was how the Human Genome Project was completed in 2002). The 454 GS20 offered about a million reads per run of about 80 to 100 bases in length in a 10 hour run, or almost 1 million bases, which was a much higher capability than what could be done via Sanger at that time. Only two years later in 2007 Roche Diagnostics acquired 454 for $155 Million, and iterated the platform for an upgraded model called the Roche / 454 GS Flex with increased readlength in 2008, and a smaller unit called the GS Junior in 2009.

    Simultaneously, in this backdrop in late 2006 Illumina acquired Solexa for about $600 Million and started selling the first Genome Analyzers in early 2007. The throughput at the beginning was about a full 1,000 times what 454 could do, although the reads were much shorter. In an approximately 2.5d (~60 hour) run, the 1G could produce a gigabase of data, with the first iteration of reads as short as 25 base pairs and then steadily increasing in that first year to 37 and then to 50 base pairs. Of course in the intervening 15 years the industry has seen Illumina scale this technology in a nothing-short-of-spectacular way:

    Comparison FeatureSolexa 1GIllumina NovaSeq X
    List Price of the Instrument$400,000$1,250,000
    Price to run a Single Experiment (“Cost to press GO”)~$3700~$19,000
    Total Sequencing Yield from a Single Experiment~800 MB6 TB (6,000 GB or 6,000,000 MB)
    Read LengthSingle-end (1x) 25 to 37 basesPaired-end (2x) 150 bases
    Cost per Gigabase~$4,600~$2.00*
    Fold-reduction in Cost per GigabaseN/A2,300-fold
    If a luxury car cost this in 2005 what would it cost in 2023$100,000$100,000 / 2,300 = $43.48
    If a house cost this in 2005 what would it cost in 2023$500,000$500,000 / 2,300 = $217.39
    Comparing the first Solexa 1G (later renamed the Illumina Genome Analyzer) to the latest iteration of Illumina’s highest throughput system, and a comparison in price per Gigabase (one billion bases of sequence)

    It was in 2006 when Agencourt was acquired by Applied Biosystems for $120M, which was then launched as the Life Technologies’ SOLiD next-generation sequencing system in 2008. The sequencing-by-ligation based system went through five iterations from 2008 through 2012, overlapping with Life Technologies purchasing Ion Torrent in 2011 and started selling the Ion Torrent PGM (“Personal Genome Machine”). As a side note, yours truly started at Illumina in 2003 as a product manager first, and then in 2005 started selling Illumina whole-genome microarrays in the Mid-Atlantic area (including the US National Institutes of Health), and sold the first Genome Analyzers through 2009; in 2010 I started selling the SOLiD 4 systems for Life Technologies and then after the Ion Torrent acquisition moved back into a Regional Marketing role.

    Of course Complete Genomics launched their service business for whole-genome sequencing in 2009 (acquired by BGI in a $117M merger in 2012), Helicos launched their true Single Molecule Sequencing system in 2008 and closed their business in 2012, and Pacific Biosciences made a splash launching the PacBio RS in 2011. Illumina, Ion Torrent (underneath the Thermo Fisher Scientific company), Pacific Biosciences, BGI / MGI / Complete Genomics are all selling (and supporting) their respective products and services; Helicos was recently written up here if you are interested in that history; and Roche / 454 discontinued that product line in 2013.

    QIAGEN’s Investments in NGS

    In this Part 1, here we describe the QIAGEN acquisition of Intelligent BioSystems in 2012 (for about $50M). In future posts the Bio-Rad acquisition of GNUbio in in 2014 (for about $40M) and the Roche acquisition of single-molecule sequencing firm Genia in 2014 (for $125M in cash and $225M in contingent payments) will be discussed, along with many other NGS startups in various stages of development.

    QIAGEN went the furthest (relative to Bio-Rad and Roche) with their sequencing platform the GeneReader; way back in 2012 I wrote up this post on The Next Generation Technologist blog about the Mini-20 system that Intelligent Biosystems had developed, and their individually-addressable 20-lane system. It was revamped, and took some time in development; understand that QIAGEN had purchased SuperArray Biosciences, also called SABiosciences, in 2009 (for $90M); SABio had pathway- and disease-specific qPCR panels, however they had in development a single-primer enrichment technology, now launched as QIAseq target enrichment panels. QIAGEN also had acquired Ingenuity Systems in 2013 (for $105M), primarily for Ingenuity’s Variant Analysis™ and iReport™ capabilities, now embodied in QIAGEN’s Clinical Insight (QCI) with clinical-grade variant analysis and reporting capability.

    Leveraging capabilities of the QIAGEN QIAcube and QIAsymphony sample preparation and liquid handling automation for library preparation, their vision and goal was to implement a user-friendly clinical sequencing platform, following a “From Sample to Insight” mantra. Having the front-end sample preparation, library preparation, NGS sequencing and clinical interpretation and reporting from a single vendor is very attractive from both a customer perspective as well as a business one. (In consultant jargon, this is called increasing the value capture model.)

    It was November 2015 at a conference in Austin Texas (Association for Molecular Pathology or AMP) where QIAGEN announced the launch of the GeneReader, as “the world’s first truly complete NGS workflow”. I happened to attend that conference, and was in the audience when the then-QIAGEN CEO Peer Schatz made the announcement.

    While not very public about it, apparently the run capacity was up to four flowcells, and each flowcell could accommodate 10 samples; flowcells could be added mid-run using a turntable inside the instrument. QIAGEN emphasized how flexible the system is to add additional samples while existing samples are currently running.

    A year later, in November 2016 QIAGEN announced the ‘relaunch’ of the GeneReader product, with new chemistry. This was in direct response to a lawsuit Illumina pursued in mid-2016, and won a preliminary injunction against QIAGEN of September 2016 for patent infringement. A few years later in 2019, QIAGEN took a $200M charge to exit the development of a clinical version of the GeneReader platform.

    One analyst described their clinical effort as a ‘nice niche’, however subsequent agreements with Illumina to develop IVD panels for the MiSeqDx, NextSeq 550Dx and future diagnostic systems is a clear signal that the GeneReader is something of a dead end.

    In the meantime, QIAGEN continues to sell (and develop) enrichment on the front-end for NGS, and further refine their clinical reporting capability.

    A few lessons from the GeneReader

    A few lessons can be drawn from this GeneReader story. (N.B. – I have no financial relationship with QIAGEN, although I know a lot of people there – or who were there – when I started my career in life sciences vendors in 1999 as a Technical Support scientist, and then as a Product Manager.)

    One lesson is the importance of a clear value proposition and differentiator. Illumina in 2016 had just launched the MiniSeq and iSeq, tackling the low-end of the market. (The price of a MiniSeq instrument was $49,500, and still sold today.) Right around the corner in March of 2017 the first NovaSeq 6000 system launched. Thermo Fisher Scientific’s Ion Torrent division announced a new system called the GeneStudio S5 and a new higher capacity sequencing chip (called the Ion 550™ Chip), and with their Oncomine combination of cancer-specific multiplex-PCR-based AmpliSeq™ panels as well as their clinical software (also somewhat confusingly called Oncomine), Thermo Fisher has a decent-size footprint in the clinical market.

    Where would the GeneReader fit in against this backdrop? The throughput was just too low to be competitive with the existing offerings from Illumina and Thermo Fisher Scientific, thus QIAGEN emphasized the ‘sample to insight’ and have a clinical offering with all-inclusive pricing (sample preparation, enrichment, sequencing, data analysis and clinical reporting).

    There was only one problem with this scenario: Thermo Fisher’s Ion Torrent systems were in exactly that same market: clinical oncology with library preparation through clinical reporting, and making a decent business from it. At least decent enough for further automation with the Ion Torrent Genexus system launch in March 2022. QIAGEN could not claim much higher accuracy, or lower cost structure, or superior sophistication of their reporting software, it was what can be called “me too” several years in the making, while Ion Torrent was already there (as well as Illumina clinical users who cobbled together all the requisite clinical parts).

    A second lesson is that the clinical market is price sensitive regarding adoption of new technology. There may be high value in a new kind of data-type (whether it’s sequencing data versus real-time PCR point mutations or single CNVs, or structural variation information from optical mapping systems like Bionano Genomics), however price is an objection. And not only the cost to run a single sample, but the costs of training personnel for a complex instrument workflow (thus the need for integrated automation, increasing complexity), the cost of acquiring the instrument in the first place, and lastly the costs of maintaining the instrument(s) needed (roughly 10% of the list price of the instrument each and every year that instrument is in service).

    QIAGEN had to be willing to invest many, many years into supporting the GeneReader to grab a slice of a relatively small and highly competitive clinical NGS market. Yes that clinical NGS market is growing (in comparison to the research market which is flat in comparison) but still those cost barriers remain. And there was nothing noteworthy in QIAGEN’s cost structure to their clinical customers; capturing the entire value chain was ultimately too expensive for a limited number of customers, and thus QIAGEN reversed course in 2019.

    The last lesson is that there will be new players and new technologies attempting to disrupt both the research and clinical markets, and what is true today in 2023 may not be true in 2025, or even perhaps 2024. Last week I spoke with a person with Element Biosciences, who is talking with many clinical customers about adopting an Element AVITI™ for their Laboratory Developed Tests (LDTs) thanks to their favorable economics. The PacBio Revio has favorable economics, as well as the PromethION 48, however is still too expensive relative to Illumina (and Element and Singular) short reads for wide adoption.

    On top of this new companies currently in “stealth” mode are working on cracking a $7 Billion dollar NGS market, the latest of which is Ultima Genomics that publicly announced last May (with a collection of pre-prints and several high-profile presentations) their system will be able to cut the cost of sequencing (in comparison to the Illumina NovaSeq X number above at $2.00/Gb) by a factor of 2, to $1.00/Gb. (Edit: here including a $2.00/Gb NovaSeq X price for a yet-to-launch “25B” NovaSeq X flowcell to achieve these economics; the original version of this post had the existing “10B” flowcell per-Gb pricing at $3.20/Gb.)

    Next month (2-6 November 2023) the American Society for Human Genetics will take place in Washington DC, and it will be exciting to see what kinds of advances are in store. I’ll be sure to share what I learn.

    Several AI Generated images using a “QIAGEN GeneReader as the centerpiece”. Room for improvement!
  • High Throughput NGS Systems: Throughput, Time and Cost Graphic

    High Throughput NGS Systems: Throughput, Time and Cost Graphic

    It is an exciting time for next-generation sequencing (NGS) with new platforms being launched. Below is a chart that illustrates recent progress.

    High Throughput NGS Systems compared by Gb/run, Days Required and $ per Gb
    High Throughput NGS Systems compared by Gb/run, Days Required and dollars per Gb

    A chart from 7 years ago…

    A few days ago I was reminded of a chart from 2016, when a blogger named Lex Nederbragt (now at the University of Oslo, Norway) as a result of competition in the NGS marketplace for both readlength and throughput made a handy chart with a lot of platforms on it. (You can see his original blog post with links to the image in a post called “Developments in high throughput sequencing – July 2016 edition“.)

    I was hopeful that that chart would have been updated in the intervening years, but alas his blogging moved platforms and then went quiet a year or so later. (And as a blogger myself, I can relate to the pressures of work and life as well as affiliation, which may or may not be conducive to this kind of activity.)

    The current NGS “arms race”

    So I thought about the current “arms race” of new platforms in the wings (i.e. Ultima Genomics is the top sponsor in February 2024’s Advances in Genome Biology and Technology), as well as new platforms only now getting into the hands of customers (specifically the Pacific Biosciences’ Revio, Element Biosciences’ AVITI, Singular Genomics’ G4, and the Pacific Biosciences’ Onso), and the renewed efforts of BGI / MGI / Complete Genomics now that they have the ability to sell their systems in North America and Europe. However complicating the MGI / Complete Genomics story, about a year ago parent company BGI Genomics was added to the US Department of Defense list of blacklisted companies – a GenomeWeb story with more details here, however as far as I can tell MGI / Complete Genomics continues to do business in the US.

    By pulling together specifications and prices (along with some handy source materials assembled by others) I constructed a list of some 21 existing (or in the case of Ultima soon to exist) offerings for sale, from Illumina’s iSeq all the way up to MGI’s monster DNBSEQ-T20x2. I took this list, calculated a US Dollar per Gigabase cost based on the highest throughput x readlength x time to sequence, and excluded all the other configurations. (For example, using a lower number of flowcells, or shorter runtimes for tag-counting applications were excluded.) I also noted the number of hours it took for this highest-throughput-per-system calculation.

    I then excluded all systems whose price-per-Gigabase of sequence was greater than $10 per Gb. (For those curious, if you figure 100 Gb of sequence per genome as a 33x WGS coverage, that’s the “$1000 Genome”). Therefore any system above the magic “$1,000 Genome” mark is not included, and you have the chart above: Gigabases per run on the X-axis, Hours per run on the Y-axis, and the size of the bubble in terms of US Dollars per Gigabase is relative to each other; the smaller the bubble, the lower the per-Gb cost.

    A few observations

    The market leader (estimated market share is about 75%) is of course Illumina, going through an upgrade cycle on the NovaSeq X where the per-Gb price on the NovaSeq 6000 at $4.84 drops to $3.20 in the latest iteration of the flowcell (these were released in February 2023, the 10B). A newer one (25B) will further drop that per-Gb price to $2.00 or so in the latter half of this year.

    Element Biosciences has a ‘package deal’ to get to $2.00 per-Gb, however that’s dependent on special discounting and large purchase commitment; I’ve left it at their current maximum capacity use-case.

    The Pacific Biosciences’ Revio did not make the cut due to higher than $10/Gb cost (from the pricing I’ve seen it’s about double that), but the Oxford Nanopore PromethION made it at exactly $10/Gb. Pretty remarkable that you can get a long-read whole genome for $1,000 when you think about it, even if it takes several days to produce the data.

    The MGI / Complete Genomics systems are certainly price-competitive – and the DNBSEQ-T20x2 broke the chart at 72,000 Gigabases per run, at $0.99 per Gb. Yes, that’s 720 whole genomes at 33x every 4 days. Their other system, the T7, has a few installations worldwide when they were effectively blocked from selling them in North America and Europe due to patent infringement (and an injunction).

    For the new Ultima system (called the UG 100), it has a relatively short runtime (24 h), a very low per-Gb price at $1.00, and at 3,000 Gb/run that is 30 whole genomes a day. Certainly a platform to watch, especially with the November 2023 ASHG conference coming up next month (in Washington DC) and the February 2024 AGBT conference (in Florida).

    I will be attending ASHG this year, and if you’d like to meet in-person during that conference be sure to reach out!

  • The Unmet Needs of Next-Generation Sequencing (NGS)

    The Unmet Needs of Next-Generation Sequencing (NGS)

    There are plenty of unmet needs in the current iteration of NGS, not the least of which is the effort involved in getting plenty of sequence data

    A short list

    The current NGS market is estimated to be about $7,000,000,000 (that’s $7 Billion) which is something remarkable for a market that started only in 2005 with the advent of the 454 / Roche GS20 (now discontinued). After the market leader Illumina there are alternative sequencing platforms such as Ion Torrent / Thermo Fisher Scientific, newcomers Element Biosciences, Singular Genomics and Ultima Genomics, and single molecule companies Pacific Biosciences (also known as PacBio) and Oxford Nanopore Technologies (also known at ONT).

    With major revenues coming from cancer testing (Illumina estimates NGS at $1.5 Billion of the oncology testing market), genetic disease testing ($800 Million) and reproductive health ($700 Million), NGS is well-entrenched in these routine assay fields.

    Yet from a different lens, that oncology testing market of $1.5B is from a total of a $78B cancer testing market, or a 2% penetration of the cancer testing landscape. Similarly, both genetic disease (a $10B market) and reproductive health (a $9B market) NGS has penetrated only about 8% of each of these markets. So there is plenty of room to grow.

    Yet what is holding the adoption of NGS back from more clinical applications? Here I propose a (relatively short) list of unmet needs, which serve as barriers to adoption. To look at it another way, this is a list from which current NGS providers (as well as new NGS providers) would do well to improve upon.

    The list is:

    • PCR bias
    • Sample input amount
    • Library preparation workflow
    • High cost of instrumentation and reagents
    • Sequencing run-times

    We’ll address each of these in order, and comment how single-molecule sequencing (aka ‘third-generation sequencing’, but that term really isn’t used much any more’) has addressed this issue (or not, as the case may be).

    PCR Bias

    PCR bias is a thing that people doing routine NGS may not think about, but for those who are doing whole genome assemblies or otherwise have to get sequence data from areas particular G-C or A-T rich, this is a Big Problem. Because PCR works on the basis of the short oligonucleotides hybridizing under a set of temperature, salt and cation concentrations, the melting temperature (Tm) of the short DNA primers is really important. PCR also depends upon the strands denaturing and re-annealing at certain temperatures, and in all this limits of G-C percent of all sequence being amplified has a very strong influence on efficiency of the reaction.

    This is a complex topic, where research papers dwell on a computer scientist’s love for sequence windows and normalized read coverage and statistical equations, suffice it to say bias can go in all kinds of directions, and there’s a lot of variability in several dimensions. (See figure – both sequence plots are from the same species of bacteria, just different strains: S. aureus USA300 and S. aureus MRSA252. One has a falling coverage as a function of G-C content, and the other a rising coverage plot (!)

    Figure 1 from Chen YC and Hwang CC et al. Effects of GC bias in next-generation-sequencing data on de novo genome assembly. PLoS One. (2013) 8(4):e62856.

    A nagging question though – what do these coverage plots imply for all the missing data? A given genomic region could have all kinds of G-C content variation making it very resistant to study. Single molecule sequencing has largely solved this G-C bias question (link to 2013 publication) but the PCR bias is something to still consider.

    Sample input amount

    Back in the early days of the Ion Torrent PGM (around 2011 to 2012), a key selling point for the Personal Genome Machine was ability of the AmpliSeq technology to use only 10 ng of FFPE DNA as input material, whereas the Illumina hybridization-based enrichment methods could not touch. (Their approach at that time was in the 100’s of ng input.)

    As many of you are aware, FFPE tissues are in limited supply, the fixation and embedding process damages nucleic acids (typically fragmented to about 300 bp in length) and in the current Standard of Care the FFPE process is firmly embedded in the surgeon – to – pathologist workflow in the hospital environment.

    Another instance of limiting input is for cell-free DNA analysis, where amounts of cfDNA from a given 10 mL blood draw is a range from 10 ng to 40 ng. (This varies by individual, healthy versus diseased, as well as inflamed or normal along with a host of other variables.) And from this limited amount of DNA, companies like GRAIL and Guardant are detecting down to one part in a thousand, or 0.1% minor allele fraction (MAF). They use plenty of tricks and techniques to generate useable NGS libraries.

    Yet with a system that does not use PCR for their library preparation (Helicos, covered here) one limitation of using 1 – 9 ng of input material (and simply adding a Poly-A tail to the DNA) was that barcodes could not be added. Thus a key design feature of the Helicos instrument was a flowcell with 50 individual lanes, one per sample. It makes me think they could have perhaps been a bit more imaginative with the sample preparation and somehow add a sample barcode before polyadenylation to enable sample multiplexing.

    Library preparation workflow

    Ask anyone who does routine NGS library preparation, with or without the aid of liquid handling automation, and they will tell you it’s work. You purify DNA, you do an enrichment step (whether multiplex PCR like AmpliSeq or QIAseq or Pillar SLIMamp or a hybrid capture step from NimbleGen / Roche or Agilent SureSelect), you cleanup those reactions, you ligate adapters, you purify it again, you setup a short-cycle PCR to add sample indexes, you clean it up again, you quantitate via fluorometry or qPCR.

    And then do some calculations to normalize molarity.

    After loading the NGS instrument you wait for cluster generation / emulsion reaction with beads, and then you sequence. You are glad you are not in the early 2010’s with separate instruments for these processes, but still this all takes time.

    And there still is a danger of overloading the NGS instrument. This danger is done away with in single molecule sequencing (a single pore or Zero Mode Waveguide can only handle one molecule, higher concentrations of molecules doesn’t matter).

    High cost of instrumentation and reagents

    A NovaSeq 6000 is almost a cool $1,000,000. A single run at maximum capacity on the NovaSeq is $30,000. This gets the cost-per-Gigabase down to $5.

    The newest NovaSeq X from Illumina is even more – a cost of an eye-watering $1,250,000. This gets the cost/GB down to $3, with additional cost lowering effort with higher throughput flowcells later in 2023 to about $2/GB.

    For single molecule sequencing, the instrumentation is still high. A PacBio Sequel II is $525K, the new PacBio Revio about $750K. One exception is the ONT PromethION is $310K. However for all these instruments, the cost per GB is very high – Sequel per-GB cost is around $45 (that’s a good 50% more what the per GB cost on an Illumina NextSeq, however you are getting long reads with the Sequel). Revio is a lot better at $20/GB, so better than the NextSeq short reads but still 4x what a NovaSeq 6000 costs per GB.

    ONT is an attractive $10/GB, still 2x the NovaSeq though.

    In many ways single molecule sequencing is far superior to short-read NGS for clinical WGS. (See my friend Brian Krueger’s LinkedIn post – and poll – here and some great insights about the value of clinical WGS in an older post here.) The main barrier to wider clinical use of WGS is cost, and with a need for at least 15x if not 30x genome coverage, that’s 50GB to 100GB of sequence data. Put another way, currently on ONT the cost to generate WGS data is $500 to $1,000, while on the PacBio Revio it is $1,000 to $2,000, which is still too expensive (even though it has broken through the ‘$1000 Genome’ barrier).

    Sequencing run-times

    Lastly NGS takes a long time to run. A MiSeq running 2×150 paired-end (PE) reads is over 48 hours; a NextSeq run is at least over a day; and a NovaSeq almost two days.

    Here single molecule sequencing isn’t much faster, a Sequel II run takes over a day, Revio still takes a day, and ONT is three days (in order to maximize throughput, these pores are allowed to run for a long time).

    For ultra-fast WGS, ONT was used to get WGS data in 3 hours on the PromethION (GenomeWeb link paywalled) for newborn screening in the NICU, where fast turnaround time is paramount. There are some eight programs worldwide underway to utilize NGS in newborn screening programs, enrolling from 1,000 to over 100,000 infants. One prominent one is Stephen Kingsmore’s Rady Children’s Institute “BeginNGS” program, subtitled “Newborn genetic screening to end the diagnostic odyssey”. (Here’s a paywalled 360dx article laying out the details of these eight different genomic screening for newborn programs worldwide).

    Anything else?

    Okay, that is my list of ‘unmet needs in NGS’ with some problems solved by single molecule approaches, yet other problems remain. Did I miss anything else?

  • BioIntelli 2.0: A lead generation tool for life science marketers

    BioIntelli 2.0: A lead generation tool for life science marketers

    Introduction to life science business intelligence

    Life science vendors, whether well-established or just starting to commercialized, need a source of likely customers who they can market to. They come up with a commercial plan with the “ideal customer profile” laid out, whether an R&D scientist doing CRISPR-Cas9 genome engineering or a process engineer at a large pharmaceutical company looking for better analytical biochemistry equipment capability.
    Of course they are also interested in customers who are already using their competitor’s products and sell to them. Or to discover who the Key Opinion Leaders are in their narrow marketplace vertical, whether AAV-based gene therapy drug development or the latest in single-cell proteomics.

    A world of customer-specific data

    BioIntelli is a Massachusetts-based firm in business for the past 15 years, with a database of continuously-updated data from a variety of public and licensed sources. One area is all around funding sources, whether from the US Government, ex-US governments, or non-profit entities, as well as from Venture Capitalists. Another area is publication databases (PubMed is the most obvious one, but also including others); research areas, collaborators and products and services mentioned in the Materials and Methods are also indexed. A third area is scientific conferences, where not only the names and affiliations of attendees are stored, but also poster abstracts and platform presentations. (It is important to note that this information, proprietary to each Society that sponsors these events, is properly obtained and licensed.)

    On top of all this is NIH purchasing information, which cross-referenced to the grants supporting these purchases is another dimension to investigate, whether from a product usage perspective, or for a customer profiling one. In addition, important changes in spending patterns can be a valuable resource for product managers, who can get real-time data on what market segments are growing, and what products are selling.

    Boolean search with wildcards

    By using Boolean operators (“AND”, “OR” and “WITH”), along with a wildcard symbol (“%”) you can search BioIntelli in a specific research area with specific product names (in the case of your competitor, for example) to get a list of prospective customers by geography. In addition, you can then setup a macro to simply future searches.

    For marketing professionals, you can perform finely detailed competitor trend analysis compared to your own market share. With up-to-date contact data your contacts are current. And having last year’s attendee list for a conference, you can directly market to your target offering to promote your presence at this year’s event, or even come up with a virtual campaign to drive traffic to your web presence, without needing to exhibit at that tradeshow entirely.

    For inside sales professionals, find genuine contact information without having to use personal emails looked up manually over LinkedIn. Obtain extensive information about people, roles and function to do detailed account mapping, to enable a targeted outreach on an account-specific basis.

    BioIntelli is worth checking out

    If you don’t have a similar tool today, or are a current customer of MONOCL or SciLeads, BioIntelli is worth looking into. Send a note to David Hines for a complimentary demonstration.

    If you’d like to read more about what improvements the 2.0 version has over the prior iteration, they recently published a press release here.

  • Observations about Helicos, a single molecule sequencer from 2008

    Observations about Helicos, a single molecule sequencer from 2008

    A brief history of Helicos Biosciences

    Does anyone remember Helicos Biosciences? Way back when, in 2009 (per Wikipedia) Stanford co-founder Stephen Quake had his genome sequenced (and published in the prestigious journal Nature Biotechnology) for a reported $50K cost in Helicos reagents; that year I remember hearing a talk given by Arul Chinnaiyan at the NCI with single-molecule RNA-Seq data, it was an exciting time.

    Crunchbase indicates Helicos raised $77M and went public in 2007; they shipped their first Heliscope in 2008, only to be delisted in 2010 and then declared bankruptcy in 2012. Remember,  the Solexa 1G / Illumina Genome Analyzer (“GA”) only started selling commercially in late 2006. So those early days it was something of a dogfight. I was selling Illumina microarrays for GWAS to the NIH from 2005 onward through the first GA’s and GA IIx’s and then from 2010 started selling the Applied Biosystems / Life Technologies’ SOLiD 4’s.

    A few distinguishing features

    For those familiar with library preparation for Illumina sequencing, it takes time and several rounds of PCR and PCR cleanup, along with quantification, to be ready for sequencing. For Helicos, it used poly-adenylated nucleic acid to bind to the flowcell (and then the chemistry would sequence the DNA directly, without any further amplification in emulsions, nanoballs, or clusters depending on the platform).

    As the worlds’ “first true single molecule sequencer”, the DNA sequence had no amplification bias and could read high GC or low GC stretches of DNA without any impairment to the accuracy. The bases were accurate to about 96%; this 4% error was not sequence dependent and was basically random, simplifying analysis. There was only a tiny amount of sample input required; 3 ng input amounts of RNA or DNA. The two flowcells gave the instrument the capability to run 50 samples in 1 run, which took 7 or 8 days to complete. And at a 2008 price per sample of about $325 (for about 14M unique reads, this info is from an old GenomeWeb interview), the price/sample for RNA-Seq was attractive, although it would require a 48-sample experiment (two of the 50 lanes were reserved for controls), or some $15,600 for a single experiment, which naturally would limit the market for high-throughput operations.

    In 2008 the Solexa 1G and Genome Analyzers were all the rage

    The Solexa acquisition occurred in Nov 2006 and several Solexa 1G’s had already been shipped and started producing data in customer laboratories, and at only 25 basepair (bp) reads they still produced about 800 MB of sequencing data per 3+ days’ sequencing run. I was selling microarrays to the NIH for Illumina since 2005, and in early 2007 the first Solexa 1G was installed at the laboratory of Keji Zhao at NHLBI, who ended up being the first person to publish a ChIP-Seq paper using NGS (it was in the journal Cell in May 2007, here’s a PubMed link).

    By the time Helicos commercially launched with their first commercial sale in February 2008, Illumina had already sold 50 Genome Analyzers by the summer of 2007, and by February 2008 had updated the instrument to do paired ends and the readlengths were extended from 25 bases to 35 bases. Illumina announced progress to moving to 50 base readlengths.

    Against this backdrop, Helicos designed and build their ‘Heliscope’ single molecule sequencer to be highly scaled: 50 channels, about 25GB of sequence data per 7 or 8 day run, read lengths 25 to 55 basepairs long with an average of about 30 to 35, about 4% error rate with bases having a G/C content ranging from 20% to 80%, and the error model was random (no systematic bias which was their big selling point against Illumina, where cluster generation as well as library preparation uses a form of PCR amplification introducing bias).

    And according to a conversation I had this week, the price of the Heliscope was also scaled: $1.2M was the instrument at the start in 2008, and steadily lowered over time to about $900K in 2011 when they ceased operations. Requiring 48 samples for an RNA-Seq experiment, taking an entire week to generate data, and costing over $15,000 was a tall order to fill; sequencing a whole genome for $50,000 was also not something many laboratories or individuals could afford to spend in 2008.

    Important aspects of the Heliscope

    Being able to source a 2008 product sheet of the HelicoScope (PDF), the data storage capacity on-board (remember this is 2008) was a whopping 28 Terabytes. This was to store enormous imaging data for the flowcells, of which there were a pair of them, and to do all the image registration and base calling. Any way you look at it, a single run producing 25 Gigabases of sequencing data in 2008 was going to pose some challenges.

    And this instrument was big: the spec sheet says the main Heliscope sequencer was four feet by 3 feet by 6 feet tall, and a whopping 1,890 lbs. An 800 lb block of Vermont granite was included at the bottom of the instrument to stabilize it against vibration. However it’s clear from a photograph of the instrument that they were fitted with wheels, so you can say it was portable, as much as a 2,000 lb instrument is portable.

    The world’s first single molecule sequencing technology (they trademarked the name of their technology, calling it True Single Molecule Sequencing (tSMS)™), the chemistry was not in ‘real-time’ like the latest PacBio Revio™ or Oxford Nanopore PromethION™, it was sequencing-by-synthesis of a single base and then imaging the entire flowcell surface. With two flowcells (each with 25 lanes), one would be imaged while the other had its flowcell biochemistry being performed. Impressively (or perhaps not that realistically?) they claimed improvements in the flowcell density and tSMS reagent efficiency they promised to eventually produce 1GB of sequence per hour (about 7x the above numbers in terms of density and thus overall throughput).

    One source told me in those days the flowcell had uneven densities of poly-T molecules, so there were unusable areas to call bases. If it was too sparse, not worth the effort of scanning and analyzing; if it was too dense, the signals would collide and no usable sequence could be obtained. The original design however scanned all the surface of all 50 channels; usable data or not, all the images were scanned and analyzed. There wasn’t the luxury of time and engineering resources to optimize this.

    What was the cause of the ultimate demise of the Heliscope?

    Not only was the instrument cost an issue, there was also the problem of getting to longer reads. In 2008 Illumina was getting 35bp reads and on their way to 50bp reads, along with paired-end capability that meant a large increase in throughput. (For those unaware, in 2023 these reads now go out to 300bp.) Helicos could not catch up; due to the likeliness of restrictions on detectability and the optical system, the laser illumination to excite the fluor labels on the nucleotides would also hold the potential to damage the DNA from being a usable molecule. And thus Helicos could talk about extending the average readlength from 35 (plus or minus 10 or 15 bases as it was a distribution of reads) to 50 or longer, but it just did not happen in the timeframe from 2008 to 2011 when Helicos stopped selling the Heliscope systems. It is my understanding that they did not sell many of these $1M systems, less than a dozen or so worldwide.

    Price of a new instrument from a new company at the $1M pricepoint is a tall order. One life science company that sold single-cell analysis equipment and consumables, Berkeley Lights (now renamed PhenomeX after an acquisition of the single-cell proteomics company Isoplexis) tried for years to reduce the size and cost of their flagship Beacon system, however were unable to and has a limited market for their analzyer.

    You can say Helicos paved the way for market reception of PacBio in 2012 and then Oxford Nanopore a few years later in 2015. The relatively high cost (some 7x to 10x on a cost-per-base relative to sequence data coming off of Illumina’s flagship NovaSeq X) remains a large barrier.

    Now that Element Biosciences, Singular Genomics, and Ultima Genomics (and let’s not forget PacBio’s Onso) are competing head-to-head with Illumina on short reads, is there room for innovation (and cost reduction) in single molecule long reads? I would certainly hope so.