by Nancy Maddox, MPH, writer
In 1965, Gordon Moore, co-founder of Intel Corporation, foresaw a stunning pace of change in computer power: so-called Moore’s Law posits a doubling in the number of transistors in computer chips every two years.
Yet, Gregory Armstrong, MD, director of CDC’s Office of Advanced Molecular Detection, said the rate of technological improvement in the field of genomics has surpassed even this lofty, biennial benchmark.
The revolution in next generation sequencing (NGS)—also known as high-throughput, deep or massively parallel sequencing—took off in about 2006-2007, after the introduction of pyrosequencing and other computer-based technologies that far surpass the first-generation Sanger sequencing method in terms of speed, throughput and affordability.
Today, a next-gen platform such as Illumina’s popular benchtop MiSeq can handle 8 to 48 samples per run (depending on the pathogen) and generate over a billion bases of output in about 55 hours. The instrument costs on the order of $100,000.
But despite the allure of the new technology, public health laboratories (PHLs) have been slow to join the next-gen revolution. In 2010, for example, virtually no state PHL had NGS capability.
“Both here at CDC and a lot of public health agencies, there was a feeling, especially in laboratories, that public health was missing the boat,” said Armstrong. “This revolution in technology was providing some real opportunities, and we were not taking advantage of those.”
In 2011, CDC convened a Bioinformatics Blue Ribbon Panel to assess the agency’s bioinformatics capabilities, and the panel released a damning report. That report, said Armstrong, “went up as far as Congress and the executive branch in Washington, DC.” The upshot of the effort, which had broadened to include the bioinformatics-dependent NGS technologies, was the creation of CDC’s Advanced Molecular Detection (AMD) program in October 2013.
Armstrong said, “Congress was very clear to us: they don’t want us to start covering the ongoing cost of operations. Our funding is meant to be catalytic and to promote innovation in the US public health system.”
In practice, that has meant working through other CDC programs to support the adoption of appropriate next-gen technologies, such as whole genome sequencing (WGS), and to support workforce training. Partly because of AMD efforts, at the end of 2015, 37 state PHLs had NGS instrumentation in-house, and another nine reported plans to acquire the technology by the end of 2016.
The next round of funding for CDC’s Epidemiology and Laboratory Capacity Cooperative Agreements will include about $2.5 million from the AMD program to develop a cross-cutting NGS infrastructure. Armstrong said his program intends to fund about three dozen laboratories and “maybe more than that.”
Already, NGS testing platforms are in use across the country to address a wide range of public health priorities, from TB and Zika virus to newborn screening and food safety testing.
Robert Myers, PhD, director of the Maryland State Department of Health and Mental Hygiene Laboratories Administration, said his lab has been using NGS for specific applications since 2012. One of the biggest advantages of the technology, he said, is its discriminatory power, enabling epidemiologists to identify novel microbes—such as Bergeyella zoohelcum isolate from a Maryland pig bite victim—and to identify genetically related sub-clusters of pathogens. In one investigation, for example, WGS revealed that an East Coast Salmonella Newport outbreak two summers actually comprised multiple sub-clusters, with different tainted food items implicated in New York cases versus those in Maryland, Virginia, Delaware, Pennsylvania and Ohio. The technology has been used to similar effect in other outbreaks.
************************
New York City is known for its hot, dog days of summer, when air conditioners buzz across the city. When Kimberlee Musser, PhD, answered a call from the New York City (NYC) Public Health Laboratory in July 2015, it turned out to be directly related to these crucial cooling systems.
Musser, who oversees bacterial disease testing at the Wadsworth Center—New York’s state PHL—had worked with the NYC laboratory just months earlier to screen building cooling tower water for Legionella pneumophila, the causative agent for the sometimes fatal pneumonia known as Legionnaires disease—spread through aerosolized water. City health officials suspected a Legionnaires disease outbreak was unfolding in the South Bronx. And they wanted to stop it.
Musser and her staff agreed to support the public health investigation by once again screening cooling tower water and potable water via PCR (327 water samples in all) and conducting WGS on clinical and environmental samples (80 samples over a two-week period). At the same time, NYC and Wadsworth scientists performed pulsed field gel electrophoresis on all Legionella isolates and CDC scientists performed a second typing method, sequence-based typing, on a subset of isolates.
“When we looked at all of that data,” said Musser, “we could determine that there was one hotel, the Opera House Hotel, that had several environmental isolates that matched well to all of the epi-linked cases in the South Bronx. It looked like a solid investigation.”
Then a second, smaller Legionella outbreak was detected in the East Bronx. Subsequent testing led authorities to a cooling tower on a college campus about seven miles from the Opera House Hotel. Although there was some heterogeneity, the isolates from the East Bronx and South Bronx were quite similar.
“Then we stepped back and realized something,” said Musser. “Sometimes you have to think about which reference genome you’re using to align your sequence data to.” The Wadsworth scientists had been using a genome from a 1976 Legionnaire’s disease outbreak in Philadelphia, and had been able to match 88% of the genome with confidence. Then they switched to one of the environmental strains from the Opera House Hotel “to understand if we were missing anything by not using the whole genome.” (In this case, CDC did the testing using a PacBio WGS platform, which is able to perform long reads and generate a completed, closed genome.)
Results were stark: The 41 clinical isolates from the South Bronx outbreak and five environmental isolates from the Opera House Hotel were identical. Moreover, the East Bronx clinical and environmental isolates differed from the South Bronx isolates by just eight base pairs—out of a 3.4 million-base-pair genome. Several historic Legionella isolates recovered from NYC and Wadsworth archives were also one to eight base pairs different from the South Bronx strain.
Said Musser, “WGS provided extra discrimination in a situation where it was really needed.” She said, “We think this strain has been in NYC at least since 2007 and slowly evolving in different niches. We’re calling it a persistent endemic strain.”
Needless to say, the state governor and NYC health commissioner promulgated new regulations requiring registration, regular testing and, where necessary, disinfection of all water cooling towers to help prevent future problems.
*******************************
CDC’s AMD program currently funds about two dozen program areas. Armstrong said those likely to experience “the earliest impact” are food safety, TB, influenza and antimicrobial resistance.
The technology is most useful in situations where investigators need a lot of information from one isolate, such as virulence markers, drug resistance markers, drug resistance mechanisms (such as plasmid vs. non-plasmid), serotype, subtype, etc. At New York’s Wadsworth Center, for example, WGS has replaced about a dozen tests previously used for TB surveillance, and it delivers findings faster.
Another bonus is the convergent nature of NGS technology, meaning a laboratory can use the same staff, same equipment and highly similar processes to analyze a range of microbes, from viruses to bacteria. In addition, different laboratories can generate comparable digital data using different sample prep methods and different software.
Just this year, the Minnesota Public Health Laboratory Division began using NGS for Streptococcus pneumoniae. Dave Boxrud, MS, who oversees the laboratory’s molecular epidemiology program, said “We might have a little bit different [genetic] extraction method than CDC and that will be fine. In the end, we have sequences and we have ways of assessing the quality of those sequences. Anything we produce in Minnesota can be compared with data from other laboratories.”
Data can also be compared with sequences from various US or international genetic databases.
******************************
Oysters are among the valuable fisheries in the Chesapeake Bay, netting Maryland harvesters $15.7 million in 2013-14. Because they are often eaten raw, they can also be a source of illness, mostly associated with Vibrio parahaemolyticus, a bacterium that thrives in brackish, temperate waters. In 2010, two diners fell ill after eating raw oysters in two different Baltimore restaurants. V. parahaemolyticus was found in their stools, and the strains were sufficiently similar via PFGE analysis to consider the cases a cluster.
In a rare turn of events, laboratorians were also able to analyze oyster samples collected from the same Chesapeake Bay location as those consumed by the diners. Surprisingly, in addition to the outbreak strain, they found numerous other V. parahaemolyticus strains, raising questions about the origins of the organisms and their relatedness.
Last year, Julie Haendiges, head of the Core Sequencing Laboratory in the Maryland State Department of Health and Mental Hygiene (MD DHMH) Laboratories Administration, led a retrospective analysis of the archived isolates using a whole-genome multilocus sequence typing (wgMLST) analysis to answer those questions.
Scientists screened 479 V. parahaemolyticus strains from oysters harvested from the implicated areas. Of these, they identified and sequenced 11 potentially pathogenic strains. The researchers found the 2010 clinical outbreak strain and implicated oyster strain to be >99.999% identical at the nucleotide level, confirming the link to Chesapeake Bay oysters. However, comparison with closely related genomic sequences in GenBank, a genetic sequence database at the National Institutes of Health, showed that this strain belongs to a clonal complex endemic to Asia, where it is considered a pandemic organism. Haendiges said, “We don’t usually see Asian strains. How it got into the Chesapeake Bay has led to a lot of speculation, but we have not seen it since 2010.
******************************
Despite its many pluses, NGS is not without challenges. Perhaps the biggest of these pertain to staffing and data analysis, two sides of the same coin.
Myers said, “It takes highly skilled people to produce quality WGS results. Now you really do need people who are good with data and good with numbers. It’s a personnel issue.”
While CDC has developed its own analysis pipelines that PHLs can use, the agency is also working with APHL to develop a cloud-based pipeline that can be accessed through the APHL Informatics Messaging Service Platform. The hope is, said Boxrud, that PHLs “can have control over the analysis. CDC won’t have to be the gatekeeper, but it can get the information it needs.”
Other challenges have to do with targeting the still-pricey technology to the most cost-effective uses. Myers said that while NGS provides an abundance of detailed data about sequenced microbes, it can be “more powerful than you need for some applications.”
CDC’s Office of Antimicrobial Resistance, headed by Jean Patel, PhD, D(ABMM), will be funding seven PHLs (to be announced this August) to develop reference-level capacity to detect and characterize drug-resistant microbes, such as carbapenem-resistant Enterobacteriaceae, colistin-resistant mcr-1-positive E. coli and Candida auris, an emerging strain of yeast impervious to common antifungal drugs. Patel said NGS “is very good for detecting resistance we already know about, but not so good for detecting resistance we don’t know about.” And even when drug-resistance genes are found, phenotypic testing may still be necessary to confirm those genes are active and to detect resistance encoded by novel mechanisms.
Still, PHLs are finding critical niches for NGS, even outside the microbiology laboratory.
*******************************
This past April, Wisconsin’s newborn screening (NBS) program had to make a decision. The State Laboratory of Hygiene was smack in the middle of a two-year pilot program to assess the use of a 240 mutation-panel as a second-tier NBS assay for cystic fibrosis (CF), an inherited condition linked to defects in the 250,000-base-pair-long CFTR gene. (Over 10 million Americans have a defect in one copy of the gene, making them CF carriers. About 35,000 have defects in both copies and have the disorder.)
Mei Wang Baker, MD, FACMG, who co-directs the NBS laboratory, said the pilot project had accumulated a “good year” of data and experience, when the program’s routine second-tier CF assay—a conventional molecular test with a much smaller mutation panel—was pulled from the market. “So we had to make an earlier decision,” she said.
“Our CF clinicians and NBS committee members really like this comprehensive [pilot] panel,” said Baker, citing its benefits: “The additional mutations on the panel increase the likelihood of identifying the disease-causing mutation in true CF cases and puts us in a better position to identify disease carriers. In addition, the assay obtains all CFTR gene codon sequence information, and can be updated to include newly discovered mutations through software modifications without changing the assay.”
Even though the NGS option is more costly, Baker says it is also more cost-effective in terms of the per mutation price. Thus, the state chose to make the pilot assay its routine second-tier CF screen.
Yet, despite a good outcome, Baker remains cautious about incorporating additional NGS assays in the Wisconsin NBS program, and especially about using such assays as first-tier screens.
Traditional NBS assays look for downstream biochemical markers—such as elevated phenylalanine in the case of phenylketonuria—that are indicative of enzyme deficiencies or other congenital disorders. NGS, on the other hand, looks upstream for the genetic source of NBS disorders. The problem is that not all disease-causing mutations are known. In addition, the clinical significance of a particular mutation is not always well understood.
Baker said NGS is “complementary with traditional metabolite detection methods.” She said, “You have to consider the utility. I want us to do this carefully.”
*********************************
Even though NGS might still be considered an emerging PHL technology, CDC’s Armstrong said early investments in the technology are already resulting in public health advances. As one example, he said, a CDC-developed infrastructure for sequencing dengue and chikungunya virus was readily adapted to Zika virus after that microbe prompted large-scale outbreaks in Latin America.
The next big boost for NGS will likely come from metagenomics—a field focused on the recovery of genetic material directly from environmental or clinical samples, such as a gut swab or water sample. Metagenomic shortcuts would eliminate the need to do extra sequencing to tease out bugs of interest from microbial communities containing many species.
In the meantime, Boxrud said he expects the new AMD funding to be a “game-changer.”
He said, “I think that one year from now, things are going to look really different. I think NGS is going to be the norm within the next year; not for all laboratories, but a significant number.”