By Melanie Padgett Powers, writer
In the near future, Kelsey Florek, PhD, MPH, a bioinformatician at the Wisconsin State Laboratory of Hygiene, imagines being able to “talk” to her data. Florek and other public health laboratory professionals could input their data securely into a software program that uses generative artificial intelligence (AI). They would open a chat box and ask their data questions.
Public health laboratorians could give their data queries or tasks such as: Is this COVID-19 variant more transmissible or evasive? Create a table that shows the location and potential relatedness of mpox cases over the past three months. Using the common characteristics of these cases, where were these individuals exposed to Listeria?
AI could also help solve puzzles, such as: Why does my bacteria look like this? Why did my sequencing not work? The computer would immediately provide answers, saving hours of analysis—but also provide important answers to questions experts might not be able to easily figure out alone.
“One of my biggest roles in the laboratory is acting as that interpreter between what the genomic data is showing us and how do we interpret it and make it actionable,” Florek said. “I’m always looking for new ways to interrogate and make that data more useful. If generative AI is a potential avenue for us to use to make that data more accessible … those are areas we want to explore more of.”
However, we’re not quite there yet. AI needs to continue to advance, and there are plenty of concerns that must be worked out first—around accuracy, privacy, ethics, data protection and more.
The Potential of AI
AI and “machine learning” are often used interchangeably, but they are not synonymous. AI is actually an umbrella term with many different areas and types. However, most AI in society today is based on machine learning, which allows a computer to analyze data to do a task without being explicitly programmed, according to the Office of Public Health Data, Surveillance and Technology (OPHDST) at the US Centers for Disease Control and Prevention (CDC). Machine learning can find patterns in data and/or predict an output based on its training.
Tools such as ChatGPT are a specific type of AI called “generative AI” (GenAI), which uses a type of machine learning called a large language model (LLM). These LLMs can recognize and generate text, images, audio and video. LLMs are trained on large data sets. Right now, this means that ChatGPT and similar tools are not answering a question correctly as much as they are predicting what the response to a prompt (or question) should be. GenAI imitates human language, but its answers are not always accurate.
Today’s GenAI has often been described as an intern who really wants to please you and give you answers—but their work products must be supervised and fact-checked.
“We’re seeing a lot of the early adopters now use generative AI, but you have to realize it’s not intelligence—we may call it artificial intelligence, but it’s just math and code,” said Jorge Calzada, MBA, head of machine learning and artificial intelligence and director of OPHDST’s Platforms Division.
While Florek isn’t in conversation with her data quite yet, her laboratory is using free open source machine learning models—often developed by academics—to help interpret data results. Florek works primarily in the infectious disease genomics arena. Her team has used machine learning in conjunction with their data analysis. “We’re often using different models to help interpret and understand the genomic data to help explain or understand why we’re seeing something—whether it’s a new resistance mechanism or a new virus variant,” she said.
Florek and her co-workers used machine learning during the COVID-19 pandemic to identify and classify emerging variants. “That was incredibly powerful and a really excellent use of how we can apply some of these machine learning models to help classify this evolution pattern as it’s happening in real time,” she said.
As the models become more sophisticated, Florek expects to move from using the software to support her analysis to being in conversation with the models, which would allow for more in-depth investigation and faster answers. Currently, working with large data sets can be difficult and require computer programming skills.
Kelly Oakeson, PhD, a bioinformatician at the Utah Public Health Laboratory, is excited about the prospect of using AI to determine things like how resistant an organism is to a specific antibiotic, which could lead to better and faster disease treatment. In that example, a laboratory would build up a training set of bacteria that are resistant to certain antibiotics. The team would conduct whole genome sequencing of those organisms to determine which genes are likely responsible for resistance to a specific antibiotic. Do that for 1,000 or so samples and you have a data set to use to train your internal machine learning algorithm, Oakeson explained.
Once you have the data set, “you could start predicting the level of what might be resistant to what antibiotic, just based on their genetic signature,” he said. “Without having to know anything about their physical attributes, you could predict these bacteria will all be resistant to this class of antibiotic because they’ve got this genetic background.”
Managing and Improving Data
AI could also remove a tremendous burden that falls on public health laboratories: data management. Public health laboratory professionals are spending an inordinate amount of time managing data such as electronic reporting—taking time away from their area of expertise, said Michelle Meigs, MBA, APHL director of informatics.
“The maintenance right now to keep up, monitor and support all of those connections, to make sure the data are valid, and to make sure that it’s done properly, is time-consuming,” Meigs said.
Calzada calls it “data janitorial services,” which includes cleaning data and transforming it from one format to another. “There’s a real opportunity for public health in leveraging classical machine learning and generative AI. … How can we lessen the burden on our data generators?”
Leveraging AI to take over those tasks would free up both employees and funds “and then that money could go back into other initiatives,” Meigs said.
Of course it’s not up to the public health laboratories by themselves—it takes funding, support and policy changes. The 2022 Reimagining Public Health survey by Ernst & Young questioned 301 public health officials. While 82% of senior leaders said they need to modernize their organization, only 39% of service delivery team members agreed. At the time, only 34% of respondents were pursuing major information technology upgrades, with only 10% integrating this activity with system-wide transformations.
This survey of 301 public health officials found that while slightly more than one in two (52%) are optimistic about what data can achieve for their communities, the rate of digital transformation remains uneven, and organizations face significant barriers to change.
Data modernization has been talked about and worked on for years in public health, but Calzada said he believes these recent and ongoing advancements in technology can help public health catch up to industry.
After the COVID-19 pandemic, CDC created three new offices, including the OPHDST which creates and manages tools to help the other divisions. Or, as Calzada explains, the Platforms Division is helping CDC become “AI ready.”
The first step is to fix data quality. “We’re starting to implement frameworks for how to measure that, doing that inside of a platform so it happens automatically and then sharing that information, so we can all have a common agreement on what the quality of our data is,” Calzada said.
Organizations and laboratories can also assess the quality of their data. The Artificial Intelligence Maturity Model developed by MITRE is a free tool that provides a systematic framework that organizations follow as they manage, adopt, resource and implement AI. MITRE is a not-for-profit IT research and development company that works through public-private partnerships and federally funded research and development centers to solve challenges to the safety, stability and well-being of the US.
“The AI Maturity Model walks individuals and organizations through how to assess their state of readiness, where they can focus their resources, and how they can … assess where they are in this continuum of AI adoption,” said epidemiologist E. Oscar Alleyne, DrPH, MPH, managing director of the Public Health Division at MITRE. The model also assists organizations in evaluating “how to build in maturity to more systemic, or perhaps more focused, areas of implementation,” he said.
Another free tool from MITRE is ATT&CK, which is a cybersecurity tool that can develop threat models that data or organizations might be facing.
As Calzada sees it, the next step is to improve the data management lifecycle. A modern data platform should manage its data automatically, he said. That includes ingesting, storing, transforming, analyzing and sharing the data.
Calzada is a data scientist whose background is as a technology executive. After he joined the public health arena less than two years ago, he said he observed the numerous department and data siloes across public health. He believes “virtual knowledge graphs” could help public health data integration, management and access. Such graphs are more flexible and can adjust, link and find relationships among various data sets.
APHL has been partnering with CDC on a project called DETOR: The New Direction in Electronic Test Orders and Results. The project recognizes the problem of public health laboratories and health care organizations using incompatible data systems. This makes it hard to exchange critical test orders and results quickly, accurately and securely.
DETOR enables real-time electronic test orders and results to be shared between public health laboratories and providers. It is built on the AIMS Platform and transmits and translates data between disparate systems. The platform was created by APHL and fully funded by CDC. It’s available free to public health laboratories and their partners through APHL.
After data quality and management improves, Calzada said the next step is to focus on machine learning operations. He said CDC has been successful at creating sophisticated computer models, but the agency needs to get better at deploying and operationalizing these models across public health.
“Can you build a factory floor that lets you produce machine learning at scale and in a reliable way, or are you going to leave it as an artisanal craft where, if you want to run a model, you call the person who built it to do it,” he said. “I think we can have much more impact on public health with a factory floor approach.”
Calzada encourages public health laboratories and individuals to reach out to him to discuss high-value models that CDC could build to address current public health challenges.
Generative AI has the power right now to improve efficiencies in the workplace, including in public health laboratories, said Jade Wang, a senior bioinformatics scientist at the New York City Public Health Laboratory. GenAI tools include ChatGPT, Google Gemini (now built into the Google search engine) and Microsoft CoPilot (an add-on to Microsoft 365, formerly Microsoft Office).
Wang’s laboratory has not used GenAI, but she has personally used it for brainstorming and other tasks that don’t use data, don’t need to be fact-checked and don’t involve any privacy concerns. For example, she might have trouble thinking of a certain phrase or word. Instead of turning to a dictionary or thesaurus, she uses GenAI, asking something like, “I’m looking for the word that means XYZ.”
Wang outlined other use cases that could speed up and streamline tasks: You might prompt or instruct a GenAI tool to draft an email you’ve had difficulty writing, though you will want to edit it. Or you may need to write a new standard operating procedure (SOP). You could input the template and the SOP’s basic steps and then GenAI would turn that into complete sentences and finish writing the SOP.
“That would save so much time. You wouldn’t have to sit there and manually describe every single step of what you’re trying to do,” Wang said.
Challenges and Concerns
Current public GenAI tools do not follow data privacy or patient privacy guidelines. The models also store and learn from the text you type in, whether it is patient identification details, confidential documents, or drafts of reports that haven’t been fact-checked or published yet.
Other concerns include inaccuracies and bias. GenAI is pulling information from the data sets it has been given. That data might include inaccuracies, biases, racist and bigoted language, and outdated guidelines or materials. These biases and falsehoods can be perpetuated as the tool continues to use them. In addition, marginalized voices and outlier cases will be included less frequently in the LLMs.
Back to that “intern” who wants to please you: the GenAI tool may not have an exact answer to your prompt in its data set, but it will still respond to you. These are called “hallucinations”—plausible answers that are made up. There have been instances of GenAI creating fake journal citations and nonexistent court cases.
CDC’s Office of the Chief Information Officer built an internal instance using the Azure OpenAI Service, which allowed for the creation of a chatbot at CDC. Staff can use the tool to test out use cases in a safe and secure environment and increase their familiarity with AI. They can ask questions or inputs that will not be shared outside CDC and will not be used to train other versions of ChatGPT.
“That was really about protecting employees from inadvertently disseminating information that they shouldn’t to the public,” Calzada said. “The approach that we took was ‘let’s be cautious, but let’s not stifle innovation’.”
AI conversations often include another worry: job loss. Workers fear computers pushing them out of their roles. But that’s something that public health laboratory professionals don’t have to anguish over, Florek said. In fact, the expectation is that AI will help with or take over low-level tasks, like data entry and data management, freeing up laboratory staff to work on more complex tasks and projects.
Florek envisions transitioning people from data entry roles to validation roles—ensuring the quality of the data coming out of the AI models—or data interpretation or process management roles. “It’s less about replacing people, and it’s more about shifting responsibilities,” she said.
Calzada has seen this happen in other industries as technology improves. “Machines don’t replace humans. Humans who use AI are the ones that end up replacing people who don’t,” he said. “We see it time and time again. It’s experts that embrace AI tools and make themselves far more productive than either one alone who are the ones that are the real winners.”
Alleyne has seen how public health, especially public health laboratories, have often been excluded from technological advancements. But he sees an opportunity now for public health laboratories to begin exploring machine learning models, brainstorming best use cases and educating themselves about the potential of AI.
“AI has lots of great potential, but it’s not something that I think you’ll see being utilized tomorrow,” Oakeson said. “There are still a lot of questions about its security, safety, privacy and ethics that have to be addressed and answered before it really makes its way into the government level, public health, the state level. But I think once it does, it’s going to be a big boon for public health laboratories.”