Where Industry Meets Experimentation

CMU Libraries and NVIDIA Hackathon Prototypes the Future of Bioinformatics

NVIDIA hackathon

by Sarah Bender

For many biomedical researchers, the biggest challenges don’t begin with a lack of ideas — they begin with a lack of access. Health data is powerful but deeply sensitive, and collaborating across institutions, countries, or health systems often raises legal, ethical, and technical concerns. Yet solving today’s most urgent biomedical questions increasingly depends on working across those boundaries.

To address this challenge, the University Libraries and NVIDIA brought students, researchers, and industry experts together for a three-day bioinformatics hackathon the first week of January. The goal was to explore how researchers can work together while keeping sensitive health data secure and decentralized — meaning the data stays where it is, rather than being copied into one central place.

To do this, participants used NVIDIA FLARE, an open-source platform for federated learning. Federated learning allows computer models to be trained across many local datasets without moving the data itself. During the hackathon, teams explored how federated learning can be applied, and built prototypes that demonstrate its technical feasibility and utility to biobanks around the world that host sensitive biomedical data.

NVIDIA FLARE provided industry-ready, open-source infrastructure specifically designed for federated learning. Cloud provider Amazon Web Services (AWS) also sent a team to the hackathon, to provide access to and guidance on using their services and Open Data on AWS — a program with a mission to share data in the cloud, enabling data users to spend more time on analysis rather than acquisition. With advanced computing resources, open data, and shared workflows, participants were able to test ideas at a scale rarely available in academic settings, pushing the needle from theory toward real-world application.

NVIDIA hackathon

Building at Scale

More than 100 participants gathered in person in Hunt Library and virtually from around the world to kick off the hackathon on Jan. 7, making it the Libraries’ largest bioinformatics hackathon to date. Hailing from seven countries, participants represented 27 different universities, five research centers, four companies, and two hospital systems.

“We were joined by undergraduates and graduate students, people from national labs, and a number of industry experts from various biotech companies,” explained STEM Librarian and Open Science Program Director Melanie Gainey, who organized the event with STEM Librarian Huajin Wang and Open Source Programs Office Community Manager Tom Hughes. “The kind of collaboration that results from a group like this has an extraordinary impact — both on students benefiting from valuable mentorship and industry professionals taking advantage of such a large pool of talent and ideas.”

The hackathon wasn’t just the most well-attended. According to NVIDIA Global Alliances Manager Ben Busby, this year’s event also focused on a more complex set of problems than previous iterations.

NVIDIA hackathon

“This is one of the more complicated hackathons we’ve ever run, both from a technical perspective and also with the legal and policy aspects of data sharing,” said Busby, who has collaborated on similar bioinformatics hackathons at CMU for the last five years. “As science gets bigger and more complicated and hackathons are becoming an increasingly legitimate way to drive the art of the possible, we’re scaling up and out.”

To tackle a number of real-world challenges related to the theme, attendees were divided into 10 teams, each with a different problem to solve. They explored topics like mitigating data gaps stemming from unequal access to healthcare and how machine learning and statistical methods can be used to identify potential cancer markers. On the surface, the teams were competitive, vying for a chance to win prizes like “Best Documentation” and “Most Innovative.”

But the lighthearted competition didn’t overpower the collaborative nature of the hackathon, and many remarked that the opportunity to connect with others was one of the most valuable aspects of the experience. “The event created a truly collaborative and intellectually stimulating environment, and I deeply appreciated the opportunity to learn from experts at the intersection of biomedical science, data infrastructure, and AI,” said University of Arkansas for Medical Sciences graduate student Md. Enamul Hoq. “The mentorship, technical depth, and infrastructure support provided by CMU Libraries made it one of the most productive and rewarding hackathons I have ever participated in.”

NVIDIA hackathon

Learning By Doing

For 2023 Mellon College of Science graduate Mahtabin Rodela Rozbu, the hackathon was a chance to explore federated learning for the first time. “My main objective was to come away with an understanding of what federated learning is, and I knew that this hackathon would be the perfect learning opportunity,” she said.

Rozbu comes from a background in biology — she switched to computational biology further into her academic career. The hackathon's structure, with people from diverse backgrounds working together and different teams addressing various subtopics, provided the perfect setting to acquire a wide breadth of knowledge.

“I’ve found that participants arrive from different paths and bring different experiences,” she explained. “The ways different people perceive and process information is vast, and that leads to a lot of good debates and arguments that help you see other perspectives, get a better understanding of the bold details and bigger picture, and ask even more fundamental questions that eventually fill in gaps in your knowledge.”

Attendees also had a chance to develop leadership skills. “Over the course of the hackathon, multiple participants established themselves as leaders without anybody asking them to,” recalled University of Colorado Anschutz Biomedical Informatics Professor Sean Davis, who served as one of the judges for the event. “One of them told me, ‘I don’t have any experience as a manager’ when I complimented their work — but now they do! They took people who didn't know each other and whose skills weren’t initially aligned with the specific problem, and they built a functioning team, driving them to be the very best that they can be.”

School of Computer Science sophomore Tyler Yang took this spirit of teamwork a step further by organizing evening outings to various Pittsburgh locations, including the Cathedral of Learning and the Duquesne Incline. “As a second year undergraduate I applied to this hackathon on a whim, but I'm glad I did,” he said. “I will say the actual hacking itself was valuable, but it was truly the people that made it great — both as helpful ‘consultants’ during the hackathon and as great friends in the evenings!”

NVIDIA hackathon

From Prototype to Publication

The impact of the hackathon did not end when the three days were over. All pipelines are licensed for reuse and can be found on GitHub. Some teams plan to continue working on their projects, and participants are now collaborating on a shared, open-access research paper that will document their approaches and findings, laying the groundwork for future progress.

To support that process, organizers from the Libraries developed a structured manuscript template and publication workflow designed to streamline collaborative authorship. The goal is not only to publish efficiently, but to do so openly, ensuring that the work is accessible, reproducible, and useful to others in the field.

“One of the things we’ve learned over the years is that the paper doesn’t write itself,” Hughes said. “The Libraries really steps in there — we know how to organize collaborative writing, how to support open publishing, and how to get the work out into the world. Translating what happens during the hackathon into research that others can build on is a huge part of why these hackathons have lasting impact.”

“The hackathon demonstrated what can happen when industry expertise, academic curiosity, and library-led research support come together around a serious problem,” Wang added. “By guiding the work toward open, shared outcomes, the Libraries helps ensure that what’s built in three days can have lasting value for the broader research community.”

NVIDIA hackathon