By Ronna Hertzano, M.D., Ph.D., and Anup Mahurkar
Data processing that analyzes a large amount of data about individual cells and measures them through multiple “omics” (such as genomics, transcriptomics, proteomics, epigenomics, and metabolomics) have advanced our understanding of biological sciences and medicine in an unprecedented way. This process is termed high-throughput, cell type-specific multi-omic analyses.
The full benefit of these data, however, is achieved through their reuse. Successful reuse of data requires identifying users and ensuring proper data democratization—that data is accessible to average users—and federation—meaning users’ databases can be connected through a virtual centralized meta-database so their access to others’ data is meaningful. In our paper published in Human Genetics in March 2022, we discuss universal challenges in access and reuse of multi-omic data, possible solutions, including the gEAR (the gene Expression Analysis Resource, umgear.org)—a tool for multi-omic data visualization, sharing, and access for the auditory research field.
Omics data generation and analysis has undergone rapid expansion since the publication of the human and mouse genomes two decades ago. Since then, technological advances have improved the speed, throughput (data processing), accuracy, and affordability of these technologies. In addition, advancements in the past few years enable many of these interrogations to be performed at the resolution of a single cell, allowing us to understand the spatial and temporal dynamics in an extremely detailed way.
Multi-omic data serves as the basis for discovery and are usually published in conjunction with peer-reviewed manuscripts. While the manuscripts highlight key findings, and may offer pertinent gene lists as attached tables, by convention all the data, raw as well as processed, are deposited in repositories.
The data’s value increases when it is made available and subsequently reused by other users for new discoveries. Standardized computational approaches are needed that allow for the data’s findability and reusability. Size of files, access to data, appropriate form of data storage, data annotation, and lack of sufficient experimental metadata are a few of the challenges for sharing and reuse of data. In parallel, we need to continue developing solutions to provide meaningful access to multi-omic data for biologists that are not trained specifically in informatics.
Having progressed from initial seed funding from HHF’s Hearing Restoration Project to now receiving significant National Institute on Deafness and Other Communication Disorders funding, the gEAR portal is an important example of this approach of democratizing data for a specific research community, the hearing research community.
Disabling hearing loss, which affects 1 in 1,000 newborns and over 50 percent of the population older than age 70, may result from mutations in more than 150 genes distributed across the different cell types of the mammalian inner ear. Cell type-specific omics have advanced our understanding of the inner ear cell types, identified critical regulators of cell fate, and uncovered some of the challenges in hair cell regeneration in mammals.
As a primary resource for data sharing within the ear field, the gEAR is cited for data validation, hypothesis generation, and data dissemination. The code, which is open source, has now been used to support research initiatives in other fields beyond the ear. However, such efforts require extensive investment. Funding agencies could propel discovery via the broad use and reuse of multi-omic data across disciplines.
A 2009–2010 Emerging Research Grants scientist and a member of HHF’s Hearing Restoration Project, Ronna Hertzano, M.D., Ph.D. (left), is a professor in the department of otorhinolaryngology–head and neck surgery and an affiliate member of the Institute for Genome Sciences, University of Maryland School of Medicine at the University of Maryland School of Medicine. Anup Mahurkar is the executive director of software engineering & information technology at the Institute for Genome Sciences, University of Maryland School of Medicine. For more, see umgear.org.
By understanding which AAV serotype works best for delivering genetic instructions to specific brain cells and sharing this information in an open-access journal, researchers can design better experiments and potentially develop treatments for brain-related conditions.