Data Science SIG Panel Recap: Data sharing and harmonization

On 21 January, 2021, the CSCCE Data Science Special Interest Group (SIG) convened a panel on data sharing and harmonization. The goal of the meeting was to highlight common challenges for community managers of data-centric communities, as well as discuss solutions and best practices to make it easier for community members to share and reuse data. In this blog post, watch the three short presentations from the panelists, and catch up on some of the key points raised. 

The National Microbiome Data Collective (NMDC)

The goal of the NMDC community is to make microbiome data FAIR: Findable, Accessible, Interoperable, and Reusable. NMDC’s community engagement manager Pajau Vangay shared her team’s nascent strategy for making this goal a reality, which includes establishing a champions program to help encourage open data sharing among scientists. A key piece of NMDC’s data harmonization strategy is ensuring a standard format for users to submit their metadata to the platform, something that is challenging across the wide variety of researchers who perform ‘omics studies. 

The AD Knowledge Portal Community

Born out of the need to consolidate and move forward research into Alzheimer’s disease (AD), a field that has struggled to make progress towards viable treatments, the AD Knowledge Portal connects more than 250 clinical researchers from across the US. Zoe Leanza (SAGE Bionetworks) manages this community, who submit a wide variety of data types to the portal. Consistency, completeness, and accuracy are the main concerns for data collection in this community, so to standardize the upload process, SAGE created a metadata collection tool called the DCC Validator. This tool flags inconsistencies or errors in the data before they are added to the portal, offering a straightforward “clearing house.”

The National Ecological Observatory Network (NEON)

Unlike NMDC and the AD Knowledge Portal, which collect and curate data from members of their community, NEON staff collect data to share with researchers across a range of ecological and environmental science disciplines. In this presentation, Alycia Crall highlights how NEON is working to ensure the data they provide interfaces seamlessly with other national and international network science projects. This involved developing standardized protocols for collecting 180 different types of ecological data at 81 field sites across the US, including Alaska, Hawaii, and Puerto Rico. All the data are freely available through a data portal and API, as are the data collection protocols, although the large size of many of the files made data access quite challenging. 

About the CSCCE Data Science SIG

The Data Science SIG is a space for community managers from data science, data science adjacent, and data science interested communities to gather and share activities, updates, and observations. We are especially interested in learning how cross-community information sharing and activities can raise up all of our communities.

More about CSCCE SIGs