The Biocuration 2023 will be hosting 5 workshops, also at the main conference venue. While most precede the mainline conference agenda, the Careers in Biocuration workshop hosted by the ISB will be during the main agenda of the final day of the conference. Here’s a quick look:

Name Date (2023) Time (CEST)
2nd Mapping Commons Workshop on Simple Standard for Sharing Ontological Mappings April 23th 2:30PM - 5:30PM
Aligning Open Biological and Biomedical Ontology Foundry ontologies with Wikidata April 24th 9:00AM - 11:00AM
Gaining perspective towards enhancing the intersection of biocuration and machine learning April 24th 9:00AM - 12:00PM
Functional impact of glycans and their curation April 24th 11:00AM - 1:00PM
Careers in Biocuration April 26th 11:00AM - 12:00PM

2nd Mapping Commons Workshop on Simple Standard for Sharing Ontological Mappings

Date April 23th (Sunday)
Time 2:30PM - 5:30PM (3 Hours)
Organizer Nicolas Matentzoglu, Independent Consultant Semantic Technologies

Despite significant advances in standardisation and FAIRification of data, global interoperability remains an elusive goal. The decentralised nature of standardisation causes semantic spaces to emerge which are governed by diverse standards, in particular controlled vocabularies and semantic data models. To facilitate interoperability across standards, we need to curate and publish mappings.

The Simple Standard for Sharing Ontological Mappings (SSSOM) has been proposed as a standard model for sharing FAIR semantic entity mappings. While the core of the standard model has been solidified, certain issues remain; in particular, the representation of complex mappings (mappings that involve more than 2 entities) and entity-literal mappings. Furthermore, certain use cases remain only partially supported, such as value set mappings and schema crosswalks. In this workshop, we aim to:

  1. Develop a common understanding of the problem of non-simple mappings
  2. Determine if SSSOM is suitable to cover some or all of these non-simple mapping problems
  3. Potentially gather requirements on how SSSOM needs to be extended to cover more of them

This workshop is organised by members of the Monarch Initiative, the SSSOM developer community, FAIR Impact, and FAIRCORE4EOSC.

Aligning Open Biological and Biomedical Ontology Foundry ontologies with Wikidata

Date April 24th (Monday)
Time 9:00AM - 11:00AM (3 Hours)
Organizer Andra Waagmeester, Micelio

Wikidata is the general knowledge graph of Wikimedia and a sister project of Wikipedia. In this workshop, we will explore how to extend the coverage of OBO ontologies in Wikidata. Starting from lessons learned with for example the Disease Ontology and the Gene Ontology, which both are covered in Wikidata, the workshop will continue to work towards covering two additional OBO ontologies (ie. ENVO and GAZ) in Wikidata.

During the workshop, we will also explore how licensing effects their usage. Ontologies in OBO use different licenses. The question is if this does not lead to license stacking and to what extend this disqualifies certain combinations (due to license stacking).

The intended audience consists of Wikidata and OBO curators. The workshop is aimed ontology curators who want to align their ontologies with Wikidata. Familiarity with identifier mapping, legal frameworks, programming in python and SPARQL querying is not a hard requirement, but would be beneficial.

Gaining perspective towards enhancing the intersection of biocuration and machine learning

Date April 24th (Monday)
Time 9:00AM - 12:00PM (2 Hours)
Organizer Lynn Schriml, University of Maryland School of Medicine
Organizer Susan Bello
Organizer Cynthia Smith

This workshop is aimed at engaging resource biocurators, genomic resources, machine learning tool developers, members of the Alliance of Genome Resources and the OBO Foundry. In this workshop we will invite members of the Alliance of Genome Resources and the OBO Foundry to discuss:

  1. The challenges and approaches used to automate curation activities through machine learning approaches
  2. Machine learning (ML) approaches utilized, lessons learned
  3. How the biocuration and ML communities can work more productively together.

Genomic resources and ontologies share the desire to figure out how we can implement ML approaches, share these tools and integrate the ML tools we need to augment our expert curation activities. Funding mechanisms increasingly request the automation of ML approaches, thus the impetus is on resource developers is to devise solutions. One approach is to identify tools that address our data driven needs, to identify solutions to challenges we face and to guide future development to address problems that tools do not yet work on. This workshop will facilitate discussions regarding how to enhance ML/AI data readiness from the perspective of ontologies and genomic resources. ML/AI approaches hold the promise of enhancing the capacity of genomic resources to mine, review and assess data for integration. However, the often transitory nature of ML/AI tool development along with the lack of specifications and planning for long term development support challenges the utility of these approaches for production level genomic resources.

Alternatively, many genomic resources must resort to building in-house approaches. In this workshop, we will address the challenges of integrating ML/AI-ready infrastructure and providing ML-ready datasets from the biocurators perspective. We propose for this first workshop, to be held at Biocuration 2023, to be followed by a second-online ISB workshop focused on learning about the ML perspective from ML developers, to further discussion on how we can work more productively together and asking what ML developers need from the biocuration community. This first 2-hour workshop will focus on (1) literature triage and (2) mapping data between resources cross with one hour dedicated to each topic area.

Following the workshop introduction (15 minutes), describing the history of efforts and attempts to bring ML into databases, the need for controls, GOLD standard (highly curated) datasets, and the need for ML projects long term support, will be followed by two-45 minute panel-driven discussions focused on successes, challenges and the pros and cons of alternative approaches. Panelists from the Alliance and the OBO Foundry community will be invited to share their success and challenges. Each (3-5 person) panel will engage workshop participants to share insights on their quality controlling perspective and to discuss their quality control process when reviewing ML generated datasets. One goal of the discussions is to outline the scope of biocuration problems that ML tools currently address and what biocuration tasks need ML development.

The panel discussion will touch on the following related topics:

  1. Where are we hoping to apply ML/AI
  2. The funding pitfall: how to get ML tools out of the prototype stage and into a functional product
  3. What are the language and jargon barriers, moving between groups, applying one tool to another dataset
  4. Not all tools work across different organisms and the idiosyncrasies of the literature across species
  5. Dependency: for ML/AI – the data absolutely needs to be highly, expertly curated. The necessity of highly curated data to empower ML/AI; pitfalls of lightly curated data
  6. Longitudinal aspect of data - for example, matching diseases or phenotypes, on names, synonyms over time and between resources
  7. The need for precision and recall metrics for the assessment metrics of ML tools
  8. Exploring the establishment of an ISB community ML/AI tool registry/list - where we can share what we use.

The goals of this workshop are to initiate dialog in order to gain a broader understanding of biocuration - ML needs, to reveal challenges and limitations of ML approaches, to foster re-use of ML tools, and to facilitate a greater understanding of where ML tools are being applied successfully.

Output: We propose to author a whitepaper built from the biocurator comments and workshop discussions. The paper will include a list of ML tools and where the tools shine, highlighting the tools that have been used for literature curation and resource content mapping.

Functional impact of glycans and their curation

Date April 24th (Monday)
Time 11:00AM - 1:00PM (2 Hours)
Organizer Raja Mazumder, George Washington University

Dynamic changes in protein glycosylation impact human health and disease progression. However, current resources that capture disease and phenotype information such as MIM, Monarch Initiative, UniProt, Genomics England, and others focus primarily on the macromolecules within the central dogma of molecular biology (DNA, RNA, proteins). In order to gain a more complete understanding of human disease, there is a need to capture the functional impact of glycans and glycosylation on biological processes. While the aforementioned resources include glycan-related genes, such as biosynthetic and degradative enzymes, the function and disease annotations are usually associated with the gene product rather than with the relevant glycosylation and glycan structural changes. Expression of glycan-related genes represents only a subset of factors affecting protein glycosylation. The functional impact of a specific glycan structure may depend on the protein to which it is bound, site of attachment, truncation or loss of the entire glycan structure. A catalog of glycosylation combinations, their relationship with other biomolecules, and their functional implications will provide insight into the biological roles of glycans and the impact of genetic and environmental factors on their expression.

The purpose of this workshop is to bring together subject matter experts, tool developers and biocurators from resources that annotate content that is related to the functional impact of glycans. Each resource will do a short presentation on their data of interest, including types of annotations, what impact glycan function might have on these annotations, and standards and ontologies they are using.

This will be followed by a jamboree/hackathon where we will discuss selected publications to identify commonalities and gaps in our current curation practices, and provide potential solutions. The attendees will help identify areas where curators, data wranglers, and text mining experts can collaborate to address gaps in glycan and glycosylation annotations, leverage each other’s work to improve their respective resources and encourage data sharing amongst resources.

There is a free pre-workshop on Sunday, April 23rd from 9-5pm CEST. Lunch and coffee will be provided. For more information and registration link, please visit here.

Organizers: Raja Mazumder (GlySpace, GlyGen), Mike Tiemeyer (GlySpace, GlyGen), Rene Ranzinger (GlyGen), Maria Martin (UniProt, GlyGen), Kiyoko Aoki-Kinoshita (GlySpace, GlyCosmos, GlyTouCan), Frederique Lisacek (GlySpace, GlyConnect), Cecilia Arighi (PIR, BioCreative, UniProt), Randi Vita (IEDB)

Careers in Biocuration

Date April 26th (Wednesday)
Time 11:00AM - 12:00PM (1 Hour)
Organizer Nicole Vasilevsky, Critical Path Institute
Organizer Randi Vita, La Jolla Institute for Allergy and Immunology
Organizer Mary Ann Tuli, GigaScience

The International Society for Biocuration (ISB) was formed to promote the field of biocuration and to provide a professional society to support curators and aid in career growth and development. The path to a career in biocuration is varied, and we play various roles in our professional positions. This workshop aims to address some of the following questions through structured brainstorming sessions:

  1. Discussion of how do you get a job as a curator?
  2. How do you write your resume/CV?
  3. What skill sets can someone learn to enhance their career growth?

As an outcome of this workshop, we will disseminate the key discussion points and takeaways via the ISB website to further the growth of curators in our community. Please fill out a pre-conference survey here. The survey will be open until March 20th.