IEDB Source of Truth

SoT is a system for dynamically creating, maintaining, and sharing an application ontology that builds on multiple reference resources. A reference resource could be a reference ontology such as OBI, or a similar non-ontological resource such as UniProt. We intend for the SoT system to be general, but the primary goal is to serve projects at LJI, including IEDB, LabKey database for HIPC and DORAS, TopCat, and Bioinformatics Core.

The core of SoT is an application ontology (ONTIE) with its own terms and axioms. ONTIE builds on reference resources including NCBI Taxonomy, MRO, OBI, UniProt, and GenPept. We reuse terms from reference resources as much as possible, but when this is not possible we add terms to ONTIE. Examples include taxa not in NCBI Taxonomy, and proteins not in GenPept. New terms can be added to ONTIE immediately, and will be maintained indefinitely. However, if a better term is found in a reference ontology, the ONTIE term will be marked obsolete and mapped to the reference term.

Each reference resource is pulled in and validated using an importer module. Specific versions of reference resources are used, and updates to reference resources are carefully checked for terms that have been dropped, added, merged, or had their logic changed.

Users can search, browse, and query SoT (i.e. the union of ONTIE and the reference resources) using a uniform web-based interface and API.

SoT consumers such as IEDB will usually have their own internal identifiers for terms, but must keep track of the IRIs for the terms they use from SoT. They can then query SoT using IRIs, ask whether the IRIs have been mapped to new IRIs, and get tables of data for those terms. Consumers can send their users to SoT to find and request terms, but the consumer is responsible for maintaining and displaying terms, local versions of labels and synonyms, and tree structure.

SoT is public and does not contain secret or confidential information. Unauthenticated users can browse and search for terms. Authenticated users can request new terms by selecting a term template and filling in the required information. If SoT can validate that information, it immediately creates a new ONTIE term and provides the user with the new IRI.

A secondary goal of SoT is to demonstrate tight integration across OBO Foundry ontologies. We intend for ONTIE to have a single, integrated and logically-consistent hierarchy, with a unified set of OWL annotation properties and object properties. The term creation and management aspect of SoT may also serve as a good example of how term request can percolate up from specific use-cases to community reference ontologies, while keeping the originating application ontology and annotated datasets in sync.

HTTP/1.1 200 OK Content-Type: text/html; charset=utf-8 Set-Cookie: ring-session=b6498df2-88e1-42bb-9f61-7a873ed3a6b5;Path=/;HttpOnly Content-Length: 15010 Server: http-kit Date: Fri, 14 Dec 2018 17:00:38 GMT IEDB SoT API

API

The IEDB SoT is designed to be used by both humans and software. For humans, we provide HTML pages connected by hyperlinks. For machines, we provide a consistent API to access data in several machine-readable formats.

ONTIE and Other Resources

SoT provides access to the ONTIE application ontology and a number of other resources used by ONTIE and IEDB, including the NCBI Taxonomy, Chemical Entities of Biological Interest, the MHC Restriction Ontology, and more. The full list of resources (including ONTIE) is available at https://ontology.iedb.org/resources. You can query for a specific resource or across all resources.

Individual ONTIE Terms

Individual ONTIE terms can be accessed at their term IRI, for example https://ontology.iedb.org/ontology/ONTIE_0000001. An HTTP GET request to the term IRI will return an HTML document with embedded RDFa data. Alternative representations are available in Turtle, JSON-LD, and TSV formats.

JSON-LD

The JSON Linked Data representation of a single term is a standard JSON object with a few special conventions:

In order to capture RDF semantics, IRIs and literal values are represented as objects, using arrays when multiple values are given. For example, https://ontology.iedb.org/ontology/ONTIE_0000001.json includes the following:

{"label": {"@value":"Mus musculus BALB/c"},
 "alternative term": [{"@value":"balb"}],
 "parent taxon":
 [{"@id": "NCBITaxon:10090",
   "iri": "http://purl.obolibrary.org/obo/NCBITaxon_10090",
   "label": "Mus musculus"}]
 ...}

Tab-Separated Values

A table of tab-separated values about a term can also be requested. By default, five columns of data are provided, for example https://ontology.iedb.org/ontology/ONTIE_0000001.tsv:

IRI	label	recognized	obsolete	replacement
https://ontology.iedb.org/ontology/ONTIE_0000001	Mus musculus BALB/c	true		

If the query parameter show-headers is false, then the header row is omitted, for example https://ontology.iedb.org/ontology/ONTIE_0000001.tsv?show-headers=false:

https://ontology.iedb.org/ontology/ONTIE_0000001	Mus musculus BALB/c	true		

The select query parameter controls the columns that are returned. Provide a comma-separated list of predicate labels, or one of the special values: IRI, CURIE, recognized. The order of predicates is respected in the returned data. For example, https://ontology.iedb.org/ontology/ONTIE_0000008.tsv?select=CURIE,label,alternative%20term:

CURIE	label	alternative term
ONTIE:0000008	Mus musculus 6.5 TCR Tg	14.3.d|SFERFEIFPKE-specific TCR Tg|Tg(Tcra/Tcrb)1Vbo

Multiple values, such as multiple alternative term values, are separated by a single pipe character (|).

When compact=true is set, values will be returned as CURIEs instead of IRIs, for example https://ontology.iedb.org/ontology/ONTIE_0002059.tsv?select=CURIE,replacement&compact=true:

CURIE	replacement
ONTIE:0002059	ONTIE:0002053

If the name of a predicate in a select is followed by [IRI], [CURIE], or [label], then the system will attempt to return values for the column in the format. For example: https://ontology.iedb.org/ontology/ONTIE_0002059.tsv?select=CURIE,replacement%20[CURIE],replacement%20[label]:

CURIE	replacement [CURIE]	replacement [label]
ONTIE:0002059	ONTIE:0002053	Large structural phosphoprotein (Human betaherpesvirus 5)

Individual Subjects

You can query for individual subjects within a resource or across all resources using the subject CURIE or IRI. Special characters in IRIs should be escaped:

Subject pages are also available in other formats:

Predicates

You can get the list of all predicates used within a resource or across all resources:

Multiple Subjects

You can also query for multiple subjects within a resource or across resources:

The HTML form lets you build a query and return results in HTML or TSV formats. The query is controlled by the query parameters in the URL.

You can specify a constraint on the query as the combination of a predicate, an operator, and an object. The predicate should be a label such as type or alternative term. The operator should be one of:

The object is the value to match. For example, to match any rdfs:label starting with 'Mus' the query parameter would be label=like.Mus*: https://ontology.iedb.org/resources/ONTIE/subjects?label=like.Mus*

The IRI and CURIE query parameters are used to specify exactly which subjects to search for. A row is returned for each requested subject, in order, whether or not the requested subject is found in the resource. This is useful for checking term status. Use the in.(*) operator, like so: https://ontology.iedb.org/resources/ONTIE/subjects?CURIE=in.(ONTIE:0000001,ONTIE:0000002). Also see "POST instead of GET" below.

POST instead of GET

When requesting a large number of terms, you can use HTTP POST instead of HTTP GET, and provide a list of the requested CURIEs or IRIs in the body of the request. For this to work, you MUST include method=GET in the query string. For example, POSTing to https://ontology.iedb.org/resources/all/subjects?method=GET&format=tsv with this body:

CURIE
ONTIE:0000001
ONTIE:0000002

will return a table:

CURIE	label	recognized	obsolete	replacement
ONTIE:0000001	Mus musculus BALB/c	true		
ONTIE:0000002	Mus musculus BALB/c A2/Kb Tg	true		

The body of the POST request is a list of CURIEs or IRIs. The first row should be CURIE or IRI. The HTTP Content-Type should be text/plain or text/tab-separated-values, not application/x-www-form-urlencoded which is the default for some tools.

Example: Term Status

When requesting a TSV table, the default columns provide a summary of each term's status. For example, POST to https://ontology.iedb.org/resources/all/subjects?method=GET&format=tsv with this body:

CURIE
ONTIE:0000001
NCBITaxon:10090
NCBITaxon:12
NCBITaxon:3
NCBITaxon:0

The response will be a table of tab-separated values with four columns and a row for each submitted IRI. The columns are:

  1. IRI or CURIE (corresponding to the request)
  2. label
  3. recognized: true if the term is available anywhere in SoT, false otherwise
  4. obsolete: true if the term is obsolete, blank or false if the term is not obsolete
  5. replacement: if the term is recognized and obsolete and has been replaced by another term, this column will contain the replacement term IRI; otherwise it will be blank

For example (contains tab characters):

CURIE	label	recognized	obsolete	replacement
ONTIE:0000001	Mus musculus BALB/c	true		
NCBITaxon:10090	Mus musculus	true		
NCBITaxon:12	obsolete taxon 12	true	true	http://purl.obolibrary.org/obo/NCBITaxon_74109
NCBITaxon:3	obsolete taxon 3	true	true	
NCBITaxon:0		false		

Term Submission

SoT can accept new term submissions from authorized developers. The system current works via a REST API. An HTML form is in development.

Terms can be submitted to https://ontology.iedb.org/ontology/ONTIE using HTTP PUT request. You must include an X-API-Key HTTP header containing a valid developer API key. The body of the request must be a valid Knotation block. It should not include a subject – if the request is valid then a new subject IRI will be assigned. Ideally the new term should use a previously defined template. The ontie.kn source file contains a number of examples.

Example: New Taxon

apply template: taxon class
 label: Mus musculus BALB/c
 parent taxon: Mus musculus
alternative term: balb
rank: subspecies

Example: New Protein

apply template: protein class
 label: Polymerase acidic protein
 taxon: Influenza A virus
alternative term: RNA-directed RNA polymerase subunit P2
alternative term: PA

Example: No Template

type: owl:Class
label: occurrence of disease
definition: The process in which a disease unfolds.
subclass of: biological process