nde

NIAID Data Ecosystem KG

The nde (NIAID Data Ecosystem) KG contains infectious and immune-mediated disease datasets. These include datasets from NIAID-funded repositories as well as globally-relevant infectious and immune-mediated disease (IID) repositories from NIH and beyond. The datasets include -omics data, clinical data, epidemiological data, pathogen-host interaction data, flow cytometry, and imaging.

6.0M triples
7 classes
30 properties
591.5K subjects

The nde (NIAID Data Ecosystem) KG contains infectious and immune-mediated disease datasets. These include datasets from NIAID-funded repositories as well as globally-relevant infectious and immune-mediated disease (IID) repositories from NIH and beyond. The datasets include -omics data, clinical data, epidemiological data, pathogen-host interaction data, flow cytometry, and imaging.

The NIAID Data Ecosystem (NDE) Knowledge Graph provides structured metadata for infectious and immune-mediated disease (IID) research resources. Developed by the National Institute of Allergy and Infectious Diseases in collaboration with The Scripps Research Institute, this knowledge graph powers the NIAID Data Ecosystem Discovery Portal (https://data.niaid.nih.gov), which aggregates millions of datasets from over 70 sources including NIAID-funded repositories and globally-relevant IID repositories.

The knowledge graph organizes metadata using Schema.org vocabulary, enabling unified search across diverse biomedical data types including -omics data, clinical studies, epidemiological data, pathogen-host interactions, flow cytometry, and imaging datasets. It connects datasets to their authors, funding sources, research projects, publications, and key disease and pathogen terms, facilitating discovery of resources related to COVID-19, HIV, malaria, tuberculosis, and other infectious diseases. By harmonizing heterogeneous metadata formats and providing both user-friendly search interfaces and programmatic API access, the NDE knowledge graph accelerates IID research and maximizes the impact of publicly-funded scientific data.

Find a study related to influenza and links to other disease sources.
PREFIX schema:   <http://schema.org/>
PREFIX rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>

SELECT ?dataset ?datasetName (GROUP_CONCAT(DISTINCT STR(?xref); separator=" | ") AS ?xrefList)
WHERE {
  {
    SELECT ?dataset ?datasetName
    WHERE {
      ?dataset rdf:type schema:Dataset ;
               schema:name ?datasetName ;
               schema:healthCondition <http://purl.obolibrary.org/obo/MONDO_0005812> .
    }
    LIMIT 1
  }

  OPTIONAL {
    <http://purl.obolibrary.org/obo/MONDO_0005812> oboInOwl:hasDbXref ?xref .
  }
}
GROUP BY ?dataset ?datasetName
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v1("?dataset"):::projected 
  v2("?datasetName"):::projected 
  v3("?xref"):::projected 
  v4("?xrefList")
  c5([obo:MONDO_0005812]):::iri 
  c2([schema:Dataset]):::iri 
  v1 --"a"-->  c2
  v1 --"schema:name"-->  v2
  v1 --"schema:healthCondition"-->  c5
  subgraph optional0["(optional)"]
  style optional0 fill:#bbf,stroke-dasharray: 5 5;
    c5 -."oboInOwl:hasDbXref".->  v3
  end
  bind1[/"str(?xref)"/]
  v3 --o bind1
  bind1 --as--o v4
Count the number of datasets by infectious agent in NDE.
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?agent ?agentName (COUNT(?dataset) AS ?datasetCount) (GROUP_CONCAT(DISTINCT ?catalogShort; separator=", ") AS ?catalogs)
WHERE {
    ?dataset rdf:type schema:Dataset ;
             schema:infectiousAgent ?agent ;
             schema:includedInDataCatalog ?catalog .
    ?agent schema:name ?agentName .
    BIND(REPLACE(STR(?catalog), "^https://okn\\.wobd\\.org/catalog/", "") AS ?catalogShort)
}
GROUP BY ?agent ?agentName
ORDER BY DESC(?datasetCount)
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v3("?agent"):::projected 
  v5("?agentName"):::projected 
  v4("?catalog")
  v6("?catalogShort"):::projected 
  v7("?catalogs")
  v2("?dataset"):::projected 
  v7("?datasetCount")
  c2([schema:Dataset]):::iri 
  v2 --"a"-->  c2
  v2 --"schema:infectiousAgent"-->  v3
  v2 --"schema:includedInDataCatalog"-->  v4
  v3 --"schema:name"-->  v5
  bind0[/"replace(str(?catalog),'^https://okn\.wobd\.org/catalog/','')"/]
  v4 --o bind0
  bind0 --as--o v6
  bind3[/"count(?dataset)"/]
  v2 --o bind3
  bind3 --as--o v7
  bind4[/"?catalogShort"/]
  v6 --o bind4
  bind4 --as--o v7
Find all influenza related studies in NDE.
PREFIX schema: <http://schema.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?dataset ?datasetName ?diseaseName ?doi
WHERE {
    ?dataset rdf:type schema:Dataset ;
             schema:name ?datasetName ;
             schema:healthCondition ?disease ;
  			 schema:sameAs ?doi .
    ?disease schema:name ?diseaseName .
    FILTER(
        ?disease = <http://purl.obolibrary.org/obo/MONDO_0005812> ||
        CONTAINS(LCASE(?diseaseName), "influenza")
    )
   FILTER(REGEX(STR(?doi), "^https://doi\\.org/"))
}
ORDER BY ?diseaseName ?datasetName
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v5("?dataset"):::projected 
  v2("?datasetName"):::projected 
  v4("?disease")
  v1("?diseaseName"):::projected 
  v3("?doi"):::projected 
  c5([schema:Dataset]):::iri 
  f0[["regex(str(?doi),'^https://doi\.org/')"]]
  f0 --> v3
  f1[["(?disease = obo:MONDO_0005812 || contains(lower-case(?diseaseName),'influenza'))"]]
  f1 --> v4
  f1 --> v1
  v5 --"a"-->  c5
  v5 --"schema:name"-->  v2
  v5 --"schema:healthCondition"-->  v4
  v5 --"schema:sameAs"-->  v3
  v4 --"schema:name"-->  v1
List all resources in NDE and count the number of studies in each resource.
PREFIX schema: <http://schema.org/>

SELECT ?catalog (COUNT(?dataset) AS ?datasetCount)
WHERE {
  ?dataset a schema:Dataset ;
           schema:includedInDataCatalog ?catalog .
}
GROUP BY ?catalog
ORDER BY DESC(?datasetCount)
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v3("?catalog"):::projected 
  v2("?dataset"):::projected 
  v4("?datasetCount")
  c2([schema:Dataset]):::iri 
  v2 --"a"-->  c2
  v2 --"schema:includedInDataCatalog"-->  v3
  bind1[/"count(?dataset)"/]
  v2 --o bind1
  bind1 --as--o v4
SPARQL Endpoint https://frink.apps.renci.org/nde/sparql
Triple Pattern Fragments https://frink.apps.renci.org/ldf/nde
ClassEntities
PropertyTriples