prokn

Protein Knowledge Network

The Protein Knowledge Network (ProKN) integrates protein-centric data with the genomic-centric datasets of the Common Fund Data Ecosystem (CFDE), spanning heterogeneous biological data types across multiple domains to foster CFDE re-use and collaboration through enhanced connectivity and data integration, enabling new capabilities for functional genomics and systems-level understanding of disease mechanisms.

81.9M triples
25 classes
115 properties
11.9M subjects

The Protein Knowledge Network (ProKN), developed by the University of Delaware as part of the NIH Common Fund Data Ecosystem (CFDE), is an integrative bioinformatics platform designed to harmonize and explore complex relationships within protein-related data. By utilizing a knowledge graph approach, ProKN links proteins with their post-translational modifications, genetic variants, and functional pathways, offering specialized tools like KSMoFinder for predicting kinase-substrate interactions, as well as services for ID mapping, variant mapping, and protein embeddings. The portal supports the FAIR data principles by providing advanced visualization interfaces alongside programmatic access via SPARQL and REST APIs, ultimately enabling researchers to bridge disparate datasets and generate new hypotheses for precision medicine and drug discovery.

Find genes associated with Alzheimer's disease
PREFIX ns: <https://research.bioinformatics.udel.edu/ProKN/rdf/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX upcore: <http://purl.uniprot.org/core/>
PREFIX efo: <http://www.ebi.ac.uk/efo/EFO_>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX biolink: <https://biolink.github.io/biolink-model/>

SELECT DISTINCT
(?gene AS ?GeneURI)
(?geneLabel AS ?GeneName)
(?gSAB AS ?GeneSource)
(?disease AS ?DiseaseURI)
(?diseaseLabel AS ?DiseaseName)
(?dSAB AS ?DiseaseSource)
WHERE {
  # Find the Alzheimer's disease by its label
        ?disease a ?diseaseType ;
    rdfs:label ?diseaseLabel ;
    dc:source ?dSAB.
        FILTER((?diseaseType = upcore:Disease || ?diseaseType = efo:0000651) && CONTAINS(LCASE(?diseaseLabel), "alzheimer"))

  # Find genes associated with this disease
        ?gene biolink:associated_with ?disease ;
    a upcore:Gene;
    rdfs:label ?geneLabel ;
    dc:source ?gSAB.
}
ORDER BY ?geneLabel
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v12("?DiseaseName")
  v13("?DiseaseSource")
  v11("?DiseaseURI")
  v9("?GeneName")
  v10("?GeneSource")
  v8("?GeneURI")
  v5("?dSAB"):::projected 
  v4("?disease"):::projected 
  v3("?diseaseLabel"):::projected 
  v2("?diseaseType")
  v7("?gSAB"):::projected 
  v6("?gene"):::projected 
  v1("?geneLabel"):::projected 
  c8([http://purl.uniprot.org/core/Gene]):::iri 
  f0[["(?diseaseType = http://purl.uniprot.org/core/Disease || ?diseaseType = http://www.ebi.ac.uk/efo/EFO_0000651)contains(lower-case(?diseaseLabel),'alzheimer')"]]
  f0 --> v2
  f0 --> v3
  v4 --"a"-->  v2
  v4 --"rdfs:label"-->  v3
  v4 --"dct:source"-->  v5
  v6 --"https://biolink.github.io/biolink-model/associated_with"-->  v4
  v6 --"a"-->  c8
  v6 --"rdfs:label"-->  v1
  v6 --"dct:source"-->  v7
  bind1[/"?gene"/]
  v6 --o bind1
  bind1 --as--o v8
  bind2[/"?geneLabel"/]
  v1 --o bind2
  bind2 --as--o v9
  bind3[/"?gSAB"/]
  v7 --o bind3
  bind3 --as--o v10
  bind4[/"?disease"/]
  v4 --o bind4
  bind4 --as--o v11
  bind5[/"?diseaseLabel"/]
  v3 --o bind5
  bind5 --as--o v12
  bind6[/"?dSAB"/]
  v5 --o bind6
  bind6 --as--o v13
Find Properties and Relationships Associated with a Specific Gene(e.g., APOE)
PREFIX ns: <https://research.bioinformatics.udel.edu/ProKN/rdf/>
PREFIX upcore: <http://purl.uniprot.org/core/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT 
(?Subject   AS ?SubjectURI)
(?Predicate AS ?PredicateURI)
(?Object    AS ?ObjectValue)
WHERE {
  ?Subject a upcore:Gene;
    rdfs:label "APOE".
  ?Subject ?Predicate ?Object
}
ORDER BY ?Subject
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v3("?Object"):::projected 
  v6("?ObjectValue")
  v2("?Predicate"):::projected 
  v5("?PredicateURI")
  v1("?Subject"):::projected 
  v4("?SubjectURI")
  c2([http://purl.uniprot.org/core/Gene]):::iri 
  c4(["APOE"]):::literal 
  v1 --"a"-->  c2
  v1 --"rdfs:label"-->  c4
  v1 -->v2--> v3
  bind0[/"?Subject"/]
  v1 --o bind0
  bind0 --as--o v4
  bind1[/"?Predicate"/]
  v2 --o bind1
  bind1 --as--o v5
  bind2[/"?Object"/]
  v3 --o bind2
  bind2 --as--o v6
Find phosphorylation sites that are likely to be downregulated by a perturbagen
PREFIX fma: <http://sig.uw.edu/fma#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX bao: <http://www.bioassayontology.org/bao#BAO_>
PREFIX allotrope: <http://purl.allotrope.org/ontologies/result#>
PREFIX biolink: <https://biolink.github.io/biolink-model/>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX eco: <http://purl.obolibrary.org/obo/ECO_>
PREFIX ns: <https://research.bioinformatics.udel.edu/ProKN/rdf/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX upcore: <http://purl.uniprot.org/core/>
PREFIX schema: <http://schema.org/>
PREFIX reproduceme: <https://w3id.org/reproduceme#>

SELECT DISTINCT (?pertLabel as ?Perturbagen) (?e as ?Experiment) (?ptmLabel as ?PhosphorylationSite) (?log2Ratio as ?Value)
WHERE {
  # 1. Find the perturbagen entity
  ?pert a bao:0003059 ;
        rdfs:label ?pertLabel .
  FILTER(LCASE(STR(?pertLabel)) = "selumetinib")

  # 2. Link the perturbagen to the experiment(s) it was used in
  ?pert eco:9000000 ?e .
  
  # 3. Find the reified statements for affected phosphorylation sites
  # Matches your current data: subject is Experiment, object is PTMSite
  ?stmt rdf:type rdf:Statement ;
        rdf:predicate biolink:affected_by ;
        rdf:subject ?ptmSite ;
        rdf:object ?e;
        ns:log2Ratio ?log2Ratio ;
        dc:source ?source .
  
  # 4. Filter for downregulation (e.g., log2Ratio <= -1.0)
  FILTER(xsd:decimal(?log2Ratio) <= -1.0)

  # 5. Verify the target is a Phosphorylation PTM
  ?ptmSite dc:type 'PHOSPHORYLATION' ;
           rdfs:label ?ptmLabel .
}
ORDER BY ?log2Ratio
LIMIT 500
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v10("?Experiment")
  v9("?Perturbagen")
  v11("?PhosphorylationSite")
  v12("?Value")
  v4("?e"):::projected 
  v1("?log2Ratio"):::projected 
  v3("?pert")
  v2("?pertLabel"):::projected 
  v8("?ptmLabel"):::projected 
  v6("?ptmSite")
  v7("?source")
  v5("?stmt")
  c7([rdf:Statement]):::iri 
  c15(["PHOSPHORYLATION"]):::literal 
  c4([http://www.bioassayontology.org/bao#BAO_0003059]):::iri 
  c9([https://biolink.github.io/biolink-model/affected_by]):::iri 
  f0[["xsd:decimal(?log2Ratio) <= '-1.0^^xsd:decimal'"]]
  f0 --> v1
  f1[["lower-case(str(?pertLabel)) = 'selumetinib'"]]
  f1 --> v2
  v3 --"a"-->  c4
  v3 --"rdfs:label"-->  v2
  v3 --"obo:ECO_9000000"-->  v4
  v5 --"a"-->  c7
  v5 --"rdf:predicate"-->  c9
  v5 --"rdf:subject"-->  v6
  v5 --"rdf:object"-->  v4
  v5 --"https://research.bioinformatics.udel.edu/ProKN/rdf/log2Ratio"-->  v1
  v5 --"dct:source"-->  v7
  v6 --"dct:type"-->  c15
  v6 --"rdfs:label"-->  v8
  bind2[/"?pertLabel"/]
  v2 --o bind2
  bind2 --as--o v9
  bind3[/"?e"/]
  v4 --o bind3
  bind3 --as--o v10
  bind4[/"?ptmLabel"/]
  v8 --o bind4
  bind4 --as--o v11
  bind5[/"?log2Ratio"/]
  v1 --o bind5
  bind5 --as--o v12
Find LINCS 1000 compounds that positively or negatively regulates at least one kinase gene, and is also perturbed in LINCS P100
PREFIX ns: <https://research.bioinformatics.udel.edu/ProKN/rdf/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bao: <http://www.bioassayontology.org/bao#BAO_>
PREFIX ncit: <http://purl.obolibrary.org/obo/NCIT_>
PREFIX upcore: <http://purl.uniprot.org/core/>
PREFIX ro: <http://purl.obolibrary.org/obo/RO_>
PREFIX sio: <http://semanticscience.org/resource/SIO_>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX edam: <http://edamontology.org/>

SELECT DISTINCT ?compoundLabel ?pubchemURI
WHERE {
  # Perturbagen
  ?pert rdf:type bao:0003059 ;
            dc:source "LINCS_P100" ;
            obo:has_dbxref ?pubchemURI.

  # Compound
  ?comp rdf:type ncit:C43366 ;
            dc:source "DDKG_LINCS" ;
            rdfs:label ?compoundLabel.
   optional {
  		?comp rdfs:seeAlso ?pubchemURI .
  }

  # Relationship: Compound -> Gene
  { ?comp ro:0002213 ?kgene. } # POSITIVELY_REGULATES
  UNION
  { ?comp ro:0002212 ?kgene. } # NEGATIVELY_REGULATES

  # Gene
  ?kgene rdf:type upcore:Gene.

  # Relationship: Gene -> Protein(IS_PROTEIN)
  ?kgene sio:010078 ?pr.

  # Protein
  ?pr rdf:type upcore:Protein ;
    edam:data_1011 ?ecNumber.

  FILTER regex(?ecNumber, "(^|;)2\\.7[^;]*")
}
ORDER BY ?compoundLabel ?pubchemURI
LIMIT 100
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v5("?comp")
  v1("?compoundLabel"):::projected 
  v3("?ecNumber")
  v6("?kgene")
  v4("?pert")
  v7("?pr")
  v2("?pubchemURI"):::projected 
  c5(["LINCS_P100"]):::literal 
  c8(["DDKG_LINCS"]):::literal 
  c7([obo:NCIT_C43366]):::iri 
  c3([http://www.bioassayontology.org/bao#BAO_0003059]):::iri 
  c13([http://purl.uniprot.org/core/Gene]):::iri 
  c15([http://purl.uniprot.org/core/Protein]):::iri 
  f0[["regex(?ecNumber,'(^|;)2\.7#91;^;#93;*')"]]
  f0 --> v3
  v4 --"a"-->  c3
  v4 --"dct:source"-->  c5
  v4 --"obo:has_dbxref"-->  v2
  v5 --"a"-->  c7
  v5 --"dct:source"-->  c8
  v5 --"rdfs:label"-->  v1
  subgraph optional0["(optional)"]
  style optional0 fill:#bbf,stroke-dasharray: 5 5;
    v5 -."rdfs:seeAlso".->  v2
  end
  subgraph union0[" Union "]
  subgraph union0l[" "]
    style union0l fill:#abf,stroke-dasharray: 3 3;
    v5 --"obo:RO_0002212"-->  v6
  end
  subgraph union0r[" "]
    style union0r fill:#abf,stroke-dasharray: 3 3;
    v5 --"obo:RO_0002213"-->  v6
  end
  union0r <== or ==> union0l
  end
  v6 --"a"-->  c13
  v6 --"http://semanticscience.org/resource/SIO_010078"-->  v7
  v7 --"a"-->  c15
  v7 --"http://edamontology.org/data_1011"-->  v3
List All Diseases and Their Names and Sources
PREFIX ns: <https://research.bioinformatics.udel.edu/ProKN/rdf/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX upcore: <http://purl.uniprot.org/core/>
PREFIX efo: <http://www.ebi.ac.uk/efo/EFO_>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT 
(?disease AS ?DiseaseURI)
(?label AS ?Name)
(?SAB AS ?Source)
WHERE {
  ?disease a ?diseaseType.

  # Filter for both Disease and DiseaseOrPhenotype types
    FILTER(?diseaseType = upcore:Disease || ?diseaseType = efo:0000651)

        ?disease rdfs:label ?label.
  ?disease dc:source ?SAB.
}
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v5("?DiseaseURI")
  v6("?Name")
  v4("?SAB"):::projected 
  v7("?Source")
  v2("?disease"):::projected 
  v1("?diseaseType")
  v3("?label"):::projected 
  f0[["(?diseaseType = http://purl.uniprot.org/core/Disease || ?diseaseType = http://www.ebi.ac.uk/efo/EFO_0000651)"]]
  f0 --> v1
  v2 --"a"-->  v1
  v2 --"rdfs:label"-->  v3
  v2 --"dct:source"-->  v4
  bind1[/"?disease"/]
  v2 --o bind1
  bind1 --as--o v5
  bind2[/"?label"/]
  v3 --o bind2
  bind2 --as--o v6
  bind3[/"?SAB"/]
  v4 --o bind3
  bind3 --as--o v7
Find protein kinases in ProKN.
PREFIX upcore: <http://purl.uniprot.org/core/>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX edam: <http://edamontology.org/>
PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT ?kinase ?accession ?ecNumber WHERE {
    ?kinase a upcore:Protein ;
            dc:source "UniProtKB" ;
            obo:NCIT_C25402 ?accession ;
            edam:data_1011 ?ecNumber .
    FILTER regex(?ecNumber, "(^|;)2\\.7[^;]*")
}
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v3("?accession"):::projected 
  v1("?ecNumber"):::projected 
  v2("?kinase"):::projected 
  c5(["UniProtKB"]):::literal 
  c3([http://purl.uniprot.org/core/Protein]):::iri 
  f0[["regex(?ecNumber,'(^|;)2\.7#91;^;#93;*')"]]
  f0 --> v1
  v2 --"a"-->  c3
  v2 --"dct:source"-->  c5
  v2 --"obo:NCIT_C25402"-->  v3
  v2 --"http://edamontology.org/data_1011"-->  v1
Find Properties and Relationships Associated with a Specific Protein(e.g., TP53)
PREFIX ns: <https://research.bioinformatics.udel.edu/ProKN/rdf/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX upcore: <http://purl.uniprot.org/core/>

SELECT DISTINCT 
(?Subject   AS ?SubjectURI)
(?Predicate AS ?PredicateURI)
(?Object    AS ?ObjectValue)
WHERE {
  ?Subject a upcore:Protein;
    obo:NCIT_C164806 "TP53".
  ?Subject ?Predicate ?Object
}
ORDER BY ?Subject
graph TD
classDef projected fill:lightgreen;
classDef literal fill:orange;
classDef iri fill:yellow;
  v3("?Object"):::projected 
  v6("?ObjectValue")
  v2("?Predicate"):::projected 
  v5("?PredicateURI")
  v1("?Subject"):::projected 
  v4("?SubjectURI")
  c2([http://purl.uniprot.org/core/Protein]):::iri 
  c4(["TP53"]):::literal 
  v1 --"a"-->  c2
  v1 --"obo:NCIT_C164806"-->  c4
  v1 -->v2--> v3
  bind0[/"?Subject"/]
  v1 --o bind0
  bind0 --as--o v4
  bind1[/"?Predicate"/]
  v2 --o bind1
  bind1 --as--o v5
  bind2[/"?Object"/]
  v3 --o bind2
  bind2 --as--o v6
SPARQL Endpoint https://frink.apps.renci.org/prokn/sparql
Triple Pattern Fragments https://frink.apps.renci.org/ldf/prokn
ClassEntities
PropertyTriples