Home


In many technical domains, the generic problem solving knowledge is scarce even though a large number of concrete resolutions exist and are well documented. This makes the machine learning from resolution traces approach facing a number of challenges, not least among them the complexity of the underlying domain (concepts, relationships, events, processes, etc.) and the machine-readability of the documented resolution. We tackle here the acquisition of expertise in phylogeny, which is a notoriously rich and prolific field where hundreds, if not thousands, concrete cases are reported in the literature, yet tools to assist the phylogenist in analyzing a new dataset are virtually absent. Thus, we propose an approach that amounts to ontology-based workflow mining: Our T-GOWLer system abstracts general patterns from event sequences previously extracted from texts. It comprises two modules –a workflow extractor and a pattern miner– both relying on a specific domain ontology.

Source code is available in our Github page: https://github.com/halioui/tgowler

Prerequisites

JAVA 1.8: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
PYHTON 2.7 https://www.python.org/download/releases/2.7
TOMCAT 8.0 https://tomcat.apache.org/download-80.cgi
SESAME OpenRDF 2.7.16 https://sourceforge.net/projects/sesame/files/Sesame%202/2.7.16
GATE 8.1 https://gate.ac.uk/download/

Data

Please refere the data.world page in order to access data or download them from here. We include exclusively here additional data (such as previously published exctracted workflows and the un-annotated texts.) -- see below.

Tools

WfExtractor_1.0: Worklflow Extractor on Gate

Download WfExtractor 1.0 [139M]

The WfExtractor_1.0 tool annotates text corpus with its phylgoenetic analyses workflows. Some of the features of WfExtractor_1.0 are:

  • Extract workflow components (programs, parameters, data and metadata) from texts
  • Extract data flows (relations) from texts
  • Create a WSD (Word Sense Disambiguation) models for both components and relations
  • Export Gate Inline XML corpus.

Input data:

datastore_2018_2019.zip [248M]datastore_gold.zip [17M]tgowler_resource_ontologies_PHAGE_1.0.rdf.zip [19M] 

Output data:

annotated_2018_2019.zip [12M]annotated_gold_100.zip [180K]annotated_gold_500.zip [800K]WF_2018_2019.xml [6,6M]WF_gold_100.xml [136K]WF_gold_500.xml [564K] 

Installation:

  1. Unzip the WfExtractor_1.0.zip file.
  2. Import all files from $WfExtractor_HOME/plugins to the $GATE_HOME/plugins directory
  3. Load the PHAGE ontology via tomcat (see Sesame deployment guide)
  4. If JAVA reports an error please configure the $TOMCAT_HOME/bin/catalina.sh file to prevent Entity Expansion Attacks:
    JAVA_OPTS="$JAVA_OPTS -Djdk.xml.entityExpansionLimit=100000000 -Djdk.xml.FEATURE_SECURE_PROCESSING=false -Xmx6G"
  5. Configure the Gazetteer_LKB dictionary configuration file $WfExtractor_HOME/application-resources/Dictionary_from_remote_repository/config.ttl
  6. with changing the ontology information:
    hr:repositoryURL \< YOUR_HTTP_REPOSITORY \>"
    rep:repositoryID "[YOUR_REPOSITORY_ID]"
    rdfs:label "[YOUR_REPOSITORY_LABEL]"
    For example:
    hr:repositoryURL \< http://localhost:8080/openrdf-sesame/repositories/phage11 \>
    rep:repositoryID "phage11" ;
    rdfs:label "PHAGE_1.1" .
  7. Open Gate and import the application file WfExtractor1.0.xgapp from $WfExtractor_HOME
  8. Run the application (see Gate 8.1 Developer Guide).

WfMiner_1.1: Worklflow Pattern Miner and Rule Recommender

Download WfMiner 1.1 [3.5M]

WfMiner_1.1 mines abstract closed patterns and generate association from XML worklfow sequence files and a specific domain ontology.

Input data:

WFMiner_WF_2018_2019.xml [13M]WFMiner_WF_gold_100.xml [252K]WFMiner_WF_gold_500.xml [1,1M] 

Output data:

TODO

Installation:

  1. Launch the bowlUtil_0.5 tool and transform the OWL ontology into a binary one (see the README file in $WFMINER_HOME/bowlUtil/). Bowl tranformation is used to speed up the mining process and load a lighter version of the ontology. Note: please use the bowl version of the ontology from the input data (above) to skip this step and don't forget to download the Gene Ontology (owl version)
  2. Unzip the WfMiner_1.1.zip file and Launch the WfMiner miner using the following code on your shell (see the README file in WFMINER_HOME/):
  3. java -jar java -jar[PATH_TO]/OntoPattern16.jar "[minSupp]" "[PATH_TO]/[bowl_file]" "[PATH_TO]/[train_set]" "[namespace]" "[PATH_TO]/[test_set]" "[topkItems]" "[topnRules]" "[minontology_level]"

    For example:
    java -jar java -jar./OntoPattern16.jar "0.1" "./phylOntology_v51_small_final.bowl" "./WD-Phy-extracted-1/WD-Phy-extracted-1_2783_0.xml" "http://www.co-ode.org/ontologies/ont.owl#" "./WD-Phy-gold-1.xml" "10" "50" "2"

Other T-gowler tools

WfTransformer_1.0: Download WfTransformer 1.0 [2.9K]


This tool transforms the Gate inline XML into sequences of events (encoded in a simple XML tree).

WfSimulator_1.0: Download WfSimulator 1.0 [7.5K]


This tool simulates phylogenetic workflows using instances encoded in the ontolog PHAGE. Using apriori abstract patterns provided by an expert to guide workflow reconstruction. The simulator is based on a Montre Carlo simulation fixing a number of parameters each run to generate event sequences.

Other Sample Data

Ontologies

PHAGE-schema_1.0.owl.zip [4.8K] (or use the BioPortal repository for a graphical view)
PHAGE_1.0-full.owl.zip [19M]

Annotated texts

Corpus_PMC_2015_1_goldStandard_(gate_datastore).zip [17M]
Corpus_PMC_2013_2015_annotated.zip [1.3M] ~ corrpupted :(

Extracted Worklfows

WFPub-1-PMC_2008-2013.zip [2.9M]
WFPub-2-PMC_2013-2015.zip [164K]
WFSim.zip [33K]

Contact

For any technical issues, please e-mail admin: halioui.ahmed@uqam.ca

Funding

This work has been supported by the NSERC Discovery Grants of Canada of Petko Valtchev and Abdoulayé Banié Diallo.

Contact : Abdoulaye Baniré Diallo   Webmaster : Amine M. Remita   Log in