Home


In many technical domains, the generic problem solving knowledge is scarce even though a large number of concrete resolutions exist and are well documented. This makes the machine learning from resolution traces approach facing a number of challenges, not least among them the complexity of the underlying domain (concepts, relationships, events, processes, etc.) and the machine-readability of the documented resolution. We tackle here the acquisition of expertise in phylogeny, which is a notoriously rich and prolific field where hundreds, if not thousands, concrete cases are reported in the literature, yet tools to assist the phylogenist in analyzing a new dataset are virtually absent. Thus, we propose an approach that amounts to ontology-based workflow mining: Our T-GOWLer system abstracts general patterns from event sequences previously extracted from texts. It comprises two modules –a workflow extractor and a pattern miner– both relying on a specific domain ontology.

Source code is available in our Github page: https://github.com/halioui/tgowler

Prerequisites

JAVA 1.8: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
PYHTON 2.7 https://www.python.org/download/releases/2.7
TOMCAT 8.0 https://tomcat.apache.org/download-80.cgi
SESAME OpenRDF 2.7.16 https://sourceforge.net/projects/sesame/files/Sesame%202/2.7.16
GATE 8.1 https://gate.ac.uk/download/

Tools

WfExtractor_1.0: Worklflow Extractor on Gate

Download WfExtractor 1.0 [139M]

The WfExtractor_1.0 tool annotates text corpus with its phylgoenetic analyses workflows. Some of the features of WfExtractor_1.0 are:

  • Extract workflow components (programs, parameters, data and metadata) from texts
  • Extract data flows (relations) from texts
  • Create a WSD (Word Sense Disambiguation) models for both components and relations
  • Export Gate Inline XML corpus.

Input data:

CorpusPMC2015 [110K]PHAGE Ontology v1.1 (rdf/zip version) [19M] importing the  Gene Ontology (owl version) [7.2M]

Output data:

Annotated CorpusPMC2015 [202K]

Installation:

  1. Unzip the WfExtractor_1.0.zip file.
  2. Import all files from $WfExtractor_HOME/plugins to the $GATE_HOME/plugins directory
  3. Load the PHAGE ontology via tomcat (see Sesame deployment guide)
  4. If JAVA reports an error please configure the $TOMCAT_HOME/bin/catalina.sh file to prevent Entity Expansion Attacks:
    JAVA_OPTS="$JAVA_OPTS -Djdk.xml.entityExpansionLimit=100000000 -Djdk.xml.FEATURE_SECURE_PROCESSING=false -Xmx6G"
  5. Configure the Gazetteer_LKB dictionary configuration file $WfExtractor_HOME/application-resources/Dictionary_from_remote_repository/config.ttl
  6. with changing the ontology information:
    hr:repositoryURL \< YOUR_HTTP_REPOSITORY \>"
    rep:repositoryID "[YOUR_REPOSITORY_ID]"
    rdfs:label "[YOUR_REPOSITORY_LABEL]"
    For example:
    hr:repositoryURL \< http://localhost:8080/openrdf-sesame/repositories/phage11 \>
    rep:repositoryID "phage11" ;
    rdfs:label "PHAGE_1.1" .
  7. Open Gate and import the application file WfExtractor1.0.xgapp from $WfExtractor_HOME
  8. Run the application (see Gate 8.1 Developer Guide).

WfMiner_1.1: Worklflow Pattern Miner and Rule Recommender

Download WfMiner 1.1 [3.5M]

WfMiner_1.1 mines abstract closed patterns and generate association from XML worklfow sequence files and a specific domain ontology.

Input data:

Extracted workflows from CorpusPMC2015 (training set) [1.4M]PHAGE Ontology v1.2 (small bowl version) [50K]Ground truth (workflows to recommend) [48K]

Output data:

Sample of generated patterns from CorpusPMC2015 [3.2M] Sample of the top 50 rules from CorpusPMC2015 [11K]

Installation:

  1. Launch the bowlUtil_0.5 tool and transform the OWL ontology into a binary one (see the README file in $WFMINER_HOME/bowlUtil/). Bowl tranformation is used to speed up the mining process and load a lighter version of the ontology. Note: please use the bowl version of the ontology from the input data (above) to skip this step and don't forget to download the Gene Ontology (owl version)
  2. Unzip the WfMiner_1.1.zip file and Launch the WfMiner miner using the following code on your shell (see the README file in WFMINER_HOME/):
  3. java -jar java -jar[PATH_TO]/OntoPattern16.jar "[minSupp]" "[PATH_TO]/[bowl_file]" "[PATH_TO]/[train_set]" "[namespace]" "[PATH_TO]/[test_set]" "[topkItems]" "[topnRules]" "[minontology_level]"

    For example:
    java -jar java -jar./OntoPattern16.jar "0.1" "./phylOntology_v51_small_final.bowl" "./WD-Phy-extracted-1/WD-Phy-extracted-1_2783_0.xml" "http://www.co-ode.org/ontologies/ont.owl#" "./WD-Phy-gold-1.xml" "10" "50" "2"

Other T-gowler tools

WfTransformer_1.0: Download WfTransformer 1.0 [2.9K]


This tool transforms the Gate inline XML into sequences of events (encoded in a simple XML tree).

WfSimulator_1.0: Download WfSimulator 1.0 [7.5K]


This tool simulates phylogenetic workflows using instances encoded in the ontolog PHAGE. Using apriori abstract patterns provided by an expert to guide workflow reconstruction. The simulator is based on a Montre Carlo simulation fixing a number of parameters each run to generate event sequences.

Other Sample Data

Ontologies

PHAGE-schema_1.0.owl.zip [4.8K] (or use the BioPortal repository for a graphical view)
PHAGE_1.0-full.owl.zip [19M]

Annotated texts

Corpus_PMC_2015_1_goldStandard_(gate_datastore).zip [17M]
Corpus_PMC_2013_2015_annotated.zip [1.3M]

Extracted Worklfows

WFPub-1-PMC_2008-2013.zip [2.9M]
WFPub-2-PMC_2013-2015.zip [164K]
WFSim.zip [33K]

Contact

For any technical issues, please e-mail admin: halioui.ahmed@uqam.ca

Funding

This work has been supported by the NSERC Discovery Grants of Canada of Petko Valtchev and Abdoulayé Banié Diallo.

Contact : Abdoulaye Baniré Diallo   Webmaster : Mohamed Amine Remita   Log in