Troubleshooting / admin guide

Workflow execution workflow

Here is the workflow of operations performed to run a workflow from the VBrowser moteur plugin.

  1. VBrowser plugin
    • connects to https:<servername>/moteur_server (as configured in moteur plugin) * sends workflow.scufl, input.xml and user proxy - moteur_server: * creates ${DOCUMENT_ROOT}/workflows/ if it does not exist * creates ${DOCUMENT_ROOT}/workflows/.htaccess to define .vljids, .rdf, .err and .out file types * creates ${DOCUMENT_ROOT}/workflows/workflow-XXXXXX and writes hello.scufl as workflow.xml, inputs as inputs.xml in it * writes proxy in /tmp/workflow-XXXXXX-proxy (with permissions 400). * launches ./submitWorkflow.sh ${DOCUMENT_ROOT}/workflows/workflow-XXXXXX/workflow.xml ${DOCUMENT_ROOT}/workflows/workflow-XXXXXX/inputs.xml /tmp/workflow-XXXXXX-proxy » ./moteur_service.log * builds the monitoring URL as https:<hostname>/workflows/workflow-XXXXXX/html/workflow-XXXXXX.html
    • returns URL to VBrowser plugin
  2. submitWorkflow.sh
    • sources ./env.sh
    • gets user's DN from proxy
    • checks if the DN already has a proxy by reading ./moteur_users.txt. If yes: overwrites old one with new one. If not: add line in ./moteur_users.txt
    • creates default html page in https:<hostname>/workflows/workflow-XXXXXX/html/workflow-XXXXXX.html * reads grid type in ./conf/grid.conf. If DIANE: calls startMaster.sh and startAgents.sh. Otherwise: do nothing. * cd to ${DOCUMENT_ROOT}/workflows/workflow-XXXXXX/ and launch moteur engine in there, putting logs in workflow.{out,err} - moteur engine * parses workflow.xml, instantiates it on inputs.xml and makes corresponding component calls. Grid jobs (i.e. workflow processors identified by GASW_execution operation) end up with JNI calls to methods implemented in libTask.so, then libGasw.so and libGrid.so (located in /var/www/cgi-bin/<servername>/lib). * generates html monitoring files in ./html - grid job execution * tests if XML (GASW) descriptor is in ./gasw * if not: downloads GASW descriptor of in ./gasw, based on URI protocol (lfn: → lcg-cp ; http: → wget ; file: → cp) * reads ./conf/grid.conf and sets grid type (GLITE_WMS, LCG orDIANE), job default requirements, VO, upload SE, timeout, default retry count and env to use on the worker nodes. * parses GASW description and builds job shell (bash) script from it (saved in ./sh/<jobname>.sh with <jobname>: `basename gaswfile.xml`-<jvm pid>-<time>-<rand>). Requirements and retry count found in GASW file overwrite default values read in ./conf/grid.conf. * for GLITE_WMS or LCG grid types: builds jdl executing this shell script and stores it in ./jdl/<jobname>.jdl * submits jdl (sh for DIANE grid type), check status and resubmit it when exit code != 0, until it succeeds or reaches retry count. ===== Summary of log files ===== - moteur_server * main log: /var/www/cgi-bin/<server_name>/moteur_service.log * DN/proxy files associations: /var/www/cgi-bin/<server_name>/moteur_users.txt - workflow execution * engine log: /var/www/workflows/workflow-XXXXXX/workflow.{err,out} * jdls, job scripts, diane logs and std.{err,out}: /var/www/workflows/workflow-XXXXXX/ ===== Summary of configuration files ===== - environment used to launch moteur engine: /var/www/cgi-bin/<server name>/env.sh - grid job submission: /var/www/cgi-bin/<server name>/conf/grid.conf