ParaLoop - The documentation
The copyright notice
paraloop
is governed by the CeCILL license under French law
The ParaLoop Quick reference card
How are read switches and parameters ?
The switches and parameters are read in the order described under. If some switch or parameter
is set at some step, it is NEVER SET again: thus the default values are set at the end of the process,
the imposed values are set at the beginning.
- The switch on the command line
$PARALOOP/../etc/paraloop.root.cfg
f1.cfg
(supposing the switch --cfg f1.cfg,f2.cfg
was specified)
f2.cfg
$HOME/.paralooprc
$PARALOOP/../etc/paraloop.cfg
Switches and parameters useful for the end user
Substitution characters
In all the parameters describing a file or a directory, you can insert some characters that will be substituted at runtime.
The list of allowed characters is described here:
Character | Substituted value |
%h | The hour part of time (11 for 11:30:05) |
%m | The minute part of time (30 for 11:30:05) |
%s | The seconds part of time (05 for 11:30:05) |
%Y | The year part of date (05 for Sept 6th 2005) |
%M | The month part of date (09 for Sept 6th 2005) |
%Y | The day part of date (06 for Sept 6th 2005) |
%p | The number of cpus (ncpus parameter) |
%l | The number of cpus on the local machine, for a cluster (local_ncpus parameter) |
%v | The number of cpus in master/slave mode (slave_ncpus parameter) |
Files and directories
Parameter | Switch | Default | Meaning |
| --cfg=f1.cfg,f2.cfg | | List of configuration files, the first specified is read first |
PARALOOP_max_file_size | | 1 Gb | The max output file size. If more than 1 Gb, another file is created |
PARALOOP_error_directory | | PARALOOP_error | The error directory |
PARALOOP_lock_directory | | PARALOOP_lock | The lock directory |
Messages and log files
Parameter | Switch | Default | Meaning |
| --verbose | No | Display more stuff to the console |
| --quiet | No | Display nearly no message |
PARALOOP_log_level | | 01 | 0 =log nearly nothing
01 =log normally
012 =log more |
Input, output
Parameter | Switch | Default | Meaning |
PARALOOP_input | --input | | The name of the input file. May be a path
May include substitution characters |
PARALOOP_output | --output | | The name of the output file. May be a path
May include substitution characters |
PARALOOP_start | --start | 0 | The start record number (0 means first record) |
PARALOOP_end | --end | End of file | The end record number |
PARALOOP_interleaved | --interleaved | no | Distribute the data in a round-robin algorithm |
Plugins
Parameter | Switch | Default | Meaning |
| --plugins | | Display the list of available plugins |
PARALOOP_program | --program | | The plugin to use |
PARALOOP_db | --db | | Used by some plugins (Blast ) |
PARALOOP_wait | --wait | 0 | Do not return, wait for every child to finish |
Processors and queues
Parameter | Switch | Default | Meaning |
PARALOOP_ncpus | --ncpus | Set by the administrator | The number of cpus to use (the number of children processes to run) |
| --local | no | Run on the local machine, without sending the jobs to the cluster nodes |
PARALOOP_fair_time_limit | | Set by the administrator | Only implemented with queues. After this time has elapsed, the job is submitted again, then interrupted, letting your colleagues a chance to work. |
PARALOOP_account | --account | | Only implemented with PBS The account, passed to the qsub utility. |
PARALOOP_queue | --queue | Set by the administrator | The execution queue |
PARALOOP_qsub_params | | Set by the administrator | Additional parameters passed to qsub |
Load balancing
Sometimes, the work dedicated to some processor takes muche more time to achieve than the work dedicated to
the other processors: configuring load balancing mode is then useful; in this mode of operation, the faster processors will "steal" their
work to the slower ones.This mode is controlled by the following parameters:
Parameter | Default | Meaning |
PARALOOP_load_balancing_enable | 0 | Enable the load balancing mode |
PARALOOP_load_balancing_threshold | 1 | Faster jobs are allowed to "steal" some work to slower jobs
when slower jobs have more than threshold records to process |
Interrupting, checking, restarting
Command | Action |
paraloop.pl --check <lock_directory> | Display the avancement of the jobs |
paraloop.pl --interrupt <lock_directory> | Interrupt the jobs |
paraloop.pl --restart <lock_directory> | Restart the jobs interrupted by previous command. |
paraloop.pl --waituntil <lock_directory> | Do nothing: just wait until job terminated. |
Parameters of the Shell plugin
Please have a look to the Shell documentation for the details about this plugin.
Parameter | Default | Meaning |
PARALOOP_Shell_interpreter | /bin/sh | path to the default shell interpreter |
Parameters of the Bioperl plugin
Please have a look to the Bioperl documentation for the details about this plugin.
Parameter | Default | Meaning |
PARALOOP_Bioperl_path | | path to the external script, ran at each iteration |
PARALOOP_Bioperl_params | '' | parameters passed to this script |
PARALOOP_Bioperl_input_format | fasta | Format of the input file, read by the external script |
Parameters of the Blast plugin
Please have a look to the Blast documentation for the details about this plugin.
Parameter | Switch | Default | Meaning |
PARALOOP_Blast_origin | | ncbi | ncbi for blast ncbi, wu for wu blast |
PARALOOP_Blast_path | | blastall if Blast_origin is ncbi
blastp if Blast_origin is wu | The path to the executable |
PARALOOP_Blast_params | | -p blastp if Blast_origin is ncbi
'' if Blast_origin is wu | The parameters passed to the executable |
PARALOOP_Blast_chunk | | 1 | The sequences are grouped in chunks of N sequences, N is given by this parameter |
PARALOOP_db | --db | | The database |
Parameters useful for the administrator
Those parameters may be set two files, with two different meanings:
.../etc/paraloop.root.cfg
- Those parameters cannot be overloaded by the user
.../etc/paraloop.cfg
- Those parameters are default values, they can be overloaded by the users.
General parameters
Parameter | Default | Meaning |
PARALOOP_Scheduler | | The Scheduler to use:
- System for a multiprocessor machine
- PBS for a machine equipped with the PBS queing system
- Rsystem for a cluster without any queing system
|
PARALOOP_no_local_mode | 0 | If specified, the users will not be able to use the --local switch,
thus forcing them to use the queing system. |
PARALOOP_fair_time_limit | 0 | Set this parameter to keep the users from monopolizing the processors |
PARALOOP_max_file_size | 1000000000 | If the output file grows too much, it is closed and a new file is reopened |
PARALOOP_PBS_ncpus | | The default number of cpus when using PBS |
PARALOOP_System_ncpus | | The default number of cpus when using System (or the --local switch) |
PARALOOP_Rsystem_ncpus | | The default number of cpus when using Rsystem |
PARALOOP_local_ncpus | | The default number of cpus when using local mode (switch --local ) |
PARALOOP_slave_ncpus | | The number of cpus each master job controls |
Parameters for the PBS Scheduler
Parameter | Default | Meaning |
PARALOOP_account | | The account name, passed to qsub |
PARALOOP_qsub_params | '' | Additional parameters passed to qsub |
PARALOOP_queue | | The execution queue |
Parameters for the Rsystem scheduler
Parameter | Default | Meaning |
PARALOOP_Rsystem_nodes | | The list of nodes constituting the cluster. Example:
node1,node2,node3 |
PARALOOP_Rsystem_rsh | rsh | The program to use for sending / executing something on the nodes: may be ssh |
PARALOOP_Rsystem_tmp | /tmp | The name of a temporary directory. This directory must be local to the node, it cannot be shared |
The master/slave mode
In this mode, paraloop works in a slightly different way:
- When the use runs paraloop with the parameter
ncpus
set to, say 2, the really launched jobs are the so-called "master" jobs.
- Each master will in turn launche
slace_ncpus
(say 3) "slave" jobs.
- Then, the number of jobs really running is 8:
- 2 master jobs, who just control the job of their slaves
- 3 x 2 slave jobs, who actually do the work.
This mode is controlled by the following parameters:
Parameter | Default | Meaning |
PARALOOP_mode | AUTONOMOUS | AUTONOMOUS or MASTER/SLAVE |
PARALOOP_slave_ncpus | | The number of jobs each master controls. |
The PARALOOP documentation
The main documentation
Document | Description |
User documentation | The user documentation, including a tutorial for writing plugins |
The plugins
Document | Description |
Plugin | The abstract class at top of the plugin hierarchy |
BpInput | The abstract class used for reading files with bioperl |
LnInput | The abstract class used for reading text files |
Bioperl | A general plugin to execute a treatment on bioperl files |
Shell | A general plugin to execute some lines of scripts, one line per processor |
Blast | A specialized plugin to execute a blast (ncbi or wu) or every sequence found in the input file |
Dummy | This dummy plugin can be used as a template for writing your own plugins |
The schedulers
Document | Description |
Scheduler | The abstract class at top of the Scheduler hierarchy |
PBS | A scheduler useful when you have PBS-Pro (or other systems ?) installed |
System | This scheduler is used with multiprocessor SMP machines, or when you use the --local switch |
Rsystem | This scheduler is used with clusters which do NOT have any batch system installed |
The other objects or modules