Skip to content

Job Script Example 02 Many Input Files

Description

This is similar to Job Script Example 01 Many Input Files, except that the files are not named sequentially. Instead, the needed file names are listed in a separate file.

The example script and input files are in:

/mnt/HA/opt/Examples/Example02

Input File

The input file is named list_of_files.txt. Contents are:

2KontuepFego.txt atph7QuodsId.txt glyrydrivUs5.txt hidAjyonOct2.txt ikDugOdcayp4.txt irdaikIbDik8.txt JuiberAnNup1.txt KrighwennAr6.txt mepViavejub7.txt NidgitOtElm0.txt rosivPicdon9.txt scomghovJer1.txt SwiviphEpur5.txt tyWocaibyav3.txt VoryifuttEk1.txt Whan8Harhij6.txt wivErtAcper3.txt yabhavnekIb9.txt

Script

File names are listed in a file

This script demonstrates a useful pattern when the input arguments are not numerically sequential. Create an array variable[1] to store the arguments to be passed, and index into that array using the SGE_TASK_ID.

#!/bin/bash
#$ -S /bin/bash
#$ -N example02a
#$ -j y
#$ -cwd
#$ -M fixme@drexel.edu
#$ -P fixmePrj
#$ -l h_rt=300
#$ -q all.q@@amdhosts
#$ -t 1:18:1

### NOTE: you must know the number of files at the time of qsub

. /etc/profile
module load shared
module load sge/univa

declare -a filenames=( $( cat list_of_files.txt ) )

### NOTE: bash array indices start at 0, but SGE task IDs start at 1
taskid=$(printf %02d $SGE_TASK_ID)
sed -e 's/hello/goodbye/' ${filenames[$( expr $SGE_TASK_ID - 1 )]} > moddata${taskid}.txt

Input files are sequential but not listed in a separate file

This is similar to the above, but the files are named in some sequential manner.

#!/bin/bash
#$ -S /bin/bash
#$ -N example02b
#$ -j y
#$ -cwd
#$ -M fixme@drexel.edu
#$ -P fixmePrj
#$ -l h_rt=300
#$ -q all.q@@amdhosts
#$ -t 1:18:1

### NOTE: you must know the number of files at the time of qsub

. /etc/profile
module load shared
module load sge/univa

### The input files are named in some ordered way, e.g. aaa.input, aab.input, aac.input, ...
declare -a filenames=( $( /bin/ls -1 *.input ) )

### NOTE: bash array indices start at 0, but SGE task IDs start at 1
###       You could change SGE task IDs by doing #$ -t 0:17:1 instead
taskid=$(printf %02d $SGE_TASK_ID)
sed -e 's/hello/goodbye/' ${filenames[$( expr $SGE_TASK_ID - 1 )]} > moddata${taskid}.txt

References

[1] Advanced Bash Scripting Guide - Arrays