Skip to content

Processing Many Sequentially Named Input Files

You have many input files to process (maybe you are converting data from one format to another). The file names are in sequential order:

rawdata01.txt rawdata02.txt ... rawdata13.txt

The example script plus input files are in

/ifs/opt/Examples/Example01

We will take advantage of Slurm job array functionality (#SBATCH --array) and to reference that via the SLURM_ARRAY_TASK_ID environment variable in the script.

Script

#!/bin/bash
#SBATCH --partition=def
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=13
#SBATCH --time=00:03:00
#SBATCH --mem=1GB

#SBATCH --array=1-13

. /etc/profile.d/modules.sh
module load shared
module load slurm/picotte

cd /ifs/groups/myGrp/juser

### Since we will have many simultaneous processes on possibly many nodes
### writing to the same directory, we use the BeeGFS filesystem
DATADIR=/beegfs/scratch/juser/Examples/Example01

### This makes the fileid an integer with zeroes in front so that all fileids are the same number of characters
fileid=$(printf %02d $SLURM_ARRAY_TASK_ID)

sed -e 's/hello/goodbye/' ${DATADIR}/rawdata${fileid}.txt > ${DATADIR}/moddata${fileid}.txt