Trinity for RNA-Seq De novo Assembly
Trinity is software for RNA-Seq De novo Assembly.[1]
Installed Version♯
A Singularity container for Trinity is installed on Picotte.[2]
Load the appropriate modulefile:
trinity/2.14.0
Using♯
The Singularity image is installed in the path
$TRINITYDIR/Trinity.simg
Example command:
singularity exec --bind=`pwd`:/tmp -e $TRINITYDIR/Trinity.simg \
Trinity \
--seqType fq \
--left /tmp/reads.left.fq.gz \
--right /tmp/reads.right.fq.gz \
--max_memory 1G --CPU 4 \
--output /tmp/trinity_out_dir
Example job script using the sample data distributed with the Trinity
source code, trinityrnaseq/sample_data/test_Trinity_Assembly:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=1:00:00
#SBATCH --mem=4G
module load trinity
# NOTE
# this example assumes
# 1) the data files reads.left.fq.gz and reads.right.fq.gz are in the same directory as this job script
# 2) there is an output directory "trinity_out_dir" in the same directory as this job script
singularity exec --bind=`pwd`:/tmp -e $TRINITYDIR/Trinity.simg \
Trinity \
--seqType fq \
--left /tmp/reads.left.fq.gz \
--right /tmp/reads.right.fq.gz \
--max_memory 4G \
--CPU $SLURM_CPUS_PER_TASK \
--output /tmp/trinity_out_dir
Truncated output from the example:
/ifs/opt/src/trinityrnaseq/sample_data/test_Trinity_Assembly
______ ____ ____ ____ ____ ______ __ __
| || \ | || \ | || || | |
| || D ) | | | _ | | | | || | |
|_| |_|| / | | | | | | | |_| |_|| ~ |
| | | \ | | | | | | | | | |___, |
| | | . \ | | | | | | | | | | |
|__| |__|_||____||__|__||____| |__| |____/
Trinity-v2.14.0
Left read files: $VAR1 = [
'/tmp/reads.left.fq.gz'
];
Right read files: $VAR1 = [
'/tmp/reads.right.fq.gz'
];
Trinity version: Trinity-v2.14.0
-currently using the latest production release of Trinity.
Tuesday, November 8, 2022: 12:18:09 CMD: java -Xmx64m -XX:ParallelGCThreads=2 -jar /usr/local/bin/util/support_scripts/ExitTester.jar 0
Tuesday, November 8, 2022: 12:18:09 CMD: java -Xmx4g -XX:ParallelGCThreads=2 -jar /usr/local/bin/util/support_scripts/ExitTester.jar 1
Tuesday, November 8, 2022: 12:18:09 CMD: mkdir -p /tmp/trinity_out_dir/chrysalis
----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads ---------------------
----------------------------------------------------------------------------------
---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 200 Coverage --
---------------------------------------------------------------
# running normalization on reads: $VAR1 = [
[
'/tmp/reads.left.fq.gz'
],
[
'/tmp/reads.right.fq.gz'
]
];
Tuesday, November 8, 2022: 12:18:09 CMD: /usr/local/bin/util/insilico_read_normalization.pl --seqType fq --JM 4G --max_cov 200 --min_cov 1 --CPU 8 --output /tmp/trinity_out_dir/insilico_read_normalization --max_CV 10000 --left /tmp/reads.left.fq.gz --right /tmp/reads.right.fq.gz --pairs_together --PARALLEL_STATS
-prepping seqs
Converting input files. (both directions in parallel)CMD: seqtk-trinity seq -A -R 1 <(gunzip -c /tmp/reads.left.fq.gz) >> left.fa
... [other output clipped] ...
All commands completed successfully. :-)
** Harvesting all assembled transcripts into a single multi-fasta file...
Tuesday, November 8, 2022: 12:18:38 CMD: find /tmp/trinity_out_dir/read_partitions/ -name '*inity.fasta' | /usr/local/bin/util/support_scripts/partitioned_trinity_aggregator.pl --token_prefix TRINITY_DN --output_prefix /tmp/trinity_out_dir/Trinity.tmp
* [Tue Nov 8 12:18:39 2022] Running CMD: /usr/local/bin/util/support_scripts/salmon_runner.pl Trinity.tmp.fasta /tmp/trinity_out_dir/both.fa 8
* [Tue Nov 8 12:18:40 2022] Running CMD: /usr/local/bin/util/support_scripts/filter_transcripts_require_min_cov.pl Trinity.tmp.fasta /tmp/trinity_out_dir/both.fa salmon_outdir/quant.sf 2 > /tmp/trinity_out_dir.Trinity.fasta
Tuesday, November 8, 2022: 12:18:40 CMD: /usr/local/bin/util/support_scripts/get_Trinity_gene_to_trans_map.pl /tmp/trinity_out_dir.Trinity.fasta > /tmp/trinity_out_dir.Trinity.fasta.gene_trans_map
#############################################################################
Finished. Final Trinity assemblies are written to /tmp/trinity_out_dir.Trinity.fasta
#############################################################################
Grid Config♯
Currently unavailable on Picotte.
OBSOLETE - Grid Config♯
There is a default grid config file given by the environment variable
TRINITYGRIDCONF, which may or may not work for your application. See
the Trinity documentation for details. The default grid config file is
as follows:
#--------------------------------------------------------------------------------------------
# grid type:
grid=SGE
# template for a grid submission
cmd=qsub -cwd -j y -l h_rt=2:00:00
# number of grid submissions to be maintained at steady state by the Trinity submission system
max_nodes=8
# number of commands that are batched into a single grid submission job.
cmds_per_node=1
#--------------------------------------------------------------------------------------------
References♯
[1] Trinity RNA-Seq
[2] Trinity Documentation - Running Trinity using Singularity