Sage Bionetworks Global Configuration
To use this custom configuration, run the pipeline with -profile sage
. This will download and load the sage.config
, which contains a number of optimizations relevant to Sage employees running workflows on AWS (e.g. using Nextflow Tower). This profile will also load any applicable pipeline-specific configuration.
This global configuration includes the following tweaks:
- Update the default value for
igenomes_base
tos3://sage-igenomes
- Enable retries for all failures
- Allow pending jobs to finish if the number of retries are exhausted
- Increase resource allocations for specific resource-related exit codes
- Optimize resource allocations to better “fit” EC2 instance types
- Slow the increase in the number of allocated CPU cores on retries
- Increase the default time limits because we run pipelines on AWS
- Increase the amount of time allowed for file transfers
- Improve reliability of file transfers with retries and reduced concurrency
Additional information about iGenomes
The following iGenomes prefixes have been copied from s3://ngi-igenomes/
(eu-west-1
) to s3://sage-igenomes
(us-east-1
). See this script for more information. The sage-igenomes
S3 bucket has been configured to openly available, but files cannot be downloaded out of us-east-1
to avoid egress charges. You can check the conf/igenomes.config
file in each nf-core pipeline to figure out the mapping between genome IDs (i.e. for --genome
) and iGenomes prefixes (example).
- Human Genome Builds
Homo_sapiens/Ensembl/GRCh37
Homo_sapiens/GATK/GRCh37
Homo_sapiens/UCSC/hg19
Homo_sapiens/GATK/GRCh38
Homo_sapiens/NCBI/GRCh38
Homo_sapiens/UCSC/hg38
- Mouse Genome Builds
Mus_musculus/Ensembl/GRCm38
Mus_musculus/UCSC/mm10
Config file
// Config profile metadata
params {
config_profile_description = 'The Sage Bionetworks Nextflow Config Profile'
config_profile_contact = 'Bruno Grande (@BrunoGrandePhD)'
config_profile_url = 'https://github.com/Sage-Bionetworks-Workflows'
// Leverage us-east-1 mirror of select human and mouse genomes
igenomes_base = 's3://sage-igenomes/igenomes'
cpus = 4
max_cpus = 32
max_memory = 128.GB
max_time = 240.h
single_cpu_mem = 6.GB
}
// Increase time limit to allow file transfers to finish
// The default is 12 hours, which results in timeouts
threadPool.FileTransfer.maxAwait = '24 hour'
// Configure Nextflow to be more reliable on AWS
aws {
region = "us-east-1"
client {
uploadMaxThreads = 4
}
batch {
retryMode = 'built-in'
maxParallelTransfers = 1
maxTransferAttempts = 10
delayBetweenAttempts = '60 sec'
}
}
// Adjust default resource allocations (see `../docs/sage.md`)
process {
resourceLimits = [
memory: 128.GB,
cpus: 32,
time: 240.h
]
maxErrors = '-1'
maxRetries = 5
// Enable retries globally for certain exit codes
errorStrategy = { task.attempt <= 5 ? 'retry' : 'finish' }
cpus = { 1 * factor(task, 2) }
memory = { 6.GB * factor(task, 1) }
time = { 24.h * factor(task, 1) }
// Process-specific resource requirements
withLabel: process_single {
cpus = { 1 * factor(task, 2) }
memory = { 6.GB * factor(task, 1) }
time = { 24.h * factor(task, 1) }
}
withLabel: process_low {
cpus = { 2 * factor(task, 2) }
memory = { 12.GB * factor(task, 1) }
time = { 24.h * factor(task, 1) }
}
withLabel: process_medium {
cpus = { 8 * factor(task, 2) }
memory = { 32.GB * factor(task, 1) }
time = { 48.h * factor(task, 1) }
}
withLabel: process_high {
cpus = { 16 * factor(task, 2) }
memory = { 64.GB * factor(task, 1) }
time = { 96.h * factor(task, 1) }
}
withLabel: process_long {
time = { 96.h * factor(task, 1) }
}
withLabel: 'process_high_memory|memory_max' {
memory = { 128.GB * factor(task, 1) }
}
withLabel: cpus_max {
cpus = { 32 * factor(task, 2) }
}
}
// Function to finely control the increase of the resource allocation
def factor(task, slow_factor = 1) {
if ( task.exitStatus in [143,137,104,134,139,247] ) {
return Math.ceil( task.attempt / slow_factor) as int
} else {
return 1 as int
}
}