Command-line interface
run-pipeline
Run the whole single-sample pipeline on the input WGS alignment file in BAM or CRAM format:
mitopy run-pipeline [OPTIONS] BAM
Note
The input alignment file should be coordinate-sorted and indexed, however if these prerequisities are not met, mitopy will coordinate-sort and index the input file.
Option |
Deafult |
Description |
|---|---|---|
|
null |
Alignment index file (BAI/CRAI). If not provided, it is assumed it resides in the same directory as input BAM. |
|
rCRS |
Mitochondrial reference. By default, variants are detected and analyzed with respect to rCRS reference. We include RSRS as an optional mitochondrial reference. |
|
null |
Reference genome FASTA file. Only required when input is a CRAM file. |
|
null |
Name of the mitochondrial contig in the alignment file. If not provided, it will be automatically detected. |
|
BAM_DIR |
Output directory. By default, results are outputed in the directory of input BAM file. |
|
BAM_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
|
1 |
Number of cores. |
|
tmp |
Directory for intermediate files. |
|
false |
If true, remove intermediate files after the analysis is completed. |
|
“” |
Extra arguments to pass onto Mutect2 variant caller. |
Variant postprocessing options
Option |
Default |
Description |
|---|---|---|
|
1.0 |
F score beta. Specifies the relative weight of recall and precision for the filtering strategy. |
|
0.0 |
Minimum variant allele fraction treshold. All sites with variant allele fraction below the treshold will be filtered. |
|
null |
Custom BED file containing blacklisted sites (the BED index file has to be present as well). If not specified, the default blacklist for chosen MT reference will be used. |
|
0.0 |
Median autosomal coverage. Set to activate filter against erroneously mapped nuclear mitochondrial DNA segments (NuMTs). To estimate median autosomal coverage from WGS BAM, Picard CollectWgsMetrics can be used. |
|
false |
Contamination filter. If enabled, sample contamination level will be estimated using haplocheck and variants will be filtered (valid only for rCRS mitochondrial reference). |
|
true |
Remove variants not passing the enabled filters from final VCF file. |
|
true |
Split multi-allelic sites and left-align variant calls. |
Annotation options
Option |
Default |
Description |
|---|---|---|
|
0.95 |
Minimum homoplasmy level treshold. Annotate variants above this treshold as homoplasmic, otherwise heteroplasmic. |
|
true |
Annotate variants with population frequencies from gnomAD database. |
|
true |
Include conservation scores from PhyloP100way and PhastCons100way in annotations. |
|
true |
Annotate variants with in-silico pathogenicity predictions from SIFT, MitoTIP and PON-mt-tRNA. |
|
true |
Annotate variants with phenotype information from MITOMAP and ClinVar databases. |
|
true |
Export annotated variants to human-readable CSV format. |
Visualization options
Option |
Default |
Description |
|---|---|---|
|
false |
Split H and L strand of mitochondrial genome in the visualization. |
|
false |
Additionally, save plot as static PNG image. |
preprocess
Prepare input WGS alignment file in BAM or CRAM format for the mitochondrial analysis:
mitopy preprocess [OPTIONS] BAM
Option |
Default |
Description |
|---|---|---|
|
null |
Alignment index file (BAI/CRAI). If not provided, it is assumed it resides in the same directory as input BAM. |
|
null |
Reference genome FASTA file. Only required when input is a CRAM file. |
|
null |
Name of the mitochondrial contig in the alignment file. If not provided, it will be automatically detected. |
|
BAM_DIR |
Output directory. By default, results are outputed in the directory of input BAM file. |
|
BAM_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
align
Align mitochondrial reads in unmapped BAM format (uBAM) to canonical or shifted mitochondrial reference using bwa-mem2:
mitopy align [OPTIONS] UBAM
Option |
Default |
Description |
|---|---|---|
|
rCRS |
Mitochondrial reference to align against. By default, the reads are aligned to rCRS mitochondrial reference. We include RSRS as an optional mitochondrial reference. |
|
false |
Shifted mode. If enabled, the alignment is performed against shifted mitochondrial reference. |
|
1 |
Number of cores. |
|
UBAM_DIR |
Output directory. By default, results are outputed in the directory of input UBAM file. |
|
UBAM_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
call
Call variants in non-control region (using canonical mitochondrial reference) or control region (using shifted mitochondrial reference) of mitochondrial genome using Mutect2:
mitopy call [OPTIONS] BAM
Option |
Default |
Description |
|---|---|---|
|
rCRS |
Mitochondrial reference. By default, variants are called against rCRS mitochondrial reference. We include RSRS as optional mitochondrial reference. |
|
“” |
Extra arguments to pass onto Mutect2 variant caller. |
|
false |
Shifted mode. If enabled, the variant are called against shifted mitochondrial reference (control region). |
|
BAM_DIR |
Output directory. By default, results are outputed in the directory of input BAM file. |
|
BAM_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
merge
Merge variant calls and stats from control and non-control region:
mitopy merge [OPTIONS] VCF VCF_SHIFTED
Option |
Default |
Description |
|---|---|---|
|
null |
File containing Mutect2 stats for VCF file. By default, it is assumed that stats file is located in the same directory as input VCF. |
|
null |
File containing Mutect2 stats for shifted VCF file. By default, it is assumed that shifted stats file is located in the same directory as input VCF_SHIFTED. |
|
rCRS |
Mitochondrial reference used in variant calling process. By default, it is assumed that variants were called against rCRS. |
|
VCF_DIR |
Output directory. By default, results are outputed in the directory of input VCF file. |
|
VCF_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
postprocess
Postprocess raw variant calls to remove potential false-positives by applying several filters and normalize the VCF file:
mitopy postprocess [OPTIONS] VCF
Option |
Default |
Description |
|---|---|---|
|
null |
File containing Mutect2 stats for VCF file. By default, it is assumed that stats file is located in the same directory as input VCF. |
|
rCRS |
Mitochondrial reference used in variant calling process. By default, it is assumed that variants were called against rCRS. |
|
1.0 |
F score beta. Specifies the relative weight of recall and precision for the filtering strategy. |
|
0.0 |
Minimum variant allele fraction treshold. All sites with variant allele fraction below the treshold will be filtered. |
|
null |
Custom BED file containing blacklisted sites (the BED index file has to be present as well). If not specified, the default blacklist for chosen MT reference will be used. |
|
0.0 |
Median autosomal coverage. Set to activate filter against erroneously mapped nuclear mitochondrial DNA segments (NuMTs). To estimate median autosomal coverage from WGS BAM, Picard CollectWgsMetrics can be used. |
|
false |
Contamination filter. If enabled, sample contamination level will be estimated using haplocheck and variants will be filtered (valid only for rCRS mitochondrial reference). |
|
true |
Remove variants not passing the enabled filters from final VCF file. |
|
true |
Split multi-allelic sites and left-align variant calls. |
|
VCF_DIR |
Output directory. By default, results are outputed in the directory of input VCF file. |
|
VCF_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
coverage
Calculate per-base coverage using mosdepth . Coverage is combined from control (SHIFTED_MT_BAM) and non-control region (MT_BAM):
mitopy coverage [OPTIONS] MT_BAM SHIFTED_MT_BAM
Option |
Default |
Description |
|---|---|---|
|
null |
Index file for input BAM containing reads aligned against mitochondrial reference. By default, it is assumed index file is located in the same directory as MT_BAM. |
|
null |
Index file for shifted input BAM containing reads aligned against shifted mitochondrial reference. By default, it is assumed index file is located in the same directory as SHIFTED_MT_BAM. |
|
true |
Create coverage plot. |
|
BAM_DIR |
Output directory. By default, results are outputed in the directory of input BAM file. |
|
BAM_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
annotate
Annotate mitochondrial variant calls with functional effects using SnpEff and optionally add other annotations:
mitopy annotate [OPTIONS] VCF
Note
The annotation stage assumes that the input VCF file is normalized (specifically multi-allelic sites are split).
The variants are annotated with respect to rCRS mitochondrial reference.
Option |
Default |
Description |
|---|---|---|
|
0.95 |
Minimum homoplasmy level treshold. Annotate variants above this treshold as homoplasmic, otherwise heteroplasmic. |
|
true |
Annotate variants with population frequencies from gnomAD database. |
|
true |
Include conservation scores from PhyloP100way and PhastCons100way in annotations. |
|
true |
Annotate variants with in-silico pathogenicity predictions from SIFT, MitoTIP and PON-mt-tRNA. |
|
true |
Annotate variants with phenotype information from MITOMAP and ClinVar databases. |
|
true |
Export annotated variants to human-readable CSV format. |
|
VCF_DIR |
Output directory. By default, results are outputed in the directory of input VCF file. |
|
VCF_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |
visualize
Visualize variant calls:
mitopy visualize [OPTIONS] VCF
Option |
Default |
Description |
|---|---|---|
|
null |
CSV file with calculated per-base coverage. If provided, coverage will be included in the final visualization. |
|
false |
Split H and L strand of mitochondrial genome in the visualization. |
|
false |
Save plot as static PNG image. |
|
VCF_DIR |
Output directory. By default, results are outputed in the directory of input VCF file. |
|
VCF_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
identify-haplogroup
Identify sample haplogroup using haplogrep3:
mitopy identify-haplogroup [OPTIONS] VCF
Option |
Default |
Description |
|---|---|---|
|
rCRS |
Mitochondrial reference. By default, haplogroup is classified with respect to rCRS reference. We include RSRS as an optional mitochondrial reference. |
|
VCF_DIR |
Output directory. By default, results are outputed in the directory of input VCF file. |
|
VCF_BASENAME |
Prefix for output files. By default, resulting files will be prefixed with the input’s file basename. |
|
false |
Verbosity. If true, logs generated by underlying tools will be recorded. |