Overview of Picard command-line tools

The Picard command-line tools are packaged as executable jar files. They require Java 1.6. They can be invoked as follows:

java jvm-args -jar PicardCommand.jar OPTION1=value1 OPTION2=value2...

Most of the commands are designed to run in 2GB of JVM, so the JVM argument -Xmx2g is recommended.

Standard Options

The following options are relevant for most Picard programs:

OptionDescription
--helpDisplays options specific to this tool.
--stdhelpDisplays options specific to this tool AND options common to all Picard command line tools.
--versionDisplays program version.
TMP_DIR=FileThis option may be specified 0 or more times.
VERBOSITY=LogLevelControl verbosity of logging. Default value: INFO. This option can be set to 'null' to clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}
QUIET=BooleanWhether to suppress job-summary info on System.err. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
VALIDATION_STRINGENCY=ValidationStringencyValidation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT}
COMPRESSION_LEVEL=IntegerCompression level for all compressed files created (e.g. BAM and GELI). Default value: 5. This option can be set to 'null' to clear the default value.
MAX_RECORDS_IN_RAM=IntegerWhen writing SAM files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort a SAM file, and increases the amount of RAM needed. Default value: 500000. This option can be set to 'null' to clear the default value.
CREATE_INDEX=BooleanWhether to create a BAM index when writing a coordinate-sorted BAM file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
CREATE_MD5_FILE=BooleanWhether to create an MD5 digest for any BAM or FASTQ files created. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

AddCommentsToBam

Adds one or more comments to the header of a specified BAM file. Copies the file with the modified header to a specified output file. Note that a block copying method is used to ensure efficient transfer to the output file. SAM files are not supported

OptionDescription
INPUT=FileInput BAM file to add a comment to the header Required.
OUTPUT=FileOutput BAM file to write results Required.
COMMENT=StringComments to add to the BAM file This option may be specified 0 or more times.

AddOrReplaceReadGroups

Replaces all read groups in the INPUT file with a new read group and assigns all reads to this read group in the OUTPUT BAMVersion: 1.0

OptionDescription
INPUT=FileInput file (bam or sam). Required.
OUTPUT=FileOutput file (bam or sam). Required.
SORT_ORDER=SortOrderOptional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. Default value: null. Possible values: {unsorted, queryname, coordinate}
RGID=StringRead Group ID Default value: 1. This option can be set to 'null' to clear the default value.
RGLB=StringRead Group Library Required.
RGPL=StringRead Group platform (e.g. illumina, solid) Required.
RGPU=StringRead Group platform unit (eg. run barcode) Required.
RGSM=StringRead Group sample name Required.
RGCN=StringRead Group sequencing center name Default value: null.
RGDS=StringRead Group description Default value: null.
RGDT=Iso8601DateRead Group run date Default value: null.
RGPI=IntegerRead Group predicted insert size Default value: null.

BamToBfq

USAGE: BamToBfq [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#BamToBfq

Create BFQ files for use by the Maq aligner.

OptionDescription
INPUT=FileThe BAM file to parse. Required.
ANALYSIS_DIR=FileThe analysis directory for the binary output file. Required.
FLOWCELL_BARCODE=StringFlowcell barcode (e.g. 30PYMAAXX). Required. Cannot be used in conjuction with option(s) OUTPUT_FILE_PREFIX
LANE=IntegerLane number. Default value: null. Cannot be used in conjuction with option(s) OUTPUT_FILE_PREFIX
OUTPUT_FILE_PREFIX=StringPrefix for all output files Required. Cannot be used in conjuction with option(s) FLOWCELL_BARCODE (F) LANE (L)
READS_TO_ALIGN=IntegerNumber of reads to align (null = all). Default value: null.
READ_CHUNK_SIZE=IntegerNumber of reads to break into individual groups for alignment Default value: 2000000. This option can be set to 'null' to clear the default value.
PAIRED_RUN=BooleanWhether this is a paired-end run. Required. Possible values: {true, false}
RUN_BARCODE=StringDeprecated option; use READ_NAME_PREFIX instead Default value: null. Cannot be used in conjuction with option(s) READ_NAME_PREFIX
READ_NAME_PREFIX=StringPrefix to be stripped off the beginning of all read names (to make them short enough to run in Maq) Default value: null.
INCLUDE_NON_PF_READS=BooleanWhether to include non-PF reads Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
CLIP_ADAPTERS=BooleanWhether to clip adapters from the reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
BASES_TO_WRITE=IntegerThe number of bases from each read to write to the bfq file. If this is non-null, then only the first BASES_TO_WRITE bases from each read will be written. Default value: null.

BamIndexStats

USAGE: BamIndexStats [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#BamIndexStats

Generates BAM index statistics. Input BAM file must have a corresponding index file.

OptionDescription
INPUT=FileA BAM file to process. Required.

BuildBamIndex

USAGE: BuildBamIndex [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#BuildBamIndex

Generates a BAM index (.bai) file.

OptionDescription
INPUT=StringA BAM file or URL to process. Must be sorted in coordinate order. Required.
OUTPUT=FileThe BAM index file. Defaults to x.bai if INPUT is x.bam, otherwise INPUT.bai.

If INPUT is a URL and OUTPUT is unspecified, defaults to a file in the current directory. Default value: null.


CalculateHsMetrics

USAGE: CalculateHsMetrics [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CalculateHsMetrics

Calculates a set of Hybrid Selection specific metrics from an aligned SAMor BAM file. If a reference sequence is provided, AT/GC dropout metrics will be calculated, and the PER_TARGET_COVERAGE option can be used to output GC and mean coverage information for every target.

OptionDescription
BAIT_INTERVALS=FileAn interval list file that contains the locations of the baits used. This option may be specified 0 or more times.
BAIT_SET_NAME=StringBait set name. If not provided it is inferred from the filename of the bait intervals. Default value: null.
TARGET_INTERVALS=FileAn interval list file that contains the locations of the targets. This option may be specified 0 or more times.
INPUT=FileAn aligned SAM or BAM file. Required.
OUTPUT=FileThe output file to write the metrics to. Required.
METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevelThe level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
REFERENCE_SEQUENCE=FileThe reference sequence aligned to. Default value: null.
PER_TARGET_COVERAGE=FileAn optional file to output per target coverage information to. Default value: null.

CleanSam

USAGE: CleanSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CleanSam

Read SAM and perform various fix-ups. Currently, the only fix-ups are 1: to soft-clip an alignment that hangs off the end of its reference sequence; and 2: to set MAPQ to 0 if a read is unmapped.

OptionDescription
INPUT=FileInput SAM to be cleaned. Required.
OUTPUT=FileWhere to write cleaned SAM. Required.

CollectAlignmentSummaryMetrics

USAGE: CollectAlignmentSummaryMetrics [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CollectAlignmentSummaryMetrics

Reads a SAM or BAM file and writes a file containing summary alignment metrics.

OptionDescription
MAX_INSERT_SIZE=IntegerPaired end reads above this insert size will be considered chimeric along with inter-chromosomal pairs. Default value: 100000. This option can be set to 'null' to clear the default value.
ADAPTER_SEQUENCE=StringList of adapter sequences to use when processing the alignment metrics This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevelThe level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
IS_BISULFITE_SEQUENCED=BooleanWhether the SAM or BAM file consists of bisulfite sequenced reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INPUT=FileInput SAM or BAM file. Required.
OUTPUT=FileFile to write the output to. Required.
REFERENCE_SEQUENCE=FileReference sequence fasta Default value: null.
ASSUME_SORTED=BooleanIf true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
STOP_AFTER=LongStop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.

CollectGcBiasMetrics

Usage: program [options...]

OptionDescription
REFERENCE_SEQUENCE=FileThe reference sequence fasta file. Required.
INPUT=FileThe BAM or SAM file containing aligned reads. Must be coordinate-sorted. Required.
OUTPUT=FileThe text file to write the metrics table to. Required.
CHART_OUTPUT=FileThe PDF file to render the chart to. Required.
SUMMARY_OUTPUT=FileThe text file to write summary metrics to. Default value: null.
WINDOW_SIZE=IntegerThe size of windows on the genome that are used to bin reads. Default value: 100. This option can be set to 'null' to clear the default value.
MINIMUM_GENOME_FRACTION=DoubleFor summary metrics, exclude GC windows that include less than this fraction of the genome. Default value: 1.0E-5. This option can be set to 'null' to clear the default value.
ASSUME_SORTED=BooleanIf true, assume that the input file is coordinate sorted, even if the header says otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
IS_BISULFITE_SEQUENCED=BooleanWhether the SAM or BAM file consists of bisulfite sequenced reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

CollectInsertSizeMetrics

USAGE: CollectInsertSizeMetrics [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CollectInsertSizeMetrics

Reads a SAM or BAM file and writes a file containing metrics about the statistical distribution of insert size (excluding duplicates) and generates a Histogram plot.

OptionDescription
HISTOGRAM_FILE=FileFile to write insert size Histogram chart to. Required.
DEVIATIONS=DoubleGenerate mean, sd and plots by trimming the data down to MEDIAN + DEVIATIONS*MEDIAN_ABSOLUTE_DEVIATION. This is done because insert size data typically includes enough anomalous values from chimeras and other artifacts to make the mean and sd grossly misleading regarding the real distribution. Default value: 10.0. This option can be set to 'null' to clear the default value.
HISTOGRAM_WIDTH=IntegerExplicitly sets the Histogram width, overriding automatic truncation of Histogram tail. Also, when calculating mean and standard deviation, only bins <= Histogram_WIDTH will be included. Default value: null.
MINIMUM_PCT=FloatWhen generating the Histogram, discard any data categories (out of FR, TANDEM, RF) that have fewer than this percentage of overall reads. (Range: 0 to 1). Default value: 0.05. This option can be set to 'null' to clear the default value.
METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevelThe level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
INPUT=FileInput SAM or BAM file. Required.
OUTPUT=FileFile to write the output to. Required.
REFERENCE_SEQUENCE=FileReference sequence fasta Default value: null.
ASSUME_SORTED=BooleanIf true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
STOP_AFTER=LongStop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.

CollectMultipleMetrics

USAGE: CollectMultipleMetrics [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CollectMultipleMetrics

Takes an input BAM and reference sequence and runs one or more Picard metrics modules at the same time to cut down on I/O. Currently all programs are run with default options and fixed output extesions, but this may become more flexible in future.

OptionDescription
INPUT=FileInput SAM or BAM file. Required.
REFERENCE_SEQUENCE=FileReference sequence fasta. Default value: null.
ASSUME_SORTED=BooleanIf true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
STOP_AFTER=IntegerStop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.
OUTPUT=StringBase name of output files. Required.
PROGRAM=ProgramList of metrics programs to apply during the pass through the SAM file. Possible values: {CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.

CollectTargetedPcrMetrics

Calculates a set of metrics to Illumina Truseq Custom Amplicon sequencing from an aligned SAMor BAM file. If a reference sequence is provided, AT/GC dropout metrics will be calculated, and the PER_TARGET_COVERAGE option can be used to output GC and mean coverage information for every target.

OptionDescription
AMPLICON_INTERVALS=FileAn interval list file that contains the locations of the baits used. Required.
CUSTOM_AMPLICON_SET_NAME=StringCustom amplicon set name. If not provided it is inferred from the filename of the AMPLICON_INTERVALS intervals. Default value: null.
TARGET_INTERVALS=FileAn interval list file that contains the locations of the targets. This option may be specified 0 or more times.
INPUT=FileAn aligned SAM or BAM file. Required.
OUTPUT=FileThe output file to write the metrics to. Required.
METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevelThe level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
REFERENCE_SEQUENCE=FileThe reference sequence aligned to. Default value: null.
PER_TARGET_COVERAGE=FileAn optional file to output per target coverage information to. Default value: null.

CollectRnaSeqMetrics

USAGE: CollectRnaSeqMetrics [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CollectRnaSeqMetrics

Program to collect metrics about the alignment of RNA to various functional classes of loci in the genome: coding, intronic, UTR, intergenic, ribosomal.

Also determines strand-specificity for strand-specific libraries.

OptionDescription
REF_FLAT=FileGene annotations in refFlat form. Format described here: http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat Required.
RIBOSOMAL_INTERVALS=FileLocation of rRNA sequences in genome, in interval_list format. If not specified no bases will be identified as being ribosomal. Format described here: http://picard.sourceforge.net/javadoc/net/sf/picard/util/IntervalList.html Default value: null.
STRAND_SPECIFICITY=StrandSpecificityFor strand-specific library prep. For unpaired reads, use FIRST_READ_TRANSCRIPTION_STRAND if the reads are expected to be on the transcription strand. Required. Possible values: {NONE, FIRST_READ_TRANSCRIPTION_STRAND, SECOND_READ_TRANSCRIPTION_STRAND}
MINIMUM_LENGTH=IntegerWhen calculating coverage based values (e.g. CV of coverage) only use transcripts of this length or greater. Default value: 500. This option can be set to 'null' to clear the default value.
CHART_OUTPUT=FileThe PDF file to write out a plot of normalized position vs. coverage. Default value: null.
IGNORE_SEQUENCE=StringIf a read maps to a sequence specified with this option, all the bases in the read are counted as ignored bases. These reads are not counted as This option may be specified 0 or more times.
RRNA_FRAGMENT_PERCENTAGE=DoubleThis percentage of the length of a fragment must overlap one of the ribosomal intervals for a read or read pair by this must in order to be considered rRNA. Default value: 0.8. This option can be set to 'null' to clear the default value.
METRIC_ACCUMULATION_LEVEL=MetricAccumulationLevelThe level(s) at which to accumulate metrics. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
INPUT=FileInput SAM or BAM file. Required.
OUTPUT=FileFile to write the output to. Required.
REFERENCE_SEQUENCE=FileReference sequence fasta Default value: null.
ASSUME_SORTED=BooleanIf true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
STOP_AFTER=LongStop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.

CollectWgsMetrics

Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments.

OptionDescription
INPUT=FileInput SAM or BAM file. Required.
OUTPUT=FileOutput metrics file. Required.
REFERENCE_SEQUENCE=FileThe reference sequence fasta aligned to. Required.
MINIMUM_MAPPING_QUALITY=IntegerMinimum mapping quality for a read to contribute coverage. Default value: 20. This option can be set to 'null' to clear the default value.
MINIMUM_BASE_QUALITY=IntegerMinimum base quality for a base to contribute coverage. Default value: 20. This option can be set to 'null' to clear the default value.
COVERAGE_CAP=IntegerTreat bases with coverage exceeding this value as if they had coverage at this value. Default value: 250. This option can be set to 'null' to clear the default value.
STOP_AFTER=LongFor debugging purposes, stop after processing this many genomic bases. Default value: -1. This option can be set to 'null' to clear the default value.

CompareSAMs

USAGE: CompareSAMS <SAMFile1> <SAMFile2>

Compares the headers of the two input SAM or BAM files, and, if possible, the SAMRecords. For SAMRecords, compares only the readUnmapped flag, reference name, start position and strand. Reports the number of SAMRecords that match, differ in alignment, are mapped in only one input, or are missing in one of the files


CreateSequenceDictionary

Usage: picard.sam.CreateSequenceDictionary [options]

Read fasta or fasta.gz containing reference sequences, and write as a SAM or BAM file with only sequence dictionary.

OptionDescription
REFERENCE=FileInput reference fasta or fasta.gz Required.
OUTPUT=FileOutput SAM or BAM file containing only the sequence dictionary Required.
GENOME_ASSEMBLY=StringPut into AS field of sequence dictionary entry if supplied Default value: null.
URI=StringPut into UR field of sequence dictionary entry. If not supplied, input reference file is used Default value: null.
SPECIES=StringPut into SP field of sequence dictionary entry Default value: null.
TRUNCATE_NAMES_AT_WHITESPACE=BooleanMake sequence name the first word from the > line in the fasta file. By default the entire contents of the > line is used, excluding leading and trailing whitespace. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
NUM_SEQUENCES=IntegerStop after writing this many sequences. For testing. Default value: 2147483647. This option can be set to 'null' to clear the default value.

DownsampleSam

USAGE: DownsampleSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#DownsampleSam

Randomly down-sample a SAM or BAM file to retain a random subset of the reads. Mate-pairs are either both kept or both discarded. Reads marked as not primary alignments are all discarded. Each read is given a probability P of being retained - results with the exact same input in the same order and with the same value for RANDOM_SEED will produce the same results.

OptionDescription
INPUT=FileThe input SAM or BAM file to downsample. Required.
OUTPUT=FileThe output, downsampled, SAM or BAM file to write. Required.
RANDOM_SEED=LongRandom seed to use if reproducibilty is desired. Setting to null will cause multiple invocations to produce different results. Default value: 1. This option can be set to 'null' to clear the default value.
PROBABILITY=DoubleThe probability of keeping any individual read, between 0 and 1. Default value: 1.0. This option can be set to 'null' to clear the default value.

ExtractIlluminaBarcodes

USAGE: ExtractIlluminaBarcodes [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#ExtractIlluminaBarcodes

Determine the barcode for each read in an Illumina lane.

For each tile, a file is written to the basecalls directory of the form s_<lane>_<tile>_barcode.txt. An output file contains a line for each read in the tile, aligned with the regular basecall output.

The output file contains the following tab-separated columns:

* read subsequence at barcode position

* Y or N indicating if there was a barcode match

* matched barcode sequence

Note that the order of specification of barcodes can cause arbitrary differences in output for poorly matching barcodes.

OptionDescription
BASECALLS_DIR=FileThe Illumina basecalls directory. Required.
OUTPUT_DIR=FileWhere to write _barcode.txt files. By default, these are written to BASECALLS_DIR. Default value: null.
LANE=IntegerLane number. Required.
READ_STRUCTURE=StringA description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Required.
BARCODE=StringBarcode sequence. These must be unique, and all the same length. This cannot be used with reads that have more than one barcode; use BARCODE_FILE in that case. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) BARCODE_FILE
BARCODE_FILE=FileTab-delimited file of barcode sequences, barcode name and, optionally, library name. Barcodes must be unique and all the same length. Column headers must be 'barcode_sequence_1', 'barcode_sequence_2' (optional), 'barcode_name', and 'library_name'. Required. Cannot be used in conjuction with option(s) BARCODE
METRICS_FILE=FilePer-barcode and per-lane metrics written to this file. Required.
MAX_MISMATCHES=IntegerMaximum mismatches for a barcode to be considered a match. Default value: 1. This option can be set to 'null' to clear the default value.
MIN_MISMATCH_DELTA=IntegerMinimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match. Default value: 1. This option can be set to 'null' to clear the default value.
MAX_NO_CALLS=IntegerMaximum allowable number of no-calls in a barcode read before it is considered unmatchable. Default value: 2. This option can be set to 'null' to clear the default value.
MINIMUM_BASE_QUALITY=IntegerMinimum base quality. Any barcode bases falling below this quality will be considered a mismatch even in the bases match. Default value: 0. This option can be set to 'null' to clear the default value.
MINIMUM_QUALITY=IntegerThe minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value.
COMPRESS_OUTPUTS=BooleanCompress output s_l_t_barcode.txt files using gzip and append a .gz extension to the file names. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
NUM_PROCESSORS=IntegerRun this many PerTileBarcodeExtractors in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0 then the number of cores used will be the number available on the machine less NUM_PROCESSORS. Default value: 1. This option can be set to 'null' to clear the default value.

EstimateLibraryComplexity

Attempts to estimate library complexity from sequence of read pairs alone. Does so by sorting all reads by the first N bases (5 by default) of each read and then comparing reads with the first N bases identical to each other for duplicates. Reads are considered to be duplicates if they match each other with no gaps and an overall mismatch rate less than or equal to MAX_DIFF_RATE (0.03 by default).

Reads of poor quality are filtered out so as to provide a more accurate estimate. The filtering removes reads with any no-calls in the first N bases or with a mean base quality lower than MIN_MEAN_QUALITY across either the first or second read.

Unpaired reads are ignored in this computation.

The algorithm attempts to detect optical duplicates separately from PCR duplicates and excludes these in the calculation of library size. Also, since there is no alignment to screen out technical reads one further filter is applied on the data. After examining all reads a Histogram is built of [#reads in duplicate set -> #of duplicate sets]; all bins that contain exactly one duplicate set are then removed from the Histogram as outliers before library size is estimated.

OptionDescription
INPUT=FileOne or more files to combine and estimate library complexity from. Reads can be mapped or unmapped. This option may be specified 0 or more times.
OUTPUT=FileOutput file to writes per-library metrics to. Required.
MIN_IDENTICAL_BASES=IntegerThe minimum number of bases at the starts of reads that must be identical for reads to be grouped together for duplicate detection. In effect total_reads / 4^max_id_bases reads will be compared at a time, so lower numbers will produce more accurate results but consume exponentially more memory and CPU. Default value: 5. This option can be set to 'null' to clear the default value.
MAX_DIFF_RATE=DoubleThe maximum rate of differences between two reads to call them identical. Default value: 0.03. This option can be set to 'null' to clear the default value.
MIN_MEAN_QUALITY=IntegerThe minimum mean quality of the bases in a read pair for the read to be analyzed. Reads with lower average quality are filtered out and not considered in any calculations. Default value: 20. This option can be set to 'null' to clear the default value.
MAX_GROUP_RATIO=IntegerDo not process self-similar groups that are this many times over the mean expected group size. I.e. if the input contains 10m read pairs and MIN_IDENTICAL_BASES is set to 5, then the mean expected group size would be approximately 10 reads. Default value: 500. This option can be set to 'null' to clear the default value.
READ_NAME_REGEX=StringRegular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character and the 2nd, 3rd and 4th elements are assumed to be tile, x and y values. Default value: [a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. This option can be set to 'null' to clear the default value.
OPTICAL_DUPLICATE_PIXEL_DISTANCE=IntegerThe maximum offset between two duplicte clusters in order to consider them optical duplicates. This should usually be set to some fairly small number (e.g. 5-10 pixels) unless using later versions of the Illumina pipeline that multiply pixel values by 10, in which case 50-100 is more normal. Default value: 100. This option can be set to 'null' to clear the default value.

FastqToSam

Extracts read sequences and qualities from the input fastq file and writes them into the output file in unaligned BAM format. Input files can be in GZip format (end in .gz).

OptionDescription
FASTQ=FileInput fastq file (optionally gzipped) for single end data, or first read in paired end data. Required.
FASTQ2=FileInput fastq file (optionally gzipped) for the second read of paired end data. Default value: null.
QUALITY_FORMAT=FastqQualityFormatA value describing how the quality values are encoded in the fastq. Either Solexa for pre-pipeline 1.3 style scores (solexa scaling + 66), Illumina for pipeline 1.3 and above (phred scaling + 64) or Standard for phred scaled scores with a character shift of 33. If this value is not specified, the quality format will be detected automatically. Default value: null. Possible values: {Solexa, Illumina, Standard}
OUTPUT=FileOutput SAM/BAM file. Required.
READ_GROUP_NAME=StringRead group name Default value: A. This option can be set to 'null' to clear the default value.
SAMPLE_NAME=StringSample name to insert into the read group header Required.
LIBRARY_NAME=StringThe library name to place into the LB attribute in the read group header Default value: null.
PLATFORM_UNIT=StringThe platform unit (often run_barcode.lane) to insert into the read group header Default value: null.
PLATFORM=StringThe platform type (e.g. illumina, solid) to insert into the read group header Default value: null.
SEQUENCING_CENTER=StringThe sequencing center from which the data originated Default value: null.
PREDICTED_INSERT_SIZE=IntegerPredicted median insert size, to insert into the read group header Default value: null.
COMMENT=StringComment(s) to include in the merged output file's header. This option may be specified 0 or more times.
DESCRIPTION=StringInserted into the read group header Default value: null.
RUN_DATE=Iso8601DateDate the run was produced, to insert into the read group header Default value: null.
SORT_ORDER=SortOrderThe sort order for the output sam/bam file. Default value: queryname. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate}
MIN_Q=IntegerMinimum quality allowed in the input fastq. An exception will be thrown if a quality is less than this value. Default value: 0. This option can be set to 'null' to clear the default value.
MAX_Q=IntegerMaximum quality allowed in the input fastq. An exception will be thrown if a quality is greater than this value. Default value: 93. This option can be set to 'null' to clear the default value.
STRIP_UNPAIRED_MATE_NUMBER=BooleanIf true and this is an unpaired fastq any occurance of '/1' will be removed from the end of a read name. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ALLOW_AND_IGNORE_EMPTY_LINES=BooleanAllow (and ignore) empty lines Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

FilterSamReads

Produces a new SAM or BAM file by including or excluding aligned reads or a list of reads names supplied in the READ_LIST_FILE from the INPUT SAM or BAM file.

OptionDescription
INPUT=FileThe SAM or BAM file that will be filtered. Required.
FILTER=FilterFilter. Required. Possible values: {includeAligned [OUTPUT SAM/BAM will contain aligned reads only. INPUT SAM/BAM must be in queryname SortOrder. (Note that *both* first and second of paired reads must be aligned to be included in the OUTPUT SAM or BAM)], excludeAligned [OUTPUT SAM/BAM will contain un-mapped reads only. INPUT SAM/BAM must be in queryname SortOrder. (Note that *both* first and second of pair must be aligned to be excluded from the OUTPUT SAM or BAM)], includeReadList [OUTPUT SAM/BAM will contain reads that are supplied in the READ_LIST_FILE file], excludeReadList [OUTPUT bam will contain reads that are *not* supplied in the READ_LIST_FILE file]}
READ_LIST_FILE=FileRead List File containing reads that will be included or excluded from the OUTPUT SAM or BAM file. Default value: null.
SORT_ORDER=SortOrderSortOrder of the OUTPUT SAM or BAM file, otherwise use the SortOrder of the INPUT file. Default value: null. Possible values: {unsorted, queryname, coordinate}
WRITE_READS_FILES=BooleanCreate .reads files (for debugging purposes) Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
OUTPUT=FileSAM or BAM file to write read excluded results to Required.

FixMateInformation

Ensure that all mate-pair information is in sync between each read and it's mate pair. If no OUTPUT file is supplied then the output is written to a temporary file and then copied over the INPUT file. Reads marked with the secondary alignment flag are written to the output file unchanged.

OptionDescription
INPUT=FileThe input file to fix. This option may be specified 0 or more times.
OUTPUT=FileThe output file to write to. If no output file is supplied, the input file is overwritten. Default value: null.
SORT_ORDER=SortOrderOptional sort order if the OUTPUT file should be sorted differently than the INPUT file. Default value: null. Possible values: {unsorted, queryname, coordinate}
ASSUME_SORTED=BooleanIf true, assume that the input file is queryname sorted, even if the header says otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ADD_MATE_CIGAR=BooleanAdds the mate CIGAR tag (MC) if true, does not if false. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

GatherBamFiles

Concatenates one or more BAM files together as efficiently as possible. Assumes that the list of BAM files provided as INPUT are in the order that they should be concatenated and simply concatenates the bodies of the BAM files while retaining the header from the first file. Operates via copying of the gzip blocks directly for speed but also supports generation of an MD5 on the output and indexing of the output BAM file. Only support BAM files, does not support SAM files.

OptionDescription
INPUT=FileOne or more BAM files or text files containing lists of BAM files one per line. This option may be specified 0 or more times.
OUTPUT=FileThe output BAM file to write. Required.

IlluminaBasecallsToFastq

USAGE: IlluminaBasecallsToFastq [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#IlluminaBasecallsToFastq

Generate fastq file(s) from data in an Illumina basecalls output directory.

Separate fastq file(s) are created for each template read, and for each barcode read, in the basecalls.

Template fastqs have extensions like .<number>.fastq, where <number> is the number of the template read,

starting with 1. Barcode fastqs have extensions like .barcode_<number>.fastq, where <number> is the number

of the barcode read, starting with 1.

OptionDescription
BASECALLS_DIR=FileThe basecalls directory. Required.
BARCODES_DIR=FileThe barcodes directory with _barcode.txt files (generated by ExtractIlluminaBarcodes). If not set, use BASECALLS_DIR. Default value: null.
LANE=IntegerLane number. Required.
OUTPUT_PREFIX=FileThe prefix for output fastqs. Extensions as described above are appended. Use this option for a non-barcoded run, or for a barcoded run in which it is not desired to demultiplex reads into separate files by barcode. Required. Cannot be used in conjuction with option(s) MULTIPLEX_PARAMS
RUN_BARCODE=StringThe barcode of the run. Prefixed to read names. Required.
MACHINE_NAME=StringThe name of the machine on which the run was sequenced; required if emitting Casava1.8-style read name headers Default value: null.
FLOWCELL_BARCODE=StringThe barcode of the flowcell that was sequenced; required if emitting Casava1.8-style read name headers Default value: null.
READ_STRUCTURE=StringA description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Required.
MULTIPLEX_PARAMS=FileTab-separated file for creating all output fastqs demultiplexed by barcode for a lane with single IlluminaBasecallsToFastq invocation. The columns are OUTPUT_PREFIX, and BARCODE_1, BARCODE_2 ... BARCODE_X where X = number of barcodes per cluster (optional). Row with BARCODE_1 set to 'N' is used to specify an output_prefix for no barcode match. Required. Cannot be used in conjuction with option(s) OUTPUT_PREFIX (O)
ADAPTERS_TO_CHECK=IlluminaAdapterPairWhich adapters to look for in the read. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
NUM_PROCESSORS=IntegerThe number of threads to run in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0, then the number of cores used will be the number available on the machine less NUM_PROCESSORS. Default value: 0. This option can be set to 'null' to clear the default value.
FIRST_TILE=IntegerIf set, this is the first tile to be processed (used for debugging). Note that tiles are not processed in numerical order. Default value: null.
TILE_LIMIT=IntegerIf set, process no more than this many tiles (used for debugging). Default value: null.
APPLY_EAMSS_FILTER=BooleanApply EAMSS filtering to identify inappropriately quality scored bases towards the ends of reads and convert their quality scores to Q2. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
FORCE_GC=BooleanIf true, call System.gc() periodically. This is useful in cases in which the -Xmx value passed is larger than the available memory. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_READS_IN_RAM_PER_TILE=IntegerConfigure SortingCollections to store this many records before spilling to disk. For an indexed run, each SortingCollection gets this value/number of indices. Default value: 1200000. This option can be set to 'null' to clear the default value.
MINIMUM_QUALITY=IntegerThe minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value.
INCLUDE_NON_PF_READS=BooleanWhether to include non-PF reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
READ_NAME_FORMAT=ReadNameFormatThe read name header formatting to emit. Casava1.8 formatting has additional information beyond Illumina, including: the passing-filter flag value for the read, the flowcell name, and the sequencer name. Default value: CASAVA_1_8. This option can be set to 'null' to clear the default value. Possible values: {CASAVA_1_8, ILLUMINA}
COMPRESS_OUTPUTS=BooleanCompress output FASTQ files using gzip and append a .gz extension to the file names. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

IlluminaBasecallsToSam

USAGE: IlluminaBasecallsToSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#IlluminaBasecallsToSam

Generate a SAM or BAM file from data in an Illumina basecalls output directory.

OptionDescription
BASECALLS_DIR=FileThe basecalls directory. Required.
BARCODES_DIR=FileThe barcodes directory with _barcode.txt files (generated by ExtractIlluminaBarcodes). If not set, use BASECALLS_DIR. Default value: null.
LANE=IntegerLane number. Required.
OUTPUT=FileDeprecated (use LIBRARY_PARAMS). The output SAM or BAM file. Format is determined by extension. Required. Cannot be used in conjuction with option(s) BARCODE_PARAMS LIBRARY_PARAMS
RUN_BARCODE=StringThe barcode of the run. Prefixed to read names. Required.
SAMPLE_ALIAS=StringDeprecated (use LIBRARY_PARAMS). The name of the sequenced sample Required. Cannot be used in conjuction with option(s) BARCODE_PARAMS LIBRARY_PARAMS
READ_GROUP_ID=StringID used to link RG header record with RG tag in SAM record. If these are unique in SAM files that get merged, merge performance is better. If not specified, READ_GROUP_ID will be set to <first 5 chars of RUN_BARCODE>.<LANE> . Default value: null.
LIBRARY_NAME=StringDeprecated (use LIBRARY_PARAMS). The name of the sequenced library Default value: null. Cannot be used in conjuction with option(s) BARCODE_PARAMS LIBRARY_PARAMS
SEQUENCING_CENTER=StringThe name of the sequencing center that produced the reads. Used to set the RG.CN tag. Default value: BI. This option can be set to 'null' to clear the default value.
RUN_START_DATE=DateThe start date of the run. Default value: null.
PLATFORM=StringThe name of the sequencing technology that produced the read. Default value: illumina. This option can be set to 'null' to clear the default value.
READ_STRUCTURE=StringA description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Required.
BARCODE_PARAMS=FileDeprecated (use LIBRARY_PARAMS). Tab-separated file for creating all output BAMs for barcoded run with single IlluminaBasecallsToSam invocation. Columns are BARCODE, OUTPUT, SAMPLE_ALIAS, and LIBRARY_NAME. Row with BARCODE=N is used to specify a file for no barcode match Required. Cannot be used in conjuction with option(s) SAMPLE_ALIAS (ALIAS) LIBRARY_NAME (LIB) OUTPUT (O) LIBRARY_PARAMS
LIBRARY_PARAMS=FileTab-separated file for creating all output BAMs for a lane with single IlluminaBasecallsToSam invocation. The columns are OUTPUT, SAMPLE_ALIAS, and LIBRARY_NAME, BARCODE_1, BARCODE_2 ... BARCODE_X where X = number of barcodes per cluster (optional). Row with BARCODE_1 set to 'N' is used to specify a file for no barcode match. You may also provide any 2 letter RG header attributes (excluding PU, CN, PL, and DT) as columns in this file and the values for those columns will be inserted into the RG tag for the BAM file created for a given row. Required. Cannot be used in conjuction with option(s) SAMPLE_ALIAS (ALIAS) LIBRARY_NAME (LIB) BARCODE_PARAMS OUTPUT (O)
ADAPTERS_TO_CHECK=IlluminaAdapterPairWhich adapters to look for in the read. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
NUM_PROCESSORS=IntegerThe number of threads to run in parallel. If NUM_PROCESSORS = 0, number of cores is automatically set to the number of cores available on the machine. If NUM_PROCESSORS < 0, then the number of cores used will be the number available on the machine less NUM_PROCESSORS. Default value: 0. This option can be set to 'null' to clear the default value.
FIRST_TILE=IntegerIf set, this is the first tile to be processed (used for debugging). Note that tiles are not processed in numerical order. Default value: null.
TILE_LIMIT=IntegerIf set, process no more than this many tiles (used for debugging). Default value: null.
FORCE_GC=BooleanIf true, call System.gc() periodically. This is useful in cases in which the -Xmx value passed is larger than the available memory. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
APPLY_EAMSS_FILTER=BooleanApply EAMSS filtering to identify inappropriately quality scored bases towards the ends of reads and convert their quality scores to Q2. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_READS_IN_RAM_PER_TILE=IntegerConfigure SortingCollections to store this many records before spilling to disk. For an indexed run, each SortingCollection gets this value/number of indices. Default value: 1200000. This option can be set to 'null' to clear the default value.
MINIMUM_QUALITY=IntegerThe minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown.The default of 2 is what the Illumina's spec describes as the minimum, but in practice the value has been observed lower. Default value: 2. This option can be set to 'null' to clear the default value.
INCLUDE_NON_PF_READS=BooleanWhether to include non-PF reads Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

CheckIlluminaDirectory

USAGE: CheckIlluminaDirectory [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#CheckIlluminaDirectory

Check that the files to provide the data specified by DATA_TYPES are available, exist, and are reasonably sized for every tile/cycle. Reasonably sized means non-zero sized for files that exist per tile and equal size for binary files that exist per cycle/per tile. CheckIlluminaDirectory DOES NOT check that the individual records in a file are well-formed.

OptionDescription
BASECALLS_DIR=FileThe basecalls output directory. Required.
DATA_TYPES=IlluminaDataTypeThe data types that should be checked for each tile/cycle. If no values are provided then the data types checked are those required by IlluminaBaseCallsToSam (which is a superset of those used in ExtractIlluminaBarcodes). These data types vary slightly depending onwhether or not the run is barcoded so READ_STRUCTURE should be the same as that which will be passed to IlluminaBasecallsToSam. If this option is left unspecified then both ExtractIlluminaBarcodes and IlluminaBaseCallsToSam should complete successfully UNLESS the individual records of the files themselves are spurious. Possible values: {Position, BaseCalls, QualityScores, PF, Barcodes} This option may be specified 0 or more times.
READ_STRUCTURE=StringA description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of "36T8B8S28T" then, before being converted to SAM records those bases will be split into 4 reads where read one consists of 36 cycles of template, read two consists of 8 cycles of barcode, read three will be an 8 base read of skipped cycles and read four is another 28 cycle template read. The read consisting of skipped cycles would NOT be included in output SAM/BAM file read groups. Note: If you want to check whether or not a future IlluminaBasecallsToSam or ExtractIlluminaBarcodes run will fail then be sure to use the exact same READ_STRUCTURE that you would pass to these programs for this run. Required.
LANES=IntegerThe number of the lane(s) to check. This option must be specified at least 1 times.
TILE_NUMBERS=IntegerThe number(s) of the tile(s) to check. This option may be specified 0 or more times.
FAKE_FILES=BooleanA flag to determine whether or not to create fake versions of the missing files. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
LINK_LOCS=BooleanA flag to create symlinks to the loc file for the X Ten for each tile. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

IntervalListTools

USAGE: IntervalListTools [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#IntervalListTools

General tool for manipulating interval lists, including sorting, merging, padding, uniqueifying, and other set-theoretic operations. Default operation if given one or more inputs is to merge and sort them. Other options are controlled by arguments.

OptionDescription
INPUT=FileOne or more interval lists. If multiple interval lists are provided the output is theresult of merging the inputs. This option must be specified at least 1 times.
OUTPUT=FileThe output interval list file to write (if SCATTER_COUNT is 1) or the directory into which to write the scattered interval sub-directories (if SCATTER_COUNT > 1) Default value: null.
PADDING=IntegerThe amount to pad each end of the intervals by before other operations are undertaken. Negative numbers are allowed and indicate intervals should be shrunk. Resulting intervals < 0 bases long will be removed. Padding is applied to the interval lists <b> before </b> the ACTION is performed. Default value: 0. This option can be set to 'null' to clear the default value.
UNIQUE=BooleanIf true, merge overlapping and adjacent intervals to create a list of unique intervals. Implies SORT=true Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
SORT=BooleanIf true, sort the resulting interval list by coordinate. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ACTION=ActionAction to take on inputs. Default value: CONCAT. This option can be set to 'null' to clear the default value. Possible values: {

CONCAT (The concatenation of all the INPUTs, no sorting or merging of overlapping/abutting intervals implied. Will result in an unsorted list unless requested otherwise.)

UNION (Like CONCATENATE but with UNIQUE and SORT implied, the result being the set-wise union of all INPUTS.)

INTERSECT (The sorted, uniqued set of all loci that are contained in all of the INPUTs.)

SUBTRACT (Subtracts SECOND_INPUT from INPUT. The resulting loci are there in INPUT that are not in SECOND_INPUT)

SYMDIFF (Find loci that are in INPUT or SECOND_INPUT but are not in both.)

}

SECOND_INPUT=FileSecond set of intervals for SUBTRACT and DIFFERENCE operations. This option may be specified 0 or more times.
COMMENT=StringOne or more lines of comment to add to the header of the output file. This option may be specified 0 or more times.
SCATTER_COUNT=IntegerThe number of files into which to scatter the resulting list by locus; in some situations, fewer intervals may be emitted. Default value: 1. This option can be set to 'null' to clear the default value.
SUBDIVISION_MODE=ModeDo not subdivide Default value: INTERVAL_SUBDIVISION. This option can be set to 'null' to clear the default value. Possible values: {INTERVAL_SUBDIVISION, BALANCING_WITHOUT_INTERVAL_SUBDIVISION}
INVERT=BooleanProduce the inverse list Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

MakeSitesOnlyVcf

Reads a VCF/VCF.gz/BCF and removes all genotype information from it while retaining all site level information, including annotations based on genotypes (e.g. AN, AF). Output an be any support variant format including .vcf, .vcf.gz or .bcf.

OptionDescription
INPUT=FileInput VCF or BCF Required.
OUTPUT=FileOutput VCF or BCF to emit without per-sample info. Required.
SAMPLE=StringOptionally one or more samples to retain when building the 'sites-only' VCF. This option may be specified 0 or more times.

MarkDuplicates

USAGE: MarkDuplicates [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#MarkDuplicates

Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules. All records are then written to the output file with the duplicate records flagged.

OptionDescription
INPUT=FileOne or more input SAM or BAM files to analyze. Must be coordinate sorted. May not be a stream because file is read twice. This option may be specified 0 or more times.
OUTPUT=FileThe output file to write marked records to Required.
METRICS_FILE=FileFile to write duplication metrics to Required.
PROGRAM_RECORD_ID=StringThe program record ID for the @PG record(s) created by this program. Set to null to disable PG record creation. This string may have a suffix appended to avoid collision with other program record IDs. Default value: MarkDuplicates. This option can be set to 'null' to clear the default value.
PROGRAM_GROUP_VERSION=StringValue of VN tag of PG record to be created. If not specified, the version will be detected automatically. Default value: null.
PROGRAM_GROUP_COMMAND_LINE=StringValue of CL tag of PG record to be created. If not supplied the command line will be detected automatically. Default value: null.
PROGRAM_GROUP_NAME=StringValue of PN tag of PG record to be created. Default value: MarkDuplicates. This option can be set to 'null' to clear the default value.
COMMENT=StringComment(s) to include in the output file's header. This option may be specified 0 or more times.
REMOVE_DUPLICATES=BooleanIf true do not write duplicates to the output file instead of writing them with appropriate flags set. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ASSUME_SORTED=BooleanIf true, assume that the input file is coordinate sorted even if the header says otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=IntegerThis option is obsolete. ReadEnds will always be spilled to disk. Default value: 50000. This option can be set to 'null' to clear the default value.
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=IntegerMaximum number of file handles to keep open when spilling read ends to disk. Set this number a little lower than the per-process maximum number of file that may be open. This number can be found by executing the 'ulimit -n' command on a Unix system. Default value: 8000. This option can be set to 'null' to clear the default value.
SORTING_COLLECTION_SIZE_RATIO=DoubleThis number, plus the maximum RAM available to the JVM, determine the memory footprint used by some of the sorting collections. If you are running out of memory, try reducing this number. Default value: 0.25. This option can be set to 'null' to clear the default value.
READ_NAME_REGEX=StringRegular expression that can be used to parse read names in the incoming SAM file. Read names are parsed to extract three variables: tile/region, x coordinate and y coordinate. These values are used to estimate the rate of optical duplication in order to give a more accurate estimated library size. The regular expression should contain three capture groups for the three variables, in order. It must match the entire read name. Note that if the default regex is specified, a regex match is not actually done, but instead the read name is split on colon character and the 2nd, 3rd and 4th elements are assumed to be tile, x and y values. Default value: [a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).*. This option can be set to 'null' to clear the default value.
OPTICAL_DUPLICATE_PIXEL_DISTANCE=IntegerThe maximum offset between two duplicte clusters in order to consider them optical duplicates. This should usually be set to some fairly small number (e.g. 5-10 pixels) unless using later versions of the Illumina pipeline that multiply pixel values by 10, in which case 50-100 is more normal. Default value: 100. This option can be set to 'null' to clear the default value.

MeanQualityByCycle

Usage: program [options...]

OptionDescription
CHART_OUTPUT=FileA file (with .pdf extension) to write the chart to. Required.
ALIGNED_READS_ONLY=BooleanIf set to true, calculate mean quality over aligned reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
PF_READS_ONLY=BooleanIf set to true calculate mean quality over PF reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INPUT=FileInput SAM or BAM file. Required.
OUTPUT=FileFile to write the output to. Required.
REFERENCE_SEQUENCE=FileReference sequence fasta Default value: null.
ASSUME_SORTED=BooleanIf true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
STOP_AFTER=LongStop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.

MergeBamAlignment

USAGE: MergeBamAlignment [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#MergeBamAlignment

Merges alignment data from a SAM or BAM file with additional data stored in an unmapped BAM file and produces a third SAM or BAM file of aligned and unaligned reads. NOTE that this program expects to find a sequence dictionary in the same directory as REFERENCE_SEQUENCE and expects it to have the same base name as the reference fasta except with the extension '.dict'

OptionDescription
UNMAPPED_BAM=FileOriginal SAM or BAM file of unmapped reads, which must be in queryname order. Required.
ALIGNED_BAM=FileSAM or BAM file(s) with alignment data. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) READ1_ALIGNED_BAM (R1_ALIGNED) READ2_ALIGNED_BAM (R2_ALIGNED)
READ1_ALIGNED_BAM=FileSAM or BAM file(s) with alignment data from the first read of a pair. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) ALIGNED_BAM (ALIGNED)
READ2_ALIGNED_BAM=FileSAM or BAM file(s) with alignment data from the second read of a pair. This option may be specified 0 or more times. Cannot be used in conjuction with option(s) ALIGNED_BAM (ALIGNED)
OUTPUT=FileMerged SAM or BAM file to write to. Required.
REFERENCE_SEQUENCE=FilePath to the fasta file for the reference sequence. Required.
PROGRAM_RECORD_ID=StringThe program group ID of the aligner (if not supplied by the aligned file). Default value: null.
PROGRAM_GROUP_VERSION=StringThe version of the program group (if not supplied by the aligned file). Default value: null.
PROGRAM_GROUP_COMMAND_LINE=StringThe command line of the program group (if not supplied by the aligned file). Default value: null.
PROGRAM_GROUP_NAME=StringThe name of the program group (if not supplied by the aligned file). Default value: null.
PAIRED_RUN=BooleanThis argument is ignored and will be removed. Required. Possible values: {true, false}
JUMP_SIZE=IntegerThe expected jump size (required if this is a jumping library). Deprecated. Use EXPECTED_ORIENTATIONS instead Default value: null. Cannot be used in conjuction with option(s) EXPECTED_ORIENTATIONS (ORIENTATIONS)
CLIP_ADAPTERS=BooleanWhether to clip adapters where identified. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
IS_BISULFITE_SEQUENCE=BooleanWhether the lane is bisulfite sequence (used when caculating the NM tag). Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ALIGNED_READS_ONLY=BooleanWhether to output only aligned reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_INSERTIONS_OR_DELETIONS=IntegerThe maximum number of insertions or deletions permitted for an alignment to be included. Alignments with more than this many insertions or deletions will be ignored. Set to -1 to allow any number of insertions or deletions. Default value: 1. This option can be set to 'null' to clear the default value.
ATTRIBUTES_TO_RETAIN=StringReserved alignment attributes (tags starting with X, Y, or Z) that should be brought over from the alignment data when merging. This option may be specified 0 or more times.
ATTRIBUTES_TO_REMOVE=StringAttributes from the alignment record that should be removed when merging. This overrides ATTRIBUTES_TO_RETAIN if they share common tags. This option may be specified 0 or more times.
READ1_TRIM=IntegerThe number of bases trimmed from the beginning of read 1 prior to alignment Default value: 0. This option can be set to 'null' to clear the default value.
READ2_TRIM=IntegerThe number of bases trimmed from the beginning of read 2 prior to alignment Default value: 0. This option can be set to 'null' to clear the default value.
EXPECTED_ORIENTATIONS=PairOrientationThe expected orientation of proper read pairs. Replaces JUMP_SIZE Possible values: {FR, RF, TANDEM} This option may be specified 0 or more times. Cannot be used in conjuction with option(s) JUMP_SIZE (JUMP)
ALIGNER_PROPER_PAIR_FLAGS=BooleanUse the aligner's idea of what a proper pair is rather than computing in this program. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
SORT_ORDER=SortOrderThe order in which the merged reads should be output. Default value: coordinate. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate}
PRIMARY_ALIGNMENT_STRATEGY=PrimaryAlignmentStrategyStrategy for selecting primary alignment when the aligner has provided more than one alignment for a pair or fragment, and none are marked as primary, more than one is marked as primary, or the primary alignment is filtered out for some reason. BestMapq expects that multiple alignments will be correlated with HI tag, and prefers the pair of alignments with the largest MAPQ, in the absence of a primary selected by the aligner. EarliestFragment prefers the alignment which maps the earliest base in the read. Note that EarliestFragment may not be used for paired reads. BestEndMapq is appropriate for cases in which the aligner is not pair-aware, and does not output the HI tag. It simply picks the alignment for each end with the highest MAPQ, and makes those alignments primary, regardless of whether the two alignments make sense together.MostDistant is also for a non-pair-aware aligner, and picks the alignment pair with the largest insert size. If all alignments would be chimeric, it picks the alignments for each end with the best MAPQ. For all algorithms, ties are resolved arbitrarily. Default value: BestMapq. This option can be set to 'null' to clear the default value. Possible values: {BestMapq, EarliestFragment, BestEndMapq, MostDistant}
CLIP_OVERLAPPING_READS=BooleanFor paired reads, soft clip the 3' end of each read if necessary so that it does not extend past the 5' end of its mate. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INCLUDE_SECONDARY_ALIGNMENTS=BooleanIf false, do not write secondary alignments to output. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ADD_MATE_CIGAR=BooleanAdds the mate CIGAR tag (MC) if true, does not if false. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

MergeSamFiles

Merges multiple SAM/BAM files into one file.

OptionDescription
INPUT=FileSAM or BAM input file This option must be specified at least 1 times.
OUTPUT=FileSAM or BAM file to write merged result to Required.
SORT_ORDER=SortOrderSort order of output file Default value: coordinate. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate}
ASSUME_SORTED=BooleanIf true, assume that the input files are in the same sort order as the requested output sort order, even if their headers say otherwise. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MERGE_SEQUENCE_DICTIONARIES=BooleanMerge the sequence dictionaries Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
USE_THREADING=BooleanOption to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
COMMENT=StringComment(s) to include in the merged output file's header. This option may be specified 0 or more times.

MergeVcfs

USAGE: MergeVcfs [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#MergeVcfs

Merges multiple VCF or BCF files into one VCF file. Input files must be sorted by their contigs and, within contigs, by start position. The input files must have the same sample and contig lists. An index file is created and a sequence dictionary is required by default.

OptionDescription
INPUT=FileVCF or BCF input files File format is determined by file extension. This option must be specified at least 1 times.
OUTPUT=FileThe merged VCF or BCF file. File format is determined by file extension. Required.
SEQUENCE_DICTIONARY=FileThe index sequence dictionary to use instead of the sequence dictionary in the input file Default value: null.

NormalizeFasta

Takes any file that conforms to the fasta format and normalizes it so that all lines of sequence except the last line per named sequence are of the same length.

OptionDescription
INPUT=FileThe input fasta file to normalize. Required.
OUTPUT=FileThe output fasta file to write. Required.
LINE_LENGTH=IntegerThe line length to be used for the output fasta file. Default value: 100. This option can be set to 'null' to clear the default value.
TRUNCATE_SEQUENCE_NAMES_AT_WHITESPACE=BooleanTruncate sequence names at first whitespace. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

ExtractSequences

Extracts one or more intervals described in an interval_list file from a given reference sequence and writes them out in FASTA format. Requires a fasta index file to be present.

OptionDescription
INTERVAL_LIST=FileInterval list describing intervals to be extracted from the reference sequence. Required.
REFERENCE_SEQUENCE=FileReference sequence file. Required.
OUTPUT=FileOutput fasta file. Required.
LINE_LENGTH=IntegerMaximum line length for sequence data. Default value: 80. This option can be set to 'null' to clear the default value.

QualityScoreDistribution

USAGE: QualityScoreDistribution [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#QualityScoreDistribution

Program to chart quality score distributions in a SAM or BAM file.

OptionDescription
CHART_OUTPUT=FileA file (with .pdf extension) to write the chart to. Required.
ALIGNED_READS_ONLY=BooleanIf set to true calculate mean quality over aligned reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
PF_READS_ONLY=BooleanIf set to true calculate mean quality over PF reads only. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INCLUDE_NO_CALLS=BooleanIf set to true, include quality for no-call bases in the distribution. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INPUT=FileInput SAM or BAM file. Required.
OUTPUT=FileFile to write the output to. Required.
REFERENCE_SEQUENCE=FileReference sequence fasta Default value: null.
ASSUME_SORTED=BooleanIf true (default), then the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
STOP_AFTER=LongStop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.

ReorderSam

Not to be confused with SortSam which sorts a SAM or BAM file with a valid sequence dictionary, ReorderSam reorders reads in a SAM/BAM file to match the contig ordering in a provided reference file, as determined by exact name matching of contigs. Reads mapped to contigs absent in the new reference are dropped. Runs substantially faster if the input is an indexed BAM file.Version: 1.0

OptionDescription
INPUT=FileInput file (bam or sam) to extract reads from. Required.
OUTPUT=FileOutput file (bam or sam) to write extracted reads to. Required.
REFERENCE=FileReference sequence to reorder reads to match. A sequence dictionary corresponding to the reference fasta is required. Create one with CreateSequenceDictionary.jar. Required.
ALLOW_INCOMPLETE_DICT_CONCORDANCE=BooleanIf true, then allows only a partial overlap of the BAM contigs with the new reference sequence contigs. By default, this tool requires a corresponding contig in the new reference for each read contig Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ALLOW_CONTIG_LENGTH_DISCORDANCE=BooleanIf true, then permits mapping from a read contig to a new reference contig with the same name but a different length. Highly dangerous, only use if you know what you are doing. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

ReplaceSamHeader

USAGE: ReplaceSamHeader [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#ReplaceSamHeader

Replace the SAMFileHeader in a SAM file with the given header. Validation is minimal. It is up to the user to ensure that all the elements referred to in the SAMRecords are present in the new header. Sort order of the two input files must be the same.

OptionDescription
INPUT=FileSAM file from which SAMRecords will be read. Required.
HEADER=FileSAM file from which SAMFileHeader will be read. Required.
OUTPUT=FileSAMFileHeader from HEADER file will be written to this file, followed by SAMRecords from INPUT file Required.

RevertSam

USAGE: RevertSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#RevertSam

Reverts SAM or BAM files to a previous state by removing certain types of information and/or substituting in the original quality scores when available.

OptionDescription
INPUT=FileThe input SAM/BAM file to revert the state of. Required.
OUTPUT=FileThe output SAM/BAM file to create. Required.
SORT_ORDER=SortOrderThe sort order to create the reverted output file with. Default value: queryname. This option can be set to 'null' to clear the default value. Possible values: {unsorted, queryname, coordinate}
RESTORE_ORIGINAL_QUALITIES=BooleanTrue to restore original qualities from the OQ field to the QUAL field if available. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
REMOVE_DUPLICATE_INFORMATION=BooleanRemove duplicate read flags from all reads. Note that if this is true and REMOVE_ALIGNMENT_INFORMATION==false, the output may have the unusual but sometimes desirable trait of having unmapped reads that are marked as duplicates. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
REMOVE_ALIGNMENT_INFORMATION=BooleanRemove all alignment information from the file. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
ATTRIBUTE_TO_CLEAR=StringWhen removing alignment information, the set of optional tags to remove. This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
SANITIZE=BooleanWARNING: This option is potentially destructive. If enabled will discard reads in order to produce a consistent output BAM. Reads discarded include (but are not limited to) paired reads with missing mates, duplicated records, records with mismatches in length of bases and qualities. This option can only be enabled if the output sort order is queryname and will always cause sorting to occur. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_DISCARD_FRACTION=DoubleIf SANITIZE=true and higher than MAX_DISCARD_FRACTION reads are discarded due to sanitization thenthe program will exit with an Exception instead of exiting cleanly. Output BAM will still be valid. Default value: 0.01. This option can be set to 'null' to clear the default value.
SAMPLE_ALIAS=StringThe sample alias to use in the reverted output file. This will override the existing sample alias in the file and is used only if all the read groups in the input file have the same sample alias Default value: null.
LIBRARY_NAME=StringThe library name to use in the reverted output file. This will override the existing sample alias in the file and is used only if all the read groups in the input file have the same sample alias Default value: null.

RevertOriginalBaseQualitiesAndAddMateCigar

USAGE: RevertOriginalBaseQualitiesAndAddMateCigar [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#RevertOriginalBaseQualitiesAndAddMateCigar

Reverts the original base qualities and adds the mate cigar tag to read-group BAMs.

OptionDescription
INPUT=FileThe input SAM/BAM file to revert the state of. Required.
OUTPUT=FileThe output SAM/BAM file to create. Required.
SORT_ORDER=SortOrderThe sort order to create the reverted output file with.By default, the sort order will be the same as the input. Default value: null. Possible values: {unsorted, queryname, coordinate}
RESTORE_ORIGINAL_QUALITIES=BooleanTrue to restore original qualities from the OQ field to the QUAL field if available. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_RECORDS_TO_EXAMINE=IntegerThe maximum number of records to examine to determine if we can exit early and not output, given that there are a no original base qualities (if we are to restore) and mate cigars exist. Set to 0 to never skip the file. Default value: 10000. This option can be set to 'null' to clear the default value.

SamFormatConverter

USAGE: SamFormatConverter [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#SamFormatConverter

Convert a BAM file to a SAM file, or BAM to SAM.

Input and output formats are determined by file extension.

OptionDescription
INPUT=FileThe BAM or SAM file to parse. Required.
OUTPUT=FileThe BAM or SAM output file. Required.

SamToFastq

USAGE: SamToFastq [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#SamToFastq

Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctlythe original read sequence as it was generated by the sequencer.

OptionDescription
INPUT=FileInput SAM/BAM file to extract reads from Required.
FASTQ=FileOutput fastq file (single-end fastq or, if paired, first end of the pair fastq). Required. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG)
SECOND_END_FASTQ=FileOutput fastq file (if paired, second end of the pair fastq). Default value: null. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG)
UNPAIRED_FASTQ=FileOutput fastq file for unpaired reads; may only be provided in paired-fastq mode Default value: null. Cannot be used in conjuction with option(s) OUTPUT_PER_RG (OPRG)
OUTPUT_PER_RG=BooleanOutput a fastq file per read group (two fastq files per read group if the group is paired). Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false} Cannot be used in conjuction with option(s) SECOND_END_FASTQ (F2) UNPAIRED_FASTQ (FU) FASTQ (F)
OUTPUT_DIR=FileDirectory in which to output the fastq file(s). Used only when OUTPUT_PER_RG is true. Default value: null.
RE_REVERSE=BooleanRe-reverse bases and qualities of reads with negative strand flag set before writing them to fastq Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INTERLEAVE=BooleanWill generate an interleaved fastq if paired, each line will have /1 or /2 to describe which end it came from Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
INCLUDE_NON_PF_READS=BooleanInclude non-PF reads from the SAM file into the output FASTQ files. PF means 'passes filtering'. Reads whose 'not passing quality controls' flag is set are non-PF reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
CLIPPING_ATTRIBUTE=StringThe attribute that stores the position at which the SAM record should be clipped Default value: null.
CLIPPING_ACTION=StringThe action that should be taken with clipped reads: 'X' means the reads and qualities should be trimmed at the clipped position; 'N' means the bases should be changed to Ns in the clipped region; and any integer means that the base qualities should be set to that value in the clipped region. Default value: null.
READ1_TRIM=IntegerThe number of bases to trim from the beginning of read 1. Default value: 0. This option can be set to 'null' to clear the default value.
READ1_MAX_BASES_TO_WRITE=IntegerThe maximum number of bases to write from read 1 after trimming. If there are fewer than this many bases left after trimming, all will be written. If this value is null then all bases left after trimming will be written. Default value: null.
READ2_TRIM=IntegerThe number of bases to trim from the beginning of read 2. Default value: 0. This option can be set to 'null' to clear the default value.
READ2_MAX_BASES_TO_WRITE=IntegerThe maximum number of bases to write from read 2 after trimming. If there are fewer than this many bases left after trimming, all will be written. If this value is null then all bases left after trimming will be written. Default value: null.
INCLUDE_NON_PRIMARY_ALIGNMENTS=BooleanIf true, include non-primary alignments in the output. Support of non-primary alignments in SamToFastq is not comprehensive, so there may be exceptions if this is set to true and there are paired reads with non-primary alignments. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}

SortSam

USAGE: SortSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#SortSam

Sorts the input SAM or BAM.

Input and output formats are determined by file extension.

OptionDescription
INPUT=FileThe BAM or SAM file to sort. Required.
OUTPUT=FileThe sorted BAM or SAM output file. Required.
SORT_ORDER=SortOrderSort order of output file Required. Possible values: {unsorted, queryname, coordinate}

VcfFormatConverter

USAGE: VcfFormatConverter [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#VcfFormatConverter

Convert a VCF file to a BCF file, or BCF to VCF.

Input and output formats are determined by file extension.

OptionDescription
INPUT=FileThe BCF or VCF input file. The file format is determined by file extension. Required.
OUTPUT=FileThe BCF or VCF output file. The file format is determined by file extension. Required.
REQUIRE_INDEX=BooleanFail if an index is not available for the input VCF/BCF Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

MarkIlluminaAdapters

USAGE: MarkIlluminaAdapters [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#MarkIlluminaAdapters

Reads a SAM or BAM file and rewrites it with new adapter-trimming tags.

Clear any existing adapter-trimming tags (XT:i:).

Only works for unaligned files in query-name order.

Note: This is a utility program and will not be run in the pipeline.

OptionDescription
INPUT=FileRequired.
OUTPUT=FileIf output is not specified, just the metrics are generated Default value: null.
METRICS=FileHistogram showing counts of bases_clipped in how many reads Required.
MIN_MATCH_BASES_SE=IntegerThe minimum number of bases to match over when clipping single-end reads. Default value: 12. This option can be set to 'null' to clear the default value.
MIN_MATCH_BASES_PE=IntegerThe minimum number of bases to match over (per-read) when clipping paired-end reads. Default value: 6. This option can be set to 'null' to clear the default value.
MAX_ERROR_RATE_SE=DoubleThe maximum mismatch error rate to tolerate when clipping single-end reads. Default value: 0.1. This option can be set to 'null' to clear the default value.
MAX_ERROR_RATE_PE=DoubleThe maximum mismatch error rate to tolerate when clipping paired-end reads. Default value: 0.1. This option can be set to 'null' to clear the default value.
PAIRED_RUN=BooleanDEPRECATED. Whether this is a paired-end run. No longer used. Default value: null. Possible values: {true, false}
ADAPTERS=IlluminaAdapterPairWhich adapters sequences to attempt to identify and clip. Possible values: {PAIRED_END, INDEXED, SINGLE_END, NEXTERA_V1, NEXTERA_V2, DUAL_INDEXED, FLUIDIGM, TRUSEQ_SMALLRNA, ALTERNATIVE_SINGLE_END} This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.
FIVE_PRIME_ADAPTER=StringFor specifying adapters other than standard Illumina Default value: null.
THREE_PRIME_ADAPTER=StringFor specifying adapters other than standard Illumina Default value: null.
ADAPTER_TRUNCATION_LENGTH=IntegerAdapters are truncated to this length to speed adapter matching. Set to a large number to effectively disable truncation. Default value: 30. This option can be set to 'null' to clear the default value.
PRUNE_ADAPTER_LIST_AFTER_THIS_MANY_ADAPTERS_SEEN=IntegerIf looking for multiple adapter sequences, then after having seen this many adapters, shorten the list of sequences. Keep the adapters that were found most frequently in the input so far. Set to -1 if the input has a heterogeneous mix of adapters so shortening is undesirable. Default value: 100. This option can be set to 'null' to clear the default value.
NUM_ADAPTERS_TO_KEEP=IntegerIf pruning the adapter list, keep only this many adapter sequences when pruning the list (plus any adapters that were tied with the adapters being kept). Default value: 1. This option can be set to 'null' to clear the default value.

SplitVcfs

USAGE: SplitVcfs [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#SplitVcfs

Splits an input VCF or BCF file into two VCF files, one for indel records and one for SNPs. Theheaders of the two output files will be identical. An index file is created and asequence dictionary is required by default.

OptionDescription
INPUT=FileThe VCF or BCF input file Required.
SNP_OUTPUT=FileThe VCF or BCF file to which SNP records should be written. The file format is determined by file extension. Required.
INDEL_OUTPUT=FileThe VCF or BCF file to which indel records should be written. The file format is determined by file extension. Required.
SEQUENCE_DICTIONARY=FileThe index sequence dictionary to use instead of the sequence dictionaries in the input files Default value: null.
STRICT=BooleanIf true an exception will be thrown if an event type other than SNP or indel is encountered Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}

ValidateSamFile

USAGE: ValidateSamFile [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#ValidateSamFile

Read a SAM or BAM file and report on its validity.

OptionDescription
INPUT=FileInput SAM/BAM file Required.
OUTPUT=FileOutput file or standard out if missing Default value: null.
MODE=ModeMode of output Default value: VERBOSE. This option can be set to 'null' to clear the default value. Possible values: {VERBOSE, SUMMARY}
IGNORE=TypeList of validation error types to ignore. Possible values: {INVALID_QUALITY_FORMAT, INVALID_FLAG_PROPER_PAIR, INVALID_FLAG_MATE_UNMAPPED, MISMATCH_FLAG_MATE_UNMAPPED, INVALID_FLAG_MATE_NEG_STRAND, MISMATCH_FLAG_MATE_NEG_STRAND, INVALID_FLAG_FIRST_OF_PAIR, INVALID_FLAG_SECOND_OF_PAIR, PAIRED_READ_NOT_MARKED_AS_FIRST_OR_SECOND, INVALID_FLAG_NOT_PRIM_ALIGNMENT, INVALID_FLAG_SUPPLEMENTARY_ALIGNMENT, INVALID_FLAG_READ_UNMAPPED, INVALID_INSERT_SIZE, INVALID_MAPPING_QUALITY, INVALID_CIGAR, ADJACENT_INDEL_IN_CIGAR, INVALID_MATE_REF_INDEX, MISMATCH_MATE_REF_INDEX, INVALID_REFERENCE_INDEX, INVALID_ALIGNMENT_START, MISMATCH_MATE_ALIGNMENT_START, MATE_FIELD_MISMATCH, INVALID_TAG_NM, MISSING_TAG_NM, MISSING_HEADER, MISSING_SEQUENCE_DICTIONARY, MISSING_READ_GROUP, RECORD_OUT_OF_ORDER, READ_GROUP_NOT_FOUND, RECORD_MISSING_READ_GROUP, INVALID_INDEXING_BIN, MISSING_VERSION_NUMBER, INVALID_VERSION_NUMBER, TRUNCATED_FILE, MISMATCH_READ_LENGTH_AND_QUALS_LENGTH, EMPTY_READ, CIGAR_MAPS_OFF_REFERENCE, MISMATCH_READ_LENGTH_AND_E2_LENGTH, MISMATCH_READ_LENGTH_AND_U2_LENGTH, E2_BASE_EQUALS_PRIMARY_BASE, BAM_FILE_MISSING_TERMINATOR_BLOCK, UNRECOGNIZED_HEADER_TYPE, POORLY_FORMATTED_HEADER_TAG, HEADER_TAG_MULTIPLY_DEFINED, HEADER_RECORD_MISSING_REQUIRED_TAG, INVALID_DATE_STRING, TAG_VALUE_TOO_LARGE, INVALID_INDEX_FILE_POINTER, INVALID_PREDICTED_MEDIAN_INSERT_SIZE, DUPLICATE_READ_GROUP_ID, MISSING_PLATFORM_VALUE, INVALID_PLATFORM_VALUE, DUPLICATE_PROGRAM_GROUP_ID, MATE_NOT_FOUND, MATES_ARE_SAME_END, MISMATCH_MATE_CIGAR_STRING, MATE_CIGAR_STRING_INVALID_PRESENCE} This option may be specified 0 or more times.
MAX_OUTPUT=IntegerThe maximum number of lines output in verbose mode Default value: 100. This option can be set to 'null' to clear the default value.
REFERENCE_SEQUENCE=FileReference sequence file, the NM tag check will be skipped if this is missing Default value: null.
IGNORE_WARNINGS=BooleanIf true, only report errors and ignore warnings. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
VALIDATE_INDEX=BooleanIf true and input is a BAM file with an index file, also validates the index. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}
IS_BISULFITE_SEQUENCED=BooleanWhether the SAM or BAM file consists of bisulfite sequenced reads. If so, C->T is not counted as an error in computing the value of the NM tag. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}
MAX_OPEN_TEMP_FILES=IntegerRelevant for a coordinate-sorted file containing read pairs only. Maximum number of file handles to keep open when spilling mate info to disk. Set this number a little lower than the per-process maximum number of file that may be open. This number can be found by executing the 'ulimit -n' command on a Unix system. Default value: 8000. This option can be set to 'null' to clear the default value.

ViewSam

USAGE: ViewSam [options]

Documentation: http://picard.sourceforge.net/command-line-overview.shtml#ViewSam

Prints a SAM or BAM file to the screen.

OptionDescription
INPUT=FileThe SAM or BAM file to view. Required.
ALIGNMENT_STATUS=AlignmentStatusPrint out all reads, just the aligned reads or just the unaligned reads. Default value: All. This option can be set to 'null' to clear the default value. Possible values: {Aligned, Unaligned, All}
PF_STATUS=PfStatusPrint out all reads, just the PF reads or just the non-PF reads. Default value: All. This option can be set to 'null' to clear the default value. Possible values: {PF, NonPF, All}