net.sf.picard.sam
Class DuplicationMetrics

java.lang.Object
  extended by net.sf.picard.metrics.MetricBase
      extended by net.sf.picard.sam.DuplicationMetrics

public class DuplicationMetrics
extends MetricBase

Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.


Field Summary
 Long ESTIMATED_LIBRARY_SIZE
          The estimated number of unique molecules in the library based on PE duplication.
 String LIBRARY
          The library on which the duplicate marking was performed.
 Double PERCENT_DUPLICATION
          The percentage of mapped sequence that is marked as duplicate.
 long READ_PAIR_DUPLICATES
          The number of read pairs that were marked as duplicates.
 long READ_PAIR_OPTICAL_DUPLICATES
          The number of read pairs duplicates that were caused by optical duplication.
 long READ_PAIRS_EXAMINED
          The number of mapped read pairs examined.
 long UNMAPPED_READS
          The total number of unmapped reads examined.
 long UNPAIRED_READ_DUPLICATES
          The number of fragments that were marked as duplicates.
 long UNPAIRED_READS_EXAMINED
          The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
 
Constructor Summary
DuplicationMetrics()
           
 
Method Summary
 void calculateDerivedMetrics()
          Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.
 Histogram<Double> calculateRoiHistogram()
          Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.
static Long estimateLibrarySize(long readPairs, long uniqueReadPairs)
          Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs ovserved.
static double estimateRoi(long estimatedLibrarySize, double x, long pairs, long uniquePairs)
          Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.
static void main(String[] args)
           
 
Methods inherited from class net.sf.picard.metrics.MetricBase
equals, equals, hashCode, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

LIBRARY

public String LIBRARY
The library on which the duplicate marking was performed.


UNPAIRED_READS_EXAMINED

public long UNPAIRED_READS_EXAMINED
The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.


READ_PAIRS_EXAMINED

public long READ_PAIRS_EXAMINED
The number of mapped read pairs examined.


UNMAPPED_READS

public long UNMAPPED_READS
The total number of unmapped reads examined.


UNPAIRED_READ_DUPLICATES

public long UNPAIRED_READ_DUPLICATES
The number of fragments that were marked as duplicates.


READ_PAIR_DUPLICATES

public long READ_PAIR_DUPLICATES
The number of read pairs that were marked as duplicates.


READ_PAIR_OPTICAL_DUPLICATES

public long READ_PAIR_OPTICAL_DUPLICATES
The number of read pairs duplicates that were caused by optical duplication. Value is always < READ_PAIR_DUPLICATES, which counts all duplicates regardless of source.


PERCENT_DUPLICATION

public Double PERCENT_DUPLICATION
The percentage of mapped sequence that is marked as duplicate.


ESTIMATED_LIBRARY_SIZE

public Long ESTIMATED_LIBRARY_SIZE
The estimated number of unique molecules in the library based on PE duplication.

Constructor Detail

DuplicationMetrics

public DuplicationMetrics()
Method Detail

calculateDerivedMetrics

public void calculateDerivedMetrics()
Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.


estimateLibrarySize

public static Long estimateLibrarySize(long readPairs,
                                       long uniqueReadPairs)
Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs ovserved. Based on the Lander-Waterman equation that states: C/X = 1 - exp( -N/X ) where X = number of distinct molecules in library N = number of read pairs C = number of distinct fragments observed in read pairs


estimateRoi

public static double estimateRoi(long estimatedLibrarySize,
                                 double x,
                                 long pairs,
                                 long uniquePairs)
Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.

Parameters:
estimatedLibrarySize - the estimated number of molecules in the library
x - the multiple of sequencing to be simulated (i.e. how many X sequencing)
pairs - the number of pairs observed in the actual sequencing
uniquePairs - the number of unique pairs observed in the actual sequencing
Returns:
a number z <= x that estimates if you had pairs*x as your sequencing then you would observe uniquePairs*z unique pairs.

calculateRoiHistogram

public Histogram<Double> calculateRoiHistogram()
Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.


main

public static void main(String[] args)