pbbam Documentation Release Derek Barnett

Size: px
Start display at page:

Download "pbbam Documentation Release Derek Barnett"

Transcription

1 pbbam Documentation Release Derek Barnett Nov 16, 2017

2

3 Contents 1 Getting Started 3 2 C++ API Reference 7 3 Additional Languages Command Line Utilities 213 i

4 ii

5 As of the 3.0 release of SMRTanalysis, PacBio is embracing the industry standard BAM format for (both aligned and unaligned) basecall data files. We have also formulated a BAM companion file format (bam.pbi) enabling fast access to a richer set of per-read information as well as compatibility for software built around the legacy cmp.h5 format. The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities. Contents 1

6 2 Contents

7 CHAPTER 1 Getting Started 1.1 Requirements These components will almost certainly already be on your system. gcc (4.8+) OR clang (v3.1+) pthreads zlib Double-check your compiler version, to be sure it is compatible. $ g++ -v $ clang -v Additional requirements: Boost (1.55+) CMake (3.0+) Google Test htslib (1.3.1+) For additional languages: SWIG (3.0.5+) For building API documentation locally: Doxygen For maximal convenience, install htslib and google test in the same parent directory you plan to install pbbam. 3

8 1.2 Clone & Build Note: The following steps are for building the C++ library and command-line utilities. If you are integrating pbbam into a C#, Python, or R project, take a look at the instructions for additional languages. The basic steps for obtaining pbbam and building it from source are as follows: Build and install htslib, per the project s instructions (or on OSX brew install htslib ) Clone You should first clone the repository: $ git clone $ cd pbbam Building with CMake When building with CMake, create a separate build directory: $ mkdir build $ cd build $ cmake.. $ make -j 4 # compiles using 4 threads Output: Library : <pbbam_root>/lib Headers : <pbbam_root>/include Utilities : <pbbam_root>/bin You may need to set a few options on the cmake command, to point to dependencies install locations. Common installation-related options include: GTEST_SRC_DIR Add these using the -D argument, like this: $ cmake.. -DGTEST_SRC_DIR="path/to/googletest" To run the test suite, run: $ make test To build a local copy of the (Doxygen-style) API documentation, run: $ make doc And then open <pbbam_root>/docs/html/index.html in your favorite browser. 4 Chapter 1. Getting Started

9 1.2.3 Building with Meson Building with Meson is generally faster and more versatile. Meson strictly requires building out of source: $ mkdir build $ cd build $ meson --prefix /my/install/prefix -Denable-tests=true.. $ ninja where ninja will by default utilize a number of threads for compilation equal to the number of logical cores on your system. Here -Denable-tests=true enables pulling in dependencies for testing. In order to run the test suite, run: $ ninja test If you wish to install pbbam, run: $ ninja install and ninja will install pbbam to /my/install/prefix. 1.3 Integrate For CMake-based projects that will ship with or otherwise live alongside pbbam, you can use the approach described here. Before defining your library or executable, add the following: add_subdirectory(<path/to/pbbam> external/build/pbbam) When it s time to run make this will ensure that pbbam will be built, inside your own project s build directory. After this point in the CMakeLists.txt file(s), a few variables will be available that can be used to setup your include paths and library linking targets: include_directories( ${PacBioBAM_INCLUDE_DIRS} # other includes that your project needs ) add_executable(foo) target_link_libraries(foo ${PacBioBAM_LIBRARIES} # other libs that your project needs ) If you re using something other than CMake for your project s build system, then you need to point it to pbbam s include directory & library, as well as those of its dependencies (primarily htslib). If you built and installed pbbam using Meson, pkg-config files will be available to be consumed by projects wishing to utilize pbbam. Autoconf, CMake, Waf, SCons and Meson all have means to determine dependency information from pkg-config files Integrate 5

10 6 Chapter 1. Getting Started

11 CHAPTER 2 C++ API Reference Watch this space for more recipes & how-tos. 2.1 Accuracy #include <pbbam/accuracy.h> class PacBio::BAM::Accuracy The Accuracy class represents the expected accuracy of a BamRecord. Values are clamped to fall within [0,1]. Constructors & Related Methods Accuracy(float accuracy) Constructs an Accuracy object from a floating-point number. Note This is not an explicit ctor, to make it as easy as possible to use in numeric operations. We really just want to make sure that the acceptable range is respected. Accuracy(const Accuracy &other) Accuracy(Accuracy &&other) Accuracy &operator=(const Accuracy &other) Accuracy &operator=(accuracy &&other) ~Accuracy() 7

12 Public Functions operator float() const Return Accuracy as float primitive Public Static Attributes const float MIN Minimum valid accuracy value [0.0]. const float MAX Maximum valid accuracy value [1.0]. 2.2 AlignmentPrinter #include <pbbam/alignmentprinter.h> class PacBio::BAM::AlignmentPrinter The AlignmentPrinter class pretty-prints an alignment with respect to its associated reference sequence. Example output: Read Reference : singleinsertion2 : lambda_neb3011 Read-length : 49 Concordance : : GGCTGCAGTGTACAGCGGTCAGGAGGCC-ATTGATGCCGG : : GGCTGCAG-GTACAGCGGTCAGGAGGCCAATTGATGCCGG : : ACTGGCTGAT : : ACTGGCTGAT : 49 Constructors & Related Methods AlignmentPrinter(const IndexedFastaReader &ifr) Constructs the alignment printer with an associated FASTA file reader. Exceptions ifr: FASTA reader std::runtime_error: if FASTA file cannot be opened for reading. AlignmentPrinter() AlignmentPrinter(const AlignmentPrinter&) 8 Chapter 2. C++ API Reference

13 AlignmentPrinter(AlignmentPrinter&&) AlignmentPrinter &operator=(const AlignmentPrinter&) AlignmentPrinter &operator=(alignmentprinter&&) ~AlignmentPrinter() Printing std::string Print(const BamRecord &record, const Orientation orientation = Orientation::GENOMIC) Pretty-prints an aligned BamRecord to std::string. Note The current implementation includes ANSI escape sequences for coloring terminal output. Future versions of this method will likely make this optional. Return formatted string containing the alignment and summary information 2.3 AlignmentSet #include <pbbam/datasettypes.h> class PacBio::BAM::AlignmentSet The AlignmentSet class represents an AlignmentSet root element in DataSetXML. Inherits from PacBio::BAM::DataSetBase Public Functions AlignmentSet() Creates an empty AlignmentSet dataset. 2.4 BaiIndexedBamReader #include <pbbam/baiindexedbamreader.h> class PacBio::BAM::BaiIndexedBamReader The BaiIndexedBamReader class provides read-only iteration over BAM records, bounded by a particular genomic interval. The SAM/BAM standard index (*.bai) is used to allow random-access operations. Inherits from PacBio::BAM::BamReader Constructors & Related Methods BaiIndexedBamReader(const GenomicInterval &interval, const std::string &filename) Constructs BAM reader, bounded by a genomic interval. All reads that overlap the interval will be available AlignmentSet 9

14 Exceptions interval: iteration will be bounded by this GenomicInterval. filename: input BAM filename std::runtime_error: if either file (*.bam or *.bai) fails to open for reading, or if the interval is invalid BaiIndexedBamReader(const GenomicInterval &interval, const BamFile &bamfile) Constructs BAM reader, bounded by a genomic interval. All reads that overlap the interval will be available. Exceptions interval: iteration will be bounded by this GenomicInterval. bamfile: input BamFile object std::runtime_error: if either file (*.bam or *.bai) fails to open for reading, or if the interval is invalid BaiIndexedBamReader(const GenomicInterval &interval, BamFile &&bamfile) Constructs BAM reader, bounded by a genomic interval. All reads that overlap the interval will be available. Exceptions interval: iteration will be bounded by this GenomicInterval. bamfile: input BamFile object std::runtime_error: if either file (*.bam or *.bai) fails to open for reading, or if the interval is invalid Random-Access const GenomicInterval &Interval() const Return the current GenomicInterval in use by this reader BaiIndexedBamReader &Interval(const GenomicInterval &interval) Sets a new genomic interval on the reader. Return reference to this reader interval: 10 Chapter 2. C++ API Reference

15 Protected Functions int ReadRawData(BGZF *bgzf, bam1_t *b) Performs the actual raw read of the next record from the BAM file. Default implementation will read records, sequentially, until EOF. Derived readers may use additional criteria to decide which record is next and when reading is done. Return value should be equivalent to htslib s bam_read1(): >= 0 : normal -1 : EOF (not an error) < -1 : error Return integer status code, see description bgzf: BGZF stream pointer b: BAM record pointer 2.5 BamFile #include <pbbam/bamfile.h> class PacBio::BAM::BamFile The BamFile class represents a BAM file. It provides access to header metadata and methods for finding/creating associated index files. Constructors & Related Methods BamFile(const std::string &filename) Creates a BamFile object on the provided filename & loads header information. Exceptions filename: BAM filename std::exception: on failure to open BAM file for reading BamFile(const BamFile &other) BamFile(BamFile &&other) BamFile &operator=(const BamFile &other) BamFile &operator=(bamfile &&other) ~BamFile() 2.5. BamFile 11

16 Index & Filename Methods void CreatePacBioIndex() const Creates a.pbi file for this BAM file. Note Existing index file will be overwritten. Use EnsurePacBioIndexExists() if this is not desired. Exceptions if: PBI file could not be properly created and/or written to disk void CreateStandardIndex() const Creates a.bai file for this BAM file. Note Existing index file will be overwritten. Use EnsureStandardIndexExists() if this is not desired. Exceptions if: BAI file could not be properly created (e.g. this BAM is not coordinate-sorted) or could not be written to disk void EnsurePacBioIndexExists() const Creates a.pbi file if one does not exist or is older than its BAM file. Equivalent to: if (!file.pacbioindexexists()) file.createpacbioindex(); Note As of v , no timestamp check is performed. Previously we requr with an additional timestamp check. Exceptions if: PBI file could not be properly created and/or written to disk void EnsureStandardIndexExists() const Creates a.bai file if one does not exist or is older than its BAM file. Equivalent to: if (!file.standardindexexists()) file.createstandardindex(); Note As of v0.4.2, no timestamp check is performed. Exceptions if: BAI file could not be properly created (e.g. this BAM is not coordinate-sorted) or could not be written to disk std::string Filename() const Return BAM filename bool HasEOF() const Return true if BAM file has EOF marker (empty BGZF block). Streamed input (filename: - ) 12 Chapter 2. C++ API Reference

17 bool PacBioIndexExists() const Return true if.pbi exists and is newer than this BAM file. std::string PacBioIndexFilename() const Return filename of PacBio index file (.pbi ) Note No guarantee is made on the existence of this file. This method simply returns the expected filename. bool PacBioIndexIsNewer() const Return true if.pbi has a more recent timestamp than this file bool StandardIndexExists() const Return true if.bai exists std::string StandardIndexFilename() const Note No guarantee is made on the existence of this file. This method simply returns the expected filename. bool StandardIndexIsNewer() const Return true if.bai has a more recent timestamp than this file File Header Data bool HasReference(const std::string &name) const Return true if header metadata has this reference name const BamHeader &Header() const Return const reference to BamHeader containing the file s metadata bool IsPacBioBAM() const Return true if file is a PacBio BAM file (i.e. has non-empty version associated with header pb tag) int ReferenceId(const std::string &name) const Return ID for reference name (can be used for e.g. GenomicIntervalQuery), or -1 if not found std::string ReferenceName(const int id) const Return name of reference matching id, empty string if not found uint32_t ReferenceLength(const std::string &name) const Return length of requested reference name. 0 if not found uint32_t ReferenceLength(const int id) const Return length of requested reference id. 0 if not found Additional Attributes int64_t FirstAlignmentOffset() const Return virtual offset of first alignment. Intended mostly for internal use. Note that this is a BGZF virtual offset, not a normal file position BamFile 13

18 2.6 BamHeader #include <pbbam/bamheader.h> class PacBio::BAM::BamHeader The BamHeader class represents the header section of the BAM file. It provides metadata about the file including file version, reference sequences, read groups, comments, etc. A BamHeader may be fetched from a BamFile to view an existing file s metadata. Or one may be created/edited for use with writing to a new file (via BamWriter). Note A particular BamHeader is likely to be re-used in lots of places throughout the library, for read-only purposes. For this reason, even though a BamHeader may be returned by value, it is essentially a thin wrapper for a shared-pointer to the actual data. This means, though, that if you need to edit an existing BamHeader for use with a BamWriter, please consider using BamHeader::DeepCopy. Otherwise any modifications will affect all BamHeaders that are sharing its underlying data. Constructors & Related Methods BamHeader() Creates an empty BamHeader. BamHeader(const std::string &samheadertext) Creates a BamHeader from SAM-formatted text. samheadertext: BamHeader(const BamHeader &other) BamHeader(BamHeader &&other) BamHeader &operator=(const BamHeader &other) BamHeader &operator=(bamheader &&other) ~BamHeader() BamHeader DeepCopy() const Detaches underlying data from the shared-pointer, returning a independent copy of the header contents. This ensures that any modifications to the newly returned BamHeader do not affect other BamHeader objects that were sharing its underlying data. Operators BamHeader &operator+=(const BamHeader &other) Merges another header with this one. Headers must be compatible for merging. This means that their Version, SortOrder, PacBioBamVersion (and in the case of aligned BAM data, Sequences) must all match. If not, an exception will be thrown. Return reference to this header 14 Chapter 2. C++ API Reference

19 Exceptions other: header to merge with this one std::runtime_error: if the headers are not compatible BamHeader operator+(const BamHeader &other) const Creates a new, merged header. Headers must be compatible for merging. This means that their Version, SortOrder, PacBioBamVersion (and in the case of aligned BAM data, Sequences) must all match. If not, an exception will be thrown. Both original headers (this header and other) will not be modified. Return merged header Exceptions other: header to merge with this one std::runtime_error: if the headers are not compatible General Attributes std::string PacBioBamVersion() const Return the PacBio BAM version number (@HD:pb) Note This is different from the SAM/BAM version number See BamHeader::Version. std::string SortOrder() const Valid values: unknown, unsorted, queryname, or coordinate Return the sort order used std::string Version() const Return the SAM/BAM version number (@HD:VN) Note This is different from the PacBio BAM version number See BamHeader::PacBioBamVersion BamHeader &PacBioBamVersion(const std::string &version) Sets this header s PacBioBAM version number (@HD:pb). Return reference to this object Exceptions std::runtime_error: if version number cannot be parsed or is less than the minimum version allowed BamHeader 15

20 BamHeader &SortOrder(const std::string &order) Sets this header s sort order label (@HD:SO). Valid values: unknown, unsorted, queryname, or coordinate Return reference to this object BamHeader &Version(const std::string &version) Sets this header s SAM/BAM version number (@HD:VN). Return reference to this object Read Groups bool HasReadGroup(const std::string &id) const Return true if the header contains a read group with id (@RG:ID) ReadGroupInfo ReadGroup(const std::string &id) const Return a ReadGroupInfo object representing the read group matching id (@RG:ID) Exceptions std::runtime_error: if id is unknown std::vector<std::string> ReadGroupIds() const Return vector of read group IDs listed in this header std::vector<readgroupinfo> ReadGroups() const Return vector of ReadGroupInfo objects, representing all read groups listed in this header BamHeader &AddReadGroup(const ReadGroupInfo &readgroup) Appends a read group entry (@RG) to this header. Return reference to this object BamHeader &ClearReadGroups() Removes all read group entries from this header. Return reference to this object BamHeader &ReadGroups(const std::vector<readgroupinfo> &readgroups) Replaces this header s list of read group entries with those in readgroups. Return reference to this object Sequences bool HasSequence(const std::string &name) const Return true if header contains a sequence with name (@SQ:SN) size_t NumSequences() const 16 Chapter 2. C++ API Reference

21 Return number of sequences entries) stored in this header int32_t SequenceId(const std::string &name) const This is the numeric ID used elsewhere throughout the API. Return numeric ID for sequence matching name See BamReader::ReferenceId, PbiReferenceIdFilter, PbiRawMappedData::tId_ Exceptions std::runtime_error: if name is unknown std::string SequenceLength(const int32_t id) const Return the length of the sequence e.g. chromosome length) at index id See SequenceInfo::Length, BamHeader::SequenceId std::string SequenceName(const int32_t id) const Return the name of the sequence at index id See SequenceInfo::Name, BamHeader::SequenceId std::vector<std::string> SequenceNames() const Position in the vector is equivalent to SequenceId. Return vector of sequence names stored in this header SequenceInfo Sequence(const int32_t id) const Return SequenceInfo object at index id See BamHeader::SequenceId Exceptions std::out_of_range: if is an invalid or unknown index SequenceInfo Sequence(const std::string &name) const Return SequenceInfo for the sequence matching name std::vector<sequenceinfo> Sequences() const Return vector of SequenceInfo objects representing the sequences entries) stored in this header BamHeader &AddSequence(const SequenceInfo &sequence) Appends a sequence entry (@SQ) to this header. Return reference to this object BamHeader &ClearSequences() Removes all sequence entries from this header. Return reference to this object BamHeader &Sequences(const std::vector<sequenceinfo> &sequences) Replaces this header s list of sequence entries with those in sequences. Return reference to this object 2.6. BamHeader 17

22 Programs bool HasProgram(const std::string &id) const Return true if this header contains a program entry with ID (@PG:ID) matching id ProgramInfo Program(const std::string &id) const Return ProgramInfo object for the program entry matching id Exceptions std::runtime_error: if id is unknown std::vector<std::string> ProgramIds() const Return vector of program IDs (@PG:ID) std::vector<programinfo> Programs() const Return vector of ProgramInfo objects representing program entries (@PG) stored in this heder BamHeader &AddProgram(const ProgramInfo &pg) Appends a program entry (@PG) to this header. Return reference to this object BamHeader &ClearPrograms() Removes all program entries from this header. Return reference to this object BamHeader &Programs(const std::vector<programinfo> &programs) Replaces this header s list of program entries with those in programs. Return reference to this object Comments std::vector<std::string> Comments() const Return vector of comment (@CO) strings BamHeader &AddComment(const std::string &comment) Appends a comment (@CO) to this header. Return reference to this object BamHeader &ClearComments() Removes all comments from this header. Return reference to this object BamHeader &Comments(const std::vector<std::string> &comments) Replaces this header s list of comments with those in comments. Return reference to this object 18 Chapter 2. C++ API Reference

23 Conversion Methods std::string ToSam() const Return SAM-header-formatted string representing this header s data 2.7 BamReader #include <pbbam/bamreader.h> class PacBio::BAM::BamReader The BamReader class provides basic read-access to a BAM file. The base-class implementation provides a sequential read-through of BAM records. Derived classes may implement other access schemes (e.g. genomic region, PBI-enabled record filtering). Subclassed by PacBio::BAM::BaiIndexedBamReader, PacBio::BAM::PbiIndexedBamReader Constructors & Related Methods BamReader(const std::string &fn) Opens BAM file for reading. Exceptions fn: BAM filename std::runtime_error: if failed to open BamReader(const BamFile &bamfile) Opens BAM file for reading. Exceptions bamfile: BamFile object std::runtime_error: if failed to open BamReader(BamFile &&bamfile) Opens BAM file for reading. Exceptions bamfile: BamFile object std::runtime_error: if failed to open virtual ~BamReader() 2.7. BamReader 19

24 BAM File Attributes const BamFile &File() const Return the underlying BamFile std::string Filename() const Return BAM filename const BamHeader &Header() const Return BamHeader object from BAM header contents BAM File I/O bool GetNext(BamRecord &record) Fetches the next BAM record. Default implementation will read records until EOF. Derived readers may use additional criteria to decide which record is next and when reading is done. Return true if record was read successfully. Returns false if EOF (or end of iterator in derived readers). False is not an error, it indicates end of data. Exceptions record: next BamRecord object. Should not be used if method returns false. std::runtime_error: if failed to read from file (e.g. possible truncated or corrupted file). void VirtualSeek(int64_t virtualoffset) Seeks to virtual offset in BAM. Note This is NOT a normal file offset, but the virtual offset used in BAM indexing. Exceptions std::runtime_error: if failed to seek int64_t VirtualTell() const Return current (virtual) file position. Note This is NOT a normal file offset, but the virtual offset used in BAM indexing. BGZF *Bgzf() const Helper method for access to underlying BGZF stream pointer. Useful for derived readers contact points with htslib methods. Return BGZF stream pointer 20 Chapter 2. C++ API Reference

25 virtual int ReadRawData(BGZF *bgzf, bam1_t *b) Performs the actual raw read of the next record from the BAM file. Default implementation will read records, sequentially, until EOF. Derived readers may use additional criteria to decide which record is next and when reading is done. Return value should be equivalent to htslib s bam_read1(): >= 0 : normal -1 : EOF (not an error) < -1 : error Return integer status code, see description bgzf: BGZF stream pointer b: BAM record pointer 2.8 BamRecord #include <pbbam/bamrecord.h> enum PacBio::BAM::ClipType This enum defines the modes supported by BamRecord clipping operations. Methods like BamRecord::Clip accept Position parameters - which may be in either polymerase or reference coorindates. Using this enum as a flag indicates how the positions should be interpreted. Values: CLIP_NONE No clipping will be performed. CLIP_TO_QUERY Clipping positions are in polymerase coordinates. CLIP_TO_REFERENCE Clipping positions are in genomic coordinates. enum PacBio::BAM::RecordType This enum defines the possible PacBio BAM record types. See ReadGroupInfo::ReadType Values: ZMW Polymerase read. HQREGION High-quality region. SUBREAD Subread (. CCS Circular consensus sequence. SCRAP Additional sequence (barcodes, adapters, etc.) 2.8. BamRecord 21

26 UNKNOWN Unknown read type. POLYMERASE = ZMW enum PacBio::BAM::FrameEncodingType This enum defines the possible encoding modes used in Frames data (e.g. Record::PulseWidth). BamRecord::IPD or Bam- The LOSSY mode is the default in production output; LOSSLESS mode being used primarily for internal applications. See for more information on pulse frame encoding schemes. Values: LOSSY 8-bit compression (using CodecV1) of frame data LOSSLESS 16-bit native frame data class PacBio::BAM::BamRecord The BamRecord class represents a PacBio BAM record. PacBio BAM records are extensions of normal SAM/BAM records. Thus in addition to normal fields like bases, qualities, mapping coordinates, etc., tags are used extensively to annotate records with additional PacBiospecific data. Mapping and clipping APIs are provided as well to ensure that such operations trickle down to all data fields properly. See for more information on standard BAM data, and https: //github.com/pacificbiosciences/pacbiofileformats/blob/3.0/bam.rst for more information on PacBio BAM fields. Subclassed by PacBio::BAM::VirtualZmwBamRecord Constructors & Related Methods BamRecord() BamRecord(BamHeader header) BamRecord(BamRecordImpl impl) BamRecord(const BamRecord &other) BamRecord(BamRecord &&other) BamRecord &operator=(const BamRecord &other) BamRecord &operator=(bamrecord &&other) virtual ~BamRecord() 22 Chapter 2. C++ API Reference

27 General Data std::string FullName() const Return this record s full name See BamRecordImpl::Name BamHeader Header() const Return shared pointer to this record s associated BamHeader int32_t HoleNumber() const Return ZMW hole number Exceptions if: missing zm tag & record name does not contain hole number PacBio::BAM::LocalContextFlags LocalContextFlags() const Return this record s LocalContextFlags std::string MovieName() const Return this record s movie name int32_t NumPasses() const Return number of complete passes of the insert Position QueryEnd() const Return the record s query end position, or Sequence().length() if not stored Note QueryEnd is in polymerase read coordinates, NOT genomic coordinates. Position QueryStart() const Return the record s query start position, or 0 if not stored Note QueryStart is in polymerase read coordinates, NOT genomic coordinates. Accuracy ReadAccuracy() const Return this record s expected read accuracy [0, 1000] ReadGroupInfo ReadGroup() const Return ReadGroupInfo object for this record std::string ReadGroupId() const Return string ID of this record s read group See ReadGroupInfo::Id int32_t ReadGroupNumericId() const Return integer value for this record s read group ID VirtualRegionType ScrapRegionType() const Return this scrap record s scrap region type 2.8. BamRecord 23

28 ZmwType ScrapZmwType() const Return this scrap record s scrap ZMW type std::vector<float> SignalToNoise() const Return this record s average signal-to-noise for each of A, C, G, and T RecordType Type() const Return this record s type See RecordType BamRecord &HoleNumber(const int32_t holenumber) Sets this record s ZMW hole number. Return reference to this record holenumber: BamRecord &LocalContextFlags(const PacBio::BAM::LocalContextFlags flags) Sets this record s local context flags. Return reference to this record flags: BamRecord &NumPasses(const int32_t numpasses) Sets this record s number of complete passes of the insert. Return reference to this record numpasses: BamRecord &QueryEnd(const PacBio::BAM::Position pos) Sets this record s query end position. Note Changing this will modify the name of non-ccs records. Return reference to this record pos: BamRecord &QueryStart(const PacBio::BAM::Position pos) Sets this record s query start position. Note Changing this will modify the name of non-ccs records. Return reference to this record pos: 24 Chapter 2. C++ API Reference

29 BamRecord &ReadAccuracy(const Accuracy &accuracy) Sets this record s expected read accuracy [0, 1000]. Return reference to this record accuracy: BamRecord &ReadGroup(const ReadGroupInfo &rg) Attaches this record to the provided read group, changing the record name & RG tag. Return reference to this record rg: BamRecord &ReadGroupId(const std::string &id) Attaches this record to the provided read group, changing the record name & RG tag. Return reference to this record id: BamRecord &ScrapRegionType(const VirtualRegionType type) Sets this scrap record s ScrapRegionType. Return reference to this record type: BamRecord &ScrapRegionType(const char type) Sets this scrap record s ScrapRegionType. Return reference to this record type: character equivalent of VirtualRegionType BamRecord &ScrapZmwType(const ZmwType type) Sets this scrap record s ScrapZmwType. Return reference to this record type: BamRecord &ScrapZmwType(const char type) Sets this scrap record s ScrapZmwType. Return reference to this record 2.8. BamRecord 25

30 type: character equivalent of ZmwType BamRecord &SignalToNoise(const std::vector<float> &snr) Sets this record s average signal-to-noise in each of A, C, G, and T. Return reference to this record snr: average signal-to-noise of A, C, G, and T (in this order) Mapping Data Position AlignedEnd() const Return the record s aligned end position Note AlignedEnd is in polymerase read coordinates, NOT genomic coordinates. Position AlignedStart() const Return the record s aligned start position Note AlignedStart is in polymerase read coordinates, NOT genomic coordinates. Strand AlignedStrand() const Return the record s strand as a Strand enum value Cigar CigarData(bool exciseallclips = false) const Return the record s CIGAR data as a Cigar object exciseallclips: if true, remove all clipping operations (hard & soft) [default:false] bool IsMapped() const Return true if this record was mapped by aligner uint8_t MapQuality() const Return this record s mapping quality. A value of 255 indicates unknown size_t NumDeletedBases() const Return the number of deleted bases (relative to reference) size_t NumInsertedBases() const Return the number of inserted bases (relative to reference) size_t NumMatches() const Return the number of matching bases (sum of = CIGAR op lengths) std::pair<size_t, size_t> NumMatchesAndMismatches() const Return a tuple containing NumMatches (first) and NumMismatches (second) size_t NumMismatches() const 26 Chapter 2. C++ API Reference

31 Return the number of mismatching bases (sum of X CIGAR op lengths) int32_t ReferenceId() const Return this record s reference ID, or -1 if unmapped. Note This is only a valid identifier within this BAM file std::string ReferenceName() const Return this record s reference name. Exceptions an: exception if unmapped record. Position ReferenceEnd() const Return the record s reference end position, or UnmappedPosition if unmapped Note ReferenceEnd is in reference coordinates, NOT polymerase read coordinates. Position ReferenceStart() const Return the record s reference start position, or UnmappedPosition if unmapped Note ReferenceStart is in reference coordinates, NOT polymerase read coordinates. Barcode Data int16_t BarcodeForward() const Return forward barcode id See HasBarcodes Exceptions std::runtime_error: if barcode data is absent or malformed. uint8_t BarcodeQuality() const Return barcode call confidence (Phred-scaled posterior probability of correct barcode call) See HasBarcodeQuality int16_t BarcodeReverse() const Return reverse barcode id See HasBarcodes Exceptions std::runtime_error: if barcode data is absent or malformed. std::pair<int16_t, int16_t> Barcodes() const Return the forward and reverse barcode ids See HasBarcodes Exceptions std::runtime_error: if barcode data is absent or malformed BamRecord 27

32 BamRecord &Barcodes(const std::pair<int16_t, int16_t> &barcodeids) Sets this record s barcode IDs ( bc tag) Return reference to this record barcodeids: BamRecord &BarcodeQuality(const uint8_t quality) Sets this record s barcode quality ( bq tag) Return reference to this record quality: Phred-scaled confidence call Auxiliary Data Queries bool HasAltLabelQV() const Return true if this record has AltLabelQV data bool HasAltLabelTag() const Return true if this record has AltLabelTag data bool HasBarcodes() const Return true if this record has Barcode data bool HasBarcodeQuality() const Return true is this record has BarcodeQuality data bool HasDeletionQV() const Return true if this record has DeletionQV data bool HasDeletionTag() const Return true if this record has DeletionTag data bool HasHoleNumber() const Return true if this record has a HoleNumber bool HasInsertionQV() const Return true if this record has InsertionQV data bool HasIPD() const Return true if this record has IPD data bool HasLabelQV() const Return true if this record has LabelQV data bool HasLocalContextFlags() const Return true if this record has LocalContextFlags (absent in CCS) 28 Chapter 2. C++ API Reference

33 bool HasMergeQV() const Return true if this record has MergeQV data bool HasNumPasses() const Return true if this record has NumPasses data bool HasPkmean() const Return true if this record has Pkmean data bool HasPkmid() const Return true if this record has Pkmid data bool HasPkmean2() const Return true if this record has Pkmean2 data bool HasPkmid2() const Return true if this record has Pkmid2 data bool HasPreBaseFrames() const Return true if this record has PreBaseFrames aka IPD data bool HasPrePulseFrames() const Return true if this record has PrePulseFrames data bool HasPulseCall() const Return true if this record has PulseCall data bool HasPulseCallWidth() const Return true if this record has PulseCallWidth data bool HasPulseExclusion(void) const Return true if this record has PulseExclusion data bool HasPulseMergeQV() const Return true if this record has PulseMergeQV data bool HasPulseWidth() const Return true if this record has PulseWidth data bool HasReadAccuracy() const Return true if this record has ReadAccuracyTag data bool HasQueryEnd() const Return true if this record has QueryEnd data bool HasQueryStart() const Return true if this record has QueryStart data bool HasScrapRegionType() const Return true if this record has ScrapRegionType data (only in SCRAP) 2.8. BamRecord 29

34 bool HasScrapZmwType() const Return true if this record has scrap ZMW type data (only in SCRAP) bool HasSignalToNoise() const Return true if this record has signal-to-noise data (absent in POLYMERASE) bool HasStartFrame() const Return true if this record has StartFrame data bool HasSubstitutionQV() const Return true if this record has SubstitutionQV data bool HasSubstitutionTag() const Return true if this record has SubstitutionTag data Sequence & Tag Data std::string AltLabelTag(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s AltLabelTag values ( pt tag). Note If aligned is true, and gaps/padding need to be inserted, the new gap chars will be - and padding chars will be *. Return AltLabelTags string orientation: Orientation of output. std::string DeletionTag(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s DeletionTag values ( dt tag). Note If aligned is true, and gaps/padding need to be inserted, the new gap chars will be - and padding chars will be *. Return DeletionTag string orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends std::string Sequence(const Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s DNA sequence (SEQ field). Note If aligned is true, and gaps/padding need to be inserted, the new gap chars will be - and padding chars will be *. Return sequence string 30 Chapter 2. C++ API Reference

35 orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends std::string SubstitutionTag(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s SubstitutionTag values ( st tag). Note If aligned is true, and gaps/padding need to be inserted, the new gap chars will be - and padding chars will be *. Return SubstitutionTags string orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends BamRecord &AltLabelTag(const std::string &tags) Sets this record s AltLabelTag values ( at tag). Return reference to this record tags: BamRecord &DeletionTag(const std::string &tags) Sets this record s DeletionTag values ( dt tag). Return reference to this record tags: BamRecord &SubstitutionTag(const std::string &tags) Sets this record s SubstitutionTag values ( st tag). Return reference to this record tags: Quality Data QualityValues AltLabelQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s AltLabelQV values ( pv tag) BamRecord 31

36 Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return AltLabelQV as QualityValues object orientation: Orientation of output. QualityValues DeletionQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s DeletionQV values ( dq tag). Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return DeletionQV as QualityValues object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends QualityValues InsertionQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s InsertionQV values ( iq tag). Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return InsertionQVs as QualityValues object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends QualityValues LabelQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s LabelQV values ( pq tag). Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return LabelQV as QualityValues object orientation: Orientation of output. QualityValues MergeQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s MergeQV values ( mq tag). Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return MergeQV as QualityValues object 32 Chapter 2. C++ API Reference

37 orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends QualityValues Qualities(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s BAM quality values (QUAL field). Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return BAM qualities as QualityValues object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends QualityValues SubstitutionQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s SubstitutionQV values ( sq tag). Note If aligned is true, and gaps/padding need to be inserted, the new QVs will have a value of 0. Return SubstitutionQV as QualityValues object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends BamRecord &AltLabelQV(const QualityValues &altlabelqvs) Sets this record s AltLabelQV values ( pv tag). Return reference to this record altlabelqvs: BamRecord &DeletionQV(const QualityValues &deletionqvs) Sets this record s DeletionQV values ( dq tag). Return reference to this record deletionqvs: BamRecord &InsertionQV(const QualityValues &insertionqvs) Sets this record s InsertionQV values ( iq tag). Return reference to this record 2.8. BamRecord 33

38 insertionqvs: BamRecord &LabelQV(const QualityValues &labelqvs) Sets this record s LabelQV values ( pq tag). Return reference to this record labelqvs: BamRecord &MergeQV(const QualityValues &mergeqvs) Sets this record s MergeQV values ( mq tag). Return reference to this record mergeqvs: BamRecord &SubstitutionQV(const QualityValues &substitutionqvs) Sets this record s SubstitutionQV values ( sq tag). Return reference to this record substitutionqvs: Pulse Data const float photonfactor Frames IPD(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s IPD values ( ip tag). Note If aligned is true, and gaps/padding need to be inserted, the new frames will have a value of 0; Return IPD as Frames object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends Frames IPDRaw(Orientation orientation = Orientation::NATIVE) const Fetches this record s IPD values ( ip tag), but does not upscale. Return IPD as Frames object orientation: Orientation of output. 34 Chapter 2. C++ API Reference

39 std::vector<float> Pkmean(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s Pkmean values ( pa tag). Return Pkmean as vector<float> object orientation: Orientation of output. std::vector<float> Pkmid(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s Pkmid values ( pm tag). Return Pkmid as vector<float> object orientation: Orientation of output. std::vector<float> Pkmean2(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s Pkmean2 values ( pi tag). Return Pkmean as vector<float> object orientation: Orientation of output. std::vector<float> Pkmid2(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s Pkmid2 values ( ps tag). Return Pkmid as vector<float> object orientation: Orientation of output. Frames PreBaseFrames(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s PreBaseFrames aka IPD values ( ip tag). Note If aligned is true, and gaps/padding need to be inserted, the new frames will have a value of 0; Return IPD as Frames object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends 2.8. BamRecord 35

40 Frames PrePulseFrames(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s PrePulseFrames values ( pd tag). Return PrePulseFrames as Frames object orientation: Orientation of output. std::string PulseCall(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s PulseCall values ( pc tag). Return PulseCalls string orientation: Orientation of output. Frames PulseCallWidth(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s PulseCallWidth values ( px tag). Return PulseCallWidth as Frames object orientation: Orientation of output. std::vector<pacbio::bam::pulseexclusionreason> PulseExclusionReason(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s PulseExclusionReason values ( pe tag). Return vector of pulse exclusion reason value QualityValues PulseMergeQV(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetch this record s PulseMergeQV values ( pg tag). Return PulseMergeQV as QualityValues object orientation: Orientation of output. 36 Chapter 2. C++ API Reference

41 Frames PulseWidth(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s PulseWidth values ( pw tag). Note If aligned is true, and gaps/padding need to be inserted, the new frames will have a value of 0. Return PulseWidths as Frames object orientation: Orientation of output. aligned: if true, gaps/padding will be inserted, per Cigar info. excisesoftclips: if true, any soft-clipped positions will be removed from query ends Frames PulseWidthRaw(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false) const Fetches this record s PulseWidth values ( pw tag), but does not upscale. Return PulseWidth as Frames object orientation: Orientation of output. std::vector<uint32_t> StartFrame(Orientation orientation = Orientation::NATIVE, bool aligned = false, bool excisesoftclips = false, PulseBehavior pulsebehavior = PulseBehavior::ALL) const Fetches this record s StartFrame values ( sf tag). Return StartFrame as uint32_t vector orientation: Orientation of output BamRecord &IPD(const Frames &frames, const FrameEncodingType encoding) Sets this record s IPD values ( ip tag). Return reference to this record frames: encoding: specify how to encode the data (8-bit lossy, or 16-bit lossless) BamRecord &Pkmean(const std::vector<float> &photons) Sets this record s Pkmean values ( pm tag). Return reference to this record photons: BamRecord &Pkmean(const std::vector<uint16_t> &encodedphotons) Sets this record s Pkmean values ( pm tag) BamRecord 37

42 Return reference to this record encodedphotons: BamRecord &Pkmid(const std::vector<float> &photons) Sets this record s Pkmid values ( pa tag). Return reference to this record photons: BamRecord &Pkmid(const std::vector<uint16_t> &encodedphotons) Sets this record s Pkmid values ( pa tag). Return reference to this record encodedphotons: BamRecord &Pkmean2(const std::vector<float> &photons) Sets this record s Pkmean2 values ( ps tag). Return reference to this record photons: BamRecord &Pkmean2(const std::vector<uint16_t> &encodedphotons) Sets this record s Pkmean2 values ( ps tag). Return reference to this record encodedphotons: BamRecord &Pkmid2(const std::vector<float> &photons) Sets this record s Pkmid2 values ( pi tag). Return reference to this record photons: BamRecord &Pkmid2(const std::vector<uint16_t> &encodedphotons) Sets this record s Pkmid2 values ( pi tag). Return reference to this record encodedphotons: 38 Chapter 2. C++ API Reference

43 BamRecord &PreBaseFrames(const Frames &frames, const FrameEncodingType encoding) Sets this record s PreBaseFrames aka IPD values ( ip tag). Return reference to this record frames: encoding: specify how to encode the data (8-bit lossy, or 16-bit lossless) BamRecord &PrePulseFrames(const Frames &frames, const FrameEncodingType encoding) Sets this record s PrePulseFrames values ( pd tag). Return reference to this record frames: encoding: specify how to encode the data (8-bit lossy, or 16-bit lossless) BamRecord &PulseCall(const std::string &tags) Sets this record s PulseCall values ( pc tag). Return reference to this record tags: BamRecord &PulseCallWidth(const Frames &frames, const FrameEncodingType encoding) Sets this record s PulseCallWidth values ( px tag). Return reference to this record frames: encoding: specify how to encode the data (8-bit lossy, or 16-bit lossless) BamRecord &PulseExclusionReason(const std::vector<pacbio::bam::pulseexclusionreason> &reasons) \brief Sets this record s PulseExclusionReason values ( pe tag). Return reference to this record reasons: BamRecord &PulseMergeQV(const QualityValues &pulsemergeqvs) Sets this record s PulseMergeQV values ( pg tag). Return reference to this record pulsemergeqvs: 2.8. BamRecord 39

44 BamRecord &PulseWidth(const Frames &frames, const FrameEncodingType encoding) Sets this record s PulseWidth values ( pw tag). Return reference to this record frames: encoding: specify how to encode the data (8-bit lossy, or 16-bit lossless) BamRecord &StartFrame(const std::vector<uint32_t> &startframe) Sets this record s StartFrame values ( sf tag). Return reference to this record startframe: static std::vector<uint16_t> EncodePhotons(const std::vector<float> &data) Low-Level Access & Operations const BamRecordImpl &Impl() const Warning This method should be considered temporary and avoided as much as possible. Direct access to the internal object is likely to disappear as BamRecord interface matures. Return const reference to underlying BamRecordImpl object BamRecordImpl &Impl() Warning This method should be considered temporary and avoided as much as possible. Direct access to the internal object is likely to disappear as BamRecord interface matures. Return reference to underlying BamRecordImpl object void ResetCachedPositions() const Resets cached aligned start/end. Note This method should not be needed in most client code. It exists primarily as a hook for internal reading loops (queries, index build, etc.) It s essentially a workaround and will likely be removed from the API. void ResetCachedPositions() Resets cached aligned start/end. Note This method should not be needed in most client code. It exists primarily as a hook for internal reading loops (queries, index build, etc.) It s essentially a workaround and will likely be removed from the API. void UpdateName() Updates the record s name (BamRecord::FullName) to reflect modifications to name components (movie name, ZMW hole number, etc.) 40 Chapter 2. C++ API Reference

Armide Documentation. Release Kyle Mayes

Armide Documentation. Release Kyle Mayes Armide Documentation Release 0.3.1 Kyle Mayes December 19, 2014 Contents 1 Introduction 1 1.1 Features.................................................. 1 1.2 License..................................................

More information

PacBio SMRT Analysis 3.0 preview

PacBio SMRT Analysis 3.0 preview PacBio SMRT Analysis 3.0 preview David Alexander, Ph.D. Pacific Biosciences, Inc. FIND MEANING IN COMPLEXITY For Research Use Only. Not for use in diagnostic procedures. Copyright 2015 by Pacific Biosciences

More information

Sequence Alignment/Map Optional Fields Specification

Sequence Alignment/Map Optional Fields Specification Sequence Alignment/Map Optional Fields Specification The SAM/BAM Format Specification Working Group 14 Jul 2017 The master version of this document can be found at https://github.com/samtools/hts-specs.

More information

Lecture 12. Short read aligners

Lecture 12. Short read aligners Lecture 12 Short read aligners Ebola reference genome We will align ebola sequencing data against the 1976 Mayinga reference genome. We will hold the reference gnome and all indices: mkdir -p ~/reference/ebola

More information

The SAM Format Specification (v1.3 draft)

The SAM Format Specification (v1.3 draft) The SAM Format Specification (v1.3 draft) The SAM Format Specification Working Group July 15, 2010 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text

More information

SAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional.

SAM : Sequence Alignment/Map format. A TAB-delimited text format storing the alignment information. A header section is optional. Alignment of NGS reads, samtools and visualization Hands-on Software used in this practical BWA MEM : Burrows-Wheeler Aligner. A software package for mapping low-divergent sequences against a large reference

More information

The SAM Format Specification (v1.3-r837)

The SAM Format Specification (v1.3-r837) The SAM Format Specification (v1.3-r837) The SAM Format Specification Working Group November 18, 2010 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited

More information

GROK User Guide August 28, 2013

GROK User Guide August 28, 2013 GROK User Guide August 28, 2013 Kristian Ovaska Contact: kristian.ovaska@helsinki.fi Biomedicum Helsinki, University of Helsinki, Finland Contents 1 Introduction 1 2 Core concepts: region and region store

More information

File Formats: SAM, BAM, and CRAM. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015

File Formats: SAM, BAM, and CRAM. UCD Genome Center Bioinformatics Core Tuesday 15 September 2015 File Formats: SAM, BAM, and CRAM UCD Genome Center Bioinformatics Core Tuesday 15 September 2015 / BAM / CRAM NEW! http://samtools.sourceforge.net/ - deprecated! http://www.htslib.org/ - SAMtools 1.0 and

More information

xtensor-io Sep 20, 2018

xtensor-io Sep 20, 2018 Sep 20, 2018 INSTALLATION 1 Enabling xtensor-io in your C++ libraries 3 2 Licensing 5 2.1 Installation................................................ 5 2.2 Basic Usage...............................................

More information

Porting to BamTools 2.x. Attention! Derek Barnett 11 October 2011

Porting to BamTools 2.x. Attention! Derek Barnett 11 October 2011 Porting to BamTools 2.x This document describes the process of porting applications from BamTools 1.x (BT-1x) to BamTools 2.x (). If you are migrating older code from BamTools 0.x (BT-0x), please see the

More information

INTRODUCTION AUX FORMATS DE FICHIERS

INTRODUCTION AUX FORMATS DE FICHIERS INTRODUCTION AUX FORMATS DE FICHIERS Plan. Formats de séquences brutes.. Format fasta.2. Format fastq 2. Formats d alignements 2.. Format SAM 2.2. Format BAM 4. Format «Variant Calling» 4.. Format Varscan

More information

CRAM format specification (version 2.1)

CRAM format specification (version 2.1) CRAM format specification (version 2.1) cram-dev@ebi.ac.uk 23 Apr 2018 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version c8b9990 from that

More information

Bazaar Architecture Overview Release 2.8.0dev1

Bazaar Architecture Overview Release 2.8.0dev1 Bazaar Architecture Overview Release 2.8.0dev1 Bazaar Developers November 30, 2018 Contents 1 IDs and keys ii 1.1 IDs..................................................... ii File ids..................................................

More information

For Research Use Only. Not for use in diagnostic procedures.

For Research Use Only. Not for use in diagnostic procedures. SMRT View Guide For Research Use Only. Not for use in diagnostic procedures. P/N 100-088-600-02 Copyright 2012, Pacific Biosciences of California, Inc. All rights reserved. Information in this document

More information

Porting to BamTools 1.x

Porting to BamTools 1.x Porting to BamTools 1.x This document describes the process of porting applications from BamTools 0.x () to BamTools 1.x (). Introduction Our development philosophy with BamTools so far has been to allow

More information

CORE Year 1 Whole Genome Sequencing Final Data Format Requirements

CORE Year 1 Whole Genome Sequencing Final Data Format Requirements CORE Year 1 Whole Genome Sequencing Final Data Format Requirements To all incumbent contractors of CORE year 1 WGS contracts, the following acts as the agreed to sample parameters issued by NHLBI for data

More information

The SAM Format Specification (v1.4-r994)

The SAM Format Specification (v1.4-r994) The SAM Format Specification (v1.4-r994) The SAM Format Specification Working Group January 27, 2012 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text

More information

grib_api.h File Reference

grib_api.h File Reference grib_api.h File Reference Copyright 2005-2013 ECMWF. More... Defines #define GRIB_API_VERSION (GRIB_API_MAJOR_VERSION*10000+G RIB_API_MINOR_VERSION*100+GRIB_API_REVISION_VERSI ON) #define GRIB_SECTION_PRODUCT

More information

Zymkey App Utils: C++

Zymkey App Utils: C++ Zymkey App Utils: C++ Generated by Doxygen 1.8.8 Tue Apr 3 2018 07:21:52 Contents 1 Intro 1 2 Hierarchical Index 5 2.1 Class Hierarchy............................................ 5 3 Class Index 7 3.1

More information

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs)

Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs) next generation sequencing analysis guidelines Demultiplexing Illumina sequencing data containing unique molecular indexes (UMIs) See what more we can do for you at www.idtdna.com. For Research Use Only

More information

CCReflect has a few interesting features that are quite desirable for DigiPen game projects:

CCReflect has a few interesting features that are quite desirable for DigiPen game projects: CCReflect v1.0 User Manual Contents Introduction... 2 Features... 2 Dependencies... 2 Compiler Dependencies... 2 Glossary... 2 Type Registration... 3 POD Registration... 3 Non-Pod Registration... 3 External

More information

cget Documentation Release Paul Fultz II

cget Documentation Release Paul Fultz II cget Documentation Release 0.1.0 Paul Fultz II Jun 27, 2018 Contents 1 Introduction 3 1.1 Installing cget.............................................. 3 1.2 Quickstart................................................

More information

The SAM Format Specification (v1.4-r956)

The SAM Format Specification (v1.4-r956) The SAM Format Specification (v1.4-r956) The SAM Format Specification Working Group April 12, 2011 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text

More information

umicount Documentation

umicount Documentation umicount Documentation Release 1.0 Mickael June 30, 2015 Contents 1 Introduction 3 2 Recommendations 5 3 Install 7 4 How to use umicount 9 4.1 Working with a single bed file......................................

More information

ECE 2400 Computer Systems Programming, Fall 2017 Prelim 2 Prep

ECE 2400 Computer Systems Programming, Fall 2017 Prelim 2 Prep revision: 2017-11-04-22-45 These problems are not meant to be exactly like the problems that will be on the prelim. These problems are instead meant to represent the kind of understanding you should be

More information

Sequence Alignment/Map Format Specification

Sequence Alignment/Map Format Specification Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 28 Feb 2014 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing

More information

Appendix B Boost.Python

Appendix B Boost.Python Financial Modelling in Python By S. Fletcher & C. Gardner 2009 John Wiley & Sons Ltd Appendix B Boost.Python The Boost.Python library provides a framework for seamlessly wrapping C++ classes, functions

More information

Extending CircuitPython: An Introduction

Extending CircuitPython: An Introduction Extending CircuitPython: An Introduction Created by Dave Astels Last updated on 2018-11-15 11:08:03 PM UTC Guide Contents Guide Contents Overview How-To A Simple Example shared-module shared-bindings ports/atmel-samd

More information

Topic 6: A Quick Intro To C. Reading. "goto Considered Harmful" History

Topic 6: A Quick Intro To C. Reading. goto Considered Harmful History Topic 6: A Quick Intro To C Reading Assumption: All of you know basic Java. Much of C syntax is the same. Also: Some of you have used C or C++. Goal for this topic: you can write & run a simple C program

More information

MDF4 Lib. Product Information

MDF4 Lib. Product Information Product Information Table of Contents 1 Overview...3 1.1 Introduction...3 1.2 Application Areas...3 1.3 Overview of Advantages...3 2 Features and Advantages...4 2.1 Supported MDF Versions...4 3 Functional

More information

Read Naming Format Specification

Read Naming Format Specification Read Naming Format Specification Karel Břinda Valentina Boeva Gregory Kucherov Version 0.1.3 (4 August 2015) Abstract This document provides a standard for naming simulated Next-Generation Sequencing (Ngs)

More information

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page.

Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. Welcome to MAPHiTS (Mapping Analysis Pipeline for High-Throughput Sequences) tutorial page. In this page you will learn to use the tools of the MAPHiTS suite. A little advice before starting : rename your

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines 454 GS Junior,

More information

v0.3.0 May 18, 2016 SNPsplit operates in two stages:

v0.3.0 May 18, 2016 SNPsplit operates in two stages: May 18, 2016 v0.3.0 SNPsplit is an allele-specific alignment sorter which is designed to read alignment files in SAM/ BAM format and determine the allelic origin of reads that cover known SNP positions.

More information

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012

SAM / BAM Tutorial. EMBL Heidelberg. Course Materials. Tobias Rausch September 2012 SAM / BAM Tutorial EMBL Heidelberg Course Materials Tobias Rausch September 2012 Contents 1 SAM / BAM 3 1.1 Introduction................................... 3 1.2 Tasks.......................................

More information

DNA Sequence Reads Compression

DNA Sequence Reads Compression DNA Sequence Reads Compression User Guide Release 2.0 March 31, 2014 Contents Contents ii 1 Introduction 1 1.1 What is DSRC?....................................... 1 1.2 Main features.......................................

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

cgatools Installation Guide

cgatools Installation Guide Version 1.3.0 Complete Genomics data is for Research Use Only and not for use in the treatment or diagnosis of any human subject. Information, descriptions and specifications in this publication are subject

More information

RM0327 Reference manual

RM0327 Reference manual Reference manual Multi-Target Trace API version 1.0 Overview Multi-Target Trace (MTT) is an application instrumentation library that provides a consistent way to embed instrumentation into a software application,

More information

Modern and Lucid C++ Advanced for Professional Programmers. Part 12 Advanced Library Design. Department I - C Plus Plus Advanced

Modern and Lucid C++ Advanced for Professional Programmers. Part 12 Advanced Library Design. Department I - C Plus Plus Advanced Department I - C Plus Plus Advanced Modern and Lucid C++ Advanced for Professional Programmers Part 12 Advanced Library Design Thomas Corbat / Prof. Peter Sommerlad Rapperswil, 23.02.2017 HS2017 Topics

More information

Crash Course into. Prof. Dr. Renato Pajarola

Crash Course into. Prof. Dr. Renato Pajarola Crash Course into Prof. Dr. Renato Pajarola These slides may not be copied or distributed without explicit permission by all original copyright holders C Language Low-level programming language General

More information

For Research Use Only. Not for use in diagnostic procedures.

For Research Use Only. Not for use in diagnostic procedures. SMRT View Guide For Research Use Only. Not for use in diagnostic procedures. P/N 100-088-600-03 Copyright 2012, Pacific Biosciences of California, Inc. All rights reserved. Information in this document

More information

use static size for this buffer

use static size for this buffer Software Design (C++) 4. Templates and standard library (STL) Juha Vihavainen University of Helsinki Overview Introduction to templates (generics) std::vector again templates: specialization by code generation

More information

Kakadu and Java. David Taubman, UNSW June 3, 2003

Kakadu and Java. David Taubman, UNSW June 3, 2003 Kakadu and Java David Taubman, UNSW June 3, 2003 1 Brief Summary The Kakadu software framework is implemented in C++ using a fairly rigorous object oriented design strategy. All classes which are intended

More information

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg

High-throughput sequencing: Alignment and related topic. Simon Anders EMBL Heidelberg High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg Established platforms HTS Platforms Illumina HiSeq, ABI SOLiD, Roche 454 Newcomers: Benchtop machines: Illumina MiSeq,

More information

MODERN AND LUCID C++ ADVANCED

MODERN AND LUCID C++ ADVANCED Informatik MODERN AND LUCID C++ ADVANCED for Professional Programmers Prof. Peter Sommerlad Thomas Corbat Director of IFS Research Assistant Rapperswil, FS 2016 LIBRARY API/ABI DESIGN PIMPL IDIOM HOURGLASS

More information

mmap: Memory Mapped Files in R

mmap: Memory Mapped Files in R mmap: Memory Mapped Files in R Jeffrey A. Ryan August 1, 2011 Abstract The mmap package offers a cross-platform interface for R to information that resides on disk. As dataset grow, the finite limits of

More information

BSON C++ API. Version by Sans Pareil Technologies, Inc. Generated by Doxygen Thu Feb :32:39

BSON C++ API. Version by Sans Pareil Technologies, Inc. Generated by Doxygen Thu Feb :32:39 BSON C++ API Version 2.5.1 by Sans Pareil Technologies, Inc. Generated by Doxygen 1.8.3 Thu Feb 28 2013 10:32:39 Contents 1 BSON C++ API Documentation 1 1.1 Contents................................................

More information

Java Overview An introduction to the Java Programming Language

Java Overview An introduction to the Java Programming Language Java Overview An introduction to the Java Programming Language Produced by: Eamonn de Leastar (edeleastar@wit.ie) Dr. Siobhan Drohan (sdrohan@wit.ie) Department of Computing and Mathematics http://www.wit.ie/

More information

Outline. 1 Function calls and parameter passing. 2 Pointers, arrays, and references. 5 Declarations, scope, and lifetimes 6 I/O

Outline. 1 Function calls and parameter passing. 2 Pointers, arrays, and references. 5 Declarations, scope, and lifetimes 6 I/O Outline EDAF30 Programming in C++ 2. Introduction. More on function calls and types. Sven Gestegård Robertz Computer Science, LTH 2018 1 Function calls and parameter passing 2 Pointers, arrays, and references

More information

Stream Model of I/O. Basic I/O in C

Stream Model of I/O. Basic I/O in C Stream Model of I/O 1 A stream provides a connection between the process that initializes it and an object, such as a file, which may be viewed as a sequence of data. In the simplest view, a stream object

More information

ECE 2400 Computer Systems Programming, Fall 2017 Prelim 2 Prep

ECE 2400 Computer Systems Programming, Fall 2017 Prelim 2 Prep revision: 2017-11-04-22-42 These problems are not meant to be exactly like the problems that will be on the prelim. These problems are instead meant to represent the kind of understanding you should be

More information

NGS Data Analysis. Roberto Preste

NGS Data Analysis. Roberto Preste NGS Data Analysis Roberto Preste 1 Useful info http://bit.ly/2r1y2dr Contacts: roberto.preste@gmail.com Slides: http://bit.ly/ngs-data 2 NGS data analysis Overview 3 NGS Data Analysis: the basic idea http://bit.ly/2r1y2dr

More information

libcbor Documentation

libcbor Documentation libcbor Documentation Release 0.4.0 Pavel Kalvoda Jan 02, 2017 Contents 1 Overview 3 2 Contents 5 2.1 Getting started.............................................. 5 2.2 Usage & preliminaries..........................................

More information

CRAM format specification (version 3.0)

CRAM format specification (version 3.0) CRAM format specification (version 3.0) samtools-devel@lists.sourceforge.net 11 May 2017 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version

More information

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012)

USING BRAT-BW Table 1. Feature comparison of BRAT-bw, BRAT-large, Bismark and BS Seeker (as of on March, 2012) USING BRAT-BW-2.0.1 BRAT-bw is a tool for BS-seq reads mapping, i.e. mapping of bisulfite-treated sequenced reads. BRAT-bw is a part of BRAT s suit. Therefore, input and output formats for BRAT-bw are

More information

Topic 6: A Quick Intro To C

Topic 6: A Quick Intro To C Topic 6: A Quick Intro To C Assumption: All of you know Java. Much of C syntax is the same. Also: Many of you have used C or C++. Goal for this topic: you can write & run a simple C program basic functions

More information

Python Working with files. May 4, 2017

Python Working with files. May 4, 2017 Python Working with files May 4, 2017 So far, everything we have done in Python was using in-memory operations. After closing the Python interpreter or after the script was done, all our input and output

More information

20.5. urllib Open arbitrary resources by URL

20.5. urllib Open arbitrary resources by URL 1 of 9 01/25/2012 11:19 AM 20.5. urllib Open arbitrary resources by URL Note: The urllib module has been split into parts and renamed in Python 3.0 to urllib.request, urllib.parse, and urllib.error. The

More information

Introduction to File Systems

Introduction to File Systems Introduction to File Systems CS-3013 Operating Systems Hugh C. Lauer (Slides include materials from Slides include materials from Modern Operating Systems, 3 rd ed., by Andrew Tanenbaum and from Operating

More information

Overview of the sqlite3x and sq3 APIs

Overview of the sqlite3x and sq3 APIs Abstract: Overview of the sqlite3x and sq3 APIs This document give an introduction on how to use the sqlite3x and sq3 C++ wrappers for sqlite3. Some knowledge of sqlite3 is assumed, but not much is needed.

More information

Operating Systems Coursework Task 3

Operating Systems Coursework Task 3 Operating Systems Coursework Task 3 TAR File System Driver DUE: Thursday 30th March @ 4PM GMT File Systems Used for the organised storage of data. Typically hierarchical/tree-based, consisting of directories

More information

WavPack 5 Library Documentation

WavPack 5 Library Documentation WavPack 5 Library Documentation David Bryant November 20, 2016 1.0 Introduction This document describes the use of the WavPack library (libwavpack) from a programmer's viewpoint. The library is designed

More information

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p.

Merge Conflicts p. 92 More GitHub Workflows: Forking and Pull Requests p. 97 Using Git to Make Life Easier: Working with Past Commits p. Preface p. xiii Ideology: Data Skills for Robust and Reproducible Bioinformatics How to Learn Bioinformatics p. 1 Why Bioinformatics? Biology's Growing Data p. 1 Learning Data Skills to Learn Bioinformatics

More information

Type Checking. Chapter 6, Section 6.3, 6.5

Type Checking. Chapter 6, Section 6.3, 6.5 Type Checking Chapter 6, Section 6.3, 6.5 Inside the Compiler: Front End Lexical analyzer (aka scanner) Converts ASCII or Unicode to a stream of tokens Syntax analyzer (aka parser) Creates a parse tree

More information

CSCI-1200 Data Structures Fall 2018 Lecture 22 Hash Tables, part 2 & Priority Queues, part 1

CSCI-1200 Data Structures Fall 2018 Lecture 22 Hash Tables, part 2 & Priority Queues, part 1 Review from Lecture 21 CSCI-1200 Data Structures Fall 2018 Lecture 22 Hash Tables, part 2 & Priority Queues, part 1 the single most important data structure known to mankind Hash Tables, Hash Functions,

More information

python-anyvcs Documentation

python-anyvcs Documentation python-anyvcs Documentation Release 1.4.0 Scott Duckworth Sep 27, 2017 Contents 1 Getting Started 3 2 Contents 5 2.1 The primary API............................................. 5 2.2 Git-specific functionality.........................................

More information

ECE 2400 Computer Systems Programming, Fall 2018 PA2: List and Vector Data Structures

ECE 2400 Computer Systems Programming, Fall 2018 PA2: List and Vector Data Structures School of Electrical and Computer Engineering Cornell University revision: 2018-09-25-13-37 1. Introduction The second programming assignment is designed to give you experience working with two important

More information

NGS Data Visualization and Exploration Using IGV

NGS Data Visualization and Exploration Using IGV 1 What is Galaxy Galaxy for Bioinformaticians Galaxy for Experimental Biologists Using Galaxy for NGS Analysis NGS Data Visualization and Exploration Using IGV 2 What is Galaxy Galaxy for Bioinformaticians

More information

CS201 Latest Solved MCQs

CS201 Latest Solved MCQs Quiz Start Time: 09:34 PM Time Left 82 sec(s) Question # 1 of 10 ( Start time: 09:34:54 PM ) Total Marks: 1 While developing a program; should we think about the user interface? //handouts main reusability

More information

Genomic Files. University of Massachusetts Medical School. October, 2014

Genomic Files. University of Massachusetts Medical School. October, 2014 .. Genomic Files University of Massachusetts Medical School October, 2014 2 / 39. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

manifold Documentation

manifold Documentation manifold Documentation Release 0.0.1 Open Source Robotics Foundation Mar 04, 2017 Contents 1 What is Manifold? 3 2 Installation 5 2.1 Ubuntu Linux............................................... 5 2.2

More information

bistro Documentation Release dev Philippe Veber

bistro Documentation Release dev Philippe Veber bistro Documentation Release dev Philippe Veber Oct 10, 2018 Contents 1 Getting started 1 1.1 Installation................................................ 1 1.2 A simple example............................................

More information

Intermediate Code Generation

Intermediate Code Generation Intermediate Code Generation In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target

More information

Sep. Guide. Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Sep. Guide.  Edico Genome Corp North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Sep 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Corp. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated

More information

Programming refresher and intro to C programming

Programming refresher and intro to C programming Applied mechatronics Programming refresher and intro to C programming Sven Gestegård Robertz sven.robertz@cs.lth.se Department of Computer Science, Lund University 2018 Outline 1 C programming intro 2

More information

Mar. Guide. Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037

Mar. Guide.  Edico Genome Inc North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Mar 2017 DRAGEN TM Quick Start Guide www.edicogenome.com info@edicogenome.com Edico Genome Inc. 3344 North Torrey Pines Court, Plaza Level, La Jolla, CA 92037 Notice Contents of this document and associated

More information

Instantiation of Template class

Instantiation of Template class Class Templates Templates are like advanced macros. They are useful for building new classes that depend on already existing user defined classes or built-in types. Example: stack of int or stack of double

More information

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup starting in 1979 based on C Introduction to C++ also

More information

High-throughout sequencing and using short-read aligners. Simon Anders

High-throughout sequencing and using short-read aligners. Simon Anders High-throughout sequencing and using short-read aligners Simon Anders High-throughput sequencing (HTS) Sequencing millions of short DNA fragments in parallel. a.k.a.: next-generation sequencing (NGS) massively-parallel

More information

LLVM Summer School, Paris 2017

LLVM Summer School, Paris 2017 LLVM Summer School, Paris 2017 David Chisnall June 12 & 13 Setting up You may either use the VMs provided on the lab machines or your own computer for the exercises. If you are using your own machine,

More information

4 Strings and Streams. Testing.

4 Strings and Streams. Testing. Strings and Streams. Testing. 21 4 Strings and Streams. Testing. Objective: to practice using the standard library string and stream classes. Read: Book: strings, streams, function templates, exceptions.

More information

std::string Quick Reference Card Last Revised: August 18, 2013 Copyright 2013 by Peter Chapin

std::string Quick Reference Card Last Revised: August 18, 2013 Copyright 2013 by Peter Chapin std::string Quick Reference Card Last Revised: August 18, 2013 Copyright 2013 by Peter Chapin Permission is granted to copy and distribute freely, for any purpose, provided the copyright notice above is

More information

COMP6771 Advanced C++ Programming

COMP6771 Advanced C++ Programming 1.. COMP6771 Advanced C++ Programming Week 5 Part Two: Dynamic Memory Management 2016 www.cse.unsw.edu.au/ cs6771 2.. Revisited 1 #include 2 3 struct X { 4 X() { std::cout

More information

ASAP - Allele-specific alignment pipeline

ASAP - Allele-specific alignment pipeline ASAP - Allele-specific alignment pipeline Jan 09, 2012 (1) ASAP - Quick Reference ASAP needs a working version of Perl and is run from the command line. Furthermore, Bowtie needs to be installed on your

More information

In I t n er t a er c a t c i t v i e v C+ C + + Com C pilat a ion on (RE ( PL RE ) PL : ) Th T e h L e ean L ean W a W y a by Viktor Kirilov 1

In I t n er t a er c a t c i t v i e v C+ C + + Com C pilat a ion on (RE ( PL RE ) PL : ) Th T e h L e ean L ean W a W y a by Viktor Kirilov 1 Interactive C++ Compilation (REPL): The Lean Way by Viktor Kirilov 1 About me my name is Viktor Kirilov - from Bulgaria 4 years of professional C++ in the games / VFX industries working on personal projects

More information

Continuous Integration INRIA

Continuous Integration INRIA Vincent Rouvreau - https://sed.saclay.inria.fr February 28, 2017 Contents 1 Preamble In this exercise, we will focus on the configuration of Jenkins for: 1. A simple aspect of C++ unit testing 2. An aspect

More information

Introduction to C++ with content from

Introduction to C++ with content from Introduction to C++ with content from www.cplusplus.com 2 Introduction C++ widely-used general-purpose programming language procedural and object-oriented support strong support created by Bjarne Stroustrup

More information

Sequence Alignment/Map Format Specification

Sequence Alignment/Map Format Specification Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 2 Sep 2016 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing

More information

CSC 210, Exam Two Section February 1999

CSC 210, Exam Two Section February 1999 Problem Possible Score 1 12 2 16 3 18 4 14 5 20 6 20 Total 100 CSC 210, Exam Two Section 004 7 February 1999 Name Unity/Eos ID (a) The exam contains 5 pages and 6 problems. Make sure your exam is complete.

More information

Object Oriented Software Design II

Object Oriented Software Design II Object Oriented Software Design II Real Application Design Christian Nastasi http://retis.sssup.it/~lipari http://retis.sssup.it/~chris/cpp Scuola Superiore Sant Anna Pisa March 27, 2012 C. Nastasi (Scuola

More information

FPLLL. Contributing. Martin R. Albrecht 2017/07/06

FPLLL. Contributing. Martin R. Albrecht 2017/07/06 FPLLL Contributing Martin R. Albrecht 2017/07/06 Outline Communication Setup Reporting Bugs Topic Branches and Pull Requests How to Get your Pull Request Accepted Documentation Overview All contributions

More information

GATB programming day

GATB programming day GATB programming day G.Rizk, R.Chikhi Genscale, Rennes 15/06/2016 G.Rizk, R.Chikhi (Genscale, Rennes) GATB workshop 15/06/2016 1 / 41 GATB INTRODUCTION NGS technologies produce terabytes of data Efficient

More information

BanzaiDB Documentation

BanzaiDB Documentation BanzaiDB Documentation Release 0.3.0 Mitchell Stanton-Cook Jul 19, 2017 Contents 1 BanzaiDB documentation contents 3 2 Indices and tables 11 i ii BanzaiDB is a tool for pairing Microbial Genomics Next

More information

Rsubread package: high-performance read alignment, quantification and mutation discovery

Rsubread package: high-performance read alignment, quantification and mutation discovery Rsubread package: high-performance read alignment, quantification and mutation discovery Wei Shi 14 September 2015 1 Introduction This vignette provides a brief description to the Rsubread package. For

More information

File Systems. CS170 Fall 2018

File Systems. CS170 Fall 2018 File Systems CS170 Fall 2018 Table of Content File interface review File-System Structure File-System Implementation Directory Implementation Allocation Methods of Disk Space Free-Space Management Contiguous

More information

ChIP-seq (NGS) Data Formats

ChIP-seq (NGS) Data Formats ChIP-seq (NGS) Data Formats Biological samples Sequence reads SRA/SRF, FASTQ Quality control SAM/BAM/Pileup?? Mapping Assembly... DE Analysis Variant Detection Peak Calling...? Counts, RPKM VCF BED/narrowPeak/

More information

Lecture 3. Essential skills for bioinformatics: Unix/Linux

Lecture 3. Essential skills for bioinformatics: Unix/Linux Lecture 3 Essential skills for bioinformatics: Unix/Linux RETRIEVING DATA Overview Whether downloading large sequencing datasets or accessing a web application hundreds of times to download specific files,

More information