Idxstats
Data Types
​
SAM, BAM and FASTQ
Usage
genocat --idxstats my-file.bam.genozip
genocat --idxstats my-file.fq.genozip
​
Description
​
Generates the list of contigs, along with number of mapped and unmapped reads for each contig. Reads with an undefined contig are grouped under “*”.
​
The output format and contents are identical to samtools idxstats.
​
This works both on SAM/BAM and on FASTQ.
For FASTQ, the mapping to contigs is as reported by the Genozip Aligner. The Genozip Aligner maps reads for compression purposes and does not attempt to map them according to the biological truth. However, usually the large majority of reads are in fact mapped to their correct position, so this can give a reasonable approximation of idxstats of the data directly from FASTQ without needing to map it to BAM.
​
Output
A tab-separated list. Four columns: Contig name ; Contig length ; Number of mapped reads ; Number of unmapped reads
​
$ genocat --idxstats my-file.bam.genozip
​
chr1 248956422 22502721 242067
chr2 242193529 22856412 237624
chr3 198295559 18923298 182644
chr4 190214555 18505304 178501
chr5 181538259 16884321 169270
​
Example
​
Alternative sex assignment with sexassign.py (on Linux):
​
$ genocat mysample.bam.genozip --idxstats | sexassign.py /dev/fd/0
​
Questions? support@genozip.com