Downsampling and Sharding
Applicable Data Types
​
VCF, SAM, BAM, FASTQ, FASTA, GFF3/GVF, 23andMe
Usage
genocat --downsample rate[,shard] genozip_files
​
Description
​
Shows one line (or read in the case of FASTQ) in every rate lines. The optional shard parameter (0-based) determines which of the rate lines is shown. The default value of shard is 0.
Downsampling is applied as the final filter after all other filters (--interleave, --grep, --regions, --no-header, --luft etc) are applied.
​
Example
​
Getting the middle read of every 3 consecutive FASTQ reads (i.e. read 1 of every {0,1,2}):
​
$ genocat my-file.fq.genozip @A00910:85:HYGWJDSXX:1:1101:3025:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTGGGGGTGGGGATCCCTATCTTAGCTGTTGCAATCCCTGGGCTGCTTCAGTGTTAATAACATTCCAAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:8160:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NATTATGAGAGAGTGCTTTTTACAATGTTAATGACATGTTATAATAAAGTAATCTTACAATAAACAAGAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:9028:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NCTACAATGTGTGACAACAATAATGTAAAAGGTAGATGAAATTAAAGTACCTAGCAATATTAGGAAATTG
+
#FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFF:F:,FF
@A00910:85:HYGWJDSXX:1:1101:15067:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTGTAGCATGCTCTTTGGTGCAAATTGACGAGCAGATTCTAAAAGTCACAGAGAAATGCAAAAGACCCTG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:
@A00910:85:HYGWJDSXX:1:1101:16007:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTCAGAGGCTTCCGGCTAAATAGTAATACAAGTAGCACAAACAACAGAGTGAGAATGTTTATCACACTC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:16984:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTCTATTTTGCCCCTGAGGGTGCATCCCGAAGAGGGAAGCTATTGATTTTTAACACTAGACACATAAAC
+
#:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:20636:1000 1:N:0:CAACGAGAGC+GAATTGAGTG NTATATACCTATTTTCATATTTTTGTCAGTGTTGGTCAGATTTTTAGAAGTGAGATTTGCTAGCAAAAAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:21811:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NCTTTCAAGAGCAGCCCCAGCTCCTTAAGCTGCTGGTCCTGGTGCATCTGCTGACTTTCATGTAGAAGAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:1714:1016 1:N:0:CAACGAGAGC+GAATTGAGTG
NATATTGGTCTTATGATCATAAATTTTCTCAGCATTTATATTCTGAAGAATATATATTTCCTGTTTATTT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFF
$ genocat my-file.fq.genozip --downsample 3,1
@A00910:85:HYGWJDSXX:1:1101:8160:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NATTATGAGAGAGTGCTTTTTACAATGTTAATGACATGTTATAATAAAGTAATCTTACAATAAACAAGAA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:16007:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NTTCAGAGGCTTCCGGCTAAATAGTAATACAAGTAGCACAAACAACAGAGTGAGAATGTTTATCACACTC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
@A00910:85:HYGWJDSXX:1:1101:21811:1000 1:N:0:CAACGAGAGC+GAATTGAGTG
NCTTTCAAGAGCAGCCCCAGCTCCTTAAGCTGCTGGTCCTGGTGCATCTGCTGACTTTCATGTAGAAGAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
​
Questions? support@genozip.com