top of page

Downsampling and Sharding

Applicable Data Types

​

VCF, SAM, BAM, FASTQ, FASTA, GFF3/GVF, 23andMe

 

Usage

 

genocat --downsample rate[,shardgenozip_files

​

Description
​

Shows one line (or read in the case of FASTQ) in every rate lines. The optional shard parameter (0-based) determines which of the rate lines is shown. The default value of shard is 0.

 

Downsampling is applied as the final filter after all other filters (--interleave--grep--regions--no-header--luft etc) are applied.

​

Example

​

Getting the middle read of every 3 consecutive FASTQ reads (i.e. read 1 of every {0,1,2}):

​

$ genocat my-file.fq.genozip @A00910:85:HYGWJDSXX:1:1101:3025:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

 

NTTGGGGGTGGGGATCCCTATCTTAGCTGTTGCAATCCCTGGGCTGCTTCAGTGTTAATAACATTCCAAA

+

#FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:8160:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NATTATGAGAGAGTGCTTTTTACAATGTTAATGACATGTTATAATAAAGTAATCTTACAATAAACAAGAA

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:9028:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NCTACAATGTGTGACAACAATAATGTAAAAGGTAGATGAAATTAAAGTACCTAGCAATATTAGGAAATTG

+

#FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFF:F:,FF

@A00910:85:HYGWJDSXX:1:1101:15067:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NTGTAGCATGCTCTTTGGTGCAAATTGACGAGCAGATTCTAAAAGTCACAGAGAAATGCAAAAGACCCTG

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:

@A00910:85:HYGWJDSXX:1:1101:16007:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NTTCAGAGGCTTCCGGCTAAATAGTAATACAAGTAGCACAAACAACAGAGTGAGAATGTTTATCACACTC

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:16984:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NTTCTATTTTGCCCCTGAGGGTGCATCCCGAAGAGGGAAGCTATTGATTTTTAACACTAGACACATAAAC

+

#:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:20636:1000 1:N:0:CAACGAGAGC+GAATTGAGTG NTATATACCTATTTTCATATTTTTGTCAGTGTTGGTCAGATTTTTAGAAGTGAGATTTGCTAGCAAAAAT

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:21811:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NCTTTCAAGAGCAGCCCCAGCTCCTTAAGCTGCTGGTCCTGGTGCATCTGCTGACTTTCATGTAGAAGAT

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:1714:1016 1:N:0:CAACGAGAGC+GAATTGAGTG

NATATTGGTCTTATGATCATAAATTTTCTCAGCATTTATATTCTGAAGAATATATATTTCCTGTTTATTT

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFF

 

$ genocat my-file.fq.genozip --downsample 3,1

 

@A00910:85:HYGWJDSXX:1:1101:8160:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NATTATGAGAGAGTGCTTTTTACAATGTTAATGACATGTTATAATAAAGTAATCTTACAATAAACAAGAA

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:16007:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NTTCAGAGGCTTCCGGCTAAATAGTAATACAAGTAGCACAAACAACAGAGTGAGAATGTTTATCACACTC

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF

@A00910:85:HYGWJDSXX:1:1101:21811:1000 1:N:0:CAACGAGAGC+GAATTGAGTG

NCTTTCAAGAGCAGCCCCAGCTCCTTAAGCTGCTGGTCCTGGTGCATCTGCTGACTTTCATGTAGAAGAT

+

#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

​

Questionssupport@genozip.com

bottom of page