Using Genozip on an HPC
How to register Genozip for a batch job
​
Genozip requires registration prior to use, and its a violation of Genozip’s license to use Genozip without registration.
Option 1 (easier): register on the login node of the HPC with genozip --register. You need to do this only once.
​
Option 2: If you are not able to register on the HPC’s login node or your batch script does not have access to your home directory, do the following:
​
1. Register Genozip on another computer. You can skip this step if you have already used (and hence registered) Genozip on this computer:
​
genozip --register
​
âž” The license file located at ~/.genozip_license.v15 on Linux and Mac and %APPDATA%\genozip\.genozip_license.v15 on Windows. Note that the file name begins with a ".".
​
2. Copy the license file to the target computer (any directory, any filename).
​
3. Use the --licfile option to point genozip to the license file, for example:
​
genozip --licfile mydir/.genozip_license.v15 mydata.bam
​​
​
Running Genozip in a Docker container: reference file caching
​
When compressing (or uncompressing) a file, Genozip caches the reference file in memory. It does so by utilizing Sys-V IPC shared memory:
​
> ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x4a67d1d2 6 divon 666 56 0
0x32522cc4 9 divon 666 2797557200 0
​
In order to avoid loading the reference file from disk with each execution and having mulitple copies of the reference consuming RAM if running multiple instances of Genozip in parallel, it is advisable to share the shared memory between docker containers. Luckily, Docker allows doing just that using docker run --ipc.
One strategy is to have a docker container which holds the reference, and the other containers using it. To load reference data without conducting a compression, one could use something like:
​
genocat --reference hs37d5.ref.genozip --regions none
​
It is possible to repeat this command with another reference file in order to cache more than one reference in memory.
​
Note that Genozip identifies reference files by their full path, so for caching to work, the same path must be used in all containers.
​​
To remove a cached reference from RAM use genozip --no-cache. This will mark the reference for removal from RAM, and it will actually be removed when the last process using it terminates.
​
Building an EasyBuild module
​​
Please find sample scripts here.
​
Questions? email support@genozip.com for help.