diff options
Diffstat (limited to 'system/splitjob/README')
-rw-r--r-- | system/splitjob/README | 28 |
1 files changed, 28 insertions, 0 deletions
diff --git a/system/splitjob/README b/system/splitjob/README new file mode 100644 index 0000000000..985797496b --- /dev/null +++ b/system/splitjob/README @@ -0,0 +1,28 @@ +This program is used to split up data from stdin in blocks which are +sent as input to parallel invocations of commands. The output from +those are then concatenated in the right order and sent to stdout. + +Splitting up and parallelizing jobs like this might be useful to speed +up compression using multiple CPU cores or even multiple computers. + +For this approach to be useful, the compressed format needs to allow +multiple compressed files to be concatenated. This is the case for +gzip, bzip2, lzip and xz. + +Example 1, use multiple logical cores: +splitjob -j 4 bzip2 < bigfile > bigfile.bz2 + +Example 2, use remote machines: +splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz + +The above example assumes that ssh is configured to allow logins +without asking for password. See the manpage for ssh-keygen or do +a google search for examples on how to accomplish this. + +Example 3, Use bigger blocks to reduce overhead: +splitjob -j 2 -b 10M gzip < file > file.gz + +For "xz -9" a block size of 384 MB gives best compression. + +Example 4, parallel decompression: +splitjob -X -r 10 -j 10 -b 384M "xz -d -" < file.xz > file |