summaryrefslogtreecommitdiffstats
path: root/system/splitjob/README
diff options
context:
space:
mode:
Diffstat (limited to 'system/splitjob/README')
-rw-r--r--system/splitjob/README28
1 files changed, 28 insertions, 0 deletions
diff --git a/system/splitjob/README b/system/splitjob/README
new file mode 100644
index 0000000000..985797496b
--- /dev/null
+++ b/system/splitjob/README
@@ -0,0 +1,28 @@
+This program is used to split up data from stdin in blocks which are
+sent as input to parallel invocations of commands. The output from
+those are then concatenated in the right order and sent to stdout.
+
+Splitting up and parallelizing jobs like this might be useful to speed
+up compression using multiple CPU cores or even multiple computers.
+
+For this approach to be useful, the compressed format needs to allow
+multiple compressed files to be concatenated. This is the case for
+gzip, bzip2, lzip and xz.
+
+Example 1, use multiple logical cores:
+splitjob -j 4 bzip2 < bigfile > bigfile.bz2
+
+Example 2, use remote machines:
+splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz
+
+The above example assumes that ssh is configured to allow logins
+without asking for password. See the manpage for ssh-keygen or do
+a google search for examples on how to accomplish this.
+
+Example 3, Use bigger blocks to reduce overhead:
+splitjob -j 2 -b 10M gzip < file > file.gz
+
+For "xz -9" a block size of 384 MB gives best compression.
+
+Example 4, parallel decompression:
+splitjob -X -r 10 -j 10 -b 384M "xz -d -" < file.xz > file