Recently I had an Apache access log file on a remote server that I wanted to archive. However, it was 3GB, and /usr/bin/zip refused to even admit the behemoth’s existence.

First idea that came to mind was splitting the file into smaller chunks that zip could deal with. For some reason, the prospect of an arduous manual process that would take me through Flag Day didn’t appeal, so I poked around via apropos to see what was available:

$ apropos split

Lo and behold, at the end of a bunch of other stuff,

split(1) - split a file into pieces

(The server was running OS X 10.3, which as far as I can tell does not include the more direct zipsplit utility found on 10.4. Same basic idea, though.)

I copied the behemoth to a secondary drive (took a while) and then navigated to its directory.

$ ls -l

which let me know:

-rw------- 1 robert staff 4239286441 10 Jun 04:55 behemoth_log

That’s a lot of bytes. Since I want to get the largest file down to a svelte 500MB, I’ll need to use this:

$ split -b 500m behemoth_log

Which, after a long period of splitting, produces these:

$ ls -lh

-rw-------   1 robert  staff          3G 10 Jun 04:55 behemoth_log
-rw-------   1 robert  staff        500M 10 Jun 05:18 xaa
-rw-------   1 robert  staff        500M 10 Jun 05:19 xab
-rw-------   1 robert  staff        500M 10 Jun 05:20 xac
-rw-------   1 robert  staff        500M 10 Jun 05:20 xad
-rw-------   1 robert  staff        500M 10 Jun 05:21 xae
-rw-------   1 robert  staff        500M 10 Jun 05:22 xaf
-rw-------   1 robert  staff        500M 10 Jun 05:22 xag
-rw-------   1 robert  staff        500M 10 Jun 05:23 xah
-rw-------   1 robert  staff         42M 10 Jun 05:23 xai

Alternatively I could have split it by kilobytes, or by number of lines using the -l line_count flag. There is also the ability to customize the output file names — read up on man split for more info. By the way, I’m guessing you want to limit your splitting to text files, so leave those binaries alone. See comments regarding using split on binaries as well.