Top Ten Data Crunching Tips and Tricks
Subject:   "read the input into memory" considered harmful
Date:   2005-06-14 08:28:21
From:   johnsaalwaechter
I don't agree with point #2, especially the advice to read the input into memory. I work in an environment with large data sets, and I've seen many instances where a script that reads the input into memory is developed on small test data, then unleashed on gigabytes of real data. Either the process exceeds its 4GB address space (most perl executables are still 32-bit), or the server itself runs out of memory.

Definitely using strategies other than "suck it all into memory" is required in many environments.

Full Threads Oldest First

Showing messages 1 through 1 of 1.

  • "read the input into memory" considered harmful
    2005-06-22 05:49:40  GregWilson [View]

    I agree, out-of-core algorithms that don't pull everything into memory at once are absolutely necessary in a lot of cases. However, the focus of the book was on automating odds-and-ends tasks, like pulling sales ranking data off Amazon and finding peaks and valleys. (Gosh, why would I be doing that...? ;-) If you can process your data record by record, that's great; if you can't, and your data won't fit in core, then what you have is a real programming task, rather than a one-off throwaway script.