Women in Technology

Hear us Roar

  Top Ten Data Crunching Tips and Tricks
Subject:   Data crunching? Crunching? Naw...
Date:   2005-06-16 06:02:14
From:   ScarletKnight
... I'm a former mainframe programmer. You don't crunch anything close to data here. Those tips are definitely less than useful. Simple SQL tips for data crunching? Give me a break. I thought I was going to read about how to optimize your transforms for speed without losing maintainability, or even some references to offload techniques to stay out of the database when it doesn't make sense, i.e. knowing the limits of set-oriented transforms versus proecedural ones.

Thumbs down.

Main Topics Oldest First

Showing messages 1 through 2 of 2.

  • Data crunching? Crunching? Naw...
    2005-06-22 05:46:02  GregWilson [View]

    I'm sorry you think I'm not crunching "anything close to data" here. I routinely use these techniques to do one-time reformatting of gigabyte (and larger) scientific data sets. That may no longer be as impressive as it once was (I can still remember the first time I saw a file that was more than a megabyte long ;-), but as I say a couple of times in the book, my focus is on the things you can do in a few minutes to handle everyday tasks, not on what you'd do to run Google or American Express.
  • Data crunching? Crunching? Naw...
    2005-06-19 23:33:36  alcabon [View]

    I m not agree with you at all. This article is a clear synthesis about the common techniques of data crunching. I was working a long time as a mainframe programmer (IBM MVS). Mainframe means large company and often high license costs for tools like SAS for example (150.000 euros/year). With SAS, you can also do data crunching (sql + SAS language) but you cannot write a large audience book saying : "you must use SAS because it is a good tool for data crunching". With unix, large company can also pay the license for Informatica for example (I used it also during months) for EDI especially but the cost is also prohibitive for small companies. I am working now in an simple unix-windows intranet environment (without SAS, Syncsort or Informatica) and Greg Wilson thinks straigth. I had started to use computer with a ZX81 16ko and 3,5 Hz now I ve got a Pentium 4-1024 Mo RAM and 120 go disk. Using memory is completely different in this condition. For offload techniques, it is a DBA problem related to your RDBMS. With Cloudscape (free and 2 Mo size footprint), I succeded import (csv files) and export with modifications of formats (using sql requests and functions) the data of millions of rows under some minutes using my simple PC. It is very easy to do. The most important for this work of data crunching with RDBMS is to write the straight sql code.
    Best regards.