| Article: |
Confessions of the World's Largest Switcher | |
| Subject: | an important piece is missing..... | |
| Date: | 2003-10-29 15:20:11 | |
| From: | anonymous2 | |
|
Response to: an important piece is missing.....
|
||
| Part of the advantage of clusters is that you don't NEED to keep every node running. If 1 of your 1100 nodes breaks, you have a 1099-node cluster. You simply take it out and either repair or replace it. There's no real 'system' to it's reliability, other than redundancy. | ||
Showing messages 1 through 2 of 2.
-
an important piece is missing.....
2003-10-29 20:29:10 anonymous2 [Reply | View]
That's not what anonymous 1 meant ... statistically there is a possibility that there will be an error in RAM that flips a bit randomly (caused by a stray cosmic ray or whatever). ECC RAM has an extra chip on the memory module to compensate for that possibility. The longer the calculation and the more computers involved the more likely that a RAM error will occur. It would be interesting to know how they do resolve the problem.




Especially if you have to then piece together a dataset with incomplete data.