ECC RAM Explained

From Free Knowledge Base- The DUCK Project: information for everyone
Revision as of 11:12, 2 March 2014 by Admin (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Parity checking provides single-bit error detection for the system memory, but does not handle multi-bit errors, and provides no way to correct memory errors. An advanced error detection and correction protocol was invented to go a step beyond simple parity checking. Called ECC, which stands for error correcting circuits, error correcting code, or error correction code, this protocol not only detects both single-bit and multi-bit errors, it will actually correct single-bit errors on the fly, transparently. Like parity checking, ECC requires a setting in the BIOS program to be enabled. Often there are two; one turns on parity checking and the other tells the system to use ECC mode.

Warning: Don't assume that ordering a PC with ECC memory means that it will be properly configured for you in the BIOS setup. Very often it isn't; a lot of PC vendors don't even understand what ECC is, never mind how to enable it.

ECC uses a special algorithm to encode information in a block of bits that contains sufficient detail to permit the recovery of a single bit error in the protected data. Unlike parity, which uses a single bit to provide protection to eight bits, ECC uses larger groupings: 7 bits to protect 32 bits, or 8 bits to protect 64 bits. There are special ECC memory modules designed specifically for use in ECC mode, but most modern motherboards that support ECC will in fact work in that mode using standard parity memory modules as well. Since parity memory includes one extra bit for every eight bits of data, this means 64 bits worth of parity memory is 72 bits wide, which means there is enough to do ECC. In fact, parity SIMMs are 36 bits wide (two are used in a fifth or sixth generation system) and parity DIMMs are 72 bits.

Note: ECC requires special chipset support. When supported and enabled, ECC will function using ordinary parity memory modules; this is the standard way that most motherboards that support ECC operate. The chipset "groups" together the parity bits into the 7-bit block needed for ECC. Many of these motherboards also support the special ECC-only modules, but some boards support parity modules in ECC mode but not the ECC-only modules. See this section for more on the difference between the two types.

ECC has the ability to correct a detected single-bit error in a 64-bit block of memory. When this happens, the computer will continue without a hiccup; it will have no idea that anything even happened. However, if you have a corrected error, it is useful to know this; a pattern of errors can indicate a hardware problem that needs to be addressed. Chipsets allowing ECC normally include a way to report corrected errors to the operating system, but it is up to the operating system to support this. Windows NT and Linux do detect these messages, but Windows 95 does not. In the latter case, you will not know when ECC has corrected a single-bit error. The user must decide if this is a concern or not; setting the system for simple parity checking will cause notification when an error occurs, but on-the-fly correction will be lost.

ECC will detect (but not correct) errors of 2, 3 or even 4 bits, in addition to detecting (and correcting) single-bit errors. ECC memory handles these multi-bit errors similarly to how parity handles single-bit errors: a non-maskable interrupt (NMI) that instructs the system to shut down to avoid data corruption. Multi-bit errors are extremely rare in memory.

Unlike parity checking, ECC will cause a slight slowdown in system operation. The reason is that the ECC algorithm is more complicated, and a bit of time must be allowed for ECC to correct any detected errors. The penalty is usually one extra wait state per memory read. This translates in most cases to a real world decrease in performance of approximately 2-3%.