Possible hardware issue with Tyan S3970 motherboard

Dear Lazyweb,

You will probably want to skip this if you have no knowledge of server motherboards and RAM and/or no interest in helping me.

I have a new server based on a Tyan S3970 motherboard with four DIMMs. It was assembled by the supplier and subjected to a burn-in test. It seems however that they did not look at the BIOS event log before shipping it because when I got it, it was full of messages regarding single-bit memory errors with date stamps stretching back through the previous week. This is plausible since the ECC RAM will correct single-bit errors.

Anyway, so I thought it would be a single bad DIMM, turned off ECC in the BIOS and broke out memtest86+.

What I found was that after approximately 90 minutes, memtest86+ reported errors across the entire memory range (implicating all DIMMs). Here’s an example:

 WallTime   Cached  RsvdMem   MemMap   Cache  ECC  Test  Pass  Errors ECC Errs
 ---------  ------  -------  --------  -----  ---  ----  ----  ------ --------
   1:59:27   8192M     160K  e820-Std    on   off   Std     0      80        0
 -----------------------------------------------------------------------------
Tst  Pass   Failing Address          Good       Bad     Err-Bits  Count Chan
---  ----  -----------------------  --------  --------  --------  ----- ----
  7     0  000974b216c -  2420.1MB  9c9a2a71  9c9a0a71  00002000      1
  7     0  00126a320cc -  4714.1MB  819293fd  8192b3fd  00002000      1
  7     0  00114c0012c -  4428.0MB  8e8b2ec2  8e8b0ec2  00002000      1
  7     0  001115920ec -  4373.1MB  652557a0  652577a0  00002000      1
  7     0  00165b030cc -  5723.1MB  86cb57f6  86cb77f6  00002000      1
  7     0  0016069710c -  5638.4MB  b59513f4  b59533f4  00002000      1
  7     0  0014969e0ec -  5270.8MB  15be53f9  15be73f9  00002000      1
  7     0  001094370cc -  4244.4MB  2b779fdd  2b77bfdd  00002000      1
  7     0  00139f8d0ec -  5023.8MB  1c54d9dd  1c54f9dd  00002000      1
  7     0  001568ad0cc -  5480.8MB  318657e8  318677e8  00002000      1

At this point I of course began to suspect the motherboard, but in the interest of thorough testing I decided to try just one pair of DIMMs. These tested for over 3 hours without a problem. I thought perhaps that this pair was good whereas the other pair might be bad, so I swapped them over. The other pair then tested for over 8 hours without error. So it’s definitely not the DIMMs.

I checked that the DIMMs are all identical (they are) and studied the motherboard manual closely:

http://www.tyan.com/manuals/m_s3970_110.pdf

The memory section on page 28 states:

For optimal dual-channel DDR operation, always install memory in pairs beginning with P1_DIMM7 and P1_DIMM8. Refer to the following table for supported DDRII populations.

The table then shows that you should install the DIMMs in pairs, starting with slots 7 and 8. So, 7→8, 5→8, 3→8 and 1→8 are the only supported configurations.

The server had been delivered with slots 1→4 populated. I have just changed that to 5→8 and it’s now over 4 hours into a test without an error, which is the most I’ve achieved with all 4 DIMMs installed. If we assume that no more errors are encountered, would you be satisfied with this conclusion?

I am still a little bit worried that the motherboard is faulty in some way, because giving a consistent single-bit memory error seems like a really weird outcome for running in an unsupported configuration. I would have thought that it would either not detect the RAM, or it would be OK. Is this behaviour something you would expect?

I’ve opened a support request with Tyan to ask if this is normal behaviour, but I’ve no idea when or if they will respond.

2 thoughts on “Possible hardware issue with Tyan S3970 motherboard

  1. Thanks Adrian.

    Perhaps I should ask the supplier to lend me another 4x2G DIMMs so I can test the server fully populated. If there are still no errors then I’ll know it was just down to misconfiguration. If it has errors then I’ll know it’s the motherboard/cpu. Either way I would then return the extra DIMMs.

    Does that seem too cheeky? I can’t help thinking they screwed up by not checking the BIOS event log.

    Cheers,
    Andy

Leave a Reply

Your email address will not be published. Required fields are marked *