I have 16 GB RAM, but my Ubuntu system just uses a maximum of 2 Gb!!? What should I do about this, if anything?
TL;DR: I thought I had a data-at-rest corruption error on 2 SSDs, but I think it is after reading the data. How can I diagnose where the failing part is?
My ML training algorithm opens thousands of files (readonly), and yesterday one of the files showed up corrupted. However, when I started exploring the differences between the 3 copies (1 on each of 2 SSDs and 1 HDD), things got more strange. All of the dates and sizes matched perfectly, but the md5sums showed differences in 10 files.
What is even stranger, after I made sure all 3 copies were in sync (using rsync with checksum), a different file on 1 SSD randomly showed corruption. So I compared the md5sum, and it was the odd one out of the 3 copies. However, when I tested it again 2 minutes later, the md5sum matched the other 2. This shows that it isn't corruption on the disk (data-at-rest).
How do I go about figuring out what is failing? I'm going to run a long memtest (which previously passed, a year ago), but I'm unsure what else I can do.
Specs
- Dell T7500 (A18 BIOS - latest from Dell)
- 2x Xeon X5675
- 64GB (4x16GB ECC)
- Drives:
- Samsung 850 EVO 250GB (SSD FW:EMT03B6Q)
- Samsung 860 EVO 500GB (SSD FW:RVT01B6Q)
- WD Blue 4TB (HDD FW: 80.00A80)
- All 3 drives are connect to:
- IO Crest 4-port SATA III PCIe 2.0 x2 Controller Card Green, SI-PEX40057 (chipset Marvell 88SE9230)
- Used because motherboard is SATA 2.0, and I needed the higher throughput. This was the only SATA card that I could boot from, given the Dell's BIOS limitations.
output of free -h
(cache is full because I just ran new match of md5sums on all 3 drives)
total used free shared buff/cache available
Mem: 62G 1.2G 312M 11M 61G 61G
Swap: 2.0G 0B 2.0G
output of sudo lshw -C memory
(I can confirm the 4 sticks are sitting in the correct slots according to the manual. MB DIMM 1 and 2, Riser DIMM 1 and 2)
*-firmware
description: BIOS
vendor: Dell Inc.
physical id: 0
version: A18
date: 10/15/2018
size: 64KiB
capacity: 1984KiB
capabilities: pci pnp apm upgrade shadowing escd cdboot bootselect edd int13floppytoshiba int13floppy720 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification netboot
*-cache:0
description: L1 cache
physical id: 700
size: 384KiB
capacity: 384KiB
capabilities: internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: 701
size: 1536KiB
capacity: 1536KiB
capabilities: internal varies unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: 704
size: 12MiB
capacity: 12MiB
capabilities: internal varies unified
configuration: level=3
*-cache:0
description: L1 cache
physical id: 702
size: 384KiB
capacity: 384KiB
capabilities: internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: 703
size: 1536KiB
capacity: 1536KiB
capabilities: internal varies unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: 705
size: 12MiB
capacity: 12MiB
capabilities: internal varies unified
configuration: level=3
*-memory
description: System Memory
physical id: 1000
slot: System board or motherboard
size: 64GiB
capabilities: ecc
configuration: errordetection=multi-bit-ecc
*-bank:0
description: DIMM DDR3 1333 MHz (0.8 ns)
product: 9965516-433.A00LF
vendor: AMD
physical id: 0
serial: CF38EF94
slot: DIMM 1
size: 16GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:1
description: DIMM DDR3 1333 MHz (0.8 ns)
product: 9965434-110.A00LF
vendor: AMD
physical id: 1
serial: 2D25C605
slot: DIMM 2
size: 16GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:2
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: 2
serial: FFFFFFFF
slot: DIMM 3
width: 64 bits
*-bank:3
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: 3
serial: FFFFFFFF
slot: DIMM 4
width: 64 bits
*-bank:4
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: 4
serial: FFFFFFFF
slot: DIMM 5
width: 64 bits
*-bank:5
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: 5
serial: FFFFFFFF
slot: DIMM 6
width: 64 bits
*-bank:6
description: DIMM DDR3 1333 MHz (0.8 ns)
product: 9965434-110.A00LF
vendor: AMD
physical id: 6
serial: 2E25EB05
slot: RISER DIMM 1
size: 16GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:7
description: DIMM DDR3 1333 MHz (0.8 ns)
product: 9965434-110.A00LF
vendor: AMD
physical id: 7
serial: 2F25DC05
slot: RISER DIMM 2
size: 16GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:8
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: 8
serial: FFFFFFFF
slot: RISER DIMM 3
width: 64 bits
*-bank:9
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: 9
serial: FFFFFFFF
slot: RISER DIMM 4
width: 64 bits
*-bank:10
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: a
serial: FFFFFFFF
slot: RISER DIMM 5
width: 64 bits
*-bank:11
description: DIMM DDR3 Synchronous [empty]
vendor: FFFFFFFFFFFF
physical id: b
serial: FFFFFFFF
slot: RISER DIMM 6
width: 64 bits
Update 1
Dell's built-in system diagnostics ran without issue (I stopped it from doing the memory tests, and did them with memtest86 instead).
Finished tests 1-8 of memtest86 v4 without issues.
I wrote a python script to get a dictionary of all the md5sums in a directory and ran it against the 3 copies simultaneously (but only 1 thread per drive*). It found 7 new discrepancies (out of 3000 files). These were about evenly divided among the 3 drives (so it isn't just an issue with the SSDs). And when I went back to check each of the 7 odd ones out, each md5sum now matched the other 2.
Current ideas:
- I thought that possibly having 2/3 workers accessing files per drive simultaneously might've been the issue, but I've now done a few tests that the errors still show up with sequential access.
- The SATA card is bad in some way. I'll reconnect all 3 drives to the motherboard and run the same test again.
Seems likely to be the SATA card Have now run 3 passes on all 3 drives after connecting them directly to the MB, with 0 md5sum discrepancies. Looks like the SATA card is flaky, and destined for the trash.
is it possible to reboot directly into the memtest86+ (without giving any input during boot) , like the Windows command mdsched
does?
I added RAM to my PC with Ubuntu 18.04 but it does not finish booting the system. I added two memories of 2GB each, before had a module of two cards of 1 Gb each. Check the Acer instructions and the ones you buy are compatible. The notebook is Acer Extensa 5620z Intel Pentium 1.46 GHz 533 MHz 2 GB DDR2.
It does not give any error, because it would sound the beep, but it does not start the system, it passes the Ubuntu logo as always when loading, but it does not start, it keeps the screen with the ubuntu logo (the word ubuntu and the five red dots underneath).
Is there any way to make it recognize the new memory, doing it from the BIOS or in safe boot of Ubuntu?
Thank you.
Edit:
I did more tests and the modules are compatible with the laptop.
I explain: I had a module of two pieces of 1 GB each. I put the new module of 2 pieces of 2 Gb each. And it did not start. Now I put one of those that had first 1 GB and one of the new 2 GB, and boots perfectly, recognizing the 2.92 Gb. Which means it works. How can I do to accept both 2 GB?
I'm sorry if this is poorly worded, but I've done a memory test on one of my computers, and certain memory addresses have errors. This is the first time I've searched. If you need more information, then I can provide it.