Samsung 840/850 PRO +more Fatal Trim Firmware bugs?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

coercitiv

Diamond Member
Jan 24, 2014
6,403
12,864
136
More updates. The summary is the problem was related to the Linux kernel. A kernel patch has been issued.

For those new to the thread, the original article on the Algolia blog contains the full story + updates.

UPDATE July 13:
Since the last update of this blog-post, we have been in a cooperation with Samsung trying to help them find the issue, during this investigation we agreed with Samsung not to communicate until their approval. As the issue was not reproduced on our server in Singapore, the reproduction is now running under Samsung supervision in Korea, out of our environment. Although Samsung requested multiple times an access to our software and corrupted data, we could not provide it to them in order to protect the privacy and data of our customers.
Samsung asked us to inform you about this:

  • Samsung tried to duplicate the failure with the latest script provided to them, but no single failure has been reproduced so far.

  • Samsung will do further tests, most likely from week 29 onwards, with a much more intensive script provided by Algolia.
After unsuccessful tries to reproduce the issue with Bash scripts we have decided to help them by creating a small C++ program that simulates the writing style and pattern of our application (no files are open with O_DIRECT). We believe that if the issue is coming from a specific way we are using the standard kernel calls, it might take a couple of days and terabytes of data to be written to the drive. We have been informed by Samsung that no issue of this kind have been reported to them. Our server provider has modified their Ubuntu 14.04 images to disable the fstrim cron in order to avoid this issue. For the last couple of months after not using trim anymore we have not seen the issue again.

UPDATE July 17:
We have just finished a conference call with Samsung considering the failure analysis of this issue. Samsung engineering team has been able to successfully reproduce the issue with our latest provided binary. Samsung had a concrete conclusion that the issue is not related to Samsung SSD or Algolia software but is related to the Linux kernel. Samsung has developed a kernel patch to resolve this issue and the official statement with details will be released tomorrow, July 18 on Linux community with the Linux patch guide. Our testing code is available on GitHub.


This has been an amazing ride, thank you everyone for joining, we have arrived at the destination.
 

Spanners

Senior member
Mar 16, 2014
325
1
0
Another reason to never go near Samsung SSDs from now on

Samsung have really fallen through the floor. TLC issues, bricked firmware issues and now corrupt data. Avoid like the plague.

Oh hey, it's something that confirms what I already think, lets jump on-board before any real information comes to light and make sweeping statements about a companies entire range of drives.
 

redzo

Senior member
Nov 21, 2007
547
5
81
Code:
/* devices that don't properly handle queued TRIM commands */
    { "Micron_M500*",        NULL,    ATA_HORKAGE_NO_NCQ_TRIM |
                        ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Crucial_CT*M500*",        NULL,    ATA_HORKAGE_NO_NCQ_TRIM |
                        ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Micron_M5[15]0*",        "MU01",    ATA_HORKAGE_NO_NCQ_TRIM |
                        ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Crucial_CT*M550*",        "MU01",    ATA_HORKAGE_NO_NCQ_TRIM |
                        ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Crucial_CT*MX100*",        "MU01",    ATA_HORKAGE_NO_NCQ_TRIM |
                        ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Samsung SSD 8*",        NULL,    ATA_HORKAGE_NO_NCQ_TRIM |
                        ATA_HORKAGE_ZERO_AFTER_TRIM, },
Code:
/*
     * As defined, the DRAT (Deterministic Read After Trim) and RZAT
     * (Return Zero After Trim) flags in the ATA Command Set are
     * unreliable in the sense that they only define what happens if
     * the device successfully executed the DSM TRIM command. TRIM
     * is only advisory, however, and the device is free to silently
     * ignore all or parts of the request.
     *
     * Whitelist drives that are known to reliably return zeroes
     * after TRIM.
     */

    /*
     * The intel 510 drive has buggy DRAT/RZAT. Explicitly exclude
     * that model before whitelisting all other intel SSDs.
     */
    { "INTEL*SSDSC2MH*",        NULL,    0, },

    { "Micron*",            NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Crucial*",            NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "INTEL*SSD*",         NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "SSD*INTEL*",            NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "Samsung*SSD*",        NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "SAMSUNG*SSD*",        NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
    { "ST[1248][0248]0[FH]*",    NULL,    ATA_HORKAGE_ZERO_AFTER_TRIM, },
It may be safer not to use Samsung SSDs with Linux, but it may not be safe enough.
Does this mean that Samsung also fixed/patched those Crucial/Micron SSD's?
Quite the irony.
 

redzo

Senior member
Nov 21, 2007
547
5
81
My bad. You are right. Algolia's bug is not related to "queued TRIM".
Though, as of today, I couldn't find any kernel patch or info submitted by samsung regarding this bug.
 

ArtForz

Junior Member
Apr 11, 2015
19
1
36
My bad. You are right. Algolia's bug is not related to "queued TRIM".
Though, as of today, I couldn't find any kernel patch or info submitted by samsung regarding this bug.
It's been posted on the linux-raid list ~2 days ago. Look for "[PATCH] raid0: data corruption when using trim" and its followups.
 

bradly1101

Diamond Member
May 5, 2013
4,689
294
126
www.bradlygsmith.org
/me remembers their platter drives... that had a firmware bug that would cause data-corruption, by reading SMART data while writing data.

/me thinks Samsung storage has always been at the quality level of OCZ.

There's always the SM951/941 that can't write more than 2mins. without throttling at around 100C.
 

rsutoratosu

Platinum Member
Feb 18, 2011
2,716
4
81
Oh wow, by this theory of banning samsung ssd, let ban all the hdd manufacture because they all have x% return rate. I see people bitching about seagate failures. they must be ban.

I saw ios randomly reboot, apple should be ban too. I heard about all the security issues in android, lets ban google too.

Stop bitching about everything, nothing is perfect.
 

Coup27

Platinum Member
Jul 17, 2010
2,140
3
81
Oh wow, by this theory of banning samsung ssd, let ban all the hdd manufacture because they all have x% return rate. I see people bitching about seagate failures. they must be ban.

I saw ios randomly reboot, apple should be ban too. I heard about all the security issues in android, lets ban google too.

Stop bitching about everything, nothing is perfect.
Thanks for that.
 

ArtForz

Junior Member
Apr 11, 2015
19
1
36

Yup, result of that: http://www.spinics.net/lists/stable/msg97984.html
= the patch version proposed in http://www.spinics.net/lists/raid/msg49447.html

Though I'm a bit confused how the reference to commit 20d0189b1012 (mainline since 3.14-rc) in that patch fits with
"We had kernels 3.2, 3.10, 3.13 and 3.16 distributed between the most often corrupted machines and waited to see which of the mines blows up. All of them did." from the algolia blog.

So either there's something else at work here or older kernels with bio_pair_split and friends need a different version of that patch...
 

lenjack

Platinum Member
Oct 10, 1999
2,704
7
81
Had an 840 pro take a dump recently. Would only boot about 10% of the time, but no data loss when it did. Replaced without hassle by Samsung, I went to an inexpensive Sandisk which works just fine. Sold the replacement 840 pro.
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,832
881
126
People have funny logic sometimes. Samsung had a firmware bug in one line of drives that has since been fixed, and now all their SSD's are junk apparently.

By that line of thinking Intel are junk also because of the controller bug in their P67/H67 board that required a recall or because of floating point bugs in some processors back in the day.

Apple had to recall a large batch of laptops because of nvidia's bug, better not buy nvidia or apple products ever again either now that I think about it.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
/me remembers their platter drives... that had a firmware bug that would cause data-corruption, by reading SMART data while writing data.

/me thinks Samsung storage has always been at the quality level of OCZ.

The Spinpoint 1 TB platter drives were the gold standard for good performance with low noise for many years, so me thinks your memory has suffered its own TRIM bug
 

Puffnstuff

Lifer
Mar 9, 2005
16,048
4,806
136
Yes this thread title is misleading and should be corrected so as to not bring people here needlessly only to discover that their free OS is actually the culprit. My 840 pro is still performing its duties in my laptop quite nicely after a very pain staking rma process. Once samsung gets that ironed out they'll be able to sell more drives.
 

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
Yes this thread title is misleading and should be corrected so as to not bring people here needlessly only to discover that their free OS is actually the culprit. My 840 pro is still performing its duties in my laptop quite nicely after a very pain staking rma process. Once samsung gets that ironed out they'll be able to sell more drives.
I'm surprised how the title hasn't been fixed yet, neither by the OP nor the mods, even after numerous posts detailing how it's a Linux only issue, guess that means no one here loves Samsung anymore
 

VirtualLarry

No Lifer
Aug 25, 2001
56,452
10,120
126
Last edited:

Coup27

Platinum Member
Jul 17, 2010
2,140
3
81
I did report the original post a day or 2 ago asking if the thread title could be changed to something more appropriate but as of yet it has not been done.
 

rchunter

Senior member
Feb 26, 2015
933
72
91
I'm surprised how the title hasn't been fixed yet, neither by the OP nor the mods, even after numerous posts detailing how it's a Linux only issue, guess that means no one here loves Samsung anymore

I still love them. My 840 & 850 pro are both still working fine as NTFS drives in windows. I've got an old crucial c300 in my linux box. I'd still buy samsung ssd's but that's mainly because I haven't had any issues. knock on wood.
 

Palorim12

Member
Jul 21, 2015
29
0
6
Someone asked why the issue was seen on the Samsungs but not the Intel Drives.

Deepor on overclock found the answer in the email chains:

"Gionatan,

Because it is related with timing.

1st trim is issued and before complete it, 2nd trim is started
(allocates memory but is not issued to device)
Then, when 1st trim is completed, it frees 2nd trim's memory,
because they share the pointer due to bug.
If 3rd trim is started before 2nd trim is issued to device,
3rd trim can allocate the same memory with 2nd trim,
because it is freed when 1st trim is completed.
Only this case makes the corruption.

Thank you,
Seunguk Shin"
 

Puffnstuff

Lifer
Mar 9, 2005
16,048
4,806
136
I just wanted to add that I've ordered a new 850 pro 256 to update my desktop with and will clean install windows 10 on it once I get the iso from windows update.
 

blakflag

Junior Member
Jun 2, 2005
9
0
66
Speaking of Samsung firmware bugs, wasn't there a big announcement a while ago that they were rolling out a second fix for that same issue because it resurfaced? I never received a firmware update notice.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |