Windows 10 doesn't boot after BSOD and possible data loss on WD Gold HDD

I initially asked this on Microsoft Community and Reddit, but because I currently don’t have any answers yet, I’m also asking this here.

I had many BSODses in last few months (more details here where you can also find my system information), initially very often (many times per day), but later “only” few times per week. Because of that, I thought it was partly fixed.

However, on Monday I got another BSOD (stop code: SYSTEM_SERVICE_EXCEPTION, what failed: Ntfs.sys). After PC rebooted, I got “Repairing disk errors. This might take over an hour to complete.” message on startup. After some time, I god message that “Automatic fix could not fix your computer” (or something like this because this was translated and I don’t know what original was).
Log file (SrcTrail.txt) showed many checks/tests with exit code 0x0, but “Check for installed ICU” with exit code 0x3f1.

I used recovery command prompt and tried to show my data. System and programs are stored in SSD C:, while most of my user profile is saved in second HDD drive in “D:\Users\USERNAME” (in recovery command prompt drive letters were reversed so data drive was C: and system was D:). I went into “C:\Users” (because drive letters were reversed) and first tried to just show content of that directory (which should show USERNAME directory inside). It did show my directory, but when I tried “dir USERNAME”, it displayed something like that directory was not found.

I then followed this guide which was supposed to fix “Repairing disk errors” problem. I ran those commands:

  • bootrec.exe /rebuildbcd
  • bootrec.exe /fixmbr
  • bootrec.exe /fixboot

I think that second of them (fixmbr) said that I don’t have enough privileges, but other two worked (or at least displayed that they were successful).
This still didn’t fix problem, so I also ran:

  • chkdsk /r c:
  • chkdsk /r d:

This took few hours, but I noticed that in Stage 1 or 2 of repairing my data (C:) disk, it said “deleting index entry…” with name of my USERNAME directory and some other data files.

After the process finished, I tried rebooting computer, but I got the same “Repairing disk errors. This might take over an hour to complete.” as before. I also tried booting into safe mode, but the problem persisted.
I also again tried recovery command prompt to view my data, but “C:\Users\USERNAME” directory completely disappeared. I think that some other directories and files on that same drive are still readable, but this doesn’t help me much as most of important data were in that directory which disappeared.

I’m using 64-bit Windows 10 Home, version 1909, build 18363.752, with all available updates installed. My computer is custom-made, but it didn’t have such problems before. I also didn’t change the hardware recently (actually never since the computer was built).
The data HDD I’m using is 2TB WD Gold WD2005FBYZ-01YCBB2. System SSD is not made by WD but it is Samsung 860 EVO. I currently can’t provide other system specifications, but I provide them later if needed.
Based on WD serial number check, I still have warranty until 2023. However, I bought computer from third-party vendor and I don’t know if warranty is also valid there.

I also don’t know which drive exactly failed. My data which disappeared are stored on HDD which is probably broken, but system which is on SSD also didn’t boot. I have some ideas about what happened (just ideas, didn’t actually check anything):

  • Both HDD and SSD decided to make (hardware?) failure at the same time, resulting in data loss and nonworking system.
  • There was some system failure which corrupted both drives.
  • Only HDD got corrupted, but system tries to fix it and fails, and then prevents itself from booting.

Is there any way to fix boot and, more importantly, recover my data? I have some quite important data there. Thank you.

I currently just disconnected both drives from data and power cables and I’m booting Linux from USB. Is this ok?

Update: I contacted WD support about this but I’m currently waiting for answer. But I have some more information about the problem:

I disconnected HDD and tried to normally boot system from SSD. It displays something like that no bootable disk was found. When I manually select Windows from boot manager, it tries to load but fails with the same error as originally. When I then boot Linux from USB, I can mount disk and view my system and program files.

I collected Minidump, chkdsk and some other log files from there. I also copied some of important files to my external disk. I can’t upload them here, but I can upload them somewhere else if needed.

Then I disconnected SSD, connected HDD and again booted from USB. I was also able to mount HDD and view some of my data. However, my user profile folder was still missing.

I also collected chkdsk and other logs from HDD, because they were readable. I also can’t upload them here, but can upload them later.

However, I was then browsing through other remaining folders on that HDD and found that “found.000/dir0000.chk” folder contains content of my missing user profile folder. There are also some other folders like “found.000/dir0001.chk” and so on, but they contain some other (actual/normal) files or files with long, random filenames. There are also few “file00000*.chk” files there.

I then checked a few files in that directory, and it seems that they contain my actual data. I decided to also copy whole “found.000” folder to my external disk.

Because there are a lot of files (more than 200GB), the process was running for few hours. I didn’t check copied data yet, but filenames and contents of some files look correct.

However, I noticed that while copying, my HDD makes very strange noise. At start, sound was very continuous and didn’t stop for a few minutes, but later it was heard just sometimes.

I also remember that this sound was also present in the past, quite a long before the problems started and was mostly present when reading or writing data. Is that sound normal or I had broken HDD for a long time?

What should I do now? I will probably be able to restore most of my files, but the system is still broken so I can’t use it.
I will probably have to reinstall whole OS, but if disks are broken I shouldn’t reinstall system there because they will probably fail again at some point.

Another update: Output of smartctl check is:

$ sudo smartctl -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-99-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2005FBYZ-01YCBB2
Serial Number:    WD-WMC6N0L5PPW5
LU WWN Device Id: 5 0014ee 0af31ac05
Firmware Version: RR07
User Capacity:    2 000 398 934 016 bytes [2,00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri May 15 22:02:27 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 218) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x203d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   184   182   021    Pre-fail  Always       -       3800
  4 Start_Stop_Count        0x0032   098   098   000    Old_age   Always       -       2297
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5050
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       690
 16 Unknown_Attribute       0x0022   000   200   000    Old_age   Always       -       13517791515
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       5211
194 Temperature_Celsius     0x0022   119   109   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

More updates: I again tested botrec.exe commands using recovery command prompt. Command bootrec.exe /rebuildbcd displayed that no Windows installations were found. Command bootrec.exe /fixmbr displayed that it was successful and command bootrec.exe /fixboot displayed that access is denied.

I also used smartctl to check my both drives. I uploaded logs here along with other previous logs and recording of HDD sound.

I then used Memtest86 and it found 38236 errors and aborted test on 50% of pass 2 “due too many errors” which probably means that my RAM is almost completely dead and it caused drive corruption. Complete logs are also in OneDrive in Memtest86.zip.

Now I think that I at least know what caused this problem and all BSODses in past few months (broken RAM). I will probably request RMA because it is still in warranty.

However, I still want to know if my drives are permanently broken or I can fix them with formatting them. My HDD makes strange sound so I’m not sure. If they are also broken, I will also request RMA for them because they are also in warranty.

Is it possible to check this from logs I provided? Should I test them with badblocks (and if yes, should I test SSD, HDD or both)?