I initially asked this on Microsoft Community and Reddit, but because I currently don’t have any answers yet, I’m also asking this here.
I had many BSODses in last few months (more details here where you can also find my system information), initially very often (many times per day), but later “only” few times per week. Because of that, I thought it was partly fixed.
However, on Monday I got another BSOD (stop code: SYSTEM_SERVICE_EXCEPTION, what failed: Ntfs.sys). After PC rebooted, I got “Repairing disk errors. This might take over an hour to complete.” message on startup. After some time, I god message that “Automatic fix could not fix your computer” (or something like this because this was translated and I don’t know what original was).
Log file (SrcTrail.txt) showed many checks/tests with exit code 0x0, but “Check for installed ICU” with exit code 0x3f1.
I used recovery command prompt and tried to show my data. System and programs are stored in SSD C:, while most of my user profile is saved in second HDD drive in “D:\Users\USERNAME” (in recovery command prompt drive letters were reversed so data drive was C: and system was D:). I went into “C:\Users” (because drive letters were reversed) and first tried to just show content of that directory (which should show USERNAME directory inside). It did show my directory, but when I tried “dir USERNAME”, it displayed something like that directory was not found.
I then followed this guide which was supposed to fix “Repairing disk errors” problem. I ran those commands:
- bootrec.exe /rebuildbcd
- bootrec.exe /fixmbr
- bootrec.exe /fixboot
I think that second of them (fixmbr) said that I don’t have enough privileges, but other two worked (or at least displayed that they were successful).
This still didn’t fix problem, so I also ran:
- chkdsk /r c:
- chkdsk /r d:
This took few hours, but I noticed that in Stage 1 or 2 of repairing my data (C:) disk, it said “deleting index entry…” with name of my USERNAME directory and some other data files.
After the process finished, I tried rebooting computer, but I got the same “Repairing disk errors. This might take over an hour to complete.” as before. I also tried booting into safe mode, but the problem persisted.
I also again tried recovery command prompt to view my data, but “C:\Users\USERNAME” directory completely disappeared. I think that some other directories and files on that same drive are still readable, but this doesn’t help me much as most of important data were in that directory which disappeared.
I’m using 64-bit Windows 10 Home, version 1909, build 18363.752, with all available updates installed. My computer is custom-made, but it didn’t have such problems before. I also didn’t change the hardware recently (actually never since the computer was built).
The data HDD I’m using is 2TB WD Gold WD2005FBYZ-01YCBB2. System SSD is not made by WD but it is Samsung 860 EVO. I currently can’t provide other system specifications, but I provide them later if needed.
Based on WD serial number check, I still have warranty until 2023. However, I bought computer from third-party vendor and I don’t know if warranty is also valid there.
I also don’t know which drive exactly failed. My data which disappeared are stored on HDD which is probably broken, but system which is on SSD also didn’t boot. I have some ideas about what happened (just ideas, didn’t actually check anything):
- Both HDD and SSD decided to make (hardware?) failure at the same time, resulting in data loss and nonworking system.
- There was some system failure which corrupted both drives.
- Only HDD got corrupted, but system tries to fix it and fails, and then prevents itself from booting.
Is there any way to fix boot and, more importantly, recover my data? I have some quite important data there. Thank you.
I currently just disconnected both drives from data and power cables and I’m booting Linux from USB. Is this ok?
Update: I contacted WD support about this but I’m currently waiting for answer. But I have some more information about the problem:
I disconnected HDD and tried to normally boot system from SSD. It displays something like that no bootable disk was found. When I manually select Windows from boot manager, it tries to load but fails with the same error as originally. When I then boot Linux from USB, I can mount disk and view my system and program files.
I collected Minidump, chkdsk and some other log files from there. I also copied some of important files to my external disk. I can’t upload them here, but I can upload them somewhere else if needed.
Then I disconnected SSD, connected HDD and again booted from USB. I was also able to mount HDD and view some of my data. However, my user profile folder was still missing.
I also collected chkdsk and other logs from HDD, because they were readable. I also can’t upload them here, but can upload them later.
However, I was then browsing through other remaining folders on that HDD and found that “found.000/dir0000.chk” folder contains content of my missing user profile folder. There are also some other folders like “found.000/dir0001.chk” and so on, but they contain some other (actual/normal) files or files with long, random filenames. There are also few “file00000*.chk” files there.
I then checked a few files in that directory, and it seems that they contain my actual data. I decided to also copy whole “found.000” folder to my external disk.
Because there are a lot of files (more than 200GB), the process was running for few hours. I didn’t check copied data yet, but filenames and contents of some files look correct.
However, I noticed that while copying, my HDD makes very strange noise. At start, sound was very continuous and didn’t stop for a few minutes, but later it was heard just sometimes.
I also remember that this sound was also present in the past, quite a long before the problems started and was mostly present when reading or writing data. Is that sound normal or I had broken HDD for a long time?
What should I do now? I will probably be able to restore most of my files, but the system is still broken so I can’t use it.
I will probably have to reinstall whole OS, but if disks are broken I shouldn’t reinstall system there because they will probably fail again at some point.
Another update: Output of smartctl check is:
$ sudo smartctl -a /dev/sdb
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-99-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD2005FBYZ-01YCBB2
Serial Number: WD-WMC6N0L5PPW5
LU WWN Device Id: 5 0014ee 0af31ac05
Firmware Version: RR07
User Capacity: 2 000 398 934 016 bytes [2,00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri May 15 22:02:27 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 218) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x203d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 184 182 021 Pre-fail Always - 3800
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2297
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5050
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 690
16 Unknown_Attribute 0x0022 000 200 000 Old_age Always - 13517791515
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 8
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 5211
194 Temperature_Celsius 0x0022 119 109 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
More updates: I again tested botrec.exe commands using recovery command prompt. Command bootrec.exe /rebuildbcd
displayed that no Windows installations were found. Command bootrec.exe /fixmbr
displayed that it was successful and command bootrec.exe /fixboot
displayed that access is denied.
I also used smartctl to check my both drives. I uploaded logs here along with other previous logs and recording of HDD sound.
I then used Memtest86 and it found 38236 errors and aborted test on 50% of pass 2 “due too many errors” which probably means that my RAM is almost completely dead and it caused drive corruption. Complete logs are also in OneDrive in Memtest86.zip.
Now I think that I at least know what caused this problem and all BSODses in past few months (broken RAM). I will probably request RMA because it is still in warranty.
However, I still want to know if my drives are permanently broken or I can fix them with formatting them. My HDD makes strange sound so I’m not sure. If they are also broken, I will also request RMA for them because they are also in warranty.
Is it possible to check this from logs I provided? Should I test them with badblocks (and if yes, should I test SSD, HDD or both)?