![]() |
|
#1
|
|||
|
|||
I'm in trouble!My n7700 has been working fine for about 480 hours with 7 wd15eads. No trouble apart from a warning on one of the drives on day one (8 reallocated sectors), which has been replaced immediately and no problem since. Until today. First a drive dropped out of the raid (wdtler was activated on all the drives with 7s read/ 7s write). Took the drive out, put it back in, not detected. No SMART info for that drive (all N/A). I backed u what I could, and checked the drive in a desktop with wd diag: it was fine. All smart info fine. Absolutely no problem whatsoever. When the partial backup finished, I switched the 7700 off, hoping it would then recognise the hd. Big mistake! It is now stuck in the infernal "self testing" loop described by a few posted (phildog33 I think). I took all the drives out, tried to boot driveless: same. I tried to carefully disassemble it, checking cables, ram module, drive backplate, etc. Nothing looked bad. But it still doesn't work. I tried (but couldn't find a way) to reset it to factory defaults. The web interface is not accessible, and pressing enter on the LCD has no effect (only escape offers 4 different logs or switche LCD on or off). I tried to ave a look at Peterfu's website to see if I could find a script to reprogram the flash from my usb stick as I did (thanks to his precious help) to downgrade a firmware on my n5200 a few months ago, but there are no scripts for the n7700. Anyone could please help? |
|
#2
|
|||
|
|||
|
Just to update this in case it can help someone else, I have left the n7700 to "self-test" without any drives in for about an hour, and it finally booted.
So I re-flashed the firmware (same 2.0.08) and reset all to factory default. I rebooted a couple of times without the disks just to check, and it worked fine. So I shut it down, put the drives from the degraded raid (not the one that had been dropped out) and it assembled the raid fine, all my data was there. So I popped the "dropped" drive in, it was this time recognised without any problem (no warning in smart info, just like on the desktop), I added it as a spare, and the n7700 is now rebuilding the raid. Final answer in 582 minutes... I don't know if deassembling/reassembling helped, or if it's the reflashing of the DOM and resetting, but fingers crosssed and touching wood, it seems to work fine. I would think that there was no issue with the drive itself, and the problem was with the n7700. I hope it won't happen again... |
|
#3
|
|||
|
|||
|
Hi Manni,
thats a strange story - have no idea what could cause this behaviour, but finally it seems that it works. Had You enabled syslog ? Then there might be some hints on the Problem in the log. br Peter |
|
#4
|
|||
|
|||
|
Thanks Peter for coming to the rescue!
I have had a look at the log, and here is what I found: - The first disk fail (disk 7) on day 1 (500 hours ago) seems to have been caused by "swap disk damage". That's the disk that had 8 reallocated sectors and got replaced immediately. - In the meantime, the NAS worked perfectly, I moved all my data from my n5200, so a lot of intensive tranfers. - I did a few nsync attempts which all failed except the smallest ones (started but never finished, probably because of some activity on source/target, as I read it doesn't work properly in these situations) - Then yesterday at 11am I had a "Hard Disk 3 on N7700 has an I/O error" (three times in a row), then it went into degraded mode. - I then backed up all the data I could backup - Apparently at 2pm it went briefly on battery power (it is on an APC UPS) - At 8:30pm I tried to shutit down, and it didn't shut down properly (web said it was safe to switch it off, but it hadn't powered down so I had to do it manually) - Then I had a couple of "Due to the system has recorded abnormal shut down previously; folder quota synchronization will take place in 10 minutes time. During this period you may face system performance is going down." as I tried to boot but it was in the self-testing cycle (it was probably trying to rebuild quota sync. - Then I took the disks out, disassembled it, reassembled it, let it self-test for about an hour, and you know the rest of the story. So it looks we initially had a failure with disk 3 (which works fine) and then an anormal shutdown (which wasn't the first but for the first time triggered a folder quota sync). I can email you the logs if you want to have a look. The RAID has just finished its rebuild, an all seems absolutely fine, but I still don't know what happened and if it can happen again. I don't believe there is any problem with disk 3 though. Thanks again for your help! Last edited by Manni01; 06.05.2009 at 11:32. |
|
#5
|
|||
|
|||
|
I've read in one of the forums - I don't know in wich one, that the shutdown procedure sometimes fails - but that should not cause a permanent "self test" cycle.
May be You contact thecus support direct via ticket they might be interested in the log file for further analyzes. br Peter |
|
#6
|
|||
|
|||
|
Thanks for the suggestion, I'll do that.
|
|
#7
|
|||
|
|||
|
Well, I have done that and Thecus support got back to me very quickly and asked for the config.bin, which had been reset when I went back to factory defaults.
But fortunately (or rather unfortunately), a different drive, in tray 7, has had an IO error last night and the raid is degraded again. So I have sent log+config.bin to Thecus support for investigation. I don't think it's the disks, there must be something wrong with the box itself, the drive #3 that dropped first and that I put back after checking it in the desktop is still going strong... I am wondering whether it could be related to a bug with the drive power management (I had set it to 60mn). I've deactivated it for now just in case (I'm in raid 6, but I'd still like to avoid another drive failure). I'm going to check HD#7 in the desktop but I'm quite sure it's fine, like #3. What's weird is that all drives say "OK", but there is no smart info for drive 7 (all N/A), like it did when #3 dropped out. In the meantime, I'm backing up the rest of the data on exernal HDs before switching it off this time. |
|
#8
|
|||
|
|||
|
I have drive managament set to 30min and no problems - may be something in combination with the drives.
At least they are working for it. br Peter |
|
#9
|
|||
|
|||
|
I have deactivated power management and drives keep dropping out randomly (#5 has just done so).
I have open two tickets, one with thecus US who has not helped after suggesting an irrelevant "test the HD with Diag and replace if faulty", which proves they don't read the ticket in full as I had already done that. It looks like it's a different support guy from the one who requested my config.bin, as he was not interested in looking at it when I sent the file after a new failure. So I opened another ticket in the European site on Friday, and I haven't got a reply yet, possibly because it was a holiday? Not impressed with Thecus support at the moment. I have no confidence in the box anymore (three failures on three different drives in less than a month), and if it's the drive, it's because of a lack of compatibility, not because of the drives themselves wich are fine when tested. I'm wondering if it could be due to using WDTLER, although I can't see why. I may try to put the drives back to ther default (0,0) and see if it improves things. |
|
#10
|
|||
|
|||
|
I have finally received a reply from Thecus today saying that after analysis of the config.bin they think they may be a defect on the N7700 controller board and suggest I organise a RMA with my dealer.
I should be able to get a direct replacement as the first failure happened within 30 days of purchase, and I'll let you know how how the replacement goes. Thanks to all that helped. |
![]() |
| Thread Tools | |
| Display Modes | |
|
|