EX2 and Amazon S3 backup failure and versioning

JOSH_LOOMIS · April 26, 2016, 5:44pm

I recently replaced my ancient SimpleShare NAS with the EX2. With the SimpleShare, I used FastGlacier to backup directly to AWS Glacier. This was manually undertaken, and I was very excited when the time came to upgrade to the EX2, which would automate the process (paired with S3 lifecycle policies).

So, I set up an AWS S3 bucket similar to how is described in this useful blog entry: Brian Beach | My Cloud EX2 Backup to Amazon S3

Backup of roughly 260 GB started great, with no problems. However, it’s now failed twice, I’m guessing due to connection issues. I simply restarted the backup, which is now still running.

However, I’ve now looked in my S3 bucket, and I see several versions of files that have gone unchanged. I’m a bit confused as to why the EX2 would upload multiple copies of the same file if that file remains unmodified. Sure, the S3 bucket lifecycle policy will delete these eventually, but I don’t want to upload my whole collection every time the job runs, unless a particular file has actually been added to the collection or modified.

Has anyone encountered this, and are there any suggested fixes?

Many thanks!

ERmorel · April 27, 2016, 6:17pm

Hi,

I haven’t this case before, Let’s see if any of the users on the Community can share some information about it.

JOSH_LOOMIS · May 4, 2016, 9:21pm

I had a case open with support, and they said they were going to review the logs and escalate to engineering, but then they closed my case.

MrqDude · May 5, 2016, 11:12pm

Bedsides myself, you are one of the few people in these forums that I have seen who use or possibly (hopefully) understand Amazon S3. You will also find there is virtually no good documentation on the built-in S3 functionality, and as far as tech support … good luck with that because they know virtually nothing about S3 and since there is no real screen help/documentation, there is nothing for them to parrot back to you when you call them.

I have been using S3 for several years on different NAS and certain servers (Linux, Windows Server, etc.) so I will share what I have learned from painful trial and error on the MyCloud2.

There are five screens you need to complete to configure an S3 job.

First screen – The job name, which is the same name of the top level directory the built in S3 will use when it uploads to your bucket, so choose wisely.

The second screen - When you create your S3 job, make sure you select the correct S3 region for your corresponding account. Even though Amazon would rather people create/use an IAM user account, I use the AWS Secret Access Key method. For the remote path, use the bucket name. I think I tried using a fully qualified path name awhile back but said screw it and used the bucket name.

Third Screen - Now comes the real garbage part, here you choose the Type and Backup Type, of course, there is no onscreen help/notes/documentation so it’s all trial and error here. Originally I tried making incremental backups using the Incremental Backup Type, but that didn’t seem to work well for me and I still wanted to only backup/copy files that have changed since the last S3 backup. So I ended up selecting Upload as my Type and Overwrite Existing File(s) as my Backup Type (I’m thinking what clown at WD chose that description because it doesn’t tell you if this is an incremental backup, full back up, whatever). Well after running this for a few weeks, the backup type “Overwrite Existing File(s)” seems to be working as an incremental backup solution but the good part is that it does not create a “new file” each time a file has changed, it overwrites the previous file of the same name.

The rest of the configuration screens are relatively straight forward. The fourth screen allows you to select sub folders for back up, it’s not super flexible (e.g., you may have to create multiple jobs to backup certain folders because of the parent/child folder relationship.

I hope this helps.

EDIT: I forgot to mention, I was having trouble using the Edge browser when creating an S3 job, here is that post.

JOSH_LOOMIS · May 6, 2016, 4:06pm

Well I’ll be damned. Sounds like their backup types are screwy.

I ran a “mini” test backup a few days ago in incremental mode, and it created a file listing on the S3 side of files it had uploaded (in a separate folder). Does it do the same thing when using full backup? My theory at the time was that it was somehow using this list to check which files had been uploaded, and since it uploaded the list at the end, any failed backup would have to start from the beginning since the list hadn’t been uploaded. Seemed like an odd way of doing things. Anyway…

When you run your “full” (AKA, incremental) backups, do you happen to have versioning turned on? Do you have any bucket policies in place (e.g., Glacier, infrequent access S3, or deletion of old versions)?

MrqDude · May 6, 2016, 5:34pm

Screwy is an understatement LOL.

As you probably know, S3 is a fairly sophisticated archiving system and well beyond the understanding/experience/need of most non-commercial users, it’s probably even beyond the scope of experience of the WD personnel who implemented S3 in the EX2. So my thinking is that WD doesn’t have many EX2 customers using S3, hence the lack of any meaningful support or a well-designed S3 implementation. But I’m glad the EX2 at least has something, even if it’s screwy/lacking.

I don’t have any policies implemented on the S3 side. For this EX2 archive scheme, I am keeping things a bit simple for now. I have not found a “file listing” on the EX2 when using the Backup Type "Overwrite Existing File(s)”, which doesn’t mean there isn’t a file listing, but my initial guess is that their process is looking at date/time and/or maybe file size to determine if a file has “changed” since the last upload.

I strongly suggest you create a small/test backup job using the ridiculously named “Overwrite Existing File(s)” as the Backup Type. From what I have learned, it appears to be an “incremental” backup (not sure how they determine what qualifies as incremental), but from my tests, it’s detecting new/updated files and uploading them to S3 while skipping over the other files.

A much better designed UI, on the third S3 configuration screen, would be to have a field for Type (like upload or download, as they have now), then Backup Type (incremental or full), then a third field to indicate if you want the uploaded files to overwrite or not overwrite. My thinking is that someone said … let’s make an incremental version that overwrites the files then call it “Overwrite Existing File(s)”, and the EX2 users will have to eventually discover it’s an incremental, not a full backup, that overwrites. Just typical bad UI development.

JOSH_LOOMIS · May 6, 2016, 7:08pm

Probably should have run a test, but I pulled the trigger right before you replied. LOL. We’ll see how it goes.

MrqDude · May 7, 2016, 3:45pm

Well I hope to hear back how you are doing Josh. Please keep us posted here on your progress.

foolio · September 25, 2016, 6:03pm

There are numerous S3 threads so I’m not sure I want to start another one yet. This one is useful because it points to me that horribly named option that I’ve been avoiding and haven’t yet tried. Will do so. But I have some questions that I’m wondering if any of you could answer.

When you get a “backup failed” error - do you have any idea where to look to find out the error? I’ve ssh’d in and looked for logs and there is nothing I could find useful.

Is there an maximum time a job can run? My uplink isn’t great so it seems like everything is bombing out around 18 to 24 hours. Have people run jobs for multiple days to get the first upload done?

Is there a way to seed the upload? Can I do the initial upload from someplace where I have a fast uplink and then let the incremental job take over?

I’ve only owned this thing about a week but read about the flaky S3 so want to hash out whether it will really work or I need to move to a different option.

SBrown · September 27, 2016, 2:42am

foolio:

When you get a “backup failed” error - do you have any idea where to look to find out the error? I’ve ssh’d in and looked for logs and there is nothing I could find useful.

S3 client logging is disabled by default.
However, there should be an error code in /etc/s3.conf
A better solution is to enable Bucket Logging on S3 and parse the bucket logs.
Logging requests with server access logging - Amazon Simple Storage Service

Is there an maximum time a job can run? My uplink isn’t great so it seems like everything is bombing out around 18 to 24 hours. Have people run jobs for multiple days to get the first upload done?

Check your S3 bucket logs for more information.

Is there a way to seed the upload? Can I do the initial upload from someplace where I have a fast uplink and then let the incremental job take over?

Haven’t tried that option.

aguy · August 31, 2018, 8:24pm

Hello there, I am curious what results you have had since posting and could you share any insights?

nickbee · March 22, 2020, 3:14am

Thanks for this, I had the same issue and was able to resolve it with this information.