Ingenuity defeated by not understanding how Sparse Bundle Disk Images works

I’m doing my 3 month mirror backup using “Beyond Compare” to compare all the changes and copy only the changes between two cloud drives. This process takes about a day to two days of comparing and copying.

Of course I am always tinkering with new tech and concepts, and this time it is a 300GB disk image. This 300GB disk image is using Apple’s sparse bundle disk image format so that not all 300GB is allocated at once with the size growing or shrinking as you add or delete data from the image.

I thought using Beyond compare, I would copy only the 8GB blocks that has changed since the structure of a sparse bundle are bundles of 8GB files that resembles the sector of a hard drive.

However I noticed something different this time around on the compare as over 12,000 file blocks were deleted on the new image and another 12,000 file blocks were created. So in one sense we are still copying only what has changed or added, but now we have to ensure that we delete the old unused blocks from the old image.

It takes at least an hour and a half to delete 12,000 files and another couple of hours to copy the new blocks. I’m not sure if this was a one-off since I did delete about a 100GB from the original image, a few months ago, but due to my old age I forgot when.

A real sync will take care of copying and deleting unused files but it is all manual at the moment.

This is one of the reasons that an EX2 mirror cloud might be in my future.

Dumb question, but why arent you just using rsync?

It already does differential copy by default… The MyCloud responds to rsync over ssh as soon as you turn ssh on… That means you can manually sync one mycloud with another effortlessly over ssh. It also can work locally between two drives that are local.

Why all the complexity?

It is litterally as easy as:

rsync /[source path] root@[IP of remote MyCloud]:/[Remote path]

EG:

rsync /share/Public/DiskImages root@192.168.0.2:/share/Public/DiskImages

Only the differences from the source folder will be copied to the destination folder. It even supports over-the-wire compression. You can of course, switch the direction too, just switch the order of source and dest. You can use local only directories as well, Still does differential copy. Even sync between two remotes if you want.

The only thing the EX brings to the table is rsync in daemon mode.

1 Like

been on my todo list but… because of my OCD, I would have to test and retest until I’m sure that data isn’t being deleted :stuck_out_tongue:

but… yeah… maybe…

so are you saying that the EX is not a true hardware mirror? and it is relying on rsync in daemon?

The EX2 uses MD driver to do software raid between the two connected drives. (So no, not hardware mirroring)

You can tell the single bay myclouds to create additional MD devices as well, and use USB drives to create an additional array. (See mdadm man pages) It is a little more complicated, because the single bay devices dont want to allow mdadm to create additional MD devices, but you can get the system to create them by abusing the sysfs functionality of the md driver, and echo. :stuck_out_tongue:

I made a thread about that some time ago.

To mirror between two myclouds, the EX series use rsync in daemon mode.

The single bay ones respond to rsync over ssh as soon as you turn ssh on.

much appreciated… I guess I should put in the effort to rsync my two cloud drives then. test, test, test, test…

good job on the “Fun with software RAID” and too bad about that extra partition. I am too OCD’ish to use that method because the 1k partition would drive me nuts but that would have been perfect for my use if I had bought a 8TB My book but now it is too late because the new 8tb my books have been re-designed into an ugly black box.

I am guessing that the usage of the MD driver to do software raid is in real time; i.e. not minutes or hours later when the drive is not in use. As long as the mirror’ing is in real time, I’m ok with that.

An EX2 is on its way since Best Buy has them on sale and should arrive tomorrow. Toss up on whether or not to keep it, or just stick with my two clouds and start r-syncing them, but even if I do rsync them, the time to do it is still a manual process which will still take a day to rsync 8TB of files and some of them has very large file directories which takes the longest no matter what.

A mirrored drive will mean "no more monthly or 3 monthly syncs saving me time.

You wouldn’t know if you remove one of the drives from an EX2 or perhaps assuming it is in the same format as a cloud, if you mount it on a MAC using a USB3 sata connector, whether it is directly readable or not?

The plan is to use the EX2 as my main data drive with no backup. In the event that the device fails, I could then pluck out one of the drives and mount that on my Mac of which I can transfer to another cloud.

MD driver is realtime, yes.

It creates a new software block device at /dev/md* (where * is some number), that acts like any other block device as far as things like fdisk and pals are concerned. The magic is that the md driver combines several other block devices (or even files… if you are using image files as containers…) into a single block device, using whatever flavor of RAID you specify when you create the array. Can be raid 0 (just combine them into one big disk no tolerance) raid 1 (simple mirror), etc. For proper tolerance, you want a RAID 4 array. (Striped, with parity.) That takes at least 3 drives. (data and parity is striped over all three drives. If one fails, it goes into degraded mode, but continues to serve data. Replacing the failed drive, the array is able to rebuild from the data and parity available, and go back into full service)

I have been playing with compiling useful utilities for the mycloud, and I think one of the items I will built tonight is iscsiadm. It is like mdadm, except for mounting/creating iscsi targets. Gen2 has the kernel modules needed for it baked in, just not the admin packages. This would let me use an iscsi target on a remote location, and combine it with a local drive, to create a mirrored software raid, which could then host a share. :stuck_out_tongue:

(think, Mycloud #1 hosts an iscsi target. MyCloud#2 mounts that target, combines it with a USB drive, and creates a mirrored RAID. Mirroring would happen in realtime.)

love the idea… but I’ve been through too many years of disaster recoveries.

I had a bunch of 4TB clouds gen 1 at one point in time. I would have been your beta tester of a mirrored cloud but now… with just two 8TB gen 2 clouds, I only have 2 copies of my data.

been there, done that…

good luck… I’ll heart your mirror cloud post when you are done…

Something went wrong… I’m now copying 18,000 sparse entry 8MB blocks…

I think this confirms my commitment to having to own a EX2 just because it mirrors in real time. Copying 18,000 8MB blocks is going to take about 4 - 5 hours even through a gigabit ethernet.

Yup… there goes another $1k…

So the copying of the 18,000 8MB blocks completed after 4 hours yesterday night. No idea of what went wrong, but the copied copy open up as a disk image just fine after the copy was done.

After sleeping on it and dreaming about spending another $1k on a cloud that does mirroring in real time would be nice but seriously there is nothing stored on my current clouds that warrants real time mirroring.

I think my current clouds will be my last clouds that I will ever own as I see that WD is changing the looks of their My Books that doesn’t match the clouds anymore.

It is too bad…

You know, those enclosures are just a little circuit board stuck on a normal sata Green drive right??

You can put anything you like inside one. Personally, I bought one some time ago, then converted the case into an enclosure for a Minnoboard SBC.

What I am trying to say is-- get an old chasis off ebay, put whatever sized drive inside you want.

Another stupid question though:

These large file chunks that need backed up— Are they mostly zero filled? If so, rsync would be even faster still. (wire compression for the win.)

well I made sure to check that mine were Reds :stuck_out_tongue:

I am getting old and set in my ways.

10 years ago I did build my own servers with tower raids that heated up the room.

Today I am perfectly happy with off-the-shelf WD cloud with a little tweaking to ensure that it sleeps is all I need.

No, they are just disk images that uses sparse disk allocation of file blocks when needed. Mac mounts them like another hard drive and I can read and write into them. When done, I just eject the disk image. However when I back them up, I don’t mount them and whatever files that are allocated, 8MB blocks, they are filled with data, so rsync will still take the same amount of time as my current method as either way the data needs to be copied or deleted.

I could get a diskless ex2 cloud for $200 and use the 8TB drives currently in my cloud, but I do have an OCD problem and that is I wouldn’t be able to live with empty cloud shells :stuck_out_tongue: I would then spend the effort of buying and putting in 2TB drives into the shells but that is effort that I don’t want to be spending as I’m currently playing Breath of the Wild on my Nintendo Switch.

I was meaning, since this disk image format seems to break up the image into 8mb chunks, what percentage of these 8mb chunk files are filled with zeros, or other repetitious data?

rsync can compress the transport stream, so if you can put one of those things inside a zip, and it shrinks (as a test), the compressed transport will transport the data faster to the destination.

As for your OCD-- I meant you can get empty MyBook shells (in the kind you want) on Ebay for like 30$ a pop. (I priced them.) You can then put whatever inside one. SSD, whatever. Meaning you can still get them in the physical package you feel OCD over, and do so cheaply.

actually I don’t know. It could be zero filled to pad it to 8MB just like the old days with sector allocations.

What I might do one day, is to create rsync for specific directories instead of the whole hard drive This will break up the syncing and can be done quickly. A lot of my files are fairly static these days like my extensive photos from my pro days like weddings which I just keep around for prosperity. I should just move them off to a mirrored USB.

As to my OCD, I cannot buy shells and have them stuffed with a generic SSDs :stuck_out_tongue: That would be a different kind of OCD I would imagine. Maybe your kind :stuck_out_tongue: Reminds of Sid Phillips in Toy story where he builds these atrocities with toy parts.

No, my OCD is not the obsession of the cloud itself.

Mine is keeping thing as pristine as the day I bought them or very similar to it. Like if I were to take apart my 8TB clouds for the drives to be utilized in a EX2 diskless cloud, I would purchase 2 x 2TB green drives as replacements at which I would probably play with it much like you with creating mirror clouds at a whim, or perhaps sell them :slight_smile:

My OCD does prevent me from just having them laying around shucked.

Actually just two years ago, I had a 3 clouds and 3 my books, all 4TB and I almost bought a QNAP NAS. In fact I had it in the checkout basket at NCIX. I would have been freed from WD forums forever. My OCD kept me from shucking my drives. I was even searching Craigs list for cheap replacement drives; almost got a bunch a 2TB green drives for $40 each.

I ended up just buying the new 8TB clouds from Costco and selling off my collection of WD clouds and my books.

How is the real time mirroring with two clouds and usb drive coming along? is it done?

I am trying (very hard) to build a kernel headers package from WD’s GPL sources archive. It does not seem to want to generate a .deb package for me. Damnit.

I need it to build openscsi, since it builds modules. I dont NEED the modules, just the admin tools (the mycloud already has the modules!)-- but to make the tools, I need the headers, because they all get made at the same time.

Ever compile a kernel? It takes AAAGES.

FreeFileSync in real time mode…?

yes to your rhetorical question but not on a cloud or for a cloud. good luck @Wierd_w

good thought… I’ll keep it in mind… Thanks

So after that massive Sparse Bundle Disk Image syncing that took about 8 hours to complete on a 300GB disk image, I ran another “Beyond Compare” between the two images after adding a few files to the image and found to my relief that it doesn’t delete blocks and create new ones when updated; it re-uses the existing blocks.

Thus on a large 300GB image, it doesn’t attempt to sync a single 300GB block every time something changes within, which is what happens if you created a writable DVD image, it re-copies the whole DVD image if something changes within.

The deletion of 12,000 blocks was a remnant of a clean up from before where I actually deleted a 100GB of files from the image, thus deleting 12,000 physical blocks from the actual Sparse Bundle. Subsequently I added a about 120GB of data back into the image which thus created 16,000 files to the bundle.

This action is what I expected of a Sparse Bundle Disk Image which is what Apple Backup uses. If your Apple Backup is 100GB in size and you updated the backup with file changes, your Sync whether you use RSync or other sync’s, only the updated file changes are copied and not the entire 100GB.

This ends the saga of searching for a real time Mirror Copy. :stuck_out_tongue: