Detect and wait until a file has been unzipped to avoid permission errors

Question

Created 2w

Replies 10

Boosts 0

Participants 3

In my app the user can select a source folder to be synced with a destination folder. The sync can also happen in response to a change in the source folder detected with FSEventStreamCreate.

If the user unzips an archive in the source folder and the sync process begins before the unzip operation has completed, the sync can fail because of a "Permission denied" error. I assume this is related to the posix permissions of the extracted folder being 420 during the unzip operation and (in my case) 511 afterwards.

Is there a way to detect than an unzip operation is in progress and wait until it has completed? I thought that using NSFileCoordinator would solve this issue, but unfortunately it's not the case. Since an unzip operation can last any amount of time, it's not ideal to just delay a sync by a fixed number of seconds and let the user deal with any error if the unzip operation takes longer.

let openPanel = NSOpenPanel()
openPanel.canChooseDirectories = true
if openPanel.runModal() == .cancel {
    return
}
let url = openPanel.urls[0].appendingPathComponent("extracted", isDirectory: false)
var error: NSError?
NSFileCoordinator(filePresenter: nil).coordinate(readingItemAt: url, error: &error) { url in
    do {
        print(try FileManager.default.attributesOfItem(atPath: url.path).sorted(by: { $0.key.rawValue < $1.key.rawValue }).map({ ($0.key.rawValue, $0.value) }))
        try FileManager.default.contentsOfDirectory(at: url, includingPropertiesForKeys: nil)
    } catch {
        print(error)
    }
}
if let error = error {
    print("file coordinator error:", error)
}

Boost

Answer 1

Etresoft OP

2w

In my app the user can select a source folder to be synced with a destination folder. The sync can also happen in response to a change in the source folder detected with FSEventStreamCreate.

First of all, these are two radically different use cases. Once the user is doing something, then you can use file coordination to have (hopefully) some control while an operation is occurring. With file system events, you are simply getting a notification about something that happened at some point in the past.

The only real similarity here is that the file system is out of your control. Whatever you're doing, you have to account for unexpected changes to permissions, additions, deletions, etc. Unzipping is always a good test because it exercises all of that.

The short answer is that there is no answer. File sync is a hard problem, in a mathematical sense.

There's nothing wrong with a little delay. That is how FSEvents work, after all. But a delay doesn't solve anything, it just makes the higher-level logic a bit more manageable.

When dealing with file system operations, and especially file system events, there is no real concept of "failure". Your sync simply can't ever fail unless the entire hard drive controller goes up in smoke. Lower-level failures from the file system will be a regular occurrence, as in every few seconds. You simply have to handle them.

Another trick you can do is to take a snapshot of the directory you are working with. By that I mean just creating a representation of the directory tree in memory the way it is in the file system. (Just the file system, not the data) File coordination might help here. Then you work from that to do your sync. But this too is just a convenience. You have to expect that when you're working through the tree, the on-disk representation can radically change, or go away altogether.

0

Answer 2

Nickkk OP

2w

First of all, these are two radically different use cases.

Thanks for your answer, but I'm not sure I see how they are radically different use cases. If the user starts unzipping, then starts a sync, it may find the same unfinished files that can be found if the sync is triggered by FSEventStreamCreate. That's actually how you can reproduce the permission error (and which I forgot to explain earlier):

Run the code I provided which will show an open panel.
Start unzipping an archive which was previously created by zipping a folder named "extracted".
Before the unzipping completes you select the parent folder in the open panel and click Open.

The only real similarity here is that the file system is out of your control.

True, hence why I was hoping in a cooperative mechanism like NSFileCoordinator which would allow to coordinate writing to the extracted folder on zip's side, and later reading it on my side.

You simply have to handle them.

Not so simple :-) In this case, I would handle the "permission denied" error by postponing the sync until the unzip has completed, which is the question I'm trying to find an answer to.

0

Answer 3

Etresoft OP

2w

I would handle the "permission denied" error by postponing the sync until the unzip has completed, which is the question I'm trying to find an answer to.

Unzipping is a good test case, but there are other operations that could trigger the same problem. The end user could act as root and create an unreadable directory, leaving it there forever.

Unless this is an operation that you initiated, with a reliable completion mechanism, then there is no way to tell when it completes. Maybe the user runs out of disk space, or the file is corrupted, and it never completes.

0

Answer 4

Nickkk OP

2w

The end user could act as root and create an unreadable directory, leaving it there forever.

In this case it would be expected that the sync results in an error. I'm only looking to handle temporary permission errors caused by running operations such as unzipping.

Maybe the user runs out of disk space, or the file is corrupted, and it never completes.

Ideally the unzipping operation would still allow me to detect that it has completed / aborted, e.g. by deleting the files extracted so far.

0

Answer 5

DTS Engineer OP

Apple

2w

A few different thought/comments here:

True, hence why I was hoping in a cooperative mechanism like NSFileCoordinator which would allow to coordinate writing to the extracted folder on zip's side, and later reading it on my side.

One thing to understand here is that the word "cooperative" is a huge part of the issue for an app that's trying to do what you're describing. NSFileCoordinator is a great solution in theory but if you try to build on it as the "exclusive" solution you'll find that there are way to many case which simply don't "cooperate".

As a side comment on that point, even when file coordination is being used that doesn't mean it will do what you want. You're expectation here seems to be that the unzip routine would issue a coordinated write against the entire directory which ends when it's "done", but that's the kind of very long running operation we specifically warn against.

More broadly, part of what you're fighting here is an example of "Inferring High-Level Semantics from Low-Level Operations". You want to know that an unzip (or other long operation) is occurring, but all the system can tell you that that API level is basically "stuff is changing" and the current permission state.

Note that these issues are what directly lead to the File Provider architecture. It's not necessarily that our API can do this an inherently "better" way, it's that it's better positioned to provide coherent behavior.

Not so simple :-) In this case, I would handle the "permission denied" error by postponing the sync until the unzip has completed, which is the question I'm trying to find an answer to.

I think what's really helpful here is to shift how you think about what the "problem" here actually is and what "handling" actually means. The problem here isn't "permission denied". It's normal and expected that a process could encounter files it doesn't have access to. Similarly, "postponing the sync until the <operation> has completed" isn't really a solution. The time here is TOTALLY unbounded and, more importantly, nothing in the system will tell you how long it might take. Most importantly, there's a good chance the user already knows exactly what's going on and doesn't have a issue with it.

Jumping back to your original statement here:

If the user unzips an archive in the source folder and the sync process begins before the unzip operation has completed, the sync can fail because of a "Permission denied" error.

In terms of the user experience, the worst case for that flow looks like this:

User clicks "Sync" button.
App posts "Sync failed: Permission Denied".
User dismisses error dialog and returns to #1, repeating until sync finishes.

The problem in that flow isn't that the sync failed, it was the iteration of the same pointless error.

Similarly, the solution here is often about improving the interface experience and "flow", NOT actually fixing/handling any problem. That means thinking about this in terms of:

How do you inform the user of issues in away that minimizes/eliminates any disruption?
Using things like waiting and retrying to "hide" short term disruptions from the user.
Deciding what your app should do for really LONG disruptions ("an hour") and then building that solution.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 6

Nickkk OP

2w

Thanks for your input.

My idea of file coordination was that it would allow processes to make "atomic" changes to files and folders so that they are left in a consistent state to other processes that use file coordination as well. Unzipping seemed like an ideal fit to me, regardless how long it takes (after all, the user should be aware that an unzip operation is in progress and shouldn't worry about other scan operations containing the zip archive apparently hanging for the duration of the unzip operation).

As I mentioned, I'm not trying to avoid permission errors in general, but only those caused by temporary operations such as unzipping. My app keeps syncing, sends a notification when an error happens and allows the user to see the list of errors, but the user isn't forced to do anything for the app to keep syncing. But even if the user is aware that the errors are caused by the unzip operation, they would have to go through the list of errors (which could be quite long) to make sure that they haven't overseen an unrelated error. What I could do mitigate this is to mark an error as "solved" if a successive sync of the same file is successful.

If file coordination isn't the answer and the unzipped files are not meant to be accessed before the unzip operation has completed, perhaps it would make sense for unzip to write the temporary files to a standard temporary folder and then move them to the final location only at the end.

0

Answer 7

DTS Engineer OP

Apple

2w

Thanks for your input. My idea of file coordination was that it would allow processes to make "atomic" changes to files and folders so that they are left in a consistent state to other processes that use file coordination as well.

Yes, that's generally what it's goal is, particularly with it's broader role in file versioning and syncing. However, the problem of your particular situation is:

File coordination is inherently "opt in", so it only helps if the writing app (which you don't control) chooses to implement it.
It looks like you want to operate in the "general case", meaning you're expecting to work with whatever directories the user specifies and with whatever apps/files they happen to be using.

Those to factors mean you simply cannot rely on file coordination. You can certainly choose to implement it and there are definitely case where it may be helpful, but you still need to figure out a solution that works for all of the other case where the writing process doesn't implement file coordination.

Unzipping seemed like an ideal fit to me, regardless how long it takes

No, this is not what file coordination is "for". It is NOT acceptable for an app to block inside a file coordination call for an "extended" period of time. File coordination calls are intended to be very brief (<~1s) "low level" I/O calls, not a tool for blocking long running operations. The problem with doing this:

(after all, the user should be aware that an unzip operation is in progress and shouldn't worry about other scan operations containing the zip archive apparently hanging for the duration of the unzip operation).

...is that you're basically setting up a "trap" for other apps running on the system. Apps expect file coordination calls to block for very limited duration and now those calls will end up blocking for far longer than they were ever expecting.

Note that this means that correctly using file coordination for large operations is more complicated than simply calling coordinate(writingItemAt:) and writing whatever you want. In practice, that means that large operations should generally be implemented using some variation of this approach:

The app uses NSFileManager.url(for:in:appropriateFor:create:) to establish a private location on the same volume as the final destination.
The app writes whatever it needs to write out to that location.
The app starts a coordinated write, then uses NSFileManager.replaceItemAt(_:withItemAt:backupItemName:options:) to safely replace the existing file with it's new objects.

FYI, this same issue can also apply when reading data. Particularly on APFS (where file cloning makes file duplication extraordinarily fast), an app doing bulk copying might be better off using a coordinated read to create a "private" copy on the same volume, then using that private copy as the source for it's copy operation.

That leads to here:

If file coordination isn't the answer and the unzipped files are not meant to be accessed before the unzip operation has completed, perhaps it would make sense for unzip to write the temporary files to a standard temporary folder and then move them to the final location only at the end.

I won't try and provide a full/specific justification but here are two specific issues:

This approach has issues outside of AFPS (and HFS+). On AFPS and HFS+, the replace operation in #3 is atomic operation internal to the file system, which means it's both extremely fast and require no meaningful storage. On other file systems, it's going to require copying and will (temporarily) require double the total storage.
For some operations (copying and unzip included), there can be value in ensuring that whatever data the operation produces is accessible, even if the full operation never completes. This could be done by recovering data out of the temporary location, but the most straightforward solution is to just write directly to the final target.

As I mentioned, I'm not trying to avoid permission errors in general, but only those caused by temporary operations such as unzipping. My app keeps syncing, sends a notification when an error happens and allows the user to see the list of errors, but the user isn't forced to do anything for the app to keep syncing. But even if the user is aware that the errors are caused by the unzip operation, they would have to go through the list of errors (which could be quite long) to make sure that they haven't overseen an unrelated error. What I could do mitigate this is to mark an error as "solved" if a successive sync of the same file is successful.

The other thing I would add here is that simply heuristics can add significant value. For example:

The permission set used is distinct and not particularly common.
Lots of decompression happen in the same directory as the source.

...so if "foo" is set to "420" and in the same directory as "foo.zip", then there is a decent chance that this is an in-progress unzip operation. Not guaranteed of course, but the way I would think about this is that your goal here is to better manage work and present what's actually going on, not "perfection". Related to that point, going back to here:

sends a notification when an error happens and allows the user to see the list of errors,

Unless you've gone out of your way (at significant performance cost) to impose a specific file ordering, the order you process files in isn't going to be meaningful to the user. On HFS+, the catalog ordering behavior makes bulk iteration roughly alphabetical, but on most other file systems (particularly APFS) the order is going to look basically arbitrary/random. That's important here because it means there isn't really any reason your app HAS to tell the user about any particular issue at the moment you encounter it- you could just defer that location/file to later processing and silently keep going.

Now, there's obviously a balancing act there (you don't necessarily want your app to dump all of it's errors at the very end), but it can certainly be another tool you use to improve the overall experience.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 8

Nickkk OP

2w

Thanks for the discussion, it will be helpful in deciding how to handle long operations.

0

Answer 9

Nickkk OP

1w

I just wanted to clarify one thing. I've been using NSFileCoordinator for copying files for quite some time now, i.e blocking for the duration of the file copy operation. Did you mean that I shouldn't be doing that, since copying a file can potentially take more than 1 second?

0

Answer 10

DTS Engineer OP

Apple

1w

I just wanted to clarify one thing. I've been using NSFileCoordinator for copying files for quite some time now, i.e blocking for the duration of the file copy operation. Did you mean that I shouldn't be doing that, since copying a file can potentially take more than 1 second?

Yes, that's something I'd try and avoid. Building on our conversation here, what I would actually do is something like this:

Create temporary/working directories on both the source and destination volumes. The "temporary" case would come from "url(for:in:appropriateFor:create:)", while the "working" case would probably be something I'd create in an obvious way ("<app name> in progress files") or have the user select.
Perform a coordinated read and clone source object to source working directory.
File is copied from source working to destination working. Theoretically this could also be coordinated, however, practically that doesn't really matter as this operation is between "your" directories.
Perform a coordinated write and use replaceItem(at:withItemAt:backupItemName:options:resultingItemURL:) to replace destination object with destination working object.

Note that the coordinated operations at #2 (clone files) and #4 (atomic replace or series of moves) are all very short lived operations. The exception here is if a non-APFS source forces you to be a copy but that's a case which forces a much broader consideration and dropping the initial duplicate entirely.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0