How to prevent holes from being created by cluster_write() in files

A filesystem of my own making exibits the following undesirable behaviour.

ClientA

% echo line1 >>echo.txt
% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n                                        
           6c  69  6e  65  31  0a                                        
0000006

ClientB

% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n                                        
           6c  69  6e  65  31  0a                                        
0000006
% echo line2 >>echo.txt
% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n   l   i   n   e   2  \n                
           6c  69  6e  65  31  0a  6c  69  6e  65  32  0a                
000000c

ClientA

% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n   l   i   n   e   2  \n                
           6c  69  6e  65  31  0a  6c  69  6e  65  32  0a                
000000c
% echo line3 >>echo.txt

ClientB

% echo line4 >>echo.txt

ClientA

% echo line5 >>echo.txt

ClientB

% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n   l   i   n   e   2  \n   l   i   n   e
           6c  69  6e  65  31  0a  6c  69  6e  65  32  0a  6c  69  6e  65
0000010    3  \n   l   i   n   e   4  \n  \0  \0  \0  \0  \0  \0        
           33  0a  6c  69  6e  65  34  0a  00  00  00  00  00  00        
000001e

ClientA

% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n   l   i   n   e   2  \n   l   i   n   e
           6c  69  6e  65  31  0a  6c  69  6e  65  32  0a  6c  69  6e  65
0000010    3  \n  \0  \0  \0  \0  \0  \0   l   i   n   e   5  \n        
           33  0a  00  00  00  00  00  00  6c  69  6e  65  35  0a        
000001e

ClientB

% od -Ax -ctx1 echo.txt
0000000    l   i   n   e   1  \n   l   i   n   e   2  \n   l   i   n   e
           6c  69  6e  65  31  0a  6c  69  6e  65  32  0a  6c  69  6e  65
0000010    3  \n  \0  \0  \0  \0  \0  \0   l   i   n   e   5  \n        
           33  0a  00  00  00  00  00  00  6c  69  6e  65  35  0a        
000001e

The first write on clientA is done via the following call chain:

vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy()

The first write on clientB first does a read, which is expected:

vnop_write()->cluster_write()->vnop_blockmap()->vnop_strategy()->myfs_read()

Followed by a write:

vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy()

The final write on clientA calls cluster_write(), which doesn't do that initial read before doing a write.

I believe it is this write that introduces the hole.

What I don't understand is why this happens and how this may be prevented.

Any pointers on how to combat this would be much appreciated.

Answered by DTS Engineer in 836929022

The following commentary explains the reasoning behind this.

One thing to keep in mind here is that the larger context here is that there isn't really any "safe" way for two machines to do remote I/O to a network filesystem without some higher level construct coordinating their access. The code above isn't actually trying to solve that problem, it's simply making a "best effort" check to try and ensure that the data being accessed is as "current" as possible. That really is a pure "best effort" as there are lots of access patterns which will show issues like this one.

Also, as a side note, keep in mind that most mac apps don't (or certainly shouldn't) modify files using this pattern:

  • open()
  • write()
  • close()

What they actually use is the "safe save" pattern of:

  • duplicate the contents to a location on the same file system.
  • modify the duplicate.
  • "atomically"* exchange the original with the duplicate.
  • delete the old file.

*How atomic this is varies by file system as do the exact details of this exchange process.

__ Kevin Elliott
DTS Engineer, CoreOS/Hardware

Found the answer I was looking for by dtrace(1)'ing SMBClient.

It calls ubc_msync(vp, 0, ubc_getsize(vp), NULL, UBC_INVALIDATE); in smbfs_vnop_open_common().

The following commentary explains the reasoning behind this.

Accepted Answer

The following commentary explains the reasoning behind this.

One thing to keep in mind here is that the larger context here is that there isn't really any "safe" way for two machines to do remote I/O to a network filesystem without some higher level construct coordinating their access. The code above isn't actually trying to solve that problem, it's simply making a "best effort" check to try and ensure that the data being accessed is as "current" as possible. That really is a pure "best effort" as there are lots of access patterns which will show issues like this one.

Also, as a side note, keep in mind that most mac apps don't (or certainly shouldn't) modify files using this pattern:

  • open()
  • write()
  • close()

What they actually use is the "safe save" pattern of:

  • duplicate the contents to a location on the same file system.
  • modify the duplicate.
  • "atomically"* exchange the original with the duplicate.
  • delete the old file.

*How atomic this is varies by file system as do the exact details of this exchange process.

__ Kevin Elliott
DTS Engineer, CoreOS/Hardware

How to prevent holes from being created by cluster_write() in files
 
 
Q