A filesystem of my own making exibits the following undesirable behaviour.
ClientA
% echo line1 >>echo.txt
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n
6c 69 6e 65 31 0a
0000006
ClientB
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n
6c 69 6e 65 31 0a
0000006
% echo line2 >>echo.txt
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a
000000c
ClientA
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a
000000c
% echo line3 >>echo.txt
ClientB
% echo line4 >>echo.txt
ClientA
% echo line5 >>echo.txt
ClientB
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n l i n e
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65
0000010 3 \n l i n e 4 \n \0 \0 \0 \0 \0 \0
33 0a 6c 69 6e 65 34 0a 00 00 00 00 00 00
000001e
ClientA
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n l i n e
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65
0000010 3 \n \0 \0 \0 \0 \0 \0 l i n e 5 \n
33 0a 00 00 00 00 00 00 6c 69 6e 65 35 0a
000001e
ClientB
% od -Ax -ctx1 echo.txt
0000000 l i n e 1 \n l i n e 2 \n l i n e
6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c 69 6e 65
0000010 3 \n \0 \0 \0 \0 \0 \0 l i n e 5 \n
33 0a 00 00 00 00 00 00 6c 69 6e 65 35 0a
000001e
The first write on clientA is done via the following call chain:
vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy()
The first write on clientB first does a read, which is expected:
vnop_write()->cluster_write()->vnop_blockmap()->vnop_strategy()->myfs_read()
Followed by a write:
vnop_write()->vnop_close()->cluster_push_err()->vnop_blockmap()->vnop_strategy()
The final write on clientA calls cluster_write()
, which doesn't do that initial read before doing a write.
I believe it is this write that introduces the hole.
What I don't understand is why this happens and how this may be prevented.
Any pointers on how to combat this would be much appreciated.
The following commentary explains the reasoning behind this.
One thing to keep in mind here is that the larger context here is that there isn't really any "safe" way for two machines to do remote I/O to a network filesystem without some higher level construct coordinating their access. The code above isn't actually trying to solve that problem, it's simply making a "best effort" check to try and ensure that the data being accessed is as "current" as possible. That really is a pure "best effort" as there are lots of access patterns which will show issues like this one.
Also, as a side note, keep in mind that most mac apps don't (or certainly shouldn't) modify files using this pattern:
- open()
- write()
- close()
What they actually use is the "safe save" pattern of:
- duplicate the contents to a location on the same file system.
- modify the duplicate.
- "atomically"* exchange the original with the duplicate.
- delete the old file.
*How atomic this is varies by file system as do the exact details of this exchange process.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware