I have successfully booted the Linux Kernel with VirtIOFS as the rootfs, but file permission issues render it completely unusable. A file on the macOS host belongs to uid 0, gid 0, but on the Linux guest, this file belongs to uid 1000, gid 10. Why does this happen? How are file permissions directly mapped between the host and the guest? If there is no mapping mechanism in place, why does this discrepancy occur? This leads to errors in Linux, such as:
sudo: /etc/sudo.conf is owned by uid 1000, should be 0
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set
bootLoader.commandLine = "console=hvc0 rootfstype=virtiofs root=myfs rw"
let directorySharingDevice = VZVirtioFileSystemDeviceConfiguration(tag: "myfs")
directorySharingDevice.share = VZSingleDirectoryShare(directory: VZSharedDirectory(url: rootURL!, readOnly: false))
The VMM is running as root.
Why does this happen?
Because the mounting OS cannot automatically trust the permissions of volumes. With the exception of "0", user and group id are essentially arbitrary integers whose meaning is ONLY assigned by the particular system that mounts a given volume. When attaching an "arbitrary" volume to a given machine, directly using uid/gid values is basically injecting arbitrary permission into that machine. Note that on macOS, this issue is what the "Ignore ownership on this volume" in the "Get Info" dialog actually controls.
How are file permissions directly mapped between the host and the guest?
By the VFS driver that mounted the volume. I'm not sure of what does this on Linux, but on macOS (when running as a guest OS) this would be handled by "mount_virtiofs". Quoting it's man page:
The mount_virtiofs command attaches the virtio-fs file system associated with fs_tag to the global file system namespace at the location indicated by directory.
The options are as follows:
-r Mount file system as read-only.
-u Set the owner of the files in the file system to uid. The uid may be a username or a numeric value.
-g Set the group of the files in the file system to gid. The gid may be a group name or a numeric value.
If there is no mapping mechanism in place, why does this discrepancy occur?
Again, I'm not sure of exactly what Linux is doing, but my guess is that it defaulted to the current use (which happened to be 1000) or it directly defaulted to 1000.
Note that this illustrates the security issues inherent in blindly accepting the permission configuration of external volumes:
sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set
In other words, if the system honored the permission configuration of external volumes, then the ability to mount a external volume would allow that user to execute arbitrary code as root.
My purpose is to run a Linux VM while benefiting from the advanced features of APFS, such as CoW, rather than using a raw disk image (sparse file). If there’s a simpler way to achieve this goal, then this question becomes unnecessary.
So, two points here:
-
I think you're going to benefit a WHOLE lot less from "the advanced features of APFS" than you think. For example, I don't think copying file in Linux will generate file close. File cloning happens on macOS because we did a fairly through job of ensuring than all "copy operation" created file clones by either directly supporting file cloning or by calling into "copyfile" (which handles cloning). I'm not confident that the virtualization exports file cloning to the guest OS or that Linux would be able to use it if it did.
-
I would set up the "core" operating system in a more standard way (probably as a disk image) and then only share specific directories that you'll be actively modifying. You don't really want macOS accidentally modifying guest OS core anyway, and that core os isn't going to get much benefit* from being in a shared directory.
*Frankly, I think you'll actually be ACTIVELY harmed by trying to share the entire OS. For example, I'd expect shared directory performance to be signficantly slower than providing a block device the guest OS can "own". That won't really be that noticeable when modifying individual files, but I think it could REALLY matter when you're talking about the entire system.
__
Kevin Elliott
DTS Engineer, CoreOS/Hardware