Shared directories as ROOTFS in Linux VM causes file permission issues

Question

Created Apr ’25

Replies 7

Boosts 0

Participants 2

I have successfully booted the Linux Kernel with VirtIOFS as the rootfs, but file permission issues render it completely unusable. A file on the macOS host belongs to uid 0, gid 0, but on the Linux guest, this file belongs to uid 1000, gid 10. Why does this happen? How are file permissions directly mapped between the host and the guest? If there is no mapping mechanism in place, why does this discrepancy occur? This leads to errors in Linux, such as:

sudo: /etc/sudo.conf is owned by uid 1000, should be 0

sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

bootLoader.commandLine = "console=hvc0 rootfstype=virtiofs root=myfs rw"

let directorySharingDevice = VZVirtioFileSystemDeviceConfiguration(tag: "myfs")
directorySharingDevice.share = VZSingleDirectoryShare(directory: VZSharedDirectory(url: rootURL!, readOnly: false))

The VMM is running as root.

Answered by DTS Engineer in 832830022

Why does this happen?

Because the mounting OS cannot automatically trust the permissions of volumes. With the exception of "0", user and group id are essentially arbitrary integers whose meaning is ONLY assigned by the particular system that mounts a given volume. When attaching an "arbitrary" volume to a given machine, directly using uid/gid values is basically injecting arbitrary permission into that machine. Note that on macOS, this issue is what the "Ignore ownership on this volume" in the "Get Info" dialog actually controls.

How are file permissions directly mapped between the host and the guest?

By the VFS driver that mounted the volume. I'm not sure of what does this on Linux, but on macOS (when running as a guest OS) this would be handled by "mount_virtiofs". Quoting it's man page:

     The mount_virtiofs command attaches the virtio-fs file system associated with fs_tag to the global file system namespace at the location indicated by directory.

     The options are as follows:

     -r            Mount file system as read-only.

     -u            Set the owner of the files in the file system to uid.  The uid may be a username or a numeric value.

     -g            Set the group of the files in the file system to gid.  The gid may be a group name or a numeric value.

If there is no mapping mechanism in place, why does this discrepancy occur?

Again, I'm not sure of exactly what Linux is doing, but my guess is that it defaulted to the current use (which happened to be 1000) or it directly defaulted to 1000.

Note that this illustrates the security issues inherent in blindly accepting the permission configuration of external volumes:

sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

In other words, if the system honored the permission configuration of external volumes, then the ability to mount a external volume would allow that user to execute arbitrary code as root.

My purpose is to run a Linux VM while benefiting from the advanced features of APFS, such as CoW, rather than using a raw disk image (sparse file). If there’s a simpler way to achieve this goal, then this question becomes unnecessary.

So, two points here:

I think you're going to benefit a WHOLE lot less from "the advanced features of APFS" than you think. For example, I don't think copying file in Linux will generate file close. File cloning happens on macOS because we did a fairly through job of ensuring than all "copy operation" created file clones by either directly supporting file cloning or by calling into "copyfile" (which handles cloning). I'm not confident that the virtualization exports file cloning to the guest OS or that Linux would be able to use it if it did.
I would set up the "core" operating system in a more standard way (probably as a disk image) and then only share specific directories that you'll be actively modifying. You don't really want macOS accidentally modifying guest OS core anyway, and that core os isn't going to get much benefit* from being in a shared directory.

*Frankly, I think you'll actually be ACTIVELY harmed by trying to share the entire OS. For example, I'd expect shared directory performance to be signficantly slower than providing a block device the guest OS can "own". That won't really be that noticeable when modifying individual files, but I think it could REALLY matter when you're talking about the entire system.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Boost

Answer 1

Coco0721 OP

Apr ’25

My purpose is to run a Linux VM while benefiting from the advanced features of APFS, such as CoW, rather than using a raw disk image (sparse file). If there’s a simpler way to achieve this goal, then this question becomes unnecessary.

0

Answer 2

DTS Engineer OP

Apple

Apr ’25

Recommended

Why does this happen?

Because the mounting OS cannot automatically trust the permissions of volumes. With the exception of "0", user and group id are essentially arbitrary integers whose meaning is ONLY assigned by the particular system that mounts a given volume. When attaching an "arbitrary" volume to a given machine, directly using uid/gid values is basically injecting arbitrary permission into that machine. Note that on macOS, this issue is what the "Ignore ownership on this volume" in the "Get Info" dialog actually controls.

How are file permissions directly mapped between the host and the guest?

By the VFS driver that mounted the volume. I'm not sure of what does this on Linux, but on macOS (when running as a guest OS) this would be handled by "mount_virtiofs". Quoting it's man page:

     The mount_virtiofs command attaches the virtio-fs file system associated with fs_tag to the global file system namespace at the location indicated by directory.

     The options are as follows:

     -r            Mount file system as read-only.

     -u            Set the owner of the files in the file system to uid.  The uid may be a username or a numeric value.

     -g            Set the group of the files in the file system to gid.  The gid may be a group name or a numeric value.

If there is no mapping mechanism in place, why does this discrepancy occur?

Again, I'm not sure of exactly what Linux is doing, but my guess is that it defaulted to the current use (which happened to be 1000) or it directly defaulted to 1000.

Note that this illustrates the security issues inherent in blindly accepting the permission configuration of external volumes:

sudo: /usr/bin/sudo must be owned by uid 0 and have the setuid bit set

In other words, if the system honored the permission configuration of external volumes, then the ability to mount a external volume would allow that user to execute arbitrary code as root.

My purpose is to run a Linux VM while benefiting from the advanced features of APFS, such as CoW, rather than using a raw disk image (sparse file). If there’s a simpler way to achieve this goal, then this question becomes unnecessary.

So, two points here:

I think you're going to benefit a WHOLE lot less from "the advanced features of APFS" than you think. For example, I don't think copying file in Linux will generate file close. File cloning happens on macOS because we did a fairly through job of ensuring than all "copy operation" created file clones by either directly supporting file cloning or by calling into "copyfile" (which handles cloning). I'm not confident that the virtualization exports file cloning to the guest OS or that Linux would be able to use it if it did.
I would set up the "core" operating system in a more standard way (probably as a disk image) and then only share specific directories that you'll be actively modifying. You don't really want macOS accidentally modifying guest OS core anyway, and that core os isn't going to get much benefit* from being in a shared directory.

*Frankly, I think you'll actually be ACTIVELY harmed by trying to share the entire OS. For example, I'd expect shared directory performance to be signficantly slower than providing a block device the guest OS can "own". That won't really be that noticeable when modifying individual files, but I think it could REALLY matter when you're talking about the entire system.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0

Answer 3

Coco0721 OP

Apr ’25

Accepted Answer

I checked the permissions of the same file using different users within the same Linux client:

[root@fedora ~]# id
uid=0(root) gid=0(root) groups=0(root)
[root@fedora ~]# ls -n /TEST 
-rw-r--r-- 1 0 0 0 Apr  3  2025 /TEST

[me@fedora ~]$ id
uid=1000(me) gid=10(wheel) groups=10(wheel)
[me@fedora ~]$ ls -n /TEST 
-rw-r--r-- 1 1000 10 0 Apr  3  2025 /TEST

This seems to suggest a disturbing fact: the VirtioFS implementation by Apple completely ignores file permissions, with no permission mapping mechanism in place, because it simply doesn’t exist.

0

Answer 4

DTS Engineer OP

Apple

Apr ’25

This seems to suggest a disturbing fact: the VirtioFS implementation by Apple completely ignores file permissions, with no permission mapping mechanism in place, because it simply doesn’t exist.

Your starting point here has been that the host and guest OS should use identical permission configurations in the shared directory. Unfortunately, that approach has a number of significant issues:

It creates an attacker vector into host OS. For example, the guest OS can configure a macOS executable as setuid root, then trick the user into executing that tool from the host side.
It allows the creation of permission configurations that the hosting user can't easily control. For example, if the guest OS create a hierarchy that's root/700 then a standard user can't easily delete that directory.
Even in normal use, it create additional complexity since the hosting user can end up needing to interact with a hierarchy whose configuration isn't valid in the host operating system.

All of those issues get in the way of VZSharedDirectory's primary goal, which is to provide a simple mechanism for smoothly sharing data between the host and guest with minimal friction. On the host side, that means perserving the privilege configuration of the host system so that files continue to "fit" within the expected permission hierarchy of the hosting system.

However that does create a problems on the guests side. Short of creating an elaborate system for overlaying permissions, the only "fixed" user it could use is root, but that would end up creating all of the same issues I listed above. So, what it does is exactly what you described above- it varies the permissions returned to provide broad access to the hosting system.

To be clear, I'm not saying that there aren't cases where directly mirroring permissions across the systems wouldn't be useful. I'm simply explaining why we choose the approach we did and the value it provides.

I do have one clarification on this point:

the VirtioFS implementation by Apple completely ignores file permissions

The behavior you're seeing is intentional behavior implemented by both sides of the VirtioFS stack (the Linux implementation and Apple), NOT just "Apple". As I mentioned in my previous message, the uid/gid the host returns to the guest actually come from the guest OS. That's why, when macOS is the guest OS, "mount_virtiofs" accepts uid/gid arguments.

In other words, the reason why this is occurring:

I checked the permissions of the same file using different users within the same Linux client:

...is, at least partly, because that's what Linux is telling us to do.

__
Kevin Elliott DTS Engineer, CoreOS/Hardware

0

Answer 5

Coco0721 OP

Apr ’25

Are you saying that this was a deliberate decision, not a mistake, that Apple provides VirtioFS support in a surprising way? You mount virtiofs on Linux, then perform a chown on any file, and it succeeds, but when you check the owner again, nothing has changed because, in fact, it’s a no-op. And Apple thinks this is correct, intuitive, and doesn’t require documentation?

0

Answer 6

Coco0721 OP

Apr ’25

Technically, this problem can be solved, such as the unprivileged mode of virtiofsd, which uses user_namespaces, but it’s unlikely that Apple will provide this feature within a limited timeframe.

0

Answer 7

DTS Engineer OP

Apple

Apr ’25

Are you saying that this was a deliberate decision, not a mistake, that Apple provides VirtioFS support in a surprising way?

Yes, the decision was quite intentional. As I outlined above, this approach is basically the only option if you:

Want the virtualization to be usable by standard users accounts and without privilege escalation.
Don't maintain an alternative storage location for the guest permission configuration. For example, several VM implementations store the guest OS configuration data in sidecar data files or as xattr's on individual files.

We considered #2 (r.91387442), however, it didn't happen in our initial implementation (primarily due to time constraints) and hasn't happened since because multiple developers have chosen to address the issue in the client app or guest OS.

You mount virtiofs on Linux, then perform a chown on any file, and it succeeds, but when you check the owner again, nothing has changed because, in fact, it’s a no-op.

As one small clarification, I believe what's actually happening is that the system attempt the change, fails, and then does not report that failure back to the guest, so you'd actually get different results if the client was running as root. I'll also note that I believe this behavior is similar to qemu when run as "-fsdev ... security_model=note" and in virtiofsd this is handled through the "squash-guest" configuration.

And Apple thinks this is correct

Yes. More specifically:

I think the current behavior is useful for a variety of use cases.
I think it would be nice if the framework provided more options and configurability.
I think that if the framework was only going to implement a single behavior, this was the correct one to implement.

intuitive,

No, not necessarily. I think the behavior is understandable within the larger system context, but it's definitely not the "obvious" approach.

and doesn’t require documentation?

Documentation has always been a difficult problem for us and the virtualization framework is certainly not our best documented framework. The details of this should absolutely be documented and, if you haven't already, I'd strongly encourage you to file bugs both asking for the current behavior to be better documented and for us to provide more options for how this should be handled.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

0