Thanks for being a part of WWDC25!

How did we do? We’d love to know your thoughts on this year’s conference. Take the survey here

Matmul with quantized weight does not run on ANE with FP16 offset: `ane: Failed to retrieved zero_point`

Hi, the following model does not run on ANE. Inspecting with deCoreML I see the error ane: Failed to retrieved zero_point.

import numpy as np

import coremltools as ct
from coremltools.converters.mil import Builder as mb
import coremltools.converters.mil as mil

B, CIN, COUT = 512, 1024, 1024 * 4

@mb.program(
    input_specs=[
        mb.TensorSpec((B, CIN), mil.input_types.types.fp16),
    ],
    opset_version=mil.builder.AvailableTarget.iOS18
)
def prog_manual_dequant(
    x,
):
    qw = np.random.randint(0, 2 ** 4, size=(COUT, CIN), dtype=np.int8).astype(mil.mil.types.np_uint4_dtype)
    scale = np.random.randn(COUT, 1).astype(np.float16)
    offset = np.random.randn(COUT, 1).astype(np.float16)
    # offset = np.random.randint(0, 2 ** 4, size=(COUT, 1), dtype=np.uint8).astype(mil.mil.types.np_uint4_dtype)
    dqw = mb.constexpr_blockwise_shift_scale(data=qw, scale=scale, offset=offset)
    return mb.linear(x=x, weight=dqw)

cml_qmodel = ct.convert(
    prog_manual_dequant,
    compute_units=ct.ComputeUnit.CPU_AND_NE,
    compute_precision=ct.precision.FLOAT16,
    minimum_deployment_target=ct.target.iOS18,
)

Whereas if I use an offset with the same dtype as the weights (uint4 in this case), it does run on ANE

Tested on coremltools 8.0b1, on macOS 15.0 beta 2/Xcode 15 beta 2, and macOS 15.0 beta 3/Xcode 15 beta 3.

Matmul with quantized weight does not run on ANE with FP16 offset: `ane: Failed to retrieved zero_point`
 
 
Q