Matmul with quantized weight does … | Apple Developer Forums

Matmul with quantized weight does not run on ANE with FP16 offset: `ane: Failed to retrieved zero_point`

Hi, the following model does not run on ANE. Inspecting with deCoreML I see the error ane: Failed to retrieved zero_point.

import numpy as np

import coremltools as ct
from coremltools.converters.mil import Builder as mb
import coremltools.converters.mil as mil

B, CIN, COUT = 512, 1024, 1024 * 4

@mb.program(
    input_specs=[
        mb.TensorSpec((B, CIN), mil.input_types.types.fp16),
    ],
    opset_version=mil.builder.AvailableTarget.iOS18
)
def prog_manual_dequant(
    x,
):
    qw = np.random.randint(0, 2 ** 4, size=(COUT, CIN), dtype=np.int8).astype(mil.mil.types.np_uint4_dtype)
    scale = np.random.randn(COUT, 1).astype(np.float16)
    offset = np.random.randn(COUT, 1).astype(np.float16)
    # offset = np.random.randint(0, 2 ** 4, size=(COUT, 1), dtype=np.uint8).astype(mil.mil.types.np_uint4_dtype)
    dqw = mb.constexpr_blockwise_shift_scale(data=qw, scale=scale, offset=offset)
    return mb.linear(x=x, weight=dqw)

cml_qmodel = ct.convert(
    prog_manual_dequant,
    compute_units=ct.ComputeUnit.CPU_AND_NE,
    compute_precision=ct.precision.FLOAT16,
    minimum_deployment_target=ct.target.iOS18,
)

Whereas if I use an offset with the same dtype as the weights (uint4 in this case), it does run on ANE

Tested on coremltools 8.0b1, on macOS 15.0 beta 2/Xcode 15 beta 2, and macOS 15.0 beta 3/Xcode 15 beta 3.