Difference between compiling a Model using CoreML and Swift-Transformers

Hello,

I was successfully able to compile TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML using Core ML, and it's working well. However, I’m now trying to compile the same model using Swift Transformers.

With the limited documentation available on the swift-chat and Hugging Face repositories, I’m finding it difficult to understand the correct process for compiling a model via Swift Transformers. I attempted the following approach, but I’m fairly certain it’s not the recommended or correct method.

Could someone guide me on the proper way to compile and use models like TinyLlama with Swift Transformers? Any official workflow, example, or best practice would be very helpful.

Thanks in advance!

This is the approach I have used:

import Foundation import CoreML import Tokenizers

@main struct HopeApp { static func main() async { print(" Running custom decoder loop...")

    do {
        let tokenizer = try await AutoTokenizer.from(pretrained: "PY007/TinyLlama-1.1B-Chat-v0.3")
        var inputIds = tokenizer("this is the test of the prompt")
        print("🧠 Prompt token IDs:", inputIds)

        let model = try float16_model(configuration: .init())
        let maxTokens = 30

        for _ in 0..<maxTokens {
            let input = try MLMultiArray(shape: [1, 128], dataType: .int32)
            let mask = try MLMultiArray(shape: [1, 128], dataType: .int32)

            for i in 0..<inputIds.count {
                input[i] = NSNumber(value: inputIds[i])
                mask[i] = 1
            }
            for i in inputIds.count..<128 {
                input[i] = 0
                mask[i] = 0
            }

            let output = try model.prediction(input_ids: input, attention_mask: mask)

            let logits = output.logits  // shape: [1, seqLen, vocabSize]
            let lastIndex = inputIds.count - 1
            let lastLogitsStart = lastIndex * 32003  // vocab size = 32003

            var nextToken = 0
            var maxLogit: Float32 = -Float.greatestFiniteMagnitude

            for i in 0..<32003 {
                let logit = logits[lastLogitsStart + i].floatValue
                if logit > maxLogit {
                    maxLogit = logit
                    nextToken = i
                }
            }

            inputIds.append(nextToken)

            
            if nextToken == 32002 { break }

           
            let partialText = try await tokenizer.decode(tokens:inputIds)
            print(partialText)
            
            

        }

    } catch {
        print("❌ Error: \(error)")
    }
}

}

Answered by DTS Engineer in 842297022

Hello @bismansahni,

Swift Transformers appears to be a non-Apple open source project. I recommend that you reach out to them directly on their GitHub page for support!

-- Greg

Hello @bismansahni,

Swift Transformers appears to be a non-Apple open source project. I recommend that you reach out to them directly on their GitHub page for support!

-- Greg

Difference between compiling a Model using CoreML and Swift-Transformers
 
 
Q