SFSpeechRecognizer is not working inside visionOS 2.4 simulator

Question

Created 4w

Replies 0

Boosts 0

Participants 1

I know there has been issues with SFSpeechRecognizer in iOS 17+ inside the simulator. Running into issues with speech not being recognised inside the visionOS 2.4 simulator as well (likely because it borrows from iOS frameworks). Just wondering if anyone has any work arounds or advice for this simulator issue. I can't test on device because I don't have an Apple Vision Pro.

Using Swift 6 on Xcode 16.3. Below are the console logs & the code that I am using.

Console Logs

BACKGROUND SPATIAL TAP (hit BackgroundTapPlane)
SpeechToTextManager.startRecording() called
[0x15388a900|InputElement #0|Initialize] Number of channels = 0 in AudioChannelLayout does not match number of channels = 2 in stream format.
iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending
iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending
iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending
iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending
iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending
iOSSimulatorAudioDevice-22270-1: Abandoning I/O cycle because reconfig pending
SpeechToTextManager.startRecording() completed successfully and recording is active.
GameManager.onTapToggle received. speechToTextManager.isAvailable: true, speechToTextManager.isRecording: true
GameManager received tap toggle callback. Tapped Object: None
BACKGROUND SPATIAL TAP (hit BackgroundTapPlane)
GESTURE MANAGER - User is already recording, stopping recording
SpeechToTextManager.stopRecording() called
GameManager.onTapToggle received. speechToTextManager.isAvailable: true, speechToTextManager.isRecording: false
Audio data size: 134400 bytes
Recognition task error: No speech detected <---

Code

private(set) var isRecording: Bool = false

private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?

@MainActor
func startRecording() async throws {
    logger.debug("SpeechToTextManager.startRecording() called")

    guard !isRecording else {
        logger.warning("Cannot start recording: Already recording.")
        throw AppError.alreadyRecording
    }

    currentTranscript = ""
    processingError = nil
    audioBuffer = Data()
    isRecording = true

    do {
        try await configureAudioSession()

        try await Task.detached { [weak self] in
            guard let self = self else {
                throw AppError.internalError(description: "SpeechToTextManager instance deallocated during recording setup.")
            }

            try await self.audioProcessor.configureAudioEngine()

            let (recognizer, request) = try await MainActor.run { () -> (SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest) in
                guard let result = self.createRecognitionRequest() else {
                    throw AppError.configurationError(description: "Speech recognition not available or SFSpeechRecognizer initialization failed.")
                }
                return result
            }

            await MainActor.run {
                self.recognitionRequest = request
            }

            await MainActor.run {
                self.recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in
                    guard let self = self else { return }

                    if let error = error {
                        // WE ENTER INTO THIS BLOCK, ALWAYS
                        self.logger.error("Recognition task error: \(error.localizedDescription)")
                        self.processingError = .speechRecognitionError(description: error.localizedDescription)
                        return
                    }
                . . .
                }
            }
            . . .
        }.value
    } catch {
       . . .
    }
}

@MainActor
func stopRecording() {
    logger.debug("SpeechToTextManager.stopRecording() called")
    
    guard isRecording else {
        logger.debug("Not recording, nothing to do")
        return
    }
    
    isRecording = false
    
    Task.detached { [weak self] in
        guard let self = self else { return }
        
        await self.audioProcessor.stopEngine()
        
        let finalBuffer = await self.audioProcessor.getAudioBuffer()
        
        await MainActor.run {
            self.recognitionRequest?.endAudio()
            self.recognitionTask?.cancel()
        }
        . . .
    }
}

Boost