Thanks for being a part of WWDC25!

How did we do? We’d love to know your thoughts on this year’s conference. Take the survey here

Horrendous Swift overlay of vDSP Fourier Transform functions

I'm very unpleased to have found out the hard way, that vDSP.FFT<T> where T : vDSP_FourierTransformable and its associated vDSP_FourierTransformFunctions, only does real-complex conversion!!!

This is horribly misleading - the only hint that it calls vDSP_fft_zrop() (split-complex real-complex out-of-place) under the hood, not vDSP_fft_zop()[1], is "A 1D single- and double-precision fast Fourier transform." - instead of "a single- and double-precision complex fast Fourier transform". Holy ******* ****.

Just look at how people miss-call this routine. Poor him, realizing he had to log2n+1 lest only the first half of the array was transformed, not understanding why [2].

And poor me, having taken days investigating why a simple Swift overlay vDSP.FFT.transform may execute at 1.4x the speed of vDSP_fft_zopt(out-of-place with temporary buffer)! [3]


[1]: or better, vDSP_fft_zopt with the temp buffer alloc/dealloc taken care of, for us

[2]: for real-complex conversion, say real signal of length 16. log2n is 4 no problem, but then the real and imaginary vectors are both length 8. Also, vDSP_fft only works on integer powers of 2, so he had to choose next integer power of 2 (i.e. 1<<(log2n-1)) instead of plain length for his internal arrays.

[3]: you guessed it. fft_zrop(log2n-1, ...) vs. fft_zop(log2n, ...). Half the problem size lol.


Now we have vDSP.DiscreteFourierTransform, which wraps vDSP_DFT routines and "calls FFT routines when the size allows it", and works too for interleaved complexes. Just go all DFT, right?

if let setup_f = try? vDSP.DiscreteFourierTransform(previous: nil, count: 8, direction: .forward, transformType: .complexComplex, ofType: DSPComplex.self) {

    // Done forward transformation
    // and scaled the results with 1/N
    // How about going reverse?
    if let setup_r = try? vDSP.DiscreteFourierTransform(previous: setup_f, count: 8, direction: .inverse, transformType: .complexComplex, ofType: DSPComplex.self) {
    // Error: cannot convert vDSP.DiscreteFourierTransform<DSPComplex> to vDSP.DiscreteFourierTransform<Float>
    // lolz
    }
}

This API appeared in macOS 12. 3 years later nobody have ever noticed. I'm at a loss for words.

I had to refresh my mind on FFT. Found this interesting reference, may be of interest for those of us who need as well.

https://www.analog.com/media/en/technical-documentation/dsp-book/dsp_book_Ch31.pdf

Hi,

Many thanks for your post.

Regarding the vDSP.DiscreteFourierTransform initializer API, This is an issue we're aware of, and you may find using the vDSP.DFT a suitable alternative. The vDSP.DFT explicitly supports complexComplex and complexReal.

Summary of related routines

Getting data to expected form/type

// 1. idiomatic Swift, but pyramid of doom
let a = [1,2,3,4]
a.withUnsafeBufferPointer { pa in
    var one : Int32 = 1
    withUnsafeMutablePointer(to: &one) { pone in
        // LAPACK routines pass ALL parameters by reference.
        // Luckily most vDSP functions don't do this
        sgemm_(pa.baseAddress!,pone, ...)
    }
}

// 2. Manual memory management. Always remember to deallocate. At least we have type checks
let b = UnsafeMutableBufferPointer<Float>
        .allocate(capacity: 16)
let c = UnsafeMutableBufferPointer<Float>
        .allocate(capacity: 16)
defer { b.deallocate(); c.deallocate() }

Let's setup some data: 16 complex values in split complex form.

_ = b.initialize(from:[0,0.1,0.3,0.5,
                       0.7,0.9,1,1,
                       1,0.9,0.7,0.5,
                       0.3,0.1,0,0])
c[0..<12] = b[4..<16]
c[12..<16] = b[0..<4]
print(Array(c)) // Remember, array has value semantics, and Array(c) won't follow c's subsequent updates
var d = UnsafeMutableBufferPointer<Float>.allocate(capacity: 16)
var e = UnsafeMutableBufferPointer<Float>.allocate(capacity: 16)
var z = UnsafeMutableBufferPointer<DSPSplitComplex>.allocate(capacity: 2)
z[0] = DSPSplitComplex(realp: b.baseAddress!, imagp: c.baseAddress!)
z[1] = DSPSplitComplex(realp: d.baseAddress!, imagp: e.baseAddress!)

Now, routines by API sets. First of all, those bridged from C

var pi = z.baseAddress!.advanced(by: 0) // All operands are pass-by-reference
var po = z.baseAddress!.advanced(by: 1) // Even though DSPSplitComplex itself contains only references
let log2n : vDSP_Length = 4             // 16 = 2**4

Raw: vDSP

Complex FFT

// we have *complexes of length 16*, thus log2n=4
if let vsetup = vDSP_create_fftsetup( log2n, FFTDirection(FFT_FORWARD) ) {
    // complex out of place. No surprises
    vDSP_fft_zop(vsetup, pi, 1, po, 1, log2n, FFTDirection(FFT_FORWARD))
    print("FFT of 16 complexes:",Array(d),Array(e),separator: "\n")
    // while the header may state otherwise, none of the reverse routines actually do that 1/N scale
    var scale = Float(1) / Float(16)
    vDSP_vsmul(po.pointee.realp, 1, &scale, po.pointee.realp, 1, 16)
    vDSP_vsmul(z[1].imagp, 1, &scale, z[1].imagp, 1, 16)
    vDSP_fft_zip(vsetup, po, 1, log2n, FFTDirection(FFT_INVERSE)) // zip is in-place
    print("Reverse FFT, back to where we started:",Array(d),Array(e),separator: "\n")
    vDSP_destroy_fftsetup(vsetup)
}

Raw: vDSP

Real FFT

let tmpB = Array(b); let tmpC = Array(c) // Take snapshot
c.update(repeating: 0)


// Path1: just complex FFT with 50% wasted space. 
if let vsetup = vDSP_create_fftsetup( log2n, FFTDirection(FFT_FORWARD) ) {
    vDSP_fft_zop(vsetup, pi, 1, po, 1, log2n, FFTDirection(FFT_FORWARD))
    print("Complex FFT of Real signal, as reference:",Array(d),Array(e),"Note the symmetry.",separator: "\n")
    vDSP_destroy_fftsetup(vsetup)
}
d.update(repeating: 0); e.update(repeating: 0)
// Path2: dedicated real-complex transformation. zrop
b.withMemoryRebound(to:DSPComplex.self) {
    vDSP_ctoz(b.baseAddress!, 2, pi, 1, 8)
    // 16 reals, viewed as 8 complexes ^^^
}
b[8..<16].update(repeating:0)
print("Real signal interleaved",Array(b),Array(c),separator: "\n") // Complexes of length 8

// Still, we have *reals of length 16*, thus log2n=4
if let vsetup = vDSP_create_fftsetup( log2n, FFTDirection(FFT_FORWARD) ) {
    vDSP_fft_zrop(vsetup, pi, 1, po, 1, log2n, FFTDirection(FFT_FORWARD))
    print("Faster FFT of Real signal, packed:",Array(d),Array(e),"Everything's 2x the numbers above.",separator: "\n")
    var scale = Float(0.5) / Float(16) // scale is 1/2N this time.
    vDSP_vsmul(z[1].realp, 1, &scale, z[1].realp, 1, 8)
    vDSP_vsmul(z[1].imagp, 1, &scale, z[1].imagp, 1, 8)
    vDSP_fft_zrip(vsetup, po, 1, log2n, FFTDirection(FFT_INVERSE)) // zrip is in-place
    print("Reverse FFT, back to where we started:",Array(d),Array(e),separator: "\n")
    vDSP_destroy_fftsetup(vsetup)
}
d.update(repeating: 0); e.update(repeating: 0)

Very thin overlay: vDSP.FourierTransformable & co.

Real FFT ONLY

// 1. note it's still *reals of length 16*, thus log2n=4
if let setup = DSPSplitComplex.FFTFunctions.makeFFTSetup(log2n: log2n, radix: .radix2) {
    DSPSplitComplex.FFTFunctions.transform(fftSetup: setup, log2n: log2n, source: pi, destination: po, direction: .forward)
    DSPSplitComplex.FFTFunctions.destroySetup(setup)
}
print("Look! Packed real-complex FFT!"Array(d),Array(e),separator: "\n"); d.update(repeating: 0); e.update(repeating: 0)
// 2. So thin that the setup is interchangeable with C-side
if var setup = vDSP_SplitComplexFloat.makeFFTSetup(log2n: log2n, radix: .radix2) {
    vDSP_SplitComplexFloat.transform(fftSetup: setup, log2n: log2n, source: pi, destination: po, direction: .forward)
    print(Array(d),Array(e))
    withUnsafeBytes(of: &setup) { setupp in
        let p = setupp.bindMemory(to: uintptr_t.self)
        let setup = OpaquePointer(bitPattern: p[0])!
        vDSP_fft_zrip(setup, po, 1, log2n, FFTDirection(FFT_INVERSE))
        print("Forgot to scale with 1/2N before transforming back:Array(d),Array(e),separator: "\n"")
        vDSP_destroy_fftsetup(setup)
    }
}
d.update(repeating: 0); e.update(repeating: 0)

Idiomatic, thicker overlay: vDSP.FFT

Real FFT ONLY

// Of Type SplitComplex. Who could have thought this is a zrop.
if let setup = vDSP.FFT(log2n: log2n, radix: .radix2, ofType: DSPSplitComplex.self) {
    // Idiomatic as in, setup is typed,
    // setup is destroyed automatically at deinit(),
    // and the operands (which are already references) are not pass-by-reference
    setup.transform(input: z[0], output: &z[1], direction: .forward)
    print("See? Packed results:",Array(d),Array(e),separator: "\n")
    let scale = Float(0.5) / Float(16) 
    // Vectors are length 8, but log2n needs to be 4,
    // and scale needs to be 1/32.
    // Signs of a packed real-complex transformation.
    vDSP.multiply(scale, d, result:&d)
    vDSP.multiply(scale, e, result:&e)
    print("Expect:",Array(b),Array(c))
    setup.transform(input: z[1], output: &z[0], direction: .inverse)
    print("Actual:",Array(b),Array(c))
}

Idiomatic, thicker overlay: vDSP.DiscreteFourierTransform

real-complex and complex-complex

// Demonstrating real-complex here
// Much simpler interface. I wonder if there's a lock underneath
// because we shouldn't modify (create or destroy) a setup
// while a routine using [it or one sharing memory with it] is running.
// Count 16 for split complexes
if let setup = try? vDSP.DiscreteFourierTransform(previous: nil, count: 16, direction: .forward, transformType: .complexReal, ofType: Float.self) {
    setup.transform(inputReal: b, inputImaginary: c, outputReal: &d, outputImaginary: &c)
    print("See? Packed results just like above:",Array(d),Array(e))
}

Actually, DFT can operate on interleaved complexes, so we don't have to convert the real signal anymore - just act as if it were interleaved complex:

_ = b.update(fromContentsOf: tmpB)
b.withMemoryRebound(to: DSPComplex.self) { b in
d.withMemoryRebound(to: DSPComplex.self) { d in
var d = d
// Count 8 for interleaved complexes???
if let setup = try? vDSP.DiscreteFourierTransform(previous: nil, count: 8, direction: .forward, transformType: .complexReal, ofType: DSPComplex.self) {
    setup.transform(input:b, output: &d)
    vDSP.convert(interleavedComplexVector: Array(d),toSplitComplexVector: &z[1])
}
}
}
print("See? Packed results just like above:",Array(d),Array(e))

Have to fight with the type checker though, and probably messes codegen once we have pointer authentication in user space.

Horrendous Swift overlay of vDSP Fourier Transform functions
 
 
Q