Accelerate

RSS for tag

Make large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.

Posts under Accelerate tag

26 Posts
Sort by:

Post

Replies

Boosts

Views

Activity

MLTensor computation took more time than expected.
func testMLTensor() { let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self) let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self) for _ in 0...50 { let t = Date() let x = (t1 * t2) print("MLTensor", t.timeIntervalSinceNow * 1000, "ms") } } testMLTensor() The above code took more time than expected, especially in the early stage of iteration.
0
0
589
Aug ’24
Documentation and usage of BNNS.NormalizationLayer
Hello everybody, I am running into an error with BNNS.NormalizationLayer. It appears to only work with .vector, and matrix shapes throws layerApplyFail during training. Inference doesn't throw but the output stays the same. How to correctly use BNNS.NormalizationLayer with matrix shapes? How to debug layerApplyFail exception? Thanks let array: [Float32] = [ 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, ] // let inputShape: BNNS.Shape = .vector(6 * 3) // works let inputShape: BNNS.Shape = .matrixColumnMajor(6, 3) let input = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) let output = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) let beta = BNNSNDArrayDescriptor.allocate(repeating: Float32(0), shape: inputShape, batchSize: 1) let gamma = BNNSNDArrayDescriptor.allocate(repeating: Float32(1), shape: inputShape, batchSize: 1) let activation: BNNS.ActivationFunction = .identity let layer = BNNS.NormalizationLayer(type: .layer(normalizationAxis: 0), input: input, output: output, beta: beta, gamma: gamma, epsilon: 1e-12, activation: activation)! let layerInput = BNNSNDArrayDescriptor.allocate(initializingFrom: array, shape: inputShape) let layerOutput = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape) // try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .inference) // No throw try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .training) _ = layerOutput.makeArray(of: Float32.self) // All zeros when .inference
1
0
840
Jul ’24
Performant alternative to scaling a CIImage / PixelBuffer
Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this? The simplified chain is like this: Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform) Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform) Create CIImages from both CVPixelBuffers Apply VariableDepthBlur (CIFilter.maskedVariableBlur) Scale up final image to metal view size (CIFilter.lanczosScaleTransform) Render CIImage to a MTKView using CIRenderDestination From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view? Any pointers greatly appreciated!
2
0
999
Jul ’24
Peculiar EXC_BAD_ACCESS, involving sparse matrices
Helo all, Currently, I'm working on an iOS app that performs measurement and shows the results to the user in a graph. I use a Savitzky-Golay filter to filter out noise, so that the graph is nice and smooth. However, the code that calculates the Savitzky-Golay coefficients using sparse matrices crashes sometimes, throwing an EXC_BAD_ACCESS. I tried to find out what the problem is by turning on Address Sanitizer and Thread Sanitizer, but, for some reason, the bad access exception isn't thrown when either of these is on. What else could I try to trace back the problem? Thanks in advance, CaS To reproduce the error, run the following: import SwiftUI import Accelerate struct ContentView: View { var body: some View { VStack { Button("Try", action: test) } .padding() } func test() { for windowLength in 3...100 { let coeffs = SavitzkyGolay.coefficients(windowLength: windowLength, polynomialOrder: 2) print(coeffs) } } } class SavitzkyGolay { static func coefficients(windowLength: Int, polynomialOrder: Int, derivativeOrder: Int = 0, delta: Int = 1) -> [Double] { let (halfWindow, remainder) = windowLength.quotientAndRemainder(dividingBy: 2) var pos = Double(halfWindow) if remainder == 0 { pos -= 0.5 } let X = [Double](stride(from: Double(windowLength) - pos - 1, through: -pos, by: -1)) let P = [Double](stride(from: 0, through: Double(polynomialOrder), by: 1)) let A = P.map { exponent in X.map { pow($0, exponent) } } var B = [Double](repeating: 0, count: polynomialOrder + 1) B[derivativeOrder] = Double(factorial(derivativeOrder)) / pow(Double(delta), Double(derivativeOrder)) return leastSquaresSolution(A: A, B: B) } static func leastSquaresSolution(A: [[Double]], B: [Double]) -> [Double] { let sparseA = A.sparseMatrix() var sparseAValuesCopy = sparseA.values var xValues = [Double](repeating: 0, count: A.transpose().count) var bValues = B sparseAValuesCopy.withUnsafeMutableBufferPointer { valuesPtr in let a = SparseMatrix_Double( structure: sparseA.structure, data: valuesPtr.baseAddress! ) bValues.withUnsafeMutableBufferPointer { bPtr in xValues.withUnsafeMutableBufferPointer { xPtr in let b = DenseVector_Double( count: Int32(B.count), data: bPtr.baseAddress! ) let x = DenseVector_Double( count: Int32(A.transpose().count), data: xPtr.baseAddress! ) #warning("EXC_BAD_ACCESS is thrown below") print("This code is executed...") let status = SparseSolve(SparseLSMR(), a, b, x, SparsePreconditionerDiagScaling) print("...but, if an EXC_BAD_ACCESS is thrown, this code isn't") if status != SparseIterativeConverged { fatalError("Failed to converge. Returned with error \(status).") } } } } return xValues } } func factorial(_ n: Int) -> Int { n < 2 ? 1 : n * factorial(n - 1) } extension Array where Element == [Double] { func sparseMatrix() -> (structure: SparseMatrixStructure, values: [Double]) { let columns = self.transpose() var rowIndices: [Int32] = columns.map { column in column.indices.compactMap { indexInColumn in if column[indexInColumn] != 0 { return Int32(indexInColumn) } return nil } }.reduce([], +) let sparseColumns = columns.map { column in column.compactMap { if $0 != 0 { return $0 } return nil } } var counter = 0 var columnStarts = [Int]() for sparseColumn in sparseColumns { columnStarts.append(counter) counter += sparseColumn.count } let reducedSparseColumns = sparseColumns.reduce([], +) columnStarts.append(reducedSparseColumns.count) let structure: SparseMatrixStructure = rowIndices.withUnsafeMutableBufferPointer { rowIndicesPtr in columnStarts.withUnsafeMutableBufferPointer { columnStartsPtr in let attributes = SparseAttributes_t() return SparseMatrixStructure( rowCount: Int32(self.count), columnCount: Int32(columns.count), columnStarts: columnStartsPtr.baseAddress!, rowIndices: rowIndicesPtr.baseAddress!, attributes: attributes, blockSize: 1 ) } } return (structure, reducedSparseColumns) } func transpose() -> Self { let columns = self.count let rows = self.reduce(0) { Swift.max($0, $1.count) } return (0 ..< rows).reduce(into: []) { result, row in result.append((0 ..< columns).reduce(into: []) { result, column in result.append(row < self[column].count ? self[column][row] : 0) }) } } }
11
0
1.4k
Jul ’24
Data storage for a Matrix struct when working with Accelerate
I have a Matrix structure as defined below for working with 2D numerical data in Accelerate. The underlying numerical data in this Matrix struct is stored as an Array. struct Matrix<T> { let rows: Int let columns: Int var data: [T] init(rows: Int, columns: Int, fill: T) { self.rows = rows self.columns = columns self.data = Array(repeating: fill, count: rows * columns) } init(rows: Int, columns: Int, source: (inout UnsafeMutableBufferPointer<T>) -> Void) { self.rows = rows self.columns = columns self.data = Array(unsafeUninitializedCapacity: rows * columns) { buffer, initializedCount in source(&buffer) initializedCount = rows * columns } } subscript(row: Int, column: Int) -> T { get { return self.data[(row * self.columns) + column] } set { self.data[(row * self.columns) + column] = newValue } } } Multiplication is implemented by the functions shown below. import Accelerate infix operator .* func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions") let result = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in vDSP.multiply(lhs.data, rhs.data, result: &buffer) } return result } func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix") var a = lhs.data var b = rhs.data let m = lhs.rows // number of rows in matrices A and C let n = rhs.columns // number of columns in matrices B and C let k = lhs.columns // number of columns in matrix A; number of rows in matrix B let alpha = 1.0 let beta = 0.0 // matrix multiplication where C ← αAB + βC let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, &a, k, &b, n, beta, buffer.baseAddress, n) } return c } I can also define a Matrix structure where the underlying data is an UnsafeMutableBufferPointer. The buffer is handled by the MatrixData class. struct Matrix<T> { let rows: Int let columns: Int var data: MatrixData<T> init(rows: Int, columns: Int, fill: T) { self.rows = rows self.columns = columns self.data = MatrixData(count: rows * columns, fill: fill) } init(rows: Int, columns: Int) { self.rows = rows self.columns = columns self.data = MatrixData(count: rows * columns) } subscript(row: Int, column: Int) -> T { get { return self.data.buffer[(row * self.columns) + column] } set { self.data.buffer[(row * self.columns) + column] = newValue } } } class MatrixData<T> { var buffer: UnsafeMutableBufferPointer<T> var baseAddress: UnsafeMutablePointer<T> { get { self.buffer.baseAddress! } } init(count: Int, fill: T) { let start = UnsafeMutablePointer<T>.allocate(capacity: count) self.buffer = UnsafeMutableBufferPointer(start: start, count: count) self.buffer.initialize(repeating: fill) } init(count: Int) { let start = UnsafeMutablePointer<T>.allocate(capacity: count) self.buffer = UnsafeMutableBufferPointer(start: start, count: count) } deinit { self.buffer.deinitialize() self.buffer.deallocate() } } Multiplication for this approach is implemented by the functions shown here. import Accelerate infix operator .* func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions") let result = Matrix<Double>(rows: lhs.rows, columns: lhs.columns) vDSP.multiply(lhs.data.buffer, rhs.data.buffer, result: &result.data.buffer) return result } func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> { precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix") let a = lhs.data.baseAddress let b = rhs.data.baseAddress let m = lhs.rows // number of rows in matrices A and C let n = rhs.columns // number of columns in matrices B and C let k = lhs.columns // number of columns in matrix A; number of rows in matrix B let alpha = 1.0 let beta = 0.0 // matrix multiplication where C ← αAB + βC let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, k, b, n, beta, c.data.baseAddress, n) return c } Both of these approaches give me similar performance. The only difference that I have noticed is the matrix buffer approach allows for reference semantics. For example, the code below uses half the memory with the matrix buffer approach compared to the matrix array approach. This is because b acts as a reference to a using the matrix buffer approach; otherwise, the matrix array approach makes a full copy of a. let n = 10_000 let a = Matrix<Double>(rows: n, columns: n, fill: 0) var b = a b[0, 0] = 99 b[0, 1] = 22 Other than reference semantics, are there any reasons to use one of these approaches over the other?
3
0
901
Jun ’24
Scipy problems with OpenBLAS and Accelerate
I'm using M1pro and have successfully installed Numpy with Accelerate following, and it really speedup my programs. I also ran np.test() to check the correctness and every test passed. However, I can't install Scipy with Accelerate, since the official document said Accelerate has a LAPACK of too old version. I can't even find a scipy that can pass scipy.test(). I tried the codes below: conda install numpy 'libblas=*=*accelerate' conda install scipy np.test() as fails, sp.test() can't even finish conda install numpy 'libblas=*=*openblas' conda install scipy Both np.test() and sp.test() can finish, but with many failures. I believe the bugs are due to Conda. pip install --no-binary :all: --no-use-pep517 numpy pip install scipy np.test() has no failure and went fast, sp.test() uses OpenBLAS and has 3 failures. This is the best version I have found. So my question is: can we find a reliable version of scipy on M1? Considering the popularity of scipy, I think it's not a high-living expectation. And a question for Apple: is there really a plan to upgrade the LAPACK in Accelerate?
2
0
2.6k
Jan ’25