func testMLTensor() {
let t1 = MLTensor(shape: [2000, 1], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 2000), scalarType: Float.self)
let t2 = MLTensor(shape: [1, 3000], scalars: [Float](repeating: Float.random(in: 0.0...10.0), count: 3000), scalarType: Float.self)
for _ in 0...50 {
let t = Date()
let x = (t1 * t2)
print("MLTensor", t.timeIntervalSinceNow * 1000, "ms")
}
}
testMLTensor()
The above code took more time than expected, especially in the early stage of iteration.
Accelerate
RSS for tagMake large-scale mathematical computations and image calculations with high-performance, energy-efficient computation using Accelerate.
Posts under Accelerate tag
26 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Hello everybody,
I am running into an error with BNNS.NormalizationLayer. It appears to only work with .vector, and matrix shapes throws layerApplyFail during training. Inference doesn't throw but the output stays the same.
How to correctly use BNNS.NormalizationLayer with matrix shapes? How to debug layerApplyFail exception?
Thanks
let array: [Float32] = [
01, 02, 03, 04, 05, 06,
07, 08, 09, 10, 11, 12,
13, 14, 15, 16, 17, 18,
]
// let inputShape: BNNS.Shape = .vector(6 * 3) // works
let inputShape: BNNS.Shape = .matrixColumnMajor(6, 3)
let input = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
let output = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
let beta = BNNSNDArrayDescriptor.allocate(repeating: Float32(0), shape: inputShape, batchSize: 1)
let gamma = BNNSNDArrayDescriptor.allocate(repeating: Float32(1), shape: inputShape, batchSize: 1)
let activation: BNNS.ActivationFunction = .identity
let layer = BNNS.NormalizationLayer(type: .layer(normalizationAxis: 0), input: input, output: output, beta: beta, gamma: gamma, epsilon: 1e-12, activation: activation)!
let layerInput = BNNSNDArrayDescriptor.allocate(initializingFrom: array, shape: inputShape)
let layerOutput = BNNSNDArrayDescriptor.allocateUninitialized(scalarType: Float32.self, shape: inputShape)
// try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .inference) // No throw
try layer.apply(batchSize: 1, input: layerInput, output: layerOutput, for: .training)
_ = layerOutput.makeArray(of: Float32.self) // All zeros when .inference
Hey, I’m building a camera app where I am applying real time effects to the view finder. One of those effects is a variable blur, so to improve performance I am scaling down the input image using CIFilter.lanczosScaleTransform(). This works fine and runs at 30FPS, but when running the metal profiler I can see that the scaling transforms use a lot of GPU time, almost as much as the variable blur. Is there a more efficient way to do this?
The simplified chain is like this:
Scale down viewFinder CVPixelBuffer (CIFilter.lanczosScaleTransform)
Scale up depthMap CVPixelBuffer to match viewFinder size (CIFilter.lanczosScaleTransform)
Create CIImages from both CVPixelBuffers
Apply VariableDepthBlur (CIFilter.maskedVariableBlur)
Scale up final image to metal view size (CIFilter.lanczosScaleTransform)
Render CIImage to a MTKView using CIRenderDestination
From some research, I wonder if scaling the CVPixelBuffer using the accelerate framework would be faster? Also, Instead of scaling the final image, perhaps I could offload this to the metal view?
Any pointers greatly appreciated!
Topic:
Media Technologies
SubTopic:
Photos & Camera
Tags:
Metal
Camera
Accelerate
Photos and Imaging
Helo all,
Currently, I'm working on an iOS app that performs measurement and shows the results to the user in a graph. I use a Savitzky-Golay filter to filter out noise, so that the graph is nice and smooth. However, the code that calculates the Savitzky-Golay coefficients using sparse matrices crashes sometimes, throwing an EXC_BAD_ACCESS. I tried to find out what the problem is by turning on Address Sanitizer and Thread Sanitizer, but, for some reason, the bad access exception isn't thrown when either of these is on. What else could I try to trace back the problem?
Thanks in advance,
CaS
To reproduce the error, run the following:
import SwiftUI
import Accelerate
struct ContentView: View {
var body: some View {
VStack {
Button("Try", action: test)
}
.padding()
}
func test() {
for windowLength in 3...100 {
let coeffs = SavitzkyGolay.coefficients(windowLength: windowLength, polynomialOrder: 2)
print(coeffs)
}
}
}
class SavitzkyGolay {
static func coefficients(windowLength: Int, polynomialOrder: Int, derivativeOrder: Int = 0, delta: Int = 1) -> [Double] {
let (halfWindow, remainder) = windowLength.quotientAndRemainder(dividingBy: 2)
var pos = Double(halfWindow)
if remainder == 0 {
pos -= 0.5
}
let X = [Double](stride(from: Double(windowLength) - pos - 1, through: -pos, by: -1))
let P = [Double](stride(from: 0, through: Double(polynomialOrder), by: 1))
let A = P.map { exponent in
X.map {
pow($0, exponent)
}
}
var B = [Double](repeating: 0, count: polynomialOrder + 1)
B[derivativeOrder] = Double(factorial(derivativeOrder)) / pow(Double(delta), Double(derivativeOrder))
return leastSquaresSolution(A: A, B: B)
}
static func leastSquaresSolution(A: [[Double]], B: [Double]) -> [Double] {
let sparseA = A.sparseMatrix()
var sparseAValuesCopy = sparseA.values
var xValues = [Double](repeating: 0, count: A.transpose().count)
var bValues = B
sparseAValuesCopy.withUnsafeMutableBufferPointer { valuesPtr in
let a = SparseMatrix_Double(
structure: sparseA.structure,
data: valuesPtr.baseAddress!
)
bValues.withUnsafeMutableBufferPointer { bPtr in
xValues.withUnsafeMutableBufferPointer { xPtr in
let b = DenseVector_Double(
count: Int32(B.count),
data: bPtr.baseAddress!
)
let x = DenseVector_Double(
count: Int32(A.transpose().count),
data: xPtr.baseAddress!
)
#warning("EXC_BAD_ACCESS is thrown below")
print("This code is executed...")
let status = SparseSolve(SparseLSMR(), a, b, x, SparsePreconditionerDiagScaling)
print("...but, if an EXC_BAD_ACCESS is thrown, this code isn't")
if status != SparseIterativeConverged {
fatalError("Failed to converge. Returned with error \(status).")
}
}
}
}
return xValues
}
}
func factorial(_ n: Int) -> Int {
n < 2 ? 1 : n * factorial(n - 1)
}
extension Array where Element == [Double] {
func sparseMatrix() -> (structure: SparseMatrixStructure, values: [Double]) {
let columns = self.transpose()
var rowIndices: [Int32] = columns.map { column in
column.indices.compactMap { indexInColumn in
if column[indexInColumn] != 0 {
return Int32(indexInColumn)
}
return nil
}
}.reduce([], +)
let sparseColumns = columns.map { column in
column.compactMap {
if $0 != 0 {
return $0
}
return nil
}
}
var counter = 0
var columnStarts = [Int]()
for sparseColumn in sparseColumns {
columnStarts.append(counter)
counter += sparseColumn.count
}
let reducedSparseColumns = sparseColumns.reduce([], +)
columnStarts.append(reducedSparseColumns.count)
let structure: SparseMatrixStructure = rowIndices.withUnsafeMutableBufferPointer { rowIndicesPtr in
columnStarts.withUnsafeMutableBufferPointer { columnStartsPtr in
let attributes = SparseAttributes_t()
return SparseMatrixStructure(
rowCount: Int32(self.count),
columnCount: Int32(columns.count),
columnStarts: columnStartsPtr.baseAddress!,
rowIndices: rowIndicesPtr.baseAddress!,
attributes: attributes,
blockSize: 1
)
}
}
return (structure, reducedSparseColumns)
}
func transpose() -> Self {
let columns = self.count
let rows = self.reduce(0) { Swift.max($0, $1.count) }
return (0 ..< rows).reduce(into: []) { result, row in
result.append((0 ..< columns).reduce(into: []) { result, column in
result.append(row < self[column].count ? self[column][row] : 0)
})
}
}
}
Topic:
Programming Languages
SubTopic:
Swift
Tags:
iOS
Swift
Accelerate
Xcode Sanitizers and Runtime Issues
I have a Matrix structure as defined below for working with 2D numerical data in Accelerate. The underlying numerical data in this Matrix struct is stored as an Array.
struct Matrix<T> {
let rows: Int
let columns: Int
var data: [T]
init(rows: Int, columns: Int, fill: T) {
self.rows = rows
self.columns = columns
self.data = Array(repeating: fill, count: rows * columns)
}
init(rows: Int, columns: Int, source: (inout UnsafeMutableBufferPointer<T>) -> Void) {
self.rows = rows
self.columns = columns
self.data = Array(unsafeUninitializedCapacity: rows * columns) { buffer, initializedCount in
source(&buffer)
initializedCount = rows * columns
}
}
subscript(row: Int, column: Int) -> T {
get { return self.data[(row * self.columns) + column] }
set { self.data[(row * self.columns) + column] = newValue }
}
}
Multiplication is implemented by the functions shown below.
import Accelerate
infix operator .*
func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions")
let result = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in
vDSP.multiply(lhs.data, rhs.data, result: &buffer)
}
return result
}
func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix")
var a = lhs.data
var b = rhs.data
let m = lhs.rows // number of rows in matrices A and C
let n = rhs.columns // number of columns in matrices B and C
let k = lhs.columns // number of columns in matrix A; number of rows in matrix B
let alpha = 1.0
let beta = 0.0
// matrix multiplication where C ← αAB + βC
let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns) { buffer in
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, &a, k, &b, n, beta, buffer.baseAddress, n)
}
return c
}
I can also define a Matrix structure where the underlying data is an UnsafeMutableBufferPointer. The buffer is handled by the MatrixData class.
struct Matrix<T> {
let rows: Int
let columns: Int
var data: MatrixData<T>
init(rows: Int, columns: Int, fill: T) {
self.rows = rows
self.columns = columns
self.data = MatrixData(count: rows * columns, fill: fill)
}
init(rows: Int, columns: Int) {
self.rows = rows
self.columns = columns
self.data = MatrixData(count: rows * columns)
}
subscript(row: Int, column: Int) -> T {
get { return self.data.buffer[(row * self.columns) + column] }
set { self.data.buffer[(row * self.columns) + column] = newValue }
}
}
class MatrixData<T> {
var buffer: UnsafeMutableBufferPointer<T>
var baseAddress: UnsafeMutablePointer<T> {
get { self.buffer.baseAddress! }
}
init(count: Int, fill: T) {
let start = UnsafeMutablePointer<T>.allocate(capacity: count)
self.buffer = UnsafeMutableBufferPointer(start: start, count: count)
self.buffer.initialize(repeating: fill)
}
init(count: Int) {
let start = UnsafeMutablePointer<T>.allocate(capacity: count)
self.buffer = UnsafeMutableBufferPointer(start: start, count: count)
}
deinit {
self.buffer.deinitialize()
self.buffer.deallocate()
}
}
Multiplication for this approach is implemented by the functions shown here.
import Accelerate
infix operator .*
func .* (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.rows == rhs.rows && lhs.columns == rhs.columns, "Matrices must have same dimensions")
let result = Matrix<Double>(rows: lhs.rows, columns: lhs.columns)
vDSP.multiply(lhs.data.buffer, rhs.data.buffer, result: &result.data.buffer)
return result
}
func * (lhs: Matrix<Double>, rhs: Matrix<Double>) -> Matrix<Double> {
precondition(lhs.columns == rhs.rows, "Number of columns in left matrix must equal number of rows in right matrix")
let a = lhs.data.baseAddress
let b = rhs.data.baseAddress
let m = lhs.rows // number of rows in matrices A and C
let n = rhs.columns // number of columns in matrices B and C
let k = lhs.columns // number of columns in matrix A; number of rows in matrix B
let alpha = 1.0
let beta = 0.0
// matrix multiplication where C ← αAB + βC
let c = Matrix<Double>(rows: lhs.rows, columns: rhs.columns)
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, alpha, a, k, b, n, beta, c.data.baseAddress, n)
return c
}
Both of these approaches give me similar performance. The only difference that I have noticed is the matrix buffer approach allows for reference semantics. For example, the code below uses half the memory with the matrix buffer approach compared to the matrix array approach. This is because b acts as a reference to a using the matrix buffer approach; otherwise, the matrix array approach makes a full copy of a.
let n = 10_000
let a = Matrix<Double>(rows: n, columns: n, fill: 0)
var b = a
b[0, 0] = 99
b[0, 1] = 22
Other than reference semantics, are there any reasons to use one of these approaches over the other?
I'm using M1pro and have successfully installed Numpy with Accelerate following, and it really speedup my programs. I also ran np.test() to check the correctness and every test passed.
However, I can't install Scipy with Accelerate, since the official document said Accelerate has a LAPACK of too old version. I can't even find a scipy that can pass scipy.test(). I tried the codes below:
conda install numpy 'libblas=*=*accelerate'
conda install scipy
np.test() as fails, sp.test() can't even finish
conda install numpy 'libblas=*=*openblas'
conda install scipy
Both np.test() and sp.test() can finish, but with many failures. I believe the bugs are due to Conda.
pip install --no-binary :all: --no-use-pep517 numpy
pip install scipy
np.test() has no failure and went fast, sp.test() uses OpenBLAS and has 3 failures. This is the best version I have found.
So my question is: can we find a reliable version of scipy on M1? Considering the popularity of scipy, I think it's not a high-living expectation.
And a question for Apple: is there really a plan to upgrade the LAPACK in Accelerate?
Topic:
Developer Tools & Services
SubTopic:
Xcode
Tags:
Developer Tools
Accelerate
Mac
Apple Silicon