I've been using Metal compute shaders for lattice quantum chromodynamics simulations and wanted to share the experience in case others are doing scientific computing on Metal.
The workload involves SU(2) matrix operations on 4D lattice grids — lots of 2x2 and 3x3 complex matrix multiplies, reductions over lattice sites, and nearest-neighbor stencil operations. The implementation bridges a C++ scientific framework (Grid) to Metal via Objective-C++ .mm files, with MSL kernels compiled into .metallib archives during the build.
Things that work well:
Shared memory on M-series eliminates the CPU↔GPU copy overhead that dominates in CUDA workflows
The .metallib compilation integrates cleanly with autotools builds using xcrun
Float4 packing for SU(2) matrices maps naturally to MSL vector types
Things I'm still figuring out:
Optimal threadgroup sizes for stencil operations on 4D grids
Whether to use MTLHeap for gauge field storage or stick with individual buffers
Best practices for double precision — some measurements need float64 but Metal's double support varies by hardware
The application is measuring chromofield flux distributions between static quarks, ultimately targeting multi-quark systems. Production runs are on MacBook Pro M-series and Mac Studio.
Code: https://github.com/ThinkOffApp/multiquark-lattice-qcd
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Created
We are developing a video processing app that applies CIFilter chains to video frames. To not force the user to keep the app foregrounded, we were happy to see the introduction of BGContinuedProcessingTask to continue processing when backgrounded.
With iOS 26, I was excited to see the com.apple.developer.background-tasks.continued-processing.gpu entitlement, which should allow GPU access in the background. Even the article in the documentation provides "exporting video in a film-editing app" or "applying visual filters (HDR, etc) or compressing images for social media posts" as use cases. However, when I check BGTaskScheduler.shared.supportedResources.contains(.gpu) at runtime, it returns false on every iPhone I've tested (including iPhone 15 Pro and iPhone 16 Pro).
From forum responses I've seen, it sounds like background GPU access is currently limited to iPad only. If that's the case, I have a few questions:
Is this an intentional, permanent limitation — or is iPhone support planned for a future iOS release?
What is the recommended approach for GPU-dependent background work on iPhone? My custom CIKernels are written in Metal (as Apple recommends since CIKL is deprecated), but Metal CIKernels cannot fall back to CPU rendering. This creates a situation where Apple's own deprecation guidance (migrate to Metal) conflicts with background processing realities (no GPU on iPhone).
Should developers maintain deprecated CIKL kernel versions alongside Metal kernels purely as a CPU fallback for background execution? That feels like it defeats the purpose of the migration.
It seems like a gap in the platform: the API exists, the entitlement exists, but the hardware support isn't there for the most common device category. Any clarity on Apple's direction here would be very helpful.
Hello everyone! I am trying to wrap a ViewModifier inside a Swift Package that bundles a metal shader file to be used in the modifier. Everything works as expected in the Preview, in the Simulator and on a real device for iOS. It also works in Preview and in the Simulator for tvOS but not on a real AppleTV. I have tried this on a 4th generation Apple TV running tvOS 26.3 using Xcode 26.2.0.
Xcode logs the following: The metallib is processed and exists in the bundle.
Compiler failed to build request
precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn
Reason: visible function not loaded
Compiler failed to build request
precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn
Reason: visible function not loaded
Compiler failed to build request
precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn
Reason: visible function not loaded
Compiler failed to build request
precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn
Reason: visible function not loaded
Compiler failed to build request
precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn
Reason: visible function not loaded
Compiler failed to build request
precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn
Reason: visible function not loaded
Contents of Package.swift:
import PackageDescription
let package = Package(
name: "Test",
platforms: [
.iOS(.v17),
.tvOS(.v17)
],
products: [
.library(
name: "Test",
targets: [
"Test"
]
)
],
targets: [
.target(
name: "Test",
resources: [
.process("Shaders")
]
),
.testTarget(
name: "TestTests",
dependencies: [
"Test"
]
)
]
)
Content of my metal file:
#include <metal_stdlib>
using namespace metal;
[[ stitchable ]] float2 complexWave(float2 position, float time, float2 size, float speed, float strength, float frequency) {
float2 normalizedPosition = position / size;
float moveAmount = time * speed;
position.x += sin((normalizedPosition.x + moveAmount) * frequency) * strength;
position.y += cos((normalizedPosition.y + moveAmount) * frequency) * strength;
return position;
}
And my ViewModifier:
import MetalKit
import SwiftUI
extension ShaderFunction {
static let complexWave: ShaderFunction = {
ShaderFunction(
library: .bundle(.module),
name: "complexWave"
)
}()
}
extension Shader {
static func complexWave(arguments: [Shader.Argument]) -> Shader {
Shader(function: .complexWave, arguments: arguments)
}
}
struct WaveModifier: ViewModifier {
let start: Date = .now
func body(content: Content) -> some View {
TimelineView(.animation) { context in
let delta = context.date.timeIntervalSince(start)
content
.visualEffect { view, proxy in
view.distortionEffect(
.complexWave(
arguments: [
.float(delta),
.float2(proxy.size),
.float(0.5),
.float(8),
.float(10)
]
),
maxSampleOffset: .zero
)
}
}
.onAppear {
let paths = Bundle.module.paths(forResourcesOfType: "metallib", inDirectory: nil)
print(paths)
}
}
}
extension View {
public func wave() -> some View {
modifier(WaveModifier())
}
}
#Preview {
Image(systemName: "cart")
.wave()
}
Any help is appreciated.
I think if your buffer is less than 4k its recommended to use
setVertexBytes, the question I have is can I keep hammering on setVertexBytes as the primary method to issue multiple draw calls within a render buffer and rely on Metal to figure out how to orphan and replace the target buffer?
A lot of the primitives I am drawing are less than 4k and the process of wiring down larger segments of memory for individual buffers for each draw primitive call seems to be a negative.
And it's just simpler to copy, submit and forget about buffer synchronization.
Topic:
Graphics & Games
SubTopic:
Metal