I am trying to use SpeechDetector Module in Speech framework along with SpeechTranscriber. and it is giving me an error
Cannot convert value of type 'SpeechDetector' to expected element type 'Array.ArrayLiteralElement' (aka 'any SpeechModule')
Below is how I am using it
let speechDetector = Speech.SpeechDetector()
let transcriber = SpeechTranscriber(locale: Locale.current,
transcriptionOptions: [],
reportingOptions: [.volatileResults],
attributeOptions: [.audioTimeRange])
speechAnalyzer = try SpeechAnalyzer(modules: [transcriber,speechDetector])
Explore the integration of media technologies within your app. Discuss working with audio, video, camera, and other media functionalities.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hello,
Does anyone have a recipe on how to raycast VNFaceLandmarkRegion2D points obtained from a frame's capturedImage?
More specifically, how to construct the "from" parameter of the frame's raycastQuery from a VNFaceLandmarkRegion2D point?
Do the points need to be flipped vertically? Is there any other transformation that needs to be performed on the points prior to passing them to raycastQuery?
Our streaming app uses FairPlay-protected video streams, which previously worked fine when using AVAssetResourceLoaderDelegate to provide CKCs.
Recently, we migrated to AVContentKeySession, and while everything works as expected during regular playback, we encountered an issue with AirPlay.
Our CKC has a 120-second expiry, so we renew it by calling renewExpiringResponseData..
This trigger the didProvideRenewingContentKeyRequest delegate and we respond with updated CKC.
However, when streaming via AirPlay, both video and audio freeze exactly after 120 seconds.
To validate the issue, I tested with AVAssetResourceLoaderDelegate and found that I can reproduce the same freeze if I do not renew the key. This suggests that AirPlay is not accepting the renewed CKC when using AVContentKeySession.
Additional Details:
This issue occurs across different iOS versions and various AirPlay devices.
The same content plays without issues when played directly on the device.
The renewal process is successful, and segments continue to load, but playback remains frozen.
Tried renewing the CKC bit early (100s).
I also tried setting player.usesExternalPlaybackWhileExternalScreenIsActive = true, but the issue persists.
We don't use persistentKey.
Is there anything else that needs to be considered for proper key renewal when AirPlaying?
Any help on how to fix this or confirmation if this is a known issue would be greatly appreciated.
Context:
I am currently developing an app using the Push-to-Talk (PTT) framework. I have reviewed both the PTT framework documentation and the CallKit demo project to better understand how to properly manage audio session activation and AVAudioEngine setup.
I am not activating the audio session manually. The audio session configuration is handled in the incomingPushResult or didBeginTransmitting callbacks from the PTChannelManagerDelegate.
I am using a single AVAudioEngine instance for both input and playback. The engine is started in the didActivate callback from the PTChannelManagerDelegate. When I receive a push in full duplex mode, I set the active participant to the user who is speaking.
Issue
When I attempt to talk while the other participant is already speaking, my input tap on the input node takes a few seconds to return valid PCM audio data. Initially, it returns an empty PCM audio block.
Details:
The audio session is already active and configured with .playAndRecord.
The input tap is already installed when the engine is started.
When I talk from a neutral state (no one is speaking), the system plays the standard "microphone activation" tone, which covers this initial delay. However, this does not happen when I am already receiving audio.
Assumptions / Current Setup
Because the audio session is active in play and record, I assumed that microphone input would be available immediately, even while receiving audio.
However, there seems to be a delay before valid input is delivered to the tap, only occurring when switching from a receive state to simultaneously talking.
Questions
Is this expected behavior when using the PTT framework in full duplex mode with a shared AVAudioEngine?
Should I be restarting or reconfiguring the engine or audio session when beginning to talk while receiving audio?
Is there a recommended pattern for managing microphone readiness in this scenario to avoid the initial empty PCM buffer?
Would using separate engines for input and output improve responsiveness?
I would like to confirm the correct approach to handling simultaneous talk and receive in full duplex mode using PTT framework and AVAudioEngine. Specifically, I need guidance on ensuring the microphone is ready to capture audio immediately without the delay seen in my current implementation.
Relevant Code Snippets
Engine Setup
func setup() {
let input = audioEngine.inputNode
do {
try input.setVoiceProcessingEnabled(true)
} catch {
print("Could not enable voice processing \(error)")
return
}
input.isVoiceProcessingAGCEnabled = false
let output = audioEngine.outputNode
let mainMixer = audioEngine.mainMixerNode
audioEngine.connect(pttPlayerNode, to: mainMixer, format: outputFormat)
audioEngine.connect(beepNode, to: mainMixer, format: outputFormat)
audioEngine.connect(mainMixer, to: output, format: outputFormat)
// Initialize converters
converter = AVAudioConverter(from: inputFormat, to: outputFormat)!
f32ToInt16Converter = AVAudioConverter(from: outputFormat, to: inputFormat)!
audioEngine.prepare()
}
Input Tap Installation
func installTap() {
guard AudioHandler.shared.checkMicrophonePermission() else {
print("Microphone not granted for recording")
return
}
guard !isInputTapped else {
print("[AudioEngine] Input is already tapped!")
return
}
let input = audioEngine.inputNode
let microphoneFormat = input.inputFormat(forBus: 0)
let microphoneDownsampler = AVAudioConverter(from: microphoneFormat, to: outputFormat)!
let desiredFormat = outputFormat
let inputFramesNeeded = AVAudioFrameCount((Double(OpusCodec.DECODED_PACKET_NUM_SAMPLES) * microphoneFormat.sampleRate) / desiredFormat.sampleRate)
input.installTap(onBus: 0, bufferSize: inputFramesNeeded, format: input.inputFormat(forBus: 0)) { [weak self] buffer, when in
guard let self = self else { return }
// Output buffer: 1920 frames at 16kHz
guard let outputBuffer = AVAudioPCMBuffer(pcmFormat: desiredFormat, frameCapacity: AVAudioFrameCount(OpusCodec.DECODED_PACKET_NUM_SAMPLES)) else { return }
outputBuffer.frameLength = outputBuffer.frameCapacity
let inputBlock: AVAudioConverterInputBlock = { inNumPackets, outStatus in
outStatus.pointee = .haveData
return buffer
}
var error: NSError?
let converterResult = microphoneDownsampler.convert(to: outputBuffer, error: &error, withInputFrom: inputBlock)
if converterResult != .haveData {
DebugLogger.shared.print("Downsample error \(converterResult)")
} else {
self.handleDownsampledBuffer(outputBuffer)
}
}
isInputTapped = true
}
Hello,
Basically, I am reading and writing an asset.
To simplify, I am just reading the asset and rewriting it into an output video without any modifications.
However, I want to add a fade-out effect to the last three seconds of the output video.
I don’t know how to do this.
So far, before adding the CMSampleBuffer to the output video, I tried reducing its volume using an extension on CMSampleBuffer.
In the extension, I passed 0.4 for testing, aiming to reduce the video's overall volume by 60%.
My question is:
How can I directly adjust the volume of a CMSampleBuffer?
Here is the extension:
extension CMSampleBuffer {
func adjustVolume(by factor: Float) -> CMSampleBuffer? {
guard let blockBuffer = CMSampleBufferGetDataBuffer(self) else { return nil }
var length = 0
var dataPointer: UnsafeMutablePointer<Int8>?
guard CMBlockBufferGetDataPointer(blockBuffer, atOffset: 0, lengthAtOffsetOut: nil, totalLengthOut: &length, dataPointerOut: &dataPointer) == kCMBlockBufferNoErr else { return nil }
guard let dataPointer = dataPointer else { return nil }
let sampleCount = length / MemoryLayout<Int16>.size
dataPointer.withMemoryRebound(to: Int16.self, capacity: sampleCount) { pointer in
for i in 0..<sampleCount {
let sample = Float(pointer[i])
pointer[i] = Int16(sample * factor)
}
}
return self
}
}
Please include the line below in follow-up emails for this request.
Case-ID: 11089799
When using AVSpeechUtterance and setting it to play in Mandarin, if Siri is set to Cantonese on iOS 18, it will be played in Cantonese. There is no such issue on iOS 17 and 16.
1.let utterance = AVSpeechUtterance(string: textView.text)
let voice = AVSpeechSynthesisVoice(language: "zh-CN")
utterance.voice = voice
2.In the phone settings, Siri is set to Cantonese
I occasionally receive reports from users that photo import from the Photos library gets stuck and the progress appears to stop indefinitely.
I’m using the following APIs:
func fetchAsset(_ asset: PHAsset) {
let options = PHImageRequestOptions()
options.deliveryMode = .highQualityFormat
options.resizeMode = .exact
options.isSynchronous = false
options.isNetworkAccessAllowed = true
options.progressHandler = { (progress, error, stop, info) in
// 🚨 never called
}
let requestId = PHImageManager.default().requestImageDataAndOrientation(
for: asset,
options: options
) { data, _, _, info in
// 🚨 never called
}
}
Due to repeated reports, I added detailed logs inside the callback closures. Based on the logs, it looks like the request keeps waiting without any callbacks being invoked — neither the progressHandler nor the completion block of requestImageDataAndOrientation is called.
This happens not only with the PHImageManager approach, but also when using PHAsset with PHContentEditingInputRequestOptions — the completion callback is not invoked as well.
func fetchAssetByContentEditingInput(_ asset: PHAsset) {
let options = PHContentEditingInputRequestOptions()
options.isNetworkAccessAllowed = true
asset.requestContentEditingInput(with: nil) { contentEditingInput, info in
// 🚨 never called
}
}
I suspect this is related to iCloud Photos.
Here is what I confirmed from affected users:
Using the native picker (My app also provides the native picker as an alternative option for attaching photos), iCloud download proceeds normally and the photo can be attached. However, using the PHImageManager-based approach in my app, the same photo cannot be attached.
Even after verifying that the photo has been fully downloaded from iCloud (e.g., by trying “Export Unmodified Originals” in the Photos app as described here: https://support.apple.com/en-us/111762, and confirming the iCloud download progress completed), the callback is still not invoked for that asset.
Detailed flow for (1):
I asked the user to attach the problematic photo (the one where callbacks never fire) using the native photo picker (UIImagePickerController).
The UI showed “Downloading from iCloud” progress.
The progress advanced and the photo was attached successfully.
Then I asked the user to attach the same photo again using my custom photo picker (which uses the PHImageManager APIs mentioned above).
The progress did not advance (No callbacks were invoked).
The operation waited indefinitely and never completed.
Workaround / current behavior:
If I ask users to reboot the device and try again, about 6 out of 10 users can attach successfully afterward.
The remaining ~4 out of 10 users still cannot attach even after rebooting.
For users who are not fixed immediately after reboot, it seems to resolve naturally after some time.
I’ve seen similar reports elsewhere, so I’m wondering if Apple is already aware of an internal issue related to this. If there is any known information, guidance, or recommended workaround, I would appreciate it.
I also logged the properties of affected PHAssets (metadata) when the issue occurs, and I can share them below if that helps troubleshooting:
[size=3.91MB] [PHAssetMediaSubtype(rawValue: 528)+DepthEffect | userLibrary | (4284x5712) | adjusted=true]
[size=3.91MB] [PHAssetMediaSubtype(rawValue: 528)+DepthEffect | userLibrary | (4284x5712) | adjusted=true]
[size=2.72MB] [PHAssetMediaSubtype(rawValue: 16)+DepthEffect | userLibrary | (3024x4032) | adjusted=true]
[size=2.72MB] [PHAssetMediaSubtype(rawValue: 16)+DepthEffect | userLibrary | (3024x4032) | adjusted=true]
[size=2.49MB] [PHAssetMediaSubtype(rawValue: 16)+DepthEffect | userLibrary | (3024x4032) | adjusted=true]
[size=2.49MB] [PHAssetMediaSubtype(rawValue: 16)+DepthEffect | userLibrary | (3024x4032) | adjusted=true]
Hello Apple team and developer community,
I am preparing a visionOS app for a fair environment, where we want to automatically stream the current experience to a nearby monitor via AirPlay, without requiring guests or staff to manually interact with the Control Center or AirPlay pickers all the time.
The goal is to provide a smooth, frictionless setup so attendees can focus on the demo, not the configuration.
Feature Request:
A supported API or method to programmatically start/stop AirPlay video streaming (mirroring or external playback) from within a visionOS app, allowing the current experience to be instantly displayed on an external monitor or Apple TV for the audience.
Context & Rationale:
In a trade fair or exhibition setting, rapid guest turnaround and minimal staff intervention are crucial. Having to manually guide each visitor through AirPlay setup is impractical.
As I understood, AVRoutePickerView can be used for this on iOS/macOS, but this is not available in visionOS. Enabling similar automated streaming on visionOS would make the device far more suitable for live demos and public showcases.
Questions:
Are there any supported workarounds or best practices for enabling automated screen streaming or AirPlay initiation on visionOS in public demo environments that I missed?
Is Apple considering adding programmatic AirPlay control or accessibility features to support such use cases in future visionOS releases?
Thank you for considering this request! If there are recommended patterns, entitlements, or accessibility solutions we could explore for trade fair scenarios, your guidance would be greatly appreciated.
Best regards,
Julian Zürn - IPI, HS Kempten
I am using AVMulti so the user captures two images how can I access those images if there is only one url that stores the captured images for the lockScreenCapture extension ? Plus how can I detect if the user opened the app from the extension to be able to navigate the user to the right screen ?
I'm using an AVCaptureSession to send video and audio samples to an AVAssetWriter. When I play back the resultant video, sometimes there is a significant lag between the audio compared with the video, so they're just not in sync. But sometimes they are, with the same code.
If I look at the very first presentation time stamps of the buffers being sent to the delegate, via
func captureOutput(_: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)
I see something like this:
Adding audio samples for pts time 227711.0855328798,
Adding video samples for pts time 227710.778785374
That is, the clock for audio vs video is behind: the first audio sample I receive is at 11.08 something, while the video video sample is earlier in time, at 10.778 something. The times are the presentation time stamps of the buffer, and the outputPresentationTimeStamp is the exact same number.
It feels like "video" vs the "audio" clock are just mismatched.
This doesn't always happen: sometimes they're synced. Sometimes they're not.
Any ideas? The device I'm recording is a webcam, on iPadOS, connected via the usb-c port.
Hey,
There seems to be an inconsistency when capturing a photo using
QualityPrioritization.Quality on the iPhone 17 Pro Main wide Lens. If you zoom above "2x" the output image always has "-2.0ev" bias in the meta data and looks underexposued. This does not happen at zoom levels above 2, or if you set the QualityPrioritization to .Balanced.
See below:
with .Quality
with .Balanced
This does not happen on the other lenses.
I'm using a simple set up and it is consistent across JPEG and ProRAW capture. I have a demo project if that is useful.
Thanks,
Alex
hello, I'm using VideoTololbox VTFrameRateConversionConfiguration to perform frame interpolation: https://developer.apple.com/documentation/videotoolbox/vtframerateconversionconfiguration?language=objc ,when using 640x480 vidoe input, I got error:
Error ! Invalid configuration
[VEEspressoModel] build failure : flow_adaptation_feature_extractor_rev2.espresso.net. Configuration: landscape640x480
[EpsressoModel] Cannot load Net file flow_adaptation_feature_extractor_rev2.espresso.net. Configuration: landscape640x480
Error: failed to create FRCFlowAdaptationFeatureExtractor for usage 8
Failed to switch (0x12c40e140) [usage:8, 1/4 flow:0, adaptation layer:1, twoStage:0, revision:2, flow size (320x240)].
Could not init FlowAdaptation
initFlowAdaptationWithError fail
tried 2048x1080 is ok.
Hi,
I’m building a PPG-based heart rate feature where the user places their finger over the rear telephoto camera. On iPhone 16 Pro Max, I'm explicitly selecting the telephoto lens like this:
videoDevice = AVCaptureDevice.default(.builtInTelephotoCamera, for: .video, position: .back)
And trying to lock it:
if #available(iOS 15.0, *),
device.activePrimaryConstituentDeviceSwitchingBehavior != .unsupported {
try? device.lockForConfiguration()
device.setPrimaryConstituentDeviceSwitchingBehavior(.locked, restrictedSwitchingBehaviorConditions: [])
device.unlockForConfiguration()
}
I also lock everything else to prevent dynamic changes:
try device.lockForConfiguration()
device.focusMode = .locked
device.exposureMode = .locked
device.whiteBalanceMode = .locked
device.videoZoomFactor = 1.0
device.automaticallyEnablesLowLightBoostWhenAvailable = false
device.automaticallyAdjustsVideoHDREnabled = false
device.unlockForConfiguration()
Despite this, the camera still switches to another lens, especially under different lighting, even though the user’s finger fully covers the lens.
Questions:
How can I completely prevent lens switching in this scenario?
Would using videoZoomFactor = 3.0 or 5.0 better enforce use of the telephoto lens?
Thanks!
Gal
I'm capturing video stream from GoPro camera (I demux UDP MPEG-TS packets) and create CMSampleBuffers from them, this works fine when I display them using CMSampleBufferLayer.
However when I dump them to disk using AVAssetWriter and then playback it with AVPlayer, AVPlayer has problems with scrubbing, it also cannot render previous frames, it needs to go back to key frames. Also thumbnails generated with AVAssetImageGenerator are mostly distorted and green, even though I set the requestedTimeToleranceAfter longer than the key frames frequency.
When I re-encode saved video once again with AVAssetExportSession and play it back then I can scrub the video just fine.
Is it because re-transcoding adds additional metadata to enable generating frames when rewinding the video and scrubbing?
If so is there a way to achieve it with AVAssetWriter without much time penalty? I need the dump/save operation to be very fast.
I also considered the following: Instead of de-muxing video and creating CMSampleBuffers, maybe I could directly dump the stream to disk and somehow add moov atoms with timing information. Would this approach work? If so where I can find information how to do it?
Thank you!
If the app is launched from LockedCameraCapture and if the settings button is tapped, I need to launch the main app.
CameraViewController:
func settingsButtonTapped() {
#if isLockedCameraCaptureExtension
//App is launched from Lock Screen
//Launch main app here...
#else
//App is launched from Home Screen
self.showSettings(animated: true)
#endif
}
In this document:
https://developer.apple.com/documentation/lockedcameracapture/creating-a-camera-experience-for-the-lock-screen
Apple asks you to use:
func launchApp(with session: LockedCameraCaptureSession, info: String) {
Task {
do {
let activity = NSUserActivityTypeLockedCameraCapture
activity.userInfo = [UserInfoKey: info]
try await session.openApplication(for: activity)
} catch {
StatusManager.displayError("Unable to open app - \(error.localizedDescription)")
}
}
}
However, the documentation states that this should be placed within the extension code - LockedCameraCapture. If I do that, how can I call that all the way down from the main app's CameraViewController?
Dear Apple Developer Team,
On iOS 26, the contents of PDF pages appear to be swapped.
Could you please advise if there is a workaround or a planned fix for this issue?
Steps to Reproduce:
Download the attached PDF on iOS 26.
Open the PDF in the Files app.
Tap the PDF to view it in Quick Look.
Navigate to page 5.
Expected Result:
The page number displayed at the bottom should be 5.
Actual Result:
The page number displayed at the bottom is 4.
Issue:
This is not limited to page 5—multiple page contents appear to be swapped.
I have also submitted feedback via Feedback Assistant (FB20743531) on October 20.
Best regards,
Yoshihito Suezawa
I am trying to use SpeechTranscriber from Speech framework. Is it possible to use it on Simulator of iOS 26 (Mac OS Tahoe)? Function "supportedLocales" returns an empty array.
Since iOS and tvOS 18, CMCD can now be automatically sent by AVPlayer (https://developer.apple.com/streaming/Whats-new-HLS.pdf).
However, after enabling CMCD, our streams occasionally fail with the following error: CoreMediaErrorDomain Error -17383
This issue appears to affect only DRM-protected (FairPlay) streams so far.
We activate CMCD via the resource loader of an AVURLAsset, before assigning the item to an AVPlayer.
Unfortunately, we haven’t found a reliable way to reproduce the issue, and we’ve been unable to gather any useful diagnostic information.
Has anyone else observed this behavior when enabling CMCD on FairPlay streams?
Topic:
Media Technologies
SubTopic:
Streaming
Tags:
FairPlay Streaming
iOS
HTTP Live Streaming
AVFoundation
I use
htttps://api.music.apple.com/v1/me/library/playlists/${playlistId}/tracks
to add tracks to a playlist I created.
How do I DELETE tracks from the playlist?
The documentation does not mention a method for this. I have tried calling DELETE methods in various combinations but nothing seems to work.
Is this possible?
Hi everyone,
We're encountering an unexpected issue with our iPhone-only camera app:
👉 TimeMark - Photo Proof
https://apps.apple.com/us/app/timemark-photo-proof/id6446071834
Problem Description:
Our app uses a full-screen camera view via AVCaptureSession. In some cases reported by users, the camera fails immediately upon app launch, and we receive this interruption reason:
AVCaptureSessionInterruptionReasonVideoDeviceNotAvailableWithMultipleForegroundApps
According to the Apple documentation https://developer.apple.com/documentation/avfoundation/avcapturesession/interruptionreason/videodevicenotavailablewithmultipleforegroundapps?language=objc , this interruption typically occurs when the app is running in a multi-app layout such as Slide Over, Split View, or Picture in Picture — all of which are iPad-only features.
However, this issue is being reported on iPhones, and our app does not support iPad at all.
Also noted in the documentation:
"Given your present AVCaptureSession configuration, the session may only be run if your app occupies the full screen."
Additional Context:
The issue occurs immediately on app launch, before the user can interact with the camera.
We don’t enable multitaskingCameraAccessEnabled.
We are 100% sure this is happening on iPhone, not iPad.
It’s hard to reproduce; users report it happening sporadically.
Locally, we tried playing Picture-in-Picture videos (e.g., Safari/YouTube) before launching our app, but we could not reproduce the issue.
Questions:
Why is this interruption reason occurring on iPhone, which doesn’t officially support Slide Over or Split View?
Could this be caused by some system-level multitasking or resource contention (e.g., Picture in Picture from FaceTime or Safari)?
Would enabling multitaskingCameraAccessEnabled help prevent this issue on iPhone, even though it's designed for iPad?
Enabling multitaskingCameraAccessEnabled seems to require enabling UIBackgroundModes → voip.
Would adding this background mode cause any App Store review risk or rejection if our app doesn't actually use VoIP functionality?
Any help, insight, or suggestions would be greatly appreciated. Thanks in advance!