使用 AVFoundation 将 AAC 音频和 h.264 视频流混合到 mp4答案

【问题标题】：Muxing AAC audio and h.264 video streams to mp4 with AVFoundation使用 AVFoundation 将 AAC 音频和 h.264 视频流混合到 mp4
【发布时间】：2018-10-13 00:29:26
【问题描述】：

对于 OSX 和 IOS，我有实时编码视频 (h.264) 和音频 (AAC) 数据流传入，我希望能够将这些数据混合到一个 mp4 中。

我正在使用AVAssetWriter 来执行复用。

我的视频可以正常工作，但我的音频听起来仍然像杂乱无章的静态声音。这是我现在正在尝试的（为简洁起见，此处略过一些错误检查）：

我初始化编写器：

   NSURL *url = [NSURL fileURLWithPath:mContext->filename];
   NSError* err = nil;
   mContext->writer = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeMPEG4 error:&err];

我初始化音频输入：

     NSDictionary* settings;
     AudioChannelLayout acl;
     bzero(&acl, sizeof(acl));
     acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
     settings = nil; // set output to nil so it becomes a pass-through

     CMAudioFormatDescriptionRef audioFormatDesc = nil;
     {
        AudioStreamBasicDescription absd = {0};
        absd.mSampleRate = mParameters.audioSampleRate; //known sample rate
        absd.mFormatID = kAudioFormatMPEG4AAC;
        absd.mFormatFlags = kMPEG4Object_AAC_Main;
        CMAudioFormatDescriptionCreate(NULL, &absd, 0, NULL, 0, NULL, NULL, &audioFormatDesc);
     }

     mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:settings sourceFormatHint:audioFormatDesc];
     mContext->aacWriterInput.expectsMediaDataInRealTime = YES;
     [mContext->writer addInput:mContext->aacWriterInput];

然后启动编写器：

   [mContext->writer startWriting];
   [mContext->writer startSessionAtSourceTime:kCMTimeZero];

然后，我有一个回调，我收到一个带有时间戳（毫秒）的数据包，以及一个包含 1024 个压缩样本的数据的std::vector<uint8_t>。我确保isReadyForMoreMediaData 是真的。然后，如果这是我们第一次收到回调，我设置了 CMAudioFormatDescription：

   OSStatus error = 0;

   AudioStreamBasicDescription streamDesc = {0};
   streamDesc.mSampleRate = mParameters.audioSampleRate;
   streamDesc.mFormatID = kAudioFormatMPEG4AAC;
   streamDesc.mFormatFlags = kMPEG4Object_AAC_Main;
   streamDesc.mChannelsPerFrame = 2;  // always stereo for us
   streamDesc.mBitsPerChannel = 0;
   streamDesc.mBytesPerFrame = 0;
   streamDesc.mFramesPerPacket = 1024; // Our AAC packets contain 1024 samples per frame
   streamDesc.mBytesPerPacket = 0;
   streamDesc.mReserved = 0;

   AudioChannelLayout acl;
   bzero(&acl, sizeof(acl));
   acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
   error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &streamDesc, sizeof(acl), &acl, 0, NULL, NULL, &mContext->audioFormat);

最后，我创建了一个CMSampleBufferRef 并将其发送：

   CMSampleBufferRef buffer = NULL;
   CMBlockBufferRef blockBuffer;
   CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, NULL, packet.data.size(), kCFAllocatorDefault, NULL, 0, packet.data.size(), kCMBlockBufferAssureMemoryNowFlag, &blockBuffer);
   CMBlockBufferReplaceDataBytes((void*)packet.data.data(), blockBuffer, 0, packet.data.size());

   CMTime duration = CMTimeMake(1024, mParameters.audioSampleRate);
   CMTime pts = CMTimeMake(packet.timestamp, 1000);
   CMSampleTimingInfo timing = {duration , pts, kCMTimeInvalid };

   size_t sampleSizeArray[1] = {packet.data.size()};

   error = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, nullptr, mContext->audioFormat, 1, 1, &timing, 1, sampleSizeArray, &buffer);       

   // First input buffer must have an appropriate kCMSampleBufferAttachmentKey_TrimDurationAtStart since the codec has encoder delay'
   if (mContext->firstAudioFrame)
   {
      CFDictionaryRef dict = NULL;
      dict = CMTimeCopyAsDictionary(CMTimeMake(1024, 44100), kCFAllocatorDefault);
      CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, dict, kCMAttachmentMode_ShouldNotPropagate);
      // we must trim the start time on first audio frame...
      mContext->firstAudioFrame = false;
   }

   CMSampleBufferMakeDataReady(buffer);

   BOOL ret = [mContext->aacWriterInput appendSampleBuffer:buffer];

我想我最怀疑的部分是我对 CMSampleBufferCreate 的调用。看来我必须传入样本大小数组，否则在检查作者状态时会立即收到此错误消息：

Error Domain=AVFoundationErrorDomain Code=-11800 "The operation could not be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12735), NSLocalizedDescription=The operation could not be completed, NSUnderlyingError=0x604001e50770 {Error Domain=NSOSStatusErrorDomain Code=-12735 "(null)"}}

基础错误似乎是kCMSampleBufferError_BufferHasNoSampleSizes。

我确实注意到 Apple 文档中使用 AAC 数据创建缓冲区的示例： https://developer.apple.com/documentation/coremedia/1489723-cmsamplebuffercreate?language=objc

在他们的示例中，他们为每个样本指定了一个长的 sampleSizeArray 条目。那有必要吗？我没有这个回调的信息。在我们的 Windows 实现中，我们不需要这些数据。所以我尝试发送 packet.data.size() 作为样本大小，但这似乎不对，而且肯定不会产生令人愉悦的音频。

有什么想法吗？在这里调整我的调用或我应该使用不同的 API 将编码数据流混合在一起。

谢谢！

【问题讨论】：

我不知道如何提供帮助，但这是如何在 stackoverflow 上写你的第一个问题的一个很好的例子！ :D

标签： ios mp4 aac avassetwriter mux

【解决方案1】：

如果您不想转码，请不要传递 outputSetting 字典。你应该在那里传递 nil ： mContext->aacWriterInput = [AVAssetWriterInputassetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:nil sourceFormatHint:audioFormatDesc];

本文某处对此进行了解释： https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/05_Export.html

【讨论】：