从音频文件中提取电平表答案

【问题标题】：Extract meter levels from audio file从音频文件中提取电平表
【发布时间】：2019-01-13 08:03:29
【问题描述】：

我需要从文件中提取音频电平表，以便在播放音频之前渲染电平。我知道AVAudioPlayer可以在通过播放音频文件时获取此信息

func averagePower(forChannel channelNumber: Int) -> Float.

但就我而言，我想事先获得[Float] 的米级。

【问题讨论】：

标签： ios swift audio avaudioplayer audiotoolbox

【解决方案1】：

斯威夫特 4

它需要一部 iPhone：

0.538s 以4min47s 持续时间和44,100 采样率处理8MByte mp3 播放器
0.170s 以22s 持续时间和44,100 采样率处理712KByte mp3 播放器
0.089s 处理caf通过在终端中使用此命令afconvert -f caff -d LEI16 audio.mp3 audio.caf 转换上述文件创建的文件。

让我们开始吧：

A)声明这个类将保存有关音频资产的必要信息：

/// Holds audio information used for building waveforms
final class AudioContext {
    
    /// The audio asset URL used to load the context
    public let audioURL: URL
    
    /// Total number of samples in loaded asset
    public let totalSamples: Int
    
    /// Loaded asset
    public let asset: AVAsset
    
    // Loaded assetTrack
    public let assetTrack: AVAssetTrack
    
    private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
        self.audioURL = audioURL
        self.totalSamples = totalSamples
        self.asset = asset
        self.assetTrack = assetTrack
    }
    
    public static func load(fromAudioURL audioURL: URL, completionHandler: @escaping (_ audioContext: AudioContext?) -> ()) {
        let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])
        
        guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
            fatalError("Couldn't load AVAssetTrack")
        }
        
        asset.loadValuesAsynchronously(forKeys: ["duration"]) {
            var error: NSError?
            let status = asset.statusOfValue(forKey: "duration", error: &error)
            switch status {
            case .loaded:
                guard
                    let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
                    let audioFormatDesc = formatDescriptions.first,
                    let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
                    else { break }
                
                let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
                let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
                completionHandler(audioContext)
                return
                
            case .failed, .cancelled, .loading, .unknown:
                print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
            }
            
            completionHandler(nil)
        }
    }
}

我们将使用它的异步函数load，并将其结果处理给完成处理程序。

B) 在您的视图控制器中导入 AVFoundation 和 Accelerate：

import AVFoundation
import Accelerate

C) 在视图控制器中声明噪声级别（以 dB 为单位）：

let noiseFloor: Float = -80

例如，小于-80dB 的任何内容都将被视为静音。

D) 以下函数采用音频上下文并产生所需的 dB 功率。 targetSamples 默认设置为 100，您可以更改它以满足您的 UI 需求：

func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
    guard let audioContext = audioContext else {
        fatalError("Couldn't create the audioContext")
    }
    
    let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples
    
    guard let reader = try? AVAssetReader(asset: audioContext.asset)
        else {
            fatalError("Couldn't initialize the AVAssetReader")
    }
    
    reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
                                   duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))
    
    let outputSettingsDict: [String : Any] = [
        AVFormatIDKey: Int(kAudioFormatLinearPCM),
        AVLinearPCMBitDepthKey: 16,
        AVLinearPCMIsBigEndianKey: false,
        AVLinearPCMIsFloatKey: false,
        AVLinearPCMIsNonInterleaved: false
    ]
    
    let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
                                                outputSettings: outputSettingsDict)
    readerOutput.alwaysCopiesSampleData = false
    reader.add(readerOutput)
    
    var channelCount = 1
    let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
    for item in formatDescriptions {
        guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
            fatalError("Couldn't get the format description")
        }
        channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
    }
    
    let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
    let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
    
    var outputSamples = [Float]()
    var sampleBuffer = Data()
    
    // 16-bit samples
    reader.startReading()
    defer { reader.cancelReading() }
    
    while reader.status == .reading {
        guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
            let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
                break
        }
        // Append audio sample buffer into our current sample buffer
        var readBufferLength = 0
        var readBufferPointer: UnsafeMutablePointer<Int8>?
        CMBlockBufferGetDataPointer(readBuffer, 0, &readBufferLength, nil, &readBufferPointer)
        sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
        CMSampleBufferInvalidate(readSampleBuffer)
        
        let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
        let downSampledLength = totalSamples / samplesPerPixel
        let samplesToProcess = downSampledLength * samplesPerPixel
        
        guard samplesToProcess > 0 else { continue }
        
        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }
    
    // Process the remaining samples at the end which didn't fit into samplesPerPixel
    let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
    if samplesToProcess > 0 {
        let downSampledLength = 1
        let samplesPerPixel = samplesToProcess
        let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
        
        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }
    
    // if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
    guard reader.status == .completed else {
        fatalError("Couldn't read the audio file")
    }
    
    return outputSamples
}

E) render 使用此函数对音频文件中的数据进行下采样，并转换为分贝：

func processSamples(fromData sampleBuffer: inout Data,
                    outputSamples: inout [Float],
                    samplesToProcess: Int,
                    downSampledLength: Int,
                    samplesPerPixel: Int,
                    filter: [Float]) {
    sampleBuffer.withUnsafeBytes { (samples: UnsafePointer<Int16>) in
        var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)
        
        let sampleCount = vDSP_Length(samplesToProcess)
        
        //Convert 16bit int samples to floats
        vDSP_vflt16(samples, 1, &processingBuffer, 1, sampleCount)
        
        //Take the absolute values to get amplitude
        vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)
        
        //get the corresponding dB, and clip the results
        getdB(from: &processingBuffer)
        
        //Downsample and average
        var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
        vDSP_desamp(processingBuffer,
                    vDSP_Stride(samplesPerPixel),
                    filter, &downSampledData,
                    vDSP_Length(downSampledLength),
                    vDSP_Length(samplesPerPixel))
        
        //Remove processed samples
        sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)
        
        outputSamples += downSampledData
    }
}

F) 依次调用该函数获取对应的dB，并将结果剪辑到[noiseFloor, 0]：

func getdB(from normalizedSamples: inout [Float]) {
    // Convert samples to a log scale
    var zero: Float = 32768.0
    vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)
    
    //Clip to [noiseFloor, 0]
    var ceil: Float = 0.0
    var noiseFloorMutable = noiseFloor
    vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
}

G) 最后你可以像这样得到音频的波形：

guard let path = Bundle.main.path(forResource: "audio", ofType:"mp3") else {
    fatalError("Couldn't find the file path")
}
let url = URL(fileURLWithPath: path)
var outputArray : [Float] = []
AudioContext.load(fromAudioURL: url, completionHandler: { audioContext in
    guard let audioContext = audioContext else {
        fatalError("Couldn't create the audioContext")
    }
    outputArray = self.render(audioContext: audioContext, targetSamples: 300)
})

不要忘记AudioContext.load(fromAudioURL:) 是异步的。

此解决方案由 William Entriken 从this repo 合成。所有功劳归于他。

斯威夫特 5

这是更新为 Swift 5 语法的相同代码：

import AVFoundation
import Accelerate

/// Holds audio information used for building waveforms
final class AudioContext {
    
    /// The audio asset URL used to load the context
    public let audioURL: URL
    
    /// Total number of samples in loaded asset
    public let totalSamples: Int
    
    /// Loaded asset
    public let asset: AVAsset
    
    // Loaded assetTrack
    public let assetTrack: AVAssetTrack
    
    private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
        self.audioURL = audioURL
        self.totalSamples = totalSamples
        self.asset = asset
        self.assetTrack = assetTrack
    }
    
    public static func load(fromAudioURL audioURL: URL, completionHandler: @escaping (_ audioContext: AudioContext?) -> ()) {
        let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])
        
        guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
            fatalError("Couldn't load AVAssetTrack")
        }
        
        asset.loadValuesAsynchronously(forKeys: ["duration"]) {
            var error: NSError?
            let status = asset.statusOfValue(forKey: "duration", error: &error)
            switch status {
            case .loaded:
                guard
                    let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
                    let audioFormatDesc = formatDescriptions.first,
                    let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
                    else { break }
                
                let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
                let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
                completionHandler(audioContext)
                return
                
            case .failed, .cancelled, .loading, .unknown:
                print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
            }
            
            completionHandler(nil)
        }
    }
}

let noiseFloor: Float = -80

func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
    guard let audioContext = audioContext else {
        fatalError("Couldn't create the audioContext")
    }
    
    let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples
    
    guard let reader = try? AVAssetReader(asset: audioContext.asset)
        else {
            fatalError("Couldn't initialize the AVAssetReader")
    }
    
    reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
                                   duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))
    
    let outputSettingsDict: [String : Any] = [
        AVFormatIDKey: Int(kAudioFormatLinearPCM),
        AVLinearPCMBitDepthKey: 16,
        AVLinearPCMIsBigEndianKey: false,
        AVLinearPCMIsFloatKey: false,
        AVLinearPCMIsNonInterleaved: false
    ]
    
    let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
                                                outputSettings: outputSettingsDict)
    readerOutput.alwaysCopiesSampleData = false
    reader.add(readerOutput)
    
    var channelCount = 1
    let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
    for item in formatDescriptions {
        guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
            fatalError("Couldn't get the format description")
        }
        channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
    }
    
    let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
    let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
    
    var outputSamples = [Float]()
    var sampleBuffer = Data()
    
    // 16-bit samples
    reader.startReading()
    defer { reader.cancelReading() }
    
    while reader.status == .reading {
        guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
            let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
                break
        }
        // Append audio sample buffer into our current sample buffer
        var readBufferLength = 0
        var readBufferPointer: UnsafeMutablePointer<Int8>?
        CMBlockBufferGetDataPointer(readBuffer,
                                    atOffset: 0,
                                    lengthAtOffsetOut: &readBufferLength,
                                    totalLengthOut: nil,
                                    dataPointerOut: &readBufferPointer)
        sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
        CMSampleBufferInvalidate(readSampleBuffer)
        
        let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
        let downSampledLength = totalSamples / samplesPerPixel
        let samplesToProcess = downSampledLength * samplesPerPixel
        
        guard samplesToProcess > 0 else { continue }
        
        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }
    
    // Process the remaining samples at the end which didn't fit into samplesPerPixel
    let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
    if samplesToProcess > 0 {
        let downSampledLength = 1
        let samplesPerPixel = samplesToProcess
        let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
        
        processSamples(fromData: &sampleBuffer,
                       outputSamples: &outputSamples,
                       samplesToProcess: samplesToProcess,
                       downSampledLength: downSampledLength,
                       samplesPerPixel: samplesPerPixel,
                       filter: filter)
        //print("Status: \(reader.status)")
    }
    
    // if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
    guard reader.status == .completed else {
        fatalError("Couldn't read the audio file")
    }
    
    return outputSamples
}

func processSamples(fromData sampleBuffer: inout Data,
                    outputSamples: inout [Float],
                    samplesToProcess: Int,
                    downSampledLength: Int,
                    samplesPerPixel: Int,
                    filter: [Float]) {
    
    sampleBuffer.withUnsafeBytes { (samples: UnsafeRawBufferPointer) in
        var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)
        
        let sampleCount = vDSP_Length(samplesToProcess)
        
        //Create an UnsafePointer<Int16> from samples
        let unsafeBufferPointer = samples.bindMemory(to: Int16.self)
        let unsafePointer = unsafeBufferPointer.baseAddress!
        
        //Convert 16bit int samples to floats
        vDSP_vflt16(unsafePointer, 1, &processingBuffer, 1, sampleCount)
        
        //Take the absolute values to get amplitude
        vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)
        
        //get the corresponding dB, and clip the results
        getdB(from: &processingBuffer)
        
        //Downsample and average
        var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
        vDSP_desamp(processingBuffer,
                    vDSP_Stride(samplesPerPixel),
                    filter, &downSampledData,
                    vDSP_Length(downSampledLength),
                    vDSP_Length(samplesPerPixel))
        
        //Remove processed samples
        sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)
        
        outputSamples += downSampledData
    }
}

func getdB(from normalizedSamples: inout [Float]) {
    // Convert samples to a log scale
    var zero: Float = 32768.0
    vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)
    
    //Clip to [noiseFloor, 0]
    var ceil: Float = 0.0
    var noiseFloorMutable = noiseFloor
    vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
}

旧解决方案

这是一个功能，您可以使用它来预渲染音频文件的电平表而不播放它：

func averagePowers(audioFileURL: URL, forChannel channelNumber: Int, completionHandler: @escaping(_ success: [Float]) -> ()) {
    let audioFile = try! AVAudioFile(forReading: audioFileURL)
    let audioFilePFormat = audioFile.processingFormat
    let audioFileLength = audioFile.length
    
    //Set the size of frames to read from the audio file, you can adjust this to your liking
    let frameSizeToRead = Int(audioFilePFormat.sampleRate/20)
    
    //This is to how many frames/portions we're going to divide the audio file
    let numberOfFrames = Int(audioFileLength)/frameSizeToRead
    
    //Create a pcm buffer the size of a frame
    guard let audioBuffer = AVAudioPCMBuffer(pcmFormat: audioFilePFormat, frameCapacity: AVAudioFrameCount(frameSizeToRead)) else {
        fatalError("Couldn't create the audio buffer")
    }
    
    //Do the calculations in a background thread, if you don't want to block the main thread for larger audio files
    DispatchQueue.global(qos: .userInitiated).async {
        
        //This is the array to be returned
        var returnArray : [Float] = [Float]()
        
        //We're going to read the audio file, frame by frame
        for i in 0..<numberOfFrames {
            
            //Change the position from which we are reading the audio file, since each frame starts from a different position in the audio file
            audioFile.framePosition = AVAudioFramePosition(i * frameSizeToRead)
            
            //Read the frame from the audio file
            try! audioFile.read(into: audioBuffer, frameCount: AVAudioFrameCount(frameSizeToRead))
            
            //Get the data from the chosen channel
            let channelData = audioBuffer.floatChannelData![channelNumber]
            
            //This is the array of floats
            let arr = Array(UnsafeBufferPointer(start:channelData, count: frameSizeToRead))
            
            //Calculate the mean value of the absolute values
            let meanValue = arr.reduce(0, {$0 + abs($1)})/Float(arr.count)
            
            //Calculate the dB power (You can adjust this), if average is less than 0.000_000_01 we limit it to -160.0
            let dbPower: Float = meanValue > 0.000_000_01 ? 20 * log10(meanValue) : -160.0
            
            //append the db power in the current frame to the returnArray
            returnArray.append(dbPower)
        }
        
        //Return the dBPowers
        completionHandler(returnArray)
    }
}

你可以这样称呼它：

let path = Bundle.main.path(forResource: "audio.mp3", ofType:nil)!
let url = URL(fileURLWithPath: path)
averagePowers(audioFileURL: url, forChannel: 0, completionHandler: { array in
    //Use the array
})

使用仪器，此解决方案在 1.2 秒内使 cpu 使用率很高，使用 returnArray 返回主线程大约需要 5 秒，在低电量模式下最多需要 10 秒。

【讨论】：

样本数是您希望在结果数组中获得的浮点数
@PeterWarbo 如果您想从所有渠道获取数据，我可以添加一个简单的调整...
但是输出不是取决于音频文件的长度吗？如果是一分钟，我可以预期输出的样本数量比长度为 10 秒的多吗？我看不到调用者如何决定接收多少样本？它应该取决于音频长度吗？
@JoshuaHart：我已经包含了 Swift 5 语法。您所要做的就是将UnsafeRawBufferPointer 转换为UnsafePointer<Int16>。
@AlexandruMotoc 我错误地将sampleBuffer.withUnsafeBytes 代码块粘贴了两次。现在已修复。谢谢！

【解决方案2】：

首先，这是一项繁重的操作，因此需要一些操作系统时间和资源来完成此操作。在下面的示例中，我将使用标准帧速率和采样，但是如果您只想将条形图显示为指示，那么您的采样量应该少得多

好的，所以您不需要播放声音来分析它。所以在这我根本不会使用AVAudioPlayer，我假设我将使用URL：

    let path = Bundle.main.path(forResource: "example3.mp3", ofType:nil)!
    let url = URL(fileURLWithPath: path)

然后我将使用AVAudioFile 将轨道信息获取到AVAudioPCMBuffer。每当您将它放在缓冲区中时，您都会获得有关您的曲目的所有信息：

func buffer(url: URL) {
    do {
        let track = try AVAudioFile(forReading: url)
        let format = AVAudioFormat(commonFormat:.pcmFormatFloat32, sampleRate:track.fileFormat.sampleRate, channels: track.fileFormat.channelCount,  interleaved: false)
        let buffer = AVAudioPCMBuffer(pcmFormat: format!, frameCapacity: UInt32(track.length))!
        try track.read(into : buffer, frameCount:UInt32(track.length))
        self.analyze(buffer: buffer)
    } catch {
        print(error)
    }
}

您可能注意到有analyze 方法。您的缓冲区中应该有接近 floatChannelData 的变量。这是一个普通数据，因此您需要对其进行解析。我将发布一个方法并在下面解释这一点：

func analyze(buffer: AVAudioPCMBuffer) {
    let channelCount = Int(buffer.format.channelCount)
    let frameLength = Int(buffer.frameLength)
    var result = Array(repeating: [Float](repeatElement(0, count: frameLength)), count: channelCount)
    for channel in 0..<channelCount {
        for sampleIndex in 0..<frameLength {
            let sqrtV = sqrt(buffer.floatChannelData![channel][sampleIndex*buffer.stride]/Float(buffer.frameLength))
            let dbPower = 20 * log10(sqrtV)
            result[channel][sampleIndex] = dbPower
        }
    }
}

其中涉及一些计算（重的）。当我在几个飞蛾前研究类似的解决方案时，我遇到了这个教程：https://www.raywenderlich.com/5154-avaudioengine-tutorial-for-ios-getting-started 那里对这个计算有很好的解释，还有我在上面粘贴的部分代码，也在我的项目中使用，所以我想归功于作者：Scott McAlister ?

【讨论】：

【解决方案3】：

根据上面@Jakub 的回答，这是一个Objective-C 版本。

如果您想提高准确度，请更改 deciblesCount 变量，但要注意性能损失。如果你想返回更多的柱，你可以在调用函数时增加divisions 变量（没有额外的性能影响）。无论如何，您应该将它放在后台线程上。

一首 3:36 分钟 / 5.2MB 的歌曲大约需要 1.2 秒。上图分别是30师和100师的霰弹枪射击

-(NSArray *)returnWaveArrayForFile:(NSString *)filepath numberOfDivisions:(int)divisions{
    
    //pull file
    NSError * error;
    NSURL * url = [NSURL URLWithString:filepath];
    AVAudioFile * file = [[AVAudioFile alloc] initForReading:url error:&error];
    
    //create av stuff
    AVAudioFormat * format = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32 sampleRate:file.fileFormat.sampleRate channels:file.fileFormat.channelCount interleaved:false];
    AVAudioPCMBuffer * buffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:format frameCapacity:(int)file.length];
    [file readIntoBuffer:buffer frameCount:(int)file.length error:&error];
    
    //grab total number of decibles, 1000 seems to work
    int deciblesCount = MIN(1000,buffer.frameLength);
    NSMutableArray * channels = [NSMutableArray new];
    float frameIncrement = buffer.frameLength / (float)deciblesCount;
    
    //needed later
    float maxDecible = 0;
    float minDecible = HUGE_VALF;
    NSMutableArray * sd = [NSMutableArray new]; //used for standard deviation
    for (int n = 0; n < MIN(buffer.format.channelCount, 2); n++){ //go through channels
        NSMutableArray * decibles = [NSMutableArray new]; //holds actual decible values
        
        //go through pulling the decibles
        for (int i = 0; i < deciblesCount; i++){
            
            int offset = frameIncrement * i; //grab offset
            //equation from stack, no idea the maths
            float sqr = sqrtf(buffer.floatChannelData[n][offset * buffer.stride]/(float)buffer.frameLength);
            float decible = 20 * log10f(sqr);
            
            decible += 160; //make positive
            decible = (isnan(decible) || decible < 0) ? 0 : decible; //if it's not a number or silent, make it zero
            if (decible > 0){ //if it has volume
                [sd addObject:@(decible)];
            }
            [decibles addObject:@(decible)];//add to decibles array
            
            maxDecible = MAX(maxDecible, decible); //grab biggest
            minDecible = MIN(minDecible, decible); //grab smallest
        }
        
        [channels addObject:decibles]; //add to channels array
    }
    
    //find standard deviation and then deducted the bottom slag
    NSExpression * expression = [NSExpression expressionForFunction:@"stddev:" arguments:@[[NSExpression expressionForConstantValue:sd]]];
    float standardDeviation = [[expression expressionValueWithObject:nil context:nil] floatValue];
    float deviationDeduct = standardDeviation / (standardDeviation + (maxDecible - minDecible));

    //go through calculating deviation percentage
    NSMutableArray * deviations = [NSMutableArray new];
    NSMutableArray * returning = [NSMutableArray new];
    for (int c = 0; c < (int)channels.count; c++){
        
        NSArray * channel = channels[c];
        for (int n = 0; n < (int)channel.count; n++){
            
            float decible = [channel[n] floatValue];
            float remainder = (maxDecible - decible);
            float deviation = standardDeviation / (standardDeviation + remainder) - deviationDeduct;
            [deviations addObject:@(deviation)];
        }
        
        //go through creating percentage
        float maxTotal = 0;
        int catchCount = floorf(deciblesCount / divisions); //total decible values within a segment or division
        NSMutableArray * totals = [NSMutableArray new];
        for (int n = 0; n < divisions; n++){
            
            float total = 0.0f;
            for (int k = 0; k < catchCount; k++){ //go through each segment
                int index = n * catchCount + k; //create the index
                float deviation = [deviations[index] floatValue]; //grab value
                total += deviation; //add to total
            }
            
            //max out maxTotal var -> used later to calc percentages
            maxTotal = MAX(maxTotal, total);
            [totals addObject:@(total)]; //add to totals array
        }
        
        //normalise percentages and return
        NSMutableArray * percentages = [NSMutableArray new];
        for (int n = 0; n < divisions; n++){
            
            float total = [totals[n] floatValue]; //grab the total value for that segment
            float percentage = total / maxTotal; //divide by the biggest value -> making it a percentage
            [percentages addObject:@(percentage)]; //add to the array
        }
        
        //add to the returning array
        [returning addObject:percentages];
    }
    
    //return channel data -> array of two arrays of percentages
    return (NSArray *)returning;
    
}

这样调用：

int divisions = 30; //number of segments you want for your display
NSString * path = [[NSBundle mainBundle] pathForResource:@"satie" ofType:@"mp3"];
NSArray * channels = [_audioReader returnWaveArrayForFile:path numberOfDivisions:divisions];

您可以在该数组中返回两个通道，您可以使用它来更新您的 UI。每个数组中的值介于 0 和 1 之间，您可以使用它们来构建条形图。

【讨论】：