【问题标题】:Extract meter levels from audio file从音频文件中提取电平表
【发布时间】:2019-01-13 08:03:29
【问题描述】:

我需要从文件中提取音频电平表,以便在播放音频之前渲染电平。我知道AVAudioPlayer可以在通过播放音频文件时获取此信息

func averagePower(forChannel channelNumber: Int) -> Float.

但就我而言,我想事先获得[Float] 的米级。

【问题讨论】:

    标签: ios swift audio avaudioplayer audiotoolbox


    【解决方案1】:

    斯威夫特 4

    它需要一部 iPhone:

    • 0.538s4min47s 持续时间和44,100 采样率处理8MByte mp3 播放器

    • 0.170s22s 持续时间和44,100 采样率处理712KByte mp3 播放器

    • 0.089s 处理caf通过在终端中使用此命令afconvert -f caff -d LEI16 audio.mp3 audio.caf 转换上述文件创建的文件。

    让我们开始吧:

    A)声明这个类将保存有关音频资产的必要信息:

    /// Holds audio information used for building waveforms
    final class AudioContext {
        
        /// The audio asset URL used to load the context
        public let audioURL: URL
        
        /// Total number of samples in loaded asset
        public let totalSamples: Int
        
        /// Loaded asset
        public let asset: AVAsset
        
        // Loaded assetTrack
        public let assetTrack: AVAssetTrack
        
        private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
            self.audioURL = audioURL
            self.totalSamples = totalSamples
            self.asset = asset
            self.assetTrack = assetTrack
        }
        
        public static func load(fromAudioURL audioURL: URL, completionHandler: @escaping (_ audioContext: AudioContext?) -> ()) {
            let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])
            
            guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
                fatalError("Couldn't load AVAssetTrack")
            }
            
            asset.loadValuesAsynchronously(forKeys: ["duration"]) {
                var error: NSError?
                let status = asset.statusOfValue(forKey: "duration", error: &error)
                switch status {
                case .loaded:
                    guard
                        let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
                        let audioFormatDesc = formatDescriptions.first,
                        let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
                        else { break }
                    
                    let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
                    let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
                    completionHandler(audioContext)
                    return
                    
                case .failed, .cancelled, .loading, .unknown:
                    print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
                }
                
                completionHandler(nil)
            }
        }
    }
    

    我们将使用它的异步函数load,并将其结果处理给完成处理程序。

    B) 在您的视图控制器中导入 AVFoundationAccelerate

    import AVFoundation
    import Accelerate
    

    C) 在视图控制器中声明噪声级别(以 dB 为单位):

    let noiseFloor: Float = -80
    

    例如,小于-80dB 的任何内容都将被视为静音。

    D) 以下函数采用音频上下文并产生所需的 dB 功率。 targetSamples 默认设置为 100,您可以更改它以满足您的 UI 需求:

    func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
        guard let audioContext = audioContext else {
            fatalError("Couldn't create the audioContext")
        }
        
        let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples
        
        guard let reader = try? AVAssetReader(asset: audioContext.asset)
            else {
                fatalError("Couldn't initialize the AVAssetReader")
        }
        
        reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
                                       duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))
        
        let outputSettingsDict: [String : Any] = [
            AVFormatIDKey: Int(kAudioFormatLinearPCM),
            AVLinearPCMBitDepthKey: 16,
            AVLinearPCMIsBigEndianKey: false,
            AVLinearPCMIsFloatKey: false,
            AVLinearPCMIsNonInterleaved: false
        ]
        
        let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
                                                    outputSettings: outputSettingsDict)
        readerOutput.alwaysCopiesSampleData = false
        reader.add(readerOutput)
        
        var channelCount = 1
        let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
        for item in formatDescriptions {
            guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
                fatalError("Couldn't get the format description")
            }
            channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
        }
        
        let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
        let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
        
        var outputSamples = [Float]()
        var sampleBuffer = Data()
        
        // 16-bit samples
        reader.startReading()
        defer { reader.cancelReading() }
        
        while reader.status == .reading {
            guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
                let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
                    break
            }
            // Append audio sample buffer into our current sample buffer
            var readBufferLength = 0
            var readBufferPointer: UnsafeMutablePointer<Int8>?
            CMBlockBufferGetDataPointer(readBuffer, 0, &readBufferLength, nil, &readBufferPointer)
            sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
            CMSampleBufferInvalidate(readSampleBuffer)
            
            let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
            let downSampledLength = totalSamples / samplesPerPixel
            let samplesToProcess = downSampledLength * samplesPerPixel
            
            guard samplesToProcess > 0 else { continue }
            
            processSamples(fromData: &sampleBuffer,
                           outputSamples: &outputSamples,
                           samplesToProcess: samplesToProcess,
                           downSampledLength: downSampledLength,
                           samplesPerPixel: samplesPerPixel,
                           filter: filter)
            //print("Status: \(reader.status)")
        }
        
        // Process the remaining samples at the end which didn't fit into samplesPerPixel
        let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
        if samplesToProcess > 0 {
            let downSampledLength = 1
            let samplesPerPixel = samplesToProcess
            let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
            
            processSamples(fromData: &sampleBuffer,
                           outputSamples: &outputSamples,
                           samplesToProcess: samplesToProcess,
                           downSampledLength: downSampledLength,
                           samplesPerPixel: samplesPerPixel,
                           filter: filter)
            //print("Status: \(reader.status)")
        }
        
        // if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
        guard reader.status == .completed else {
            fatalError("Couldn't read the audio file")
        }
        
        return outputSamples
    }
    

    E) render 使用此函数对音频文件中的数据进行下采样,并转换为分贝:

    func processSamples(fromData sampleBuffer: inout Data,
                        outputSamples: inout [Float],
                        samplesToProcess: Int,
                        downSampledLength: Int,
                        samplesPerPixel: Int,
                        filter: [Float]) {
        sampleBuffer.withUnsafeBytes { (samples: UnsafePointer<Int16>) in
            var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)
            
            let sampleCount = vDSP_Length(samplesToProcess)
            
            //Convert 16bit int samples to floats
            vDSP_vflt16(samples, 1, &processingBuffer, 1, sampleCount)
            
            //Take the absolute values to get amplitude
            vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)
            
            //get the corresponding dB, and clip the results
            getdB(from: &processingBuffer)
            
            //Downsample and average
            var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
            vDSP_desamp(processingBuffer,
                        vDSP_Stride(samplesPerPixel),
                        filter, &downSampledData,
                        vDSP_Length(downSampledLength),
                        vDSP_Length(samplesPerPixel))
            
            //Remove processed samples
            sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)
            
            outputSamples += downSampledData
        }
    }
    

    F) 依次调用该函数获取对应的dB,并将结果剪辑到[noiseFloor, 0]

    func getdB(from normalizedSamples: inout [Float]) {
        // Convert samples to a log scale
        var zero: Float = 32768.0
        vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)
        
        //Clip to [noiseFloor, 0]
        var ceil: Float = 0.0
        var noiseFloorMutable = noiseFloor
        vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
    }
    

    G) 最后你可以像这样得到音频的波形:

    guard let path = Bundle.main.path(forResource: "audio", ofType:"mp3") else {
        fatalError("Couldn't find the file path")
    }
    let url = URL(fileURLWithPath: path)
    var outputArray : [Float] = []
    AudioContext.load(fromAudioURL: url, completionHandler: { audioContext in
        guard let audioContext = audioContext else {
            fatalError("Couldn't create the audioContext")
        }
        outputArray = self.render(audioContext: audioContext, targetSamples: 300)
    })
    

    不要忘记AudioContext.load(fromAudioURL:) 是异步的。

    此解决方案由 William Entrikenthis repo 合成。所有功劳归于他。


    斯威夫特 5

    这是更新为 Swift 5 语法的相同代码:

    import AVFoundation
    import Accelerate
    
    /// Holds audio information used for building waveforms
    final class AudioContext {
        
        /// The audio asset URL used to load the context
        public let audioURL: URL
        
        /// Total number of samples in loaded asset
        public let totalSamples: Int
        
        /// Loaded asset
        public let asset: AVAsset
        
        // Loaded assetTrack
        public let assetTrack: AVAssetTrack
        
        private init(audioURL: URL, totalSamples: Int, asset: AVAsset, assetTrack: AVAssetTrack) {
            self.audioURL = audioURL
            self.totalSamples = totalSamples
            self.asset = asset
            self.assetTrack = assetTrack
        }
        
        public static func load(fromAudioURL audioURL: URL, completionHandler: @escaping (_ audioContext: AudioContext?) -> ()) {
            let asset = AVURLAsset(url: audioURL, options: [AVURLAssetPreferPreciseDurationAndTimingKey: NSNumber(value: true as Bool)])
            
            guard let assetTrack = asset.tracks(withMediaType: AVMediaType.audio).first else {
                fatalError("Couldn't load AVAssetTrack")
            }
            
            asset.loadValuesAsynchronously(forKeys: ["duration"]) {
                var error: NSError?
                let status = asset.statusOfValue(forKey: "duration", error: &error)
                switch status {
                case .loaded:
                    guard
                        let formatDescriptions = assetTrack.formatDescriptions as? [CMAudioFormatDescription],
                        let audioFormatDesc = formatDescriptions.first,
                        let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
                        else { break }
                    
                    let totalSamples = Int((asbd.pointee.mSampleRate) * Float64(asset.duration.value) / Float64(asset.duration.timescale))
                    let audioContext = AudioContext(audioURL: audioURL, totalSamples: totalSamples, asset: asset, assetTrack: assetTrack)
                    completionHandler(audioContext)
                    return
                    
                case .failed, .cancelled, .loading, .unknown:
                    print("Couldn't load asset: \(error?.localizedDescription ?? "Unknown error")")
                }
                
                completionHandler(nil)
            }
        }
    }
    
    let noiseFloor: Float = -80
    
    func render(audioContext: AudioContext?, targetSamples: Int = 100) -> [Float]{
        guard let audioContext = audioContext else {
            fatalError("Couldn't create the audioContext")
        }
        
        let sampleRange: CountableRange<Int> = 0..<audioContext.totalSamples
        
        guard let reader = try? AVAssetReader(asset: audioContext.asset)
            else {
                fatalError("Couldn't initialize the AVAssetReader")
        }
        
        reader.timeRange = CMTimeRange(start: CMTime(value: Int64(sampleRange.lowerBound), timescale: audioContext.asset.duration.timescale),
                                       duration: CMTime(value: Int64(sampleRange.count), timescale: audioContext.asset.duration.timescale))
        
        let outputSettingsDict: [String : Any] = [
            AVFormatIDKey: Int(kAudioFormatLinearPCM),
            AVLinearPCMBitDepthKey: 16,
            AVLinearPCMIsBigEndianKey: false,
            AVLinearPCMIsFloatKey: false,
            AVLinearPCMIsNonInterleaved: false
        ]
        
        let readerOutput = AVAssetReaderTrackOutput(track: audioContext.assetTrack,
                                                    outputSettings: outputSettingsDict)
        readerOutput.alwaysCopiesSampleData = false
        reader.add(readerOutput)
        
        var channelCount = 1
        let formatDescriptions = audioContext.assetTrack.formatDescriptions as! [CMAudioFormatDescription]
        for item in formatDescriptions {
            guard let fmtDesc = CMAudioFormatDescriptionGetStreamBasicDescription(item) else {
                fatalError("Couldn't get the format description")
            }
            channelCount = Int(fmtDesc.pointee.mChannelsPerFrame)
        }
        
        let samplesPerPixel = max(1, channelCount * sampleRange.count / targetSamples)
        let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
        
        var outputSamples = [Float]()
        var sampleBuffer = Data()
        
        // 16-bit samples
        reader.startReading()
        defer { reader.cancelReading() }
        
        while reader.status == .reading {
            guard let readSampleBuffer = readerOutput.copyNextSampleBuffer(),
                let readBuffer = CMSampleBufferGetDataBuffer(readSampleBuffer) else {
                    break
            }
            // Append audio sample buffer into our current sample buffer
            var readBufferLength = 0
            var readBufferPointer: UnsafeMutablePointer<Int8>?
            CMBlockBufferGetDataPointer(readBuffer,
                                        atOffset: 0,
                                        lengthAtOffsetOut: &readBufferLength,
                                        totalLengthOut: nil,
                                        dataPointerOut: &readBufferPointer)
            sampleBuffer.append(UnsafeBufferPointer(start: readBufferPointer, count: readBufferLength))
            CMSampleBufferInvalidate(readSampleBuffer)
            
            let totalSamples = sampleBuffer.count / MemoryLayout<Int16>.size
            let downSampledLength = totalSamples / samplesPerPixel
            let samplesToProcess = downSampledLength * samplesPerPixel
            
            guard samplesToProcess > 0 else { continue }
            
            processSamples(fromData: &sampleBuffer,
                           outputSamples: &outputSamples,
                           samplesToProcess: samplesToProcess,
                           downSampledLength: downSampledLength,
                           samplesPerPixel: samplesPerPixel,
                           filter: filter)
            //print("Status: \(reader.status)")
        }
        
        // Process the remaining samples at the end which didn't fit into samplesPerPixel
        let samplesToProcess = sampleBuffer.count / MemoryLayout<Int16>.size
        if samplesToProcess > 0 {
            let downSampledLength = 1
            let samplesPerPixel = samplesToProcess
            let filter = [Float](repeating: 1.0 / Float(samplesPerPixel), count: samplesPerPixel)
            
            processSamples(fromData: &sampleBuffer,
                           outputSamples: &outputSamples,
                           samplesToProcess: samplesToProcess,
                           downSampledLength: downSampledLength,
                           samplesPerPixel: samplesPerPixel,
                           filter: filter)
            //print("Status: \(reader.status)")
        }
        
        // if (reader.status == AVAssetReaderStatusFailed || reader.status == AVAssetReaderStatusUnknown)
        guard reader.status == .completed else {
            fatalError("Couldn't read the audio file")
        }
        
        return outputSamples
    }
    
    func processSamples(fromData sampleBuffer: inout Data,
                        outputSamples: inout [Float],
                        samplesToProcess: Int,
                        downSampledLength: Int,
                        samplesPerPixel: Int,
                        filter: [Float]) {
        
        sampleBuffer.withUnsafeBytes { (samples: UnsafeRawBufferPointer) in
            var processingBuffer = [Float](repeating: 0.0, count: samplesToProcess)
            
            let sampleCount = vDSP_Length(samplesToProcess)
            
            //Create an UnsafePointer<Int16> from samples
            let unsafeBufferPointer = samples.bindMemory(to: Int16.self)
            let unsafePointer = unsafeBufferPointer.baseAddress!
            
            //Convert 16bit int samples to floats
            vDSP_vflt16(unsafePointer, 1, &processingBuffer, 1, sampleCount)
            
            //Take the absolute values to get amplitude
            vDSP_vabs(processingBuffer, 1, &processingBuffer, 1, sampleCount)
            
            //get the corresponding dB, and clip the results
            getdB(from: &processingBuffer)
            
            //Downsample and average
            var downSampledData = [Float](repeating: 0.0, count: downSampledLength)
            vDSP_desamp(processingBuffer,
                        vDSP_Stride(samplesPerPixel),
                        filter, &downSampledData,
                        vDSP_Length(downSampledLength),
                        vDSP_Length(samplesPerPixel))
            
            //Remove processed samples
            sampleBuffer.removeFirst(samplesToProcess * MemoryLayout<Int16>.size)
            
            outputSamples += downSampledData
        }
    }
    
    func getdB(from normalizedSamples: inout [Float]) {
        // Convert samples to a log scale
        var zero: Float = 32768.0
        vDSP_vdbcon(normalizedSamples, 1, &zero, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count), 1)
        
        //Clip to [noiseFloor, 0]
        var ceil: Float = 0.0
        var noiseFloorMutable = noiseFloor
        vDSP_vclip(normalizedSamples, 1, &noiseFloorMutable, &ceil, &normalizedSamples, 1, vDSP_Length(normalizedSamples.count))
    }
    

    旧解决方案

    这是一个功能,您可以使用它来预渲染音频文件的电平表而不播放它:

    func averagePowers(audioFileURL: URL, forChannel channelNumber: Int, completionHandler: @escaping(_ success: [Float]) -> ()) {
        let audioFile = try! AVAudioFile(forReading: audioFileURL)
        let audioFilePFormat = audioFile.processingFormat
        let audioFileLength = audioFile.length
        
        //Set the size of frames to read from the audio file, you can adjust this to your liking
        let frameSizeToRead = Int(audioFilePFormat.sampleRate/20)
        
        //This is to how many frames/portions we're going to divide the audio file
        let numberOfFrames = Int(audioFileLength)/frameSizeToRead
        
        //Create a pcm buffer the size of a frame
        guard let audioBuffer = AVAudioPCMBuffer(pcmFormat: audioFilePFormat, frameCapacity: AVAudioFrameCount(frameSizeToRead)) else {
            fatalError("Couldn't create the audio buffer")
        }
        
        //Do the calculations in a background thread, if you don't want to block the main thread for larger audio files
        DispatchQueue.global(qos: .userInitiated).async {
            
            //This is the array to be returned
            var returnArray : [Float] = [Float]()
            
            //We're going to read the audio file, frame by frame
            for i in 0..<numberOfFrames {
                
                //Change the position from which we are reading the audio file, since each frame starts from a different position in the audio file
                audioFile.framePosition = AVAudioFramePosition(i * frameSizeToRead)
                
                //Read the frame from the audio file
                try! audioFile.read(into: audioBuffer, frameCount: AVAudioFrameCount(frameSizeToRead))
                
                //Get the data from the chosen channel
                let channelData = audioBuffer.floatChannelData![channelNumber]
                
                //This is the array of floats
                let arr = Array(UnsafeBufferPointer(start:channelData, count: frameSizeToRead))
                
                //Calculate the mean value of the absolute values
                let meanValue = arr.reduce(0, {$0 + abs($1)})/Float(arr.count)
                
                //Calculate the dB power (You can adjust this), if average is less than 0.000_000_01 we limit it to -160.0
                let dbPower: Float = meanValue > 0.000_000_01 ? 20 * log10(meanValue) : -160.0
                
                //append the db power in the current frame to the returnArray
                returnArray.append(dbPower)
            }
            
            //Return the dBPowers
            completionHandler(returnArray)
        }
    }
    

    你可以这样称呼它:

    let path = Bundle.main.path(forResource: "audio.mp3", ofType:nil)!
    let url = URL(fileURLWithPath: path)
    averagePowers(audioFileURL: url, forChannel: 0, completionHandler: { array in
        //Use the array
    })
    

    使用仪器,此解决方案在 1.2 秒内使 cpu 使用率很高,使用 returnArray 返回主线程大约需要 5 秒,在低电量模式下最多需要 10 秒。

    【讨论】:

    • 样本数是您希望在结果数组中获得的浮点数
    • @PeterWarbo 如果您想从所有渠道获取数据,我可以添加一个简单的调整...
    • 但是输出不是取决于音频文件的长度吗?如果是一分钟,我可以预期输出的样本数量比长度为 10 秒的多吗?我看不到调用者如何决定接收多少样本?它应该取决于音频长度吗?
    • @JoshuaHart:我已经包含了 Swift 5 语法。您所要做的就是将UnsafeRawBufferPointer 转换为UnsafePointer&lt;Int16&gt;
    • @AlexandruMotoc 我错误地将sampleBuffer.withUnsafeBytes 代码块粘贴了两次。现在已修复。谢谢!
    【解决方案2】:

    首先,这是一项繁重的操作,因此需要一些操作系统时间和资源来完成此操作。在下面的示例中,我将使用标准帧速率和采样,但是如果您只想将条形图显示为指示,那么您的采样量应该少得多

    好的,所以您不需要播放声音来分析它。所以在这我根本不会使用AVAudioPlayer,我假设我将使用URL

        let path = Bundle.main.path(forResource: "example3.mp3", ofType:nil)!
        let url = URL(fileURLWithPath: path)
    

    然后我将使用AVAudioFile 将轨道信息获取到AVAudioPCMBuffer。每当您将它放在缓冲区中时,您都会获得有关您的曲目的所有信息:

    func buffer(url: URL) {
        do {
            let track = try AVAudioFile(forReading: url)
            let format = AVAudioFormat(commonFormat:.pcmFormatFloat32, sampleRate:track.fileFormat.sampleRate, channels: track.fileFormat.channelCount,  interleaved: false)
            let buffer = AVAudioPCMBuffer(pcmFormat: format!, frameCapacity: UInt32(track.length))!
            try track.read(into : buffer, frameCount:UInt32(track.length))
            self.analyze(buffer: buffer)
        } catch {
            print(error)
        }
    }
    

    您可能注意到有analyze 方法。您的缓冲区中应该有接近 floatChannelData 的变量。这是一个普通数据,因此您需要对其进行解析。我将发布一个方法并在下面解释这一点:

    func analyze(buffer: AVAudioPCMBuffer) {
        let channelCount = Int(buffer.format.channelCount)
        let frameLength = Int(buffer.frameLength)
        var result = Array(repeating: [Float](repeatElement(0, count: frameLength)), count: channelCount)
        for channel in 0..<channelCount {
            for sampleIndex in 0..<frameLength {
                let sqrtV = sqrt(buffer.floatChannelData![channel][sampleIndex*buffer.stride]/Float(buffer.frameLength))
                let dbPower = 20 * log10(sqrtV)
                result[channel][sampleIndex] = dbPower
            }
        }
    }
    

    其中涉及一些计算(重的)。当我在几个飞蛾前研究类似的解决方案时,我遇到了这个教程:https://www.raywenderlich.com/5154-avaudioengine-tutorial-for-ios-getting-started 那里对这个计算有很好的解释,还有我在上面粘贴的部分代码,也在我的项目中使用,所以我想归功于作者:Scott McAlister ?

    【讨论】:

      【解决方案3】:

      根据上面@Jakub 的回答,这是一个Objective-C 版本。

      如果您想提高准确度,请更改 deciblesCount 变量,但要注意性能损失。如果你想返回更多的柱,你可以在调用函数时增加divisions 变量(没有额外的性能影响)。无论如何,您应该将它放在后台线程上。

      一首 3:36 分钟 / 5.2MB 的歌曲大约需要 1.2 秒。上图分别是30师和100师的霰弹枪射击

      -(NSArray *)returnWaveArrayForFile:(NSString *)filepath numberOfDivisions:(int)divisions{
          
          //pull file
          NSError * error;
          NSURL * url = [NSURL URLWithString:filepath];
          AVAudioFile * file = [[AVAudioFile alloc] initForReading:url error:&error];
          
          //create av stuff
          AVAudioFormat * format = [[AVAudioFormat alloc] initWithCommonFormat:AVAudioPCMFormatFloat32 sampleRate:file.fileFormat.sampleRate channels:file.fileFormat.channelCount interleaved:false];
          AVAudioPCMBuffer * buffer = [[AVAudioPCMBuffer alloc] initWithPCMFormat:format frameCapacity:(int)file.length];
          [file readIntoBuffer:buffer frameCount:(int)file.length error:&error];
          
          //grab total number of decibles, 1000 seems to work
          int deciblesCount = MIN(1000,buffer.frameLength);
          NSMutableArray * channels = [NSMutableArray new];
          float frameIncrement = buffer.frameLength / (float)deciblesCount;
          
          //needed later
          float maxDecible = 0;
          float minDecible = HUGE_VALF;
          NSMutableArray * sd = [NSMutableArray new]; //used for standard deviation
          for (int n = 0; n < MIN(buffer.format.channelCount, 2); n++){ //go through channels
              NSMutableArray * decibles = [NSMutableArray new]; //holds actual decible values
              
              //go through pulling the decibles
              for (int i = 0; i < deciblesCount; i++){
                  
                  int offset = frameIncrement * i; //grab offset
                  //equation from stack, no idea the maths
                  float sqr = sqrtf(buffer.floatChannelData[n][offset * buffer.stride]/(float)buffer.frameLength);
                  float decible = 20 * log10f(sqr);
                  
                  decible += 160; //make positive
                  decible = (isnan(decible) || decible < 0) ? 0 : decible; //if it's not a number or silent, make it zero
                  if (decible > 0){ //if it has volume
                      [sd addObject:@(decible)];
                  }
                  [decibles addObject:@(decible)];//add to decibles array
                  
                  maxDecible = MAX(maxDecible, decible); //grab biggest
                  minDecible = MIN(minDecible, decible); //grab smallest
              }
              
              [channels addObject:decibles]; //add to channels array
          }
          
          //find standard deviation and then deducted the bottom slag
          NSExpression * expression = [NSExpression expressionForFunction:@"stddev:" arguments:@[[NSExpression expressionForConstantValue:sd]]];
          float standardDeviation = [[expression expressionValueWithObject:nil context:nil] floatValue];
          float deviationDeduct = standardDeviation / (standardDeviation + (maxDecible - minDecible));
      
          //go through calculating deviation percentage
          NSMutableArray * deviations = [NSMutableArray new];
          NSMutableArray * returning = [NSMutableArray new];
          for (int c = 0; c < (int)channels.count; c++){
              
              NSArray * channel = channels[c];
              for (int n = 0; n < (int)channel.count; n++){
                  
                  float decible = [channel[n] floatValue];
                  float remainder = (maxDecible - decible);
                  float deviation = standardDeviation / (standardDeviation + remainder) - deviationDeduct;
                  [deviations addObject:@(deviation)];
              }
              
              //go through creating percentage
              float maxTotal = 0;
              int catchCount = floorf(deciblesCount / divisions); //total decible values within a segment or division
              NSMutableArray * totals = [NSMutableArray new];
              for (int n = 0; n < divisions; n++){
                  
                  float total = 0.0f;
                  for (int k = 0; k < catchCount; k++){ //go through each segment
                      int index = n * catchCount + k; //create the index
                      float deviation = [deviations[index] floatValue]; //grab value
                      total += deviation; //add to total
                  }
                  
                  //max out maxTotal var -> used later to calc percentages
                  maxTotal = MAX(maxTotal, total);
                  [totals addObject:@(total)]; //add to totals array
              }
              
              //normalise percentages and return
              NSMutableArray * percentages = [NSMutableArray new];
              for (int n = 0; n < divisions; n++){
                  
                  float total = [totals[n] floatValue]; //grab the total value for that segment
                  float percentage = total / maxTotal; //divide by the biggest value -> making it a percentage
                  [percentages addObject:@(percentage)]; //add to the array
              }
              
              //add to the returning array
              [returning addObject:percentages];
          }
          
          //return channel data -> array of two arrays of percentages
          return (NSArray *)returning;
          
      }
      

      这样调用:

      int divisions = 30; //number of segments you want for your display
      NSString * path = [[NSBundle mainBundle] pathForResource:@"satie" ofType:@"mp3"];
      NSArray * channels = [_audioReader returnWaveArrayForFile:path numberOfDivisions:divisions];
      

      您可以在该数组中返回两个通道,您可以使用它来更新您的 UI。每个数组中的值介于 0 和 1 之间,您可以使用它们来构建条形图。

      【讨论】:

        猜你喜欢
        • 2012-12-12
        • 2012-07-10
        • 1970-01-01
        • 1970-01-01
        • 2014-08-14
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多