使用 FFMPEG 从音频文件中获取波形数据答案

【问题标题】：Get waveform data from audio file using FFMPEG使用 FFMPEG 从音频文件中获取波形数据
【发布时间】：2017-05-29 14:03:37
【问题描述】：

我正在编写一个应用程序，它需要获取音频文件的原始波形数据，以便可以在应用程序 (C#/.NET) 中呈现它。我正在使用 ffmpeg 卸载此任务，但看起来 ffmpeg 只能output the waveform data as a png or as a stream to gnuplot。

我查看了其他库来执行此操作（NAudio/CSCore），但是它们需要 windows/microsoft 媒体基础，并且由于此应用程序将作为 web 应用程序部署到 azure，因此我无法使用它们。

我的策略是只从 png 本身读取波形数据，但这似乎很老套，而且太过分了。理想的输出是数组中固定采样的一系列峰值，其中数组中的每个值都是峰值（范围从 1 到 100 或其他值，例如 this）。

【问题讨论】：

您可以使用流方法将其通过管道传输到您的应用程序或将其保存到磁盘并读取。

标签： c# .net azure ffmpeg asp.net-core

【解决方案1】：

您可以使用this tutorial 中描述的函数来获取从音频文件解码的原始数据作为double 值的数组。

从链接总结：

函数decode_audio_file有4个参数：

path：要解码的文件路径
sample_rate：输出数据所需的采样率
data：指向双精度值的指针，提取的数据将存储在其中
size：指向最终提取值数组长度的指针（样本数）

成功时返回 0，失败时返回 -1，以及写入stderr 流的错误消息。

功能代码如下：

#include <stdio.h>
#include <stdlib.h>
 
#include <libavutil/opt.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswresample/swresample.h>
 
 
int decode_audio_file(const char* path, const int sample_rate, double** data, int* size) {
 
    // initialize all muxers, demuxers and protocols for libavformat
    // (does nothing if called twice during the course of one program execution)
    av_register_all();
 
    // get format from audio file
    AVFormatContext* format = avformat_alloc_context();
    if (avformat_open_input(&format, path, NULL, NULL) != 0) {
        fprintf(stderr, "Could not open file '%s'\n", path);
        return -1;
    }
    if (avformat_find_stream_info(format, NULL) < 0) {
        fprintf(stderr, "Could not retrieve stream info from file '%s'\n", path);
        return -1;
    }
 
    // Find the index of the first audio stream
    int stream_index =- 1;
    for (int i=0; i<format->nb_streams; i++) {
        if (format->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) {
            stream_index = i;
            break;
        }
    }
    if (stream_index == -1) {
        fprintf(stderr, "Could not retrieve audio stream from file '%s'\n", path);
        return -1;
    }
    AVStream* stream = format->streams[stream_index];
 
    // find & open codec
    AVCodecContext* codec = stream->codec;
    if (avcodec_open2(codec, avcodec_find_decoder(codec->codec_id), NULL) < 0) {
        fprintf(stderr, "Failed to open decoder for stream #%u in file '%s'\n", stream_index, path);
        return -1;
    }
 
    // prepare resampler
    struct SwrContext* swr = swr_alloc();
    av_opt_set_int(swr, "in_channel_count",  codec->channels, 0);
    av_opt_set_int(swr, "out_channel_count", 1, 0);
    av_opt_set_int(swr, "in_channel_layout",  codec->channel_layout, 0);
    av_opt_set_int(swr, "out_channel_layout", AV_CH_LAYOUT_MONO, 0);
    av_opt_set_int(swr, "in_sample_rate", codec->sample_rate, 0);
    av_opt_set_int(swr, "out_sample_rate", sample_rate, 0);
    av_opt_set_sample_fmt(swr, "in_sample_fmt",  codec->sample_fmt, 0);
    av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_DBL,  0);
    swr_init(swr);
    if (!swr_is_initialized(swr)) {
        fprintf(stderr, "Resampler has not been properly initialized\n");
        return -1;
    }
 
    // prepare to read data
    AVPacket packet;
    av_init_packet(&packet);
    AVFrame* frame = av_frame_alloc();
    if (!frame) {
        fprintf(stderr, "Error allocating the frame\n");
        return -1;
    }
 
    // iterate through frames
    *data = NULL;
    *size = 0;
    while (av_read_frame(format, &packet) >= 0) {
        // decode one frame
        int gotFrame;
        if (avcodec_decode_audio4(codec, frame, &gotFrame, &packet) < 0) {
            break;
        }
        if (!gotFrame) {
            continue;
        }
        // resample frames
        double* buffer;
        av_samples_alloc((uint8_t**) &buffer, NULL, 1, frame->nb_samples, AV_SAMPLE_FMT_DBL, 0);
        int frame_count = swr_convert(swr, (uint8_t**) &buffer, frame->nb_samples, (const uint8_t**) frame->data, frame->nb_samples);
        // append resampled frames to data
        *data = (double*) realloc(*data, (*size + frame->nb_samples) * sizeof(double));
        memcpy(*data + *size, buffer, frame_count * sizeof(double));
        *size += frame_count;
    }
 
    // clean up
    av_frame_free(&frame);
    swr_free(&swr);
    avcodec_close(codec);
    avformat_free_context(format);
 
    // success
    return 0;
 
}

您将需要以下标志来编译使用的程序：-lavcodec-ffmpeg -lavformat-ffmpeg -lavutil -lswresample 根据您的系统和安装，也可能是：-lavcodec -lavformat -lavutil -lswresample

其用法如下：

int main(int argc, char const *argv[]) {
 
    // check parameters
    if (argc < 2) {
        fprintf(stderr, "Please provide the path to an audio file as first command-line argument.\n");
        return -1;
    }
 
    // decode data
    int sample_rate = 44100;
    double* data;
    int size;
    if (decode_audio_file(argv[1], sample_rate, &data, &size) != 0) {
        return -1;
    }
 
    // sum data
    double sum = 0.0;
    for (int i=0; i<size; ++i) {
        sum += data[i];
    }
 
    // display result and exit cleanly
    printf("sum is %f", sum);
    free(data);
    return 0;
}

【讨论】：

【解决方案2】：

萨博纳布迪，

写了关于手动获取波形的方法，但为了向您展示一个示例，我发现 this code 可以满足您的需求（或者至少，您可以从中学到一些东西）。

1) 使用 FFmpeg 获取样本数组

试试这里显示的示例代码：http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US

尝试一下，尝试根据手册等的建议进行调整...在显示的代码中，只需更改 string path 以指向您自己的文件路径。编辑 proc.StartInfo.Arguments 部分以替换最后一个部分，如下所示：

proc.StartInfo.Arguments = "-i \"" + path + "\" -vn -ac 1 -filter:a aresample=myNum -map 0:a -c:a pcm_s16le -f data -";

myNum 来自 aresample=myNum 部分的计算公式为：

44100 * total Seconds = X.
myNum = X / WaveForm Width.

最后使用ProcessBuffer这个逻辑函数：

static void ProcessBuffer(byte[] buffer, int length)
{
    float val; //amplitude value of a sample
    int index = 0; //position within sample bytes
    int slicePos = 0; //horizontal (X-axis) position for pixels of next slice


    while (index < length)
    {
        val = BitConverter.ToInt16(buffer, index);
        index += sizeof(short);

        // use number in va to do something...
        // eg: Draw a line on canvas for part of waveform's pixels
        // eg: myBitmap.SetPixel(slicePos, val, Color.Green);

        slicePos++;
    }
}

如果您想手动操作without FFmpeg。你可以试试……

2) 将音频解码为 PCM
您可以将音频文件 (mp3) 加载到您的应用程序中，然后首先将其解码为 PCM（ie: 原始数字音频）。然后只读取 PCM 编号以制作波形。不要直接从 MP3 等压缩数学字节中读取数字。

这些 PCM 数据值（关于音频振幅）进入一个字节数组。如果您的声音是 16 位的，那么您可以通过将每个样本读取为 short 来提取 PCM 值（ie: 自 @987654334 以来一次获得 两个连续字节的值 @)。

基本上，当字节数组中有 16 位音频 PCM 时，每两个字节代表一个音频样本的振幅值。此值成为您在每个切片的高度（响度）。切片是波形中时间的 1 像素垂直线。

现在采样率表示每秒的采样数。通常是 44100 个样本 (44.1 khz)。可以看到，用 44000 像素来表示一秒的声音是不可行的，所以divide 所需波形的总秒数width。将结果 &multiply 乘以 2（覆盖两个字节），这就是你在形成波形时跳跃和采样幅度的程度。在while 循环中执行此操作。

【讨论】：

难以置信，非常感谢！我实际上设法做到了这个 hacky 解决方案，但我一定会试一试。感谢您的指点:)。不是来自博茨瓦纳，但足够接近（南非）:)