【问题标题】:Sending Voice over UDP with Java Sound API使用 Java Sound API 通过 UDP 发送语音
【发布时间】:2020-07-25 18:31:21
【问题描述】:

我使用 Java Sound API 开发了一个 Java 应用程序。它的作用是捕获来自麦克风的数据并通过 UDP 将其发送到其他计算机以便在那里播放。现在,我遇到了音量、质量和速度问题。我无法找出问题的根源,因此我需要帮助找出程序出了什么问题。


更新

速度问题似乎是由于 Java Sound API 太慢所致。我尝试了没有 UDP 套接字的程序,并且存在相同类型的延迟,因此 UDP 不会在 LAN 中引入额外的延迟。当程序与耳机一起使用时,回声问题就会消失。声音的质量总体来说还不算太差。唯一剩下的问题是音量。

以下是发件人:

import javax.sound.sampled.*;
import java.io.ByteArrayOutputStream;
import java.net.DatagramPacket;
import java.net.DatagramSocket;
import java.net.InetAddress;

public class VoipApp
{
    public static void main(String[]args) throws Exception
    {
        AudioFormat.Encoding encoding = AudioFormat.Encoding.PCM_SIGNED;
        float rate = 44100.0f;
        int sampleSize = 16;
        int channels = 2;
        int frameSize = 4;
        boolean bigEndian = true;

        AudioFormat format = new AudioFormat(encoding, rate, sampleSize, channels, (sampleSize / 8)
                * channels, rate, bigEndian);

        TargetDataLine line;
        DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
        if(!AudioSystem.isLineSupported(info)){
            System.out.println("Not Supported");
            System.exit(1);
        }

        DatagramSocket socket = new DatagramSocket(8081);
        //InetAddress IPAddress = InetAddress.getLocalHost();
        InetAddress IPAddress = InetAddress.getByName("192.168.0.14");

        try
        {
            line = (TargetDataLine) AudioSystem.getLine(info);
            line.open(format);

            //ByteArrayOutputStream out = new ByteArrayOutputStream();
            int numBytesRead;
            byte[] data = new byte [line.getBufferSize() / 5];
            int totalBytesRead = 0;

            line.start();
            while(true){
                numBytesRead = line.read(data,0, data.length);
                DatagramPacket sendPacket = new DatagramPacket(data, data.length, IPAddress, 8080);
               // totalBytesRead += numBytesRead;
                socket.send(sendPacket);
               //out.write(data, 0, numBytesRead);
               // System.out.println("Debug");
            }

        }
        catch(LineUnavailableException e)
        {
            e.printStackTrace();
        }
    }
}

下面是接收器:

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
import java.net.DatagramPacket;
import java.net.DatagramSocket;

public class VoipAppTwo
{
    public static void main(String[]args) throws Exception
    {
        AudioFormat.Encoding encoding = AudioFormat.Encoding.PCM_SIGNED;
        float rate = 44100.0f;
        int sampleSize = 16;
        int channels = 2;
        int frameSize = 4;
        boolean bigEndian = true;

        AudioFormat format = new AudioFormat(encoding, rate, sampleSize, channels, (sampleSize / 8)
                * channels, rate, bigEndian);

        SourceDataLine speakers;
        DataLine.Info dataLineInfo = new DataLine.Info(SourceDataLine.class, format);

        speakers = (SourceDataLine) AudioSystem.getLine(dataLineInfo);
        speakers.open(format);

        DatagramSocket socket = new DatagramSocket(8080);
        byte[] data = new byte[speakers.getBufferSize() / 5];
        speakers.start();
        while(true)
        {
            DatagramPacket receivePacket = new DatagramPacket(data, data.length);
            socket.receive(receivePacket);
            speakers.write(data, 0, data.length);
        }
    }
}

【问题讨论】:

    标签: java audio udp voip javasound


    【解决方案1】:

    一些想法,但需要注意的是我没有尝试使用 UDP 的直接经验。

    在我看来,必须“处理”丢失和乱序的数据包(假设为 UDP),否则预期的不连续性会不断产生破坏性的响亮点击。但是 IDK 这通常是如何完成的。过滤器?缓冲?将数据包数据封装到加窗(Hann 或 Hamming?)帧中以桥接数据包不连续性?

    javax.sound.sampled 非常接近原生声音。 Here is a good article to reference on considerations pertaining to real time, low latency Java-based audio.

    【讨论】:

    • 我设法使用 ALSA Sound API 使其几乎是实时的。
    • 很高兴听到找到解决方案!您如何访问 ALSA API?您是通过 Java 进行的,例如选择 ALSA 行吗? (还在使用 TargetDataLine/SourceDataLine?)还是其他方式?
    • 请考虑编写您的解决方案并将其标记为答案。这将是一项宝贵的贡献。
    • 我在 Linux 中使用 C(ALSA API 仅在 Linux 中可用)。与 Java Sound API 相比,音频捕获和播放之间的延迟几乎没有,但背景中有噪音——我无法摆脱。一旦我解决了噪音问题,我可以在这里发布解决方案。但它还没有准备好。
    • 我还尝试使用 Pulseaudio 进行捕获和播放。它比使用 ALSA API 更容易使用,但在捕获和播放之间存在相同类型的延迟,就像使用 Java Sound API 一样。
    【解决方案2】:

    最后,我可以在这里回答我自己的问题。我使用 ALSA API(Linux)、PulseAudio(Linux) 和 JackAudio(Linux) 使程序几乎是实时的(小于 1秒延迟)。 PulseAudio 的阻塞 API 比 ALSA 和 JackAudio 都好用,但我遇到了与 Java Sound API 相同的延迟。 JackAudio 是一款用于专业音频工作的低延迟音频服务器。我看到了这个例子:https://github.com/jackaudio/example-clients/blob/master/simple_client.c 并很快意识到它是用于单声道输出的。 我稍微修改了代码以使其成为立体声,然后添加了 UDP 套接字和第二个回调函数。回调函数就像它们是单独的线程一样工作,因此无需引入 POSIX 线程。

    代码并不完美,可能不是使用 JackAudio 的好方法,并且其中存在大量错误。但是,我相信对于那些对这类东西感兴趣的人来说,这是一个很好的起点。要使用它,首先通过您的发行版的包管理器安装JackAudio 服务器和开发文件。

    让我知道是否有更好的方法来使用任何可用的工具集实现相同的功能。

    杰克音频

    JackAudio 解决方案:

    #include <stdio.h>
    #include <errno.h>
    #include <unistd.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <arpa/inet.h>
    
    #include <jack/jack.h>
    
    #define PORT 8080
    
    int create_UDP_socket_send(struct sockaddr_in * server, const char * ip)
    {
        int sock;
        if((sock = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        {
            perror("Socket");
            exit(EXIT_FAILURE);
        }
    
        memset(server, 0, sizeof(*server));
        server->sin_family = AF_INET;
        server->sin_port = htons(PORT);
        server->sin_addr.s_addr = inet_addr(ip);
    
        return sock;
    }
    
    int create_UDP_socket_receive()
    {
        struct sockaddr_in server;
        int sock;
        if((sock = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        {
            perror("Socket");
            exit(EXIT_FAILURE);
        }
    
        memset(&server, 0, sizeof(server));
        server.sin_family = AF_INET;
        server.sin_port = htons(PORT);
        server.sin_addr.s_addr = INADDR_ANY;
    
        if (bind(sock, (struct sockaddr *)&server, sizeof(server)) < 0)
        {
            perror("bind()");
            exit(EXIT_FAILURE);
        }
    
        return sock;
    }
    
    typedef struct{
        int socket;
        struct sockaddr_in server;
    } __Info__;
    
    jack_port_t *input_port;
    jack_port_t *output_port1;
    jack_port_t *output_port2;
    jack_client_t *client1;
    jack_client_t *client2;
    
    int processFirst (jack_nframes_t nframes, void *arg)
    {
        jack_default_audio_sample_t *in, *out;
    
        __Info__ * info = (__Info__ *) arg;
        
        in = jack_port_get_buffer (input_port, nframes);
        sendto(info->socket, in, sizeof(jack_default_audio_sample_t) * nframes, 0, (struct sockaddr *)&(info->server), sizeof(info->server));
    
        return 0;
    }
    
    int processSecond (jack_nframes_t nframes, void *arg)
    {
        jack_default_audio_sample_t *out1, *out2;
        
        int * socket = (int *)arg;
    
        out1 = jack_port_get_buffer (output_port1, nframes);
        out2 = jack_port_get_buffer (output_port2, nframes);
        unsigned char buffer[sizeof(jack_default_audio_sample_t) * nframes];
        recvfrom(*socket, buffer, sizeof(jack_default_audio_sample_t) * nframes, 0, NULL, NULL);
        memcpy(out1, buffer, sizeof(jack_default_audio_sample_t) * nframes);
        memcpy(out2, buffer, sizeof(jack_default_audio_sample_t) * nframes);
    
        return 0;
    }
    
    void jack_shutdown (void *arg)
    {
        exit (1);
    }
    
    int main (int argc, char *argv[])
    {
        const char **ports;
        const char *client_name1 = "simple1";
        const char *client_name2 = "simple2";
        const char *server_name = NULL;
        jack_options_t options = JackNullOption;
        jack_status_t status;
    
        client1 = jack_client_open (client_name1, options, &status, server_name);
        if (client1 == NULL) {
            fprintf (stderr, "jack_client_open() failed, "
                 "status = 0x%2.0x\n", status);
            if (status & JackServerFailed) {
                fprintf (stderr, "Unable to connect to JACK server\n");
            }
            exit (1);
        }
    
        client2 = jack_client_open (client_name2, options, &status, server_name);
        if (client2 == NULL) {
            fprintf (stderr, "jack_client_open() failed, "
                 "status = 0x%2.0x\n", status);
            if (status & JackServerFailed) {
                fprintf (stderr, "Unable to connect to JACK server\n");
            }
            exit (1);
        }
    
        if (status & JackServerStarted) {
            fprintf (stderr, "JACK server started\n");
        }
    
        if (status & JackNameNotUnique) {
            client_name1 = jack_get_client_name(client1);
            fprintf (stderr, "unique name `%s' assigned\n", client_name1);
        }
    
        struct sockaddr_in server;
        int socket = create_UDP_socket_send(&server, argv[1]);
        __Info__ info = {
            socket, 
            server
        };
    
        int socket2 = create_UDP_socket_receive();
    
        jack_set_process_callback (client1, processFirst, &info);
        jack_set_process_callback (client2, processSecond, &socket2);
    
        jack_on_shutdown (client1, jack_shutdown, 0);
        jack_on_shutdown (client2, jack_shutdown, 0);
    
        printf ("engine sample rate: %" PRIu32 "\n", jack_get_sample_rate (client1));
        printf ("engine sample rate: %" PRIu32 "\n", jack_get_sample_rate (client2));
    
        input_port = jack_port_register (client1, "input", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
        output_port1 = jack_port_register (client2, "output1", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0);
        output_port2 = jack_port_register (client2, "output2", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 1);
    
        if ((input_port == NULL) || (output_port1 == NULL) || (output_port2 == NULL)) {
            fprintf(stderr, "no more JACK ports available\n");
            exit (1);
        }
    
        if (jack_activate (client1)) {
            fprintf (stderr, "cannot activate client");
            exit (1);
        }
    
        if (jack_activate (client2)) {
            fprintf (stderr, "cannot activate client2");
            exit (1);
        }
    
        ports = jack_get_ports (client1, NULL, NULL, JackPortIsPhysical | JackPortIsOutput);
        if (ports == NULL) {
            fprintf(stderr, "no physical capture ports\n");
            exit (1);
        }
    
        if (jack_connect (client1, ports[0], jack_port_name (input_port))) {
            fprintf (stderr, "cannot connect input ports\n");
        }
    
        free (ports);
        
        ports = jack_get_ports (client2, NULL, NULL, JackPortIsPhysical | JackPortIsInput);
        if (ports == NULL) {
            fprintf(stderr, "no physical playback ports\n");
            exit (1);
        }
    
        if (jack_connect (client2, jack_port_name (output_port1), ports[0])) {
            fprintf (stderr, "cannot connect output ports\n");
        }
    
        if (jack_connect (client2, jack_port_name (output_port2), ports[1])) {
            fprintf (stderr, "cannot connect output ports\n");
        }
    
        sleep (-1);
    
        jack_client_close (client1);
        jack_client_close (client2);
        exit (0);
    }
    

    ALSA API

    我使用了以下教程中的代码:https://www.linuxjournal.com/article/6735?page=0,2 有时,接收方会出现欠载。

    ALSA API 发件人:

    #define ALSA_PCM_NEW_HW_PARAMS_API
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <arpa/inet.h>
    #include <unistd.h>
    #include <errno.h>
    #include <alsa/asoundlib.h>
    
    #define PORT 8080
    
    int create_UDP_socket_send(struct sockaddr_in * server, const char * address)
    {
      int sock;
      if((sock = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
      {
        perror("Socket");
        exit(EXIT_FAILURE);
      }
    
      memset(server, 0, sizeof(*server));
      server->sin_family = AF_INET;
      server->sin_port = htons(PORT);
      server->sin_addr.s_addr = inet_addr((const char *)address);
    
      return sock;
    }
    
    int main() {
      long loops;
      int rc;
      int size;
      snd_pcm_t *handle;
      snd_pcm_hw_params_t *params;
      unsigned int val;
      int dir;
      snd_pcm_uframes_t frames;
      //char *buffer;
    
      /* Open PCM device for recording (capture). */
      rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_CAPTURE, 0);
      if (rc < 0) {
        fprintf(stderr, "unable to open pcm device: %s\n", snd_strerror(rc));
        exit(1);
      }
    
      /* Allocate a hardware parameters object. */
      snd_pcm_hw_params_alloca(&params);
    
      /* Fill it in with default values. */
      snd_pcm_hw_params_any(handle, params);
    
      /* Set the desired hardware parameters. */
    
      /* Interleaved mode */
      snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED);
    
      /* Signed 16-bit little-endian format */
      snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE);
    
      /* Two channels (stereo) */
      snd_pcm_hw_params_set_channels(handle, params, 2);
    
      /* 44100 bits/second sampling rate (CD quality) */
      val = 44100;
      snd_pcm_hw_params_set_rate_near(handle, params, &val, &dir);
    
      /* Set period size to 32 frames. */
      frames = 32;
      snd_pcm_hw_params_set_period_size_near(handle, params, &frames, &dir);
    
      /* Write the parameters to the driver */
      rc = snd_pcm_hw_params(handle, params);
      if (rc < 0) {
        fprintf(stderr, "unable to set hw parameters: %s\n", snd_strerror(rc));
        exit(1);
      }
    
      /* Use a buffer large enough to hold one period */
      snd_pcm_hw_params_get_period_size(params, &frames, &dir);
      
      size = frames * 4; /* 2 bytes/sample, 2 channels */
      // buffer = (char *) malloc(size);
      char buffer[size];
    
      int nsent;
      struct sockaddr_in server;
      int socket = create_UDP_socket_send(&server, "127.0.0.1");
    
      snd_pcm_hw_params_get_period_time(params, &val, &dir);
    
      while (1) {
        rc = snd_pcm_readi(handle, buffer, frames);
        if (rc == -EPIPE) {
          /* EPIPE means overrun */
          fprintf(stderr, "overrun occurred\n");
          snd_pcm_prepare(handle);
        } else if (rc < 0) {
          fprintf(stderr, "error from read: %s\n", snd_strerror(rc));
        } else if (rc != (int)frames) {
          fprintf(stderr, "short read, read %d frames\n", rc);
        }
        // rc = write(1, buffer, size);
        if((nsent = sendto(socket, buffer, sizeof(buffer), 0, (struct sockaddr *)&server, sizeof(server))) < 0)
        {
          perror("sendto()");
          exit(EXIT_FAILURE);
        }
      }
    
      snd_pcm_drain(handle);
      snd_pcm_close(handle);
    
      return 0;
    }
    

    ALSA API 接收器:

    #define ALSA_PCM_NEW_HW_PARAMS_API
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <arpa/inet.h>
    #include <unistd.h>
    #include <errno.h>
    #include <alsa/asoundlib.h>
    
    #define PORT 8080
    
    int create_UDP_socket_receive()
    {
        struct sockaddr_in server;
        int sock;
        if((sock = socket(AF_INET, SOCK_DGRAM, 0)) < 0)
        {
            perror("Socket");
            exit(EXIT_FAILURE);
        }
    
        memset(&server, 0, sizeof(server));
        server.sin_family = AF_INET;
        server.sin_port = htons(PORT);
        server.sin_addr.s_addr = INADDR_ANY;
    
        if (bind(sock, (struct sockaddr *)&server, sizeof(server)) < 0)
        {
            perror("bind()");
            exit(EXIT_FAILURE);
        }
    
        return sock;
    }
    
    int main() {
        long loops;
        int rc;
        int size;
        snd_pcm_t *handle;
        snd_pcm_hw_params_t *params;
        unsigned int val;
        int dir;
        snd_pcm_uframes_t frames;
    
      /* Open PCM device for playback. */
        rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_PLAYBACK, 0);
        if (rc < 0) {
            fprintf(stderr, "unable to open pcm device: %s\n", snd_strerror(rc));
            exit(1);
        }
    
      /* Allocate a hardware parameters object. */
        snd_pcm_hw_params_alloca(&params);
    
      /* Fill it in with default values. */
        snd_pcm_hw_params_any(handle, params);
    
      /* Set the desired hardware parameters. */
    
      /* Interleaved mode */
        snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED);
    
      /* Signed 16-bit little-endian format */
        snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE);
    
      /* Two channels (stereo) */
        snd_pcm_hw_params_set_channels(handle, params, 2);
    
      /* 44100 bits/second sampling rate (CD quality) */
        val = 44100;
        snd_pcm_hw_params_set_rate_near(handle, params, &val, &dir);
    
      /* Set period size to 32 frames. */
        frames = 32;
        snd_pcm_hw_params_set_period_size_near(handle, params, &frames, &dir);
    
      /* Write the parameters to the driver */
        rc = snd_pcm_hw_params(handle, params);
        if (rc < 0) {
            fprintf(stderr, "unable to set hw parameters: %s\n", snd_strerror(rc));
            exit(1);
        }
    
        /* Use a buffer large enough to hold one period */
        snd_pcm_hw_params_get_period_size(params, &frames, &dir);
        
        size = frames * 4; /* 2 bytes/sample, 2 channels */
        char buffer[size];
    
        int nread;
        int socket = create_UDP_socket_receive();
    
        snd_pcm_hw_params_get_period_time(params, &val, &dir);
    
        while (1) {
    
            if((nread = recvfrom(socket, buffer, sizeof(buffer), 0, NULL, NULL)) < 0)
            {
                perror("recvfrom()");
                exit(EXIT_FAILURE);
            }
    
            // write(1, buffer, sizeof(buffer));
    
            rc = snd_pcm_writei(handle, buffer, frames);
            if (rc == -EPIPE) {
            /* EPIPE means underrun */
                fprintf(stderr, "underrun occurred\n");
                snd_pcm_prepare(handle);
            } else if (rc < 0) {
                fprintf(stderr, "error from writei: %s\n", snd_strerror(rc));
            }  else if (rc != (int)frames) {
                fprintf(stderr, "short write, write %d frames\n", rc);
            }
        }
    
        snd_pcm_drain(handle);
        snd_pcm_close(handle);
        free(buffer);
    
        return 0;
    }
    

    【讨论】:

      【解决方案3】:

      网络语音是一个复杂的主题,有很多棘手的问题。

      我认为您的问题来自网络层的不规则性。

      如果你的网络不是 100% 稳定的(你的计算机/盒子/等的缓冲区中的其他应用程序的干扰是非常困难的。)你的声音会有点机器人。

      避免这种情况的一种方法是添加一个小缓冲区,它会延迟接收者和发送者之间的通信。如果延迟太低,性能问题的影响将继续发生。如果它太长了那就太长了;)...

      另一个可能发生的问题(最有可能在 WAN 网络中)是可能影响通信的数据包丢失。为了解决这个问题。我没有很好的解决方案,这取决于网络或/和用例。

      【讨论】:

      • 所以,我摆脱了 UDP 套接字,只是在缓冲区上使用了 TargetDataLineSourceDataLine 来查看问题是否是由 UDP 套接字引起的。事实证明,UDP 套接字不会导致延迟,而是 Java Sound API。我需要更快的 API|比Java提供的那些。我的替代方案是什么?我也可以用 C++ 重写它,但是我还没有找到任何 C++ 库来连接麦克风和扬声器。
      • 可能是用于 Java 库的 Beads 或 Jsyn
      • Python 是一种可能......但我觉得奇怪的是 java 播放声音的速度不够快
      • 我会说它足够快,但我需要一些接近实时的东西。
      • 我已经用 2 台计算机 abd 尝试过你的代码,实际上它运行良好(我有一点回声,但仅此而已)
      猜你喜欢
      • 2014-07-13
      • 2018-11-09
      • 1970-01-01
      • 2011-05-13
      • 2015-02-19
      • 1970-01-01
      • 1970-01-01
      • 2023-04-07
      • 1970-01-01
      相关资源
      最近更新 更多