浅析libcurl多线程安全问题

背景:使用多线程libcurl发送请求,在未设置超时或长超时的情况下程序运行良好。但只要设置了较短超时(小于180s),程序就会出现随机的coredump。并且栈里面找不到任何有用的信息。

问题:1.为什么未设置超时,或者长超时时间(比如601s)的情况下多线程libcurl不会core?

问题:2.进程coredump并不是必现,是否在libcurl内多线程同时修改了全局变量导致?

 

先来看下官方libcurl的说明:

libcurl is freethread-safeIPv6 compatiblefeature richwell supportedfastthoroughly documented and is already used by many known, big and successful companies and numerous applications.

可以看到官方自称licurl是线程安全的,是否真的如此?再来看看代码中用到的超时选项的说明:

CURLOPT_TIMEOUT

Pass a long as parameter containing the maximum time in seconds that you allow the libcurl transfer operation to take. Normally, name lookups can take a considerable time and limiting operations to less than a few minutes risk aborting perfectly normal operations. This option will cause curl to use the SIGALRM to enable time-outing system calls.

In unix-like systems, this might cause signals to be used unless CURLOPT_NOSIGNAL is set.

Default timeout is 0 (zero) which means it never times out.

选项提到了超时机制是使用SIGALRM信号量来实现的,并且在unix-like操作系统中又提到了另外一个选项CURLOPT_NOSIGNAL:

CURLOPT_NOSIGNAL

Pass a long. If it is 1, libcurl will not use any functions that install signal handlers or any functions that cause signals to be sent to the process. This option is mainly here to allow multi-threaded unix applications to still set/use all timeout options etc, without risking getting signals. The default value for this parameter is 0. (Added in 7.10)

If this option is set and libcurl has been built with the standard name resolver, timeouts will not occur while the name resolve takes place. Consider building libcurl with c-ares support to enable asynchronous DNS lookups, which enables nice timeouts for name resolves without signals.

Setting CURLOPT_NOSIGNAL to 1 makes libcurl NOT ask the system to ignore SIGPIPE signals, which otherwise are sent by the system when trying to send data to a socket which is closed in the other end. libcurl makes an effort to never cause such SIGPIPEs to trigger, but some operating systems have no way to avoid them and even on those that have there are some corner cases when they may still happen, contrary to our desire. In addition, usingCURLAUTH_NTLM_WB authentication could cause a SIGCHLD signal to be raised.

该选项说明提到,为了在多线程中允许程序去设置timeout选项,但不是使用signals,需要设置CURLOPT_NOSIGNAL为1 。

于是在代码中加上了这句,测试再没有发现有coredump的情况。

 1 easy_setopt(curl, CURLOPT_NOSIGNAL, (long)1); 

问题:3.timeout机制实现机制是什么,为什么设置了选项CURLOPT_NOSIGNAL线程就安全了?

 

为了解答上面的问题,需要查看libcurl的相关源代码,以下是DNS解析的函数:

  1 int Curl_resolv_timeout(struct connectdata *conn,
  2                         const char *hostname,
  3                         int port,
  4                         struct Curl_dns_entry **entry,
  5                         long timeoutms)
  6 {
  7 #ifdef USE_ALARM_TIMEOUT
  8 #ifdef HAVE_SIGACTION
  9   struct sigaction keep_sigact;   /* store the old struct here */
 10   volatile bool keep_copysig = FALSE; /* wether old sigact has been saved */
 11   struct sigaction sigact;
 12 #else
 13 #ifdef HAVE_SIGNAL
 14   void (*keep_sigact)(int);       /* store the old handler here */
 15 #endif /* HAVE_SIGNAL */
 16 #endif /* HAVE_SIGACTION */
 17   volatile long timeout;
 18   volatile unsigned int prev_alarm = 0;
 19   struct SessionHandle *data = conn->data;
 20 #endif /* USE_ALARM_TIMEOUT */
 21   int rc;
 22 
 23   *entry = NULL;
 24 
 25   if(timeoutms < 0)
 26     /* got an already expired timeout */
 27     return CURLRESOLV_TIMEDOUT;
 28 
 29 #ifdef USE_ALARM_TIMEOUT
 30   if(data->set.no_signal)
 31     /* Ignore the timeout when signals are disabled */
 32     timeout = 0;
 33   else
 34     timeout = timeoutms;
 35 
 36   if(!timeout)
 37     /* USE_ALARM_TIMEOUT defined, but no timeout actually requested */
 38     return Curl_resolv(conn, hostname, port, entry);
 39 
 40   if(timeout < 1000)
 41     /* The alarm() function only provides integer second resolution, so if
 42        we want to wait less than one second we must bail out already now. */
 43     return CURLRESOLV_TIMEDOUT;
 44 
 45   /*************************************************************
 46    * Set signal handler to catch SIGALRM
 47    * Store the old value to be able to set it back later!
 48    *************************************************************/
 49 #ifdef HAVE_SIGACTION
 50   sigaction(SIGALRM, NULL, &sigact);
 51   keep_sigact = sigact;
 52   keep_copysig = TRUE; /* yes, we have a copy */
 53   sigact.sa_handler = alarmfunc;
 54 #ifdef SA_RESTART
 55   /* HPUX doesn't have SA_RESTART but defaults to that behaviour! */
 56   sigact.sa_flags &= ~SA_RESTART;
 57 #endif
 58   /* now set the new struct */
 59   sigaction(SIGALRM, &sigact, NULL);
 60 #else /* HAVE_SIGACTION */
 61   /* no sigaction(), revert to the much lamer signal() */
 62 #ifdef HAVE_SIGNAL
 63   keep_sigact = signal(SIGALRM, alarmfunc);
 64 #endif
 65 #endif /* HAVE_SIGACTION */
 66 
 67   /* alarm() makes a signal get sent when the timeout fires off, and that
 68      will abort system calls */
 69   prev_alarm = alarm(curlx_sltoui(timeout/1000L));
 70 
 71   /* This allows us to time-out from the name resolver, as the timeout
 72      will generate a signal and we will siglongjmp() from that here.
 73      This technique has problems (see alarmfunc).
 74      This should be the last thing we do before calling Curl_resolv(),
 75      as otherwise we'd have to worry about variables that get modified
 76      before we invoke Curl_resolv() (and thus use "volatile"). */
 77   if(sigsetjmp(curl_jmpenv, 1)) {
 78     /* this is coming from a siglongjmp() after an alarm signal */
 79     failf(data, "name lookup timed out");
 80     rc = CURLRESOLV_ERROR;
 81     goto clean_up;
 82   }
 83 
 84 #else
 85 #ifndef CURLRES_ASYNCH
 86   if(timeoutms)
 87     infof(conn->data, "timeout on name lookup is not supported\n");
 88 #else
 89   (void)timeoutms; /* timeoutms not used with an async resolver */
 90 #endif
 91 #endif /* USE_ALARM_TIMEOUT */
 92 
 93   /* Perform the actual name resolution. This might be interrupted by an
 94    * alarm if it takes too long.
 95    */
 96   rc = Curl_resolv(conn, hostname, port, entry);
 97 
 98 #ifdef USE_ALARM_TIMEOUT
 99 clean_up:
100 
101   if(!prev_alarm)
102     /* deactivate a possibly active alarm before uninstalling the handler */
103     alarm(0);
104 
105 #ifdef HAVE_SIGACTION
106   if(keep_copysig) {
107     /* we got a struct as it looked before, now put that one back nice
108        and clean */
109     sigaction(SIGALRM, &keep_sigact, NULL); /* put it back */
110   }
111 #else
112 #ifdef HAVE_SIGNAL
113   /* restore the previous SIGALRM handler */
114   signal(SIGALRM, keep_sigact);
115 #endif
116 #endif /* HAVE_SIGACTION */
117 
118   /* switch back the alarm() to either zero or to what it was before minus
119      the time we spent until now! */
120   if(prev_alarm) {
121     /* there was an alarm() set before us, now put it back */
122     unsigned long elapsed_ms = Curl_tvdiff(Curl_tvnow(), conn->created);
123 
124     /* the alarm period is counted in even number of seconds */
125     unsigned long alarm_set = prev_alarm - elapsed_ms/1000;
126 
127     if(!alarm_set ||
128        ((alarm_set >= 0x80000000) && (prev_alarm < 0x80000000)) ) {
129       /* if the alarm time-left reached zero or turned "negative" (counted
130          with unsigned values), we should fire off a SIGALRM here, but we
131          won't, and zero would be to switch it off so we never set it to
132          less than 1! */
133       alarm(1);
134       rc = CURLRESOLV_TIMEDOUT;
135       failf(data, "Previous alarm fired off!");
136     }
137     else
138       alarm((unsigned int)alarm_set);
139   }
140 #endif /* USE_ALARM_TIMEOUT */
141 
142   return rc;
143 }
View Code

相关文章: