我想出了一个可行的方法来处理这种情况。正如我之前提到的,FreeBSD 不支持强大的互斥锁,因此该选项已被淘汰。还有一个线程锁定了一个互斥锁,它不能以任何方式解锁。
所以我为解决这个问题所做的就是放弃互斥体并将其指针放在一个列表中。由于锁包装器代码使用 pthread_mutex_trylock,然后如果它失败则放弃 CPU,因此没有线程可以在等待永久锁定的互斥锁时卡住。在健壮互斥体的情况下,锁定互斥体的线程将能够恢复它,如果它获得 EOWNERDEAD 作为返回码。
这里有一些定义:
/* Checks to see if we have access to robust mutexes. */
#ifndef PTHREAD_MUTEX_ROBUST
#define TSRA__ALTERNATE
#define TSRA_MAX_MUTEXABANDON TSRA_MAX_MUTEX * 4
#endif
/* Mutex: Mutex Data Table Datatype */
typedef struct mutex_lock_table_tag__ mutexlock_t;
struct mutex_lock_table_tag__
{
pthread_mutex_t *mutex; /* PThread Mutex */
tsra_daclbk audcallbk; /* Audit Callback Function Pointer */
tsra_daclbk reicallbk; /* Reinit Callback Function Pointer */
int acbkstat; /* Audit Callback Status */
int rcbkstat; /* Reinit Callback Status */
pthread_t owner; /* Owner TID */
#ifdef TSRA__OVERRIDE
tsra_clnup_t *cleanup; /* PThread Cleanup */
#endif
};
/* ******** ******** Global Variables */
pthread_rwlock_t tab_lock; /* RW lock for mutex table */
pthread_mutexattr_t mtx_attrib; /* Mutex attributes */
mutexlock_t *mutex_table; /* Mutex Table */
int tabsizeentry; /* Table Size (Entries) */
int tabsizebyte; /* Table Size (Bytes) */
int initialized = 0; /* Modules Initialized 0=no, 1=yes */
#ifdef TSRA__ALTERNATE
pthread_mutex_t *mutex_abandon[TSRA_MAX_MUTEXABANDON];
pthread_mutex_t mtx_abandon; /* Abandoned Mutex Lock */
int mtx_abandon_count; /* Abandoned Mutex Count */
int mtx_abandon_init = 0; /* Initialization Flag */
#endif
pthread_mutex_t mtx_recover; /* Mutex Recovery Lock */
这里有一些锁恢复的代码:
/* Attempts to recover a broken mutex. */
int tsra_mutex_recover(int lockid, pthread_t tid)
{
int result;
/* Check Prerequisites */
if (initialized == 0) return(EDOOFUS);
if (lockid < 0 || lockid >= tabsizeentry) return(EINVAL);
/* Check Mutex Owner */
result = pthread_equal(tid, mutex_table[lockid].owner);
if (result != 0) return(0);
/* Lock Recovery Mutex */
result = pthread_mutex_lock(&mtx_recover);
if (result != 0) return(result);
/* Check Mutex Owner, Again */
result = pthread_equal(tid, mutex_table[lockid].owner);
if (result != 0)
{
pthread_mutex_unlock(&mtx_recover);
return(0);
}
/* Unless the system supports robust mutexes, there is
really no way to recover a mutex that is being held
by a thread that has terminated. At least in FreeBSD,
trying to destory a mutex that is held will result
in EBUSY. Trying to overwrite a held mutex results
in a memory fault and core dump. The only way to
recover is to abandon the mutex and create a new one. */
#ifdef TSRA__ALTERNATE /* Abandon Mutex */
pthread_mutex_t *ptr;
/* Too many abandoned mutexes? */
if (mtx_abandon_count >= TSRA_MAX_MUTEXABANDON)
{
result = TSRA_PROGRAM_ABORT;
goto error_1;
}
/* Get a read lock on the mutex table. */
result = pthread_rwlock_rdlock(&tab_lock);
if (result != 0) goto error_1;
/* Perform associated data audit. */
if (mutex_table[lockid].acbkstat != 0)
{
result = mutex_table[lockid].audcallbk();
if (result != 0)
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
}
/* Allocate New Mutex */
ptr = malloc(sizeof(pthread_mutex_t));
if (ptr == NULL)
{
result = errno;
goto error_2;
}
/* Init new mutex and abandon the old one. */
result = pthread_mutex_init(ptr, &mtx_attrib);
if (result != 0) goto error_3;
mutex_abandon[mtx_abandon_count] = mutex_table[lockid].mutex;
mutex_abandon[mtx_abandon_count] = mutex_table[lockid].mutex;
mtx_abandon_count++;
mutex_table[lockid].mutex = ptr;
#else /* Recover Mutex */
/* Try locking the mutex and see what we get. */
result = pthread_mutex_trylock(mutex_table[lockid].mutex);
switch (result)
{
case 0: /* No error, unlock and return */
pthread_unlock_mutex(mutex_table[lockid].mutex);
return(0);
break;
case EBUSY: /* No error, return */
return(0);
break;
case EOWNERDEAD: /* Error, try to recover mutex. */
if (mutex_table[lockid].acbkstat != 0)
{
result = mutex_table[lockid].audcallbk();
if (result != 0)
{
if (mutex_table[lockid].rcbkstat != 0)
{
result = mutex_table[lockid].reicallbk();
if (result != 0)
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
}
else
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
}
}
else
{
result = TSRA_PROGRAM_ABORT;
goto error_2;
}
break;
case EDEADLK: /* Error, deadlock avoided, abort */
case ENOTRECOVERABLE: /* Error, recovery failed, abort */
/* NOTE: We shouldn't get this, but if we do... */
abort();
break;
default:
/* Ambiguous situation, best to abort. */
abort();
break;
}
pthread_mutex_consistant(mutex_table[lockid].mutex);
pthread_mutex_unlock(mutex_table[lockid].mutex);
#endif
/* Housekeeping */
mutex_table[lockid].owner = pthread_self();
pthread_mutex_unlock(&mtx_recover);
/* Return */
return(0);
/* We only get here on errors. */
#ifdef TSRA__ALTERNATE
error_3:
free(ptr);
error_2:
pthread_rwlock_unlock(&tab_lock);
#else
error_2:
pthread_mutex_unlock(mutex_table[lockid].mutex);
#endif
error_1:
pthread_mutex_unlock(&mtx_recover);
return(result);
}
因为 FreeBSD 是一个像 Linux 一样不断发展的操作系统,所以我已经做出了规定,以允许将来使用强大的互斥锁。由于没有健壮的互斥体,因此如果支持健壮的互斥体,则确实无法进行增强的错误检查。
对于健壮的互斥体,执行增强的错误检查以验证是否需要恢复互斥体。对于不支持健壮互斥体的系统,我们必须信任调用者来验证有问题的互斥体是否需要恢复。此外,还有一些检查以确保只有一个线程执行恢复。在互斥体上阻塞的所有其他线程都被阻塞。我已经考虑过如何向其他线程发出恢复正在进行的信号,因此例程的这方面仍然需要工作。在恢复情况下,我正在考虑比较指针值以查看互斥锁是否被替换。
在这两种情况下,审计例程都可以设置为回调函数。审计例程的目的是验证和纠正受保护数据中的任何数据差异。如果审计未能更正数据,则调用另一个回调例程,即数据重新初始化例程。这样做的目的是重新初始化受互斥锁保护的数据。如果失败,则调用 abort() 以终止程序执行并删除核心文件以进行调试。
对于放弃互斥的情况,指针并没有被丢弃,而是放在一个列表中。如果放弃了太多互斥锁,则程序将中止。如上所述,在互斥锁例程中,使用 pthread_mutex_trylock 代替 pthread_mutex_lock。这样,就不会在死互斥体上永久阻塞任何线程。所以一旦在互斥表中切换指针指向新的互斥量,所有等待互斥量的线程都会立即切换到新的互斥量。
我确信此代码中存在错误/错误,但这是一项正在进行的工作。虽然还没有完全完成和调试,但我觉得这里有足够的理由来回答这个问题。