Possible deadlock in qcc::Timer::AddAlarm()?

asked 2014-06-23 13:26:54 -0700

If I host a session on a relatively slow device, I often experience deadlocks when several peers try to join the session all at once. In the debugger, I'm seeing multiple threads blocked on select() inside a call to Event::Wait(Event::neverSet, Event::WAIT_FOREVER), which is called from qcc::Timer::AddAlarm() when there are already maxAlarm alarms installed.

Most of the locked threads are of type TimerThread, but the main thread also gets blocked in Timer::AddAlarm() deep below a call to -[PGMPeerGroupManager getHostPeerIdOfGroup:]

As far as I can tell, the waiting threads in the Timer's addWaitQueue can only be released in TimerThread::Run() or TimerThread::Stop(). Since most of the blocked threads are timer threads, they can't reach the point where they release the next waiter, and the main thread call from the peer group manager doesn't appear to ever release any of those waiters, hence the deadlock.

Is there something I can do to avoid this situation?

I'm not sure it is the same issue, but I also experiences deadlock pretty often so any news on this would be great !

PierreR ( 2014-07-02 00:25:55 -0700 )edit

It would be of a great helpf to file a JIRA ticket here: https://jira.allseenalliance.org

praveenb ( 2014-07-04 08:23:21 -0700 )edit

I thought of that but I'm afraid don't have enough information to file a meaningful ticket :(

PierreR ( 2014-07-09 03:34:46 -0700 )edit

robertg and PierreR which Operating system are you seeing these deadlock issues?

georgen ( 2014-07-14 09:53:07 -0700 )edit

To keep this conversation going, can you please explain what "relatively slow device" means. Can you comment if this is just iOS or Android as well? Are you seeing the issue on the client or services side? Any chance you are using the JoinOrCreate api? How many devices does "several peers" mean?

bspencer ( 2014-08-04 11:53:27 -0700 )edit

1 answer

Sort by » oldest newest most voted

answered 2014-08-15 10:16:38 -0700

updated 2014-08-28 15:36:51 -0700

Seems to be the same as ASACORE-751 which has been fixed now.

