Thread Contention: A Common Performance Issue in DXP 7.0

This article documents a common performance issue related to Thread Contention within Liferay DXP 7.0. Read below for the necessary information about this issue and how to navigate through it.

Resolution

Thread contention is when there are multiple threads waiting to lock the same object. This can get out of hand as the time it takes for a thread to acquire the lock grows with the number of threads waiting for the lock. This is made worse by the fact that the JVM does not utilize a FIFO system when awarding the lock to one of the waiting threadsit chooses one based on some criteria, and for our purposes that may as well be random.

Symptoms depend on the the severity of the thread contention. If there are only a couple of threads affected, then we may only see slower response times. If hundreds are affected, then the Liferay platform may as well be completely unresponsive. 

Here is an example from logging:

"http-/0.0.0.0:8280-248" daemon prio=10 tid=0x00007fbd0018b780 nid=0x484d waiting for monitor entry [0x00007fbbba490000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.Category.callAppenders(Category.java:205)
    - waiting to lock <0x000000066b869058> (a org.apache.log4j.spi.RootLogger)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.log(Category.java:856)

In this case, log4j is waiting to lock a RootLogger object and its execution is suspended (e.g. state is not RUNNABLE) until the lock is acquired. If searching for the ID (0x0000...), it might display that a lot of other threads are also waiting on the same object.

If there are over a dozen or even hundreds of threads trying to call log4j's Category.callAppenders() and are waiting to lock the same object, it is definitely a thread contention issue.

Spotify's Online Thread Dump Analyzer summarizes this result neatly:

spotify-performance-01.PNG

Conclusion:

It is a thread contention issue if a lot of threads are waiting to lock objects, and this state persists throughout multiple thread dumps.

Possible solutions to these issues are:

  1. Change the execution logic not to rely so much on the bottleneck, or... 
  2. Make some configuration changes, or... 
  3. Add more resources that can handle the load it is receiving. 
¿Fue útil este artículo?
Usuarios a los que les pareció útil: 0 de 2