This article documents a common performance issue related to Thread Contention within Liferay DXP 7.0. Read below for the necessary information about this issue and how to navigate through it.
Thread contention is when there are multiple threads waiting to lock the same object. This can get out of hand as the time it takes for a thread to acquire the lock grows with the number of threads waiting for the lock. This is made worse by the fact that the JVM does not utilize a FIFO system when awarding the lock to one of the waiting threads—it chooses one based on some criteria, and for our purposes that may as well be random.
Symptoms depend on the the severity of the thread contention. If there are only a couple of threads affected, then we may only see slower response times. If hundreds are affected, then the Liferay platform may as well be completely unresponsive.
Here is an example from logging:
"http-/0.0.0.0:8280-248" daemon prio=10 tid=0x00007fbd0018b780 nid=0x484d waiting for monitor entry [0x00007fbbba490000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:205) - waiting to lock <0x000000066b869058> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856)
In this case,
log4j is waiting to lock a
RootLogger object and its execution is suspended (e.g. state is not RUNNABLE) until the lock is acquired. If searching for the
ID (0x0000...), it might display that a lot of other threads are also waiting on the same object.
If there are over a dozen or even hundreds of threads trying to call
Category.callAppenders() and are waiting to lock the same object, it is definitely a thread contention issue.
Spotify's Online Thread Dump Analyzer summarizes this result neatly:
It is a thread contention issue if a lot of threads are waiting to lock objects, and this state persists throughout multiple thread dumps.
Possible solutions to these issues are:
- Change the execution logic not to rely so much on the bottleneck, or...
- Make some configuration changes, or...
- Add more resources that can handle the load it is receiving.