One thing I’d like to drive home with regard to latches is this: latches are a type of lock, locks are serialization devices, and serialization devices inhibit scalability. If your goal is to construct an application that scales well in an Oracle environment, you must look for approaches and solutions that minimize the amount of latching you need to perform.

Even seemingly simple activities, such as parsing a SQL statement, acquire and release hundreds or thousands of latches on the library cache and related structures in the shared pool. If we have a latch, then someone else might be waiting for it. When we go to get a latch, we may well have to wait for it ourselves.

Waiting for a latch can be an expensive operation. If the latch is not available immediately and we are willing to wait for it, as we likely are most of the time, then on a multi-CPU machine our session will spin, trying over and over, in a loop, to get the latch.

The reasoning behind this is that context switching (i.e., getting kicked off the CPU and having to get back on the CPU) is expensive. So, if the process cannot get a latch immediately, we’ll stay on the CPU and try again immediately rather than just going to sleep, giving up the CPU, and trying later when we’ll have to get scheduled back on the CPU.

The hope is that the holder of the latch is busy processing on the other CPU (and since latches are designed to be held for very short periods of time, this is likely) and will give it up soon. If after spinning and constantly trying to get the latch, we still fail to obtain it, only then will our process sleep, or take itself off of the CPU, and let some other work take place.

This sleep action is usually the result of many sessions concurrently requesting the same latch; it is not that a single session is holding it for a long time,but rather that so many sessions want it at the same time, and each holds it for a short duration. If you do something short (fast) often enough, it adds up! The pseudo-code for a latch get might look like this:

Loop for i in 1 .. 2000 loop try to get latch if got latch, return if i = 1 then misses=misses+1end loop INCREMENT WAIT COUNTsleepAdd WAIT TIMEEnd loop;

The logic is to try to get the latch and, failing that, to increment the miss count, a statistic we can see in a Statspack report or by querying the V$LATCH view directly.

Once the process misses, it will loop some number of times (an undocumented parameter controls the number of times and is typically set to 2000), attempting to get the latch over and over. If one of these get attempts succeeds, then it returns and we continue processing. If they all fail, the process will go to sleep for a short duration of time, after incrementing the sleep count for that latch.

Upon waking up, the process begins all over again. This implies that the cost of getting a latch is not just the “test and set”–type operation that takes place but also a considerable amount of CPU while we try to get the latch. Our system will appear to be very busy (with much CPU being consumed), but not much work is getting done.


Post datePost date 07/23/2023
Last editLast edit
AuthorAuthor
TagsTags

Leave a Reply

Your email address will not be published. Required fields are marked *