MySQL死锁日志中的HOLDS THE LOCKS里的waitting

详细说明如下, 重点在最后一段,HOLD THE LOCKS部分的waitting并非是指已经取得的锁,而是说当前还在尝试取得的锁。

[7 Sep 9:57] Jakub Lopuszanski

Hi, thank you for detailed analysis.

There’s clearly many ways in which the output could be made more readable to humans.

However, if you put aside the very unfriendly format, the information provided is actually useful and in some sense (that is if you stretch the meaning of “HOLD”) true.

To describe a deadlock cycle involving N transactions, we need to specify at least 2*N lock objects.
Perhaps it’s not immediately obvious, that a deadlock cycle involving N transactions actually (often) involves twice as many locks:
– what i-th transaction is trying to acquire
– what lock of (i+1 mod N)-th transaciton is preventing the above acquisition
This is perhaps a bit confusing because the word “lock” itself in InnoDB means “a request for a lock” (so, there are as many of them as there are transactions competing for a given resource) rather than a single thing per resource for which transactions compete (which might be a common way to define this word in say, C++).
So, you can find some pictures in the web where deadlock among two transactions is shown as involving two locks, but in InnoDB’s parlance, it involves four locks (=four lock requests). In the picture there are two items (resources), two people (transactions), but four hands (=lock requests).

Yes, it may happen that some of these 2*N locks are in fact the same – this is the case in your example – the same “WAITING X,REC_NOT_GAP” lock of 291 is at the same time the one for which transaction 291 is waiting, and the reason transaction 290 can not acquire it’s lock.
As you may already know, this is a part of strategy to avoid starvation of waiters – the transaction 290, when trying to acquire a lock, must check for conflicts also with already existing waiters in the queue, such as transaction 291.
So, a good picture for this situation might be a hand of 290 holding a thing, wrapped by a hand of 291 which also tries to grab this thing, and another hand of 290 wrapping both of them trying to get it. Here the middle hand plays two roles: of an unsatisfied, waiting request, and of a reason someone else has to wait.

Yes, the title of the section uses verb “HOLDs”, which might be interpreted as “was granted the lock”, but this is later “clarified” at the end of the next output line:

*** (1) HOLDS THE LOCK(S):
  RECORD LOCKS space id 1940 page no 7 n bits 248 index PRIMARY of table `world`.`city` trx id 541847 lock_mode X locks rec but not gap waiting

Note the word “waiting” at the end of the list of “attributes” of this lock object – this “clarifies” that this lock is not yet granted, but waiting, yet still it plays the role of the request of transaction 291 which is preventing the request of transaction 290 from fulfilment.

Removing this section, as you suggest, is IMHO not a good idea, because then it wouldn’t be clear what is the reason that transaction 290 has to wait. (Also, it would break backward compatibility with tools parsing it).

So, yes, these could be made more human-readable, for example by using word “IS ASSOCIATED WITH THE LOCK(S):” or something like this, but IMHO, “HOLDS” is not such a bad word to describe the situation that trx 291 is so firmly attached to this lock, and will not let go, nor allow 290 to bypass it 🙂

Anyway, I believe we can’t fix it for two reasons:
1. Backward compatibility. There are tools which might search for headers like this one to obtain information.
2. We rather figure out a way to put the deadlock information into performance_schema, than invest in improving these texts.