Loading directory metadata with high concurrency can lead to deadlock

Description

A deadlock can occur in the following situation:
1. Directory /X exists within Alluxio
2. Directories /X/a, /X/b, /X/c exist only in the UFS
3. Many clients try to list /X at the same time.

When we try to load the children of /X from multiple threads, it's possible that thread A loads /X/a while thread B loads /X/b. Each thread takes a write lock on the subdirectory when loading it. This write lock isn't released until the list RPC is completed. However, after loading metadata, the list RPCs will attempt to read lock all children of /X so that they can list them. This can lead to thread A trying to read-lock /X/b while holding a write lock on /X/a, and thread B trying to read-lock /X/a while holding a write lock on /X/a.

To fix this, we should downgrade the write locks as soon as the metadata is loaded, before attempting to take any read locks.

Environment

None

Status

Assignee

Andrew Audibert

Reporter

Andrew Audibert

Labels

None

Components

Fix versions

Affects versions

1.6.0
1.7.0
1.6.1

Priority

Critical
Configure