Allow Alluxio clients to recover from worker lost

Description

When an Alluxio client is reading data from an Alluxio worker, and the worker disappears, the client throws an exception and usually the entire job needs to be retried. This could lose many minutes worth of work. We should make the client try harder and recover from worker lost by retrying reading from another worker.

Environment

None

Status

Assignee

Bin Feng

Reporter

Bin Feng

Labels

None

Components

Affects versions

1.8.0

Priority

Major
Configure