FileInStream may cache blocks where they already exist

Description

When FileInStream reads a block with a caching read type, it follows a location policy to decide which Alluxio worker to cache to. The default policy is round-robin, which is essentially a random worker.

If the client tries to read a file which is already in Alluxio on a remote worker, the policy will kick in to decide where to cache to. If the policy decides to cache back to the worker which is being read from, an exception is thrown.

We should fix this by refusing to cache a block to a worker which already has the block.

Environment

None

Status

Assignee

Andrew Audibert

Reporter

Andrew Audibert

Labels

Components

Fix versions

Affects versions

1.0.0

Priority

Blocker
Configure