This JIRA aims to add a service to Alluxio that can run simple I/O-related jobs in a distributed framework.
Each job is defined in a pthread-like programming model across a set of workers. The computation assigned to a single worker is called a task. Inside a task, one can define the I/O work.
A job, once submitted, will be queued on the master and then distributed to the worker nodes. Once a task is done or failed, the result will be returned to the master. A job is complete if all tasks of this job are complete successfully, or considered failed if any task fails. A failed job will trigger a retry. Each job is supposed to be idempotent, so retrying a job will not introduce side-effect.
This new job service in Alluxio will enable the following operations (in the initial implementation):
Async persistence (without the current limitation)
Replication enforcement, so user can specify the number of copies of a file
distributed move/cp