Improve Alluxio UFS Test

Description

1 Overview

This design doc aims to improve the integration tests when different UFS as under storage in Alluxio.

Currently in Alluxio 1.5, the dependency of the integration tests module is in a bad state due to multiple reasons (we will explain the reasons in follow up section). As a result, we could not run integration tests with HDFS emulator as UFS for hadoop versions other than 1.0 and 2.2.

2 Goals

Hard requirement: Resolve dependency conflict.
Hard requirement: No loss of test coverage.
Stretch goal: Cleaner code structure which allows better isolation of UFS issues from Alluxio issues during tests and eases the debugging.

3 Use Cases

N/A

4 Design

4.1 Root Cause of the Problem

4.1.1 Mixing relocated/unrelocated hadoop dependencies

Integration tests module “alluxio-tests” depends on hadoop from three sources
Module alluxio-underfs which depends on hadoop-client, shaded and relocated
Module alluxio-core-client-hdfs which depends on hadoop-client, unshaded and unrelocated
Class LocalMiniDFSCluster which depends on hadoop-minicluster (this library indirectly brings in hadoop server classes), unshaded and unrelocated
Because hadoop-client partially relocated from 1, unshaded from 2 and 3, it creates troubles when running hdfs test. Note that, the conflict between 1 and 2 can be solved by marking hadoop-client dependency in 2 as provided, but the conflict between 1 and 3 is hard to resolve as 3 requires server-side classes while 1 only relocates the dependencies from client-side.

4.1.2 Dependency conflict between alluxio-core-server and LocalMiniDFSCluster

The integration test of hadoop-2.7 profile can NOT complete because there are conflict in jersey dependencies between alluxio-core-server and hadoop minicluster, where simple dependency exclusion could not work.

4.2 Solution:

We choose this following approach to pursue.

Philosophy of this solution:

  • In alluxio-tests module, we only test the Alluxio-internal logic with no real implementation of UFS involved.

  • UFS implementation should be tested separately and independently, and ensure the contract of UFS is met for each indivudal implementation

  • We don’t test UFS emulation but test real UFS instances.

Steps:

  • Remove UFS test profiles from integration tests, including “hdfsTest”, “s3Test” and etc profiles.

  • Remove alluxio-underfs dependencies in alluxio-tests

  • Implement UFS contract tests

  • Create an abstract UFS contract test to describes the contract and assumption between Alluxio and an instance of Alluxio UFS implementation.
    For each individual UFS, in its module create test cases focusing on testing the contract between Alluxio and the UFS through the UFS API.
    For autobots tests, run E2E tests like wordcount or Alluxio integration test with UFSes in addition to S3 and HDFS.

Pros:

  • Integration tests will be Alluxio-only tests, easier to reason and hopefully less flakiness introduced due to different UFS related implementation details

  • Easier dependency management, easier to maintain compared to the current state.
    Cons:

  • Lose tests coverage to a certain degree, e.g. the workloads of Alluxio integration tests on top of UFS, we probably need to think and reason if autobots tests can cover them. During release tests, we also run GCS, OSS, Swift, integration tests during release tests. This approach will result in loss of confidence in integration workloads on these UFSes.

Environment

None

Activity

Show:
Bin Fan
November 13, 2017, 7:34 AM

Resolved by adding UFS contract tests

Fixed

Assignee

Bin Fan

Reporter

Bin Fan

Labels

Fix versions

Affects versions

Priority

Major