Fault tolerance integration tests leak threads

Description

Killing the Alluxio test cluster to test fault tolerance leaks many threads and destabilizes the build. We need to either kill the cluster in a way which kills all threads, or find a different way to test fault tolerance.

The specific culprits are MasterFaultToleranceIntegrationTest and JournalIntegrationTest. They have been @Ignored until this ticket is resolved.

To reproduce, you can add

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 static { new Thread(new Runnable() { @Override public void run() { while (true) { CommonUtils.sleepMs(1000); List<Thread> threads = new ArrayList<>(Thread.getAllStackTraces().keySet()); Collections.sort(threads, new Comparator<Thread>() { @Override public int compare(Thread o1, Thread o2) { return o1.getName().compareTo(o2.getName()); } }); LOG.warn("-------------------------Threads--------------------------"); LOG.warn("Count: " + threads.size()); for (Thread t : threads) { LOG.warn(t.getName()); } LOG.warn("-------------------------End Threads--------------------------"); } } }).start(); }

to a class which gets used by the test.

Environment

None

Status

Assignee

Andrew Audibert

Reporter

Andrew Audibert

Labels

Components

Affects versions

1.2.0

Priority

Critical
Configure