All issues

CLI hang in HDFS shutdown hook
ALLUXIO-3319
File like "a?b=c" in mounted s3 file system will lead to the restart of master fail
ALLUXIO-3282
unshaded protobuf classes show up in uber client jar
ALLUXIO-3278
Journal logs may be lost in case of slow shutdown during failover
ALLUXIO-3120
Unable to start Alluxio Workers
ALLUXIO-2059
hadoop1 and developer build failures
ALLUXIO-2045
Temp files show up unexpectedly
ALLUXIO-1957
FileInStream may cache blocks where they already exist
ALLUXIO-1814
Tachyon client thrift read time out
ALLUXIO-1703
java.io.IOException: FailedToCheckpointException(message:Failed to rename /tmp/tmp/tachyon/workers/1448540000001/7/31 to /tmp/tmp/tachyon/data/31)
ALLUXIO-1352
workers rsync wrong release package from master
ALLUXIO-448
NPE when MR tries to use Tachyon
ALLUXIO-216
There is a problem starting Alluxio Master using "safe" mode
ALLUXIO-3368
Journal replay failure after rm -R and re-create on a mount point
ALLUXIO-3352
InodeDirectory consume too much memory
ALLUXIO-3314
Metrics shutdown hook can take a long time
ALLUXIO-3304
alluxio 1.8.0 throws thrift errors with spark 2.3.1
ALLUXIO-3298
Failure when the server supports EPOLL but client doesn't.
ALLUXIO-3260
UFS stream is not closed during journal backup
ALLUXIO-3258
ufs mode is not included in snapshot state
ALLUXIO-3255
Unrecognized journal entry
ALLUXIO-3251
Channel#writeAndFlush get stuck in AbstractReadHandler
ALLUXIO-3249
Add support for S3 buckets which requires encryption for SSE KMS
ALLUXIO-3228
Netty timeout exception in client
ALLUXIO-3199
Mesos integration does not propagate Alluxio site properties
ALLUXIO-3153
Blocks cannot be freed, but they take up space on workers
ALLUXIO-3134
Loading directory metadata with high concurrency can lead to deadlock
ALLUXIO-3095
extensions install doesn't work if the extensions directory does not exist
ALLUXIO-3074
The version of com.google.guava does not match with Presto
ALLUXIO-3049
Master cannot be switched in the second time
ALLUXIO-3047
Failed to recursively delete a directory when the directory has both persisted and unpersisted subfolders
ALLUXIO-3036
Incorrect behavior when a recursively deleted directory is out of sync with UFS
ALLUXIO-3033
LocalFileBlockReader reads outside the file length
ALLUXIO-3017
Journal fails when replaying entries which delete persisted inode directories.
ALLUXIO-3013
Unnecessary partial caching from positioned read
ALLUXIO-2995
io.netty.util.IllegalReferenceCountException is thrown when files are pinned in Alluxio
ALLUXIO-2987
Master service unavailable when setting TTL for files concurrently
ALLUXIO-2820
Fix MasterFaultToleranceIntegrationTest with -Phadoop
ALLUXIO-2818
Unable to mount S3 buckets for which you only have read permissions
ALLUXIO-2640
File content inconsistency between alluxio from hdfs
ALLUXIO-2636
Hanging encountered trying to persist file to remote HDFS
ALLUXIO-2558
ThreadPool is full and new task has been rejected
ALLUXIO-2555
Use of sudo is overly broad
ALLUXIO-2548
Use available capacity to decide whether a worker can hold a block
ALLUXIO-2206
Jenkins build failure: TieredStoreIntegrationTest
ALLUXIO-2200
Fix concurrentDeleteTest failures
ALLUXIO-2176
BlockAlreadyExistsException in MostAvailableFirstPolicy and RoundRobinPolicy
ALLUXIO-2095
Fault tolerance integration tests leak threads
ALLUXIO-2091
Mesos framework unable to authentication without credentials
ALLUXIO-2040
Potential off-heap memory leak on Netty file transfer (MAPPED mode)
ALLUXIO-2022
issue 1 of 3390

CLI hang in HDFS shutdown hook

Description

I can reproduce this consistently by running

1 hdfs dfs -mkdir alluxio://zk@@master1:2181;master2:2181/test

after launching that command, the shell becomes frozen for over a minute, not even responding to Ctrl-C. Looking at the jstack, it seems like the metrics heartbeat is to blame. We need to set a short time limit on the final heartbeat (e.g. 500ms), and ideally also make it interruptible.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 "Thread-9" #24 prio=5 os_prio=0 tid=0x00007fe5001e4000 nid=0xf1e waiting for monitor entry [0x00007fe4f5938000] java.lang.Thread.State: BLOCKED (on object monitor) at alluxio.AbstractClient.close(AbstractClient.java:268) - waiting to lock <0x00000000ee25a3c0> (a alluxio.client.metrics.MetricsMasterClient) at alluxio.client.file.FileSystemContext.closeInternal(FileSystemContext.java:285) - locked <0x00000000ee257c40> (a alluxio.client.file.FileSystemContext) at alluxio.client.file.FileSystemContext.close(FileSystemContext.java:266) at alluxio.hadoop.AbstractFileSystem.close(AbstractFileSystem.java:151) at alluxio.hadoop.FileSystem.close(FileSystem.java:27) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2913) - locked <0x00000000f3001718> (a org.apache.hadoop.fs.FileSystem$Cache) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2930) - locked <0x00000000f30cad50> (a org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

Environment

None

Status

Assignee

Unassigned

Reporter

Andrew Audibert

Labels

None

Components

Affects versions

1.8.0

Priority

Blocker