Open issues

CLI hang in HDFS shutdown hook
ALLUXIO-3319
InodeDirectory consume too much memory
ALLUXIO-3314
Channel#writeAndFlush get stuck in AbstractReadHandler
ALLUXIO-3249
Add support for S3 buckets which requires encryption for SSE KMS
ALLUXIO-3228
Netty timeout exception in client
ALLUXIO-3199
Mesos integration does not propagate Alluxio site properties
ALLUXIO-3153
Master cannot be switched in the second time
ALLUXIO-3047
File content inconsistency between alluxio from hdfs
ALLUXIO-2636
EMR Bootstrap integration script fixes.
ALLUXIO-3400
FileSystemContext.acquireMasterClient() cause NPE while we create FileSystem.create()
ALLUXIO-3399
SETTTL is not working when we have 2 tiered layer level
ALLUXIO-3398
Add embedded journal
ALLUXIO-3392
Some links of alluxio web are not documented.
ALLUXIO-3389
There are some link error in alluxio documentation
ALLUXIO-3387
Use system.err instead of system.out to log errors from the CLI
ALLUXIO-3386
Log on master when secondaries go down
ALLUXIO-3385
In some cases, file deletion errors are thrown when creating a file
ALLUXIO-3382
Full scan for active ufs sync timeout on the client w/ invalid feedback
ALLUXIO-3381
In some cases, the master always outputs an error log.
ALLUXIO-3380
Remove unused Session code in the worker
ALLUXIO-3379
Improve client metrics heartbeat
ALLUXIO-3377
"bin/alluxio fs load" can load data in a distributed way for better performance
ALLUXIO-3374
Unmounting a mountpoint failed to remove sync points contained in the mountpoint.
ALLUXIO-3372
chmod -R does not recursively change mod
ALLUXIO-3371
Be able to specify read and write type in "cp" and "mv"
ALLUXIO-3369
Inotify-based Active UFS Sync
ALLUXIO-3367
User can change file permission with setfacl even if they are not owner of the file
ALLUXIO-3365
alluxio worker will throw "Failed to find any Kerberos tgt" with kerberos hadoop enviroment
ALLUXIO-3360
When a dir is created concurrently, DefaultFileSystemMaster#createDirectory throws FileDoesNotExistException.
ALLUXIO-3359
bin/alluxio fs persist hangs after failing
ALLUXIO-3357
Log the web exceptions and errors instead of let the whole page crash
ALLUXIO-3354
Clients should refresh cluster defaults after disconnect with the master
ALLUXIO-3351
Migrating the master node causes the Zookeeper 's ZNodes leakage
ALLUXIO-3350
Integrate the new under storage based on Qiniu Kodo to Alluxio
ALLUXIO-3347
Alluxio integration with Mapd
ALLUXIO-3337
Adding access from R language
ALLUXIO-3336
Support fixed test ports in local alluxio cluster
ALLUXIO-3332
Config check for site property file
ALLUXIO-3331
Upgrade Guava dependency and shade Guava in client uber jar
ALLUXIO-3330
Improve default ramdisk mounting behavior
ALLUXIO-3329
block readlock unreleased
ALLUXIO-3328
Spark process report "Path is no longer valid, possibly due to a concurrent delete."
ALLUXIO-3326
Investigate libfuse 3.X writeback_cache option which can significantly improve performance.
ALLUXIO-3324
Alluxio uses com.google.guava:guava:14.0.1 which has security vulnerability
ALLUXIO-3322
Add S3 proxy range reading
ALLUXIO-3321
Retry reading S3 files when facing Amazon connection reset issue.
ALLUXIO-3320
Alluxio-Mesos : Journal get formatted on every Alluxio restart
ALLUXIO-3315
alluxio-checker module failed to build with java 9
ALLUXIO-3313
REST api create does not actually create the file in the UFS
ALLUXIO-3308
CreateFile does not propagate the correct inherited ACL mask to the UFS (hdfs)
ALLUXIO-3307
issue 1 of 598

CLI hang in HDFS shutdown hook

Description

I can reproduce this consistently by running

1 hdfs dfs -mkdir alluxio://zk@@master1:2181;master2:2181/test

after launching that command, the shell becomes frozen for over a minute, not even responding to Ctrl-C. Looking at the jstack, it seems like the metrics heartbeat is to blame. We need to set a short time limit on the final heartbeat (e.g. 500ms), and ideally also make it interruptible.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 "Thread-9" #24 prio=5 os_prio=0 tid=0x00007fe5001e4000 nid=0xf1e waiting for monitor entry [0x00007fe4f5938000] java.lang.Thread.State: BLOCKED (on object monitor) at alluxio.AbstractClient.close(AbstractClient.java:268) - waiting to lock <0x00000000ee25a3c0> (a alluxio.client.metrics.MetricsMasterClient) at alluxio.client.file.FileSystemContext.closeInternal(FileSystemContext.java:285) - locked <0x00000000ee257c40> (a alluxio.client.file.FileSystemContext) at alluxio.client.file.FileSystemContext.close(FileSystemContext.java:266) at alluxio.hadoop.AbstractFileSystem.close(AbstractFileSystem.java:151) at alluxio.hadoop.FileSystem.close(FileSystem.java:27) at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2913) - locked <0x00000000f3001718> (a org.apache.hadoop.fs.FileSystem$Cache) at org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2930) - locked <0x00000000f30cad50> (a org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

Environment

None

Status

Assignee

Unassigned

Reporter

Andrew Audibert

Labels

None

Components

Affects versions

1.8.0

Priority

Blocker