io.netty.util.IllegalReferenceCountException is thrown when files are pinned in Alluxio

Description

This is originally reported in this POST. https://groups.google.com/d/topic/alluxio-users/iFDUqriA-Wk/discussion

Looks like others are also experiencing the same problem.

Hitting the below exception when trying to read the pinned files on Alluxio using Spark.

17/08/08 13:40:54 WARN scheduler.TaskSetManager: Lost task 610.0 in stage 0.0 (TID 313, node15, executor 17): io.netty.util.IllegalReferenceCountException: refCnt: 0, decrement: 1
at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
at alluxio.client.block.stream.BlockOutStream.releaseCurrentPacket(BlockOutStream.java:262)
at alluxio.client.block.stream.BlockOutStream.cancel(BlockOutStream.java:185)
at alluxio.client.file.FileInStream.closeOrCancelCacheStream(FileInStream.java:388)
at alluxio.client.file.FileInStream.handleCacheStreamException(FileInStream.java:449)
at alluxio.client.file.FileInStream.readInternal(FileInStream.java:235)
at alluxio.client.file.FileInStream.read(FileInStream.java:179)
at alluxio.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:126)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:70)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:120)
at org.apache.hadoop.io.SequenceFile$Reader.readBuffer(SequenceFile.java:2149)
at org.apache.hadoop.io.SequenceFile$Reader.seekToCurrentValue(SequenceFile.java:2215)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2290)
at org.apache.hadoop.mapred.SequenceFileRecordReader.getCurrentValue(SequenceFileRecordReader.java:109)
at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:84)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:266)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:211)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

My Alluxio cluster is heavily loaded and thinks that it could be a bug triggered by pinning all the space in a worker and then trying to cache to the worker.

Environment

Alluxio 1.5.0
Spark 2.1.0
Hadoop 2.6.0

Status

Assignee

Calvin Jia

Reporter

Pradeep Chanumolu

Labels

Components

Fix versions

Affects versions

1.5.0

Priority

Critical
Configure