I'm evaluating Alluxio for a use case where I need to access files stored in S3 as if they were located in the local file system (read-only workload so far). Currently S3FS is serving this purpose fairly well, but it has only limited caching support. Most of the time Alluxio with FUSE works fine, but when it hits certain files it consistently produces IO errors in the client which show up as exceptions in fuse.log. Trying to access the same files with the alluxio fs command reproduces the problem, for example:
The files in question can be read successfully with S3FS and can be downloaded from the AWS S3 console. Most other files are read successfully, but running into this problem keeps us from being able to use Alluxio on our whole data set.
Attached is a portion of fuse.log showing the above block errors.
I’m only reading from Alluxio. The files are written directly to S3 and later read through Alluxio.
I haven’t done anything with WRITE_TYPE (not even familiar with the setting).
Could this be the problem?
Maybe relevant: I’m running Alluxio inside a Docker container with the following alluxio-site.properties:
Did you mount the S3 bucket into an Alluxio folder with read-only mode?
If the file is written through Alluxio to S3, Alluxio has the metadata about this file, but if the file is not written through Alluxio, Alluxio will not have metadata about it.
When you use alluxio fs ls, alluxio get the metadata from s3 but since it’s a ls operation, the file is not cached in Alluxio so no used bytes occur.
please add if any information about S3 read-only use cases misses, thanks.
In addition, it would be faster to post those kinds of issues through Alluxio user mailing list first. All our developers will be able to see your question. It will be faster for us to reply and help solve the issue. Thanks
I’ve found the cause of the problem. I was running Alluxio in a Docker container but didn’t set Docker’s --shm-size parameter to match the value I had set for worker memory size. This caused the worker to fail at startup, which was causing all accesses to fail with block errors.
I have another issue to try to solve, but will bring it up on the mailing list instead of opening another ticket.
Problem was due to a mismatch in the size of /dev/shm and the size set for worker memory (shm size was the default 64M for Docker while worker memory size was set to 1G).