Currently every FileSystemContext will send its own metrics heartbeat to master. We should aggregate these heartbeats into a single heartbeat to reduce load on master and clients.
Would this problem be solved after we switch to gRPC (since we will have multiplexing on the same connection)?
Also, I think the MetricsSystem cannot be a singleton if we want to have a modular client where you could create instances of Alluxio clients that talk to different Alluxio masters.
The problem will be reduced if gRPC re-uses the same connection for the heartbeats, but the extra client heartbeats are still unnecessary stress on client and master.
If we split MetricsSystem to be client-level instead of global, this ticket isn’t as important, though batching metrics heartbeats could still benefit in reducing master stress.
That makes sense, I think the main overhead is # of connections currently.