View RPC diagnostics
You can use remote procedure call (RPC) diagnostics for your filesystem configuration to help monitor the health of your cluster. You can do this with the UI or the CLI.
View RPC diagnostics with the UI
The RPC Diagnostics graphs give you insight into RPC call times and the number of events returned for each RPC call.
Go to any of your filesystem configurations in the UI and select the link Advanced Diagnostics to view RPC diagnostics information.
You can monitor historic values for the following Remote Procedure Call (RPC) metrics in two graphs in the UI:
Graph | Values |
---|---|
RPC call time | Average call time |
Maximum call time | |
Events for each RPC | Average number of events |
Maximum number of events |
To view values plotted across both graphs for a specific timeframe, select the data duration from the dropdown list in the UI:
- One hour
- One day
- One week
- 30 days
Average values should remain constant. If the graphs show maximum values that vary, it may indicate the Hadoop cluster has issues.
Maximum values are reset every minute.
View RPC diagnostics with the CLI
You can run the following commands to view a diagnostics summary with your shell CLI or the Data Migrator CLI tool:
Shell CLI with API endpoint
To get a diagnostics summary in text format, go to your shell CLI and run the command:
curl http://127.0.0.1:18080/diagnostics/summary.txt
To view diagnostics continually updated, go to your shell CLI and run the command:
watch curl -s http://127.0.0.1:18080/diagnostics/summary.txt
If you're using curl on a TLS connection and with basic auth, use https
with the --insecure
and --user
curl options.
For example:
curl https://<host>:<port>/<end_point> --insecure --user <basic-auth-username>
CLI tool
To get a diagnostics summary with the CLI tool, run the command:
status --diagnostics
To view diagnostics continually updated with the CLI tool, run the command:
status --diagnostics --watch
You'll see a line in the summary for the RPC diagnostics similar to this example:
EventsBehind Current/Avg/Max: 0/0/0, RPC Time Avg/Max: 4/54
The current, average, and maximum values for the metric EventsBehind
inform you how far behind Data Migrator is from retrieving events on the NameNode. If these values exceed the number of events on the NameNode, migrations will eventually stop automatically. Warnings are displayed in the UI and the CLI to let you know you need to adjust the maximum number of events before migrations stop automatically.
The average and maximum values for the metric RPC Time Avg/Max
inform you how long the RPC call times are.