Painless visualization of Microsoft/ethr log with ELK

Microsoft/ethr is a simple network performance measurement tool that works on terminal. As the name shows it is released by Microsoft, and actively maintained with many contributors recently.

image image from official repository

I liked it because it is very simple and easy to use(and it's always a fun for me to try new tools). Just running with the following commands provides the logs and nice terminal-based UI of per-host network performance measurement.

Server: TUI appears and ethrs.log is created in current directory

$ ethr -s -ui

Client: ethrc.log is created in current directory.

$ ethr -c <server-ip>

Analyzing the result

It is not realistic to analyze the raw log after the long-run performance measurement, which generates millions of lines of log. Fortunately, the log is machine-friendly by default(and inevitably): per-line JSON formatted like below. So I decided to use Fluentd + Elasticsearch + Kibana as aggregator / visualizer.

Sample ethrs.log

{"Time":"2019-05-01T13:06:11Z","Type":"TestResult","RemoteAddr":"100.100.103.153","Protocol":"TCP","BitsPerSecond":"9.98M","ConnectionsPerSecond":"","PacketsPerSecond":"","AverageLatency":""}
{"Time":"2019-05-01T13:06:11Z","Type":"TestResult","RemoteAddr":"100.100.103.181","Protocol":"TCP","BitsPerSecond":"384K","ConnectionsPerSecond":"","PacketsPerSecond":"","AverageLatency":""}
{"Time":"2019-05-01T13:06:11Z","Type":"TestResult","RemoteAddr":"100.100.103.146","Protocol":"TCP","BitsPerSecond":"3.20M","ConnectionsPerSecond":"","PacketsPerSecond":"","AverageLatency":""}
{"Time":"2019-05-01T13:06:11Z","Type":"TestResult","RemoteAddr":"[SUM]","Protocol":"TCP","BitsPerSecond":"315.14M","ConnectionsPerSecond":"0","PacketsPerSecond":"0","AverageLatency":""}

In this post, I'll explain how we can visualize the BitsPerSecond field and make it analyzable on Kibana like this;

image

Setting up environments

I don't explain how to install the software, as there are well-organized documentations and many blog posts for each. Instead, I'll give some ethr-specific tips that I came up with in my struggle during the setup.

For simplicity, all the software or the ethr server, Fluentd(td-agent), Elasticsearch, and Kibana, is supposed to be installed in the same host.

Handling timestamp in Fluentd

At first, I set time_key in <source> section to Time which ethrs.log had in each line, but it did not work; the parsed log in /var/log/td-agent/td-agent.log had "1970-01-01 00:00:00".

This is strange because the time format looked like a standard one(compliant with ISO8601 format), and td-agent recognized the key successfully(the original key/value is omitted in the td-agent.log if time parsing succeeds).

Sample /etc/td-agent/td-agent.conf

<source>
  type     tail
  tag      ethrs
  format   json
  path     /var/log/ethrs/ethrs.log
  pos_file /var/log/ethrs/ethrs.log.pos
  time_key Time # <= THIS DIDN'T WORK :(
</source>

<match ethrs>
  @type copy
  <store>
    @type stdout
  </store>
  <store>
    @type elasticsearch
    host localhost
    port 9200
    index_name td.${tag}
    type_name td.${tag}
  </store>
</match>

*continuous logging is not considered in this configuration

Maybe digging into the Fluentd documentation would solve the problem, but as in the title of this post, I'd like to visualize the logs as painless as possible, so I decided not to let td-agent to parse timestamp. I knew it could easily be achieved by Kibana: just select "Time" when creating Index Pattern on Kibana GUI and it's done ;)

image

Converting units

The value in "BitsPerSecond" field has human-friendly representation of bits/sec with units. The unit varies like "43.65M", "384K", and "0"(I haven't seen non-zero value without "M" or "K", but there might be). Due to this I needed to convert the unit to make it a single format.

There were two options to address this problem;

  • Edit source code and rebuild ethr
    In the source code of ethr, it seems it is converted with numberToUnit(), and it's currently hard-coded.
    Just removing the converting part from the source is not such a burden, but a bad practice from the maintainability point of view. Maybe sending pull request that aligns the units with new command-line option is better way, but much more tiresome.

  • Create Scripted Field in Kibana
    Scripted Field is a feature to add custom fields in the query results(not records itself) by scripting. The script language supported is "Painless"(default) and "Lucene expressions".

I chose the latter option finally, as it seemed more fun and it would help in some other cases in the future as long as I continue using Kibana, but this was actually the most painful part, though the name of DSL used in Scripted Field is "Painless"...

Creating Scripted Field

To create a new Scripted Field, go Management > Index Patterns > <index name> > Scripted fields. The new field I created was the bps, so the "Type" and "Format" should be "Number".

Painless can be written in Java-like syntax. What the script does here are;

  • parse "BitsPerSecond" field as raw (return 0 if the field is missing)
  • remove string("K", "M", "G") contained in raw and align units depending on the string
  • return as Float value
String raw = doc['BitsPerSecond.keyword'].value;
if (raw == null) {
    return 0
}

if (raw.contains('K')) {
    String num = raw.substring(0, raw.indexOf('K'));
    return Float.parseFloat(num) * 1024;
} else if (raw.contains('M')) {
    String num = raw.substring(0, raw.indexOf('M'));
    return Float.parseFloat(num) * 1024 * 1024;
} else if (raw.contains('G')) {
    String num = raw.substring(0, raw.indexOf('G'));
    return Float.parseFloat(num) * 1024 * 1024 * 1024;
} else {
    return Float.parseFloat(raw);
}

I'm not so familiar with built-in Java methods so maybe the script above is somewhat inefficient. Please let me know if there's a better way.

Visualizing

I did nothing special about visualizing the bps value in Kibana. Here's some small tips for analyzing ethrs.log.

  • Exclude the lines where "RemoteAddr" is "[SUM]" for per-host analysis.
    There are "[SUM}" line that represents aggregated measurement of all the clients in ethrs.log, but it's annoying to see that in the graph when analyzing per-host basis. Just putting the following lucence query will work.
NOT (RemoteAddr:"[SUM]") 
  • Use average aggregation for Y-Axis.
    Plotting per second(=per line) is a kind of disaster for the machine and your browser. Selecting average for Y-Axis and setting interval to auto or some custom value that is not so small(depends on time range to analyze) lets you work on analysis comfortably.

Summarizing up

  • Microsoft/ethr is so easy to use. It's worth considering to use if you don't need super-precise measurement.
  • It's still in so early stage of development that some features such as no option for logging format are missing.
  • Elasticsearch & Kibana, especially Painless scripting language helps customize the log(but with much pain for beginners).

Hope this helps who interested in ethr. Happy performance measurements!


1151 Words

2019-05-02