Configuration

You pass a file path as a command line argument, this is a json file, see exampleConfig.js
If you change this file, it will reload the configuration
Flushes metrics to backends every 10 seconds by default
Default backend is graphite, can use multiple by configuring in the config file (See backends in e xample config file)
Default is to calculate 90 percentile, can configure using the config (See percentThreshold in example config file)
Can configure to make local flushes of metric data (See keyFlush in example config file)

Metrics interface

Port is 8125
A metric is a new line, could send multiple in one message, which will be treated as individual metrics

Management interface

Port is 8126
Commands are

  • help
  • stats
  • health
  • counters
  • timers
  • gauges
  • delcounters
  • deltimers
  • delgauges
  • quit

Metrics

Counts

Storage

counters which is a hash(string, float)

Special counters

packets_received
Incremented for every receive

statsd.bad_lines_seen
Incremented for every bad metric received - corrupt

Client counters

akey:1|c|2
where
akey is the key
1 is the value
c is the metric type
2 is the sample rate

Processing is
counters[key] += value * (1 / sampleRate)

Gauges

Storage

gauges which is a hash(string, float)

Special gauge

statsd..timestamp_lag
This is calculated by statsd as current_time_stamp - previous_flush_timestamp - flushInterval # All in seconds

Client gauges

akey:1|g
where
akey is the key
1 is the value, can have a leading +- which is treated differently
g is the metric type

Processing is
gauges[key] = value # where no +- prefix
gauges[key] += value # where a +- prefix is present

Timers

Storage

timers which is a hash(string, float[])
timer_counters which is a hash(string, float)

####Client timers akey:1|c|2
where
akey is the key
1 is the value
ms is the metric type
2 is the sample rate

Processing is
timers[key].push(value)
timer_counters[key] += (1 / sampleRate)

Sets

Storage

sets which is a hash(string, set)
set is a custom type to get distinct values, only one copy of a value, if two copies, will only have stored one

Client sets

akey:1|s
where
akey is the key
1 is the value
s is the metric type

Processing is
set[key].insert(value)

Process metrics function

See lib/process_metrics.js

Altered values

metrics.counters.sort() // Sort each keys values

New values

counter_rates = metrics.counters.map(value => value / (flushInterval / 1000)) // Re-calculate "per second" rate for each keys value
timer_data = metrics.counters.map(value =>					// NOTE - We do not include the values here
  {
    count =  timer_counters[key]												// Count that accounts for sample rate
    lower = min value
    upper = max value
    sum = sum values
    mean = sum / count
    median = See below
    stdev = See below
    count_ps =  timer_counters[key] / (flushInterval / 1000)					// Re-calculate "per second" count
    *** each percentile includes the following - not a collection - See below - CAN USE NEGATIVES ALSO, NOT CATERED FOR HERE
    mean_PERCENTILE = Sum / percentile threshold index
    upper_PERCENTILE = Percentile
    sum_PERCENTILE = Sum up to percentile threshold index
  })
  histogram = IGNORING THIS FOR NOW
  statsd_metrics = 																// Single processing_time attribute

Median calculation

$values = (5238, 4483, 6084, 5575, 7553, 0.5);
$count = $values.Length;
$sortedValues = $values | sort;
$midPointIndex = [Math]::Floor($values.Length / 2);
$median = if (($count % 2) -eq 1) { $sortedValues[$midPointIndex] } else { (($sortedValues[$midPointIndex - 1] + $sortedValues[$midPointIndex]) / 2) }
$median;

Standard deviation calculation

$values = (4483, 5238, 5575, 6084, 7553);
$count = $values.Length;
$sum = 0; $values | % { $sum += $_; }
$mean = $sum / $count;
$sumOfDiffs = 0; $values | % { $sumOfDiffs += ($_ - $mean) * ($_ - $mean); }
$standardDeviation = [Math]::Sqrt($sumOfDiffs / $count);
$standardDeviation;

Percentile threshold index, note uses round rather than round down as other examples I have seen

$originalValues = (4483, 5238, 5575, 6084, 7553);
$sortedValues = $originalValues | sort;					// Will be using sorted values for calculations
$count = $values.Length;
$percentile = 90;
$thresholdIndex = [Math]::Round(([Math]::Abs($percentile) / 100) * $count);
$thresholdIndex;

Graphite backend

Calls backends/graphite.js flush_stats exposed function

  • This creates a string version of the metrics hash generated by process metrics, adding some extra local processing information
  • It then writes this to the graphite address and port (Default is 2003)

This creates a tcp connection and writes some text

In the case of sets, it is the set length that is included in the text sent to graphite

Text that is written is Multiline text block where each line is made up of a key value and timestamp
where
key is the keyname we assigned with one of the following prefixes

  • stats_counts
  • stats_counts
  • stats.timers
  • stats.gauges
  • status.sets value is the value we wrote or the process_metrics function generated
    timestamp is the unix epoch when the flushMetrics function was called

Sample

stats_counts.statsd.bad_lines_seen 0 1390737306
stats.statsd.packets_received 1.8 1390737306
stats_counts.statsd.packets_received 18 1390737306
stats.Prod.TheApp.Worker.DoWork.Exiting 0.37037037037037035 1390737306
stats_counts.Prod.TheApp.Worker.DoWork.Exiting 3.7037037037037033 1390737306
stats.Prod.TheApp.Worker.DoWork.Enter 0.3 1390737306
stats_counts.Prod.TheApp.Worker.DoWork.Enter 3 1390737306
stats.timers.Prod.TheApp.TOther.1.mean_90 6078.666666666667 1390737306
stats.timers.Prod.TheApp.TOther.1.upper_90 6908 1390737306
stats.timers.Prod.TheApp.TOther.1.sum_90 18236 1390737306
stats.timers.Prod.TheApp.TOther.1.std 777.9123058260202 1390737306
stats.timers.Prod.TheApp.TOther.1.upper 6908 1390737306
stats.timers.Prod.TheApp.TOther.1.lower 5038 1390737306
stats.timers.Prod.TheApp.TOther.1.count 3 1390737306
stats.timers.Prod.TheApp.TOther.1.count_ps 0.3 1390737306
stats.timers.Prod.TheApp.TOther.1.sum 18236 1390737306
stats.timers.Prod.TheApp.TOther.1.mean 6078.666666666667 1390737306
stats.timers.Prod.TheApp.TOther.1.median 6290 1390737306
stats.timers.Prod.TheApp.Program.Work.mean_90 2657.6666666666665 1390737306
stats.timers.Prod.TheApp.Program.Work.upper_90 3487 1390737306
stats.timers.Prod.TheApp.Program.Work.sum_90 7973 1390737306
stats.timers.Prod.TheApp.Program.Work.std 777.9123058260202 1390737306
stats.timers.Prod.TheApp.Program.Work.upper 3487 1390737306
stats.timers.Prod.TheApp.Program.Work.lower 1617 1390737306
stats.timers.Prod.TheApp.Program.Work.count 3 1390737306
stats.timers.Prod.TheApp.Program.Work.count_ps 0.3 1390737306
stats.timers.Prod.TheApp.Program.Work.sum 7973 1390737306
stats.timers.Prod.TheApp.Program.Work.mean 2657.6666666666665 1390737306
stats.timers.Prod.TheApp.Program.Work.median 2869 1390737306
stats.gauges.Prod.TheApp.Worker.SleepInterval 2868 1390737306
stats.gauges.Prod.TheApp.Worker.SleepInterval.Other 2902 1390737306
statsd.numStats 8 1390737306
stats.statsd.graphiteStats.calculationtime 1 1390737306
stats.statsd.processing_time 1 1390737306