Datagram builder as a C extension #259

methodmissing · 2020-03-30T21:51:29Z

Why?

The datagram builder is heavy on allocation (and reallocation) when coercing both metric names and tags into normalized and valid statsd wire protocol components.

This extension is also implemented as a wrapped struct without any global state as @hkdsun and @bmansoob expressed interest in how to do that.

A simple struct with mixed native type and VALUE (Ruby object reference) members
GC mark callback invoked during the GC tracing phase (we need to let the GC know about Ruby object references we hold on to)
GC free callback invoked during the sweep phase if this object was deemed to not be referenced by any other, the stack etc. . We free the symbol tables for caches and the struct itself.
It's a good practice to add an object size accumulator callback to accurately reflect the size of this object via Objspace.memsize_of
A data type declaration defines the shape of the wrapped object and callbacks for the GC
An unboxing function coerces a VALUE reference back to a builder struct as per the type definition above
The allocator function prepares the struct for use by initializing members to their appropriate types, caches conditionally etc. and returns a VALUE reference to the allocated struct on the heap

Points of attack

The prefix is constant, initialized once yet merged into the output buffer from offset 0 to it's length for every builder call. We can instead seed the builder buffer with it and advance the buffer low water mark to the end of prefix on every call
Name normalization is expensive as it involves a regex match for the fast path and an additional function call for the slow path. For the fast path it's more efficient to walk to string from start to end , and for the slow path implement a bounded size normalized names cache to save on String#tr function calls for invalid metric names.
Tags normalization is expensive and especially with the hybrid Hash and Array tags API allocates additional objects for the Hash based API. This method is not optimized in C as it's mostly iterators and method calls which can become EXTREMELY ugly rewriting in a C extension. Instead another bounded size normalized tags cache was introduced to bypass this Ruby method for tags collections with a contents hash we've seen before.
There's always a new tags Array spawned in the generate_generic_datagram method. This is wasteful even for cases which don't have any default tags defined. This was flattened out into a zero alloc append only pattern instead.
Sample rate values appended to the buffer is zero alloc for Fixnum and Float types, but has a fallback path for other object types which would allocate a ruby String. This pattern likely works well for the value argument too, but would prefer to get runtime feedback first and consider it as a later optimisation.
Remove a repeated ivar lookup and size check for default tags in favor of caching it on the builder struct instead on init.
Reduce method dispatch overhead by calling the builder directly from the various metric helper methods.

Configurables

The buffer size can be changed compile time
Normalized caches are compile time features that can easily be disabled or changed

Resiliency issues covered

Strict overflow checks that finalize the buffer prior to overflow
Test suite run with GC.stress = true

Wrapped builder struct layout

Fits within 1 cache line (46 of 64 bytes) with the buffer as the last member. 4kb is very generous, but in reality the vast majority of statsd datagrams would only reach into 1 or 2 more cache lines.

The struct is passed around by reference as it's heap allocated anyways and few things are as bad as large structs passed by value.

lourens@CarbonX1:~/src/statsd-instrument/lib/statsd/instrument/ext$ pahole -C datagram_builder ./statsd.so
struct datagram_builder {
	st_table *                 normalized_tags_cache; /*     0     8 */
	st_table *                 normalized_names_cache; /*     8     8 */
	VALUE                      str_normalize_chars;  /*    16     8 */
	VALUE                      str_normalize_replacement; /*    24     8 */
	VALUE                      default_tags;         /*    32     8 */
	_Bool                      empty_default_tags;   /*    40     1 */

	/* XXX 3 bytes hole, try to pack */

	int                        prefix_len;           /*    44     4 */
	int                        len;                  /*    48     4 */
	char                       datagram[4096];       /*    52  4096 */

	/* size: 4152, cachelines: 65, members: 9 */
	/* sum members: 4145, holes: 1, sum holes: 3 */
	/* padding: 4 */
	/* last cacheline: 56 bytes */
};

Future TODO

Remove the String allocation for the value argument
Evaluate the hashing function used for the tags cache (Hash and Array specific are not directly exposed through the extension API)

Downsides

This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.

Other ideas

Willem mentioned more Stricter mode linters that can catch invalid metric names and tags enabled for example during CI could work wonders too.

@csfrancis @wvanbergen

…ting ENV var STATSD_EXT_DEBUG

…ith a static buffer

…uite with env var GC_STRESS

…, favoring broken packets to undefined behavior

…m the C ext only (normalize_tags is a pure function)

…also allows the normalized tags cache to be decoupled from Ruby land

… are very careful with offsets, even when preventing overflows

…zed cache limits should be configurable at runtime, idk?)

…instead of hashes. Along with that came several requirements which also illustrates how to implement wrapped structs and plug them clean into the GC. * The ivar lookups for caches were not efficient as the total ivar count for the builder went up to 4, meaning 1 of them would incur 2 symbol table lookups for a cached element as ivar count over 3 overflows to a symbol table * 'struct datagram_builder' now becomes a wrapped struct for tracking state such as optional normalized cache symbol tables, the datagram encode buffer, the initial offset of the first chunk of that buffer and also caches the default tags ivar * 'rb_data_type_t datagram_builder_type' and functions datagram_builder_mark, datagram_builder_free and datagram_builder_size illustrates the callbacks required for proper GC integration of wrapped structs * The primary reason for the wrapped struct is to not have global state as if there's multiple instances ever instantiated from Ruby land, the global state can be clobbered. * With the struct we are able to now pre seed the initial part of the buffer with the given prefix ivar and track the offset into the buffer for #generate_generic_datagram to pick up from so we don't have to do the ivar lookup and memcpy for EVERY datagram built * Normalized caches are now backed by symbol tables. I am happy with the normalized name implementation as rb_str_hash is exposed and computes a numeric hash from the String contents. Unfortunately rb_ary_hash and rb_hash_hash are not exposed and rb_hash defaults to a method call instead. I flagged to revisit.

…tion call, still has O(n) complexity but cheaper than str hash + symbol table lookup

…vents a Ruby String allocation per datagram

…pecified

…ect dispatch to them from the respective metric helper functions

methodmissing · 2020-03-30T21:57:20Z

... and some CI massaging required

wvanbergen · 2020-03-31T10:42:15Z

This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.

In think there are two ways to address this:

The datagram builder is already configurable. We could ship this as a separate datagram builder class (potentially as a different gem). The client can then configure to use this different datagram builder class, maybe using an environment variable.
We create a separate gem that when loaded, will overwrite the datagram builder class with this native implementation.

methodmissing · 2020-03-31T16:15:44Z

Yeah that makes sense - like the sequel model

csfrancis

This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.

If we think that’s a big deal, I like the option of a separate gem that depends on this that overwrites the class.

csfrancis · 2020-03-31T22:13:11Z

ext/statsd/statsd.c

+  struct datagram_builder *builder = (struct datagram_builder *)ptr;
+  if (!builder) return;
+#ifdef NORMALIZED_TAGS_CACHE_ENABLED
+  st_free_table(builder->normalized_tags_cache);


Maybe set these to NULL after freeing?

Yes agree, use after free guard

csfrancis · 2020-03-31T22:15:25Z

ext/statsd/statsd.c

+#ifdef NORMALIZED_NAMES_CACHE_ENABLED
+  size += st_memsize(builder->normalized_names_cache);
+#endif
+  return size;


Will this include the sizes of the VALUE members on the builder (assuming the values can’t be embedded)?

My understanding of object memory size reporting (as expected to be returned by ObjectSpace.memsize_of) is that retained memory shouldn't be included. See Class for example

Heap dumps follow that model too, otherwise the dominator tree for harb would not have been needed 😄

st_memsize is just memory consumed by the hash table internals (bins and entries)

csfrancis · 2020-03-31T22:25:24Z

ext/statsd/statsd.c

+    normalized_name = normalized_names_cached(builder, self, name);
+  }
+  chunk_len = RSTRING_LEN(normalized_name);
+  if (builder->len + chunk_len > DATAGRAM_SIZE_MAX) goto finalize_datagram;


Should we be raising on these conditions?

I thought about it, but then again this is telemetry specific and the default buffer is a generous buffer size of 4096 bytes per datagram.

The line is somewhere between:

Raising on a telemetry specific path the could bubble up an unhandled exception, impacting business specific paths. I don't think that's a great default.

If you emit datagrams that large chances are cardinality is also going to be high through tags most likely and a truncated datagram would be the least we could do. Although an exception would probably be deserved

What's the behaviour if we send a datagram that's too large? Does that depend on the statsd backend?

Currently, we don't know. I assume they will simply not be delivered because they are being dropped by the networking stack as the UDP datagrams get too large. By design, we don't know about this - it's a fire and forget protocol.

What Willem said re. fire and forget. I'm fine with any solution other than raising an error as telemetry specific fire and forget datagrams shouldn't risk raising unhandled exceptions that could uproot business processes in the host application

csfrancis · 2020-03-31T22:28:10Z

ext/statsd/statsd.c

+  memcpy(builder->datagram + builder->len, StringValuePtr(normalized_name), chunk_len);
+  builder->len += chunk_len;
+
+  if (builder->len + 1 > DATAGRAM_SIZE_MAX) goto finalize_datagram;


You could maybe extract this into an inline function.

Was thinking about it too, but for flow control to convert the buffer to a Ruby String and return, the goto is local.

A static function would require length and the builder as arguments too and wouldn't make a difference to code size.

Thoughts?

Or let overflow checking be a compile time option and allow arbitrary size datagrams if disabled (effectively the current Ruby implementation's behaviour)

I was thinking that this would make more sense if we were raising an exception.

The risk / regression factor in raising exceptions is that we'd violate a loose contract of "allowing any size datagrams to be built" to "raising an exception that can fan out to the whole host application"

Thoughts on a middle ground that either:

Implements overflow guards, truncates the datagram and emit a warning with rb_warn("StatsD datagram truncated to X bytes"). Easy to provide a reason too, as with the logging pipeline.

Follow the existing behavior of supporting arbitrary size datagrams: static buffer that can overflow to a heap allocated buffer OR move the buffer to the heap. Both of these aren't difficult, but creates many more branches and complexity too.

?

methodmissing · 2020-04-01T10:29:33Z

@wvanbergen objections to spawning statsd-instrument-c (blocked on a better name, patches welcome 😄)? That would also introduce some other issues to think about:

What happens with test coverage in light of the extracted extension?
Running the main test suite with and without the C extensions (gems become a CI dependency)

…am_builder_free

csfrancis and others added 20 commits March 30, 2020 20:18

Initial C extension implementation

42387e8

Enable warnings + pedantic flag and allow for easier debugging by set…

ddfce73

…ting ENV var STATSD_EXT_DEBUG

Implement DatagramBuilder#generate_generic_datagram as a C function w…

37f71d2

…ith a static buffer

Introduce support for conditionally enabling GC.stress for the test s…

2c46dba

…uite with env var GC_STRESS

Prevent buffer overflows in DatagramBuilder#generate_generic_datagram…

5913e03

…, favoring broken packets to undefined behavior

Introduce a bounded cache for DatagramBuilder#normalize_tags used fro…

f0c9cec

…m the C ext only (normalize_tags is a pure function)

Let the C ext be a prepended module as opposed to redefinition which …

86a26fe

…also allows the normalized tags cache to be decoupled from Ruby land

Let the normalized tags cache be a compile time option

2272013

No need to memset the buffer to 0s for 4096 bytes on every call as we…

ab7b420

… are very careful with offsets, even when preventing overflows

Also extract a bounded normalized metrics name cache (perhaps normali…

ce81bbb

…zed cache limits should be configurable at runtime, idk?)

Extract normalize_name_fast_path which is guaranteed to not do a func…

b7a19e8

…tion call, still has O(n) complexity but cheaper than str hash + symbol table lookup

Fix warnings about losing integer precision

430e72b

Introduce a fast path for sample rate Float and Fixnum types that pre…

be655a5

…vents a Ruby String allocation per datagram

Avoid Array alloc + concat for when both default and given tags are s…

2200dec

…pecified

Hide #generate_generic_datagram and #normalize_name from Ruby and dir…

ef094cd

…ect dispatch to them from the respective metric helper functions

Prefer a strhash table for the normalized names cache

1664789

Move the global string allocations to builder struct members instead

d60dbe1

Perform the empty default tags predicate check on init

a40d369

Let append_normalized_tags prefer a bool return too

16d6f14

methodmissing requested a review from wvanbergen as a code owner March 30, 2020 21:51

csfrancis reviewed Mar 31, 2020

View reviewed changes

Explicitly set freed struct members and the builder to NULL in datagr…

666ff83

…am_builder_free

Datagram builder as a C extension #259

Are you sure you want to change the base?

Datagram builder as a C extension #259

Uh oh!

Conversation

methodmissing commented Mar 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

Points of attack

Configurables

Resiliency issues covered

Wrapped builder struct layout

Future TODO

Downsides

Other ideas

Uh oh!

methodmissing commented Mar 30, 2020

Uh oh!

wvanbergen commented Mar 31, 2020

Uh oh!

methodmissing commented Mar 31, 2020

Uh oh!

csfrancis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

methodmissing Apr 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

methodmissing Apr 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

methodmissing commented Apr 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

methodmissing commented Mar 30, 2020 •

edited

Loading

methodmissing Apr 1, 2020 •

edited

Loading

methodmissing Apr 1, 2020 •

edited

Loading