-
Notifications
You must be signed in to change notification settings - Fork 100
Datagram builder as a C extension #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ting ENV var STATSD_EXT_DEBUG
…ith a static buffer
…uite with env var GC_STRESS
…, favoring broken packets to undefined behavior
…m the C ext only (normalize_tags is a pure function)
…also allows the normalized tags cache to be decoupled from Ruby land
… are very careful with offsets, even when preventing overflows
…zed cache limits should be configurable at runtime, idk?)
…instead of hashes. Along with that came several requirements which also illustrates how to implement wrapped structs and plug them clean into the GC. * The ivar lookups for caches were not efficient as the total ivar count for the builder went up to 4, meaning 1 of them would incur 2 symbol table lookups for a cached element as ivar count over 3 overflows to a symbol table * 'struct datagram_builder' now becomes a wrapped struct for tracking state such as optional normalized cache symbol tables, the datagram encode buffer, the initial offset of the first chunk of that buffer and also caches the default tags ivar * 'rb_data_type_t datagram_builder_type' and functions datagram_builder_mark, datagram_builder_free and datagram_builder_size illustrates the callbacks required for proper GC integration of wrapped structs * The primary reason for the wrapped struct is to not have global state as if there's multiple instances ever instantiated from Ruby land, the global state can be clobbered. * With the struct we are able to now pre seed the initial part of the buffer with the given prefix ivar and track the offset into the buffer for #generate_generic_datagram to pick up from so we don't have to do the ivar lookup and memcpy for EVERY datagram built * Normalized caches are now backed by symbol tables. I am happy with the normalized name implementation as rb_str_hash is exposed and computes a numeric hash from the String contents. Unfortunately rb_ary_hash and rb_hash_hash are not exposed and rb_hash defaults to a method call instead. I flagged to revisit.
…tion call, still has O(n) complexity but cheaper than str hash + symbol table lookup
…vents a Ruby String allocation per datagram
…ect dispatch to them from the respective metric helper functions
In think there are two ways to address this:
|
|
Yeah that makes sense - like the |
csfrancis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.
If we think that’s a big deal, I like the option of a separate gem that depends on this that overwrites the class.
| struct datagram_builder *builder = (struct datagram_builder *)ptr; | ||
| if (!builder) return; | ||
| #ifdef NORMALIZED_TAGS_CACHE_ENABLED | ||
| st_free_table(builder->normalized_tags_cache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe set these to NULL after freeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agree, use after free guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #ifdef NORMALIZED_NAMES_CACHE_ENABLED | ||
| size += st_memsize(builder->normalized_names_cache); | ||
| #endif | ||
| return size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this include the sizes of the VALUE members on the builder (assuming the values can’t be embedded)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of object memory size reporting (as expected to be returned by ObjectSpace.memsize_of) is that retained memory shouldn't be included. See Class for example
Heap dumps follow that model too, otherwise the dominator tree for harb would not have been needed 😄
st_memsize is just memory consumed by the hash table internals (bins and entries)
| normalized_name = normalized_names_cached(builder, self, name); | ||
| } | ||
| chunk_len = RSTRING_LEN(normalized_name); | ||
| if (builder->len + chunk_len > DATAGRAM_SIZE_MAX) goto finalize_datagram; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be raising on these conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but then again this is telemetry specific and the default buffer is a generous buffer size of 4096 bytes per datagram.
The line is somewhere between:
-
Raising on a telemetry specific path the could bubble up an unhandled exception, impacting business specific paths. I don't think that's a great default.
-
If you emit datagrams that large chances are cardinality is also going to be high through tags most likely and a truncated datagram would be the least we could do. Although an exception would probably be deserved

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behaviour if we send a datagram that's too large? Does that depend on the statsd backend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we don't know. I assume they will simply not be delivered because they are being dropped by the networking stack as the UDP datagrams get too large. By design, we don't know about this - it's a fire and forget protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Willem said re. fire and forget. I'm fine with any solution other than raising an error as telemetry specific fire and forget datagrams shouldn't risk raising unhandled exceptions that could uproot business processes in the host application
| memcpy(builder->datagram + builder->len, StringValuePtr(normalized_name), chunk_len); | ||
| builder->len += chunk_len; | ||
|
|
||
| if (builder->len + 1 > DATAGRAM_SIZE_MAX) goto finalize_datagram; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could maybe extract this into an inline function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking about it too, but for flow control to convert the buffer to a Ruby String and return, the goto is local.
A static function would require length and the builder as arguments too and wouldn't make a difference to code size.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or let overflow checking be a compile time option and allow arbitrary size datagrams if disabled (effectively the current Ruby implementation's behaviour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that this would make more sense if we were raising an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk / regression factor in raising exceptions is that we'd violate a loose contract of "allowing any size datagrams to be built" to "raising an exception that can fan out to the whole host application"
Thoughts on a middle ground that either:
-
Implements overflow guards, truncates the datagram and emit a warning with
rb_warn("StatsD datagram truncated to X bytes"). Easy to provide a reason too, as with the logging pipeline. -
Follow the existing behavior of supporting arbitrary size datagrams: static buffer that can overflow to a heap allocated buffer OR move the buffer to the heap. Both of these aren't difficult, but creates many more branches and complexity too.
?
|
@wvanbergen objections to spawning
|

Why?
The datagram builder is heavy on allocation (and reallocation) when coercing both metric names and tags into normalized and valid statsd wire protocol components.
This extension is also implemented as a wrapped struct without any global state as @hkdsun and @bmansoob expressed interest in how to do that.
A simple struct with mixed native type and
VALUE(Ruby object reference) membersGC mark callback invoked during the GC tracing phase (we need to let the GC know about Ruby object references we hold on to)
GC free callback invoked during the sweep phase if this object was deemed to not be referenced by any other, the stack etc. . We free the symbol tables for caches and the struct itself.
It's a good practice to add an object size accumulator callback to accurately reflect the size of this object via
Objspace.memsize_ofA data type declaration defines the shape of the wrapped object and callbacks for the GC
An unboxing function coerces a
VALUEreference back to a builder struct as per the type definition aboveThe allocator function prepares the struct for use by initializing members to their appropriate types, caches conditionally etc. and returns a
VALUEreference to the allocated struct on the heapPoints of attack
The prefix is constant, initialized once yet merged into the output buffer from offset 0 to it's length for every builder call. We can instead seed the builder buffer with it and advance the buffer low water mark to the end of prefix on every call
Name normalization is expensive as it involves a regex match for the fast path and an additional function call for the slow path. For the fast path it's more efficient to walk to string from start to end , and for the slow path implement a bounded size normalized names cache to save on
String#trfunction calls for invalid metric names.Tags normalization is expensive and especially with the hybrid
HashandArraytags API allocates additional objects for theHashbased API. This method is not optimized in C as it's mostly iterators and method calls which can become EXTREMELY ugly rewriting in a C extension. Instead another bounded size normalized tags cache was introduced to bypass this Ruby method for tags collections with a contents hash we've seen before.There's always a new tags Array spawned in the
generate_generic_datagrammethod. This is wasteful even for cases which don't have any default tags defined. This was flattened out into a zero alloc append only pattern instead.Sample rate values appended to the buffer is zero alloc for Fixnum and Float types, but has a fallback path for other object types which would allocate a ruby String. This pattern likely works well for the
valueargument too, but would prefer to get runtime feedback first and consider it as a later optimisation.Remove a repeated ivar lookup and size check for default tags in favor of caching it on the builder struct instead on init.
Reduce method dispatch overhead by calling the builder directly from the various metric helper methods.
Configurables
The buffer size can be changed compile time
Normalized caches are compile time features that can easily be disabled or changed
Resiliency issues covered
Strict overflow checks that finalize the buffer prior to overflow
Test suite run with GC.stress = true
Wrapped builder struct layout
Fits within 1 cache line (46 of 64 bytes) with the buffer as the last member. 4kb is very generous, but in reality the vast majority of statsd datagrams would only reach into 1 or 2 more cache lines.
The struct is passed around by reference as it's heap allocated anyways and few things are as bad as large structs passed by value.
Future TODO
valueargumentDownsides
Other ideas
@csfrancis @wvanbergen