Skip to content

Conversation

@methodmissing
Copy link

@methodmissing methodmissing commented Mar 30, 2020

Why?

The datagram builder is heavy on allocation (and reallocation) when coercing both metric names and tags into normalized and valid statsd wire protocol components.

This extension is also implemented as a wrapped struct without any global state as @hkdsun and @bmansoob expressed interest in how to do that.

  • A simple struct with mixed native type and VALUE (Ruby object reference) members

  • GC mark callback invoked during the GC tracing phase (we need to let the GC know about Ruby object references we hold on to)

  • GC free callback invoked during the sweep phase if this object was deemed to not be referenced by any other, the stack etc. . We free the symbol tables for caches and the struct itself.

  • It's a good practice to add an object size accumulator callback to accurately reflect the size of this object via Objspace.memsize_of

  • A data type declaration defines the shape of the wrapped object and callbacks for the GC

  • An unboxing function coerces a VALUE reference back to a builder struct as per the type definition above

  • The allocator function prepares the struct for use by initializing members to their appropriate types, caches conditionally etc. and returns a VALUE reference to the allocated struct on the heap

Points of attack

Configurables

Resiliency issues covered

Wrapped builder struct layout

Fits within 1 cache line (46 of 64 bytes) with the buffer as the last member. 4kb is very generous, but in reality the vast majority of statsd datagrams would only reach into 1 or 2 more cache lines.

The struct is passed around by reference as it's heap allocated anyways and few things are as bad as large structs passed by value.

lourens@CarbonX1:~/src/statsd-instrument/lib/statsd/instrument/ext$ pahole -C datagram_builder ./statsd.so
struct datagram_builder {
	st_table *                 normalized_tags_cache; /*     0     8 */
	st_table *                 normalized_names_cache; /*     8     8 */
	VALUE                      str_normalize_chars;  /*    16     8 */
	VALUE                      str_normalize_replacement; /*    24     8 */
	VALUE                      default_tags;         /*    32     8 */
	_Bool                      empty_default_tags;   /*    40     1 */

	/* XXX 3 bytes hole, try to pack */

	int                        prefix_len;           /*    44     4 */
	int                        len;                  /*    48     4 */
	char                       datagram[4096];       /*    52  4096 */

	/* size: 4152, cachelines: 65, members: 9 */
	/* sum members: 4145, holes: 1, sum holes: 3 */
	/* padding: 4 */
	/* last cacheline: 56 bytes */
};

Future TODO

  • Remove the String allocation for the value argument
  • Evaluate the hashing function used for the tags cache (Hash and Array specific are not directly exposed through the extension API)

Downsides

  • This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.

Other ideas

  • Willem mentioned more Stricter mode linters that can catch invalid metric names and tags enabled for example during CI could work wonders too.

@csfrancis @wvanbergen

csfrancis and others added 20 commits March 30, 2020 20:18
…, favoring broken packets to undefined behavior
…m the C ext only (normalize_tags is a pure function)
…also allows the normalized tags cache to be decoupled from Ruby land
… are very careful with offsets, even when preventing overflows
…zed cache limits should be configurable at runtime, idk?)
…instead of hashes. Along with that came several requirements which also illustrates how to implement wrapped structs and plug them clean into the GC.

* The ivar lookups for caches were not efficient as the total ivar count for the builder went up to 4, meaning 1 of them would incur 2 symbol table lookups for a cached element as ivar count over 3 overflows to a symbol table
* 'struct datagram_builder' now becomes a wrapped struct for tracking state such as optional normalized cache symbol tables, the datagram encode buffer, the initial offset of the first chunk of that buffer and also caches the default tags ivar
* 'rb_data_type_t datagram_builder_type' and functions datagram_builder_mark, datagram_builder_free and datagram_builder_size illustrates the callbacks required for proper GC integration of wrapped structs
* The primary reason for the wrapped struct is to not have global state as if there's multiple instances ever instantiated from Ruby land, the global state can be clobbered.
* With the struct we are able to now pre seed the initial part of the buffer with the given prefix ivar and track the offset into the buffer for #generate_generic_datagram to pick up from so we don't have to do the ivar lookup and memcpy for EVERY datagram built
* Normalized caches are now backed by symbol tables. I am happy with the normalized name implementation as rb_str_hash is exposed and computes a numeric hash from the String contents. Unfortunately rb_ary_hash and rb_hash_hash are not exposed and rb_hash defaults
  to a method call instead. I flagged to revisit.
…tion call, still has O(n) complexity but cheaper than str hash + symbol table lookup
…ect dispatch to them from the respective metric helper functions
@methodmissing
Copy link
Author

... and some CI massaging required

Screenshot from 2020-03-30 22-56-46

@wvanbergen
Copy link
Contributor

This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.

In think there are two ways to address this:

  1. The datagram builder is already configurable. We could ship this as a separate datagram builder class (potentially as a different gem). The client can then configure to use this different datagram builder class, maybe using an environment variable.
  2. We create a separate gem that when loaded, will overwrite the datagram builder class with this native implementation.

@methodmissing
Copy link
Author

Yeah that makes sense - like the sequel model

Copy link
Contributor

@csfrancis csfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library now becomes dependent of a C compiler and ruby development headers - need to think about decoupling so that is optional.

If we think that’s a big deal, I like the option of a separate gem that depends on this that overwrites the class.

struct datagram_builder *builder = (struct datagram_builder *)ptr;
if (!builder) return;
#ifdef NORMALIZED_TAGS_CACHE_ENABLED
st_free_table(builder->normalized_tags_cache);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set these to NULL after freeing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agree, use after free guard

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef NORMALIZED_NAMES_CACHE_ENABLED
size += st_memsize(builder->normalized_names_cache);
#endif
return size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this include the sizes of the VALUE members on the builder (assuming the values can’t be embedded)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of object memory size reporting (as expected to be returned by ObjectSpace.memsize_of) is that retained memory shouldn't be included. See Class for example

Heap dumps follow that model too, otherwise the dominator tree for harb would not have been needed 😄

st_memsize is just memory consumed by the hash table internals (bins and entries)

normalized_name = normalized_names_cached(builder, self, name);
}
chunk_len = RSTRING_LEN(normalized_name);
if (builder->len + chunk_len > DATAGRAM_SIZE_MAX) goto finalize_datagram;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be raising on these conditions?

Copy link
Author

@methodmissing methodmissing Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but then again this is telemetry specific and the default buffer is a generous buffer size of 4096 bytes per datagram.

The line is somewhere between:

  • Raising on a telemetry specific path the could bubble up an unhandled exception, impacting business specific paths. I don't think that's a great default.

  • If you emit datagrams that large chances are cardinality is also going to be high through tags most likely and a truncated datagram would be the least we could do. Although an exception would probably be deserved :trollface:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behaviour if we send a datagram that's too large? Does that depend on the statsd backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we don't know. I assume they will simply not be delivered because they are being dropped by the networking stack as the UDP datagrams get too large. By design, we don't know about this - it's a fire and forget protocol.

Copy link
Author

@methodmissing methodmissing Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Willem said re. fire and forget. I'm fine with any solution other than raising an error as telemetry specific fire and forget datagrams shouldn't risk raising unhandled exceptions that could uproot business processes in the host application

memcpy(builder->datagram + builder->len, StringValuePtr(normalized_name), chunk_len);
builder->len += chunk_len;

if (builder->len + 1 > DATAGRAM_SIZE_MAX) goto finalize_datagram;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could maybe extract this into an inline function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about it too, but for flow control to convert the buffer to a Ruby String and return, the goto is local.

A static function would require length and the builder as arguments too and wouldn't make a difference to code size.

Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or let overflow checking be a compile time option and allow arbitrary size datagrams if disabled (effectively the current Ruby implementation's behaviour)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that this would make more sense if we were raising an exception.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The risk / regression factor in raising exceptions is that we'd violate a loose contract of "allowing any size datagrams to be built" to "raising an exception that can fan out to the whole host application"

Thoughts on a middle ground that either:

  • Implements overflow guards, truncates the datagram and emit a warning with rb_warn("StatsD datagram truncated to X bytes"). Easy to provide a reason too, as with the logging pipeline.

  • Follow the existing behavior of supporting arbitrary size datagrams: static buffer that can overflow to a heap allocated buffer OR move the buffer to the heap. Both of these aren't difficult, but creates many more branches and complexity too.

?

@methodmissing
Copy link
Author

@wvanbergen objections to spawning statsd-instrument-c (blocked on a better name, patches welcome 😄)? That would also introduce some other issues to think about:

  • What happens with test coverage in light of the extracted extension?
  • Running the main test suite with and without the C extensions (gems become a CI dependency)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants