Skip to content

Significant CPU impact on instrumented RedisΒ #1025

@paweljw

Description

@paweljw

Hi! I noticed a significant impact on the CPU load from a Redis service instrumented with OBI. I whittled it down to a minimal reproduction: https://github.com/paweljw/redis-impact-repro I generally run run_load.sh first, then run_obi.sh.

This minimal configuration file seems to be enough to trigger it:

discovery:
  instrument:
  - open_ports: 6379
trace_printer: text

Here are the results I'm getting with the synthetic reproduction on a spare, otherwise empty 4-core server (not anywhere close to 100% loaded):

load-generator-1  | [STATS] avg_latency=10.73ms | redis_cpu=27.9% | ops/sec=9813
load-generator-1  | [STATS] avg_latency=9.49ms | redis_cpu=29.0% | ops/sec=10792
load-generator-1  | [STATS] avg_latency=8.49ms | redis_cpu=29.0% | ops/sec=12281
load-generator-1  | -------------- OBI STARTED -----------------
load-generator-1  | [STATS] avg_latency=9.55ms | redis_cpu=28.2% | ops/sec=10899
load-generator-1  | [STATS] avg_latency=10.75ms | redis_cpu=47.1% | ops/sec=9654
load-generator-1  | [STATS] avg_latency=10.32ms | redis_cpu=47.2% | ops/sec=10035
load-generator-1  | [STATS] avg_latency=10.78ms | redis_cpu=46.7% | ops/sec=9555
load-generator-1  | [STATS] avg_latency=11.64ms | redis_cpu=45.7% | ops/sec=8773
load-generator-1  | [STATS] avg_latency=10.92ms | redis_cpu=46.4% | ops/sec=9350
load-generator-1  | [STATS] avg_latency=10.41ms | redis_cpu=46.7% | ops/sec=9827
load-generator-1  | [STATS] avg_latency=11.45ms | redis_cpu=45.5% | ops/sec=8985
load-generator-1  | [STATS] avg_latency=11.78ms | redis_cpu=40.7% | ops/sec=8709
load-generator-1  | -------------- OBI STOPPED -----------------
load-generator-1  | [STATS] avg_latency=8.54ms | redis_cpu=28.2% | ops/sec=12126
load-generator-1  | [STATS] avg_latency=9.78ms | redis_cpu=28.2% | ops/sec=10658
load-generator-1  | [STATS] avg_latency=9.00ms | redis_cpu=28.1% | ops/sec=11524
load-generator-1  | [STATS] avg_latency=9.49ms | redis_cpu=28.0% | ops/sec=10930
load-generator-1  | [STATS] avg_latency=8.03ms | redis_cpu=29.2% | ops/sec=12774

When excluding - exe_path: "**/redis-server", the performance hit is gone (and so are the traces, of course):

load-generator-1  | [STATS] avg_latency=10.09ms | redis_cpu=28.4% | ops/sec=9888
load-generator-1  | [STATS] avg_latency=10.28ms | redis_cpu=28.6% | ops/sec=10069
load-generator-1  | [STATS] avg_latency=7.87ms | redis_cpu=29.6% | ops/sec=13077
load-generator-1  | -------------- OBI STARTED -----------------
load-generator-1  | [STATS] avg_latency=9.93ms | redis_cpu=27.6% | ops/sec=10517
load-generator-1  | [STATS] avg_latency=10.90ms | redis_cpu=28.8% | ops/sec=9479
load-generator-1  | [STATS] avg_latency=8.30ms | redis_cpu=29.5% | ops/sec=12507
load-generator-1  | [STATS] avg_latency=10.91ms | redis_cpu=27.6% | ops/sec=9767
load-generator-1  | [STATS] avg_latency=8.46ms | redis_cpu=29.3% | ops/sec=12169
load-generator-1  | -------------- OBI STOPPED -----------------
load-generator-1  | [STATS] avg_latency=8.80ms | redis_cpu=29.0% | ops/sec=11718
load-generator-1  | [STATS] avg_latency=8.64ms | redis_cpu=28.8% | ops/sec=11898

My first thought was eBPF, after some previous run-ins :) but it seems just auto-instrumenting the process is enough. Not limited to Python-based workloads, I'm seeing this with other clients too.

This unfortunately blocks me from continuing to roll out OBI - the synthetic benchmark takes an extra ~20% percentage point hit, a real server with high load is predictably worse. The ops/sec hit isn't so great either for production usecases 😞

I'm not sure whether this is a bug, or expected performance hit from instrumentation. Could you please advise? Thank you! πŸ™

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions