Backpressure
When a telemetry pipeline ingests data, it’s possible for the volume of incoming data to exceed that pipeline’s throughput. This creates a condition known as backpressure, where data accumulates faster than the pipeline can process and route that data to its intended destination.
Although some amount of backpressure is normal and expected, unmanaged or excessive backpressure can result in data loss.
Overview
If a pipeline needs to buffer data, it stores the data in memory until that pipeline is ready to process and route the data to its intended destination. After it processes and routes the data, the pipeline flushes that data from memory. Pipelines with the StatefulSet workload type use a hybrid buffer by storing a parallel copy of buffered data in the file system, which creates a persistent backup that mirrors the data written to memory.
If the volume of incoming data exceeds a pipeline’s throughput, the amount of data in temporary storage will increase accordingly. This is what creates backpressure, but pipelines are designed to accommodate a certain amount of backpressure without issue. To draw a comparison with pipes that carry water, temporary storage is like the basin of a sink: if water flows faster than it drains, the basin will fill up and store the extra water until it reaches the drain.
However, if a pipeline continues buffering new data to temporary storage faster than it can remove old data, that storage will eventually reach capacity. This is the point at which backpressure becomes an urgent problem because the buffered data is like a sink that’s about to overflow. When a pipeline’s temporary storage is at capacity, it will stop buffering new data, which prevents an overflow but can cause data loss.
Push-based sources versus pull-based sources
When a pipeline stops buffering new data, the potential for data loss partly depends on whether its associated source plugins are push-based or pull-based.
For push-based source plugins, which passively receive data, Chronosphere has no control over the behavior of that data source. Some sources might pause the flow of data if they detect an interruption, but other sources might continue attempting to send data to the unavailable pipeline, which can cause data loss.
For pull-based source plugins, which either actively fetch data or generate test data, Telemetry Pipeline can control communication between the source and itself. If a pipeline is unavailable, Telemetry Pipeline pauses the flow of data, then resumes fetching data when the pipeline is ready to ingest it again. This behavior can add temporary latency if the source buffers a large amount of data during the pipeline’s downtime, but avoids major data loss.
Hybrid buffering for StatefulSets
In a StatefulSet pipeline, each chunk (opens in a new tab) of buffered data is always stored in the file system. If the pipeline has enough space in memory, an identical chunk of that buffered data is also written to memory. If the pipeline doesn’t have enough space in memory, the chunk of buffered data remains only in the file system until there is enough space to write an identical chunk to memory.
Chunks that are stored simultaneously in memory and in the file system are known as up
chunks, and chunks that are stored only in the file system are known as down
chunks. Unlike down
chunks, pipelines can access data in up
chunks directly. A
down
chunk in the file system becomes an up
chunk when a copy of the down
chunk
is written to memory.
After an up
chunk is processed and routed, the associated buffered data both
in memory and in the file system is flushed.
Manage backpressure
The available methods for managing backpressure vary depending on your pipeline’s workload type. Because of this, choosing the right workload type is also a key part of managing backpressure.
The primary factors that contribute to backpressure are independent of Telemetry Pipeline, like the amount of data that your sources emit and your destinations’ capacity to receive data. To manage these factors, you must configure your sources and destinations directly.
Deployment pipelines
Pipelines with the Deployment workload type store buffered data only in memory.
Use the following methods to manage backpressure for Deployment pipelines:
- Set the
mem_buf_limit
configuration parameter to enforce a limit for how much data a source plugin can buffer to memory. When this limit is reached, the pipeline will stop buffering new data from that source plugin. - Configure resource profiles to set thresholds for a pipeline’s resource usage.
- Scale pipelines to increase their throughput. However, keep in mind that adding replicas to a pipeline increases its resource usage, and that scaling won’t alleviate destination-level bottlenecks.
- Monitor the latency added by any active processing rules. Complex processing operations can add delays and reduce throughput.
StatefulSet pipelines
Pipelines with the StatefulSet workload type use a hybrid approach that stores buffered data both in memory and in the file system.
Use the following methods to manage backpressure for StatefulSet pipelines:
- Configure resource profiles to set
thresholds for a pipeline’s resource usage.
- Set the
resources.storage.backlogMemLimit
resource profile parameter to increase or decrease the amount of memory allocated for storing buffered data. When this limit is reached, any new data will be buffered solely to the file system, assuming the file system has enough available free space. This parameter sets a per-pipeline limit instead of a per-plugin limit. - Set the
resources.storage.volumeSize
resource profile parameter to increase or decrease the size of the persistent file system for each pipeline Pod. - Set the
resources.storage.maxChunksUp
resource profile parameter to increase or decrease the number of chunks (opens in a new tab) that the pipeline can buffer to memory. When this limit is reached, any new data is buffered only to the file system, assuming the file system has enough available free space. This parameter sets a per-pipeline limit instead of a per-plugin limit.
- Set the
- Scale pipelines to increase their throughput. However, adding replicas to a pipeline increases its resource usage, and scaling won’t alleviate bottlenecks caused by destinations.
- Monitor the latency added by any active processing rules. Complex processing operations can add delays and reduce throughput.
The mem_buf_limit
configuration parameter has no effect on source plugins in
StatefulSet pipelines.
DaemonSet pipelines
Pipelines with the DaemonSet workload type store buffered data only in memory.
Use the following methods to manage backpressure for DaemonSet pipelines:
- Set the
mem_buf_limit
configuration parameter to enforce a limit for how much data a source plugin can buffer to memory. When this limit is reached, the pipeline will stop buffering new data from that source plugin. - Configure resource profiles to set thresholds for a pipeline’s resource usage.
- Monitor the latency added by any active processing rules. Complex processing operations can add delays and reduce throughput.
DaemonSet pipelines use a static number of replicas, which means they can’t be scaled.