Skip to content

Plan your Intake Runner

To ensure optimal performance and cost-effectiveness, it’s a good practice to plan your STACKIT Intake configuration in advance. The key to this planning is understanding your data stream’s characteristics.

Before creating an Intake Runner, you need to analyze your data stream’s patterns and requirements:

  • Data Volume Analysis: Estimate the volume of data you expect to ingest. Intake Runner capacity is defined by two key metrics: maximum number of messages per hour and maximum message size in KiB. Be sure to consider your peak ingestion rates and message sizes, not just the average.
  • Ingestion Patterns: Consider whether your ingestion workload is steady or spiky. The Intake Runner’s buffering capability can handle temporary spikes and downstream outages for up to 24 hours, but your defined capacity should be sufficient to process the sustained workload.
  • Start with a Realistic Capacity: Begin with a capacity that accommodates your expected peak hourly throughput. You can monitor the system and increase the capacity if necessary. Note that you cannot reduce the capacity at the moment. Instead, consider deleting and recreating the Intake Runner and its Intakes, pointing them to the existing tables.
  • Leverage Buffering: If you don’t expect downstream interruptions as long as 24 hours and can live with variations in the time it takes for messages to become visible in Dremio beyond 5 minutes, the buffering can also help you to absorb spikes in message volume beyond the specified hourly maximum throughput. An Intake Runner will only block messages if the preallocated buffer storage is full.
  • Partitioning: Use partitioning in your Intake to optimize data processing and query performance in Dremio. Choosing a date or any other courser-grained time field as a partition field is a good idea. This will permit you to perform Iceberg table compaction for improved Dremio query performance on older partitions while Intake is still writing to the current partition. Using automatic ingestion time-based partitioning with the automatically added timestamp column __intake_ts also is a simple way to add time-based partitioning.