bulk_indexing

bulk_indexing #

Description #

The bulk_indexing processor is used to asynchronously consume bulk requests in queues.

Configuration Example #

A simple example is as follows:

pipeline:
- name: bulk_request_ingest
  auto_start: true
  keep_running: true
  processor:
    - bulk_indexing:
        bulk_size_in_mb: 1
        queues:
          type: bulk_reshuffle
          level: cluster

Parameter Description #

NameTypeDescription
idle_timeout_in_secondsintTimeout duration of the consumption queue, which is set to 1 by default.
max_connection_per_nodeintMaximum number of connections allowed by the target node. The default value is 1.
bulk_size_in_kbintSize of a bulk request, in KB.
bulk_size_in_mbintSize of a bulk request, in MB.
queuesmapA group of queues filtered by label, in which data needs to be consumed.
skip_info_missingboolWhether to ignore queue data consumption when conditions are not met, for example, the node, index, or shard information does not exist, that is, whether to consume queue data after information is obtained. The default value is false. Otherwise, one Elasticsearch node is selected to send requests.
bulk.compressboolWhether to enable request compression.
bulk.retry_delay_in_secondsintWaiting time for request retry.
bulk.reject_retry_delay_in_secondsintWaiting time for request rejection.
bulk.max_retry_timesintMaximum retry count.
bulk.failure_queuestringQueue for storing requests that fail because of a back-end failure.
bulk.invalid_queuestringQueue for storing requests, for which 4xx is returned because of invalid requests.
bulk.dead_letter_queuestringRequest queue, for which the maximum retry count is exceeded.
bulk.safety_parseboolWhether to enable secure parsing, that is, no buffer is used and memory usage is higher. The default value is true.
bulk.doc_buffer_sizeboolMaximum document buffer size for the processing of a single request. You are advised to set it to be greater than the maximum size of a single document. The default value is 256*1024.