Professional Documents
Culture Documents
You have numerous options when tuning the performance of NiFi and Elasticsearch. The
following guide introduces tools for monitoring performance and validating key tuning
parameters, and provides a performance tuning strategy that you can use with the component.
The NiFi and Elasticsearch component may be perceived as relatively slow in processing speed.
This may appear to be the case if you are testing the default configuration, which has not been
modified or tuned to improve processing performance. This default configuration includes a
mostly single threaded configuration of NiFi, and a minimal configuration of the Elasticsearch
cluster. This setup can result in performance issues when building the search index. However,
the performance of the solution is not only due to the default configuration.
It is also important to choose an optimal hardware configuration when testing. While CPU and
memory resources are not critical, the I/O bandwidth of the disk subsystem on the Elasticsearch
cluster emerges as a critical factor. Thus, careful selection of the environment from the start can
avoid several simple performance pitfalls.
The following documents outline this approach. They also provide a general understanding of the
index building process and the corresponding metrics that can be used to observe the process.
They provide a general understanding and interpretation of these metrics to further improve
performance. Also included is a list of configuration changes for two hardware specifications:
A minimal configuration, that is applicable for use with a minimal hardware specification,
and
A recommended configuration, that should be applied on the recommended hardware
specification.
Background information
Multiple factors influence the performance of the search index build, including hardware footprint,
catalog size, and attribute dictionary richness. Understanding the bottlenecks and how they
express themselves across the whole process is crucial to fine-tuning the search solution.
1. Data retrieval.
2. Data processing or Transformation.
3. Data uploading.
A set of predefined connectors consisting of multiple process groups for different purposes is
available. To handle the data retrieval, processing, and uploading stages of the index creation
process, each process group often has nested process groups.
Retrieving data group
Fetch data from database or Elasticsearch.
Each group has an influence on the index building process's speed and
efficiency. The retrieving data group, for example, would be in control of the
flow file size (bucket size) and query execution frequency (scroll page size).
You can optimise the payload and retrieval cost from the database, as the
chunk of data NiFi processes as a unit, by altering these variables. The size
of the flow file affects Elasticsearch upload performance. Complicated and
large structures can take longer time for Elasticsearch to parse, resulting in
poor scalability.
The processing data group controls the amount of work NiFi can do. For
example, you can regulate how many flow files may be processed
concurrently by controlling the processor's thread count. This increases the
processing speed of a typical processor, potentially improving flow file pileup
in front of the processor. The NLP processor, for example, is a typical
processor that benefits significantly from additional threads. You can control
how many connections to Elasticsearch you make concurrently using the
more specialised bulk update type processor, allowing you to import more
data to Elasticsearch.
HCL Commerce's infrastructure requirements are well defined, and while NiFi
and Elasticsearch may function on a reduced footprint, performance may
suffer if you reduce their resource allocation. Good I/O bandwidth is
necessary for both NiFi and Elasticsearch infrastructures, enough memory for
Java heap and Java native memory allocation, and preferably, enough
memory for file caching. The latter may need to be specified in the pod since
it ensures that the operating system has enough additional RAM for the
service.
The default processor runs a single thread at a time, processing one flow file
at a time. If concurrent processing is desired, the number of concurrent tasks
that it can do can be adjusted. Set the number of threads for the process
group by changing the processor Concurrent Tasks value (under Processor
configuration or SCHEDULING tab).
Throughput can be improved if a CPU is able to multitask by increasing the
number of threads it employs. Two such examples are the transformation
processor (as in NLP) and the Bulk update processor (sends data to
Elasticsearch). This update does not help every processor. Most processors
come with an default configuration that takes this variable into account and
does not need to be altered. When performance testing reveals a bottleneck
in front of the processors, the default configuration may benefit from further
tuning.
When the processor can process flow files at the same rate as they come,
the Concurrent Tasks value is ideal, preventing large pileups of flow files in
the processor's wait queue. Because such a balance may not always be
feasible, the best configuration focuses on reducing the flow file pileup in the
waiting queue.
Due to the slowdown transfer speed to Elasticsearch, you can see a shallow depletion of the
documents in the queue. You can increase this speed by opening more connections to
Elasticsearch and configuring more threads for the Bulk Update Processor to increase
throughput.
When the total number of threads is raised, the following graph represents what happens on the
system:
When the Bulk Update Processor is configured with only three threads, the initial configuration
shows a very shallow depletion of the documents. When configured with 16 threads, the ramp-
down angle increases significantly, and when configured with 64 threads, it improves even more.
The important distinction is that increasing the number of threads in this processor increases the
number of HTTP connections opened from NiFi to Elasticsearch, while the resulting CPU
utilisation remains almost unchanged.
Other processors, such as the NLP process group, could benefit from similar observation and
improvement. The maximum concurrency achievable in the system is restricted by the number of
physical CPU cores available to the NiFi pod while using CPU bound processors like NLP.
Furthermore, the OS resources available to the pod/ JVM where NiFi runs should be given
special consideration. A CPU-bound processor, for instance, benefits from concurrency only if
there are any available cores on the CPU. Increased processor concurrency, on the other hand,
increases the JVM heap size and the required native memory. To detect and correct such
situations, it is critical to monitor heap values and overall memory.
Detecting and capturing such a failure is a rare occurrence that is often overlooked. One such
event is shared here, along with point to the metrics values:
The Java heap size is seen in the graph above. The brief pause in the middle of the graph
reflects the NiFi pod crashing and restarting. If you observe the CPU before the crash, you can
see that the CPU utilisation spikes, but the heap size never reaches 100%, instead staying
around 70%, which is a reasonable heap size.
With a large bucket size, a different problem may arise: Elasticsearch may slow down the data
import significantly as the flow file size increases, to a point that the benefits in NiFi are
completely negated and processing degrades due to the Elasticsearch data import. Organize and
track your ingest testing for different bucket size values to avoid such situations and confusing
results.
In this particular case, the slowdown are going to be visible within the Grafana graphs as idle
time between the queued item’s peaks. This happens when the scroll.size parameter is
configured to be relatively low compared to the total size of the database table that is being
accessed. The scroll.size should ideally match the processing time, where the database query
time should be equal to the NiFi processing time of the extracted data. However, in special cases
where the SQL runs longer then the NiFi processing, you could observe this as short peaks in
the queried items graph, spaced apart by flat/idle line.
This idle time between the fetches of the database data could be mitigated by increasing
the scroll.page.size value to higher number. For example if the database has total of 1 M
catalog items, and the scroll.page.size is set to 100000 items, the whole process
involves 10 iterations. This indicates that the items in the queue have 10 spikes, with idle time
intervals. You can reduce the wait time by 50% by increasing scroll.page.size to 200000,
thereby reducing the cycles to 5 cycles. You should set scroll.page.size to 1M so that the whole
data can be retrieved in one cycle and only one idle period for the processing phase is
observed.The following graph shows one such case:
Due to the side effects, very large tuning variable changes can adversely affect total processing
time. It should be noted again that careful change of the values should be monitored and
measured, making sure that each individual change produces a positive change (improves) to
the overall processing time.
Other considerations
Cache size
"[${TENANT:-}${ENVIRONMENT:-}live]:services/cache/nifi/Price":
localCache:
maxSize: -1
maxSizeInHeapPercent: 8 (default 2)
remoteCache:
enabled: false
remoteInvalidations:
publish: false
subscribe: false
"[${TENANT:-}${ENVIRONMENT:-}auth]:services/cache/nifi/Inventory":
localCache:
maxSize: -1
maxSizeInHeapPercent: 8 (default 2)
remoteCache:
enabled: false
remoteInvalidations:
publish: false
subscribe: false
"[${TENANT:-}${ENVIRONMENT:-}auth]:services/cache/nifi/Bulk":
localCache:
maxSize: -1
maxSizeInHeapPercent: 4 (default 1)
remoteCache:
enabled: false
remoteInvalidations:
publish: false
subscribe: false
"[${TENANT:-}${ENVIRONMENT:-}auth]:services/cache/nifi/Wait":
localCache:
maxSize: -1
maxSizeInHeapPercent: 4 (default 1)
remoteCache:
enabled: false
The NiFi and Elasticsearch heaps, and also native memory utilization, should be given special
attention. If the memory size is inadequate for the workload, it must be extended. After each
tuning adjustment, the heap and native memory use should be monitored. This is crucial when
increasing processor concurrency or increasing bucket.size/flowfile size.
The Graphana NiFi Performance graph is the most convenient way to track the index's overall
development. You can look at the total execution pace, determine the speed of major processor
groups, and see how much data is generated and uploaded to Elasticsearch.
Grafana
You can use Grafana to analyze the performance of the ingest pipeline. The two most useful
graphs are Queued Items and Wait Link.
WaitLink process groups are added between process groups in the NiFi ingest connectors to
ensure that the previous stage is completed before the subsequent stage is started. Data
currently in use in an ongoing process cannot be used in subsequent stages. Furthermore, this
reduces the probability of multiple processes operating at the same time, which might result in
significant spikes in CPU, network, memory, or disc IO resource requests.
The time spent on WaitLink can be used to estimate the total time spent on a stage and identify
the stages that consume the most time and/or resources during the build. Since WaitLink is not
available for all process groups, the Queued Items graph offers more details about the
processing time for each process group.
The Bulk Service <- XXXX> charts are useful to look at within Queued Items. The processed
data (index documents) is sent from NiFi to Elasticsearch by these process groups. Bulk Service
- Product is the most essential. Use the timestamp in WaitLink to access the corresponding
stages because the curve runs from the beginning to the finish of the ingest pipeline.
The next two graphs, for example, illustrate that the Product Stage 1e has the most queued
items. This observation indicates that the retrieving data group and the processing data group
are capable of handling the task rapidly and sending large amounts of data to the Bulk service
group for transmission.
The duration with 100 queued items is short in this example, thus it is not a concern. A possible
bottleneck in the pipeline would be a process group that takes longer and has a larger number of
queued items.
Grafana may also be used to track other parameters.
You can enable it by adding the following two line within nifi-app.yaml /commerce-helmchart-
master/hcl-commerce-helmchart/stable/hcl-commerce/templates/nifi-app.yaml before installing
NiFi:
- name: "DOMAIN_NAME"
value: "{{ .Release.Namespace }}.svc.cluster.local"
- name: "SPIUSER_NAME"
value: {{ $.Values.common.spiUserName | quote }}
- name: "FEATURE_NIFI_COUNTER"
value: "true"
- name: "VAULT_CA"
value: {{ .Values.vaultCA.enabled | quote }}
Examine the report while the test is ongoing or after the ingest process is done if you enable it.
One disadvantage is that each connection can only display one report. By using the same
connector, another ingest pipeline can be run at the same time (please allow a couple of minutes
for this to complete). Once the ingest pipeline starts, the report created for the previous run is
deleted.
The ingest report, Ingest Metrics, is sent to the index run within Elasticsearch once an ingest
pipeline is completed. Grafana can be set up to display the report in the format you specify. All of
the reports for the various ingest pipelines and connections are saved. To view the report, select
connector and runID.
At Grafana, the data for Ingest Metrics differs from that for Queued Items/ Wait Link.
Elasticsearch receives the metrics from NiFi after the ingest process is complete. However,
Queued Items/ Wait Link uses Prometheus to collect data in real time.
You may not want to finish an ingest pipeline before running it again for tuning purposes, or the
process could fail at any time throughout the ingest process. In these circumstances, NiFi
counters may make reporting for particular stages of an ingest pipeline easier.
Kibana
Kibana can be used to monitor the resource consumption of Elasticsearch. For more information
about Kibana, see Kibana documentation.
Kibana is monitoring Elasticsearch activities in this graph. The CPU usage, JVM heap, and IO
operations rate are the key metrics for the index building process. The IO operation rate is the
main metric since it is difficult to push faster overall throughput if the IO rate is fully utilised. If the
speed is unacceptable, the best course of action is to look into other options that have a higher
throughput.
You can use Grafana to analyze the performance of the ingest pipeline. The two most useful
graphs are Queued Items and WaitLink.
Visual Representation of the NiFi activities
In the NiFi ingest connectors, WaitLink process groups are added between process groups to
ensure that the previous stage is completed before the next stage is started. This way,
subsequent stages will not use data that is currently being worked on in an unfinished process. In
addition, this reduces the occurrence of different processes running at the same time, which can
cause extreme spikes in resource requests for CPU, network, memory or disk I/O.
NiFi uses "flow files" to process data in batches. The number of documents included in a flow file
is defined by the scroll.bucket.size property. Setting (scroll.bucket.size)=300, for example,
would allow 300 catentryIds per flowfile if applied to the Product Update 1i processing
segment.
Both WaitLink and Bucket.Size values can be tracked in Grafana. Observing the activities and
quantities helps determine system behavior and aids in the detection of slow segments.
Both graphs have a number of metrics that can be tracked (by clicking the coloured line on the
right side of the graph), but only the most key metrics are displayed by default. Hover the mouse
pointer over the graph line and see which curve belongs to which process group or wait link.
When the processor group name or wait link is clicked, a small pop-up box appears:
The "queued link" graph depicts the number of flow files queued for processing at a given
processor group. A sharp rise in the curve indicates that the previous processor (or processor
group) is processing data faster, or that the processor group is struggling to keep up with the
overall throughput of adjacent processor groups. In the image below, you can observe a rapid
increase in the number of queued items around the 21.54 timestamp, indicating that the
processor is not keeping up with the incoming flow:
Similarly, the graph's ramp-down section has a steep curve, indicating that the CPU was able to
complete the processing rapidly. The steeper the curve, the faster the processor can process the
flowfiles, and the shallower the curve, the slower the processor can process the data. A case of
sluggish data flow processing can be seen in the image below:
The incoming rate (centered at 22:22 timestamp) is substantially greater than the outgoing rate,
with the incoming rate being relatively steep compared to the shallow angle of the outgoing
curve.
These simple observations are simple to apply to graphs and identify potential bottlenecks.
However, the conclusions are not always true, and the processor groups are sometimes
constrained in their data processing. To conclude, more observations are needed to confirm the
bottleneck.
Below the queued items are graphs of WaitLink. WaitLink graphs, unlike queued items, show
which stage or segment is processing at any given time. In other words, while the X axis
indicates time (corresponding to the Queue Item graph), the Y - axis shows the active segment,
having values ranging from 0 to 1:
If the system supports various languages, you may see many WaitLinks appear at the same
time. Thus, graphs reaching the Y axis up to value 2 may be shown for two languages, and so
on.
Wait links are helpful in assessing which processing stage takes the longest to complete. The
slowest segments are the longest rectangles, which are the best candidates for ingestion
process optimization.
In the next topic let us explore few typical cases of suboptimal ingestion processing and you will
formulate a strategy to improve the processing speed.
The Elasticsearch monitoring dashboard page displays numerous metric displays, ranging from KPI,
Shards and JVM Garbage Collection, to CPU and Memory, Disk and Network information. In-depth
knowledge of Elasticsearch operations is required for a complete understanding of some metrics, but
the crucial indicators for monitoring and troubleshooting Elasticsearch are broad enough that they
are easy to use.
KPI
Shards
Garbage collection is the automated process of cleaning up objects used by code in memory.
Breakers
circuit breakers dashboards that displays tripped circuit breakers, their frequency and value of the
metrics that tripped them.
Documents
Displays dashboards that show the documents and the operations done on them.
Times
Dashboards that show service times to the following operations: Query, Indexing, Merging and
Throttle time.
Thread Pool
Caches
Segments
Indices
Count of documents and total Size.
Indices
Doc values.
Indices
Refreshes.
Indices
Fields.
The following two critical monitoring metrics allow you to dynamically monitor the Elasticsearch
cluster and spot variations over time:
The following screenshots display a given time slice's query time and processing time. The
minimum, maximum, and average values are shown in the side table, while the graph presents
the maximum values.
The chart is simple to understand and can be used to quickly identify issues. Any sudden jump in
the max query time values would indicate a severe problem in the cluster, and further
investigation is mandatory. The graph depicts a situation where the query processing time is
disturbed for a more extended period, and this period should be correlated with the other
displays to determine the source of the problem.
Sudden spikes and then a return to normal are also possible. These spikes are usually due to
external events impacting the query processing time.
Moving the mouse cursor over the image provides the actual values for that time interval.
CPU Utilisation
This Dashboard is so a critical dashboardthat can alert you to unusual behavior or a
difficulty of the system to cope with the current workload.
Resource consumption
The resource consumption group ovides in-depth information on the Elasticsearch cluster
operations and resource availability. The steady state of operation should be well
understood, and any deviation from the steady state should present an alert for potential
threats or instability in the system that needs to be investigated.
CPU utilisation
The CPU age graph is relatively simple. It presents each ES node as a separate color in
the graph and, at the same time, presents a table with min, max, and average values for
each node in the ES cluster. Detecting situations where the CPU is very high (starvation)
or low (contention and slowdown in the processing rate) is easy and noticeable. The
following picture shows how the CPU utilization jumps to 80%+ while the traffic and
indexing are performed on the site.
Network
The network resources graphs are relatively simple but will provide additional data and
easy determination if an excess requests volume is fluencing operations.
The table displays the minimum, maximum, and average values for the chosen time period
as well as the value for the Old (also known as tenure) and Young (also known as rsery)
spaces in the heap.
Note: The OpenJDK JVM, which powers the Elasticsearch cluster, employs a
generational GC algorithm that is comparable to the IBM gencon GC algorithm.
The general expectation is that the used heap metrics will be below the max
heap metrics, while the CPU utilization chart will depict normal and steady resource
consumption. The GC Time chart should display low overhead and a short time spent
doing the garbage collection.
However, if the used heap metric is frequently maximum and close to the max
heap metric, steady operation is still present. This indicates that the overall heap in ES is
insufficient and should be increased.
Thread pool dashboards display real-time information about the worker threads and how they
operate on the cluster. The information is on a cluster level, but each thread group is shown per
ES node.
Thread pool operations queued indicates the number of tasks waiting to be executed,
while Thread pool threads active indicates the number of threads executing tasks. Both metrics
are essential for monitoring the health and performance of Elasticsearch.
The Thread pool operations queued metric indicates the number of tasks waiting to be
executed by the thread pool. This can happen when the number of tasks submitted to the thread
pool exceeds the maximum number of threads available to execute them. When this happens,
tasks are placed in a queue and are executed as soon as a thread becomes available.
The Thread pool threads active metric indicates the number of threads that are actively
executing tasks. When this number is close to the maximum number of threads available, it can
indicate that the system is under heavy load and may be experiencing performance issues.
You can further inspect the active threads by placing the mouse pointer over the graph at a
specific time to get the count of active threads at that specific time, as shown in image below.
Understanding the thread pools and their role in Elasticsearch operations is crucial to
troubleshoot problems:
1. Generic thread pool
This thread pool runs tasks that do not fit into any specialized thread pool. The generic
thread pool runs internal tasks within Elasticsearch, such as sending and receiving
network requests.
Each thread pool has its settings, such as the maximum number of threads and the queue size,
which can be configured to optimize performance based on the specific needs of your
Elasticsearch deployment.
The write thread pool and the search request thread pool are the two most essential threads that
must be closely watched and aged. Depending on the workload level and workload mix,
appropriate settings for each thread pool may be required to control the workload flow and keep
the cluster in stable operation.
There are several internal caches in lasticsearch, but the critical cache pools for Eastic operation
are:
1. Field data cache
The field data cache is used to cache field values for frequently accessed fields, and it
helps to speed up sorting aggregations and scripted fields. The field data cache is
implemented as a soft reference cache, which means that the cache can be cleared by
the garbage collector when memory becomes scarce.
2. Query cache
The query cache is used to cache the results of frequently executed queries and helps
speed up search operations. The query cache is implemented as an Least Recently
Used (LRU) cache, which means that the least recently executed queries are evicted
from the cache when it becomes full.
The field cache, for example, will be cleared whenever an index refresh or index merging
operation is carried out, necessitating a new load of field values from the disc to memory.
Whenever a reindexing operation is performed, such as when a new document is added or
changed the index, the query cache will be deleted entirely.
It is important to note that clearing the cache can cause a temporary slowdown in performance,
as the cache will need to be re-populated with new data.
The dashboard visualizes the number of operations performed on your Elasticsearch cluster,
broken down by index, search, get, and delete operations.
This can help you identify operations that are taking longer than expected and that may require
optimization.
The following section describes the straightforward workload tuning of the instance for creating
an index on Elasticsearch. Before attempting to tune live traffic, this will likely be the team's initial
tuning test.
Detecting and determining the root cause of Elasticsearch slowness when concurrent operations
are executed is complex and should be performed with extra care and preparation. You can
anticipate carrying out environment testing in the following situations:
1. As anticipated for the peak season, when the production workload is at capacity or at its
peak.
2. All the index-related operations that are expected to be executed during the peak
workload (index build, NRT updates to the index, inventory index updates, etc.).
3. Other operations do not directly affect the index or search data but can affect the overall
operations (for example, full cache clear after reindexing, etc.).
The case of flow congestion
Certain situations are frequently encountered when executing a mixed workload, such as funnel
congestion and performance degradation. Differing from the single type of operation, this is a
more complicated tuning exercise that observes and adjusts for insufficient resources and
balances those resources according to some priority. For example, the production search
request should precede the index build operations.
In a case where the slowness of Elasticsearch is due to a workload mix in which the search
cluster was indexing and serving live request data, the tuning comes more complicated and
requires upfront decision-making regarding the priorities of the workload mix types.
In most cases, you need to prioritize one type of operation over another. In this example, it would
require selecting a higher priority for the Live searches and a lower priority for the index creation
on the Auth environment.
Elasticsearch has a write thread pool that handles indexing requests. By default, this pool has a
size of (number of CPU cores * 2) + 1. You can increase or decrease the size of this pool based
on the workload of your cluster.
To change the size of the write thread pool, you can use the thread_pool.write.size setting in
the Elasticsearch configuration file. For example, to increase the size of the write thread pool
to 16, you can add the following line to the configuration file:
thread_pool.write.size : 16.
To change the size of the search thread pool, you can use the thread_pool.search.size setting
in the Elasticsearch configuration file. For example, to increase the size of the search thread pool
to 24, you can add the following line to the configuration file:
thread_pool.search.size: 24.
Important: Changing the thread pool sizes should be done cautiously and after thorough testing.
Increasing the thread pool size too much can lead to resource exhaustion and cause
performance issues. It is recommended that you monitor the resource usage of your cluster after
changing the thread pool sizes and adjust them accordingly to achieve optimal performance.
Adjusting index settings can also effectively tune Elasticsearch for better performance. The
following parameters can be changed to enhance indexing performance:
Refresh interval
By default, Elasticsearch refreshes the search index every second. This means that
when a document is indexed, it will be immediately available for search at the next
refresh interval. If you have a high volume of indexing requests, consider increasing the
refresh interval to reduce the frequency of index refreshes and improve indexing
performance. Conversely, if you need real-time indexing, you can decrease the refresh
interval.
To adjust the refresh interval, use the index.refresh_interval setting. For example, to set
the refresh interval to 5 seconds, you can run the following command:
PUT /my_index/_settings
{
"index" : {
"refresh_interval" : "5s"
}
}
Number of shards
The number of shards in an index can also affect indexing performance. If you have an
extensive index with many shards, indexing performance may be slower due to the
overhead of coordinating writes across multiple shards. In general, keeping the number
of shards per index between 1 and 5 is recommended.
To adjust the number of shards, you can use the index.number_of_shards setting. For
example, to set the number of shards to 3, you can run the following command:
PUT /my_index/_settings
{
"index" : {
"number_of_shards" : 3
}
}
Tuning knobs
Elasticsearch has several configuration points that may be changed for optimal performance and
the allocation of resources. The following are some critical tuning knobs to think about.
Heap Size
The heap size is one of the most critical tuning parameters for Elasticsearch. It
determines the amount of memory allocated to Elasticsearch's JVM and affects various
operations, including caching, indexing, and search.
Note: It is recommended to allocate around 50%of available memory to the heap, up to a
maximum of 30 GB. Elasticsearch uses native memory for various caches and buffers in
intra- and inter-pod communications.
Thread Pools
Elasticsearch uses various thread pools for different operations, such as indexing,
searching, and merging. You can tune the thread pool settings to control the number of
threads allocated for each type of operation and adjust the queue size for pending
requests. This can help balance the allocation of resources based on your specific
workload.
Circuit Breakers
Circuit breakers protect Elasticsearch against excessive memory usage or disk space
consumption. You can configure circuit breaker settings to control how Elasticsearch
handles resource limitations and prevent out-of-memory or disk space errors.
Cache Size
Elasticsearch uses various caches, such as the field data and query cache, to improve
search performance. You can adjust the cache size settings to optimize memory usage
based on your query patterns and data size.
Query Caching
Elasticsearch supports query caching, which can improve query performance by caching
the results of frequently executed queries. To optimize performance, you can enable
query caching and adjust cache settings, such as the cache size and expiration time.
Hardware Resources
Elasticsearch's performance is heavily influenced by the hardware resources available.
Consider tuning the hardware configuration, such as CPU memory, disk I/O, and network
settings, to match your workload requirements and optimize performance.
File Descriptors and Process Limits
Elasticsearch is a resource-intensive application requiring sufficient file descriptors and
process limits to function correctly. You can increase the maximum number of open file
descriptors and adjust process limits to accommodate the needs of your Elasticsearch
cluster.
Indexing Buffer Settings
Elasticsearch uses memory buffers to stage data before it is written to disk. You can tune
the indexing buffer sizes to optimize indexing performance.
The indices.memory.index_buffer_size setting controls the size of the buffer for each
shard.
Query-Time Filters
Elasticsearch provides query-time filters that allow you to apply filters to a query. Using
filters can improve query performance by reducing the amount of data that needs to be
processed.
Refresh and Flush Intervals
Elasticsearch periodically refreshes its index to make new data searchable. You can
adjust the refresh interval to balance indexing performance and search latency.
Similarly, the flush interval controls how often Elasticsearch writes data from memory to
disk. Adjusting these intervals can impact indexing throughput and resource usage. An
Elasticsearch flush performs a Lucene commit and starts a new translog generation.
Flushes are performed automatically in the background to ensure the translog does not
grow too large, which would make replaying its operations take considerable time during
recovery.
Translog Durability
The translog is a transaction log that ensures data durability in case of node failures.
Adjust the translog durability settings to balance data safety and indexing performance.
For more information, see Translog.
By default, Elasticsearch uses the async durability mode, where the changes are
periodically synced to disk but not necessarily after each request. This provides a good
balance between durability and performance.
Aggregations
Elasticsearch provides powerful aggregation capabilities, but complex aggregations can
impact performance. You can tune aggregation settings, such
as search.max_buckets and indices.breaker.total.limit, to control the memory usage
and limit the number of buckets aggregations produce.
For more information, see Aggregations.
Shard Size
Each shard in Elasticsearch comes with some overhead, so having a large number of
small shards can impact performance. It is essential to balance the number of shards and
the size of each shard based on your data volume and hardware resources.
Shard Allocation
Shards are the basic units of data distribution in Elasticsearch. By default, Elasticsearch
tries to distribute shards evenly across nodes. However, you can control shard allocation
settings to ensure balanced resource usage and optimize cluster performance.
Shard Routing
Elasticsearch distributes shards across nodes based on a hashing algorithm. You can
influence shard routing by customizing the shard allocation process using shard
allocation filters and allocation awareness settings. This can help balance data
distribution and improve cluster performance.
Network Settings
Adjusting network settings, such as TCP configurations, can impact the performance and
responsiveness of Elasticsearch. To ensure efficient network communication, you can
optimize settings like TCP keep-alive, socket buffers, and connection timeouts.
Data Serialization and Compression
Elasticsearch allows configuring data serialization and compression options, such as
using a more efficient binary format (like SMILE or CBOR) or enabling compression for
network communication. These settings can improve storage efficiency and reduce
network overhead.
See Save space and money with improved storage efficiency in Elasticsearch 7.10 for
more information.
All these tuning knobs provide flexibility for optimizing Elasticsearch based on your specific
workload, hardware resources, and performance requirements. It is essential to carefully monitor
the impact of any changes and conduct performance testing to ensure optimal results.
Additionally, always refer to the official Elasticsearch documentation and consider the
recommendations provided by Elastic for tuning and optimization.
Thread pools
A node uses several thread pools to manage memory consumption. Queues associated with many of
the thread pools enable pending requests to be held instead of discarded.
There are several thread pools, but the important ones include:
generic
For generic operations (for example, background node discovery). Thread pool type is scaling.
search
For count/search/suggest operations. Thread pool type is fixed with a size of int((# of allocated
processors * 3) / 2) + 1, and queue_size of 1000.
search_throttled
For count/search/suggest/get operations on search_throttled indices. Thread pool type is fixed with
a size of 1, and queue_size of 100.
search_coordination
For lightweight search-related coordination operations. Thread pool type is fixed with a size of a max
of min(5, (# of allocated processors) / 2), and queue_size of 1000.
get
For get operations. Thread pool type is fixed with a size of int((# of allocated processors * 3) / 2) + 1,
and queue_size of 1000.
analyze
For analyze requests. Thread pool type is fixed with a size of 1, queue size of 16.
write
For single-document index/delete/update and bulk requests. Thread pool type is fixed with a size
of # of allocated processors, queue_size of 10000. The maximum size for this pool is 1 + # of
allocated processors.
snapshot
For snapshot/restore operations. Thread pool type is scaling with a keep-alive of 5m. On nodes with
at least 750MB of heap the maximum size of this pool is 10 by default. On nodes with less than
750MB of heap the maximum size of this pool is min(5, (# of allocated processors) / 2) by default.
snapshot_meta
For snapshot repository metadata read operations. Thread pool type is scaling with a keep-alive
of 5m and a max of min(50, (# of allocated processors* 3)).
warmer
For segment warm-up operations. Thread pool type is scaling with a keep-alive of 5m and a max
of min(5, (# of allocated processors) / 2).
refresh
For refresh operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(10, (# of
allocated processors) / 2).
fetch_shard_started
For listing shard states. Thread pool type is scaling with keep-alive of 5m and a default maximum size
of 2 * # of allocated processors.
fetch_shard_store
For listing shard stores. Thread pool type is scaling with keep-alive of 5m and a default maximum size
of 2 * # of allocated processors.
flush
For flush and translog fsync operations. Thread pool type is scaling with a keep-alive of 5m and a
default maximum size of min(5, (# of allocated processors) / 2).
force_merge
For force merge operations. Thread pool type is fixed with a size of max(1, (# of allocated processors)
/ 8) and an unbounded queue size.
management
For cluster management. Thread pool type is scaling with a keep-alive of 5m and a default maximum
size of 5.
system_read
For read operations on system indices. Thread pool type is fixed with a default maximum size
of min(5, (# of allocated processors) / 2).
system_write
For write operations on system indices. Thread pool type is fixed with a default maximum size
of min(5, (# of allocated processors) / 2).
system_critical_read
For critical read operations on system indices. Thread pool type is fixed with a default maximum size
of min(5, (# of allocated processors) / 2).
system_critical_write
For critical write operations on system indices. Thread pool type is fixed with a default maximum size
of min(5, (# of allocated processors) / 2).
watcher
For watch executions. Thread pool type is fixed with a default maximum size of min(5 * (# of
allocated processors), 50) and queue_size of 1000.
Thread pool settings are static and can be changed by editing elasticsearch.yml. Changing a specific
thread pool can be done by setting its type-specific parameters; for example, changing the number of
threads in the write thread pool:
thread_pool:
write:
size: 30
The following are the types of thread pools and their respective parameters:
fixed
The fixed thread pool holds a fixed size of threads to handle the requests with a queue (optionally
bounded) for pending requests that have no threads to service them.
The queue_size allows to control the size of the queue of pending requests that have no threads to
execute them. By default, it is set to -1 which means its unbounded. When a request comes in and
the queue is full, it will abort the request.
thread_pool:
write:
size: 30
queue_size: 1000
scaling
The scaling thread pool holds a dynamic number of threads. This number is proportional to the
workload and varies between the value of the core and max parameters.
The keep_alive parameter determines how long a thread should be kept around in the thread pool
without it doing any work.
thread_pool:
warmer:
core: 1
max: 8
keep_alive: 2m
The number of processors is automatically detected, and the thread pool settings are automatically
set based on it. In some cases it can be useful to override the number of detected processors. This
can be done by explicitly setting the node.processors setting. This setting is bounded by the number
of available processors and accepts floating point numbers, which can be useful in environments
where the Elasticsearch nodes are configured to run with CPU limits, such as cpu shares or quota
under Cgroups.
node.processors: 2
There are a few use-cases for explicitly overriding the node.processors setting:
1. If you are running multiple instances of Elasticsearch on the same host but want
Elasticsearch to size its thread pools as if it only has a fraction of the CPU, you should
override the node.processors setting to the desired fraction, for example, if you’re running
two instances of Elasticsearch on a 16-core machine, set node.processors to 8. Note that this
is an expert-level use case and there’s a lot more involved than just setting
the node.processors setting as there are other considerations like changing the number of
garbage collector threads, pinning processes to cores, and so on.
2. Sometimes the number of processors is wrongly detected and in such cases explicitly setting
the node.processors setting will workaround such issues.
In order to check the number of processors detected, use the nodes info API with the os flag.
Elasticsearch contains multiple circuit breakers used to prevent operations from causing an
OutOfMemoryError. Each breaker specifies a limit for how much memory it can use. Additionally,
there is a parent-level breaker that specifies the total amount of memory that can be used across all
breakers.
Except where noted otherwise, these settings can be dynamically updated on a live cluster with
the cluster-update-settings API.
For information about circuit breaker errors, see Circuit breaker errors.
indices.breaker.total.use_real_memory
(Static) Determines whether the parent breaker should take real memory usage into account (true)
or only consider the amount that is reserved by child circuit breakers (false). Defaults to true.
indices.breaker.total.limit
(Dynamic) Starting limit for overall parent breaker. Defaults to 70% of JVM heap
if indices.breaker.total.use_real_memory is false. If indices.breaker.total.use_real_memory is true,
defaults to 95% of the JVM heap.
The field data circuit breaker estimates the heap memory required to load a field into the field data
cache. If loading the field would cause the cache to exceed a predefined memory limit, the circuit
breaker stops the operation and returns an error.
indices.breaker.fielddata.limit
indices.breaker.fielddata.overhead
(Dynamic) A constant that all field data estimations are multiplied with to determine a final
estimation. Defaults to 1.03.
The request circuit breaker allows Elasticsearch to prevent per-request data structures (for example,
memory used for calculating aggregations during a request) from exceeding a certain amount of
memory.
indices.breaker.request.limit
indices.breaker.request.overhead
(Dynamic) A constant that all request estimations are multiplied with to determine a final estimation.
Defaults to 1.
In flight requests circuit breaker
The in flight requests circuit breaker allows Elasticsearch to limit the memory usage of all currently
active incoming requests on transport or HTTP level from exceeding a certain amount of memory on
a node. The memory usage is based on the content length of the request itself. This circuit breaker
also considers that memory is not only needed for representing the raw request but also as a
structured object which is reflected by default overhead.
network.breaker.inflight_requests.limit
(Dynamic) Limit for in flight requests breaker, defaults to 100% of JVM heap. This means that it is
bound by the limit configured for the parent circuit breaker.
network.breaker.inflight_requests.overhead
(Dynamic) A constant that all in flight requests estimations are multiplied with to determine a final
estimation. Defaults to 2.
The accounting circuit breaker allows Elasticsearch to limit the memory usage of things held in
memory that are not released when a request is completed. This includes things like the Lucene
segment memory.
indices.breaker.accounting.limit
(Dynamic) Limit for accounting breaker, defaults to 100% of JVM heap. This means that it is bound by
the limit configured for the parent circuit breaker.
indices.breaker.accounting.overhead
(Dynamic) A constant that all accounting estimations are multiplied with to determine a final
estimation. Defaults to 1
Slightly different than the previous memory-based circuit breaker, the script compilation circuit
breaker limits the number of inline script compilations within a period of time.
See the "prefer-parameters" section of the scripting documentation for more information.
script.max_compilations_rate
(Dynamic) Limit for the number of unique dynamic scripts within a certain interval that are allowed
to be compiled. Defaults to 150/5m, meaning 150 every 5 minutes.
If the cluster regularly hits the given max_compilation_rate, it’s possible the script cache is
undersized, use Nodes Stats to inspect the number of recent cache
evictions, script.cache_evictions_history and compilations script.compilations_history. If there are a
large number of recent cache evictions or compilations, the script cache may be undersized, consider
doubling the size of the script cache via the setting script.cache.max_size.
Poorly written regular expressions can degrade cluster stability and performance. The regex circuit
breaker limits the use and complexity of regex in Painless scripts.
script.painless.regex.enabled
limited (Default)
Enables regex but limits complexity using the script.painless.regex.limit-factor cluster setting.
true
Enables regex with no complexity limits. Disables the regex circuit breaker.
false
Disables regex. Any Painless script containing a regular expression returns an error.
script.painless.regex.limit-factor
(Static) Limits the number of characters a regular expression in a Painless script can consider.
Elasticsearch calculates this limit by multiplying the setting value by the script input’s character
length.
When a sequence query is executed, the node handling the query needs to keep some structures in
memory, which are needed by the algorithm implementing the sequence matching. When large
amounts of data need to be processed, and/or a large amount of matched sequences is requested by
the user (by setting the size query param), the memory occupied by those structures could
potentially exceed the available memory of the JVM. This would cause an OutOfMemory exception
which would bring down the node.
To prevent this from happening, a special circuit breaker is used, which limits the memory allocation
during the execution of a sequence query. When the breaker is triggered,
an org.elasticsearch.common.breaker.CircuitBreakingException is thrown and a descriptive error
message is returned to the user.
breaker.eql_sequence.limit
(Dynamic) The limit for circuit breaker used to restrict the memory utilisation during the execution of
an EQL sequence query. This value is defined as a percentage of the JVM heap. Defaults to 50%. If
the parent circuit breaker is set to a value less than 50%, this setting uses that value as its default
instead.
breaker.eql_sequence.overhead
(Dynamic) A constant that sequence query memory estimates are multiplied by to determine a final
estimate. Defaults to 1.
breaker.eql_sequence.type
(Static) Circuit breaker type. Valid values are:
memory (Default)
noop
breaker.model_inference.limit
(Dynamic) The limit for the trained model circuit breaker. This value is defined as a percentage of the
JVM heap. Defaults to 50%. If the parent circuit breaker is set to a value less than 50%, this setting
uses that value as its default instead.
breaker.model_inference.overhead
(Dynamic) A constant that all trained model estimations are multiplied by to determine a final
estimation. See Circuit breaker settings. Defaults to 1.
breaker.model_inference.type
(Static) The underlying type of the circuit breaker. There are two valid
options: noop and memory. noop means the circuit breaker does nothing to prevent too much
memory usage. memory means the circuit breaker tracks the memory used by trained models and
can potentially break and prevent OutOfMemory errors. The default value is memory.
Cache size
The entries in the cache are expensive to build, so the default behavior is to keep the cache
loaded in memory. The default cache size is unlimited, causing the cache to grow until it
reaches the limit set by the field data circuit breaker. This behavior can be configured.
If the cache size limit is set, the cache will begin clearing the least-recently-updated entries in
the cache. This setting can automatically avoid the circuit breaker limit, at the cost of
rebuilding the cache as needed.
If the circuit breaker limit is reached, further requests that increase the cache size will be
prevented. In this case you should manually clear the cache.
indices.fielddata.cache.size
(Static) The max size of the field data cache, eg 38% of node heap space, or an absolute value,
eg 12GB. Defaults to unbounded. If you choose to set it, it should be smaller than Field data
circuit breaker limit.
Elasticsearch contains multiple circuit breakers used to prevent operations from causing an
OutOfMemoryError. Each breaker specifies a limit for how much memory it can use. Additionally,
there is a parent-level breaker that specifies the total amount of memory that can be used across all
breakers.
Except where noted otherwise, these settings can be dynamically updated on a live cluster with
the cluster-update-settings API.
For information about circuit breaker errors, see Circuit breaker errors.
indices.breaker.total.use_real_memory
(Static) Determines whether the parent breaker should take real memory usage into account (true)
or only consider the amount that is reserved by child circuit breakers (false). Defaults to true.
indices.breaker.total.limit
(Dynamic) Starting limit for overall parent breaker. Defaults to 70% of JVM heap
if indices.breaker.total.use_real_memory is false. If indices.breaker.total.use_real_memory is true,
defaults to 95% of the JVM heap.
The field data circuit breaker estimates the heap memory required to load a field into the field data
cache. If loading the field would cause the cache to exceed a predefined memory limit, the circuit
breaker stops the operation and returns an error.
indices.breaker.fielddata.limit
indices.breaker.fielddata.overhead
(Dynamic) A constant that all field data estimations are multiplied with to determine a final
estimation. Defaults to 1.03.
indices.breaker.request.limit
indices.breaker.request.overhead
(Dynamic) A constant that all request estimations are multiplied with to determine a final estimation.
Defaults to 1.
The in flight requests circuit breaker allows Elasticsearch to limit the memory usage of all currently
active incoming requests on transport or HTTP level from exceeding a certain amount of memory on
a node. The memory usage is based on the content length of the request itself. This circuit breaker
also considers that memory is not only needed for representing the raw request but also as a
structured object which is reflected by default overhead.
network.breaker.inflight_requests.limit
(Dynamic) Limit for in flight requests breaker, defaults to 100% of JVM heap. This means that it is
bound by the limit configured for the parent circuit breaker.
network.breaker.inflight_requests.overhead
(Dynamic) A constant that all in flight requests estimations are multiplied with to determine a final
estimation. Defaults to 2.
The accounting circuit breaker allows Elasticsearch to limit the memory usage of things held in
memory that are not released when a request is completed. This includes things like the Lucene
segment memory.
indices.breaker.accounting.limit
(Dynamic) Limit for accounting breaker, defaults to 100% of JVM heap. This means that it is bound by
the limit configured for the parent circuit breaker.
indices.breaker.accounting.overhead
(Dynamic) A constant that all accounting estimations are multiplied with to determine a final
estimation. Defaults to 1
Slightly different than the previous memory-based circuit breaker, the script compilation circuit
breaker limits the number of inline script compilations within a period of time.
See the "prefer-parameters" section of the scripting documentation for more information.
script.max_compilations_rate
(Dynamic) Limit for the number of unique dynamic scripts within a certain interval that are allowed
to be compiled. Defaults to 150/5m, meaning 150 every 5 minutes.
If the cluster regularly hits the given max_compilation_rate, it’s possible the script cache is
undersized, use Nodes Stats to inspect the number of recent cache
evictions, script.cache_evictions_history and compilations script.compilations_history. If there are a
large number of recent cache evictions or compilations, the script cache may be undersized, consider
doubling the size of the script cache via the setting script.cache.max_size.
Poorly written regular expressions can degrade cluster stability and performance. The regex circuit
breaker limits the use and complexity of regex in Painless scripts.
script.painless.regex.enabled
limited (Default)
Enables regex but limits complexity using the script.painless.regex.limit-factor cluster setting.
true
Enables regex with no complexity limits. Disables the regex circuit breaker.
false
Disables regex. Any Painless script containing a regular expression returns an error.
script.painless.regex.limit-factor
(Static) Limits the number of characters a regular expression in a Painless script can consider.
Elasticsearch calculates this limit by multiplying the setting value by the script input’s character
length.
When a sequence query is executed, the node handling the query needs to keep some structures in
memory, which are needed by the algorithm implementing the sequence matching. When large
amounts of data need to be processed, and/or a large amount of matched sequences is requested by
the user (by setting the size query param), the memory occupied by those structures could
potentially exceed the available memory of the JVM. This would cause an OutOfMemory exception
which would bring down the node.
To prevent this from happening, a special circuit breaker is used, which limits the memory allocation
during the execution of a sequence query. When the breaker is triggered,
an org.elasticsearch.common.breaker.CircuitBreakingException is thrown and a descriptive error
message is returned to the user.
This circuit breaker can be configured using the following settings:
breaker.eql_sequence.limit
(Dynamic) The limit for circuit breaker used to restrict the memory utilisation during the execution of
an EQL sequence query. This value is defined as a percentage of the JVM heap. Defaults to 50%. If
the parent circuit breaker is set to a value less than 50%, this setting uses that value as its default
instead.
breaker.eql_sequence.overhead
(Dynamic) A constant that sequence query memory estimates are multiplied by to determine a final
estimate. Defaults to 1.
breaker.eql_sequence.type
memory (Default)
noop
breaker.model_inference.limit
(Dynamic) The limit for the trained model circuit breaker. This value is defined as a percentage of the
JVM heap. Defaults to 50%. If the parent circuit breaker is set to a value less than 50%, this setting
uses that value as its default instead.
breaker.model_inference.overhead
(Dynamic) A constant that all trained model estimations are multiplied by to determine a final
estimation. See Circuit breaker settings. Defaults to 1.
breaker.model_inference.type
(Static) The underlying type of the circuit breaker. There are two valid
options: noop and memory. noop means the circuit breaker does nothing to prevent too much
memory usage. memory means the circuit breaker tracks the memory used by trained models and
can potentially break and prevent OutOfMemory errors. The default value is memory.
Term queries and queries used outside of a filter context are not eligible for caching.
By default, the cache holds a maximum of 10000 queries in up to 10% of the total heap
space. To determine if a query is eligible for caching, Elasticsearch maintains a query history
to track occurrences.
Caching is done on a per segment basis if a segment contains at least 10000 documents and
the segment has at least 3% of the total documents of a shard. Because caching is per
segment, merging segments can invalidate cached queries.
The following setting is static and must be configured on every data node in the cluster:
indices.queries.cache.size
(Static) Controls the memory size for the filter cache. Accepts either a percentage value,
like 5%, or an exact value, like 512mb. Defaults to 10%.
index.queries.cache.enabled
(Static) Controls whether to enable query caching. Accepts true (default) or false.
The following settings are static and must be configured on every data node in the cluster:
indices.memory.index_buffer_size
(Static) Accepts either a percentage or a byte size value. It defaults to 10%, meaning
that 10% of the total heap allocated to a node will be used as the indexing buffer size
shared across all shards.
indices.memory.min_index_buffer_size
(Static) If the index_buffer_size is specified as a percentage, then this setting can
be used to specify an absolute minimum. Defaults to 48mb.
indices.memory.max_index_buffer_size
(Static) If the index_buffer_size is specified as a percentage, then this setting can
be used to specify an absolute maximum. Defaults to unbounded.
Similarly to sizing bulk requests, only testing can tell what the optimal number of workers is.
This can be tested by progressively increasing the number of workers until either I/O or CPU
is saturated on the cluster.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that
have received one search request or more in the last 30 seconds.
This is the optimal configuration if you have no or very little search traffic (e.g. less than one
search request every 5 minutes) and want to optimize for indexing speed. This behavior aims
to automatically optimize bulk indexing in the default case when no searches are performed.
In order to opt out of this behavior set the refresh interval explicitly.
On the other hand, if your index experiences regular search requests, this default behavior
means that Elasticsearch will refresh your index every 1 second. If you can afford to increase
the amount of time between when a document gets indexed and when it becomes visible,
increasing the index.refresh_interval to a larger value, e.g. 30s, might help improve
indexing speed.
Disable swappingedit
You should make sure that the operating system is not swapping out the java process
by disabling swapping.
Directly-attached (local) storage generally performs better than remote storage because it is
simpler to configure well and avoids communications overheads. With careful tuning it is
sometimes possible to achieve acceptable performance using remote storage too. Benchmark
your system with a realistic workload to determine the effects of any tuning parameters. If
you cannot achieve the performance you expect, work with the vendor of your storage system
to identify the problem.
The default is 10% which is often plenty: for example, if you give the JVM 10GB of memory,
it will give 1GB to the index buffer, which is enough to host two shards that are heavily
indexing.
Index modulesedit
Index Modules are modules created per index and control all aspects related to an index.
Index Settingsedit
static
They can only be set at index creation time or on a closed index, or by using the update-index-
settings API with the reopen query parameter set to true (which automatically closes and reopens
impacted indices).
dynamic
Changing static or dynamic index settings on a closed index could result in incorrect settings that are
impossible to rectify without deleting and recreating the index.
Below is a list of all static index settings that are not associated with any specific index module:
index.number_of_shards
The number of primary shards that an index should have. Defaults to 1. This setting can only be set at
index creation time. It cannot be changed on a closed index.
The number of shards are limited to 1024 per index. This limitation is a safety limit to prevent
accidental creation of indices that can destabilize a cluster due to resource allocation. The limit can
be modified by specifying export ES_JAVA_OPTS="-Des.index.max_number_of_shards=128" system
property on every node that is part of the cluster.
index.number_of_routing_shards
Elasticsearch uses this value when splitting an index. For example, a 5 shard index
with number_of_routing_shards set to 30 (5 x 2 x 3) could be split by a factor of 2 or 3. In other
words, it could be split as follows:
5 → 10 → 30 (split by 2, then by 3)
5 → 15 → 30 (split by 3, then by 2)
5 → 30 (split by 6)
This setting’s default value depends on the number of primary shards in the index. The default is
designed to allow you to split by factors of 2 up to a maximum of 1024 shards.
In Elasticsearch 7.0.0 and later versions, this setting affects how documents are distributed across
shards. When reindexing an older index with custom routing, you must explicitly
set index.number_of_routing_shards to maintain the same document distribution. See the related
breaking change.
index.codec
The default value compresses stored data with LZ4 compression, but this can be set
to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower
stored fields performance. If you are updating the compression type, the new one will be applied
after segments are merged. Segment merging can be forced using force merge.
index.routing_partition_size
The number of shards a custom routing value can go to. Defaults to 1 and can only be set at index
creation time. This value must be less than the index.number_of_shards unless
the index.number_of_shards value is also 1. See Routing to an index partition for more details about
how this setting is used.
index.soft_deletes.enabled
[7.6.0] Deprecated in 7.6.0. Creating indices with soft-deletes disabled is deprecated and will be
removed in future Elasticsearch versions.Indicates whether soft deletes are enabled on the index.
Soft deletes can only be configured at index creation and only on indices created on or after
Elasticsearch 6.5.0. Defaults to true.
index.soft_deletes.retention_lease.period
The maximum period to retain a shard history retention lease before it is considered expired. Shard
history retention leases ensure that soft deletes are retained during merges on the Lucene index. If a
soft delete is merged away before it can be replicated to a follower the following process will fail due
to incomplete history on the leader. Defaults to 12h.
index.load_fixed_bitset_filters_eagerly
Indicates whether cached filters are pre-loaded for nested queries. Possible values are true (default)
and false.
index.shard.check_on_startup
Expert users only. This setting enables some very expensive processing at shard startup and is only
ever useful while diagnosing a problem in your cluster. If you do use it, you should do so only
temporarily and remove it once it is no longer needed.
Elasticsearch automatically performs integrity checks on the contents of shards at various points
during their lifecycle. For instance, it verifies the checksum of every file transferred when recovering
a replica or taking a snapshot. It also verifies the integrity of many important files when opening a
shard, which happens when starting up a node and when finishing a shard recovery or relocation.
You can therefore manually verify the integrity of a whole shard while it is running by taking a
snapshot of it into a fresh repository or by recovering it onto a fresh node.
This setting determines whether Elasticsearch performs additional integrity checks while opening a
shard. If these checks detect corruption then they will prevent the shard from being opened. It
accepts the following values:
false
Don’t perform additional checks for corruption when opening a shard. This is the default and
recommended behaviour.
checksum
Verify that the checksum of every file in the shard matches its contents. This will detect cases where
the data read from disk differ from the data that Elasticsearch originally wrote, for instance due to
undetected disk corruption or other hardware failures. These checks require reading the entire shard
from disk which takes substantial time and IO bandwidth and may affect cluster performance by
evicting important data from your filesystem cache.
true
Performs the same checks as checksum and also checks for logical inconsistencies in the shard, which
could for instance be caused by the data being corrupted while it was being written due to faulty
RAM or other hardware failures. These checks require reading the entire shard from disk which takes
substantial time and IO bandwidth, and then performing various checks on the contents of the shard
which take substantial time, CPU and memory.
Below is a list of all dynamic index settings that are not associated with any specific index module:
index.number_of_replicas
index.auto_expand_replicas
Auto-expand the number of replicas based on the number of data nodes in the cluster. Set to a dash
delimited lower and upper bound (e.g. 0-5) or use all for the upper bound (e.g. 0-all). Defaults
to false (i.e. disabled). Note that the auto-expanded number of replicas only takes allocation
filtering rules into account, but ignores other allocation rules such as total shards per node, and this
can lead to the cluster health becoming YELLOW if the applicable rules prevent all the replicas from
being allocated.
index.search.idle.after
How long a shard can not receive a search or get request until it’s considered search idle. (default
is 30s)
index.refresh_interval
How often to perform a refresh operation, which makes recent changes to the index visible to search.
Defaults to 1s. Can be set to -1 to disable refresh. If this setting is not explicitly set, shards that
haven’t seen search traffic for at least index.search.idle.after seconds will not receive background
refreshes until they receive a search request. Searches that hit an idle shard where a refresh is
pending will trigger a refresh as part of the search operation for that shard only. This behavior aims
to automatically optimize bulk indexing in the default case when no searches are performed. In order
to opt out of this behavior an explicit value of 1s should set as the refresh interval.
index.max_result_window
The maximum value of from + size for searches to this index. Defaults to 10000. Search requests take
heap memory and time proportional to from + size and this limits that memory. See Scroll or Search
After for a more efficient alternative to raising this.
index.max_inner_result_window
The maximum value of from + size for inner hits definition and top hits aggregations to this index.
Defaults to 100. Inner hits and top hits aggregation take heap memory and time proportional to from
+ size and this limits that memory.
index.max_rescore_window
The maximum value of window_size for rescore requests in searches of this index. Defaults
to index.max_result_window which defaults to 10000. Search requests take heap memory and time
proportional to max(window_size, from + size) and this limits that memory.
index.max_docvalue_fields_search
The maximum number of docvalue_fields that are allowed in a query. Defaults to 100. Doc-value
fields are costly since they might incur a per-field per-document seek.
index.max_script_fields
The maximum number of script_fields that are allowed in a query. Defaults to 32.
index.max_ngram_diff
The maximum allowed difference between min_gram and max_gram for NGramTokenizer and
NGramTokenFilter. Defaults to 1.
index.max_shingle_diff
index.max_refresh_listeners
Maximum number of refresh listeners available on each shard of the index. These listeners are used
to implement refresh=wait_for.
index.analyze.max_token_count
The maximum number of tokens that can be produced using _analyze API. Defaults to 10000.
index.highlight.max_analyzed_offset
The maximum number of characters that will be analyzed for a highlight request. This setting is only
applicable when highlighting is requested on a text that was indexed without offsets or term vectors.
Defaults to 1000000.
index.max_terms_count
The maximum number of terms that can be used in Terms Query. Defaults to 65536.
index.max_regex_length
The maximum length of regex that can be used in Regexp Query. Defaults to 1000.
index.query.default_field
(string or array of strings) Wildcard (*) patterns matching one or more fields. The following query
types search these matching fields by default:
Multi-match
Query string
Defaults to *, which matches all fields eligible for term-level queries, excluding metadata fields.
index.routing.allocation.enable
index.routing.rebalance.enable
index.gc_deletes
The length of time that a deleted document’s version number remains available for further versioned
operations. Defaults to 60s.
index.default_pipeline
Default ingest pipeline for the index. Index requests will fail if the default pipeline is set and the
pipeline does not exist. The default may be overridden using the pipeline parameter. The special
pipeline name _none indicates no default ingest pipeline will run.
index.final_pipeline
Final ingest pipeline for the index. Indexing requests will fail if the final pipeline is set and the
pipeline does not exist. The final pipeline always runs after the request pipeline (if specified) and the
default pipeline (if it exists). The special pipeline name _none indicates no final ingest pipeline will
run.
You can’t use a final pipeline to change the _index field. If the pipeline attempts to change
the _index field, the indexing request will fail.
index.hidden
Indicates whether the index should be hidden by default. Hidden indices are not returned by default
when using a wildcard expression. This behavior is controlled per request through the use of
the expand_wildcards parameter. Possible values are true and false (default).
Analysis
Control over where, when, and how shards are allocated to nodes.
Mapping
Merging
Control over how shards are merged by the background merge process.
Similarities
Configure custom similarity settings to customize how search results are scored.
Slowlog
Control over how slow queries and fetch requests are logged.
Store
Translog
History retention
Indexing pressure
Translogedit
Changes to Lucene are only persisted to disk during a Lucene commit, which is a relatively
expensive operation and so cannot be performed after every index or delete operation.
Changes that happen after one commit and before another will be removed from the index by
Lucene in the event of process exit or hardware failure.
Lucene commits are too expensive to perform on every individual change, so each shard copy
also writes operations into its transaction log known as the translog. All index and delete
operations are written to the translog after being processed by the internal Lucene index but
before they are acknowledged. In the event of a crash, recent operations that have been
acknowledged but not yet included in the last Lucene commit are instead recovered from the
translog when the shard recovers.
An Elasticsearch flush is the process of performing a Lucene commit and starting a new
translog generation. Flushes are performed automatically in the background in order to make
sure the translog does not grow too large, which would make replaying its operations take a
considerable amount of time during recovery. The ability to perform a flush manually is also
exposed through an API, although this is rarely needed.
Translog settingsedit
The data in the translog is only persisted to disk when the translog is fsynced and committed.
In the event of a hardware failure or an operating system crash or a JVM crash or a shard
failure, any data written since the previous translog commit will be lost.
The following dynamically updatable per-index settings control the behaviour of the translog:
index.translog.sync_interval
How often the translog is fsynced to disk and committed, regardless of write operations.
Defaults to 5s. Values less than 100ms are not allowed.
index.translog.durability
Whether or not to fsync and commit the translog after every index, delete, update, or
bulk request. This setting accepts the following parameters:
request
(default) fsync and commit after every request. In the event of hardware failure, all
acknowledged writes will already have been committed to disk.
async
fsync and commit in the background every sync_interval. In the event of a failure, all
acknowledged writes since the last automatic commit will be discarded.
index.translog.flush_threshold_size
The translog stores all operations that are not yet safely persisted in Lucene (i.e., are not part
of a Lucene commit point). Although these operations are available for reads, they will need
to be replayed if the shard was stopped and had to be recovered. This setting controls the
maximum total size of these operations, to prevent recoveries from taking too long. Once the
maximum size has been reached a flush will happen, generating a new Lucene commit point.
Defaults to 512mb.
NiFi can ingest information faster than Elasticsearch can index it. It is therefore possible for NiFi
to overwhelm Elasticsearch with data, which can result in performance degradation. If do not
have separate Auth and Live environments, your production search experience may be affected.
The solution is NiFi tuning. If the rate at which NiFi ingests data is less than or equal to the rate
at which Elasticsearch consumes it, then there is no performance degradation. This is a
straightforward solution in theory, however, each HCL Commerce Search environment is
different. Therefore, tuning has to be done on an individual basis, specifically for each
implementation. To ensure that you are able to do this, HCL Commerce provides the following
guidelines, methods and parameter settings.
General approach
Tuning NiFi to match your Elasticsearch throughput involves adding additional configuration
options to NiFi tuning parameters. These configuration points are provided in the following
documents. In addition to the tuning parameters themselves, a method for calculating
appropriate values for these tuning parameters is provided. This aids you in knowing what to
tune and how to tune it.
It is useful to break the tuning process down into two clear steps:
1. Tuning:
o By adding additional upgrade-friendly configuration for NiFi tuning parameters,
o By publicly documenting these configuration points,
o By privately documenting a method for calculating sane values for these tuning
parameters.
2. Automation:
o By adding an endpoint to Ingest service so that it can analyze the historical ingest
The automation phase includes adding an endpoint to the Ingest service. This endpoint will
analyze historical ingest data and calculate new tuning values. Another endpoint will assign
these new tuning values to the relevant ingest pipelines. This automation will streamline the
process and ensure accurate tuning based on actual data.
The Ingest dataflow consists of multiple business processing stages, linked together one after
another. Each stage is a stream of data moving from one location (database) to another
(Elasticsearch). Each dataflow involves three main ETL operations: Extracting, Transforming,and
Loading.
Each operation can be controlled by the following tuning parameters:
o Extracting uses page size and bucket size to determine the size of each
The tuning goal of Ingest dataflow is to obtain environment-specific tuning settings optimized for
each stage with the least overall Ingest elapsed time. It is recommended that you attempt to
satisfy certain assumptions when performing tuning to obtain more reliable results: use the
heaviest ingest run, such as the re-index connector, for tuning estimation, and only allow one
exclusive re-indexing operation to run in NiFi at any time.
When and how to tune your Ingest pipelines
Tuning Ingest dataflow is necessary to avoid overloading or idling the system. An
overloaded or underused system can experience performance issues.
Metrics for tuning NiFi
Factors to consider when calculating tuning parameters for each stage and the metrics to
collect from each stage. The importance of monitoring resource utilization is emphasized.
NiFi parameter tuning
The text provides formulas and suggested values for tuning parameters related to data
extraction, transformation, and bulk service in a given pipeline. These parameters are
calculated based on various factors and can help optimize the performance of the
system. Sample tests are provided to illustrate the application of these tuning
parameters.
Recommended Parameters for NIFI and Elasticsearch
You can run your Elasticsearch and NiFi environments using the default settings, which
provides a minimal resource set. For best performance, tune your configuration or use
the recommended parameters for CPU, memory and system resources.
Related tasks
Custom NiFi processors
Related reference
Ingest Store index pipeline
Ingest Store index schema
Ingest Catalog index pipeline
Ingest Catalog index schema
Ingest Product index pipeline
Ingest Product index schema
Ingest Category index pipeline
Ingest Category index schema
Ingest Attribute index pipeline
Ingest Attribute index schema
Ingest Inventory index pipeline
Ingest Inventory index schema
Ingest Price index pipeline
Ingest Price index schema
Ingest URL index pipeline
Ingest URL index schema
Ingest Synonym index pipeline
Ingest Stopword index pipeline
Index field type aliases and usage
Search issues
Logging and troubleshooting the Ingest and Query services
Elasticsearch index schema changes
Tuning
Generally, default settings will work for this process, but it may take a long time to finish the
indexing process for a large data set. Depending on the data set size and hardware configuration
(memory, CPU, disk, and network), improvements to the subgroups and process groups can be
made so process flows are faster and more efficient within subgroups, and between connected
subgroups.
Enter the SCROLL SQL group, then right click on the base canvas and select variables.
The scroll.page.size is the number of rows fetched from database by the SQL.
The scroll.bucket.size is the number of rows from the fetched data in each
bucket for processing. The bucket.size will determine the size of the flow files
(and the number of documents contained within each file).
• Depending on the catalog size, the SQL can take a long time to get response
data back to the NiFi. The purpose of the scroll SQL is to limit the data size that
can be processed in NiFi at once, to avoid memory errors on large catalogs.
• The scroll settings are optimal when the time that it takes to process the data is
evenly matched with the amount of time that the next SQL scroll takes to receive
its data. With this optimization, unnecessary processing or I/O delay is minimized.
• The output from one subgroup is handed in turn to the next connected process
subgroup. This process must be audited, to ensure that there are no bottlenecks
which can impact the efficiency of the overall process.
In the process group, the data set is fetched from the database by using a single SQL
stream. For example, the Find Associations at DatabaseProductStage1b.
Enter the Processor group that we are optimizing, right click on the base canvas, and
select variables.
The scroll.bucket.size is the number of rows from the fetched data that is placed in
each bucket for processing. The bucket.size will determine the size of the flow files (and
the number of documents contained within each file).
Since the index build is a staged process, some information may be added to existed
index documents. In the case, NiFi needs to fetch data from Elasticsearch. Let us use the
URL Stage 2 as the example.
Enter the SCROLL Elasticsearch group, right click on the base canvas, and
select variables.
Change scroll.bucket.size and scroll.page.size to values that you want, based on the
following considerations:
The scroll.page.size is the number of documents that are fetched from Elasticsearch. If
the number is too small, NiFi must make more connections to Elasticsearch.
The scroll.bucket.size is the number of documents from the fetched data in each bucket
for processing. The bucket.size will determine the size of the flow files (and the number
of documents contained within each file).
Another parameter that is useful for tuning is scroll.duration. This value defines the
amount of time that Elasticsearch will store the query result set in memory. This
parameter is useful when dealing with many stores and languages running in parallel,
where a running out of scroll error can be encountered. This error indicates that you are
running out of scroll space, and reducing the scroll duration will force Elasticsearch to
free older or obsolete buffers faster. Inversely, increasing the scroll duration in
Elasticsearch for that index will provide extra time to complete processing operations.
Enter the Create Product Document from Database, right click Create Product
Document from Database and select configure. Under the SCHEDULING tab, update
the Concurrent Tasks value to set the number of threads that will be used for the
process group. When increasing the number of concurrent tasks, the memory usage for
the process group is also increased accordingly. Therefore, setting this value to a
number that is greater than the number of CPUs that are allocated to the pod, or beyond
the amount of memory that is allocated to the pod may not make sense, and can have a
negative impact on performance.
The Post Bulk Elasticsearch processor sends the created index documents to
Elasticsearch. By default, Elasticsearch will use the same number of CPUs as the
number of the connections. Considering the possible delay or pools, the number that is
set for the Post Bulk Elasticsearch processor may be larger than number of CPUs that
are allocated to the Elasticesearch pod.
Do not multithread processors that use scrolling to getting data from database or
Elasticsearch. Since scrolling approach is used to batch the data in sizes that are
best fit, multithreading would have negative impact of the overal system
processing efficiency.
Consider increases to bucket sizes for multithreaded processors that send bulk
updates to Elasticsearch, or perform single reads from Elasticsearch.
More concurrent tasks/threads naturally consume more memory and vCPU
resources.
If the cost of memory garbage collection is high, you may need to reduce the
concurrent number of threads, or add additional memory resources.
• Monitor all servers (Database, NiFi, and Elasticsearch) during the ingest
process to find the bottleneck in the pipeline.
Flow file size tuning is very visible and impactful on the system.
Large flow files have several negative side effects:
o They demand a large memory heap for NiFi.
o They require a matched funnel on Elasticsearch, to accept the data as it
comes over.
NiFi GC overhead may become prohibitively high, or NiFi can run out of heap
space with an Out of Memory error.
For example, the following link has 451 queued items for the process Analyze
Successful SQL Response.
Back pressure is a configuration threshold that controls the overall data streaming speed.
This threshold indicates how much data should be allowed to exist in the queue before
the component (Processor or Processor Group) that is producing the data in the queue is
no longer scheduled to run. This is designed to avoid the system from being overrun with
data in motion.
Back Pressure Object Threshold - This is the number of objects that can be in
the queue before back pressure control is applied.
Back Pressure Size Threshold - This specifies the size of the objects that can
be in the queue before back pressure control is applied.
If you usually work with documents, use the back pressure object threshold to control the
back pressure. To configure it, right click the link and select View Configuration.
The Back Pressure Object Threshold and Back Pressure Size threshold can be set
from their default values.
Due to high resource consumption Monitoring should always start with operating system
resources, and their utilization. Identify if there is a resource that is saturated, such as CPU
(processor utilization), IO (network, disk, or memory), Memory, and so on, on the system level.
This is the first step in the tuning exercise – to ensure that we are not running the solution with
system resources that are improperly configured, consumed, or bottlenecked. The easiest way to
monitor this is with Grafana, and Kibana (for Elasticsearch specific metrics), or any other system
level monitor (for example, nmon). If a system resource is saturated, adjustment in the
environment is required before attempting further tuning. For example, there is no point to tune
processor threads/concurrency if there is not enough CPU resource available in the system.
Special attention should be paid to the NiFi and Elasticsearch heap. If the heap size is
inadequate for the workload, it will need adjustment. The heap utilization should be monitored
after each tuning change. This is especially crucial when increasing the concurrency of
processors, or changes to bucket.size/flowfile size. These heap values may be required to be
adjusted for each change to these key performance variables.
The easiest way to observe the overall progress of the index building is via the Grafana NiFi
Performance graph. We can observe the overall execution speed, identify major processor
group speed, and view the amount of data that is generated and pushed to Elasticsearch.
Grafana
You can use Grafana to analyze the performance of the Ingest pipeline. The two most useful
graphs are Queued Items and Wait Link. To set up these and other dashboards, refer
to Extensible metrics for monitoring and alerts.
In the NiFi ingest connectors, WaitLink process groups are added between process groups to
ensure that the previous stage is completed before the next stage is started. This way,
subsequent stages will not use data that is currently being worked on in an unfinished process. In
addition, this reduces the occurrence of different processes running at the same time, which can
cause extreme spikes in resource requests for CPU, network, memory or disk IO.
The time that is spent on WaitLink can be used to estimate the full time that is used for a stage,
and identify stages with the highest time and/or resource usage within the build. Since not all of
the process groups have WaitLink, the Queued Items graph provides more details for the time
taken for processing within each process group.
The useful charts to look at within Queued Items are the Bulk Service - <XXXX> charts. These
process groups send the processed data (index documents) to Elasticsearch from NiFi. The most
important one is Bulk Service – Product. Since the curve starts from the beginning to the end of
the ingest pipeline, we can use the timestamp in Wait Link to get the related stages.
For example, the following two graphs show that the biggest number of queued items is at
the Product Stage 1e. This observation means the retrieving data group and processing data
group can handle the task quickly, and send lots of data to the Bulk service group for
transferring.
In this example, the duration with 100 queued items is short and therefore is not a problem. If a
process group takes a longer time, with a larger number of queued items, it would be a possible
bottleneck in the pipeline.
Kibana
Kibana can be used to monitor the resource consumption of Elasticsearch. For more information
about Kibana, refer to the Kibana documentation.
This graph displays Kibana monitoring Elasticsearch operations. For the index building process,
the key metrics are the CPU utilization, JVM heap, and IO operations rate. The IO operation rate
is the most critical metric, in the sense that if the IO rate is fully utilized, it is not possible to push
faster overall throughput. If the speed is not acceptable, the best course of action is to investigate
alternative solutions with higher throughput.
Due to high resource consumption, the NiFi counters collection for HCL Commerce activities are
disabled by default.
You can enable it by adding the following lines within nifi-app.yaml (/commerce-
helmchart-master/hcl-commerce-helmchart/stable/hcl-commerce/templates/
nifi-app.yaml) before installing NiFi:
name: "FEATURE_NIFI_COUNTER"
value: "true"
After enabling it, you can view the report while the test is running, or after the Ingest process is
completed. One disadvantage is that you can only see one report for each connector. If you are
using the same connector to run another Ingest pipeline, the report that was generated for the
previous run will be removed at the beginning of the new Ingest process (this process can take a
couple of minutes).
After an Ingest pipeline is finished, the Ingest report, Ingest Metrics, will be sent to the
index run within Elasticsearch. You can configure Grafana to display the report in the format you
defined. The reports for the different Ingest pipelines and different connectors are all stored. You
can select connector and runID to view the report.
The data for Ingest Metrics at Grafana is different of the Queued Items/Wait Link. The metrics
will only be sent, by NiFi, to Elasticsearch after the Ingest process is finished. But Queued
Items/Wait Link are using Prometheus to collect information at runtime.
For tuning purposes, you may not want to finish an Ingest pipeline before running it again, or the
process can fail at any point in the Ingest process. In these cases, NiFi counters may be easier
to collect reports for some of the stages in an Ingest pipeline.
In the minimal configuration, each of the pods is allocated six vCPUs and 10GB of memory.
In the recommended configuration, each of the pods is allocated sixteen vCPUs and a minimum
of sixteen GB of memory.
This number does not reflect your actual requirements, however. Index files are dated and with
time will accumulate on the disk. As a general rule, provide at least ten times the disk space of
the anticipated index size to reflect this variability, and clean up old index files on a daily basis. In
the case of the six gigabyte index, this would mean allocating at least sixty gigabytes and running
a regular job to delete stale files.
NiFi
The processing speed of the data set, and the resulting speed of the search index creation is the
result of the throughput one can achieve in the NiFi and Elasticsearch cluster. There are several
parameters that would improve and optimize the throughput for a given hardware footprint; NiFi
processors threads and bucket size.
Threads
The default processor runs on a single thread, processing just one flow file at a time. If you use
concurrent processing, the number of concurrent tasks that it can do can be adjusted. Set the
number of threads for the process group by changing the processor Concurrent Tasks value
(under Processor configuration or SCHEDULING tab).
When the processor can process flow files at the same rate as they come, the Concurrent
Tasks value is ideal, preventing large pileups of flow files in the processor's wait queue.
If a CPU can multitask, increasing the threads available to the processor increases the processor
throughput. The transformation processor (as in NLP) and the Bulk update processor are two
such examples. You can assign more threads to the processor, or set the Concurrent
Tasks variable to be higher than one. Increasing the threads of the processor will result in
improvement of the processor throughput. The following screenshot represents an NLP
Processor which is set to sixteen concurrent tasks, equal to the number of virtual CPUs (vCPUs)
that are available on the node.
This update is
not useful for all processors. Most processors come with an default configuration that takes this
variable into account and does not need to be altered. When performance testing reveals a
bottleneck in front of the processors, the default configuration may benefit from further tuning.
Because such balancing may not always be feasible, the recommended best practice is to focus
on reducing the flow file pileup in the waiting queue.
Additional threads increase the processor bandwidth by the factor of the number of threads, if the
processor is doing computational processing (that is, you can experience linear scaling). In the
case of the I/O operations, the processor will experience some improvement that would start to
diminish after certain level of threads is set (such as nonlinear scalability that would end in
saturation).
In the case of the NLP Processor, the processor is purely computational, and the thread limit is
considered to be the same number of vCPUs that are available for the NiFi Pod.
Bucket size
Bucket size (scroll.bucket.size) is another parameter you can change to improve a processor
bandwidth. It changes the size of the flowfile that is processed. By increasing the bucket size you
are increasing the size of data that will be processed by a single processor as a group.
Bucket size changes are a bit more difficult to implement. The location of the variable is on the
second level parameters of the processor group.
The Bucket Size is optimal when you see that the flow file is easily processed through the system
(including Elasticsearch upload). Increasing the size (number of documents in the flow file)
increases the throughput. However, when increased to very high size, the throughput tapers off
and gradually decreases. Simultaneously, the resources requirements of the system (NiFi and
Elasticsearch) increase, leading to lower throughput.
The Variables window opens. Here you can change the bucket size value.
When the time it takes to process the data is evenly matched with the time it takes for the
next SQL scroll to receive its data, the scroll settings are optimal. This improvement
eliminates needless processing and I/O time. Additional factors to consider include the
memory space available in NiFi to hold the result set while parsing and splitting this into
flowfiles, and the total number of flow files that would be created in NiFi at once. If the
scroll page size is larger, you should expect an impact on NiFi operations. When you
reach this limit, you can increase the resources allocated to NiFi, or limit the scroll page
size to reduce the impact to performance.
You can set LISTAGG locally or globally. To set it locally, change the
attribute flow.database.listagg.
You control attributes using UpdateAttribute processors, which update the attributes of
the flow files. For example, if you want to
set flow.database.listagg="false" for AttributeStage1b in auth.reindex, set it in
the Properties as follows: NiFi Flow > auth.reindex - Attribute Stage 1b (Find Attribute
Values) > Find Attribute Values > SCROLL SQL > Define scroll offset and page size.
Note: If you experience issues during a particular stage of ingest pipeline processing
where string aggregation is exceeding the 32k LISTAGG function limit, you will need to
disable LISTAGG for that particular processing stage. For instance, to disable LISTAGG
for Attribute Stage 1b (Find Attribute Values) in versions prior to 9.1.11:
You can make make global changes using Ingest profiles. For more information
and an example of how to change LISTAGG using an Ingest profile, see Ingest
configuration via REST.
When the list aggregate is disabled, the SQL proccess will return more rows and NiFi will
use the Serialize process to handle the returned data. In the case, the duration for
processing the data will be much longer. To account for this,
the page.size and bucket.size for the SQL process, and thread number for
the Serialize process must be increased.
Elasticsearch
The following section will discuss a few improvements that can be made to the
Elasticsearch configuration to increase the overall throughput and speed of index building.
However, if it is viable, disable this behavior by setting its value to -1. If disabling this is
not viable, a longer time interval, such as 60 seconds will also make an impact. By
increasing the refresh rate to longer periods, the updated documents from the memory
buffer are less frequently written into the indices (and eventually written to disk). This
improves the processing speed on Elasticsearch, as fewer refresh events will result in
more resource bandwidth to receive data. In addition, bulk updates to the file system are
always more desirable. On the other side of the equation, the longer refresh intervals will
cause the memory buffer to inflate while trying to accommodate for all of the incoming
data.
For more information on the index refesh interval setting, see the Elasticsearch
documentation.
Ensure that you stop the processor. Select and edit the json object, replacing
the refresh_interval value with the value that you want.
Increasing indexing buffer sizes will help to speed up the indexing operation and will
improve the overall index building speed.
However, this may not be sufficient when dealing with large catalogs and large
bucketsizes. In the case of the index build, a bulk update request is issued from NiFi to
Elasticsearch. The Elasticsearch master node receives the request body, which is
comprised of multiple documents, and for each document, it determines the shard it
should be stored in. A connection is opened to the appropriate node/shard so that the
document can be processed.
Thus, the bulk update ends up using multiple connections, and it is quite possible to run
out of threads as well as connections. If Elasticsearch runs out of connections, a
response code, 429, is returned. This will interrupt the index build process, and the index
build fails.
To accommodate the needs for more connections and threads, the Elasticsearch server
can be configured to start with more threads and a deeper connection queue on each
node. The following file describes the key Elasticsearch configurations (contents are set
within the es-config.yaml configuration file):
replicas: 3
minimumMasterNodes: 2
ingress:
enabled: true
path: /
hosts:
- es.andon.svt.hcl.com
tls: []
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-storage-es
resources:
requests:
storage: 15Gi
esJavaOpts: "-Xmx12g -Xms12g"
resources:
requests:
cpu: 2
memory: "14Gi"
limits:
cpu: 14
memory: "14Gi"
esConfig:
elasticsearch.yml: |
indices.fielddata.cache.size: "20%"
indices.queries.cache.size: "30%"
indices.memory.index_buffer_size: "20%"
node.processors: 12
thread_pool:
search:
size: 100
queue_size: 10000
min_queue_size: 100
max_queue_size: 10000
auto_queue_frame_size: 10000
target_response_time: 30s
thread_pool:
write:
size: 100
queue_size: 10000
The actual values must be specific to the environment and its configuration.
Increasing the worker threads to 100, and the connections pool to 10000 will suffice for
catalogs of 1M items on an Elasticsearch cluster of 3 nodes and 3 shards with the default
configurations in place.
To apply the changes, the Elasticsearch cluster should be reinstalled using the new
configuration file.
Sharding
It is useful to know the optimal number of index shards to be used as
your data grows during production. You can determine this based on
the existing size of the search index. Use the following three rules to
calculate when to adjust the number of index shards.
Clustering
Elasticsearch
Elasticsearch comes installed by default as a three shard cluster. Both minimal and
recommended sizing implements such clustering, and the only differences are the
resources that are allocated. The recommended sizing has more vCPUs and memory per
pod, which is sufficient to drive traffic and build the index.
NiFi
NiFi is configured as single server, in both minimal and recommended configurations. For
typical expected workloads this is sufficient. However, if Natural Language Processing
(NLP) processing presents a bottleneck, NiFi horizontal clustering will improve NLP
throughput, with linear scalability.
Sharding
It is useful to know the optimal number of index shards to be used as your data
grows during production. You can determine this based on the existing size of the
search index. Use the following three rules to calculate when to adjust the number
of index shards.
An index shard should not not exceed 40% of the total available storage of
its node cluster.
An index shard size should not exceed 50 GB; generally the index performs
best when its size is less than 25 GB per shard.
The document counts and sizes across shards should be similar.
Hardware footprint
There are several factors in considering the hardware footprint and key resources
that impact the processing and index creation speed.
The minimum CPU resources that are recommended for each pod are set to 6 vCPUs
per pod. This is shown to have acceptable performance when building catalogs of
medium size (assuming 300 000 items). However, each catalog is different, and catalogs
that have an excessively large attribute dictionary can require extra resources to keep up
with the increased processing demand.
In general, NiFi processing will comfortably fit into the allocated CPU resource size,
except in the case of NLP processing. Typically NLP processing is not CPU bound.
Before increasing allocated CPU resources (to boost NLP processing speed), it is
recommended to re-test the index build with the existing hardware. During a second
index build, NLP will re-use some of the computation from the initial run, reducing the
overall processing time dramatically. Use these repeated builds to derive any increased
resource requirements.
More importantly, the heap sizes need to be adjusted to fit the indexed data size and
complexity. The NiFi and Elasticsearch process is streamlined, and as long as the
configuration is kept same, the heap should be sufficient for any size of the catalog.
In the provided minimal and recommended cases, there are two heap configurations,
targeting the aforementioned 300,000 and 1M catalog. The tuning parameters are
different between these configurations, since a 1M item catalog requires larger heap
sizes in both NiFi and Elasticsearch (12GB and 16GB accordingly).
The most influential resource to the overall operation of the NiFi and Elasticsearch
coupling is the Elasticsearch I/O subsystem, which is generally driving the overall speed
of the processing. If Elasticsearch stores the indexed data on the disk slowly, it will also
slow down the overall build process, including NiFi execution speed. Thus, the file/io
system on Elasticsearch must be considered early, and ideally configured on local
SSD/NVMe storage for maximum throughput and I/O rates.
Recommended Configuration 16 16 16 12
3 - ES
1-
Nifi
In the default HCL Commerce Elasticsearch cluster implementation, all nodes have all the node
roles. If one node is busy on data operations and under resource constraints while it also has the
master role, it can affect the cluster health, which in turn impacts data availability. Elasticsearch
in particular can demand high CPU usage in the Auth environment when doing ingest operations,
which would adversely impact the data queries in the production environment.
With the default clustered deployment, Elasticsearch automatically distributes the primary and
replica shards to the available nodes (pods). In an HCL Commerce deployment, with authoring
and live configurations, ingest operations could create periods of heavy activity on Elasticsearch,
that stress resources such as memory and CPU. If an Elasticsearch node hosts both authoring
and live indices, ingest operations could impact query performance and affect the availability of
the live storefront.
This documentation describes a sample configuration that defines dedicated Elasticsearch node
groups for authoring and live. By separating authoring and live indices into two different node
groups, we can prevent reindexing from impacting the live site.
Elasticsearch installation
To setup Elasticsearch with dedicated node groups, each group is installed as a separate Helm
release. This enables the use of different configurations for each group. When each node group
is started, they join a single Elasticsearch cluster.
The configuration presented here is a sample. You will need to adjust the node configurations
such as memory and CPU, and the number of nodes to meet your requirements. Depending on
your load requirements, you might need additional live nodes. You can also change the pod
configurations to better match your available infrastructure.
The values files for each release are available here:
master.yaml
auth.yaml
live.yaml
Elasticsearch pods
vcp
group pods u memory heap
maste 3 2 2G 1.5G
vcp
group pods u memory heap
live 2 6 10G 6G
Kubernetes node groups
The Elasticsearch node groups, that are installed as separate Helm releases, can be configured
with different node affinity rules. For example, Elasticsearch live nodes (pods) can be deployed
within a particular Kubernetes node pool.
This sample from Google Cloud, defines affinity rules so the pods are only deployed within nodes
that belong to the elastic-live-pool node pool:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- elastic-live-pool
While it is possible to deploy Auth and Live to the same node pool, using separate node pools
has the avantage that each pool can be configured with different Kubernetes node
characteristics. For example, the authoring node pool can be configured with a machine with
more memory than the live pool.
The sample is deployed in Google Cloud (GKE) using the following node configurations:
node
pools s vcpu memory
elastic-auth- 1 12 32G
node
pools s vcpu memory
pool
elastic-live-pool 2 8 16G
If your cluster is not currently configured with multiple node pools, all other deployments must
have affinity rules added, as otherwise they could deploy to any pool including the Elasticsearch
pools. If a non-Elasticsearch pod (for example, Redis or Vault) deploys to the Elasticsearch node
pools, the node might not be left with enough resources to start the Elasticsearch pods.
Besides the node groups for authoring and live, the sample configuration defines a group
of dedicated master nodes. These nodes do not handle traffic or maintain indices. Instead, their
only responsability is to manage the cluster state. In production environments, it is recommended
to have dedicated master nodes, as data nodes might become overwhelmed and unresponsive,
and this could lead to cluster state synchronization problems.
The dedicated master nodes typically require few resources. The sample configures the limits to
2 vcpu, memory to 2G and Java Heap to 1.5G. Monitoring should be used to confirm these
resources are sufficient.
In our example, master nodes can run within either the elastic-auth-pool or the elastic-live-pool
node pools:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- elastic-auth-pool
- elastic-live-pool
"settings": {
"index.routing.allocation.require.env": "auth",
"number_of_replicas": 0,
"number_of_shards": 1
The inclusion of the live* index pattern (as opposed to .live* which is used in the live
group) is to include a set of indices that must be kept within the authoring group.
The live.master.* indices are copies of the production ready data which are kept within
the authoring group, and are copied into the .live.* indices during push-to-live. Price
and inventory data for the live environment is ingested into
the live.price and live.inventory indices. These indices are copied into master
indices and then propagated into the live environment.
The live.store index is used during ingest while the .live.store.yyyymmddHHmm index
is a copy kept within the live group. This allows ingest and live operations to remain
available even if the other Elasticsearch node group is unreacheable.
"index_patterns": [".live*"],
"settings": {
"index.routing.allocation.require.env": "live",
"number_of_replicas": 1,
"number_of_shards": 1
Configuring NiFi
Additional configurations are required for NiFi to support the dual node group setup.
The following example uses curl to apply the configurations using the shard
queryapp server:
'localhost:30900/search/resources/api/v2/configuration?
split.json:
"global": {
"connector": [
{
"name": "attribute",
"property": [
"name": "flow.inventory.copy",
"value": "auth,live"
},
"name": "flow.price.copy",
"value": "auth"
},
"name": "alias.keep.backup",
"value": "0"
},
"name": "cluster.index.nodegroup",
"value": "dual"
Connecting to Elasticsearch
Each Elasticsearch release generates its own set of services. While all the nodes
can handle requests for any index, there is additional overhead if the node handling
the request does not locally manage the index.
elasticsearch-auth ClusterIP
elasticsearch-live ClusterIP
The recommended configuration is to have the live servers use the live
service elasticsearch-live.elastic.svc.cluster.local, while the rest can
connect directly to the authoring node via the authoring service (elasticsearch-
auth.elastic.svc.cluster.local). The master nodes do not own indices and
svt/qa/live/elasticSearchHost value="elasticsearch-
live.elastic.svc.cluster.local"
svt/qa/live/elasticSearchPort value="9200"
svt/qa/live/elasticSearchScheme value="http"
The authoring service is referenced under the environment level. It is used when not
running in live:
svt/qa/elasticSearchHost value="elasticsearch-
auth.elastic.svc.cluster.local"
Once the installation of all the helm releases is complete (es-master, es-auth and
es-live) and the statefulsets are started, all pods should form a single Elasticsearch
cluster. This can be validated by executing the /_cat/nodes API as follows:
curl "localhost:9200/_cat/nodes?v"
- elasticsearch-master-0
- elasticsearch-master-1
* elasticsearch-master-2
cdfhirstw - elasticsearch-auth-0
cdfhirstw - elasticsearch-live-0
cdfhirstw - elasticsearch-live-1
After reindexing is complete, use the _cat/indices API to verify the indices' health
is 'green.' If the primary shard cannot be allocated, the index health will be 'red'. If a
replica shard is not allocated, the index health is 'yellow'. If there are indices that are
not green, there could be a problem with the Elasticsearch or NiFi configurations.
The Elasticsearch Cluster allocation explain API can describe reasons why the
cluster is unable to allocate a shard.
curl "localhost:9200/_cat/indices?v"
Te5iRhS1CFXY0OZv0wKg 1 0 43 0 44.1kb
44.1kb
MAkWPxKhSfyHuxyEqFkFRA 1 0 3 0 10.5kb
10.5kb
HOoMtbyVRt6Ow8FPPUco6Q 1 0 54 0 70.7kb
70.7kb
106.9kb
EOZXi76ITeiQdC3iUFXmog 1 0 10 0 13.4kb
13.4kb
...
Similarly, the /_cat/shards API shows the nodes on which each index is allocated.
This is used to verify that the affinity for Auth and Live indices is working correctly.
curl "localhost:9200/_cat/shards?v"
store ip node
1.1.1.1 elasticsearch-auth-0
1.1.1.1 elasticsearch-auth-0
...
.live.12001.category.202312011848 0 r STARTED 928 1.3mb
1.1.2.0 elasticsearch-live-0
1.1.2.1 elasticsearch-live-1
1.1.2.0 elasticsearch-live-0
1.1.2.1 elasticsearch-live-1
...
Migrating to a dual nodegroup configuration
Existing environments can be migrated to a split Auth and Live
configuration.
Indexing data lifecycles in dual-nodegroup Elasticsearch configurations
Different processes are involved in indexing data lifecycles in dual
Elasticsearch nodegroup configurations. The affected processes include
Near-Real-Time updates, offline dataloads, Push-To-Live scenarios, and
Update-Live operations. Cache invalidation processes are also performed
with each of these updates.
A set of optimal processing values for the minimal configuration for NiFi and Elasticsearch are
provided. These are the default values for the HCL Commerce deployment. They are sufficient
for a typical index up to 300,000 catalog items. The second configuration provides optimal
processing of the recommended configuration for NiFi and Elasticsearch. These are the provided
recommended values for the HCL Commerce deployment. They are sufficient for a typical index
up to 1M catalog items.
Heap sizes
Optimal values:
NiFi heap – 9GB
Elasticsearch heap - 12GB
Recommended values:
NiFi heap – 12GB
Elasticsearch heap - 16GB
You can also use this Metrics Monitoring framework to visualize the cache requests sent and
received by Nifi . Using this new API, http://NIFIHOST:30690/monitor/metrics, the monitoring
data can be collected in the industry-standard Prometheus format which can be used in Grafana
or any other different tools to visualize the cache requests sent and received by Nifi.
There are three parts to the monitoring framework. First, a fully-customizable presentation layer
enables you to use your preferred tools to report and analyze your systems' performance. The
flexibility of this layer comes from its use of a vendor-neutral, industry-standard data-
representation language. This is the open-source Prometheus toolkit. Finally, Prometheus gets
its data from the fully-customizable Micrometer library, which "scrapes" the data from your
containers.
Note: For more information on using of Grafana and sample dashboards, see HCL Commerce
Monitoring - Prometheus and Grafana Integration.
The top of the framework is the reporting layer. Because your data is represented in the
Prometheus format, you can use many different tools to display and analyze it. One popular
dashboarding tool is Grafana (https://grafana.com/). Grafana is often used with Prometheus to
provide graphical analysis of monitoring data.
You can download the HCL Commerce Grafana Dashboards from the HCL License and Delivery
portal. For more information on the available HCL Commerce Grafana Dashboard package,
see HCL Commerce eAssemblies.
Monitoring and performance data is scraped using the JVM-based Micrometer instrumentation
library. The key concept for Micrometer is the meter. A rich set of predefined meter primitives
exist that define times, counters, gauges and other data collection types. You can use the default
meters to aggregate performance and monitoring data from your containers, or customize your
own.
Metrics for the performance of each container are exposed at its /monitor/metrics endpoint.
They are collected by a process known as “scraping.” Micrometer scrapes the metrics endpoint
on all containers at a configurable internal. The metrics are stored in a database where other
services can access them. In Kubernetes environments, scrapers also add contextual metadata
to the metrics obtained by endpoints, such as the service, namespace, and pod that identify the
origin of the data.
Configuring meters
Metrics are enabled by default when using the HCL Commerce Helm charts. They can also be
enabled by configuring the environment variable:
EXPOSE_METRICS=true
Metrics are exposed on each pods on the following paths and ports:
In addition to enabling metrics, the Helm chart exposes the metrics port thru the services, and
offers the option to define a servicemonitor
( metrics.servicemonitor.enabled, metrics.servicemonitor.namespace) for use with
Prometheus Operator.
In addition to the default set of meters, you can add your own. When meters are enabled,
the Metrics class makes the global registry available. Meters added to the global registry are
automatically published to the metrics endpoint.
New meters can be added to the registry by using the Micrometer APIs. See the Micrometer
Javadoc for API details: https://javadoc.io/doc/io.micrometer/micrometer-core/1.3.5/index.html.
Samples
The following examples show how metrics can be used from custom code.
Counters
A positive count that can be increased by a fixed amount. For example, “number of
requests.” Prometheus includes functions such as rate() and increase() that can be
used to protect against counter resets
Metrics.isEnabled()
? Counter.builder( "backend.calls.total" )
: null;
if ( BACKEND_COUNTER != null ) {
BACKEND_COUNTER.increment();
if ( Metrics.isEnabled() ) {
Metrics.getRegistry().counter(
"backend.calls.total",
"result",
myGetResult()
).increment();
Timers
Timers are used to track the duration and frequency of events. Besides calculating the
average durations, the API allows to configure a set of Service Level Objectives (SLO),
which are translated to histogram buckets. SLOs can also be used to calculate quantiles.
For more information, see Histograms and Summaries on the Prometheus website.
The Metrics class defines SLOs for common usages. For example,
Metrics.DEFAULT_SLO_REST_DURATIONS_NAME defines buckets that are
appropriate for typical REST execution times. If your timer doesn’t match these durations,
you can specify new values as a long array. For more information, see .sla() in
the Timer.Builder class definition on the Micrometer website.
Metrics.isEnabled()
? Timer.builder( "backend.calls.duration" )
.sla( Metrics.getSLOsByName(Metrics.DEFAULT_SLO_REST_DURATI
ONS_NAME) )
.builder.register( Metrics.getRegistry())
: null;
if (BACKEND_TIMER != null ) {
startTime = System.nanoTime();
doWork();
if (BACKEND_TIMER != null ) {
When using a Timer with label values that are not known in advance, the micrometer API
doesn’t allow for SLO (.sla(..)) to be specified. In order to achieve this, define a meter
filter to merge the config. The Metrics.applySLO(final String metricName, final
long[] slos) or Metrics.applySLO(final String metricName, final String
static {
Metrics.applySLO( TIMER_NAME,
Metrics.DEFAULT_SLO_REST_DURATIONS_NAME );
if ( Metrics.isEnabled() ) {
startTime = System.nanoTime();
doWork();
if ( Metrics.isEnabled() ) {
Metrics.getRegistry().timer(
TIMER_NAME,
"result",
getResult() )
Gauges
A gauge holds a value that can increase and decrease over time. The meter is mapped
to a function to obtain the value. Examples include number of active sessions and current
cache sizes.
class MyService {
if (Metrics.isEnabled()) {
this,
MyService::getActiveClients )
.tags("endpoint", getEndpointName())
.register(Metrics.getRegistry());
return nActiveClients;
}
Enabling and customizing the Metrics
Monitoring framework in a development
environment
In order to use and extend the HCL
Commerce Version 9.1 Metrics Monitoring
framework in your HCL Commerce
development environment, enable it in each
of the Transaction, Search, Customization
(Xc) and Store servers.
Performance Measurement tool
As a developer, you can use the Performance Measurement tool to gather performance data on
a running application to help you identify any performance bottlenecks. You can use this tool in
both your development environment to test operation performance, and in your production
environment to analyze the actual performance of an operation. When you run this tool, you can
generate different reports to help you identify operations that impact performance and determine
how to improve caching performance.
The Performance Measurement tool is a flexible measurement tool that you can use in the
following ways:
As a serviceability tool, which you can use to evaluate what the system is doing. This
evaluation can be done while the operation runs in the production environment.
As a general performance measurement tool, which you can use to determine where
time is spent during a request execution.
As a cache potential measurement tool, which you can use to determine the value that
caching could bring to various operations.
When you run this tool, you can generate any of the following types of reports to help you
measure and analyze your site performance:
Performance reports
Stack reports
Execution reports
Caller reports
When this tool runs, it uses API classes to gather metric data for operations to use for generating
the preceding reports. The API classes that the tool uses to gather the metrics are built on the
Java Logging mechanism. These API classes create a metric gatherer and then create objects of
a type OperationMetric. The tool uses these objects to gather the following information about a
single operation execution:
Start time
End time
Duration
Result size
Name
Key-value pairs. (that are used as a unique cache key)
Whether caching is enabled for the operation
Whether the operation result is fetched from cache.
Unique ID of the request
Unique ID of the parent request
For more information about how to run the Performance Measurement tool to generate reports,
see Using the Performance Measurement tool
Security
Depending on how you set up caching for your site, some operations that you cache can use
sensitive information as key-values. For example, a servlet login form request might include a
user password as a parameter. In this case, the parameter value is typically masked to prevent
the value from being included in any generated performance logs.
When you run the Performance Measurement tool, the tool does capture parameter values for
select statement within "GET" operations. If the parameter values that contain sensitive
information are masked, the generated Performance Measurement tool reports should not
include any sensitive information.
When you are generating performance reports, three files are generated:
report-operations.csv
This report provides a simplified view of caching performance. Use this report when you do not need
complex statistics about data caching. This report includes the following information:
OPERATION_NAME
AVERAGE_CALL_DURATION_IN_MS
AVERAGE_RESULT_SIZE_IN_BYTES
The average size of a result when the result would be saved in cache.
CUMULATIVE_EXECUTION_TIME_MS
The amount of time that is spent by the system when it runs all of the measured executions of the
operation.
CALL_COUNT
report-execution.csv
This report lists the main operations that the system executes. These operations are listed from
slowest duration to fastest. Use this report to help you identify the slowest requests on your system.
You can use this report with execution reports to help identify the performance of the requests by
matching the operation name and starting timestamp between the reports. This report includes the
following information:
OPERATION_NAME
DURATION_MS
START_TIME_MS
The start time of the operation in milliseconds as a relative timestamp to the stop time.
STOP_TIME_MS
The stop time of the operation in milliseconds as a relative timestamp to the start time.
RESULT_SIZE
The size of the operation result.
KEY_VALUE
IDENTIFIER
report-operation-cache.csv
Use this report to help you analyze the cache efficiency and potential for every operation. This report
includes information for all the following metrics. This report can include measurements and
information for the following metrics:
MS_SAVED_PER_BYTE
The time (in seconds) that is saved on your system for every byte of cache that you allocate to a
specific operation. This value is based on the assumption that your cache is infinite and that the
cache access is instantaneous. You can use this information to help you determine the best place to
allocate your available cache resources.
CACHE_ALLOCATION_IN_BYTES
The recommended amount of memory (in bytes) to allocate to the cache. This amount is based on
the allocatedCacheSize variable that is set in the analysis.properties file.
AVERAGE_CALL_DURATION_IN_MS
AVERAGE_CACHE_HIT_DURATION_IN_MS
The average duration of a call when the call results in a cache hit.
AVERAGE_CACHE_MISS_DURATION_IN_MS
The average duration of a call when the call results in a cache miss.
AVERAGE_RESULT_SIZE_IN_BYTES
The average size of a result when the result would be saved in cache.
CUMULATIVE_EXECUTION_TIME_MS
The amount of time that is spent by the system when it runs all of the measured executions of the
operation.
MAX_CACHE_ALLOCATION_SIZE_IN_BYTES
This is the maximum amount of cache (in bytes) that this operation could take if all of the unique
calls are stored in cache.
MAX_CACHE_BENEFIT_MS
The amount of time that is saved during the execution of an operation if the operation uses a perfect
cache that takes no execution time for a cache hit.
UNIQUE_CACHE_ENTRY_COUNT
The number of unique cache entries that generate if you have an infinite cache and every operation
result is cached.
MAX_THEORIC_CACHE_HIT_COUNT
The number of cache hits that generate during the cache performance measurement if you have an
infinite cache and every operation result is cached and never invalidated.
REAL_CACHE_HIT_COUNT
The number of request results that are actually fetched from the cache when your cache is enabled.
You can use this information to find the operations that are redundant.
REAL_CACHE_ENABLED_COUNT
CACHE_ENABLED_CALL_PERCENTAGE
MAX_THEORIC_CACHE_HIT_PERCENTAGE
The theoretical maximum number of requests that result in cache hits if you have an infinite cache
and no invalidation occurs.
REAL_CACHE_HIT_PERCENTAGE
CACHE_EFFECTIVENESS_VS_THEORY_PERCENTAGE
The effectiveness of your cache as a percentage of the effectiveness of the maximum value that is
predicted by the theoretical caching of the operation. You can use this information to help you find
where your caching practices are inefficient. This information can also help you to pinpoint where
your cache is too efficient and might be missing a key.
CALL_COUNT
HCL Commerce Search performance tuning falls under the following sections:
Indexing server
Search runtime server
Where the main objective of tuning the indexing server is for optimal memory management,
tuning the search server run time is to obtain the best response times.
When to perform full search index builds
The HCL Commerce Search index is automatically built when certain business tasks are
performed, as outlined in Common business tasks and their impact to the HCL Commerce
Search index. In several cases, common business tasks result in delta index builds that do not
pose a significant risk to production system performance. However, doing several delta index
builds without occasional full index builds might result in the search index gradually degrading
over time due to fragmentation. To avoid this issue, doing full search index builds when possible
ensures that the search index performs well over time.
When Lucene receives a delete request, it does not delete entries from the index, but instead
marks them for deletion and adds updated records to the end of the index. This marking results
in the catalog unevenly spreading out across different segment data files in the search index, and
might result in increased search response times. If you have a dedicated indexing server,
consider scheduling a periodic full search index build. Make this build a background task that
runs once per month, so that the deleted entries are flushed out, and to optimize the data.
Indexing server
Consider the following factors when you tune the indexing server:
Index build preprocessor is now using Varchar as field type rather than Clob
The data type of several columns of the TI_ATTR table were changed from CLOB. The
six columns are now defined as varchar(32672) in Db2, and varchar2(32767) for Oracle
in the wc-dataimport-preprocess-attribute.xml configuration file. The same change was
made in the ATTRIBUTES column of TI_ADATTR. This change reduces the
preprocessing time of these two tables.
This change requires that Oracle users enable the "Extended Data Types" feature
described in https://oracle-base.com/articles/12c/extended-data-types-12cR1. If you are
migrating from a previous version, ensure that you drop all temporary tables before
proceeding.
Note: You must also execute the instruction in bold in this sample, or the Oracle
database will not come back online after a restart. You need only execute this instruction
once.
CONN / AS SYSDBA
SHUTDOWN IMMEDIATE;
STARTUP UPGRADE;
ALTER SYSTEM SET max_string_size=extended;
@?/rdbms/admin/utl32k.sql
SHUTDOWN IMMEDIATE;
STARTUP;
x-data-config.xml
To enable the preprocessor, copy and use the XML files that are provided.
To enable the preprocessor for your CI/CD pipeline, begin by copying the XML
files within your development (toolkit) environment samples folder. Copy the XML
files from the samples/dataimport/copy_columns_data_preprocessor directory within
your development environment to the \WC\xml\search\dataImport directory for your
CI/CD pipeline.
If you want a quick trial of the preprocessor, copy the XML files from your Utility
Docker container to
the /profile/installedApps/localhost/ts.ear/xml/search/dataImport directory of your
Transaction server Docker Container. You can complete this procedure to test
the preprocessor results within your CI/CD pipeline or within a development
environment.
wc-dataimport- TI_DPGROUPI_#INDEX_SCOPE_TAG#
preprocess-direct- TI_DPGRPNAME_#INDEX_SCOPE_TAG#_#lang_tag
parent- #
catgroup.xml
wc-dataimport- TI_CATENTRY_#INDEX_SCOPE_TAG#
preprocess-
fullbuild.xml
wc-dataimport- TI_D_CATENTRY_#INDEX_SCOPE_TAG#
preprocess- TI_CATENTRY_#INDEX_SCOPE_TAG#
fullbuild-
workspace.xml
wc-dataimport- TI_OFFER_#INDEX_SCOPE_TAG#
preprocess-
offerprice.xml
wc-dataimport- TI_APGROUPI_#INDEX_SCOPE_TAG#
preprocess-parent-
catgroup.xml
wc-dataimport- TI_PRODUCTSET_#INDEX_SCOPE_TAG#
preprocess-
productset.xml
Important: Before you build an index, ensure that you delete all temporary tables with
the exception of the following delta indexing tables:
TI_DELTA_CATENTRY
TI_DELTA_CATGROUP
TI_DELTA_INVENTORY
Ensure that you have Tracing enabled. Run the index as usual, and use Trace to
determine what performance improvements occurred.
The transaction log size in the Db2 database is controlled by LOGFILSIZ and
LOGPRIMARY+LOGSECOND. The following SQL statements provide an example of
how to increase the log space to 4 KB*40000*(20+160)=28.8 GB:
The default fetchSize and batchSize of the preprocessor are each 500.
The fetchSize cannot be larger than 32767 for Db2, or 1000 for Oracle.
For example:
<_config:data-processing-config
processor="com.ibm.commerce.foundation.dataimport.preprocess.CatalogHierar
chyDataPreProcessor"
masterCatalogId="10101" batchSize="500" fetchSize="1000">
...
</_config:data-processing-config>
The query for the TI_ADATTR temporary table is changed in
Version 9.0.0.6+
During index building, nearly all rtrim() and cast() calls were removed from the query
for the TI_ADATTR temporary table. These calls were redundant for ordinary index
builds. The removal of these calls improves the response time of this query against Db2
databases and improves scaling for large numbers of catalog entries. The change for this
query is enabled by default when you update to Version 9.0.0.6+.
Search caching for the indexing server
Typically disable all Solr caches on the indexing server.
Tuning index buffer size and commit actions
during data import (buildindex)
You can tune your solrconfig.xml file to allocate sufficient memory for index buffering and
prevent commit actions when you are building your index. When the RAM buffer for index
updates is full, Solr performs commit actions that persist data onto disks. When these
commit actions occur, Solr has a global exclusive lock on your entire JVM. This lock
prevents other threads from doing update operations, even when the thread is working
on different records or files. This locking can increase the amount of time that is required
to build your index. By increasing your RAM buffer size, and disabling the commit trigger,
you can reduce the chances of this locking. You can tune your Solr parameters for
commit timing and buffer size in the solrconfig.xml file:
Allocate more memory for index buffering by changing the value for
the ramBufferSizeMB parameter. 2048 MB is the maximum memory that you
can allocate:
<ramBufferSizeMB>2048</ramBufferSizeMB>
1. Increase the default paging size for your operating system. For example, 3 GB. In
cases where the operating system requires a higher paging size, adding more
memory to the system also helps to resolve issues.
2. Increase the default database heap size to a larger value. For example, increase
the Db2 heap size to 8192.
3. Increase the file descriptor limit to a higher value. For example: ulimit -n
8192.
Do not exceed 28 GB of heap size per JVM, even when you use a 64-bit
environment. In a 64-bit JVM, an address compressed reference optimization
feature exists that might be disabled if the heap space exceeds 28 GB. If it is
disabled, there can up to a 30% overall throughput degradation.
Search runtime
server
Consider the
following factors
when you tune the
search runtime
server:
Caching
Search caching for the runtime production subordinate servers
The starter configuration that is included in the CatalogEntry solrconfig.xml file is only
designed for a small scale development environment, such as HCL
Commerce Developer.
When you redeploy this index configuration to a larger scale system, such as a staging or
production system, customize at least the following cache parameters:
queryResultWindowSize
queryResultMaxDocsCached
queryResultCache
filterCache (Required on the product index when an extension index such as
Inventory exists)
documentCache (Required on the product index when an extension index such
as Inventory exists)
The following example demonstrates how to define cache sizes for the Catalog Entry
index and its corresponding memory heap space that is required in the JVM:
Allocate 10 M cache slots for caching the first three pages of the main query.
The size of each filterCache is 4 bytes per docId (int) reference x random number of
search hits of 90,000, equaling 360 KB.
The total required size for the filterCache results in a value of 1.8 GB (360 KB x 5000).
Note: The filterCache is required on the product index when an extension index such as
Inventory exists, so that the query component functions correctly.
docum
entCac
he
Assume an average size of each Catalog Entry document to be 10 KB.
Assign 5% of the entire catalog to be cached, or 100000 entries for the documentCache.
The total required size for the documentCache results in a value of 1.0 GB (10 KB x
100000).
Note:
Set the documentCache size to at least the maximum anticipated size of a search
result.
The documentCache is required on the product index when an extension index
such as Inventory exists so that the query component functions correctly.
As a result, the e
JVM heap size t
required for eac
Entry core is 4.3
GB + 1.8 GB + 1
Managing
cache sizes
to conform
to JVM
memory
Ensure that you configure the fieldValueCache of the catalog entry index core in
the solrconfig.xml file. This configuration can prevent out-of-memory issues by limiting its
size to conform to JVM memory.
The cache set size depends on the facets field quantity and catalog size. The cache
entry size can roughly be computed by the quantity of catalog entries in the index core,
which is then multiplied by 4 bytes. That is, the potential quantity of cache entries equals
the quantity of potential facets.
<fieldValueCache class="solr.FastLRUCache"
size="300"
autowarmCount="128"
showItems="32" />
Note: The recommended solr.FastLRUCache caching implementation does not have a
hard limit to its size. It is useful for caches that have high hit ratios, but might significantly
exceed the size value that you set. If you are using solr.FastLRUCache, monitor your
heap usage during peak periods. If the cache is significantly exceeding its limit, consider
changing the fieldValueCache class to solr.LRUCache to avoid performance issues or
an out-of-memory condition.
services/cache/SearchNavigationDistributedMapCache
Each entry ranges 8 - 10 KB, containing 10 - 20 relevancy fields. The cache instance
also contains other types of cache entries. The database is used for every page hit when
the cache instance is full, reducing performance.
T
u
n
i
n
g
t
h
e
s
e
a
r
c
h
d
a
t
a
c
a
c
h
e
f
o
r
f
a
c
e
t
e
d
n
a
v
i
g
a
t
i
o
n
The HCL Commerce Search server code uses the WebSphere Dynamic Cache facility to
perform caching of database query results. Similar to the data cache used by the
main HCL Commerce server, this caching code is referred to as the HCL Commerce
Search server data cache.
Adju
sting
heap
spac
e
whe
n
sear
ch
prod
uct
displ
ay is
enab
led
When
the
searc
h
produ
ct
displa
y
featur
e is
enabl
ed,
adjust
the
heap
size
accor
ding
to
these
guideli
nes:
Alloca
te
appro
ximate
ly 5
MB/ca
tegory
with
produ
ct
seque
ncing
file for
produ
ct
seque
ncing:
For
Image
Facet
Overri
de:
~10M
B per
categ
ory
with
image
overri
de file.
For
Seque
ncing
and
Image
Overri
de:
Assu
ming
a
baseli
ne of
100,0
00
produ
cts in
the
categ
ory,
allocat
e
~15M
B per
categ
ory
with
seque
ncing
and
image
overri
de file.
If you
are
using
manu
al
seque
ncing
with
many
categ
ories,
add
1.5
MB
per
categ
ory
that is
seque
nced
for
each
additio
nal
100,0
00
produ
cts.
For
exam
ple,
accor
ding
to the
15 MB
per
categ
ory
estima
te,
manu
al
seque
ncing
of 200
categ
ories
with a
catalo
g size
of
100k
can
use 3
GB of
memo
ry.
Manu
al
seque
ncing
of the
same
200
categ
ories
can
use 6
GB
when
the
catalo
g size
is 1.1
million
.
Theref
ore,
the
heap
space
that is
allocat
ed per
categ
ory
must
be
adjust
ed
accor
ding
to the
catalo
g size.
Face
t
perf
orma
nce
Consi
der
the
followi
ng
facet
perfor
manc
e
tuning
consid
eratio
ns
when
you
work
with
facets
in
starter
stores
:
Tune
the
size of
the ser
vices/c
ache/S
earchN
avigati
onDist
ributed
MapCa
che ca
che
instan
ce
accor
ding
to the
numb
er of
categ
ories.
Tune
the
size of
the ser
vices/c
ache/S
earchA
ttribute
Distrib
utedM
apCac
he cac
he
instan
ce
accor
ding
to the
numb
er of
attribu
te
diction
ary
faceta
ble
attribu
tes.
Avoid
enabli
ng
many
attribu
te
diction
ary
facete
d
naviga
tion
attribu
tes in
the
storefr
ont
(Sho
w
facets
in
searc
h
result
s).
Avoidi
ng
many
of
these
attribu
tes
can
help
avoid
Solr
out-of-
memo
ry
issues
.
Exte
nsio
n
inde
x
Consi
der
the
followi
ng
usage
when
an
extens
ion
index
such
as
Invent
ory
exists
in
WebS
phere
Comm
erce
searc
h:
The
filterC
ache
and
docu
mentC
ache
are
requir
ed on
the
produ
ct
index
when
an
extens
ion
index
such
as
Invent
ory
exists
in HC
L
Comm
erce
Searc
h so
that
the
query
comp
onent
functio
ns
correc
tly.
You
can
typical
ly
disabl
e all
other
intern
al Solr
cache
s for
the
extens
ion
index
in the
searc
h run
time.
Conf
igura
tion
optio
ns
Searc
h
confi
gurati
on
Ensure that you are familiar with the various Solr configuration parameters, Solr Wiki:
solrconfig.xml. The documentation contains information for typical configuration
customizations that can potentially increase your search server performance. For
example, if your store contains a high number of categories or contracts, or if your search
server is receiving Too many boolean clauses errors, increase the default value
for maxBooleanClauses.
Indexing
changes
and other
considera
tions
Garbage
collection
The default garbage collector policy for the HCL Commerce JVM is the Generational
Concurrent Garbage Collector. Typically, you do not need to change this garbage
collector policy.
You can activate the Generational Concurrent Garbage Collector for the HCL Commerce
Search JVM by using the -Xgcpolicy:gencon command line option.
Note: Using a garbage collection policy other than the Generational Concurrent Garbage
Collector might result in situations with increased request processing times and high CPU
utilization.
Spell checking
You can experience a performance impact when you enable spell checking for HCL
Commerce Search terms.
You might see performance gains in transaction throughput if either spell checking is
skipped where necessary, or when users search for products with catalog overrides.
For example, a search term that is submitted in a different language than the storefront
requires resources for spell checking. However, product names with catalog overrides
are already known and do not require any resources for spell checking.
The spell checker component, DirectSolrSpellChecker, uses data directly from the
CatalogEntry index, instead of relying on a separate stand-alone index.
Improving Stor
Preview perform
for search chan
To improve performance when you preview search changes, you can skip indexing
unstructured content when business users start Store Preview:
For more information, see Changing properties in the HCL Commerce configuration file
(wc-component.xml).
Performance
You can monitor
Commerce Sear
following method
Lucene Index t
Luke is a development and diagnostic tool for search indexes. It displays and modifies
search index content. For more information, see Luke - Lucene Index Toolbox.
WebSphere Ap
clients
JMX clients can read runtime statistics from Solr.
1. Add the JMX registry in the Solr core configuration file, solrconfig.xml:
2.
3. <jmx
serviceURL="service:jmx:iiop://host_name:2809/jndi/JMXConnector"/>
4. Use jconsole in Rational Application Developer to connect to the runtime JMX.
When the Solr core is initialized, you can use jconsole to view information from
JMX, such as statistics information for caches.
Perform the following business actions during business hours, as they do not pose a significant
risk to production system performance:
Avoid performing the following business actions during business hours, as they might affect
performance to the production system:
Reparenting an existing category. This triggers a full reindexing, which is not suitable
during business hours where the system usage is high.
Removing an existing category or a subcategory from a catalog. This triggers a full
reindexing, which is not suitable during business hours where the system usage is high.
Individual attachments should not be reloaded or deleted separately. This operation must
be performed with an associated product.
Warning: By default, the updateSearchIndex scheduled job runs a full index update. Do not
run the updateSearchIndex scheduled job in any full index configurations on a production
environment.
The inventory index contains operational data and therefore can be used only for previewing in
an authoring environment. IT administrators can set up a recurring task to take snapshots of
inventory status from a production environment and use them in an authoring environment for
previewing, testing, and tuning of search rules.
An authoring environment is one that has a workspace that is enabled and business
users can use this environment to perform changes within a workspace. Once the
workspace gets approved, committed data (from the base schema) can be reindexed and
then published to the production environment through index replication.
A staging environment is similar to an authoring environment, with the exception that it
does not have workspace that is enabled. Business users can still make changes in this
staging environment, but changes are made immediately to the search index in this
staging environment.
Note: All reindexing types that are listed in the table are denoted against the product index and
not the category index, except where indicated (Category business component).
The following table groups business tasks and reindexing types by business components.
Business
component Business task Reindexing type
Catalog: master Linking or unlinking to an Delta: Product and Category index
or sales catalog existing category from a Note: A delta reindex is performed only if the
catalog tree number of changes that are affected by the
business task is less than
the DeltaIndexingThreshold threshold.
Changes to an association Delta: Product index
of existing product to a
catalog
Store: direct Adding a new stand-alone Full: All indexes
business model direct model store that
uses a separate master
catalog
Store: extended Adding a new extended Not required
business model site that uses an existing
indexed catalog asset store
Adding a product, or Delta: Product index
delete an existing product
from an existing extended
site
Catalog entry: Adds a product, or delete Delta: Product index
product, package, an existing product
bundle, kit, item Updates any existing Delta: Product index
property or adds a
property to an existing
catalog entry, such as the
product description,
product name, brand
name, thumbnail, images,
SKU
Updates any existing or Delta: Product index
adds new package or
bundle
Associating or removing a Delta: Product index
product attribute from an
existing product
Reparent a catalog entry Delta: Product index
Category Adds a category Delta: Category index
Deletes an existing Full
category
Business tasks that affect the search index
The following table groups business tasks and reindexing types by business components.
Business
component Business task Reindexing type
Updates any existing Delta: Category index
property or adds a
property to an existing
category, such as the
category description,
thumbnail, images
Reparent a category Delta: Product and Category index
Merchandising Updates or adds new Not required
association merchandising association
Attribute Adding or removing any Not required
Dictionary value of a newly created
attributes product attribute in the
attribute dictionary
Updates to any value of an Delta: Product index
existing attribute in the
attribute dictionary that is
associated with products.
Updates made to an Delta: Product index
associated catalog entry's
Attribute Dictionary
attributes or their allowed
values.
Attributes Updates to any value of a Delta: Product index
newly created or existing
product attribute
Adding or removing any Delta: Product index
value of an existing
product attribute
Adding or removing a Delta: Product index
product attribute
Associated asset Uploads a new attachment Delta: Product and Unstructured index
and associates with an
existing product
Reuploads or deletes an Delta: Product and Unstructured index
existing attachment that is
associated with only one
product
Reuploads or deletes an Delta: Product and Unstructured index
existing attachment that is
associated with existing
products
Price Updates to any existing or Delta: Product index
Business tasks that affect the search index
The following table groups business tasks and reindexing types by business components.
Business
component Business task Reindexing type
adds a new (default) price
rule to a store
Updates to store default Delta: Product index
offer price for a product
Updates to list price for a Not required
product
Contract Creating or changing a Full
contract using Catalog
Filter from within the
WebSphere Commerce
Accelerator.
Marketing Adding, Not required
changing, or deleting an
existing marketing activity
(Web, Dialog)
Search rule Adding, changing, or Not required
deleting an existing search
rule
Search term Adding, changing, or Not required
association deleting an existing search
term association
Versioning Rollback or forward to Delta: Product and Category index
another version of a
category
Rollback or forward to Delta: Product index
another version of a
product
Inventory Updates to the inventory Full: Inventory index
search index
Common business tasks that affect the search index with workspaces
enabled
The following table highlights the available index types for approved content (base) and
workspaces.
All reindexing types that are listed in the table are denoted against the following indexes:
Index types
Indexing is triggered against the base schema to index the workspace changes under the
Approved content index, and
Indexing is required against the workspace schema to clean up the approved changes
from the workspace index.
The following table describes business tasks that affect the search index
Reindexing type
Business Approved
component Business task Content Workspace
Catalog: master or Linking or unlinking to an existing Delta: Product Delta: Product
sales catalog subcategory from a catalog tree and Category and Category
index index
Linking or unlinking for a top Full: All indexes Full: All indexes
category
Changes to an association of existing Delta: Product Delta: Product
product to a catalog index index
Create Sales Catalog Not required Not required
Update Catalog description Not required Not required
Update default Catalog Not required Not required
Store: direct Adding a new stand-alone direct Full: All indexes Full: All indexes
business model model store that uses a separate
master catalog
Store: extended Adding a new Extended Site that uses Not required Not required
business model an existing indexed catalog asset store
Adding a product, or delete an Delta: Product Delta: Product
existing product from an existing index index
Extended Site
Catalog entry: Adding a product, or delete an Delta: Product Delta: Product
product, package, existing product index index
bundle, kit, item Updating any existing property or Delta: Product Delta: Product
adding a property to an existing index index
catalog entry, such as the product
description, product name, brand
name, thumbnail, images, SKU
Business tasks that affect the search index
The following table describes business tasks that affect the search index
Reindexing type
Business Approved
component Business task Content Workspace
Updating any existing or adding a Delta: Product Delta: Product
package or bundle index index
Associating or removing a product Delta: Product Delta: Product
attribute from an existing product index index
Reparenting a catalog entry Delta: Product Delta: Product
index index
Updating the sequence of a catalog Delta: Product Delta: Product
entry within a category index index
Unpublish a product (Display to Delta: Product Delta: Product
customer not selected in the index index
Management Center)
Category Adding a subcategory to an existing Delta: Category Delta: Category
category index index
Deleting a subcategory from an Full: All indexes Full
existing category
Updating any existing property or Delta: Category Delta: Category
adding a property to an existing index index
category, such as the category
description, thumbnail, images
Reparenting a category Delta: Product Delta: Product
and Category and Category
index index
Updating the sequence of a sales Delta: Category Delta: Category
category when Expanded Category index index
Navigation is disabled
Updating the sequence of a sales Delta: Product Delta: Product
category when Expanded Category and Category and Category
Navigation is enabled index index
Unpublish a category (Display to Delta: Category Delta: Category
customer not selected in the index index
Management Center)
Unpublish a category (Display to Full: Product Full: Product
customer not selected in the and Category and Category
Management Center) when deep index index
category unpublish is enabled
Merchandising Updating or adding new Not required Not required
association merchandising associations
Attribute Dictionary Adding a value to an existing Not required Not required
attributes attribute dictionary attribute
Updating or removing any value of an Delta: Product Delta: Product
Business tasks that affect the search index
The following table describes business tasks that affect the search index
Reindexing type
Business Approved
component Business task Content Workspace
existing attribute in the attribute index index
dictionary that is associated with
products
Adding an attribute dictionary Not required Not required
attribute
Marking an attribute dictionary Delta: Product Delta: Product
attribute as searchable or facetable index index
Removing an attribute dictionary Delta: Product Delta: Product
attribute index index
Attributes Updates to any value of a newly Delta: Product Delta: Product
created or existing product attribute index index
Adding or removing any value of an Delta: Product Delta: Product
existing product attribute index index
Adding or removing a product Delta: Product Delta: Product
attribute index index
Associated asset Uploading a new attachment and Delta: Product Delta: Product
associating it with an existing product and and
Unstructured Unstructured
index index
Uploading or deleting an existing Delta: Product Delta: Product
attachment that is associated with one and and
or more products Unstructured Unstructured
Note: You must also update the index index
product
Price Updates to default store offer price Delta: Product Delta: Product
for a product index index
Contract Creating or changing a contract using Full: All indexes Full: All indexes
Catalog Filter from within the
WebSphere Commerce Accelerator.
Marketing Adding, changing, or deleting an Not required Not required
existing Web or Dialog activity
The following table describes business tasks that affect the search index
Reindexing type
Business Approved
component Business task Content Workspace
Versioning Rollback or forward to another Delta: Product Delta: Product
version of a category and Category and Category
index index
Rollback or forward to another Delta: Product Delta: Product
version of a product index index
Inventory Updates to the inventory search index Full: Inventory Full: Inventory
index index
Example: Reading a table row for common business tasks
When a business user is working on a workspace schema and creating a new product in
the Catalogs tool, a delta reindexing is required to update the workspace product index.
Performance logger
The performance logger produces trace information about the response time when HCL
Commerce calls out to an external system. The trace can be used by a monitor to
measure response times.
The trace is enabled using the following string: com.ibm.commerce.performance=fine.
The name and location of the trace file
is: WAS_profiledir/logs/performanceTrace.json.
The following is a sample entry in the performance trace file, in JSON format:
{"timestamp": "2012/10/02 23:56:38:265 EDT", "threadID": "0000009c",
"source": "External OMS",
"service": "getPage-getOrderList", "serviceTime": "6188 ms"},
The following are the external service calls that are traced by default, with the API that
calls out the performance logger:
Sour
API ce Service
Sour
API ce Service
ProcessInventoryRequirementCancelInventoryReservationActionCm Exter multiAPI-
dImpl.callCancelInventoryService() cancelReservatio
nal n
OMS
ProcessInventoryRequirementReserveInventoryActionCmdImpl.call Exter reserveAvailable
ReserveInventoryService() Inventory
nal
OMS
FetchTransferredExternalOrderByStoreMemberAndStatusCmdImpl.fe Exter getPage-
tchExternalOrders() getOrderList
nal
OMS
An option that stores the data in cache but does not persist that data to the database.
This is the default behavior for the recently viewed categories and products, but can be
configured for any user behavior data recording.
An option to set the maximum size of user behavior data that is stored in cache before
that data is persisted to the database.
Activity and All information that is related to the processing of an activity during the
behavior rules storefront is put into the marketing cache. All information includes definition
of the behavior rules that need to match against URLs and controller
commands.
Recording of Performed in batch mode. Only once there is an activity is the data that is
user data related to that activity recorded. For example, only once a target has a
behavior rule that requires a customer to have browsed the Furniture category
five times is the customers browsing of the Furniture recorded. The browsing
of other categories is not recorded in this instance.
Optional For high amounts of data. For example, for a recently browsed list, the data is
Area Performance consideration
Accessing A customer's online behavior user data is in a user data cache. While a
user data customer is browsing the site, any access to their user data is from the cache
that avoids any database access.
Customer is in Processing time of large dialog activities can be configured to run off peak.
Segment Segment evaluation can be expensive and result in many customers being run
trigger through a dialog activity.
Aggregate Views, clicks, and customers reach an element are accumulated in memory
statistics and are periodically persisted to the database that avoids a database write on
every page visit.
Emerald REST Caching On TS Server for Commerce 9.1
Emerald Store is powered by the REST framework, and the cache is implemented using a REST
servlet.
To enable caching of the default Emerald store, copy the following example;
/Rest/WebContent/WEB-INF/cachespec.xml.emerald.sample.store in the
REST/WEB-APP/cachespec.xml file and restart.
/wcs/resources/store/{storeId}/adminLookup?q=findByStoreIdentifier&storeIdentifier={storename}
/wcs/resources/store/{storeId}/associated_promotion?
q=byProduct&qProductId={productId}&langId={langId
/wcs/resources/store/{storeId}/espot/{espotIdentifier}?
catalogId={catalogId}&name={name}&langId={langId
/wcs/resources/store/{storeId}/inventoryavailability/{catentryId}?langId={langId}
/wcs/resources/store/{storeId}/guestidentity?langId={langId}
/wcs/resources/store/{storeId}/online_store
Only a few of the cached urls require the invalidation method, which is enabled. The inventory url is
the critical url, which is cached and requires a caching trigger to keep it up to date with the system's
real inventory.
The sample cachespec.xml that defines the rules is provided in the file:
/Rest/WebContent/WEB-INF/cachespec.xml.emerald.sample.store
The product inventory is implemented as a separate rest call to the TS-app server in the Emerald
store, which is a REST-based store. This call is submitted straight from the browser to the TS-app,
bypassing the CDN layer. The store's performance will be significantly improved by caching this call.
Although the call can be cached on the CDN, the content will be temporary, and the invalidation
policy will be rudimentary. Caching it on the appserver, on the other hand, is considerably more
versatile and will enable invalidation techniques to satisfy business needs and keep the data fresh
over time.
In general, the inventory/quantity is updated regularly from both the backend and user interaction
with the system (when submitting the order).
Daily updates and direct feeds would run on the production database, causing inventory
adjustments. The triggers would take track of these changes according to the inventory tracking
rules. In other words, the inventory feed will be treated in the same way that the user browses and
purchases items from the inventory.
While surfing, the browse page data will verify stock/inventory information to ensure that the end
user receives accurate information.
The inventory in the Emerald store is accessed through REST requests sent from the user's browser
to the TS app server. These REST calls return inventory information/counts as a response. In most
instances, the actual count is less essential than knowing whether the product is in stock or out of
stock. If that's the case, the Boolean value of "in stock" should only be refreshed if the following
conditions are met:
The stock count goes from positive number to zero (the product is sold/inventory is depleted).
The stock count goes from zero to positive number (new stock arrived in the system)
In other words, the inventory can be cached until it runs out of stock, at which point it should be
invalidated, OR the inventory can be stored until the backend feed puts the item back into stock.
There could be a bit more complicated lifecycle of the cached content as:
In stock
Out of stock.
Obviously, we will have to invalidate the inventory as it is passing through each of these conditions:
The stock count goes down to LowStock Value (we don’t really need to track if this approached from
high or low.
In the inventory invalidation strategy, this mode of operation and logic must be followed. The
mechanism for tracking and emitting the invalidation message to the rest of the system should be
integrated into the caching triggers in this specific case.
Daily Updates
Daily updates and direct feeds operate on the production database and might induce alterations to
the database's inventory table. Since this inventory table's caching triggers will be active, data feed
change operations will be handled in the same location and in the same manner as user browsing
and checkout activities on the inventory.
The design of the dependencies and invalidations is defined by the website's actual implementation.
The inventory calls on the TS-app server, for example, are rest calls that can be directly invalidated;
but, if the inventory is displayed on the search response pages, the cache will be emptied altogether.
Conditional Triggers
The triggers should fire only if the condition is satisfied, i.e. they should only insert into CACHEIVL
table only if the condition is satisfied. This is achieved with the when clause in the trigger:
WHEN (clause)
BEGIN ATOMIC
The clause for the first case where we have only instock and out of stock would look like:
WHEN ((N.quantity > 0 AND O.quantity = 0) or (N.quantity = 0 and O.quantity > 0))
WHEN ((N.quantity > 0 AND O.quantity = 0) or (N.quantity = 0 and O.quantity > 0))
BEGIN ATOMIC
When we have 3 levels of inventory, as in instock, low stock and out of stock:
WHEN ((N.quantity > 0 AND O.quantity = 0) or (N.quantity = 0 and O.quantity > 0) OR (N.quantity =
10 and O.quantity > 10))
BEGIN ATOMIC
The example below demonstrates how to set up triggers on an inventory table. The trigger will insert
invalidation messages in the cache ivl for the following objects, in addition to the trigger clauses that
follow the business logic:
SKU level.
Product Level.
Bundle Level.
--#SET TERMINATOR #
DROP TRIGGER ch_inventory_u#
WHEN ((N.quantity > 0 AND O.quantity = 0) or (N.quantity = 0 and O.quantity > 0) OR (N.quantity =
10 and O.quantity > 10))
BEGIN ATOMIC
INSERT INTO cacheivl (template, dataid, inserttime) VALUES ( NULLIF('A', 'A'), 'catentryId:'||
RTRIM(CHAR(N.catentry_id)) , current_timestamp );
catentrel.catentry_id_parent = product.catentry_id);
prodtobund.catentry_id_parent = bundle.catentry_id);
END#
The 3 inserts in the trigger are matching the 3 possible objects that may have been affected by the
inventory table record. The provided trigger is directly applicable to DB2 database, and with minimal
changes it can be applied in Oracle DB as well. No further changes are needed, and the invalidations
will start working once the trigger is in place.