Best practice - How to tune the Case Engine components

One of the key features of the Case Modelling solution, is that various components are split from the Runtime, so they can be managed separately. This creates the possibility to fine tune different parts of the solutions using smaller building blocks. The components used for user-interactions is limited to the Runtime, all other components are stateless. Since the solution is event based, the queue plays a general role for the components to communicate with each other. In maintaining and tuning the solution, the queue can be a great starting point to check whether components are scaled right for your situation. The most critical Blueriq components concerned with user-interaction and processing operational data are the Runtime, Case Engine, Customerdata Service and DCM Lists Service. This guide will focus mainly on scaling these components.

1Runtime
- 1.1Configuration possibilities for separate Runtimes
2Case Engine
- 2.1Scheduler
3DCM Lists Service
4Process Outbox and Outbox Poller
- 4.1Async threads pool
- 4.2Outbox Poller
- 4.3Example issues and solutions
  - 4.3.1Example 1
  - 4.3.2Example 2
  - 4.3.3Example 3
  - 4.3.4Example 4
5Customerdata Service

Runtime

The Runtime has two main jobs in a DCM solution: User-interaction with user sessions and processing automatic tasks. The second job is an asynchronous process, where the Runtime reads from the queue:

dcmTasksEventsQueue

When the number of unacknowledged messages increases for this queue, the Runtime cannot cope with the load and will start lagging behind.

There is a choice to either start another Runtime, or make sure the Runtime can process more automatic task events simultaneously. The following properties can be used to configure the Runtime to process more automatic tasks at the same time:

application-case-engine-client.properties

# queue consuming threads
blueriq.dcm.concurrency.concurrent-consumers=5
blueriq.dcm.concurrency.max-concurrent-consumers=5

When tuning the Runtime, make sure it is capable of handling the desired amount of user sessions, together with the maximum number of concurrent automatic task processors.

Configuration possibilities for separate Runtimes

For most situations where the system load is limited, it is reasonable to have a single Runtime handle user-interactions and automatic tasks. However, with high load applications, it could make sense to split these responsibilities, making sure they can be scaled accordingly. This can be configured:

Runtime type	Configuration
User-interaction only	Runtime should be added to the loadbalancer, making sure user-interaction can be routed to the Runtime The property `blueriq.dcm.execute-automatic-tasks=false` in the `application-case-engine-client.properties` makes sure automatic tasks are not processed by this Runtime instance
Automatic task processing only	Runtime should not be added to the loadbalancer, user-interaction is not routed to this Runtime The property `blueriq.dcm.execute-automatic-tasks=true` in the `application-case-engine-client.properties` makes sure automatic tasks are processed, possibly together with the concurrency settings described above.

The advantage of separating these responsibilities, could be that the first Runtime could be scaled by measuring the amount of user sessions, and the second by the amount of unacknowledged messages on the queue.

Case Engine

The Case Engine is mostly concerned with handling messages from the queue. It also is used for several synchronous calls from the Runtime and DCM Maintenance APP, like Starting a task, GetCaseInfo or GetTaskInfo.

When tuning the Case Engine, make sure the synchronous calls are performed in reasonable time, since users will be waiting on the result.

The most important queue for the Case Engine, where all related events are stored is:

dcmEventsQueue

When the number of unacknowledged messages increases for this queue, the Case Engine cannot cope with the load and will start lagging behind. Just like the Runtime, configuration can be set to consume the queue with more than one thread at the same time:

application-case-engine.properties

# parameters to make sure events are consumed multi-threaded
blueriq.case.engine.concurrency.concurrent-consumers=5
blueriq.case.engine.concurrency.max-concurrent-consumers=5

The Case Engine can be tuned to the max amount of concurrent consumers, while still being able to perform synchronous tasks in reasonable time. Since sessions are almost always short-lived, memory usage might also be less than the Runtime. When the Case Engine can not cope with the load, a new Case Engine could be added to the solution. Just make sure all synchronous calls are sent from the Runtime / DCM Maintenance APP using a loadbalancer.

Process evaluation can be very database-intensive. Therefore, increasing the number of concurrent consumers increases the possibilities of database (dead)locks to occur. Mostly, the system will recover from these locks automatically, by retrying the action after some time. But it is something to keep into account.

Scheduler

The Case-Engine scheduler is used to process timed events (like task expire dates and process timers). Please make sure the scheduler is configured properly to store timed events in the sql-database. Also, make sure it is properly configured to cope with having more than one Case Engine present on the same database. The only responsibility for the scheduler, is to put messages to the queue at the right time, there is probably no need to upscale this specific component in a high load situation.

DCM Lists Service

The DCM Lists Service stores all data used in the Case and Worklist in the Runtime, together with DCM_CaseSearch service requests. The lists service listens to the queue:

dcmListsServiceEventsQueue

When the number of unacknowledged messages increases, it means that the lists service cannot handle the load. Therefore lists queries can start lagging behind with the real situation in the Case Engine (so for example locked cases still seem locked even when the task is processed by the Case Engine).

Either the number of consuming threads can be increased, or the number of dcm-lists applications.

blueriq-dcm-lists.yml

blueriq:
  dcm:
    lists:
      event:
        consumer:
          concurrency:
            max-concurrent-consumers: 5
            concurrent-consumers: 5

Make sure the lists-service is still capable of handling all synchronous requests, since users will be waiting for the results.

Process Outbox and Outbox Poller

The Case-Engine processes messages, such as a "task completed" message from a Runtime. When it processes this message, it sends different resulting messages to the outbox. To guarantee transactionality, all messages (such as new automatic tasks, dcm-lists-updates, trace updates and timeline updates) are stored in the database and processed after the database commit has taken place. The Case-Engine considers the processing of the message to be completed, and an async thread will make sure all messages from the outbox are pushed to the right queues eventually. However, one action could lead to many messages to queue in the outbox, especially when the trace is turned on. Therefore processing the outbox has been made configurable, so upscaling is possible.

When the process_outbox table contains many records, consider tuning the outbox as explained below.

Processing outbox can be tuned by increasing/decreasing batchsize, concurrent consumers, async threads and interval. Below there is diagram that shows how whole flow works.

Async threads pool

After saving messages into outbox, process outbox publishers will process messages by removing them from outbox and publishing into other queues. Their number can be configured by updating async threads pool size. This will increase speed of clearing out Outbox. Default value is 1.

blueriq.case.engine.outbox.async.threads.pool.size=5

Outbox Poller

Outbox Poller is also possible to tune by specifying batchsize, which is a numer of messages pulled into memory that will be processed. By default it's 10000

blueriq.case.engine.outbox.poller.batchsize=10000

By specifying interval, we can configure time when poller should get messages from outbox and start processing another batch. By default, it's 2 minutes

blueriq.case.engine.outbox.poller.interval.minutes=2m

Example issues and solutions

Example 1

Problem: Outbox is full, queue is empty

Solution: Increase async threads

Example 2

Problem: Out of memory error

Solution: Decrease batchsize

Example 3

Problem: Messages are processes in queue very slowly

Solution: Increase concurrent consumers

Example 4

Problem: After system was down, outbox is full of messages and they are processed very slowly

Solution: Increase batchsize, decrease interval

Customerdata Service

The role of the Customerdata Service is limited to storing and retrieving data. All searches are done by using the DCM-Lists-Service. Therefore, the Cutsomerdata Service does not need much tuning, and can be horizontally scaled when possible. This can be done by adding a new instance, and running all customerdata endpoint through a loadbalancer (configured in the Case-Engine).

Other spaces