One of the key features of the Case Modelling solution, is that various components are split from the Runtime, so they can be managed separately. This creates the possibility to fine tune different parts of the solutions using smaller building blocks. The components used for user-interactions is limited to the Runtime, all other components are stateless. Since the solution is event based, the queue plays a general role for the components to communicate with each other. In maintaining and tuning the solution, the queue can be a great starting point to check whether components are scaled right for your situation. The most critical Blueriq components concerned with user-interaction and processing operational data are the Runtime, Case Engine, Customerdata Service and DCM Lists Service. This guide will focus mainly on scaling these components.
The Runtime has two main jobs in a DCM solution: User-interaction with user sessions and processing automatic tasks. The second job is an asynchronous process, where the Runtime reads from the queue:
dcmTasksEventsQueue |
When the number of unacknowledged messages increases for this queue, the Runtime cannot cope with the load and will start lagging behind.
There is a choice to either start another Runtime, or make sure the Runtime can process more automatic task events simultaneously. The following properties can be used to configure the Runtime to process more automatic tasks at the same time:
# queue consuming threads blueriq.dcm.concurrency.concurrent-consumers=5 blueriq.dcm.concurrency.max-concurrent-consumers=5 |
When tuning the Runtime, make sure it is capable of handling the desired amount of user sessions, together with the maximum number of concurrent automatic task processors. |
For most situations where the system load is limited, it is reasonable to have a single Runtime handle user-interactions and automatic tasks. However, with high load applications, it could make sense to split these responsibilities, making sure they can be scaled accordingly. This can be configured:
Runtime type | Configuration |
User-interaction only |
Automatic task processing only |
The advantage of separating these responsibilities, could be that the first Runtime could be scaled by measuring the amount of user sessions, and the second by the amount of unacknowledged messages on the queue.
The Case Engine is mostly concerned with handling messages from the queue. It also is used for several synchronous calls from the Runtime and DCM Maintenance APP, like Starting a task, GetCaseInfo or GetTaskInfo.
When tuning the Case Engine, make sure the synchronous calls are performed in reasonable time, since users will be waiting on the result. |
The most important queue for the Case Engine, where all related events are stored is:
dcmEventsQueue |
When the number of unacknowledged messages increases for this queue, the Case Engine cannot cope with the load and will start lagging behind. Just like the Runtime, configuration can be set to consume the queue with more than one thread at the same time:
# parameters to make sure events are consumed multi-threaded |
The Case Engine can be tuned to the max amount of concurrent consumers, while still being able to perform synchronous tasks in reasonable time. Since sessions are almost always short-lived, memory usage might also be less than the Runtime. When the Case Engine can not cope with the load, a new Case Engine could be added to the solution. Just make sure all synchronous calls are sent from the Runtime / DCM Maintenance APP using a loadbalancer.
Process evaluation can be very database-intensive. Therefore, increasing the number of concurrent consumers increases the possibilities of database (dead)locks to occur. Mostly, the system will recover from these locks automatically, by retrying the action after some time. But it is something to keep into account. |
The Case-Engine scheduler is used to process timed events (like task expire dates and process timers). Please make sure the scheduler is configured properly to store timed events in the sql-database. Also, make sure it is properly configured to cope with having more than one Case Engine present on the same database. The only responsibility for the scheduler, is to put messages to the queue at the right time, there is probably no need to upscale this specific component in a high load situation.
The DCM Lists Service stores all data used in the Case and Worklist in the Runtime, together with DCM_CaseSearch service requests. The lists service listens to the queue:
dcmListsServiceEventsQueue |
When the number of unacknowledged messages increases, it means that the lists service cannot handle the load. Therefore lists queries can start lagging behind with the real situation in the Case Engine (so for example locked cases still seem locked even when the task is processed by the Case Engine).
Either the number of consuming threads can be increased, or the number of dcm-lists applications.
blueriq: dcm: lists: event: consumer: concurrency: max-concurrent-consumers: 5 concurrent-consumers: 5 |
Make sure the lists-service is still capable of handling all synchronous requests, since users will be waiting for the results.
The Case-Engine processes messages, such as a "task completed" message from a Runtime. When it processes this message, it sends different resulting messages to the outbox. To guarantee transactionality, all messages (such as new automatic tasks, dcm-lists-updates, trace updates and timeline updates) are stored in the database and processed after the database commit has taken place. The Case-Engine considers the processing of the message to be completed, and an async thread will make sure all messages from the outbox are pushed to the right queues eventually. However, one action could lead to many messages to queue in the outbox, especially when the trace is turned on. Therefore processing the outbox has been made configurable, so upscaling is possible.
When the process_outbox table contains many records, consider tuning the outbox as explained below. |
Processing outbox can be tuned by increasing/decreasing batchsize, concurrent consumers, async threads and interval. Below there is diagram that shows how whole flow works.
After saving messages into outbox, process outbox publishers will process messages by removing them from outbox and publishing into other queues. Their number can be configured by updating async threads pool size. This will increase speed of clearing out Outbox. Default value is 1. |
Outbox Poller is also possible to tune by specifying batchsize, which is a numer of messages pulled into memory that will be processed. By default it's 10000 |
By specifying interval, we can configure time when poller should get messages from outbox and start processing another batch. By default, it's 2 minutes |
Problem: Outbox is full, queue is empty
Solution: Increase async threads
Problem: Out of memory error
Solution: Decrease batchsize
Problem: Messages are processes in queue very slowly
Solution: Increase concurrent consumers
Problem: After system was down, outbox is full of messages and they are processed very slowly
Solution: Increase batchsize, decrease interval
The role of the Customerdata Service is limited to storing and retrieving data. All searches are done by using the DCM-Lists-Service. Therefore, the Cutsomerdata Service does not need much tuning, and can be horizontally scaled when possible. This can be done by adding a new instance, and running all customerdata endpoint through a loadbalancer (configured in the Case-Engine).