One of the key features of the Case Modelling solution, is that various components are split from the Runtime, so they can be managed separately. This creates the possibility to fine tune different parts of the solutions using smaller building blocks. The components used for user-interactions is limited to the Runtime, all other components are stateless. Since the solution is event based, the queue plays a general role for the components to communicate with each other. In maintaining and tuning the solution, the queue can be a great starting point to check whether components are scaled right for your situation. The most critical Blueriq components concerned with user-interaction and processing operational data are the Runtime, Case Engine, Customerdata Service and DCM Lists Service. This guide will focus mainly on scaling these components.
Runtime
The Runtime has two main jobs in a DCM solution: User-interaction with user sessions and processing automatic tasks. The second job is an asynchronous process, where the Runtime reads from the queue:
dcmTasksEventsQueue
When the number of unacknowledged messages increases for this queue, the Runtime cannot cope with the load and will start lagging behind.
There is a choice to either start another Runtime, or make sure the Runtime can process more automatic task events simultaneously. The following properties can be used to configure the Runtime to process more automatic tasks at the same time:
# queue consuming threads blueriq.dcm.concurrency.concurrent-consumers=5 blueriq.dcm.concurrency.max-concurrent-consumers=5
When tuning the Runtime, make sure it is capable of handling the desired amount of user sessions, together with the maximum number of concurrent automatic task processors.
Configuration possibilities for separate Runtimes
For most situations where the system load is limited, it is reasonable to have a single Runtime handle user-interactions and automatic tasks. However, with high load applications, it could make sense to split these responsibilities, making sure they can be scaled accordingly. This can be configured:
Runtime type | Configuration |
---|---|
User-interaction only |
|
Automatic task processing only |
|
The advantage of separating these responsibilities, could be that the first Runtime could be scaled by measuring the amount of user sessions, and the second by the amount of unacknowledged messages on the queue.
Case Engine
The Case Engine is mostly concerned with handling messages from the queue. It also is used for several synchronous calls from the Runtime and DCM Maintenance APP, like Starting a task, GetCaseInfo or GetTaskInfo.
When tuning the Case Engine, make sure the synchronous calls are performed in reasonable time, since users will be waiting on the result.
The most important queue for the Case Engine, where all related events are stored is:
dcmEventsQueue
When the number of unacknowledged messages increases for this queue, the Case Engine cannot cope with the load and will start lagging behind. Just like the Runtime, configuration can be set to consume the queue with more than one thread at the same time:
# parameters to make sure events are consumed multi-threaded blueriq.case.engine.concurrency.concurrent-consumers=5 blueriq.case.engine.concurrency.max-concurrent-consumers=5
The Case Engine can be tuned to the max amount of concurrent consumers, while still being able to perform synchronous tasks in reasonable time. Since sessions are almost always short-lived, memory usage might also be less than the Runtime. When the Case Engine can not cope with the load, a new Case Engine could be added to the solution. Just make sure all synchronous calls are sent from the Runtime / DCM Maintenance APP using a loadbalancer.
Process evaluation can be very database-intensive. Therefore, increasing the number of concurrent consumers increases the possibilities of database (dead)locks to occur. Mostly, the system will recover from these locks automatically, by retrying the action after some time. But it is something to keep into account.
Scheduler
The Case-Engine scheduler is used to process timed events (like task expire dates and process timers). Please make sure the scheduler is configured properly to store timed events in the sql-database. Also, make sure it is properly configured to cope with having more than one Case Engine present on the same database. The only responsibility for the scheduler, is to put messages to the queue at the right time, there is probably no need to upscale this specific component in a high load situation.
DCM Lists Service
The DCM Lists Service stores all data used in the Case and Worklist in the Runtime, together with DCM_CaseSearch service requests. The lists service listens to the queue:
dcmListsServiceEventsQueue
When the number of unacknowledged messages increases, it means that the lists service cannot handle the load. Therefore lists queries can start lagging behind with the real situation in the Case Engine (so for example locked cases still seem locked even when the task is processed by the Case Engine).
Either the number of consuming threads can be increased, or the number of dcm-lists applications.
blueriq: dcm: lists: event: consumer: concurrency: max-concurrent-consumers: 5 concurrent-consumers: 5
Make sure the lists-service is still capable of handling all synchronous requests, since users will be waiting for the results.
Process Outbox and Outbox Poller
Processing outbox can be tuned by increasing/decreasing batchsize, concurrent consumers, async threads, poller concurrency threads and interval. Below there is diagram that shows how whole flow works.
Async threads pool
After saving messages into outbox, process outbox publishers will process messages by removing them from outbox and publishing into other queues. Their number can be configured by updating async threads pool size. This will increase speed of clearing out Outbox
blueriq.case.engine.outbox.async.threads.pool.size=5
Outbox Poller
Outbox Poller is also possible to tune by adding extra poller concurrency threads into its publisher. It will split batchsize into smaller parts and will process them in parallel. By default concurrency pollier is disabled
blueriq.case.engine.outbox.poller.concurrency.enabled=true blueriq.case.engine.outbox.poller.concurrency.threads=2
By specifying interval, we can configure time when poller should get messages from outbox and start processing another batch. By default, it's 2 minutes
blueriq.case.engine.outbox.poller.interval.minutes=2m
We can also specify batchsize, which is number of messages pulled from outbox into memory. By default it's 10000
blueriq.case.engine.outbox.poller.batchsize=10000
Example issues and solutions
Example 1
Problem: Outbox is full, queue is empty
Solution: Increase async threads or poller concurrency threads
Example 2
Problem: Out of memory error
Solution: Decrease batchsize
Example 3
Problem: Messages are processes in queue very slowly
Solution: Increase concurrent consumers
Example 4
Problem: After system was down, outbox is full of messages and they are processed very slowly
Solution: Increase poller concurrency threads, increase batchsize, decrease interval
Customerdata Service
The role of the Customerdata Service is limited to storing and retrieving data. All searches are done by using the DCM-Lists-Service. Therefore, the Cutsomerdata Service does not need much tuning, and can be horizontally scaled when possible. This can be done by adding a new instance, and running all customerdata endpoint through a loadbalancer (configured in the Case-Engine).