Best practice - How to tune lists searches

All lists and searches are performed using the DCM-Lists-Service. This service persists all searchable data as duplicate data, stored in such a way that it can be queried easily. Due to the variation in data, that differs for each Blueriq model, the data is stored in a document database.

Two collections are available in the DCM-Lists-Service data layer:

cases used for searching cases, for example in the DCM_CaseList container or the DCM_CaseSearch service. Contains one document for each case, with the case data (from the case-engine store), but also the process profile data, case-metadata and dossier-metadata. Each case document contains all data that could be filtered in the lists containers.
tasks used for searching manual tasks, for example in the DCM_WorkList container. This is a subset of all tasks of the process-engine, with only tasks that are open (or started) now and could be relevant on a tasklist. Each task document contains task-specific data, and also a copy of the case-data as stated in the cases collection. This means duplicate data, but can be searched without having to join queries and other documents.

Determine which field(s) to index

Indexing the right field is can be needed to facilitate the lists/searches in the lists service most efficiently. The query will be performed in memory where possible, and only a subset of the documents are needed to read from disk to search beyond the index.

The database might use the index to filter the collection, and afterwards sort the collection. It is customary build an index with the best filteroption first (using filtering out the most results for your use), and end with the sorting attribute where possible.

Without any index, MongoDB needs to browse through all documents when performing a query. When the database grows, it might lead to higher CPU load on the system. Make sure to regularly check the database to use indices at the queries performed often. MongoDB can be configured to log its slow queries. When testing your application, you might want to log and analyse any slow queries, to optimise the indices and performance of the database. Example configuration:

mongod.conf

# Enable query logging (verbose mode)
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 100

https://www.mongodb.com/docs/manual/reference/configuration-options/

Make sure indices are present for the queries done regularly by the end-users.

Example

In the DCM Foundation there is a list for cases where the user is involved. To tune this list, we could have a look at the specific container:

There is a filter on Case metadata - BetrokkenenReferentieIds. If we want to optimise this list, we could add an index for the field which is being queried on. For applicants, they usually only have a limited amount of cases, so this might be a good starting point for an index. The case document in the lists service for this specific Case-Type might look like this:

Example document in the tasks collection

{
  "_id": {
    "$numberLong": "5"
  },
  "caseId": "667524bbde495a136891c493",
  "name": "ToevoegenBewijsstukken",
  "priority": 0,
  "required": false,
  "startDate": {
    "$date": "2024-06-21T06:59:07.849Z"
  },
  "caseLastUpdated": {
    "$date": "2024-06-21T06:59:07.535Z"
  },
  "status": "open",
  "customFields": {},
  "displayNames": {
    "nl-nl": "Toevoegen bewijsstukken",
    "en-gb": "Add supporting documents"
  },
  "roles": [
    "Behandelaar",
    "Aanvrager"
  ],
  "teams": [],
  "users": [],
  "unauthorizedUsers": [],
  "caseLocked": false,
  "applicationName": "studio-DCMFoundation-ZaakType_A",
  "applicationVersion": "0.0-V7_3_2",
  "attributes": {
    ...
  },
  "metadata": {
    "WorkflowStatus": "open",
    "Fase": "aanvragen",
    "Betrokkenen": [
      "aanvrager"
    ],
    "Status": "opvoeren",
    "Zaaktype": "ZaakType_A",
    "Kenmerk": "ce5c0272-ac30-48d8-a6de-4595b11a5193",
    "BetrokkenenRollen": [
      "aanvrager"
    ],
    "AanmaakMoment": {
      "$numberLong": "1718953144974"
    },
    "MigratieVersie": 1,
    "BetrokkenenReferentieIds": [
      "aanvrager"
    ]
  },
  "dossierMetadata": {},
  "caseReference": "ce5c0272-ac30-48d8-a6de-4595b11a5193",
  "_class": "com.blueriq.dcm.lists.repository.mongo.document.TaskDocument"
}

The field to index can be found using the path metadata.BetrokkenenReferentieIds, which is also an multi valued attribute (an index can use one list as input), in my case with one value "aanvrager". In the screenshot we can also see a field where the list is sorted on, the standard casedata CreationDate (in the document called "startDate"). When we add this field in the index, MongoDB can use this to sort the list in memory. For our specific situation we could add an index like this:

dbDcmLists = new Mongo().getDB("dcmLists");
dbDcmLists.cases.createIndex({"metadata.BetrokkenenReferentieIds": 1 , "startDate" : 1});

Now we've successfully created an index, that can be checked in MongoDB Compass. This tool can return the usage of the index. So after some testing, we can confirm whether the index has been used in actual queries in the DCM-Lists-Service.

Creating indices

Based on the filter options in the model, it can help to create indices in the MongoDB. Make sure the most distinguishable field is indexed first, and the default sorting attribute is the last item in your index. When no sorting attribute is sent in the filter, the _id field is used for sorting (otherwise requesting the first 10 results might lead to different results the second time).

Example

One example is the use of a case-specific worklist. One of the key filters on this list is the Case ID. Since filtering on Case ID might already limit the amount of possible documents to just a few documents, this might be a useful index to have.

dbDcmListsTasks = new Mongo().getDB("dcmLists");
dbDcmListsTasks.tasks.createIndex({"caseId": 1 , "_id" : 1});

Considerations

Indexes will be created and maintained in memory of the Database. So make sure that indexes really help optimising your queries, since there will be a penalty on creating/updating documents, and memory usage. However, searching without indices can become more expensive when there are more documents to search upon.

Other spaces

Best practice - How to tune lists searches

Determine which field(s) to index

Example

Creating indices

Example

Considerations