Blueriq can be used in a hosting environment where high reliability is a requirement. This article will talk about the why, the when and the how of setting up Blueriq services in with aforementioned requirements in mind. With Blueriq your end users are using a Blueriq application that was designed in our design environment and it runs on a Blueriq Runtime instance. May it be a simple forms application, several webservices to an entire case management system. All of these Blueriq projects will be hosted and will be running through a Blueriq Runtime.
Why to Use Failover (Scalability vs. Availability vs. Reliability)
Let's first explore how Blueriq approaches the topic of high reliability. At this moment in time we decided to focus on reliability over availability and scalability, but without limiting the option to make your setup more available and/or scalable.
Let's look at some problems:
- When a runtime instance has any kind of downtime; maintenance or unforeseen issues, your Blueriq application will naturally not be available anymore. This can become a real problem when you want to minimize downtime on your online services.
- Blueriq by its nature is a stateful information system, meaning each end-user using your Blueriq application will have a session that is stored server side. These session are classically stored inside server memory and would be lost if the server goes down.
- Your application might also be under heavy load on certain peak times, making it important to have enough capacity in your infrastructure to accommodate all these users with sufficient performance and responsiveness.
Problem 1 would require a solution that increases availability during downtimes.
Problem 2 would need some way to increase reliability of user sessions during system failures and maintenance windows.
Problem 3 would want some way of running multiple instances of the same type concurrently. This is often called vertical scalability.
There is a difference between a scalable, available and reliable system, although there can be some overlap as in making a system highly available might also make it more scalable or more reliable. The key difference is that in a highly reliable environment the focus is on retaining state when something goes down whether it is planned or not. With Blueriq this mainly means we keep sessions available even when Blueriq services go offline. When we are talking about creating a scalable environment it would not be necessary to focus on retaining user sessions. This is because a session can be running fine inside a single online service for its whole active time. You could theoretically start as many other instances of a Blueriq service as you need and have all of them manage there own sessions entirely in system memory without much trouble in context of scalability.
Managing sessions entirely in memory is one of the obstacles to overcome in the architecture of Blueriq in making it more reliable, but not in making it more scalable or available. Retaining user sessions is also one of the most important things in a Blueriq application since users might be in the middle of handling case work or filling out extensive forms and they would lose all their progress if the system is not sufficiently reliable in retaining their progress on moments of downtime.
To make your Blueriq stack highly reliable we made Blueriq services store their state in external high performance data stores, in our case Redis. This way the components of Blueriq that have session management can be setup in a failover cluster, thus making your Blueriq stack more resilient against downtime.
When to Use Failover (What fits your requirements?)
Blueriq usually runs in an landscape with multiple other hosted applications. The documentation about the subject failover is primarily meant to advise on when and how to use failover. We offer certain functionality and it is up to the professionals implementing the solution to make the decisions on how to use functionality in this solution. How does Blueriq fit in your infrastructure? What technical requirements do you have as a team and organization?
Consider the following when planning your hosting environment setup:
- Is session data loss something that must be prevented on downtime?
- Do you have high availability requirements (I.E. >99% uptimes) with your customers/partners?
- Do you want to update/release Blueriq services/models with newer versions while minimizing downtime?*
If any of these questions are answered with a "yes" you might want to consider setting things up in a failover configuration. As explained before a failover configuration is best at increasing reliability, but can help you scale or become more available too.
Setting Blueriq up with failover support does also bring some downsides that could mean it is no longer worth it:
- Significant growth of hosted services in your infrastructure.
Besides a significant increase in the number of Blueriq components/services that are online there would also be: Load balancers, gateways, external memory stores, etc.
Is it worth it to include these layers and extra configuration to your infrastructure? Will the increased operating costs be worth it? - Overall performance might decrease.
Though Blueriq is constantly looking to optimize and monitor general performance, it is not uncommon for response times to go up in any sort of failover environment.
Using external session stores will generally not be as fast as storing state directly in server memory.
The impact is generally well within acceptable time ranges, but do consider this when thinking about switching to a failover setup.
* Switching to newer major versions of Blueriq components might still require you to have downtime.
How to Use Failover (Clustering Runtimes)
Setting up Blueriq to support failover scenarios requires you to set up clusters of your Blueriq services.
You can find explanations about this over here: Runtime cluster - Blueriq Latest - Blueriq Community
The linked documentation talks about setting up clusters for Blueriq forms applications, but the same can be done in a DCM environment.
Do note that you need the DCM new architecture that requires the case engine.