Adapting the InvariMatch system to work in a cluster

The InvariMatch system’s original architecture, where all video processing took place on one machine, worked well from the beginning. However, we understood that the system would fail under an increased load and thought about optimizing it. In 2016, one of our customers needed to install InvariMatch on a cluster of several machines. The system worked well at first, but after some time we faced several unexpected problems. We had to improve InvariMatch urgently, and these are the steps we took to achieve that.

Uniform load distribution between nodes

Without a proper request distribution system, the incoming tasks were not distributed equally across all nodes in the cluster. As a result, some of the nodes were overloaded, while the remaining ones remained idle. This resulted in a significant decrease in the clusters request processing capabilities.

We improved our system by distributing all incoming tasks equally among nodes in a cluster. An algorithm calculated the nodes workload automatically, which allowed the system to utilize all nodes at once.  As a result of this change, the system was now much more efficient at dealing with high quantities of requests.

Switching from NFS to separate data storage in a cluster

The redistribution of data storage into several nodes was the second problem we had to solve. At first, we stored the input video and the video in the search index as separate files on one server. Nodes involved in video processing accessed files via NFS. With the number of videos in the database rapidly increasing, the load on the data channel got too big, so we decided to distribute the files which were being used to local nodes.

Transition from Multicast to Unicast in UDP responses

Initially, requests and responses from search cores were sent via Multicast. The incoming requests were identical for all cores, so Multicast allowed us to save on traffic in the network. However, the responses were different and were addressed to the same node. So, we began to use direct UDP responses. This significantly reduced the network load within the system.

Automation of node updating and configuring

It was not an easy task to update and configure a great deal of nodes manually. Therefore, we decided to automate it. The config files could now be assembled through a web interface and the system is now administrated in a centralized manner. The system update process has been optimized based on a global RPM scenario.

Automatic nodes diagnostics

Despite all the precautionary measures, there was a power loss in the data centre one day. Restoring the system after such failure took a lot of time. To speed up system recovery in the future, we added an automatic node diagnostic and troubleshooting system that executes on node start. This made it possible to identify problems in the operation of each node before the system began working as well as reducing the time needed for manual troubleshooting.

Storing search results in a database

In the original InvariMatch system, it was convenient to store the search results in separate files. However, it was very difficult to access files scattered throughout the system in a cluster. Eventually, we began to store the results in a MySQL database so that we could easily access them for analysis if necessary.

As a result of these changes, we managed to optimize the InvaryMatch system for work in a cluster of nodes as well as making it more stable and reliable. InvariMatch can now process large amounts of data efficiently, update quickly and recover after a crash.