InvariVision vs Fingerprint video recognition technology

Dmitriy Yeremeyev , a co-founder of the InvariVision project, spoke about the new frame-by-frame video recognition technology have been developed by the company. What is its trait and why is it better than the Fingerprint technology that lies in a foundation of YouTube’s Content ID?

The full recording of the InvariVision developer interview is available here:

Interview with
Dmitriy Yeremeyev

How was InvariVision technology created?

We had a basic image recognition algorithm. I’ve worked on its development for a long time. Its main advantage is that it can work with a huge image database. For our experiments, we used 20 million images. Search in such a database does not take more than 20 milliseconds. It’s quick enough. Therefore, the algorithm can analyze several million images even on the low-performance CPU (for example, on mobile phones).

What can this basic technology do?

We have already tested our technology in robot navigation. It works like this: the robot memorizes the pictures, and associated coordinates (location coordinates, azimuth, and an angle at which it moves around the map). When the robot sees the pictures, he immediately understands where it is. Thus, it can navigate in space.

Also, you can attach a virtual object to these coordinates. That is, to use it for constructing augmented reality. This is another direction for our work.

Right now we are focusing on video recognition. The Content ID algorithm on the YouTube is based on the so-called Fingerprint technology. However, Fingerprint does not always correctly recognize the video. The technology that we are developing works more efficiently.

Dmitriy Yeremeyev

What is the disadvantage of Fingerprint?

Fingerprint technology can’t analyze the recording frame by frame. It “splits” the video records into intervals with a certain number of frames and calculates video statistics from them (for example, brightness histograms). So, if the algorithm uses 5 minutes intervals,  it cannot recognize fragments that is less than 5 minutes. This is the first drawback.

The second is that this method doesn’t work if the original video has been dramatically modified (was reencoded, added a frame, and so on). Users can easily bypass the video copyright protection with such distortions.

What is the advantage of InvariVision technology over Fingerprint?

The main difference lies in the fact that our recognition technology analyses the records frame by frame. So, we can find even tiny 4 seconds fragments. Secondly, frame-by-frame technology is more resistant to distortion. It readily recognizes added frame, video cuts or changes in speed. Our system is more difficult to cheat.

Why do the market giants like YouTube continue to use Fingerprint?

Although fingerprint technology has serious drawbacks, it’s widespread and has long been used. To put a new more efficient technology on the market in such conditions is very difficult. Therefore, now we are working to ensure that as many specialists as possible learn about our developments and start using it to recognize video.