How does it work?

AI-based deep learning filtering

Irisity's Alarm Filter uses proprietary, state-of-the-art deep neural networks in combination with various computer vision and complementary AI technologies to accurately assess if there is human activity in the alarm sent to the Filter.

Images or video as input

The alarm filter will process incoming alarms containing 3+ image frames, or a short video clip. It will classify alarms as True when there is human activity in the clip, meaning human-propelled motion, such as people, vehicles, bikes, or similar moving. It will classify all other alarms as False, meaning "no human activity".

Is video or images preferred?

For the AI, the video is decoded into individual frames, meaning that a clip and the same clip split up into multiple frames will yield the same result. That said, there are some things to keep in mind:

  • Images and video will give the same AI accuracy.
  • Video clips typically give a nicer operator experience.
  • More is not better, meaning a short video clip is often preferred due to bandwidth and other considerations.

What frames are used?

When sending in more than 3 frames or a clip containing more than 3 frames, it is helpful to know how those are handled.

Our recommendation is to send

  • 3 frames: The detection and 1 second before and after
  • or a 2-second video clip with 0.5 seconds pre-recording and 1.5 seconds post-recording with a low framerate



  • The AI will always look at the middle frame, no matter how long the clip is. It will also look at the frames +/- 1 second from the middle frame.
    • For a clip up to 2 seconds, this will mean we take the first, last, and middle frame. This means that in these cases the detection frame can be any of the above frames.
    • For longer clips it is important to ensure that the detection frame is the middle frame or shortly after the detection, depending on when the source analytics trigger. We will then use frames 1 second before and 1 second after to determine object motion.
    • For a 20-second clip, this would mean we process frames around the 9-, 10-, and 11-second mark, meaning it's important that the object causing the detection is present in at least one, preferably all, of those frames.
  • As a result of the above points, we strongly recommend configuring pre- and post-recording of the event clip to be the same when using video clips. Example: Either send 2+2 seconds of video in the clip before/after the detection frame or 5+5 seconds. Do not configure the camera to send 5+2 seconds since that will mean that the main detection frame is not analyzed, as well as increase the risk that neither frame analyzed contains the object that triggered the detection.
  • For the best results, we recommend using a clip or frames covering 2 or more seconds of video.
  • Optimal frame spacing (for images and clips) is 1 second, but more can be used to improve the operators' experience.

The object causing the alarm needs to be in view...

As obvious as it sounds, the primary case for missed detections is when the alarm clip or frames sent do not contain the object that caused the motion.

Make sure that the object (typically person or vehicle) causing the alarm is visible in the first, last, or middle frame, and preferably all of them. If you still have issues with human activity not being picked up correctly, please reach out to us at Irisity and we'll be happy to look into the issue.

...and large enough!

We recommend that the object you're trying to detect is always at least 5% of the image frame height for the Irisity Alarm Filter to work optimally. We therefore recommend a resolution of 640x480 pixels or 720p for optimal performance and bandwidth. Portrait mode images are also supported. Panoramic (very wide) images are also supported, but not recommended.

Resolutions lower than the recommended resolution may result in degraded detection performance, while higher resolution will consume additional bandwidth and compute resources unnecessarily.

Our recommendation is to

  • Se the resolution between 640x480 pixels and 720p
  • Make sure to have at least 5% of the image height covering the object you want to detect

How should I set up my cameras?

Let's cover two aspects: alarm setup and protocol.

Alarm setup

  • We recommend using the best video analysis available in the camera or system you are setting up.
  • We recommend setting the sensitivity fairly high, depending on the system used. We recommend starting high and reducing the sensitivity if needed after 24 hours of testing the system.
  • We recommend configuring activity zones or black masks already in the sending system to avoid excessive alarm volumes consuming bandwidth, compute resources, and may mask important events. These should preferably exclude known sources of constant motion, such as busy roads or trees causing frequent false alarms.
  • For the same reason, we recommend setting an approximate schedule already in the alarming system.
  • A cooldown period, time between alarms, of 30-60 seconds is typically recommended. If supported, it is often preferable to configure multiple independent alarm zones, each with a higher cooldown, than to have one large zone with a low cooldown in each camera view. This is because a constant false alarm source in one corner of the frame (a busy road for example) will set the source system in constant cooldown, meaning that motion in a different area of the frame could potentially never cause an alarm to be sent to the Irisity Alarm Filter. Multiple zones will also allow multiple detections just seconds apart due to relevant motion in two different areas of the frame.
  • If possible, set the system to have the middle frame as the detection frame. This will maximize the chances of the object causing the motion being in the camera view even if it is entering or leaving the frame.

Protocol - HTTPS or SMTP?

Irisity's alarm filter supports both HTTPS and SMTP, as well as custom integrations. In order, we recommend:

  1. Custom integrations when available. These are created to get the best results based on the capabilities of the sending system and typically give the most efficient setup and management. Reach out to us for a quote if you are interested in a custom integration for your system.
  2. HTTPS. When the sending system supports it, HTTPS is preferred. It has less overhead than SMTP, has higher security, and will give the shortest latency since it is a point-to-point protocol. It can be proxied, which is fine but will always yield a response back to the sending system.
  3. SMTP. The main benefit of SMTP is that it is the video alarm protocol with the widest support in the security industry. It is supported by the Irisity Alarm Filter but has drawbacks such as the often lacking security, non-standard implementation from various vendors, and limited support for configuration, parameters, and metadata.

For more detailed setup instructions, please refer to our complete documentation section.