IRIS+™ Architecture and Engineering

28min

General

Specification for Video Analytics Solution

The proposed video analysis solution should meet all the requirements detailed in this A&E specification document.

Supported cameras

ONVIF/RTSP

The solution should support analysis of any ONVIF / RTSP video streams from fixed, angled, or overhead cameras. These could be IP cameras or analog cameras via encoders.

CCD cameras

  1. The solution should support the analysis of video streams of optical cameras. 
  2. The solution should support a minimum resolution of 480p and should be able to support higher resolution streams (i.e.: 720p / 1080p) to improve detection distance and accuracy. The maximum supported resolution should be 4k.
  3. The solution should support the analysis of video streams with a minimum frame rate of 8 FPS. 

Thermal cameras

  1. The solution should support the analysis of video streams of thermal cameras. 
  2. The solution should support a minimum resolution of QVGA (320 x 240 pixels) and should be able to support higher-resolution streams to improve detection distance and accuracy. 
  3. The solution should support the analysis of video streams with a minimum frame rate of 8 FPS. 

Video Analysis Technology

The solution should be based on Deep Learning technology for target detection and classification.

Target types

  1. The solution should support the automatic detection and classification of the following target types:  
    • Person:
      • Standing, Fallen, or lying on the ground
  • Two Wheeled Vehicle
    • Motorcycle, Bicycle 
  • Vehicle
    • Car, Van, Bus, Truck 
  • Object
    • Bags, Suitcases, Backpacks, Boxes, Purses
  • Smoke and Fire
    • Smoke, Fire 
  1. The solution should be able to detect and ignore the following objects automatically: 
    • Clouds
    • Birds
    • Dogs/cats
    • Vegetation
  2. The solution should support detecting the existence or absence of custom object types. The customer should be required to provide a relevant set of several hundred images of the desired object type, and a custom object detection model should be prepared for the customer. 

Rule-Based Event Detection & Analysis Capabilities

Analytics Rules

  1. The solution should offer a suite of analytic rules to provide real-time detection of the following behaviors: 
    • Target/s moving in an area/loitering target is moving in the region of interest for a user-defined duration
    • Target/s crossing a line – the target has crossed a user-defined line in a specific direction or any direction 
    • Stopped vehicle – the target has stopped in the region of interest for a user-defined duration 
    • Vehicle speeding – the target vehicle crosses a line at a higher speed than defined 
    • Grouping – detection of a dense group of people (a configurable number) in the region of interest, detected for a user-defined duration 
    • Crowd Density – the solution can handle counting very dense crowds, up to 5000 people in the field of view of one camera.
    • Object left behind – detection of suitcase/bag/backpack/purse left behind in the region of interest for a user-defined duration 
    • Asset protection – Mark an object in the field of view and receive an alert when that object is removed 
    • Traffic counter-flow – a vehicle is traveling in the opposite direction 
    • Slip & Fall detection – a person is falling on the ground and detected lying on the floor 
    • Face recognition - A person in a scene is detected with the same attributes as an individual from a watchlist
  2. Each detection rule should apply to the relevant target types. The user should be able to select several relevant target types for each detection rule.
  3. The solution should be able to detect the existence or disappearance of Custom objects in/from a user-defined region of interest.

Rule Configuration and Setup

  1. The solution should provide the ability to execute bulk operations for activating, deactivating, and scheduling multiple analytics rules. 
  2. The solution should enable any combination of analytics rules to run on the same camera simultaneously, without limitations. 
  3. The solution should enable the operator to define multiple detection regions per camera. 

Scene calibration

Each camera should support automatic and manual calibration. Calibration refers to the translation from pixels in the image to actual size (meters/feet) in different parts of the image: 

  1. The system should automatically calibrate object sizes in the image based on standard sizes of DNN-classified targets in the scene, over time.  
  2. The system should have the possibility to override the automatic calibration or calibrate scenes where no movement occurs. Manual calibration should support different pixel-to-meter translations for different parts of the image, creating a flexible calibration mesh across the image frame. 
  3. All calibration methods should support complex translations creating accurate translations also in challenging environments resulting from e.g. fish-eye/distorted camera images and scenes with multiple levels. 

Event Generation

  1. The solution should provide real-time generation of events to alert operators when a behavior is detected that matches the user-defined rule. 
  2. The solution should support simultaneous tracking of multiple targets within the detection regions and/or the cross lines. 
  3. The solution should generate a short event video clip for each detected event, showing several seconds before and after the event, and include a bounding box around the target that triggered the event. 

Event Integration

The solution should be able to send the events to the following external systems: 

  • Milestone XProtect VMS 
  • Genetec Security Center VMS 
  • Immix CS and Immix GF 
  • Sentinel 
  • Patriot
  • Mobotix MxHub and MxManagementCenter 
  • Other systems based on WebHooks protocol (HTTP Push) 
  • Other systems based on SMTP protocol 

Anomaly Detection

The system should be able to continuously ‘learn’ the typical behavior of the scene. The system should then be able to automatically detect abnormal behaviors of detected targets and generate anomaly events in real time.

Anomaly Detection Configuration

  1. The user should be able to activate/de-activate anomaly detection for a single or multiple camera/s. 
  2. The user should be able to change the default anomaly detection configuration parameters for a single or multiple camera: 
    • Analyzed target types: multiple selections of relevant target types  
    • Anomaly detection sensitivity: measured by the average number of anomaly events to be generated daily/weekly 

Anomaly Detection and Event Generation

  1. The system should analyze the behavior of a target based on the following parameters: 
    • The target’s existence in the camera’s field of view 
    • The Target’s path  
    • The Target’s size 
    • The Target’s speed 
  2. The system should analyze the relations of several targets in the scene based on the following parameters:
    • The number of targets in the scene and their location in the camera’s field of view
    • The distance between the targets in the scene
  3. The system should analyze the object’s normal behaviors per different weekdays and hours of the day.
  4. The system should generate an anomaly event including an event clip with several seconds before and after the event date and time, bounding boxes around the relevant targets, and a description of the detected anomaly.

Anomaly Detection Training and User Feedback

  1. The system should require an initial learning (‘training’) period of a few weeks. 
  2. The system should enable the user to fine-tune the anomaly events by letting the user tag the anomaly events as relevant or irrelevant. irrelevant events should feed the ongoing training process and should subsequently be ignored.  
  3. The system should re-learn the scene’s normal behavior regularly. 

Anonymization

The solution should support static and dynamic video anonymization functionality. The functionality should be individually configurable per video stream.

Different anonymization methods should be supported: 

  1. Standard pixelation application to grayscale images and streams
  2. Standard pixelation application to color images and streams.

Pixelation implementations

Each frame in the video data is permanently, destructively, and masked. The original video data should not be possible to recover. 

Video Investigation

Method of operation

  1. The system should analyze all cameras in real time and create metadata that will be stored in a database. It should be possible to search any camera with a delay of no more than 10 seconds in real time.  
  2. The system hardware specification will facilitate the real-time processing of all cameras. 
  3. It should be possible to search any or all cameras in the installation simultaneously and without needing to process the cameras in small batches, regardless of the number of cameras installed in the system. 

Defining Search Criteria

  1. The solution should not require the operator to apply any rule or behavior configuration in advance as a pre-requisite for performing video investigation. 
  2. The video investigation should be conducted simultaneously over single or multiple selected cameras, either from a list or map. 
  3. The solution should offer to search for the following suite of behaviors of the supported target types:
    • Person / Two Wheeled Vehicle (Motorcycle/Bicycle) / Vehicle (Car, Van, Bus, Truck) moving for a specified time, in the entire field of view (FOV) or in a specified area of interest (AOI) 
    • Person  / Two Wheeled Vehicle (Motorcycle/Bicycle) / Vehicle (Car, Van, Bus, Truck) crossing a line in a specific direction or in either direction
    • People grouping for a specified time, in the entire FOV, or in a specified AOI (based on a user-defined number of people threshold) 
    • Persons occupying for a specified time the entire FOV or a specified AOI (based on a user-defined occupancy threshold) 
    • Two Wheeled Vehicle (Motorcycle/Bicycle) / Vehicle (Car, Van, Bus, Truck that stopped for a specified time, in the entire FOV or a specified AOI 
    • Bags/backpacks/suitcases that were added for a specified time, to the entire FOV or a specified AOI 
    • Persons that exhibit similar attributes to a Search for Similar image
  4. The solution should allow filtering of search results based on target color characteristics. For people, the solution should allow specifying upper body color and lower body color. 
  5. The solution should be capable of searching over various time range options: 
    • Over the past N minutes, hours, or days (e.g., over the past 3 hours; past 7 days) 
    • From a start date and time to an end date and time 
    • Over a recurring time interval across a date interval (e.g., between 8-9 a.m., every day between Jan 1-10) 
  6. The solution should provide the capability to Search for Similar Targets: If a target is found, another search can be performed in the recorded video (generated from the same camera or any group of cameras) to find targets that are the same as or similar to the found target. 

Viewing Search Results

  1. The solution should be able to display video recording for any search result around the time that the search target/behavior was found, without requiring integration into a 3rd party recording solution. 
    • The solution should continuously display a bounding box over the target (target tracking)  
    • The solution should enable the user to Pause and Re-Play the video playback 
    • The solution should enable the user to use the progress bar to navigate to any time position along the playback segment 
  2. The solution should provide multiple options for viewing search results:
    • Event Thumbnails: After searching for the relevant results, the solution should be capable of displaying a thumbnail for each result, each of which shows the detected target. It should be possible to play back the video of each thumbnail.  
    • Target Path / Location: After searching, the solution should be capable of displaying all motion paths in a scene over the field-of-view reference image. Motion paths should be colorized based on the primary object class (Human, Vehicle, Two-wheeled vehicle) and filterable based on object class. 
    • Heatmap: After searching, the solution should be capable of overlaying a grid where each cell is colorized depending on the amount of activity in that part of the image. It should be possible to filter the heatmap overlay depending on the object class. 
    • Original video: After searching, the solution should be capable of showing the original video in an integrated video player.

Process and Investigation Capabilities

The solution should provide the following process and investigation capabilities:

  • Save Search Query: Users should be able to save a search query with a given name for later reuse.  
  • Save Search Results: Users should be able to save search results with snapshots of the detections and the results’ identifying information (camera ID, time).  
  • Export Clip: The solution should enable users to export a search result to a video file, for a single result as well as for a complete video summary. The exported clip will include the target tracking display. 
  • Search for similar -

Statistics

  1. The solution should offer analytics rules to provide statistical analysis capabilities for multiple target types including but not limited to: 
    • Count the number of targets moving directionally, i.e., crossing a virtual line/s that is operator-defined in the camera’s field of view. The solution should optimize the precise counting of targets by distinguishing individual targets in a cluster. If a cluster of 4 people crosses a line (for example), a count of 4 will occur rather than 1.
    • Calculate the average speed of vehicles crossing a line.  
  2. The solution should offer statistics of events generated in the system, over time. Aggregation of events over cameras and hours/days should be supported. 
  3. The solution should offer statistics of health alerts generated in the system, over time. Aggregation of events over cameras/devices/sources and hours/days should be supported. 
  4. The statistical data should be available using APIs 

Note: Specifically crowd statistics and crowd density share the same algo but can be applied in both manners per sensor

Geospatial Awareness

Geospatial Mapping

  1. The solution should enable the user to configure the following geospatial data per video source connected to the solution: 
    • The video source’s location on a map 
    • The video source’s FOV (Field Of View) registration – correlation of points in the FOV with a map 
  2. Alternatively, the solution should retrieve the geospatial data from 3rd party systems. 

Geospatial Analysis

  1. The solution should be able to present real-time events or video investigation search results over a map. 
  2. The solution should allow a map-based selection of relevant cameras within a user-defined zone, for video investigation. 
  3. The solution should be able to present a tracked target path over a map. 

System Health Monitoring

  1. The solution should self-monitor its main components to ensure high availability and reliable video analysis. This monitoring should include the following aspects: 
    • Ability to properly pull the ONVIF/RTSP video stream 
    • Minimal video stream frame rate and resolution 
    • Scene lighting (too dark/saturated / blocked) 
    • Event delivery status 
    • Monitoring of analytics servers (the computer running the video analytics) 
  2. The solution should support configurable thresholds for detecting and generating health alerts.  
  3. The solution should be able to send health alerts to a configurable email recipients list. 

System Architecture

The solution should be based on the following main components:

  1. Analytics server(s) 
    • The analytics server(s) pulls video streams from cameras and performs initial video analysis.  
    • The analytics server(s) communicates with the core server(s). 
    • The analytics server(s) should support low-bandwidth connection to the core services, down to ~5 kbps per camera. 
    • The analytics server(s) should be able to scale to support any number of cameras and should support both scale-up and scale-out resource scaling to increase performance. 
    • A GPU should not be required. 
    • The analytics server should support both cloud-based, on-premise, and hybrid deployments. On-premise installations without internet access (“fully offline”) should be supported. 
    • Real-time event integrations to third-party systems from the analytics server should be supported.  
  2. Core server(s) 
    • The core server(s) should provide central management for all video analysis, rule configuration, and setup. 
    • The core server(s) should support both cloud-based and on-premise deployments. On-premise installations without internet access (“fully offline”) should be supported. 
    • The core server(s) should be able to scale to support any number of cameras and should support both scale-up and scale-out resource scaling to increase performance. 
    • The core server(s) should support high-availability deployments. 
    • Multi-Tenant – the solution should support an unlimited number of separate accounts that should be completely isolated from each other. 
    • An unlimited number of users – each account in the system should allow setting up an unlimited number of users. 
    • Role-based authentication – each user in the system should be assigned a role with specific access permissions to different modules of the solution.  
    • Real-time event integrations to third-party systems from the core server(s). 
    • Real-time health alert integration to third-party systems should be supported. 
  3. Public API
    • The system should expose a public, documented, API. 
    • The API should require authentication and encrypted communication. 

The API should support all functionality in the system, including management of cameras and analytics, management of events and health alerts, monitoring of system health and real-time events, and the ability to create and get results from a video investigation.