Chapter 8: Artificial Intelligence

8.1. Anomaly Detection

8.1.1. Overview

Using Artificial Intelligence (AI) to perform Anomaly Detection, in a broad sense, is the identification of patterns that do not conform to a defined normal behavior. In cOS Core, the Anomaly Detection feature employs a combination of state-of-the-art concepts in machine learning and time-series analysis to detect communication misbehavior in near real-time. The underlying technique features a multi-layer AI engine that enables the creation and configuration of multiple AI models that monitor multiple data streams concurrently as visualized in the below image.

AI Engine

Figure 8.1. AI Engine

A model that is initialized and configured for a specific data stream via the AI engine runs in two primary phases:

  • Training

  • Inference

In Training phase, the model processes traffic data to learn and identify a baseline (normal) behavior of the monitored communication. The learning process employs sophisticated mathematical tools to capture behavioral aspects of the communication, focusing on identifying regularity and cyclicity in traffic sequences and patterns, which makes this feature particularly suitable for machine-to-machine communication.

In Inference phase, the trained model is used to analyse each packet received from the corresponding data stream to determine whether the communication conforms to the identified normal behavior or deviates from it. In the latter case, the model generates an alarm event signalling the detection of suspicious or abnormal network activity that deviates from the trained pattern.

Anomaly Detection License

To use Anomaly Detection, the license must contain a valid subscription date and the AI Policy parameter for the feature to be enabled. An example of how the parameter might appear using the CLI command License:
Max AI Policies: 20

Please note that the license must include support for AI. The standard license does not allow administrators to configure AI functionality by default. In addition, there are platform and CPU requirements:

  • 32-bit x86 : Not supported

  • 64-bit x86: Minimum number of cores: 4

  • 64-bit Arm: Minimum number of cores: 3

Anomaly Detection in cOS Core

The anomaly-detection feature extends the existing capabilities of cOS Core by using a novel AI method that monitors the behavior of incoming traffic. For a given data stream, an AI model is initially trained on a pre-specified number of packets to learn the communication characteristics representing the normal behavior. The trained model can then monitor the data stream to identify changes in behavior, and raise an alarm whenever such a change is detected.

To create and configure an AI model on a given data stream, the user needs to create an AI policy, via either the WebUI or the CLI. Each AI policy corresponds to exactly one AI model.

[Tip] Tip: Dropped Traffic Can Be Analyzed

Even traffic that is not allowed through the firewall can be analyzed by an AI model if it is included in its corresponding AI policy.

[Note] Note: Anomaly Detection Does Not Drop Traffic

As the name implies, Anomaly Detection's main purpose is to report anomalies and generate a log when an anomaly has been detected. Its purpose is not to drop traffic when an anomaly is detected but rather inform about an anomaly to the administrator who can then take the appropriate action e.g. start analyzing logs, data etc. surrounding the reported anomaly.

[Important] Important: No High Availability Support

It is currently not possible to configure and use AI Anomaly Detection on High Availability clusters. An error will be shown if attempts are made to create an AI Policy on an HA cluster.

Modes of Operation

At any given time, an AI model can be in one of four different modes of operation: Training, Warmup, Inference, and Error.

  • Training

    The model learns and continuously updates its understanding of the structure and cyclicity underlying the incoming data. In this context, ensuring that the training data is representative of the normal behavior is essential to the model performance. In training mode, the status field will display a percentage of the training progress relative to the user-specified number of training packets.

  • Warmup

    Once training is complete, the model requires receiving enough data points to fill up the lookback window (see Section 8.1.3.1, Advanced Options for more information) before it can output the first score. In warmup mode, the status field will display a percentage of the warmup progress. Warmup should be considerably faster than training.

  • Inference

    After a model has been trained, all the future incoming data points are referenced to the learned behavior and if an anomaly is discovered, an alarm will be generated for the user. In inference mode, the status will display either OK or alarm.

  • Error

    In the event of unexpected failures during training, the model will enter error mode. While in this mode, the model will stop processing but remain active, with the error logged by cOS Core. To resume operation, the user must reset/retrain the model.

Model Storage

Data storage for anomaly detection models operates as follows:

  • Training Mode

    During training mode, the parameters that make up the model are stored in memory. If cOS Core is restarted during this mode the training will be restarted.

  • Warmup and Inference

    Once training mode is completed, the trained model is written to storage. If cOS Core is restarted, the trained model is loaded from storage and is placed in Warmup mode and will eventually enter Inference mode.

  • Storage in cOS Core Configuration

    Trained models are not stored in the firewall configuration.

8.1.2. Usage Considerations

The Clavister anomaly detection feature is a powerful tool that can help identify complex anomalies in a network but, like any other machine learning based method it can trigger an alert on normal traffic, this is commonly referred to as a False Positive. In order to minimize such occurrences, some considerations could be taken into account when configuring and using AI Policies.

  • The monitored communication should, to some extent, be deterministic, meaning the communication should be somewhat regular and have repeating patterns rather than being random. This is typically the case for machine-to-machine communication, such as a camera generating a video stream to a server.

  • Just as it is important to monitor the communication patterns, it is equally important to have enough training data in order to represent all of the normal behaviors.

  • If the communication pattern is complex, including more training data would be beneficial to minimize the risk of getting false positives.

  • If the communication is too random and unpredictable it may be difficult for a model to learn a good structure in training. As an example, monitoring users browsing the Internet.

  • Applying multiple trained models to the same traffic is also possible, depending on the scenario, but it could result in a higher load on the AI subsystem. There is no restriction on how the AI Policy can be configured, other than the number of policies.

  • Be aware that there is no right or wrong way to apply an AI Policy to an environment. Rather, it is up to the administrator to evaluate if the traffic generated in the target environment is a suitable target for applying an AI Policy.

8.1.3. Configuring an AI Policy

The configuration of an AI Policy is similar to other types of policies in terms of interface and network filter. Each policy has filtering properties to target specific traffic with the added option to customize some of the more advanced AI options.

[Note] Note: AI Policies Do Not Rely on Rule Order

Unlike traditional IP Policies, which rely on their rule order for traffic matching and are processed in order from top to bottom, AI Policies are not affected by their order in the rule set. AI Policies evaluate traffic independently of where they are placed in the list.

Filter Options

The filtering fields used to trigger an AI Policy are the following:
  • Source Interface

    The interface that the packets expect to arrive on.

  • Source network

    The IP span that the sender addresses should belong to.

  • Destination network

    The IP span that the destination (target) addresses should belong to.

  • Service

    Services are pre-defined or user-defined objects representing various IP protocols, such as HTTP, SSH and Telnet that can further narrow down what the AI Policy should monitor.

[Note] Note: No Destination Interface

AI Policy lookups are performed on all incoming packets before any other rule lookup is done, including route lookup. This means that the destination interface is not known yet.

8.1.3.1. Advanced Options

Additional advanced settings that can be modified are the following:
  • Amount of Training Data

    The number of network packets that will be used for training.

  • Lookback-window Size

    Number of historical packets to consider for current behavior validation

    [Note] Note: Changing the Lookback-window Option

    The Lookback-Window Size parameter has the greatest impact on the model's size and performance. It would be a trade-off between performance and accuracy, the larger the Lookback-window size the more accurate the model would be but would also have a larger impact on performance.

  • Changing the Sensitivity Option

    Controls how reactive the model is to changes in traffic patterns.

    [Note] Note: The Sensitivity Option

    The model's sensitivity parameter should rarely need to be changed. It should be set to default unless a specific need to adjust it arises.

[Warning] Warning: Changing Training Data or Lookback-Window Options

Since the "Amount of Training Data Points" and "Lookback-Window Size" parameters are fundamental to the AI policy, changing either of these options will cause the model corresponding to the AI policy to restart training.

Example 8.1. Setting Up an AI Policy on a Network Segment

The following example details the steps needed to set up an AI policy for a simple scenario where three cameras are situated behind the G2 interface and these cameras will generate video data (e.g. RTSP) to the server in the DMZ as illustrated below.

An AI Policy called Camera will be created. The Source Interface and Source Network defines where traffic is expected to arrive from, in this example it would be the G2 interface and the entire network segment behind it called G2_Net. The Destination Network will use all-nets to include all possible IPv4 network addresses to generate an anomaly alert if a device tries to access an IP or network outside the trained pattern, rather than simply being caught by the firewall's normal ruleset.

Similarly, "all_services" will be used as the service for the same reason, to generate an anomaly alert in case a device starts generating traffic using ports or protocols that are outside the trained model.

Command-Line Interface

Create an AI Policy for the targeted traffic:

Device:/> add AIPolicy Camera
			SourceInterface=G2
			SourceNetwork=G2_Net
			DestinationNetwork=all-nets
			Service=all_services
			AmountOfTrainingData=100000
			LookbackWindowSize=400

InControl

Follow similar steps to those used for the Web Interface below.

Web Interface

Create an AI Policy for the targeted traffic:

  1. Go to: Threat Prevention > Artificial intelligence > Anomaly Detection > Add > AI Policy
  2. Now enter:
    • Name: Camera
    • Source Interface: G2
    • Source Network: G2_Net
    • Destination Network: all-nets
    • Service: all_services
    • Amount of Training Data: 100 000
    • Lookback-window Size: 400

In the previous example, the AI policy was configured to use an entire network as the source network. This means that if an anomaly is detected it would not provide enough details for the administrator to know exactly where the anomaly was detected. cOS Core would report that the AI policy Camera reported an anomaly but it would then be up to the administrator to investigate further using logs and other data to figure out why an anomaly was reported and which device(s) triggered the anomaly.

The anomaly in the above example could have triggered on either of the cameras, the server or even something in between like the switch, bad cable or other. The main point would be that something changed that caused the trained AI policy to report an anomaly.

This is one of the main reasons Section 8.1.2, Usage Considerations mentions that there is no right or wrong way to configure an AI policy. It depends on the type of granularity the administrator needs when using Anomaly Detection to monitor network status. To gain more granularity additional AI policies can be used as described below.

Example 8.2. Setting Up Multiple AI Polices on the same Network Segment

A way to extend the configuration in the previous example, the following example details the steps needed to add additional AI polices to monitor each of the three cameras that are situated behind the G2 interface. These cameras will generate video data (e.g. RTSP) to the server in the DMZ as illustrated below.

Here three AI Polices called Camera-1, Camera-2 and Camera-3 will be created. The Source Interface and Source Network define where traffic is expected to arrive from, in this example it would be the G2 interface and the address book objects that are created to contain the IP address to each camera. The Destination Network will use all-nets to include all possible IPv4 network addresses to generate an anomaly alert if a device tries to access an IP or network outside the trained pattern, rather than simply being caught by the firewall's normal ruleset.

Similarly, "all_services" will be used as the service for the same reason, to generate an anomaly alert in case a device starts generating traffic using ports or protocols that are outside the trained model.

Command-Line Interface

Create AI Polices that targets each camera as the source:

Device:/> add AIPolicy Camera-1
			SourceInterface=G2
			SourceNetwork=Camera-1_ip
			DestinationNetwork=all-nets
			Service=all_services
			AmountOfTrainingData=100000
			LookbackWindowSize=400
Device:/> add AIPolicy Camera-2
			SourceInterface=G2
			SourceNetwork=Camera-2_ip
			DestinationNetwork=all-nets
			Service=all_services
			AmountOfTrainingData=100000
			LookbackWindowSize=400
Device:/> add AIPolicy Camera-3
			SourceInterface=G2
			SourceNetwork=Camera-3_ip
			DestinationNetwork=all-nets
			Service=all_services
			AmountOfTrainingData=100000
			LookbackWindowSize=400

InControl

Follow similar steps to those used for the Web Interface below.

Web Interface

Create an AI Policy for the targeted traffic:

  1. Go to: Threat Prevention > Artificial intelligence > Anomaly Detection > Add > AI Policy
  2. Now enter:
    • Name: Camera-1
    • Source Interface: G2
    • Source Network: Camera-1_ip
    • Destination Network: all-nets
    • Service: all_services
    • Amount of Training Data: 100 000
    • Lookback-window Size: 400
  3. And for Camera-2:
    • Name: Camera-2
    • Source Interface: G2
    • Source Network: Camera-2_ip
    • Destination Network: all-nets
    • Service: all_services
    • Amount of Training Data: 100 000
    • Lookback-window Size: 400
  4. And for Camera-3:
    • Name: Camera-3
    • Source Interface: G2
    • Source Network: Camera-3_ip
    • Destination Network: all-nets
    • Service: all_services
    • Amount of Training Data: 100 000
    • Lookback-window Size: 400

Example Summary

An important note regarding the two examples is that they can can be combined. Meaning both examples can be configured and used at the same time. The administrator can then get two notifications that, for example, an anomaly was detected in the G2 network and an anomaly was detected in e.g. Camera-2. Which could help the administrator narrow down where the anomaly was detected.

The amount of training data and look-back window size can optionally be adjusted based on requirements and needs, see Section 8.1.3.1, Advanced Options for more information.

There are many different ways to configure an AI policy, it is the administrator's decision regarding what kind of monitoring granularity would be appropriate to apply to the target network(s).

[Note] Note: More Policies Means Higher CPU requirements

The more IP policies configured, the more trained models will be used, which would result in higher CPU requirements.

AI Runtime Information and Status

In the Web Interface under Status > AI Policies the runtime status and information about configured IP Policies can be found. Information such as:
  • Status graphs

  • Policy status

  • Analyzed Packets

  • Number of dropped packets

  • Event History

  • And more can be found

The options to Restart or Retrain the AI Policy model exist in this area as well.