# Use Cases

## **1. High CPU Utilization in a Corporate Office's Authentication Server**

A corporate office hosts an **Active Directory (AD) Authentication Server** that handles employee logins. It’s common for the CPU to spike at **10:00 AM**, as most users authenticate at the start of the workday.

### **Challenge**

The administrator knows 80–85% CPU usage at 10 AM is **normal**.

However, they want to detect:

* Sudden surges **beyond typical high usage**
* Situations where **high CPU sustains longer than expected**
* Different thresholds per **server type**, since each behaves differently

Using **Static Thresholds** (like CPU ≥ 80%) results in too many false alerts.

### **Solution: Adaptive Threshold Configuration**

The team enables adaptive thresholding, which uses a model trained on 30 days of CPU data to define expected usage dynamically.

| **Time**     | **Predicted** | **Upper Bound** | **Lower Bound** |
| ------------ | ------------- | --------------- | --------------- |
| **10:00 AM** | 80%           | 85%             | 70%             |

#### **Configuration Inputs**

| Label          | Value              | Why?                                    |
| -------------- | ------------------ | --------------------------------------- |
| Severity       | Critical           | High CPU usage impacts login services   |
| Factor (Upper) | 1                  | Allow 1 bandwidth above the Upper Bound |
| Poll Points    | 3                  | 15 minutes of CPU tracking              |
| Breached %     | 100%               | All 3 values must cross the limit       |
| Alert Above    | *(not configured)* | Trust model prediction only             |

\
**Threshold Calculation**

* **Upper Band** = Upper Bound − Predicted = 85 − 80 = 5%
* **Factor** = 1
* **Upper Limit** = 85 + (1 × 5) = **90%**

An alert is raised only if the CPU crosses **90%** in **3** consecutive readings.

**Poll Data (5-min intervals)**

| Time     | CPU Usage |
| -------- | --------- |
| 10:00 AM | 92%       |
| 10:05 AM | 91%       |
| 10:10 AM | 93%       |

{% hint style="danger" %}
All values > 90% → **Anomaly triggered**
{% endhint %}

### **Investigation Outcome**

The IT team finds that an **unoptimized login script** overwhelms the server during logins. They optimize the script, and the load returns to expected patterns.

**Why Adaptive Threshold Helped**

* **Model learned hourly behavior** – 80% at 10 AM is normal, no alert
* **Avoided false positives** seen in the static CPU ≥ 80% rule
* **Flexible per-server** learning: No two servers have the same ideal CPU load
* **Factor + Band** allowed fine-grained alert sensitivity
* **Poll Points + Breach%** validated that the spike was real and sustained

## **2. Low Database Connection Count in a Business-Critical Application**

An e-commerce company like **Swiggy** relies on a backend database that handles real-time customer transactions, order placements, and app interactions. The **Database Connection Count** metric indicates how many active sessions are connected to the database from the application.

During **business hours**, this count should be high (\~500). During **non-business hours**, it naturally dips (\~50).\
Any **unexpected drop** in these values signals a potential issue in customer access, system load, or backend connectivity.

### **Challenge**

Static thresholds can't adapt to changing behavior over time.\
If a **fixed limit like "connection count < 400"** is used, it might cause false positives at night or weekends.

However, **unexpected drops** during:

* Morning peak times (e.g., < 500)
* Night-time baselines (e.g., < 50)

...should be captured accurately without noise.

### **Solution: Adaptive Threshold with Lower Limit + Alert Below**

Using the ML model, the system learns hour-by-hour trends for connection count based on 30 days of historical usage.

| **Time**     | **Predicted** | **Lower Bound** | **Upper Bound** |
| ------------ | ------------- | --------------- | --------------- |
| **11:00 AM** | 500           | 450             | 550             |
| **02:00 AM** | 60            | 50              | 80              |

### **Configuration**

| **Label**          | **Value**                               | **Why?**                                  |
| ------------------ | --------------------------------------- | ----------------------------------------- |
| **Severity**       | Critical                                | Low connections = potential business loss |
| **Factor (Lower)** | 1                                       | Standard buffer tolerance                 |
| **Poll Points**    | 3                                       | Evaluate over 15 mins                     |
| **Breached %**     | 66%                                     | At least 2 out of 3 must breach           |
| **Alert Below**    | 400 (business hours), 40 (non-business) | Absolute fallback                         |

\
**Calculation Example (Business Hours)**

* **Predicted** = 500
* **Lower Bound** = 450
* **Lower Band** = 500 − 450 = 50
* **Factor** = 1
* **Lower Limit** = 450 − (1 × 50) = **400**

If the **connection count drops below 400**, the system raises an alert.

#### **Sample Poll Data (Business Hours)**

| Time     | DB Connection Count |
| -------- | ------------------- |
| 11:00 AM | 395                 |
| 11:05 AM | 398                 |
| 11:10 AM | 397                 |

{% hint style="danger" %}
3 values < 400 → **Anomaly triggered**
{% endhint %}

### **Investigation Outcome**

* Backend logs showed the **application server** had memory issues.
* New user sessions couldn’t be established, reducing connections.
* Alert helped act before the revenue impact.

### **Why Adaptive Threshold Helped**

* Model learned natural daily peaks and drops
* Avoided alerts during expected night dips
* Factor and Band logic helped customize tolerance
* The **alert below** acted as a static fallback to enforce minimum levels
