> For the complete documentation index, see [llms.txt](https://docs.infraon.io/infraon-help/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.infraon.io/infraon-help/infinity-user-guide/infraon-configuration/it-operations/thresholds/aiops-configuration/use-cases.md).

# Use Cases

## **1. High CPU Utilization in a Corporate Office's Authentication Server**

A corporate office hosts an **Active Directory (AD) Authentication Server** that handles employee logins. It’s common for the CPU to spike at **10:00 AM**, as most users authenticate at the start of the workday.

### **Challenge**

The administrator knows 80–85% CPU usage at 10 AM is **normal**.

However, they want to detect:

* Sudden surges **beyond typical high usage**
* Situations where **high CPU sustains longer than expected**
* Different thresholds per **server type**, since each behaves differently

Using **Static Thresholds** (like CPU ≥ 80%) results in too many false alerts.

### **Solution: Adaptive Threshold Configuration**

The team enables adaptive thresholding, which uses a model trained on 30 days of CPU data to define expected usage dynamically.

| **Time**     | **Predicted** | **Upper Bound** | **Lower Bound** |
| ------------ | ------------- | --------------- | --------------- |
| **10:00 AM** | 80%           | 85%             | 70%             |

#### **Configuration Inputs**

| Label          | Value              | Why?                                    |
| -------------- | ------------------ | --------------------------------------- |
| Severity       | Critical           | High CPU usage impacts login services   |
| Factor (Upper) | 1                  | Allow 1 bandwidth above the Upper Bound |
| Poll Points    | 3                  | 15 minutes of CPU tracking              |
| Breached %     | 100%               | All 3 values must cross the limit       |
| Alert Above    | *(not configured)* | Trust model prediction only             |

\
**Threshold Calculation**

* **Upper Band** = Upper Bound − Predicted = 85 − 80 = 5%
* **Factor** = 1
* **Upper Limit** = 85 + (1 × 5) = **90%**

An alert is raised only if the CPU crosses **90%** in **3** consecutive readings.

**Poll Data (5-min intervals)**

| Time     | CPU Usage |
| -------- | --------- |
| 10:00 AM | 92%       |
| 10:05 AM | 91%       |
| 10:10 AM | 93%       |

{% hint style="danger" %}
All values > 90% → **Anomaly triggered**
{% endhint %}

### **Investigation Outcome**

The IT team finds that an **unoptimized login script** overwhelms the server during logins. They optimize the script, and the load returns to expected patterns.

**Why Adaptive Threshold Helped**

* **Model learned hourly behavior** – 80% at 10 AM is normal, no alert
* **Avoided false positives** seen in the static CPU ≥ 80% rule
* **Flexible per-server** learning: No two servers have the same ideal CPU load
* **Factor + Band** allowed fine-grained alert sensitivity
* **Poll Points + Breach%** validated that the spike was real and sustained

## **2. Low Database Connection Count in a Business-Critical Application**

An e-commerce company like **Swiggy** relies on a backend database that handles real-time customer transactions, order placements, and app interactions. The **Database Connection Count** metric indicates how many active sessions are connected to the database from the application.

During **business hours**, this count should be high (\~500). During **non-business hours**, it naturally dips (\~50).\
Any **unexpected drop** in these values signals a potential issue in customer access, system load, or backend connectivity.

### **Challenge**

Static thresholds can't adapt to changing behavior over time.\
If a **fixed limit like "connection count < 400"** is used, it might cause false positives at night or weekends.

However, **unexpected drops** during:

* Morning peak times (e.g., < 500)
* Night-time baselines (e.g., < 50)

...should be captured accurately without noise.

### **Solution: Adaptive Threshold with Lower Limit + Alert Below**

Using the ML model, the system learns hour-by-hour trends for connection count based on 30 days of historical usage.

| **Time**     | **Predicted** | **Lower Bound** | **Upper Bound** |
| ------------ | ------------- | --------------- | --------------- |
| **11:00 AM** | 500           | 450             | 550             |
| **02:00 AM** | 60            | 50              | 80              |

### **Configuration**

| **Label**          | **Value**                               | **Why?**                                  |
| ------------------ | --------------------------------------- | ----------------------------------------- |
| **Severity**       | Critical                                | Low connections = potential business loss |
| **Factor (Lower)** | 1                                       | Standard buffer tolerance                 |
| **Poll Points**    | 3                                       | Evaluate over 15 mins                     |
| **Breached %**     | 66%                                     | At least 2 out of 3 must breach           |
| **Alert Below**    | 400 (business hours), 40 (non-business) | Absolute fallback                         |

\
**Calculation Example (Business Hours)**

* **Predicted** = 500
* **Lower Bound** = 450
* **Lower Band** = 500 − 450 = 50
* **Factor** = 1
* **Lower Limit** = 450 − (1 × 50) = **400**

If the **connection count drops below 400**, the system raises an alert.

#### **Sample Poll Data (Business Hours)**

| Time     | DB Connection Count |
| -------- | ------------------- |
| 11:00 AM | 395                 |
| 11:05 AM | 398                 |
| 11:10 AM | 397                 |

{% hint style="danger" %}
3 values < 400 → **Anomaly triggered**
{% endhint %}

### **Investigation Outcome**

* Backend logs showed the **application server** had memory issues.
* New user sessions couldn’t be established, reducing connections.
* Alert helped act before the revenue impact.

### **Why Adaptive Threshold Helped**

* Model learned natural daily peaks and drops
* Avoided alerts during expected night dips
* Factor and Band logic helped customize tolerance
* The **alert below** acted as a static fallback to enforce minimum levels


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.infraon.io/infraon-help/infinity-user-guide/infraon-configuration/it-operations/thresholds/aiops-configuration/use-cases.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
