Use Cases

1. High CPU Utilization in a Corporate Office's Authentication Server

A corporate office hosts an Active Directory (AD) Authentication Server that handles employee logins. It’s common for the CPU to spike at 10:00 AM, as most users authenticate at the start of the workday.

Challenge

The administrator knows 80–85% CPU usage at 10 AM is normal.

However, they want to detect:

Sudden surges beyond typical high usage
Situations where high CPU sustains longer than expected
Different thresholds per server type, since each behaves differently

Using Static Thresholds (like CPU ≥ 80%) results in too many false alerts.

Solution: Adaptive Threshold Configuration

The team enables adaptive thresholding, which uses a model trained on 30 days of CPU data to define expected usage dynamically.

Time

Predicted

Upper Bound

Lower Bound

10:00 AM

80%

85%

70%

Configuration Inputs

Label

Value

Why?

Severity

Critical

High CPU usage impacts login services

Factor (Upper)

Allow 1 bandwidth above the Upper Bound

Poll Points

15 minutes of CPU tracking

Breached %

100%

All 3 values must cross the limit

Alert Above

(not configured)

Trust model prediction only

Threshold Calculation

Upper Band = Upper Bound − Predicted = 85 − 80 = 5%
Factor = 1
Upper Limit = 85 + (1 × 5) = 90%

An alert is raised only if the CPU crosses 90% in 3 consecutive readings.

Poll Data (5-min intervals)

Time

CPU Usage

10:00 AM

92%

10:05 AM

91%

10:10 AM

93%

All values > 90% → Anomaly triggered

Investigation Outcome

The IT team finds that an unoptimized login script overwhelms the server during logins. They optimize the script, and the load returns to expected patterns.

Why Adaptive Threshold Helped

Model learned hourly behavior – 80% at 10 AM is normal, no alert
Avoided false positives seen in the static CPU ≥ 80% rule
Flexible per-server learning: No two servers have the same ideal CPU load
Factor + Band allowed fine-grained alert sensitivity
Poll Points + Breach% validated that the spike was real and sustained

2. Low Database Connection Count in a Business-Critical Application

An e-commerce company like Swiggy relies on a backend database that handles real-time customer transactions, order placements, and app interactions. The Database Connection Count metric indicates how many active sessions are connected to the database from the application.

During business hours, this count should be high (~500). During non-business hours, it naturally dips (~50). Any unexpected drop in these values signals a potential issue in customer access, system load, or backend connectivity.

Challenge

Static thresholds can't adapt to changing behavior over time. If a fixed limit like "connection count < 400" is used, it might cause false positives at night or weekends.

However, unexpected drops during:

Morning peak times (e.g., < 500)
Night-time baselines (e.g., < 50)

...should be captured accurately without noise.

Solution: Adaptive Threshold with Lower Limit + Alert Below

Using the ML model, the system learns hour-by-hour trends for connection count based on 30 days of historical usage.

Time

Predicted

Lower Bound

Upper Bound

11:00 AM

500

450

550

02:00 AM

Configuration

Label

Value

Why?

Severity

Critical

Low connections = potential business loss

Factor (Lower)

Standard buffer tolerance

Poll Points

Evaluate over 15 mins

Breached %

66%

At least 2 out of 3 must breach

Alert Below

400 (business hours), 40 (non-business)

Absolute fallback

Calculation Example (Business Hours)

Predicted = 500
Lower Bound = 450
Lower Band = 500 − 450 = 50
Factor = 1
Lower Limit = 450 − (1 × 50) = 400

If the connection count drops below 400, the system raises an alert.

Sample Poll Data (Business Hours)

Time

DB Connection Count

11:00 AM

395

11:05 AM

398

11:10 AM

397

3 values < 400 → Anomaly triggered

Investigation Outcome

Backend logs showed the application server had memory issues.
New user sessions couldn’t be established, reducing connections.
Alert helped act before the revenue impact.

Why Adaptive Threshold Helped

Model learned natural daily peaks and drops
Avoided alerts during expected night dips
Factor and Band logic helped customize tolerance
The alert below acted as a static fallback to enforce minimum levels

PreviousAIOps Configuration NextSeasonal Event Interface

Last updated 18 hours ago

Was this helpful?