Use Cases
1. High CPU Utilization in a Corporate Office's Authentication Server
A corporate office hosts an Active Directory (AD) Authentication Server that handles employee logins. It’s common for the CPU to spike at 10:00 AM, as most users authenticate at the start of the workday.
Challenge
The administrator knows 80–85% CPU usage at 10 AM is normal.
However, they want to detect:
Sudden surges beyond typical high usage
Situations where high CPU sustains longer than expected
Different thresholds per server type, since each behaves differently
Using Static Thresholds (like CPU ≥ 80%) results in too many false alerts.
Solution: Adaptive Threshold Configuration
The team enables adaptive thresholding, which uses a model trained on 30 days of CPU data to define expected usage dynamically.
Time
Predicted
Upper Bound
Lower Bound
10:00 AM
80%
85%
70%
Configuration Inputs
Label
Value
Why?
Severity
Critical
High CPU usage impacts login services
Factor (Upper)
1
Allow 1 bandwidth above the Upper Bound
Poll Points
3
15 minutes of CPU tracking
Breached %
100%
All 3 values must cross the limit
Alert Above
(not configured)
Trust model prediction only
Threshold Calculation
Upper Band = Upper Bound − Predicted = 85 − 80 = 5%
Factor = 1
Upper Limit = 85 + (1 × 5) = 90%
An alert is raised only if the CPU crosses 90% in 3 consecutive readings.
Poll Data (5-min intervals)
Time
CPU Usage
10:00 AM
92%
10:05 AM
91%
10:10 AM
93%
All values > 90% → Anomaly triggered
Investigation Outcome
The IT team finds that an unoptimized login script overwhelms the server during logins. They optimize the script, and the load returns to expected patterns.
Why Adaptive Threshold Helped
Model learned hourly behavior – 80% at 10 AM is normal, no alert
Avoided false positives seen in the static CPU ≥ 80% rule
Flexible per-server learning: No two servers have the same ideal CPU load
Factor + Band allowed fine-grained alert sensitivity
Poll Points + Breach% validated that the spike was real and sustained
2. Low Database Connection Count in a Business-Critical Application
An e-commerce company like Swiggy relies on a backend database that handles real-time customer transactions, order placements, and app interactions. The Database Connection Count metric indicates how many active sessions are connected to the database from the application.
During business hours, this count should be high (~500). During non-business hours, it naturally dips (~50). Any unexpected drop in these values signals a potential issue in customer access, system load, or backend connectivity.
Challenge
Static thresholds can't adapt to changing behavior over time. If a fixed limit like "connection count < 400" is used, it might cause false positives at night or weekends.
However, unexpected drops during:
Morning peak times (e.g., < 500)
Night-time baselines (e.g., < 50)
...should be captured accurately without noise.
Solution: Adaptive Threshold with Lower Limit + Alert Below
Using the ML model, the system learns hour-by-hour trends for connection count based on 30 days of historical usage.
Time
Predicted
Lower Bound
Upper Bound
11:00 AM
500
450
550
02:00 AM
60
50
80
Configuration
Label
Value
Why?
Severity
Critical
Low connections = potential business loss
Factor (Lower)
1
Standard buffer tolerance
Poll Points
3
Evaluate over 15 mins
Breached %
66%
At least 2 out of 3 must breach
Alert Below
400 (business hours), 40 (non-business)
Absolute fallback
Calculation Example (Business Hours)
Predicted = 500
Lower Bound = 450
Lower Band = 500 − 450 = 50
Factor = 1
Lower Limit = 450 − (1 × 50) = 400
If the connection count drops below 400, the system raises an alert.
Sample Poll Data (Business Hours)
Time
DB Connection Count
11:00 AM
395
11:05 AM
398
11:10 AM
397
3 values < 400 → Anomaly triggered
Investigation Outcome
Backend logs showed the application server had memory issues.
New user sessions couldn’t be established, reducing connections.
Alert helped act before the revenue impact.
Why Adaptive Threshold Helped
Model learned natural daily peaks and drops
Avoided alerts during expected night dips
Factor and Band logic helped customize tolerance
The alert below acted as a static fallback to enforce minimum levels
Last updated
Was this helpful?