Use Cases

1. High CPU Utilization in a Corporate Office's Authentication Server

A corporate office hosts an Active Directory (AD) Authentication Server that handles employee logins. It’s common for the CPU to spike at 10:00 AM, as most users authenticate at the start of the workday.

Challenge

The administrator knows 80–85% CPU usage at 10 AM is normal.

However, they want to detect:

  • Sudden surges beyond typical high usage

  • Situations where high CPU sustains longer than expected

  • Different thresholds per server type, since each behaves differently

Using Static Thresholds (like CPU ≥ 80%) results in too many false alerts.

Solution: Adaptive Threshold Configuration

The team enables adaptive thresholding, which uses a model trained on 30 days of CPU data to define expected usage dynamically.

Time

Predicted

Upper Bound

Lower Bound

10:00 AM

80%

85%

70%

Configuration Inputs

Label

Value

Why?

Severity

Critical

High CPU usage impacts login services

Factor (Upper)

1

Allow 1 bandwidth above the Upper Bound

Poll Points

3

15 minutes of CPU tracking

Breached %

100%

All 3 values must cross the limit

Alert Above

(not configured)

Trust model prediction only

Threshold Calculation

  • Upper Band = Upper Bound − Predicted = 85 − 80 = 5%

  • Factor = 1

  • Upper Limit = 85 + (1 × 5) = 90%

An alert is raised only if the CPU crosses 90% in 3 consecutive readings.

Poll Data (5-min intervals)

Time

CPU Usage

10:00 AM

92%

10:05 AM

91%

10:10 AM

93%

Investigation Outcome

The IT team finds that an unoptimized login script overwhelms the server during logins. They optimize the script, and the load returns to expected patterns.

Why Adaptive Threshold Helped

  • Model learned hourly behavior – 80% at 10 AM is normal, no alert

  • Avoided false positives seen in the static CPU ≥ 80% rule

  • Flexible per-server learning: No two servers have the same ideal CPU load

  • Factor + Band allowed fine-grained alert sensitivity

  • Poll Points + Breach% validated that the spike was real and sustained

2. Low Database Connection Count in a Business-Critical Application

An e-commerce company like Swiggy relies on a backend database that handles real-time customer transactions, order placements, and app interactions. The Database Connection Count metric indicates how many active sessions are connected to the database from the application.

During business hours, this count should be high (~500). During non-business hours, it naturally dips (~50). Any unexpected drop in these values signals a potential issue in customer access, system load, or backend connectivity.

Challenge

Static thresholds can't adapt to changing behavior over time. If a fixed limit like "connection count < 400" is used, it might cause false positives at night or weekends.

However, unexpected drops during:

  • Morning peak times (e.g., < 500)

  • Night-time baselines (e.g., < 50)

...should be captured accurately without noise.

Solution: Adaptive Threshold with Lower Limit + Alert Below

Using the ML model, the system learns hour-by-hour trends for connection count based on 30 days of historical usage.

Time

Predicted

Lower Bound

Upper Bound

11:00 AM

500

450

550

02:00 AM

60

50

80

Configuration

Label

Value

Why?

Severity

Critical

Low connections = potential business loss

Factor (Lower)

1

Standard buffer tolerance

Poll Points

3

Evaluate over 15 mins

Breached %

66%

At least 2 out of 3 must breach

Alert Below

400 (business hours), 40 (non-business)

Absolute fallback

Calculation Example (Business Hours)

  • Predicted = 500

  • Lower Bound = 450

  • Lower Band = 500 − 450 = 50

  • Factor = 1

  • Lower Limit = 450 − (1 × 50) = 400

If the connection count drops below 400, the system raises an alert.

Sample Poll Data (Business Hours)

Time

DB Connection Count

11:00 AM

395

11:05 AM

398

11:10 AM

397

Investigation Outcome

  • Backend logs showed the application server had memory issues.

  • New user sessions couldn’t be established, reducing connections.

  • Alert helped act before the revenue impact.

Why Adaptive Threshold Helped

  • Model learned natural daily peaks and drops

  • Avoided alerts during expected night dips

  • Factor and Band logic helped customize tolerance

  • The alert below acted as a static fallback to enforce minimum levels

Last updated

Was this helpful?