Skip to main content

Error Handling, Retries & Flaky Test Control

Flaky tests reduce trust in automation. This section explains how to handle errors, apply retries responsibly, and design stable Cucumber frameworks—especially for CI/CD.


What is a Flaky Test?

A flaky test:

  • Passes sometimes, fails sometimes
  • Fails without code changes
  • Often caused by timing, shared state, or environment instability

Rule:

Retries hide problems; fixes solve them.


Common Causes of Flakiness

  • Timing/synchronization issues
  • Shared or static state
  • Test data collisions
  • Environment instability
  • Over-parallelization
  • External dependency failures

Error Handling Strategy

Fail Fast

  • Validate preconditions early
  • Stop execution when a critical step fails
  • Avoid cascading failures

Clear Assertions

  • Assert only what matters
  • Use meaningful messages
  • Avoid over-asserting

Retry Strategies (Use Sparingly)

Retries should be:

  • Framework-level, not step-level
  • Limited (1–2 retries max)
  • Transparent in reports

Use retries when:

  • External systems are temporarily unstable
  • Network hiccups occur
  • Known transient issues exist

Avoid retries for:

  • Assertion failures
  • Business logic defects
  • Deterministic failures

Where Retries Belong

  • Test framework layer (e.g., TestNG retry analyzer)
  • Runner configuration (not inside steps)
  • Never inside feature files

Designing for Stability

  • Use scenario-level isolation
  • Control test data lifecycle
  • Prefer explicit waits over sleeps
  • Mock or stub unstable dependencies (when possible)
  • Separate smoke vs regression scopes

Flaky Test Triage Process

Detect

Classify (timing / data / env)

Fix root cause

Verify stability

Remove retry if possible

CI/CD Considerations

  • Do not mask failures with aggressive retries
  • Track flaky tests separately
  • Fail fast on critical paths (smoke)
  • Quarantine unstable tests temporarily

Common Anti-Patterns ❌

  • Blind retries for all failures
  • Retrying inside steps or hooks
  • Ignoring flaky test metrics
  • Increasing waits everywhere

Interview-Ready Questions

Q: What is a flaky test?
A: A test that fails intermittently without code changes.

Q: When should retries be used?
A: Only for transient, non-deterministic failures.


Key Takeaways

  • Flakiness must be fixed, not hidden
  • Retries are a last resort
  • Design for isolation and determinism
  • CI visibility is essential
  • Stable tests build trust