Skip to main content

Investigate flaky tests

This tutorial walks you through the standard flaky test investigation process using TestOps capabilities.

testops flaky tag

Sometimes pass and sometimes fail, flaky tests cause false-alarm and waste investigation time. TestOps uses Probabilistic Flakiness Score (PFS) calculated by analyzing pass-to-fail transition patterns across your test execution history, to help you quickly detect and troubleshoot flaky test cases across your test suites.

Before you investigate, read PFS Calculation and Smart Tag configuration to understand default configurations for flakiness detection and how to customize them.

Steps to investigate flaky tests​

Step 1: Detect flaky tests​

In TestOps, you can view flaky tests via multiple routes:

testops test case health analytics report
  • Test Run Details page while assessing test case failure (see Investigate Test Failures to learn more). Flaky tests are flagged with tag flaky (can be filtered):
testops test run list with flaky tests
tip
  • Apply filter "Flaky" to quickly see all flaky tests.
  • Test Case Health Analysis Report displays PFS score for flaky tests. Tests with high PFSs should be prioritized.

Either entry should display all test cases and flag the ones that are potentially flaky.

Step 2: Prioritize flaky test to resolve​

Not all flaky tests are equally urgent:

  • Focus on tests for critical paths first (login, checkout, payment flows…)
  • Prioritize fixing critical tests with high PFSs.
  • Prioritize frequently run tests.

Once you've decide on a test to investigate, click on a execution to view the run details.

Step 3: Analyze result patterns​

In the Test run detail page > Tests tab, you should see a trend visualization line, that lists past results of the same test:

testops historical results trend line

You may spot patterns that suggest flakiness. For example, a historical trend that reads Passed → Failed → Passed → Failed with no discernible logic highly suggests flakiness.

Recent codebase deployments should also be considered. For example:

  • Deployment A at Run #10, Test passes 5 times after => may be due to code changes, not a flaky test.
  • Deployment B at Run #20, but test passes 3 times, fails 2 times every 5 runs regardless of deployment => pattern suggests flakiness, not code regression

Step 4: Determine root cause​

Hover on a failed result of a flaky test case to view the execution log:

testops test result details
tip
  • Scan for timing-related warnings (Element not found, Timeout waiting for element, ...) - these are most common causes. See table of common causes below for more details.
  • Use AI analysis feature on failed results to have AI provide a pointer to investigate further. See Investigate Failure - Triage with AI to learn more.

Common causes of flakiness​

Below are typical causes of flaky tests you may find useful:

TypeCausesPossible solution
Timing and Synchronization Issues- Hard-coded waits (e.g. WebUI.delay(5)) that are too short or too long

- Missing waits for dynamic content, AJAX calls, or animations

- Event listeners not attached before events fire

- Animation timing causing element positions to change mid-interaction
Use explicit waits with conditions (wait for element to be clickable, visible, etc.)

Wait for specific application states, use debouncing, disable animations in test mode...
Dynamic Selectors- Unstable element IDs (e.g. auto-generated, timestamp-based)

- Index-based selectors (e.g. CSS selector div:nth-child(3) breaks when children element order changes)

- Fragile XPath that depends on deep DOM hierarchy
Use stable selectors (data-testid, ARIA labels, semantic attributes)...
Test Data Dependencies- Inexplicit test data (resulted from previous test)

- State not cleaned up between runs (cache, cookies, local storage)

- Database state leakage (previous test creates records affecting next test)
Implement test data isolation, setup/teardown hooks, database transactions...
Environment Variability- Resource contention (CPU, memory, network bandwidth)

- Third-party service flakiness (external APIs, payment gateways)

- Browser version differences across test agents

- Network latency causing timeouts
Mock external dependencies, increase timeout thresholds...
Parallel / Concurrency Conflicts- Tests running in parallel modifying the same database records, files, or shared resources

- Tests competing for the same ports, queues, or temporary directories

- Order-dependent tests where one test relies on another test's side effects
Ensure tests are independent, isolate resources per test (unique data, temp dirs, ports), use proper locking or transactional rollbacks

Once you've determined the cause, edit your test cases, and re-execute the test several times to check.

It's helpful to add attributes in Customizable fields, for future referencing. See Customizable fields to learn how to configure these fields for your project.

Was this page helpful?