Flaky Test Detection

Introduction

Flaky Score is an AI-driven feature that evaluates the stability of test cases by analyzing their execution history. It indicates the likelihood of a test case behaving unpredictably, helping testers assess the risk associated with its execution.

Flaky test cases can disrupt testing pipelines and undermine confidence in test results. Traditionally, identifying such tests required manual comparison of execution results across multiple runs. With a flaky score, this process is now automated, enabling teams to detect and manage flaky tests more efficiently and reliably.

QA Managers can define and configure settings according to their specific testing processes for calculating the flaky score, ensuring its relevance to their testing methodologies.

For example, the following table shows the execution results of the test cases executed multiple times.

Note

Flaky Score is calculated only for the executed test cases and is determined based on the latest X number of executions. The Flaky Score ranges between 0 (Not Flaky) and 1 (Flaky).

Test Case Name

Test 1

Test 2

Test 3

Test 4

Test 5

Test 6

Test 7

Test 8

Flaky or Non-flaky?

Test Case A

Pass

Pass

Pass

Pass

Pass

Pass

Pass

Pass

Non-flaky

Test Case B

Fail

Fail

Fail

Fail

Fail

Fail

Fail

Fail

Non-flaky

Test Case C

Pass

Pass

Fail

Fail

Pass

Fail

Pass

Fail

Flaky

Note

  • Only the final execution status assigned to the test case is considered.

  • The Flaky Score is calculated only for the test executions of the same project.

Flaky Score Instance Level Settings

To activate the Flaky Score feature, the super administrator must enable it for the instance.

To enable the flaky score, go to Customizations, and select General Settings and Audit. Next, select AI Configurations. Navigate to the Flaky Score and Success Rate Configuration section. By default, this section is disabled.

Enable the Allow Project Admins to Enable and Configure the Flaky Score for their project settings for Flaky Score calculations.

QMetry’s AI Configurations screen under the Customization module showing various AI-enabled options, including Auto Generate Test Cases and Steps using QI, Generate SQL reports, and QI Search: Ask me Anything. The section highlighted at the bottom, Flaky Score and Success Rate Configuration, shows toggles that allow project admins to enable and configure Flaky Score and Success Rate settings for their respective projects.

Flaky Score Project Level Settings

To enable flaky score, users must enable these settings for individual projects in the instance.

Note

  • Allow Project Admins to Enable and Configure the Flaky Score for their project must be enabled in the Customization module.

  • Project admins and users with the Flaky Score - Modify or Flaky Score - Generate permissions can configure these settings.

To enable project-level settings for flaky score, perform the following steps:

  1. Go to Intelligence.

  2. Select Flaky Score Settings.

  3. Open the Configurations tab.

  4. Turn on Enable Flaky Score for the Project.

    QMetry Flaky Score Settings page under QMetry Intelligence showing the configuration option Enable Flaky Score for the Project toggled on. This setting allows administrators to activate flaky score calculation for selected test executions within a specific project.

Flaky Score Configuration

The settings on this page enable the admin to define the scope of test executions considered for Flaky Score calculation. AI will calculate the Flaky Score based on the filtered test execution records, using the specified criteria. The existing Flaky Score will be reset and recalculated whenever the configuration is modified.

Provide the following details for the Flaky Score rate configuration:

  • Test Case Executions: Mention the number of the latest test executions for a test case you want to consider while calculating the flaky score. 10 test executions are selected by default. The Test Case Execution value should be from 1 to 1000.

  • Test Case Versions: Select the executions for either the “Latest Version” or “All Version” test case versions to consider while calculating the flaky score. The “Latest Version” test case version is selected by default.

    • Latest Version: Only the test executions of the latest test case versions will be considered when calculating the flaky score.

    • All Versions: The latest test executions of all test case versions will be considered while calculating the flaky score.

  • Execution Timeframe (days): Mention the days to define the timeframe in which the test executions were executed in the last specified days. By default, 90 days are selected, indicating that test executions carried out during the previous 90 days will be considered when calculating the flaky score. If left blank, it will be considered as all the days.

  • Release: Select Release(s) to consider test executions for those releases. “All” releases are selected by default.

  • Cycle: Select Cycle(s) to consider test executions for those cycles. “All” cycles are selected by default.

  • Platform: Select Platform(s) to consider test executions for those platforms. “All” platforms are selected by default.

  • Build: Select build(s) to consider the test executions pertaining to that particular build(s). “All” builds are selected by default.

  • Test Case Execution Status (All execution statuses other than selected as passed and failed will be considered as Skipped. Skipped status will not be considered for calculation).

    • As Passed: The test execution with these statuses will be considered as "Passed" while calculating the Flaky Score. The default value is “Passed”.

    • As Failed: The test executions with these statuses will be considered as "Failed" while calculating the Flaky Score. The default values are “Failed” and “Blocked”.

Click Update to save the Flaky Score Settings.

To reset the settings, click the Reset button.

To generate the flaky score, click the Generate button.

The success message pops up.

Note

The Flaky Score can be calculated once in 24 hours for each project. Once you generate the Flaky Score, the Generate button will remain disabled for the next 24 hours.

When you hover over the Generate button, it displays the details of the user who generated the Flaky Score most recently, along with the timestamp.

QMetry Flaky Score Settings configuration page displaying parameters for calculating flaky scores in a project. Options include defining Test Case Executions, Version scope, Execution Time Frame (in Days), Release, Cycle, Platform, and Build. The As Passed and As Failed statuses determine which test executions are included.

Once you generate the Flaky Score on the Flaky Score Settings screen, you can view the Flaky Score on the test case list view, test case details screen, test case link screens, and test execution screens. You can show or hide the Flaky Score column on the test case list view, test case details screen, test case link screen, and test execution screens.

Display of Flaky Score

The lesser the value (closer to 0), the less flaky the test case. It means the behavior is deterministic.

The larger the value (closer to 1), the greater the flakiness of the test case. It means the behavior is non-deterministic.

The following table interprets the intensity of the flaky score in accordance with its derived calculation and color code.

Flaky Score Range

Intensity of Flakiness

Between 0.81-1

High

Between 0.41-0.80

Medium

Between 0-0.40

Low

Filter Test Cases on Flaky Score

The test cases can be filtered on Flaky Score. You can find the Flaky Score under the Advanced Filters.

View Flaky Score on Test Case List View

According to the Flaky Score settings configured in the Configuration, the Flaky Score is calculated and displayed on the test case list view. It allows QA managers and Testers to view the risk probability and test execution history.

Go to the Test Cases module and make the Flaky Score column visible for the list view.

  • Flaky Score: It is calculated based on the frequency of pass or fail results.

    QMetry Test Cases module displaying the column selection menu with the Flaky Score option checked. This allows users to view each test case’s flaky score directly in the test case list. The example shows a test folder named FitTracker with multiple test cases listed under it.

You can view the Flaky Score column, along with its corresponding statistics. You can sort on the column to view test cases with higher flaky scores.

Click the Flaky Score to view the traceability report for the test case, which shows the test case is associated with which test executions, issues, and execution results. The report helps you further analyze the Flaky Score.

QMetry Test Executions and Issues window displaying a side-by-side view of test runs and associated defects. The example shows test case FIT-TS-34 with one execution marked as Failed under the MacroQ2 cycle and another marked as Passed under R1C2. The related issue FIT-IS-726 is listed with details such as Status: Done, Type: Bug, Priority: High, and Resolution: Done.

If the Flaky Score Settings are changed, the existing Flaky Score will get reset.

The info icon beside the Flaky Score column displays the details of when the score was last generated and by whom.

QMetry Test Cases list showing the Flaky Score (F.S.) column with an information tooltip open. The tooltip displays the last generation details. It also guides users to navigate to the Flaky Score Settings page to update or manage configurations.

View Flaky Score on Test Case Detail Page

The generated Flaky Score is displayed beside the Test Case Key at the top of the screen.

QMetry Test Case Details page displaying the Flaky Score (F.S.) indicator with a value of 0.02 (50%) next to the test case key FIT-TC-147, titled 'Verify Login Details'. The page displays test steps, including “Enter username,” “Enter password,” and dynamic data parameters such as @Employee ID, @User Type, and @Designation.

View Flaky Score on Test Execution Screen

According to the Flaky Score settings configured in the Configuration, the Flaky Score is calculated and displayed accordingly. The flaky score indicates the tester's probability of risk while executing the test case and provides a means for comparing pre- and post-test execution results.

Go to the Test Execution screen.

Test Execution Detail View

QMetry Test Execution page for the regression suite FIT-TS-2 displaying multiple test runs. The Flaky Score (F.S.) column highlights varying flakiness levels with color-coded indicators — blue (0.00, 0.02), yellow (0.46), and red (0.96) — representing different degrees of test stability. Other columns include Executed At, Build Name, Execution Type, Success Rate (S.R.), Successor, and Predecessor.

If you go further into the Detail View, you can view the Flaky Score count at the top of the page.

Test Execution Default View

You can show or hide the Flaky Score column on the execution screen.

Click the Flaky Score to drill down to view the test executions and defects associated with the test case.

How to Reduce Flakiness?

Flaky tests can occur due to various reasons. Testers should work with their development teams to find the exact reason for the cause. Here are the 10 common causes of flaky tests:

1. Async wait:

Some tests are written in a way that requires waiting for something else to complete. Many flaky tests use sleep statements for this purpose. However, sleep statements are imprecise, and the test may fail if the waiting time exceeds expectations due to variations in processing time.

2. Concurrency issues:

Flaky tests can result from concurrency issues such as data races, atomicity violations, or deadlocks. These tests often make incorrect assumptions about the ordering of operations performed by different threads. To address this, synchronization blocks can be added or the test can be modified to accommodate a wider range of behaviors.

3. Test order dependency:

Certain tests may pass or fail depending on the order in which preceding tests were executed. A good test should be independent and able to run in any order with other tests. It should be properly isolated and set up its own expected state.

4. Timing issues:

Flaky tests can arise from timing inconsistencies when the test code relies on specific event timings. For example, if a test checks for a particular webpage element after a specific delay, network issues or differences in CPU performance between test runs can lead to intermittent failures.

5. Element locator issues:

Many automation tools use XPath to locate webpage elements, but XPath can be unstable as it is sensitive to changes in the page's DOM. Self-healing AI techniques can address challenging testing scenarios involving dynamic elements, iFrames, and pop-ups. This involves using multiple locator strategies to find an element, and switching to a backup strategy if the primary one fails. Modifications to an element's properties or the addition of similar elements can render the initial XPath invalid, resulting in false positives or negatives.

6. Test code issues:

Poorly written or ambiguous test code can contribute to flaky tests. If the test code lacks clarity regarding the expected application behavior, the test may fail or pass inconsistently. Additionally, complex test code or code relying on external dependencies may be more prone to failure.

7. Test data issues:

Tests that depend on inconsistent test data can become flaky. Corrupted test data or different test runs using the same data can lead to inconsistent results. Tests utilizing random data generators without considering the full range of possible results can also introduce flakiness. It is advisable to control the seed for random data generation and carefully consider all possible values.

8. Test configuration issues:

Inconsistent test configurations between runs can cause flaky tests. Incorrect test parameters or improper test settings setup can result in test failures.

9. Environment issues:

Flaky tests can be attributed to the execution environment. Network connectivity problems, improper handling of I/O operations, hardware differences between test runs, or variations in test environments can introduce nondeterminism, leading to flaky tests.

10. Resource leaks: Tests can become flaky if the application or tests do not adequately acquire and release resources, such as memory or database connections. To avoid such issues, it is recommended to use resource pools and ensure that resources are properly returned to the pool when no longer needed.

Publication date: