Loading post...
Loading post...

It’s Friday afternoon. You push the last commit of the week. The CI/CD pipeline kicks off. You grab a coffee, waiting for the screen to turn green.
But the notification comes back red: "Test Failed".
You don't panic. You don't even check the logs to see the error. You just sigh and hit that infamous button: "Re-run job."
Ten minutes later, it passes. The screen is green. "Just a transient network issue," you tell yourself, and close your laptop.
In that precise moment, you didn't just ignore a potential bug in your code; you silently killed something far more valuable: your team's belief in quality.
In the automation world, we call this a "Flaky Test." But this isn't just a technical term for a test that sometimes passes and sometimes fails; it is a psychological phenomenon that rots engineering culture from the inside out.
In software engineering, determinism is essential. Given the same inputs, we expect the same outputs. Flaky tests violate this fundamental rule. If a test fails even 1% of the time for no identifiable reason, it’s no longer a verification tool; it’s a coin toss.
When engineers stop trusting a tool (in this case, the test suite), that tool becomes useless, no matter how sophisticated it is. It is like a fire alarm that constantly goes off for no reason; eventually, you just ignore the sirens.
In the industry, we call this "Alert Fatigue." When red builds become routine noise rather than actionable signals, real disasters become inevitable.
Sociologist Diane Vaughan coined the term "Normalization of Deviance" while reviewing the Challenger space shuttle disaster. She found that when technical teams observe a problem (like a leaky O-ring) long enough without immediate catastrophic results, they begin to accept the problem not as a "failure," but as a normal part of the system's operation.
This is exactly what happens in software teams dealing with flaky tests.
The moment the phrase "Oh, that login test fails sometimes, just retry it" becomes acceptable in a stand-up meeting, your quality standard has dropped to the level of that flaky test. Because someday, when that test fails due to a real, critical bug, no one will look twice. Everyone will just hit "retry," and you might ship a live incident to production.
Flaky tests are the "broken windows" of a codebase. According to this famous criminological theory: if a window in a building is broken and left unrepaired, passersby conclude that no one cares, and soon more windows will be broken.
If you and your team are used to seeing a few "reds" or "yellows" in your daily test reports, you start tolerating imperfections in new code too. The subconscious thought "The build is never fully green anyway" kills perfectionism.
Once discipline loosens, technical debt grows like an avalanche.
So, what is the solution? Writing better intelligent wait commands in Selenium? Buying beefier CI servers?
No, the solution requires a philosophical stance shift.
The approach of engineering giants like Google and Netflix toward this issue is radical but necessary: An untrustworthy test is worse than no test at all.
If a test is flaky and cannot be fixed immediately, quarantine it or delete it.
Sounds scary, doesn't it? Deleting a test reduces coverage. But having lower coverage is a more honest approach than the illusion created by a test that gives false information, wastes the team's time, and erodes trust.
Test automation isn't just built for catching bugs. Automation exists so developers can lean back after a change and say, "Yes, this change is safe, I'm confident."
If even one test in your suite makes you ask "Is it really safe, though?", that test is failing its primary duty. Fighting flaky tests isn't just a battle for code quality; it's a battle to preserve your team's sanity and mutual trust.
Next time the CI pipeline turns red, stop before hitting that "retry" button. Ask yourself: Am I just bypassing a temporary technical glitch right now, or am I chipping away another piece of our engineering culture?