Post
DK1 teams eventually accumulate a pile of questionable runs: some are useful failures, some are broken captures, and some look bad only because the replay tooling is weak.
How are you triaging failed DK1 episodes and using failure replay to decide what to keep, relabel, or discard?
Please share how you replay bad runs, what metadata or signals you inspect first, and when a failure is still useful for training or evaluation.
If you reply, include one exact replay clue that changed your decision about a bad episode.