In video understanding tasks, rules can serve as a framework to interpret complex scenarios, ensuring consistent recognition of actions and events based on shared standards. By defining expectations for behaviors—such as identifying violations or fouls in sports—rules guide models to accurately capture and categorize actions, fostering reliability and precision in understanding. However, many existing agent benchmarks neglect long-context rule-based video understanding, instead focusing only on long-term video or general visual question answer. To simulate real-life, rule-based, customized video understanding tasks, we develope a new video understanding benchmark focused on comprehending violations and fouls within basketball games.