Multi-Armed Bandit
ConceptA multi-armed bandit (MAB) is a decision-making model in which a decision maker repeatedly chooses among arms with unknown reward behavior, balancing exploration of uncertain choices against exploitation of choices that have paid off. In hardware verification evidence, MAB is used to schedule virtual test sequences, using functional-coverage-driven rewards and algorithms such as UCB1 to accelerate coverage closure.
WIKI
Overview
A Multi-Armed Bandit (MAB) is a model for making decisions under uncertainty. Its name comes from the casino analogy of multiple one-armed slot machines: a decision maker is given several arms with unknown reward profiles and, at each time step, chooses one arm to maximize expected reward. The central trade-off is between exploration—trying arms whose rewards are uncertain—and exploitation—choosing arms that have already produced relatively high rewards. [C1]
Public examples describe the same exploration/exploitation dilemma in other domains. In password guessing, a guesser may have several dictionaries or information sources but does not know in advance which will yield the best results; this can be framed as a MAB problem. QoS-aware variants of Thompson sampling have also been proposed for runtime decision making in self-adaptive systems where an arm must satisfy QoS requirements with high confidence. [C10]
NEIGHBORHOOD
No graph connections found for this entity yet. It may appear in future ingestion runs.
explore full graph →