Skip to content
STIMSMITH

SOURCE ARCHIVE

SHA256: 4caf0e8f47adf212540c793ad5f9500ba73c5826cdbe868802594915a9392fa5
TYPE: text/html
SIZE: 224.1 KB
FETCHED: 6/12/2026, 10:03:34 PM
EXTRACTOR: http-html
CHARS: 2,188

EXTRACTED CONTENT

2,188 chars

Compared to the inefficiency and unreliability of direct RTL-based sampling in CorrectBench, we propose a more efficient and robust Sampling&Filtering mechanism, fully decoupled from RTL generation, as illustrated in Fig. 4. After the agent samples N𝑁Nitalic_N Program Emulator candidates, denoted as M1,…,MN

subscript𝑀1…subscript𝑀𝑁{M_{1},\dots,M_{N}}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_M start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT (with N=5𝑁5N=5italic_N = 5 in this study), it produces N𝑁Nitalic_N corresponding signal reference result candidates R1,…,RN

subscript𝑅1…subscript𝑅𝑁{R_{1},\dots,R_{N}}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. These results are categorized into three distinct cases: 1. Consistent Outputs: If all results are identical, we merge them into a single representative output. 2. Outlier Detection: If there exists a unique result Rjsubscript𝑅𝑗R_{j}italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that is different from all the other results in the set {R1,R2,…,Rn}subscript𝑅1subscript𝑅2…subscript𝑅𝑛\left{R_{1},R_{2},\ldots,R_{n}\right}{ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } (for instance, j=4𝑗4j=4italic_j = 4 in Fig. 4), the outlier is filtered out. 3. Partial Consistent: The remaining circumstances, which exhibit partial consistency, are merged while abstaining from diversity. This mechanism efficiently provides the most informative to the LLM-as-a-Judge module for further evaluation.