SOURCE ARCHIVE
EXTRACTED CONTENT
2,188 charsCompared to the inefficiency and unreliability of direct RTL-based sampling in CorrectBench, we propose a more efficient and robust Sampling&Filtering mechanism, fully decoupled from RTL generation, as illustrated in Fig. 4. After the agent samples N𝑁Nitalic_N Program Emulator candidates, denoted as M1,…,MN
subscript𝑀1…subscript𝑀𝑁{M_{1},\dots,M_{N}}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_M start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT (with N=5𝑁5N=5italic_N = 5 in this study), it produces N𝑁Nitalic_N corresponding signal reference result candidates R1,…,RN
subscript𝑅1…subscript𝑅𝑁{R_{1},\dots,R_{N}}italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. These results are categorized into three distinct cases: 1. Consistent Outputs: If all results are identical, we merge them into a single representative output. 2. Outlier Detection: If there exists a unique result Rjsubscript𝑅𝑗R_{j}italic_R start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT that is different from all the other results in the set {R1,R2,…,Rn}subscript𝑅1subscript𝑅2…subscript𝑅𝑛\left{R_{1},R_{2},\ldots,R_{n}\right}{ italic_R start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_R start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_R start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } (for instance, j=4𝑗4j=4italic_j = 4 in Fig. 4), the outlier is filtered out. 3. Partial Consistent: The remaining circumstances, which exhibit partial consistency, are merged while abstaining from diversity. This mechanism efficiently provides the most informative to the LLM-as-a-Judge module for further evaluation.