This is more accurate than checking for the number of polls. This also makes the test capable of handling slowly connecting peers, which could cause a contender poll to finalize before the other contenders are registered (so the poll count will be lower than the expected 12). By checking the set over the whole quorum we remove this constraint entirely.
This implies to properly compare the proof rankings also, so we fix the rank computation by using the exact score and no longer assume 5000.