I found an amazing journal entry explaining exactly what I wanted to do! They even did a better job than I was thinking.
They got around the ‘double dummy’ by dealing 5 hands. A pretty good idea. They also used gradient decent based on the score. I wonder if they are calculating double dummy throughout their learning loop.
A couple of things that could be added, they don’t seem to take into consideration opponent’s bidding nor vulnerability. They also didn’t partition their DNN based on suits, which I think would give the model a big head start.
But really cool work.