I would argue that's super easy to boil down to one number: calculate the balance between players of equivalent skill for Level 0 players, Level 1 players, Level 2 players, all the way up to whatever the highest level is. Then average all those balances, weighting each by the proportion of games played by Level 0, Level 1, Level 2 players, etc. That's the balance of the scenario.
That's also what ROAR reports.
(Yes, ROAR also reports games between players of different skill levels, but based on this conversation I'm inclined to think that those games aren't likely to change the balance in one direction or another).
Steve is right that the most granular possible metric of balance would be a 3-D surface with Axis Skill on the x-axis, Allied Skill on the y-axis, and proportion-of-Axis-wins on the z-axis. But I this approach has some sticking points too--how do you know how good you and your opponent are? Can you figure this out with enough precision compared to the size of the fluctuations in the Balance Landscape that there is any improvement upon using a single balance number? I don't know the answer to that.