From Marshall Rose: "the first deals with the delay in before asking for ballots (after a contribution) and the second deals with each delay between submitting each ballot."
New "brave-core" (chromium refork to get front end but w/o Google accounts/sync), the new implementation; narration by Serg Zhukovsky:
Thanks! The protection of the delay-based ballot-mixing looks somewhat weak.
I see that the delay from one ballot batch to the next is set to a uniformly-random time between 10 and 60 seconds. I also see that the ballot batch size is 10.
Let's assume that you have fixed the problem of the ballots being submitted in the clear and that they are instead send through a proxy on different TCP connections (or through Tor on different circuits). Let's also say that the news reports are correct and Brave has 4 million active users, and moreover that all have enabled payments (this is generous). Furthermore, assume that (1) there is no reason payments would start being submitted on any particular day or time, and (2) say, 20 publishers are paid 20 tokens each on average. Then the average number of users uploading ballots in any given minute - let's call them "neighbors" - is roughly (20 ballots/user / 10 ballots/batch)(20 publishers/month)(4e6 users)/(30 days/month)/(24 hours/day)/(60 minute/hour) = 3704 neighbors. That's not a very big anonymity set, but it's not nothing. Early adopters got screwed, though.
However, it gets worse. Based on BatClient::prepareVoteBatch() (bat_client.cc), it looks like each batch has ballots for a single publisher (if I'm wrong, privacy erodes even further). How many of those 3704 simultaneous uses are uploading to the exact same domain? I don't know how to guess (to be conservative, we should assume none). Moreover, unless the last batch happens to be a multiple of 10, that one will be unique in being a size less than 10. Both of these things make it easy in many cases to determine when you are finished uploading ballots for one publisher and are moving onto the next. If many of your neighbors have split a publisher's votes across multiple batches, then it seems unlikely that they will move to the next publisher at the same time, making your ballot submissions more linkable across publishers, exactly the problem we were worried about.
In addition, the time of each batch isn't independent because the delay is applied after the last one. For example, if one batch appears is sent quickly by chance, then the next one is more likely to be sent at a time close to the initial batch. Thus, to link two batches, you only need to consider the batches that end in the 10-60 seconds preceding the start of the second one. How many of your 3704 neighbors ended a batch then? The longer a batch upload takes, (due to latency & bandwidth), the less likely that the batch of any other user has ended in that short time frame.
There is also the issue of semantic linking. If I'm a relatively rare group, the sites I visit are likely to be linkable to each other as distinct from those of my neighbors. For example, suppose I speak Catalán and am involved in regional politics, or that I am a teenage boy in Iceland. How many of my sites are linkable based on those characteristics? And this linking can happen across "gaps" in the ballot-upload stream, where there is uncertainly about how the uploads are linked together, allowing you to get the inference "back on track".
Fortunately, the timing stuff seems fixable. Hiding the IP and separating ballots across connections is an essential first step, of course. but to handle timing issues after that, simply do the following: (1) submit all ballots individually; (2) schedule each ballot upload independently of the other ballots and at a uniformly-random time over, say, the next week or two, and if the user isn't online, reschedule.
The semantic issues with small populations seems harder. I think Brave should prevent itself from learning about votes for publishers unless enough people vote for the publisher. I don't see how to do this without a more powerful cryptographic protocol (it is absolutely doable, though, using MPC for example).
Thanks. Marshall Rose said in reply "That is a very good analysis about the traffic pattern. You are right about the batching of ballots for a similar publisher. That is an optimization to reduce the total voting period."
We will work on improving the system. Please mail me first at brave dot com if you want to correspond further. Thanks again.
JS implementation in Muon-based https://github.com/brave/browser-laptop product:
1. https://github.com/brave-intl/bat-client/blob/master/index.j...
2. https://github.com/brave-intl/bat-client/blob/master/index.j...
3. https://github.com/brave-intl/bat-client/blob/master/index.j...
From Marshall Rose: "the first deals with the delay in before asking for ballots (after a contribution) and the second deals with each delay between submitting each ballot."
New "brave-core" (chromium refork to get front end but w/o Google accounts/sync), the new implementation; narration by Serg Zhukovsky:
"""
here we make a first call https://github.com/brave-intl/bat-native-ledger/blob/reconci...
which calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...
after the random wait it calls that function https://github.com/brave-intl/bat-native-ledger/blob/reconci...
and in it’s callback it goes all over again if there are still votes to send
if you are interested in the function that does the randomization, it’s here: https://github.com/brave-intl/bat-native-ledger/blob/reconci... """
The C++ is all new code, not yet released -- bug reports welcome! Thanks.