I was a bit disappointed by this because it feels more like people trying to get press and make a name for themselves than an actual serious endeavor at digging through open government data with ML. The first thing you see in the README is links to numerous social media profiles, followed by a Kickstarter link and a BTC address. Zero technical info. Looking into the Kickstarter (well, Catarse, the Brazilian copycat), apparently there's a young company behind this - "Data Science Brigade". So I guess this is partly a publicity stunt.
I'm sorry, but if you guys are serious about building this open you should make it entirely technical and keep any press/social media/financial bullshit out of it. Entirely separated. In addition, if the project is sponsored by a company, make that clear in a notice somewhere. It just doesn't feel honest to do it any other way. I'm not going to contribute with time or money to a project that claims to be about "fighting corruption" but might have all sorts of unknown interests behind it. I'd be pleased to contribute otherwise.
If they are serious, they might want to do it full time. Or even hire people. They'll require funding. Even recruiting people to work on OSS requires marketing of sorts. You gotta convince people the project is cool, but that it's on a good trajectory. And finally, after all this, you want people talking/following so you can push your results out.
There isn't anything inherently wrong with self promotion and often it can be critical.
I'm one of the co founders of Serenata de Amor and your reply was very interesting for us. It pushed the core team (me included) to rethink our steps. If you had time for such a substantial critique I hope you have time to hear our side of the story. It's not about antagonizing you, but giving our point of view might help people understand why things were done the way they are.
First of all, yes, we want publicity. We depend on that actually. We set up a crowd funding campaign to be able to raise funds and work full time on the project for a couple of months. If we reach our target it will mean 8 people working full time during on it, for two months. People who have quit their jobs and still have bills to pay. So yes, publicity is good for us. And publicity is good for the project — that's what we hope, at least.
Furthermore we aim at publicity among a diverse public. Corruption is not an issue relevant only for the tech community, thus we don't intend to reach only developers and data scientists. We want journalists to know what we are doing, we want general public interested in politics and civil participation to know what we are doing, we want people looking for justice to know what we are doing, we want politically skeptical people to know what we are doing. And we do not have a website (yet). So our choice was to go for a non-techical REAMDE.md and keep the technical stuff and contribution guide in a separate document, the CONTRIBUTING.md. And even that one is not 100% focused on tech people, because the project don't depend exclusively on tech people — we need creative social scientists to draw hypothesis, we need people with law expertise to help us handling the “interesting” stories we might find, we need communication experts to make press releases so news outlet might help us in putting pressure over politicians, and so on.
Furthermore, we are open to talk. We are not a closed group, we have no hierarchy, no dictators and not even a BDFL. That's why we publicize about our social media. We want people to know what we are doing and we want people to reach us whenever they want, using the communication platforms they use.
So, yes. I do agree with your most sharp critiques. Zero technical info in the README.md? Sure thing: journalists might not be interested on that and we want them to follow the project. Are we trying to get publicity? Sure thing: we actually are trying to be able to work full time on that thing for 2 months and we will still have bills to pay — just like everybody else. Not 100% technical? Sure thing: we cannot be 100% technical and let the publicity out of it since our project requires other sets of expertise.
And still: we hope that this does not invalidades our claims that we are using machine learning to fight corruption in Brazil.
Finally, Data Science Brigade. This is an organization we started to help creating and fostering the culture of data science in Brazil, specially among journalists. Yes, there are interests underneath it (spoiler, there's no such a thing as non-interested action): we work with that stuff and we want to have jobs in this field in the near future. Also we are Brazilians and we want to have here the kind of serious journalism focused on data we see in the US, for example. We believe this is good for the society we live in and we believe we can help with that — so, why not?
These are our thoughts, the reasons why underneath the stack you are criticizing. Maybe we got it wrong — and we do apologize if that's the case. But by now we must say that we are confident that things are bringing good results, both in terms of technical contribution and in terms of publicity.
Once more, many thank for the sharp and straightforward criticism. People not always have the balls to say what they think — and we value your courage to do so. Unfortunately we may disagree on some points, because I'm afraid we do need publicity and non-techincal people. I don't expect you to agree with us, but maybe this helps you (and others) to understand why our entry points are non-technical and welcoming lay people in general. And if you like, you (and others) may make your self at home as tech-savvy on CONTRIBUTING.md, Issues and Pull Requests, and on Gitter and Telegram too.
I've thought about this a lot for India as well. To be realistic we would need unprecedented levels of transparency to get the amount of data needed to get usable results. With the amount of nepotism around even constructing a simple network of party heads of each state and related companies and contracts awarded for public work would be valuable.
At this stage we really should think of it more in terms of documenting corruption rather than stopping corruption. When (and if) the system is ready to change the data would be extremely useful to see why things are happening the way they are and work out if solutions would just move the corruption-bottleneck rather than eliminate it.
> At this stage we really should think of it more in terms of documenting corruption rather than stopping corruption.
This is a very good point; often I see people block this sort of discussion by asking "Well, what are you gonna do about it??" Taking this point of view sidesteps that question.
Tell them that documenting will allow better history be recorded. Then you can redirect the argument of "is history as a subject any useful" to external endpoint.
I have always told the same about India. We need to make everything accessible WITHOUT the need of a Right to Information query. Almost every query that is made via RTI should be accessible within a few clicks.
This is the same as "fighting massive surveillance with X", the root of the problem is how the system works. These kind of problems cannot be solved by anything but a massive change in the status quo. As long as corrupt politicians are in power they cannot care less for evidence of potential wrongdoings. In Mexico a renowned writer once said about that: "when the impunity is absolute the appearances are for losers"
The impact of such an initiative may not be big, or even disruptive, but I'd say it at least adds cost to keep the impunity status up.
Also, given a project like this has significant findings and gets picked up by traditional or social media, it may act as a catalyst to public outrage against said impunity, which may trigger more significant changes.
'Changing how a system works' rarely happens as a result of a single event. Even with changes which have a definite turning point, this turning point is often a drop in a bucket which started to be filled much before.
There is no way to filter the good politicians from the bad, when interests are at stake. If you want to lessen corruption, then you need to balance the distribution of power and encourage competition.
I used to think your idea could be a solution but is too slow and also there is an "infinite" input of new corrupt politicians. Doing some numbers will render that trying to vote them out every 4 years doesn't sound like a win. Since we need defence against self-corruption, the source of all evil, the only I can think of is the Non-Aggression Principle and enforce respect for private property https://en.wikipedia.org/wiki/Non-aggression_principle
The maze of shell companies and indirect associations would make it relatively easily to bypass automated scrutiny. But it's an interesting idea I've thought a lot about.
It would be great to automatically analyze things such as government contracts and the politicians who were involved in getting it voted in. But you'd need a data source of the social network graphs of those who are behind organizations - which could be a requirement for bidding on government contracts. Anyone who doesn't submit an accurate network of human contacts who will ultimately make financial gain from a contract will be barred from all future contracts.
That would serve as an excellent deterrent for politicians who plan to use their position of power for personal gain and promote the usage of companies who actually deserve to be getting contracts (improving the quality of government output).
To get conviction or sanctions, one generally needs something definite. But rules-of-thumb and appearances are useful as a way to know where to begin investigating.
However, the problem with a public system of rules of thumb is that it can be used by the wrong-doers to hide themselves.
The US bank system scans the cash activity of businesses to spot money laundering and other activity. If a money launder had the rules for what transaction pattern was going to be red-flagged, they could structure their transactions pattern to not be flagged while continuing to launder money.
> You can see a lot of things in this repository, except Machine Learning. Why is this in front-page ?
Perhaps because the majority of HN readers have no concrete understanding of what machine learning is, yet they upvote most ML stories in pursuit of comprehension.
The idea seems to be fighting a specific kind of corruption. Congressmen have an office budget and many use invalid or suspicious receipts to justify expenses. The idea is to use bots to analyze and red flag suspicious receipts and then use evidence as proof in justice.
This seems to be an almost entirely pointless endeavor. The problem with corruption isn't Billy expensing $5,000 of wine, and a visit to the strip club.
I've lived in both US and Brazil and I would say that the level of corruption is more or less the same no matter where one lives in the world, but in Brazil even the street cleaner knows about the political scandals while in the US it's all smoke & mirrors, to the point that only the higher-ups are in on it.
Brazil: "I know about it but what can I do to change it? Let's focus on some talking point instead."
US: "I don't really know about any scandals so let's focus on some talking point."
The problem also isn't exactly political but social, as in what might be called "small/everyday corruption" (running a red light, cutting in line, not mentioning the cashier just gave you more change than you deserved, and a million other things). There's no ML for that, only education [1].
As an aside: when one lives in Brazil (or any country, really), "when in Rome..." seems to apply, at least in some cases. At one point, I lived in the slums and was using internet that was very likely from a stolen connection (known as a "gato", or in this case, "gatonet"), but this was what everyone had and there was nothing else on offer. I also used types of public transportation that were very efficient for me but, afaik, technically illegal yet no one was really policing it. To give one more Brazil example, at a movie theater, I once used a recently expired student card of mine to get half-price. With these types of things taken into consideration, I'd say that the US (my country) is like concrete while Brazil is like water. To each, their own. There's good and bad aspects about both.
1 - In Brazil, there's sometimes a mix-up in the usage of the terms education (educação), which includes concepts of right vs wrong and manners, and schooling (escolaridade), thus in my mention of education I mean to include both senses.
It's weird that we don't have a profile page for every official on a web site, detailing this official's performance, and for their supervisor, also a list of spendings overseen by this official and a button 'open a case for persecution'.
I mean, seriously? We have such things in commercial enterprises don't we?
I'm sorry, but if you guys are serious about building this open you should make it entirely technical and keep any press/social media/financial bullshit out of it. Entirely separated. In addition, if the project is sponsored by a company, make that clear in a notice somewhere. It just doesn't feel honest to do it any other way. I'm not going to contribute with time or money to a project that claims to be about "fighting corruption" but might have all sorts of unknown interests behind it. I'd be pleased to contribute otherwise.