The truly shocking VW emissions fraud should force us to think through how we can ensure the transparency that is needed in software. The general issue is excellently summarize in this recent NYT article:
“Intelligent public policy, as we all have learned since the early 20th century, is to require elevators to be inspectable, and to require manufacturers of elevators to build them so they can be inspected,” [Mr. Moglen, a lawyer, technologist and historian who founded the Software Freedom Law Center] said. “If Volkswagen knew that every customer who buys a vehicle would have a right to read the source code of all the software in the vehicle, they would never even consider the cheat, because the certainty of getting caught would terrify them.”
That is not how carmakers or even the E.P.A. see things. The code in automobiles is tightly protected under the Digital Millennium Copyright Act. Last year, several groups sought to have the code made available for “good-faith testing, identifying, disclosing and fixing of malfunctions, security flaws or vulnerabilities,” as Alex Davies reported last week in Wired.
A group of automobile manufacturers said that opening the code to scrutiny could create “serious threats to safety and security.” And two months ago, the E.P.A. said it, too, opposed such a move because people might try to reprogram their cars to beat emission rules.
At one level, that organizations’ policies have to be expressed in software these days is good news — at least in theory it should be much easier to find pricing discrimination buried in a website algorithm, than to show a pattern among individuals involved in intuitive price setting (see my earlier blog here).
But that depends on the software being open and accessible for review. The industry has argued successfully against this in terms of maintaining competitiveness and intellectual property, and in terms of the need to protect against hacking of various forms. The competitiveness argument carries little weight. We already have a robust industry-dominated system to do that. Rather it is an argument for maintaining barriers to entry, something that we should regard with deep suspicion.
The hacking and security argument also cuts both ways. Hackers can always get in, with enough resources. The difference is that transparent software might be easier to hack, but it will also have its errors found much more quickly by the good guys.
This is important for us not just because it raises all sorts of legal issues, but because very soon software-driven algorithms are going to be doing all kinds of things in the justice system. Probably most important in terms of transparency and credibility will be the emerging triage systems being developed in court and legal aid contexts. Similarly, as big data is used to drive decisions in how documents are assembled, cases are processed, the need for transparency is critical. See here triage principles developed in 2012.
So the rules, and indeed how the rules are actually put in practice, both need to be transparent. The problem this creates is the risk of gaming. If for example, it is known that saying you read badly makes it easier to get a lawyer in a particular triage system, then some people may say that they do not read well. Sadly, therefore, the best triage factors, or rather the proxies that are used to score the factors, may be objective data, like level of education. Moreover, the best of all are those that can be obtained from data in other databases — such as whether a tax return has been filed on one’s own, or with help.
This may well make building triage systems harder, but if we ignore the risk, it is only a matter of time before we find that some Ferguson-like municipality has developed a fee maximization algorithm for arrests, fines, fees and assessments.