In his debate on evil with Plantinga in their book, Tooley uses Carnap's logical probability measure to get an upper bound on the probability that N evils are in fact unjustified. The result is technically interesting, but Carnap's probability measure is standardly seen as merely a part of the history of philosophy of science, and I don't know of anybody other than Tooley in recent decades to have actually used it for anything. I've always seen Carnap's measure as a failed attempt to produce a logical probability measure that makes induction possible, and I assumed that everyone shared the view that it was a failed attempt--I am pretty sure it was taught to us as a failed attempt at Pittsburgh. Anyway, in case anybody is curious what is wrong with the Carnap measure, here are some remarks (cross-posted from my own blog). I have no idea if the criticisms are original or not.
Carnap's objective prior probability measure was designed to make induction possible. To explain the problems with the Carnap measure, I need some details. If you're familiar with Carnap measure, you can skip ahead to "Problem 1".
Carnap's prior probability measure is best seen as a measure for the probability of claims made by sentences of a truth-functional language with n names, a1,...,an, and k unary predicates, Q1,...,Qk. Let N be the set of names, Q the set of predicates and T the set {True, False}. Call the language L(Q,N). Say that a state s is a function from the Cartesian product QxN to T, and let S be the set of all states. There is a natural way of saying whether a sentence u of L(N,P) is true at a state s. Basically, you say that the sentence Qi(aj) is true at s if and only if s(Qi,aj)=True, and then extend truth-functionally to all states.
There is a natural probability measure on S, which I will call the "Wittgenstein measure", defined by PW(A)=|A|/|S| for every subset A of S, where |X| is the cardinality of the set X. This probability measure assigns equal probability to every state. Given a probability measure P on states, we get a probability measure for the sentences of L(Q,N). If u is such a sentence, define the subset uT={s:u is true at s} of S. Then, we can let P(u)=P(uT). The Wittgenstein measure does not allow induction. Suppose that we have three names, and two predicates, Raven and Black. Our evidence E is: Raven(a1), Raven(a2), Raven(a3), Black(a1) and Black(a2). Then, PW(Black(a3)|E)=1/2=PW(Black(a3)), as can be easily verified, because all states are equally likely, and hence the state that makes all the ai be black ravens is no more likely than the state that makes all the ai be ravens but with only a1 and a2 black.
So, Carnap wanted to come up with a probability measure that allows induction but is still fairly natural. What he did was this. Instead of assigning equal probability to each state, he assigned equal probability to each equivalence class of states. Say that s~t for states s and t if there is some permutation p of the names N such that s(R,p(a))=t(R,a) for every predicate R and every name a. Let [s] be the equivalence class of s under this relation: [s]={t:t~s}. Let S* be the set of these equivalence classes. Then, if s is a state, we define: PC({s})=1/(|[s]||S*|). In other words, each state in an equivalence class has equal probability, and each equivalence class has equal probability. If A is any subset of S, we then define PC(A) as the sum of PC({a}) as a ranges over the elements of A.
The merit of Carnap measure is that it assigns a greater probability to more uniform states. Thus, PC(Black(a3)|E) should be greater than 1/2 (I haven't actually worked the numbers).
Problem 1: Carnap measure is not invariant under increase of the number of predicates. Intuitively, adding irrelevant predicates to the language, predicates that do not appear in either the evidence or the hypothesis, should not change the degree of confirmation. But it does. In fact, we have the following theorem. Let u be any sentence of L(Q,N). Let Qr be Q with r additional predicates thrown in. Let ur be a sentence of L(Qr,N) which is just like u (i.e., ur is u considered qua sentence of L(Qr,N)).
Theorem 1: PC(ur) tends to PW(u) as r tends to infinity.
In other words, as one increases the number of predicates, one loses the ability to do induction, since PW is no good for induction. The proof (which is non-trivial, but not insanely hard) is left to the reader.
Problem 2: Let d be a sentence of L(Q,N) saying that indiscernibles are identical. For instance, let dij be the disjunction ~(Q1(ai) iff Q1(aj)) or ... or ~(Qk(ai) iff Qk(aj)), and let d be the conjunction of the dij for all distinct i and j.
Theorem 2: PC(u|d)=PW(u|d).
Thus, when we condition on the identity of indiscernibles, Carnap measure collapses to Wittgenstein measure. But Wittgenstein measure is worthless for induction. And often the identity of indiscernibles holds. For instance, suppose we have a1,a2,a3 as our individuals, and our evidence is this: a1,a2,a3 are each a raven, a1 and a2 are black. So far so good, we can do induction and we get some confirmation of a3 being black. But suppose we also learn that identity of indiscernibles holds for these three ravens. Then we lose the confirmation! And we might well learn this. For instance, we might learn that exactly a1 and a3 are male, and exactly a1 and a2 each have an even number of feathers, and that means that identity of indiscernibles holds.
Moreover, I think most of us have a background belief that our world has such richness of properties that, at least as a contingent matter of fact, the identity of indiscernibles holds for macroscopic objects. If so, then Carnap measure makes induction impossible for macroscopic objects.
Sketch of proof of Theorem 2: Let D be the set of states at which identity of indiscernibles holds. Thus, D is the set of states s with the property that if a and b are distinct, then there is a predicate R such that s(R,a) differs from s(R,b). Observe that if s is any state in D, then |[s]|=n!, where n is the number of names. For, any permutation of the names induces a different state given the identity of indiscernibles, and there are n! permutations. Therefore, PC({s})=1/(n!|S*|). Hence, PC({s}) has the same value for every s in D. Therefore, PC({s}|D)=1/|D|. But, likewise, PW({s}|D)=1/|D|. The Theorem follows easily from this.
Remark: Theorem 2 gives an intuitive reason to believe Theorem 1. As one increases the number of predicates while keeping fixed the number of names, a greater and greater share of the state space satisfies the identity of indiscernibles.


Theorem 3: Let E_1 be any evidence solely about the particulars r_1,...,r_n. Let H be a hypothesis solely about r_{n+1}. Let E_2 be the evidence that P(r_1)&...&P(r_n)&~P(r_{n+1}), where P is some predicate. Then, according to PC, E_1 and H are conditionally independent given E_2.
Remark: This means that if you're trying to do simple induction, and you also know that there is any property that the particulars involved in the inductive data have and which the particular you are trying to learn about does not have, the inductive data tells you nothing about the particular. But this is always going to be the case--the unobserved particular will be differently located, less accessible, whatever. Minimally, the unobserved particular will be unobserved!
Alex,
Thanks for posting this. Re problem 1: Do you get this problem b/c of the correspondence between predicates and kinds? I.e., could you redescribe the problem as follows: The Carnap measure is not invariant under increase of natural kinds? If so, that doesn’t seem to be a problem.
Re problem 2: I don't follow this. If we assume that the identity of indiscernibles holds for our evidence set then I'm not surprised that we lose inductive confirmation (if only b/c we lose distinctness in our evidence set). I think, though, I must misunderstand your thought. What do you mean by assuming that the identity of indiscernibles holds for the evidence set. It seems like you want the evidence set to include distinct individuals for which we know they have certain properties but we lack knowledge about whether one individual has a relevant property. But then I don't see what significance to attach to the claim that the identity of indiscernibles holds for the set.
Ted:
Re. 1:
Well, I don't know if natural kinds are the only properties. Does positively charged count as a natural kind?
In any case, this is supposed to be a logic for epistemic purposes, so we probably shouldn't presuppose a fixed set of properties or kinds, since we don't yet know all the contents of the set of properties or kinds.
Things seem get even worse if we extend Carnap's system to have determinables. I think (I never wrote down the proof, but it seemed clear when I thought about it): as soon as we get a single determinable that has an infinite set of possible values (e.g., mass), induction stops working--even for other properties.
Re. 2:
Identity of indiscernibles holds for a set of objects provided that for every pair of distinct x and y, there is some property (in the set of properties that we're working with) that is had by one of x and y but not by the other.
E.g., suppose our evidence is:
Object 1: raven, black, charged, female, smart
Object 2: raven, black, not charged, female, smart
Object 3: raven, black, charged, not female, smart
Object 4: raven, black, not charged, not female, not smart
Object 5: raven, black, not charged, not female, smart
Object 6: raven, black, not charged, female, not smart
Object 7: raven, charged, female, not smart
Then, our evidence entails identity of indiscernibles (relative to the set of properties {raven,black,charged,female,smart}) for the relevant objects. And we get:
PC(object 7 is black | evidence) = 1/2.
Which was also our prior for object 7 being black. So, according to Carnap, all the evidence turned out to be useless for purposes of inferring the blackness of object 7. Yet, clearly, the evidence makes it more probable than not that object 7 is black.
That's cool! Thanks for the clarification.
These objections to Carnapian inductive probability are not new. Patrick Maher discusses (and responds to) various criticisms of Carnap's project in the following forthcoming paper:
http://patrick.maher1.net/preprints/eoip.pdf
It is unfortunate that Tooley is using Carnap's early systems here, and not his later systems, which, as Maher explains, are much more interesting (and also immune to the criticisms presented here).
That's a very helpful reference, thanks!
Based on the Maher piece, I suspect that even Carnap's more sophisticated systems will exhibit some similar problems. Here's one. Take some family of properties {F_1,...,F_n}. Carnap's system makes the piece of evidence that F_1(a_1)&...&F_1(a_n) support F_1(a_0). My conjecture now is this. Partition the properties more finely. So, maybe F_i is the exclusive disjunction F_{i,1},...,F_{i,k}. We now have a richer family of nk properties. Suppose, further, that the family is so fine-grained that no two of the a_i satisfy the same member of the family. (This will almost always be doable if the properties are on a continuum.) Let E* be the enriched evidence obtained by saying which of the finer-grained properties a_1,...,a_n have. I suspect that if we use Carnap's probability measure on the richer family and try to compute P(F_1(a_1)|E*)=P(F_{1,1}(a_1) or ... or F_{1,k}(a_k)|E*) we will just get something rather closer to our prior for F_1(a_1), and if we continue to do this, letting k go to infinity, we will converge to that prior. I could be wrong about this--I need to go to the library and see the later Carnap work to check.
Actually, it seems I was wrong! My conjecture as stated above seems false. But there still seem to be problems.
Problem A:
Let's say we have some family F_i of properties with a trillion members, and let's say we've observed ten particulars, a_1,...,a_10, finding that each of them has a different F_i. We haven't observed a_0 yet. A reasonable inference to make, I think, would be that a_0 doesn't have any of the F_i that are had by a_1,...,a_10. But by my calculations, according to Theorem 5 in Maher, the probability that a_0 has one of those F_i is about 10/(10+lambda). Intuitively, that seems way too high, no? (Let's make lambda=2. If lambda is higher, than just raise the number of F_i's, and raise the number of observed items.)
Let's say there a trillion shades of brown, and we've observed ten deer and found each has a different shade of brown. There is an unobserved deer. Surely we shouldn't infer that probably the unobserved deer has one of the ten observed shades of brown.
Problem B:
Suppose that we've observed a thousand deer, and they're all brown. But they are each a slightly different shade of brown, at our very high resolution on which we have a trillion shades of brown.
The Carnap measures (for moderate lambda) will assign a pretty high probability that Bambi, whom we haven't observed, is brown, given the evidence. But here is the perverse thing. Write the claim Brown(Bambi) as the disjunction: Brown1(Bambi) or Brown2(Bambi), where Brown1 is the disjunction of all the shades of brown observed in the other deer while Brown2 is the disjunction of all the other shades of brown. Then the Carnap measures will assign probability close to 1 to Brown(Bambi). But the reason they will do this is because they will assign probability close to 1 to Brown1(Bambi), while they assign probability close to 0 to Brown2(Bambi). However, intuitively, while we are pretty sure Bambi will be brown, given the wide diversity of brownness in the deer population, it isn't the case that we're pretty sure Bambi will have one of the already observed shades of brown. In fact, we might be pretty sure he won't.
Suppose now we somebody comes and tells us that Bambi isn't any of the shades of brown that the existing deer are. Then on Carnap's measure, our credence in Brown(Bambi) will, I think (I didn't check carefully), jump to pretty close to our prior for Brown(Bambi).
Maybe then if we're to use the Carnap measure, we need to discard our finer data (shades of brown) and partition more coarsely. That seems a kludge, and it ends up discarding some of our evidence.
Set lambda=2. Take any family of incompatible properties, with however many members you like, and any regular priors. Now take any four particulars. According to Theorem 5 with lambda=2 (which is what Maher argues for), the prior probability that at least two of the four particulars will share a property from the family is greater than 0.8.
That surely isn't right. If we have a family of around a million incompatible properties (maybe weights measured up to six significant digits), and we take four particulars and find that they share one of the properties in the family, we ought to be surprised. But on the Carnap measure, we ought to be mildly surprised when they don't.
This is the birthday paradox with a vengeance.
There are deeper problems than these, for the case in which we have more than two properties. Maher himself gives these stronger objections! See:
http://www.jstor.org/pss/20013083
So, Maher is only defending the Carnapian explanation for the case of two properties. It is an open question whether this can be extended to a plausible model for multiple properties. See the paper above for Maher's own objections to the existing attempts to extend the later Carnapian models to the n > 2 case.
Thanks for this reference, too! As you can see, I really know nothing about this material (apart from a bit of reading of Carnap back in the first year of grad school).
By "more than two properties" do you mean "more than two property families" or "more than two properties per family"?
In the MS you linked to earlier, Maher is apparently defending Carnap for the case of a single property family, but with an arbitrary number of incompatible properties in the family. If so, then the objections in my previous two comments apply, since they are predicated on a single predicate family with lots of properties in it. And that's the case that Tooley needs (he's interested in the family of all maximal right- and wrong-making properties); he doesn't need to worry about the case of multiple property families.
In the Erkenntnis piece, it seems we get discussions of the case of three or more property families with two properties in each, as well as of the case of two or more property families with three or more properties in them.
Sorry, I guess I'm not following the latest objection (and/or feeling its pull). Can you write it out a bit more explicitly?
OK, let me try to be more explicit. The objection is to the Carnap system in the Maher preprint. This system is supposed to work for a single family of mutually exclusive and jointly exhaustive properties. And this is what Tooley needs. I will suppose lambda = 2 (if lambda = 1, it won't change things much).
So, let F = { F_1,...,F_n } be any family of mutually exclusive and jointly exhaustive properties, with n at least 2. Suppose that the priors P(F_i(a)) are all non-zero. Let PC be the Carnap prior probability measure. Let the a_i be particulars.
For instance, maybe F_i(a) iff a has a birthday on the ith day of the year. Or maybe F_i(a) iff a's weight is equal to i/10000000 kg (maybe we let n=1000000000, and we restrict the particulars to human beings).
Objection 1: Let a_1,...,a_k be any k distinct particulars. Let SameF be the event that there exists i, j, l, with j and l distinct, such that F_i(a_j) and F_i(a_l). Then PC(SameF) > 0.8 if k > 3. (It's an easy calculation.) But this is counterintuitive, isn't it? In the weight or birthday cases, we would, a priori, not expect four individuals to have among them two who have the same weight up to the precision in question. And it is particularly counterintuitive that that the 0.8 lower bound is correct independently of the number n of members of the property family. With a really large property family, we expect it to be significantly less likely to get repeats than with a small property family.
Objection 2: Let the F_i be colors enumerated to 8-bits per color, so n=16777216, so that F_i(a) iff the average color of a's surface is the ith color. Let Brown be a disjunction of all the brown colors (just make some arbitrary judgment around the boundaries). Suppose that we've observed 9 deer, a_1,...,a_9 and measured their colors to the requisite precision, and all these colors are brown. This evidence is E=F_{i_1}(a_1)&...&F_{i_9}(a_9). And let a_10 be the tenth deer. Then PC(Brown(a_10)|E) is close to one, just as we want it to be. But it's high for the wrong reason. Let Brown1 be the disjunction of F_{i_1},...,F_{i_10}. Let Brown2 be the disjunction of all the other brown colors (there are several orders of magnitude more colors disjoined in Brown2 than in Brown1). Then PC(Brown1(a_10)|E) is close to 1, while PC(Brown2(a_10)|E) is close to 0. Thus, PC predicts that the tenth deer is brown only because PC predicts that the tenth deer has the exact same color as one of the observed deer. But this is, surely, mistaken: we do not expect the tenth deer to have the exact same color as one of the observed deer, at least not to this precision of color measurements. This may, however, simply be a special case of the general problem that Maher notes that Carnap's system doesn't work so well in cases where similarities between properties matter.
I thought of trying to handle Objection 1 by making lambda depend in some way on n (so it's large for large n). I worry that if we do that, then it may take too long to get confirmation for disjunctive properties like Brown, though maybe there is some sweet spot for lambda as a function of n.
Thanks. That helps. I encourage you to ask Patrick what he thinks about these sorts of examples. I think Carnap's project fails for other reasons, so I don't really have a dog in this fight anyway.