Free Essay

In: Science

Submitted By pari87

Words 4220

Pages 17

Words 4220

Pages 17

Discrete Mathematics and Probability Theory

Fall 2009

Satish Rao,David Tse

Note 11

Conditional Probability

A pharmaceutical company is marketing a new test for a certain medical condition. According to clinical trials, the test has the following properties:

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10%

(these are called “false negatives”).

2. When applied to a healthy person, the test comes up negative in 80% of cases, and positive in 20%

(these are called “false positives”).

Suppose that the incidence of the condition in the US population is 5%. When a random person is tested and the test comes up positive, what is the probability that the person actually has the condition? (Note that this is presumably not the same as the simple probability that a random person has the condition, which is

1

just 20 .)

This is an example of a conditional probability: we are interested in the probability that a person has the condition (event A) given that he/she tests positive (event B). Let’s write this as Pr[A|B].

How should we deﬁne Pr[A|B]? Well, since event B is guaranteed to happen, we should look not at the whole sample space Ω , but at the smaller sample space consisting only of the sample points in B. What should the conditional probabilities of these sample points be? If they all simply inherit their probabilities from Ω , then the sum of these probabilities will be ∑ω ∈B Pr[ω ] = Pr[B], which in general is less than 1. So

1

we should normalize the probability of each sample point by Pr[B] . I.e., for each sample point ω ∈ B, the new probability becomes

Pr[ω ]

Pr[ω |B] =

.

Pr[B]

Now it is clear how to deﬁne Pr[A|B]: namely, we just sum up these normalized probabilities over all sample points that lie in both A and B:

Pr[A|B] :=

∑

ω ∈A∩B

Pr[ω |B] =

Pr[ω ] Pr[A ∩ B]

=

.

Pr[B]

ω ∈A∩B Pr[B]

∑

Deﬁnition 11.1 (conditional probability): For events A, B in the same probability space, such that Pr[B] > 0, the conditional probability of A given B is

Pr[A|B] :=

Pr[A ∩ B]

.

Pr[B]

Let’s go back to our medical testing example. The sample space here consists of all people in the US — denote their number by N (so N ≈ 250 million). The population consists of four disjoint subsets:

CS 70, Fall 2009, Note 11

1

N

20

T P: the true positives (90% of

=

9N

200

of them);

FP: the false positives (20% of

19N

20

=

19N

100

of them);

T N: the true negatives (80% of

19N

20

=

76N

100

of them);

FN: the false negatives (10% of

N

20

=

N

200

of them).

Now let A be the event that a person chosen at random is affected, and B the event that he/she tests positive.

Note that B is the union of the disjoint sets T P and FP, so

|B| = |T P| + |FP| =

Thus we have

Pr[A] =

1

20

and

9N

200

+ 19N =

100

47N

200 .

Pr[B] =

47

200 .

Now when we condition on the event B, we focus in on the smaller sample space consisting only of those

47N

200 individuals who test positive. To compute Pr[A|B], we need to ﬁgure out Pr[A ∩ B] (the part of A that lies in B). But A ∩ B is just the set of people who are both affected and test positive, i.e., A ∩ B = T P. So we have 9

|T P|

=

.

Pr[A ∩ B] =

N

200

Finally, we conclude from the deﬁnition of conditional probability that

Pr[A|B] =

9/200

9

Pr[A ∩ B]

=

=

≈ 0.19.

Pr[B]

47/200 47

This seems bad: if a person tests positive, there’s only about a 19% chance that he/she actually has the condition! This sounds worse than the original claims made by the pharmaceutical company, but in fact it’s just another view of the same data.

9

[Incidentally, note that Pr[B|A] = 9/200 = 10 ; so Pr[A|B] and Pr[B|A] can be very different. Of course, Pr[B|A]

1/20

is just the probability that a person tests positive given that he/she has the condition, which we knew from the start was 90%.]

To complete the picture, what’s the (unconditional) probability that the test gives a correct result (positive or negative) when applied to a random person? Call this event C. Then

Pr[C] =

|T P|+|T N|

N

=

9

200

76

+ 100 =

161

200

≈ 0.8.

So the test is about 80% effective overall, a more impressive statistic.

But how impressive is it? Suppose we ignore the test and just pronounce everybody to be healthy. Then we would be correct on 95% of the population (the healthy ones), and wrong on the affected 5%. I.e., this trivial test is 95% effective! So we might ask if it is worth running the test at all. What do you think?

Here are a couple more examples of conditional probabilities, based on some of our sample spaces from the previous lecture note.

1. Balls and bins. Suppose we toss m = 3 balls into n = 3 bins; this is a uniform sample space with

8

1

33 = 27 points. We already know that the probability the ﬁrst bin is empty is (1 − 3 )3 = ( 2 )3 = 27 .

3

What is the probability of this event given that the second bin is empty? Call these events A, B

CS 70, Fall 2009, Note 11

2

respectively. To compute Pr[A|B] we need to ﬁgure out Pr[A ∩ B]. But A ∩ B is the event that both the

1

ﬁrst two bins are empty, i.e., all three balls fall in the third bin. So Pr[A ∩ B] = 27 (why?). Therefore,

Pr[A|B] =

Not surprisingly, 1 is quite a bit less than

8

likely that bin 1 will be empty.

Pr[A ∩ B] 1/27 1

=

= .

Pr[B]

8/27 8

8

27 :

knowing that bin 2 is empty makes it signiﬁcantly less

2. Dice. Roll two fair dice. Let A be the event that their sum is even, and B the event that the ﬁrst die is even. By symmetry it’s easy to see that Pr[A] = 1 and Pr[B] = 1 . Moreover, a little counting gives us

2

2 that Pr[A ∩ B] = 1 . What is Pr[A|B]? Well,

4

Pr[A|B] =

Pr[A ∩ B] 1/4 1

=

= .

Pr[B]

1/2 2

In this case, Pr[A|B] = Pr[A], i.e., conditioning on B does not change the probability of A.

Bayesian Inference

The medical test problem is a canonical example of an inference problem: given a noisy observation (the result of the test), we want to ﬁgure out the likelihood of something not directly observable (whether a person is healthy). To bring out the common structure of such inference problems, let us redo the calculations in the medical test example but only in terms of events without explicitly mentioning the sample points of the underlying sample space.

Recall: A is the event the person is affected, B is the event that the test is positive. What are we given?

• Pr[A] = 0.05, (5% of the U.S. population is affected.)

• Pr[B|A] = 0.9 (90% of the affected people test positive)

• Pr[B|A] = 0.2 (20% of healthy people test positive)

We want to calculate Pr[A|B]. We can proceed as follows:

Pr[A|B] =

Pr[A ∩ B] Pr[B|A] Pr[A]

=

Pr[B]

Pr[B]

(1)

and

Pr[B] = Pr[A ∩ B] + Pr[A ∩ B] = Pr[B|A] Pr[A] + Pr[B|A](1 − Pr[A])

(2)

Combining equations (1) and (2), we have expressed Pr[A|B] in terms of Pr[A], Pr[B|A] and Pr[B|A]:

Pr[A|B] =

Pr[B|A] Pr[A]

Pr[B|A] Pr[A] + Pr[B|A](1 − Pr[A])

(3)

This equation is useful for many inference problems. We are given Pr[A], which is the (unconditional) probability that the event of interest A happens. We are given Pr[B|A] and Pr[B|A], which quantify how noisy

CS 70, Fall 2009, Note 11

3

the observation is. (If Pr[B|A] = 1 and Pr[B|A] = 0, for example, the observation is completely noiseless.)

Now we want to calculate Pr[A|B], the probability that the event of interest happens given we made the observation. Equation (3) allows us to do just that.

Equation (3) is at the heart of a subject called Bayesian inference, used extensively in ﬁelds such as machine learning, communications and signal processing. The equation can be interpreted as a way to update knowledge after making an observation. In this interpretation, Pr[A] can be thought of as a prior probability: our assessment of the likelihood of an event of interest A before making an observation. It reﬂects our prior knowledge. Pr[A|B] can be interpreted as the posterior probability of A after the observation. It reﬂects our new knowledge.

Of course, equations (1), (2) and (3 are derived from the basic axioms of probability and the deﬁnition of conditional probability, and are therefore true with or without the above Bayesian inference interpretation.

However, this interpretation is very useful when we apply probability theory to study inference problems.

Bayes’ Rule and Total Probability Rule

Equations (1) and (2) are very useful in their own right. The ﬁrst is called Bayes’ Rule and the second is called the Total Probability Rule. Bayes’ Rule is useful when one wants to calculate Pr[A|B] but one is given Pr[B|A] instead, i.e. it allows us to ”ﬂip” things around. The Total Probability Rule is an application of the strategy of ”dividing into cases” we learnt in Notes 2 to calculating probabilities. We want to calculate the probability of an event B. There are two possibilities: either an event A happens or A does not happen. If

A happens the probability that B happens is Pr[B|A]. If A does not happen, the probability that B happens is

Pr[B|A]. If we know or can easily calculate these two probabilities and also Pr[A], then the total probability rule yields the probability of event B.

Independent events

Deﬁnition 11.2 (independence): Two events A, B in the same probability space are independent if Pr[A ∩

B] = Pr[A] × Pr[B].

The intuition behind this deﬁnition is the following. Suppose that Pr[B] > 0. Then we have

Pr[A|B] =

Pr[A ∩ B] Pr[A] × Pr[B]

=

= Pr[A].

Pr[B]

Pr[B]

Thus independence has the natural meaning that “the probability of A is not affected by whether or not B occurs.” (By a symmetrical argument, we also have Pr[B|A] = Pr[B] provided Pr[A] > 0.) For events A, B such that Pr[B] > 0, the condition Pr[A|B] = Pr[A] is actually equivalent to the deﬁnition of independence.

Examples: In the balls and bins example above, events A, B are not independent. In the dice example, events

A, B are independent.

The above deﬁnition generalizes to any ﬁnite set of events:

Deﬁnition 11.3 (mutual independence): Events A 1 , . . . , An are mutually independent if for every subset

I ⊆ {1, . . . , n},

Pr[ i∈I Ai ] = ∏i∈I Pr[Ai ].

Note that we need this property to hold for every subset I.

For mutually independent events A1 , . . . , An , it is not hard to check from the deﬁnition of conditional probability that, for any 1 ≤ i ≤ n and any subset I ⊆ {1, . . . , n} \ {i}, we have

Pr[Ai |

CS 70, Fall 2009, Note 11

j∈I A j ]

= Pr[Ai ].

4

Note that the independence of every pair of events (so-called pairwise independence) does not necessarily imply mutual independence. For example, it is possible to construct three events A, B,C such that each pair is independent but the triple A, B,C is not mutually independent.

Combinations of events

In most applications of probability in Computer Science, we are interested in things like Pr[ n Ai ] and i=1 Pr[ n Ai ], where the Ai are simple events (i.e., we know, or can easily compute, the Pr[A i ]). The intersection i=1 i Ai corresponds to the logical AND of the events A i , while the union i Ai corresponds to their logical OR .

As an example, if Ai denotes the event that a failure of type i happens in a certain system, then i Ai is the event that the system fails.

In general, computing the probabilities of such combinations can be very difﬁcult. In this section, we discuss some situations where it can be done.

Intersections of events

From the deﬁnition of conditional probability, we immediately have the following product rule (sometimes also called the chain rule) for computing the probability of an intersection of events.

Theorem 11.1: [Product Rule] For any events A, B, we have

Pr[A ∩ B] = Pr[A] Pr[B|A].

More generally, for any events A1 , . . . , An ,

Pr[

n i=1 Ai ]

= Pr[A1 ] × Pr[A2 |A1 ] × Pr[A3 |A1 ∩ A2 ] × · · · × Pr[An |

n−1 i=1 Ai ].

Proof: The ﬁrst assertion follows directly from the deﬁnition of Pr[B|A] (and is in fact a special case of the second assertion with n = 2).

To prove the second assertion, we will use induction on n (the number of events). The base case is n = 1, and corresponds to the statement that Pr[A] = Pr[A], which is trivially true. For the inductive step, let n > 1 and assume (the inductive hypothesis) that

Pr[

n−1 i=1 Ai ]

= Pr[A1 ] × Pr[A2 |A1 ] × · · · × Pr[An−1 |

n−2 i=1 Ai ].

Now we can apply the deﬁnition of conditional probability to the two events A n and

Pr[

n i=1 Ai ]

= Pr[An ∩ (

n−1 i=1 Ai )]

= Pr[An |

= Pr[An |

n−1 i=1 Ai

n−1 n−1 i=1 Ai ] × Pr[ i=1 Ai ] n−1 i=1 Ai ] × Pr[A1 ] × Pr[A2 |A1 ] × · · · × Pr[An−1 |

to deduce that n−2 i=1 Ai ],

where in the last line we have used the inductive hypothesis. This completes the proof by induction. 2

The product rule is particularly useful when we can view our sample space as a sequence of choices. The next few examples illustrate this point.

1. Coin tosses. Toss a fair coin three times. Let A be the event that all three tosses are heads. Then

A = A1 ∩ A2 ∩ A3 , where Ai is the event that the ith toss comes up heads. We have

Pr[A] = Pr[A1 ] × Pr[A2 |A1 ] × Pr[A3 |A1 ∩ A2 ]

= Pr[A1 ] × Pr[A2 ] × Pr[A3 ]

1

1

= 1 × 1 × 2 = 8.

2

2

CS 70, Fall 2009, Note 11

5

The second line here follows from the fact that the tosses are mutually independent. Of course, we already know that Pr[A] = 1 from our deﬁnition of the probability space in the previous lecture note.

8

The above is really a check that the space behaves as we expect. 1

If the coin is biased with heads probability p, we get, again using independence,

Pr[A] = Pr[A1 ] × Pr[A2 ] × Pr[A3 ] = p3 .

And more generally, the probability of any sequence of n tosses containing r heads and n − r tails is pr (1− p)n−r . This is in fact the reason we deﬁned the probability space this way in the previous lecture note: we deﬁned the sample point probabilities so that the coin tosses would behave independently.

2. Balls and bins. Let A be the event that bin 1 is empty. We saw in the previous lecture note (by counting) that Pr[A] = (1 − 1 )m , where m is the number of balls and n is the number of bins. The n product rule gives us a different way to compute the same probability. We can write A = m Ai , i=1 where Ai is the event that ball i misses bin 1. Clearly Pr[A i ] = 1 − 1 for each i. Also, the Ai are n mutually independent since ball i chooses its bin regardless of the choices made by any of the other balls. So

1 m

Pr[A] = Pr[A1 ] × · · · × Pr[Am ] = 1 −

.

n

3. Card shufﬂing. We can look at the sample space as a sequence of choices as follows. First the top

1

card is chosen uniformly from all 52 cards, i.e., each card with probability 52 . Then (conditional on the

1

ﬁrst card), the second card is chosen uniformly from the remaining 51 cards, each with probability 51 .

Then (conditional on the ﬁrst two cards), the third card is chosen uniformly from the remaining 50, and so on. The probability of any given permutation, by the product rule, is therefore

1

1

1

1 1

1

× × × ···× × =

.

52 51 50

2 1 52!

Reassuringly, this is in agreement with our deﬁnition of the probability space in the previous lecture note, based on counting permutations.

4. Poker hands. Again we can view the sample space as a sequence of choices. First we choose one of the cards (note that it is not the “ﬁrst” card, since the cards in our hand have no ordering) uniformly from all 52 cards. Then we choose another card from the remaining 51, and so on. For any given poker hand, the probability of choosing it is (by the product rule):

4

3

2

1

1

5

× × × ×

= 52 ,

52 51 50 49 48

5

just as before. Where do the numerators 5, 4, 3, 2, 1 come from? Well, for the given hand the ﬁrst card we choose can be any of the ﬁve in the hand: i.e., ﬁve choices out of 52. The second can be any of the remaining four in the hand: four choices out of 51. And so on. This arises because the order of the cards in the hand is irrelevant.

Let’s use this view to compute the probability of a ﬂush in a different way. Clearly this is 4 × Pr[A], where A is the probability of a Hearts ﬂush. And we can write A = 5 Ai , where Ai is the event that i=1 the ith card we pick is a Heart. So we have

Pr[A] = Pr[A1 ] × Pr[A2 |A1 ] × · · · × Pr[A5 |

4 i=1 Ai ].

1 Strictly speaking, we should really also have checked from our original deﬁnition of the probability space that Pr[A ], Pr[A |A ]

1

2 1

1

and Pr[A3 |A1 ∩ A2 ] are all equal to 2 .

CS 70, Fall 2009, Note 11

6

Clearly Pr[A1 ] = 13 = 1 . What about Pr[A2 |A1 ]? Well, since we are conditioning on A 1 (the ﬁrst card

52

4 is a Heart), there are only 51 remaining possibilities for the second card, 12 of which are Hearts. So

Pr[A2 |A1 ] = 12 . Similarly, Pr[A3 |A1 ∩ A2 ] = 11 , and so on. So we get

51

50

4 × Pr[A] = 4 ×

9

13 12 11 10

× × × × ,

52 51 50 49 48

which is exactly the same fraction we computed in the previous lecture note.

So now we have two methods of computing probabilities in many of our sample spaces. It is useful to keep these different methods around, both as a check on your answers and because in some cases one of the methods is easier to use than the other.

5. Monty Hall. Recall that we deﬁned the probability of a sample point by multiplying the probabilities of the sequence of choices it corresponds to; thus, e.g.,

1

Pr[(1, 1, 2)] = 1 × 3 × 1 =

3

2

1

18 .

The reason we deﬁned it this way is that we knew (from our model of the problem) the probabilities for each choice conditional on the previous one. Thus, e.g., the 1 in the above product is the probability

2

that Carol opens door 2 conditional on the prize door being door 1 and the contestant initially choosing door 1. In fact, we used these conditional probabilities to deﬁne the probabilities of our sample points.

Unions of events

You are in Las Vegas, and you spy a new game with the following rules. You pick a number between 1 and 6. Then three dice are thrown. You win if and only if your number comes up on at least one of the dice.

The casino claims that your odds of winning are 50%, using the following argument. Let A be the event that you win. We can write A = A1 ∪ A2 ∪ A3 , where Ai is the event that your number comes up on die i. Clearly

Pr[Ai ] = 1 for each i. Therefore,

6

Pr[A] = Pr[A1 ∪ A2 ∪ A3 ] = Pr[A1 ] + Pr[A2 ] + Pr[A3 ] = 3 ×

1 1

= .

6 2

Is this calculation correct? Well, suppose instead that the casino rolled six dice, and again you win iff your number comes up at least once. Then the analogous calculation would say that you win with probability

6 × 1 = 1, i.e., certainly! The situation becomes even more ridiculous when the number of dice gets bigger

6

than 6.

The problem is that the events Ai are not disjoint: i.e., there are some sample points that lie in more than one of the Ai . (We could get really lucky and our number could come up on two of the dice, or all three.) So if we add up the Pr[Ai ] we are counting some sample points more than once.

Fortunately, there is a formula for this, known as the Principle of Inclusion/Exclusion:

Theorem 11.2: [Inclusion/Exclusion] For events A 1 , . . . , An in some probability space, we have

Pr[

n i=1 Ai ]

n

= ∑ Pr[Ai ] − i=1 ∑ Pr[Ai ∩ A j ] + ∑

{i, j}

{i, j,k}

Pr[Ai ∩ A j ∩ Ak ] − · · · ± Pr[

n i=1 Ai ].

[In the above summations, {i, j} denotes all unordered pairs with i = j, {i, j, k} denotes all unordered triples of distinct elements, and so on.]

CS 70, Fall 2009, Note 11

7

I.e., to compute Pr[ i Ai ], we start by summing the event probabilities Pr[A i ], then we subtract the probabilities of all pairwise intersections, then we add back in the probabilities of all three-way intersections, and so on. We won’t prove this formula here; but you might like to verify it for the special case n = 3 by drawing a

Venn diagram and checking that every sample point in A 1 ∪ A2 ∪ A3 is counted exactly once by the formula.

You might also like to prove the formula for general n by induction (in similar fashion to the proof of the

Product Rule above).

Taking the formula on faith, what is the probability we get lucky in the new game in Vegas?

Pr[A1 ∪ A2 ∪ A3 ] = Pr[A1 ] + Pr[A2 ] + Pr[A3 ] − Pr[A1 ∩ A2 ] − Pr[A1 ∩ A3 ] − Pr[A2 ∩ A3 ] + Pr[A1 ∩ A2 ∩ A3 ].

Now the nice thing here is that the events A i are mutually independent (the outcome of any die does not

1

depend on that of the others), so Pr[A i ∩ A j ] = Pr[Ai ] Pr[A j ] = ( 1 )2 = 36 , and similarly Pr[A1 ∩ A2 ∩ A3 ] =

6

1 3

1

( 6 ) = 216 . So we get

1

1

Pr[A1 ∪ A2 ∪ A3 ] = 3 × 1 − 3 × 36 + 216 =

6

91

216

≈ 0.42.

So your odds are quite a bit worse than the casino is claiming!

When n is large (i.e., we are interested in the union of many events), the Inclusion/Exclusion formula is essentially useless because it involves computing the probability of the intersection of every non-empty subset of the events: and there are 2 n − 1 of these! Sometimes we can just look at the ﬁrst few terms of it and forget the rest: note that successive terms actually give us an overestimate and then an underestimate of the answer, and these estimates both get better as we go along.

However, in many situations we can get a long way by just looking at the ﬁrst term:

1. Disjoint events. If the events Ai are all disjoint (i.e., no pair of them contain a common sample point

— such events are also called mutually exclusive), then

Pr[

n i=1 Ai ]

n

= ∑ Pr[Ai ]. i=1 [Note that we have already used this fact several times in our examples, e.g., in claiming that the probability of a ﬂush is four times the probability of a Hearts ﬂush — clearly ﬂushes in different suits are disjoint events.]

2. Union bound. Always, it is the case that

Pr[

n i=1 Ai ]

n

≤ ∑ Pr[Ai ]. i=1 This merely says that adding up the Pr[A i ] can only overestimate the probability of the union. Crude as it may seem, in the next lecture note we’ll see how to use the union bound effectively in a Computer

Science example.

CS 70, Fall 2009, Note 11

8…...

Premium Essay

... Probability – the chance that an uncertain event will occur (always between 0 and 1) Impossible Event – an event that has no chance of occurring (probability = 0) Certain Event – an event that is sure to occur (probability = 1) Assessing Probability probability of occurrence= probability of occurrence based on a combination of an individual’s past experience, personal opinion, and analysis of a particular situation Events Simple event An event described by a single characteristic Joint event An event described by two or more characteristics Complement of an event A , All events that are not part of event A The Sample Space is the collection of all possible events Simple Probability refers to the probability of a simple event. Joint Probability refers to the probability of an occurrence of two or more events. ex. P(Jan. and Wed.) Mutually exclusive events is the Events that cannot occur simultaneously Example: Randomly choosing a day from 2010 A = day in January; B = day in February Events A and B are mutually exclusive Collectively exhaustive events One of the events must occur the set of events covers the entire sample space Computing Joint and Marginal Probabilities The probability of a joint event, A and B: Computing a marginal (or simple) probability: Probability is the numerical measure of the likelihood that an event will occur The probability of any event must be between 0 and 1,...

Words: 553 - Pages: 3

Premium Essay

...Statistics 100A Homework 5 Solutions Ryan Rosario Chapter 5 1. Let X be a random variable with probability density function c(1 − x2 ) −1 < x < 1 0 otherwise ∞ f (x) = (a) What is the value of c? We know that for f (x) to be a probability distribution −∞ f (x)dx = 1. We integrate f (x) with respect to x, set the result equal to 1 and solve for c. 1 1 = −1 c(1 − x2 )dx cx − c x3 3 1 −1 = = = = c = Thus, c = 3 4 c c − −c + c− 3 3 2c −2c − 3 3 4c 3 3 4 . (b) What is the cumulative distribution function of X? We want to ﬁnd F (x). To do that, integrate f (x) from the lower bound of the domain on which f (x) = 0 to x so we will get an expression in terms of x. x F (x) = −1 c(1 − x2 )dx cx − cx3 3 x −1 = But recall that c = 3 . 4 3 1 3 1 = x− x + 4 4 2 = 3 4 x− x3 3 + 2 3 −1 < x < 1 elsewhere 0 1 4. The probability density function of X, the lifetime of a certain type of electronic device (measured in hours), is given by, 10 x2 f (x) = (a) Find P (X > 20). 0 x > 10 x ≤ 10 There are two ways to solve this problem, and other problems like it. We note that the area we are interested in is bounded below by 20 and unbounded above. Thus, ∞ P (X > c) = c f (x)dx Unlike in the discrete case, there is not really an advantage to using the complement, but you can of course do so. We could consider P (X > c) = 1 − P (X < c), c P (X > c) = 1 − P (X < c) = 1 − −∞ f (x)dx P (X > 20) = 10 dx......

Words: 4895 - Pages: 20

Premium Essay

...Probability & Statistics for Engineers & Scientists This page intentionally left blank Probability & Statistics for Engineers & Scientists NINTH EDITION Ronald E. Walpole Roanoke College Raymond H. Myers Virginia Tech Sharon L. Myers Radford University Keying Ye University of Texas at San Antonio Prentice Hall Editor in Chief: Deirdre Lynch Acquisitions Editor: Christopher Cummings Executive Content Editor: Christine O’Brien Associate Editor: Christina Lepre Senior Managing Editor: Karen Wernholm Senior Production Project Manager: Tracy Patruno Design Manager: Andrea Nix Cover Designer: Heather Scott Digital Assets Manager: Marianne Groth Associate Media Producer: Vicki Dreyfus Marketing Manager: Alex Gay Marketing Assistant: Kathleen DeChavez Senior Author Support/Technology Specialist: Joe Vetere Rights and Permissions Advisor: Michael Joyce Senior Manufacturing Buyer: Carol Melville Production Coordination: Liﬂand et al. Bookmakers Composition: Keying Ye Cover photo: Marjory Dressler/Dressler Photo-Graphics Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Pearson was aware of a trademark claim, the designations have been printed in initial caps or all caps. Library of Congress Cataloging-in-Publication Data Probability & statistics for engineers & scientists/Ronald E. Walpole . . . [et al.] — 9th ed. p. cm. ISBN......

Words: 201669 - Pages: 807

Premium Essay

...Probability, Mean and Median In the last section, we considered (probability) density functions. We went on to discuss their relationship with cumulative distribution functions. The goal of this section is to take a closer look at densities, introduce some common distributions and discuss the mean and median. Recall, we define probabilities as follows: Proportion of population for Area under the graph of p ( x ) between a and b which x is between a and b p( x)dx a b The cumulative distribution function gives the proportion of the population that has values below t. That is, t P (t ) p( x)dx Proportion of population having values of x below t When answering some questions involving probabilities, both the density function and the cumulative distribution can be used, as the next example illustrates. Example 1: Consider the graph of the function p(x). p x 0.2 0.1 2 4 6 8 10 x Figure 1: The graph of the function p(x) a. Explain why the function is a probability density function. b. Use the graph to find P(X < 3) c. Use the graph to find P(3 § X § 8) 1 Solution: a. Recall, a function is a probability density function if the area under the curve is equal to 1 and all of the values of p(x) are non-negative. It is immediately clear that the values of p(x) are non-negative. To verify that the area under the curve is equal to 1, we recognize that the graph above can be viewed as a triangle. Its...

Words: 1914 - Pages: 8

Free Essay

...Problem 1: Question 1. The probability of a case being appealed for each judge in Common Pleas Court. p(a) | 0.04511031 | 0.03529063 | 0.03497615 | 0.03070624 | 0.04047164 | 0.04019435 | 0.03990765 | 0.04427171 | 0.03883194 | 0.04085893 | 0.04033333 | 0.04344897 | 0.04524181 | 0.06282723 | 0.04043298 | 0.02848818 | Question 2. The probability of a case being reversed for each judge in Common Pleas Court. P® | 0.00395127 | 0.0029656 | 0.0063593 | 0.0035824 | 0.00223072 | 0.00795053 | 0.00725594 | 0.00675904 | 0.00434918 | 0.00477185 | 0.002 | 0.00404176 | 0.00561622 | 0.0104712 | 0.00413881 | 0.00194238 | Question 3. The probability of reversal given an appeal for each judge in Common Pleas Court. p(R/A) | 0.08759124 | 0.08403361 | 0.18181818 | 0.11666667 | 0.05511811 | 0.1978022 | 0.18181818 | 0.15267176 | 0.112 | 0.11678832 | 0.04958678 | 0.09302326 | 0.12413793 | 0.16666667 | 0.1023622 | 0.06818182 | Question 4. The probability of cases being appealed in Common Pleas Court. Probability of cases being appealed in common pleas court | 0.0400956 | Question 5. Identify the best judges in Common Pleas Court according to the three criteria in Questions 1-3: 1) The best judge in Common Pleas Court with the smallest probability in Question 1; Ralph Winkler 2) The best judge in Common Pleas Court with the smallest probability in Question 2; and ......

Words: 1093 - Pages: 5

Premium Essay

...PROBABILITY 1. ACCORDING TO STATISTICAL DEFINITION OF PROBABILITY P(A) = lim FA/n WHERE FA IS THE NUMBER OF TIMES EVENT A OCCUR AND n IS THE NUMBER OF TIMES THE EXPERIMANT IS REPEATED. 2. IF P(A) = 0, A IS KNOWN TO BE AN IMPOSSIBLE EVENT AND IS P(A) = 1, A IS KNOWN TO BE A SURE EVENT. 3. BINOMIAL DISTRIBUTIONS IS BIPARAMETRIC DISTRIBUTION, WHERE AS POISSION DISTRIBUTION IS UNIPARAMETRIC ONE. 4. THE CONDITIONS FOR THE POISSION MODEL ARE : • THE PROBABILIY OF SUCCESS IN A VERY SMALL INTERAVAL IS CONSTANT. • THE PROBABILITY OF HAVING MORE THAN ONE SUCCESS IN THE ABOVE REFERRED SMALL TIME INTERVAL IS VERY LOW. • THE PROBABILITY OF SUCCESS IS INDEPENDENT OF t FOR THE TIME INTERVAL(t ,t+dt) . 5. Expected Value or Mathematical Expectation of a random variable may be defined as the sum of the products of the different values taken by the random variable and the corresponding probabilities. Hence if a random variable X takes n values X1, X2,………… Xn with corresponding probabilities p1, p2, p3, ………. pn, then expected value of X is given by µ = E (x) = Σ pi xi . Expected value of X2 is given by E ( X2 ) = Σ pi xi2 Variance of x, is given by σ2 = E(x- µ)2 = E(x2)- µ2 Expectation of a constant k is k i.e. E(k) = k fo any constant k. Expectation of sum of two random variables is the sum of their expectations i.e. E(x +y) = E(x) + E(y) for any......

Words: 979 - Pages: 4

Premium Essay

...Odd-Numbered End-of-Chapter Exercises * Chapter 2 Review of Probability 2.1. (a) Probability distribution function for Y Outcome (number of heads) | Y 0 | Y 1 | Y 2 | Probability | 0.25 | 0.50 | 0.25 | (b) Cumulative probability distribution function for Y Outcome (number of heads) | Y 0 | 0 Y 1 | 1 Y 2 | Y 2 | Probability | 0 | 0.25 | 0.75 | 1.0 | (c) . Using Key Concept 2.3: and so that 2.3. For the two new random variables and we have: (a) (b) (c) 2.5. Let X denote temperature in F and Y denote temperature in C. Recall that Y 0 when X 32 and Y 100 when X 212; this implies Using Key Concept 2.3, X 70oF implies that and X 7oF implies 2.7. Using obvious notation, thus and This implies (a) per year. (b) , so that Thus where the units are squared thousands of dollars per year. (c) so that and thousand dollars per year. (d) First you need to look up the current Euro/dollar exchange rate in the Wall Street Journal, the Federal Reserve web page, or other financial data outlet. Suppose that this exchange rate is e (say e 0.80 Euros per dollar); each 1 dollar is therefore with e Euros. The mean is therefore e C (in units of thousands of Euros per year), and the standard deviation is e C (in units of thousands of Euros per year). The correlation is unit-free, and is unchanged. 2.9. | | Value of Y | Probability Distribution of X | | | 14 | 22 | 30 | 40 | 65 | | | Value......

Words: 11774 - Pages: 48

Premium Essay

...PROBABILITY ASSIGNMENT 1. The National Highway Traffic Safety Administration (NHTSA) conducted a survey to learn about how drivers throughout the US are using their seat belts. Sample data consistent with the NHTSA survey are as follows. (Data as on May, 2015) Driver using Seat Belt? | Region | Yes | No | Northeast | 148 | 52 | Midwest | 162 | 54 | South | 296 | 74 | West | 252 | 48 | Total | 858 | 228 | a. For the U.S., what is the probability that the driver is using a seat belt? b. The seat belt usage probability for a U.S. driver a year earlier was .75. NHTSA Chief had hoped for a 0.78 probability in 2015. Would he have been pleased with the 2003 survey results? c. What is the probability of seat belt usage by region of the Country? What region has the highest seat belt usage? d. What proportion of the drivers in the sample came from each region of the country? What region had the most drivers selected? 2. A company that manufactures toothpaste is studying five different package designs. Assuming that one design is just as likely to be selected by a consumer as any other design, what selection probability would you assign to each of the package designs? In an actual experiment, 100 consumers were asked to pick the design they preferred. The following data were obtained. Do the data confirm the belief that one design is just as likely to be selected as other? Explain. Design | Number of Times Preferred | 1 | 5 | 2 |......

Words: 1453 - Pages: 6

Premium Essay

...Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #07 Random Variables So, far we were discussing the laws of probability so, in the laws of the probability we have a random experiment, as a consequence of that we have a sample space, we consider a subset of the, we consider a class of subsets of the sample space which we call our event space or the events and then we define a probability function on that. Now, we consider various types of problems for example, calculating the probability of occurrence of a certain number in throwing of a die, probability of occurrence of certain card in a drain probability of various kinds of events. However, in most of the practical situations we may not be interested in the full physical description of the sample space or the events; rather we may be interested in certain numerical characteristic of the event, consider suppose I have ten instruments and they are operating for a certain amount of time, now after amount after working for a certain amount of time, we may like to know that, how many of them are actually working in a proper way and how many of them are not working properly. Now, if there are ten instruments, it may happen that seven of them are working properly and three of them are not working properly, at this stage we may not be interested in knowing the positions, suppose we are saying one instrument, two instruments and so, on tenth...

Words: 5830 - Pages: 24

Free Essay

... Title: The Probability that the Sum of two dice when thrown is equal to seven Purpose of Project * To carry out simple experiments to determine the probability that the sum of two dice when thrown is equal to seven. Variables * Independent- sum * Dependent- number of throws * Controlled- Cloth covered table top. Method of data collection 1. Two ordinary six-faced gaming dice was thrown 100 times using three different method which can be shown below. i. The dice was held in the palm of the hand and shaken around a few times before it was thrown onto a cloth covered table top. ii. The dice was placed into a Styrofoam cup and shaken around few times before it was thrown on a cloth covered table top. iii. The dice was placed into a glass and shaken around a few times before it was thrown onto a cloth covered table top. 2. All result was recoded and tabulated. 3. A probability tree was drawn. Presentation of Data Throw by hand Sum of two dice | Frequency | 23456789101112 | 4485161516121172 | Throw by Styrofoam cup Sum of two dice | Frequency | 23456789101112 | 2513112081481072 | Throw by Glass Sum of two dice | Frequency | 23456789101112 | 18910121214121174 | Sum oftwo dice | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | Total | Experiment1 | 4 | 4 | 8 | 5 | 16 | 15 | 16 | 12 | 11 | 7 | 2 | 100 | Experiment2 | 2 | 5 | 13 | 11 | 20 | 8 | 14 | 8 | 10 | 7 | 2 | 100 | Experiment3 | 1 | 8 | 9 | 10 | 12 | 12 | 13 | 12 |......

Words: 528 - Pages: 3

Premium Essay

...Assignment Time to submit: 1wk Q1. State the Bayes’ Theorem. Explain its importance. Insurance Company believes that people can be divided into two classes, accident prone and non-accident prone. Their statistics show that an accident prone person will have accident within a year with probability 0.4, whereas probability decreases to 0.2 for a non-accident prone person. If 30% of population is accident prone, what is the probability that a new policy holder will have an accident within one year of purchasing the policy? Suppose a new policy holder has an accident within one year, what is the probability that he or she is accident prone? Q2. Surveys by the Federal Deposit Insurance Corporation have shown that the life of a regular savings account maintained in one of its member banks is approximately normally distributed with average life of 24 months and standard deviation of 7.5 months. i) If a depositor opens an account at a bank that is a member of FDIC, what is the probability that there will still be money in the account after 28 months? ii) What is the probability that the account will have been closed before one year? Q3. From 2002 until 2007, the mean price/earnings ratio of approximately 1,800 stocks listed on Bombay Stock Exchange was 14.35, with a standard deviation of 9.73. In a sample of 30 randomly chosen BSE stocks, the mean P/E ratio in 2008 was 11.77. Does this sample present sufficient evidence to conclude (at the 0.05 level of......

Words: 716 - Pages: 3

Free Essay

...Probability XXXXXXXX MAT300 Professor XXXXXX Date Probability Probability is commonly applied to indicate an outlook of the mind with respect to some hypothesis whose facts are not yet sure. The scheme of concern is mainly of the frame “would a given incident happen?” the outlook of the mind is of the type “how sure is it that the incident would happen?” The surety we applied may be illustrated in form of numerical standards and this value ranges between 0 and 1; this is referred to as probability. The greater the probability of an incident, the greater the surety that the incident will take place. Therefore, probability in a used perspective is a measure of the likeliness, which a random incident takes place (Olofsson, 2005). The idea has been presented as a theoretical mathematical derivation within the probability theory that is applied in a given fields of study like statistics, mathematics, gambling, philosophy, finance, science, and artificial machine/intelligence learning. For instance, draw deductions concerning the likeliness of incidents. Probability is applied to show the underlying technicalities and regularities of intricate systems. Nevertheless, the term probability does not have any one straight definition for experimental application. Moreover, there are a number of wide classifications of probability whose supporters have varied or even conflicting observations concerning the vital state of probability. Just as other......

Words: 335 - Pages: 2

Premium Essay

...PROBABILITY SEDA YILDIRIM 2009421051 DOKUZ EYLUL UNIVERSITY MARITIME BUSINESS ADMINISTRATION CONTENTS Rules of Probability 1 Rule of Multiplication 3 Rule of Addition 3 Classical theory of probability 5 Continuous Probability Distributions 9 Discrete vs. Continuous Variables 11 Binomial Distribution 11 Binomial Probability 12 Poisson Distribution 13 PROBABILITY Probability is the branch of mathematics that studies the possible outcomes of given events together with the outcomes' relative likelihoods and distributions. In common usage, the word "probability" is used to mean the chance that a particular event (or set of events) will occur expressed on a linear scale from 0 (impossibility) to 1 (certainty), also expressed as a percentage between 0 and 100%. The analysis of events governed by probability is called statistics. There are several competing interpretations of the actual "meaning" of probabilities. Frequentists view probability simply as a measure of the frequency of outcomes (the more conventional interpretation), while Bayesians treat probability more subjectively as a statistical procedure that endeavors to estimate parameters of an underlying distribution based on the observed distribution. The conditional probability of an event A assuming that B has occurred, denoted ,equals The two faces of probability introduces a central ambiguity which has been around for 350 years and still leads to disagreements about...

Words: 3252 - Pages: 14

Free Essay

...Probability Question 1 The comparison between the bar chart and histogram are bar graphs are normally used to represent the frequency of discrete items. They can be things, like colours, or things with no particular order. But the main thing about it is the items are not grouped, and they are not continuous. Where else for the histogram is mainly used to represent the frequency of a continuous variable like height or weight and anything that has a decimal placing and would not be exact in other words a whole number. An example of both the graphs:- Bar Graph Histogram These 2 graphs both look similar but however, in a histogram the bars must be touching. This is because the data used are number that are grouped and in a continuous range from left to right. But as for the bar graph the x axis would have its individual data like colours shown in the above. Question 2 a) i) The probability of females who enjoys shopping for clothing are 224/ 500 = 0.448. ii) The probability of males who enjoys shopping for clothing are 136/500 = 0.272. iii) The probability of females who wouldn’t enjoy shopping for clothing are 36/500 = 0.072. iv) The probability of males who wouldn’t enjoy shopping for clothing are 104/500 = 0.208. b) P (AᴗB) = P(A)+P(B)-P(AᴖB) P (A|B) = P(AᴖB)P(B) > 0 P (B|A) = PAᴖBPA PAᴖBPB = PAᴖBPA PAPB = 1 P(A) = P(B) P(AᴗB) = 1 P(A)+P(B) = 1 P(B) > 0.25 Question 3 1. Frequency Distribution of Burberry Clothing...

Words: 908 - Pages: 4

Free Essay

...Massachusetts Institute of Technology 6.042J/18.062J, Fall ’02: Mathematics for Computer Science Professor Albert Meyer and Dr. Radhika Nagpal Course Notes 10 November 4 revised November 6, 2002, 572 minutes Introduction to Probability 1 Probability Probability will be the topic for the rest of the term. Probability is one of the most important subjects in Mathematics and Computer Science. Most upper level Computer Science courses require probability in some form, especially in analysis of algorithms and data structures, but also in information theory, cryptography, control and systems theory, network design, artiﬁcial intelligence, and game theory. Probability also plays a key role in ﬁelds such as Physics, Biology, Economics and Medicine. There is a close relationship between Counting/Combinatorics and Probability. In many cases, the probability of an event is simply the fraction of possible outcomes that make up the event. So many of the rules we developed for ﬁnding the cardinality of ﬁnite sets carry over to Probability Theory. For example, we’ll apply an Inclusion-Exclusion principle for probabilities in some examples below. In principle, probability boils down to a few simple rules, but it remains a tricky subject because these rules often lead unintuitive conclusions. Using “common sense” reasoning about probabilistic questions is notoriously unreliable, as we’ll illustrate with many real-life examples. This reading is longer than usual . To keep things in......

Words: 18516 - Pages: 75