カリフォルニア大学バークレー校Stat 2.2 x Probability確率初歩学習ノート:Section 2 Random sampling with and without replacement

10785 ワード

Stat2.2 x Probability(確率)コースは、カリフォルニア大学バークレー校(University of California,Berkeley)が2014年にedXプラットフォームで講義した.
PDFノートダウンロード(Academia.edu)
Summary
  • Independent $$P(A\cap B)=P(A)\cdot P(B)$$
  • Binomial Distribution $$C_{n}^{k}\cdot p^k\cdot(1-p)^{n-k}$$ R function:
    dbinom(k, n, p)

  • UNGRADED EXERCISE SET A
    PROBLEM 1
    I toss a coin 4 times. Find the chance of getting:
    1A the sequence $HTHT$
    1B 2 heads
    1C more heads than tails
    Solution
    1A) $$P(\text{HTHT})=\frac{1}{2^4}=0.0625$$
    1B) Binomial distribution $n=4, k=2, p=0.5$: $$P(\text{two heads of four tosses})=C_{n}^{k}\cdot p^k\cdot (1-p)^{n-k}=C_{4}^{2}\times0.5^4=0.375$$ R code:
    > dbinom(x = 2, size = 4, prob = 0.5)
    
    [1] 0.375

    1C) Binomial distribution $n=4, k=3,4, p=0.5$: $$P(\text{more heads than tails})=P(\text{3 heads of 4 tosses})+P(\text{4 heads of 4 tosses})$$ $$=\sum_{k=3}^{4}C_{4}^{k}\cdot 0.5^k\cdot (1-0.5)^{4-k}=0.25+0.0625=0.3125$$ R code:
    > sum(dbinom(x = 3:4, size = 4, prob = 0.5))
    
    [1] 0.3125

    PROBLEM 2
    A random number generator draws at random with replacement from the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Find the chance that the digit 5 appears on more than 11% of the draws, if:
    2A 100 draws are made
    2B 1000 draws are made
    Solution
    2A) Binomial distribution $n=100, k=12:100, p=0.1$: $$P(\text{digit 5 appears on more than 11% of 100 draws})$$ $$=\sum_{k=12}^{100}C_{100}^{k}\cdot 0.1^k\cdot (1-0.1)^{100-k}=0.2969669$$ R code:
    > sum(dbinom(x = 12:100, size = 100, prob = 0.1))
    
    [1] 0.2969669
    
    > # alternativel using "pbinom" function
    
    > pbinom(q = 100, size = 100, p = 0.1) - pbinom(q = 11, size = 100, p = 0.1)
    
    [1] 0.2969669

    2B) Binomial distribution $n=1000, k=111:1000, p=0.1$: $$P(\text{digit 5 appears on more than 11% of 1000 draws})$$ $$=\sum_{k=111}^{1000}C_{100}^{k}\cdot 0.1^k\cdot (1-0.1)^{1000-k}=0.1347765$$ R code:
    > sum(dbinom(x = 111:1000, size = 1000, prob = 0.1))
    
    [1] 0.1347765
    
    > # Alternatively
    
    > pbinom(q = 1000, size = 1000, p = 0.1) - pbinom(q = 110, size = 1000, p = 0.1)
    
    [1] 0.1347765

    PROBLEM 3
    A die is rolled 12 times. Find the chance that the face with six spots appears once among the first 6 rolls, and once among the next 6 rolls.
    Solution
    The first six rolls and the second six rolls are independent, and each of them is binomial distribution $n=6, k=1, p=\frac{1}{6}$: $$P(\text{once among first 6 rolls & once among second 6 rolls})$$ $$=P(\text{once among first 6 rolls})\times P(\text{once among second 6 rolls})$$ $$=C_{6}^{1}\times\frac{1}{6}\times(1-\frac{1}{6})^5\times C_{6}^{1}\times\frac{1}{6}\times(1-\frac{1}{6})^5=0.1615056$$ R code:
    > dbinom(x = 1, size = 6, prob = 1/6) ^ 2
    
    [1] 0.1615056

    PROBLEM 4
    A quiz consists of 20 true-false questions. The score for each question is 1 point if it is answered correctly, and 0 otherwise.
    4A Suppose a student guesses the answer to Question 1 on the test by tossing a coin: if the coin lands Heads, she answers True, and if it lands Tails, she answers False. What is the chance that she gets the right answer?
    4B Suppose a student guesses the answers to both Questions 1 and 2 as described in 4A, using a different toss for each question. Are the events “gets the right answer to Question 1” and “gets the right answer to Question 2” independent?
    4C To get an A grade on the test, you need a total score of more than 16 points. One of the students knows the correct answer to 6 of the 20 questions. The rest she guesses at random by tossing a coin (one toss per question, as in 4B). What is the chance that she gets an A grade on the test?
    Solution
    4A) No matter what the right answer is, the chance that the coin picks that answer is $\frac{1}{2}$.
    4B) Yes, they are independent. No matter what the pair of correct answers is $(TT, TF, FT, TT)$, the chance that the students gets both right is $$P(\text{Q1 & Q2 are right})=\frac{1}{4}=\frac{1}{2}\times\frac{1}{2}=P(\text{Q1 is right})\cdot P(\text{Q2 is right})$$
    4C) From the remaining 14 questions she needs to get at least 11 points. Binomial distribution $n=14, k=11:14, p=0.5$: $$P(\text{at least 11 are right among 14 questions})$$ $$=\sum_{k=11}^{14}C_{14}^{k}\cdot0.5^k\cdot(1-0.5)^{14-k}=0.02868652$$ R code:
    > sum(dbinom(x = 11:14, size = 14, prob = 0.5))
    
    [1] 0.02868652

    PROBLEM 5
    A die has one red face, two blue faces, and three green faces. It is rolled 5 times. Find the chance that the red face appears on one of the rolls and the remaining rolls are green. [Careful what you multiply. The most straightforward method is to follow the derivation of the binomial formula.]
    Solution
    This can be seen as a derivation of binomial distribution: $C_{n}^{k}\cdot {p_1}^k\cdot {p_2}^{n-k}$, where $n=5, k=1, p_1=\frac{1}{6}, p_2=\frac{3}{6}$: $$P(\text{1 red and 4 green among 5 rolls})=C_{5}^{1}\times\frac{1}{6}\times(\frac{3}{6})^4=0.05208333$$ R code:
    > choose(5, 1) * (1/6) * (3/6)^4
    
    [1] 0.05208333

    Summary
  • Hypergeometric Distribution $$\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}$$ R function:
    dhyper(x, m, n, k)
  • Geometric Distribution $$p\cdot(1-p)^x$$ R function:
    dgeom(x, p)

  • UNGRADED EXERCISE SET B
    PROBLEM 1
    A poker hand consists of 5 cards dealt at random without replacement from a standard deck of 52 cards of which 26 are red and the rest black. A poker hand is dealt. Find the chance that the hand contains three red cards and two black cards.
    Solution
    Hypergeometric distribution $x=3, m=26, n=26, k=5$: $$P(\text{3 red and 2 black})=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{C_{26}^{3}\cdot C_{26}^{2}}{C_{52}^{5}}=0.3251301$$ R code:
    > dhyper(x = 3, m = 26, n = 26, k = 5)
    
    [1] 0.3251301

    PROBLEM 2
    In a population of 500 voters, 40% belong to Party X. A simple random sample of 60 voters is taken. What is the chance that a majority (more than 50%) of the sampled voters belong to Party X?
    Solution
    Hypergeometric distribution $x=31:60, m=200, n=300, k=60$: $$P(\text{majority voters belong to Party X})$$ $$=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{\sum_{x=31}^{60}C_{200}^{x}\cdot C_{300}^{60-x}}{C_{500}^{60}}=0.0348151$$ R code:
    > sum(dhyper(x = 31:60, m = 200, n = 300, k = 60))
    
    [1] 0.0348151

    PROBLEM 3
    In an egg carton there are 12 eggs, of which 9 are hard-boiled and 3 are raw. Six of the eggs are chosen at random to take to a picnic (yes, the draws are made without replacement). Find the chance that at least one of the chosen eggs is raw.
    Solution
    Hypergeometric distribution $x=1:3, m=3, n=9, k=6$: $$P(\text{at least one is raw})=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{\sum_{x=1}^{3}C_{3}^{x}\cdot C_{9}^{6-x}}{C_{12}^{6}}=0.9090909$$ R code:
    > sum(dhyper(x = 1:3, m = 3, n = 9, k = 6))
    
    [1] 0.9090909

    PROBLEM 4
    A box contains 8 dark chocolates, 8 white chocolates, and 8 milk chocolates. I choose chocolates at random (yes, without replacement; I’m eating them). What is the chance that I have chosen 20 chocolates and still haven’t got all the dark ones?
    Solution
    Hypergeometric distribution $x=0:7, m=8, n=16, k=20$: $$P(\text{less than 8 dark chocolates})=\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=\frac{\sum_{x=0}^{7}C_{8}^{x}\cdot C_{16}^{20-x}}{C_{24}^{20}}=0.828722$$ R code:
    > sum(dhyper(x = 0:7, m = 8, n = 16, k = 20))
    
    [1] 0.828722
    
    > 1-dhyper(x = 8, m = 8, n = 16, k = 20)
    
    [1] 0.828722

    PROBLEM 5
    I throw darts repeatedly. Assume that on each throw I have a 1% chance of hitting the bullseye, independently of all other throws. (Note that this implies for example that repetition doesn’t help my aim get any better; in my case that might not be such a bad assumption.) Find the chance that it takes me more than 100 throws to hit the bullseye.
    Solution
    At least 101 throws including 100 fails and 1 success. so $$P(\text{more than 100 throws to hit the bullseye})=(1-0.01)^{100}=0.3660323$$ Alternatively, we can consider that "doesn't hit the bullseye within 100 throws"(geometric distribution $x=0:99, p=0.01$): $$P(\text{more than 100 throws to hit the bullseye})$$ $$=1-P(\text{at most 100 throws to hit the bullseye})$$ $$=1-\sum_{x=0}^{99}(1-0.01)^x\cdot0.01=0.3660323$$ R code:
    > 1 - sum(dgeom(x = 0:99, prob = 0.01))
    
    [1] 0.3660323

    PROBLEM 6
    If you bet on “red” at roulette, you have chance 18/38 of winning. (There will be more on roulette later in the course; for now, just treat it as a generic gambling game.) Suppose you make a sequence of independent bets on “red” at roulette, with the decision that you will stop playing once you’ve won 5 times. What is the chance that after 15 bets you are still playing?
    Solution
    After 15 bets you are still playing means "there are at most winning 4 times within 15 bets", hence it is binomial distribution that $n=15, k=0:4, p=\frac{18}{38}$: $$P(\text{at most winning 4 times within 15 bets})$$ $$=\sum_{k=0}^{4}C_{15}^{k}\cdot(\frac{18}{38})^k\cdot(1-\frac{18}{38})^{15-k}=0.08739941$$ R code:
    > sum(dbinom(x = 0:4, size = 15, prob = 18/38))
    
    [1] 0.08739941

    PROBLEM 7
    A school is running a raffle. There are 100 tickets, of which 3 are winners. You can assume that tickets are sold by drawing at random without replacement from the available tickets. Teacher X buys 10 raffle tickets, and so does Teacher Y. Find the chance that one of those two teachers gets all three winning tickets.
    Solution
    Hypergeometric distribution $x=3, m=3, n=97, k=10$: $$P(\text{teacher X or teacher Y gets all three winning tickets})$$ $$=P(\text{teacher X gets three winning tickets})+P(\text{teacher Y gets three winning tickets})$$ $$=2\times\frac{C_{m}^{x}\cdot C_{n}^{k-x}}{C_{m+n}^{k}}=2\times\frac{C_{3}^{3}\cdot C_{97}^{7}}{C_{100}^{10}}=0.00148423$$ R code:
    > 2 * dhyper(x = 3, m = 3, n = 97, k = 10)
    
    [1] 0.00148423