ENGINEERING HYDROLOGY: CHAPTER 071 - JOINT PROBABILITY

1. REGIONAL ANALYSIS

1.01
Regional analysis encompasses the study of hydrologic phenomena with the aim of developing mathematical relations to be used in a regional context.

1.02
Generally, mathematical relations are developed so that information from gaged or long-record catchments can be readily transferred to neighboring ungaged or short-record catchments of similar hydrologic characteristics.

1.03
Other applications of regional analysis include regression techniques used to develop empirical equations applicable within a broad geographical region.

1.04
Regional analysis makes use of statistics and probability, including joint probability distributions.

1.05
Joint probability distributions are useful in regression theory.

2. JOINT PROBABILITIES

2.01
Probability distributions with two random variables, X and Y, are called bivariate, or joint distributions.

2.02
A joint distribution expresses in mathematical terms the probability of occurrence of an outcome consisting of a pair of values.

2.03
In statistical notation,

2.04

P(X = x_i, Y = y_j)

2.05
is the probability P that the random variables X and Y will take on the outcomes x _i and y_i simultaneously.

2.06
A shorter notation is:

2.07

P(x_i, y_j)

2.08
The sum of the probabilities of all possible outcomes is equal to unity, as follows:

2.09

n    m
∑  &sum   P(x_i,y_j) = 1

ⁱ⁼¹ ^j=1

2.10
A classical example of joint probability is that of the outcome of the cast of two dice, say A and B.

2.10

2.11
Intuitively, the probability of getting a 1 for A and a 1 for B is 1/36.

2.12

                   1
P(A = 1, B = 1) =
                   36

2.13
In total, there are 36 possible outcomes, and each has the same probability: 1/36.

2.14
This distribution is referred to as the bivariate uniform distribution because each outcome has an equal and uniform probability of occurrence.

2.15
Joint cumulative probabilites are defined in a similar way as for univariate probabilities:

2.16

                 k     l
F(x_k, y_l) = ∑  &sum   P(x_i,y_j)

          ⁱ⁼¹ ^j=1

2.17
in which F(x_k, y_l) is the joint cumulative probability.

2.18
For the example of the two dice, the probability of A being 5 or less, and B being 3 or less, is the sum of all the individual probabilities, for all combinations of i and j, as i varies from 1 to 5, and j varies from 1 to 3; that is:

2.19

5 × 3 = 15

15 × (1/36) = 15/36

3. MARGINAL PROBABILITIES

3.01
Marginal probability distributions are obtained by summing up P(x_i, y_j) over all values of one of the variables, for instance, X.

3.02
The resulting marginal distribution is the probability distribution of the other variable, in this case Y, without regard to X.

3.03
Marginal distributions are univariate distributions obtained from bivariate distributions.

3.04
In statistical notation, the marginal probability distribution of X is:

3.05

            m
P(x_i) = ∑   P(x_i,y_j)

^j=1

3.06
Likewise, the marginal distribution of Y is:

3.07

            n
P(y_j) = ∑   P(x_i,y_j)

ⁱ⁼¹

3.08
Returning to the two dice, the probability of A being equal to 1, regardless of the value of B is:

3.09

6 × (1/36) = 1/6

3.10
Likewise, the probability of B being equal to 4, regardless of the value of A is also 1/6.

3.11
Notice that the joint probabilities of each of all six possible outcomes have been summed in order to calculate the marginal probability.

3.12
Marginal cumulative probability distributions are obtained by combining the concepts of marginal and cumulative distributions.

3.13
In statistical notation, the marginal cumulative probability distribution of X is:

3.14

            k    m
F(x_k) = ∑  &sum   P(x_i,y_j)

      ⁱ⁼¹ ^j=1

3.15
Likewise, the marginal cumulative probability distribution of Y is:

3.16

            n    l
F(y_l) = ∑  &sum   P(x_i,y_j)

      ⁱ⁼¹ ^j=1

3.17
Returning to the two dice, the probability of A being 2 or less, regardless of the value of B, is:

3.18

2 × 6 × (1/36) = 1/3

3.19
Likewise, the probability of B being 5 or less, regardless of the value of A, is 5/6.

3.20

5 × 6 × (1/36) = 5/6

4. CONDITIONAL PROBABILITIES

4.01
The concept of conditional probability is useful in regression analysis.

4.02
The conditional probability is the ratio of joint and marginal probabilities.

4.03
In statistical notation:

4.04

          P(x,y)
P(x|y) =
           P(y)

4.05
in which P(x|y) is the conditional probability of x, given y.

4.06
The conditional probability of y, given x, is:

4.07

          P(x,y)
P(y|x) =
           P(x)

34.08
From these equations, it follows that joint probability of the product of conditional and marginal probabilities.

4.09
Joint probabilities can be expressed as continuous functions.

4.10
In this case, they are referred to as joint density functions, with the notation f(x,y).

4.11
Moments provide descriptions of the properties of joint distributions.

4.12
For continuous functions, the joint moment of order r and s about the origin, is defined as follows:

4.13

           ∞ ∞

μ'_r,s =

∫ ∫ x^r y^s f(x,y) dy dx

        -∞ -∞

4.14
With r =1 and s = 0, this equation reduces to the mean of x:

4.15

∞ ∞

μ'_1,0 =

∫ x [ ∫ f(x,y) dy ]dx

-∞ -∞

4.16
The expression within brackets is the marginal probability density function of x, that is, f(x).

4.17
Therefore, the expression for the mean of x is:

4.18

∞

μ'_1,0 = μ_x =

∫ x f(x) dx

-∞

4.19
Likewise, the mean of y is:

4.20

∞

μ'_1,0 = μ_y =

∫ y f(y) dy

-∞

4.21
The second moments are usually written about the mean:

4.22

∞ ∞

μ_r,s =

∫ ∫ (x - μ_x)^r (y - &mu_y)^s f(x,y) dy dx

-∞ -∞

4.23
For r = 2 and s = 0, this equation reduces to the variance of x:

4.24

∞

σ_x² =

∫ (x - μ_x)² f(x) dx

-∞

4.25
Likewise, for r = 0 and s = 2, this equation reduces to the variance of y:

4.26

∞

σ_y² =

∫ (y - μ_y)² f(y) dy

-∞

4.27
A third type of moment arises for r = 1 and s = 1:

4.28

∞ ∞

σ_x,y =

∫ ∫ (x - μ_x) (y - &mu_y) f(x,y) dy dx

-∞ -∞

4.29
This moment is called the covariance.

4.30
The correlation coefficient is a dimensionless value relating the covariance and the standard deviations:

4.31

&sigma_x,y
ρ_x,y =
σ_x σ_y

4.32
in which ρ_x,y is the correlation coefficient based on population data.

4.33
The sample correlation coefficient is:

4.34

s_x,y
r_x,y =
s_x s_y

4.35
The correlation coefficient is a measure of the linear dependence between x and y.

4.36
It varies in the range -1 to +1.

4.37
A value of ρ or r close to or equal to 1 indicates a strong linear relationship between the variables, with large values of x associated with large values of y, and small values of x with small values of y.

4.38
A value of ρ or r close to or equal to -1 indicates a correlation such that large values of x are associated with small values of y, and viceversa.

4.39
A value of ρ or r equal to zero, that is, a zero covariance, indicates the lack of linear dependence between x and y.

Narrator: Victor M. Ponce

Music: Fernando Oñate

Editor: Flor Pérez

Visualab Productions