Recent developments in the application of Fisher information to sustainable environmental management

Alejandra González-Mejía , ... Heriberto Cabezas , in Assessing and Measuring Environmental Impact and Sustainability, 2015

Early warning signals, regime change, and leading indicators

Ecological systems can be resilient and still experience large fluctuations or low stability (Holling, 1973). Although stability has been found to be related to the number of links between species in a trophic web (MacArthur, 1955) as the human population and its resource demands increase, systems typically move further away from equilibrium (Holling, 1973). As indicated previously, many complex systems have critical thresholds or tipping points that, when breached, cause the system to shift from one dynamic regime to another. This, in turn, has led to the idea that early warning signals may be derived from time series data gathered from systems on the verge of experiencing critical transitions (Scheffer et al., 2009). Wissel (1984) determined that when many systems are approaching a critical threshold, they undergo a "critical slowing down" in system activity. Other systems experience behavior such as enhanced flickering before the shift to a different regime occurs (Carpenter et al., 2008).

Indicators to include variance, skewness, kurtosis, and critical slowing have been proposed and studied by numerous researchers (Biggs et al., 2009) as indicators of impending regime shift. Eason et al. (2014b) provide a review of much of this literature and noted that these approaches have been extensively applied to assess models or simple systems with few variables. However, Scheffer et al. (2009) noted that work is still needed to determine if they are useful for evaluating real, complex, multivariate systems. This sentiment is echoed by Dakos et al. (2012), who explored multiple indicators that have been proposed as early warning signals and found difficulty in determining the best indicator for identifying impending transitions. Moreover, Biggs et al. (2009) found that these approaches typically do not signal a shift until it is underway and often too late for management intervention (Dakos et al., 2012).

Recently, FI has been examined as an indicator of pending regime change (RC). Eason et al. (2014b) used method 2 to explore the relationship between FI and traditional indicators of regime shift, and they comparatively evaluated their results when assessing simple and complex systems. From the study of simple model systems, FI was found to be negatively correlated with variance and positively correlated with kurtosis. The relationship between skewness and critical slowing (denoted by increasing autocorrelation defined by lag 1 autocorrelation coefficient, AR1) was inconclusive. However, the authors argue that because a more ordered system would have less deviation from mean behavior (less skewness), and because critical slowing is reflected by increasing autocorrelation (Scheffer et al., 2009), FI is expected to be negatively correlated with both skewness and AR1 as a system approaches a critical transition. Scheffer et al. (2012) indicated that a positive relationship exists between FI and critical slowing (increasing AR1). However, as noted by Seekell et al. (2012) and in Appendix 2.2, system dynamics may vary and, subsequently, so may indicator signals. For simple systems (two species of Lotka–Volterra and simulation of nitrogren release into a shallow lake), the indicators performed similarly with FI identifying shifts (local minimums) at approximately the same time or often before other indicators (Eason et al., 2014b).

Numerous researchers have studied the Pacific Ocean ecosystem and cited regional climate (McGowan et al., 1998) and biological changes (Grebmeier et al., 2006) resulting in two regime shifts in 1977 and 1989 (Hare and Mantua, 2000). Karunanithi et al. (2008) previously analyzed this system using FI and found dynamic changes in the system in line with expected shift periods. Here, we present results that extend the study to compare the performance of FI with traditional regime shift indicators. Sixty-five biological and climate variables were compiled from a Hare and Mantua (2000) study and, from this data, FI and variance were computed (hwin=10, winspace=1) to assess the dynamic behavior of the system. Method 2 was used to compute FI, and the size of states was determined by using Matlab code (Mathworks, Inc., 2012) developed to locate a period in the system with minimal variation to estimate the uncertainty for each dimension.

The results plotted in Figure 2.7A show that some variables displayed increasing variance but others did not. Such behavior is true of similar indicators (e.g., skewness, critical slowing) because using traditional indicators to explore system dynamics requires the indicator to be computed for each variable (Eason et al., 2014b). Hence, the traditional indicators provided unclear signals about the behavior of this complex ecosystem (and multivariate systems, in general). Seekell et al. (2012) noted similar behavior and reported that there is evidence of conflicting patterns (increases or decreases) in autocorrelation, variance, and skewness as a system approaches a regime shift. When FI was computed from these data, there were local minimums in FI that corresponded with shift periods and decreases in FI prior to the shift (Figure 2.7B). Accordingly, Eason et al. (2014b) proposed that declines in FI should be explored as early warning signals of critical transitions because they provide evidence of loss of dynamic order and a "window of opportunity" for management intervention. Further expanding on this idea, Bayes theorem is used to augment the approach with classical statistical methods. Bayes theorem is not strictly part of information theory. However, we add it here because of its importance in interpreting and extending the results that can be obtained from FI analysis.

Figure 2.7. Comparing regime shift indicator performance for the Bering Strait ecosystem. (A) Variance and (B) FI (method 2).

Bayes theorem

Bayes theorem was developed by the English Reverend Thomas Bayes (1702–1761) and first published in 1763 in the Philosophical Transactions of the Royal Society of London. The theorem deals with conditional probabilities, such as the likelihood of a particular event X occurring if another event Y has already occurred. For purposes of this work, the most important result from Bayes theorem in the simplest form is defined in Eq. (2.20), where P ( X | Y ) is the probability of X being observed if Y has already occurred (the probability of X in the presence of Y), P ( X ) is the probability of X ocurring, P ( Y | X ) is the probability Y occurring if X has already happened, and P ( Y ) is the probability of Y being observed.

(2.20) P ( X | Y ) = P ( X ) P ( Y | X ) P ( Y )

Bayes theorem is most useful when there are reasonable estimates of P ( X ) and P ( Y ) and some information about the conditional probability P ( Y | X ) exists. For assessing warning signals in FI, Bayes theorem is applied to estimate the likelihood that a decrease or a sequence of decreases in FI signals an impending RC. Hence, Eq. (2.21) denotes the appropriate expression using Bayes theorem, where Δ FI < 0 is a decline in FI, P ( RC | Δ FI < 0 ) is the probability that there will be an RC if a decrease in FI has been observed, P ( RC ) is the probability of observing an RC over the history of the system, P ( Δ FI < 0 | R C ) is the probability of seeing a decrease in FI if an RC has been observed, and P ( Δ FI < 0 ) is the probability of observing a decrease in FI.

(2.21) P ( RC | Δ FI < 0 ) = P ( RC ) P ( Δ FI < 0 | R C ) P ( Δ FI < 0 )

Given that regime shifts have typically been preceded by declines in FI (Mayer et al., 2007), the probability of there being a decrease in FI ( Δ FI < 0 ) if an RC has occurred is one, i.e., P ( Δ FI < 0 | R C ) = 1 . P ( RC ) can be estimated by counting the number of times an RC occurred and dividing it by the total number of FI results. Similarly, P ( Δ FI < 0 ) can be computed from the number of time steps where there was a decline in FI divided by the total number of FI time steps. The same general logic applies for using Bayes theorem to assess whether two or three sequential time steps of decreasing FI signals an RC (i.e., D2 and D3). Bayes theorem will be applied to the study of nation states.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780127999685000026

HYDROCARBON ASSESSMENT USING SUBJECTIVE PROBABILITY AND MONTE CARLO METHODS

K.J. Roy , in Methods and Models for Assessing Energy Resources, 1979

DEFINITION OF TERMS

Bayes' Theorem–A formula that gives the probability of two events occurring as the product of the probability that event A occurs, times the probability that event B occurs given event A has occurred.

P ( A , B ) = P ( A ) P ( A | B )

P(A|B) is the conditional probability and P(A) is the marginal probability.

Cumulative distribution function–As used here is a distribution that gives the probability that a random variable will have a value greater than a given value.

Density function–A distribution that gives the probability that a random variable will have a given value.

Exploration play–Hydrocarbon occurrences in which the pools are of similar age and are in reservoirs of the same lithology, have traps of the same or closely related type, contain hydrocarbons from the same source, and in general have had the same history. The play may be entirely conceptual, entirely proven by drilling, or partially explored.

Exploration play method–A method of assessment by which each play in a basin is assessed and the individual play assessments are added to produce a basin assessment.

Monte Carlo methods–Techniques by which solutions to mathematical equations are given in the form of frequency distribution of answers as a result of repeated substitution of randomly selected combinations of values of the parameters in the equation. In this paper the variables are described by cumulative distribution functions and the various combinations of values occur in proportion to the frequency of occurrence of the values of the variables.

Potential equation–An equation that expresses hydrocarbon potential in terms of series of variables whose value when determined lead, through the equation, to a prediction of the value of the hydrocarbon potential.

Prospect–A specific volume of rock that is expected to contain a hydrocarbon pool–an anomaly that would be drilled in hopes of discovering a covering hydrocarbon accumulation.

Resources–A term that includes all categories of reserves and undiscovered recoverable hydrocarbon potential.

Risk–The marginal probability, the probability that the parameters of hydrocarbon occurrence will have values greater than determined minimum.

Subjective probability–Probability arrived at by subjective judgments rather than by counting processes.

Undiscovered hydrocarbon potential–The quantity of undiscovered oil and gas postulated to be present in new pools and recoverable with present technology.

Volumetric method–An assessment method in which potential is calculated as the product of a volume of rock and a yield of hydrocarbon per unit volume.

Yield factor–A number expressing the ultimate yield of recoverable hydrocarbon per unit volume of rock. The number is arrived at by consideration of analogous fully explored areas.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080244433500327

Probability and Stochastic Processes

Sergios Theodoridis , in Machine Learning (Second Edition), 2020

Bayes Theorem

The Bayes theorem is a direct consequence of the product rule and the symmetry property of the joint probability, P ( x , y ) = P ( y , x ) , and it is stated as

(2.14)

where the marginal, P ( x ) , can be written as

P ( x ) = y Y P ( x , y ) = y Y P ( x | y ) P ( y ) ,

and it can be considered as the normalizing constant of the numerator on the right-hand side in Eq. (2.14), which guarantees that summing up P ( y | x ) with respect to all possible values of y Y results in one.

The Bayes theorem plays a central role in machine learning, and it will be the basis for developing Bayesian techniques for estimating the values of unknown parameters.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128188033000118

Identifying and Reducing Potentially Wrong Immunoassay Results Even When Plausible and "Not-Unreasonable"

Adel A.A. Ismail , in Advances in Clinical Chemistry, 2014

3.1.1.5 Cardiac Troponin (cTn) in patients with myocardial infarction/coronary syndrome

Bayes' theorem described above for false-positive results can also be applied to assess the probability of false-negative results using the same criteria. When the disease has low prevalence the probability that a negative result is false-negative is very low and will not be a major problem. The probability of a false-negative, however, would increase significantly when the disease prevalence rises, e.g., serum cTn measurement in patients with symptoms consistent with acute myocardial infarction (AMI)/acute coronary syndrome (ACS). Assuming an interference rate of ±   0.4% and the prevalence of MI in symptomatic patients is ~   80%, the probability of a negative test being false-negative can be calculated according to the formula:

P a / not b = P not b / a P a P not b / a P a + P not b / not a P not a

Or simply

True - negatives the sum of True - negatives + false - negatives

The false-negative rate in the above example would be ~   2%. False-positive results can still occur but at a reduced rate (~   0.5%). A falsely elevated result may lead to hospitalization and expensive investigations such as coronary angiography, while a falsely low result may deny the patient the necessary investigations and treatment with potentially more serious sequelae [58].

False-positive case reports have been widely reported in recent literature though interestingly, the number of false-negative case reports is lower [58,121]. It is important also to reemphasize that the effects of interference, either negative or positive, may persist for a long time dependent on the underlying cause that triggered the production of interfering immunoglobulin antibodies. Protocols such as repeat analysis after 6–12 or 6–18   h from onset may be unhelpful or even misleading if the initial reading is false. This is because interference causing false serum cTn result persists over this short period of time, reinforcing the initial reading, and giving a false sense of reassurance. Falsely low serum cTn is known to occur [62,72]. However, because falsely low results are potentially more serious than the more frequently reported falsely high results, "low" serum cTn in symptomatic patients should be interpreted with caution.

It may be of interest to point out that autoantibodies to cardiac troponins were found in 3–6% of healthy control subjects, blood donors, and hospitalized patients with or without a history of cardiac disease [72,122]. Further studies [74,123–129] have provided additional evidence for the presence of these potentially offending and interfering antibodies capable of causing false-negative immunoassay results. A number of small clinical studies [74,123–128] on patients with AMI/ACS have demonstrated false-negative rates of cTn I ranging from ~   1.5% to 6%; a figure not dissimilar to that projected by Bayesian theorem. In a recent multicenter study for early diagnosis of AMI involving 1818 patients [130], the clinical sensitivity and specificity of cTn I immunoassay were 90.7% and 90.2%, respectively, regardless of the intervals between the onset of chest pain and admission/analyses. Unfortunately, in this and other similar clinical studies, no follow-up investigations were carried out to assess analytical interference in cTn I analyses and its contribution to the ~   10% loss of accuracy.

Ideally, advice to identify false-negative results for cTn I should be provided by the immunoassay's supplier. However, laboratorians can initiate some confirmatory tests (see Table 7.1) such as repeat analysis using blocking antibodies or measuring creatine kinase (CK) activity enzymatically (non-immunologically) as well as the doubling dilution test which in some cases of negative interference can paradoxically produce higher results on progressive dilution [81] (i.e., reverse linearity). Discrepant results should raise other diagnostic possibilities including false-negative interference [114,115].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128014011000074

Risk Based Integrity Modeling of Gas Processing Facilities using Bayesian Analysis

Premkumar Thodi , ... Mahmoud Haddara , in Proceedings of the 1st Annual Gas Processing Symposium, 2009

6 Posterior Probability Modeling

6.1 Bayes Theorem

Bayes' theorem states how to update the prior distribution, p(θ) with likelihood function, p(y / θ) mathematically to obtain the posterior distribution as;

(1) p θ / y = p θ p y / θ p θ p y / θ

The posterior density p(θ / y) summarizes the total information, after viewing the data and provides a basis for inference regarding the parameter, θ (Leonard and Hsu, 1999).

There are four methods for computing the posterior distributions using the priors and likelihood functions (Gosh et al., 2006). They include: analytical approximations, such as numerical integration techniques and Laplace approximations; data augmentation methods such as E-M (Expectation-Maximization) algorithm; Monte Carlo direct sampling and McMC (Markov chain Monte Carlo) methods, such as M-H algorithm and Gibb's sampling. The selected prior and likelihood models for degradations, like 3P-Weibull/3P-lognormal and Type 1 extreme value do not lend themselves easily to Bayesian updating. The main problem is that there is no distribution class on the parameters that is preserved under Bayesian updating (Bedford and Cooke, 2001). This means that simulation methods are the best ways to determine the posterior distributions. The use of M-H algorithm in conjunction with a particular choice of prior has been suggested (Robert and Casella, 1999). Hence, in the present study, the M-H algorithm and Laplace approximation methods are used for computing the posteriors.

The M-H algorithm is a rejection sampling algorithm used to generate a sequence of samples from a probability distribution that is difficult to sample directly. This sequence can be used in McMC simulation to approximate a distribution or to compute an integral. In Bayesian applications, the ability to generate samples without knowing the normalizing constant is a major virtue of this algorithm. Details on implementation of the algorithm may be found in Chib and Greenberg (1995); Robert and Casella (1999).

The Laplace approximation involves expanding the parameters in a Taylor series about the mean values and then using the second order terms for approximating the integrals in numerator and denominator of the posterior function. The posterior means and variances are estimated using Laplace approximation and then the relevant empirical formulas are used to estimate posterior parameters. If g(θ) is a smooth, positive function on the parameter space, the posterior mean of g(θ) can be written as;

(2) E g = E g θ / y = g θ e l θ π θ e l θ π θ

where, l(θ) is the log-likelihood, π(θ) is the prior density. Implementation details of the Laplace method may be found in Tierney and Kadane (1986) and Tanner (1996). Two Matlab codes have been developed; one for the M-H algorithm and another for the Laplace approximation. In order to calibrate the codes, the posterior parameter estimated using the M-H algorithm and Laplace approximations are compared with those of known conjugate pair estimates for normal-normal, gamma-gamma, gamma-normal and gamma-poison distributions. The comparison is summarized in Table 4. Since the error in estimation of parameters using Laplace approximation is higher than that of M-H Algorithm, M-H algorithm has been used further to estimate the posteriors.

Table 4. Comparison of Posteriors by the M-H Algorithm and Laplace Approximations

Parameters – Conjugate Pair Parameters – M-H Algorithm Percentage Error Parameters-Laplace Appx Percentage Error
8.20 0.800 8.229 0.750 -   0.35 6.28 8.201 0.748 -   0.01 6.55
2.10 1.025 2.356 1.036 -   12.18 -   1.09 2.011 1.208 4.24 -   17.89
1.50 0.750 1.562 0.744 -   4.11 0.75 1.433 0.792 4.46 -   5.53
1.10 1.025 1.199 1.076 -   8.96 -   4.94 1.221 0.734 -   11.03 28.41

Both M-H algorithm and Laplace approximation are used for developing the posteriors of the aforementioned degradation priors; however, the M-H algorithm produced better results compared to Laplace approximation. The error in Laplace estimation has been found to increase while estimating variances using square terms. Further, it has been observed that the M-H algorithm converges to sensible results within 10000 iterations. The sample prior-posterior analysis results using the M-H algorithm for UC and EC are summarized in Table 5, and are shown graphically in Figures 5a and b.

Table 5. Sample Posterior Probability Models and the Estimated Parameters

Structural Degradations Posterior Probability Models and its Parameters
Type of Model Shape Scale Threshold
Uniform Corrosion 3P Weibull 1.2740 0.1017 0.0087
3P Lognormal 0.1283 0.2754 -   0.1192
Erosion Corrosion Type 1 Ext. Value 0.2682 0.0673 -
3P Weibull 1.5280 3.4480 0.2896

Fig. 5a

Fig. 5a. Prior-Posterior Analysis Results for UC

Fig. 5b

Fig. 5b. Prior-Posterior Analysis Results for EC

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444532923500377

Equipment Failure

Arnab Chakrabarty , ... Tahir Cagin , in Multiscale Modeling for Process Safety Applications, 2016

7.2.2.3 Bayes theorem and application to equipment failure

Bayes theorem can also be extended to a continuous form where probability distributions are used:

(7.28) f ( θ | x ) = P ( x | θ ) f ( θ ) P ( x | θ ) f ( θ ) d θ

where f′ represents the prior probability distribution, f″ represents the posterior probability distribution, and P(x|θ) is the probability of x as a function of θ. The idea is the same: update prior beliefs about the performance of the system with observed information, but in this case probability distributions are used and updated.

Typically, the selection of the prior distribution is somewhat subjective, so a selection of a conjugate prior from the same family of distributions as the posterior can make the choice more objective for easier computation of the posterior parameters. An example of conjugate distributions are a beta prior and binomial likelihood (leading to a beta posterior), and a gamma prior and Poisson likelihood (leading to a gamma posterior). According to the literature (Shafaghi, 2008), the gamma distribution and Poisson likelihood are best suited to finding a failure rate of components, while the binomial likelihood and beta distribution are best suited for the calculation of rate of failure on demand. Explanation of the Poisson/gamma process follows.

The discrete Poisson distribution assumes that events of interest are randomly dispersed in time or space with a constant intensity of λ (Modarres et al., 2009). The distribution can be expressed as:

(7.29) P ( x ) = ρ x exp ( ρ ) x !

where x is a discrete variable that describes the number of events observed at a certain time interval, and ρ  = λt, where λ is as described earlier and t is the time interval of interest. The Poisson distribution can be used as the likelihood function gathered from observed data in the continuous Bayes formulation.

The gamma distribution is the continuous analog of the Poisson, and models time for α events to occur randomly with an observed mean time of β between events. This is used as the prior probability in the continuous Bayes formulation. This distribution can be expressed as:

(7.30) f ( t ) = 1 β α Γ ( α ) t α 1 exp ( t β )

Γ(α) can be simplified to (α    1)! for all discrete values of α. As a simple example of the use of this continuous formulation, consider a set of tanks that are believed to crack due to corrosion in a gamma-distributed manner. It is believed that three events will occur randomly with a mean time or 5   years in between each event (α  =   3, β  =   5). The prior probability function can be stated as:

(7.31) f ( t ) = 1 5 3 ( 3 1 ) ! t 3 1 exp ( t 5 ) = 1 250 t 2 exp ( t 5 )

Suppose that the tanks are observed for 1   year and an event occurs. The observed λ value would now be one per year. The Poisson distribution of this observation would be:

(7.32) P ( x ) = ( 1 × t ) 1 exp ( 1 × t ) 1 ! = t exp ( t )

Using the above formula to calculate the posterior distribution, we find:

(7.33) f ( θ | x ) = P ( x | θ ) f ( θ ) P ( x | θ ) f ( θ ) d θ = t exp ( t ) · 1 250 t 2 exp ( t 5 ) 0 t exp ( t ) · 1 250 t 2 exp ( t 5 ) d t

this can be simplified to:

(7.34) f ( θ | x ) = t 3 exp ( 6 t 5 ) 0 t 3 exp ( 6 t 5 ) d t 1 2.89 t 3 exp ( 6 t 5 )

which fits the gamma distribution with parameters α  =   4 and β  =   5/6. The beta parameter has gone from a prediction of a mean time between events of 5   years and 10   months. The updated distribution compared with the prior distribution is as follows (Figure 7.2).

Figure 7.2. Probability distribution function of tank failure.

As can be seen, the probability of a failure has gone from relatively uncertain to having a significant spike early in operation as a result of the early failure. The Bayes model has predicted that the failures are expected to occur much earlier than originally expected. With further monitoring and continuous testing, it is possible that the model can once again be updated with more information as the time goes on to provide an even more accurate distribution.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123969750000073

Sensor data analysis, reduction and fusion for assessing and monitoring civil infrastructures

D. Zonta , in Sensor Technologies for Civil Infrastructures, 2014

2.4.4 Alternative non-probabilistic models

Although Bayes' theorem lets us state a rigorous mathematical formulation for any inference problem encountered in structural monitoring, it is a fact that civil engineers are not usually very familiar with probability theory, and are often discouraged by the complexity of its application. Indeed, implementations of probabilistic methods usually involve great computational effort to integrate the likelihood and evidence functions; it also requires a formal definition of all the uncertainties in the variables used in the inference process, which is not an easy task if one is unfamiliar with statistics.

These limits of Bayesian probability theory have justified in the past the redevelopment of alternative methods to handle uncertainties. Among these methods, possibly the most popular approaches are fuzzy logic and interval algebra. Below we provide a brief outline of these approaches.

Fuzzy logic. Fuzzy set theory was first introduced by Zadeh (1965) as a way to handle semantically imprecise concepts. Fuzzy logic is based on the concept of membership function. While in deterministic logic an element x can belong or not to set A, in fuzzy logic this is defined through a membership function μA (x); the metric of the membership is a value ranging between 1 and 0, known as grade of truth, or truth value. In fuzzy logic the classical Boolean logic operators (NOT, AND, OR) are redefined and often referred to as Zadhe operators. For example NOT is obviously defined as:

[2.51] μ A ¯ x = 1 μ A x

Operators AND and OR are not univocally defined but are usually defined as follows: the degree of truth of x being simultaneously in A AND B (which is to say x is in the interception of A and B, AB), is the minimum grade of truth of A and B

[2.52] μ A B x = min μ A x , μ B x

Similarly, operator OR is usually defined as the maximum grade of truth of the two

[2.53] μ A B x = max μ A x , μ B x

It is almost natural to seek some analogy between truth value and Bayesian probability; specifically, the grade of truth, varying between 0 and 1, strictly recalls the concept of probability. However, comparison between the expression of the joint probability and Equation [2.52] shows that fuzzy logic coincides with probability only under special conditions, and in any case does not have the flexibility to handle correlation. Critics of fuzzy logic note, often on the basis of de Finetti's theory (1993), the lack of a need for a method for handling uncertainties, alternative to a probabilistic method.

Interval-based techniques. Interval algebra, first disseminated by the work of Moore (1966), is an even rougher way to handle uncertainties. While in deterministic logic the value of a variable x is determinate, and in probabilistic analysis is defined by a distribution pdf(x), in interval analysis the uncertainty of a variable is defined through an interval x:

[2.54] x = x ¯ x ¯

which means that the variable x is expected to range from a minimum value of x to a maximum of x ¯ . Interval algebra provides a formal extension of classic algebraic operators; for example, sum and difference are redefined as:

[2.55] x + y = x ¯ x ¯ + y ¯ y ¯ = x ¯ + y ¯ , x ¯ + y ¯

[2.56] x y = x ¯ x ¯ y ¯ y ¯ = x ¯ y ¯ , x ¯ y ¯

Interval analysis has gained some popularity in the past, and is sometimes still used today for its simplicity and easy interpretation of the results. From a statistical perspective, it can be seen as a very rough approximation of the probability theory, where distributions are conservatively forced to be uniform, and correlation is ignored. These simplifications often cause too conservative results: for example, direct application of Equation [2.6] produces the following apparent paradox:

[2.57] x x = x ¯ x ¯ x ¯ x ¯ = x ¯ x ¯ , x ¯ x ¯

for which an interval minus itself is not equal to zero, but to an interval centred on zero. That said, using intervals can be useful when scarce information makes it difficult to define the distributions necessary to state the inference problem in rigorous probabilistic terms. The reader is also referred to the textbook by Alfred and Herzberger (1983) for an exhaustive explanation of interval logic.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781782422426500028

Basics of probability and statistics

J. Morio , M. Balesdent , in Estimation of Rare Event Probabilities in Complex Aerospace and Other Systems, 2016

2.1.2 Notion of dependence of random events and conditional probabilities

Definition 2.1.7

Let A and B be two random events such that P ( B ) > 0 . The probability of A knowing B denoted P ( A | B ) is defined by

(2.1) P ( A | B ) = P ( A B ) P ( B )

Theorem 2.1.1 (Bayes' theorem)

Let A and B be two random events such that P ( B ) > 0 . The probability of A knowing B, P ( A | B ) is equal to

P ( A | B ) = P ( B | A ) P ( A ) P ( B )

This theorem is obtained from Equation (2.1) because we have

P ( A B ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A )

Definition 2.1.8

Let A and B be two random events.A and B are said to be independent if

P ( A B ) = P ( A ) P ( B )

Consequently, we deduce from the definition of a conditional probability that P ( A | B ) = P ( A ) when the events A and B are independent. The realizations of A do not depend on the realizations of B.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780081000915000022

Probabilistic analysis in DC grids

Carlos D. Zuluaga R. , in Modeling, Operation, and Analysis of DC Grids, 2021

7.4.1 Bayes theorem and its interpretation

Before defining the Bayes theorem and its significance, it is necessary to put in the context some comments on notation. For example, p ( V | P ) denotes a conditional probability density of the magnitude of voltage V given the power injections P, and similarly for p ( P ) , which denotes a marginal distribution of P. When we use a normal (Gaussian) distribution, it is usually denoted by the name of the distribution; for example, if V i , the magnitude of voltage at the bus i, is modeled by a normal distribution with mean μ V i and variance σ V i 2 , then we write V i N ( μ V i , σ V i 2 ) , that is, p ( V i ) = N ( V i | μ V i , σ V i 2 ) . 2 See Chapter 13 in [17] for more information about several standard distributions.

In general, in power flow analysis of DC grids (see Chapter 8), the power injections are assumed to be known, and the magnitude of voltage is computed by using, for example, the Newton–Raphson method. In real DC grids the power injections P must be considered as random variables due to the penetration of renewable energy resources. Therefore the goal in this section is obtaining a function that models the uncertainty over the magnitude of voltage V. Using the previous notation, we are interested in knowing the conditional probability density of the magnitude of voltage V given the power injections P. From the product rule of the probability theory, together with the symmetry property p ( V , P ) = p ( P , V ) , we can obtain

(7.4) p ( V | P ) = p ( P | V ) p ( V ) p ( P ) ,

where p ( V ) is the prior distribution (prior) for V, p ( P | V ) is the likelihood function (likelihood) that expresses how probable the power injections are for different settings of V, p ( V | P ) is the posterior probability distribution (posterior) of the magnitude of voltage V given the power injections P, and p ( P ) is a normalization constant for ensuring that the posterior is a valid probability distribution. This normalization constant is given by p ( P ) = p ( P | V ) p ( V ) d V . The likelihood is also called the sampling distribution or the data distribution [18]. The expression shown in Eq. (7.4) is called Bayes' theorem.

In Bayesian analysis, we can model V as a random variable having a probability distribution p ( V ) , called the prior distribution. This distribution represents our prior belief about the value of V before P is observed. On the other hand, the posterior distribution represents our knowledge about V after having observed P.

Example

Suppose we have a DC microgrid with three nodes, that is,

(7.5) [ P 1 P 2 ] = [ f 1 ( V 0 , V 1 , V 2 ) f 2 ( V 0 , V 1 , V 2 ) ] = f ( [ V 1 V 2 ] ) .

In this case, V 0 is known. Let us consider that P 1 and P 2 can be modeled by probability distributions assuming that there is uncertainty in the microgrid. Therefore we are interested in obtaining the posterior distribution of V 1 and V 2 given P 1 and P 2 , that is, p ( V 1 | P 1 , P 2 ) and p ( V 2 | P 1 , P 2 ) . To apply the Bayes theorem (7.4), we suppose that V 1 and V 2 are random variables and their prior distribution can be modeled by p ( V 1 , V 2 ) . From the Bayes theorem we obtain

(7.6) p ( V 1 , V 2 | P 1 , P 2 ) = p ( P 1 , P 2 | V 1 , V 2 ) p ( V 1 , V 2 ) p ( P 1 , P 2 ) ,

where

(7.7) p ( P 1 , P 2 ) = V 1 V 2 p ( P 1 , P 2 | V 1 , V 2 ) p ( V 1 , V 2 ) d V 1 d V 2 .

This is equivalent to computing a volume contained within a surface. Calculating the average over V 2 , we have

(7.8) p ( V 1 | P 1 , P 2 ) = p ( V 1 , V 2 | P 1 , P 2 ) d V 2 .

The posterior distribution p ( V 1 | P 1 , P 2 ) can be calculated if the likelihood function p ( P 1 , P 2 | V 1 , V 2 ) and the normalization constant (see Eq. (7.7)) are tractable. 3

If we consider a DC microgrid with n nodes with n voltages ( V 1 , V 2 , , V n ) , we must compute the normalization constant as follows:

(7.9) p ( P 1 , , P n ) = V 1 V n p ( P 1 , , P n | V 1 , , V n ) p ( V 1 , V 2 , , V n ) d V 1 , d V n ,

which can be an intractable normalizing constant.

Generally, the normalization constant or evidence in exact Bayesian inference is intractable, and thus it is not possible to have an analytic form (or closed form) of the likelihood function (it depends on the uncertainty in P), or there are too many variables of V. In DC microgrids, several power injections in P must be modeled by non-Gaussian distributions [17], and f (see Eq. (7.2)) is nonlinear. For these reasons, we can only use approximate Bayesian inference methods to model the uncertainty in DC grids. In the following subsections, we show how to apply several approximate methods for obtaining Bayesian modeling of these grids.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128221013000125

Confirmation Theory

James Hawthorne , in Philosophy of Statistics, 2011

3.4 Likelihood Ratios, Likelihoodism, and the Law of Likelihood

The versions of Bayes' Theorem expressed by Equations 9–11 show that for probabilistic confirmation theory the influence of empirical evidence on posterior probabilities of hypotheses is completely captured by the ratios of likelihoods, P[e n |h j ·b·c n ]/P[e n |h i ·b·c n ]. The evidence (c n · e n ) influences the posterior probabilities in no other way. 26 So, the following "Law" is implied by the logic of confirmation functions.

General Law of Likelihood: Given any pair of incompatible hypotheses h i and h j , whenever the likelihoods P α [e n |h j ·b·c n ] and P α [e n |h i ·b·c n ] are defined, the evidence (c n ·e n ) favors h i over h j , given b, if and only if P α [e n |h i ·b·c n ] > P α [e n |h j ·b·c n ]. The ratio of likelihoods P α [e n |h i ·b·c n ]/P α [e n |h j ·b·c n ] measures the strength of the evidence for h i over h j given b. 27

The Law of Likelihood says that the likelihood ratios represent the total impact of the evidence. Bayesians agree with this, but take prior probabilities to also play a role in the net assessment of confirmation, as represented by the posterior probabilities. So, for Bayesians, even when the strength of the evidence, P α [e n |h i ·b·c n ]/P α [e n |h j ·b·c n ], is be very high, strongly favoring h i over h j , the net degree of confirmation of h i may be much smaller than that of h j if h i is taken to be much less plausible than h j on grounds not captured by this evidence (where the weight of these additional considerations is represented by the confirmational prior probabilities of hypotheses).

Two features of the way the General Law of Likelihood is stated here need some explanation. As stated, this law does not presuppose that likelihoods of form P α [e n |h j · b · c n ] and P α [e n |h i · b · c n ] are always defined. This qualification is introduced to accommodate a conception of evidential support called Likelihoodism, which I'll say more about in a moment. Also, the likelihoods in the law are expressed with the subscript α attached, to indicate that the law holds for each confirmation function P α , even if the values of the likelihoods are not completely objective or agreed on by a given scientific community. These two features of the law both involve issues concerning the objectivity of the likelihoods.

Each confirmation function (each function that satisfies the axioms of section 1) is defined on every pair of sentences. So, the likelihoods are always defined for a given confirmation function. Thus, for a Bayesian confirmation theory the qualifying clause about the likelihoods being defined is automatically satisfied. Furthermore, for confirmation functions the versions of Bayes' theorem (Equations 8–11) hold even when the likelihoods are not objective or intersubjectively agreed. When intersubjective agreement on likelihoods may fail, we leave the subscripts α, β, etc. attached to the likelihoods to indicate this possible lack of objective agreement. Even so, the General Law of Likelihood applies to the confirmation function likelihoods taken one confirmation function at a time. For each confirmation function, the impact of the evidence in distinguishing between hypotheses is completely captured by the likelihood ratios.

A view (or family of views) called likelihoodism maintains that confirmation theory should only concern itself with how much the evidence supports one hypothesis over another, and maintains that evidential support should only involve ratios of completely objective likelihoods. When the likelihoods are objective, their ratios provide an objective measure of how strongly the evidence supports h i as compared to h j , one that is "untainted" by such subjective elements as prior plausibility considerations. According to likelihoodists, objective likelihood ratios are the only scientifically appropriate way to assess what the evidence says about hypotheses.

Likelihoodists need not reject Bayesian confirmation theory altogether. Many are statisticians and logicians who hold that the logical assessment of the evidential impact should be kept separate from other considerations. They often add that the only job of the statistician/logician is to evaluate the objective strength of the evidence. Some concede that the way in which these objective likelihoods should influence the agents' posterior confidence in the truth of a hypothesis may depend on additional considerations — and that perhaps these considerations may be represented by individual subjective prior probabilities for agents in the way Bayesians suggest. But such considerations go beyond the impact of the evidence. So it's not the place of the statistician/logician to compute recommended values of posterior probabilities for the scientific community. 28

For most pairs of sentences conditional probabilities fail to be objectively defined in a way that suits likelihoodists. So, by their lights the logic of confirmation functions (captured by the axioms of section 1) cannot represent an objective logic of evidential support. Because of this, likelihoodists do not have Bayes' theorem available (except in special cases where an objective probability measure on the hypothesis space is available), and so cannot extract the Law of Likelihood from it (as do Bayesians via Equations 9–11). Rather, likelihoodists must state the Law of Likelihood as an axiom of their logic of evidential support, an axiom that (for them) applies only when likelihoods have well-defined objective values.

Likelihoodists tend to have a very strict conception of what it takes for likelihoods to be well-defined. They consider a likelihood well-defined only when it has the form of what we referred to earlier as a direct inference likelihood — i.e., only when either, (1) the hypothesis (together with background and experimental conditions) logically entails the evidence claim, or (2) the hypothesis (together with background conditions) logically entails an explicit simple statistical hypothesis that (together with experimental conditions) specifies precise probabilities for each type of event that makes up the evidence.

Likelihoodists make a point of contrasting simple statistical hypotheses with composite statistical hypotheses, which only entail imprecise, or disjunctive, or directional claims about the statistical probabilities of evidential events. A simple statistical hypothesis might say, for example, "the chance of heads on tosses of the coin is precisely .65"; a composite statistical hypothesis might say, "the chance of heads on tosses is either .65 or .75", or it may be a directional hypothesis that says, "the chance of heads on tosses is greater than .65". Likelihoodists maintain that composite hypotheses are not an appropriate basis for well-defined likelihoods, because such hypotheses represent a kind of disjunction of simple statistical hypotheses, and so must depend on non-objective factors — i.e. they must depend on the prior probabilities of the various hypotheses in the disjunction. For example, "the chance of heads on tosses is either .65 or .75", is a disjunction of the two simple statistical hypotheses h .65 and h .75. From the axioms of probability theory it follows that the likelihood of any specific sequence of outcomes e from appropriate tosses c is given by

P α [ e | c | ( h .65 h .75 ) ] = ( P [ e | c h .65 ] P α [ h .65 | c ] + P [ e | c h .75 ] P α [ h .75 | c ] ) / ( P α [ h .65 | c ] + P α [ h .75 | c ] )

where only the likelihoods based on simple hypotheses (those from which I have dropped the 'α') are completely objective. Thus, likelihoods based on disjunctive hypotheses depend (at least implicitly) on the prior probabilities of the simple statistical hypotheses involved; and likelihoodists consider such factors to be too subjective to be permitted a role in a logic that is supposed to represent only the impact of the evidence.

Taking all of this into account, the version of the Law of Likelihood appropriate to likelihoodists may be stated as follows.

Special Law of Likelihood:

Given a pair of incompatible hypotheses h i and h j that imply statistical models regarding outcomes e n given b·c n , the likelihoods P[e n |h j ·b·c n ] and P[e n |h i · b · c n ] are well defined. For such likelihoods, the evidence (c n · e n ) supports h i over h j , given b, if and only if P[e n | h i · b · c n ] > P[e n | h j ·b·c n ]; the ratio of likelihoods P[e n | h i ·b·c n ] / P[e n |h j ·b·c n ] measures the strength of the evidence for h i over h j given b.

Notice that when either version of the Law of Likelihood holds, the absolute size of any particular likelihood is irrelevant to the strength of the evidence. All that matters is the relative size of the likelihoods — i.e., the size of their ratio. Here is a way to see the point. Let c 1 and c 2 be the conditions for two different experiments having outcomes e 1 and e 2, respectively. Suppose that e 1 is 1000 times more likely on hypothesis h i (given b · c 1) than is e 2 on h i (given b · c 2); and suppose that e 1 is also 1000 times more likely on h j (given b · c 1) than is e 2 on h j (given b · c 2) — i.e., suppose that P α [e 1|h i · b · c 1] = 1000 × P α [e 2|h i · b · c 2], and P α [e 1|h j · b · c 1] = 1000 × P α [e 2|h j · b · c 2]. Which piece of evidence, (c 1 · e 1) or (c 2 · e 2), is stronger evidence with regard to the comparison of h i to h j ? The Law of Likelihood implies both are equally strong. All that matters evidentially are the ratios of the likelihoods, and they are the same in this case:

P α [ e 1 | h i b c 1 ] / P α [ e 1 | h j b c 1 ] = P α [ e 2 | ( h i b c 2 ) ] / P α [ e 2 | h j b c 2 ]

Thus, the General Law of Likelihood implies the following principle.

General Likelihood Principle:

Suppose two different experiments or observations (or two sequences of them) c 1 and c 2 produce outcomes e 1 and e 2, respectively. Let {h 1, h 2, …} be any set of alternative hypotheses. If there is a constant r such that for each hypothesis h j from the set, P α [e 1|h j · b · c 1] = r× P α [e 2|h j ·b·c 2], then the evidential import of (c 1·e 1) for distinguishing among hypotheses in the set (given b) is precisely the same as the evidential import of (c 2 · e 2).

Similarly, the Special Law of Likelihood implies a corresponding Special Likelihood Principle that applies only to hypotheses that express simple statistical models. 29

Bayesians agree with likelihoodists that likelihood ratios completely characterize the extent to which the evidence favors one hypothesis over another (as shown by Equations 9–11). So they agree with the letter of the Law of Likelihood and the Likelihood Principle. Furthermore, Bayesian confirmationists may agree that it's important to keep likelihoods separate from other factors, such as prior probabilities, in scientific reports about the evidence. However, Bayesians go further than most likelihoodists in finding a legitimate role for prior plausibility assessments to play in the full evaluation of scientific hypotheses. They propose to combine a measure of the impact of evidence (couched in terms of ratios of likelihoods) with a measure of the plausibility of hypotheses based on all other relevant factors (couched in terms of ratios of prior probabilities) to yield a probabilistic measure of the net confirmation of each hypothesis (its posterior probability).

Throughout the remainder of this article I will not assume that likelihoods must be based on simple statistical hypotheses, as likelihoodist would have them. However, most of what will be said about likelihoods, including the convergence results in section 5 (which only involved likelihoods), applies to the likelihoodist conception of likelihoods as well. We'll continue for now to take the likelihoods with which we are dealing to be objective in the sense that all members of the scientific community agree on their numerical values. In section 6 we'll see how to extend this approach to cases where the likelihoods are less objectively determinate.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444518620500101