The Principle of Maximum Entropy states: When one has only partial information about the possible outcomes one should choose the probabilities so as to maximize the uncertainty about the missing information, as shown by Jaynes [8]. In other words, the basic rule is: Use all the information on the parameter that you have, but avoid including any information that you do not have. Therefore one should be as uncommitted as possible about missing information.
Also, entropy is a measure of randomness. By applying the principle of maximum entropy, one obtains the most random distribution subject to the satisfaction of the given constraints. We might also say, that if there is not complete information about a distribution, the optimum estimate is as unbiased as possible, and so choose the most random possible distribution. Choosing any other distribution would mean, including additional information not given to us and by that not keeping to the principle.
Jaynes proposed that Shannon's measure of uncertainty (entropy) could be
used to define the values for probabilities.
The principle of Maximum Entropy provides that if there are n possible
outcomes then, in the absence of additional information, the outcomes should
be presumed to have equal probabilities. So no outcome is preferred over any
other.
![]()
We may also have some additional information that can be expressed as
![]()
In the constraint equations
is a function of n
variables
.
We have m+1 relations between
. If m+1<n, it is not possible to determine
the probabilities
uniquely. We can use
any arbitrary values for n-m-1 of the probabilities. After that we can solve
the remaining m+1 probabilities by using the equations (2.3) and
(2.4).
We thus have a infinite number of solutions for the
probabilities and consequently an infinity of probability distributions.
According to Jaynes one should select that distribution which has
maximum entropy.
He suggested that we should choose
so as to maximize the uncertainty measure
subject to equations (2.3) and (2.4).
To sum up we should choose the distribution that
The minimally prejudiced (or biased) probability distribution is the
set of
which obeys the (m+1) equations (2.3),
(2.4) above and maximizes S of equation (2.1). The
resulting distribution is
![]()
where the
's are Lagrangian multipliers. Since a exponential
function is never negative it is for sure that
for each
i so that there is no need to state the non-negativity constraint.
The sum of partial probabilities is unity (eq. 2.3).
Forming the sum of equation (2.6) we get
![]()
and solving for
![]()