Convex Functions, Partial Orderings, and Statistical Applications by Josip E. Peajcariaac and Y. L. Tong by Josip E. Peajcariaac and Y. L. Tong - Read Online

Page 1 of 1

chapter.

Convex Functions

Convex functions are very important in the theory of inequalities. Thus in the classic book of Hardy, Littlewood, and Pólya (1934, 1952), the third chapter is devoted to the theory of convex functions. This subject receives full treatment in several readily available sources, and in this chapter we will merely collect results which will be used in the rest of this book. For further details arid proofs the reader is referred to Popoviciu (1944), Roberts and Varberg (1973), and other volumes. Here we give only some of the results concerning convex functions; other related results will be provided later in the book.

1.1 Definition

(a)Let I . Then f: I is said to be convex if for all x, y I and all α ∈ [0, 1],

(1.1)

holds. If (1.1) is strict for all x y and α ∈ (0, 1), then f is said to be strictly convex.

(b)If the inequality in (1.1) is reversed, then f is said to be concave. If it is strict for all x y and α ∈ (0, 1), then f is said to be strictly concave.

1.2 Remarks

(a)For x, y I, p, q ≥ 0, p + q ≥ 0, (1.1) is equivalent to

(1.2)

(b)The simple geometric interpretation of (1.1) is that the graph of f lies below its chords.

(c)If x1, x2, x3 are three points in I such that x1 ≤ x2 ≤ x3, then (1.1) is equivalent to

(1.3)

whichis equivalent to

(1.4)

or, more symmetrically and without the condition of monotonicity on x1, x2, x3,

(1.5)

(d)The area of the triangle whose vertices are (x1, f (x1)), (x2, f (x2)), and (x3, f (x3)) is given by

Thefunction f whose graph is given in Figure 1.1 is strictly convex, in which case P > 0. In Figure 1.2 we have P < 0 and the function is strictly concave.

Figure 1.1

Figure 1.2

(e)f is both convex and concave iff f (x) = λx + c for some λ, c .

(f)Another way of writing (1.4) is instructive:

(1.6)

sothat the following result is valid: A function f is convex on I iff for every point c I the function (f (x) − f (c))/(x c) is increasing on I (x c).

(g)By using (1.6) we can easily prove the following result: If f is a convex function on I and if x1 ≤ y1, x2 ≤ y2, x1 ≠ x2, y1 ≠ y2, then the following inequality is valid:

(1.7)

By letting x1 = x, x2 = x + z, y1 = y, y2 = y + z, (x y, z ≥ 0) in (1.7), we have

whichis just

(1.9)

Thus by letting x = x h, y = x + h′, z = h h′ in (1.9) we have

(1.10)

forall 0 < h′ < h. By symmetry, it is clear that we need only assume |h′| ≤ |h|. Thus we obtain , 1974), and the proofs given in those references involve majorization theory.

The following three theorems concern derivatives of convex functions.

1.3 Theorem

Let I be an interval in and f: I be convex. Then (i) f is Lipschitz on any closed interval in I; (ii) f′+ and f′− exist and are increasing in I, and f′− ≤ f′+ (if f is strictly convex, then these derivatives are strictly increasing); and (iii) f′ exists, except possibly on a countable set, and on the complement of which it is continuous.

Proof: See Roberts and Varberg (1973, pp. 4–7).

1.4 Theorem

(a) f: [a, bis (strictly) convex iff there exists an (strictly) increasing function g: [a, band a real number c (a < c < b) such that, for all x and a < x < b,

(1.11)

(b) If f is differentiable, then f is (strictly) convex iff f′ is (strictly) increasing.

(c) If f" exists on (a, b), then f is convex iff f" (x) ≥ 0. If f″(x) > 0, then f is strictly convex.

1.5 Remarks

(a) Under the assumptions of Theorem 1.3 the following inequalities are also valid:

(1.12)

(b) It is useful to note that if f C²(a, b), m = min f″, and M = max ff(x) and f(xare convex. This fact has been used to extend many inequalities implied by convexity (see a, 1982, and Andrica, 1984).

1.6 Theorem

(a) f: (a, bis convex iff there is at least one line of support for f at each x0 ∈ (a, b), i.e.,

(1.13)

where λ depends on x 0 and is given by λ = f′ (x0) when f′ exists, and λ ∈ [f′−(x0), f′+ (x0)] when f′− (x0) ≠ f′+ (x0).

(b) f: (a, bis convex if the function f(x) − f(x0) − λ(x x0) (the difference between the function and its support) is decreasing for x < x0 and increasing for x > x0.

Proof: (a) See Roberts and Varberg (1973, p. 12).

(b) It is equivalent to the inequality

(1.14)

andthe reverse inequality for x0 ≤ x1 < x2; and this is a simple consequence of (1.13) and of Remark 1.2(g).

The next theorem concerns closure properties of convex functions.

1.7 Theorem

(a) If {fn}: I is a sequence of convex functions converging to a finite limit function f on I, then f is convex. Moreover, the convergence is uniform on any closed subinterval of I, the interior of I.

(b) Every continuous function f convex on [a, b] is the uniform limit of the sequence

(1.15)

where w(x, xk) = (x xk)+ (i.e., w(x, xk) = 0 for x < xk and w(x, xk) = x − xk for x ≥ xk) or w(x, xk) = |x − xk|, pk ≥ 0, xk ∈ [a, b] for k = 0, 1,…, n, and λ, c .

Proof: See Roberts and Varberg (1973, p. 17) for (a) and Popoviciu (1934a, b) and Toda (1936) for (b).

1.8 Definition

A function f: [a, bis called convex in the Jensen sense, or J-convex, on [a, b] if for all points x, y ∈ [a, b] the inequality

(1.16)

holds. A J-convex function f is said to be strictly J-convex if for all pairs of points (x, y), x ≠ y, strict inequality holds in (1.16).

1.9 Remarks

(a) J. L. W. V. Jensen (1905, 1906) was the first to define convex functions using inequality (1.16) and to draw attention to their importance.

(b) We have seen that all convex functions are continuous on (a, b). A J-convex function on [a, b], however, is not necessarily continuous on [a, b]. Indeed, if

(1.17)

thenf(2x) = 2f(x) and

Thusa solution of (1.17) is certainly convex, and it is also well known that Cauchy’s equation (1.17) has noncontinuous solutions (Hamel, 1905; Aczel, 1966; Hardy, Littlewood, and Pólya, 1934 and 1952, p. 96; Roberts and Varberg, 1973, p. 217; and Kuczma, 1985, pp. 120-22).

There are many results concerning sufficient conditions under which a J-convex function is continuous. For example, Jensen (1905) proved the following result:

1.10 Theorem

If a J-convex function is defined and bounded from above on (a, b), then it is continuous on (a, b).

For generalizations of Theorem 1.10 see, e.g., Roberts and Varberg (1973, pp. 211-25).

As an analogy to J-convex functions we define convex sequences as follows:

1.11 Definition

A finite sequence {ak}k=1n (an infinite sequence {ak}k=1∞) of real numbers is said to be a convex sequence if

1.12 Remark

If the sequence {an} (n ) is convex, then the function f whose graph is the polygonal line with corner points (n, an) (n , 1989).

1.13 Definition

f: [a, bis said to be a Wright-convex function if for each x ≤ y, z ≥ 0, x, y + z ∈ [a, b], the inequality (1.8) is valid.

1.14 Remark

If C, W and M are the sets of convex, Wright-convex, and J-convex functions, then C ⊂ W ⊂ M (see Kenyon, 1956 and Klee, 1956). Furthermore, each inclusion is proper.

1.15 Definition

A function f: I , I , is said to be log-convex, or multiplicatively convex if log f is convex, or equivalently if for all x, y ∈ I and all α ∈ [0, 1],

(1.18)

Itis said to be log-concave if the inequality in (1.18) is reversed.

1.16 Remark

If f and g are convex and g is increasing, then g ° f is convex (Roberts and Varberg, 1973, p. 16); moreover, by f = exp log f, it follows that a log-convex function is convex (but not conversely). This directly follows from (1.18), of course, since by the arithmetic-geometric inequality we have

1.17 Definition

A function f: I is said to be quasi-convex if for all x, y I and all α ∈ [0, 1],

1.18 Remark

For additional results on quasi-convexity and definitions related to inequality (1.13), see Ponstein (1967).

1.19 Definition

If g is strictly monotonic, then f is said to be (strictly) convex with respect to g if f ° g−1 is (strictly) convex.

1.20 Fact

Assume that f and g exist, are continuous, and that f′ and g′ are never zero. If (g″/g′) ≥ (f″/f′), then f is convex with respect to g (see Mikusinski, 1948, and Cargo, 1965).

arenko’s generalization of a concept of convexity:

1.21 Definition

Let g: I x I be a given function such that g(x, y) > 0 for y > x (x, y I). A function f: I is said to be convex on I with respect to g (g-convex on I) if

(1.20)

holdsfor all x1, x2, x3 ∈ I, x1 < x2 < x3.

1.22 Remarks

(a) Of particular interest are the g-convex functions satisfying

(b) For related properties of g-convex functions, see (1979a, 1979b).

Let K(b) be the class of all functions fwhich are continuous and nonnegative on the segment I = [0, b] and such that f(0) = 0. Note that the mean function F of the function f K(b), defined by

(1.21)

belongsto the class K(b).

Now let K1(b) denote the class of function f K(b) convex on I, and K2(b) be the class of functions f K(b) convex in mean on I, i.e., the class of functions for which F K1(b). Let K3(b) denote the class of functions f which are starshaped with respect to origin on the segment I, i.e., the class of functions f with the property that for all x I and all t (0 ≤ t ≤ 1) the following inquality holds:

(1.22)

1.23 Definition

f is said to belong to the class K4(b) if it is superadditive on I, i.e., if

(1.23)

Ifthe reverse inequality in (1.23) is valid, then f is said to be subadditive.

If F belongs to the class K3(b), we say that f is starshaped in mean, i.e., that it belongs to K5(b); and if F belongs to K4(b), we say that f is superadditive in mean, i.e., that it belongs to K6(b). Similarly, we define functions subadditive in mean. Bruckner and Ostrow (1962) proved that the following inclusions hold:

(1.24)

Beckenbach (1969) gave examples showing that each inclusion is proper.

1.24 Remarks

(a) (1.24) is the well-known partial ordering of convexity. There exist many generalizations and related results. Some of them will be given in this volume.

(b) Interesting results concern functions which have the given property in mean. In later chapters we shall give results for sequences and functions monotonic in mean (such as Čebyšev’s inequality).

1.2 Convex Functions on a Normed Linear Space

The definition of a convex function has a very natural generalization to real-valued function defined on an arbitrary real linear space L. Here we merely require that the domain U of f be convex. This assures that for x, y U and α ∈ [0, 1], f can always be defined at αx + (1 − α)y. We then define f to be convex on U L if

(1.25)

In the following we give some results in Roberts and Varberg (1973) for the case in which L is a normed linear space. Note that for L n we have a convex function of several variables.

1.25 Remark

Let U be an open convex set in L, x0 ∈ U, y L, and define the function g(t) = f(x0 + ty) where t ∈ (a, b) such that x0 + ty U for all t. The function f: U is said to be convex if g(t) is a convex function on (a, b) (Roberts and Varberg, 1973, p. 91).

1.26 Theorem

(a) Let f be convex on an open set U L. If f is bounded from above in a neighborhood of a point in U, then f is locally Lipschitz in U, hence Lipschitz on any compact subset of U, and f is continuous on U.

(b) If f is convex on the open set U n, then f is Lipschitz on every compact subset of U and continuous on U.

Proof: See Roberts and Varberg (1973, pp. 93–94).

1.27 Theorem

(a) Assume that f is defined on the open convex set U L. If f is convex on U and Fréchet differentiable at x0, then, for x U,

(1.26)

holds. If f is differentiable throughout U, then f is convex iff (1.26) holds for all x, x0 ∈ U. Furthermore, f is strictly convex iff the inequality is strict.

(b) Let f: U be continuous and Fréchet differentiable on the open set U L. Then f is (strictly) convex if f′ is (strictly) increasing on U.

(c) Let f be continuously differentiable and suppose that the second derivative exists throughout an open set U L. Then f is convex on U iff f″(x) is nonnegative definite for every x U. Furthermore, if f″(x) is positive definite on U, then f is strictly convex.

Proof: See Roberts and Varberg (1973, pp. 98–101).

1.28 Remarks

(a) We shall say that f′ is increasing on U if for x, y U we have

andthat f′ is strictly increasing on U if this inequality is strict for all x y.

(b) A function f: U M (U L; L, M are normed linear vector spaces) is Fréchet differentiable at x0 (x0 ∈ U) if there exists a linear transformation T: L M such that

whichis equivalent to

asx x0. The linear transformation T is called the Fréchet derivative and is denoted by f′(x0).

(c) A similar derivative of the Fréchet derivative is called the second Fréchet derivative. This derivative is a symmetric bilinear transformation defined on L x L, i.e., fk, h(x) = fh, k(x) (h, k L). Note that if f: U is continuously differentiable on the open convex set U L and f″(x) exists throughout U, then for any x, x0 ∈ U there is an s ∈ (0, 1) such that

(1.27)

whereh = x − x0.

(d) A symmetric bilinear transformation B(h, k) defined on L x L is positive (nonnegative) definite if for every h L (h ≠ 0), we have

(e) The following definition is also valid: A (continuous real-valued) function f is operator convex on (λ, v) if fa + βb) ≤ αf(a) + βf(b) for positive reals α, β such that α + β = 1 and operators a, b with their spectra in (λ, v). (See Davis, 1957, for a brief survey of operator functions and Ando, 1978, for further comments on classes of operator functions.)

Now, by letting L n we can consider convex functions of several variables. In this case f′(x0) is the gradient vector of the function f at the point x0, i.e.,

andthe following theorem is valid:

1.29 Theorem

(a) Suppose f is defined on the open convex set U n. If f is convex on U and the gradient vector f′(x0) exists, then for x U,

(1.28)

where a, bis the inner product of vectors a, b (i.e., 〈a, b〉 = Σi=1n aibi). If f is (strictly) convex and f′(x) exists throughout U, then f is (strictly) increasing on U. Conversely, if the partial derivatives of f exist and are continuous throughout U and if f′ is (strictly) increasing, then f is (strictly) convex.

(b) Let f have continuous second partial derivatives ∂²f/∂xj xi = fij throughout an open convex set U n. Then f is convex on U iff the Hessian matrix H = ||fij(x)|| is nonnegative definite for each x U. Moreover, if H is positive definite on U, then f is strictly convex.

Proof: See Roberts and Varberg (1973, pp. 102–103).

1.30 Remark

A real symmetric matrix A = ||aij|| (i, j = 1, 2, …, n) is said to be positive (nonnegative) definite if the quadratic form Q(x) = Σni, j=1 aijxixj is positive (nonnegative) for all x = (x1 …, xn) ≠ (0, …, 0). It is known that A is a positive (nonnegative) definite matrix iff all determinants

are positive (nonnegative) (see, e.g., Beckenbach and Bellman, 1961, 1965, pp. 57–58).

The following theorem can be found in Roberts and Varberg (1973, p. 108):

1.31 Theorem

(a) The function f is convex on an open convex set U of a normed linear space L iff f has support at each point of U, i.e., if there is an affine function A: L such that A(x0) = f(x0) and A(x) ≤ f(x) for every x U.

(b) The function f is convex on an open convex set U n iff for every point X0 U there exists a point λ n such that

(1.29)

1.32 Remark

Denoting by f′(x0, v) the Gateaux derivative (the directional derivative) of f, i.e.,

thefunction f is Gateaux differentiate at x0 if f′(x0, v) exists for every v L.

Note that A(v) = f′(x0, v) is a linear function of v. Furthermore, if f has a Fréchet derivative at x0, then f is Gateaux differentiable at x0 with f′(x0, v) = fv′(x0). Gateaux differentiability of f at x0 does not, however, guarantee the existence of f′(x0) without additional conditions on f

The following result is given in Roberts and Varberg (1973, p. 107):

1.33 Theorem

Let f be convex on an open set U ⊆ L. Then for all x, x0 ∈ U,

(1.30)

holds. If f is strictly convex, then the inequality is strict.Note that (1.30) implies

Thusby (1.29) and (1.30) we have

1.34 Remark

In the case of convex functions of several variables (multivariate convex functions) we have

wheref1+′,…,fn+ are the right-hand partial derivatives of f. In this case (1.30) and (1.31) become

and

(1.33)

In the following we observe another definition of convex functions and the definition of J-convex functions for functions of several variables.

1.35 Definition

Let U n. The function f: U is said to be convex if its epigraph

(1.34)

n .

It is known that Definition 1.35 is equivalent to (1.1) for all x, y n and all α ∈ [0, 1]. This definition has some useful applications.

If we replace the interval [a, b] by an open convex set U ⊆ L, we can extend Definition 1.8 for J-convex functions immediately to functions F:U (Roberts and Varberg (1973, pp. 215-16):

1.36 Theorem

Let f be J-convex on an open set U in a normed linear space L. Iff is bounded above in a neighborhood of a single point x0 ∈ U, then f is continuous and hence convex on U.

Wright-convex functions have an interesting and important generalization for functions of several variables (Brunk, 1964).

n denote the n-dimensional vector lattice of points x = (x1,…,xn), xi real for i = 1,…,n, with the partial ordering

iffxi, ≤ yi for i = 1,…,n.

1.37 Definition

A real-valued function f on an n-dimensional rectangle I n is said to have increasing increments if

(1.35)

whenevera I, b + h I, 0 ≤ h n, a b.

1.38 Remark

Brunk (1964) gave the following result for functions with increasing increments f: I : (i) is not necessarily continuous on I. (ii) If f(x) is a continuous function for b x a + b, where 0 n, then φ(t) = f(ta + b), 0 ≤ t ≤ 1, is a convex function on [0, 1]. (iii) If the partial derivatives fi(x) exist for x I, then f has increasing increments iff each of these partial derivatives is increasing in each argument; in other words, iff the gradient f′(x) = (f1′(x),…,fn′(x)) is increasing on I. (iv) The second partials, if they exist, are then nonnegative. (v) If f is continuous and has increasing increments on I, it can be approximated uniformly on I by polynomials having increasing increments and therefore nonnegative second partial derivatives.

1.3 Convex Functions of Higher Order

Let f be a real-valued function defined on [a, b]. A kth order divided difference of f at distinct points x0,…,xk in [a, b] may be defined recursively by

and

(1.36)

Thevalue [x0,…,xk]f is independent of the order of the points x0,…,xk. This definition may be extended to include the case in which some or all of the points coincide by assuming that x0 ≤ … ≤ xk and letting

providedthat f(j)(x) exists. If f ∈ Πn (the class of polynomials of degree at most n), then [x0,…,xn]f is equal to the coefficient of xn and hence is zero if deg f < n.

We can easily show that (1.36) is equivalent to

(1.37)

The following result on B-splines and divided differences can be found in Curri and Schoenberg (1966): For fixed x ∈ [a, b], let M(x, y) = n(y − x)+n−1 be defined as n(y − x)n−1 if y x and zero otherwise. Let a x0 ≤ … ≤ xn b with x0 ≠ xn, and

(1.38)

(the nth divided difference of M(x, y) with respect to y at x0,…,xn). Mn(x) is commonly referred to as a B-spline and has the following