Professional Documents
Culture Documents
net/publication/2808445
CITATIONS READS
170 443
4 authors, including:
Alexander Kogan
Rutgers Business School
95 PUBLICATIONS 2,074 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Impact of Business Analytics and Enterprise Systems on Managerial Accounting View project
All content following this page was uploaded by Endre Boros on 22 January 2013.
RUTCOR Rutgers Center a RUTCOR, Rutgers University, P.O. Box 5062, New Brunswick, NJ
for Operations Research 08903, U.S.A., email: ( fboros,hammer,kogang@rutcor.rutgers.edu)
b Department of Applied Mathematics and Physics, Graduate School
Rutgers University P.O.
Box 5062 New Brunswick
of Engineering, Kyoto University, Kyoto, Japan 606, email: (
ibaraki@kuamp.kyoto-u.ac.jp)
New Jersey 08903-5062 c Department of Accounting and Information Systems, Faculty of Man-
Telephone: 908-445-3804 agement, Rutgers University, Newark, NJ 07102, U.S.A.
Telefax: 908-445-5472
Email: rrr@rutcor.rutgers.edu
http://rutcor.rutgers.edu/rrr
Rutcor Research Report
RRR 04-97, February 1997
Acknowledgements: The authors gratefully acknowledge the partial support by the Oce
of Naval Research (grants N00014-92-J1375 and N00014-92-J4083).
RRR 04-97 Page 1
1 Introduction
1.1 Logical Analysis of Data
The study of numerical data, common to a large variety of disciplines, leads to interesting
problems of combinatorial optimization, the solution of which can have an immediate im-
pact on the understanding of the nature of the phenomena under investigation, the factors
governing them, and on the ways we can in
uence their development.
The data we are examining here consists of collections of \observations" represented
as points in IRd . These observations can be associated to patients in a medical study,
consumers in a marketing analysis, probes in a potential oil eld, etc. The components
of the vectors, called \attributes" stand for the results of various medical tests, nancial
parameters, geological measurements, etc. The observations fall in two classes: \positive"
ones (e.g. healthy patients, potenitial buyers, oil rich areas, etc.) and \negative" ones (e.g.
sick patients, etc.)
The goal of data analysis is to arrive to logical explanations distinguishing positive and
negative observations. It was shown in [10, 14] that { in the case of binary data { this goal
can be achived by the use of partialy dened Boolean functions.
The usefulness of logical or Boolean techniques is an important ingredient of many ap-
proaches in this eld [1, 9, 11, 16, 21, 22]. The classication power of the Boolean-based
methodology of the logical analysis of data (LAD) and its applicability to oncology, psy-
chiatry, oil exploration, economic analysis, and other elds are described in [6, 15, 18].
Preliminary results reported in [6] already indicate that the LAD approach not only has
high classication power for distinguishing between positive and negative examples, but also
is quite successful in extracting underlying structural information from the given data, which
is useful in understanding why and how phenomena occur. While the techniques of LAD
are entirely dierent from those of the pioneering works of Mangasarian (e.g. [17]), the two
approaches provide strongly complementary and mutually reinforcing results.
The methodology of LAD is extended to the case of numerical data by a process called
binarization, consisting in the transformation of numerical (real valued) data to binary (0,1)
ones. In this transformation we map each observation u = (uA ; uB ; :::) of the given numerical
data set to a binary vector x(u) = (x1; x2; :::) 2 f0; 1gn by dening e.g. x1 = 1 i uA 1,
x2 = 1 i uB 2, etc, and in such a way that if u and v represent, respectively, a positive
and negative observation point, then x(u) 6= x(v). The binary variables xi, i = 1; :::; n
associated to the real attributes are called indicator variables, and the real parameters i,
i = 1; :::; n used in the above process are called cut points.
Page 2 RRR 04-97
The binarization process described above has been included in a current implementation
of LAD and has been successfully used in the computational experiments reported in [6].
The object of this paper is to provide a theoretical foundation for the binarization process
with the primary focus on the combinatorial optimization problems related to the minimiza-
tion of the number of cut points. The paper studies the computational complexity of these
problems, develops polynomial time algorithms for some of them, and provides compact
integer programming formulations for them.
1.2 Examples
The meaning of a \logical" explanation of numerical data can be best illustrated by an
example. Let us consider Table 1, containing a set S + of positive observations and a set S ?
of negative observations having the attributes A; B; C . To be more specic, we may think of
the phenomenon X as a headache and the attributes A; B and C as representing say, blood
pressure, body temperature, and the number of pulses, respectively (the numerical values
being normalized appropriately).
In order to extract a logical explanation from this data set, let us rst introduce the
following cut points
A = 3:0; B = 2:0; C = 3:0
for the attributes A; B; C , respectively, and then transform the numerical attributes to binary
indicator variables by dening xA = 1 i uA A , xB = 1 i uB B and xC = 1 i
uC C for all points u = (uA; uB ; uC ) 2 S + [ S ?. The result of this binarization of Table 1
is given in Table 2.
RRR 04-97 Page 3
Although exactly one cut point was introduced for each attribute in the example above,
there is no reason to prohibit the introduction of more than one, or of no, cut point for some
attributes. As another example, let us introduce the following cut points,
B = 2:0; 0C = 5:0; 00C = 1:5;
that is, one cut point for the attribute B , two cut points for C , and none for A. Using two
cuts points per one attribute may be natural; x0C = x00C = 1 would mean \high", x0C = 0 and
Page 4 RRR 04-97
x00C = 1 would mean \intermediate", and x0C = x00C = 0 would mean \low" (note that the
combination x0C = 1 and x00C = 0 is not feasible). The resulting pdBf is shown in Table 3 , 1
most one cut point per numerical attribute is used in the binarization, and show that this
important variant is NP-hard for all of the six classes considered above.
The main focus of this paper is on the problem of nding the minimum number of cut
points (and their locations) under the constraint that the resulting pdBf has an extension in
some given class C .
In order to provide an algorithmic framework for this minimization problem we formulate
it as a linear integer programming problem for all of the six classes above. For some of the
classes this formulation turns out to be a set covering problem. If the problem sizes are
not too large, these formulations can be used to solve the problems exactly, and even if the
sizes are large, they can serve as starting points for the development of eective heuristic
algorithms. Some heuristic algorithms were already implemented in the report [6], and we
are currently elaborating on them to pursue further improvements.
We analyze the restricted case of the minimization problem in which the dimension of the
numerical data set is a xed constant, and show that this restricted problem is polynomially
solvable for the class of linear functions. We also prove that the minimization problem for
the class of positive Boolean functions can be solved in polynomial time when the numerical
data set is of dimension 2.
Finally, it is shown that the minimization problem becomes NP-hard for all other classes
considered in this paper, even if the dimension of the numerical data set is restricted.
It should be remarked that although we do not formulate explicitly, one can easily produce
the dual forms for all the results in this paper for the six classes above using the well known
Boolean duality between 0 and 1, x and x, and ^ and _. This is accomplished by swapping
the true and the false points of the pdBf's and using the dual forms of the classes above. For
example, the results about CLINEAR can be reformulated for the class of Boolean functions
represented by single elementary conjunctions; the results about CHORN can be reformulated
for the class of functions represented by Horn conjunctive normal forms, etc. Note that the
class CTH (as well as CALL, of course) is dual to itself.
T (f ) [ F (f ) = f0; 1gn . For two functions f and h, we denote f h if f (x) h(x) for all
x 2 f0; 1gn . A function is positive (or monotone) if x y (i.e., xj yj for j = 1; 2; : : : ; n)
implies f (x) f (y). A function f is called threshold if there exist n + 1 real numbers
wj ; j = 0; 1; 2; : : : ; n; such that
8
< 1 if Pnj wj xj w
f (x) = : (3)
0 if Pnj wj xj < w :
=1 0
=1 0
The Boolean variables x1; x2; : : : ; xn and their complements x1; x2; : : : ; xn are called literals.
A conjunction of literals (or simply, a term) t = Vj2Pt xj Vj2Nt xj , with Pt \ Nt = ;, is called
an implicant of f if t f (where t is understood as the function it represents). An implicant
t is called prime if there is no implicant t0 = Vj2Pt0 xj Vj2Nt0 xj such that Pt0 Pt ; Nt0 Nt
and Pt0 [ Nt0 6= Pt [ Nt . A disjunction Wmi=1 ti of terms ti = Vj2Pti xj Vj2Nti xj , i = 1; 2; : : : ; m,
is called a disjunctive normal form (DNF), and it is well-known that every Boolean function
can be represented by a DNF. A DNF is a k-DNF if jPti [ Nti j k for all i. In particular,
we call a 2-DNF quadratic and a 1-DNF linear. A DNF Wmi=1 ti is Horn if every ti satises
jNti j 1. A function f is called Horn if it has a Horn DNF. It is known (e.g., [11]) that f
is Horn if and only if F (f ) is closed under intersection operations (i.e., x; y 2 F (f ) implies
x ^ y 2 F (f ), where (x ^ y)j = xj ^ yj ; j = 1; 2; : : : ; n).
We shall consider in this paper the following special classes of Boolean functions: the
class CALL of all Boolean functions, the class CP of positive functions, the class CHORN of
Horn functions, the class CTH of threshold functions, the class CLINEAR of linear (i.e., 1-DNF)
functions, and the class CQUAD of quadratic (i.e., 2-DNF) functions.
A partially dened Boolean function (pdBf, for short) is dened by a pair (T; F ), where
T; F f0; 1gn . A function f is an extension of pdBf (T; F ) if T T (f ) and F F (f ) hold.
The following six lemmas, proved in [7, 10], give the basis for our discussion.
Lemma 2.2 A pdBf (T; F ) has an extension in CP if and only if there is no a 2 T and
b 2 F such that a b. 2
Lemma V2.3 A pdBf (T; F ) has an extension in CHORN if and only if there is no a 2 T such
that a = b2F : : ba b.
st 2
RRR 04-97 Page 7
Lemma 2.4 A pdBf (T; F ) has an extension in CTH if and only if the system of inequalities
X
n X
n
aj wj w ; for all a 2 T
0 and bj wj w ? 1; for all b 2 F;
0
j =1 j =1
has a solution. 2
where [n] = f1; 2; : : : ; ng. Then (T; F ) has an extension in CLINEAR if and only if f =
W x W x satises T (f ) T . 2
j 2I1 j j 2I0 i
Lemma 2.6 Let I and I as in (4), and dene I ; I and I for a given pdBf (T; F ) by
0 1 00 01 11
satises T (f ) T . 2
It is easy to see that the conditions in these lemmas can be checked in time polynomial
in the input length of (T; F ), i.e., n(jT j + jF j). For example, the condition of Lemma 2.4
can be tested by linear programming in polynomial time.
T 1 1 1 1 1 0 0 1
0 1 1 0 0 1 1 1
0 0 0 0 1 0 1 1
F 1 1 1 0 0 0 1 1
0 0 1 0 1 0 0 0
The rst problem considered in this paper is to decide whether there exists a set of cut
points such that the resulting pdBf has an extension in a specied class C .
RRR 04-97 Page 9
Lemma 2.7 Let C be one of the six classes of functions listed in Section 1. If there is a set
of cut points such that the resulting pdBf has an extension in C , then the master pdBf also
has an extension in C .
Proof. Immediate from the denitions of these classes, since adding variables (corresponding
to new cut points) to binary vectors does not prevent (T; F ) from having an extension in C .
2
Lemma 2.7 shows that the master pdBf is the richest representation needed for nding
an extension in the classes considered.
For some classes of functions, the condition for the existence of an extension for the
master pdBf can be stated directly in terms of the original numerical data set.
Lemma 2.8 Let us denote the master pdBf of a numerical data set (S ; S ?) by (T ; F ).
+
Then,
(i) (T ; F ) has an extension in CALL if and only if S + \ S ? = ;.
(ii) (T ; F ) has an extension in CP if and only if there is no pair u 2 S + and v 2 S ? such
that u v.
Proof. Immediate from Lemmas 2.1 and 2.2, and the denition of the master pdBf. 2
Let us consider the class CTH , and let us recall that two subsets S + and S ? of the
Euclidean space IRd are said to be linearly separable if there exist d + 1 real numbers Wj ,
j = 0; 1; 2; : : : ; d; such that
X
d X
d
Wj uj W ; for all u 2 S
0
+
and Wj vj < W ; for all v 2 S ?:
0 (6)
j =1 j =1
Linear separability of two sets S + and S ? is well-known to be equivalent to the disjointness
of their convex hulls, i.e. to conv S + \ conv S ? = ;. Let us remark that for nite sets the
integrality of the coecients W0 and Wj , j = 1; :::; d can also be assumed.
One can easily see that there are examples of (S +; S ?), which are not linearly separable,
but for which a corresponding pdBf (T; F ) does have a threshold extension. If for example
S = fu = (1:0; 1:0)g and S ? = fv = (2:0; 0:0); v = (0:0; 2:0)g;
+ (1) (1) (2)
then
u = 21 v + 21 v ;
(1) (1) (2)
Page 10 RRR 04-97
showing that S + and S ? are not linearly separable. However, if we introduce the cut points
A = 0:5 and B = 0:5, then the resulting pdBf (T; F ),
T = fa = (1; 1)g and F = fb = (1; 0); b = (0; 1)g;
(1) (1) (2)
(for A = 1; :::; d) occurring in S + [ S ?, and that the binary values of the indicator variables
8
< i
xAi = : 1 if uA > uA
( )
0 otherwise
for i = 1; :::; KA correspond to u 2 S + [ S ? in the master pdBf (T ; F ). Hence we have
Pd W u = Pd W u KA + PKA x (u i? ? u i )
( ) ( 1) ( )
A A A A A A i Ai A A
=1
P d
=1
A ( P d )P
= A WA uA + A i xAiWA (uAi? ? uAi ):
K K A
=1
( 1) ( )
=1 =1 =1
Thus the coecients w = W ? PdA WAuAKA , and wAi = WA (uAi? ? uAi ) for A = 1; :::; d
0 0 =1
( ) ( 1) ( )
Theorem 2.1 A numerical data set (S ; S ? ) is linearly separable only if its master pdBf
+
Theorem 2.2 It can be checked in polynomial time whether the master pdBf (T ; F ) of a
numerical data set (S +; S ? ) has an extension in any of the six classes of functions listed in
Section 1. 2
RRR 04-97 Page 11
Theorem 3.1 The problem SINGLE-CP(C ) is NP-complete for each of the classes CALL ,
CHORN , CTH , CLINEAR, and CQUAD.
Proof. It is clear that SINGLE-CP(C ) is in NP for all the classes listed in the theorem
statement. In order to prove that SINGLE-CP(C ) is NP-hard, we shall reduce to it the
3SAT problem, a well-known NP-complete problem.
3SAT
Input: A set of clauses Ci , i = 1; 2; : : : ; m, each containing at most three literals from
fX1, X1, : : :, Xn , Xn g.
Question: Is there an assignment of values to the Boolean variables X1 ; X2 ; : : : ; Xn such
that all the clauses are satised?
We shall associate to a given set of clauses Ci; i = 1; 2; : : : ; m, a numerical data set
(S ; S ? ), involving n attributes and m + 1 input vectors, such that SINGLE-CP(C ) has a
+
solution for this (S + ; S ?) if and only if the answer of 3SAT is YES. The set S ? will consist
of a single negative point (1; 1; : : : ; 1), while the set S + will include a positive point u(i) for
every clause Ci = Wj2Pi Xj Wj2Ni X j , dened by
8
i < 2 if j 2 Pi
>
uj = > 0 if j 2 Ni
( )
: 1 if j 26 Pi [ Ni:
For example, for the clauses C1 = (X2 _ X 3 _ X 5), C2 = (X 1 _ X4), C3 = (X1 _ X 2 _ X5),
and C4 = (X3 _ X 4 _ X 5) the corresponding numerical data set (S + ; S ?) will be
Page 12 RRR 04-97
S : positive examples 1
+
2 0 1 0
0 1 1 2 1
2 0 1 1 2
1 1 2 0 0
S ? : negative examples 1 1 1 1 1
To solve SINGLE-CP(CALL ), we shall introduce at most one cut point per attribute in
such a way that the resulting pdBf (T; F ) has an extension in CALL (i.e. T \ F = ;). Clearly,
for each attribute j , the cut point j must be either 0.5 or 1.5. Let us associate to the cut
points the following binary assignment of the Boolean variables Xj :
8
<
Xj = : 1 if j = 1:5 (7)
0 if j = 0:5
With this denition, there is an assignment that satises all clauses Ci if and only if the
associated binarization produces a consistent pdBf (T; F ) with T = fx(u(i)) j u(i) 2 S +g
and F = fx(v) j v = (1; 1; : : : ; 1)g. Indeed, point u(i) 2 S + can be separated from point
v 2 S ? either by setting j = 1:5 for some j 2 Pi, or by setting j = 0:5 for some j 2 Ni,
i.e. exactly when the associated binary assignment satises clause Ci.
Note that if the pdBf (T; F ) is consistent, then it has an extension represented by the
DNF Wnj=1 Xjj , where Xjj = Xj if j = 1:5 and Xjj = Xj if j = 0:5. Clearly, this function
is linear, and hence it is also quadratic, Horn, and threshold. Therefore, (S + ; S ?) has an
extension in CALL if and only if it has an extension in all of the classes CHORN , CTH , CLINEAR,
and CQUAD . 2
by
8 8
< +1 if Xj 2 Ci or j 2 Ai n Ii;
> < +1 if j 2 Ai n Ii;
>
aj = > 0 if Xj 2 Ci;
i
( ) i
and bj = > 0 if Xj 2 Ci;
( )
Although the data are already binary, it is necessary to introduce a cut point j = 0:5 if the
attribute j is used as a Boolean variable xj in the pdBf (T; F ) obtained after binarization.
Let J be the set of indices j for which the cut points j = 0:5 are introduced. Then
T = f(a j ; j 2 J ); (a j ; j 2 J ); : : :; (amj ; j 2 J )g and F = f(0; 0; : : : ; 0)g;
1 2
where each vector has dimension jJ j. By Lemma 2.1, this pdBf (T; F ) has an extension in
CALL if and only if Pj2J aij 1 holds for all i 2 [m]. Since the number of cut points is
jJ j, (T; F ) is a desired pdBf if and only if J is a solution to MIN-COVER. This proves that
MIN-COVER is reduced to MIN-CP(CALL), and hence MIN-CP(CALL ) is NP-hard.
Furthermore, notice that, if J allows extensions in CALL, one of them can be represented
as f = _j2J xj , which is positive, Horn, threshold and linear (hence quadratic). This
means that MIN-CP(C ) is NP-hard for all the six classes CALL ; CP ; CHORN ; CTH ; CLINEAR,
and CQUAD . 2
RRR 04-97 Page 15
As a remark, we note that the above construction uses at most one cut point j = 0:5
for each attribute j of (S + ; S ?). Therefore, MIN-CP(C ) remains NP-hard for all the six
classes even after imposing the additional constraint of using at most one cut point for each
attribute.
j =1
Page 16 RRR 04-97
yj = 0; 1; j = 1; 2; : : : ; n:
Here each constraint P aju;v yj 1 guarantees that u 2 S and v 2 S ? are transformed
( ) +
into distinct Boolean vectors using only the chosen variables xj . If a solution y is feasible
(i.e., satises all the constraints), then the resulting pdBf (T; F ) satises the condition of
Lemma 2.1, and if y is optimal, it gives the minimum number of cut points necessary to
satisfy Lemma 2.1. Let n be the number of cut points introduced to dene the master pdBf.
This MIN-COVER problem has n variables, and jS j jS ?j constraints.
+
We extend now the same idea to the case of CP . This can be accomplished by simply
changing the denition of aju;v as follows.
( )
8
<
aju;v = : 1 if xj (u) = 1 and xj (v) = 0
( )
0 otherwise.
The constraint P aju;v yj 1 now guarantees that, for each pair of u 2 S and v 2 S ?, at
( ) +
least one j having the property xj (u) > xj (v) is selected. Therefore, the resulting (T; F )
constructed from (T ; F ), by using the coordinates j for which yj = 1 satises the condition
in Lemma 2.2. This MIN-COVER formulation for CP has the same number of variables and
constraints as the above formulation for CALL .
As the third case of MIN-COVER formulation, we consider CLINEAR. In this case, Lemma
2.5 states that a cut point Ai for the attribute A of the given data set (S ; S ?) is useful only
+
if the resulting indicator variable xAi belongs to one of the sets I or I dened in Lemma
0 1
2.5. This is possible if and only if either (i) vA Ai for all v 2 S ?, or (ii) vA < Ai for all
v 2 S?.
Therefore, without loss of generality we can restrict our attention to the following two
cut points for each attribute A:
A = minfvA j v 2 S ?g ? =2 and A = maxfvA j v 2 S ? g + =2;
0 0 1 1 (11)
here 0 > 0 is the gap between minfvA j v 2 S ?g and the largest uA ; u 2 S + , which is
smaller than minfvA j v 2 S ?g (if there is no such uA , the cut point A0 is not needed),
and 1 > 0 is dened in a similar way. The situation is illustrated in Figure 1 for the
two dimensional case, where xAi and xBj are the indicator variables associated with the cut
points Ai and Bj , respectively.
The problem MIN-COVER can now be constructed for CLINEAR in the same manner
as the formulation for CALL . The only dierence from the case CALL is that here each yj
corresponds to A0 or to A1 for some attribute A. A solution y of MIN-COVER gives the
RRR 04-97 Page 17
B
x A0 xA1 ∈ S+
∈S-
x B1
α B1
A
αA0 α A1
1-DNF 0 1 0 1
_ _
f =@ xA A _ @ xA A ;
1 0
yA1 =1 yA0 =1
which is an extension of (T; F ). This MIN-COVER problem has only 2d variables, where d
is the dimension of vectors in (S +; S ? ). The number of constraints remains the same as in
the above two cases.
We shall present now integer programming formulations of MIN-CP(C ) for CQUAD , CHORN
and CTH . The problem MIN-CP(CQUAD) can be stated as an integer program, which is
resembling MIN-COVER. For each linear or quadratic term tk , which corresponds to an
element in I0; I1; I00; I01; I11 of Lemma 2.6,
let us introduce a decision variable zk 2 f0; 1g
(k = 1; 2; : : : ; K ). Clearly, K n + 3 2 , unless F = ;. Let us denote the degree of tk by
n
dk (here, dk is either 1 or 2), and dene for each u 2 S +
8 8
< <
aku
( )
= : 1 if tk (x(u)) = 1 and bjk
( )
= : 1 if either xj or xj is involved in tk (12)
0 otherwise, 0 otherwise.
to describe whether xj is selected (yj = 1) or not (yj = 0). Then an integer program for
MIN-CP(CQUAD) is the following.
X
n
minimize yj
j =1
X K
subject to aku zk 1; for all u 2 S
( ) +
k=1
Xn
bjk yj dk zk ; for all k = 1; : : : K
( )
(13)
j =1
yj = 0; 1; j = 1; 2; : : : ; n
zk = 0; 1; k = 1; 2; : : : ; K:
The constraint PKk=1 a(ku)zk 1 guarantees that at least one quadratic (or linear) term tk
with tk (x(u)) = 1 is chosen for every u 2 S + . The constraint Pnj=1 b(jk)yj dk zk becomes
redundant if zk = 0, but if zk = 1 it enforces yj = 1 whenever b(jk) = 1 (as a result of
Pn b(k) = d ). This guarantees that all the indicator variables x in each chosen term t
j =1 j k j k
(i.e., with zk = 1) are selected. Therefore, any feasible solution of (13) denes a pdBf that
satises the conditions of Lemma 2.6, and the optimum value of the objective function gives
the minimum number nof cut points necessary to satisfy Lemma
2.6. The integer program (13)
has at most 2n + 3 2 variables and at most jS + j + n + 3 n2 constraints.
It is straightforward to see how the above integer program can be modied for nding
the minimum number of cut points such that the resulting pdBf has an extension in the class
of d-DNFs, for any xed d.
The integer program (13) can be modied for solving MIN-CP(CHORN ), based on the
following observation. For u 2 S + , let Ju = fj j xj (u) = 1; j = 1; 2; : : : ; ng. Then a Horn
extension exists if and only if, for every u 2 S + , at least one of the Horn terms Vj2Ju xj and
xi Vj2Ju xj for i 2 f1; 2; : : : ; ngn Ju takes the value 0 on all points x(v), v 2 S ?. Therefore, a
Horn DNF (if any) can be constructed by taking the disjunction of a number of such terms.
It should be noted that some literals may be removed from the selected terms in order
to decrease the number of cut points, as long as the resulting DNF still represents a Horn
extension of (S +; S ? ). To achieve this goal, let tk ; k = 1; 2; : : : ; K , denote all the terms of
the above type. Clearly, K njS + j. Let us introduce decision variables zk (k = 1; 2; : : : ; K )
for such terms, and dene
8 8
< >
< 1 if xj (v) = 0 and tk contains literal xj ;
(u)
ak = : 1 if t k (x(u)) = 1 (v;k )
and bj = > or xj (v) = 1 and tk contains xj
0 otherwise, : 0 otherwise.
RRR 04-97 Page 19
Let us further introduce a decision variable yj to describe whether the Boolean variable xj
in the master pdBf (and hence the cut point associated with it) is chosen (yj = 1) or not
(yj = 0). The resulting integer program for the solution of MIN-CP(CHORN ) becomes
X
n
minimize yj
j =1
X K
subject to aku zk 1; for all u 2 S
( ) +
k=1
Xn
bjv;k yj zk ; for all k = 1; 2; : : : ; K and for all v 2 S ?
( )
(14)
j =1
yj = 0; 1; j = 1; 2; : : : ; n
zk = 0; 1; k = 1; 2; : : : ; K:
The constraint PKk=1 a(ku)zk 1 guarantees that there is at least one Horn term tk for every
u 2 S +, which satises tk (x(u)) = 1. The constraint Pnj=1 bj(v;k)yj zk guarantees that,
for every selected term tk and v 2 S ?, at least one indicator variable xj involved in this
term will also be selected guaranteeing that this term take the value 0 on v. Therefore, any
feasible solution of (14) denes a pdBf which satises the conditions of Lemma 2.3, and if y
is optimal, the optimum value gives the minimum number of cut points necessary to satisfy
Lemma 2.3. The integer linear program (14) has at most n + njS + j variables and at most
jS +j + njS + j jS ? j constraints.
The integer program for the solution of MIN-CP(CTH ) is based on Lemma 2.4 that
requires the following linear program to have a solution:
Pn a w w ; for all a 2 T
j j j =1 0
For the development that follows, we need to establish an upper bound M on the absolute
values of the w's. Since the number of variables in (15) is n + 1 and since all the coecients
are +1, ?1, or 0, each component of a solution (if any) can be represented as the ratio of
two determinants of dimension n + 1 with elements +1, ?1, or 0 (see, e.g., [3]). Based on
this, we can use the Hadamard inequality to show that M satises
M n n2 : (16)
Introducing now a decision variable yj to describe whether the Boolean variable xj in the
master pdBf (and hence the cut point associated with it) is chosen (yj = 1) or not (yj = 0),
Page 20 RRR 04-97
Since d is limited by a constant k, in order to solve this problem, we generate all the possible
vectors y 2 f0; 1g2k to nd a feasible vector that minimizes P yj . Since the number of
possible Boolean vectors is bounded by a constant 22k , the whole process can be performed
in polynomial time, thus proving the next theorem.
Theorem 5.1 If the dimension d of the given data set (S ; S ?) is limited by a constant k,
+
Before studying the problem MIN-2CP(CP ), let us consider a data set (S + ; S ?) and its
extension in the two-dimensional space (i.e., d = 2), where the two attributes are denoted A
and B . The positive and negative observations are represented in the 2-dimensional plane
as shown in Figure 2(a). We draw vertical lines corresponding to the cut points introduced
for A, and horizontal lines for B , thus creating a number of rectangular regions. A minimal
rectangular region (not containing other rectangular regions) is called a cell, and each cell
corresponds to a unique Boolean vector resulting from the binarization using the cut points
(see Fig. 2(b)). The resulting pdBf (T; F ) is obtained by dening T as the set of all the
Boolean vectors corresponding to the cells containing positive observations, and dening F as
the set of all the Boolean vectors corresponding to the cells containing negative observations.
Recall that the master pdBf of a numerical data set (S + ; S ?) has a positive extension
if and only if no pair of u 2 S + and v 2 S ? satises u v (by Lemma 2.8). We say
that (S +; S ? ) is positively oriented if this condition is satised. For example, the data set
(S +; S ? ) in Figure 3 is positively oriented.
Let us call a set of cut points (or the associated pdBf (T; F )) consistent if no resulting
cell contains both positive and negative observations. Lemma 2.1 states that there is an
extension in CALL if and only if the given set of cut points is consistent. The same statement
also holds for the class CP if the data set (S + ; S ?) is positively oriented. So the problem
here is how to introduce a minimum number of vertical and horizontal lines in order to be
consistent.
We shall call a pair of positive and negative points in the 2-dimensional plane critical if
the minimum rectangle containing them contains no other point of S + [ S ?. The broken line
segments in Figure 2 indicate all critical pairs in this data set. We say that a line (vertical or
horizontal) separates a critical pair if the two points of the pair are located in the opposite
sides of the line. The following lemma is a restatement of the above argument.
Lemma 5.1 Given a 2-dimensional positively oriented data set (S ; S ?), let (T; F ) denote
+
the pdBf resulting from a set of cut points. Then the following three conditions are equivalent.
Page 22 RRR 04-97
: positive example
α : negative example
B2
: critical pair
α
B1
α α α A
A1 A2 A3
α B2
αB1
(00000) (10000) (11000) (11100)
α A1 α A2 α A3 A
B
(1) : positive point
u
: negative point
: extreme positive point
v (1) (2) : extreme negative point
u
v (2) u (3)
u (4) = u (5)
v (3) = v (4)
v (5) = v (6)
A
Theorem 5.2 If a given data set (S ; S ? ) is positively oriented, then the maximum number
+
of pairwise independent pairs in (S + ; S ? ) equals the minimum number of cut points needed
to produce a consistent pdBf (T; F ).
After introducing some more notations and denitions, we shall prove this statement by
actually presenting a construction of a consistent separation that involves as many cut points
as the number of pairwise independent pairs.
Let us call a positive point u 2 S + extreme if there is no distinct positive point u0 2 S +
such that u0 < u. Let us denote by Sex+ the set of extreme positive points in S +. Similarly,
a negative point v 2 S ? is called extreme if there is no distinct negative point v0 2 S ? such
that v0 > v. Let us denote by Sex? the set of extreme negative points in S ?. The signicance
of extreme points consists in the fact that any separation of all critical pairs of extreme
points will be at the same time a separation of all critical pairs in Lemma 5.1(ii). In Figure
3, there are ve extreme positive points and ve extreme negative points, which are indicated
as double circles.
For a point w, let us denote by NW (w) the region of all 2-dimensional points located to
the \north-west" from w. More precisely,
NW (w) = fw0 j wA0 wA ; wB0 wB g:
Similarly,
NE (w) = fw0 j wA0 wA ; wB0 wB g;
SW (w) = fw0 j wA0 wA; wB0 wB g;
SE (w) = fw0 j wA0 wA; wB0 wB g:
We say that a point w0 precedes another point w00 if w0 2 NW (w00), and denote this relation
by w0 w00. Since any two extreme points u0; u00 2 Sex satisfy either u0 2 NW (u00) or
+
u00 2 NW (u0), this precedence relation is a linear order on Sex. Similarly, this relation is also
+
the corresponding (k + 1)-th point is not dened, and the construction thereafter is stopped.
Let us call all the constructed points u i and v i , i = 1; 2; ::: corner positive and corner
( ) ( )
RRR 04-97 Page 25
negative points, respectively. Note that SE (u(i+1)) SE (u(i)) and SE (v(i+1)) SE (v(i))
hold for all i. Figure 3 contains corner positive points u(1); u(2); : : : ; u(5) and corner negative
points v(1); v(2); : : :; v(6), for which v(3) = v(4); v(5) = v(6) and u(4) = u(5) hold.
Assume that the positive corner points u(1); u(2); : : :; u(ku ) and the negative corner points
v(1); v(2); : : :; v(kv) are constructed in the above process. Since the construction alternates
between positive and negative points, it is easy to see that ku and kv dier at most by one.
Let us denote
k = minfku; kv g:
Clearly, k 1 holds if S + 6= ; and S ? 6= ;. It is important to notice that the above
construction rule implies that the pairs (u(i); v(i)) and (u(j); v(j)) are independent for any
i 6= j . Therefore, we thus nd k pairwise independent pairs (u(i); v(i)); i = i; 2; : : : ; k;
(represented in the example of Figure 3 by solid line segments).
For a pair (u(i); v(i)), the two points u(i) and v(i) may be either -comparable or not. If
they are not -comparable, then v(i) u(i). On the other hand, if they are -comparable,
the subsequent pairs have a very special structure, as shown in the next lemma.
Proof. We consider only the case of u i v i , since the other case can be proved in a
( ) ( )
then v0 would satisfy v0 2 SE (u i ) \ Sex? SE (u i? ) \ Sex? and v0 would play the role of
( ) ( 1)
It can be seen that if points w; w0 2 S + [ S ? , then wA0 62 (wA ? 12 A; wA) or wB0 and
wB0 62 (wB ; wB + 12 B ). For the subsequent construction we shall introduce the following
potential cut points A = A(i) and B = B (i).
A(i) = u(Ai) ? 21 A
B (i) = vB(i) + 12 B :
We are now ready to turn to
Proof of Theorem 5.2. We have seen that the above procedure constructs k pairwise
independent pairs. We shall prove the theorem by constructing the same number k of cut
points, which separate all critical pairs.
The selection of the cut points starts from the very last (i.e., the k-th) pair u(k) and v(k).
Let us consider rst the case in which there exists an i k such that the points u(i) and
v(i) are -comparable. Then the corner points u(k) and v(k) are -comparable by Lemma
5.2. If u(k) v(k), then we choose B = B (k) as the rst cut point, and continue choosing cut
points alternating between B s and As, (B = B (k); A = A(k?1); B = B (k?2); : : :) until the rst
pair (u(1); v(1)) is separated. If, on the other hand, v(k) u(k), then we choose alternatively
A = A(k); B = B (k?1); A = A(k?2); : : : until the rst pair (u(1); v(1)) is separated. These
alternating cut points form a staircase as shown in Figure 4. To show that this staircase
separates all critical pairs in (S +; S ?), it is sucient to prove that there is no positive point
to the south-west of this staircase, and no negative point to the north-east of it.
Let us consider in detail the case of u(k) v(k), since the case of v(k) u(k) can be
treated in a similar way. We show rst that there is no negative point to the north-east
of the staircase. For the area NE (A(k?1); B (k)), this amounts to showing that there is no
negative point in NE (u(Ak?1); vB(k)) (except for v(k)). Obviously,
NE (u(Ak?1); vB(k)) = NE (u(k?1)) [ NE (v(k)) [ (SE (u(k?1)) \ NW (v(k))):
The area NE (u(k?1)) does not contain a negative point, since any w 2 NE (u(k?1)) satis-
es w u(k?1) and the data set (S +; S ? ) is positively oriented. The area NE (v(k)) does
not contain any negative point, since v(k) is an extreme negative point. Finally, the area
(SE (u(k?1)) \ NW (v(k))) does not contain any negative point, since v(k) is the -smallest
negative point in SE (u(k?1)). In a similar way, it can be shown that there is no negative
point in NE (A(k?3); B (k?2)), and so on. In Figure 4, it is illustrated by broken lines how
A ; vB ) can be represented as NE (u ) [ NE (v ) [ (SE (u ) \ NW (v )).
NE (u(2) (3) (2) (3) (2) (3)
If k is even, the argument is complete, since the north-east area of the staircase is the
union of k=2 north-east regions NE (A(2j?1); B (2j)); j = 1; 2; : : : ; k=2. If k is odd, then it
RRR 04-97 Page 27
(1)
u
B=B (1)
v (1) (2)
u
(3)
A=A (2) B=B
v (2) u (3)
A=A (4)
( A (2) , B (3) )
( A (4) , B (5) )
v (5) = v (6)
remains to be shown that there is no negative point with B B (1), or equivalently, that
there is no negative point in the region with B vB(1), which is NE (v(1)) [ NW (v(1)). Indeed,
NE (v(1)) contains no negative point, since v(1) is an extreme negative point. Also, NW (v(1))
contains no negative point, since v(1) is the rst corner negative point.
Let us now discuss why there is no positive point to the south-east of the staircase. Notice
rst that there is no positive points with B B (k) (equivalently, B vB(k)). Indeed, SW (v(k))
does not contain any positive point, since (S + ; S ?) is positively oriented, and SE (v(k)) does
not contain any positive point, since u(k)( v(k)) is the last corner positive point. In order to
show that there is no positive point in SW (A(k?1); B (k?2)) (equivalently, SW (u(Ak?1); vB(k?2))),
we need the following representation:
SW (uAk? ; vBk? ) = SW (u k? ) [ SW (v k? ) [ (SE (v k? ) \ NW (u k? )):
( 1) ( 2) ( 1) ( 2) ( 2) ( 1)
The area SW (u(k?1)) does not contain any positive point, since u(k?1) is an extreme positive
point. The area SW (v(k?2)) does not contain any positive point w, since (S +; S ? ) is positively
oriented. Finally, the area (SE (v(k?2)) \ NW (u(k?1))) does not contain any positive point,
since u(k?1) is the -smallest positive point in SE (v(k?2)). In a similar way, it can be shown
that there is no negative point in SW (A(k?3); B (k?4)), and so on.
If k is odd, the argument is complete, since the south-west area of the staircase is the
union of (k ? 1)=2 south-west regions SW (A(2j); B (2j?1)) and of the rst half-plane dened
Page 28 RRR 04-97
by B vB(k). If k is even, then it remains to be shown that there is no positive point with
A A(1) (equivalently, A u(1) A ). Indeed, the area SW (u ) contains no positive point,
(1)
since u(1) is an extreme positive point; and NW (u(1)) contains no positive point, since u(1)
is the rst corner positive point. This completes the analysis of the case u(k) v(k).
Finally, let us consider the case in which there is no i k such that the points u(i)
and v(i) are -comparable. Let us analyze rst the case ku = kv . There are two ways of
constructing the separating staircase using k cut points: either B (k); A(k?1); B (k?2); : : : or
A(k); B (k?1); A(k?2); : : :. In either case, an analysis similar to the above shows that there is
no negative point to the north-east of the staircase, and no positive point to the south-west
of it. Let us consider now the case kv = ku + 1 (hence k = ku ). In this case v(k+1) 6=
v(k), and the staircase is constructed by choosing the cut points B (k); A(k?1); B (k?2); : : :.
Notice that in this case there is no positive point with B B (k) (equivalently, B vB(k)).
Indeed, SW (v(k)) contains no positive point (by the assumption that (S +; S ? ) is positively
oriented), and SE (v(k)) contains no positive point (since u(k) is the last corner positive
point). The rest of the argument proceeds as above. (Note that the construction of the
staircase using the cut points A(k); B (k?1); A(k?2); : : : would not work, since the point v(k+1)
is located to the north-east of the staircase.) The last case ku = kv + 1 (hence k = kv )
is similar to the above, and we omit its proof. Figure 5 shows the structure with four
independent corner pairs when ku = 4 and kv = 5. It also shows the staircase constructed
as in the proof. The broken lines in the gure indicate how SW (u(3) A ; vB ) is represented as
(2)
(1)
u
v (1) (2)
u
u (3)
v (2)
u (4)
v (3)
v (4) v (5)
MAX-2SAT
Input: A set of m clauses, each containing two literals chosen from fX1; X1; X2; X 2; : : :;
Xn ; Xn g.
Output: An assignment to Boolean variables X1; X2 ; : : : ; Xn such that the num-
ber of satised clauses is maximized. (A clause is satised if at least one of its
literals is given the value 1 by the assignment.)
To prove these results, we have to introduce some terminology, to describe some building
blocks, and to prove some lemmas.
Construction of a grid structure: As a rst step of our construction, we prepare
a grid structure, as shown in Fig. 6, by placing positive and negative examples along the
left-most and the bottom straight edges. It is immediate to see that (1) critical pairs in
this structure are only the contingent pairs of positive and negative examples, and (2) one
vertical or horizontal line (i.e., cut point) is required to separate each pair. Clearly, the
resulting structure is consistent. It can also be easily seen that, by changing the locations
of positive and negative examples, the grid structure can be controlled so that various other
grid structures in the subsequent discussion can be built in a similar manner. For simplicity,
we represent each pair of double lines by a single line in most of the subsequent gures.
We shall place some additional examples in the cells created by the grid. Since examples in
Page 30 RRR 04-97
dierent cells of the grid are already separated by the grid lines, we can restrict our discussion
only to those critical pairs which are contained in single cells. This will greatly simplify the
argument.
In the following discussion, we shall count only those lines which are introduced to sepa-
rate additional examples, since all the grid lines are forced and their number can be consid-
ered to be xed.
(a) X j - square
Xj = 1 Xj = 0
( mn+2m, mn+2m )
2 • C1 •
• •
2 • C2
•
2m
• • • C/X • • • • • •
2 Cm
• •
• •
m X1 • •
mn
• • • • • • • • • C/X
• •
m • Xn •
• •
m m 2 2 2
(1, 2)
mn 2m
(1, 1)
(2, 1)
we take a cell in Xj -squares as a unit. The entire area is divided into four sub-areas; the lower-
left area contains n Xj -squares arranged diagonally, the upper-right area of size 2m 2m
contains m squares of sizes 2 2, named Ci-squares, which are also arranged diagonally,
and the remaining two areas, the upper-left and the lower-right ones, will represent the
connection between the clauses Ci and the literals Xj ; Xj . The last two sub-areas will be
called the C=X area and the C=X area respectively, for the reason to become clear later.
Variables, clauses and their connection: Fig. 9 shows a unit cell in an Xj -square
and a Ci-square. A Ci-square has four unit areas, and a positive example is placed in the
upper-right corner of the upper-right unit while a negative example is in the lower-left corner
of the lower-left unit.
To represent the fact that each clause Ci contains two literals, we place critical pairs in
the unit areas of the following locations:
(m(j ? 1) + i; mn + 2m ? 2(i ? 1)) if Xj is the rst literal in Ci
(m(j ? 1) + i; mn + 2m ? 2(i ? 1) ? 1) if Xj is the second literal in Ci
(mn + 2i; mn ? m(j ? 1) ? (i ? 1)) if Xj is the rst literal in Ci
(mn + 2i ? 1; mn ? m(j ? 1) ? (i ? 1)) if Xj is the second literal in Ci;
where the location of a unit area is described by the location of its upper-right corner in
the A-B plane. We call these unit areas containing critical pairs connectors, and use names
such as Ci-connectors (if they are related to Ci), Xj -connectors (if they are related to Xj ),
Xj -connectors (if they are related to Xj ), and C=X (resp., C=X )-connectors (if they are in
C=X (resp., C=X )-area). These rules may be best illustrated by an example such as Fig.
10, in which two clauses C1 = (X1 _ X2) and C2 = (X2 _ X3) are represented.
Page 34 RRR 04-97
C1
B
10
C2
8
X1
X2
0 A
6 8 10
X3
Figure 10: Construction for C1 = (X1 _ X2) and C2 = (X2 _ X3) with n = 3 and m = 2.
RRR 04-97 Page 35
Xj Xl Ci Ci
(a) Ci = ( X j ∨ X l )
Xj
Xj Ci
Xl
(b) Ci = ( X j ∨ X l )
Xl
(c) C = ( X ∨ X )
i j l
Ci Ci
v v
Xj Xj
Xl Xl
(a) X j = 1, X l = 0 (b) X j = 1, X l = 1
Ci Ci
Xj Xj v
h h
Xl
Xl
(c) X j = 0, X l = 0 (d) X j = 0, X l = 1
Lemma 5.3 Given a 2-dimensional data set (S ; S ?) on attributes A and B , the pdBf (T; F )
+
resulting from a set of cut points has an extension in CHORN if and only if the following two
conditions hold (see Fig. 2).
(i) (T; F ) is consistent.
(ii) For any subset V S ?, the cell containing the point ^v2V v does not contain any posi-
tive example u 2 S + , where v^v 0 for v; v 0 2 R2 denotes the point (minfvA; vA0 g; minfvB ; vB0 g).
Proof. Condition (i) is already in Lemma 5.1 for CALL. Let x(u) denote the Boolean vector
obtained from u 2 S [ S ? using the introduced cut points (see Fig. 2(b)). Then if condition
+
in the above situation, this is equivalent to the condition in Lemma 2.3. These observations
prove the lemma. 2
Now let us examine all the new points generated by operations ^v2V v for V S ? in
the construction for the proof of Theorem 5.4. First, all the negative examples used to build
the grid (see Fig. 6) can generate only one new point at the very lower-left corner, which
violates none of the conditions (i) and (ii). To see the eect of other negative examples, we
illustrate in Fig. 13 all the points generated by such operations (shown as ) for the case
of C1 = (X1 _ X2) and C2 = (X2 _ X3 ) corresponding to Fig. 10. It is straightforward to
observe that conditions (i) and (ii) hold if the set of cut points separates all critical pairs in
Fig. 10. This can be generalized to other cases, though we omit the details. In other words,
the pdBf (T; F ) obtained in the construction for Theorem 5.4 has an extension in CHORN ,
and hence establishes the next theorem.
Now we turn to the class CTH . For a given 2-dimensional data set (S + ; S ?), assume
that a consistent set of cut points is at hand. Call a cell positive (resp., negative) if it
contains a positive example (resp., a negative example). The property of consistency implies
that there is no cell which is positive and negative at the same time. A sequence of cells
Q = q1; q2; : : :; qk is called an alternating cycle if
(i) q1 = qk ,
(ii) positive and negative cells alternate in Q, and
(iii) the edges connecting qi and qi+1 for i = 1; 2; : : : ; k ? 1 alternate between vertical and
horizontal.
An example of an alternating cycle is shown in Fig. 14.
Lemma 5.4 Given a 2-dimensional data set (S ; S ? ), the pdBf (T; F ) resulting from a set
+
of cut points has an extension in CTH if and only if the following two conditions hold.
(i) (T; F ) is consistent.
(ii) There is no alternating cycle.
Page 40 RRR 04-97
+ +
+ + +
respectively, and let xA (u) = (xA1(u); xA2(u); : : :; xAp(u)) and xB (u) = (xB1(u); xB2(u); : : :; xBr(u)),
where xAi (u) = 1 if uA Ai and 0 otherwise; similarly for xBj . By the asummability con-
dition mentioned in Section 2.2, (T; F ) has an extension in CTH if and only if there is no set
of u 0; u 2 S + and
v 0; v 2 S ? such that
X X
u =
v > 0;
u2S + v2S ?
X X
u(xA(u); xB (u)) =
v (xA(v); xB(v)): (18)
u2S + v2S ?
We rst prove the only-if-part. Condition (i) follows from Lemma 5.1. To show condition
(ii), assume that an alternating cycle Q = q1; q2; : : :; qk exists, and let U + (resp., U ? ) be the
set of positive examples (resp., negative examples) obtained by selecting exactly one example
from each participating cell. Then
jU +j = jU ?j
X X
(xA(u); xB (u)) = (xA(v); xB (v))
u2U + v2U ?
hold since Q is alternating. Therefore, by (18), (T; F ) does not have an extension in CTH , a
contradiction.
To prove the if-part, assume that (T; F ) does not have an extension in CTH . If it does
not have an extension in CALL, then the condition (i) is violated. If it has an extension in
CALL but not in CTH , there are u 0; u 2 S + and
v 0; v 2 S ? satisfying (18). We
RRR 04-97 Page 41
point out that, by the denition of binarization as explained in Section 2.2, xA(u) is equal
to one of the p-dimensional vectors (0 00); (0 01); (0 011); : : : ; (1 111), and xB (v)
is equal to one of the r-dimensional vectors (0 00), (0 01), (0 011), : : :, (1 111).
Note that the p vectors (resp., r vectors) (0 01), (0 011), : : :, (1 111) among those
form a basis of the p-dimensional (resp., r-dimensional) linear space, and that (00 00)
does not contribute to the sum in (18). Linear algebra then tells that
X X
uxA (u) =
v xA (v) (19)
u2S + ; xA (u)=a v2S ? ; xA (v)=a
holds for every a = (0 00); (0 01); (0 011); : : : ; (1 111), and that similar equations
hold for xB (u).
Now let us start with a positive cell containing some u 2 S + with u > 0. Then by
(19), there must be an example v 2 S ? such that xA (v) = xA(u) (but xB (v) 6= xB (u)) and
v > 0. This edge from the cell of u to the cell of v is horizontal. We then apply a similar
argument to the negative cell of v, and nd a positive cell containing u0 2 S + such that
xB (u0) = xB (v) and u0 > 0. This implies that there is a vertical edge from the negative
cell of v to the positive cell of u0. Repeating this, we grow an alternating path, but, since
the number of points with non-zero coecients in the relation (18) is nite, a cell must be
repeated at a certain step, thus forming an alternating cycle. We conclude, therefore, that
there is an alternating cycle, and the condition (ii) is violated. 2
Next we show that any set of cut points producing an extension in CALL in the construc-
tion in the proof of Theorem 5.4 satises the above two conditions. The condition (i) is
immediate from Lemma 5.1. To show that the condition (ii) is also satised, we proceed
step by step.
1. If there is an alternating cycle Q, it does not use the examples introduced to build the
grid (like Fig. 6). Indeed, if such a Q exists, it must use a horizontal or vertical edge from
one of positive examples among them, which is perpendicular to the array of the examples;
but it is impossible since there is no other example in the row or column of such edge.
2. The alternating cycle Q cannot use a cell in Xj -squares. To prove this, assume without
loss of generality that an Xj -square is separated by m horizontal lines. Then, any row of
cells of this Xj -square contains only positive examples or only negative examples, implying
that the row cannot produce a horizontal edge in Q. (Note, in this case, there may be a
column containing both positive and negative examples.)
3. The alternating cycle Q cannot use a cell in C=X or C=X area. This is because Q
then contains an edge connecting the cell to a cell in some Xj -square, which was ruled out
Page 42 RRR 04-97
in 2.
4. As a result of 13, the alternating cycle Q must consist only of cells in Ci-squares.
However, this is obviously impossible since the cells in this area are diagonally arranged.
This proves that the resulting pdBf (T; F ) has an extension in CTH , and establishes the
next theorem.
Theorem 5.6 MIN-2CP(CTH ) is NP-hard. 2
Our next target is CQUAD . First note that each region represented in the 2-dimensional
plane by a quadratic term such as XAi XAj , XAi XBj , XAi XBj or XAi XBj is one of those
illustrated in Fig. 15. It is also allowed to use regions of linear terms such as those illustrated
in Fig. 1. Call such a region a positive quadratic region (PQR) if it contains at least one
positive example but no negative example. We say that such PQR covers the positive
examples in it. Then the next lemma tells when an extension in CQUAD exists.
X Ai X Bj X Ai X Bj
X Bi X Bj
X Ai X Aj
X Ai X Bj X Ai X Bj
Lemma 5.5 Given a 2-dimensional data set (S ; S ? ), the pdBf (T; F ) resulting from a set
+
of cut points has an extension in CQUAD if and only if the following two conditions hold.
RRR 04-97 Page 43
C1
C2
X1
X2
X3
A
Figure 16: Construction for CQUAD with C1 = (X1 _ X2) and C2 = (X2 _ X3).
RRR 04-97 Page 45
of lines; six lines are required if at most one of Xj and Xl is satised, but four lines are
necessary and sucient if both of Xj and Xl are satised.
Therefore, the minimum number of lines such that the resulting pdBf (T; F ) has an exten-
sion in CQUAD provides a solution to MAX-2COMPSAT. This shows that MAX-2COMPSAT
is reduced to
MIN-2CP(CQUAD ), and establishes the next theorem.
Theorem 5.7 MIN-2CP(CQUAD) is NP-hard. 2
As the nal case in this section, we deal with MIN-3CP(CP ) in the 3-dimensional space
with coordinates A; B and C . We call the plane obtained by xing A (resp., B; C ) to a
constant an A-plane (resp., B -plane, C -plane).
In this case, we build the required grid not in the C -plane with C = 0, but in the triangu-
lar region T0 that lies in the plane dened by the three points (K; 0; 0); (0; K; 0); (0; 0; K ),
which is illustrated in Fig. 20. To draw the lines of the grid, we place positive examples and
negative examples in the C -plane with C = 0 along the straight line segment connecting
(K; 0; 0) and (0; K; 0) in the manner of Fig. 21. Although we omit the details, it is clear that
the desired grid structure can be built in the square area indicated by bold borders. Also
note that all cells created in this way are consistent with respect to the examples used for
Page 46 RRR 04-97
v1 v 2 v 3 v4
h
h 34
h2
h1
Xj
Xj
(a) Xj = 1 (b) X j = 0
building the grid, and furthermore, no pair of a positive example u and a negative example
v satises v u. The lines in Fig. 21 represent A-planes and B -planes perpendicular to the
C -plane, which then intersect the triangle T0 of Fig. 20 and project the grid structure of
Fig. 21 on T0.
Then, using the grid built on T0, we construct on T0 the structure used in the proof of
Theorem 5.4. Fig. 22 shows how it is realized for the example case of Fig. 10, where shaded
regions represent Xj -squares, Ci-squares and connectors in C=X and C=X areas. In this
construction, however, we shrink the size of each critical pair (i.e., we put and closer)
as illustrated in Fig. 23. Although not exhibited, the critical pairs in connectors are also
modied in the same manner. We emphasize that all these examples are put on T0.
Next, for every negative example in the construction (including those used to build the
grid), we introduce a new positive example and place it right above the corresponding
(in the direction of increasing C -coordinate). This necessitates the introduction of a C -plane
to separate each critical pair just created in this process. These C -planes intersect T0 and
produce horizontal lines, which are illustrated in Figs. 22 and 23 by broken lines. Note
that C -coordinate values decrease if we move upward in these gures, and these broken lines
separate none of the critical pairs which were already present before this modication.
RRR 04-97 Page 47
v3 v2 v 1 Ci v2 v1 Ci
Xj Xj
h1 h1
h h2
2
h3 h3
h4
Xl
Xl
(a) None of X j and X l (b) One of X j and X l
is separated (say X l ) is separated
Ci
v2 v 1
Xj
h1
h2
Xl
(c) Both X j and X l are
separated
Figure 19: Covering Ci-square together with the associated connectors (Ci = (Xj _ Xl)).
Page 48 RRR 04-97
( 0, 0, K) T0
( 0, K, 0)
A
( K, 0, 0)
B A
( 0, K, 0) ( K, 0, 0)
( 0, 0, 0 )
B A
( 0, 0, K )
Now it is not dicult to observe that the structure constructed on T0 in the 3-dimensional
space is the same as the one constructed for the proof of Theorem 5.4 in the 2-dimensional
space. Therefore, the previous proof can also be applied to this case in exactly the same
manner, and we conclude that minimizing the number of lines to separate all critical pairs
is equivalent to solving the given instance of MAX-2SAT.
Finally, we point out that no pair of a positive example u and a negative example v in
this construction satises v u in R3, because they are located on T0 (except for those used
to build the grid in Fig. 21, and those placed above negative examples, for which the proof is
also straightforward). If a set of cut points is introduced to separate all the remaining critical
pairs, then it can be shown from the construction that the Boolean vectors x(u) and x(v)
for any pair of a positive example u and a negative example v do not satisfy x(v) x(v).
(The cut points for the broken lines in Figures 22 and 23 were introduced to guarantee this
property.) Then Lemma 2.2 tells that the resulting pdBf (T; F ) has an extension in CP , and
establishes the next theorem.
References
[1] D. Angluin, Queries and concept learning, Machine Learning, 2 (1988) 319-342.
[2] E. Balas and A. Ho, Set covering algorithms using cutting planes, heuristics, and sub-
gradient optimization: A computational study, Mathematical Programming, 12 (1980)
37-60.
[3] R. Bellman, Introduction to Matrix Analysis , McGraw-Hill, New York, 1960.
[4] J. C. Bioch and T. Ibaraki, Complexity of identication and dualization of positive
Boolean functions, Information and Computation, 123 (1995) 50-63.
[5] E. Boros, V. Gurvich, P.L. Hammer, T. Ibaraki, and A. Kogan, Decompositions of
partially dened Boolean functions, Discrete Applied Mathematics, 62 (1995) 51-75.
[6] E. Boros, P. L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik, An imple-
mentation of logical analysis of data, Technical Report RRR 22-96, RUTCOR, Rutgers
University, 1996.
RRR 04-97 Page 51
[7] E. Boros, T. Ibaraki and K. Makino, Error-free and best-t extensions of partially
dened Boolean functions, Technical Report RRR 14-95, RUTCOR, Rutgers University,
1995.
[8] E. Boros, T. Ibaraki and K. Makino, Extensions of partially dened Boolean functions
with missing data, Technical Report RRR 06-96, RUTCOR, Rutgers University, 1996.
[9] P. Clark and T. Niblett, The CN2 induction algorithm, Machine Learning, 3 (1989)
261-283.
[10] Y. Crama, P. L. Hammer and T. Ibaraki, Cause-eect relationships and partially dened
boolean functions, Annals of Operations Research, 16 (1988) 299-326.
[11] R. Dechter and J. Pearl, Structure identication in relational data, Articial Intelligence
58 (1992) 237-270.
[12] O. Ekin, P.L. Hammer and A. Kogan, On connected Boolean functions, Technical Re-
port RRR 36-96, RUTCOR, Rutgers University, 1996.
[13] M. R. Garey and D. S. Johnson, Computers and Intractability, Freeman, New York,
1979.
[14] P. L. Hammer, Partially dened boolean functions and cause-eect relationships, in
International Conference on Multi-Attribute Decision Making Via OR-Based Expert
Systems, University of Passau, Passau, Germany, April 1986.
[15] A. B. Hammer, P. L. Hammer and I. Muchnik, Logical analysis of Chinese productivity
patterns, RUTCOR Report RR 1-96, RUTCOR, Rutgers University, 1996.
[16] S. J. Hong, R-MINI: An iterative approach for generating minimal rules from examples,
IEEE Trans. on Knowledge and Data Engineering, to appear.
[17] O.L. Mangasarian, Mathematical programming in machine learning, In G. Di Pillo and
F. Giannessi, eds., Nonlinear Optimization and Applications, pp. 283-295, New York,
1996 (Plenum Publishing).
[18] I. Muchnik and L. Yampolsky, A pilot study of the concurrent validity of logical analysis
of data, as applied to two Beck inventories. Technical Report RTR 02-95, RUTCOR,
Rutgers University, 1995.
[19] S. Muroga, Threshold Logic and Its Applications, Wiley-Interscience, 1971.
Page 52 RRR 04-97
[20] P. M. Murphy and D. W. Aha, UCI repository of machine learning databases: Machine
readable data repository, Department of Computer Science, University of California,
Irvine, 1994.
[21] J. R. Quinlan, Induction of decision trees, Machine Learning, 1 (1986) 81-106.
[22] J. R. Quinlan and R. Rivest, Inferring decision trees using minimum description length
principle, Information and Computation, 80 (1989) 227-248.
[23] L. G. Valiant, A theory of the learnable, Communications of the ACM, 27 (1984) 1134-
1142.