'a

S~mposiumon Graph Theory in Chemistry

6

10

2

4

A Graph-Theoretical Approach
to Structure-Property Relationships
-

-

Zlatko MlhaliC
Faculty of Science and Mathematics,The University of Zagreb, Strossmayerovtrg 14.41000 Zagreb, The Republic of Croatia
Nenad TrinajstlC
The Rugjer BoSkoviC Institute, P.O.B. 1016,41001 Zagreb,The Republic of Croatia

A fundamental concept of chemistry is that the structural characteristics of a molecule are responsible for its
pmperties (1).This was pointed out in the middle of the
last century by Crum Brown and Fraser (2) who had also
devised one of the first structure-property models. However, the earliest work in which this relationship was observed (the toxicity of methyl and amyl alcohols)was a thesis by Cms in 1863 (3).
A Topological Model of Matter
The origin of the structure-property concept can be
traced (4) to the work of the Croatian Jesuit priest, scientist, and philosopher Rugjer Josip BoGkoviC (5)who introduced the idea of representing atoms as points in space (6).
(His major work was the theory of a single law of forces.)
By allowing the point atoms to assume a variety of different arrangements, BogkoviC was able to account for the existence of different substances.
In this way the BobkoviC model may be considered as the
forerunner of a topological model for the structure of matter. BoBkovib's fundamental idea, which is of the greatest
importance in chemistry, was that substances have different properties because they have differentstructures. This
idea was used, for example, by Davy to rationalize the difference between diamond and graphite (4, 7).

Table 2. List of Properties that Are Deslrable for
Topological Indices a s Proposed by RandiC (18)
1
2
3
4

5

Direct structural interpretation
Good correlation with at least one molecular property
Good discrimination of isomers
Locally defined
Generalizable
Linearly independent
Simplicty
Not based on physical or chemical properties
Not trivially related to other indices
Effidencyof construction
Based on familiarstructural concepts
Correct size dependence
Gradual change with gradual change in structures

QSPR

The structure-~m~ertv
. - relationships wantifv the connection between ihe structure and p&pekies o<moleculea
(81. These relationships are mathematical models that
allow the prediction ofproperties from structural parameters. They are called quantitative structure-property rela-

Table 1. List of Selected Topological Indices
Topological Standard Structural interpretationa Author (Year)
index
symbol
Wiener
W
Sum of distances in a
Weiner (1947)
number
molecular graph
Z
Sum of countsof non- Hosoya (1971)
Hosoya
index
adjacent edges in a
molecular graph
RandiC
x
Sum of weighted edges RandiC (1975)
in a molecular graph
index
Balaban
J
Sum of weighted
Balaban (1982)
index
distances in a molecular
graph
Schultz
MTI
Sum of elements of the Schultz (1989)
index
structural row matrix
v[A +bD]of a molecular
graph
H
Sum of squares of
PlavSiC, NikoliiC
Haraty
number
reciprocal distances in a TrinajstiC (1991)
molecular araoh
- .
'Graph-theoretical mncepts are given in the following section.
'v the valency row matrix; A = adjamncy matrix: 0 = t h e distanm matrix

-

Topological Indexes
A graph-theoretical approach to QSPR is based on the
use of topological (graph-theoretical) indices for encoding
the structural information (8-14). The term topological
index (15) indicates a characterization of a molecule (or a
corresponding molecular graph (16)) by a single number.
The need to represent molecular structure by a single
number arises from the fact that most molecular pmperties are recorded as single numbers. Therefore, QSPR
modelling reduces to a correlation between the two sets of
numbers via an algebraic expression. (One set of numbers
represents the properties, and the other set represents the
structures of molecules under study.)
Characterizing a molecule by a single number represents a considerable loss of information: A three-dimensional object (molecule)is described by a one-dimensional
object (topological index). However, what is surprising is
how much of the relevant structural information is still retained in a given topological index.
There are-more than 120 topological indices available to
date in the literature (171,without any sign that their pmVolume 69 Number 9 September 1992

701

graphs that represent chemical systems. Such graphs represent only the molecular skeletons. The graph is aclualy aone-dimensional entity. reactions. a set of vertices (points) and of edges (lines)joining these vertices. Adiagram of a labeled (numbered)graph. hydrogen-depleted graphs are often used. the low discriminatorv Dower of many indices does not prevent them from being useful descriptors in structure-property-activity modelling.4-trimetbylhexane. In these graphs vertices correspond to individual atoms. Sites can be atoms.271. in general. showing vertices as circles and edges as lines. Similarly.16. we will follow the book Graph Theory by FrankHarary(24) and both editions of our book Chemical Graph Theoq (8. graphs have appeal as structural models in science.25). ete.molecular graph correspondmg lo tne carbon skeleton of2. For a graph theorist. necting the appropriate circles. and consequently of theoretical chemistry.3. It should be noted that topological indices are graph invariants. but they can be embedded or realized in spaces of higher dimensions.3. that is. intermediates. In the next section we will give a brief survey of elementary (chemical) grapb-theoretical concepts. molecules. The Concept of a Graph in Graph Theory The central concept in graph theory is that of a graph. The desirable properties proposed by RandiL (18)represent the very high level of sophistication that a topological index should achieve. van der Waals forces. Table 2 gives a list of useful properties that are desirable for topological indices (18). 702 Journal of Chemical Education Two graphs GI and Gz are isomorphic if there exists a one-to-one correspondence between their vertex sets V(GJ and V(G2). related to the concept of the molecular graph: The term graph was introduced by English mathematician Sylvester (28) in 1878 on the basis of the constitutional formulas used by the chemists of his day. In other words. two edges of G are adjacent if they have a vertex in common. an0 the edges correspono to chem. that is. and edges correspond to the bonds between them. and clusters. which induces a one-to-one correspondence between their edge sets E(GJ and E(G2).as shown here. crystals. the vertices i and j are then incident to such an edge. Because a diagram of a graph completely describes the graph.Tne vertices correspono to aroms. graphs can be used to represent a variety of chemical objects such as molecules. Graphs are one-dimensional objects. that is.4-trimethylhexane. The valency of a vertex i of G is the number of edges incident to i. and in chemistry. chemical graph theory is concerned with all aspects of the application of graph theory to chemistry. In doing so. Figure 2 gives a labeled molecular hydrogen-depleted graph that depicts the carbon skeleton of 2. For a chemist. related to topology and wmbinatorics. In the fourth section a design of the ~ t ~ c t & . As an example. A aoelea.liferation will stop in the near future. This narticular urouertv " is rather low for all topological indices'considerei here except the Balaban index (19-22). . Agraph G can be visualized by a diagram when the vertices are drawn as small circles or dots. reaction steps. polymers. molecular fragments. in particular (26. this is the weak point of most topological indices. Figure 1shows a diagram of a labelled graph. Molecular graphs (or constitutional graphs) are chemical graphs that represent the constitution of molecules. The Concept of a Graph in Chemistry In chemistry. Nonetheless. As an example.An invariant of a graph G is a quantity associated with G that has the same value for any graph that is isomorphic with G. it is customary and convenient to refer to the diagram of the graph as the graph itself. Graph theory is a branch of discrete mathematics. This section will be followed by a section containing definitions of the six selected to~oloeicalindices. This is denoted by dm. Two vertices i and j of a graph G are adjacent if there is an edge joining them. Then a didactic example will be presented. Chemical graph theory is a branch of mathematical chemistry. while the connections between sites can represent bonds. Mainly due to their diagrammatic representation. omitting hydrogen atoms and their bonds. To simplify the manipulation of molecular graphs. It is concerned with handling chemical graphs. . All six indices listed in Table 1 approach this ideal. Table 1 lists six topblogical indices thatAwill-be considered in this report. Here we will review onlv several selected touoloeical indices. Hence..~ m ~relationships ert~ will be delineated.ca wnos. except for molecular identification numbers (23). a collection of elements of the set and of binary relations between these elements. thus. the two-dimensional realization of a graph is more appealing. The common feature of chemical systems is the presence of sites and connections between them. Elementary Graph-Theoretical Concepts We will cover only those graph-theoretical concepts that will be used in this report.. An interesting historical detail i. Molecular Gmphs A special class of chemical graphs are molecular graphs. hydrogen-depleted. Their weakest point is the discrimination of isomers. etc. However. Analyzing and Comparing Graphs Figure 1. chemical graph theory deals with analyses of all consequences of connectivity in a chemical system. Agraph is called labeled when a specificnumbering of the its vertices is introduced. by it can be realized in two dimensions. Chemical systems can be represented by chemical graphs using a simple conversion rule: Sites are replaced by vertices and wnnections by edges. a graph is the application of a set on itself. It deals with the way objects are connected and with all the consequences of the connectivity. The connectivity in a system is. electrons. a fundamental quality of graph theory. and the edges as lines or curves con- Figure 2.

2+ 4.O)+ p(T. Agraph G is connected if every pair of its vertices is joined by a path. Very often the distance matrix of a graph G can be generated using powers of the corresponding adjacency matrix of G (29). (c)The Hosoya index of T Z(n= p(T.O) = 1 (ii) p(T. It is defined below. a graph is considered disconnected. It is defined below. in which each edge is incident with the two vertices immediately preceding and following it. Table 3 gives the adjacency matrix and the distance matrix that correspond to the molecular graph in Figure 2. Table 5. then the graph is called a cycle. D = D(G). is a square symmetric matrix of order N. The Computation of the Wiener Number for a Tree TDepicting the Carbon Skeleton of BMethylbutane. The Computation of the Hosoya Index for a Tree TRepresenting the Carbon Skeleton of 2. A graph whose vertices all have the same valence is called a regular graph. The distance between two vertices is the number of edges in the shortest path that joins the two vertices.of a labeled connected graph G with N vertices is a square symmetric matrix of order N.2)= 8 (iv) p(T:3)= 2 Wiener Number The Wiener number. Table 3. Associating Graphs with Matrices A labeled (chemical) graph may be associated with several matrices. A = A(G). The vertex-adjacency matrix. The Adjacency Matrix and the Distance Matrix of the Molecular Graph in Figure 2 Definitions of the Selected Topological Indices (b)The count of the ~(Tfiquantities in T (i) p(T. wasintroduced by Wiener in 1947 as the path number (30). It is commonly called the adjacency matrix. Otherwise..I) = 6 (iii) p(T.e. the distance) between the vertices i and j in G. beginning and ending with vertices. (a)A tree T where lij is the length of the shortest path (i.l)+ p(T. (a)A labeled tree 7 (b) The distance matrix of T (c) The Wiener number of T 1 W ( n =-(%I+ 8. Two very important graph-theoretical matrices are the vertex-adjacency matrix and the distance matrix. A graph is acyclic if it has no cycles.This topological index is defined as the half-sum of the elements of the distance matrix (15).A walk of a graph G is an alternating sequence of vertices and edges. The molecular graph in Figure 2 is an example of a tree. A tree is a connected acyclic graph.2) + p(T. A path is a walk in which no vertex occurs more than once. Volume 69 Number 9 September 1992 703 . 1 if vertices i and j are adjacent Table 4.3-Dimethylpentane.3) = 18 2 (1) The distance matrix. If all vertices in a regular graph have a valence of 2.3)= 17 Table 4 gives an example for computingthe Wiener number. W = W(G) of G. of a labeled connected graph G with N vertices.

.Table 6. Hosoya lndex The Hosoya index.. The Computation of RandiC lndex for a Tree T Depicting the Carbon Skeleton of 4-Ethyl-2-methylheptane (b) The distance sums 0 1 2 D(!z)= 3 4 2 3 1 0 1 2 3 1 2 2 1 0 1 2 2 1 3 2 1 0 1 3 2 4 3 2 1 0 4 3 2 1 2 3 4 0 3 3 2 1 2 3 3 0 (c) The Balaban index of 1 (a)A tree T 34) (and also in quantitative structure-reactivity relationship ($SARI (35)). By definition.0. x = x(G) of G. The Edge Weights of 10 Edge Types Which Appear in Graphs Corresponding to the Carbon Skeletons of Hydrocarbons Table 8..25 Table 7. 1)is the number of edges in G. Table 7 gives an example of computingthe RandiC index by means of eq 6. only four types of vertices with respect to their valencies appear. and p(G. RandiC lndex The Randid index.2887 4. 0) = 1.4 0. vertices with d = 1.3 0.4 0. p(G. 4 42=2 43=2 bL2= 1 b3=4 (c) The Randit index of T x ( q = 2.5 22 0..7071 + 2..5774 1.3 0. . eq 5 may be givenin closed form.4082 = 4.4 bg j = i.4082 2.This index is defined below.5 + 4.3333 3. If the number of each edge type is denoted by where i = 1. i) is the number of selections of i mutually nonadjacent edges in G.4 0.7018 and if the edge weights from Table 6 are used. In molecular graphs that depict the carbon skeletons of hydrocarbons. that is.7071 1. It is defined as .? 1 12 0. then eq 5 becomes the following. Table 5 gives an example of computing the Hosoya index. was introduced by Hosoya in 1971 as the Z index (15).0. Balaban lndex The ~ a l a b a nindex. 3.3536 3.4. J = J(G) of G.5774 + 0.. The Computation of the Balaban lndex for a Labeled Tree TRepresenting the Carbon Skeleton of 2.0. For saturated hydrocarbons.2.4 0. was introduced by Balaban in 1982 as the average-distance sum connectivity (36).3-Dimethylpentane (a)A labeld tree 7 I. wherep(G. Z = Z(G). This is one of the most widely used topological indices in QSPR (32704 Journal of Chemical Education This expression reveals that the Randid indices of hydmcarbons are fully determined by the counts of the edge types in the corresponding hydrogen-depleted graphs. was introduced by RandiC in 1975 as the connectivity index (31). The Randid index is defined as (b) Count of the edge-types (the numbers at the vertices represent their valencies) where d(i) and d(j) are the valencies of the vertices i and j that define the edge ij. These give rise to 10 types of edges whose weights are given in Table 6..5 23 0.

.10 Schultz lndex The Schultz index.. for monocycles.0. and (D)iis the distance sum where i = 1.2 0.33 0.0.25 0.06 1 0 1 0.11 0. Haraiy Number The Harary number. Clearly the Wiener number can also be expressed in terms of the distance sums. The distance sum (Dlifor a vertex i of G represents a sum of all entries in the corresponding row of the distance matrix..11 + 15 + 16 + 18 + 25 = 118 where M is the number of edges in G.5 0. = 0.1 + 16.11 0.371.11 0.25 0. v is the eyelomatic number of G.5 1 0. (38)in 1991 in honor of Professor Frank Harary on his 70th birthday He greatly influenced the development of graph theory and chemical graph theory.06 0.25 0..25 0. + 8.2..2.33 0. The Com~utationof the Haraw Number for a Tree T ~ e l j i c t i nthe ~ Carbon skeleton of 2. MTI = MTI(G) of G. and D is the distance matrix.N) represent the elements of the following row matrix of order N. Table 9 gives an example of computing the Schultz index.I :l I 0. This index is defined below. was introduced by Schultz in 1989 as the molecular topological index (37). The cyclomatic number = p(G) of a polycydic graph G is equal to the minimum number of edges that must be removed from G to transform it to the related acyclic graph.06 +40.3-Dimethylhexane (a)A labeled tree T + (b) The distance matrix of 7 (b) The adjacency matrix of T ic) The distance matrix of 1 (c)The D-' matrix of T (d) The adjacency-plus-distancematrix of T :.25 0.33 0. N MTI = x ei i=l (10) where the ezs (i = 1. A is the adjacency matrix.11 0.33 0. The Computation of the Schultz lndex for a Tree TDe~ictinathe Carbon Skeleton (a)A labeled tree T Table 10. where v is the valency row matrix. .5 0..5 1 (d)The D-' matrix of T I 0 1 0 1 0. For trees.2 0 0.0.04 0.25 1 0. H = H(G) of G .04) = 10. N.25 0.25 1 v ( T ) = [ 1 3 2 2 1 11 (1) The v[A + Dl row matrix v[A + D](T) = [22 15 16 16 25 221 (g) The Schukz index of T MTZ(T) = 2.Table 9.25 0.2 0. v = 1. This index is defined below (21..33 0.25 1 (e)The valence row matrix of T 0. was introduced by PlavSiC et al.22 (e)The Harary number of T H(T) = % (14.25 0 0. Table 8 gives an example of computing the Balaban index.5 1 0.25 + 14.33 0 0. Volume 69 Number 9 September 1992 705 .

Get a reliable source of experimental data for a given set of molecules. Agood QSPR model must have r > 0.3-dimethylpentane 29-dimethylpentane 2. it must be revised and .2.s-dimethylhexane 2methylheptane 3-methylheptane 4methylheptane 3-ethylhexane octane 2.3-trimethylpentane 2.3-tetramethylbutane 2. Step 4. T h e q u a l i t y of t h e QSPR models can be conveniently measured by the correlation coefficient r and the standard deviations.2.2-dimethylpentane 3.3-dimethylhexane 3-ethyl-2methyipentane 3.3-trimethylhexane 2.3-trimethylpentane 2. Predictions are made for the values of the molecular property for species that are not part of the training set via the obtained initial QSPR model. Step 5. and Schultz indices for alkanes with up to 10 carbon atoms.34-trimethylhexane 2. Step 2.2.2-dimethyl butane 2. The Wiener Numbers IWI.2. The unknown molecules are ~ t ~ ~ t u rrelated d l y to the initial set of compounds.4tetramethylpentane 2. The QSPR model is thus a regression model. and one must be careful about its statistical stability.4-trimethylpentane 2. This is also an important step because selecting the appropriate topological index (or indices) can facilitate finding the most accurate model.3. This is a n iterative approach. For example. Step 3 is a central step in the design of the structure-property models. Balaban. Hosova Indices (a. The predictions are tested with unknown molecules by experimental determination of the predicted properties.3-dimethylhexane 3-ethyl-3-methylpentane 2.3.3-dimethyl butane 2-methylpentane 3-methylpentane hexane 2.3tetramethylpentane 2. Balaban lndices (4.2.3-trimethylbutane 2. The two sets of numbers are then statistically analyzed using a suitable algebraic expression.2-dimethyl-3ethylpentane 3. This step is rather involved because it requires acquiring or preparing the test molecules. Randic indices irl. This initial set of molecules is sometimes called the training set (45).2-dimethylpropane 2-methylbutane pentane 2.4-trimethylpentme 2.3-trimethylhexane 2. Here we outline one possible strategy. for boiling points.4tetramethylpentane 2.2.3. - Designing QSPR Models There a r e several ways to design QSPR models (39-44). Table 10 eives a n examole of comoutine t h e karary number: Table 11 eives the Wiener and Harary numbers.2-dimethylhexane 3. Therefore.Schultz indic& ( M T Harary ~ ~ u n i b e r (H) s and Boili& Points (bp In 'C) of Alkanes with Up to 10 Carbon Atoms Alkane methane ethane propane 2-methylpropane butane 2.4-dimethylpentane 2-methylhexane 3-methylhexane 3-ethylpentane heptane 2.2. and the Hosoya.4-dimethylhexane 2. If the tests support the predictions. one presents the QSPR model in its final form with all necessary statistical characteristics.3.3.Table 11. Chance factors could yield spuriously accurate correlations (4648).3-dimethyl-3ethylpentane 2. Figure 3 contains a flow diagram of the steps involved in the design of a QSPR model.2.3. If the tests do not support the initial QSPR model. The quality of the selected data is important because it will affect all the following steps.99.4-dimethylhexane 2.4. while s depends on the property. Step 1.4tetramethylpentane 706 W Z 0 1 4 9 10 16 18 20 28 29 32 31 35 42 46 44 46 48 52 50 48 56 58 63 62 66 71 67 64 65 70 67 66 71 74 79 76 75 72 84 82 86 92 88 88 84 90 86 88 Journal of Chemicall Education J MTI H bp where V 2i s t h e matrix whose elements are the squares of the reciprocal distances in G. The D" matrix may be considered a s the distance matrix of a class of specially weighted g r a p h s i n which weights between vertices in G mimic the Coulomb law between the sites in the corresponding structure. RandiC. s c 5 'C. Step 3.3. Step 6. The topological index is selected and computed.The data in this set must be reliable and accurate.3.

5-dimethylheptane 2.2.54rimethylheptane 2.4.2.4.5-trimethylhexane 2.4.4-tetramethylhexane 3.3.3.4-tetramethylhexane 3-ethyl-2.bdimethylheptane 44-dimethylheptane 3-ethyi-3-methylhexane 3.2.4trimethylheptane 3.8140 Volume 69 436 13.3-trimethylheptane 2.5-dimethyiheptane 2.5.3. Continued MTI Alkane 2.4.4tetramethylhexane 2.3.4-dimethyloctane 3-ethyl-3-methylheptane 4-ethyl-4-melhylheptane 3.4.44rimethylpentane 2.3.3.4.4-dimethyl-3-ethyipentane 2.4-pentamethylpentane 2.3-trimethylheptane 2.5tetramethylhexane 2.4.2.4-dimethylheptane 4-ethyl-2-methylhexane 3.5-trimethylheplane 2.2.3.4641 3.3.4-pentamethylpentane 2.2.4-trimethylhexane 2.3-dimethyl-4-ethylhexane 2.2dimethyl-3-ethylhexane 3.3.4-trimethylheptane 3.3.2.2.9933 Number 9 161 September 1992 707 .2.3-diethylhexane 2.bdiethylpentane 23.6-trimethyiheptane 2.3-dimethyloctane 4.4trimethylhexane 2.3.3-dimethylheptane 3-ethyl-2-methylhexane 3.4-tetramethylhexane 2.3.3.4-dimethyl-3-ethylhexane 3-ethyl-234-lrimethylpentane 2.2.3.2.5tetramethylhexane 121 58 4.54etramethylhexane 2.2-dimethyloctane 3.4-dimethylheptane 3-ethyl-4methylhexane 2.5-trimethylhexane 22-dimethyiheptane 3.6-dimethyiheptane 2-methyloctane 3-methyioctane 4-methyloctane Sethylheptane 4-ethylheptane nonane 2.4tetramethyihexane 2.Table 11.3-dimethyl-3-ethylhexane 33diethyl-2-methylpentane 2.3.3.4trimethylhexane 2.2.3-tetramethylhexane 3-ethyl-22.3-trimethylpentane 3.

7979 3.6-trimethylheptane 2.35 f l . Continued - Alkane J 2.5-dimethyl-3-ethylhexane 2.4-dimethyl-3-ethyihexane 3.4-trimethylheptane 2.4-dimethyl-3-isopropylpentane 3-isopropyl-2-methylhexane 2. MTI (13) .4-dimethyloctane 4-ethyl-2-methylheptane 3.7280 3.5637 3.4617 3.3.5-trimethyiheptane 2.9680 3.7561 3.9095 2.4123 3.3088 3.3014 3.6-dimethyioctane 2.4.3.9835 3.93 (M.Table 11.3374 3.9142 the procedure repeated.3978 3.3-dimethyi-4-ethylhexane 2.5299 3.2055 3. The QSPR model thus established.4.9984 3.99) An Instructive Example We will apply the procedure from the preceding section.97) ~30899'0'0137'.6476 Step 1 The boiling points ('C) of the alkanes are taken from the CRC Handbook of Chemistry a n d Physics (49) and Beilstein (50).1600 3.(3.3-dimethyloctane 3-ethyl-2-methylheptane 3.1682 3.6982 3.1244 3. Step 2 We will consider a t this stage all six topological indices discussed i n this report. 0 2 ) 1 0 $ ~ -164.35trimethylheptane 2.6854 3.4-dimethyloctane 4-isopropylheptane 4-ethyl-3-methylheptane 43-dimethyloctane 3-ethyl-4-methylheptane 3.2686 3. even for a narrow class of compounds.6033 3.0333 2.4.4999 3. is a very useful tool for predicting the properties of hypothetical compounds a n d for t h e search for new compounds with programmed properties (12).2951 2.3759 3. As the initial set we will consider alkanes with up to 8 carbon atoms (40 molecules).7561 3.2555 3.3908 3. 708 Journal of Chemical Education find Step 3 The following structure-property models are the most successful for each index considered: bp = 77.24 (i4.3-dimethyl-4-ethylhexane 2.5-trimethylheptane 2.5-dimethyloctane 3-ethyl-5-methylheptane 2.6-trimethylheptane 2.1296 3.5027 3.5833 3.7732 2.5-dimethyloctane 5-ethyl-2-methylheptane 3.6-dimethyloctane 2.4-diethylhexane 2.0869 2. to give a n instructive example of the design of the QSPR model for predicting the boiling points of alkanes.8862 2.7-dimethyloctane 2-methylnonane 3-methylnonane 4-methylnonane 3-ethyloctane 5-methylnonane 4-ethyloctane 4-propylheptane decane 165 89 4. 3.

3.3.3-trimethylhexane 2.3-letramethylpenlane 2. .3-dimethylheptane 3-ethyl-2-methylhexane 3.seo and the proced~rerepeateo ~ n t i l the satisfactory model is obtained ~ 2. For example. However. respectively. in Figures 46. . bp vs X . i. S: Tests confirmeathe nit:al model.2. were also eliminated from the study.Ptrimethylhexane 2.3.3-dimethylheptane 4.2. and bp vs NXand the accompanying statistical data are given.2.2. and .Ptrirnethylhexane 2. Both models have problems with some members of the nonane series. Althoughmost of the QSPR models produced are very accurate (r > 0. such as ethane and propane.5-dimethylheptane 2.senino uo -r -5: Test ng the predictions.4-dimethylheplane 4-ethyl-2-methylhexane 3. The Predicted Values of Boiling Points ('C) of Nonanes predicted boiling point Nonane 2.3-dimethyl-3-ethylpentane 2.998.QSPR model. the QSPR models based on in Z and x did not improve. 2: Seledion of the topological index.2.Table 12..~ .Plrimethylhexane 2. They will be used in the next step.4-dimethyl-3-ethylpentane Figure 3. Tne model appears to be satlsfaclory for f~rtherwork. Methane was not considered.3.-work . iii.4-dimethylheptane 3-ethyl-3-methylhexane 3.3. and 1 9 may serve as reliable models for predicting the alkane boiling points.2.(All alkanes with up to 9 carbon atoms have been considered but methsne. one of the most a m rate QSPR models for predicting boiling points of alkanes is the following (40). they suffer from several shortcomings.4. The boiling points of alkanes have been predicted many times (8. The procedure may be repeated. usually for C4-C7 families. 15. The final foml ofthe OSPR model.51). This model is given by up All three models expressed as 14. The slight improvement happened only when a hiparametric model (with x and N is the number of carbon atoms in alkane) was used. when S t e 3~is r e ~ e a t e dusine the boiling points of all alkanes with to 9 Arban atom.5trimethylhexane 2.4.2-dimethyl-3-ethylpentane 3. - Step 4 Step 6 We use eqs 14 and 15 to predict the boiling points of nonanes (35 molecules) (see Table 12).40 2.4-trimethylhexane 2. and we will eventually arrive a t the best possible QSPR model for predicting the boiling points of alkanes. Tne model mJst be rev. Statistical . hS: Tens rejected the nit al model as not sat~sfactory.3. The complexity of some of the accurate QSPR models in the literature is forbidding. Models were built for a limited set of alkanes.15.-~ .-~ .3037.5-dimelhylheptane 2. ii.5-lrimethylhexane 22-dimelhylheptane 3.4dimelhylheptane 3-ethyl-Pmethylhexane 2. 1: Source of experimental data. 6. Plots of bp vs in Z .) Volume 69 Number 9 Sevternber 1992 709 .4-tetramethylpentane 2.4-tetramethylpentane 233-trimethylhexane 2.3.40.3-diethylpentane 2. Step 5 We compare the predicted and experimental values of the nonane boiling ~ o i n t (see s Table 13). s < 2 W. 4 Predictions. A flow diagram of the steps involved in the design of a QSPR model.4-tetramethylpentane ~q 14 119.13. In some cases other lighter alkanes.26 ~q 15 119. 3: the.6-dimethylheptane 2-methyloctane 3-methyloctane 4-methyloctane 3-elhylheptane 4-ethylheptane nonane The most accurate models are those based on in Z (eq 14) and x (eq 15).

4dmethylheptane 4-ethyl-2-methylhexane 2.4-trimethylhexane 3-ethyl-2methylhexane 2.4.3.W 0.2. Comparison between Predicted (Two Models) and Experimental Values of Boiling Points ('C) of Nonanes Nonane (bp)exp Model (14) Model (15) Nonane 2.4-trimethylhexane 2.6dimethylheptane 2.3-dimethylheptane 4-methyloctane 4.2-dlmethylheptane 3-methyloctane 3.2.5-trimethylhexane 2.3.w 3. Model (14) 2.3-dimethylheptane 3.3.&trimethylhexane 2.&tetramethylpentane 3.50 2w In Z gure 4.Table 13.4-tetramethylpentane 3..3.3-dimethyl-3ethylpentane 2.4dimethylheptane 2. A plot of bp vs In Zfor the first 40 alkanes.3.3-tetramethylpentane 2.5dimethylheptane 2.3-trimethylhexane (bp).2.w 1.2+trimethylhexane 2.3-trlmethylhexane 3-ethyl-4-methylhexane 2.3.3.50 Model (15) .4-dimethylheptane 3-ethylheptane 3-ethyl-3-methylhexane 4-ethylheptane 32-diethylpentane nonane -2M 0.4-dimethyl-3ethylpentane 2.2-dimethyl-3ethylpentane 2.3.4.50 1.2.23-trimethylhexane 2-methyloctane 2. 710 Journal of Chemical Education 250 3.bdimethyiheptane 2.4-tetramethylpentane 2.

Volume 69 Number 9 September 1992 711 . Aplot of bpvs x for the first 40 alkanes. A plot of bpvs Ny for the first 75 alkanes. Figure 6.Figure 5.

(ThmhemJ 1988. 23. When all diverse structural features of alkanes are considered. EIPmntaofCkmimlPhllosophy. T C h . P.uc. l b n s p l k . H. I V . 18. M m + X.767.57. p83. D.22.Chem S a 1988.Ed. R.17. Multivariate regression models appear to be verv accurate due to a varietv ~arametersinvolved in the correlation.8. we be^.Mayer. Ed. lhngstiC..315. E. then simple aceurate models are possible (34).. p 159.399 37.N.155. Dkrete Molhemniiml M&l. 1.29. J..D. N&Y&. ~ n t J.Gmphs m Mothematical M&b. H. W. Mmklling 1968. N. if we limit ourselves to a simple family of alkanes (especially with less than 10 carbon atoms).&ybld.6. 8. 1986. . R. 65. The umer limits for the accurate models were set at r > 0.. . Coste1lo. Chem.40. D. 33. Technology. 1812.N. Sehultz. Wiley: . 48. J.Am Chem. B.: Chr(den. Moth Chem.4. in eq 20 are defined Figure 7.44. 45. K. and Informatics of the Republic of Cmatia for support. J.. 51.Bumms.. M.. 27.J.160 22. Edvc 1%87.. lbpliss. lI. En#. J V. Nova: New Ymk. D I1 InCh. 337.. N. 67th d . CRC: BoeaRaton. %zinger. D. 1886. Id.Elsene.: muarer:Dordnecht. PhraiC. ~ ~ ~~~ - Acknowledaement We are thankful to the Ministry of Science.26. G.. ~ I k . M. Re&tPmbHandbueh &r%Mis~ishen Chamie. S. R J. FL. 4C. @onrum cham: Qunntum ChPm Symp. 43. 1977. W.47.Bagal. a cluster and a path-cluster are given in Figure 7. Knop. 1983.\lullcr. D . 26.. S. 49. Hanscn. J M d .69. J. D. Examples of a path.J. 1078.N. 5. 16. Six selected topological indices were tested. J. Trinajatif. honey. 88. 42. The instructive example was directed to the design of the structure-property model for predicting the boiling points of alkanes. Wiener. 0. Tmrhi. %ajetif. Kier.264.Addison-Wesley: Reading. w: Deviuem. lW6. Compul. ~ *.187. G. Ck. Banchav. U. S. 28.%sjati&. InPmtlool Applimtlolo o f Q ~ m t i f o t i iSm&=m4cIiuity Roiationahipa (QSARJ in Enuimnmnfd Clumiafry and lbdmlogy. R. May. Lo#. P.2332 . RandiC. Nikoli&. Bonrheu.Soc 1917. J..575 11 Rsndk. p 235. Math. Rum Chem Rou. K . B a d .61.n3. and n4 are the numbers of vertices with valencies 1. J.23.173. 47. J Moth. Lacher.Mekenyan. 67. N.H. Epehtein. Chem 1987..62.. 1963:Vol. 1990. l M v l k ~ mHuh?. . c. h B. M . 50. 19.Ed*. hbelta. 1983. 26. 0. M. 1989.P G. The most accurate QSPR models for alkane ~ o i n t sare based on ln 2. DdiC.m in p m s . U J Chrm Phya. H.clusters (t = c). A c l a P h a n Jugarl. Rounay.. G..I Math ChDm 1890. Stankcnch. FL. Gmph Theary.. LEnuimn. E. huvray.Balaban. Conclusions In this report we presented a strategy for designing the quantitative structure-property relationships based on topological indices. FL.3.Natum 1878. - - The zero-connectivity index u where nl. I. S J Chem In/ Compur Sn 1880.Y. Runondinl: Venetia.~R~uaiuaiB&Po~oi4 Skolaka knjige: W b .. T.6MR. 41.Rinajsti6. London.I The as follows. Rentiee-Hall: Englearaod CIS%NJ.J Med Chem 1818. Psta. M . Rslaban. ffiop.Chapter I1 h u n a y . 36. 19S5. n2.1. Ran&&.1238.2. Gmaaman. S R u m C k m Roo 1S88. J A w u Ckm. MIT Cambridge. Ckm.17.363. Niemi. J. Randii. MoL S t m t . J. Ed*. The accu~ ~boiline ~ racy of t h l bodel was judged according to thLcorrelation coefficient and the standard error.97.S.4517. Ed .M. 35.. %ajstiC. 17.Am. 20 F h l a b ~ n T.254.-8. 29. F. Phys.i ~ p h i l o e o p h inotvmlia mtum exUffntium. Chem la?& 15.571. h pms. 1981.. Chem 1989.. W. D. J. J. C h . H. Rindle. C. G. 14. When m = 1. A. D . N i b % S. Examples of a path (3rd order). This is a bilibiligvsl edition: cmatlan and English. the model usually gives extremely good agreement between the experimental and calculated boiling points.A. A J Ckm. R C. J. huvray. R . Moth.. Edwards. Vdth..D.meters takes care of a certain structural detail of a large alkane.. or path-cluster ( t = pc) combinations ofm edges. Wei. 82. R. and 4. CRCHa&kofChrmlatnondPhysiac. 1986.995 z s < 5 T. of order m and type t can be obtained by summing analogous terms over subgraphs involving paths (t = p ) . Plav3iC.A T J. N i p .221.. 1971: 2nd prmtmg. I" C o m p v l n t i ~Chemiml ~l Gmph ThmX huvray. framnents are edges which lead to the fint-order connekivitYindex 'x. P A. sutmuttedforpublicatim. We&. 7. Chem Soc. szymansld. C h a m i d Gmph Thmy:CRC: Baca Raton.30.. Chartrand. G.9. 39.J. 8.846. V. Topliaa. F.P J : Jura..199% chapter 3. 1. M&. l(m. Hol. Each of these p&. 46.~ L V Math Chm. l M . R. 1976: p 56. Chem Inf Compvt S c i 1983. 1987. 31. 2nd neviaeded.MA.mloolAppiicanomo/T~pd~g) ondUmph T h o .84. Chemiml Gmph Theory. 'lhna). 1°C Compul: &i.21:l 21 . 1763. K:~ o u e rw . Ckmlml Gmkh Thewry.J. LipecL. 1891. L. 89.1987. 712 Journal of Chemical Education 3. 3 d printing: CRC: Baea Raton. 4.R. We conclude that there is no simple single-parameter QSPR model for predicting the boiling points over a wide range of alkanes due to the great diversity among experimental values. However. Trinajatik. Needham. C h .N. . J. 25.: Belaban. MA. Sylvester.C. Dub0is.4188. Haran. Jomn 1071. Szymanalu. To conclude this section we stress that there is no simple QSPR model for predicting boiling points over a wide range of alkanes. lW..p 1. In MATHICHEMICOMP 1967. 30.M. Nizhnii.U.285.Math. Molffvlor Conmtiuitv in Stmbre4ctiuihlAdwie.. Elsevier Amater dam. P. R. h v i l k m .H.40.Am. 12 Smkcneh. 15 llopava H Bull C h e m S a . . Daw. 24. Basmvick. 13. respectively Connectivity indices "'x. The extended connectivity index m ~ C[d(i) = dm . J. M e r . d(m + l)la5 (21) where m represents the order of possible fragments... G.fin& R B. 1990. J .N. 34.Am 1988. J.:Hall. V. CRC: BoeaRaton..575. a cluster (3rd order) and a pathcluster (4th order)for a tree Tcorresponding to 3-methylpentane. Jeman-Bldif.337. m &It ad micam legem uirivm in 6.1990.Val. D . 1966.B. and Nu. and Sehmidt: B e ton.: Amsterdam. J. The English translation ia also m4able: The TheoryafNolvrol P h i h p h y .M.MA. H.20.1066.. Seybld. U %.120. 7. P 1: J C h m Edvc LW. M. Ed=.