Professional Documents
Culture Documents
Highlights
Chemical Space (CS) is a set of molecules related between them by binary relations based on
some similarity measure.
There are three representations of CSs based in the position of molecules in space: (I) coordi-
nate based, (II) cell based and (III) graph based.
Concepts like similarity cliffs have been introduced in order to describe relationships between
small changes in biological activity and large changes in similarity.
“Subset selection usually takes places in early screening while similarity searching or ligand
based virtual screening is typically used in subsequent follow-on screening activities.”
The most common strategy used to generate initial screening sets is maximizing their diversity
by minimizing the similarity of compounds in set. Maximum dissimilarity can be sized by an
algorithm named Dfragall.
Cell-based sampling schemes like simple, threshold based, proportional and property-based sam-
pling are used to obtain a subset of desired size and diversity.
In order to enhancing the diversity and maintaining integrity of an existing collection, com-
pound acquisition is important to enhance assays and measurement of biological activity or
decompose over time.