Professional Documents
Culture Documents
The rst design issue is the assignment of data to the processors. The computation time at a node
is proportional to the number of records at that node. So, the best way to achieve load balance at
the root node is to fragment each attribute list horizontally into p fragments of equal sizes and to
assign each fragment to a different processor. We assume that the initial assignment of data to the
processors remains unchanged throughout the process of classication. With this approach, the
splitting decision may cause the records of a node to be distributed unevenly among processors.
If the computations for all the records of all the nodes at a level are performed before doing any
synchronizing communication between the processors, then the per-node imbalance would not be
a concern. This approach of doing per-level communications as against per-node
communications should also perform better on the parallel machines with high communication
latencies, especially because the number of nodes will be large at the levels much deeper in tree.