You are on page 1of 13

Network Surgery Scheduler

Results and Proposal for Algorithm 2


Summary
➢What have we done so far?

• Instead of training the models for a fixed number of epochs, we train them until their
average of accuracy increase rate is below a threshold.

• We performed a series of experiments on the datasets 'fashion_mnist' and 'cifar10' with


different hyperparameter settings.

• Hyperparameters that were varied in the test pipeline: 'accuracy increase rate' and 'size
of the initial model'.
➢Fashion MNIST
Results and
Conclusion • Best Case
➢Fashion MNIST
Results and
Conclusion • Average Case
➢Fashion MNIST
Results and
Conclusion • Worst Case
➢Cifar10
Results and
Conclusion • Best Case
➢Cifar10
Results and
Conclusion • Average Case
➢Cifar10
Results and
Conclusion • Worst Case
Comparison with previous version

Total number of epochs that were spent on training of candidates

Medium (model size) Large (model size)


Dataset
Minimum epochs Maximum epochs Minimum epochs Maximum epochs

fashion_mnist 96(690) 126(690) (700) (700)

cifar10 (730) (730) 47(510) 47(510)

values in the bracket indicate epochs spent when previous version of surgeon was tested with same hyperparameter configuration.
Conclusion
• Training time was significantly reduced compared to the earlier version of
Surgeon. When tested with the earlier version of Surgeon in a similar test environment, it
was reduced by an average of 50%.

• The reduced training time did not affect the validation accuracy of the optimized model
when it was trained for a smaller number of epochs.

• The total number of epochs spent throughout the surgeon's run has dropped
significantly.

• The initial model size did not have much influence on the training amount required by
the candidate.

• The Surgeon performs better if the accuracy increase rate lies between 0.01 to 0.03.
Proposed Changes for Scheduling Training
➢ Earlier

• The initial model is trained for 10 epochs.

• The child candidates are trained for 10 epochs (epoch step).

• After we have reached maximum number of tries per epoch if no candidate has
reached at least the same accuracy as the LastBestScore, we keep the previous best
candidate.
➢ Algorithm 1
• The initial model is trained until the maximum number of epochs is reached or the average
accuracy increase rate of the model for the last 10 epochs (this value is a hyperparameter) is
smaller than the threshold for the accuracy increase rate (X1 threshold for the accuracy increase
rate of the initial model).

• Child candidates are trained until the maximum number of epochs is reached or the average
accuracy increase rate of the model for the last 10 (this value is a hyperparameter) epochs is less
than the accuracy increase rate threshold (X2 accuracy increase rate threshold).

• After each try (in an epoch step), we select all candidates in new branches whose accuracy is
better than that of the worst candidate in the current branches, and assign them to the current
branches. If no such candidate exists, we stop the whole algorithm and return the best version so
far.
➢ Now
• We set a maximum number of epochs that will be allowed for the surgeon.

• The initial model is trained same as algorithm 1.

• The child candidates are trained based on the parameter change compared to their parent, e.g., if
a child candidate has 10% more parameters than its parent and its parent has been trained for 10
epochs. Then the child candidate is trained for 1 epoch (as we reuse the weights of the parent
model).

• After each trial (in an epoch step), we select all candidates in new branches whose accuracy is
better than that of the worst candidate in the current branches, and assign them to the current
branches. If no such candidate exists, we stop and return the best version so far.

• We return our optimized model to the surgeon until we exhaust our computational budget, i.e.,
the epoch counter for the surgeon is less than the maximum number of epochs allowed for the
surgeon.

You might also like