Adversarial inputs = almost indistinguishable from natural data yet classified incorrectly by the network.
Research at the time suggested that adversarial attacks may be an inherent weakness of deep networks.
Problem addressed through robust optimisation.
To get to fully resistant deep learning models, one of the main steps is robustness against well defined classes of(strong) adversaries.
So the question is:
How can we train deep neural networks that are robust to adversarial inputs?
Benign input = an input that the model is supposed to correctly classify, valid input
So we can frame the problem using a natural saddle point formulation which has 2 benefits:
<aside> ❗ So we find that adversarial training $\propto$ optimising the saddle point problem.
</aside>
Explores the optimisation of saddle point problem which can be solved using first order methods
<aside> 💡 First order method = anything using info from the first derivative to optimise like gradient descent, SGD etc.
</aside>
Using these insights, it motivates PGD(Projected Gradient Descent) as a univeral first order adversary = strongest attack using the first order information about the network.
Model capacity plays an important role in withstanding strong adversarial attacks.
Trained on MNIST and CIFAR10 datasets using PGD as a reliable first order adversary to yeild excellent results.