A) It prevents overfitting by regularizing the model
B) It provides a consistent learning rate throughout the training
C) It provides a consistent learning rate throughout the training
D) It's particularly useful for non-convex optimization problems