As one of the practical paradigms that preserves data privacy when training a shared machine learning model in a decentralized fashion, federated learning has been studied extensively in the past five years. However, a substantial amount of existing work in the literature questioned its core claim of preserving data privacy, and proposed gradient leakage attacks to reconstruct raw data used for training. In the day and age of fine-tuning large language models, whether data privacy can be preserved is very important.
In this talk, I will show that despite the conventional wisdom that federated learning pose privacy leaks, data privacy, in fact, may be quite well protected. Claims in the existing literature on gradient leakage attacks are not valid in our experiments, for both image classification and natural language tasks. Our extensive array of experiments were based on Plato, an open-source framework that I developed from scratch for reproducible benchmarking comparisons in federated learning.