Benchmark Platform
To integrate all the algorithms, datasets, and scenarios, we standardize the experimental settings and create a unified benchmark platform for a fair comparison of these algorithms. Here, we present the benchmark results of 20 algorithms across two widely-used label skew scenarios. This is just one example. You can obtain different results by resetting all the configurations in main.py
in our PFLlib.
Leaderboard
Our methods—FedCP, GPFL, and FedDBE—lead the way. Notably, FedDBE stands out with robust performance across varying data heterogeneity levels.
The test accuracy (%) on the CV and NLP tasks in label skew settings.
Settings | Pathological Label Skew Setting | Practical Label Skew Setting | ||||||
---|---|---|---|---|---|---|---|---|
FMNIST | Cifar100 | TINY | FMNIST | Cifar100 | TINY | TINY* | AG News | |
FedAvg | 80.41 ± 0.08 | 25.98 ± 0.13 | 14.20 ± 0.47 | 85.85 ± 0.19 | 31.89 ± 0.47 | 19.46 ± 0.20 | 19.45 ± 0.13 | 87.12 ± 0.19 |
FedProx | 78.08 ± 0.15 | 25.94 ± 0.16 | 13.85 ± 0.25 | 85.63 ± 0.57 | 31.99 ± 0.41 | 19.37 ± 0.22 | 19.27 ± 0.23 | 87.21 ± 0.13 |
FedGen | 79.76 ± 0.60 | 20.80 ± 1.00 | 13.82 ± 0.09 | 84.90 ± 0.31 | 30.96 ± 0.54 | 19.39 ± 0.18 | 18.53 ± 0.32 | 89.86 ± 0.83 |
Per-FedAvg | 99.18 ± 0.08 | 56.80 ± 0.26 | 28.06 ± 0.40 | 95.10 ± 0.10 | 44.28 ± 0.33 | 25.07 ± 0.07 | 21.81 ± 0.54 | 87.08 ± 0.26 |
pFedMe | 99.35 ± 0.14 | 58.20 ± 0.14 | 27.71 ± 0.40 | 97.25 ± 0.17 | 47.34 ± 0.46 | 26.93 ± 0.19 | 33.44 ± 0.33 | 87.08 ± 0.18 |
Ditto | 99.44 ± 0.06 | 67.23 ± 0.07 | 39.90 ± 0.42 | 97.47 ± 0.04 | 52.87 ± 0.64 | 32.15 ± 0.04 | 35.92 ± 0.43 | 91.89 ± 0.17 |
APFL | 99.41 ± 0.02 | 64.26 ± 0.13 | 36.47 ± 0.44 | 97.25 ± 0.08 | 46.74 ± 0.60 | 34.86 ± 0.43 | 35.81 ± 0.37 | 89.37 ± 0.86 |
FedFomo | 99.46 ± 0.01 | 62.49 ± 0.22 | 36.55 ± 0.50 | 97.21 ± 0.02 | 45.39 ± 0.45 | 26.33 ± 0.22 | 26.84 ± 0.11 | 91.20 ± 0.18 |
FedAMP | 99.42 ± 0.03 | 64.34 ± 0.37 | 36.12 ± 0.30 | 97.20 ± 0.06 | 47.69 ± 0.49 | 27.99 ± 0.11 | 29.11 ± 0.15 | 83.35 ± 0.05 |
APPLE | 99.30 ± 0.01 | 65.80 ± 0.08 | 36.22 ± 0.40 | 97.06 ± 0.07 | 53.22 ± 0.20 | 35.04 ± 0.47 | 39.93 ± 0.52 | 84.10 ± 0.18 |
FedALA | 99.57 ± 0.01 | 67.83 ± 0.06 | 40.31 ± 0.30 | 97.66 ± 0.02 | 55.92 ± 0.03 | 40.54 ± 0.02 | 41.94 ± 0.02 | 92.45 ± 0.10 |
FedPer | 99.47 ± 0.03 | 63.53 ± 0.21 | 39.80 ± 0.39 | 97.44 ± 0.06 | 49.63 ± 0.54 | 33.84 ± 0.34 | 38.45 ± 0.85 | 91.85 ± 0.24 |
FedRep | 99.56 ± 0.03 | 67.56 ± 0.31 | 40.85 ± 0.37 | 97.56 ± 0.04 | 52.39 ± 0.35 | 37.27 ± 0.20 | 39.95 ± 0.61 | 92.25 ± 0.20 |
FedRoD | 99.52 ± 0.05 | 62.30 ± 0.02 | 37.95 ± 0.22 | 97.52 ± 0.04 | 50.94 ± 0.11 | 36.43 ± 0.05 | 37.99 ± 0.26 | 92.16 ± 0.12 |
FedBABU | 99.41 ± 0.05 | 66.85 ± 0.07 | 40.72 ± 0.64 | 97.46 ± 0.07 | 55.02 ± 0.33 | 36.82 ± 0.45 | 34.50 ± 0.62 | 95.86 ± 0.41 |
FedCP | 99.66 ± 0.04 | 71.80 ± 0.16 | 44.52 ± 0.22 | 97.89 ± 0.05 | 59.56 ± 0.08 | 43.49 ± 0.04 | 44.18 ± 0.21 | 92.89 ± 0.10 |
GPFL | 99.85 ± 0.08 | 71.78 ± 0.26 | 44.58 ± 0.06 | 97.81 ± 0.09 | 61.86 ± 0.31 | 43.37 ± 0.53 | 43.70 ± 0.44 | 97.97 ± 0.14 |
FedDBE | 99.74 ± 0.04 | 73.38 ± 0.18 | 42.89 ± 0.29 | 97.69 ± 0.05 | 64.39 ± 0.27 | 43.32 ± 0.37 | 42.98 ± 0.52 | 96.87 ± 0.18 |
FedDistill | 99.51 ± 0.03 | 66.78 ± 0.15 | 37.21 ± 0.25 | 97.43 ± 0.04 | 49.93 ± 0.23 | 30.02 ± 0.09 | 29.88 ± 0.41 | 85.76 ± 0.09 |
FedProto | 99.49 ± 0.04 | 69.18 ± 0.03 | 36.78 ± 0.07 | 97.40 ± 0.02 | 52.70 ± 0.33 | 31.21 ± 0.16 | 26.38 ± 0.40 | 96.34 ± 0.58 |
The test accuracy (%) under different heterogeneous degrees in the practical label skew setting using the 4-layer CNN.
Datasets | Cifar100 | Tiny-ImageNet | ||||||
---|---|---|---|---|---|---|---|---|
β = 0.01 | β = 0.1 | β = 0.5 | β = 5 | β = 0.01 | β = 0.1 | β = 0.5 | β = 5 | |
FedAvg | 23.58 ± 0.43 | 31.89 ± 0.47 | 27.99 ± 0.32 | 35.51 ± 0.33 | 15.70 ± 0.46 | 19.46 ± 0.20 | 21.14 ± 0.47 | 21.71 ± 0.27 |
FedProx | 23.74 ± 0.25 | 31.99 ± 0.41 | 35.05 ± 0.20 | 35.31 ± 0.43 | 15.66 ± 0.36 | 19.37 ± 0.22 | 21.22 ± 0.47 | 21.69 ± 0.25 |
FedGen | 20.89 ± 0.42 | 30.96 ± 0.54 | 33.88 ± 0.83 | 35.64 ± 0.36 | 15.87 ± 0.23 | 19.19 ± 0.18 | 21.06 ± 0.08 | 21.44 ± 0.51 |
Per-FedAvg | 49.25 ± 0.54 | 44.28 ± 0.33 | 35.32 ± 0.12 | 24.94 ± 0.64 | 39.39 ± 0.30 | 25.07 ± 0.07 | 16.36 ± 0.13 | 12.08 ± 0.27 |
pFedMe | 65.28 ± 0.83 | 47.34 ± 0.46 | 32.52 ± 1.57 | 14.70 ± 0.92 | 41.45 ± 0.14 | 26.93 ± 0.19 | 17.48 ± 0.61 | 4.03 ± 0.50 |
Ditto | 73.29 ± 0.49 | 52.87 ± 0.64 | 26.28 ± 0.17 | 35.72 ± 0.21 | 50.62 ± 0.02 | 32.15 ± 0.04 | 18.98 ± 0.05 | 21.79 ± 0.62 |
APFL | 72.63 ± 0.15 | 46.74 ± 0.60 | 25.69 ± 0.21 | 20.76 ± 9.03 | 49.96 ± 0.04 | 34.87 ± 0.43 | 23.31 ± 0.18 | 16.12 ± 0.10 |
FedFomo | 71.11 ± 0.08 | 45.39 ± 0.45 | 24.35 ± 0.55 | 29.77 ± 0.56 | 46.36 ± 0.54 | 26.33 ± 0.22 | 11.59 ± 0.11 | 14.86 ± 0.55 |
FedAMP | 72.78 ± 0.17 | 47.69 ± 0.49 | 25.94 ± 0.12 | 13.71 ± 0.12 | 48.42 ± 0.06 | 27.99 ± 0.11 | 12.48 ± 0.21 | 5.41 ± 0.14 |
APPLE | 71.11 ± 0.10 | 53.22 ± 0.20 | 41.81 ± 0.23 | 32.68 ± 0.28 | 48.04 ± 0.10 | 35.04 ± 0.47 | 24.28 ± 0.21 | 17.79 ± 0.47 |
FedALA | 73.82 ± 0.35 | 55.92 ± 0.03 | 44.17 ± 0.51 | 34.27 ± 0.70 | 55.75 ± 0.02 | 40.54 ± 0.02 | 29.04 ± 0.13 | 22.12 ± 0.22 |
FedPer | 71.09 ± 0.34 | 49.63 ± 0.54 | 29.17 ± 0.21 | 16.09 ± 0.19 | 51.83 ± 0.22 | 33.84 ± 0.34 | 17.31 ± 0.19 | 9.61 ± 0.06 |
FedRep | 74.91 ± 0.16 | 52.39 ± 0.35 | 29.74 ± 0.21 | 14.93 ± 0.12 | 55.43 ± 0.15 | 37.27 ± 0.20 | 16.74 ± 0.09 | 8.04 ± 0.05 |
FedRoD | 67.78 ± 0.55 | 50.94 ± 0.11 | 36.29 ± 0.07 | 25.63 ± 0.74 | 49.17 ± 0.06 | 36.43 ± 0.05 | 23.23 ± 0.11 | 16.71 ± 0.24 |
FedBABU | 73.30 ± 0.32 | 55.02 ± 0.33 | 39.35 ± 0.45 | 27.61 ± 0.51 | 53.97 ± 0.25 | 36.82 ± 0.45 | 23.08 ± 0.20 | 15.42 ± 0.30 |
FedCP | 77.76 ± 0.03 | 59.56 ± 0.08 | 41.76 ± 0.15 | 26.83 ± 0.03 | 56.31 ± 0.39 | 43.49 ± 0.04 | 28.57 ± 0.07 | 19.12 ± 0.13 |
GPFL | 76.27 ± 0.12 | 61.86 ± 0.31 | 43.73 ± 0.42 | 30.86 ± 0.28 | 57.05 ± 0.41 | 43.37 ± 0.53 | 26.85 ± 0.14 | 16.34 ± 0.39 |
FedDBE | 78.39 ± 0.08 | 64.39 ± 0.27 | 52.58 ± 0.17 | 41.12 ± 0.30 | 54.61 ± 0.09 | 43.32 ± 0.37 | 33.71 ± 0.15 | 26.76 ± 0.11 |
FedDistill | 74.63 ± 0.14 | 49.93 ± 0.23 | 29.32 ± 0.46 | 15.45 ± 0.44 | 50.49 ± 0.07 | 30.02 ± 0.09 | 14.34 ± 0.06 | 6.56 ± 0.06 |
FedProto | 77.19 ± 0.15 | 52.70 ± 0.33 | 32.57 ± 0.23 | 14.20 ± 3.92 | 50.65 ± 0.32 | 31.21 ± 0.16 | 16.69 ± 0.02 | 8.92 ± 0.19 |
The time cost (in minutes) on Tiny-ImageNet using ResNet-18 in the practical label skew setting.
Items | Total time | Iterations | Average time |
---|---|---|---|
FedAvg | 365 | 230 | 1.59 |
FedProx | 325 | 163 | 1.99 |
FedGen | 259 | 50 | 5.17 |
Per-FedAvg | 121 | 34 | 3.56 |
pFedMe | 1157 | 113 | 10.24 |
Ditto | 318 | 27 | 11.78 |
APFL | 156 | 57 | 2.74 |
FedFomo | 193 | 71 | 2.72 |
FedAMP | 92 | 60 | 1.53 |
APPLE | 132 | 45 | 2.93 |
FedALA | 123 | 63 | 1.93 |
FedPer | 83 | 43 | 1.92 |
FedRep | 471 | 115 | 4.09 |
FedRoD | 87 | 50 | 1.74 |
FedBABU | 811 | 513 | 1.58 |
FedCP | 204 | 74 | 2.75 |
GPFL | 171 | 75 | 2.28 |
FedDBE | 171 | 107 | 1.60 |
FedDistill | 45 | 16 | 2.78 |
FedProto | 138 | 25 | 5.52 |
Experimental Setup
We set up the experiments following our pFL algorithm GPFL, as it provides comprehensive evaluations. Here are the details:
Datasets and Models
- For the CV tasks, we use three popular datasets:
- Fashion-MNIST (FMNIST) (
generate_FashionMNIST.py
) | 4-layer CNN (model code) - Cifar100 (
generate_Cifar100.py
) | 4-layer CNN (model code) - Tiny-ImageNet (
generate_TinyImagenet.py
) | 4-layer CNN (model code) and ResNet-18 (model code)
- Fashion-MNIST (FMNIST) (
- For the NLP task, we use one popular dataset:
- AG News (
generate_AGNews.py
) | fastText (model code)
- AG News (
We denote TINY as the 4-layer CNN on Tiny-ImageNet, and TINY* as the ResNet-18 on Tiny-ImageNet, respectively.
Two Widely-Used Label Skew Scenarios
- Pathological label skew: We sample data with label distribution of 2/10/20 for each client on FMNIST, Cifar100, and Tiny-ImageNet, drawn from a total of 10/100/200 categories. The data is disjoint, with varying numbers of samples across clients.
- Practical label skew: Data is sampled from FMNIST, CIFAR-100, Tiny-ImageNet, and AG News using Dirichlet distribution, denoted by \(Dir(\beta)\). Specifically, we sample \(q_{c, i} \sim Dir(\beta)\) (with \(\beta = 0.1\) or \(\beta = 1\) by default for CV/NLP tasks) and allocate a \(q_{c, i}\) proportion of samples with label \(c\) to client \(i\).
Other Implementation Details
Following pFedMe and FedRoD, we use 20 clients with a client joining ratio of \(\rho = 1\), with 75% of data for training and 25% for evaluation. We report the best global model performance for traditional FL and the best average performance across personalized models for pFL. The batch size is set to 10, and the number of local epochs is 1. We run 2000 iterations with three trials per method and report the mean and standard deviation.
References
If you're interested in experimental results (e.g., accuracy) for the algorithms mentioned above, you can find results in our accepted FL papers, which also utilize this library. These papers include:
Here are the relevant papers for your reference:
@inproceedings{zhang2023fedala, title={Fedala: Adaptive local aggregation for personalized federated learning}, author={Zhang, Jianqing and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Guan, Haibing}, booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, volume={37}, number={9}, pages={11237--11244}, year={2023} } @inproceedings{Zhang2023fedcp, author = {Zhang, Jianqing and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Guan, Haibing}, title = {FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy}, year = {2023}, booktitle = {Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining} } @inproceedings{zhang2023gpfl, title={GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning}, author={Zhang, Jianqing and Hua, Yang and Wang, Hao and Song, Tao and Xue, Zhengui and Ma, Ruhui and Cao, Jian and Guan, Haibing}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={5041--5051}, year={2023} } @inproceedings{zhang2023eliminating, title={Eliminating Domain Bias for Federated Learning in Representation Space}, author={Jianqing Zhang and Yang Hua and Jian Cao and Hao Wang and Tao Song and Zhengui XUE and Ruhui Ma and Haibing Guan}, booktitle={Thirty-seventh Conference on Neural Information Processing Systems}, year={2023}, url={https://openreview.net/forum?id=nO5i1XdUS0} }