LEAF : A Benchmark for Federated Settings

Sungchul Kim·2022년 4월 3일

Federated-Learning

목록 보기

5/5

오늘은 LEAF라는 library에 대해 소개하도록 하겠습니다. LEAF는 realistic한 benchmark dataset을 제공합니다. 링크는 다음과 같습니다.

LEAF

우선, library설명에 앞서 federated learning종류에 관해 살펴보도록 하겠습니다. federated learning은 크게 horizontal/vertical로 구분됩니다.

Figure 1. Categorization of Federated Learning. a) Horizontal federated learning.

개개인(each client)이 가지는 data는 같은 feature space에 존재함.
- Client A / Client B (Mnist/Mnist)

Figure 2. Categorization of Federated Learning. b) Vertical federated learning.

개개인(each client)이 가지는 data는 서로 다른 feature space에 존재함.
- Client A / Client B (Amazon/IMDB)

개별 client가 가지는 mobile device data는 다르기 때문에, vertical federated learning이 좀 더 realistic setting이라는 생각이 듭니다.

다시 돌아와서, LEAF library가 가지는 dataset에 대해 설명하도록 하겠습니다. LEAF는 dataset을 호출하면 전처리 후 standardized format으로 변환시켜주는 역할을 합니다.

Figure3. Statistics of datasets in LEAF

Data Description in LEAF

LEAF는 총 5가지 데이터셋을 가지고 있습니다. 하나씩 살펴보도록 하겠습니다.

Federated Extended MNIST
- Partitioning the data in Extended MNIST
- https://paperswithcode.com/dataset/emnist
Sentiment140
- Tweet에 존재하는 emotion을 기반으로 tweet에 자동적으로 annotation을 달아준 dataset
- https://www.tensorflow.org/datasets/catalog/sentiment140
Shakespeare (The Complete Works of Wiliam Shakespeare)
- https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/shakespeare/load_data?hl=ko
CelebA
- Partitioning the data in Large-scale CelebFaces Attributes Dataset
- https://www.tensorflow.org/datasets/catalog/celeb_a
Reddit
- Preprocess comments posted on the social network on Dec. 2017
- https://paperswithcode.com/dataset/reddit

김성철