작성자 : 장아연
0. Recap
Deep Graph Encoder

Deep Graph Encoders는 임의의 graph를 Deep Neural Network를 통과 시켜 embedding space로 사상 시키는 것
A General GNN Framework

(4) Graph augmentation
: what kind of graph and feature augumentation can we create to shape the structure of this neural network
(5) Learning Objective
: learn objectives and how to make training work
1. Graph Augmentation for GNNs
Why Graph Augment
가정 : Raw input Graph 와 computational Graph는 동일
- Feature : lack features
- Graph Structure :
 sparse -> inefficient passing
 dense -> too costly passing
 large -> not computational graph into GPU
 즉, input graph는 embedding을 위한 computation graph의 적합한 상태가 아님.
Idea : Raw input Graph 와 computational Graph는 동일 X
Graph Augmentation Approaches
Graph Feature Augmentation
- lack features : feature augmentation을 통해 feature 생성
Standard Approach
필요성 : adj. matrix만 가지고 있는 경우 Input Graph에서 node feature가 없는 경우 흔히 발생
1. Constant Node Feature
Assign constant values to nodes
basically all the node have same future value of 1

2. One-hot node feature
Assign unique IDs to nodes
IDs are converted to one-hot vectors
flag value 1 at ID of that single node

Constant Node Feature vs One-hot node feature

Cycle Count
필요성 : GNN 학습에 어려움을 겪는 특정 구조 존재
- 
GNN은 v1이 속한 cycle의 길이 학습 불가  
- 
v1가 어떤 graph에 속해 있는지 구별 불가능 
- 
모든 node의 차수가 2임 
- 
computational graph는 같은 binary tree임
  
  
 
- 
그 외 Node Degree, Clustering coeffiecient, PageRank, Centrality 등 이용  
Graph Structure Augmentation
- too sparse : virtual node / edges를 생성
- too dense : message passing 과정에서 neighbors sampling 
- too large : embedding 계산을 위해 subgraph sampling
Add Virtual Edge
개념 
- virtual edge를 이용해 2-hop 관계의 이웃 노드 연결
- adj.matrix 대신 A+A2 이용 
예시
- Author(A)와 Author(B) 연결
 : Author(A) -> paper -> Author(B)
  
Add Virtual node
개념
- virtual node를 이용해 graph의 모든 node를 연결
- 모든 node와 node는 distance 2를 가짐
 : node(A) ->virtual node-> node(B)
예시

결과 : virtual node/edge 추가해 sparse한 graph에서 message passing 향상
Node Neighborhood Sampling
기존 
- 모든 node는 message passing에 사용됨
  
개념 
- node neighborhood를 random sampling해 message passing 진행
예시 :
1. random하게 2개의 neighbor 선택해 sampling

2. 다음 layer에서 다른 neighbor 선택해 resampling해 embedding 진행

결과
- 모든 neighbor 사용한 경우와 유사한 embedding
- computational cost 경감
 (large graph scaling 해줌)
 
2. Prediction with GNNs
Prediction head : How to train GNN?

- hv(L) : node L from graph neural network의 final layer
- Prediction head: final model's output
- Label : where label come from?
- Loss function : define loss function, what to optimize
Different Prediction Head
서로 다른 prediction head에 따라 서로 다른 task가 요구됨

Node-level task
- node embedding 사용해 예측
- prediction head : { hv(L)∈Rd,∀v∈G }
 get d-dim node embedding from GNN computation
- k way prediction의 경우
 classification : classify among k categories
 regression : regress on k target
- y^v=Headnode(hv(L))=W(H)hv(L)
 output head of given node=matrix time final embedding of node
- W(H)∈Rk∗d : W(H) map node embedding from hv(L)∈Rd (embedding space) to  y^v∈Rk (prediction)
Edge-level task
- node embedding pair 사용해 예측
  
- k-way prediction의 경우 : y^uv=Headedge(hu(L),hv(L))
option for edge level prediction
1. Concatenation + Linear transformation

- y^uv=(hu(L))Thv(L)
- Linear : 2d-dimensional => k-dim embedding
2. Dot product
- y^uv=Linear(Concat(hu(L),hv(L)))
- only applied to 1 way prediction (binary prediction)
- for k-way prediction : multi-head attention과 유사
  
- trainable different matrix W : W(1),W(2) ... W(k) 이용
 => every class get to learn its own transformation
- prediction for every class인 y^uv(1) ... y^uv(k) 를 concat해 final prediction인 y^uv 구함
Graph-level task
- graph 속 모든 node embedding 이용해 예측
- y^G=Headgraph{ hv(L)∈Rd,∀v∈G }
 take individual node embedding for every node and aggregate them to find embedding in graph
  
- Headgraph(∗)과 GNN에서 AGG(∗) 유사
  
option for Graph level prediction

위 3개는 small graph에 적합함
large graph를 global pooling하여 information lose

- solution : Hierarchical Global Pooling
Hierarchical Global Pooling
- 모든 node embedding을 위계 따라 aggregate
개념 예시

DiffPool idea : Hierarchically pool node embedding

3. Training Graph Neural Network
Prediction & labels : ground truth come from

Supervised Label vs Unsupervised Signal
1. supervised learning 
- label from external source
 node label yv : citation network (node가 소속된 학문 분야)
 edge label yuv : transition network (edge의 fraudulent 여부)
 graph label yG : among molecular graph, drug likeness graph
- unsupervised learning
- only graph, no external label
- find supervision signal in graph
 node label yv : Node statistics : clustering coefficient , Pagerank
 edge label yuv : Link Prediction : hide edge between node and predict it
 graph label yG :Graph statistics : predict each graph are isomorphic
Loss function : compute final loss

Setting for GNN training
- N data point (node / edge / graph)
- y^(i) : prediction at all level for label y(i)
  
GNN은 Classification과 Regression 모두 사용 가능 그러나 각각 loss function과 evaluation이 다름
Classification Loss
- Classification : 예측값 y^(i)의 discrete value
- Node classification : node가 속한 카테고리 예측
- Cross entropy (CE)
  
Regression Loss
- Regression : 예측값 y^(i)의 continuous value
- mocular graph의 drug 유사도 예측
- Mean Squared Error (MSE) : L2 loss
  
Evaluation metric : measure success of GNN

Evaluate Regression

Evaluate Classification
- multi-class classification : Accuracy
  
- binary classification : Confusion matrix, ROC Curve

- Accuracy : TP+TN+FP+FNTP+TN =∣Dataset∣TP+TN
- Precision : TP+FPTP
- Recall : TP+FPTP
- f1-score : P+R2P∗R
  
- TPR = recall = TP+FNTP
- FPR = FP+TNFP
- dashed line : random classifier의 performance
- threshold : binary classifier에 따라 값 다름
- ROC AUC : ROC curve 밑에 면적
4. Setting - up GNN Prediction Tasks

Fixed split vs Random split
Fixed split
- Train : GNN parameters를 optimize에 사용
- Validation : model/hyperparameters를 develope
- Test :  final performance를 report하는 데 사용
 -> test 결과에 대한 해당 실행 결과 보장X
 Random split
- randomly split training / validation / test dataset
 -> 서로 다른 random seed에 대해 performance를 average함
Why Splitting Graph is special
image data splite하는 경우
- image classification은 모든 data point가 image로 각각의 data가 서로 independent함
  
graph data splite하는 경우
- node classification은 모든 data point가 node로 각각의 data가 서로 dependent함
- node 1과 node 2는 node 5를 예측하는데 영향을 줌
 : node 1과 node2가 train data, node 5가 test data인 경우, information leakage 발생 가능
  
solution
1. Transductive setting
- entire graph in all dataset
- only split label
- dataset consist of one graph
- node / edge prediction
- train : entire graph, use node 1 & 2 label
- validation : entire graph, use node 3 & 4 label
  
2. Inductive setting
- different graph in each dataset
- dataset consist of multiple graph
- generalize to unseen graph
- node / edge / graph task
- train : embedding 계산 graph over node 1 & 2, train use node 1 & 2 label
- validation :  embedding 계산 graph over node 3 & 4, evaluate node 3 & 4 label
  
Example
1. Node Classification
- transductive node classification :
 can observe entire graph but only observe respective nodes
  
- inductive node classification :
 each contain independent graph
  
- Graph Classification
- only in inductive setting 
- test on unseen graph
  
- Link Classification
- predict missing edge
- unsupervised / self-supervised task
 -> create label & dataset split
 = we hide edge & let GNN predict whether edge exist
  
Setting up Link Prediction
1-0. Assign edge as Message edges or Supervision edges
- Message edges : GNN message passing에 사용
- Supervision edges : objectives 계산에 사용
1-1.
message edges : remain in graph
supervision edges : supervise edge prediction made by model
2-0. split edges as train / validation / test
2-1. Inductive link prediction split의 경우
- contain independent graph in dataset
- 각 dataset에는 message edges와supervision edges가 포함됨
- supervision edgesnot fed in GNN
  
2-2. Transductive link prediction split의 경우
- after train, supervision edgesknown to GNN
- use supervision edgesat validation time
  
- in sum, 4 types of edges
  
14기 김상현
GNN의 augmentation과 학습 방법에 대해 이해할 수 있었습니다.
유익한 강의 감사합니다:)