[U] Week 2 Day 2

이동찬·2022년 10월 1일

네이버 부스트캠프 AI Tech

목록 보기

9/18

1. 강의 복습 내용

목표 :
- 기본 과제 1 : PyTorch Documentation 활용 & Custom Model 제작 (with 🦆부덕이)
- PyTorch 강의 2개 듣기

결과 :
- 기본 과제 1 (△)
- PyTorch 강의 2개 듣기 (X)

2. 피어 세션

강의 학습 내용과 과제에서 다뤘던 헷갈리는 부분을 서로 토론함
1. gather 함수에서의 dimension이 의미하는 것?
→ indices가 참조할 dimension
2. $L$ 을 $p_k$ 로 미분한 값이 왜 저렇게 나오는지에 대해
강의에서 나왔던 PyTorch template vs. Pytorch Lightning 차이점에 대해

3. 공부를 하며 고민한 내용, 고민 결과

torch.nn.Linear()에서 weight를 (out_features, in_features)로 만들고 weight를 transpose시켜서 input.matmul(weight.t())하는 이유는 뭘까? weight를 (in_features, out_features)로 만들어서 input.matmul(weight)하면 "불필요한 transpose 연산을 하지 않아도 되지 않는가?"
→ torch.nn.Linear() 공식문서 : https://pytorch.org/docs/master/_modules/torch/nn/modules/linear.html#Linear

결론

참고 : https://discuss.pytorch.org/t/why-does-the-linear-module-seems-to-do-unnecessary-transposing/6277

backend side에서 BLAS(Basic Linear Algebra Subroutines)가 transposed matrices 연산을 지원하므로 transpose할 때 overhead(간접 비용)은 따로 들지 않음
→ "Transposition is free for gemm(general matrix multiply) calls"
→ 즉, transposing in forward pass는 overhead가 없지만, backward pass에서 less efficient함

matrix multiplication의 second matrix를 저장할 때 transposed 형태로 저장하면 효율성을 더 상승시킴
→ multiplication routine이 메모리에 더 contiguous한(연속적인, 인접한) 방식으로 접근할 수 있어서 cache misses가 더 적음
→ https://stackoverflow.com/questions/18796801/increasing-the-data-locality-in-matrix-multiplication