[Q&A] CNN : Lesson 6. Autoencoder

olxtar·2022년 5월 19일

Question

class ConvDenoiser(nn.Module):
    def __init__(self):
        super(ConvDenoiser, self).__init__()
        ## encoder layers ##
        # conv layer (depth from 1 --> 32), 3x3 kernels
        self.conv1 = nn.Conv2d(1, 32, 3, padding=1)  
        # conv layer (depth from 32 --> 16), 3x3 kernels
        self.conv2 = nn.Conv2d(32, 16, 3, padding=1)
        # conv layer (depth from 16 --> 8), 3x3 kernels
        self.conv3 = nn.Conv2d(16, 8, 3, padding=1)
        # pooling layer to reduce x-y dims by two; kernel and stride of 2
        self.pool = nn.MaxPool2d(2, 2)
        
        ## decoder layers ##
        # transpose layer, a kernel of 2 and a stride of 2 will increase the spatial dims by 2
        self.t_conv1 = nn.ConvTranspose2d(8, 8, 3, stride=2)  # kernel_size=3 to get to a 7x7 image output
        # two more transpose layers with a kernel of 2
        self.t_conv2 = nn.ConvTranspose2d(8, 16, 2, stride=2)
        self.t_conv3 = nn.ConvTranspose2d(16, 32, 2, stride=2)
        # one, final, normal conv layer to decrease the depth
        self.conv_out = nn.Conv2d(32, 1, 3, padding=1)
    def forward(self, x):
        ## encode ##
        # add hidden layers with relu activation function
        # and maxpooling after
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        # add second hidden layer
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        # add third hidden layer
        x = F.relu(self.conv3(x))
        x = self.pool(x)  # compressed representation
        
        ## decode ##
        # add transpose conv layers, with relu activation function
        x = F.relu(self.t_conv1(x))
        x = F.relu(self.t_conv2(x))
        x = F.relu(self.t_conv3(x))
        # transpose again, output should have a sigmoid applied
        x = F.sigmoid(self.conv_out(x))
                
        return x

위의 코드와 아래의 설명과 같이 Tensor size (image data)가 변화한다.
Encoder에서의 Convolutional layer formula, 즉 아래의 공식은 알겠는데...
Decoder에서의 Transpose Convolutional layer에서의 변화는 모르겠다...
대충 kernel size = 2, stride = 2로하면 size가 2배되는것은 알겠는데...
Decoder의 첫번째 t_conv1에서 왜 kernel size = 3일까?

Convolutional operation 공식

$S_{output} = \frac{1}{Stride}(S_{input} - S_{filter} + 2\times Padding) + 1$

Encoder
Input 28x28x1
conv1 28x28x32 $\rightarrow$ pool 14x14x32
conv2 14x14x16 $\rightarrow$ pool 7x7x16
conv3 7x7x8 $\rightarrow$ pool 4x4x8

Decoder
Input 4x4x8
t_conv1 7x7x8
t_conv2 14x14x16
t_conv3 28x28x32
conv_out 28x28x1

Answer-1

Deconvolutional operation 공식

$S_{output} = Stride \times(S_{input}-1)+S_{filter}-2\times Padding$

Re-Question

4x4x8 사이즈의 데이터를 Transpose Conv Layer(=Deconvolutional Layer)를 통과시켰을 때, 7x7x8이 나온다면 Filter, 즉 Kernel size?

Deconvolutional operation 공식

$S_{output} = Stride \times(S_{input}-1)+S_{filter}-2\times Padding$

$S_{input}=4$
$S_{output}= 7$
$Stride=2$
$S_{filter}= \;?$
$Padding= \; ?$

$\therefore$ 대입해보면...
7 = 2 * (4-1) + $S_{filter}$ - 2 * $Padding$
7 = 6 + $S_{filter}$ - 2 * $Padding$

뭐 그래서 일단 필터사이즈가 2이면 패딩이 1/2여야하니까...
필터사이즈가 3이고, 패딩이 1이면 딱좋은걸 알겠는데

[?] Transpose Conv Layer에서 Padding size는 어떻게 알 수 있죠?

olxtar

예술과 기술

이전 포스트

[Q&A] CNN : Lesson 1. Convolutional Neural Network

다음 포스트

[Q&A] CNN : Lesson 6. Autoencoder

Question

Answer-1

Re-Question

[Q&A] CNN : Lesson 1. Convolutional Neural Network

Visualize CNN's filters and feature maps

0개의 댓글