One Hot Encoding to Embedding

Dobby713·2022년 6월 17일
0

RA

목록 보기
8/8
  • Amonog dataset these below were using one hot encoding
    1.supreme industry classification
    2.median industry classification
    3.sub industry classification
    4.industry id
    5.Market capitalization scale
    6.Market classification
    7.Month, Day, Week

among them, industry id, month, day change from one hot encoding to embedding.

def emb_layer_dict(list_, dim_):        # 인자로 변경할 리스트와 dimesion을 받음  
    emb_dict = dict()
    if len(list_) > 31:                 # 산업분류코드는 널값처리 위해                              
        emb_dict = {tkn: i+2 for i, tkn in enumerate(list_)}       
        emb_dict['<unk>'] = 0 
        emb_dict['<pad>'] = 1
    else:
        emb_dict = {tkn: i for i, tkn in enumerate(list_)}

    emb_table = nn.Embedding(num_embeddings=len(emb_dict), 
                            embedding_dim=dim_,
                            padding_idx=1)
 
    
    return {list_[i]:emb_table[i] for i in range(len(list_))}

RunntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead

The reason is
the tensor to another tensor that isn't requiring a gradient in addition to it's actual value definition. This other tensor can be converted to a numpy array.

ref : https://stackoverflow.com/questions/55466298/pytorch-cant-call-numpy-on-variable-that-requires-grad-use-var-detach-num

So, I add this line after emb_table

emb_table = emb_table.weight.detach().numpy()

0개의 댓글