One Hot Encoding to Embedding

Dobby713·2022년 6월 17일

Embedding One-Hot Encoding embedding-layer error

RA

목록 보기

8/8

Amonog dataset these below were using one hot encoding
1.supreme industry classification
2.median industry classification
3.sub industry classification
4.industry id
5.Market capitalization scale
6.Market classification
7.Month, Day, Week

among them, industry id, month, day change from one hot encoding to embedding.

def emb_layer_dict(list_, dim_):        # 인자로 변경할 리스트와 dimesion을 받음  
    emb_dict = dict()
    if len(list_) > 31:                 # 산업분류코드는 널값처리 위해                              
        emb_dict = {tkn: i+2 for i, tkn in enumerate(list_)}       
        emb_dict['<unk>'] = 0 
        emb_dict['<pad>'] = 1
    else:
        emb_dict = {tkn: i for i, tkn in enumerate(list_)}

    emb_table = nn.Embedding(num_embeddings=len(emb_dict), 
                            embedding_dim=dim_,
                            padding_idx=1)
 
    
    return {list_[i]:emb_table[i] for i in range(len(list_))}

RunntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead

The reason is
the tensor to another tensor that isn't requiring a gradient in addition to it's actual value definition. This other tensor can be converted to a numpy array.

ref : https://stackoverflow.com/questions/55466298/pytorch-cant-call-numpy-on-variable-that-requires-grad-use-var-detach-num

So, I add this line after emb_table

emb_table = emb_table.weight.detach().numpy()

Dobby713

이전 포스트

One Hot Encoding to Embedding

RA

RA 초기 기록들

0개의 댓글