[ValueError]ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list)

차보경·2022년 5월 29일

Embedding NLP python value error

what's the matter

목록 보기

1/2

Keras에서 embedding을 하는데 갑자기 오류가 났다.

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

넘파이어레이를 텐서로 바꾸는데 실패했다고 하는데, 왜일까...!
우선 코드를 보자

🛠error 난 코드


vocab_size = len(word_to_index) # 딕셔너리에 포함된 단어 개수는 10개

word_vector_dim = 8    # output으로 8차원의 워드 벡터를 설정

embedding = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=word_vector_dim, mask_zero=True)

# 숫자로 변환된 텍스트 데이터 [[1, 3, 6, 8], [1, 5, 4, 7], [1, 7, 3, 2, 8]] 에 Embedding 레이어를 적용
raw_inputs = np.array(get_encoded_sentences(sentences, word_to_index), dtype='object')
# get_encoded_sentences는 코드내에서 문장들을 index로 바꿔주기 위해 선언한 함수 

output = embedding(raw_inputs)
print(output)

🧐error 이유 : embeddign input vector의 크기가 다름

Keras Embedding layer로 input되는 들어가는 문장의 Vector는 크기가 같아야 한다.
지금 raw_input으로 들어가는 텍스트 데이터의 길이가 각각 4 / 4 / 5로 같지 않기 때문에 변환을 하지 못한 것이다.
근데 왜 길이를 맞춰줘야할까?
: 행렬로 만드는 이유는 뭐다?? 병렬 연산을 위해서!
-> 수많은 문장을 하나의 행렬로 바꿔줘 입력시킴으로 병렬 연산을 시켜주기 위함이다.
역시 컴퓨터 + 행렬 연산 = 속도 깡패...!!

🙌 해결 방법 : `pad`를 사용하여 길이를 강제로 맞춰줌

Tensorflow의 tf.keras.preprocessing.sequence.pad_sequences()함수를 사용하여 크기가 맞지 않는 vector 행의 뒤에 <pad> 인 0을 부여함으로 길이를 맞춰준 후 embedding 함

vocab_size = len(word_to_index)  
word_vector_dim = 8    

embedding = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=word_vector_dim, mask_zero=True)

raw_inputs = np.array(get_encoded_sentences(sentences, word_to_index), dtype=object)


########### 여기까진 아까와 동일 ########### 
# pad를 주기위해 추가한 코드
raw_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs,
                                                       value=word_to_index['<PAD>'],
                                                       padding='post')
output = embedding(raw_inputs)
print(output)

tf.keras.preprocessing.sequence.pad_sequences()의 요소들
1) sequences : sequence list 개체(길이가 맞지 않음)
2) maxlen : 생성될 vector의 최대 길이 (설정하지 않으면 최대 길이에 맞춰 생성됨)
3) dtype : 생성될 sequence의 타입 (디폴트 값은 'int32')
4) padding : 패딩이 들어갈 위치. 'pre'/'post'로 설정할 수 있으며 (디폴트는 "pre")
5) truncating : 만약 maxlen을 설정했을 때, 그보다 긴 값을 자르는 기준 (디폴트는 'pre',
6) value : 패딩으로 들어갈 값 (디폴트는 0.0)
NLP 진행시 embedding은 필수니 이런 오류가 다신 나지 않도록 주의하자!

기타

refernce : TensorFlow - pad_sequences 내용
mask_zero=true -> 원글은 true, 패딩된 부분은 false로 표기

차보경

차보의 Data Engineer 도전기♥ (근데 기록을 곁들인)

다음 포스트

[ValueError]ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list)

what's the matter

🛠error 난 코드

🧐error 이유 : embeddign input vector의 크기가 다름

🙌 해결 방법 : `pad`를 사용하여 길이를 강제로 맞춰줌

기타

[AttributeError]AttributeError: 'Series' object has no attribute 'value'

0개의 댓글

[ValueError]ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list)

what's the matter

🛠error 난 코드

🧐error 이유 : embeddign input vector의 크기가 다름

🙌 해결 방법 : pad를 사용하여 길이를 강제로 맞춰줌

기타

[AttributeError]AttributeError: 'Series' object has no attribute 'value'

0개의 댓글

🙌 해결 방법 : `pad`를 사용하여 길이를 강제로 맞춰줌