AttributeError: 'list' object has no attribute 'size' with HuggingFace model

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Other frameworkAttributeError: 'list' object has no attribute 'size' with HuggingFace model
Chris Staff asked 4 weeks ago

I am running the following code for machine translation with HuggingFace Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-nl")

# Initialize the model
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-nl")

# Tokenize text
text = "Hello my friends! How are you doing today?"
tokenized_text = tokenizer.prepare_seq2seq_batch([text])

# Perform translation and decode the output
translation = model.generate(**tokenized_text)
translated_text = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]

# Print translated text

However, I am getting this error:

Traceback (most recent call last):
File "", line 15, in
translation = model.generate(**tokenized_text)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd\", line 15, in decorate_context
return func(*args, **kwargs)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\", line 847, in generate
model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(input_ids, model_kwargs)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\", line 379, in _prepare_encoder_decoder_kwargs_for_generation
model_kwargs["encoder_outputs"]: ModelOutput = encoder(input_ids, return_dict=True, **encoder_kwargs)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\models\marian\", line 711, in forward
input_shape = input_ids.size()
AttributeError: 'list' object has no attribute 'size'

What can be the cause? How to fix?

1 Answers
Best Answer
Chris Staff answered 4 weeks ago

The prepare_seq2seq_batch call returns a Python dictionary with at least one key called input_ids, which contain the ids of the tokens of the input text in the vocabulary being used. By default, the type of input_ids is a list – and this seems to be updated in some recent version of HuggingFace. The error indicates that a list has no ‘size’ attribute, which is correct; Python lists don’t have them. However, with return_tensors, you can change the type of the input_ids object into pt (PyTorch Tensors), tf (TensorFlow Tensors) or np (NumPy Tensors). Like this:

tokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors='pt')

For me, this resolves the issue.

Your Answer

8 + 7 =