TypeError: expected string or bytes-like object when running TAPAS with HuggingFace Transformers

Ask Questions Forum: ask Machine Learning Questions to our readersCategory: Deep LearningTypeError: expected string or bytes-like object when running TAPAS with HuggingFace Transformers
Chris Staff asked 1 month ago

I am trying to run the HuggingFace Transformers based TAPAS model:


from transformers import TapasTokenizer, TapasForQuestionAnswering
import pandas as pd

# Load pretrained tokenizer: TAPAS finetuned on WikiTable Questions
tokenizer = TapasTokenizer.from_pretrained("google/tapas-base-finetuned-wtq")

# Load pretrained model: TAPAS finetuned on WikiTable Questions
model = TapasForQuestionAnswering.from_pretrained("google/tapas-base-finetuned-wtq")

# Define the table and queries
table = {
'name': ['Dave', 'Peter', 'John'],
'age': [12, 38, 99]
}
queries = ['Who is the oldest person?']

# Convert table into DataFrame
table_df = pd.DataFrame.from_dict(table)
print(table_df)

# Tokenize table
tokenized_table = tokenizer(table=table_df, queries=queries, padding='max_length', return_tensors='pt')

# Get result
outputs = model(**tokenized_table)

# Convert logit outputs into predictions
predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(
tokenized_table,
outputs.logits.detach(),
outputs.logits_aggregation.detach()
)

# Prepare results
id2aggregation = {0: "NONE", 1: "SUM", 2: "AVERAGE", 3:"COUNT"}
aggregation_predictions_string = [id2aggregation[x] for x in predicted_aggregation_indices]
answers = []
for coordinates in predicted_answer_coordinates:
if len(coordinates) == 1:
# only a single cell:
answers.append(table.iat[coordinates[0]])
else:
# multiple cells
cell_values = []
for coordinate in coordinates:
cell_values.append(table.iat[coordinate])
answers.append(", ".join(cell_values))
display(table)
print("")
for query, answer, predicted_agg in zip(queries, answers, aggregation_predictions_string):
print(query)
if predicted_agg == "NONE":
print("Predicted answer: " + answer)
else:
print("Predicted answer: " + predicted_agg + " > " + answer)

I am running into the following error:


Traceback (most recent call last):
File "tapas.py", line 26, in
tokenized_table = tokenizer(table=table_df, queries=queries, padding='max_length', return_tensors='pt')
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\models\tapas\tokenization_tapas.py", line 601, in __call__
return self.batch_encode_plus(
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\models\tapas\tokenization_tapas.py", line 716, in batch_encode_plus
return self._batch_encode_plus(
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\models\tapas\tokenization_tapas.py", line 762, in _batch_encode_plus
table_tokens = self._tokenize_table(table)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\models\tapas\tokenization_tapas.py", line 1321, in _tokenize_table
tokenized_row.append(self.tokenize(cell))
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\site-packages\transformers\tokenization_utils.py", line 260, in tokenize
text = re.sub(pattern, lambda m: m.groups()[0] or m.groups()[1].lower(), text)
File "C:\Users\chris\Anaconda3\envs\pytorch\lib\re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

What can be the problem here?

1 Answers
Best Answer
Chris Staff answered 1 month ago

The problem is related to the age column of your table, which is specified as integers / numbers, rather than strings. TAPAS only supports strings and all numeric data has to be converted into string format. Changing the code into this should work:


# Define the table and queries
table = {
'name': ['Dave', 'Peter', 'John'],
'age': ["12", "38", "99"]
}
queries = ['Who is the oldest person?']

Note that there are also two other issues with the code above:

  • There are two occurrences of a call to table.iat. However, this must be table_df.iat.
  • The display definition is not defined. That line thus must be removed.

The outcome after fixing this:


Who is the oldest person?
Predicted answer: Dave

An interesting result but at least it works! 😉

Your Answer

8 + 6 =