I am interested in building a description/caption generator using a template based approach.
I have a rule based method which converts an input sentence into a template as shown below.
Shaped to a shirt silhouette, it’s easy to wear
Shaped to a SILHOUETTE, it’s easy to wear
The paisley print will look and make you feel amazing
The PRINT will look and make you feel amazing
Now, I can fill in in the placeholders like SILHOUETTE with tags pertaining to SILHOUETTE like shift, aline, bodycon etc.
There are 215 templates and after filling in the placeholders, I have 215 * 3 = 645 total sentences. The goal is to now formulate this as a sequence to sequence problem as follows:
shirt Shaped to a shirt silhouette, it’s easy to wear
thin puff This dress features thin straps and puff sleeves
I trained a T5-base model using hugging face with the following model hyper-parameters:
The output sentences have the following problems:
1. Many a times, all the 3 returned sequences are the same.
2. There is extraneous information present in the output sentence. Here is an example:
-Input: blue mini holiday (these are tags corresponding to COLOR, LENGHT and OCCASION)
-Output: Covered in a blue color, this mini length dress will brighten up your summer season wardrobe, styled in a holiday silhouette and SHAPE with a holiday sleeve of a minimum of 10 inch
Now, my input sentences don’t have holiday silhouette or holiday sleeves. Also,SEASON was not a part of the input above, but the sentence has summer in it.
Can someone please explain how to overcome such problems. Is my dataset too small?