Alexander Mathews, a PhD student at the Australian National University (ANU) Research School of Computer Science, has developed a machine learning system that creates story-like captions for images.
Mr Mathews, from the ANU Computational Media Lab, has focused on producing a model that captions an image in context of its surroundings, with positive or negative sentiment and a specific linguistic style.
The specific style can even be modelled from an author’s writing style – such as Shakespeare – or a collection of existing texts such as a romance or crime novel.
This human-like style generation comes from connecting the semantic gist of a sentence (i.e. the content words) to a powerful language generator that learns the word sequence patterns in the given style.
“It’s been great to work though the process and see the novel descriptions displayed. We were able to deliver past tense and adopt a first person view in the narrative, all of which are typical of the natural human language,” said Mr Mathews.
Dr Lexing Xie from the ANU Computational Media Lab said that the work is encouraging as the system can separately distil content and articulate style.
“We are pleased to see that the model has learned to pick more versatile words for expressing the same meaning.” Dr Xie said.
Mr Mathews’ research shows the opportunities to deliver richer image descriptions based on the vast amount of linguistic datasets available. He will present his paper at the IEEE International Conference on Computer Vision and Pattern Recognition in Salt Lake City next week.
Link to paper: SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text