Employability of Neural Network Tools and Techniques for Enhancing Image Caption Generation

Harshit Dua

Galgotias University, Uttar Pradesh, India


Vol: 10, Issue: 4, 2020

Receiving Date: 2020-09-04 Acceptance Date:


Publication Date:


Download PDF


Nowadays, there is massive research in generating automatic image caption; this technique is very challenging and uses Natural language processing. For instance, it could assist incapacitated people with improving the matter of images on the web. Likewise, it could give more precise and minimized images/recordings in situations, such as picture sharing in interpersonal organization or video surveillance system. The structure comprises a convolutional neural organization (CNN) traced by a repetitive neural organization (RNN). The strategy can produce picture sayings that are generally semantically unmistakable and linguistically right by taking in information from picture and subtitle matches. Individuals, for the most part, depict a scene utilizing characteristic languages which are concise and reduced. However, computer vision frameworks define the set by taking a picture which is a two-measurement presentation. The plan is to picture and engrave similar places and projects from the image to the sentences.

Keywords: Computer Vision; Image Captions; Neural Nets; Mappings


  1. Abhaya Agarwal and Alon Lavie. 2008. Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with human rankings of machine translation output. In Proceedings of the Third Workshop on Statistical Machine Translation. Association for Computational Linguistics, 115–118.
  2. Ahmet Aker and Robert Gaizauskas. 2010. Generating image descriptions using dependency relational patterns. In Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, 1250–1258.
  3. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision. Springer, 382–398.
  4. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-up and top-down attention for image captioning and vqa. ArXiv preprint arXiv:1707.07998 (2017). [5] Jyoti Aneja, Aditya Deshpande, and Alexander G Schwing. 2018. Convolutional image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5561–5570.
  5. Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  6. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR).
  7. Shuang Bai and Shan An. 2018. A Survey on Automatic Image Caption Generation. Neurocomputing. ACM Computing Surveys, Vol. 0, No. 0, Article 0. Acceptance Date: October 2018. 0:30 Hossain et al.
  8. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Vol. 29. 65–72.

Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.