Arnav Goenka
Vellore Institute of Technology, Vellore, Tamil Nadu, India
Download PDFhttp://doi.org/10.37648/ijrst.v12i03.009
Representation learning is a machine learning type wherein a system automatically uses deep models to extract features from raw data. It is essential for tasks like classifications, regression, and identification. Multimodal representation learning is a subset of representation learning that focuses on feature extraction from several heterogeneous, interconnected modalities. Although these modalities are frequently heterogeneous, they show correlations and relationships. These modalities include text, images, audio, or videos. Several difficulties arise from this intrinsic complexity, including combining multimodal data from various sources by precisely characterizing the relationships and correlations between modalities and jointly deriving features from multimodal data. Researchers are becoming increasingly interested in these problems, particularly as deep learning gains momentum. In recent years, many deep multimodal learning techniques have been developed. We present an overview of deep multimodal learning in this study, focusing on techniques that have been proposed in the past decade. We aim to provide readers with valuable insights for researchers, especially those working on multimodal deep machine learning, by educating them on the latest developments, trends, and difficulties in this field.
Keywords: machine learning; Multimodal representation learning; Multimodality Robust Line Segment (MRLS)
Disclaimer: All papers published in IJRST will be indexed on Google Search Engine as per their policy.