Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

Summary

  1. In this paper, author propose DIMNets, a framework that formulates the problem of cross-modal matching of voices and faces. It is not map voices to faces directly.
  2. In this framework, we can make use of multiple kinds of label information provided through covariates.

Research Objective

this paper focuses on the task of devising computational mechanisms for cross-modal matching of voice recordings and images of the speakers’ faces.

Background and Problems

  • Background

    • the vocal tract that generates voice also shapes the face. And, humans have been shown to be able to associate voices of unknown individuals to pictures of their faces [5].
  • Problem Statement

    • The specific problem setting we look at is one wherein we have an existing database of samples of people’s voices and images of their faces
    • we must automatically and accurately determine which voices match to which faces.

Related work

  1. Nagrani et al. [14] formulate the mapping as a binary task: given a voice recording, one must successfully select the speaker’s face from a pair of face images(or the reverse).

Method(s)

  • Object
    Using an effective way to fusion mutimodal represention. It will make our method’s Top-N accurcy higher.

  • Methods

    • Using Bag of Words to extract textual descriptions, adopt CNN networks to extract raw features from items(i.e., videos) in our framework.
    • through three autoencoder[45] to reduce fimensionality and learn feature.(i) undercomplete autoencoders, (ii) sparse autoencoders,and (iii) denoising autoencoders
    • three different architectures

paper review : Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

  • Recommender module :
    • build an item-item similarity matrix B by exploiting the new representation of items as given by the F matrix(new representation
      for items by fusing modalities using autoencoders).
    • different form SSLIM(a tradictiional recommender) We firstly computing items similarties matrix B by F, Then using line7 to compute argmin S. ( R,F,g is calculate in training perior)
    • Test result,Give the Top-N argmin(s) as recommendate result.
      paper review : Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

Evaluation

  • DtataSet :
  • MovieLens dataset: (i) ML-1M, and (ii) ML-10M
  • Vine dataset
    paper review : Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
  • Metrics :Normalized Discounted Cumulative Gain at top-N ([email protected])
    paper review : Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

  • Baselines : 10 baselines

  • Results : .too long…
    paper review : Disjoint Mapping Network for Cross-modal Matching of Voices and Faces

  • Analysis:

    • In this section, the author analysis fusion architecture impact , autoencoder type impact and Overall performance.

Conclusion

  • main controbution
  1. present three different architectures to learn multimodal representations of items.
  2. conduct several experiments to analyze different aspects of our framework using three real-world datasets.
  • week point
  1. not reflected .
  • further work
  1. investigate different Deep Learning architectures as well as other feature representations.
  2. study how other modalities, such as audio, may impact the quality of suggested items.
  3. consider other recommendation domains, such as social network, product and music content.

Reference(optional)

Arouse for me

  • In this paper, the author using a section to introduce fundamental concepts relevant to this work. I need to pay more attention into it. This kind of section is benefit for paper to be understand.
  • I should study how to write related work. The author writing a related work section by a logic way. My related work is little logically.
  • I need to add the curve fig in experiment.
  • Add more quotes, don’t be shy.

相关文章:

  • 2021-08-02
  • 2021-08-22
  • 2022-12-23
  • 2021-08-16
  • 2021-04-18
  • 2021-04-09
  • 2021-11-02
  • 2021-10-12
猜你喜欢
  • 2021-07-01
  • 2021-09-22
  • 2021-09-20
  • 2021-07-18
  • 2021-09-09
  • 2021-09-01
  • 2021-10-22
相关资源
相似解决方案