background

Recently, I find a good cources about multimodal machine learning. In this blog, I will study it and note my understanding.
Here is orgin URL : ppt

O

master multimodal datasets and tasks

KR

▪ Identify tasks/applications of multimodal
machine learning
▪ Knowledge of available datasets to tackle the
challenges
▪ Appreciation of current state-of-the-art

1. Identify tasks/applications of multimodal

1. affect recongnition

metic :

  • lable emtion
  • arousal, valence
    dataSet :
  • AFEW – Acted Facial Expressions in the Wild (part of EmotiW Challenge)
  • Three AVEC challenge datasets 2011/2012,2013/2014, 2015, 2016, 2017, 2018
  • The Interactive Emotional Dyadic Motion Capture (IEMOCAP)
  • Persuasive Opinion Multimedia (POM)
  • Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos (MOSI)
  • Multimodal sentiment and emotion recognition( CMU-MOSEI )
  • Tumblr Dataset – Tumblr posts with images and emotion word tags.
  • Multimodal humor sensing.(Video from RGB-d camera, but no audio/language)

State-of-the- art :
2015 challenge winner: Using multimodal BILSTM to fusion.
11-777 lecture 1.2 Dateset and task
Emotiw 2016 winner :Using CNN-RNN and C3D hybrid networks(later fusion)
11-777 lecture 1.2 Dateset and task
Emotiw 2017 winner :Learning Supervised Scoring Ensemble
11-777 lecture 1.2 Dateset and task

2. Personality/trait recognition

dataSet :

  • VGD – Video Game Dataset, game rating based on text and trailer screenshots.
  • Multimodal Dyadic Behaviour Database

3. Media description

task one Media description : Given a piece of media (image, video, audiovisual clips) provide a free form text description.

task two VQA : Given an image and a question, answer the question
task three Referring Expression: Generation (Bounding Box to Text) and Comprehension (Text to Bounding Box)

task four : Visual Dialog
11-777 lecture 1.2 Dateset and task

dataset:

  • Microsoft Common Objects in COntext (MS COCO)
  • MPII Movie Description dataset
  • Montréal Video Annotation dataset

4. Event detection

task :
▪ Given video/audio/ text detect predefined events or scenes.
▪ Segment events in a stream.
▪ Summarize videos.

dataset :

    • cooking dataset
      11-777 lecture 1.2 Dateset and task
  • omit

5. Cross-media retrieval

task : Given one form of media retrieve related forms of media, given text retrieve images, given image retrieve relevant documents.

dataset :

  • Interior Design Dataset – Retrieve desired product using room photos and text queries.
    11-777 lecture 1.2 Dateset and task
    … too much

6. main challenge during experiment

11-777 lecture 1.2 Dateset and task

相关文章: