文章目录
background
Recently, I find a good cources about multimodal machine learning. In this blog, I will study it and note my understanding.
Here is orgin URL : ppt
O
master multimodal datasets and tasks
KR
▪ Identify tasks/applications of multimodal
machine learning
▪ Knowledge of available datasets to tackle the
challenges
▪ Appreciation of current state-of-the-art
1. Identify tasks/applications of multimodal
1. affect recongnition
metic :
- lable emtion
- arousal, valence
dataSet : - AFEW – Acted Facial Expressions in the Wild (part of EmotiW Challenge)
- Three AVEC challenge datasets 2011/2012,2013/2014, 2015, 2016, 2017, 2018
- The Interactive Emotional Dyadic Motion Capture (IEMOCAP)
- Persuasive Opinion Multimedia (POM)
- Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos (MOSI)
- Multimodal sentiment and emotion recognition( CMU-MOSEI )
- Tumblr Dataset – Tumblr posts with images and emotion word tags.
- Multimodal humor sensing.(Video from RGB-d camera, but no audio/language)
State-of-the- art :
2015 challenge winner: Using multimodal BILSTM to fusion.
Emotiw 2016 winner :Using CNN-RNN and C3D hybrid networks(later fusion)
Emotiw 2017 winner :Learning Supervised Scoring Ensemble
2. Personality/trait recognition
dataSet :
- VGD – Video Game Dataset, game rating based on text and trailer screenshots.
- Multimodal Dyadic Behaviour Database
3. Media description
task one Media description : Given a piece of media (image, video, audiovisual clips) provide a free form text description.
task two VQA : Given an image and a question, answer the question
task three Referring Expression: Generation (Bounding Box to Text) and Comprehension (Text to Bounding Box)
task four : Visual Dialog
dataset:
- Microsoft Common Objects in COntext (MS COCO)
- MPII Movie Description dataset
- Montréal Video Annotation dataset
- …
4. Event detection
task :
▪ Given video/audio/ text detect predefined events or scenes.
▪ Segment events in a stream.
▪ Summarize videos.
dataset :
-
- cooking dataset
- cooking dataset
- omit
5. Cross-media retrieval
task : Given one form of media retrieve related forms of media, given text retrieve images, given image retrieve relevant documents.
dataset :
- Interior Design Dataset – Retrieve desired product using room photos and text queries.
… too much