paper review : Multimodal data fusion framework based on autoencoders for top-N recommender systems

文章目录

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Summary

In this paper, author design a new framework based on autoencoders to improve video’s recommendation quality. This method effection comfirm by MovieLens datasets and Vine dataset.

Research Objective

The author aiming combine visual and textual information to improve video recommendation top-N accuracy.

Background and Problems

Background
- Recommender sysytems focus on top-N recommendation. The previous menthods can be divided into latent space methods and neihbourhood-based methods.(user-based and item-based) In recently years, many researches using moview description instead of focusing users past item.
previous methods brief introduction
- latent space methods : it commonly use slove rating prediction task
- neigh-bourhood methods (user-based and itembased) : item-based methods based on using predefined measure to calculate similarities between items, it is better that user-based.
Problem Statement
- when items are described in high-dimensional spaces, item-based work not so well. So some researches use Sparse LInear Methods to solve it.
- How to combine multimodal recommendation effectively still is a problems. Furthermore, there is no framework in top-K recommendation of video.

Related work

omit (this section added by me )

Method(s)

Object
Using an effective way to fusion mutimodal represention. It will make our method’s Top-N accurcy higher.
Methods
- Using Bag of Words to extract textual descriptions, adopt CNN networks to extract raw features from items(i.e., videos) in our framework.
- through three autoencoder[45] to reduce fimensionality and learn feature.(i) undercomplete autoencoders, (ii) sparse autoencoders,and (iii) denoising autoencoders

paper review : Multimodal data fusion framework based on autoencoders for top-N recommender systems

Recommender module :
- build an item-item similarity matrix B by exploiting the new representation of items as given by the F matrix(new representation
  for items by fusing modalities using autoencoders).
- different form SSLIM(a tradictiional recommender) We firstly computing items similarties matrix B by F, Then using line7 to compute argmin S. ( R,F,g is calculate in training perior)
- Test result,Give the Top-N argmin(s) as recommendate result.

Evaluation

DtataSet :

MovieLens dataset: (i) ML-1M, and (ii) ML-10M
Vine dataset

Metrics :Normalized Discounted Cumulative Gain at top-N ([email protected])
Baselines : 10 baselines
Results : .too long…
Analysis:
- In this section, the author analysis fusion architecture impact , autoencoder type impact and Overall performance.

Conclusion

main controbution

present three different architectures to learn multimodal representations of items.
conduct several experiments to analyze different aspects of our framework using three real-world datasets.

week point

not reflected .

further work

investigate different Deep Learning architectures as well as other feature representations.
study how other modalities, such as audio, may impact the quality of suggested items.
consider other recommendation domains, such as social network, product and music content.

Reference(optional)

Arouse for me

In this paper, the author using a section to introduce fundamental concepts relevant to this work. I need to pay more attend into it. This kind of section is benefit for paper to be understand.
I should study how to write related work. The author writing a related work section by a logic way. My related work is little logically.
I need to add the curve fig in experiment.
Add more quotes, don’t be shy.