文章目录
Multimodal data fusion framework based on autoencoders for top-N recommender systems
Summary
In this paper, author design a new framework based on autoencoders to improve video’s recommendation quality. This method effection comfirm by MovieLens datasets and Vine dataset.
Research Objective
The author aiming combine visual and textual information to improve video recommendation top-N accuracy.
Background and Problems
-
Background
- Recommender sysytems focus on top-N recommendation. The previous menthods can be divided into latent space methods and neihbourhood-based methods.(user-based and item-based) In recently years, many researches using moview description instead of focusing users past item.
-
previous methods brief introduction
- latent space methods : it commonly use slove rating prediction task
- neigh-bourhood methods (user-based and itembased) : item-based methods based on using predefined measure to calculate similarities between items, it is better that user-based.
-
Problem Statement
- when items are described in high-dimensional spaces, item-based work not so well. So some researches use Sparse LInear Methods to solve it.
- How to combine multimodal recommendation effectively still is a problems. Furthermore, there is no framework in top-K recommendation of video.
Related work
omit (this section added by me )
Method(s)
-
Object
Using an effective way to fusion mutimodal represention. It will make our method’s Top-N accurcy higher. -
Methods
- Using Bag of Words to extract textual descriptions, adopt CNN networks to extract raw features from items(i.e., videos) in our framework.
- through three autoencoder[45] to reduce fimensionality and learn feature.(i) undercomplete autoencoders, (ii) sparse autoencoders,and (iii) denoising autoencoders
- Recommender module :
- build an item-item similarity matrix B by exploiting the new representation of items as given by the F matrix(new representation
for items by fusing modalities using autoencoders). - different form SSLIM(a tradictiional recommender) We firstly computing items similarties matrix B by F, Then using line7 to compute argmin S. ( R,F,g is calculate in training perior)
- Test result,Give the Top-N argmin(s) as recommendate result.
- build an item-item similarity matrix B by exploiting the new representation of items as given by the F matrix(new representation
Evaluation
- DtataSet :
- MovieLens dataset: (i) ML-1M, and (ii) ML-10M
- Vine dataset
-
Metrics :Normalized Discounted Cumulative Gain at top-N ([email protected])
-
Baselines : 10 baselines
-
Results : .too long…
-
Analysis:
- In this section, the author analysis fusion architecture impact , autoencoder type impact and Overall performance.
Conclusion
- main controbution
- present three different architectures to learn multimodal representations of items.
- conduct several experiments to analyze different aspects of our framework using three real-world datasets.
- week point
- not reflected .
- further work
- investigate different Deep Learning architectures as well as other feature representations.
- study how other modalities, such as audio, may impact the quality of suggested items.
- consider other recommendation domains, such as social network, product and music content.
Reference(optional)
Arouse for me
- In this paper, the author using a section to introduce fundamental concepts relevant to this work. I need to pay more attend into it. This kind of section is benefit for paper to be understand.
- I should study how to write related work. The author writing a related work section by a logic way. My related work is little logically.
- I need to add the curve fig in experiment.
- Add more quotes, don’t be shy.