Timezone: »
In many classification problems, the input is represented as a set of features, e.g., the bag-of-words (BoW) representation of documents. Support vector machines (SVMs) are widely used tools for such classification problems. The performance of the SVMs is generally determined by whether kernel values between data points can be defined properly. However, SVMs for BoW representations have a major weakness in that the co-occurrence of different but semantically similar words cannot be reflected in the kernel calculation. To overcome the weakness, we propose a kernel-based discriminative classifier for BoW data, which we call the latent support measure machine (latent SMM). With the latent SMM, a latent vector is associated with each vocabulary term, and each document is represented as a distribution of the latent vectors for words appearing in the document. To represent the distributions efficiently, we use the kernel embeddings of distributions that hold high order moment information about distributions. Then the latent SMM finds a separating hyperplane that maximizes the margins between distributions of different classes while estimating latent vectors for words to improve the classification performance. In the experiments, we show that the latent SMM achieves state-of-the-art accuracy for BoW text classification, is robust with respect to its own hyper-parameters, and is useful to visualize words.
Author Information
Yuya Yoshikawa (Chiba Institute of Technology)
Tomoharu Iwata (NTT)
Hiroshi Sawada (NTT Service Evolution Labs.)
More from the Same Authors
-
2022 Poster: Symplectic Spectrum Gaussian Processes: Learning Hamiltonians from Noisy and Sparse Data »
Yusuke Tanaka · Tomoharu Iwata · naonori ueda -
2022 Poster: Few-shot Learning for Feature Selection with Hilbert-Schmidt Independence Criterion »
Atsutoshi Kumagai · Tomoharu Iwata · Yasutoshi Ida · Yasuhiro Fujiwara -
2022 Poster: Sharing Knowledge for Meta-learning with Feature Descriptions »
Tomoharu Iwata · Atsutoshi Kumagai -
2021 Poster: Meta-Learning for Relative Density-Ratio Estimation »
Atsutoshi Kumagai · Tomoharu Iwata · Yasuhiro Fujiwara -
2021 Poster: Loss function based second-order Jensen inequality and its application to particle variational inference »
Futoshi Futami · Tomoharu Iwata · naonori ueda · Issei Sato · Masashi Sugiyama -
2019 Poster: Transfer Anomaly Detection by Inferring Latent Domain Representations »
Atsutoshi Kumagai · Tomoharu Iwata · Yasuhiro Fujiwara -
2019 Poster: Spatially Aggregated Gaussian Processes with Multivariate Areal Outputs »
Yusuke Tanaka · Toshiyuki Tanaka · Tomoharu Iwata · Takeshi Kurashima · Maya Okawa · Yasunori Akagi · Hiroyuki Toda -
2016 Poster: Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models »
Tomoharu Iwata · Makoto Yamada -
2015 Poster: Cross-Domain Matching for Bag-of-Words Data via Kernel Embeddings of Latent Distributions »
Yuya Yoshikawa · Tomoharu Iwata · Hiroshi Sawada · Takeshi Yamada