Download PDFOpen PDF in browserIdentifying Legality of Japanese Online Advertisements Using Complex-Valued Support Vector Machine with DFT-Based Document FeaturesEasyChair Preprint 669214 pages•Date: September 25, 2021AbstractAs Internet advertising market expands, the number of advertisements containing inappropriate language is increasing. Advertisements that exaggerate the efficacy of products may contravene the Pharmaceutical Affairs Law and the Act against Unjustifiable Premiums and Misleading Representations. Therefore, a system that can detect problematic expressions is required. Some advertisements cannot be classified using only the statistics of words. Therefore, embedding other information, such as word order and word period in the features is effective to categorize documents. However, the number of labeled data in advertising documents is limited; consequently, models with complex structures tend to overlearn. In addition, features and discriminant models with high generalization performance must be found even if the number of data is small. To address these severe issues, we propose a document feature based on the discrete Fourier transform(DFT) of word vectors weighted using an index previously proposed in a study that attempted to categorize Chinese online advertisements. We also propose a document discriminant model based on a complex-valued support vector machine. We demonstrate that the proposed model outperforms previous models in terms of discriminative performance of F-measure. We found that the proposed index emphasizes word vectors of specific nouns and verbs in Japanese advertisements. In addition, we found that DFT significantly increased the norms of document vectors of illegal documents. These factors contributed to the better performance of the proposed model. Keyphrases: Complex-valued Support Vector Machine, Discrete Fourier Transform, Internet advertisement, Natural Language Processing
|