Aller au contenu principal Aller au sitemap

CyberDTD: A Multimodal Benchmark Dataset for Cyberbullying Detection in Tunisian Dialect

AUTHORS

  • Bechir Sahar Ben
  • Mekki Asma
  • Badache Ismail
  • Ellouze Mariem
  • Belguith Lamia Hadrich

KEYWORDS

  • Natural Language Processing NLP
  • Cyberbullying
  • Tunisian Dialect
  • Multimodal Dataset
  • Document type

    Conference papers

    Abstract

    Effective detection of cyberbullying requires understanding both textual and visual signals, including images with embedded text and user generated comments. This need is even more evident in low resource and multilingual environments such as Tunisia. In this context, this paper establishes CyberDTD (Cyberbullying Detection in Tunisian Dialect), a multimodal dataset designed to support research on cyberbullying detection in the Tunisian Dialect (TD). With 10,802 images across five categories, humor, sarcasm, hate, violence, and neutral. We present, to the best of our knowledge, the first cyberbullying dataset in TD. We provide a comprehensive description covering a wide range of online harassment, while also including neutral examples for balanced analysis. Key challenges such as class imbalance, multimodality, and cultural specificity are highlighted. CyberDTD represents an important resource for building and evaluating machine learning models in low-resource settings, supporting the development of more robust and culturally aware cyberbullying detection systems.

    FILE

    MORE INFORMATION