CyberDTD: A Multimodal Benchmark Dataset for Cyberbullying Detection in Tunisian Dialect
AUTHORS
KEYWORDS
Document type
Conference papersAbstract
Effective detection of cyberbullying requires understanding both textual and visual signals, including images with embedded text and user generated comments. This need is even more evident in low resource and multilingual environments such as Tunisia. In this context, this paper establishes CyberDTD (Cyberbullying Detection in Tunisian Dialect), a multimodal dataset designed to support research on cyberbullying detection in the Tunisian Dialect (TD). With 10,802 images across five categories, humor, sarcasm, hate, violence, and neutral. We present, to the best of our knowledge, the first cyberbullying dataset in TD. We provide a comprehensive description covering a wide range of online harassment, while also including neutral examples for balanced analysis. Key challenges such as class imbalance, multimodality, and cultural specificity are highlighted. CyberDTD represents an important resource for building and evaluating machine learning models in low-resource settings, supporting the development of more robust and culturally aware cyberbullying detection systems.
FILE