Analisis Klasifikasi Teks Pada Kata Slang di Media Sosial Menggunakan Pengolahan Bahasa Alami untuk Trending Topik
DOI:
https://doi.org/10.30865/klik.v5i1.2018Keywords:
Text Analysis; Slang Words; Social Media; NLP; ClassificationAbstract
This study aims to analyze trending topics related to the use of slang words on social media by utilizing natural language processing (NLP) techniques. The main focus of this research is to understand the patterns and trends of slang use on social media platforms, which can uncover important social and linguistic dynamics. The dataset used consisted of tweets in Indonesia and United Kingdom containing slang words, collected from Twitter over a six-month period. The analysis process begins with data cleansing to eliminate irrelevant elements, followed by tokenization and lemmatization to normalize the text. Furthermore, the Support Vector Machine (SVM) and Random Forest classification models are applied to detect and classify slang words in the dataset. The results show that the SVM model achieves a slang detection accuracy of 88% with an F1-score of 0.87, while the Random Forest model achieves an accuracy of 85% with an F1-score of 0.84. Further linguistic analysis showed that 60% of slang words are most commonly used in informal contexts such as everyday conversation, while the other 40% are related to popular culture trends, including music, movies, and fashion. In addition, these findings indicate that there is a variation in the use of slang between Indonesian and United Kingdom-speaking Twitter users, where slang in Indonesian tends to be more creative and contextual, while in United Kingdom it is more standardized and spread globally. This study confirms the effectiveness of both models in classifying slang words as well as identifying key trends in their use on social media. The contribution of this research is important for the study of digital linguistics because it expands the understanding of the dynamics of online slang use, and shows the great potential of NLP applications in linguistic analysis in the digital age. With the results obtained, this research can be a valuable guide for researchers and practitioners interested in understanding the evolution of language on social media, while providing a foundation for the development of more sophisticated and adaptive NLP technologies in handling language variations on digital platforms.
Downloads
References
B. Masua and N. Masasi, “Enhancing text pre-processing for Swahili language: Datasets for common Swahili stop-words, slangs and typos with equivalent proper words,” Data Br., vol. 33, p. 106517, Dec. 2020, doi: 10.1016/J.DIB.2020.106517.
A. Saiyed et al., “Technology-Assisted Motivational Interviewing: Developing a Scalable Framework for Promoting Engagement with Tobacco Cessation Using NLP and Machine Learning,” Procedia Comput. Sci., vol. 206, pp. 121–131, Jan. 2022, doi: 10.1016/J.PROCS.2022.09.091.
B. Samanta, R. Shil, A. R. Pal, and A. Pal, “Slang Word Detection in the Context of User Profiling in the Social Media Platformes,” 2024 4th Int. Conf. Adv. Electr. Comput. Commun. Sustain. Technol. ICAECT 2024, pp. 1–5, 2024, doi: 10.1109/ICAECT60202.2024.10468875.
M. Rothe, R. Lath, D. Kumar, P. Yadav, and A. Aylani, “Slang language Detection and Identification In Text,” 2023 14th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2023, pp. 1–5, 2023, doi: 10.1109/ICCCNT56998.2023.10308036.
J. Wang, L. Sun, Y. Liu, M. Shao, and Z. Zheng, “Multimodal Sarcasm Target Identification in Tweets,” Proc. Annu. Meet. Assoc. Comput. Linguist., vol. 1, pp. 8164–8175, 2022, doi: 10.18653/V1/2022.ACL-LONG.562.
P. D. Kaware and A. B. Raut, “Automatic Detection of Multilingual Misogynistic Content in Social Media Data Based on Machine Learning Approach,” 2nd Int. Conf. Integr. Circuits Commun. Syst. ICICACS 2024, pp. 1–7, 2024, doi: 10.1109/ICICACS60521.2024.10499136.
R. Korniichuk and M. Boryczka, “Averaging and boosting methods in ensemble-based classifiers for text readability,” Procedia Comput. Sci., vol. 192, pp. 3677–3685, 2021, doi: 10.1016/j.procs.2021.09.141.
C. Kumaresan and P. Thangaraju, “ELSA: Ensemble learning based sentiment analysis for diversified text,” Meas. Sensors, vol. 25, p. 100663, Feb. 2023, doi: 10.1016/J.MEASEN.2022.100663.
M. Siino, I. Tinnirello, and M. La Cascia, “Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers,” Inf. Syst., vol. 121, p. 102342, Mar. 2024, doi: 10.1016/J.IS.2023.102342.
M. Müller, L. Longard, and J. Metternich, “Comparison of preprocessing approaches for text data in digital shop floor management systems,” Procedia CIRP, vol. 107, pp. 179–184, Jan. 2022, doi: 10.1016/J.PROCIR.2022.04.030.
S. Demir and B. Topcu, “Graph-based Turkish text normalization and its impact on noisy text processing,” Eng. Sci. Technol. an Int. J., vol. 35, p. 101192, Nov. 2022, doi: 10.1016/J.JESTCH.2022.101192.
Y. B. Kaya and A. C. Tantu?, “Effect of tokenization granularity for Turkish large language models,” Intell. Syst. with Appl., vol. 21, p. 200335, Mar. 2024, doi: 10.1016/J.ISWA.2024.200335.
K. Madatov, S. Bekchanov, and J. Vi?i?, “Dataset of stopwords extracted from Uzbek texts,” Data Br., vol. 43, p. 108351, Aug. 2022, doi: 10.1016/J.DIB.2022.108351.
M. Nutu, “Deep Learning Approach for Automatic Romanian Lemmatization,” Procedia Comput. Sci., vol. 192, pp. 49–58, Jan. 2021, doi: 10.1016/J.PROCS.2021.08.006.
N. Fatima, S. M. Daudpota, Z. Kastrati, A. S. Imran, S. Hassan, and N. S. Elmitwally, “Improving news headline text generation quality through frequent POS-Tag patterns analysis,” Eng. Appl. Artif. Intell., vol. 125, p. 106718, Oct. 2023, doi: 10.1016/J.ENGAPPAI.2023.106718.
H. Rahab, A. Zitouni, and M. Djoudi, “SANA: Sentiment analysis on newspapers comments in Algeria,” J. King Saud Univ. - Comput. Inf. Sci., vol. 33, no. 7, pp. 899–907, Sep. 2021, doi: 10.1016/J.JKSUCI.2019.04.012.
X. Luo, “Efficient English text classification using selected Machine Learning Techniques,” Alexandria Eng. J., vol. 60, no. 3, pp. 3401–3409, 2021, doi: 10.1016/j.aej.2021.02.009.
V. A. Fitri, R. Andreswari, M. A. Hasibuan, V. A. Fitri, R. Andreswari, and M. A. Hasibuan, “Analysis of Social Media Twitter with Case of Anti- Sentiment Analysis of Social Media Twitter with Case of Anti- LGBT Campaign in Indonesia using Naïve Bayes , Decision Tree , LGBT Campaign in Indonesia using Naïve Bayes , Decision Tree , and Random Fore,” Procedia Comput. Sci., vol. 161, pp. 765–772, 2019, doi: 10.1016/j.procs.2019.11.181.
V. Pichiyan, S. Muthulingam, G. Sathar, S. Nalajala, A. Ch, and M. N. Das, “Web Scraping using Natural Language Processing: Exploiting Unstructured Text for Data Extraction and Analysis,” Procedia Comput. Sci., vol. 230, pp. 193–202, Jan. 2023, doi: 10.1016/J.PROCS.2023.12.074.
S. Choo and W. Kim, “A study on the evaluation of tokenizer performance in natural language processing,” Appl. Artif. Intell., vol. 37, no. 1, 2023, doi: 10.1080/08839514.2023.2175112.
S. S. Id and J. Luo, “Stopwords in technical language processing,” pp. 1–13, 2021, doi: 10.1371/journal.pone.0254937.
M. Anggraeni, M. Syafrullah, and H. A. Damanik, “Literation Hearing Impairment (I-Chat Bot): Natural Language Processing (NLP) and Naïve Bayes Method,” J. Phys. Conf. Ser., vol. 1201, no. 1, 2019, doi: 10.1088/1742-6596/1201/1/012057.
K. X. Han, W. Chien, C. C. Chiu, and Y. T. Cheng, “Application of support vector machine (SVM) in the sentiment analysis of twitter dataset,” Appl. Sci., vol. 10, no. 3, 2020, doi: 10.3390/app10031125.
J. Asian, M. D. Rosita, and T. Mantoro, “Sentiment Analysis for the Brazilian Anesthesiologist Using Multi-Layer Perceptron Classifier and Random Forest Methods,” J. Online Inform., vol. 7, no. 1, pp. 132–141, Sep. 2022, doi: 10.15575/JOIN.V7I1.900.
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Analisis Klasifikasi Teks Pada Kata Slang di Media Sosial Menggunakan Pengolahan Bahasa Alami untuk Trending Topik
ARTICLE HISTORY
Issue
Section
Copyright (c) 2024 Shabrina Rasyid Munthe, Sudi Suryadi, Fadhil Laksono

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).