Harnessing NLP and Big Data to Solve Linguistic Challenges in Indonesian Humanoid Robots: Pathways to Innovation and Entrepreneurship
DOI:
https://doi.org/10.64268/josce.v1i2.35Keywords:
Digital Entrepreneurship, Innovation Ecosystem, Language Technology Commercialization, NLP-Based Startups, Supply Chain of AI ProductsAbstract
Aim: Indonesian, as a national language, contains intricate linguistic features such as agglutinative morphology, idioms, and numerous dialectal variations. These characteristics present significant challenges in developing humanoid robots capable of natural interaction through Natural Language Processing (NLP). This study aims to address these linguistic complexities while exploring the entrepreneurial potential of localized NLP applications in Indonesia.
Methods: The research employs a qualitative literature review method, focusing on existing studies related to Indonesian NLP datasets, transformer-based language models, and speech technologies. Key sources include IndoNLI for inference, IndoSentiment for sentiment analysis, and case studies of humanoid robots like Lumen. The analysis also includes approaches utilizing Big Data, multi-pass decoders, and contextual language modeling to optimize performance in Indonesian linguistic settings.
Findings: Findings indicate that the successful development of Indonesian-speaking humanoid robots relies on context-aware NLP models trained on representative, culturally relevant datasets. Integrating multimodal systems and Big Data enables enhanced comprehension of idiomatic, regional, and informal expressions. The research also reveals that NLP-based innovations can be commercialized through AI-powered assistants, educational bots, and digital customer service, opening new opportunities for tech-driven entrepreneurship.
Significance: This study contributes to both technological advancement and business innovation by linking linguistic AI research with entrepreneurial applications. It underscores the importance of building a robust local data ecosystem and designing language models that reflect Indonesia’s linguistic diversity. These insights are vital not only for improving human-robot interaction but also for fostering sustainable digital entrepreneurship within emerging markets like Indonesia.
References
Budiharto, W. (2020). Deep Learning-Based Question Answering System For Intelligent Humanoid Robot. Journal Of Big Data, 7(1). Https://Doi.Org/10.1186/S40537-020-00341-6
Cahyawijaya, S., Aji, A. F., Lovenia, H., Winata, G. I., Wilie, B., Mahendra, R., Koto, F., Moeljadi, D.
Vincentio, K., Romadhony, A., & Purwarianti, A. (2022). Nusacrowd: A Call For Open And Reproducible NLP Research In Indonesian Languages. Arxiv. Https://Doi.Org/10.48550/Arxiv.2205.15960
Ernawati, I. A., Brawijaya, K. S., Aini, F., & Nurhayati, E. (2023). Perkembangan Ragam Bahasa Dalam Komunikasi Mahasiswa Di Lingkungan Kampus UPN “Veteran” Jawa Timur. Jurnal Pengabdian West Science, 2(6), 406–420. Https://Doi.Org/10.58812/Jpws.V2i6.388
Heinrich, S., & Wermter, S. (2011). Towards Robust Speech Recognition For Human-Robot Interaction. In Proceedings Of The IROS Workshop On Cognitive Neuroscience Robotics (CNR) (Pp. 29–34).
Jiono, M. (2020). Self Localization Based On Neighborhood Probability Mapping For Humanoid Robot. In 4th International Conference On Vocational Education And Training (ICOVET 2020) (Pp. 355–359). Https://Doi.Org/10.1109/ICOVET50258.2020.9230237
Mahendra, R., Aji, A. F., Louvan, S., Rahman, F., & Vania, C. (2021). Indonli: A Natural Language Inference Dataset For Indonesian. In Proceedings Of The 2021 Conference On Empirical Methods In Natural Language Processing (Pp. 10511–10527). Https://Doi.Org/10.18653/V1/2021.Emnlp-Main.821
Moleong, L. J. (2018). Metodologi Penelitian Kualitatif (Ed. Revisi). Remaja Rosdakarya.
Sya, S. S., & Prihatmanto, A. S. (2015). Design And Implementation Of Image Processing System For Lumen Social Robot-Humanoid As An Exhibition Guide For Electrical Engineering Days. Proceedings Of Electrical Engineering Days. Https://Ieeexplore.Ieee.Org/Document/7738307
Sugiyono. (2019). Metode Penelitian Kualitatif, Kuantitatif, Dan R&D. Alfabeta.
Chen, J., Liu, Z., Huang, X., Wu, C., Liu, Q., Jiang, G., Pu, Y., Lei, Y., Chen, X., Wang, X., Zheng, K., Lian, D., & Chen, E. (2024). When large language models meet personalization: Perspectives of challenges and opportunities. World Wide Web, 27(4), 42. https://doi.org/10.1007/s11280-024-01276-1
Cucchiarini, C., Hubers, F., & Strik, H. (2022). Learning L2 idioms in a CALL environment: The role of practice intensity, modality, and idiom properties. Computer Assisted Language Learning, 35(4), 863–891. https://doi.org/10.1080/09588221.2020.1752734
Diao, L., & Hu, P. (2021). Deep learning and multimodal target recognition of complex and ambiguous words in automated English learning system. Journal of Intelligent & Fuzzy Systems, 40(4), 7147–7158. https://doi.org/10.3233/JIFS-189543
Ferasso, M., Tortato, U., & Ikram, M. (2023). Mapping the Circular Economy in the Small and Medium-sized Enterprises field: An exploratory network analysis. Cleaner and Responsible Consumption, 11, 100149. https://doi.org/10.1016/j.clrc.2023.100149
Gruetzemacher, R., & Paradice, D. (2022). Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research. ACM Comput. Surv., 54(10s), 204:1-204:35. https://doi.org/10.1145/3505245
Hussain, M. (2023). When, Where, and Which?: Navigating the Intersection of Computer Vision and Generative AI for Strategic Business Integration. IEEE Access, 11, 127202–127215. https://doi.org/10.1109/ACCESS.2023.3332468
Lin, J., Dai, X., Xi, Y., Liu, W., Chen, B., Zhang, H., Liu, Y., Wu, C., Li, X., Zhu, C., Guo, H., Yu, Y., Tang, R., & Zhang, W. (2025). How Can Recommender Systems Benefit from Large Language Models: A Survey. ACM Trans. Inf. Syst., 43(2), 28:1-28:47. https://doi.org/10.1145/3678004
Luckyardi, S., Karin, J., Rosmaladewi, R., Hufad, A., & Haristiani, N. (2024). Chatbots as Digital Language Tutors: Revolutionizing Education Through AI. Indonesian Journal of Science and Technology, 9(3), Article 3. https://doi.org/10.17509/ijost.v9i3.79514
Luckyardi, S., Munawaroh, S., Abduh, A., Rosmaladewi, R., Hufad, A., & Haristiani, N. (2024). Advancing Language Education in Indonesia: Integrating Technology and Innovations. ASEAN Journal of Science and Engineering, 4(3), Article 3. https://doi.org/10.17509/ajse.v4i3.79471
Nasution, A. H., & Onan, A. (2024). ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-Resource Language NLP Tasks. IEEE Access, 12, 71876–71900. https://doi.org/10.1109/ACCESS.2024.3402809
Paramesha, M., Rane, N., & Rane, J. (2024). Big Data Analytics, Artificial Intelligence, Machine Learning, Internet of Things, and Blockchain for Enhanced Business Intelligence (SSRN Scholarly Paper 4855856). Social Science Research Network. https://doi.org/10.2139/ssrn.4855856
Ragno, L., Borboni, A., Vannetti, F., Amici, C., & Cusano, N. (2023). Application of Social Robots in Healthcare: Review on Characteristics, Requirements, Technical Solutions. Sensors, 23(15), Article 15. https://doi.org/10.3390/s23156820
Schiavo, F., Campitiello, L., Todino, M. D., & Di Tore, P. A. (2024). Educational Robots, Emotion Recognition and ASD: New Horizon in Special Education. Education Sciences, 14(3), Article 3. https://doi.org/10.3390/educsci14030258
Younis, H. A., Ruhaiyem, N. I. R., Ghaban, W., Gazem, N. A., & Nasser, M. (2023). A Systematic Literature Review on the Applications of Robots and Natural Language Processing in Education. Electronics, 12(13), Article 13. https://doi.org/10.3390/electronics12132864
Zhao, S., Wu, Y., Tsang, Y.-K., Sui, X., & Zhu, Z. (2021). Morpho-semantic analysis of ambiguous morphemes in Chinese compound word recognition: An fMRI study. Neuropsychologia, 157, 107862. https://doi.org/10.1016/j.neuropsychologia.2021.107862
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Syaifullah Syaifullah, Wenny Noorahim

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.