Harmonizing Multilingual Product Data Using Machine Learning: A Case Study of the Rwanda Revenue Authority

Kamana, Raymond (2025) Harmonizing Multilingual Product Data Using Machine Learning: A Case Study of the Rwanda Revenue Authority. International Journal of Innovative Science and Research Technology, 10 (8): 25aug118. pp. 79-84. ISSN 2456-2165

Abstract

This study focuses on solving the problem of inconsistent and multilingual product names in the Rwanda Revenue Authority’s (RRA) Electronic Billing Machine (EBM) system. Because product names are entered manually, many spelling differences and translations make it hard to track and analyze tax data. To fix this, the study uses Natural Language Processing (NLP) and Machine Learning (ML) to clean and group similar product names. A total of 4.1 million records from 2020 to 2022 were translated into English and processed. Sentence meaning was captured using MiniLM embeddings, then simplified using UMAP, and finally grouped using HDBSCAN. The cleaned and grouped product names make it easier to detect possible fraud, spot underpricing, and improve the accuracy of tax reporting. This method helps RRA improve data quality and tax compliance.

Documents
2234:13466
[thumbnail of IJISRT25AUG118.pdf]
Preview
IJISRT25AUG118.pdf - Published Version

Download (642kB) | Preview
Information
Library
Metrics

Altmetric Metrics

Dimensions Matrics

Statistics

Downloads

Downloads per month over past year

View Item