European Journal of Information Technologies and Computer Science https://ej-compute.org/index.php/compute European Journal of Information Technologies and Computer Science EUROPA Publishing en-US European Journal of Information Technologies and Computer Science 2736-5492 Domain-Adaptive Pretraining of Transformer-Based Language Models on Medical Texts: A High-Performance Computing Experiment https://ej-compute.org/index.php/compute/article/view/149 <p class="p1">This research was to investigate the effect of utilizing high-performance computing (HPC) resources to enhance the adaptability and performance of transformer-based language models. The research was done through intensive domain-specific pretraining in the medical domain. The study aimed to answer the question: Can domain-adaptive pretraining on medical texts significantly improve language model performance metrics such as perplexity while maintaining computational efficiency and addressing ethical considerations? The research utilized a corpus of medical texts. These were carefully split into training and evaluation datasets. Initial model training on NVIDIA A30 GPUs, with 96% GPU utilization, calculated an average perplexity of 73.54. Following iterative refinements—including domain-specific tokenizer optimization, data preprocessing, mixed-precision training, and adjusted learning parameters—the final model achieved an average perplexity of 3.39. The evaluation run processed 7103 samples in 98.02 seconds, with a training loss of 2.405 and an evaluation loss of 2.045, indicating strong generalization and the absence of overfitting. The final model and results were saved for reproducibility and future use. This study was justified by the pressing need for accurate and efficient medical natural language processing (NLP) applications. The application areas are in clinical decision support, patient record summarization, and medical research analysis. The research findings highlight that investing in HPC-driven domain-adaptive pretraining delivers substantial improvements in performance. It also equips medical NLP models with abilities to handle the complexities of domain-specific language effectively. The Ethical considerations of this research were based on optimizing GPU utilization to reduce energy consumption and ensure transparency through reproducible methodologies. We recommend future research to explore larger medical datasets, broader clinical specializations, and diverse transformer architectures while also investigating the transferability of learned representations across related medical subdomains. The advancements could further enhance the applicability of specialized language models in medical research and practice.</p> Charles Kinyua Gitonga Lydia Gakii Mugao Copyright (c) 2025 Charles Kinyua Gitonga, Lydia Gakii Mugao http://creativecommons.org/licenses/by-nc/4.0 2025-04-03 2025-04-03 5 2 1 9 10.24018/compute.2025.5.2.149 Exploring RAG Solutions for a Specific Language: Albanian https://ej-compute.org/index.php/compute/article/view/148 <p class="p1">The primary goal of this project is to develop a powerful information retrieval and question-answering system specifically tailored for Albanian- speaking users, bridging the gap between traditional document search methods and modern, context-aware responses. This solution aims to address the unique linguistic and document-processing challenges present in Albanian-language data by combining state-of-the-art Retrieval-Augmented Generation (RAG) techniques with advanced natural language processing (NLP) capabilities. Through the implementation of this RAG solution, we aim to empower organizations, educational institutions, and users in Albanian-speaking regions with fast, accurate, and contextually relevant access to information within their documents. By leveraging vector- based search, large language models, and optimized document processing adapted to the nuances of the Albanian language, this system will simplify information access, reduce reliance on manual searches, and enhance decision-making processes. Retrieval-augmented generation (RAG) is a technique for increasing the accuracy and reliability of generative models of Artificial Intelligence with facts obtained from various external resources. This technique or solution fills a gap in the way LLM works. In other words, LLMs are like neural networks of the brain, usually measured by the number of parameters they contain in the current digital era, organizations and institutions in Albanian-speaking regions face significant challenges in processing, analyzing, and efficiently retrieving information from their documents. Traditional search methods often fail to understand the contextual nuances of the Albanian language, leading to inefficient information retrieval and suboptimal user experiences. Also, the lack of specialized “Natural language processing” or NLP (natural language processing) tools for the Albanian language creates barriers in the effective implementation of document management and question-answering systems.</p> Leotrim Ramadani Fisnik Doko Copyright (c) 2025 Leotrim Ramadani, Fisnik Doko http://creativecommons.org/licenses/by-nc/4.0 2025-02-12 2025-02-12 5 2 26 31 10.24018/compute.2025.5.1.148 UAV Path Planning Based on Butterfly Optimization Algorithm in Three-Dimensional Space https://ej-compute.org/index.php/compute/article/view/147 <p>The BOA is a novel optimization algorithm, which is inspired by the butterfly and enables the searching for the best solutions in a respective search area. The algorithm can be set to targeted goals like the amount of distance needed to cover, or/and the presence of an obstacle or the completion of the particular mission objectives. I applied the BOA to generate paths of UAVs on a three-dimensional space and considered the objectives of collision urgency, energy consumption, and near-optimal path planning. Specifically for the assessment of the algorithm, I simulated the application of MATLAB and apply multiple scenarios both on two-dimensional and on three-dimensional environments. I also benchmarked the BOA with two other algorithms including the Ant Colony Optimization and Particle Swarm Optimization (PSO). The results proved that the BOA performed better than the GA in terms of cost function and the time required to arrive at the optimal solution especially in 3D solid terrain. By analyzing the simulation results, the flexibility of the BOA in a 3D environment is evident when new changes take place in the environment. Moreover, the algorithm showed rather swift reaction in terms of path acting in response to various unexpected obstacles. The proposed BOA is viable for the path planning of UAVs in three-dimensional space and effective compared to the other optimization algorithms.</p> Maytham Kadhim Srayyih Copyright (c) 2025 Maytham Kadhim Srayyih http://creativecommons.org/licenses/by-nc/4.0 2025-01-21 2025-01-21 5 2 21 25 10.24018/compute.2025.5.1.147 The Impact of Quantum Computing on Cryptographic Systems: Urgency of Quantum-Resistant Algorithms and Practical Applications in Cryptography https://ej-compute.org/index.php/compute/article/view/146 <p class="p1">Quantum computing presents computational powers previously thought unattainable. This brings severe threats to classical cryptographic methods, especially <em>RSA</em> and <em>ECC</em>. This paper addresses these risks through a detailed investigation of quantum-resistant algorithms, focusing on lattice- based (<em>CRYSTALS-Kyber</em>), hash-based (<em>SPHINCS+</em>), and code-based (<em>McEliece</em>) systems. Research questions guiding this study include: How vulnerable are traditional algorithms under quantum attack, and which quantum-resistant alternatives offer viable performance and security trade-offs? Through simulations, we analyzed key metrics like encryption speeds, key sizes, and efficiency under quantum threats. Additionally, we demonstrated vulnerabilities in <em>RSA-2048</em> and <em>ECC-256</em> under Shor’s algorithm, emphasizing the necessity for quantum-resistant cryptography. Our results highlighted <em>CRYSTALS-Kyber</em> as a balanced candidate, aligning with the NIST<em> PQC</em> Standardization, while Quantum Key Distribution (<em>QKD</em>) is reviewed for high-sensitivity contexts. Given the forecasted advancements in quantum hardware, we propose a transitional approach using hybrid cryptographic systems to ensure immediate security and ease the shift to quantum-safe protocols. This study also explores industry applications, particularly in finance, healthcare, and IoT, recommending a phased adoption strategy utilizing hybrid cryptographic systems for a secure, gradual transition.</p> Charles Kinyua Gitonga Copyright (c) 2025 Charles Kinyua Gitonga http://creativecommons.org/licenses/by-nc/4.0 2025-01-14 2025-01-14 5 2 1 10 10.24018/compute.2025.5.1.146 A New Fault Method Detection for Wireless Sensor Networks using “Autoencoder and LS-SVM” https://ej-compute.org/index.php/compute/article/view/145 <p style="margin: 0cm; margin-bottom: .0001pt; text-align: justify;">This study focuses on the issue of fault detection in WSNs while not disturbing the flow of data; and it presents a comprehensive, and new approach to dealing with the problem. The first steps in the context of the developed methodology for application to the data of stock exchanges include scaling of samples by the method of min-max, transformation of windows of samples as part of data preparation, as well as preliminary data cleaning and accurate division of data into sections. And these steps are important for dataset preparation for further analysis. The proposed method relies on the integration of Autoencoders put alongside Least Squares Support Vector Machines (LSSVM). An Autoencoder network was developed and the size of the hidden nodes was later adjusted to identify internal parameters in the dataset. It was helpful for the subsequent reconstructions of the data scene and allowed to obtain high-level features required for fault detection. With the help of these extracted features, LSSVM model was developed towards classifying no+rmal and anomalous condition in WSNs, The training outcome exhibited high effectiveness where anticipated indexes of training data set were 99.77% and for the test data set were 99%. The above outcomes support the feasibility and accuracy of the applied approach in fault recognition. The thesis greatly helps in the progression of the field by providing a methodical way of addressing the important problem of fault detection in WSNs and providing experimental evidence and analysis for the stated problem.</p> Falah Hasan Hani Copyright (c) 2024 Falah Hasan Hani http://creativecommons.org/licenses/by-nc/4.0 2024-12-05 2024-12-05 5 2 18 24 10.24018/compute.2024.4.5.145