Publications
Research papers, articles, and academic publications from our team
Prompting fairness: Learning prompts for debiasing large language models
Authors: Camelia Lemnaru, Cristian Andrei Rad
Large language models are prone to internalize social biases due to the characteristics of the data used for their self-supervised training scheme. Considering their recent emergence and wide availability to the general public, it is mandatory to identify and alleviate these biases to avoid perpetuating stereotypes towards underrepresented groups. We present a novel prompt-tuning method for reducing biases in encoder models such as BERT or RoBERTa. Unlike other methods, we only train a small set of additional reusable token embeddings that can be concatenated to any input sequence to reduce bias in the outputs. We particularize this method to gender bias by providing a set of templates used for training the prompts. Evaluations on two benchmarks show that our method is on par with the state of the art while having a limited impact on language modeling ability
A Hybrid Machine Learning–Genetic Algorithm for Optimizing Surface-Mount Technology Planning
Authors: Adrian Petru Groza
We tackle the problem of improving the Surface- Mount Technology (SMT) process planning in an automotive manufacturing setting. Current simulations show low accu- racy across production lines as the existing approach relies on predefined setups rather than adapting to product-specific configurations. We propose a hybrid framework that couples machine learning with a genetic algorithm to generate product- specific plans. Our solution involves three tasks: (i) assigning boards to lines, (ii) allocating components to Pick-and-Place (PnP) machines, and (iii) balancing workloads across machines. Our hybrid pipeline embeds supervised learning in a genetic optimizer. A multi-class classifier selects feasible PnP head con- figurations per Bill of Materials (BOM) part number (precision = 0.73). A genetic algorithm assigns components to compatible feeder tables/machines, while a regression model estimates table cycle times (R² = 0.88). The fitness jointly optimizes Components Placed per Hour (CPH) and Line Balancing (LB) under process constraints. Different mutation methods are explored, revealing that mutation based on balancing the workload by leveling the number of placements on the tables with minimum and maximum cycle time results in an LB of 0.83, with a CPH of 0.37 and an average delta cycle time of -3.27% across 105-part numbers
ConvU-NExT: An Asymmetrical Encoder–Decoder for Denoising Low Dose CT
Authors: Adrian Petru Groza
Low-dose computed tomography (LDCT) is a medical imaging modality designed to minimize ionizing radiation exposure while maintaining the ability to produce detailed cross-sectional images. It is particularly valuable in scenarios requiring repeated imaging, such as cancer screening, follow-up examinations or pediatric diagnostics, where reducing radiation dose is critical to patientsafety. For example, to reduce noise by half, fourtimesthe radiation dose isrequired in the slice. The goal isto achieve postprocessed LDCT images with comparable quality to those obtained from standard-dose CT imaging. We start with a brief overview of the CT procedures and their limitations. Then we introduce a novel denoising method based on an asymmetric integration of the ConvNeXt backbone with the U-Net architecture. This novel approach obtained 2–3 times less noise than the original LDCT, having a 10%–20% increase in performance compared to U-Net implementation, checked against three metrics MSE, SSIMLoss and combinations of both. The results suggest that: (i) augmenting the images with specific noise, obtained from water phantom CT scan test, while training yieldssuperiorresults compared to generic noise augmentations; (ii) a larger kernelsize better extracts features and (iii) a smaller kernel size was mandatory for feature reconstruction
AlloyGraph: Data and Evaluation Results for Multi-Agent AI Superalloy Property Prediction
Authors: Alexandru Lecu, Adrian Petru Groza
Training data (77 alloys from the Nickel Institute handbook), evaluation data (88 alloys from manufacturer datasheets), prediction results for six model configurations, chatbot evaluation benchmarks (250 MCQ questions, 100 RAGAS questions, 12 expert-graded questions), inverse design results (20 target specifications), and OWL ontology for the AlloyGraph platform. Associated repository: https://github.com/AlexLecu/AlloyGraph
OCTA-Based Biomarker Characterization in nAMD
Authors: Adrian Petru Groza
We aim to enhance ophthalmologists' decision-making when diagnosing the Neovascular Age-Related Macular Degeneration (nAMD). We developed three tools to analyze Optical Coherence Tomography Angiography images: (1) extracting biomarkers such as mCNV area and vessel density using image processing; (2) generating a 3D visualization of the neovascularization for a better view of the affected regions; and (3) applying an ensemble of three white box machine learning algorithms (decision tree, support vector machines and DL-Learner) for nAMD diagnosis. The learned expressions reached 100% accuracy for the training data and 68% accuracy in testing. The main advantage is that all the learned models white-box, which ensures explainability and transparency, allowing clinicians to better understand the decision-making process.
Colonic Polyp Detection with Object Detection Models
Authors: Eugen Richard Ardelean
In recent years, deep learning has been applied more and more to medical image analysis. One such application of deep learning is the automated polyp detection in colonoscopy with the target of reducing miss rates. This study presents a comprehensive evaluation of nine state-of-the-art object detection models for colonic polyp detection: YOLOv8, YOLOv9, YOLOv10, YOLO11, YOLO12, YOLO26, RT-DETR, YOLO-World, and YOLOE. The models were evaluated on three publicly available datasets: CVC-ClinicDB, CVC-ColonDB, and ETIS-LaribPolypDB. All models were trained under standardized conditions using identical hyperparameters and data augmentation strategies to guarantee fair comparison. Performance was evaluated using multiple metrics: mAP@50, mAP@50–95, F1 score, precision, recall, inference time, and computational cost. YOLO11 demonstrated the best overall performance, achieving mAP@50 scores of 0.995, 0.944, and 0.978 on the three datasets respectively, while maintaining the fastest inference time of approximately 150 ms per image and the third-lowest computational cost at 21.3 GFLOPs. Cross-dataset generalization experiments revealed a significant loss of performance, with mAP@50 dropping by 20–40% when models were tested on an unseen dataset, highlighting the challenge of true generalization with limited datasets. Statistical analysis by polyp size showed that while all models achieved F1 scores exceeding 0.95 for large polyps, performance decreased to 0.60–0.85 for small polyps, indicating a limitation in detecting small lesions. The analysis of failure modes showed that missed detections, false positives and boundary errors constitute 60–75% of all failures, suggesting that domain adaptation of object detection models may be required.
Performance Evaluation of Large Language Models for Automated Knowledge Graph Generation
Authors: Tudor Cioara
Memetic-based Coordination of Distributed Storage Units Flexibility for Congestion Management
Authors: Tudor Cioara
P2P Energy Trading Coordination in Interconnected Microgrid Systems
Authors: Tudor Cioara
Performance Evaluation of LLMs in Automated RDF Knowledge Graph Generation
Authors: Tudor Cioara
Cloud systems generate large, heterogeneous log data containing critical infrastructure, application, and security information. Transforming these logs into RDF triples enables their integration into knowledge graphs, improving interpretability, root-cause analysis, and cross-service reasoning beyond what raw logs allow. Large Language Models (LLMs) offer a promising approach to automate RDF knowledge graph generation; however, their effectiveness on complex cloud logs remains largely unexplored. In this paper, we evaluate multiple LLM architectures and prompting strategies for automated RDF extraction using a controlled framework with two pipelines for systematically processing semi-structured log data. The extraction pipeline integrates multiple LLMs to identify relevant entities and relationships, automatically generating subject-predicate-object triples. These outputs are evaluated using a dedicated validation pipeline with both syntactic and semantic metrics to assess accuracy, completeness, and quality. Due to the lack of public ground-truth datasets, we created a reference Log-to-KG dataset from OpenStack logs using manual annotation and ontology-driven methods, enabling objective baseline. Our analysis shows that Few-Shot learning is the most effective strategy, with Llama achieving a 99.35% F1 score and 100% valid RDF output while Qwen, NuExtract, and Gemma also perform well under Few-Shot prompting, with Chain-of-Thought approaches maintaining similar accuracy. One-Shot prompting offers a lighter but effective alternative, while Zero-Shot and advanced strategies such as Tree-of-Thought, Self-Critique, and Generate-Multiple perform substantially worse. These results highlight the importance of contextual examples and prompt design for accurate RDF extraction and reveal model-specific limitations across LLM architectures.
Edge-Oriented Orchestration of Energy Services Using Graph-Driven Swarm Intelligence
Authors: Tudor Cioara
As smart grids increasingly depend on IoT devices and distributed energy management, they require decentralized, low latency orchestration of energy services. We address this with a unified framework for edge fog cloud infrastructures tailored to smart energy systems. It features a graph based data model that captures infrastructure and workload, enabling efficient topology exploration and task placement. Leveraging this model, a swarm-based heuristic algorithm handles task offloading in a resource-aware, latency sensitive manner. Our framework ensures data interoperability via energy data space compliance and guarantees traceability using blockchain based workload notarization. We validate our approach with a real-world KubeEdge deployment, demonstrating zero downtime service migration under dynamic workloads while maintaining service continuity.
A systematic review of generative AI usage for IT project management
Authors: Tudor Cioara
This paper aims to synthesize current knowledge on generative AI in IT project management using the PRISMA methodology to provide researchers with a comprehensive perspective on techniques, applications, adoption trends, limitations, and integration across project management tools and process groups. The analysis reveals a clear dominance of OpenAI's GPT in the included studies but relying primarily on prompt engineering, suggesting that research in this area remains at an exploratory stage. Finally, it identifies and discusses three promising research directions for AI-enabled project management, including process group-specific AI agents, project role-based AI agents, and hybrid collaborative networks that enable human-guided orchestration.
Energy forecasting under missing data: Comparative evaluation of augmented representations and decoder-only time-series imputation
Authors: Tudor Cioara
Data-related issues, including missing values and irregular measurements, challenge the accuracy of short-term energy forecasting in smart grids. In data-scarce scenarios, two approaches are commonly considered, but their strengths and weaknesses are not fully mapped. Embedding-based models learn joint representations from heterogeneous data, compensating for the lack of time-series measurements via additional contextual or external sources, whereas imputation pipelines restore temporal continuity but may smooth variability or produce implausible values. To address these limitations, we propose a unified forecasting framework for energy systems that integrates a shared Temporal Fusion Transformer prediction with a controlled degradation protocol to simulate realistic missing-data patterns. This enables a fair and systematic comparison between two pipelines: a representation-augmented learning and decoder-only time series imputation. The former integrates TS2Vec temporal embeddings and BERT-based static contextual representations to provide a richer forecasting space without without explicit reconstruction of missing values. The latter uses a Chronos-2 model to reconstruct missing time-series segments, followed by physics-based correction to enforce physically plausible outputs. We evaluate both pipelines under a controlled data degradation protocol to map the trade-offs between representation learning and data continuity restoration through imputation. We use real-world non-residential building electricity consumption and wind generation datasets. The imputation-based pipeline achieves a mean sMAPE of 10.14% and MAE of 8.43 kWh across 100 buildings, compared to 12.11% and 10.89 kWh for the representation-based approach ( p < 0 . 01 p < 0 . 01 p < 0 . 01 ) . On the wind generation imputation also improves predictive accuracy ( R 2 = 0 . 870 vs. R 2 = 0 . 794 ). However representation-based models remain competitive in scenarios with irregular, spike-dominated, or event-driven consumption patterns where imputation provides limited additional benefits.
Prompts and Prayers: the Rise of GPTheology
Authors: Adrian Petru Groza
Increasingly artificial intelligence (AI) has been cast in “god-like” roles (to name a few: film industry – Matrix, The Creator, Mission Impossible, Foundation, Dune etc.; literature – Children of Time, Permutation City, Neuromancer, I Have no Mouth and I Must Scream, Alphaville etc.). This trend has accelerated with the advent of sophisticated Large Language Models such as ChatGPT. For this phenomenon, where AI is perceived as divine, we use the term GPTheology, where ChatGPT and other AI models are treated as potential oracles of a semi-divine nature. This paper explores the emergence of GPTheology as a form of techno-religion, examining how narratives around AI echo traditional religious constructs. We draw on community narratives from online forums – Reddit – and recent projects – AI-powered Mazu Statue in Malaysia (Lu, 2025); “ShamAIn” Project in Korea (He-rim, 2025); AI Jesus in a Swiss Church (Kennedy, 2024). These examples show striking similarities to technological notions of the Singularity and the development of Artificial General Intelligence (AGI). Additionally, we analyse how daily interactions with AI are acquiring ritualistic associations and how AI-centric ideologies clash with or are integrated into established religions. This study uses a dataset of Reddit posts discussing AI to identify recurring themes of salvation, prophecy, and demonization surrounding AI. Our findings suggest that new belief systems are developing around AI, and this carries both philosophical and sociotechnical implications. Our paper critically analyses the benefits and dangers, as well as the social, political and ethical challenges of this development. This transdisciplinary inquiry highlights how AI and religion are increasingly intertwined, prompting necessary questions about humanity’s relationship with its creations and the future of belief.
A Comparative Survey of Social Bias in Text and Image Generation: Gaps, Directions and Compliance with the EU AI Act
Authors: Cristian Andrei Rad, Camelia Lemnaru
Generative artificial intelligence models, including large language models and image generation models, are increasingly deployed in socially impactful domains. However, these models often exhibit social biases that can amplify stereotypes and produce harmful, discriminatory outputs. In this paper, we present a modality-comparative survey of social bias in text and image generation, structured around four components: benchmarks, bias identification, measurement, and mitigation. We systematically analyze methodological parallels and divergences across the two modalities, highlighting emerging research trends and identifying gaps. Finally, we map current image generation research efforts to the EU AI Act’s technical requirements, offering insights into how the community can advance towards more fair, safe, and trustworthy systems.
MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection
Authors: Adrian Petru Groza, Alexandru Lecu
The large spread of disinformation across digital platforms creates significant challenges to information integrity. This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles, focusing on titles and short text snippets. The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent (which relies on named entity recognition), (iii) a coherence detection agent (using LLM prompt engineering), and (iv) a web-scraped data analyzer that extracts relational triplets for fact checking. The system is orchestrated via the Model Context Protocol (MCP), offering shared context and live learning across components. Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches. The weighted aggregation method, mathematically derived from individual agent misclassification rates, proves superior to algorithmic threshold optimization. The modular architecture makes the system easily scalable, while also maintaining details of the decision processes.
Reducing Hallucinations in Medical AI: A Knowledge Graph-Augmented Retrieval System for Evidence-Based Age-Related Macular Degeneration Information
Authors: Alexandru Lecu, Adrian Petru Groza
Large language models (LLMs) have significantly advanced natural language generation but frequently produce unverified outputs, compromising their reliability in critical medical applications. We present a framework that combines structured biomedical knowledge with LLMs through retrieval-augmented generation to address this challenge. Our system automatically extracts causal relationships from 5 000 age-related macular degeneration (AMD) abstracts, building a knowledge graph with over 43 200 validated relations. Using vector-based retrieval, the framework generates contextually relevant and verifiable responses with direct clinical evidence links. We evaluated our approach across eight language models, including open-source models from 1B to 70B parameters (LLama, Mistral, Qwen, SmolLM) and GPT-5-mini, on 3 000 queries with varying question types and reasoning complexity. Smaller models (3B parameters) showed substantial improvements: SmolLM3-3B reached 95.6% accuracy on singlehop true/false questions (from 78.2% baseline). The medium-scale model Mistral-7B demonstrated the largest gains on complex multi-hop reasoning, improving from 45% to 76% accuracy on multiple-choice questions. Larger models (70B parameters) showed minimal improvement due to already high baseline performance (97-98% accuracy). Our results demonstrate that RAG-enhanced knowledge graphs enable resource-efficient smaller models to achieve performance levels approaching or matching larger models, reducing hallucinations while maintaining computational efficiency for clinical deployment [PDF](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11298209)
TADA: Training-free Attribution and Out-of-Domain Detection of Audio Deepfakes
Authors: David Combei
Deepfake detection has gained significant attention across audio, text, and image modalities, with high accuracy in distinguishing real from fake. However, identifying the exact source—such as the system or model behind a deepfake—remains a less studied problem. In this paper, we take a significant step forward in audio deepfake model attribution or source tracing by proposing a training-free, green AI approach based entirely on k-Nearest Neighbors (kNN). Leveraging a pre-trained self-supervised learning (SSL) model, we show that grouping samples from the same generator is straightforward– we obtain an 0.93 F1-score across five deepfake datasets. The method also demonstrates strong out-of-domain (OOD) detection, effectively identifying samples from unseen models at an F1-score of 0.84
Unmasking real-world audio deepfakes: A data-centric approach
Authors: David Combei
The growing prevalence of real-world deepfakes presents a critical challenge for existing detection systems, which are often evaluated on datasets collected just for scientific purposes. To address this gap, we introduce a novel dataset of real-world audio deepfakes. Our analysis reveals that these real-world examples pose significant challenges, even for the most performant detection models. Rather than increasing model complexity or exhaustively search for a better alternative, in this work we focus on a data-centric paradigm, employing strategies like dataset curation, pruning, and augmentation to improve model robustness and generalization. Through these methods, we achieve a 55% relative reduction in EER on the In-the-Wild dataset, reaching an absolute EER of 1.7%, and a 63% reduction on our newly proposed real-world deepfakes dataset, AI4T. These results highlight the transformative potential of data-centric approaches in enhancing deepfake detection for real-world applications.
Replay Attacks Against Audio Deepfake Detection
We show how replay attacks undermine audio deepfake detection: By playing and re-recording deepfake audio through various speakers and microphones, we make spoofed samples appear authentic to the detection model. To study this phenomenon in more detail, we introduce ReplayDF, a dataset of recordings derived from M-AILABS and MLAAD, featuring 109 speaker-microphone combinations across six languages and four TTS models. It includes diverse acoustic conditions, some highly challenging for detection. Our analysis of six open-source detection models across five datasets reveals significant vulnerability, with the topperforming W2V2-AASIST model’s Equal Error Rate (EER) surging from 4.7% to 18.2%. Even with adaptive Room Impulse Response (RIR) retraining, performance remains compromised with an 11.0% EER. We release ReplayDF for noncommercial research use.
lujteam at SemEval-2025 Task 10: Finetuning SmolLM2 with Taxonomy-based Prompting for Explaining the Dominant Narrative in Propaganda Text
Authors: Anca Nicoleta Mărginean
XAI has been a long-standing goal of AI. Explaining why a text can be considered to have a dominant narrative, where the narrative is known, is of great importance for dealing with propaganda in the news. This paper reports on the participation of the system clujteam in Subtask 3 of Task 10 of Semveal 2025. The system obtained 7th place with a value of 0.72464 for F1macro, at 0.026 distance from the 1st place. The key components of the solution are the given taxonomy for the narratives and supervised fine-tuning of SmolLM2.
On the Contribution of Lexical Features to Speech Emotion Recognition
Authors: David Combei
Although paralinguistic cues are often considered the primary drivers of speech emotion recognition (SER), we investigate the role of lexical content extracted from speech and show that it can achieve competitive—and in some cases higher—performance compared to acoustic models. On the MELD dataset, our lexical-based approach obtains a weighted F1-score (WF1) of 51.5%, compared to 49.3% for an acousticonly pipeline with a larger parameter count. Furthermore, we analyze different self-supervised (SSL) speech and text representations, conduct a layer-wise study of transformer-based encoders, and evaluate the effect of audio denoising.
Self-Explanatory Disinformation Detection with Expert-Guided Refinement
Authors: Ioana Cheres, Adrian Petru Groza
To be added after acceptance
Using LLMs and ontologies to extract causal relationships from medical abstracts
Authors: Alexandru Lecu, Adrian Petru Groza
The substantiation of the causal relationships behind its development is very important in identifying possible interventions and early treatment. Knowledge Graphs (KG) play a crucial role in the medical research domain by organizing data into interconnected structures that represent relationships between entities such as disease, treatments, and progressions. This paper shows a complete workflow that demonstrates the extraction of causal relationships from medical abstracts using a fine-tuned GPT-based model and the integration of these relationships into a KG.