TALL-Detect: A Transformer and LLM-Augmented Framework for Tamper-Evident and Explainable Secure Log Analysis

Kala, I (2026) TALL-Detect: A Transformer and LLM-Augmented Framework for Tamper-Evident and Explainable Secure Log Analysis. Journal Européen des Systèmes Automatisés, 59 (2). ISSN 12696935

[thumbnail of TALL-Detect A Transformer and LLM-Augmented Framework for Tamper-Evident and Explainable Secure Log Analysis.pdf] Text
TALL-Detect A Transformer and LLM-Augmented Framework for Tamper-Evident and Explainable Secure Log Analysis.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB)

Abstract

Analyzing system logs is essential for cybersecurity, enabling organizations to monitor system behavior, detect attacks, and support digital forensic investigations. However, with the expansion of cloud and distributed computing environments, log file integrity has become increasingly vulnerable to tampering--including modification, deletion, reordering, and injection of false entries. Existing anomaly detection approaches assume untampered log inputs and are therefore ineffective against deliberate, adversarial log manipulation. To address this gap, this paper proposes Transformer and Large Language Model (TALL)-Detect, a Transformer and Large Language Model (LLM)-Augmented framework for tamper-evident and explainable secure log analysis. TALL-Detect combines LLM-based semantic normalization (LLM-Security Risk Classification (SRC)), Bidirectional Encoder Representations from Transformers (BERT)-based contextual embedding, temporal behavior modeling, and cross-system correlation analysis. A unified TamperScore is computed through learned weighted fusion of four anomaly dimensions: semantic, structural, temporal, and cross-system behavioral inconsistency. A semi-supervised adaptive learning module (Semi-RALD) enables adaptation to evolving log patterns with minimal labeled data. TALL-Detect was evaluated on Hadoop Distributed File System (HDFS) and Blue Gene/L (Supercomputer Log Dataset) (BGL) benchmark datasets with synthetically injected tampering at multiple rates. Results demonstrate F1-scores of 0.969 (HDFS) and 0.976 (BGL), with statistically significant improvements over all baselines (p < 0.01, Wilcoxon test) and low false positive rates (FPR) < 0.03.

Item Type: Article
Subjects: Artificial Intelligence and Data Science > Forensics
Artificial Intelligence and Data Science > Cyber Security
Divisions: Computer Science and Engineering
Depositing User: Dr Krishnamurthy V
Date Deposited: 25 Apr 2026 10:08
Last Modified: 25 Apr 2026 10:08
URI: https://ir.psgitech.ac.in/id/eprint/1847

Actions (login required)

View Item
View Item