E. Dritsas, M. Trigka, Ph. Mylonas |
Efficient Query Filtering in Big Data Using Entropy-Based Learned Indexing |
The 19th International Conference on Innovations in Intelligent Systems and Applications (INISTA 2025), Ras Al Khaimah, UAE, 29-31 October 2025 |
ABSTRACT
|
This paper explores the use of neural networks as lightweight surrogates for guiding value localization over numeric attributes in analytical data systems. Instead of modeling key-position mappings directly, we frame the problem as a multi-class classification task over discretized bins, enabling approximate query support without explicit index structures. The model is trained in a self-supervised manner using pseudo-labels from data-driven binning schemes, requiring no manual annotation. Experiments on three real-world datasets show that our method achieves classification accuracy above 94.7% for coarse binning (K = 5), with consistent F1 scores and inference times under 0.6 milliseconds (ms). The approach remains robust under skewed distributions using entropy- or quantile-based binning and is well suited to latency-sensitive tasks such as filtering or partition pruning. While this study focuses on univariate attributes, the method is extensible to multivariate scenarios via Multilayer Perceptrons (MLPs) or attention-based models. Overall, neural surrogates offer a practical complement to traditional indexing in approximate or edge-driven analytical workloads.
|
29 October , 2025 |
E. Dritsas, M. Trigka, Ph. Mylonas, "Efficient Query Filtering in Big Data Using Entropy-Based Learned Indexing", The 19th International Conference on Innovations in Intelligent Systems and Applications (INISTA 2025), Ras Al Khaimah, UAE, 29-31 October 2025 |
[ PDF] [
BibTex] [
Print] [
Back] |