E. Dritsas, M. Trigka, Ph. Mylonas |
Privacy-Preserving Record Linkage over Big Data Platforms |
The 19th International Conference on Innovations in Intelligent Systems and Applications (INISTA 2025), Ras Al Khaimah, UAE, 29-31 October 2025 |
ABSTRACT
|
Privacy-Preserving Record Linkage (PPRL) integrates sensitive datasets from independent parties without exposing personal identifiers. While secure multiparty computation (SMC) and homomorphic encryption ensure strong privacy, they suffer from high computational costs and poor scalability. Encoding-based methods like Bloom filters are lightweight but face quality issues at scale due to saturation and blocking inefficiencies. This paper proposes a scalable, modular PPRL framework over distributed platforms. It combines Bloom filter encoding, Hamming-based locality-sensitive hashing (LSH), and Dice similarity within a MapReduce pipeline on Hadoop distributed file system (HDFS). The system supports decentralized, end-to-end linkage under semi-honest or covert adversarial models. Experiments on datasets of 100,000–500,000 records show linear scalability, 7.2× speedup over cryptographic baselines, and recall degradation linked to filter saturation. A regression model captures the execution–candidate volume relationship, aiding system tuning. The framework supports high-throughput, regulation-compliant linkage for healthcare, finance, and public sector use.
|
29 October , 2025 |
E. Dritsas, M. Trigka, Ph. Mylonas, "Privacy-Preserving Record Linkage over Big Data Platforms", The 19th International Conference on Innovations in Intelligent Systems and Applications (INISTA 2025), Ras Al Khaimah, UAE, 29-31 October 2025 |
[ PDF] [
BibTex] [
Print] [
Back] |