2025-12-12

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum Algorithms	Yuewen Hou, Dhanvi Bharadwaj, Gokul Subramanian Ravi	2025-12-12	下载	Variational Quantum Algorithms (VQAs) are promising for near- and intermediate-term quantum computing, but their execution cost is substantial.
Leveraging FPGAs for Homomorphic Matrix-Vector Multiplication in Oblivious Message Retrieval	Grant Bosworth, Keewoo Lee, Sunwoong Kim	2025-12-12	下载	While end-to-end encryption protects the content of messages, it does not secure metadata, which exposes sender and receiver information through traffic analysis.
Using GUI Agent for Electronic Design Automation	Chunyi Li, Longfei Li, Zicheng Zhang, Xiaohong Liu, Min Tang, Weisi Lin, Guangtao Zhai	2025-12-12	下载	Graphical User Interface (GUI) agents adopt an end-to-end paradigm that maps a screenshot to an action sequence, thereby automating repetitive tasks in virtual environments.
PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration	Yifan Zhang, Zhiheng Chen, Ye Qiao, Sitao Huang	2025-12-12	下载	Aggressively quantized large language models (LLMs), such as BitNet-style 1.58-bit Transformers with ternary weights, make it feasible to deploy generative AI on low-power edge FPGAs.
CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM Inference	Yanru Chen, Runyang Tian, Yue Pan, Zheyu Li, Weihong Xu, Tajana Rosing	2025-12-12	下载	The proliferation of large language models (LLMs) is accelerating the integration of multimodal assistants into edge devices, where inference is executed under stringent latency and energy constraints...
RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs	Yanru Chen, Zheyu Li, Keming Fan, Runyang Tian, John Hsu, Weihong Xu, Minxuan Zhou, Tajana Rosing	2025-12-12	下载	All-pairs shortest paths (APSP) remains a major bottleneck for large-scale graph analytics, as data movement with cubic complexity overwhelms the bandwidth of conventional memory hierarchies.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum Algorithms	Yuewen Hou, Dhanvi Bharadwaj, Gokul Subramanian Ravi	2025-12-12	下载	Variational Quantum Algorithms (VQAs) are promising for near- and intermediate-term quantum computing, but their execution cost is substantial.
Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs	Shiju Li, Younghoon Min, Hane Yie, Hoshik Kim, Soohong Ahn, Joonseop Sim, Chul-Ho Lee, Jongryool Kim	2025-12-12	下载	Sparse General Matrix-Matrix Multiplication (SpGEMM) is a fundamental operation in numerous scientific computing and data analytics applications, often bottlenecked by irregular memory access patterns...
MVP-ORAM: a Wait-free Concurrent ORAM for Confidential BFT Storage	Robin Vassantlal, Hasan Heydari, Bernardo Ferreira, Alysson Bessani	2025-12-12	下载	It is well known that encryption alone is not enough to protect data privacy. Access patterns, revealed when operations are performed, can also be leveraged in inference attacks.
Hypergraph based Multi-Party Payment Channel	Ayush Nainwal, Atharva Kamble, Nitin Awathare	2025-12-12	下载	Public blockchains inherently offer low throughput and high latency, motivating off-chain scalability solutions such as Payment Channel Networks (PCNs).
ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning	Yuze He, Ferdi Kossmann, Srinivasan Seshan, Peter Steenkiste	2025-12-12	下载	Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras.
Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity	Manideep Reddy Chinthareddy	2025-12-12	下载	Snowflake-style distributed ID generators are the industry standard for producing k-ordered, unique identifiers at scale. However, the traditional requirement for manually assigned or centrally coordi...
FirecREST v2: lessons learned from redesigning an API for scalable HPC resource access	Elia Palme, Juan Pablo Dorsch, Ali Khosravi, Giovanni Pizzi, Francesco Pagnamenta, Andrea Ceriani, Eirini Koutsaniti, Rafael Sarmiento, Ivano Bonesana, Alejandro Dabin	2025-12-12	下载	Introducing FirecREST v2, the next generation of our open-source RESTful API for programmatic access to HPC resources. FirecREST v2 delivers a 100x performance improvement over its predecessor.
Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems	Chong Tang, Hao Dai, Jagmohan Chauhan	2025-12-12	下载	The growing demand for real-time DNN applications on edge devices necessitates faster inference of increasingly complex models. Although many devices include specialized accelerators (e.g.
Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet Messaging	Patrick D. Manya, Eugene M. Mbuyi, Gothy T. Ngoie, Jordan F. Masakuna	2025-12-12	下载	Identifying central nodes using closeness centrality is a critical task in analyzing large-scale complex networks, yet its decentralized computation remains challenging due to high communication overh...
LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models	Yijie Zhi, Yayu Cao, Jianhua Dai, Xiaoyang Han, Jingwen Pu, Qingran Wu, Sheng Cheng, Ming Cai	2025-12-12	下载	Loop transformations are semantics-preserving optimization techniques, widely used to maximize objectives such as parallelism. Despite decades of research, applying the optimal composition of loop tra...
RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training	Tianyuan Wu, Lunxi Cao, Yining Wei, Wei Gao, Yuheng Zhao, Dakai An, Shaopan Xiong, Zhiqiang Lv, Ju Huang, Siran Yang, Yinghao Yu, Jiamang Wang, Lin Qu, Wei Wang	2025-12-12	下载	Rollout-training disaggregation is emerging as the standard architecture for Reinforcement Learning (RL) post-training, where memory-bound rollout and compute-bound training are physically disaggregat...
RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs	Yanru Chen, Zheyu Li, Keming Fan, Runyang Tian, John Hsu, Weihong Xu, Minxuan Zhou, Tajana Rosing	2025-12-12	下载	All-pairs shortest paths (APSP) remains a major bottleneck for large-scale graph analytics, as data movement with cubic complexity overwhelms the bandwidth of conventional memory hierarchies.
Theoretical Foundations of GPU-Native Compilation for Rapid Code Iteration	Adilet Metinov, Gulida M. Kudakeeva, Gulnara D. Kabaeva	2025-12-12	下载	Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Leaner and Faster Web: How CBOR Can Improve Dynamic Content Encoding in JSON and DNS over HTTPS	Martine S. Lenders, Carsten Bormann, Thomas C. Schmidt, Matthias Wählisch	2025-12-12	下载	The Internet community has taken major efforts to decrease latency in the World Wide Web. Significant improvements have been achieved in accelerating content transport and in compressing static conten...
Policy Gradient Algorithms for Age-of-Information Cost Minimization	José-Ramón Vidal, Vicent Pla, Luis Guijarro, Israel Leyva-Mayorga	2025-12-12	下载	Recent developments in cyber-physical systems have increased the importance of maximizing the freshness of the information about the physical environment.
Hypergraph based Multi-Party Payment Channel	Ayush Nainwal, Atharva Kamble, Nitin Awathare	2025-12-12	下载	Public blockchains inherently offer low throughput and high latency, motivating off-chain scalability solutions such as Payment Channel Networks (PCNs).
ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning	Yuze He, Ferdi Kossmann, Srinivasan Seshan, Peter Steenkiste	2025-12-12	下载	Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras.
Toward Scalable VR-Cloud Gaming: An Attention-aware Adaptive Resource Allocation Framework for 6G Networks	Gabriel Almeida, João Paulo Esper, Cleverson Nahum, Aldebaro Klautau, Kleber Vieira Cardoso	2025-12-12	下载	Virtual Reality Cloud Gaming (VR-CG) represents a demanding class of immersive applications, requiring high bandwidth, ultra-low latency, and intelligent resource management to ensure optimal user exp...
Proving DNSSEC Correctness: A Formal Approach to Secure Domain Name Resolution	Qifan Zhang, Zilin Shen, Imtiaz Karim, Elisa Bertino, Zhou Li	2025-12-12	下载	The Domain Name System Security Extensions (DNSSEC) are critical for preventing DNS spoofing, yet its specifications contain ambiguities and vulnerabilities that elude traditional "break-and-fix" appr...
Distributed Resource Allocation and Application Deployment in Mesh Edge Networks	Antoine Bernard, Antoine Legrain, Maroua Ben Attia, Abdo Shabah	2025-12-12	下载	Virtual Network Embedding (VNE) approaches typically assume static or slowly-changing network topologies, but emerging applications require deployment in mobile environments where traditional methods ...
A Survey on Reconfigurable Intelligent Surfaces in Practical Systems: Security and Privacy Perspectives	Ziyu Chen, Yitong Shen, Jingzhe Zhang, Yao Zheng, Yili Ren, Xuyu Wang, Shiwen Mao, Hanqing Guo	2025-12-12	下载	Reconfigurable Intelligent Surfaces (RIS) have emerged as a transformative technology capable of reshaping wireless environments through dynamic manipulation of electromagnetic waves.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Hold Onto That Thought: Assessing KV Cache Compression On Reasoning	Minghui Liu, Aadi Palnitkar, Tahseen Rabbani, Hyunwoo Jae, Kyle Rui Sang, Dixi Yao, Shayan Shabihi, Fuheng Zhao, Tian Li, Ce Zhang, Furong Huang, Kunpeng Zhang	2025-12-12	下载	Large language models (LLMs) have demonstrated remarkable performance on long-context tasks, but are often bottlenecked by memory constraints.
LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language Models	Yijie Zhi, Yayu Cao, Jianhua Dai, Xiaoyang Han, Jingwen Pu, Qingran Wu, Sheng Cheng, Ming Cai	2025-12-12	下载	Loop transformations are semantics-preserving optimization techniques, widely used to maximize objectives such as parallelism. Despite decades of research, applying the optimal composition of loop tra...
AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs	Anshul Kumar, Gagan Raj Gupta, Manisha Chawla	2025-12-12	下载	Large Language Models (LLMs) can perform many NLP tasks well, but fully fine-tuning them is expensive and requires a lot of memory. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA reduce t...