Skip to content

2025-12-12

cs.AR - Architecture

标题作者发布日期PDF摘要
TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum AlgorithmsYuewen Hou, Dhanvi Bharadwaj, Gokul Subramanian Ravi2025-12-12下载Variational Quantum Algorithms (VQAs) are promising for near- and intermediate-term quantum computing, but their execution cost is substantial.
Leveraging FPGAs for Homomorphic Matrix-Vector Multiplication in Oblivious Message RetrievalGrant Bosworth, Keewoo Lee, Sunwoong Kim2025-12-12下载While end-to-end encryption protects the content of messages, it does not secure metadata, which exposes sender and receiver information through traffic analysis.
Using GUI Agent for Electronic Design AutomationChunyi Li, Longfei Li, Zicheng Zhang, Xiaohong Liu, Min Tang, Weisi Lin, Guangtao Zhai2025-12-12下载Graphical User Interface (GUI) agents adopt an end-to-end paradigm that maps a screenshot to an action sequence, thereby automating repetitive tasks in virtual environments.
PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial ReconfigurationYifan Zhang, Zhiheng Chen, Ye Qiao, Sitao Huang2025-12-12下载Aggressively quantized large language models (LLMs), such as BitNet-style 1.58-bit Transformers with ternary weights, make it feasible to deploy generative AI on low-power edge FPGAs.
CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM InferenceYanru Chen, Runyang Tian, Yue Pan, Zheyu Li, Weihong Xu, Tajana Rosing2025-12-12下载The proliferation of large language models (LLMs) is accelerating the integration of multimodal assistants into edge devices, where inference is executed under stringent latency and energy constraints...
RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on GraphsYanru Chen, Zheyu Li, Keming Fan, Runyang Tian, John Hsu, Weihong Xu, Minxuan Zhou, Tajana Rosing2025-12-12下载All-pairs shortest paths (APSP) remains a major bottleneck for large-scale graph analytics, as data movement with cubic complexity overwhelms the bandwidth of conventional memory hierarchies.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
TreeVQA: A Tree-Structured Execution Framework for Shot Reduction in Variational Quantum AlgorithmsYuewen Hou, Dhanvi Bharadwaj, Gokul Subramanian Ravi2025-12-12下载Variational Quantum Algorithms (VQAs) are promising for near- and intermediate-term quantum computing, but their execution cost is substantial.
Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMsShiju Li, Younghoon Min, Hane Yie, Hoshik Kim, Soohong Ahn, Joonseop Sim, Chul-Ho Lee, Jongryool Kim2025-12-12下载Sparse General Matrix-Matrix Multiplication (SpGEMM) is a fundamental operation in numerous scientific computing and data analytics applications, often bottlenecked by irregular memory access patterns...
MVP-ORAM: a Wait-free Concurrent ORAM for Confidential BFT StorageRobin Vassantlal, Hasan Heydari, Bernardo Ferreira, Alysson Bessani2025-12-12下载It is well known that encryption alone is not enough to protect data privacy. Access patterns, revealed when operations are performed, can also be leveraged in inference attacks.
Hypergraph based Multi-Party Payment ChannelAyush Nainwal, Atharva Kamble, Nitin Awathare2025-12-12下载Public blockchains inherently offer low throughput and high latency, motivating off-chain scalability solutions such as Payment Channel Networks (PCNs).
ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous LearningYuze He, Ferdi Kossmann, Srinivasan Seshan, Peter Steenkiste2025-12-12下载Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras.
Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived IdentityManideep Reddy Chinthareddy2025-12-12下载Snowflake-style distributed ID generators are the industry standard for producing k-ordered, unique identifiers at scale. However, the traditional requirement for manually assigned or centrally coordi...
FirecREST v2: lessons learned from redesigning an API for scalable HPC resource accessElia Palme, Juan Pablo Dorsch, Ali Khosravi, Giovanni Pizzi, Francesco Pagnamenta, Andrea Ceriani, Eirini Koutsaniti, Rafael Sarmiento, Ivano Bonesana, Alejandro Dabin2025-12-12下载Introducing FirecREST v2, the next generation of our open-source RESTful API for programmatic access to HPC resources. FirecREST v2 delivers a 100x performance improvement over its predecessor.
Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge SystemsChong Tang, Hao Dai, Jagmohan Chauhan2025-12-12下载The growing demand for real-time DNN applications on edge devices necessitates faster inference of increasingly complex models. Although many devices include specialized accelerators (e.g.
Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet MessagingPatrick D. Manya, Eugene M. Mbuyi, Gothy T. Ngoie, Jordan F. Masakuna2025-12-12下载Identifying central nodes using closeness centrality is a critical task in analyzing large-scale complex networks, yet its decentralized computation remains challenging due to high communication overh...
LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language ModelsYijie Zhi, Yayu Cao, Jianhua Dai, Xiaoyang Han, Jingwen Pu, Qingran Wu, Sheng Cheng, Ming Cai2025-12-12下载Loop transformations are semantics-preserving optimization techniques, widely used to maximize objectives such as parallelism. Despite decades of research, applying the optimal composition of loop tra...
RollMux: Phase-Level Multiplexing for Disaggregated RL Post-TrainingTianyuan Wu, Lunxi Cao, Yining Wei, Wei Gao, Yuheng Zhao, Dakai An, Shaopan Xiong, Zhiqiang Lv, Ju Huang, Siran Yang, Yinghao Yu, Jiamang Wang, Lin Qu, Wei Wang2025-12-12下载Rollout-training disaggregation is emerging as the standard architecture for Reinforcement Learning (RL) post-training, where memory-bound rollout and compute-bound training are physically disaggregat...
RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on GraphsYanru Chen, Zheyu Li, Keming Fan, Runyang Tian, John Hsu, Weihong Xu, Minxuan Zhou, Tajana Rosing2025-12-12下载All-pairs shortest paths (APSP) remains a major bottleneck for large-scale graph analytics, as data movement with cubic complexity overwhelms the bandwidth of conventional memory hierarchies.
Theoretical Foundations of GPU-Native Compilation for Rapid Code IterationAdilet Metinov, Gulida M. Kudakeeva, Gulnara D. Kabaeva2025-12-12下载Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Leaner and Faster Web: How CBOR Can Improve Dynamic Content Encoding in JSON and DNS over HTTPSMartine S. Lenders, Carsten Bormann, Thomas C. Schmidt, Matthias Wählisch2025-12-12下载The Internet community has taken major efforts to decrease latency in the World Wide Web. Significant improvements have been achieved in accelerating content transport and in compressing static conten...
Policy Gradient Algorithms for Age-of-Information Cost MinimizationJosé-Ramón Vidal, Vicent Pla, Luis Guijarro, Israel Leyva-Mayorga2025-12-12下载Recent developments in cyber-physical systems have increased the importance of maximizing the freshness of the information about the physical environment.
Hypergraph based Multi-Party Payment ChannelAyush Nainwal, Atharva Kamble, Nitin Awathare2025-12-12下载Public blockchains inherently offer low throughput and high latency, motivating off-chain scalability solutions such as Payment Channel Networks (PCNs).
ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous LearningYuze He, Ferdi Kossmann, Srinivasan Seshan, Peter Steenkiste2025-12-12下载Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras.
Toward Scalable VR-Cloud Gaming: An Attention-aware Adaptive Resource Allocation Framework for 6G NetworksGabriel Almeida, João Paulo Esper, Cleverson Nahum, Aldebaro Klautau, Kleber Vieira Cardoso2025-12-12下载Virtual Reality Cloud Gaming (VR-CG) represents a demanding class of immersive applications, requiring high bandwidth, ultra-low latency, and intelligent resource management to ensure optimal user exp...
Proving DNSSEC Correctness: A Formal Approach to Secure Domain Name ResolutionQifan Zhang, Zilin Shen, Imtiaz Karim, Elisa Bertino, Zhou Li2025-12-12下载The Domain Name System Security Extensions (DNSSEC) are critical for preventing DNS spoofing, yet its specifications contain ambiguities and vulnerabilities that elude traditional "break-and-fix" appr...
Distributed Resource Allocation and Application Deployment in Mesh Edge NetworksAntoine Bernard, Antoine Legrain, Maroua Ben Attia, Abdo Shabah2025-12-12下载Virtual Network Embedding (VNE) approaches typically assume static or slowly-changing network topologies, but emerging applications require deployment in mobile environments where traditional methods ...
A Survey on Reconfigurable Intelligent Surfaces in Practical Systems: Security and Privacy PerspectivesZiyu Chen, Yitong Shen, Jingzhe Zhang, Yao Zheng, Yili Ren, Xuyu Wang, Shiwen Mao, Hanqing Guo2025-12-12下载Reconfigurable Intelligent Surfaces (RIS) have emerged as a transformative technology capable of reshaping wireless environments through dynamic manipulation of electromagnetic waves.

cs.PF - Performance

标题作者发布日期PDF摘要
Hold Onto That Thought: Assessing KV Cache Compression On ReasoningMinghui Liu, Aadi Palnitkar, Tahseen Rabbani, Hyunwoo Jae, Kyle Rui Sang, Dixi Yao, Shayan Shabihi, Fuheng Zhao, Tian Li, Ce Zhang, Furong Huang, Kunpeng Zhang2025-12-12下载Large language models (LLMs) have demonstrated remarkable performance on long-context tasks, but are often bottlenecked by memory constraints.
LOOPRAG: Enhancing Loop Transformation Optimization with Retrieval-Augmented Large Language ModelsYijie Zhi, Yayu Cao, Jianhua Dai, Xiaoyang Han, Jingwen Pu, Qingran Wu, Sheng Cheng, Ming Cai2025-12-12下载Loop transformations are semantics-preserving optimization techniques, widely used to maximize objectives such as parallelism. Despite decades of research, applying the optimal composition of loop tra...
AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMsAnshul Kumar, Gagan Raj Gupta, Manisha Chawla2025-12-12下载Large Language Models (LLMs) can perform many NLP tasks well, but fully fine-tuning them is expensive and requires a lot of memory. Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA reduce t...

基于 VitePress 构建