2025-03-04

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Enabling Fast, Accurate, and Efficient Real-Time Genome Analysis via New Algorithms and Techniques	Can Firtina	2025-03-04	下载	The advent of high-throughput sequencing technologies has revolutionized genome analysis by enabling the rapid and cost-effective sequencing of large genomes.
SWAPPER: Dynamic Operand Swapping in Non-commutative Approximate Circuits for Online Error Reduction	Marcello Traiola, Nazar Misyats, Silviu-Ioan Filip, Remi Garcia, Angeliki Kritikakou	2025-03-04	下载	Error-tolerant applications, such as multimedia processing, machine learning, signal processing, and scientific computing, can produce satisfactory outputs even when approximate computations are perfo...
CORDIC Is All You Need	Omkar Kokane, Adam Teman, Anushka Jha, Guru Prasath SL, Gopal Raut, Mukul Lokhande, S. V. Jaya Chand, Tanushree Dewangan, Santosh Kumar Vishvakarma	2025-03-04	下载	Artificial intelligence necessitates adaptable hardware accelerators for efficient high-throughput million operations. We present pipelined architecture with CORDIC block for linear MAC computations a...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
A HPX Communication Benchmark: Distributed FFT using Collectives	Alexander Strack, Dirk Pflüger	2025-03-04	下载	Due to increasing core counts in modern processors, several task-based runtimes emerged, including the C++ Standard Library for Concurrency and Parallelism (HPX).
Comparative Analysis of Lightweight Kubernetes Distributions for Edge Computing: Performance and Resource Efficiency	Diyaz Yakubov, David Hästbacka	2025-03-04	下载	Edge computing environments increasingly rely on lightweight container orchestration platforms to manage resource-constrained devices. This paper provides an empirical analysis of five lightweight kub...
Deal: Distributed End-to-End GNN Inference for All Nodes	Shiyang Chen, Xiang Song, Vasiloudis Theodore, Hang Liu	2025-03-04	下载	Graph Neural Networks (GNNs) are a new research frontier with various applications and successes. The end-to-end inference for all nodes, is common for GNN embedding models, which are widely adopted i...
ESSPI: ECDSA/Schnorr Signed Program Input for BitVMX	Sergio Demian Lerner, Martin Jonas, Ariel Futoransky	2025-03-04	下载	The BitVM and BitVMX protocols have long relied on inefficient one-time signature (OTS) schemes like Lamport and Winternitz for signing program inputs.
A Sheaf-Theoretic Characterization of Tasks in Distributed Systems	Stephan Felber, Bernardo Hummes Flores, Hugo Rincon Galeana	2025-03-04	下载	We introduce a sheaf-theoretic characterization of task solvability in general distributed computing models, unifying distinct approaches to message-passing models.
SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling	Cunchi Lv, Xiao Shi, Dong Liang, Wenting Tan, Xiaofang Zhao	2025-03-04	下载	Deep Learning (DL), especially with Large Language Models (LLMs), brings benefits to various areas. However, DL training systems usually yield prominent idling GPU resources due to many factors, such ...
Introducing Support for Move Operations in Melda CRDT	Amos Brocco	2025-03-04	下载	In this paper, we present an extension to Melda (a library which implements a general purpose delta state JSON CRDT) to support move operations.
Memory and Bandwidth are All You Need for Fully Sharded Data Parallel	Jiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim	2025-03-04	下载	Transformer models have revolutionized a wide spectrum of disciplines, especially in language processing. The recent success has proven that model size scalability is crucial for achieving superior pe...
3-Majority and 2-Choices with Many Opinions	Nobutaka Shimizu, Takeharu Shiraga	2025-03-04	下载	We present the first nearly-optimal bounds on the consensus time for the well-known synchronous consensus dynamics, specifically 3-Majority and 2-Choices, for an arbitrary number of opinions.
Efficient Long Context Fine-tuning with Chunk Flow	Xiulong Yuan, Hongtao Xu, Wenting Shen, Ang Wang, Xiafei Qiu, Jie Zhang, Yuqiong Liu, Bowen Yu, Junyang Lin, Mingzhen Li, Weile Jia, Yong Li, Wei Lin	2025-03-04	下载	Long context fine-tuning of large language models(LLMs) involves training on datasets that are predominantly composed of short sequences and a small proportion of longer sequences.
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory	Jiashun Suo, Xiaojian Liao, Limin Xiao, Li Ruan, Jinquan Wang, Xiao Su, Zhisheng Huo	2025-03-04	下载	Large language models like GPT-4 are resource-intensive, but recent advancements suggest that smaller, specialized experts can outperform the monolithic models on specific tasks.
PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power Accelerators	Keondo Park, You Rim Choi, Inhoe Lee, Hyung-Sin Kim	2025-03-04	下载	Running deep learning models on resource-constrained edge devices has drawn significant attention due to its fast response, privacy preservation, and robust operation regardless of Internet connectivi...
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference	Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Jingwen Leng, Chen Jin	2025-03-04	下载	In this work, we design and implement VQ-LLM, an efficient fused Vector Quantization (VQ) kernel generation framework. We first introduce a software abstraction called codebook cache to optimize codeb...
A Distributed Partitioning Software and its Applications	Aparna Sasidharan	2025-03-04	下载	This article describes a geometric partitioning software that can be used for quick computation of data partitions on many-core HPC machines. It is most suited for dynamic applications with load distr...
Relaxation for Efficient Asynchronous Queues	Samuel Baldwin, Cole Hausman, Mohamed Bakr, Edward Talmage	2025-03-04	下载	We explore the problem of efficiently implementing shared data structures in an asynchronous computing environment. We start with a traditional FIFO queue, showing that full replication is possible wi...
AugFL: Augmenting Federated Learning with Pretrained Models	Sheng Yue, Zerui Qin, Yongheng Deng, Ju Ren, Yaoxue Zhang, Junshan Zhang	2025-03-04	下载	Federated Learning (FL) has garnered widespread interest in recent years. However, owing to strict privacy policies or limited storage capacities of training participants such as IoT devices, its effe...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Generative Active Adaptation for Drifting and Imbalanced Network Intrusion Detection	Ragini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Phuong Cao, Nick Feamster, Klara Nahrstedt	2025-03-04	下载	Machine learning has shown promise in network intrusion detection systems, yet its performance often degrades due to concept drift and imbalanced data.
Scaling IP Lookup to Large Databases using the CRAM Lens	Robert Chang, Pradeep Dogga, Andy Fingerhut, Victor Rios, George Varghese	2025-03-04	下载	Wide-area scaling trends require new approaches to Internet Protocol (IP) lookup, enabled by modern networking chips such as Intel Tofino, AMD Pensando, and Nvidia BlueField, which provide substantial...
Efficient and Optimal No-Regret Caching under Partial Observation	Younes Ben Mazziane, Francescomaria Faticanti, Sara Alouf, Giovanni Neglia	2025-03-04	下载	Online learning algorithms have been successfully used to design caching policies with sublinear regret in the total number of requests, with no statistical assumption about the request sequence.
Network Simulator-centric Compositional Testing	Tom Rousseaux, Christophe Crochet, John Aoga, Axel Legay	2025-03-04	下载	This article introduces a novel methodology, Network Simulator-centric Compositional Testing (NSCT), to enhance the verification of network protocols with a particular focus on time-varying network pr...
PANTHER: Pluginizable Testing Environment for Network Protocols	Christophe Crochet, John Aoga, Axel Legay	2025-03-04	下载	In this paper, we introduce PANTHER, a modular framework for testing network protocols and formally verifying their specification. The framework incorporates a plugin architecture to enhance flexibili...
Real-Time Burst-Mode Digital Signal Processing for Passive Optical Networks	Ji Zhou, Kainan Wu, Haide Wang, Jinyang Yang, Weiping Liu, Junwen Zhang, Changyuan Yu, Xiangjun Xin, Liangchuan Li	2025-03-04	下载	Driven by the ever-increasing capacity demands, the 50G passive optical network (PON) is maturing gradually. One of the main challenges for the 50G PON is implementing burst-mode digital signal proces...
MobRFFI: Non-cooperative Device Re-identification for Mobility Intelligence	Stepan Mazokha, Fanchen Bao, George Sklivanitis, Jason O. Hallstrom	2025-03-04	下载	WiFi-based mobility monitoring in urban environments can provide valuable insights into pedestrian and vehicle movements. However, MAC address randomization introduces a significant obstacle in accura...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM Inference	Hongchao Du, Shangyu Wu, Arina Kharlamova, Nan Guan, Chun Jason Xue	2025-03-04	下载	Large Language Models (LLMs) face challenges for on-device inference due to high memory demands. Traditional methods to reduce memory usage often compromise performance and lack adaptability.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Heavy-traffic Optimality of Skip-the-Longest-Queues in Heterogeneous Service Systems	Yishun Luo, Martin Zubeldia	2025-03-04	下载	We consider a discrete-time parallel service system consisting of $n$ heterogeneous single server queues with infinite capacity. Jobs arrive to the system as an i.i.d.
Energy efficiency of cache eviction algorithms for Zipf distributed objects	Emese Sziklay, Tamás Jursonovics	2025-03-04	下载	This paper presents a summary analysis of the Least Frequently Used (LFU) and Perfect Least Frequently Used (PLFU) cache eviction algorithms on real data, transferred on Content Delivery Nettworks (CD...
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory	Jiashun Suo, Xiaojian Liao, Limin Xiao, Li Ruan, Jinquan Wang, Xiao Su, Zhisheng Huo	2025-03-04	下载	Large language models like GPT-4 are resource-intensive, but recent advancements suggest that smaller, specialized experts can outperform the monolithic models on specific tasks.