Skip to content

2025-03-04

cs.AR - Architecture

标题作者发布日期PDF摘要
Enabling Fast, Accurate, and Efficient Real-Time Genome Analysis via New Algorithms and TechniquesCan Firtina2025-03-04下载The advent of high-throughput sequencing technologies has revolutionized genome analysis by enabling the rapid and cost-effective sequencing of large genomes.
SWAPPER: Dynamic Operand Swapping in Non-commutative Approximate Circuits for Online Error ReductionMarcello Traiola, Nazar Misyats, Silviu-Ioan Filip, Remi Garcia, Angeliki Kritikakou2025-03-04下载Error-tolerant applications, such as multimedia processing, machine learning, signal processing, and scientific computing, can produce satisfactory outputs even when approximate computations are perfo...
CORDIC Is All You NeedOmkar Kokane, Adam Teman, Anushka Jha, Guru Prasath SL, Gopal Raut, Mukul Lokhande, S. V. Jaya Chand, Tanushree Dewangan, Santosh Kumar Vishvakarma2025-03-04下载Artificial intelligence necessitates adaptable hardware accelerators for efficient high-throughput million operations. We present pipelined architecture with CORDIC block for linear MAC computations a...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
A HPX Communication Benchmark: Distributed FFT using CollectivesAlexander Strack, Dirk Pflüger2025-03-04下载Due to increasing core counts in modern processors, several task-based runtimes emerged, including the C++ Standard Library for Concurrency and Parallelism (HPX).
Comparative Analysis of Lightweight Kubernetes Distributions for Edge Computing: Performance and Resource EfficiencyDiyaz Yakubov, David Hästbacka2025-03-04下载Edge computing environments increasingly rely on lightweight container orchestration platforms to manage resource-constrained devices. This paper provides an empirical analysis of five lightweight kub...
Deal: Distributed End-to-End GNN Inference for All NodesShiyang Chen, Xiang Song, Vasiloudis Theodore, Hang Liu2025-03-04下载Graph Neural Networks (GNNs) are a new research frontier with various applications and successes. The end-to-end inference for all nodes, is common for GNN embedding models, which are widely adopted i...
ESSPI: ECDSA/Schnorr Signed Program Input for BitVMXSergio Demian Lerner, Martin Jonas, Ariel Futoransky2025-03-04下载The BitVM and BitVMX protocols have long relied on inefficient one-time signature (OTS) schemes like Lamport and Winternitz for signing program inputs.
A Sheaf-Theoretic Characterization of Tasks in Distributed SystemsStephan Felber, Bernardo Hummes Flores, Hugo Rincon Galeana2025-03-04下载We introduce a sheaf-theoretic characterization of task solvability in general distributed computing models, unifying distinct approaches to message-passing models.
SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference FillingCunchi Lv, Xiao Shi, Dong Liang, Wenting Tan, Xiaofang Zhao2025-03-04下载Deep Learning (DL), especially with Large Language Models (LLMs), brings benefits to various areas. However, DL training systems usually yield prominent idling GPU resources due to many factors, such ...
Introducing Support for Move Operations in Melda CRDTAmos Brocco2025-03-04下载In this paper, we present an extension to Melda (a library which implements a general purpose delta state JSON CRDT) to support move operations.
Memory and Bandwidth are All You Need for Fully Sharded Data ParallelJiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim2025-03-04下载Transformer models have revolutionized a wide spectrum of disciplines, especially in language processing. The recent success has proven that model size scalability is crucial for achieving superior pe...
3-Majority and 2-Choices with Many OpinionsNobutaka Shimizu, Takeharu Shiraga2025-03-04下载We present the first nearly-optimal bounds on the consensus time for the well-known synchronous consensus dynamics, specifically 3-Majority and 2-Choices, for an arbitrary number of opinions.
Efficient Long Context Fine-tuning with Chunk FlowXiulong Yuan, Hongtao Xu, Wenting Shen, Ang Wang, Xiafei Qiu, Jie Zhang, Yuqiong Liu, Bowen Yu, Junyang Lin, Mingzhen Li, Weile Jia, Yong Li, Wei Lin2025-03-04下载Long context fine-tuning of large language models(LLMs) involves training on datasets that are predominantly composed of short sequences and a small proportion of longer sequences.
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited MemoryJiashun Suo, Xiaojian Liao, Limin Xiao, Li Ruan, Jinquan Wang, Xiao Su, Zhisheng Huo2025-03-04下载Large language models like GPT-4 are resource-intensive, but recent advancements suggest that smaller, specialized experts can outperform the monolithic models on specific tasks.
PointSplit: Towards On-device 3D Object Detection with Heterogeneous Low-power AcceleratorsKeondo Park, You Rim Choi, Inhoe Lee, Hyung-Sin Kim2025-03-04下载Running deep learning models on resource-constrained edge devices has drawn significant attention due to its fast response, privacy preservation, and robust operation regardless of Internet connectivi...
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM InferenceZihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Jingwen Leng, Chen Jin2025-03-04下载In this work, we design and implement VQ-LLM, an efficient fused Vector Quantization (VQ) kernel generation framework. We first introduce a software abstraction called codebook cache to optimize codeb...
A Distributed Partitioning Software and its ApplicationsAparna Sasidharan2025-03-04下载This article describes a geometric partitioning software that can be used for quick computation of data partitions on many-core HPC machines. It is most suited for dynamic applications with load distr...
Relaxation for Efficient Asynchronous QueuesSamuel Baldwin, Cole Hausman, Mohamed Bakr, Edward Talmage2025-03-04下载We explore the problem of efficiently implementing shared data structures in an asynchronous computing environment. We start with a traditional FIFO queue, showing that full replication is possible wi...
AugFL: Augmenting Federated Learning with Pretrained ModelsSheng Yue, Zerui Qin, Yongheng Deng, Ju Ren, Yaoxue Zhang, Junshan Zhang2025-03-04下载Federated Learning (FL) has garnered widespread interest in recent years. However, owing to strict privacy policies or limited storage capacities of training participants such as IoT devices, its effe...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Generative Active Adaptation for Drifting and Imbalanced Network Intrusion DetectionRagini Gupta, Shinan Liu, Ruixiao Zhang, Xinyue Hu, Xiaoyang Wang, Hadjer Benkraouda, Pranav Kommaraju, Phuong Cao, Nick Feamster, Klara Nahrstedt2025-03-04下载Machine learning has shown promise in network intrusion detection systems, yet its performance often degrades due to concept drift and imbalanced data.
Scaling IP Lookup to Large Databases using the CRAM LensRobert Chang, Pradeep Dogga, Andy Fingerhut, Victor Rios, George Varghese2025-03-04下载Wide-area scaling trends require new approaches to Internet Protocol (IP) lookup, enabled by modern networking chips such as Intel Tofino, AMD Pensando, and Nvidia BlueField, which provide substantial...
Efficient and Optimal No-Regret Caching under Partial ObservationYounes Ben Mazziane, Francescomaria Faticanti, Sara Alouf, Giovanni Neglia2025-03-04下载Online learning algorithms have been successfully used to design caching policies with sublinear regret in the total number of requests, with no statistical assumption about the request sequence.
Network Simulator-centric Compositional TestingTom Rousseaux, Christophe Crochet, John Aoga, Axel Legay2025-03-04下载This article introduces a novel methodology, Network Simulator-centric Compositional Testing (NSCT), to enhance the verification of network protocols with a particular focus on time-varying network pr...
PANTHER: Pluginizable Testing Environment for Network ProtocolsChristophe Crochet, John Aoga, Axel Legay2025-03-04下载In this paper, we introduce PANTHER, a modular framework for testing network protocols and formally verifying their specification. The framework incorporates a plugin architecture to enhance flexibili...
Real-Time Burst-Mode Digital Signal Processing for Passive Optical NetworksJi Zhou, Kainan Wu, Haide Wang, Jinyang Yang, Weiping Liu, Junwen Zhang, Changyuan Yu, Xiangjun Xin, Liangchuan Li2025-03-04下载Driven by the ever-increasing capacity demands, the 50G passive optical network (PON) is maturing gradually. One of the main challenges for the 50G PON is implementing burst-mode digital signal proces...
MobRFFI: Non-cooperative Device Re-identification for Mobility IntelligenceStepan Mazokha, Fanchen Bao, George Sklivanitis, Jason O. Hallstrom2025-03-04下载WiFi-based mobility monitoring in urban environments can provide valuable insights into pedestrian and vehicle movements. However, MAC address randomization introduces a significant obstacle in accura...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading for On-Device LLM InferenceHongchao Du, Shangyu Wu, Arina Kharlamova, Nan Guan, Chun Jason Xue2025-03-04下载Large Language Models (LLMs) face challenges for on-device inference due to high memory demands. Traditional methods to reduce memory usage often compromise performance and lack adaptability.

cs.PF - Performance

标题作者发布日期PDF摘要
Heavy-traffic Optimality of Skip-the-Longest-Queues in Heterogeneous Service SystemsYishun Luo, Martin Zubeldia2025-03-04下载We consider a discrete-time parallel service system consisting of nn heterogeneous single server queues with infinite capacity. Jobs arrive to the system as an i.i.d.
Energy efficiency of cache eviction algorithms for Zipf distributed objectsEmese Sziklay, Tamás Jursonovics2025-03-04下载This paper presents a summary analysis of the Least Frequently Used (LFU) and Perfect Least Frequently Used (PLFU) cache eviction algorithms on real data, transferred on Content Delivery Nettworks (CD...
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited MemoryJiashun Suo, Xiaojian Liao, Limin Xiao, Li Ruan, Jinquan Wang, Xiao Su, Zhisheng Huo2025-03-04下载Large language models like GPT-4 are resource-intensive, but recent advancements suggest that smaller, specialized experts can outperform the monolithic models on specific tasks.

基于 VitePress 构建