Skip to content

2025-04-24

cs.AR - Architecture

标题作者发布日期PDF摘要
ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate ComputingAyesha Siddique, Khurram Khalil, Khaza Anuarul Hoque2025-04-24下载Explainable artificial intelligence (XAI) enhances AI system transparency by framing interpretability as an optimization problem. However, this approach often necessitates numerous iterations of compu...
Biting the CHERI bullet: Blockers, Enablers and Security Implications of CHERI in DefenceShamal Faily2025-04-24下载There is growing interest in securing the hardware foundations software stacks build upon. However, before making any investment decision, software and hardware supply chain stakeholders require evide...
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM InferenceQingyuan Liu, Liyan Chen, Yanning Yang, Haocheng Wang, Dong Du, Zhigang Mao, Naifeng Jing, Yubin Xia, Haibo Chen2025-04-24下载Large Language Models (LLMs) increasingly require processing long text sequences, but GPU memory limitations force difficult trade-offs between memory capacity and bandwidth.
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware AccelerationMaoyang Xiang, Ramesh Fernando, Bo Wang2025-04-24下载Transformer-based Large Language Models (LLMs) have significantly advanced AI capabilities but pose considerable challenges for deployment on edge devices due to high computational demands, memory ban...
Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model AccelerationRobin Geens, Arne Symons, Marian Verhelst2025-04-24下载State Space Models (SSMs) offer a promising alternative to transformers for long-sequence processing. However, their efficiency remains hindered by memory-bound operations, particularly in the prefill...
FLAG: Formal and LLM-assisted SVA Generation for Formal Specifications of On-Chip Communication ProtocolsYu-An Shih, Annie Lin, Aarti Gupta, Sharad Malik2025-04-24下载Formal specifications of on-chip communication protocols are crucial for system-on-chip (SoC) design and verification. However, manually constructing these formal specifications from informal document...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
EPSILON: Adaptive Fault Mitigation in Approximate Deep Neural Network using Statistical SignaturesKhurram Khalil, Khaza Anuarul Hoque2025-04-24下载The increasing adoption of approximate computing in deep neural network accelerators (AxDNNs) promises significant energy efficiency gains. However, permanent faults in AxDNNs can severely degrade the...
Optimized Cloud Resource Allocation Using Genetic Algorithms for Energy Efficiency and QoS AssuranceCaroline Panggabean, Devaraj Verma C, Bhagyashree Gogoi, Ranju Limbu, Rhythm Sarker2025-04-24下载Cloud computing environments demand dynamic and efficient resource management to ensure optimal performance, reduced energy consumption, and adherence to Service Level Agreements (SLAs).
Cross-region Model Training with Communication-Computation Overlapping and Delay CompensationYing Zhu, Yang Xu, Hongli Xu, Yunming Liao, Zhiwei Yao, Liusheng Huang2025-04-24下载Training large language models (LLMs) requires massive computational resources, often necessitating the aggregation of geographically distributed data centers (\ie, cross-region training).
TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File SystemZheng Wei, Jing Xing, Yida Gu, Wenjing Huang, Dong Dai, Guangming Tan, Dingwen Tao2025-04-24下载Compared to replication-based storage systems, erasure-coded storage incurs significantly higher overhead during data updates. To address this issue, various parity logging methods have been pro- pose...
Shared Randomness in Locally Checkable Problems: The Role of Computational AssumptionsAdar Hadad, Moni Naor2025-04-24下载Shared randomness is a valuable resource in distributed computing, allowing some form of coordination between processors without explicit communication.
Communication-Efficient Personalized Distributed Learning with Data and Node HeterogeneityZhuojun Tian, Zhaoyang Zhang, Yiwei Li, Mehdi Bennis2025-04-24下载To jointly tackle the challenges of data and node heterogeneity in decentralized learning, we propose a distributed strong lottery ticket hypothesis (DSLTH), based on which a communication-efficient p...
GRANITE : a Byzantine-Resilient Dynamic Gossip Learning FrameworkYacine Belal, Mohamed Maouche, Sonia Ben Mokhtar, Anthony Simonet-Boulogne2025-04-24下载Gossip Learning (GL) is a decentralized learning paradigm where users iteratively exchange and aggregate models with a small set of neighboring peers.
CHASe: Client Heterogeneity-Aware Data Selection for Effective Federated Active LearningJun Zhang, Jue Wang, Huan Li, Zhongle Xie, Ke Chen, Lidan Shou2025-04-24下载Active learning (AL) reduces human annotation costs for machine learning systems by strategically selecting the most informative unlabeled data for annotation, but performing it individually may still...
Dynamic Approximate Maximum Matching in the Distributed Vertex Partition ModelPeter Robinson, Xianbin Zhu2025-04-24下载We initiate the study of approximate maximum matching in the vertex partition model, for graphs subject to dynamic changes. We assume that the nn vertices of the graph are partitioned among kk playe...
JITServe: SLO-aware LLM Serving with Imprecise Request InformationWei Zhang, Zhiyu Wu, Yi Mu, Rui Ning, Banruo Liu, Nikhil Sarda, Myungjin Lee, Fan Lai2025-04-24下载The integration of Large Language Models (LLMs) into applications ranging from interactive chatbots to multi-agent systems has introduced a wide spectrum of service-level objectives (SLOs) for respons...
Developing a Blockchain-Based Secure Digital Contents Distribution SystemSyed Mohiuddin Qadri, Sangwhan Cha2025-04-24下载As digital content distribution expands rapidly through online platforms, securing digital media and protecting intellectual property has become increasingly complex.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Toward Low-Latency Services over PON using OCDMA Private NetworksSteevy J. Cordette2025-04-24下载An low-latency service scheme is proposed over Passive Optical Network (PON). The Optical Code Division Multiplexing Access (OCDMA) technique is used to define multiple private networks serving as Vir...
STGen: A Novel Lightweight IoT Testbed for Generating Sensor Traffic for the Experimentation of IoT Protocol and its Application in Hybrid NetworkHasan MA Islam, S. Nath, M. Rahman, N. Shahriar, M. K. M. Khan, R. Islam2025-04-24下载A Wireless Sensor Network (WSN) is a network that does not rely on a fixed infrastructure and consists of numerous sensors, such as temperature, humidity, GPS, and cameras, equipped with onboard proce...
Mitigating xApp conflicts for efficient network slicing in 6G O-RAN: a graph convolutional-based attention network approachSihem Bakri, Indrakshi Dey, Harun Siljak, Marco Ruffini, Nicola Marchetti2025-04-24下载O-RAN (Open-Radio Access Network) offers a flexible, open architecture for next-generation wireless networks. Network slicing within O-RAN allows network operators to create customized virtual network...
An All-Optical Metro Network Architecture and QoS-Aware Wavelength Allocation Study for Converged Fixed, Mobile, and Edge Computing Multi-Granular TrafficDavid Georgantas, Zhaoyang Liu, Georgios Drainakis, Bitao Pan, Adonis Bogris, Peristera Baziana2025-04-24下载In this paper, we introduce an all-optical metro network architecture, called MOON, to serve converged multigranular traffic from fixed, mobile, and edge computing services.
An Extensible Software Transport Layer for GPU NetworkingYang Zhou, Zhongjie Chen, Ziming Mao, ChonLam Lao, Shuo Yang, Pravein Govindan Kannan, Jiaqi Gao, Yilong Zhao, Yongji Wu, Kaichao You, Fengyuan Ren, Zhiying Xu, Costin Raiciu, Ion Stoica2025-04-24下载Fast-evolving machine learning (ML) workloads have increasing requirements for networking. However, host network transport on RDMA NICs is hard to evolve, causing problems for ML workloads.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Proto: A Guided Journey through Modern OS ConstructionWonkyo Choe, Rongxiang Wang, Afsara Benazir, Felix Xiaozhu Lin2025-04-24下载Proto is a new instructional OS that runs on commodity, portable hardware. It showcases modern features, including per-app address spaces, threading, commodity filesystems, USB, DMA, multicore support...
Biting the CHERI bullet: Blockers, Enablers and Security Implications of CHERI in DefenceShamal Faily2025-04-24下载There is growing interest in securing the hardware foundations software stacks build upon. However, before making any investment decision, software and hardware supply chain stakeholders require evide...

cs.PF - Performance

标题作者发布日期PDF摘要
PHast -- Perfect Hashing made fastPiotr Beling, Peter Sanders2025-04-24下载Perfect hash functions give unique "names" to arbitrary keys requiring only a few bits per key. This is an essential building block in applications like static hash tables, databases, or bioinformatic...
PowerSensor3: A Fast and Accurate Open Source Power Measurement ToolSteven van der Vlugt, Leon Oostrum, Gijs Schoonderbeek, Ben van Werkhoven, Bram Veenboer, Krijn Doekemeijer, John W. Romein2025-04-24下载Power consumption is a major concern in data centers and HPC applications, with GPUs typically accounting for more than half of system power usage.

基于 VitePress 构建