2025-02-26

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies	Shaibal Saha, Lanyu Xu	2025-02-26	下载	In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation.
FPGA-based Emulation and Device-Side Management for CXL-based Memory Tiering Systems	Yiqi Chen, Xiping Dong, Zhe Zhou, Zhao Wang, Jie Zhang, Guangyu Sun	2025-02-26	下载	The Compute Express Link (CXL) technology facilitates the extension of CPU memory through byte-addressable SerDes links and cascaded switches, creating complex heterogeneous memory systems where CPU a...
A Multicast-Capable AXI Crossbar for Many-core Machine Learning Accelerators	Luca Colagrande, Luca Benini	2025-02-26	下载	To keep up with the growing computational requirements of machine learning workloads, many-core accelerators integrate an ever-increasing number of processing elements, putting the efficiency of memor...
Evaluation of CGRA Toolchains	Dominik Walter, Marita Halm, Daniel Seidel, Indrayudh Ghosh, Christian Heidorn, Frank Hannig, Jürgen Teich	2025-02-26	下载	Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class for such accelerators are so-called processor arrays, which typically integrat...
3D-TrIM: A Memory-Efficient Spatial Computing Architecture for Convolution Workloads	Cristian Sestito, Ahmed J. Abdelmaksoud, Shady Agwa, Themis Prodromakis	2025-02-26	下载	The Von Neumann bottleneck, which relates to the energy cost of moving data from memory to on-chip core and vice versa, is a serious challenge in state-of-the-art AI architectures, like Convolutional ...
A Reliable, Time-Predictable Heterogeneous SoC for AI-Enhanced Mixed-Criticality Edge Applications	Angelo Garofalo, Alessandro Ottaviano, Matteo Perotti, Thomas Benz, Yvan Tortorella, Robert Balas, Michael Rogenmoser, Chi Zhang, Luca Bertaccini, Nils Wistoff, Maicol Ciani, Cyril Koenig, Mattia Sinigaglia, Luca Valente, Paul Scheffler, Manuel Eggimann, Matheus Cavalcante, Francesco Restuccia, Alessandro Biondi, Francesco Conti, Frank K. Gurkaynak, Davide Rossi, Luca Benini	2025-02-26	下载	Next-generation mixed-criticality Systems-on-chip (SoCs) for robotics, automotive, and space must execute mixed-criticality AI-enhanced sensor processing and control workloads, ensuring reliable and t...
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type	Weiming Hu, Haoyan Zhang, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng	2025-02-26	下载	Large language models (LLMs) are one of the most important killer computer applications. The recent algorithmic advancement proposes a fine-grained group-wise quantization for LLMs, which treats a sma...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
HDEE: Heterogeneous Domain Expert Ensemble	Oğuzhan Ersoy, Jari Kolehmainen, Gabriel Passamani Andrade	2025-02-26	下载	Training dense LLMs requires enormous amounts of data and centralized compute, which introduces fundamental bottlenecks and ever-growing costs for large models.
Algorithms for Parallel Shared-Memory Sparse Matrix-Vector Multiplication on Unstructured Matrices	Kobe Bergmans, Karl Meerbergen, Raf Vandebril	2025-02-26	下载	The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructu...
Efficient Federated Search for Retrieval-Augmented Generation	Rachid Guerraoui, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos	2025-02-26	下载	Large language models (LLMs) have demonstrated remarkable capabilities across various domains but remain susceptible to hallucinations and inconsistencies, limiting their reliability.
FedCDC: A Collaborative Framework for Data Consumers in Federated Learning Market	Zhuan Shi, Patrick Ohl, Boi Faltings	2025-02-26	下载	Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owner...
A Reliable, Time-Predictable Heterogeneous SoC for AI-Enhanced Mixed-Criticality Edge Applications	Angelo Garofalo, Alessandro Ottaviano, Matteo Perotti, Thomas Benz, Yvan Tortorella, Robert Balas, Michael Rogenmoser, Chi Zhang, Luca Bertaccini, Nils Wistoff, Maicol Ciani, Cyril Koenig, Mattia Sinigaglia, Luca Valente, Paul Scheffler, Manuel Eggimann, Matheus Cavalcante, Francesco Restuccia, Alessandro Biondi, Francesco Conti, Frank K. Gurkaynak, Davide Rossi, Luca Benini	2025-02-26	下载	Next-generation mixed-criticality Systems-on-chip (SoCs) for robotics, automotive, and space must execute mixed-criticality AI-enhanced sensor processing and control workloads, ensuring reliable and t...
CLLoRA: An Approach to Measure the Effects of the Context Length for LLM Fine-Tuning	Ping Zhang, Zhaorui Zhang, Sheng Di, Yao Xin, Benben Liu	2025-02-26	下载	Large language model fine-tuning has been identified as an efficient approach to applying the pre-trained Large language models to other domains.
Research on Edge Computing and Cloud Collaborative Resource Scheduling Optimization Based on Deep Reinforcement Learning	Yuqing Wang, Xiao Yang	2025-02-26	下载	This study addresses the challenge of resource scheduling optimization in edge-cloud collaborative computing using deep reinforcement learning (DRL).

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
On Supporting IP Routing in the Next Generation of Mobile Systems	Hamed Hellaoui, Matti Laitila, Markus Isomäki, Hua Chao	2025-02-26	下载	The upcoming generation of mobile telecommunication systems is expected to support new use cases, where the mobile network serves one or more IP subnetworks located behind the User Equipment (UEs).
Formowanie klastrów obsługujących w sieci 6g open ran o architekturze zorientowanej na użytkownika	Marcin Hoffmann	2025-02-26	下载	One of the main challenges associated with the implementation of an Open RAN User-Centric Cell-Free network is the appropriate formulation of serving clusters.
A Multi-Agent DRL-Based Framework for Optimal Resource Allocation and Twin Migration in the Multi-Tier Vehicular Metaverse	Nahom Abishu Hayla, A. Mohammed Seid, Aiman Erbad, Tilahun M. Getu, Ala Al-Fuqaha, Mohsen Guizani	2025-02-26	下载	Although multi-tier vehicular Metaverse promises to transform vehicles into essential nodes -- within an interconnected digital ecosystem -- using efficient resource allocation and seamless vehicular ...
Sequential Entanglement-Swapping assisted by Quantum Protocol over Ethernet Networks	Kun Chen-Hu, Kristian S. Jensen, Petar Popovski	2025-02-26	下载	The integration of quantum communication protocols over Ethernet networks is proposed, showing the potential of combining classical and quantum technologies for efficient, scalable quantum networking.
O-RIS-ing: Evaluating RIS-Assisted NextG Open RAN	Maria Tsampazi, Michele Polese, Falko Dressler, Tommaso Melodia	2025-02-26	下载	Reconfigurable Intelligent Surfaces (RISs) pose as a transformative technology to revolutionize the cellular architecture of Next Generation (NextG) Radio Access Networks (RANs).

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Safe and usable kernel extensions with Rex	Jinghao Jia, Ruowen Qin, Milo Craun, Egor Lukiyanov, Ayush Bansal, Michael V. Le, Hubertus Franke, Hani Jamjoom, Tianyin Xu, Dan Williams	2025-02-26	下载	Safe kernel extensions have gained significant traction, evolving from simple packet filters to large, complex programs that customize storage, networking, and scheduling.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical Analysis	Long Cheng, Qichen Liao, Fan Wu, Junlin Mu, Tengfei Han, Zhe Qiu, Lianqiang Li, Tianyi Liu, Fangzheng Miao, Keming Gao, Liang Wang, Zhen Zhang, Qiande Yin	2025-02-26	下载	Attention calculation is extremely time-consuming for long-sequence inference tasks, such as text or image/video generation, in large models. To accelerate this process, we developed a low-precision, ...