Skip to content

2025-02-26

cs.AR - Architecture

标题作者发布日期PDF摘要
Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration StrategiesShaibal Saha, Lanyu Xu2025-02-26下载In recent years, vision transformers (ViTs) have emerged as powerful and promising techniques for computer vision tasks such as image classification, object detection, and segmentation.
FPGA-based Emulation and Device-Side Management for CXL-based Memory Tiering SystemsYiqi Chen, Xiping Dong, Zhe Zhou, Zhao Wang, Jie Zhang, Guangyu Sun2025-02-26下载The Compute Express Link (CXL) technology facilitates the extension of CPU memory through byte-addressable SerDes links and cascaded switches, creating complex heterogeneous memory systems where CPU a...
A Multicast-Capable AXI Crossbar for Many-core Machine Learning AcceleratorsLuca Colagrande, Luca Benini2025-02-26下载To keep up with the growing computational requirements of machine learning workloads, many-core accelerators integrate an ever-increasing number of processing elements, putting the efficiency of memor...
Evaluation of CGRA ToolchainsDominik Walter, Marita Halm, Daniel Seidel, Indrayudh Ghosh, Christian Heidorn, Frank Hannig, Jürgen Teich2025-02-26下载Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class for such accelerators are so-called processor arrays, which typically integrat...
3D-TrIM: A Memory-Efficient Spatial Computing Architecture for Convolution WorkloadsCristian Sestito, Ahmed J. Abdelmaksoud, Shady Agwa, Themis Prodromakis2025-02-26下载The Von Neumann bottleneck, which relates to the energy cost of moving data from memory to on-chip core and vice versa, is a serious challenge in state-of-the-art AI architectures, like Convolutional ...
A Reliable, Time-Predictable Heterogeneous SoC for AI-Enhanced Mixed-Criticality Edge ApplicationsAngelo Garofalo, Alessandro Ottaviano, Matteo Perotti, Thomas Benz, Yvan Tortorella, Robert Balas, Michael Rogenmoser, Chi Zhang, Luca Bertaccini, Nils Wistoff, Maicol Ciani, Cyril Koenig, Mattia Sinigaglia, Luca Valente, Paul Scheffler, Manuel Eggimann, Matheus Cavalcante, Francesco Restuccia, Alessandro Biondi, Francesco Conti, Frank K. Gurkaynak, Davide Rossi, Luca Benini2025-02-26下载Next-generation mixed-criticality Systems-on-chip (SoCs) for robotics, automotive, and space must execute mixed-criticality AI-enhanced sensor processing and control workloads, ensuring reliable and t...
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical TypeWeiming Hu, Haoyan Zhang, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng2025-02-26下载Large language models (LLMs) are one of the most important killer computer applications. The recent algorithmic advancement proposes a fine-grained group-wise quantization for LLMs, which treats a sma...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
HDEE: Heterogeneous Domain Expert EnsembleOğuzhan Ersoy, Jari Kolehmainen, Gabriel Passamani Andrade2025-02-26下载Training dense LLMs requires enormous amounts of data and centralized compute, which introduces fundamental bottlenecks and ever-growing costs for large models.
Algorithms for Parallel Shared-Memory Sparse Matrix-Vector Multiplication on Unstructured MatricesKobe Bergmans, Karl Meerbergen, Raf Vandebril2025-02-26下载The sparse matrix-vector (SpMV) multiplication is an important computational kernel, but it is notoriously difficult to execute efficiently. This paper investigates algorithm performance for unstructu...
Efficient Federated Search for Retrieval-Augmented GenerationRachid Guerraoui, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos2025-02-26下载Large language models (LLMs) have demonstrated remarkable capabilities across various domains but remain susceptible to hallucinations and inconsistencies, limiting their reliability.
FedCDC: A Collaborative Framework for Data Consumers in Federated Learning MarketZhuan Shi, Patrick Ohl, Boi Faltings2025-02-26下载Federated learning (FL) allows machine learning models to be trained on distributed datasets without directly accessing local data. In FL markets, numerous Data Consumers compete to recruit Data Owner...
A Reliable, Time-Predictable Heterogeneous SoC for AI-Enhanced Mixed-Criticality Edge ApplicationsAngelo Garofalo, Alessandro Ottaviano, Matteo Perotti, Thomas Benz, Yvan Tortorella, Robert Balas, Michael Rogenmoser, Chi Zhang, Luca Bertaccini, Nils Wistoff, Maicol Ciani, Cyril Koenig, Mattia Sinigaglia, Luca Valente, Paul Scheffler, Manuel Eggimann, Matheus Cavalcante, Francesco Restuccia, Alessandro Biondi, Francesco Conti, Frank K. Gurkaynak, Davide Rossi, Luca Benini2025-02-26下载Next-generation mixed-criticality Systems-on-chip (SoCs) for robotics, automotive, and space must execute mixed-criticality AI-enhanced sensor processing and control workloads, ensuring reliable and t...
CLLoRA: An Approach to Measure the Effects of the Context Length for LLM Fine-TuningPing Zhang, Zhaorui Zhang, Sheng Di, Yao Xin, Benben Liu2025-02-26下载Large language model fine-tuning has been identified as an efficient approach to applying the pre-trained Large language models to other domains.
Research on Edge Computing and Cloud Collaborative Resource Scheduling Optimization Based on Deep Reinforcement LearningYuqing Wang, Xiao Yang2025-02-26下载This study addresses the challenge of resource scheduling optimization in edge-cloud collaborative computing using deep reinforcement learning (DRL).

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
On Supporting IP Routing in the Next Generation of Mobile SystemsHamed Hellaoui, Matti Laitila, Markus Isomäki, Hua Chao2025-02-26下载The upcoming generation of mobile telecommunication systems is expected to support new use cases, where the mobile network serves one or more IP subnetworks located behind the User Equipment (UEs).
Formowanie klastrów obsługujących w sieci 6g open ran o architekturze zorientowanej na użytkownikaMarcin Hoffmann2025-02-26下载One of the main challenges associated with the implementation of an Open RAN User-Centric Cell-Free network is the appropriate formulation of serving clusters.
A Multi-Agent DRL-Based Framework for Optimal Resource Allocation and Twin Migration in the Multi-Tier Vehicular MetaverseNahom Abishu Hayla, A. Mohammed Seid, Aiman Erbad, Tilahun M. Getu, Ala Al-Fuqaha, Mohsen Guizani2025-02-26下载Although multi-tier vehicular Metaverse promises to transform vehicles into essential nodes -- within an interconnected digital ecosystem -- using efficient resource allocation and seamless vehicular ...
Sequential Entanglement-Swapping assisted by Quantum Protocol over Ethernet NetworksKun Chen-Hu, Kristian S. Jensen, Petar Popovski2025-02-26下载The integration of quantum communication protocols over Ethernet networks is proposed, showing the potential of combining classical and quantum technologies for efficient, scalable quantum networking.
O-RIS-ing: Evaluating RIS-Assisted NextG Open RANMaria Tsampazi, Michele Polese, Falko Dressler, Tommaso Melodia2025-02-26下载Reconfigurable Intelligent Surfaces (RISs) pose as a transformative technology to revolutionize the cellular architecture of Next Generation (NextG) Radio Access Networks (RANs).

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Safe and usable kernel extensions with RexJinghao Jia, Ruowen Qin, Milo Craun, Egor Lukiyanov, Ayush Bansal, Michael V. Le, Hubertus Franke, Hani Jamjoom, Tianyin Xu, Dan Williams2025-02-26下载Safe kernel extensions have gained significant traction, evolving from simple packet filters to large, complex programs that customize storage, networking, and scheduling.

cs.PF - Performance

标题作者发布日期PDF摘要
Online Pseudo-average Shifting Attention(PASA) for Robust Low-precision LLM Inference: Algorithms and Numerical AnalysisLong Cheng, Qichen Liao, Fan Wu, Junlin Mu, Tengfei Han, Zhe Qiu, Lianqiang Li, Tianyi Liu, Fangzheng Miao, Keming Gao, Liang Wang, Zhen Zhang, Qiande Yin2025-02-26下载Attention calculation is extremely time-consuming for long-sequence inference tasks, such as text or image/video generation, in large models. To accelerate this process, we developed a low-precision, ...

基于 VitePress 构建