Skip to content

2025-03-12

cs.AR - Architecture

标题作者发布日期PDF摘要
Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment ProblemHaesol Im, Chan-Woo Yang, Moslem Noori, Dmitrii Dobrynin, Elisabetta Valiante, Giacomo Pedretti, Arne Heittmann, Thomas Van Vaerenbergh, Masoud Mohseni, John Paul Strachan, Dmitri Strukov, Ray Beausoleil, Ignacio Rozada2025-03-12下载Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years.
Hardware.jl - An MLIR-based Julia HLS Flow (Work in Progress)Benedict Short, Ian McInerney, John Wickerson2025-03-12下载Co-developing scientific algorithms and hardware accelerators requires domain-specific knowledge and large engineering resources. This leads to a slow development pace and high project complexity, whi...
EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data TransferYi Chen, Jie Lou, Malte Wabnitz, Johnson Loh, Tobias Gemmeke2025-03-12下载Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which e...
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data AnalyticsZeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajing Tang, Gang Pan, Fei Wu, Bingsheng He, Gustavo Alonso2025-03-12下载Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both ...
A Review on Proprietary Accelerators for Large Language ModelsSihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon2025-03-12下载With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing.
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash ProcessingMayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu2025-03-12下载Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Parallel Batch-Dynamic Maximal Matching with Constant Work per UpdateGuy E. Blelloch, Andrew C. Brady2025-03-12下载We present a work optimal algorithm for parallel fully batch-dynamic maximal matching against an oblivious adversary. It processes batches of updates (either insertions or deletions of edges) in const...
Computing the Saturation Throughput for Heterogeneous p-CSMA in a General Wireless NetworkFaezeh Dehghan Tarzjani, Bhaskar Krishnamachari2025-03-12下载A well-known expression for the saturation throughput of heterogeneous transmitting nodes in a wireless network using p-CSMA, derived from Renewal Theory, implicitly assumes that all transmitting node...
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCoZachary Charles, Gabriel Teston, Lucio Dery, Keith Rush, Nova Fallen, Zachary Garrett, Arthur Szlam, Arthur Douillard2025-03-12下载As we scale to more massive machine learning models, the frequent synchronization demands inherent in data-parallel approaches create significant slowdowns, posing a critical challenge to further scal...
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based BatchingTairan Xu, Leyang Xue, Zhan Lu, Adrian Jackson, Luo Mai2025-03-12下载This paper presents MoE-Gen, a high-throughput MoE inference system optimized for single-GPU execution. Existing inference systems rely on model-based or continuous batching strategies, originally des...
The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUsTimothée David--Cléris, Guillaume Laibe, Yona Lapeyre2025-03-12下载We present Shamrock, a performance portable framework developed in C++17 with the SYCL programming standard, tailored for numerical astrophysics on Exascale architectures.
Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge DeviceMumuksh Tayal, Yogesh Simmhan2025-03-12下载Edge devices like Nvidia Jetson platforms now offer several on-board accelerators -- including GPU CUDA cores, Tensor Cores, and Deep Learning Accelerators (DLA) -- which can be concurrently exploited...
Energy Metrics for Edge Microservice Request Placement StrategiesKlervie Toczé, Simin Nadjm-Tehrani2025-03-12下载Microservices are a way of splitting the logic of an application into small blocks that can be run on different computing units and used by other applications.
EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data TransferYi Chen, Jie Lou, Malte Wabnitz, Johnson Loh, Tobias Gemmeke2025-03-12下载Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which e...
MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product QuantizationZongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan2025-03-12下载Large language models (LLMs) are increasingly utilized for complex tasks requiring longer context lengths, with some models supporting up to 128K or 1M tokens.
Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming ApproachRuifeng She, Bowen Pang, Kai Li, Zehua Liu, Tao Zhong2025-03-12下载As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential.
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data AnalyticsZeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajing Tang, Gang Pan, Fei Wu, Bingsheng He, Gustavo Alonso2025-03-12下载Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both ...
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE InferenceMohammad Siavashi, Faezeh Keshmiri Dindarloo, Dejan Kostic, Marco Chiesa2025-03-12下载Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and bes...
Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNsAidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano2025-03-12下载The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE).
Falafels: A tool for Estimating Federated Learning Energy Consumption via Discrete SimulationAndrew Mary Huet de Barochez, Stéphan Plassart, Sébastien Monnet2025-03-12下载The growth in computational power and data hungriness of Machine Learning has led to an important shift of research efforts towards the distribution of ML models on multiple machines, leading in even ...
Drift-Aware Federated Learning: A Causal PerspectiveYunjie Fang, Sheng Wu, Tao Yang, Xiaofeng Wu, Bo Hu2025-03-12下载Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual ...
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the EdgeMaximilian Abstreiter, Sasu Tarkoma, Roberto Morabito2025-03-12下载The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making.
FedMSGL: A Self-Expressive Hypergraph Based Federated Multi-View LearningDaoyuan Li, Zuyuan Yang, Shengli Xie2025-03-12下载Federated learning is essential for enabling collaborative model training across decentralized data sources while preserving data privacy and security.
Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated LearningZirui Gong, Yanjun Zhang, Leo Yu Zhang, Zhaoxi Zhang, Yong Xiang, Shirui Pan2025-03-12下载Federated Ranking Learning (FRL) is a state-of-the-art FL framework that stands out for its communication efficiency and resilience to poisoning attacks.
CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash ProcessingMayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu2025-03-12下载Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications.
Performance Models for a Two-tiered Storage SystemAparna Sasidharan, Xian-He, Jay Lofstead, Scott Klasky2025-03-12下载This work describes the design, implementation and performance analysis of a distributed two-tiered storage software. The first tier functions as a distributed software cache implemented using solid-s...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Computing the Saturation Throughput for Heterogeneous p-CSMA in a General Wireless NetworkFaezeh Dehghan Tarzjani, Bhaskar Krishnamachari2025-03-12下载A well-known expression for the saturation throughput of heterogeneous transmitting nodes in a wireless network using p-CSMA, derived from Renewal Theory, implicitly assumes that all transmitting node...
A Short Scalability Study on the SeQUeNCe Parallel Quantum Network SimulatorAaron Welch, Mariam Kiran2025-03-12下载As quantum networking continues to grow in importance, its study is of interest to an ever wider community and at an increasing scale. However, the development of its physical infrastructure remains b...
IUP: Integrated and Programmable User Plane for Next-Generation Mobile NetworksChieh-Chun Chen, Chia-Yu Chang, Navid Nikaein2025-03-12下载Mobile networks evolve on a regular basis to meet the requirements of a rapidly changing application ecosystem; hence, a future-proof design is key to getting the most out of their lifecycle.
Experimental Analysis of a Self-Coherent M-QAM Receiver by Means of Recurrent Optical Spectrum Slicing and Direct DetectionKostas Sozos, Francesco Da Ros, Senior Member Optica, Metodi Yankov, Stavros Deligiannidis, George Sarantoglou, Charis Mesaritakis, Adonis Bogris, Fellow Optica2025-03-12下载High order modulation formats constitute the most prominent way for increasing spectral efficiency in transmission systems. Coherent transceivers that support such higher order formats require heavy d...
Charting 5G Energy Efficiency: Flexible Energy Modeling for Sustainable NetworksAnderson L de Araujo, Luc Deneire, Guillaume Urvoy-Keller, André L F de Almeida2025-03-12下载Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies...
Efficient Adaptive Bandwidth Allocation for Deadline-Aware Online Admission Control in Time-Sensitive NetworkingSifan Yu, Feng He, Anlan Xie, Luxi Zhao2025-03-12下载With the growing demand for dynamic real-time applications, online admission control for time-critical event-triggered (ET) traffic in Time-Sensitive Networking (TSN) has become a critical challenge.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
BYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More ExcellentHongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Zhenghong Lin, Mingjie Xing, Yanjun Wu, Carl Yang2025-03-12下载Operating system (OS) kernel tuning is a critical yet challenging problem for performance optimization, due to the large configuration space, complex interdependencies among configuration options, and...
KNighter: Transforming Static Analysis with LLM-Synthesized CheckersChenyuan Yang, Zijie Zhao, Zichen Xie, Haoyu Li, Lingming Zhang2025-03-12下载Static analysis is a powerful technique for bug detection in critical systems like operating system kernels. However, designing and implementing static analyzers is challenging, time-consuming, and ty...

cs.PF - Performance

标题作者发布日期PDF摘要
AI Work Quantization Model: Closed-System AI Computational Effort MetricAasish Kumar Sharma, Michael Bidollahkhani, Julian Martin Kunkel2025-03-12下载The rapid adoption of AI-driven automation in IoT environments, particularly in smart cities and industrial systems, necessitates a standardized approach to quantify AIs computational workload.
Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNsAidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano2025-03-12下载The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE).
A Review on Proprietary Accelerators for Large Language ModelsSihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon2025-03-12下载With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing.
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the EdgeMaximilian Abstreiter, Sasu Tarkoma, Roberto Morabito2025-03-12下载The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making.
Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial AttacksIdris Zakariyya, Ferheen Ayaz, Mounia Kharbouche-Harrari, Jeremy Singer, Sye Loong Keoh, Danilo Pau, José Cano2025-03-12下载Reducing the memory footprint of Machine Learning (ML) models, especially Deep Neural Networks (DNNs), is imperative to facilitate their deployment on resource-constrained edge devices.

基于 VitePress 构建