Appearance
2025-03-12
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment Problem | Haesol Im, Chan-Woo Yang, Moslem Noori, Dmitrii Dobrynin, Elisabetta Valiante, Giacomo Pedretti, Arne Heittmann, Thomas Van Vaerenbergh, Masoud Mohseni, John Paul Strachan, Dmitri Strukov, Ray Beausoleil, Ignacio Rozada | 2025-03-12 | 下载 | Research into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years. |
| Hardware.jl - An MLIR-based Julia HLS Flow (Work in Progress) | Benedict Short, Ian McInerney, John Wickerson | 2025-03-12 | 下载 | Co-developing scientific algorithms and hardware accelerators requires domain-specific knowledge and large engineering resources. This leads to a slow development pace and high project complexity, whi... |
| EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data Transfer | Yi Chen, Jie Lou, Malte Wabnitz, Johnson Loh, Tobias Gemmeke | 2025-03-12 | 下载 | Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which e... |
| FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics | Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajing Tang, Gang Pan, Fei Wu, Bingsheng He, Gustavo Alonso | 2025-03-12 | 下载 | Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both ... |
| A Review on Proprietary Accelerators for Large Language Models | Sihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon | 2025-03-12 | 下载 | With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing. |
| CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing | Mayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu | 2025-03-12 | 下载 | Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Parallel Batch-Dynamic Maximal Matching with Constant Work per Update | Guy E. Blelloch, Andrew C. Brady | 2025-03-12 | 下载 | We present a work optimal algorithm for parallel fully batch-dynamic maximal matching against an oblivious adversary. It processes batches of updates (either insertions or deletions of edges) in const... |
| Computing the Saturation Throughput for Heterogeneous p-CSMA in a General Wireless Network | Faezeh Dehghan Tarzjani, Bhaskar Krishnamachari | 2025-03-12 | 下载 | A well-known expression for the saturation throughput of heterogeneous transmitting nodes in a wireless network using p-CSMA, derived from Renewal Theory, implicitly assumes that all transmitting node... |
| Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo | Zachary Charles, Gabriel Teston, Lucio Dery, Keith Rush, Nova Fallen, Zachary Garrett, Arthur Szlam, Arthur Douillard | 2025-03-12 | 下载 | As we scale to more massive machine learning models, the frequent synchronization demands inherent in data-parallel approaches create significant slowdowns, posing a critical challenge to further scal... |
| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | Tairan Xu, Leyang Xue, Zhan Lu, Adrian Jackson, Luo Mai | 2025-03-12 | 下载 | This paper presents MoE-Gen, a high-throughput MoE inference system optimized for single-GPU execution. Existing inference systems rely on model-based or continuous batching strategies, originally des... |
| The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs | Timothée David--Cléris, Guillaume Laibe, Yona Lapeyre | 2025-03-12 | 下载 | We present Shamrock, a performance portable framework developed in C++17 with the SYCL programming standard, tailored for numerical astrophysics on Exascale architectures. |
| Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device | Mumuksh Tayal, Yogesh Simmhan | 2025-03-12 | 下载 | Edge devices like Nvidia Jetson platforms now offer several on-board accelerators -- including GPU CUDA cores, Tensor Cores, and Deep Learning Accelerators (DLA) -- which can be concurrently exploited... |
| Energy Metrics for Edge Microservice Request Placement Strategies | Klervie Toczé, Simin Nadjm-Tehrani | 2025-03-12 | 下载 | Microservices are a way of splitting the logic of an application into small blocks that can be run on different computing units and used by other applications. |
| EDEA: Efficient Dual-Engine Accelerator for Depthwise Separable Convolution with Direct Data Transfer | Yi Chen, Jie Lou, Malte Wabnitz, Johnson Loh, Tobias Gemmeke | 2025-03-12 | 下载 | Depthwise separable convolution (DSC) has emerged as a crucial technique, especially for resource-constrained devices. In this paper, we propose a dual-engine for the DSC hardware accelerator, which e... |
| MILLION: Mastering Long-Context LLM Inference Via Outlier-Immunized KV Product Quantization | Zongwu Wang, Peng Xu, Fangxin Liu, Yiwei Hu, Qingxiao Sun, Gezi Li, Cheng Li, Xuan Wang, Li Jiang, Haibing Guan | 2025-03-12 | 下载 | Large language models (LLMs) are increasingly utilized for complex tasks requiring longer context lengths, with some models supporting up to 128K or 1M tokens. |
| Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach | Ruifeng She, Bowen Pang, Kai Li, Zehua Liu, Tao Zhong | 2025-03-12 | 下载 | As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. |
| FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics | Zeke Wang, Jie Zhang, Hongjing Huang, Yingtao Li, Xueying Zhu, Mo Sun, Zihan Yang, De Ma, Huajing Tang, Gang Pan, Fei Wu, Bingsheng He, Gustavo Alonso | 2025-03-12 | 下载 | Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both ... |
| Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mohammad Siavashi, Faezeh Keshmiri Dindarloo, Dejan Kostic, Marco Chiesa | 2025-03-12 | 下载 | Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and bes... |
| Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs | Aidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano | 2025-03-12 | 下载 | The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE). |
| Falafels: A tool for Estimating Federated Learning Energy Consumption via Discrete Simulation | Andrew Mary Huet de Barochez, Stéphan Plassart, Sébastien Monnet | 2025-03-12 | 下载 | The growth in computational power and data hungriness of Machine Learning has led to an important shift of research efforts towards the distribution of ML models on multiple machines, leading in even ... |
| Drift-Aware Federated Learning: A Causal Perspective | Yunjie Fang, Sheng Wu, Tao Yang, Xiaofeng Wu, Bo Hu | 2025-03-12 | 下载 | Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual ... |
| Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge | Maximilian Abstreiter, Sasu Tarkoma, Roberto Morabito | 2025-03-12 | 下载 | The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making. |
| FedMSGL: A Self-Expressive Hypergraph Based Federated Multi-View Learning | Daoyuan Li, Zuyuan Yang, Shengli Xie | 2025-03-12 | 下载 | Federated learning is essential for enabling collaborative model training across decentralized data sources while preserving data privacy and security. |
| Not All Edges are Equally Robust: Evaluating the Robustness of Ranking-Based Federated Learning | Zirui Gong, Yanjun Zhang, Leo Yu Zhang, Zhaoxi Zhang, Yong Xiang, Shirui Pan | 2025-03-12 | 下载 | Federated Ranking Learning (FRL) is a state-of-the-art FL framework that stands out for its communication efficiency and resilience to poisoning attacks. |
| CIPHERMATCH: Accelerating Homomorphic Encryption-Based String Matching via Memory-Efficient Data Packing and In-Flash Processing | Mayank Kabra, Rakesh Nadig, Harshita Gupta, Rahul Bera, Manos Frouzakis, Vamanan Arulchelvan, Yu Liang, Haiyu Mao, Mohammad Sadrosadati, Onur Mutlu | 2025-03-12 | 下载 | Homomorphic encryption (HE) allows secure computation on encrypted data without revealing the original data, providing significant benefits for privacy-sensitive applications. |
| Performance Models for a Two-tiered Storage System | Aparna Sasidharan, Xian-He, Jay Lofstead, Scott Klasky | 2025-03-12 | 下载 | This work describes the design, implementation and performance analysis of a distributed two-tiered storage software. The first tier functions as a distributed software cache implemented using solid-s... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Computing the Saturation Throughput for Heterogeneous p-CSMA in a General Wireless Network | Faezeh Dehghan Tarzjani, Bhaskar Krishnamachari | 2025-03-12 | 下载 | A well-known expression for the saturation throughput of heterogeneous transmitting nodes in a wireless network using p-CSMA, derived from Renewal Theory, implicitly assumes that all transmitting node... |
| A Short Scalability Study on the SeQUeNCe Parallel Quantum Network Simulator | Aaron Welch, Mariam Kiran | 2025-03-12 | 下载 | As quantum networking continues to grow in importance, its study is of interest to an ever wider community and at an increasing scale. However, the development of its physical infrastructure remains b... |
| IUP: Integrated and Programmable User Plane for Next-Generation Mobile Networks | Chieh-Chun Chen, Chia-Yu Chang, Navid Nikaein | 2025-03-12 | 下载 | Mobile networks evolve on a regular basis to meet the requirements of a rapidly changing application ecosystem; hence, a future-proof design is key to getting the most out of their lifecycle. |
| Experimental Analysis of a Self-Coherent M-QAM Receiver by Means of Recurrent Optical Spectrum Slicing and Direct Detection | Kostas Sozos, Francesco Da Ros, Senior Member Optica, Metodi Yankov, Stavros Deligiannidis, George Sarantoglou, Charis Mesaritakis, Adonis Bogris, Fellow Optica | 2025-03-12 | 下载 | High order modulation formats constitute the most prominent way for increasing spectral efficiency in transmission systems. Coherent transceivers that support such higher order formats require heavy d... |
| Charting 5G Energy Efficiency: Flexible Energy Modeling for Sustainable Networks | Anderson L de Araujo, Luc Deneire, Guillaume Urvoy-Keller, André L F de Almeida | 2025-03-12 | 下载 | Despite the rapid advancements in 5G technology, accurately assessing the energy consumption of its Radio Access Networks (RANs) remains a challenge due to the diverse range of applicable technologies... |
| Efficient Adaptive Bandwidth Allocation for Deadline-Aware Online Admission Control in Time-Sensitive Networking | Sifan Yu, Feng He, Anlan Xie, Luxi Zhao | 2025-03-12 | 下载 | With the growing demand for dynamic real-time applications, online admission control for time-critical event-triggered (ET) traffic in Time-Sensitive Networking (TSN) has become a critical challenge. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| BYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More Excellent | Hongyu Lin, Yuchen Li, Haoran Luo, Kaichun Yao, Libo Zhang, Zhenghong Lin, Mingjie Xing, Yanjun Wu, Carl Yang | 2025-03-12 | 下载 | Operating system (OS) kernel tuning is a critical yet challenging problem for performance optimization, due to the large configuration space, complex interdependencies among configuration options, and... |
| KNighter: Transforming Static Analysis with LLM-Synthesized Checkers | Chenyuan Yang, Zijie Zhao, Zichen Xie, Haoyu Li, Lingming Zhang | 2025-03-12 | 下载 | Static analysis is a powerful technique for bug detection in critical systems like operating system kernels. However, designing and implementing static analyzers is challenging, time-consuming, and ty... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AI Work Quantization Model: Closed-System AI Computational Effort Metric | Aasish Kumar Sharma, Michael Bidollahkhani, Julian Martin Kunkel | 2025-03-12 | 下载 | The rapid adoption of AI-driven automation in IoT environments, particularly in smart cities and industrial systems, necessitates a standardized approach to quantify AIs computational workload. |
| Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs | Aidan Ferguson, Perry Gibson, Lara D'Agata, Parker McLeod, Ferhat Yaman, Amitabh Das, Ian Colbert, José Cano | 2025-03-12 | 下载 | The deployment of deep neural networks (DNNs) in privacy-sensitive environments is constrained by computational overheads in fully homomorphic encryption (FHE). |
| A Review on Proprietary Accelerators for Large Language Models | Sihyeong Park, Jemin Lee, Byung-Soo Kim, Seokhun Jeon | 2025-03-12 | 下载 | With the advancement of Large Language Models (LLMs), the importance of accelerators that efficiently process LLM computations has been increasing. |
| Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge | Maximilian Abstreiter, Sasu Tarkoma, Roberto Morabito | 2025-03-12 | 下载 | The rapid rise of Language Models (LMs) has expanded the capabilities of natural language processing, powering applications from text generation to complex decision-making. |
| Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks | Idris Zakariyya, Ferheen Ayaz, Mounia Kharbouche-Harrari, Jeremy Singer, Sye Loong Keoh, Danilo Pau, José Cano | 2025-03-12 | 下载 | Reducing the memory footprint of Machine Learning (ML) models, especially Deep Neural Networks (DNNs), is imperative to facilitate their deployment on resource-constrained edge devices. |