Appearance
2024-11-26
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors | Mariam Rakka, Jinhao Li, Guohao Dai, Ahmed Eltawil, Mohammed E. Fouda, Fadi Kurdahi | 2024-11-26 | 下载 | Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. |
| RTL-Breaker: Assessing the Security of LLMs against Backdoor Attacks on HDL Code Generation | Lakshmi Likhitha Mankali, Jitendra Bhandari, Manaar Alam, Ramesh Karri, Michail Maniatakos, Ozgur Sinanoglu, Johann Knechtel | 2024-11-26 | 下载 | Large language models (LLMs) have demonstrated remarkable potential with code generation/completion tasks for hardware design. In fact, LLM-based hardware description language (HDL) code generation ha... |
| Rapid Deployment of Domain-specific Hyperspectral Image Processors with Application to Autonomous Driving | Jon Gutiérrez-Zaballa, Koldo Basterretxea, Javier Echanobe, Óscar Mata-Carballeira, M. Victoria Martínez | 2024-11-26 | 下载 | The article discusses the use of low cost System-On-Module (SOM) platforms for the implementation of efficient hyperspectral imaging (HSI) processors for application in autonomous driving. |
| High-Performance and Scalable Fault-Tolerant Quantum Computation with Lattice Surgery on a 2.5D Architecture | Yosuke Ueno, Taku Saito, Teruo Tanimoto, Yasunari Suzuki, Yutaka Tabuchi, Shuhei Tamate, Hiroshi Nakamura | 2024-11-26 | 下载 | Due to the high error rate of a qubit, detecting and correcting errors on it is essential for fault-tolerant quantum computing (FTQC). Among several FTQC techniques, lattice surgery (LS) using surface... |
| Efficient transformer adaptation for analog in-memory computing via low-rank adapters | Chen Li, Elena Ferro, Corey Lammie, Manuel Le Gallo, Irem Boybat, Bipin Rajendran | 2024-11-26 | 下载 | Analog In-Memory Computing (AIMC) offers a promising solution to the von Neumann bottleneck. However, deploying transformer models on AIMC remains challenging due to their inherent need for flexibilit... |
| PIM-AI: A Novel Architecture for High-Efficiency LLM Inference | Cristobal Ortega, Yann Falevoz, Renaud Ayrignac | 2024-11-26 | 下载 | Large Language Models (LLMs) have become essential in a variety of applications due to their advanced language understanding and generation capabilities. |
| A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training | Mingjing Li, Huihui Zhou, Xiaofeng Xu, Zhiwei Zhong, Puli Quan, Xueke Zhu, Yanyu Lin, Wenjie Lin, Hongyu Guo, Junchao Zhang, Yunhao Ma, Wei Wang, Qingyan Meng, Zhengyu Ma, Guoqi Li, Xiaoxin Cui, Yonghong Tian | 2024-11-26 | 下载 | There is a growing necessity for edge training to adapt to dynamically changing environment. Neuromorphic computing represents a significant pathway for high-efficiency intelligent computation in ener... |
| Single Event Upsets characterization of 65 nm CMOS 6T and 8T SRAM cells for ground level environment | Daniel Malagon, Gabriel Torrens, Jaume Segura, Sebastia A. Bota | 2024-11-26 | 下载 | We present experimental results of the cross-section related to cosmic-ray irradiation at ground level for minimum-sized six-transistors (6T) and eight-transistors (8T) bit-cells SRAM memories impleme... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Parallel Scan Algorithm in the Tensor Core Unit Model | Anastasios Zouzias, William F. McColl | 2024-11-26 | 下载 | We present a parallel scan (prefix sum) algorithm in the Tensor Core Unit (TCU) model of computation. The TCU model assumes that multiplication between two square matrices of constant size is a ba... |
| RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices | Andreas Karatzas, Dimitrios Stamoulis, Iraklis Anagnostopoulos | 2024-11-26 | 下载 | Modern edge data centers simultaneously handle multiple Deep Neural Networks (DNNs), leading to significant challenges in workload management. |
| Adaptive Client Selection with Personalization for Communication Efficient Federated Learning | Allan M. de Souza, Filipe Maciel, Joahannes B. D. da Costa, Luiz F. Bittencourt, Eduardo Cerqueira, Antonio A. F. Loureiro, Leandro A. Villas | 2024-11-26 | 下载 | Federated Learning (FL) is a distributed approach to collaboratively training machine learning models. FL requires a high level of communication between the devices and a central server, thus imposing... |
| Rapid Distributed Fine-tuning of a Segmentation Model Onboard Satellites | Meghan Plumridge, Rasmus Maråk, Chiara Ceccobello, Pablo Gómez, Gabriele Meoni, Filip Svoboda, Nicholas D. Lane | 2024-11-26 | 下载 | Segmentation of Earth observation (EO) satellite data is critical for natural hazard analysis and disaster response. However, processing EO data at ground stations introduces delays due to data transm... |
| A Cloud-based Real-time Probabilistic Remaining Useful Life (RUL) Estimation using the Sequential Monte Carlo (SMC) Method | Karthik Reddy Lyathakula, Fuh-Gwo Yuan | 2024-11-26 | 下载 | The remaining useful life (RUL) estimation is an important metric that helps in condition-based maintenance. Damage data obtained from the diagnostics techniques are often noisy and the RUL estimated ... |
| APEX: An Extensible and Dynamism-Aware Simulator for Automated Parallel Execution in LLM Serving | Yi-Chien Lin, Woosuk Kwon, Ronald Pineda, Fanny Nina Paravecino | 2024-11-26 | 下载 | Efficiently serving Large Language Models (LLMs) requires selecting an optimal parallel execution plan, balancing computation, memory, and communication overhead. |
| Evaluating the Overhead of the Performance Profiler Cloudprofiler With MooBench | Shinhyung Yang, David Georg Reichelt, Wilhelm Hasselbring | 2024-11-26 | 下载 | Performance engineering has become crucial for the cloud-native architecture. This architecture deploys multiple services, with each service representing an orchestration of containerized processes. |
| Joint Resource Optimization, Computation Offloading and Resource Slicing for Multi-Edge Traffic-Cognitive Networks | Ting Xiaoyang, Minfeng Zhang, Shu gonglee, Saimin Chen Zhang | 2024-11-26 | 下载 | The evolving landscape of edge computing envisions platforms operating as dynamic intermediaries between application providers and edge servers (ESs), where task offloading is coupled with payments fo... |
| PIM-AI: A Novel Architecture for High-Efficiency LLM Inference | Cristobal Ortega, Yann Falevoz, Renaud Ayrignac | 2024-11-26 | 下载 | Large Language Models (LLMs) have become essential in a variety of applications due to their advanced language understanding and generation capabilities. |
| A High Energy-Efficiency Multi-core Neuromorphic Architecture for Deep SNN Training | Mingjing Li, Huihui Zhou, Xiaofeng Xu, Zhiwei Zhong, Puli Quan, Xueke Zhu, Yanyu Lin, Wenjie Lin, Hongyu Guo, Junchao Zhang, Yunhao Ma, Wei Wang, Qingyan Meng, Zhengyu Ma, Guoqi Li, Xiaoxin Cui, Yonghong Tian | 2024-11-26 | 下载 | There is a growing necessity for edge training to adapt to dynamically changing environment. Neuromorphic computing represents a significant pathway for high-efficiency intelligent computation in ener... |
| Distributed Load Balancing with Workload-Dependent Service Rates | Wenxin Zhang, Santiago R. Balseiro, Robert Kleinberg, Vahab Mirrokni, Balasubramanian Sivan, Bartek Wydrowski | 2024-11-26 | 下载 | We study distributed load balancing in bipartite queueing systems where frontends route jobs to heterogeneous backends with workload-dependent service rates. |
| KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation | Chaoyi Jiang, Lei Gao, Hossein Entezari Zarch, Murali Annavaram | 2024-11-26 | 下载 | Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value (KV) cache is used to store intermediate activations, which significa... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Combining Threat Intelligence with IoT Scanning to Predict Cyber Attack | Jubin Abhishek Soni, Amit Anand, Rajesh Kumar Pandey, Aniket Abhishek Soni | 2024-11-26 | 下载 | While the Web has become a global platform for communication, malicious actors, including hackers and hacktivist groups, often disseminate ideological content and coordinate activities through the "Da... |
| AI-Augmented Ethical Hacking: A Practical Examination of Manual Exploitation and Privilege Escalation in Linux Environments | Haitham S. Al-Sinani, Chris J. Mitchell | 2024-11-26 | 下载 | This study explores the application of generative AI (GenAI) within manual exploitation and privilege escalation tasks in Linux-based penetration testing environments, two areas critical to comprehens... |
| A Primer on AP Power Save in Wi-Fi 8: Overview, Analysis, and Open Challenges | Roger Sanchez-Vital, Andrey Belogaev, Carles Gomez, Jeroen Famaey, Eduard Garcia-Villegas | 2024-11-26 | 下载 | Wi-Fi facilitates the Internet connectivity of billions of devices worldwide, making it an indispensable technology for modern life. Wi-Fi networks are becoming significantly denser, making energy con... |
| MetaGraphLoc: A Graph-based Meta-learning Scheme for Indoor Localization via Sensor Fusion | Yaya Etiabi, Eslam Eldeeb, Mohammad Shehab, Wafa Njima, Hirley Alves, Mohamed-Slim Alouini, El Mehdi Amhoud | 2024-11-26 | 下载 | Accurate indoor localization remains challenging due to variations in wireless signal environments and limited data availability. This paper introduces MetaGraphLoc, a novel system leveraging sensor f... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Tracing Optimization for Performance Modeling and Regression Detection | Kaveh Shahedi, Heng Li, Maxime Lamothe, Foutse Khomh | 2024-11-26 | 下载 | Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describes the relationship between the performance of a system and i... |
| KVPR: Efficient LLM Inference with I/O-Aware KV Cache Partial Recomputation | Chaoyi Jiang, Lei Gao, Hossein Entezari Zarch, Murali Annavaram | 2024-11-26 | 下载 | Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value (KV) cache is used to store intermediate activations, which significa... |