Skip to content

2025-05-14

cs.AR - Architecture

标题作者发布日期PDF摘要
Customizing a Large Language Model for VHDL Design of High-Performance MicroprocessorsNicolas Dupuis, Ravi Nair, Shyam Ramji, Sean McClintock, Nishant Chauhan, Priyanka Nagpal, Bart Blaner, Ken Valk, Leon Stok, Ruchir Puri2025-05-14下载The use of Large Language Models (LLMs) in hardware design has taken off in recent years, principally through its incorporation in tools that increase chip designer productivity.
SEGA-DCIM: Design Space Exploration-Guided Automatic Digital CIM Compiler with Multiple Precision SupportHaikang Diao, Haoyi Zhang, Jiahao Song, Haoyang Luo, Yibo Lin, Runsheng Wang, Yuan Wang, Xiyuan Tang2025-05-14下载Digital computing-in-memory (DCIM) has been a popular solution for addressing the memory wall problem in recent years. However, the DCIM design still heavily relies on manual efforts, and the optimiza...
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesChenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei2025-05-14下载The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconn...
Automated SAR ADC Sizing Using Analytical EquationsZhongyi Li, Zhuofu Tao, Yanze Zhou, Yichen Shi, Zhiping Yu, Ting-Jung Lin, Lei He2025-05-14下载Conventional analog and mixed-signal (AMS) circuit designs heavily rely on manual effort, which is time-consuming and labor-intensive. This paper presents a fully automated design methodology for Succ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
FAST: An Efficient Scheduler for All-to-All GPU CommunicationYiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi2025-05-14下载All-to-All(v) communication is a critical primitive in modern machine learning workloads, particularly mixture-of-experts (MoE) models. Unfortunately, efficient scheduling is challenging due to worklo...
MDTP -- An Adaptive Multi-Source Data Transfer ProtocolSepideh Abdollah, Craig Partridge, Susmit Shannigrahi2025-05-14下载Scientific data volume is growing in size, and as a direct result, the need for faster transfers is also increasing. The scientific community has sought to leverage parallel transfer methods using mul...
IoT-Enabled Hemodynamic Surveillance System: AD8232 Bioelectric Signal Processing with ESP32Hemalatha R J, Shubham Malhotra, Shivapanchakshari T G, Lokesh K, Dev Anand D, Samson Jebakumar S2025-05-14下载This dissertation proposes an electrocardiogram (ECG) tracking device that diagnoses cardiopulmonary problems using the Internet of Things (IoT) desired results.
ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia GraceRuimin Shi, Gabin Schieffer, Maya Gokhale, Pei-Hung Lin, Hiren Patel, Ivy Peng2025-05-14下载Vector architectures are essential for boosting computing throughput. ARM provides SVE as the next-generation length-agnostic vector extension beyond traditional fixed-length SIMD.
Strategies to Measure Energy Consumption Using RAPL During Workflow Execution on Commodity ClustersPhilipp Thamm, Ulf Leser2025-05-14下载In science, problems in many fields can be solved by processing datasets using a series of computationally expensive algorithms, sometimes referred to as workflows.
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesChenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei2025-05-14下载The rapid scaling of large language models (LLMs) has unveiled critical limitations in current hardware architectures, including constraints in memory capacity, computational efficiency, and interconn...
Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD IntegrationZhonggen Li, Xiangyu Ke, Yifan Zhu, Yunjun Gao, Feifei Li2025-05-14下载Graph embeddings map graph nodes to continuous vectors and are foundational to community detection, recommendation, and many scientific applications.
Birch SGD: A Tree Graph Framework for Local and Asynchronous SGD MethodsAlexander Tyurin, Danil Sivtsov2025-05-14下载We propose a new unifying framework, Birch SGD, for analyzing and designing distributed SGD methods. The central idea is to represent each method as a weighted directed tree, referred to as a computat...
Towards Efficient Verification of Parallel Applications with Mc SimGridMatthieu Laurent, Thierry Jéron, Martin Quinson2025-05-14下载Assessing the correctness of distributed and parallel applications is notoriously difficult due to the complexity of the concurrent behaviors and the difficulty to reproduce bugs.
ELIS: Efficient LLM Iterative Scheduling System with Response Length PredictorSeungbeom Choi, Jeonghoe Goo, Eunjoo Jeon, Mingyu Yang, Minsung Jang2025-05-14下载We propose ELIS, a serving system for Large Language Models (LLMs) featuring an Iterative Shortest Remaining Time First (ISRTF) scheduler designed to efficiently manage inference tasks with the shorte...
Toward Malicious Clients Detection in Federated LearningZhihao Dou, Jiaqi Wang, Wei Sun, Zhuqing Liu, Minghong Fang2025-05-14下载Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model without sharing their raw data. However, the decentralized nature of FL introduces vulnerabili...
Architecture of Tianyu Software: Relative Photometry as a Case StudyYicheng Rui, Yifan Xuan, Shuyue Zheng, Kexin Li, Kaiming Cui, Kai Xiao, Jie Zheng, Jun Kai Ng, Hongxuan Jiang, Fabo Feng, Qinghui Sun2025-05-14下载Tianyu telescope, an one-meter robotic optical survey instrument to be constructed in Lenghu, Qinghai, China, is designed for detecting transiting exoplanets, variable stars and transients.
The Adaptive Complexity of Finding a Stationary PointHuanjian Zhou, Andi Han, Akiko Takeda, Masashi Sugiyama2025-05-14下载In large-scale applications, such as machine learning, it is desirable to design non-convex optimization algorithms with a high degree of parallelization.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
FAST: An Efficient Scheduler for All-to-All GPU CommunicationYiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi2025-05-14下载All-to-All(v) communication is a critical primitive in modern machine learning workloads, particularly mixture-of-experts (MoE) models. Unfortunately, efficient scheduling is challenging due to worklo...
The Power of Alternatives in Network EmbeddingOleg Kolosov, Gala Yadgar, David Breitgand, Dean H. Lorenz2025-05-14下载In the virtual network embedding problem, the goal is to map embed a set of virtual network instances to a given physical network substrate at minimal cost, while respecting the capacity constraints o...
MDTP -- An Adaptive Multi-Source Data Transfer ProtocolSepideh Abdollah, Craig Partridge, Susmit Shannigrahi2025-05-14下载Scientific data volume is growing in size, and as a direct result, the need for faster transfers is also increasing. The scientific community has sought to leverage parallel transfer methods using mul...
Dimensioning and Optimization of Reliability Coverage in Local 6G NetworksJacek Kibiłda, Dian Echevarría Pérez, André Gomes, Onel L. Alcaraz López, Arthur S. de Sena, Nurul Huda Mahmood, Hirley Alves2025-05-14下载Enabling vertical use cases for the sixth generation (6G) wireless networks, such as automated manufacturing, immersive extended reality (XR), and self-driving fleets, will require network designs tha...
Wormhole Detection Based on Z-Score And Neighbor Table ComparisonZezhi Zeng2025-05-14下载Wormhole attacks can cause serious disruptions to the network topology in disaster rescue opportunity networks. By establishing false Wormhole(WH) links, malicious nodes can mislead legitimate paths...
Instant AoI Optimization through Relay Location Selection in Disaster Multi-hop CommunicationYang Gao, Zezhi Zeng2025-05-14下载Meteorological disasters such as typhoons, forest fires, and floods can damage the communication infrastructures, which will further disable the communication capabilities of cellular networks.
DNS Query Forgery: A Client-Side Defense Against Mobile App Traffic ProfilingAndrea Jimenez-Berenguel, César Gil, Carlos Garcia-Rubio, Jordi Forné, Celeste Campo2025-05-14下载Mobile applications continuously generate DNS queries that can reveal sensitive user behavioral patterns even when communications are encrypted.
RAG-Enabled Intent Reasoning for Application-Network InteractionSalwa Mostafa, Mohamed K. Abdel-Aziz, Mohammed S. Elbamby, Mehdi Bennis2025-05-14下载Intent-based network (IBN) is a promising solution to automate network operation and management. IBN aims to offer human-tailored network interaction, allowing the network to communicate in a way that...
Interplay Between AI and Space-Air-Ground Integrated Network: The Road AheadChenyu Wu, Xi Wang, Yi Hu, Shuai Han, Dusit Niyato2025-05-14下载Space-air-ground integrated network (SAGIN) is envisioned as a key network architecture for achieving ubiquitous coverage in the next-generation communication system.
QUIC Steps: Evaluating Pacing Strategies in QUIC ImplementationsMarcel Kempf, Simon Tietz, Benedikt Jaeger, Johannes Späth, Georg Carle, Johannes Zirngibl2025-05-14下载Pacing is a key mechanism in modern transport protocols, used to regulate packet transmission timing to minimize traffic burstiness, lower latency, and reduce packet loss.
A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder LearningBerkay Guler, Giovanni Geraci, Hamid Jafarkhani2025-05-14下载Current applications of self-supervised learning to wireless channel representation often borrow paradigms developed for text and image processing, without fully addressing the unique characteristics ...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Adaptive Migration Decision for Multi-Tenant Memory SystemsHyungjun Cho, Igjae Kim, Kwanghoon Choi, Hongjin Kim, Wonjae Lee, Junhyeok Im, Jinin So, Jaehyuk Huh2025-05-14下载Tiered memory systems consisting of fast small memory and slow large memory have emerged to provide high capacity memory in a cost-effective way.

cs.PF - Performance

标题作者发布日期PDF摘要
Statistical Modeling and Uncertainty Estimation of LLM Inference SystemsKaustabha Ray, Nelson Mimura Gonzalez, Bruno Wassermann, Rachel Tzoref-Brill, Dean H. Lorenz2025-05-14下载Large Language Model (LLM) inference systems present significant challenges in statistical performance characterization due to dynamic workload variations, diverse hardware architectures, and complex ...

基于 VitePress 构建