CV | Dr. Zonghang Li

Basics

Name	Zonghang Li
Degree	Ph.D.
Email	lizhuestc@gmail.com
Wechat	lizh_uestc
Homepage	https://lizonghang.github.io
Github	https://github.com/Lizonghang
Google scholar	https://scholar.google.com/citations?hl=en&user=1IA-XokAAAAJ
Summary	A young geeker and scholar who loves coding and exploring new technologies to realize fantastic ideas.

Work

2024 - Present

Abu Dhabi, UAE
Postdoc Researcher

MBZUAI
2020 - 2023

Chengdu, CN
Academic Instructor

Yingcai Honors College of UESTC

Guiding undergraduate students in the Yingcai Honors College of UESTC to carry out academic research and publish high-quality academic papers.
- My student Shenglai Zeng was selected as an outstanding student of UESTC and is pursuing his Ph.D. degree in Michigan State University. Our paper won the 2023 Best Paper Award from IEEE Transactions on Cloud Computing.
2019 - 2020

Shenzhen, CN
Invited Technical Instructor

Peng Cheng Laboratory (PCL)

Guiding PCL researchers to develop an communication-efficient geo-distributed machine learning system.
- The developed system was adopted by PCL.

Education

2021 - 2022

Singapore
Visiting Scholar

Nanyang Technological University

School of Computer Science and Engineering
2018 - 2018

Oxford, UK
Visiting Scholar

University of Oxford

Lady Margaret Hall
2014 - 2024

Chengdu, CN
Bachelor and PhD

University of Electronic Science and Technology of China

School of Information and Communication Engineering

Awards

2024

Best Paper Award

IEEE Transactions on Cloud Computing
2024

UESTC Excellent PhD Graduates Award

University of Electronic Science and Technology of China
2023

Best Oral Presentation Award

National Doctoral Forum on Information and Communication Engineering

Topic: Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route
2023

UESTC Academic Rising Star Award

University of Electronic Science and Technology of China
2023

CETC-UESTC Scholarship

China Electronics Technology Group Corporation (CETC)
2023

Guangdong Computer Society Best Paper Award

Guangdong Computer Society

Title: Personalized Saliency in Task-oriented Semantic Communications: Image Transmission and Performance Analysis.
Journal: IEEE Journal on Selected Areas in Communications (CCF A)
2022

NPKE Talent Student Scholarship

Nanjing Pukou Economic Development Zone
2021

UESTC Doctoral First Class Scholarship

University of Electronic Science and Technology of China
2021

UESTC Doctoral Academic Visiting Scholar Funding

University of Electronic Science and Technology of China
2021

CIC Future Network Leading Innovative Scientific and Technological Achievements

China Institute of Communications (CIC)

Our developed geo-distributed ML system named GeoMX is recommended by CIC to be published as the Future Network Leading Innovative Scientific and Technological Achievements in 2021!
2020

INET-RC Outstanding Contribution Award

Intelligent Network and Application Research Center
2019

CIC Excellent Solution Award

University Computer Course Teaching Steering Committee of the Ministry of Education and China Institute of Communications (CIC)

ZHIHUI CUP - National College Financial Technology Innovation Competition
2018

Oxford City Council Most Impactful Project Award

Oxfordshire County Council and Oxford City Council

Oxford's zero emission zone (ZEZ) is a pilot area in the city center where only zero-emission vehicles are allowed without charge. This project aims to analyze and predict the load of urban power grids under the influence of a large number of electronic vehicles, and provide suggestions for urban grid power supply to reduce the pressure of power supply on the overall power grid.
2018

UESTC Leading Talent Award

University of Electronic Science and Technology of China

Talks

2023

Roundtable: Large Model Applications in Industry - Dilemma and Open Source Solutions

The 8th China Open Source Conference (COSCon '23)

Guest Speaker of the Main Forum
2023

ChatGPT and Large Model Training Technologies Behind National Computing Power Networks

National University AI Education Summit

Invited Lecturer
2025

Prima.cpp: Speeding Up 70B LLM Inference on Low-Resource Everyday Home Clusters

The University of Birmingham Dubai

Keynote Speaker & Roundtable Guest

Projects

2024 - Now
Prima.cpp - A distributed inference system serving 70B-scale LLMs on mobile devices with piped ring parallelism and automatic layer assignment

Prima.cpp is a distributed inference system built on llama.cpp. Unlike existing on-device inference systems that assume sufficient memory, user devices often lack the total memory required to run 70B-scale models. While llama.cpp uses mmap to avoid OOMs, this approach incurs significant disk I/O latency. To address this, prima.cpp designs a piped ring parallel architecture that runs model layers in a circle and overlaps disk loading with operations on other devices. However, assigning model layers to heterogeneous devices is challenging due to heterogeneity in computing hardware, memory, disk, and OS, making inference latency hard to predict. Prima.cpp implements a layer-to-device scheduler that models these factors and optimizes overall inference latency while considering memory constraints and disk loading delays. In a setup using 4 user devices (laptop, tablet, smartphone, and desktop, with a combined 30GB of memory), prima.cpp achieves an inference latency of 1 second per token for Llama-3 70B. Further optimizations are underway.
2024 - 2024
TPI-LLM - A distributed inference system serving 70B-scale LLMs on mobile devices with tensor parallelism and sliding window memory scheduling

TPI-LLM is a LLM serving system designed to bring LLM functions to low-resource mobile devices. While cloud LLM services have achieved great success, privacy concerns arise and users do not want their conversations uploaded to the cloud as these conversations could involve sensitive personal information. Our TPI-LLM system addresses the privacy issue by enabling LLM inference on mobile devices with limited computing and memory resources. The system leverages multiple mobile devices to perform inference through tensor parallelism, combined with a sliding window memory scheduler to reduce peak memory footprint. Currently, TPI-LLM can run Yi-34B in full precision on 4 laptops with 5GB of memory on each laptop, and run Llama 2-70B on 8 devices with 3GB of memory on each device. Furthermore, TPI-LLM has demonstrated 80%-90% less TTFT and token latency compared to Transformers, Accelerate, Galaxy, and 43%-55% less compared to llama.cpp on larger models (>13B).
2018 - 2023
GeoMX - Accepted and adopted by ZTE Co., Ltd.

GeoMX is a fast and unified distributed system for training ML models over geographical data centers, which offers 20x speedup under identical network conditions.
2022 - 2024
NetStorm - Accepted by IEEE/ACM TON (CCF A)

NetStorm is an topology-adaptive and communication-efficient system designed for geo-distributed machine learning training, which achieves a speedup of 7.5~9.2 times over the standard GeoMX system.
2023 - 2024
KlonetAI - An intelligent agent adopted by a work accepted by NSDI 24 (CCF A)

Klonet is designed to support the deployment and testing of new network protocols and applications in a realistic environment, such as distributed machine learning and federated learning, and KlonetAI provides an AI agent for intelligent interaction with the Klonet platform.
2022 - 2023
AGOD - AI-generated optimization decision accepted by IEEE TMC (CCF A)

This project is an implementation of the system design and the deep diffusion soft actor-critic (D2SAC) algorithm
2022 - 2023
PerSF-SemCom - Personalized saliency-based semantic communication accepted by IEEE JSAC (CCF A)

This project implements an energy-efficient task-oriented semantic communication framework with a triple-based scene graph for image information at the semantic level, and then designs a personalized semantic encoder based on user interests to meet the requirements of personalized saliency.
2019 - 2021
NBSync - An asynchronous pipelining scheduler accepted by IEEE TSC (CCF A)

NBSync is a novel training algorithm for distributed ML over WANs, which greatly speeds up the model training by the parallelism of local computing and global synchronization. NBSync employs a well-designed pipelining scheme, which relaxes the sequential dependency of local computing and global synchronization and process them in parallel so as to overlap their operating overhead in the time dimension. NBSync also realizes flexible, differentiated and dynamical local computing for workers to maximize the overlap ratio in dynamically heterogeneous training environments.
2018 - 2019
ESync - An efficient DML synchronization algorithm accepted by IEEE TSC (CCF A)

ESync is an efficient synchronization algorithm designed for distributed ML tasks in heterogeneous clusters (the cluster consists of computing devices with different computing capabilities).
2018 - 2025
Other Programs

These programs are close sourced due to IP and confidentiality protocols.
- 2018-2020: Advanced Distributed Machine Learning Techniques. Provincial and Ministerial Key Program. Approved.
- 2018-2019: Advanced Data Center Network Architectures. Huawei Technologies Co., Ltd. Approved.
- 2019-2020: Communication Optimizations for Distributed Machine Learning over WANs. Peng Cheng Laboratory. Approved.
- 2021-2025: Computing Power Network and New Communication Primitives. ZTE Communication Co., Ltd. In progress.
- 2022-2023: Accelerating Data Transmission for Geographically Distributed Machine Learning. Zhejiang Lab. Approved.
- 2022-2023: Advanced Network Technologies for Giant Connections, Large Traffic, and Low Latency in the Rapid Evolution of 5G/B5G. National Key Research and Development Program. Approved.

Basics

Work

MBZUAI

Yingcai Honors College of UESTC

Guiding undergraduate students in the Yingcai Honors College of UESTC to carry out academic research and publish high-quality academic papers.

Peng Cheng Laboratory (PCL)

Guiding PCL researchers to develop an communication-efficient geo-distributed machine learning system.

Education

Nanyang Technological University

School of Computer Science and Engineering

University of Oxford

Lady Margaret Hall

University of Electronic Science and Technology of China

School of Information and Communication Engineering

Awards

IEEE Transactions on Cloud Computing

University of Electronic Science and Technology of China

National Doctoral Forum on Information and Communication Engineering

Topic: Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route

University of Electronic Science and Technology of China

China Electronics Technology Group Corporation (CETC)

Guangdong Computer Society

Title: Personalized Saliency in Task-oriented Semantic Communications: Image Transmission and Performance Analysis.Journal: IEEE Journal on Selected Areas in Communications (CCF A)

Nanjing Pukou Economic Development Zone

University of Electronic Science and Technology of China

University of Electronic Science and Technology of China

China Institute of Communications (CIC)

Our developed geo-distributed ML system named GeoMX is recommended by CIC to be published as the Future Network Leading Innovative Scientific and Technological Achievements in 2021!

Intelligent Network and Application Research Center

University Computer Course Teaching Steering Committee of the Ministry of Education and China Institute of Communications (CIC)

ZHIHUI CUP - National College Financial Technology Innovation Competition

Oxfordshire County Council and Oxford City Council

University of Electronic Science and Technology of China

Talks

The 8th China Open Source Conference (COSCon '23)

Guest Speaker of the Main Forum

National University AI Education Summit

Invited Lecturer

The University of Birmingham Dubai

Keynote Speaker & Roundtable Guest

Projects

GeoMX is a fast and unified distributed system for training ML models over geographical data centers, which offers 20x speedup under identical network conditions.

NetStorm is an topology-adaptive and communication-efficient system designed for geo-distributed machine learning training, which achieves a speedup of 7.5~9.2 times over the standard GeoMX system.

Klonet is designed to support the deployment and testing of new network protocols and applications in a realistic environment, such as distributed machine learning and federated learning, and KlonetAI provides an AI agent for intelligent interaction with the Klonet platform.

This project is an implementation of the system design and the deep diffusion soft actor-critic (D2SAC) algorithm

This project implements an energy-efficient task-oriented semantic communication framework with a triple-based scene graph for image information at the semantic level, and then designs a personalized semantic encoder based on user interests to meet the requirements of personalized saliency.

ESync is an efficient synchronization algorithm designed for distributed ML tasks in heterogeneous clusters (the cluster consists of computing devices with different computing capabilities).

These programs are close sourced due to IP and confidentiality protocols.

Title: Personalized Saliency in Task-oriented Semantic Communications: Image Transmission and Performance Analysis.
Journal: IEEE Journal on Selected Areas in Communications (CCF A)