I earned my Ph.D. from the Institute of Computer Science at the University of Bonn under the supervision of Professor Jens Lehmann (AMiner Most Influential Scholar for Knowledge Engineering, Chief Scientist of Amazon AlexaAI, and Co-Founder of DBpedia, with over 31,000 citations and an H-index of 67). My primary research focus was on temporal KG representation learning and reasoning. Previously, I obtained my bachelor's and master's degree from Zhejiang University's School of Control Science and Engineering, specializing in time-series signal processing and pattern recognition.
I have published 20+ papers as the first or corresponding author in top international conferences and journals in natural language processing (NLP) and data mining (DM), such as ICLR, NeurIPS, WWW, SIGIR, EMNLP, and TKDE. Additionally, over 20 papers were published in leading conferences and journals like TPAMI and KDD. In total, I have published over 40 papers, with more than 2,000 citations (Google Scholar).
I officially joined the Institute of Digital Economy of the Greater Bay Area (IDEA) in February 2023 as an AI Financial Research Scientist. I have been approved as a Category B Talent under the Shenzhen "Pengcheng Peacock Plan." I am responsible for leading projects such as large-scale financial behavior knowledge graphs (KGs) and financial large language models (LLMs), focusing on quantitative investment, financial QA, sentiment analysis, and financial text analysis and generation based on KGs and LLMs.
My research interests involve large language model (LLM) and knowledge graph (KG), including but not limited to LLM reasoning, LLM agents, multi-modal LLM, knowledge-driven LLMs, KG representation learning, KG reasoning, and KG alignment.
I'm looking for self-motivated interns at IDEA (Shenzhen). If you are interested in the above topics, please send me your resume by email.
LLMs have become indispensable in the field of natural language processing, excelling in various reasoning tasks such as text generation and understanding. Despite their remarkable performance, these models encounter challenges related to explainability, safety, hallucination, out-of-date knowledge, and deep reasoning capabilities, particularly when dealing with knowledge-intensive tasks.
Our research aims to delves into the potential of knowledge-driven LLM reasoning as a promising approach to
address these limitations. More specifically, knowledge-driven LLM reasoning leverages LLMs to interact with the external enviroment (consisting of a variety of knowledge sources, e.g., KGs, textual corpus, databases, code repositories) and retrieve necessary knowledge to enhance the understanding and generation
capabilities of LLMs, providing them the ability to reason over complex questions or tasks.
Among various knowledge sources, KGs offer structured, explicit,and editable representations of knowledge, presenting a complementary strategy to mitigate the limitations of LLMs. Thus, we first focus on the usage of KGs as external knowledge sources for LLMs and propose an algorithmic framework "Think-on-Graph" (meaning: LLMs "Think" along the reasoning paths "on" knowledge "graph" step-by-step, abbreviated as ToG (ICLR'24)). Using beam search, ToG allows LLM to dynamically explore a number of reasoning paths in KG and make decisions accordingly. Moreover, we also conduct a survey on the revolution of KGs and provide new perspective on the combination of LLMs and different types of KGs.
In the future, we will explore new approaches of knowledge-driven LLM reasoning that can incorporate different type of knowledge sources hybridly, as well as new KG types that can be easier to combined with LLMs than existing KGs. Besides, multi-modal LLMs (specifically for chart and table understanding), LLM agents for complex reasoning tasks are also our research interests.
Temporal Knowledge Graph (TKG) is structured as a multi-relational directed graph where each edge stands for an occurred fact. TKG consists of a large number of facts in the form of quadruple (subject entity, relation, object entity, timestamp) , or (s, r, o, t) for short, where entities (as nodes) are connected via relations with timestamps (as edges).
Many TKGs are human-created or automatically constructed from semi-structured and unstructured text, suffering the problem of incompleteness, i.e. many missing links among entities. This weakens the expressive ability of TKGs and restricts the range of TKG-based applications. To address this isse, we propose multiple TKG embedding model, including ATiSE (ISWC'2020 best student paper award nominee), TeRo (COLING'20), TeLM (NAACL'21), TRE (ECML'23) and TGeomE (TKDE), for predicting missing links on KGs by learning low-dimensional vector representations for entities, relations and timestamps. We also propose the first query embedding framework TFLEX (NeurIPS'23) which can model first-order logic and temporal logic in KG embedding spaces.
Since knowledge of TKGs is ever-changing and the temporal information makes TKGs highly dynamic. In the real world scenario, the emergence of new entities/relations in the development process over time creates the need for TKG reasoning in the inductive setting, where entities/relations in training TKGs do not completely overlap entities/relations in testing KGs. Thus, we propose MTKGE (WWW'23) and SST-BERT (SIGIR'23) which both can effectively model new emerging entities for inductive TKG reasoning.
Besides TKG reasoning, TKG alignment, which aims to finding equivalent entities between TKGs, can also benefit the completeness of TKGs by fusing TKGs. In our work TEA-GNN (EMNLP'2021), the task of TKG alignment was introduced for the first time. Follow-up works include TREA (WWW'22) and Simple-HHEA (WWW'24). In these papers, we propose different approaches and datasets, carefully designed for TKG alignment.