Yuan, Ruifeng (袁瑞峰)

alt text 

PHD student,
Department of Computing,
The Hong Kong Polytechnic University
Hongkong, China
E-mail: ruifeng.yuan@connect.polyu.hk

About me

I have finished my PhD degree in computer science in The Hong Kong Polytechnic University. I received my B.Sc of Automation at Xiamen University. My research focuses on Nature Language Processing, Text Summarization and Large Language Model.

Research

My research mainly focuses on Nature Language Processing, including:

  • Text Summarization

  • LLM Pretraining

Publications

  1. Siming Huang, Tianhao Cheng, Jason Klein Liu, Jiaran Hao, Liuyihan Song, Yang Xu, J. Yang, J.H. Liu, Chenchen Zhang, Linzheng Chai, Ruifeng Yuan, Zhaoxiang Zhang, Jie Fu, Qian Liu, Ge Zhang, Zili Wang, Yuan Qi, Yinghui Xu, Wei Chu. "OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models", arxiv

  2. Ruifeng Yuan, Shichao Sun, Yongqi Li, Zili Wang, Ziqiang Cao, Wenjie Li "Personalized Large Language Model Assistant with Evolving Conditional Memory", arxiv

  3. Shichao Sun, Ruifeng Yuan, Ziqiang Cao, Wenjie Li, Pengfei Liu. "Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization"[C], Findings in ACL2024

  4. Shichao Sun, Junlong Li, Weizhe Yuan, Ruifeng Yuan, Wenjie Li, Pengfei Liu. "The critique of critique"[C], Findings in ACL2024

  5. Yushan Liu, Zili Wang, Ruifeng Yuan. "QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters"[C], AAAI2024.

  6. Shichao Sun, Ruifeng Yuan, Jianfei He, Ziqiang Cao, Wenjie Li, Xiaohua Jia. "Data Selection Curriculum for Abstractive Text Summarization"[C], Findings in EMNLP2023.

  7. Dongjie Yang, Ruifeng Yuan, YuanTao Fan, YiFei Yang, Zili Wang, Shusen Wang, Hai Zhao. "RefGPT: Dialogue Generation of GPT, by GPT, and for GPT"[C], Findings in EMNLP2023.

  8. Ruifeng Yuan, Shichao Sun, Zili Wang, Ziqiang Cao and Wenjie Li. "Separating Context and Pattern: Learning Disentangled Sentence Representations for Low-Resource Extractive Summarization"[C], Findings in ACL2023.

  9. Shichao Sun, Ruifeng Yuan, Wenjie Li, Sujian Li. ”Improving Sentence Similarity Estimation for Unsupervised Extractive Summarization”[C], ICASSP 2023.

  10. Shichao Sun, Ruifeng Yuan, Wenjie Li, Ziqiang Cao, Sujian Li. ”Dialogue acts enhanced extract–abstract framework for meeting summarization”[J], Information Processing & Management.

  11. Ruifeng Yuan, Zili Wang, Ziqiang Cao and Wenjie Li. "Preserve Context Information for Extract-Generate Long-Input Summarization Framework"[C], AAAI2023.

  12. Ruifeng Yuan, Zili Wang, Ziqiang Cao and Wenjie Li. "Few-shot Query-oriented Summarization with Prefix-merging"[C], EMNLP2022.

  13. Ruifeng Yuan, Zili Wang and Wenjie Li. "Event Graph based Sentence Fusion"[C], EMNLP2021.

  14. Ruifeng Yuan, Zili Wang and Wenjie Li. "Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT"[C], COLING2020.

  15. Yuan R, Zhou Q, Zhou W. "dTexSL: A dynamic disaster textual storyline generating framework"[J]. World Wide Web, 2018: 1-21.

  16. Zhou Q, Yuan R, Li T. "An improved textual storyline generating framework for disaster information management"[C]//Intelligent Systems and Knowledge Engineering (ISKE), 2017 12th International Conference. ISKE.2017.8258738

Academic service

  1. Area Chair in Text Summarization Track of AACL2022

Projects

  1. Large Language Model Development, Xiaohongshu Inc(intern), 02.2023-Present

    • The goal of this task is to train a large language model with the characteristics of Xiaohongshu by utilizing the Xiaohongshu corpus and other publicly available Chinese and English corpora data.

    • The language model can serve as the foundation for a series of business models in Xiaohongshu, such as relevance models and dialogue models, and so on.

    • In this project, I am mainly responsible for the pre-training part and some of the data generation work for Instruction Fine-tuning.

  2. Inverse Retrieval based on Query Generation, Xiaohongshu Inc(intern), 05.2022-02.2023

    • The goal of this task is to generate a series of diverse queries offline for Xiaohongshu documents, in order to perform additional semantic expansion on the documents.

    • By adding a inverse retrieval channel through a inverted index search, the diversity of recalled documents in the search is increased. Meanwhile, applying such inverse retrieval channel for specified documents can effectively increase the exposure rate of such documents (high-quality documents, commercial promotional documents, new documents).

    • The project has been launched and has achieved actual revenue. I am mainly responsible for data collection and the training of the query generation model.

Education

The Hong Kong Polytechnic University, Hong Kong, China

  • Ph.D student, Computing

  • Research Topics: Nature Language Processing, Text Summarization, LLM Pretraining

  • Supervised by Prof.Wenjie Li

  • August 2019 - now

Xiamen University, Xiamen, Fujian Province, China

  • Bachelor of Automation, School of Aerospace Engineering

  • Undergraduate Thesis: Storyline generation on disaster event

  • Supervised by Prof.Qifeng Zhou

  • August 2015 - March 2019