Published on 14 Nov 2023

NBS Knowledge Lab Interdisciplinary Distinguished Speaker Series webinar: Reinforcement Learning for Quantitative Trading

In this webinar, our esteemed speaker delved into the exciting research advancements in Reinforcement Learning (RL) for Quantitative Trading (QT) and shed light on future prospects.

In the last decade, AI has significantly reshaped Quantitative Trading (QT). Reinforcement Learning (RL) has emerged as a captivating approach for QT. In the latest instalment of the NBS Knowledge Lab Interdisciplinary Distinguished Speaker Series, co-hosted by NBS's Centre for Sustainable Finance Innovation (CSFI), we had the privilege of hosting Professor Bo An. He delved into the exciting research advancements in RL for QT. Prof. An holds the prestigious President's Council Chair Professor position at the School of Computer Science and Engineering, NTU.

Leading the session were NBS's Professor Xin (Simba) Chang, who serves as the Associate Dean (Research), and Associate Professor Byoung-Hyoun Hwang from the Division of Banking and Finance.

The following is an edited transcript, with some key takeaways: 

Professor Bo An on Reinforcement Learning  

QT is a huge market with many people involved. Applying AI to QT is quite straightforward because there is a lot of data available. For instance, machine learning-based methods could be applied to make predictions on the trend of assets and to assess the risks. 

Using AlphaZero, the AI chess programme as a reference, Prof An said that the idea behind RL is very simple: Different versions of agents compete with each other and the outcome is used to update the policy of those agents. Over the years, Prof An and his team have been working on RL with many different industry partners, deploying their systems or algorithms for fraud detection, recommendation, ride-hailing, and financial market trading.  

RL and non-RL based financial methods  

Rule-based financial methods are sensitive to hyperparameter, highly reliant on market condition and often have poor generalisation ability. Alternatively, prediction-based approach may not be accurate; there is a gap between prediction and decision.  

In RL, the prediction step is completely removed. A policy is trained directly from data, which removes worries about inconsistencies. This end-to-end training allows easy incorporation of practical constraints. It is also easy to balance profit maximisation and other measures, such as risk. However, RL-based approach has not really been deployed in industries, partly due to the limited academic research as well as a lack of evaluation benchmarks. There is also no holistic platform that can be used to facilitate research, testing and deployment.   

RL in QT 

Prof An shared several examples of his team’s industrial collaborations on RL. Their first collaboration was with a firm that wanted to try applying RL to portfolio management. The team came up with a model called hierarchical portfolio management: a policy was trained to tell users how to change the weights of different stocks and to be used to trade in real time to achieve the change of the weights. The experimental results showed that the team fared much better than all the existing benchmarks. The collaborating company also reported that their annual return increased by 25-30% after adopting the team’s strategy. 

The team has been involved in many other projects using RL, projects that involved intraday trading, prediction, RL motivated by high frequency trading, and portfolio management, etc. 

Importantly, the team is planning to release their code and trading strategy to the public within a month, giving everyone interested in RL an opportunity to try it out. In addition, the team came up with an evaluation framework called “PRUDEX”, which proposes 17 evaluation measures and has visualisation toolkits that can be used to create a comprehensive evaluation of one’s trading strategy.  

Addressing gaps in RL research 

To address the need for a platform that can facilitate the building of RL agents for financial market trading, the team created “TradeMaster”, a holistic, first-of-its-kind, RL-based QT platform. TradeMaster provides data with different data frequency and for different financial assets. It deals with both macro-level tasks and micro-level tasks. Algorithms designed for financial market problems as well as classic RL algorithms not designed for QT problems are provided. There are also auto-RL algorithms that can be used to tune the hyperparameters used for RL algorithms. In addition, there are evaluation toolkits for different metrics and regionalisation as well as codes for deployment on the cloud.  

The team also built a simulator using real data for some stocks. Such a market simulator has two advantages: it can be used for extensive evaluation and as traditional data, such that this can be treated as data augmentation. More information on TradeMaster can be found here: 

Q&A session 

To a question on the kind of data used by the team, Prof An clarified that the data used is from Yahoo Finance. The team’s code uses data that everybody has access to. This data is also placed on their platform. While information such as “social media data’ or companies’ voluntary disclosures may help strengthen their model, the team does not currently have the bandwidth to take them into account. At the moment, the team is trying to just purely use the data that everybody can access but Prof An welcomed others who are interested to extend their framework and integrate more data. 

Addressing a question about the possible loss of profit if everybody uses RL to trade, Prof An pointed out that at the moment, this is not such a large concern because it takes time for people to adopt new technology. Deep learning-based approaches often fail and there is no explanation, even though good performance is shown. He clarified that the team has not looked into how market stability will be affected if people all start using RL. However, they have looked at distributions and shifts in markets in an RL environment for other projects. 

Audience members were also interested to know the major concerns when moving towards real investment in financial markets. Prof An said that the team used early data to train a policy and test it at a later stage. Their trading activity will not impact the stock price because often they are trading at very small volume. It is impossible to be sure what is going to happen tomorrow and the best that can be done is to do more extensive evaluation. This is why the team wants to create a simulator to generate markets of different styles and simulate markets they have never seen.  

Key takeaways 

One notable advantage of the end-to-end training approach is its elimination of the prediction component, effectively mitigating the inaccuracies associated with prediction-based methods. According to Prof An and his team's research findings, RL appears to offer greater reliability compared to current models. However, its broader adoption in industries is still pending, given the time required for integrating new technologies.


Watch the webinar here: