RCEA Big Data & Machine Learning Conference

The Rimini Centre for Economic Analysis is organizing the RCEA Big Data & Machine Learning Conference online on May 13-14 2022.


The Conference aims to discuss recent advances in the broad field of Big Data & Machine Learning, in relation to both methods and applications.



Conference Program



May 13 (Friday) - Link to Conference

Opening Introduction

08:50-09:00 am PT Marcelle Chauvet

Session I Keynote Speaker (Moderator: Marcelle Chauvet)

09:00-10:00 am PT Susan Athey (Stanford University)

“Adaptive Experiments for Targeted Treatment Assignment: Theory and Applications”

10:00-10:10 am PT Discussion

10:10-10:15 am PT Break

Session II Invited Speakers (Moderator: Matthew Harding)

10:15-11:00 am PT Vasilis Syrgkanis (Microsoft Research)

“Automatic Debiased Machine Learning for Dynamic Structural and Causal Parameters”

11:00-11:10 am PT Discussion


11:10-11:55 am PT Sean Klein & Tarek Nassar (PIMCO)

“Horizons and Filtrations in Forecasting Models”

11:55-12:05 pm PT Discussion


12:05-12:50 pm PT Philippe Goulet Coulombe (UQAM)

“A Neural Phillips Curve and a Deep Output Gap”

12:50-01:00 pm PT Discussion


1:00pm-1:30pm PT Happy Hour - Get together via Zoom Meeting

May 14 (Saturday) Link to Conference


Session III Keynote Speaker (Moderator: Aman Ullah)

09:00-10:00 am PT Jianqing Fan (Princeton University)

“Structural Deep Learning in Conditional Asset Pricing”

10:00-10:10 am PT Discussion

10:10-10:15 am PT Break

Session IV Invited Speakers (Moderator: Tae-Hwy Lee)

10:15-11:00 am PT Markus Pelger (Stanford University)

“Missing Financial Data”

11:00-11:10 am PT Discussion

11:10-11:55 am PT Paolo Giordani (Norwegian Business School)

“Data Efficient Machine Learning with SMARTboost”

11:55-12:05 pm PT Discussion


12:05-12:50 pm PT Dimitris Korobilis (University of Glasgow)

“Bayesian Dynamic Variable Selection in High Dimensions”

12:50-01:00 pm PT Discussion


1:00pm-1:30pm PT LINK FIXED! Happy Hour - Get together via Zoom Meeting


---------------------------------------------------------------------

All times are in Pacific Time (GMT-7).



Title and Abstract

Susan Athey (Stanford University)

Title: “Adaptive Experiments for Targeted Treatment Assignment: Theory and Applications”

Abstract:

Vasilis Syrgkanis (Microsoft Research)

Title: “Automatic Debiased Machine Learning for Dynamic Structural and Causal Parameters”

Abstract: We extend the idea of automated debiased machine learning to the dynamic treatment regime. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. (Joint work with Victor Chernozhukov, Whitney Newey, and Rahul Singh)

Paper: https://arxiv.org/abs/2203.13887

Sean Klein & Tarek Nassar (PIMCO)

Title: “Horizons and Filtrations in Forecasting Models”

Abstract: Forecasting models typically predict future outcomes one step ahead. For certain applications contingent on the knowledge of future values over an extended period of time, this is insufficient. Furthermore, when the target is to predict a time aggregated value, predicting one step ahead is not an option. We present an approach, referred to as a stability loss function model, tailored to predicting multiple steps ahead. The model shows considerable improvements over standard one step ahead forecasts and it is possible to empirically determine the optimal forecasting horizon with this approach. Finally, we highlight practical issues that can influence the determinants of the filtration or the horizon for applied modeling, including concerns around the data, the question being asked, model performance, and the model’s application and use. We show how incorporating this information can improve the performance and usability of models with an application to forecasting asset flows.

Philippe Goulet Coulombe (UQAM)

Title: “A Neural Phillips Curve and a Deep Output Gap”

Abstract: Many problems plague the estimation of Phillips curves. Among them is the hurdle that the two key components, inflation expectations and the output gap, are both unobserved. Traditional remedies include creating reasonable proxies for the notable absentees or extracting them via some form of assumptions-heavy filtering procedure. I propose an alternative route: a Hemisphere Neural Network (HNN) whose peculiar architecture yields a final layer where components can be interpreted as latent states within a Neural Phillips Curve. There are benefits. First, HNN conducts the supervised estimation of nonlinearities that arise when translating a high-dimensional set of observed regressors into latent states. Second, computations are fast. Third, forecasts are economically interpretable. Fourth, inflation volatility can also be predicted by merely adding a hemisphere to the model. Among other findings, the contribution of real activity to inflation appears severely underestimated in traditional econometric specifications. Also, HNN captures out-of-sample the 2021 upswing in inflation and attributes it first to an abrupt and sizable dis-anchoring of the expectations component, followed by a wildly positive gap starting from late 2020. HNN's gap unique path comes from dispensing with unemployment and GDP in favor of an amalgam of nonlinearly processed alternative tightness indicators -- some of which are skyrocketing as of early 2022.

Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4018079

Jianqing Fan (Princeton University)

Title: “Structural Deep Learning in Conditional Asset Pricing”

Abstract: We develop new financial economics theory guided structural nonparametric methods for estimating conditional asset pricing models using deep neural networks, by employing time-varying conditional information on alphas and betas carried by firm-specific characteristics. Contrary to many applications of neural networks in economics, we can open the “black box” of machine learning predictions by incorporating financial economics theory into the learning, and provide an economic interpretation of the successful predictions obtained from neural networks, by decomposing the neural predictors as risk-related and mispricing components. Our estimation method starts with period-by-period cross-sectional deep learning, followed by local PCAs to capture time-varying features such as latent factors of the model. We formally establish the asymptotic theory of the structural deep-learning estimators, which apply to both in-sample fit and out-of-sample predictions. We also illustrate the “double-descent-risk” phenomena associated with over-parametrized predictions, which justifies the use of over-fitting machine learning methods. (Joint work with Tracy Ke, Yuan Liao, and Andreas Neuhierl)


Markus Pelger (Stanford University)

Title: “Missing Financial Data”

Abstract: Missing data is a prevalent, yet often ignored, feature of company fundamentals. In this paper, we document the structure of missing financial data and show how to systematically deal with it. In a comprehensive empirical study we establish four key stylized facts. First, the issue of missing financial data is profound: it affects over 70% of firms that represent about half of the total market cap. Second, the problem becomes particularly severe when requiring multiple characteristics to be present. Third, firm fundamentals are not missing-at-random, invalidating traditional ad-hoc approaches to data imputation and sample selection. Fourth, stock returns themselves depend on missingness, with “opaque" companies having significantly lower expected returns. We propose a novel imputation method to obtain a fully observed panel of firm fundamentals. It exploits both time-series and cross-sectional dependency of firm characteristics to impute their missing values, while allowing for general systematic patterns of missing data. Our approach provides a substantial improvement over the standard leading empirical procedures such as using cross-sectional averages or past observations. Our results have crucial implications for many areas of asset pricing. (Joint work with Svetlana Bryzgalova, Sven Lerner, and Martin Lettau)

Paolo Giordani (Norwegian Business School)

Title: “Data Efficient Machine Learning with SMARTboost”

Abstract: Boosted regression trees are currently the tool of choice for prediction with unstructured tabular data, and are capable of approximating highly complex functions in large samples. However, their performance suffers when the sample and/or the signal-to-noise ratio (SNR) is low, due to a lack of any prior to suggest that some functions are more plausible than others. We argue that in many instances, economic and financial datasets are effectively ‘small’ even when the sample size is nominally large. SMARTboost (boosting of symmetric smooth additive regression trees) is a recently introduced machine learning model designed for good performance with both large and small sample sizes and in both high and low SNR environments. With small samples or low SNR, the priors built into SMARTboost favor relatively simple and monotonic functions, while arbitrarily complex functions can be approximated (often much more efficiently compared to existing boosting alternatives) in large samples as the data overwhelm the priors.


Dimitris Korobilis (University of Glasgow)

Title: “Bayesian Dynamic Variable Selection in High Dimensions”

Abstract: This paper proposes a variational Bayes algorithm for computationally efficient posterior and predictive inference in time-varying parameter (TVP) regression models. Within this context we specify a new dynamic variable/model selection strategy for TVP dynamic regression models in the presence of a large number of predictors. The proposed variational Bayes dynamic variable selection (VBDVS) algorithm allows for assessing at each time period in the sample which predictors are relevant (or not) for forecasting the dependent variable. The algorithm is applied to the problem of forecasting inflation using over 400 macroeconomic, financial and global predictors, many of which are potentially irrelevant or short-lived. We find that the new methodology is able to ensure parsimonious solutions to this high-dimensional estimation problem, that translate into excellent forecast performance. (Joint work with Gary Koop)