Abstract
This dissertation investigates surrogate modeling for fixed-location environmental forecasting using novel data-combination techniques. The work surveys the landscape of observational measurements and numerically generated data, identifying similar research and gaps in current methodologies. The comparison situates surrogate learning within this context, as fast and accurate predictions are a key issue. The ratio-coupled training framework is introduced to combine two data sources per predicted feature through a tunable parameter that weights training signal strength. An optimization scheme is developed to simultaneously tune surrogate weights and the coupled signal ratio, allowing relative influence between signals to act as an explicit regularizer.
Three case studies demonstrate the methodology and showcase the approach in a variety of contexts. The first study is based on the partial differential equations that establish the Cahn-Hilliard equations. Few features and a small example show the technique can be used for non-linear equation estimation. The second study uses buoy observation data and two model sources to couple three features at once. A lack of data quantity and use of an additive architecture resulted in results unique from the other case studies. Finally, a weather station dataset of nearby observation platforms is used with many coupled features. Baselines were improved upon with careful model tuning and use of a multi-year dataset at training time. In all examples, Gaussian noise is found to be a strong regularizer. When used within the optimization scheme, many examples even showed improvement over the direct numerical coupling.
Comprehensive comparisons of search methods indicate that optimized and Bayesian hyperparameter selection techniques can deliver competitive accuracy while keeping the hyperparameter tuning time low. Localized surrogates can generate forecast inferences significantly faster than global numerical analyses once a performant model is trained. The results provide guidance on model selection, hyperparameter selection, and search strategy, depending on the underlying dataset and prediction domain. The dissertation concludes with an overview of best methods found as well as suggestions for continued improvements in the near- and long-term.
Biography
Austin Brian Schmidt completed his M.S. degree in computer science in 2021 and is a current awardee of the prestigious SMART scholarship, awarded by the DoD. Mr. Schmidt is currently pursuing a Ph.D. in engineering and applied sciences at the University of New Orleans with a concentration in machine learning and artificial intelligence. Alongside working on degree requirements, he conducts machine learning research at the Canizaro Livingston Gulf States Center for Environmental Informatics (GulfSCEI) and attends summer internships with the Naval Oceanographic Office (NAVOCEANO).
