This is an urgent data lake bug report: as approaching the deadline for submission, we applied our scheduled weekly data pull to get the most recent data for our model re-training and backtesting, before we will create the demo video.
However, some important features (weather) for our model are suddenly all gone! for example, AverageSurfaceAirPressure, AverageDailyTemperature, …, for Newyork, California, Florida,etc.
Could you help us to check this issue ASAP? Otherwise, we can only present and demo our older model which was created by last week (trained by data 3 weeks ago with 2 weeks as out-of-sample backtesting), the accuracy could be weaker : (
For later the production purpose, we are planning to auto-retrain our model by weekly or fortnightly to catch up with recent data. For that purpose, if any data with any specific event happened (like all missing or abnormal values), would be great to have a Data Flow Event flag, so we can automatically pause or reschedule our model trainings.