[Urgent] missing weather data for NewYork, California, Florida, ..., (ALL gone for all dates))

@C3.aiGC
This is an urgent data lake bug report: as approaching the deadline for submission, we applied our scheduled weekly data pull to get the most recent data for our model re-training and backtesting, before we will create the demo video.
However, some important features (weather) for our model are suddenly all gone! for example, AverageSurfaceAirPressure, AverageDailyTemperature, …, for Newyork, California, Florida,etc.

Could you help us to check this issue ASAP? Otherwise, we can only present and demo our older model which was created by last week (trained by data 3 weeks ago with 2 weeks as out-of-sample backtesting), the accuracy could be weaker : (

For later the production purpose, we are planning to auto-retrain our model by weekly or fortnightly to catch up with recent data. For that purpose, if any data with any specific event happened (like all missing or abnormal values), would be great to have a Data Flow Event flag, so we can automatically pause or reschedule our model trainings.

Hi @Haonan, thanks for letting us know. We’ll look into this and get back to you as soon as possible.

@Haonan we’ve been unable to replicate this issue. The same command as you show currently returns valid data with no missing values. Please let us know if the issue persists.

@C3.aiGC also just tested as now, the data populating now : D

The last time, when we were running into the missing data issue as in the screenshot attached before, was UK time - Nov 7th Saturday morning ~ 9 am to 10 am.

So does that mean: during the last weekend, you guys scheduled some maintenance or updating works, then that may cause the data missing issue if we were querying the datalake at that time? Would you mind to check your back-end log to verify?
Or more importantly for us, if we are going to set Cron job at our side to auto pull data from the datalake , should we try to avoid the schedule the data pull on the weekends?

Many thanks in advance

@C3.aiGC
Sorry to bug you guys again, but I think this issue may need your more attention as it looks like a hidden and reoccurred one : (
So this weekend, as part of our scheduled data pull, and I am applying some manual QCs on it’s results now, the weather data are gone again for US - like the same exact issue happens for last weekend.

Not all states, but some (for example, NewYork, Floriday, Califonia and etc).


The time I am running the above test script is 2020-11-15 09:30am UK time. As mentioned, we had an auto-scheduled data pull yesterday at around the same time, ran into the same issue.

We Did a data pull earlier at 2020-11-12 which returned all good data, and we will go live with that data for our production update this week.

Hi @Haonan, the data is back again. We will take a look at why this is a recurring problem.