||Video Quality Assessment (VQA) methods have beendesigned with a focus on particular degradation types, usuallyartificially induced on a small set of reference videos. Hence,most traditional VQA methods under-perform in-the-wild. Deeplearning approaches have had limited success due to the smallsize and diversity of existing VQA datasets, either artificial orauthentically distorted. We introduce a new in-the-wild VQAdataset that is substantially larger and diverse: FlickrVid-150k.It consists of a coarsely annotated set of 153,841 videos having5 quality ratings each, and 1600 videos with a minimum of89 ratings each. Additionally, we propose new efficient VQAapproaches (MLSP-VQA) relying on multi-level spatially pooleddeep features (MLSP). They are extremely well suited for trainingat scale, compared to deep transfer learning approaches. Ourbest method MLSP-VQA-FF improves the Spearman Rank-order Correlation Coefficient (SRCC) performance metric onthe standard KonVid-1k in-the-wild benchmark dataset to 0.83surpassing the best existing deep-learning model (0.8 SRCC)and hand-crafted feature-based method (0.78 SRCC). We furtherinvestigate how alternative approaches perform under differentlevels of label noise, and dataset size, showing that MLSP-VQA-FF is the overall best method. Finally, we show that MLSP-VQA-FF trained on FlickrVid-150k sets the new state-of-the-artfor cross-test performance on KonVid-1k and LIVE-Qualcommwith a 0.79 and 0.58 SRCC, respectively, showing excellent generalization.