||Men, H., Hosu, V., Lin, H., Bruhn, A., Saupe, D.
||Current benchmarks for optical flow algorithmsevaluate the estimation either directly by comparing the pre-dicted flow fields with the ground truth or indirectly by us-ing the predicted flow fields for frame interpolation and thencomparing the interpolated frames with the actual frames. Inthe latter case, objective quality measures such as the meansquared error are typically employed. However, it is wellknown that for image quality assessment, the actual qualityexperienced by the user cannot be fully deduced from suchsimple measures. Hence, we conducted a subjective qualityassessment crowdscouring study for the interpolated framesprovided by one of the optical flow benchmarks, the Middle-bury benchmark. It contains interpolated frames from 155methods applied to each of 8 contents. For this purpose, wecollected forced choice paired comparisons between inter-polated images and corresponding ground truth. To increasethe sensitivity of observers when judging minute differencein paired comparisons we introduced a new method to thefield of full-reference quality assessment, called artifact am-plification. From the crowdsourcing data (5887 participants,3720 comparisons of 50 votes each) we reconstructed abso-lute quality scale values according to Thurstone’s model. Asa result, we obtained a re-ranking of the 155 participating al-gorithms w.r.t. the visual quality of the interpolated frames.This re-ranking not only shows the necessity of visual qual-ity assessment as another evaluation metric for optical flowand frame interpolation benchmarks, the results also provide the ground truth for designing novel image quality assess-ment (IQA) methods dedicated to perceptual quality of inter-polated images. As a first step, we proposed such a new full-reference method, called WAE-IQA. By weighing the localdifferences between an interpolated image and its groundtruth WAE-IQA performed slightly better than the currentlybest FR-IQA approach from the literature.