Web accessibility entails far more than just paying attention to a handful of HTML tags and attributes: it deals with human behavior. As consequence we have a number of pitfalls that hamper accessibility evaluation of web pages. In this paper I review some of the research my coauthors and I did in the last four years that provides some experimental evidence. In fact, the three fundamental processes of (1) selecting the pages to be investigated, (2) finding their problems, and (3) measuring the corresponding accessibility levels are ridden with potential traps which affect reliability and even validity of evaluations. Knowing which traps are there and figuring out how to overcome them should rank high in the priority list of researchers and practitioners in accessibility. This is what is needed in order to move towards an engineering of accessibility.