Multi-site coding techniques allow fast simulations of cellular automata that are economical in the use of memory. In these techniques the transition rule must be expressed using only bitwise operations. We present an algorithm for the simulation of generic totalistic and outer totalistic cellular automata which uses a multi-site coding technique. The algorithm is based on the careful use of (a) improvements over the canonical forms by using the exclusive-or operation, (b) optimal storage of the configuration in the computer memory, and (c) appropriate construction of stochastic rules. Items (b) and (c) of the method can be also applied to non-totalistic automata in any dimension.