Whats the point to even benchmarking data throughput anyway, if the same bit of HLL source all runs the at same speed across multiple architectures?