Tensorflow: RC 0.10 3X Slower than 0.9 -


i compiling current master version source. if compile using cuda 7.5 , cudnn 4.0 following compilation error:

error: /home/rob/tensorflow/tensorflow/contrib/rnn/build:45:1: undeclared inclusion(s) in rule '//tensorflow/contrib/rnn:python/ops/_lstm_ops_gpu': rule missing dependency declarations following files included 'tensorflow/contrib/rnn/kernels/lstm_ops_gpu.cu.cc': '/usr/local/cuda-7.5/include/cuda_runtime.h' '/usr/local/cuda-7.5/include/host_config.h' '/usr/local/cuda-7.5/include/builtin_types.h' [etc...] 

if compile cudnn 5.1, compiles , runs execution time 3x longer training script running compared same using 0.9.0 release installed via pip.

i tried pip version of 0.10.rc0 (gpu) , saw same 3x slow down vs. version 0.9.0

i using ubuntu 14.04, py 3.4 , tesla k40c gpu. bazel version 0.3.1

what cause of 3x slow down of ver 0.10.0rc0 , there way regain prior performance?

secondarily, how eliminate build errors when using cudnn 4?

the relative slowness of 0.10.0rc0 confirmed bug being addressed. more information , status can found in this thread.


Comments