SSD-Tensorflowトレーニング自己データのInvalidArgumentErrorエラーの解決

3781 ワード

SSD-Tensorflowで自分のデータを訓練しているとき、このような問題に遭遇しました.
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [8] rhs shape= [84]
  [[Node: save/Assign_15 = Assign[T=DT_FLOAT, _class=["loc:@ssd_300_vgg/block10_box/conv_cls/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ssd_300_vgg/block11_box/conv_cls/biases, save/RestoreV2_15)]]

理由:train_ssd_network.pyファイルのslim.learning.train関数でtf_が呼び出されましたutils.get_init_fn(FLAGS)は、train_が存在する場合、ネットワークパラメータを初期化するために使用される.dirフォルダは関数がNoneに直接戻り、次のcheckpoint_は実行されません.exclude_scopes操作により、このエラーが発生しました.
SSD-Tensorflow部分トレーニングコード:
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=FLAGS.gpu_memory_fraction)
config = tf.ConfigProto(log_device_placement=False,
                        gpu_options=gpu_options)
saver = tf.train.Saver(max_to_keep=5,
                       keep_checkpoint_every_n_hours=1.0,
                       write_version=2,
                       pad_step_number=False)
slim.learning.train(
    train_tensor,
    logdir=FLAGS.train_dir,
    master='',
    is_chief=True,
    init_fn=tf_utils.get_init_fn(FLAGS),            #  
    summary_op=summary_op,                          # tf.summary.merge 
    number_of_steps=FLAGS.max_number_of_steps,      #  step
    log_every_n_steps=FLAGS.log_every_n_steps,      #  
    save_summaries_secs=FLAGS.save_summaries_secs,  #  summary 
    saver=saver,                                    # tf.train.Saver 
    save_interval_secs=FLAGS.save_interval_secs,    #  model step 
    session_config=config,                          # sess 
    sync_optimizer=None)

呼び出された初期化関数コード:
def get_init_fn(flags):
    """Returns a function run by the chief worker to warm-start the training.
    Note that the init_fn is only run when initializing the model during the very
    first global step.
 
    Returns:
      An init function run by the supervisor.
    """
    if flags.checkpoint_path is None:
        return None
    # Warn the user if a checkpoint exists in the train_dir. Then ignore.
    if tf.train.latest_checkpoint(flags.train_dir):
        tf.logging.info(
            'Ignoring --checkpoint_path because a checkpoint already exists in %s'
            % flags.train_dir)
        return None #  !
 
    exclusions = []
    if flags.checkpoint_exclude_scopes:
        exclusions = [scope.strip()
                      for scope in flags.checkpoint_exclude_scopes.split(',')]
 
    # TODO(sguada) variables.filter_variables()
    variables_to_restore = []
    for var in slim.get_model_variables():
        excluded = False
        for exclusion in exclusions:
            if var.op.name.startswith(exclusion):
                excluded = True
                break
        if not excluded:
            variables_to_restore.append(var)
    # Change model scope if necessary.
    if flags.checkpoint_model_scope is not None:
        variables_to_restore = \
            {var.op.name.replace(flags.model_name,
                                 flags.checkpoint_model_scope): var
             for var in variables_to_restore}
 
 
    if tf.gfile.IsDirectory(flags.checkpoint_path):
        checkpoint_path = tf.train.latest_checkpoint(flags.checkpoint_path)
    else:
        checkpoint_path = flags.checkpoint_path
    tf.logging.info('Fine-tuning from %s. Ignoring missing vars: %s' % (checkpoint_path, flags.ignore_missing_vars))
 
    return slim.assign_from_checkpoint_fn(
        checkpoint_path,
        variables_to_restore,
        ignore_missing_vars=flags.ignore_missing_vars)

解決策:train_を削除するdirフォルダ.