【難病】記録kafka.common.Consumer RebalanceFailedException:例外

2581 ワード

kafka

最近kafkaに用いられているが,使用中にデータの送信と受信に多くの異常が発生している.次のような異常があります.

Exception in thread "main"kafka.common.ConsumerRebalanceFailedException:
groupB_ip-10-38-19-230-1414174925481-97fa3f2a can't rebalance after 4
retries
at
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:432)
at
kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:722)
at
kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:212)

at kafka.javaapi.consumer.Zookeeper……

debugは、Consumer側でコードが走っていることを発見しました.

Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap =   this.consumer
               .createMessageStreams(topicCountMap);

この行は「動かない」となり、上記の異常が発生します.インターネットで関連ソリューションを検索します.Consumer端のzookeeperをsync.time.msプロパティを大きく設定し、試しても問題は変わりません.

次のアドレスで信頼できる解決策を見つけるまで:

https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?>

英語の原文から引用する.

consumer rebalancing fails (you will see ConsumerRebalanceFailedException): This is due to conflicts when two consumers are trying to own the same topic partition. The log will show you what caused the conflict (search for "conflict in ").

If your consumer subscribes to many topics and your ZK server is busy, this could be caused by consumers not having enough time to see a consistent view of all consumers in the same group. If this is the case, try Increasing rebalance.max.retries and rebalance.backoff.ms.

Another reason could be that one of the consumers is hard killed. Other consumers during rebalancing won't realize that consumer is gone after zookeeper.session.timeout.ms time. In the case, make sure that rebalance.max.retries * rebalance.backoff.ms > zookeeper.session.timeout.ms.

次に、太字部分の解決策を試してみました.Consumer側で2つの属性を以下のように設定します.

props.put("rebalance.max.retries", "5");
props.put("rebalance.backoff.ms", "1200");

5*1200=6000の値がzookeeperより大きいことを確認する.session.timeout.msプロパティに対応する値(ここでは5000です).再びProducer側とConsumer側を別々に起動し、問題はやはり解決した.

注意:サービス側Producerのmetadata.broker.Listプロパティは1つだけではないほうがいいです.これにより、負荷のバランスが要求されます.
PS:kafkaのいくつかの異常については、その動作メカニズムをより明確に理解する必要がありますが、私はそんなに時間がありません.だから一時的に仏足を抱いて問題を解決しました.

Python 3——ファイル操作

【記録】AutoMapper Project To not support ResolveUsing