public void informFailure(T failedObject) {
//If there is no backoff this method is a no-op.
if (!shouldBackOff) {
return;
}
//将该主机暂时移除可用主机列表
...
所以解决办法:配置max back off
问题3:Flume Log4j失败重连策略异常
问题体现在,设置了max back off,重连时间居然一直是2000ms,看了一下它的算法,指数退避算法。在OrderSelector.java的informFailure函数中。
1234567891011121314151617181920212223
public void informFailure(T failedObject) {
//If there is no backoff this method is a no-op.
if (!shouldBackOff) {
return;
}
FailureState state = stateMap.get(failedObject);
long now = System.currentTimeMillis();
long delta = now - state.lastFail;
long lastBackoffLength = Math.min(maxTimeout, 1000 * (1 << state.sequentialFails));
long allowableDiff = lastBackoffLength + CONSIDER_SEQUENTIAL_RANGE;
if (allowableDiff > delta) {
if (state.sequentialFails < EXP_BACKOFF_COUNTER_LIMIT) {
state.sequentialFails++;
}
} else {
state.sequentialFails = 1;
}
state.lastFail = now;
//Depending on the number of sequential failures this component had, delay
//its restore time. Each time it fails, delay the restore by 1000 ms,
//until the maxTimeOut is reached.
state.restoreTime = now + Math.min(maxTimeout, 1000 * (1 << state.sequentialFails));
}
最后生成的restoreTime即下一次进行重试的时间。我没有去设置avro connect time out 和request time out,默认都是20s,应该算是偏长了。根据他的算法,delta永远是大于40s,但是allowableDiff却一直是3s,4s.所以我直接改了判定条件,allowableDiff < delta,之后就正常。但是还存在一个问题,sequentialFails并不会在一段时间后reset.
问题4:Log4j异步加载器丢失日志数据
AsyncAppender默认缓冲区大小128,满了之后会丢失数据。调大缓冲区,avro connect time out 和request time out也得适当调一下
Event flumeEvent;
Object message = event.getMessage();
if (message instanceof GenericRecord) {
..
} else {
hdrs.put(Log4jAvroHeaders.MESSAGE_ENCODING.toString(), "UTF8");
//按照log4j.properties配置格式化日志
String msg = layout != null ? layout.format(event) : message.toString();
//author:edwardsbean
if(layout.ignoresThrowable()) {
String[] s = event.getThrowableStrRep();
if (s != null) {
int len = s.length;
for(int i = 0; i < len; i++) {
msg += s[i];
msg += Layout.LINE_SEP;
}
}
}
flumeEvent = EventBuilder.withBody(msg, Charset.forName("UTF8"), hdrs);
}
try {
rpcClient.append(flumeEvent);
日志接收端就可以接受到日志的详细信息:
123
[x] Received '2013-12-26 10:38:08 user log detail
[nd-PC2600/192.168.253.126] FATAL [com.xx.test.Main] java.lang.Exception: error detail
at com.xx.test.Main.main(Main.java:35)