【发布时间】:2021-11-29 01:29:03
【问题描述】:
我正在尝试使用 cloud run 通过 HTTPS(流式传输)接收数百万条日志消息并将它们发送到云日志记录。
但我发现有一些数据丢失,云日志中的消息数量少于云运行接收。
这是我尝试过的示例代码,
# unzip data
data = gzip.decompress(request.data)
# split by lines
logs = data.decode('UTF-8').split('\n')
# output the logs
log_cnt = 0
for log in logs:
try:
# output to jsonPayload
print(json.dumps(json.loads(log_str))
log_cnt += 1
except Exception as e:
logging.error(F"messsage: {str(e)}")
如果我比较云日志中的 log_cnt 和日志数,log_cnt 更多。所以有些印刷品没有完成传递信息。
我尝试使用 logging API 代替 print(),但是使用 logging API 发送的日志数量太多(一分钟限制 12,000 次调用),因此导致延迟非常糟糕,无法稳定处理请求.
我怀疑移动的实例数量可能会导致它,所以我测试了活动实例什么时候没有改变,但仍然有 3-5% 的消息丢失。
我可以做些什么来将所有消息发送到云日志而不会有任何损失?
(更新)
这行数据看起来像这样,(大约 1kb)
{"key1": "ABCDEFGHIJKLMN","key2": "ABCDEFGHIJKLMN","key3": "ABCDEFGHIJKLMN","key4": "ABCDEFGHIJKLMN","key5": "ABCDEFGHIJKLMN","key6": "ABCDEFGHIJKLMN","key7": "ABCDEFGHIJKLMN","key8": "ABCDEFGHIJKLMN","key9": "ABCDEFGHIJKLMN","key10": "ABCDEFGHIJKLMN","key11": "ABCDEFGHIJKLMN","key12": "ABCDEFGHIJKLMN","key13": "ABCDEFGHIJKLMN","key14": "ABCDEFGHIJKLMN","key15": "ABCDEFGHIJKLMN","key16": "ABCDEFGHIJKLMN","key17": "ABCDEFGHIJKLMN","key18": "ABCDEFGHIJKLMN","key19": "ABCDEFGHIJKLMN","key20": "ABCDEFGHIJKLMN","key21": "ABCDEFGHIJKLMN","key22": "ABCDEFGHIJKLMN","key23": "ABCDEFGHIJKLMN","key24": "ABCDEFGHIJKLMN","key26": "ABCDEFGHIJKLMN","key27": "ABCDEFGHIJKLMN","key28": "ABCDEFGHIJKLMN","key29": "ABCDEFGHIJKLMN","key30": "ABCDEFGHIJKLMN","key31": "ABCDEFGHIJKLMN","key32": "ABCDEFGHIJKLMN","key33": "ABCDEFGHIJKLMN","key34": "ABCDEFGHIJKLMN","key35": "ABCDEFGHIJKLMN"}
【问题讨论】:
-
每次调用可以有多少计数(GZIP 内容中的行数)?此外,您为什么需要在 Cloud Logging 中添加日志,您能否分享您的用例的更高图景(出于好奇)?
-
GZIP内容的行数在几百到几千左右,我想使用云监控来监控日志,也想将它们转移到大查询中进行调查,用于调试。
-
你可以尝试在你的 for 循环之后添加一个
time.sleep(10)吗?好的,您将支付 10 秒执行时间的超额成本,但这是为了验证假设。 -
我在代码末尾添加time.sleep(10) 进行测试,数据丢失比以前少了。在添加测试代码之前丢失了 2-3%,现在丢失了 0.7-0.8% 的数据。
-
能否让它等到打印作业完成?
标签: python-3.x google-cloud-run google-cloud-logging