我一直没有系统编程的经验,欠缺这方面的方法论,这次遇到 ovs 的 port 丢失问题,Google 相关 troubleshoot 的东西都是 data flow 层面的信息,实在没办法了上来提问,期待大家帮忙,mentoring 哈。
因为是半生产环境,所以不是方便一直连上去重现,所以要自己多研究一下才能去做尝试。
刚刚也把问题发在了 stackoverflow: https://stackoverflow.com/questions/48246801/openvswitch-port-missing-in-large-load-long-poll-interval-observed
如果大家有帮助思路烦请直接去那里回答或者麻烦两边都添加回复,抱歉带来这样的麻烦。
ISSUE description
I have a OpenStack system with HA management network (VIP) via ovs (Open vSwitch) port, it's found in this system, with high load (concurrently volume-from-glance-image creation), the VIP port (an ovs port) will be missing.
Analysis
For now, with default log level from log file, the only thing observed is as below the Unreasonably long 62741ms poll interval.
2017-12-29T16:40:38.611Z|00001|timeval(revalidator70)|WARN|Unreasonably long 62741ms poll interval (0ms user, 0ms system)
Idea for now
I will turn debug log on for file and try reproducing the issue:
sudo ovs-appctl vlog/set file:dbg
Question
- What else should I do during/after of the issue reproduction please?
- Is this issue typical? Caused by what if yes?
I googled OpenvSwitch trouble shoot or other related key words while information was all on data flow/table level instead of this ovs-vswitchd level ( am I right? )
Many thanks! BR//Wey