如何从 oprofile 输出中获取调用堆栈？答案

【问题标题】：How to get a callstack from oprofile output?如何从 oprofile 输出中获取调用堆栈？
【发布时间】：2013-06-21 18:18:13
【问题描述】：

我很困惑。我不知道 oprofile 是否甚至可以从分析报告中提供堆栈跟踪。我一直在查看 oprofile 手册，它只是通过说 they can be logged 来引用堆栈跟踪，但它没有给出如何这样做的示例。

这是我的 test.cpp

#include <iostream>                              
#include <unistd.h>                              
using namespace std;                             

void test(){                                     
    for (int x = 0; x < 100000; x++) cout << ".";
    sleep(1);                                    
    cout << endl;                                
};                                               

int main(int argv, char** argc){                 
    for (int x = 0; x < 120; x++) test();        
    return 0;                                    
}

这是我用来编译它的命令：

g++ -g -Wall test.cpp -o test

还有，这是我的 perf.sh 脚本（在虚拟机中的 RHEL 6.2 上运行）：

#!/bin/bash -x
sudo opcontrol --no-vmlinux                                                 
sudo opcontrol --reset                                                      
sudo opcontrol --start --separate=library,thread --image=$HOME/test
sudo opcontrol --callgraph=10                                               
sudo opcontrol --status                                                     
read -p "Press [Enter] key to stop profiling"                                                                       
sudo opcontrol --dump || exit 1                                             
sudo opreport --demangle=smart \                                            
              --merge=all \                                                 
              --symbols \                                                   
              --callgraph \                                                 
              --global-percent \                                            
              --output-file=perf.out                                        
sudo opcontrol --shutdown                                                   
sudo opcontrol --reset

这是我目前收到的报告：

CPU: CPU with timer interrupt, speed 0 MHz (estimated)                            
Profiling through timer interrupt                                                 
samples  %        app name                 symbol name                            
-------------------------------------------------------------------------------   
14       43.7500  libstdc++.so.6.0.13      /usr/lib64/libstdc++.so.6.0.13         
  14       43.7500  libstdc++.so.6.0.13      /usr/lib64/libstdc++.so.6.0.13 [self]
-------------------------------------------------------------------------------   
11       34.3750  libc-2.12.so             fwrite                                 
  11       34.3750  libc-2.12.so             fwrite [self]                        
-------------------------------------------------------------------------------   
5        15.6250  libc-2.12.so             _IO_file_xsputn@@GLIBC_2.2.5           
  5        15.6250  libc-2.12.so             _IO_file_xsputn@@GLIBC_2.2.5 [self]  
-------------------------------------------------------------------------------   
2         6.2500  libc-2.12.so             __strlen_sse42                         
  2         6.2500  libc-2.12.so             __strlen_sse42 [self]                
-------------------------------------------------------------------------------

还有，我的问题是：如何让堆栈跟踪显示在分析报告中？

【问题讨论】：

好问题。那一点文档不是很清楚。它为未显示呼叫计数而道歉，这在抽样中无关紧要。您应该看到 100% 基本上分为两部分。一个是main:12 -> test:7 -> sleep -> <system routines>，另一个是main:12 -> test:8 -> cout::endl -> <system IO routines>。我怀疑大部分是在睡眠中。我怀疑cout << "." 中的内容很少，除非您输出到stderr。无论如何，这就是 GDB 中的几个堆栈快照将向您展示的内容。
是的。 GDB 比 oprofile 更有帮助，除了我需要在比我上面编写的测试脚本大得多的多线程应用程序中使用 oprofile。所以，使用 GDB 对我来说并不会真正起作用。与他人交谈后，我觉得使用日志记录+计时器进行检测可能更有意义。
好吧，这就是我要做的：在多线程的情况下，当我中断它时，每个线程都会停止，所以我在每个线程上得到一个bt。那些什么都不做的，比如等待输入，我忽略了。其他的都很有价值。我假设目标是找到使代码更快的方法，而不是仅仅进行测量。也许那不是你的目标。您可以与其他人交谈，但该技巧并不像应有的那样广为人知，因此您知道自己会听到什么。

标签： c++ profiling stack-trace oprofile

【解决方案1】：

（这有点晚了，但这可能对其他人有帮助）

因为您在计时器模式下进行分析（这是某些 CPU 上的默认行为），所以回溯可能在您的内核中被禁用（哪个版本似乎是 2.6.32，因为您'重新使用 RHEL 6.2)。

您可以尝试：

使用硬件计数器
看一下oprofile内核部分的the history，如果你的内核版本确实有限制，这可能已经修复了
更新内核

我在使用相同的内核版本时遇到了同样的问题，但由于我使用的是 ARM，所以我的快速修复无法正常工作（this 是在这种情况下应用的补丁）。

【讨论】：