Hardware Performance Monitoring Deep Dive using Intel Performance Counter Monitor

A little while ago, I had to take a deep dive into hardware statistics in order to troubleshoot a performance bottleneck. In order to achieve this, I ended up using Intel Performance Counter Monitor. As one cannot simply download pre-compiled binaries of those tools, I had to dust off my mad C++ compiler skills. You can find the compiled binaries I did here as part of the GEM Automation latest release to save you some trouble. You’re welcome! 🙂

In order to use those tools, simply extract the GEM Automation archive to a local path on the machine you want to monitor. You can change the current working directory to:

<extraction path>\InfrastructureTesting\IntelPerformanceCounterMonitor\x64\

Here’s an overview of each of the exe in the directory and a sample output of each. Do note that you can export data to a CSV file for easier analysis. It seems to also include more metrics when you output the data that way.

  • pcm.exe
    • Provides CPU statistics for both sockets and cores

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3MPI : number of L3 cache misses per instruction
 L2MPI : number of L2 cache misses per instruction
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
 energy: Energy in Joules


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP

   0    0     0.01   0.32   0.02    1.16      28 K     44 K    0.36    0.81    0.00    0.00     65
   1    0     0.00   0.23   0.01    1.16    3270       18 K    0.82    0.81    0.00    0.00     65
   2    0     0.00   0.20   0.01    1.16    5487       19 K    0.73    0.81    0.00    0.00     61
   3    0     0.00   0.22   0.01    1.16    4425       16 K    0.73    0.84    0.00    0.00     61
   4    0     0.01   0.51   0.01    1.16      47 K     82 K    0.42    0.69    0.00    0.00     69
   5    0     0.00   0.22   0.02    1.16      32 K     48 K    0.34    0.76    0.00    0.01     69
   6    0     0.00   0.23   0.01    1.16    5810       20 K    0.71    0.81    0.00    0.00     67
   7    0     0.00   0.26   0.01    1.16    5952       35 K    0.83    0.73    0.00    0.00     67
   8    0     0.00   0.24   0.01    1.16    9282       26 K    0.64    0.77    0.00    0.00     63
   9    0     0.00   0.20   0.01    1.16    2845       12 K    0.78    0.87    0.00    0.00     63
  10    0     0.01   0.53   0.02    1.16    8552       55 K    0.85    0.66    0.00    0.00     65
  11    0     0.01   0.82   0.01    1.16    7612       28 K    0.73    0.78    0.00    0.00     65
  12    0     0.01   0.39   0.02    1.16      13 K    112 K    0.88    0.59    0.00    0.01     62
  13    0     0.00   0.21   0.01    1.16    3111       17 K    0.82    0.83    0.00    0.00     62
  14    0     0.00   0.31   0.01    1.16      20 K     61 K    0.66    0.65    0.00    0.01     62
  15    0     0.00   0.25   0.01    1.16    2127       14 K    0.85    0.86    0.00    0.00     62
  16    0     0.00   0.22   0.01    1.16    3462       17 K    0.80    0.85    0.00    0.00     61
  17    0     0.00   0.33   0.01    1.16      32 K     65 K    0.50    0.64    0.00    0.01     61
  18    0     0.00   0.21   0.01    1.16    3476       13 K    0.74    0.88    0.00    0.00     62
  19    0     0.00   0.23   0.01    1.16    2169       11 K    0.81    0.89    0.00    0.00     63
  20    1     0.04   0.60   0.06    1.16     123 K    515 K    0.76    0.62    0.00    0.01     60
  21    1     0.00   0.21   0.01    1.16    3878       39 K    0.90    0.73    0.00    0.01     60
  22    1     0.01   0.39   0.03    1.16      41 K    259 K    0.84    0.61    0.00    0.01     58
  23    1     0.00   0.18   0.01    1.16    4880       33 K    0.85    0.75    0.00    0.01     58
  24    1     0.02   1.07   0.02    1.16      24 K    207 K    0.88    0.79    0.00    0.00     67
  25    1     0.00   0.20   0.01    1.16    4392       30 K    0.86    0.76    0.00    0.01     67
  26    1     0.01   0.46   0.02    1.16      25 K    133 K    0.81    0.58    0.00    0.01     61
  27    1     0.00   0.30   0.01    1.16      42 K    134 K    0.68    0.51    0.00    0.01     61
  28    1     0.01   0.35   0.02    1.16      13 K    106 K    0.87    0.61    0.00    0.01     63
  29    1     0.00   0.21   0.01    1.16    9944       39 K    0.75    0.73    0.00    0.01     63
  30    1     0.00   0.24   0.01    1.16    5716       59 K    0.90    0.67    0.00    0.01     61
  31    1     0.01   0.30   0.02    1.16      16 K    106 K    0.84    0.59    0.00    0.01     61
  32    1     0.00   0.28   0.01    1.16    9956       74 K    0.87    0.64    0.00    0.01     64
  33    1     0.00   0.28   0.01    1.16      38 K     78 K    0.51    0.58    0.01    0.01     64
  34    1     0.00   0.30   0.01    1.16    9211       85 K    0.89    0.62    0.00    0.01     65
  35    1     0.01   0.39   0.01    1.16      10 K     81 K    0.87    0.64    0.00    0.01     65
  36    1     0.00   0.30   0.01    1.16    7509       83 K    0.91    0.63    0.00    0.01     59
  37    1     0.00   0.20   0.01    1.16    5518       22 K    0.75    0.82    0.00    0.01     59
  38    1     0.00   0.27   0.01    1.16    9772       74 K    0.87    0.64    0.00    0.01     63
  39    1     0.00   0.29   0.01    1.16      10 K     58 K    0.82    0.68    0.00    0.01     63
---------------------------------------------------------------------------------------------------------------
 SKT    0     0.00   0.33   0.01    1.16     243 K    724 K    0.66    0.75    0.00    0.00     60
 SKT    1     0.01   0.41   0.02    1.16     417 K   2225 K    0.81    0.66    0.00    0.01     59
---------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.01   0.38   0.01    1.16     661 K   2949 K    0.78    0.69    0.00    0.01     N/A

 Instructions retired:  523 M ; Active cycles: 1382 M ; Time (TSC): 2508 Mticks ; C0 (active,non-halted) core residency: 1.19 %

 C1 core residency: 98.81 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %;
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 0.76 => corresponds to 18.93 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.01 => corresponds to 0.26 % core utilization over time interval

Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links):

              | 
---------------------------------------------------------------------------------------------------------------
 SKT    0     |  
 SKT    1     |  
---------------------------------------------------------------------------------------------------------------
Total QPI incoming data traffic:    0       QPI data traffic/Memory controller traffic: 0.00

Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links):

              | 
---------------------------------------------------------------------------------------------------------------
 SKT    0     |  
 SKT    1     |  
---------------------------------------------------------------------------------------------------------------
Total QPI outgoing data and non-data traffic:    0  

          |  READ |  WRITE | CPU energy | DIMM energy
---------------------------------------------------------------------------------------------------------------
 SKT   0     0.09     0.06      37.51      16.17
 SKT   1     0.07     0.05      38.45      13.03
---------------------------------------------------------------------------------------------------------------
       *     0.16     0.11      75.97      29.20

  • pcm-core.exe
    • Provides detailed core level information
Time elapsed: 1004 ms
txn_rate: 1

Core | IPC | Instructions  |  Cycles  | Event0  | Event1  | Event2  | Event3 
   0   0.44         102 M      232 M     301 K     768 K      91 K     830 K
   1   1.04         137 M      131 M     140 K     336 K      12 K     918 K
   2   0.85         194 M      228 M     247 K     569 K      82 K     613 K
   3   0.25        7377 K       29 M      17 K      31 K    4364        93 K
   4   0.66          99 M      149 M     148 K     373 K      49 K     407 K
   5   0.61         169 M      275 M     163 K     770 K      94 K    1105 K
   6   0.89         186 M      209 M     258 K     399 K      55 K     635 K
   7   0.48         101 M      211 M     200 K     641 K      64 K     670 K
   8   0.50          88 M      176 M     177 K     547 K      73 K     510 K
   9   0.19        4422 K       22 M    4572        20 K    3379        83 K
  10   0.71         124 M      175 M     167 K     389 K      49 K     388 K
  11   0.24        5738 K       24 M    6407        24 K    4258        90 K
  12   0.67          58 M       87 M      73 K     184 K      23 K     249 K
  13   0.90         161 M      180 M     160 K     308 K      80 K     603 K
  14   0.71          49 M       69 M      70 K     100 K      16 K     193 K
  15   0.29          16 M       56 M      37 K      51 K      37 K     241 K
  16   0.73          46 M       63 M      40 K      80 K      25 K     300 K
  17   0.28        6441 K       23 M    6106        22 K    4619       104 K
  18   0.27        9346 K       34 M      28 K      52 K    8449       120 K
  19   0.46         130 M      285 M     358 K     914 K      95 K     874 K
  20   0.65         807 M     1240 M     502 K    4783 K     785 K    5832 K
  21   0.16        4350 K       26 M    4635        74 K    3481        84 K
  22   0.53         123 M      232 M     207 K     710 K     131 K     738 K
  23   0.17        4402 K       25 M    5703        32 K    4500        93 K
  24   0.50          87 M      175 M     188 K     617 K      37 K     524 K
  25   0.18        4483 K       24 M    5430        24 K    4040        90 K
  26   0.56         200 M      360 M     250 K    1192 K      84 K    3315 K
  27   1.45         958 M      661 M     434 K     920 K      50 K      13 M
  28   0.31          17 M       56 M      57 K     173 K      17 K     178 K
  29   1.43         888 M      622 M     457 K     622 K      38 K    2603 K
  30   0.41          29 M       72 M      68 K     228 K      25 K     233 K
  31   0.56          68 M      122 M     159 K     287 K      20 K     544 K
  32   0.39          23 M       62 M      59 K     164 K      19 K     222 K
  33   0.31        8809 K       28 M      26 K      49 K    6731       119 K
  34   0.61         156 M      255 M     146 K     923 K      70 K     740 K
  35   0.43          22 M       51 M      58 K     114 K      12 K     180 K
  36   0.74         737 M     1001 M     177 K    3782 K     730 K    3088 K
  37   0.35          29 M       86 M      30 K     157 K      13 K    2449 K
  38   0.39          16 M       42 M      16 K     112 K      17 K     133 K
  39   0.69         664 M      961 M     115 K    3848 K     722 K    2978 K
-------------------------------------------------------------------------------------------------------------------
   *   0.75        6556 M     8780 M    5584 K      25 M    3673 K      46 M

  • pcm-memory.exe
    • Provides socket and channel level read/write throughput information
Time elapsed: 1000 ms
Called sleep function for 1000 ms
|---------------------------------------||---------------------------------------|
|--             Socket  0             --||--             Socket  1             --|
|---------------------------------------||---------------------------------------|
|--     Memory Channel Monitoring     --||--     Memory Channel Monitoring     --|
|---------------------------------------||---------------------------------------|
|-- Mem Ch  0: Reads (MB/s):    49.91 --||-- Mem Ch  0: Reads (MB/s):     3.42 --|
|--            Writes(MB/s):    43.65 --||--            Writes(MB/s):     1.13 --|
|-- Mem Ch  1: Reads (MB/s):    13.95 --||-- Mem Ch  1: Reads (MB/s):     3.37 --|
|--            Writes(MB/s):     5.32 --||--            Writes(MB/s):     1.15 --|
|-- Mem Ch  2: Reads (MB/s):    10.08 --||-- Mem Ch  2: Reads (MB/s):    46.07 --|
|--            Writes(MB/s):     3.59 --||--            Writes(MB/s):    42.18 --|
|-- Mem Ch  3: Reads (MB/s):    13.52 --||-- Mem Ch  3: Reads (MB/s):     3.31 --|
|--            Writes(MB/s):     4.43 --||--            Writes(MB/s):     1.10 --|
|-- NODE 0 Mem Read (MB/s) :    87.47 --||-- NODE 1 Mem Read (MB/s) :    56.17 --|
|-- NODE 0 Mem Write(MB/s) :    56.98 --||-- NODE 1 Mem Write(MB/s) :    45.56 --|
|-- NODE 0 P. Write (T/s):     624374 --||-- NODE 1 P. Write (T/s):     622531 --|
|-- NODE 0 Memory (MB/s):      144.45 --||-- NODE 1 Memory (MB/s):      101.74 --|
|---------------------------------------||---------------------------------------|
        
|---------------------------------------||---------------------------------------|
        
|--                   System Read Throughput(MB/s):    143.64                  --|
        
|--                  System Write Throughput(MB/s):    102.54                  --|
        
|--                 System Memory Throughput(MB/s):    246.19                  --|
        
|---------------------------------------||---------------------------------------|
  • pcm-msr.exe
    • Not entirely sure what this does…
  • pcm-numa.exe
    • Provides memory NUMA memory access information information
Time elapsed: 1014 ms
Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses 
   0   0.33         15 M       47 M        22 K              3620                
   1   0.23       4114 K       17 M      4843                1060                
   2   0.20       5205 K       25 M      6682                4486                
   3   0.23       6016 K       26 M      1369                1070                
   4   0.80         22 M       28 M      4045                1435                
   5   0.23       9756 K       42 M        11 K              6362                
   6   0.22       5305 K       24 M      4357                1152                
   7   0.56         25 M       44 M        57 K                10 K              
   8   0.24       5380 K       22 M      3655                1807                
   9   0.21       4525 K       21 M      2075                1219                
  10   0.53         20 M       38 M      6579                2557                
  11   0.22       4857 K       22 M      4607                2460                
  12   0.38         16 M       44 M        25 K              2940                
  13   1.42         70 M       49 M      5793                2280                
  14   0.24       5952 K       24 M      2233                1007                
  15   0.25       5551 K       22 M      2150                 835                
  16   0.31       8273 K       26 M        22 K              1730                
  17   0.23       3939 K       17 M      1309                 592                
  18   0.20       4401 K       21 M      3583                1833                
  19   0.27       5272 K       19 M        10 K              1558                
  20   0.55        102 M      188 M        76 K                69 K              
  21   0.20       4772 K       24 M      1801                1430                
  22   0.50         68 M      137 M        89 K                46 K              
  23   0.25       7923 K       31 M      8629                  17 K              
  24   0.35         17 M       51 M        38 K              7632                
  25   0.19       5416 K       27 M      3670                1265                
  26   0.34         16 M       48 M        24 K              9108                
  27   0.31         12 M       40 M        21 K                34 K              
  28   0.34         14 M       43 M      7770                3473                
  29   0.24       7116 K       30 M      6161                1686                
  30   0.33         13 M       41 M      9403                3111                
  31   0.32         12 M       40 M        13 K              2672                
  32   0.30         11 M       37 M        12 K              1773                
  33   0.32         10 M       31 M        77 K              2129                
  34   0.32         11 M       36 M      5342                2449                
  35   0.24       6862 K       28 M      4013                5977                
  36   0.35         12 M       36 M      7212                1994                
  37   0.23       5039 K       22 M      1721                1333                
  38   0.25       7346 K       29 M      5205                1658                
  39   0.26       7379 K       28 M      8195                4296                
-------------------------------------------------------------------------------------------------------------------
   *   0.39        606 M     1542 M       625 K               270 K              

  • pcm-pcie.exe
    • Provides PCIe link usage information (useful to determine if you hit a PCIe bottleneck)
Skt | PCIeRdCur | PCIeNSRd  | PCIeWiLF | PCIeItoM | PCIeNSWr | PCIeNSWrF
 0       759 K         0           0        612 K        0          0  
 1         0           0           0          0          0          0  
-----------------------------------------------------------------------------------
 *        759 K         0           0        612 K        0          0  
  • pcm-power.exe
    • Provides memory power consumption statistics

----------------------------------------------------------------------------------------------
Time elapsed: 1000 ms
Called sleep function for 1000 ms
S0CH0; DRAMClocks: 933924607; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 159520; Rank0 Cycles per transition: 933924607
S0CH0; DRAMClocks: 933924607; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 157305; Rank1 Cycles per transition: 933924607
S0CH1; DRAMClocks: 933925096; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 153645; Rank0 Cycles per transition: 933925096
S0CH1; DRAMClocks: 933925096; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 151533; Rank1 Cycles per transition: 933925096
S0CH2; DRAMClocks: 933925354; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 149329; Rank0 Cycles per transition: 933925354
S0CH2; DRAMClocks: 933925354; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 148905; Rank1 Cycles per transition: 933925354
S0CH3; DRAMClocks: 933924943; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 147401; Rank0 Cycles per transition: 933924943
S0CH3; DRAMClocks: 933924943; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 145298; Rank1 Cycles per transition: 933924943
S0; PCUClocks: 800627536; Freq band 0/1/2 cycles: 99.84%; 99.84%; 0.00%
S0; Consumed energy units: 2457737; Consumed Joules: 37.50; Watts: 37.50; Thermal headroom below TjMax: 60
S0; Consumed DRAM energy units: 1061128; Consumed DRAM Joules: 16.19; DRAM Watts: 16.19
S1CH0; DRAMClocks: 933902607; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 164508; Rank0 Cycles per transition: 933902607
S1CH0; DRAMClocks: 933902607; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 164626; Rank1 Cycles per transition: 933902607
S1CH1; DRAMClocks: 933901094; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 166178; Rank0 Cycles per transition: 933901094
S1CH1; DRAMClocks: 933901094; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 166269; Rank1 Cycles per transition: 933901094
S1CH2; DRAMClocks: 933900756; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 166668; Rank0 Cycles per transition: 933900756
S1CH2; DRAMClocks: 933900756; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 166654; Rank1 Cycles per transition: 933900756
S1CH3; DRAMClocks: 933900898; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 166572; Rank0 Cycles per transition: 933900898
S1CH3; DRAMClocks: 933900898; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 166625; Rank1 Cycles per transition: 933900898
S1; PCUClocks: 800628916; Freq band 0/1/2 cycles: 100.00%; 100.00%; 100.00%
S1; Consumed energy units: 2521661; Consumed Joules: 38.48; Watts: 38.48; Thermal headroom below TjMax: 56
S1; Consumed DRAM energy units: 854553; Consumed DRAM Joules: 13.04; DRAM Watts: 13.04

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s