Hardware Performance Monitoring Deep Dive using Intel Performance Counter Monitor

A little while ago, I had to take a deep dive into hardware statistics in order to troubleshoot a performance bottleneck. In order to achieve this, I ended up using Intel Performance Counter Monitor. As one cannot simply download pre-compiled binaries of those tools, I had to dust off my mad C++ compiler skills. You can find the compiled binaries I did here as part of the GEM Automation latest release to save you some trouble. You’re welcome! ūüôā

In order to use those tools, simply extract the GEM Automation archive to a local path on the machine you want to monitor. You can change the current working directory to:

<extraction path>\InfrastructureTesting\IntelPerformanceCounterMonitor\x64\

Here’s an overview of each of the exe in the directory and a sample output of each. Do note that you can export data to a CSV file for easier analysis. It seems to also include more metrics when you output the data that way.

  • pcm.exe
    • Provides CPU statistics for both sockets and cores

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses 
 L2MISS: L2 cache misses (including other core's L2 cache *hits*) 
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3MPI : number of L3 cache misses per instruction
 L2MPI : number of L2 cache misses per instruction
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
 energy: Energy in Joules


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | TEMP

   0    0     0.01   0.32   0.02    1.16      28 K     44 K    0.36    0.81    0.00    0.00     65
   1    0     0.00   0.23   0.01    1.16    3270       18 K    0.82    0.81    0.00    0.00     65
   2    0     0.00   0.20   0.01    1.16    5487       19 K    0.73    0.81    0.00    0.00     61
   3    0     0.00   0.22   0.01    1.16    4425       16 K    0.73    0.84    0.00    0.00     61
   4    0     0.01   0.51   0.01    1.16      47 K     82 K    0.42    0.69    0.00    0.00     69
   5    0     0.00   0.22   0.02    1.16      32 K     48 K    0.34    0.76    0.00    0.01     69
   6    0     0.00   0.23   0.01    1.16    5810       20 K    0.71    0.81    0.00    0.00     67
   7    0     0.00   0.26   0.01    1.16    5952       35 K    0.83    0.73    0.00    0.00     67
   8    0     0.00   0.24   0.01    1.16    9282       26 K    0.64    0.77    0.00    0.00     63
   9    0     0.00   0.20   0.01    1.16    2845       12 K    0.78    0.87    0.00    0.00     63
  10    0     0.01   0.53   0.02    1.16    8552       55 K    0.85    0.66    0.00    0.00     65
  11    0     0.01   0.82   0.01    1.16    7612       28 K    0.73    0.78    0.00    0.00     65
  12    0     0.01   0.39   0.02    1.16      13 K    112 K    0.88    0.59    0.00    0.01     62
  13    0     0.00   0.21   0.01    1.16    3111       17 K    0.82    0.83    0.00    0.00     62
  14    0     0.00   0.31   0.01    1.16      20 K     61 K    0.66    0.65    0.00    0.01     62
  15    0     0.00   0.25   0.01    1.16    2127       14 K    0.85    0.86    0.00    0.00     62
  16    0     0.00   0.22   0.01    1.16    3462       17 K    0.80    0.85    0.00    0.00     61
  17    0     0.00   0.33   0.01    1.16      32 K     65 K    0.50    0.64    0.00    0.01     61
  18    0     0.00   0.21   0.01    1.16    3476       13 K    0.74    0.88    0.00    0.00     62
  19    0     0.00   0.23   0.01    1.16    2169       11 K    0.81    0.89    0.00    0.00     63
  20    1     0.04   0.60   0.06    1.16     123 K    515 K    0.76    0.62    0.00    0.01     60
  21    1     0.00   0.21   0.01    1.16    3878       39 K    0.90    0.73    0.00    0.01     60
  22    1     0.01   0.39   0.03    1.16      41 K    259 K    0.84    0.61    0.00    0.01     58
  23    1     0.00   0.18   0.01    1.16    4880       33 K    0.85    0.75    0.00    0.01     58
  24    1     0.02   1.07   0.02    1.16      24 K    207 K    0.88    0.79    0.00    0.00     67
  25    1     0.00   0.20   0.01    1.16    4392       30 K    0.86    0.76    0.00    0.01     67
  26    1     0.01   0.46   0.02    1.16      25 K    133 K    0.81    0.58    0.00    0.01     61
  27    1     0.00   0.30   0.01    1.16      42 K    134 K    0.68    0.51    0.00    0.01     61
  28    1     0.01   0.35   0.02    1.16      13 K    106 K    0.87    0.61    0.00    0.01     63
  29    1     0.00   0.21   0.01    1.16    9944       39 K    0.75    0.73    0.00    0.01     63
  30    1     0.00   0.24   0.01    1.16    5716       59 K    0.90    0.67    0.00    0.01     61
  31    1     0.01   0.30   0.02    1.16      16 K    106 K    0.84    0.59    0.00    0.01     61
  32    1     0.00   0.28   0.01    1.16    9956       74 K    0.87    0.64    0.00    0.01     64
  33    1     0.00   0.28   0.01    1.16      38 K     78 K    0.51    0.58    0.01    0.01     64
  34    1     0.00   0.30   0.01    1.16    9211       85 K    0.89    0.62    0.00    0.01     65
  35    1     0.01   0.39   0.01    1.16      10 K     81 K    0.87    0.64    0.00    0.01     65
  36    1     0.00   0.30   0.01    1.16    7509       83 K    0.91    0.63    0.00    0.01     59
  37    1     0.00   0.20   0.01    1.16    5518       22 K    0.75    0.82    0.00    0.01     59
  38    1     0.00   0.27   0.01    1.16    9772       74 K    0.87    0.64    0.00    0.01     63
  39    1     0.00   0.29   0.01    1.16      10 K     58 K    0.82    0.68    0.00    0.01     63
---------------------------------------------------------------------------------------------------------------
 SKT    0     0.00   0.33   0.01    1.16     243 K    724 K    0.66    0.75    0.00    0.00     60
 SKT    1     0.01   0.41   0.02    1.16     417 K   2225 K    0.81    0.66    0.00    0.01     59
---------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.01   0.38   0.01    1.16     661 K   2949 K    0.78    0.69    0.00    0.01     N/A

 Instructions retired:  523 M ; Active cycles: 1382 M ; Time (TSC): 2508 Mticks ; C0 (active,non-halted) core residency: 1.19 %

 C1 core residency: 98.81 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %;
 C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : 0.76 => corresponds to 18.93 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.01 => corresponds to 0.26 % core utilization over time interval

Intel(r) QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links):

              | 
---------------------------------------------------------------------------------------------------------------
 SKT    0     |  
 SKT    1     |  
---------------------------------------------------------------------------------------------------------------
Total QPI incoming data traffic:    0       QPI data traffic/Memory controller traffic: 0.00

Intel(r) QPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through QPI links):

              | 
---------------------------------------------------------------------------------------------------------------
 SKT    0     |  
 SKT    1     |  
---------------------------------------------------------------------------------------------------------------
Total QPI outgoing data and non-data traffic:    0  

          |  READ |  WRITE | CPU energy | DIMM energy
---------------------------------------------------------------------------------------------------------------
 SKT   0     0.09     0.06      37.51      16.17
 SKT   1     0.07     0.05      38.45      13.03
---------------------------------------------------------------------------------------------------------------
       *     0.16     0.11      75.97      29.20

  • pcm-core.exe
    • Provides detailed core level information
Time elapsed: 1004 ms
txn_rate: 1

Core | IPC | Instructions  |  Cycles  | Event0  | Event1  | Event2  | Event3 
   0   0.44         102 M      232 M     301 K     768 K      91 K     830 K
   1   1.04         137 M      131 M     140 K     336 K      12 K     918 K
   2   0.85         194 M      228 M     247 K     569 K      82 K     613 K
   3   0.25        7377 K       29 M      17 K      31 K    4364        93 K
   4   0.66          99 M      149 M     148 K     373 K      49 K     407 K
   5   0.61         169 M      275 M     163 K     770 K      94 K    1105 K
   6   0.89         186 M      209 M     258 K     399 K      55 K     635 K
   7   0.48         101 M      211 M     200 K     641 K      64 K     670 K
   8   0.50          88 M      176 M     177 K     547 K      73 K     510 K
   9   0.19        4422 K       22 M    4572        20 K    3379        83 K
  10   0.71         124 M      175 M     167 K     389 K      49 K     388 K
  11   0.24        5738 K       24 M    6407        24 K    4258        90 K
  12   0.67          58 M       87 M      73 K     184 K      23 K     249 K
  13   0.90         161 M      180 M     160 K     308 K      80 K     603 K
  14   0.71          49 M       69 M      70 K     100 K      16 K     193 K
  15   0.29          16 M       56 M      37 K      51 K      37 K     241 K
  16   0.73          46 M       63 M      40 K      80 K      25 K     300 K
  17   0.28        6441 K       23 M    6106        22 K    4619       104 K
  18   0.27        9346 K       34 M      28 K      52 K    8449       120 K
  19   0.46         130 M      285 M     358 K     914 K      95 K     874 K
  20   0.65         807 M     1240 M     502 K    4783 K     785 K    5832 K
  21   0.16        4350 K       26 M    4635        74 K    3481        84 K
  22   0.53         123 M      232 M     207 K     710 K     131 K     738 K
  23   0.17        4402 K       25 M    5703        32 K    4500        93 K
  24   0.50          87 M      175 M     188 K     617 K      37 K     524 K
  25   0.18        4483 K       24 M    5430        24 K    4040        90 K
  26   0.56         200 M      360 M     250 K    1192 K      84 K    3315 K
  27   1.45         958 M      661 M     434 K     920 K      50 K      13 M
  28   0.31          17 M       56 M      57 K     173 K      17 K     178 K
  29   1.43         888 M      622 M     457 K     622 K      38 K    2603 K
  30   0.41          29 M       72 M      68 K     228 K      25 K     233 K
  31   0.56          68 M      122 M     159 K     287 K      20 K     544 K
  32   0.39          23 M       62 M      59 K     164 K      19 K     222 K
  33   0.31        8809 K       28 M      26 K      49 K    6731       119 K
  34   0.61         156 M      255 M     146 K     923 K      70 K     740 K
  35   0.43          22 M       51 M      58 K     114 K      12 K     180 K
  36   0.74         737 M     1001 M     177 K    3782 K     730 K    3088 K
  37   0.35          29 M       86 M      30 K     157 K      13 K    2449 K
  38   0.39          16 M       42 M      16 K     112 K      17 K     133 K
  39   0.69         664 M      961 M     115 K    3848 K     722 K    2978 K
-------------------------------------------------------------------------------------------------------------------
   *   0.75        6556 M     8780 M    5584 K      25 M    3673 K      46 M

  • pcm-memory.exe
    • Provides socket and channel level read/write throughput information
Time elapsed: 1000 ms
Called sleep function for 1000 ms
|---------------------------------------||---------------------------------------|
|--             Socket  0             --||--             Socket  1             --|
|---------------------------------------||---------------------------------------|
|--     Memory Channel Monitoring     --||--     Memory Channel Monitoring     --|
|---------------------------------------||---------------------------------------|
|-- Mem Ch  0: Reads (MB/s):    49.91 --||-- Mem Ch  0: Reads (MB/s):     3.42 --|
|--            Writes(MB/s):    43.65 --||--            Writes(MB/s):     1.13 --|
|-- Mem Ch  1: Reads (MB/s):    13.95 --||-- Mem Ch  1: Reads (MB/s):     3.37 --|
|--            Writes(MB/s):     5.32 --||--            Writes(MB/s):     1.15 --|
|-- Mem Ch  2: Reads (MB/s):    10.08 --||-- Mem Ch  2: Reads (MB/s):    46.07 --|
|--            Writes(MB/s):     3.59 --||--            Writes(MB/s):    42.18 --|
|-- Mem Ch  3: Reads (MB/s):    13.52 --||-- Mem Ch  3: Reads (MB/s):     3.31 --|
|--            Writes(MB/s):     4.43 --||--            Writes(MB/s):     1.10 --|
|-- NODE 0 Mem Read (MB/s) :    87.47 --||-- NODE 1 Mem Read (MB/s) :    56.17 --|
|-- NODE 0 Mem Write(MB/s) :    56.98 --||-- NODE 1 Mem Write(MB/s) :    45.56 --|
|-- NODE 0 P. Write (T/s):     624374 --||-- NODE 1 P. Write (T/s):     622531 --|
|-- NODE 0 Memory (MB/s):      144.45 --||-- NODE 1 Memory (MB/s):      101.74 --|
|---------------------------------------||---------------------------------------|
        
|---------------------------------------||---------------------------------------|
        
|--                   System Read Throughput(MB/s):    143.64                  --|
        
|--                  System Write Throughput(MB/s):    102.54                  --|
        
|--                 System Memory Throughput(MB/s):    246.19                  --|
        
|---------------------------------------||---------------------------------------|
  • pcm-msr.exe
    • Not entirely sure what this does…
  • pcm-numa.exe
    • Provides memory NUMA memory access information information
Time elapsed: 1014 ms
Core | IPC  | Instructions | Cycles  |  Local DRAM accesses | Remote DRAM Accesses 
   0   0.33         15 M       47 M        22 K              3620                
   1   0.23       4114 K       17 M      4843                1060                
   2   0.20       5205 K       25 M      6682                4486                
   3   0.23       6016 K       26 M      1369                1070                
   4   0.80         22 M       28 M      4045                1435                
   5   0.23       9756 K       42 M        11 K              6362                
   6   0.22       5305 K       24 M      4357                1152                
   7   0.56         25 M       44 M        57 K                10 K              
   8   0.24       5380 K       22 M      3655                1807                
   9   0.21       4525 K       21 M      2075                1219                
  10   0.53         20 M       38 M      6579                2557                
  11   0.22       4857 K       22 M      4607                2460                
  12   0.38         16 M       44 M        25 K              2940                
  13   1.42         70 M       49 M      5793                2280                
  14   0.24       5952 K       24 M      2233                1007                
  15   0.25       5551 K       22 M      2150                 835                
  16   0.31       8273 K       26 M        22 K              1730                
  17   0.23       3939 K       17 M      1309                 592                
  18   0.20       4401 K       21 M      3583                1833                
  19   0.27       5272 K       19 M        10 K              1558                
  20   0.55        102 M      188 M        76 K                69 K              
  21   0.20       4772 K       24 M      1801                1430                
  22   0.50         68 M      137 M        89 K                46 K              
  23   0.25       7923 K       31 M      8629                  17 K              
  24   0.35         17 M       51 M        38 K              7632                
  25   0.19       5416 K       27 M      3670                1265                
  26   0.34         16 M       48 M        24 K              9108                
  27   0.31         12 M       40 M        21 K                34 K              
  28   0.34         14 M       43 M      7770                3473                
  29   0.24       7116 K       30 M      6161                1686                
  30   0.33         13 M       41 M      9403                3111                
  31   0.32         12 M       40 M        13 K              2672                
  32   0.30         11 M       37 M        12 K              1773                
  33   0.32         10 M       31 M        77 K              2129                
  34   0.32         11 M       36 M      5342                2449                
  35   0.24       6862 K       28 M      4013                5977                
  36   0.35         12 M       36 M      7212                1994                
  37   0.23       5039 K       22 M      1721                1333                
  38   0.25       7346 K       29 M      5205                1658                
  39   0.26       7379 K       28 M      8195                4296                
-------------------------------------------------------------------------------------------------------------------
   *   0.39        606 M     1542 M       625 K               270 K              

  • pcm-pcie.exe
    • Provides PCIe link usage information (useful to determine if you hit a PCIe bottleneck)
Skt | PCIeRdCur | PCIeNSRd  | PCIeWiLF | PCIeItoM | PCIeNSWr | PCIeNSWrF
 0       759 K         0           0        612 K        0          0  
 1         0           0           0          0          0          0  
-----------------------------------------------------------------------------------
 *        759 K         0           0        612 K        0          0  
  • pcm-power.exe
    • Provides memory power consumption statistics

----------------------------------------------------------------------------------------------
Time elapsed: 1000 ms
Called sleep function for 1000 ms
S0CH0; DRAMClocks: 933924607; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 159520; Rank0 Cycles per transition: 933924607
S0CH0; DRAMClocks: 933924607; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 157305; Rank1 Cycles per transition: 933924607
S0CH1; DRAMClocks: 933925096; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 153645; Rank0 Cycles per transition: 933925096
S0CH1; DRAMClocks: 933925096; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 151533; Rank1 Cycles per transition: 933925096
S0CH2; DRAMClocks: 933925354; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 149329; Rank0 Cycles per transition: 933925354
S0CH2; DRAMClocks: 933925354; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 148905; Rank1 Cycles per transition: 933925354
S0CH3; DRAMClocks: 933924943; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 147401; Rank0 Cycles per transition: 933924943
S0CH3; DRAMClocks: 933924943; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 145298; Rank1 Cycles per transition: 933924943
S0; PCUClocks: 800627536; Freq band 0/1/2 cycles: 99.84%; 99.84%; 0.00%
S0; Consumed energy units: 2457737; Consumed Joules: 37.50; Watts: 37.50; Thermal headroom below TjMax: 60
S0; Consumed DRAM energy units: 1061128; Consumed DRAM Joules: 16.19; DRAM Watts: 16.19
S1CH0; DRAMClocks: 933902607; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 164508; Rank0 Cycles per transition: 933902607
S1CH0; DRAMClocks: 933902607; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 164626; Rank1 Cycles per transition: 933902607
S1CH1; DRAMClocks: 933901094; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 166178; Rank0 Cycles per transition: 933901094
S1CH1; DRAMClocks: 933901094; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 166269; Rank1 Cycles per transition: 933901094
S1CH2; DRAMClocks: 933900756; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 166668; Rank0 Cycles per transition: 933900756
S1CH2; DRAMClocks: 933900756; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 166654; Rank1 Cycles per transition: 933900756
S1CH3; DRAMClocks: 933900898; Rank0 CKE Off Residency: 0.02%; Rank0 CKE Off Average Cycles: 166572; Rank0 Cycles per transition: 933900898
S1CH3; DRAMClocks: 933900898; Rank1 CKE Off Residency: 0.02%; Rank1 CKE Off Average Cycles: 166625; Rank1 Cycles per transition: 933900898
S1; PCUClocks: 800628916; Freq band 0/1/2 cycles: 100.00%; 100.00%; 100.00%
S1; Consumed energy units: 2521661; Consumed Joules: 38.48; Watts: 38.48; Thermal headroom below TjMax: 56
S1; Consumed DRAM energy units: 854553; Consumed DRAM Joules: 13.04; DRAM Watts: 13.04

 

 

 

AMD Naples – More than an Intel challenger for Storage Spaces Direct?

With the recent announce of the new AMD “Napples” processor, a few things have changed in regards to options for Storage Spaces Direct. Let’s have a look to see what’s this new CPU is about.

AMD-Naples-Zen-CPU-14--pcgh.pngA few key points:

  • Between 16/32 threads and 32 cores/64 threads per socket or up to 64 cores/128 threads in a 2 socket server
    • Intel Skylake is “only” expected to have 28 cores per socket (** Update 2017-03-19 ** There are now rumors of 32 cores Skylake E5 v5 CPUs)
  • 2TB of RAM per socket
  • 8 channel DDR4
    • Bandwidth is expected to be in the 170GB/s range
    • Intel Skylake is expected to only have 6 channel memory
  • 128 PCIe 3.0 lanes PER socket
    • In 2 sockets configuration, it’s “only” 64 lanes that will be available as the other 64 are used for socket to socket transport
    • In other words for S2D, this means a single socket can properly support 2 x 100GbE ports AND 24 NVMe drives without any sorcery like PCIe switches in between
    • That’s roughly 126GB/s of PCIe bandwidth, not too shabby

Here’s an example of what it looks like in the flesh:

AMD-Naples-Speedway-Internal

With that kind of horse power, you might be able to start thinking about having a few million IOPS per S2D node if Microsoft can manage to scale up to that level. Scale that out to the supported 16 nodes in a cluster and now we have a party going! Personally, I think¬† going with a single socket configuration with 32 cores would be fine sizing/configuration for S2D. It would also give you a server failure domain that’s reasonable. Furthermore, from a licensing standpoint, a 64 cores Datacenter Edition server is rather pricey to say the least… You might want to go with a variant with less cores if your workload allows it. The IO balance being provided by this new AMD CPU is much better than what’s being provided by Intel at this point in time. That may change if Intel decides to go with PCIe 4.0 but it doesn’t look like we’ll see this any time soon.

If VDI/RDS SH is your thing, perhaps taking advantage of those extra PCIe lanes for GPUs will be a nice advantage. Top that with a crazy core/thread count and you would be able to drive some pretty demanding user workload without overcommitting too much your CPU and while also having access to tons of memory.

I’ll definitely take a look at AMD systems when Naples is coming out later this year. A little competition in the server CPU market is long overdue! Hopefully AMD will price this one right and reliability will be what we expect for a server. Since it’s a new CPU architecture, it might take a little while before software manufacturers support and optimize for this chip. With the right demand from customer, that might accelerate the process!

 

 

 

 

 

Identify Assemblies and Applications .NET Runtime and Framework Versions

As .NET Frameworks versions expire, you may need to identify which applications need to be recompiled/retargeted with a new version of .NET Framework. One method I found was to use Mono.Cecil to gather .NET assembly information such as the runtime required as well as the .NET Framework version used. Note the Framework version is only available for assemblies using the 4.0 runtime and up.

For example, you can then do the following to capture information for multiple assemblies in IIS sites running .NET code:

(ls \\contoso001\c$\inetpub\wwwroot\*\*\bin\*.dll).FullName | .\Get-FrameworkVersion.ps1 | Export-Csv AssembliesVersions.csv -NoTypeInformation -Append

You can find the script here:

http://gemautomation.codeplex.com/SourceControl/latest#Utilities/Get-AssemblyFrameworkVersion.ps1

The new script and the required assembly are part of the new GEM Automation 3.10.0.0 release.

 

 

GEM Automation 3.8.0.0 Released

The version 3.8.0.0 of GEM Automation has just been released. You can download the new package here.

Here are some of the new goodies it contains.

Hyper-V

  • Updated New-NanoServerCluster.ps1 for 2016 TP4 and fixed issues
  • Cleaned up sessions at the end of the script

StorageSpaces

  • libStorageSpaces.psm1
    • New Get-PhysicalDiskReliabilityCounters: Merges information from Get-PhysicalDisk and Get-StorageReliabilityCounter executed on remote computers to generate an output that can then be exported to a CSV for analysis in Excel.
      • This can be used to spot trends in enclosures used for Storage Spaces to detect hot-spots or disk reliability issues.

Here’s an example that shows the percentage of the current temperature of the drives vs maximum thermal values for the disks in multiple enclosures for a cluster:

disk_reliability_counters_temperature_excel

SQLServer

  • libSQLServerStatistics.psm1
    • New functions to capture wait statistics Get-SQLWaitStatisticsSample, Store-SQLWaitStatisticsSample, Get-SQLWaitLastRunningTotal
      • As the wait statistics are cumulative in SQL Server, the collection process calculates the delta between each collection run help you assess how much of a certain wait type you had during a sample.
  • Monitor-SQLWait.ps1
    • Script that runs the wait statistics collection in a loop
  • Updated CreateSchemaDatabaseStatistics.sql to include tables to capture wait statistics
  • New WaitTypes.csv that contains description and category of the WaitTypes.csv
    • You can import this in the WaitTypes table and join it in your queries on the WaitStatistics table to get grouping of the wait types.

Here what the data looks like when you visualize it in Excel:

waits_excel_spreadsheet

Windows

  • libWindowsPerformance.psm1
    • New Convert-PerfmonCSVData
    • New Convert-PerformanceBinaryData
    • New Get-CounterInstances
    • New Import-FilteredCounter

New functions that normalizes the Perfmon data so the counter names,instances,category become attributes instead of a column. For example:

(PDH-CSV 4.0) (Eastern Daylight Time)(240)”,”\\HOST001\\logicaldisk(Total)\% free space”,”\\HOST001\\logicaldisk(Total)\avg. disk sec/read”

becomes:

SampleTime,CounterCategory,CounterName,InstanceName,Value

This is useful when you want to perform aggregation of multiple counter instances per computer using other analysis tools such as Excel.

If you have feedback or ideas regarding GEM Automation, let me know!

Gotcha while installing wireless driver in Windows Server 2016 TP4

If you’re hitting¬†an error similar to the following while trying to install a wireless driver:

A service installation section in this inf is invalid.

Just make sure you install the feature Wireless LAN Service through Server Manager or by running the following PowerShell cmdlet:

Add-WindowsFeature -Name Wireless-Networking

I hope that will save a few people some hair pulling! ūüėČ

GEM Automation Feature – Test-VirtualDiskPerformance

As a follow up post to GEM Automation Feature – Run-DiskSpd, I will now be covering another storage test function, Test-VirtualDiskPerformance from InfrastructureTesting\libStorageSpacesTesting.psm1.

The main goal of this function is to facilitate testing of Storage Spaces for baselining and new hardware qualification by automatically generate Storage Spaces virtual disk variations on which multiple IO tests will be performed.

Here’s an overview of the function’s capabilities:

  • Creates test virtual disks in a specific storage pool by varying the following¬†settings
    • Storage Spaces
      • Number¬†of columns¬†used
      • Interleave size
      • ¬†Resiliency type
    • File system
      • File system type (currently NTFS and ReFS)
      • Allocation unit sizes
  • Determine the number of disks in the pool to adjust the number of column variations for the test virtual disk
  • Determine the disk types present in the pool to test them separately
  • Run-DiskSpd is then used¬†to run a variety of IO test cases based on their definition in DiskSpd_TestCases.config
  • Results are persisted to diskspd_output.csv

Here’s an example of how you would call Test-VirtualDiskPerformance:

Test-VirtualDiskPerformance -storagePoolFriendlyName "Storage Pool" -ioTestCaseName SQLServerVM -ioTestDuration 5 -virtualDiskSizeInGB 5

Here’s what you see while running the tests (you will also see the status for the Run-DiskSpd tests as well):

Test-VirtualDiskPerformance_Screenshot

Once the tests have completed, you can analyze the results using the same Excel spreadsheet as for the Run-DiskSpd analysis, Storage Testing Analysis.xlsx. Again, should you have any questions or comments regarding this, feel free to let me know via the comments!

 

 

Usenix Federated Conference Week Day 5

*** This is a post from 2013 that happened to be sitting as a draft in WordPress, decided to publish it anyway ūüėČ ***

I’ve decided to spend some time in the HotStorage conference track today. The first session was a panel discussion on software defined storage. The panel was composed from representatives from Nexenta, EMC, Nimble Storage, VMWare and Maginatis. They tried to demystify the definition of what is SDS while also providing comparison with the networking world which has a bit more maturity in the domain. The panel seems to suggest that object storage will be the foundation for future storage platform while providing richer semantic to storage through REST based API. The representative from Nimble seem to suggest that converged and software defined storage are not holding their promises regarding cost reduction and flexibility. I can only disagree with his statement as I’m seeing the benefits already with the first iteration of Storage Spaces and I have even more hope with the second version coming with Windows Server 2012 R2.

The second session was comparing the physical and logical backups for virtual machines. As I expected, once you apply deduplication and compression to the backup data, physical backups are as efficient storage wise while being better for backup speed purposes due to sequential reads during the backup operation. Not the most ground-breaking session!

tumblr_m57xd65swZ1qjvxfho1_500

The third session was regarding performance improvements of VM through virtual disk introspection. Surprisingly, they seem to be able to achieve good performance improvements by better understanding the various IO calls for metadata manipulation.

The fourth session covered how a Chinese cloud provider was able to backup thousands of VMs with multiple TB of changed data on a daily basis in a timely fashion. The researcher came up with a mechanism to backup the VMs in parallel while also deduplicating the backup data. They were able to achieve speeds of close to 9GB/s on 100 hosts and 2500 VMs.

The fifth session was comparing different types of PCIe based SSDs, SAS controller based SSDs and PCIe and software based controller SSDs. While having PCIe interconnect end to end provides the lowest latency, CPU usage is very high, 70-90% of a core to sustain the performance. The SAS based card kept CPU much lower, sub 10%.

The sixth session was from the group who runs Titan, one of the largest supercomputer in the world. Some interesting facts: it costs 1 million dollars per day to run titan and 1 second of idle core time wastes 300 hours on a 1M core jobs. The goal of the research was to better redistribute IO across the storage devices.

The seventh session was regarding an efficient storage mechanism to store time series data. The technique basically revolves around storing the data sequentially while using the time data as an index to enable efficient time based range queries.

I finished the day at the International Conference on Autonomic Computing. The first session there was from a researcher from Toronto University who presented a way to manage interactive and batch workloads in geo distributed datacenters. It went into some details on a subject I had read about a little from Google where they were moving workloads globally to leverage lower power costs by lowering cooling requirements to run applications. The model took into consideration temperature and electricity costs amongst other things.

Another session was on how you throttle CPU power in databases workloads by performing an iterative evaluation of the system power behavior while queries were running.

The last session of the day and the conference for me was from a student from Harvard that inferred wireless device patterns by integrating signal analysis in the receiver. The technique allowed for much greater reception range for the receiver, in the order of 4x.