Back to the main data prep VBA page
I work within development organizations helping to tune high scale workloads for our large scale commercial customer base. These customers typically deploy in a variety of environments that include many different database technologies, often ones not directly owned by the company I work for. Since our application is heavily dependent on the performance of the underlying database, I’ve worked up many reference scripts used to parse performance data provided by the different databases in use so I can create a time series view of key performance metrics from the RDBMS.
In this case I have presented a simple script that parses a directory full of Oracle Statspack text files that contain performance data on regular snap intervals. These files contain a wealth of performance data that is often used as a single report that spans multiple hours. Being a performance guy I want a time series view of this data that uses smaller snap intervals and tracks key individual metrics over time so I can correlate any application tier issues to specific database metrics captured by Oracle Statspack.
The following script will create a single pipe (“|”) delimited output file that contains a row per report with the key sections of the Oracle Statspack output I am interested in. This script can easily be adapted to target other sections of Oracle Statspack output, but there are a few interesting parsing examples that can be altered as needed. Here’s the script in it’s entirety:
[vb]
Sub importAWRtext()
‘June 2014
‘routine to import text based AWR snaps
‘logic allows for multi-node imports on RAC environments
‘point to a directory and import all reports
‘set variables
Dim rawLOG As String, rawLOGpath As String
Dim splitArray() As String, StrOut As String
Dim awrTextArr() As Variant
Dim lineCt As Double
pathFull = Sheets("input").Range("C13")
Set objrawLOG = CreateObject("Scripting.FileSystemObject")
rawLOGpath = objrawLOG.GetFile(pathFull).ParentFolder.path & "\"
‘setup clean files
srcCSV = Dir(objrawLOG.GetFile(pathFull).ParentFolder.path & "\")
Do While srcCSV <> ""
tempFILE = rawLOGpath & "\" & srcCSV & ""
If srcCSV Like "csvAWR*" Then
Set delFILE = CreateObject("Scripting.FileSystemObject")
delFILE.Deletefile tempFILE, True
End If
srcCSV = Dir
Loop
fnameLOGclean = "csvAWRclean.out"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objCLEAN = objFSO.CreateTextFile(rawLOGpath & "\" & fnameLOGclean, 1)
‘build array of log names up front, use those as source
Dim logNameArr(10000, 1) As Variant
rawLOG = Dir(rawLOGpath)
logCT = 0: logSO = 0
Do While rawLOG <> ""
If rawLOG Like "*.txt" Then
logCT = logCT + 1
logNameArr(logCT, 1) = rawLOG
End If
rawLOG = Dir
Loop
‘setup array for caching results
ReDim awrTextArr(logCT + 1, 1) As Variant
‘loop through statspack files
For logLoop = 1 To logCT
lineCt = 0: curStr = ""
rawLOG = logNameArr(logLoop, 1)
srcFullPath = rawLOGpath & rawLOG
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objrawLOG = objFSO.OpenTextFile(rawLOGpath & rawLOG, 1)
Do Until objrawLOG.AtEndOfStream
txnstring = objrawLOG.ReadLine
lineCt = lineCt + 1
‘DB info
If txnstring Like "DB Name*" Then curStr = "DBname": tgtLine = lineCt + 2
If tgtLine = lineCt And curStr = "DBname" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(0) & "|" & splitArray(2) & "|" _
& splitArray(6) & "|" & splitArray(7) & "|"
curStr = ""
headerLine = headerLine & "dbname|instance|release|RAC|"
End If
‘Host info
If txnstring Like "Host Name*" Then curStr = "HOSTname": tgtLine = lineCt + 2
If tgtLine = lineCt And curStr = "HOSTname" Then
StrOut = StrOut & Trim(Left(txnstring, 16)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 49), 32)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 54), 4)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 60), 4)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 68), 7)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 80), 10)) & "|"
curStr = ""
headerLine = headerLine & "host name|platform|CPUs|Cores|Sockets|Memory(GB)|"
End If
‘unique strings
If txnstring Like "Begin Snap:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(2) & "|" & splitArray(3) & " " & splitArray(4) & "|"
headerLine = headerLine & "begin snap ID|begin time|"
End If
If txnstring Like " End Snap:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(2) & "|" & splitArray(3) & " " & splitArray(4) & "|"
headerLine = headerLine & "end snap ID|end time|"
End If
If txnstring Like " Buffer Cache:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(2) & "|" & splitArray(7) & "|"
headerLine = headerLine & "buffer cache|block size|"
End If
If txnstring Like " Logical reads:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(2) & "|"
headerLine = headerLine & "lrps|"
End If
If txnstring Like " Physical reads:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(2) & "|"
headerLine = headerLine & "prps|"
End If
If txnstring Like "*Buffer Hit %:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(3) & "|"
headerLine = headerLine & "BCHR|"
End If
‘CPU utilization
If txnstring Like "*Host CPU (CPUs*" Then curStr = "CPUutil": tgtLine = lineCt + 4
If tgtLine = lineCt And curStr = "CPUutil" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(2) & "|" & splitArray(3) & _
"|" & splitArray(4) & "|" & splitArray(5) & "|"
curStr = "": tgtLine = 0
headerLine = headerLine & "%user|%system|%wio|%idle|"
End If
‘Top 5 timed events
If txnstring Like "Top 5 Timed Foreground Events*" Then curStr = "top5": tgtLine = lineCt + 6
If tgtLine = lineCt And curStr = "top5" Then
StrOut = StrOut & Trim(Left(txnstring, 30)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 43), 12)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 55), 11)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 62), 6)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 69), 6)) & "|"
tgtLine = tgtLine + 1: loopInc = loopInc + 1
headerLine = headerLine & "event" & loopInc & "|waits|time(s)|avg wait(ms)|%DB time|"
If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0
End If
‘top 3 SQL by elapsed time
If txnstring Like "*SQL ordered by Elapsed Time*" Then curStr = "SQLbyTime"
If curStr = "SQLbyTime" And loopInc = 0 And txnstring Like "———-*" Then tgtLine = lineCt + 1
If curStr = "SQLbyTime" And loopInc > 0 And Trim(txnstring) = "" Then tgtLine = lineCt + 1
If tgtLine = lineCt And curStr = "SQLbyTime" And loopInc < 4 Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(6) & "|" & splitArray(0) & "|" & splitArray(1) & "|" & _
splitArray(2) & "|" & splitArray(3) & "|"
loopInc = loopInc + 1
headerLine = headerLine & "elapsed" & loopInc & " sqlID|elapsed(sec)|execs|ms/exec|%total|"
End If
If txnstring Like "*SQL ordered by CPU Time*" Then loopInc = 0: tgtLine = 0: curStr = ""
‘top 3 SQL by Gets
If txnstring Like "*SQL ordered by Gets*" Then curStr = "SQLbyGets"
If curStr = "SQLbyGets" And loopInc = 0 And txnstring Like "———-*" Then tgtLine = lineCt + 1
If curStr = "SQLbyGets" And loopInc > 0 And Trim(txnstring) = "" Then tgtLine = lineCt + 1
If tgtLine = lineCt And curStr = "SQLbyGets" And loopInc < 4 Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(7) & "|" & splitArray(0) & "|" & splitArray(1) & "|" & _
splitArray(2) & "|" & splitArray(3) & "|"
loopInc = loopInc + 1
headerLine = headerLine & "gets" & loopInc & " sqlID|gets|execs|gets/exec|%total|"
End If
If txnstring Like "*SQL ordered by Reads*" Then loopInc = 0: tgtLine = 0: curStr = ""
‘Segments by parser
‘there are multiple similarly structured tables in the segments by section
‘they are variable in length, and some have an extra line in the table header
‘this section sets up the target line start and the custom header variables
If txnstring Like "*Segments by Logical Reads*" Then
curStr = "SegByLog": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by log"
header2 = "|name|object|type|logical reads|%total|"
End If
If txnstring Like "*Segments by Physical Reads*" Then
curStr = "SegByPhys": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by phys"
header2 = "|name|object|type|physical reads|%total|"
End If
If txnstring Like "*Segments by Physical Writes*" Then
curStr = "SegByPhysW": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by writes"
header2 = "|name|object|type|physical writes|%total|"
End If
If txnstring Like "*Segments by Table Scans*" Then
curStr = "SegByScans": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by scans"
header2 = "|name|object|type|table scans|%total|"
End If
If txnstring Like "*Segments by Row Lock Waits *" Then
curStr = "SegByLocks": tgtLine = lineCt + 8: testEmpty = lineCt + 2
header1 = "segs by rlwait"
header2 = "|name|object|type|row lock waits|%total|"
End If
If curStr Like "SegBy*" And lineCt = testEmpty And txnstring Like " No data exists for this section of the report.*" Then
For t = 1 To 5
StrOut = StrOut & "||||||"
headerLine = headerLine & header1 & loopInc & header2
Next t
curStr = "": loopInc = 0: tgtLine = 0
End If
‘execute "segments by" parsing logic
‘since the table is variable length, look for the end line and skip further processing
If tgtLine = lineCt And curStr Like "SegBy*" Then
If txnstring Like " ——————————*" Then
For t = loopInc + 1 To 5
StrOut = StrOut & "||||||"
headerLine = headerLine & header1 & loopInc & header2
Next t
curStr = "": loopInc = 0: tgtLine = 0
Else
StrOut = StrOut & Trim(Left(txnstring, 11)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 21), 10)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 42), 20)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 59), 5)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 72), 12)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 80), 7)) & "|"
tgtLine = tgtLine + 1: loopInc = loopInc + 1
headerLine = headerLine & header1 & loopInc & header2
If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0
End If
End If
Loop
‘record output string
‘caching to an array so that I can build out a custom header
If logLoop = 1 Then awrTextArr(1, 1) = headerLine
awrTextArr(logLoop + 1, 1) = StrOut
StrOut = "": headerLine = ""
Next logLoop
‘loop through array, write output to flie
For i = 1 To logCT + 1
objCLEAN.WriteLine awrTextArr(i, 1)
Next i
‘clean up
objrawLOG.Close
End Sub
[/vb]
I fully admit I cheated and using the worksheet TRIM function in a few cases to collapse consecutive spaces for easier splitting on variable length fields. There are likely a few ways to solve this by walking the string and counting sections, but TRIM was easier. I may revisit some of my brute force approaches to relying on fixed field length parsing.
So far I have used this on 11.2.0.3.0 output. We’ll see if it holds up as other samples from different Oracle versions come in. I’d been putting off writing this script for a few years, focusing my initial efforts on the Oracle AWR output, but I finally broke down and wrote this script after yet another RAC cluster set of reports came in (24 x 1 hour samples across a 4 node cluster for a total of 96 reports…).
Parsing data from a single line in Oracle Statspack
The simplest use case is to pull data from multiple columns in a single line. In this case, I used the worksheet TRIM function to eliminate extra spaces in the variable length lines, followed by the split function using a space as a delimiter. Then I pulled the specific “column” of interest:
[vb]
If txnstring Like "*Buffer Hit %:*" Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(3) & "|"
headerLine = headerLine & "BCHR|"
End If
[/vb]
Parsing data from a fixed row count Oracle Statspack table
There are tables within Oracle Statspack that have a fixed number of entries. An example of this is the “top 5 timed events” that will always have 5 entries in a production system. I have not run across a case where there are not five entries, so we can simplify the parsing of this table by simply keeping a count of the lines that have been processed.
[vb]
‘Top 5 timed events
If txnstring Like "Top 5 Timed Foreground Events*" Then curStr = "top5": tgtLine = lineCt + 6
If tgtLine = lineCt And curStr = "top5" Then
StrOut = StrOut & Trim(Left(txnstring, 30)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 43), 12)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 55), 11)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 62), 6)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 69), 6)) & "|"
tgtLine = tgtLine + 1: loopInc = loopInc + 1
headerLine = headerLine & "event" & loopInc & "|waits|time(s)|avg wait(ms)|%DB time|"
If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0
End If
[/vb]
You can see that I used a fixed format parsing method for counting character positions based on the expected table layout. We’ll see if they modify the column assignments by release 🙂
Parsing data from a variable length Oracle Statspack table
This is a section of the script where I pulled the top three SQL queries ordered by buffer gets. There are two if statements that skip lines that should be ignored. The parsing will run until the next table is encountered (SQL by Reads):
[vb]
‘top 3 SQL by Gets
If txnstring Like "*SQL ordered by Gets*" Then curStr = "SQLbyGets"
If curStr = "SQLbyGets" And loopInc = 0 And txnstring Like "———-*" Then tgtLine = lineCt + 1
If curStr = "SQLbyGets" And loopInc > 0 And Trim(txnstring) = "" Then tgtLine = lineCt + 1
If tgtLine = lineCt And curStr = "SQLbyGets" And loopInc < 4 Then
tempStr = Application.WorksheetFunction.Trim(txnstring)
splitArray() = Split(tempStr, " ")
StrOut = StrOut & splitArray(7) & "|" & splitArray(0) & "|" & splitArray(1) & "|" & _
splitArray(2) & "|" & splitArray(3) & "|"
loopInc = loopInc + 1
headerLine = headerLine & "gets" & loopInc & " sqlID|gets|execs|gets/exec|%total|"
End If
If txnstring Like "*SQL ordered by Reads*" Then loopInc = 0: tgtLine = 0: curStr = ""
[/vb]
Parsing a variable length Oracle Statspack tables with the same base format
I have a section in the script that uses the same parsing method for multiple tables that have the same fixed format dimensions. These “Segment by” tables have the same column counts but different headers. The complicating factor is in some cases, these tables contain no data. I have to trap that condition and maintain the output string dimensions:
[vb]
‘Segments by parser
‘there are multiple similarly structured tables in the segments by section
‘they are variable in length, and some have an extra line in the table header
‘this section sets up the target line start and the custom header variables
If txnstring Like "*Segments by Logical Reads*" Then
curStr = "SegByLog": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by log"
header2 = "|name|object|type|logical reads|%total|"
End If
If txnstring Like "*Segments by Physical Reads*" Then
curStr = "SegByPhys": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by phys"
header2 = "|name|object|type|physical reads|%total|"
End If
If txnstring Like "*Segments by Physical Writes*" Then
curStr = "SegByPhysW": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by writes"
header2 = "|name|object|type|physical writes|%total|"
End If
If txnstring Like "*Segments by Table Scans*" Then
curStr = "SegByScans": tgtLine = lineCt + 7: testEmpty = lineCt + 2
header1 = "segs by scans"
header2 = "|name|object|type|table scans|%total|"
End If
If txnstring Like "*Segments by Row Lock Waits *" Then
curStr = "SegByLocks": tgtLine = lineCt + 8: testEmpty = lineCt + 2
header1 = "segs by rlwait"
header2 = "|name|object|type|row lock waits|%total|"
End If
If curStr Like "SegBy*" And lineCt = testEmpty And txnstring Like " No data exists for this section of the report.*" Then
For t = 1 To 5
StrOut = StrOut & "||||||"
headerLine = headerLine & header1 & loopInc & header2
Next t
curStr = "": loopInc = 0: tgtLine = 0
End If
[/vb]
The prior section sets the type of table that is being parsed with the proper identifiers and header strings. It also contains the logic to look for the tables that contain no data. Once the type of table is established, the tables are parsed and the content is appended to the output string:
[vb]
‘execute "segments by" parsing logic
‘since the table is variable length, look for the end line and skip further processing
If tgtLine = lineCt And curStr Like "SegBy*" Then
If txnstring Like " ——————————*" Then
For t = loopInc + 1 To 5
StrOut = StrOut & "||||||"
headerLine = headerLine & header1 & loopInc & header2
Next t
curStr = "": loopInc = 0: tgtLine = 0
Else
StrOut = StrOut & Trim(Left(txnstring, 11)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 21), 10)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 42), 20)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 59), 5)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 72), 12)) & "|"
StrOut = StrOut & Trim(Right(Left(txnstring, 80), 7)) & "|"
tgtLine = tgtLine + 1: loopInc = loopInc + 1
headerLine = headerLine & header1 & loopInc & header2
If loopInc = 5 Then curStr = "": loopInc = 0: tgtLine = 0
End If
End If
[/vb]