Please consider donating: https://www.corelan.be/index.php/donate/


3,734 views

Monitoring Robocopy logfiles with Operations Manager 2007 (vbscript monitor)

One of the reasons why the company I work at has taken the decision to implement Operations Manager is because the daily task to go through a pile of interfacing and backup log files became too time consuming and demotivating for the people involved. Armed with Operations Manager and a basic vbscript, I have set up a couple of monitors that now have automated the entire process.

These are the components that I have set up in order to get it to work :

  • I created a vbscript that parses to one or more robocopy log files. The only real limitation of the script in its current form is that it only goes through the first set of robocopy output in the log file. So if you have the habit to append the output of multiple robocopy scripts, or multiple instances of the same script into the same log file, you’ll either have to change the script, or you’ll have to change your robocopy scripts.
  • Define what you want to monitor : what is the log file name, file location. Are you satisfied with the way the script will determine whether the log file is healthy or not (see later) ? Determine the right time to check the log file (don’t check the log file at a time when you know the robocopy script is still writing in the file), etc.
  • I created a monitor that uses the script and uses data that is passed back from the vbscript to OpsMgr to determine whether a log file (and the corresponding robocopy process) can be considered healthy or not.
  • I set up alerting, including some data about the log file(s) in the alert description field.

The script

You can download the latest copy of the script from here :
checkrobocopylog.vbs (8.9 KiB)

These are the most important pieces

  1. The script checks if you have provided at least one parameter. If not, an event will be logged in the event log and the script will quit.

    This is what the event will look like :  

    021908_2100_MonitoringR1  

  2. Every parameter stands for a log file. You can specify any number of log files (well, "any" may be to broad … I’m sure OpsMgr has some limitations, but I ran one script against up to 10 log files, without problems). Ideally, I would recommend using separate monitors for log files that are not related to other log files. It’s a bit of work to set up, but once it is set up, you’ll never need to touch your monitor config again. Note : you cannot use wildcards for specifying the log files. You can either modify the script to accept wildcards, or you should specify all of the log files individually as parameters to the script.
  3. Each log file is opened, the header of the robocopy log file is skipped, and the log file is read all the way until it finds the footer (the set of data that contains the information about the number of files, dirs and bytes that have been copied, skipped, failed, and so on. This information is parsed from the log file and stored in unique variables. Because the script allows you to process multiple log files, a sequence number is appended to each of these variables. So the variables that related to the first log file, will be appended with number 1 (as parameter name). These are the most important variables :
    1. For each log file : 23 variables
      1. Logfilefoundx (where x = sequence number of the log file) (true or false)
      2. Finishedx        (true or false)
      3. NrOfDaysAgox        (numeric) (set to -1 if file was not found)
      4. TotalNrOfBytesx    (string)
      5. TotalNrOfDirsx        (numeric)
      6. TotalNrOfFilesx        (numeric)
      7. NrOfCopiedBytesx    (string)
      8. NrOfCopiedDirsx    (numeric)
      9. NrOfCopiedFilesx    (numeric)
      10. NrOfSkippedBytesx    (string)
      11. NrOfSkippedDirsx    (numeric)
      12. NrOfSkippedFilesx    (numeric)
      13. NrOfMismatchBytesx    (string)
      14. NrOfMismatchDirsx    (numeric)
      15. NrOfMismatchFilesx    (numeric)
      16. NrOfFailedBytesx    (string)
      17. NrOfFailedFilesx    (numeric)
      18. NrOfFailedDirsx    (numeric)
      19. NrOfExtraBytesx    (string)
      20. NrOfExtraDirsx        (numeric)
      21. NrOfExtraFilesx        (numeric)
      22. LogFileNamex        (string)
      23. LastRunTimex        (string)

            

    2. General data (for all of the log files)
      1. TotalFailedDirs        (numeric)
      2. TotalFailedFiles    (numeric)
      3. AllLogFilesFound    (true or false)
      4. AllFinished        (true or false)
      5. FailedLogs        (string)
      6. Information    (string containing some detailed information about all of the log files. Useful in alert description fields.

         

  4. A logfile is considered to be a "failed" logfile if
    1. It contains failed files
    2. It contains failed dirs
    3. It is older than 3 days (hardcoded in the script – change this value to whatever you want)
    4. It cannot be found
    5. It does not contain the footer (so it has not completed yet)

    If you want different behavior, you’ll have to change the logic in the script.

  5. After processing all log files, the property bag is sent back to OpsMgr. If you only monitor 1 log file, 27 parameters will be passed back. You can use any of these 27 parameters in the expression or in the alert description, giving you maximum flexibility

    

Operations Manager configuration : set up the monitor

Log file : c:\robocopy.log

Open authoring, go the monitoring, and set the scope to "Windows Server 2003 computer" (or any other group that contains computer objects)

Open "Entity Health" – "Availability", right click and choose "Create a monitor". Select "Unit Monitor"

021908_2100_MonitoringR2

Select "Scripting" – "Generic" – Timed Script Two State Monitor, and select a custom management pack

021908_2100_MonitoringR3

Specify a good name for your monitor, verify that the target is set to Windows Server 2003 computer (or any other target containing computer objects) and make sure the monitor is disabled

021908_2100_MonitoringR4

Configure the schedule. I’ll set the script to run every 15 minutes, for testing purposes

021908_2100_MonitoringR5

Define the script filename (don’t forget the .vbs extension) and set a timeout.

Paste the entire script (see above) in the Script: field

021908_2100_MonitoringR6

Click the "parameters" field and fill out the full path to the log file(s). Put the path between double quotes, and separate multiple logfiles with a space.

021908_2100_MonitoringR7

    

Set the unhealthy expression. I’ll use a more or less generic trap : if the variable "FailedLogs" contains a dot (.), then it contains a reference to at least one log file, so the monitor should go into unhealthy state.

This is how you should reference a variable in the expression : Property[@Name=’FailedLogs’]

My unhealthy expression looks like this :

021908_2100_MonitoringR8

The healthy expression looks like this :

021908_2100_MonitoringR9

Choose the health state

021908_2100_MonitoringR10

Set alert settings :

021908_2100_MonitoringR11

If you are only monitoring one log file with this monitor, you can get some of the individual log file variables :

Logfile name : $Data/Context/Property[@Name=’LogFileName1′]$
Log file found : $Data/Context/Property[@Name=’LogFileFound1′]$
Log file finished : $Data/Context/Property[@Name=’Finished1′]$
Age of logfile (in days) : $Data/Context/Property[@Name=’NrOfDaysAgo1′]$
Nr of failed file copy actions logged : $Data/Context/Property[@Name=’NrOfFailedFiles1′]$
Nr of failed dir copy actions logged : $Data/Context/Property[@Name=’NrOfFailedDirs1′]$

If you have more than one logfile, you can use $Data/Context/Property[@Name=’Information’]$

The number of variables that can be used in the alert description field is limited to 10 (OpsMgr limitation), so if you are monitoring multiple log files, I’d recommend only using some of the general variables and not individual log file variables.

Save the monitor

Create an override and set the monitor to run on the server that hosts the log file.

021908_2100_MonitoringR12

021908_2100_MonitoringR13

021908_2100_MonitoringR14

Save the override

Wait until the Management Pack gets distributed and the script kicks in.

021908_2100_MonitoringR15

    

Have a look at the event log. You should see 2 Health Service Events (under Operations Manager) with Event ID 101, indicating the start and completion of the script.

021908_2100_MonitoringR16

021908_2100_MonitoringR17

(since the file in my example is 5 days old, the field "Log files with errors" lists d:\robocopy.log. As a result, the health state of my machine changes to warning.

If you open the health explorer for the computer, then you should see the monitor listed and enabled. If you go to the state change events view, you can see all of the parameters that were passed back as part of the property bag. The NrOfDaysAgo1 field indicates 5, which triggered the warning in my example.

021908_2100_MonitoringR18

If the monitor indicates that there was a problem with the log file, then you’ll get the following message in OpsMgr:

021908_2100_MonitoringR19

If you click ‘View additional knowledge" and open the Alert Context tabsheet, you’ll see all of the variables as well :

021908_2100_MonitoringR20

    

If the problem gets solved, the state should return to healthy automatically (depending on how you’ve set up the alerting section of this monitor)

    

    

    

  

© 2008 – 2017, Peter Van Eeckhoutte (corelanc0d3r). All rights reserved.

Comments are closed.

Corelan Training

We have been teaching our win32 exploit dev classes at various security cons and private companies & organizations since 2011

Check out our schedules page here and sign up for one of our classes now!

Donate

Want to support the Corelan Team community ? Click here to go to our donations page.

Want to donate BTC to Corelan Team?



Your donation will help funding server hosting.

Corelan Team Merchandise

You can support Corelan Team by donating or purchasing items from the official Corelan Team merchandising store.

Protected by Copyscape Web Plagiarism Tool

Corelan on Slack

You can chat with us and our friends on our Slack workspace:

  • Go to our facebook page
  • Browse through the posts and find the invite to Slack
  • Use the invite to access our Slack workspace
  • Categories