How healthy is your z/OS on zD&T? (How to monitor z/OS Health)
A good systems programmer should generally know how to monitor the health of his or her z/OS system. However, if you’re using z/OS Development and Test (ZD&T), you’re likely a developer with only the basic skills. Fortunately, the product comes with Health Checker for z/OS, and it should be configured to run out of the box with a plethora of general purpose checks.
A Primer on z/OS Health Checker
The Health Checker is actually is a component of MVS that can diagnose potential problems before they adversely impact your system. It it is not a monitoring or diagnostic tool, but more of a validator that checks your system for derivations from standard and best practices. At work is a set of programs (called checks), that are run on a frequency by a started task (HZSPROC). It will run checks periodically and store the results in a sequential dataset, typically ADCD.&SYSNAME..HZSPDATA (as defined in the proc on zD&T).
How to get the health check output
If you’ve seen the following in your zD&T log, then you might be wondering, what does that mean?
OPRMSG: HZS0001I CHECK(IBMCSV,CSV_APF_EXISTS):
OPRMSG: CSVH0957E Problem(s) were found with data sets in the APF list.
In this case, it means that the one of the health checks, IBMCSV, has run, and it specifically looks at the rule CSV_APF_EXISTS, which checks to make sure all the APF authorized datasets actually exists. However, this entry in the log only indicates that it ran. It does not tell you which datasets were not found.
To get all the details, you’ll run a JCL job against the Health Checker system which will spill out information from its storage. There is a sample JCL, HZSPRINT located in SYS1.SAMPLIB that you can copy and tailor to your liking. In a nutshell, its a job that queries the storage, gets the output, and stores it in a readable format wherever you want (in a dataset, a USS log file, or a JES SYSOUT). In my case above, I tailor it to query for only the CSV_APF_EXISTS check and spit the output to SYSOUT as follows:
//HZSPRINT EXEC PGM=HZSPRNT,TIME=1440,REGION=0M,PARMDD=SYSIN //SYSIN DD * CHECK(IBMCSV,CSV_APF_EXISTS) ,EXCEPTIONS //SYSOUT DD SYSOUT=A,DCB=(LRECL=256)
This spits out the info I need to determine which datasets are missing, via the SYSOUT.
Opening up the SYSOUT from the JES spool, I see the following:
* * Start: CHECK(IBMCSV,CSV_APF_EXISTS) * * CHECK(IBMCSV,CSV_APF_EXISTS) SYSPLEX: ADCDPL SYSTEM: S0W1 START TIME: 04/14/2021 07:18:37.465875 CHECK DATE: 20071120 CHECK SEVERITY: LOW CHECK PARM: MIGRATEDOK(SYSTEM) CSVH0955I A problem was found with each APF list entry displayed. VOLUME DSNAME ERROR A4CFG1 NETVIEW.V621USER.VTAMLIB DS not found Low Severity Exception * CSVH0957E Problem(s) were found with data sets in the APF list.
In my case the correct dataset was NETVIEW.VTAMLIB, so I made the correction to my parmlib member and all is well. There should be other output you can check for as well, as a CHECK(*,*) in your JCL member would give you.
How to see what checks are configured
To display which checks are configured to run, you issue a system operator modify command to the HZSPROC as such:
f hzsproc,display,checks,check=(IBMCSV,*),detail
In this case, it should spit out all the Content Supervision Checks (IBMCSV) runs, which includes
Checks are configured in the HZSPRMxx member in you system parmlib. You can change the checks here, and the changes will be reflected in your next IPL. You can also issue changes dynamically using the modify command above. See this cheat sheet for examples.
More more details on what checks are available, see the IBM Health Checker for z/OS checks – IBM Documentation.