Read and Update with SQL

One of the advantages of SQL over native IO is that it allows you to handle sets of data rather than having to read and update each row individually. However, there can be cases where you want to handle your data line by line. To do this with SQL, you need to use a cursor.

I have a couple of examples below, both of which are written in fully free RPGLE. Going forward, all examples will be fully free because, as developers, we should all be putting the past behind us.

I am basing this on the previously mentioned Employees file, to which I have added an extra field named EMPVALID. This is numeric — a zero indicates an invalid employee number and a one indicates a valid employee number. Any number of validations or codes could be added, of course, but I want to keep things simple.

So first of all, here’s a simple program that will read and display each invalid employee.

 
      **Free
        
        // Employee records data structure
        dcl-ds EMPDS extname('EMPLOYEES')
        end-ds;

        // Ensure the cursor is not already open
        // This isn't really necessary here, but it is good practice
        exec SQL close C1;

        // Declare and open the cursor
        exec SQL declare C1 cursor for
            select EMPNUMBER, EMPNAME, EMPVALID
            from EMPLOYEES
            where EMPVALID = 0
            for fetch only;
        exec SQL open C1;

        // Read the file
        dou SQLCOD < 0 or SQLCOD = 100;
            exec SQL fetch C1 into :EMPDS;
            if SQLCOD >= 0 and SQLCOD <> 100;
                dsply (EMPNAME);
            endif;
        enddo;

        // Close the cursor and exit
        exec SQL close C1;
        *inlr = *on;

This is fine as far as it goes, but what we really want to do is check whether the employee numbers are valid and update the EMPVALID accordingly. For this, I will use the %check BIF.


      **Free

        // Employee records data structure
        dcl-ds EMPDS extname('EMPLOYEES')
        end-ds;

        // A display field to get around the limitations of the DSPLY OpCode
        dcl-s DISPLAY CHAR(52);

        // And we need a constant, digits, containing the valid characters
        dcl-c DIGITS '0123456789';

        // Ensure the cursor is not already open
        // This isn't really necessary here, but it is good practice
        exec SQL close C1;

        // Declare and open the cursor
        // Note that the last clause is now a 'for update of...'
        exec SQL declare C1 cursor for
            select EMPNUMBER, EMPNAME, EMPVALID
            from EMPLOYEES
            where EMPVALID = 0
            for update of EMPVALID;
        exec SQL open C1;

        // Read the file
        dou SQLCOD < 0 or SQLCOD = 100;
            exec SQL fetch C1 into :EMPDS;
            if SQLCOD >= 0 and SQLCOD <> 100;
                if %check(DIGITS: EMPNUMBER) > 0;
                    DISPLAY = %trim(EMPNAME) + ' is valid';
                    dsply (DISPLAY);

                    // And this is where we set EMPVALID = 1
                    // for the current record
                    exec SQL
                        update EMPLOYEES
                        set EMPVALID = 1
                        where current of C1;
                endif;
            endif;
        enddo;

        // Close the cursor and exit
        exec SQL close C1;
        *inlr = *on;

And this is the output:

DSPLY  Bugs Bunny is valid
DSPLY  Elmer Fudd is valid
DSPLY  Speedy Gonzales is valid
DSPLY  Sylvester is valid

And looking at the file, I can see everything is updated, as below:

EMPNUMBER   EMPNAME                                                  EMPVALID
00001       Bugs Bunny                                                      1
OOOO2       Elmer Fudd                                                      1
0000000003  Daffy Duck                                                      0
-4          Speedy Gonzales                                                 1
00005       Sylvester                                                       1

I told you that Elmer Fudd was tricksey.

As a rule, updating sets of data is much more efficient and should certainly be preferred for any new development. However, there are plenty of programs out there for which the business rules are tucked away in some validation subroutine which has been written to process one line at a time. In these cases, being able to process line by line with an SQL cursor gives you a handy first step towards modernising the application.

I have also found cursors to be useful when building subfiles, but that’s a whole other post.

Editing RPGLE source members with Vim

I am very fond of the Vim text editor and will use it, out of preference, whenever possible. This includes writing RPG (and, obviously, RPGLE) programs on the IBM i. The only downside here is that, while Vim supports syntax highlighting for a multitude of languages, RPG isn’t one of them.

For a long time, I have relied on the syntax files written by Martin Rowe back in 2014. These, however, are getting more than a little long in the tooth. So, since I have a little time on my hands, I have started putting together a new set of syntax files which, hopefully, will be able to keep up with any new developments.

It’s all very early days so far, but you can find the Vim RPG Syntax highlighter here.

Feel free to download it and use it. And if you have any improvements to make or suggest, I would love to hear from you.

On indicators

RPG has, since the beginning of time, supported a set of indicators — one byte characters that can be either *on or *off. These are popular but really shouldn’t be used any more. Being named *IN01 to *IN99 makes for indicators whose function is unclear and you are much better off using built in functions and explicitly defined fields.

There is an exception, of course. When dealing with display files, you have no choice but to use indicators to determine what keys have been pressed. Even here, though, it is a good idea to overlay the indicators with human readable field names.

This is how I do it:

 * Global variables
d IndicatorPtr    s               *   inz(%addr(*in))
d                 ds                  based(IndicatorPtr)
d iExit                   3      3
d iRefresh                5      5
d iAdd                    6      6
d iCancel                12     12
d iValidCmdKey           25     25
d iError                 30     30
d iPageDown              26     26
d iSflInit               80     80
d iSflEnd                81     81
d iSflEmpty              82     82
d iSflPosCursor          83     83
d iSflDeleted            84     84
d iProtectKey            90     90
d iProtectData           91     91

So *IN03 is mapped to field iExit, *IN05 is mapped to field iRefresh, and so on.

With the indicators defined in this way, I can write code like this:

p Main            b
d Main            pi
 /free

     // Identify the current program
     program = RtvProgram();

     // Main processing loop
     dou iExit = *on;
         WorkWithMessageDescriptions();
     enddo;

     return;

 /end-free
p Main            e

Note also that this Main procedure is not setting on the *LR indicator. This is because, in the header spec, I have a named main procedure:

h main(Main)

Using this tells the compiler that I am not using the RPG logic cycle and that processing should start with procedure Main. Not only does this allow me to abandon setting on *LR but also reduces a lot of the progeram’s overhead.

Using regular expressions in DB2 for i

Regular expressions have been supported in DB2 for a while now and, while they are not something I often need, there are times when they come in very handy indeed. A simple example involves cleaning up some data so that it can be exported from an old (poorly maintained) database to a new, shiny one that (I’m promised) absolutely will be used.

The database in question is an employee master and, in order to protect the innocent, I have both simplified it and changed the names for the purpose if this example. There are only two fields to worry about empnumber and empname, which represent the employee number and name respectively. There are a couple of issues to address. The first one is that the empnumber is ten characters in the old database but five numeric in the new, so I need to ensure that all the numbers fit.

The second issue is that this database has been very poorly maintained. If you allow someone to enter a character in a numeric field, they will. And they did.

Here’s an idea of how the database looked:

EMPNUMBER   EMPNAME
00001       Bugs Bunny
OOOO2       Elmer Fudd
0000000003  Daffy Duck
-4          Speedy Gonzales
00005       Sylvester

As you can see, there are lots of ways in which the employee number can be invalid, and these have all been very well explored. And watch out for Elmer Fudd — he’s a bit confused about the difference between letters and numbers.

Fortunately REGEXP_INSTR is your friend:

The REGEXP_INSTR returns the starting position or the position after the end of the matched substring, depending on the value of the return_option argument.

So all I need is a regular expression that identifies which employee numbers are five digits followed by spaces, and which aren’t. And here it is:

select * from employees
where regexp_instr(empnumber, '^\d{5}\s+$') != 1

There are several optional parameters but I only need to use the source string (empnumber) and the pattern expression. The expression I am using looks for digits in the first five characters followed by spaces to the end of the field. And this is the result:

EMPNUMBER   EMPNAME
OOOO2       Elmer Fudd
0000000003  Daffy Duck
-4          Speedy Gonzales

I told you that Elmer Fudd was tricksy.

It should also be possible to use REGEXP_REPLACE to automate some of the cleanup, but you really do need your business users to buy in before you go down that particular rabbit hole.

Merging data with SQL MERGE

There have been times, many times, when I have needed to update the one or more columns in a table with values retrieved from another. A traditional approach to this is to use an UPDATE with subselects but there is a better way:

The MERGE statement updates a target (a table or view) using data from a source (result of a table reference). Rows in the target that match the input data may be updated or deleted as specified, and rows that do not exist in the target may be inserted as specified. Updating, deleting, or inserting a row in a view updates, deletes, or inserts the row into the tables on which the view is based if no INSTEAD OF trigger is defined on the view.

In order to play around with this, I build and populated a couple of test tables (imaginatively named TEST01 and TEST02) using the below scripts:

create table test01
 (key_field char(1) not null,
  data01 char(20),
  data02 char(30),
  data03 char(5),
  primary key (key_field));

insert into test01
 (Key_field, data01)
values('A', 'TEXT One');

insert into test01
 (Key_field, data01)
values('B', 'TEXT Two');

insert into test01
 (Key_field, data01)
values('D', 'TEXT Four');
create table test02
 (key_field char(1) not null,
  data01 char(20),
  data02 char(30),
  data03 char(5),
  primary key (key_field));

insert into test02
 (Key_field, data02)
values('A', 'More text one');

insert into test02
 (Key_field, data02, data03)
values('C', 'more text two', 'CODE');

insert into test02
 (Key_field, data01, data02)
values('E', 'TEXT five', 'More text five');

This gives me two tables with one matched row (key_field=’A’). To update the first table with the values in the second is really rather simple:

MERGE INTO test01 a
  USING (SELECT key_field, data01, data02, data03 from test02) b
  ON b.key_field = a.key_field
WHEN MATCHED THEN
   UPDATE SET data02 = b.data02, data03 = b.data03;

And the one matched row is updated in table TEST01 with the data from table TEST02.

If this was all I could do, I would be happy to have replaced a rather clunky approach with a more readable solution. But now the MERGE statement really starts to make a difference:

MERGE INTO test01 a
  USING (SELECT key_field, data01, data02, data03 from test02) b
  ON b.key_field = a.key_field
WHEN MATCHED THEN
   UPDATE SET data02 = b.data02, data03 = b.data03
WHEN NOT MATCHED THEN
   INSERT (key_field, data01, data02, data03)
   VALUES (b.key_field, b.data01, b.data02, b.data03)
NOT ATOMIC
  CONTINUE ON SQLEXCEPTION;

The WHEN NOT MATCHED clause means that not only can I update the matched rows, I can also do something about the rows that exist in table TEST02 that have now matched record in table TEST01. In this example, I have inserted the rows from table TEST02, other options allow me to either delete the row from table TEST01 or raise a SIGNAL statement.

And that NOT ATOMIC CONTINUE ON SQLEXCEPTION specifies that if an error is encountered on the insert, the MERGE will continue. Bad data happens, this saves be the burden of constantly having to refine my selection criteria to avoid the crap.

And a final note on the subject of selection criteria, the WHEN MATCHED and WHEN NOT MATCHED can both be extended with an AND to refine the selection to exactly what you want.

Do go and read up on the MERGE statement. It provides a powerful and flexible approach to updating tables from source data and one that I will be using in the very near future.

Better date conversion with timestamp_format

Handling dates and times on the IBM i can often be a bit of a challenge. Obviously, if you are working with a modern application that uses the Date, Time and Timestamp data types everything is nice and straightforward. However, there are plenty of old applications that use eight, seven or six digit numeric fields to represent dates and, sooner or later, you will need to extract these in a manner that looks sane in some other environment.

Timestamp_format is your friend.

The TIMESTAMP_FORMAT function returns a timestamp that is based on the interpretation of the input string using the specified format.

It’s that “interpretation of the input string” that makes this so handy. Put simply: you tell the function how to interpret the string and it will do the rest.

Here’s an example. I have a contracts table (CONTRACTS) with three columns: Contract Number (CTNR), Start Date (STDT) and End Date (ENDT). And the start and end date are both seven digit numbers. I need to list all of the currently active contracts (Start Date is before Today and End Date is after Today).

And the SQL looks like this:

select CTNR,
       date(timestamp_format(substr(digits(STDT), 2, 6), 'YYMMDD')),
       date(timestamp_format(substr(digits(ENDT), 2, 6), 'YYMMDD'))
from CONTRACTS
where date(timestamp_format(substr(digits(STDT), 2, 6), 'YYMMDD')) <= current_date
and date(timestamp_format(substr(digits(ENDT), 2, 6), 'YYMMDD')) >= current_date

Current_date is a special register that returns the current date, but that’s not important right now.

The date conversion part involves several nested functions to get to a final date, so here’s the breakdown using a date of 1st September 2015 (1150901):

  • digits(STDT) converts the 7 digit number into a 7 character string. So 1150901 is converted to ‘1150901’.
  • The IBM i can figure out which century it’s in by looking at the year, so I don’t need the century flag and the substr(digits(STDT), 2, 6) strips it out to give me a value of ‘150901’
  • The timestamp_format function takes the date string and uses the format string of ‘YYMMDD’ to generate a timestamp of 2015-09-01-00.00.00.000000. Since I only passed it a date, the hours, minutes and seconds are all zeroes.
  • And finally, I can use the date function to retrieve the date from the generated timestamp.

And here’s the result:

CTNR      DATE      DATE
C1000001  01/09/15  31/08/16

Pretty, and portable.

Using EXTFILE to override files within RPG

So here’s the situation: Five warehouses populating five sets of files (same filenames, different libraries) and I have an RPGLE program that needs to read through each of these to accumulate dispatch information.

The traditional way of doing this would involve writing a CL program to OVRDBF to each file before calling the RPGLE program. This works, but it’s a bit messy and this can cause future maintenance problems. It’s much better, therefore, to specify the file (and library) to open within the RPGLE program itself. You can do this with the EXTFILE keyword.

Here’s an example:

 * ---------------------------------------------------------------------- *
 * Program     : LSX001R                                                  *
 * Description : Dynamically select which file to open                    *
 * ---------------------------------------------------------------------- *
h main(Main)
fMOVEMENTS if   e           k DISK    usropn extfile(filename)

d Main            pr                  extpgm('LSX001R')
d  library                      10a

dfilename         s             21a

 * ------------------------------------------------------------------------
 * Main Procedure
 * ------------------------------------------------------------------------
p Main            b
d Main            pi
d  library                      10a
 /free

     filename = %trim(library) + '/' + 'MOVEMENTS';
     open MOVEMENTS;

     // Do whatever processing on the file needs to be done...

     close MOVEMENTS;

 /end-free
p Main            e

Hopefully this is reasonably straightforward. In the file spec for the MOVEMENTS file, I have used EXTFILE with a variable filename to specify the actual file opened. I have also used the USROPN keyword as I need to populate the filename variable before I attempt to open the file.

Populating this field is pretty simple. I pass the library name to the program as a parameter and the first thing the Main procedure does is concatenate the library and file with a ‘/’ separator.

Now I can open the correct file, do whatever processing is necessary, and then close the file afterwards.

A similar approach, using the EXTMBR keyword can also be used to open a specific file member. You can, of course, combine EXTFILE and EXTMBR in order to dynamically determine both the file and member, if you really want to.

It should be noted that this will only work if you are using traditional database IO. Any embedded SQL will remain unaffected by anything you do in the F-Spec. If you want to do something similar with SQL, you will need to look into the CREATE ALIAS statement.

Adding variables to ad-hoc SQL queries with CL

Last month I mentioned using REXX to build and execute ad-hoc SQL queries. For the sake of completeness, here is the same solution implemented in CL using the RUNSQL command.

The program does exactly the same as the previously describe REXX script — extracts all records flagged with today’s date so that they can be reviewed at a later date. And here it is:

pgm
dcl &date6 *char 6
dcl &date8 *char 8
dcl &statement *char 255

/* First retrieve the current system date (QDATE)                             */
/* and convert it to an 8 character value                                     */
RTVSYSVAL  SYSVAL(QDATE) RTNVAR(&DATE6)
CVTDAT     DATE(&DATE6) TOVAR(&DATE8) FROMFMT(*JOB) TOFMT(*YYMD) TOSEP(*NONE)

/* Next, build the SQL statement                                              */
CHGVAR     VAR(&STATEMENT) +
           VALUE('insert into EXPATPAUL1/INPUTDATA' *bcat +
                 'select * from EXPATPAUL1/PRODDATA where dhdate =' *bcat +
                 &date8)
runsql sql(&statement) commit(*none)

endpgm

Using RGZPFM to sort physical files

It’s rare that you need to sort a database file (or table), but I have encountered a circumstance in which a sorted file had to be delivered. Being naturally lazy, I started looking around for the least-work method of achieving this and ended up looking at the RGZPFM command. This is a command I have used quite often to compress a physical file (it removes the deleted records), but this time around it was the reorganise part of Reorganise Physical File Member that I was interested in.

From the help text:

If a keyed file is identified in the Key file (KEYFILE) parameter, the system reorganizes the member by changing the physical sequence of the records in storage to either match the keyed sequence of the physical file member’s access path, or to match the access path of a logical file member that is defined over the physical file.

The important thing here is that any keyed file can be specified. So if I have an appropriately keyed logical file, I can very easily sort the physical.

Some names have been changed, but if I have a physical file called ITEMLIST and I want it sorted by field BARCODE, I need to first create an index:

create index ITEMLIST01          
on ITEMLIST (BARCODE);           
                                 
label on index ITEMLIST01        
is 'Logical on ITEMLIST (Keyed: BARCODE)'; 

Then I can easily reorder the physical file ITEMLIST into BARCODE sequence with the command:

RGZPFM FILE(ITEMLIST) KEYFILE(ITEMLIST01 ITEMLIST01)

The only thing to take not of is that RGZPFM will issue an error if the file has no records. The approach I took wast to use RTVMBRD to retrieve the NBRCURRCD for the member, an alternative approach would be to use a MONMSG immediately after the RGZPFM.

Debugging RPG Programs in Batch

If you spend any time developing RPG applications you will find that, sooner or later, you will need to debug a program that runs in batch. There are a few steps that you need to take in order to do this and, because my memory is terrible, I’m putting them here.

  • Know what job queue your job is being submitted to. Hold this job queue.
  • Submit your job.
  • Your job is now sitting on the held job queue waiting to be released. Display the job and make a note of the user name, job name and number.
  • Start a service job using STRSRVJOB entering the name, job name and number from the previous step.
  • STRDBG PGM(ProgramName). You can’t enter any breakpoints yet, so hit F12 to exit the source display
  • Release the job queue
  • The Start Serviced Job screen will be displayed asking you to press F10 to enter debug commands for the job. Press F10.
  • DSPMODSRC and enter your breakpoints.
  • Press F12 to resume and F3 to return to the Start Serviced Job screen.
  • Press Enter to start your job.

The job will now begin running and will stop, passing control back to your screen, when the first breakpoint is reached.