Read a CSV File (CSV) Table Function

Index

Schema IQUERY

 CSV( path_name=>'ifs-text-file.csv', 
header_rows=>1,
starting_row=>1,
delim => ',',
dump_Headers=>0 )

The Read Comma Separated Values File (CSV) Table Function returns one row for each record in the specified IFS text file. CSV files can be created using myriad methods. Often when created using MS Excel or other tools, a so-called Byte Order Mark or "BOM" is inserted at the beginning of the Text file. SQL functions such as the CSV Table function inspect the first few bytes of text files to determine if the BOM is included, and skip over it before returning the content of the file.

When the CSV table function reads the CSV file, the data returned is not identical to what is contained in the CSV file. For this reason, users who simply want to view the contents of an IFS text file should use another method to do so. For example IFS_READ() Table function found in library SQLTOOLS starting with IBM i V7R2 and in QSYS2 on the latest TR and versions of IBM i.

Parameters

PATH_NAME

The fully qualified IFS file (with the path) that contains the CSV data. This is a required parameter.

HEADER_ROWS

Default: 1

The number of Column Header rows at the beginning of the file. Normally 1 row is specified, however sometimes additional headers are included that need to be skipped over to get to the actual Column Headings record. For example, if the first record contains a general Title and the 2nd record contains the Column titles, to let CSV() know that the column headings appear in row 2, use HEADER_ROW=>2 when calling it. Note that this value identifies the record that contains the column headings used by CSV_VAL(), CSV_INT() , CSV_DEC(), and CSV_DATE() functions to retrieve the column's data.

STARTING_ROW

Default: 1

The number of the first data record (beyond the header row(s)) that is to be processed.  For example, you want to skip the first 5 rows of the CSV data (again, after the header rows) then STARTING_ROW=> 6 would be specified. The default is 1 which means start with the first row of data.

DELIM

Default: ','

The separator used to delimit the columns of data in the text input file. The default is a comma, but any symbol can be used although if the double-quote or quote (apostrophe) are used, unpredictable result will occur.

DUMP_HEADERS

Default: 0

Controls whether the names detected for the columns headings are written to the joblog. The default 0 means do not write the column headings to the joblog, any positive numeric value means the column headings are written to the joblog. 

Examples

select csv_int(data,'population'),csv_val(data,'state') from table(iQuery.csv('/home/SQLTOOLS/data/states.csv')) st;

See Also: CSV_VAL CSV_CHAR CSV_VARCHAR CSV_COUNT CSV_INT CSV_BIGINT CSV_DATE CSV_DEC CSV_BIGDEC

Column Data Type Description
DATA
VARCHAR(8192)
The data from the rows/records in the IFS text file specified on the FILE_NAME parameter. Pass this value to the CSV parsing routines such as CSV_INT, CSV_VAL, CSV_DATE, etc.

Using the CSV UDTFs

Normally the CSV Table function is used in conjunction with one or more of the CSV parsing UDFs such as CSV_VAL and CSV_DEC. Each of these CSV_xxxx UDFs return the data from the DATA column of the CSV Table function; which must be passed as their first parameter. The CSV() function returns one column (field): the CSV file's row data. Then the specific CSV parsing UDF uses that data to extract the individual column data.

There are myriad ways to create a CSV file, MS Excel, the CPYTOIMPF command, and OUTPUT(*CSV) on the RUNiQRY command are just a few. To produce a test CSV file to use in this example, I ran the following RUNiQRY command:

RUNiQRY 'select * from qiws.qcustcdt'  OUTPUT(*CSV)

This produces a CSV Text File on the IFS similar to the following:

CUSNUM,Last Name,INIT,Street Address,CITY,STATE,ZIPCOD,CDTLMT,CHGCOD,BALDUE,CDTDUE 
938472,"Henning","G K","4859 Elm Ave","Dallas","TX",75217,5000,3,.00,.00           
839283,"Cozzi's","B D","21B NW 135 St","Clay","NY",13041,400,1,101.00,.00          
392859,"Vine","S S","PO Box 79","Broton","VT",5046,700,1,439.00,.00                
938485,"Johnson","J A","3 Alpine Way","Helen","GA",30545,9000,2,3987.50,33.50      
397267,"Tyron","W E","13 Myrtle Dr","Hector","NY",14841,1000,1,.00,.00             
389572,"Stevens","K L","208 Snow Pass","Denver","CO",80226,400,1,.00,1.50          
846283,"Alison","J S","787 Lake Dr","Isle","MN",56342,5000,3,.00,.00               
475938,"Doe","J W","59 Archer Rd","Sutter","CA",95685,700,2,250.00,100.00          
693829,"Thomas","A N","3 Dove Circle","Casper","WY",82609,9000,2,.00,.00           
593029,"Williams","E D","485 SE 2 Ave","Dallas","TX",75218,200,1,.00,.00           
192837,"Lee","F L","5963 Oak St","Hector","NY",14841,700,2,490.50,-1234.50         
583990,"Abraham","M T","392 Mill St","Isle","MN",56342,9000,3,500.00,.00

To read these rows using the CSV Table function, the following statement could be used.

select csv_int(data,'cusnum'), cast(csv_char(data,'Last name') as varchar(20)) as LastName, 
csv_val(data,'state') as STATE
from table(iQuery.csv('/home/cozzi/QCUSTCDT.CSV')) csv;

Three fields are identified by their column headings "CUSNUM", "Last Name" and 'STATE". The CSV_xxx parsing routines ignore upper/lower case and spaces in the column headings/names. In addition, special characters like dashes or slashes are removed/ignored. So the labels specified on the 2nd parameter of the CSV_xxx UDFs have to match the column heading names, but upper/lower case, spaces and special characters are ignored in performing that match. Therefore "Last Name" or "LastName" or "lastname" all return the column whose column heading is "Last Name" in the CSV file.

One situation that comes of frequently is a column entitled something like "Qty O/H" and how to identify it. Again, the forward slash annd spaces are dropped by CSV() when processing a column, so any form of QTY and OH can be used, for example: 'QTYOH' and 'Qty O/H' both return the column labeled 'Qty O/H'. Note that some column returned by SQL have numerical identifiers and no real column name. Names like: 000001, 000002, etc. The CSV UDFs allow users to use either the column headers or the relative column number. For example to reference the column named STATE in the above example, you can refer to it as CSV_VAL(data, 'state') or as CSV_VAL(data, 3) and get the same result. 

The result of the above SQL statement might look like the following:

 00001  LastName              STATE  
938472  Henning               TX     
839283  Cozzi's               NY     
392859  Vine                  VT     
938485  Johnson               GA     
397267  Tyron                 NY     
389572  Stevens               CO     
846283  Alison                MN     
475938  Doe                   CA     
693829  Thomas                WY     
593029  Williams              TX     
192837  Lee                   NY     
583990  Abraham               MN