Reading Data from Fixed-Width Text Files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • code green
    Recognized Expert Top Contributor
    • Mar 2007
    • 1726

    Reading Data from Fixed-Width Text Files

    I am receiving orders from a web-site as a text file of fixed-width format
    Code:
    THYNMD........08000000......00000022.....000011
    The data sections are padded out with spaces (denoted as dots in example) and appear on single lines of hundreds of characters in length.
    Only about three lines altogether seperated by CR.
    The widths are precisley defined in the specs
    Code:
    Field    From    To    Length    
    Name1	500    539    40
    Name2	540    579    40
    I hope to produce a class that extracts the data into associtive arrays for import into MySQL,
    so I imagine extensive use of string and/or array functions.

    Has anybody worked with with this format that may have functions to help me start or advice on pitfalls to avoid?
    Thanks in advance
  • Stewart Ross
    Recognized Expert Moderator Specialist
    • Feb 2008
    • 2545

    #2
    Although my experience does not relate to web-based work in PHP I worked with fixed-width text files when submitting funding returns to a student funding council. I used VBA and Access to process them. Other databases and programming languages would have been equally as useful.

    Although I used Access tables to store the data then queries to process them it is just as possible to use arrays in memory processed using bespoke code instead. I did as much as I could in database queries, to make it as transparent as possible what was going on (for maintenance purposes). If all of the processing is done in code it is more difficult for other people who might follow on from ourselves to maintain, as code is another step removed from the experience of non-programmers and more difficult to understand (at least initially) than an SQL query may be.

    The funding returns were in two fixed-width text files, one representing course information (known as FES 1) and the other student data (FES 2), including the number of fundable units (student units of measure as they were called) we were claiming for each student. To process these files I read them into an Access table containing a single column called [Lines] that contained the text read from the text file to the end-of-line in each row.

    An anonymised example of the text lines from the FES 2 table is shown below. The file has an internal structure of header rows followed by detail rows, repeated for each programme:

    Code:
    IN130121FES51605530013
    NAMEA               DW00000011/1          020091355241019881270XX22 1XXINLM  22101708         08  110920060000000018.00000.00000.00014082006250520072410200606303420.00000.000
    NAMEBBBB            JM00000015/1          010050006270519881270XX51 9XXINLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302120.00000.000
    NAMECC              JW00000000/1          030066561071119891270XX2 4XX INLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302720.00000.000
    NAMEDDDDD           M 00000000/1          030049071290719891270XX49 9XXINLM  22101708         01  110920060000000018.00000.00000.00014082006250520072410200606302620.00000.000
    As with your example there was a detailed specification for each component field of the file, and for the header/detail structure. The field start and end character numbers and field names were stored in a table. Some of the fields listed are shown below:

    Code:
    Field Name   P1  P2
    01 Surname    1  20
    02 Initials  21  22
    03 SQA       23  42
    04 ID        23  42
    05 DOB       52  59
    06 Gend      60  60
    07 Home      61  63
    08 PCode     64  71
    09 Cmps      72  77
    In my case I used an unjoined Access Crosstab query to extract the field structure from the student data text file and the specification file for checking purposes. The simple query which extracts the fields using the unjoined crosstab and a sample of the consequent decoded data is shown below:

    Code:
    TRANSFORM First(ExtractText([Lines],[p1],[p2])) AS [Text]
    SELECT   [FES 2 Text File].Lines, Len([Lines]) AS L
    FROM     [FES 2 Text File], [FES 2 Table Fields]
    GROUP BY [FES 2 Text File].Lines, Len([Lines])
    PIVOT    [FES 2 Table Fields].[Field Name];
    (The use of function First in the TRANSFORM statement is just a dummy in this case to allow the crosstab to take place, as Access will not pivot a value unless some kind of aggregate function is used on the data.)

    Code:
    01 Surname 02 Initials  03 SQA      04 ID       05 DOB    06 Gend  07 Home  08 PCode  09 Cmps  10 Src of Fin  11 Ethnic  12 MOA  13 Cat  14 Schl  15 Dis All  16 Dis  17 Outcm  18 QED  19 H  20 A  21 SCE1  22 SCE2  23 Units  24 Other  25 SUMs  26 ELS  27 SUM Eq  28 Start  29 End   30 25%   31 Enr no  32 Trf SUMs  33 Trf ELS
    NAME1111   HC           00000000/4  00000000/4  13101989  1        300      XX30 8XX  INLM     06             11         08      10       0121052 0           07      00                0     0     0        0        0                   010.00   000.00  000.00     23042007  25052007 15052007 000000     10.00        000.00
    NAME2      BR           00000000/2  00000000/2  22071987  1        270      XX15 9XX  INLM     05             10         06      08       0109102 0           06      00                0     0     0        0        0                   007.50   000.00  000.00     11092006  30032007 31102006 000000     07.50        000.00
    NAME3333   RE           00000000/1  00000000/1  04061990  2        270      XX6 7XX   INAAC    42             10         07      03       0118092 0           06      00                0     0     0        0        0                   006.00   000.00  000.00     21082006  23042007 21102006 000000     06.00        000.00
    NAME4444   K            00000000/2  00000000/2  11051985  2        270      XX2 4XX   INMM     10             10         08      11       0109102 0           06      00                0     0     0        0        0                   001.00   000.00  000.00     11092006  22062007 15112006 000000     01.00        000.00

    VBA Functions to extract the data from the text string were very simple, and should be adaptable to other languages relatively easily:

    Code:
    Public Function ExtractField(source, P1 As Integer, P2 As Integer) As String
    ' Returns a sub-string of the source string, trimmed to remove
    ' trailing spaces. Leading spaces remain to assist in identifying field
    ' starts
    
    Dim Result As String, l As Integer
    Result = ""
    If Not IsNull(source) Then
        l = Len(source)
        If (P1 <= l) And (l - P1 - P2 + 1 >= 0) Then
            Result = RTrim(Mid$(source, P1, P2))
        End If
    End If
    ExtractField = Result
    End Function
    
    Public Function ExtractText(source, ByVal P1 As Integer, ByVal P2 As Integer) As String
    ' In:   source string
    '       p1 = start position in source string
    '       p2 = end position in source string
    ' Out:  substring  of source starting at p1 and ending at p2
    ' Uses extractfield() to get the data from the source string
    '
      ExtractText = ExtractField(source, P1, P2 - P1 + 1)
    End Function
    My main task was actually not so much the decoding of the text file back into its constituent fields but to generate data in the fixed format. This was done by recordset processing of individual fields from final queries, setting up fixed-width and fixed-format values as appropriate to match the spec.

    The downside to all the string extracts is that processing can be very slow. In my case the records were one to a text line, so processing more than 8000 lines each containing around 32 individually-extracted fields was relatively slow to accomplish.

    In your case you have a record structure which appears to go across rows with multiple records in a single row (hence only three rows in total in your text file). If that is the case you may have to pre-process the text lines to extract the individual records into an array or table, say, before you begin extracting the fields as such.

    Hope this brief summary of experience doing similar things outside of PHP itself helps. I wish you all success in doing the work in PHP, or indeed for developing in any other scripting language you may want to use to do the processing for you.

    Regards

    Stewart
    Last edited by Stewart Ross; Mar 19 '11, 02:13 PM.

    Comment

    • code green
      Recognized Expert Top Contributor
      • Mar 2007
      • 1726

      #3
      Informative and useful.
      Thanks for the reply.

      Comment

      Working...