how to convert gpr file to csv format: using python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • baber
    New Member
    • Jan 2007
    • 4

    how to convert gpr file to csv format: using python

    Hi
    I am a beguinner, I would like to known how to convert a file in gpr format to csv format by using python.
    Baber
  • bartonc
    Recognized Expert Expert
    • Sep 2006
    • 6478

    #2
    Originally posted by baber
    Hi
    I am a beguinner, I would like to known how to convert a file in gpr format to csv format by using python.
    Baber
    Very well. Let's move this to the Python forum. Welcome to TSDN.

    Comment

    • bartonc
      Recognized Expert Expert
      • Sep 2006
      • 6478

      #3
      Originally posted by baber
      Hi
      I am a beguinner, I would like to known how to convert a file in gpr format to csv format by using python.
      Baber
      Welcome to the Python Forum on TheScipts.com.
      I don't recognize gpr. Is it some other text format or from a program?

      Comment

      • ghostdog74
        Recognized Expert Contributor
        • Apr 2006
        • 511

        #4
        well, you should help use to help you, by providing an example of gpr format, and your expected output, in which case, you are requiring csv.
        looking up the gpr extension, i can only find that it relates to some modeling software system...

        Comment

        • bartonc
          Recognized Expert Expert
          • Sep 2006
          • 6478

          #5
          Originally posted by ghostdog74
          well, you should help use to help you, by providing an example of gpr format, and your expected output, in which case, you are requiring csv.
          looking up the gpr extension, i can only find that it relates to some modeling software system...
          Hey ghostdog! Where you been so long?
          I actually found the GenePix Results format, but don't know if this is the correct one:
          Code:
          GPR Header
          A sample GPR file header and a description of each entry are shown below: 
          
          Entry Description 
           
          ATF     1.0 File type and version number. 
          29       48 Number of optional header records and
          number of data fields (columns). 
          "Type=GenePix Results 3" Type of ATF file. 
          "DateTime=2002/02/09 17:15:48" Date and time when the image was acquired. 
          "Settings=C:\Genepix\Genepix.gps" The name of the settings file that was used for analysis. 
          "GalFile=C:\Genepix\Demo.gal" The GenePix Array List file used to associate Names and IDs to each entry. 
          "PixelSize=10" Resolution of each pixel in µm. 
          "Wavelengths=635     532" Installed laser excitation sources in nm. 
          "ImageFiles=C:\Genepix\demo.tif 0
          C:\Genepix\Genepix.tif 1" The name and path of the associated TIF file(s). 
          "NormalizationMethod=None" The type of normalization method used, if applicable. 
          "NormalizationFactors=1    1" The normalization factor applied to each channel. 
          "JpegImage=C:\Genepix\demo.jpg" The name and path of the associated Jpeg image files. 
          "StdDev=Type 1" The type of standard deviation calculation selected in the Options settings. 
          "RatioFormulation=W1/W2 (635/532)" The ratio formulation of the ratio image, showing which image is numerator and which is denominator. 
          "Barcode=00331" The barcode symbols read from the image. 
          "BackgroundSubtraction=LocalFeature" The background subtraction method selected in the Options settings. 
          "ImageOrigin=0, 0" The origin of the image relative to the scan area. 
          "JpegOrigin=390, 4320" The origin of the Results JPEG image (the bounding box of the analysis Blocks) relative to the scan area origin. 
          "Creator=GenePix 4.1.1.4" The version of the GenePix Pro software used to create the Results file. 
          "Scanner=GenePix 4000B [serial number]" Type and serial number of scanner used to acquire the image. 
          "FocusPosition=0" The focus position setting used to acquire the image, in microns. 
          "Temperature=19.6127" The temperature of the scanner, in degrees C. 
          "LinesAveraged=1" The line average setting used to acquire the image. 
          "Comment=hyb 2673" User-entered file comment. 
          "PMTGain=500     600" The PMT settings during acquisition. 
          "ScanPower=100    100" The amount of laser transmission during acquisition. 
          "LaserPower=1    1" The power of each laser, in volts. 
          "LaserOnTime=5    5" The laser on-time for each laser, in minutes. 
          "Filters=<Empty>    <Empty>" Emission filters used during acquisition (GenePix 4100 and 4200 only.) 
          "ScanRegion=100,100,2000,2000" The coordinate values of the scan region used during acquisition, in pixels. 
          "Supplier=" Header field supplied in GAL file. 
          Data record column headings Column titles for each measurement (see below). 
          Data Records Extracted data. 
          
           
          
            
          
          GPR Data
          The list below describes each column of data in the Results file. 
          
          Column Title Description 
           
          Block the block number of the feature. 
          Column the column number of the feature. 
          Row the row number of the feature. 
          Name the name of the feature derived from the Array List (up to 40 characters long, contained in quotation marks). 
          ID the unique identifier of the feature derived from the Array List (up to 40 characters long, contained in quotation marks). 
          X the X-coordinate in µm of the center of the feature-indicator associated with the feature, where (0,0) is the top left of the image. 
          Y the Y-coordinate in µm of the center of the feature-indicator associated with the feature, where (0,0) is the top left of the image. 
          Dia. the diameter in µm of the feature-indicator. 
          F635 Median median feature pixel intensity at wavelength #1 (635 nm). 
          F635 Mean mean feature pixel intensity at wavelength #1 (635 nm). 
          F635 SD the standard deviation of the feature pixel intensity at wavelength #1 (635 nm). 
          B635 Median the median feature background intensity at wavelength #1 (635 nm). 
          B635 Mean the mean feature background intensity at wavelength #1 (635 nm). 
          B635 SD the standard deviation of the feature background intensity at wavelength #1 (635 nm). 
          % > B635 + 1 SD the percentage of feature pixels with intensities more than one standard deviation above the background pixel intensity, at wavelength #1 (635 nm). 
          % > B635 + 2 SD the percentage of feature pixels with intensities more than two standard deviations above the background pixel intensity, at wavelength #1 (635 nm). 
          F635 % Sat. the percentage of feature pixels at wavelength #1 that are saturated. 
          F532 Median median feature pixel intensity at wavelength #2 (532 nm). 
          F532 Mean mean feature pixel intensity at wavelength #2 (532 nm). 
          F532 SD the standard deviation of the feature intensity at wavelength #2 (532 nm). 
          B532 Median the median feature background intensity at wavelength #2 (532 nm). 
          B532 Mean the mean feature background intensity at wavelength #2 (532 nm). 
          B532 SD the standard deviation of the feature background intensity at wavelength #2 (532 nm). 
          % > B532 + 1 SD the percentage of feature pixels with intensities more than one standard deviation above the background pixel intensity, at wavelength #2 (532 nm). 
          % > B532 + 2 SD the percentage of feature pixels with intensities more than two standard deviations above the background pixel intensity, at wavelength #2 (532 nm). 
          F532 % Sat. the percentage of feature pixels at wavelength #2 that are saturated. 
          Ratio of Medians the ratio of the median intensities of each feature for each wavelength, with the median background subtracted. 
          Ratio of Means the ratio of the arithmetic mean intensities of each feature for each wavelength, with the median background subtracted. 
          Median of Ratios the median of pixel-by-pixel ratios of pixel intensities, with the median background subtracted. 
          Mean of Ratios the geometric mean of the pixel-by-pixel ratios of pixel intensities, with the median background subtracted. 
          Ratios SD the geometric standard deviation of the pixel intensity ratios. 
          Rgn Ratio the regression ratio of every pixel in a 2-feature-diameter circle around the center of the feature. 
          Rgn R² the coefficient of determination for the current regression value. 
          F Pixels the total number of feature pixels. 
          B Pixels the total number of background pixels. 
          Sum of Medians the sum of the median intensities for each wavelength, with the median background subtracted. 
          Sum of Means the sum of the arithmetic mean intensities for each wavelength, with the median background subtracted. 
          Log Ratio log (base 2) transform of the ratio of the medians. 
          Flags the type of flag associated with a feature. 
          Normalize the normalization status of the feature (included/not included). 
          F1 Median - B1 the median feature pixel intensity at wavelength #1 with the median background subtracted. 
          F2 Median - B2 the median feature pixel intensity at wavelength #2 with the median background subtracted. 
          F1 Mean - B1  the mean feature pixel intensity at wavelength #1 with the median background subtracted. 
          F2 Mean - B2 the mean feature pixel intensity at wavelength #2 with the median background subtracted. 
          SNR 1 the signal-to-noise ratio at wavelength #1, defined by (Mean Foreground 1- Mean Background 1) / (Standard deviation of Background 1) 
          F1 Total Intensity the sum of feature pixel intensities at wavelength #1 
          Index the number of the feature as it occurs on the array. 
          "User Defined" user-defined feature data read from the GAL file (GenePix Pro 4.1).

          Comment

          • ghostdog74
            Recognized Expert Contributor
            • Apr 2006
            • 511

            #6
            hey barton
            i've been lurking around :-)...
            anyway, thanks for the gpr format. if its correct, then now its up to OP to specify his requirements. :)

            Comment

            • baber
              New Member
              • Jan 2007
              • 4

              #7
              Originally posted by ghostdog74
              hey barton
              i've been lurking around :-)...
              anyway, thanks for the gpr format. if its correct, then now its up to OP to specify his requirements. :)
              This example of gpr file is a good one.
              gpr format (microarray data file) is like this:

              Description
              line 1
              line 2
              Line n

              col1 col2 ..... coln
              line1 val1 val2 valn
              line2 etc etc
              line3 etc

              Now, I want know how to convert gpr to csv with python ?

              Comment

              • bvdet
                Recognized Expert Specialist
                • Oct 2006
                • 2851

                #8
                Originally posted by baber
                This example of gpr file is a good one.
                gpr format (microarray data file) is like this:

                Description
                line 1
                line 2
                Line n

                col1 col2 ..... coln
                line1 val1 val2 valn
                line2 etc etc
                line3 etc

                Now, I want know how to convert gpr to csv with python ?
                If I understand this format correctly, it is a tab delimited file. The script below will replace each tab with a comma and output to another file:
                Code:
                import os
                
                def tab_to_csv(tab_name, csv_name):
                    try:
                        f1 = open(tab_name, 'r')
                        f2 = open(csv_name, 'w')
                        outList = []
                        for line in f1:
                            outList.append(line.replace('\t', ','))
                        f1.close()
                        f2.writelines(outList)
                        f2.close()
                        return True
                    except:
                        return False
                
                if __name__ == '__main__':
                    
                    def run_script():
                        
                        gpr_file = (os.path.join('H:\\', 'TEMP', 'temsys', 'GPR.gpr'))
                        csv_file = (os.path.join('H:\\', 'TEMP', 'temsys', 'GPR.txt'))
                        if tab_to_csv(gpr_file, csv_file):
                            print 'Tab delimited file conversion to comma delimited file was successful'
                        else:
                            print 'There was an error'
                    run_script()

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #9
                  Here's some more information I found on the gpr format:
                  Originally posted by http://www.molecularde vices.com/pages/software/gn_genepix_file _formats.html#a tf
                  ATF - Axon Text File format (*.atf)

                  ATF is a tab-delimited text file format that can be read by typical spreadsheet programs such as Microsoft Excel. It is used for GenePix Array List (GAL) files, and GenePix Results (GPR) files.

                  An ATF text file consists of records. Each line in the text file is a record. Each record may consist of several fields, separated by a field separator (column delimiter). The tab and comma characters are field separators. Space characters around a tab or comma are ignored and considered part of the field separator. Text strings are enclosed in quotation marks to ensure that any embedded spaces, commas and tabs are not mistaken for field separators.

                  The group of records at the beginning of the file is called the file header. The file header describes the file structure and includes column titles, units, and comments.
                  It would be great if baber could provide us with a sample gpr file so we could test it.
                  Last edited by bvdet; Jan 16 '07, 11:34 PM. Reason: add comment

                  Comment

                  • dshimer
                    Recognized Expert New Member
                    • Dec 2006
                    • 136

                    #10
                    1) This looks like a very straightforward text file in which you could read in all the lines, create a list of each line, evaluate the list based on their contents the just write it back out delimited by commas.

                    That said, I'll admit I'm still a bit confused by the format. Does this imply that each line "line 1" etc, is comprised of a bunch of data organized in columns? Or that there are N lines containing something, then a string of n entries of "col" data, followed by further strings of value data? In any case I can think of several ways to easily read and analyze the data, I just am not totally clear on what is being described.

                    Originally posted by baber
                    This example of gpr file is a good one.
                    gpr format (microarray data file) is like this:

                    Description
                    line 1
                    line 2
                    Line n

                    col1 col2 ..... coln
                    line1 val1 val2 valn
                    line2 etc etc
                    line3 etc

                    Now, I want know how to convert gpr to csv with python ?

                    Comment

                    • bartonc
                      Recognized Expert Expert
                      • Sep 2006
                      • 6478

                      #11
                      Originally posted by baber
                      This example of gpr file is a good one.
                      gpr format (microarray data file) is like this:

                      Description
                      line 1
                      line 2
                      Line n

                      col1 col2 ..... coln
                      line1 val1 val2 valn
                      line2 etc etc
                      line3 etc

                      Now, I want know how to convert gpr to csv with python ?
                      So this IS GenePix, right?

                      Comment

                      • ghostdog74
                        Recognized Expert Contributor
                        • Apr 2006
                        • 511

                        #12
                        Originally posted by baber
                        This example of gpr file is a good one.
                        gpr format (microarray data file) is like this:

                        Description
                        line 1
                        line 2
                        Line n

                        col1 col2 ..... coln
                        line1 val1 val2 valn
                        line2 etc etc
                        line3 etc

                        Now, I want know how to convert gpr to csv with python ?
                        i don't really know what is your desired output, but by specifying csv, i guessed you just want a comma separated. Here's a bit of code
                        Code:
                        import fileinput
                        for line in fileinput.FileInput("file",inplace=1):
                           print ','.join(line.split())
                        >>>

                        output:
                        Code:
                        line,1
                        line,2
                        Line,n
                        
                        col1,col2,.....,coln
                        line1,val1,val2,valn
                        line2,etc,etc
                        line3,etc

                        Comment

                        • bvdet
                          Recognized Expert Specialist
                          • Oct 2006
                          • 2851

                          #13
                          Originally posted by ghostdog74
                          i don't really know what is your desired output, but by specifying csv, i guessed you just want a comma separated. Here's a bit of code
                          Code:
                          import fileinput
                          for line in fileinput.FileInput("file",inplace=1):
                             print ','.join(line.split())
                          >>>

                          output:
                          Code:
                          line,1
                          line,2
                          Line,n
                          
                          col1,col2,.....,coln
                          line1,val1,val2,valn
                          line2,etc,etc
                          line3,etc
                          It works except as indicated below. Before:
                          Code:
                          ATF	1			
                          8	5			
                          Type=GenePix ArrayList V1.0				
                          BlockCount=4				
                          BlockType=0				
                          URL=http://genome-www.stanford.edu/cgi-bin/dbrun/SacchDB?find+Locus+%22[ID]%22				
                          "Block1= 400, 400, 100, 24, 175, 5, 175"				
                          "Block2= 4896, 400, 100, 24, 175, 5, 175"				
                          "Block3= 400, 4896, 100, 24, 175, 5, 175"				
                          "Block4= 4896, 4896, 100, 24, 175, 5, 175"				
                          Block	Column	Row	Name	ID
                          1	1	1	VPS8	YAL002W
                          1	2	1	NTG1	YAL015C
                          After:
                          Code:
                          ATF,1
                          8,5
                          Type=GenePix ArrayList V1.0
                          BlockCount=4
                          BlockType=0
                          URL=http://genome-www.stanford.edu/cgi-bin/dbrun/SacchDB?find+Locus+%22[ID]%22
                          "Block1= 400, 400, 100, 24, 175, 5, 175"
                          "Block2= 4896, 400, 100, 24, 175, 5, 175"
                          "Block3= 400, 4896, 100, 24, 175, 5, 175"
                          "Block4= 4896, 4896, 100, 24, 175, 5, 175"
                          Block,Column,Row,Name,ID
                          1,1,1,VPS8,YAL002W
                          1,2,1,NTG1,YAL015C
                          To prevent duplicate commas at embedded spaces, strip trailing tab and newline characters and split on tabs:
                          Code:
                          for line in fileinput.input(gpr_file, True, '.bak'):
                             print ','.join(line.rstrip('\t\n').split('\t'))
                          Good post ghostdog. I did not know about fileinput.

                          Comment

                          • baber
                            New Member
                            • Jan 2007
                            • 4

                            #14
                            Thanks a lot, now I can convert .gpr to .csv.

                            Baber

                            Comment

                            • bartonc
                              Recognized Expert Expert
                              • Sep 2006
                              • 6478

                              #15
                              Originally posted by baber
                              Thanks a lot, now I can convert .gpr to .csv.

                              Baber
                              Awesome! Thanks for the update.

                              Comment

                              Working...