how can align as a column

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vignesh1985
    New Member
    • Apr 2007
    • 11

    how can align as a column

    Hello sir,

    i m vignesh from singapore. Can u pls tell me how can i separate ID and AC as columns from following file. Three space Gap found between ID and value(Underline d).Please do help me in this.

    Regards
    vignesh

    Code:
    ID   [u]A0JQA7_LACS1[/u]            Unreviewed;       258 AA.
    AC   [u]A0JQA7[/u];
    DT   12-DEC-2006, integrated into UniProtKB/TrEMBL.
    DT   12-DEC-2006, sequence version 1.
    DT   29-MAY-2007, entry version 4.
    //
    ID   [u]A0JQB0_LACS1[/u]            Unreviewed;        71 AA.
    AC   [u]A0JQB0[/u];
    DT   12-DEC-2006, integrated into UniProtKB/TrEMBL.
    DT   12-DEC-2006, sequence version 1.
    DT   06-MAR-2007, entry version 2.
    //
    ID   [u]A0JQB4_LACS1[/u]            Unreviewed;        57 AA.
    AC  [u] A0JQB4[/u];
    DT   12-DEC-2006, integrated into UniProtKB/TrEMBL.
    DT   12-DEC-2006, sequence version 1.
    DT   06-MAR-2007, entry version 2.
    DE   Hypothetical protein.
    //
  • prn
    Recognized Expert Contributor
    • Apr 2007
    • 254

    #2
    Hi Vignesh,

    I'm not sure I understand the question. Let's start with what you want. If I understand (and that's doubtful) you want to take what you gave us as input and get something like the following as output:
    Code:
    ID   A0JQA7_LACS1
    AC   A0JQA7;
    //
    ID   A0JQB0_LACS1
    AC   A0JQB0;
    //
    ID   A0JQB4_LACS1
    AC   A0JQB4;
    Is this correct? If not, then what should be different? Should the semicolons be removed from the AC lines? Do you want the "//" lines? What else?

    Best Regards,
    Paul

    Comment

    • vignesh1985
      New Member
      • Apr 2007
      • 11

      #3
      Thank u for reply,

      Yes this is correct sir. Semicolons and "//" no need. output should like following type.

      ID AC

      A0JQA7_LACS1 A0JQA7
      A0JQB0_LACS1 A0JQB0
      A0JQB4_LACS1 A0JQB4

      tab should between two columns sir.

      regards
      vignesh

      Comment

      • KevinADC
        Recognized Expert Specialist
        • Jan 2007
        • 4092

        #4
        Hi vignesh,

        what code have you tried so far?

        Kevin

        Comment

        • prn
          Recognized Expert Contributor
          • Apr 2007
          • 254

          #5
          Hi Vignesh,

          OK. I think I understand a little bit better what you are looking for. Let me confirm a couple more details:
          1) The "Three space Gap" you mentioned is in the source file? I had initially thought you meant that you wanted a "Three space Gap" in the output file, but you seem to want a tab.
          2) You want the IDs and ACs lined up in columns! AHA! Yet another spot where I had misunderstood.
          3) Are the sample data representative of the actual data? That is, are the IDs always 12 characters? (This may be important if you want the column headers to line up since a single tab between "ID" and "AC" in the column headers will put the "AC" column header far to the left of the AC column. Beyond that, if some of the IDs are shorter -- less than 8 characters -- or longer -- more than 16 characters -- the IDs and ACs will not line up in nice columns without a good deal more processing.) If visual columns are important, then we need to know more. OTOH, if the tab is just so that you have "tab-delimited" data, then we don't need to worry, but then it's not clear what the purpose of the apparent column headers is.
          4)Are the first 6 characters of the IDs always the same as the corresponding ACs? If so, why are you bothering with the ACs?

          As Kevin said, can you let us know what you have tried and where your efforts have bogged down? At the very least, this should help us understand what kind of issues you need help with.

          Best Regards,
          Paul

          Comment

          • vignesh1985
            New Member
            • Apr 2007
            • 11

            #6
            hi.

            1) Yes i want tab delimited between Id and AC.

            2) Before i mentioned i want IDs and Acs as column wise.

            3) column headers also should single tab delimited .

            4) no here, IDs are diffrent and ACs are diffrent, so i need full string of AC and ID like before i mentioned.These are also should delemit by single tab.

            Code:
            ID A0JQA7_LACS1 Unreviewed; 258 AA.
            AC A0JQA7;
            DT 12-DEC-2006, integrated into UniProtKB/TrEMBL.
            DT 12-DEC-2006, sequence version 1.
            DT 29-MAY-2007, entry version 4.
            DE CI-like repressor, phage associated.
            GN OrderedLocusNames=LSL_0242;
            OS Lactobacillus salivarius subsp. salivarius (strain UCC118).
            OC Bacteria; Firmicutes; Lactobacillales; Lactobacillaceae;
            OC Lactobacillus.
            OX NCBI_TaxID=362948;
            //
            ID A0JQB0_LACS1 Unreviewed; 71 AA.
            AC A0JQB0;
            DT 12-DEC-2006, integrated into UniProtKB/TrEMBL.
            DT 12-DEC-2006, sequence version 1.
            DT 06-MAR-2007, entry version 2.
            DE Hypothetical protein.
            GN OrderedLocusNames=LSL_0245;
            OS Lactobacillus salivarius subsp. salivarius (strain UCC118).
            OC Bacteria; Firmicutes; Lactobacillales; Lactobacillaceae;
            OC Lactobacillus.
            OX NCBI_TaxID=362948;
            //
            ID A0JQB4_LACS1 Unreviewed; 57 AA.
            AC A0JQB4;
            DT 12-DEC-2006, integrated into UniProtKB/TrEMBL.
            DT 12-DEC-2006, sequence version 1.
            DT 06-MAR-2007, entry version 2.
            DE Hypothetical protein.
            GN OrderedLocusNames=LSL_0249;
            OS Lactobacillus salivarius subsp. salivarius (strain UCC118).
            OC Bacteria; Firmicutes; Lactobacillales; Lactobacillaceae;
            OC Lactobacillus.
            OX NCBI_TaxID=362948;
            //
            1) My data file is like that i want to extract only ID values and AC values as a column wise..


            2) i tried programlike following

            [CODE=perl]
            ############### ########
            # pathway of text file.

            $fh1="/home/divya/Desktop/sample program/sample1.txt";
            $fh2="/home/divya/Desktop/sample program/output.txt";

            open(IN1,"<$fh1 ")|| die "can't open file";

            $index=0;

            while ($line1 = <IN1>) {
            # splitting data as a two column. b'cos i need two column. so i m not bothering about othr charecters.
            ($a1,$b1) = split(/\s\s\s/,$line1,2);

            push(@id,$a1);
            push(@value,$b1 );

            $index++;
            }

            close(IN1);

            $len1 = @id;

            print "Len= $len1 \n";

            open(IN2,">$fh2 ")|| die "can't open file";

            for ($i=0; $i<$len1; $i++) {
            if($id[$i] eq "ID") {
            print IN2 ("$value[$i]\n");
            break;
            }
            }

            close(IN2);
            [/CODE]

            ############### #######
            2) i dnt know how to extract and print the AC values which are corresponding with values.

            thank you for replying

            regards
            vignesh
            Last edited by miller; Jun 29 '07, 06:06 AM. Reason: Code Tag and ReFormatting

            Comment

            • KevinADC
              Recognized Expert Specialist
              • Jan 2007
              • 4092

              #7
              Code:
              while (reading file) {
              if (/^ID|^AC/) {
                extract your data
              }

              Comment

              • KevinADC
                Recognized Expert Specialist
                • Jan 2007
                • 4092

                #8
                the is no "break" function in perl.

                Comment

                Working...