Format File Easy one (not for me)!

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andrew Slentz

    Format File Easy one (not for me)!

    I have a format file which is working but not correctly. It is, for
    some reason, dropping the first line of the input .csv file. The
    problem is something with the second coulumn of data having quotes in
    it. Any ideas? Below is some info.

    Format file (I use firstrow=2 in Bulk Import command:
    8.0
    6
    1 SQLCHAR 0 3000 ",\"" 1 Provider_Raw_ID Latin1_General_ CI_AS
    2 SQLCHAR 0 3000 "\"," 0 none_name Latin1_General_ CI_AS
    3 SQLCHAR 0 3000 "," 0 none_Spec_orig Latin1_General_ CI_AS
    4 SQLCHAR 0 3000 "," 3 SpecialtyCode Latin1_General_ CI_AS
    5 SQLCHAR 0 3000 "," 2 Category Latin1_General_ CI_AS
    6 SQLCHAR 0 3000 "\r\n" 4 NetworkCompared To Latin1_General_ CI_AS

    Sample input file:
    ID,NAME,SPEC_OR IGINAL,SPEC,CAT EGORY,NetworkCo mparedTo
    1,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
    2,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
    3,"Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1
    4,"Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1
    5,"Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1
    6,"Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1
    7,"Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1
    8,"Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1

    Thanks a lot!!!

    Andrew




    *** Sent via Devdex http://www.devdex.com ***
    Don't just participate in USENET...get rewarded for it!
  • Simon Hayes

    #2
    Re: Format File Easy one (not for me)!


    "Andrew Slentz" <ajslentz@yahoo .com> wrote in message
    news:40cfb33e$0 $25524$c397aba@ news.newsgroups .ws...[color=blue]
    > I have a format file which is working but not correctly. It is, for
    > some reason, dropping the first line of the input .csv file. The
    > problem is something with the second coulumn of data having quotes in
    > it. Any ideas? Below is some info.
    >
    > Format file (I use firstrow=2 in Bulk Import command:
    > 8.0
    > 6
    > 1 SQLCHAR 0 3000 ",\"" 1 Provider_Raw_ID Latin1_General_ CI_AS
    > 2 SQLCHAR 0 3000 "\"," 0 none_name Latin1_General_ CI_AS
    > 3 SQLCHAR 0 3000 "," 0 none_Spec_orig Latin1_General_ CI_AS
    > 4 SQLCHAR 0 3000 "," 3 SpecialtyCode Latin1_General_ CI_AS
    > 5 SQLCHAR 0 3000 "," 2 Category Latin1_General_ CI_AS
    > 6 SQLCHAR 0 3000 "\r\n" 4 NetworkCompared To Latin1_General_ CI_AS
    >
    > Sample input file:
    > ID,NAME,SPEC_OR IGINAL,SPEC,CAT EGORY,NetworkCo mparedTo
    > 1,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
    > 2,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
    > 3,"Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1
    > 4,"Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1
    > 5,"Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1
    > 6,"Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1
    > 7,"Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1
    > 8,"Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1
    >
    > Thanks a lot!!!
    >
    > Andrew
    >
    >
    >
    >
    > *** Sent via Devdex http://www.devdex.com ***
    > Don't just participate in USENET...get rewarded for it![/color]

    As a previous poster noted, Erland recently posted some useful comments on
    format files which might be helpful. You might also want to consider DTS,
    which is usually easier to set up than a format file (at least that's my
    opinion), and is reasonably 'intelligent' about delimiters and file formats.
    It also has facilities for transforming data as its loaded, if you need to
    handle issues such as different date formats, for example.

    But DTS is a much more complex tool, and dynamically setting source file
    names and destination database names is definitely awkward unless you have
    some basic VBScript/COM knowledge. If you don't do much data importing, then
    it might be overkill, but if you're comfortable with that sort of
    programming, it could be another option to consider.

    Simon


    Comment

    • Erland Sommarskog

      #3
      Re: Format File Easy one (not for me)!

      [posted and mailed, please reply in news]

      Andrew Slentz (ajslentz@yahoo .com) writes:[color=blue]
      > I have a format file which is working but not correctly. It is, for
      > some reason, dropping the first line of the input .csv file. The
      > problem is something with the second coulumn of data having quotes in
      > it. Any ideas? Below is some info.
      >...
      > Sample input file:
      > ID,NAME,SPEC_OR IGINAL,SPEC,CAT EGORY,NetworkCo mparedTo
      > 1,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
      > 2,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
      > 3,"Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1
      > 4,"Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1
      > 5,"Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1
      > 6,"Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1
      > 7,"Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1
      > 8,"Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1[/color]

      Alas, BCP is not able to handle this file, because BCP does not
      know what a column header is. It thinks that the entire file is
      rows and field in those rows.

      I made longer post on BCP last night, and I recommend you to look it
      up, because it may give you some understanding on how BCP works. You
      will not that the datafile in that post, too, had column header, but
      in that case BCP was able to sort out the problem. That was because
      the line with the column names did not contain the delimiter for the
      first field.

      I don't know DTS, but it is possible that DTS could handle this better.
      Else you would have some program that stripped that first line. Or
      simply added a dummy delimiter to each row but the first, and then
      change the format file accordingly.

      --
      Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

      Books Online for SQL Server SP3 at
      SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

      Comment

      • Andrew Slentz

        #4
        Re: Format File Easy one (not for me)!

        Would it be possible to use FIRSTROW=2? I am actually using Bulk Insert
        and it's an available option. I tried it with no luck and got so far as
        to have the first column in the header row in the first column of the
        database but the rest of the data was right.

        Also... If I could get something like a "|" or something like that
        placed at the end of the header column would that work as a distinct
        delimeter???

        Thanks for all of your help!

        Andrew



        *** Sent via Devdex http://www.devdex.com ***
        Don't just participate in USENET...get rewarded for it!

        Comment

        • Erland Sommarskog

          #5
          Re: Format File Easy one (not for me)!

          Andrew Slentz (ajslentz@yahoo .com) writes:[color=blue]
          > Would it be possible to use FIRSTROW=2? I am actually using Bulk Insert
          > and it's an available option. I tried it with no luck and got so far as
          > to have the first column in the header row in the first column of the
          > database but the rest of the data was right.[/color]

          No, FIRSTROW=2 won't cut it. BCP looks for the delimiter for the first
          column, then the delimiter for the second, the for the third, and so on
          to the last, and then back to the beginning. There is no inherent meaning
          of line feed.
          [color=blue]
          > Also... If I could get something like a "|" or something like that
          > placed at the end of the header column would that work as a distinct
          > delimeter???[/color]

          But you have to have that delitimer on every record in the file. If you
          put | on each line, but the header line, you could say that the first
          field is terminated by |, and then say that this field does not map to
          a database column.



          --
          Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

          Books Online for SQL Server SP3 at
          SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

          Comment

          • Erland Sommarskog

            #6
            Re: Format File Easy one (not for me)!

            [posted and mailed, please reply in news]

            Andrew Slentz (ajslentz@yahoo .com) writes:[color=blue]
            > Would it be possible to use FIRSTROW=2? I am actually using Bulk Insert
            > and it's an available option. I tried it with no luck and got so far as
            > to have the first column in the header row in the first column of the
            > database but the rest of the data was right.
            >
            > Also... If I could get something like a "|" or something like that
            > placed at the end of the header column would that work as a distinct
            > delimeter???[/color]

            Here are some more suggestions how you could change the file. This was
            was your latest posting, that I have seen:

            ID,NAME,SPEC_OR IGINAL,SPEC,CAT EGORY,NetworkCo mparedTo
            1,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
            2,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
            3,"Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1
            4,"Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1
            5,"Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1
            6,"Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1
            7,"Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1
            8,"Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1

            This could work:

            ID,"NAME",SPEC_ ORIGINAL,SPEC,C ATEGORY,Network ComparedTo
            1,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
            2,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
            3,"Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1
            4,"Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1
            5,"Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1
            6,"Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1
            7,"Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1
            8,"Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1

            This would work, because the header row now has the same delimiters
            as the rest of the file. In this case you would use FIRSTROW=2 to
            skip the headerline.

            Here's another one:

            ID,NAME,SPEC_OR IGINAL,SPEC,CAT EGORY,NetworkCo mparedTo
            !1,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
            !2,"Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1
            !3,"Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1
            !4,"Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1
            !5,"Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1
            !6,"Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1
            !7,"Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1
            !8,"Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1

            Here you would add one more field to the format file, which would be
            termimated by ! and where the database column would be 0. In this case,
            you should leave FIRSTROW=1. The entire header row, including the newline
            would then be the first field of the first record, which you would simply
            ignore.

            Yet a twist:

            NAME,SPEC_ORIGI NAL,SPEC,CATEGO RY,NetworkCompa redTo,ID
            "Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1,1
            "Aaron, Arnold H, DO",Family Practice General Practice,FP,PCP ,netcomp1,2
            "Aaron, Arnold H, DO",General Practice,GP,PCP ,netcomp1,3
            "Abae, Mick, MD",Reproductiv e Endocrinology,O BEN,OB,netcomp1 ,4
            "Abanilla, Fernando M, MD",Nephrology, IMNE,SPEC,netco mp1,5
            "Abaunza, Ramiro J, MD",Obstetric s/Gynecology,OBGY ,OB,netcomp1,6
            "Abaunza-Fiallos, Yanina J, MD",Pediatrics, PD,PED,netcomp1 ,7
            "Abbas, Rahat, MD",Internal Medicine,IM,PCP ,netcomp1,8

            Again, you would add one more column to the format file, this time
            terminated by ", and again you should have FIRSTROW=1.

            The actual format file for these cases are left as an exercise to the
            reader. :-)

            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server SP3 at
            SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

            Comment

            Working...