Performance over COM

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Linan

    Performance over COM

    Recently I wrote a program to help my wife to extract data from about
    500 excel files. For each file, around 1000 cells are visited and the
    texts are retrieved.

    Firstly I wrote a python script, used win32com module. The program ran
    well, but took 78 mins to accomplished the task. Of course it was a
    little bit slow, so I gave csharp a go. The csharp program took 72
    mins.

    Both Excel and .Net are MS product, run on a microsoft operating
    system. However, I don't see any performance advantages. I don't want
    to fire a script/compile laugange or opensource/ms war. My question is,
    any tricks to speed COM related operation up in csharp, without using
    multiple threading?

    My code:
    <pre>
    Application excelApp=new Application();

    // Go through 500+ excel files
    foreach(string file in fileList){

    excelApp.Workbo oks.Open(file,T ype.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing);

    // Go through 1000+ cells in each file
    foreach(Diction aryEntry de in cellHash){
    string id=(string)de.K ey;
    string[] val=(string[])de.Value;
    string
    v=(string)excel App.Sheet[val[1]].get_Range(val[0],Missing.Value) .Text;
    Console.WriteLi ne("{0}\t{1}\t{ 2}\t{3}",id,day ,time,v);
    }

    excelApp.Workbo oks.Close();
    }
    </pre>

  • Stephany Young

    #2
    Re: Performance over COM

    If what you have shown us is, in fact, the code for your actual process then
    some factors jump out and hit one between the eyes, including:

    the retrieval of the value from the Excel cell is executed 500,000+
    times

    the Console.Writeli ne instructuion is also executed 500,000+ times

    If you run the program with the Console.Writeli ne instruction commented out,
    you will get a feel for just how long the interaction with Excel takes.

    Each interaction with Excel is an expensive operation so the ideal situation
    is to reduce the number of interactions to the absolute minimum.

    You are reading from the same cells in each Workbook. Instead of reading
    form each cell, try reading the the entire range into an array and then
    reading the individual 'cells' from the array. This will reduce the number
    of interactions with Excel some 1000 fold.



    "Linan" <tali.wang@gmai l.comwrote in message
    news:1157840553 .873111.195310@ e3g2000cwe.goog legroups.com...
    Recently I wrote a program to help my wife to extract data from about
    500 excel files. For each file, around 1000 cells are visited and the
    texts are retrieved.
    >
    Firstly I wrote a python script, used win32com module. The program ran
    well, but took 78 mins to accomplished the task. Of course it was a
    little bit slow, so I gave csharp a go. The csharp program took 72
    mins.
    >
    Both Excel and .Net are MS product, run on a microsoft operating
    system. However, I don't see any performance advantages. I don't want
    to fire a script/compile laugange or opensource/ms war. My question is,
    any tricks to speed COM related operation up in csharp, without using
    multiple threading?
    >
    My code:
    <pre>
    Application excelApp=new Application();
    >
    // Go through 500+ excel files
    foreach(string file in fileList){
    >
    excelApp.Workbo oks.Open(file,T ype.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
    Type.Missing);
    >
    // Go through 1000+ cells in each file
    foreach(Diction aryEntry de in cellHash){
    string id=(string)de.K ey;
    string[] val=(string[])de.Value;
    string
    v=(string)excel App.Sheet[val[1]].get_Range(val[0],Missing.Value) .Text;
    Console.WriteLi ne("{0}\t{1}\t{ 2}\t{3}",id,day ,time,v);
    }
    >
    excelApp.Workbo oks.Close();
    }
    </pre>
    >

    Comment

    • Sylvain Lafontaine

      #3
      Re: Performance over COM

      I will add to that that instead of opening Excell with Automation (something
      that will always be slow whatever the scripting language you are using); you
      can try reading the file using the JET OLEDB provider or the Excel ODBC
      Provider:









      Finally, for this type of work, probably that you should use a database
      (Access or SQL-Server) instead of Excel. Excel is something like a big
      calculator, not a database.

      --
      Sylvain Lafontaine, ing.
      MVP - Technologies Virtual-PC
      E-mail: http://cerbermail.com/?QugbLEWINF


      "Stephany Young" <noone@localhos twrote in message
      news:OSwCNIG1GH A.4924@TK2MSFTN GP05.phx.gbl...
      If what you have shown us is, in fact, the code for your actual process
      then some factors jump out and hit one between the eyes, including:
      >
      the retrieval of the value from the Excel cell is executed 500,000+
      times
      >
      the Console.Writeli ne instructuion is also executed 500,000+ times
      >
      If you run the program with the Console.Writeli ne instruction commented
      out, you will get a feel for just how long the interaction with Excel
      takes.
      >
      Each interaction with Excel is an expensive operation so the ideal
      situation is to reduce the number of interactions to the absolute minimum.
      >
      You are reading from the same cells in each Workbook. Instead of reading
      form each cell, try reading the the entire range into an array and then
      reading the individual 'cells' from the array. This will reduce the
      number of interactions with Excel some 1000 fold.
      >
      >
      >
      "Linan" <tali.wang@gmai l.comwrote in message
      news:1157840553 .873111.195310@ e3g2000cwe.goog legroups.com...
      >Recently I wrote a program to help my wife to extract data from about
      >500 excel files. For each file, around 1000 cells are visited and the
      >texts are retrieved.
      >>
      >Firstly I wrote a python script, used win32com module. The program ran
      >well, but took 78 mins to accomplished the task. Of course it was a
      >little bit slow, so I gave csharp a go. The csharp program took 72
      >mins.
      >>
      >Both Excel and .Net are MS product, run on a microsoft operating
      >system. However, I don't see any performance advantages. I don't want
      >to fire a script/compile laugange or opensource/ms war. My question is,
      >any tricks to speed COM related operation up in csharp, without using
      >multiple threading?
      >>
      >My code:
      ><pre>
      >Application excelApp=new Application();
      >>
      >// Go through 500+ excel files
      >foreach(stri ng file in fileList){
      >>
      >excelApp.Workb ooks.Open(file, Type.Missing, Type.Missing, Type.Missing,
      >Type.Missing , Type.Missing, Type.Missing, Type.Missing, Type.Missing,
      >Type.Missing , Type.Missing, Type.Missing, Type.Missing, Type.Missing,
      >Type.Missing );
      >>
      >// Go through 1000+ cells in each file
      >foreach(Dictio naryEntry de in cellHash){
      >string id=(string)de.K ey;
      >string[] val=(string[])de.Value;
      >string
      >v=(string)exce lApp.Sheet[val[1]].get_Range(val[0],Missing.Value) .Text;
      >Console.WriteL ine("{0}\t{1}\t {2}\t{3}",id,da y,time,v);
      >}
      >>
      >excelApp.Workb ooks.Close();
      >}
      ></pre>
      >>
      >
      >

      Comment

      • Otis Mukinfus

        #4
        Re: Performance over COM

        On 9 Sep 2006 15:22:34 -0700, "Linan" <tali.wang@gmai l.comwrote:
        >Recently I wrote a program to help my wife to extract data from about
        >500 excel files. For each file, around 1000 cells are visited and the
        >texts are retrieved.
        >
        >Firstly I wrote a python script, used win32com module. The program ran
        >well, but took 78 mins to accomplished the task. Of course it was a
        >little bit slow, so I gave csharp a go. The csharp program took 72
        >mins.
        >
        >Both Excel and .Net are MS product, run on a microsoft operating
        >system. However, I don't see any performance advantages. I don't want
        >to fire a script/compile laugange or opensource/ms war. My question is,
        >any tricks to speed COM related operation up in csharp, without using
        >multiple threading?
        >
        >My code:
        ><pre>
        >Application excelApp=new Application();
        >
        >// Go through 500+ excel files
        >foreach(stri ng file in fileList){
        >
        > excelApp.Workbo oks.Open(file,T ype.Missing, Type.Missing, Type.Missing,
        >Type.Missing , Type.Missing, Type.Missing, Type.Missing, Type.Missing,
        >Type.Missing , Type.Missing, Type.Missing, Type.Missing, Type.Missing,
        >Type.Missing );
        >
        > // Go through 1000+ cells in each file
        > foreach(Diction aryEntry de in cellHash){
        > string id=(string)de.K ey;
        > string[] val=(string[])de.Value;
        > string
        >v=(string)exce lApp.Sheet[val[1]].get_Range(val[0],Missing.Value) .Text;
        > Console.WriteLi ne("{0}\t{1}\t{ 2}\t{3}",id,day ,time,v);
        > }
        >
        > excelApp.Workbo oks.Close();
        >}
        ></pre>
        Ask your wife to save her files as CSV files and process them by reading the
        text files produced by the Excel save to CSV. You will be able to read them
        much faster and process them faster too.
        Good luck with your project,

        Otis Mukinfus


        Comment

        • Linan

          #5
          Re: Performance over COM

          The OleDb solution speeds up the program. Now it takes only 35 mins.

          Thank you very much!

          Sylvain Lafontaine (fill the blanks, no spam please) wrote:
          I will add to that that instead of opening Excell with Automation (something
          that will always be slow whatever the scripting language you are using); you
          can try reading the file using the JET OLEDB provider or the Excel ODBC
          Provider:
          >

          >

          >

          >

          >
          Finally, for this type of work, probably that you should use a database
          (Access or SQL-Server) instead of Excel. Excel is something like a big
          calculator, not a database.
          >
          --
          Sylvain Lafontaine, ing.
          MVP - Technologies Virtual-PC
          E-mail: http://cerbermail.com/?QugbLEWINF
          >
          >
          "Stephany Young" <noone@localhos twrote in message
          news:OSwCNIG1GH A.4924@TK2MSFTN GP05.phx.gbl...
          If what you have shown us is, in fact, the code for your actual process
          then some factors jump out and hit one between the eyes, including:

          the retrieval of the value from the Excel cell is executed 500,000+
          times

          the Console.Writeli ne instructuion is also executed 500,000+ times

          If you run the program with the Console.Writeli ne instruction commented
          out, you will get a feel for just how long the interaction with Excel
          takes.

          Each interaction with Excel is an expensive operation so the ideal
          situation is to reduce the number of interactions to the absolute minimum.

          You are reading from the same cells in each Workbook. Instead of reading
          form each cell, try reading the the entire range into an array and then
          reading the individual 'cells' from the array. This will reduce the
          number of interactions with Excel some 1000 fold.



          "Linan" <tali.wang@gmai l.comwrote in message
          news:1157840553 .873111.195310@ e3g2000cwe.goog legroups.com...
          Recently I wrote a program to help my wife to extract data from about
          500 excel files. For each file, around 1000 cells are visited and the
          texts are retrieved.
          >
          Firstly I wrote a python script, used win32com module. The program ran
          well, but took 78 mins to accomplished the task. Of course it was a
          little bit slow, so I gave csharp a go. The csharp program took 72
          mins.
          >
          Both Excel and .Net are MS product, run on a microsoft operating
          system. However, I don't see any performance advantages. I don't want
          to fire a script/compile laugange or opensource/ms war. My question is,
          any tricks to speed COM related operation up in csharp, without using
          multiple threading?
          >
          My code:
          <pre>
          Application excelApp=new Application();
          >
          // Go through 500+ excel files
          foreach(string file in fileList){
          >
          excelApp.Workbo oks.Open(file,T ype.Missing, Type.Missing, Type.Missing,
          Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
          Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
          Type.Missing);
          >
          // Go through 1000+ cells in each file
          foreach(Diction aryEntry de in cellHash){
          string id=(string)de.K ey;
          string[] val=(string[])de.Value;
          string
          v=(string)excel App.Sheet[val[1]].get_Range(val[0],Missing.Value) .Text;
          Console.WriteLi ne("{0}\t{1}\t{ 2}\t{3}",id,day ,time,v);
          }
          >
          excelApp.Workbo oks.Close();
          }
          </pre>
          >

          Comment

          Working...