Decimating Excel files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • gonzlobo

    Decimating Excel files

    No, I don't want to destroy them (funny how the word 'decimate' has
    changed definition over the years) :).

    We have a data acquisition program that saves its output to Excel's
    ..xls format. Unfortunately, the programmer was too stupid to write
    files the average user can read.

    I'd like some advice on how to go about:
    1. Reading a large Excel file and chop it into many Excel files (with
    only 65535 lines per file)
    or
    2. Decimate an Excel file & write... say every other line (user
    selectable)... to a new file.

    I'm pretty experienced at reading and writing simple text files, but
    this is my first foray into using COM. I would imagine either choice 1
    or 2 is pretty simple once I can get the file open.

    Thanks in advance.
  • mensanator@aol.com

    #2
    Re: Decimating Excel files

    On Feb 3, 1:43?pm, gonzlobo <gonzl...@gmail .comwrote:
    No, I don't want to destroy them (funny how the word 'decimate' has
    changed definition over the years) :).
    >
    We have a data acquisition program that saves its output to Excel's
    .xls format. Unfortunately, the programmer was too stupid to write
    files the average user can read.
    >
    I'd like some advice on how to go about:
    1. Reading a large Excel file and chop it into many Excel files (with
    only 65535 lines per file)
    An Excel sheet only has 65535 lines. Or do yo mean it has
    multiple sheets?
    or
    2. Decimate an Excel file & write... say every other line (user
    selectable)... to a new file.
    Excel has VBA and can do this easily. One thing about
    Excel's VBA is that it already understands Excel.
    >
    I'm pretty experienced at reading and writing simple text files, but
    this is my first foray into using COM. I would imagine either choice 1
    or 2 is pretty simple once I can get the file open.
    Is it the case where you have .xls files but not the application?
    >
    Thanks in advance.

    Comment

    • gonzlobo

      #3
      Re: Decimating Excel files

      Thanks, but I was looking for a python solution.
      Excel has VBA and can do this easily. One thing about
      Excel's VBA is that it already understands Excel.

      Comment

      • greg

        #4
        Re: Decimating Excel files

        gonzlobo wrote:
        2. Decimate an Excel file & write... say every other line (user
        selectable)... to a new file.
        Every other line would be bicimating or something,
        wouldn't it?

        --
        Greg

        Comment

        • Phil Schmidt

          #5
          Re: Decimating Excel files

          gonzlobo wrote:
          We have a data acquisition program that saves its output to Excel's
          .xls format. Unfortunately, the programmer was too stupid to write
          files the average user can read.
          >
          I'd like some advice on how to go about:
          1. Reading a large Excel file and chop it into many Excel files (with
          only 65535 lines per file)
          or
          2. Decimate an Excel file & write... say every other line (user
          selectable)... to a new file.
          Is the file format really native Excel, or is a CSV or TSV file? I've
          seen apps (one is a data acquisition program, as a matter of fact)
          that create "Excel" files that are just CSV or TSV files. Try opening
          the file with a text editor to see if it's plain ASCII text.

          In any case, if it's a CSV or TSV file, the Python CSV module is your
          friend.

          Comment

          • gonzlobo

            #6
            Re: Decimating Excel files

            Yeah, it definitely an Excel file (so says Kate).
            Is the file format really native Excel, or is a CSV or TSV file? I've
            seen apps (one is a data acquisition program, as a matter of fact)
            that create "Excel" files that are just CSV or TSV files. Try opening
            the file with a text editor to see if it's plain ASCII text.

            Comment

            • Paddy

              #7
              Re: Decimating Excel files

              On Feb 3, 7:43 pm, gonzlobo <gonzl...@gmail .comwrote:
              No, I don't want to destroy them (funny how the word 'decimate' has
              changed definition over the years) :).
              >
              We have a data acquisition program that saves its output to Excel's
              .xls format. Unfortunately, the programmer was too stupid to write
              files the average user can read.
              >
              I'd like some advice on how to go about:
              1. Reading a large Excel file and chop it into many Excel files (with
              only 65535 lines per file)
              or
              2. Decimate an Excel file & write... say every other line (user
              selectable)... to a new file.
              >
              I'm pretty experienced at reading and writing simple text files, but
              this is my first foray into using COM. I would imagine either choice 1
              or 2 is pretty simple once I can get the file open.
              >
              Thanks in advance.
              I got a fair amount of hits for xls-to-csv in Google..

              - Paddy.
              ..

              Comment

              • jean-michel bain-cornu

                #8
                Re: Decimating Excel files

                Hi,
                We have a data acquisition program that saves its output to Excel's
                ..xls format. Unfortunately, the programmer was too stupid to write
                files the average user can read.
                >
                I'd like some advice on how to go about:
                1. Reading a large Excel file and chop it into many Excel files (with
                only 65535 lines per file)
                or
                2. Decimate an Excel file & write... say every other line (user
                selectable)... to a new file.
                >
                I'm pretty experienced at reading and writing simple text files, but
                this is my first foray into using COM. I would imagine either choice 1
                or 2 is pretty simple once I can get the file open.
                When I have Excel stuff to do, I use :
                Download pyExcelerator for free. Generating Excel 97+ files with Python 2.4+ (need decorators), importing Excel 95+ files, support for UNICODE in Excel files, using variety of formatting features and printing options, Excel files and OLE2 compound files dumper. No need in Windows/COM

                May be it can cover your needs ?
                From my point of view, COM is something to avoid : no docs, lot of
                investigations, weak results.
                Regards
                jm

                Comment

                • Arnd

                  #9
                  Re: Decimating Excel files

                  Every other line would be bicimating or something,
                  wouldn't it?
                  Good observation, but as we have numbers of type Cardinalia,
                  Ordinalia, Distributiva & Multiplicativa in Latin I would prefer
                  secundating or secondating. (Bisimating or bicimating would multiply
                  the lines by a factor 2)

                  ;) Arnd

                  Comment

                  • greg

                    #10
                    Re: Decimating Excel files

                    Arnd wrote:
                    Good observation, but as we have numbers of type Cardinalia,
                    Ordinalia, Distributiva & Multiplicativa in Latin I would prefer
                    secundating or secondating. (Bisimating or bicimating would multiply
                    the lines by a factor 2)
                    Interesting. But does this mean that "duplicatin g" is
                    actually from the wrong root? And also we have
                    "bifurcatio n", which means splitting in two rather
                    than multiplication by two -- or does that come
                    down to the same thing?

                    --
                    Greg

                    Comment

                    • Arnd

                      #11
                      Re: Decimating Excel files

                      On 5 Feb., 10:53, greg <g...@cosc.cant erbury.ac.nzwro te:
                      Interesting. But does this mean that "duplicatin g" is
                      actually from the wrong root?
                      by definition: roots are never wrong ;)
                      But indeed, you're right, one has to look at the root (eg connected
                      verb) to understand the Numeralia they used:
                      The number 2 has Numeralia duo, secundus, bini and bis:

                      duo + plicare (=to fold): Cardinalia (how much? ->n-fold)
                      And also we have "bifurcatio n",
                      yes, or eg "bisection" (from "bis"): Multiplicativa (how many times?)
                      To complete the story:
                      "binary" (from "bini"): Distributiva (how much each time?, to build
                      groups or distributions)
                      "second" (from "secundus") : Ordinalia (which? -the 2nd, the 3rd etc)

                      Arnd (I hated Latin)


                      Comment

                      • Gabriel Genellina

                        #12
                        Re: Decimating Excel files

                        En Sat, 03 Feb 2007 18:52:10 -0300, mensanator@aol. com
                        <mensanator@aol .comescribió:
                        On Feb 3, 1:43?pm, gonzlobo <gonzl...@gmail .comwrote:
                        >We have a data acquisition program that saves its output to Excel's
                        >.xls format. Unfortunately, the programmer was too stupid to write
                        >files the average user can read.
                        >>
                        >I'd like some advice on how to go about:
                        >1. Reading a large Excel file and chop it into many Excel files (with
                        >only 65535 lines per file)
                        >
                        An Excel sheet only has 65535 lines. Or do yo mean it has
                        multiple sheets?
                        As I understand the problem, the OP has a program that generates the .xls
                        files, but it's so dumb that writes files too large for Excel to read.
                        I'd try the "xlrd" package - it is capable of reading Excel files on any
                        platform.

                        --
                        Gabriel Genellina

                        Comment

                        • mensanator@aol.com

                          #13
                          Re: Decimating Excel files

                          On Feb 5, 5:46 pm, "Gabriel Genellina" <gagsl...@yahoo .com.arwrote:
                          En Sat, 03 Feb 2007 18:52:10 -0300, mensana...@aol. com
                          <mensana...@aol .comescribió:
                          >
                          On Feb 3, 1:43?pm, gonzlobo <gonzl...@gmail .comwrote:
                          We have a data acquisition program that saves its output to Excel's
                          .xls format. Unfortunately, the programmer was too stupid to write
                          files the average user can read.
                          >
                          I'd like some advice on how to go about:
                          1. Reading a large Excel file and chop it into many Excel files (with
                          only 65535 lines per file)
                          >
                          An Excel sheet only has 65535 lines. Or do yo mean it has
                          multiple sheets?
                          >
                          As I understand the problem, the OP has a program that generates the .xls
                          files, but it's so dumb that writes files too large for Excel to read.
                          My first thought was how would that be possible?

                          But then, nothing's stopping someone from making
                          a million line .csv file (which Excel thinks it "owns")
                          that would be too big for Excel to open.

                          If that's the case, then chasing COM is barking up
                          the wrong tree.
                          I'd try the "xlrd" package - it is capable of reading Excel files on any
                          platform.
                          >
                          --
                          Gabriel Genellina

                          Comment

                          • John Machin

                            #14
                            Re: Decimating Excel files

                            On Feb 6, 10:46 am, "Gabriel Genellina" <gagsl...@yahoo .com.arwrote:
                            En Sat, 03 Feb 2007 18:52:10 -0300, mensana...@aol. com
                            <mensana...@aol .comescribió:
                            >
                            On Feb 3, 1:43?pm, gonzlobo <gonzl...@gmail .comwrote:
                            We have a data acquisition program that saves its output to Excel's
                            .xls format. Unfortunately, the programmer was too stupid to write
                            files the average user can read.
                            >
                            I'd like some advice on how to go about:
                            1. Reading a large Excel file and chop it into many Excel files (with
                            only 65535 lines per file)
                            >
                            An Excel sheet only has 65535 lines. Or do yo mean it has
                            multiple sheets?
                            >
                            As I understand the problem, the OP has a program that generates the .xls
                            files, but it's so dumb that writes files too large for Excel to read.
                            I'd try the "xlrd" package - it is capable of reading Excel files on any
                            platform.
                            Thanks for the plug, Gabriel. However xlrd is not the panacea for all
                            evils and all idiocies :-)

                            Excel "file sizes" are limited by the number of rows and columns that
                            a particular version's file format will support in each worksheet.
                            There may be a limit on the maximum number of worksheets in a file,
                            but I've never heard of this as a problem.

                            Before Excel 97 aka 8.0, the limits were 16384 rows x 256 columns.

                            Excel 97 (8.0) up to Excel 2003 (11.0) allow 65536 rows by 256
                            columns.

                            Excel 2007 aka 12.0 (just released) raises the limits to 2**20 rows x
                            16384 columns.

                            Let's presume the OP is talking about files written in the format
                            ("BIFF8") that is expected by Excel 97-2003.

                            In cell data records, there are 16 bits for an unsigned row number. If
                            the file writer writes zillions of rows, with row numbers modulo
                            65536, then not even xlrd can help the OP -- not out of the box;
                            rescue would be possible with a tweaked version *if* the rows were
                            written in sequential order.

                            There are also 16 bits for an unsigned column number. It is possible
                            to write 65536 columns, but I'd guess that Excel would refuse to open
                            the file, or go bananas. In any case the OP's problem seems to be with
                            too many rows.

                            In any case #2, what the OP said was [my emphasis added]:
                            "Unfortunat ely, the programmer was too stupid to write files *the
                            average user* can read." -- i.e. it's not Excel being grumpy, it's
                            the average user.

                            It would help a great deal if the OP would say what the problem really
                            is ...

                            Cheers,
                            John

                            Comment

                            • John Machin

                              #15
                              Re: Decimating Excel files

                              On Feb 6, 12:27 pm, "mensana...@aol .com" <mensana...@aol .comwrote:
                              On Feb 5, 5:46 pm, "Gabriel Genellina" <gagsl...@yahoo .com.arwrote:
                              >
                              >
                              >
                              En Sat, 03 Feb 2007 18:52:10 -0300, mensana...@aol. com
                              <mensana...@aol .comescribió:
                              >
                              On Feb 3, 1:43?pm, gonzlobo <gonzl...@gmail .comwrote:
                              >We have a data acquisition program that saves its output to Excel's
                              >.xls format. Unfortunately, the programmer was too stupid to write
                              >files the average user can read.
                              >
                              >I'd like some advice on how to go about:
                              >1. Reading a large Excel file and chop it into many Excel files (with
                              >only 65535 lines per file)
                              >
                              An Excel sheet only has 65535 lines. Or do yo mean it has
                              multiple sheets?
                              >
                              As I understand the problem, the OP has a program that generates the .xls
                              files, but it's so dumb that writes files too large for Excel to read.
                              >
                              My first thought was how would that be possible?
                              >
                              But then, nothing's stopping someone from making
                              a million line .csv file (which Excel thinks it "owns")
                              that would be too big for Excel to open.
                              >
                              If that's the case, then chasing COM is barking up
                              the wrong tree.
                              >
                              To clear up the doubts, I'd suggest that the OP do something like this
                              at the Python interactive prompt:

                              print repr(open('nast y_file.xls', 'rb').read(512) )

                              If that produces recognisable stuff, then it's a CSV file (or a tab
                              separated file) masquerading as an XLS file.

                              OTOH, if it produces a bunch of hex starting with "\xd0\xcf
                              \x11\xe0\xa1\xb 1\x1a\xe1" then it's at least an OLE2 compound document
                              -- could be Word or Powerpoint, though :-)

                              What would be even better is, if the OP has downloaded xlrd:

                              Presuming Windows, and Python installed in default location, and xlrd
                              installed using its setup.py, do this at the command prompt

                              c:\python25\pyt hon c:\python25\scr ipts\runxlrd.py ov nasty_file.xls

                              This should give an overview ("ov") of the file, showing for each
                              worksheet how many columns and rows are used, plus other potentially
                              helpful information -- or it will raise an exception; still useful
                              information.

                              Cheers,
                              John


                              Comment

                              Working...