Problems with parsing uploaded csv file contents

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • codesid
    New Member
    • Dec 2006
    • 6

    Problems with parsing uploaded csv file contents

    I could not find over the web anything related to this issue that I found, so I started to open a discussion about this, and maybe can help me out, or give better ideas of how to handle with this.

    Environment:
    Windows XP Pro, VS2003, .NET 1.1, C#

    The Case:
    When we obtain the information of a csv file from a post type "multipart/form-data", the contents of the file come with strange chars, similar to double spaces that are not exactly spaces (very tricky to clean them up). I searched this case, but did not find anything related. So, I managed to solve this in a very dirty way, which I am embarassed to show here (lol).

    We can upload (without using any COMs) files in 2 ways, basically:

    1) using Request.BinaryR ead (manual handling with posted data)
    2) using HtmlInputFile.P ostedFile (using .NET web controls)

    I've tested my case for both and the results are the same.

    Eg.

    piece of the file = [bar foo,bar@foo.com ,]
    after obtaining the info from the post = [
    b a r f o o , b a r @ f o o . c o m , ]

    When I run a simple script to separate all elements of the content that I receive, see how it looks like:

    [ ] [
    ](this is a new line content) [ ] [b] [ ] [a] [ ] [r] [ ] [ ] [ ] [f] [ ] [o] [ ] [o] [ ] [,] [ ] [b] [ ] [a] [ ] [r] [ ] [@] [ ] [f] [ ] [o] [ ] [o] [ ] [.] [ ] [c] [ ] [o] [ ] [m] [ ] [,] [ ]

    For your information, the script:

    // line is the line from the csv file
    for(int i = 0; i < line.Length; i++)
    {
    string digit = line.Substring( i,1);
    Response.Write( " ["+digit+"] ");
    }

    Note: Not all csv files have this problem. I got this from google's exporting features (gmail, orkut, etc).

    The funny thing is when I display the contents on a webpage, everything seems ok, because the strange chars do not appear... I first discovered this when I did a script to automatically store the csv contents in a database... the data was very strange, because in the database all strange spaces were there, including the "new line" which does not disappear even if you replace it for anything else.

    When I tried to compare the data of the spaces there, I could not find anything that would clean them:

    line = line.Replace(" ", "") -> does not work
    digit.Equals(st ring.Empty) -> always return false
    "" + digit == "" -> is false
    "" + digit == " " -> is false

    So, I lost my hope on finding this char, which is not empty neither blank, so I went for the hashcode, and apparently solved the problem, but as I told before, I would not advise myself to do something like that, ever.

    So, simply read everything char by char, and clean them up...

    string correctedLine = "";
    for(int i = 0; i < line.Length; i++)
    {
    string digit = line.Substring( i,1);
    if (digit.GetHashC ode() != 5381 && digit.GetHashCo de() != 177583)
    {
    correctedLine += digit;
    }
    }

    Has anyone ever seen this?

    Thanks in advance.
  • codesid
    New Member
    • Dec 2006
    • 6

    #2
    Haven't anyone gotten into similar problem so far?

    Comment

    • bplacker
      New Member
      • Sep 2006
      • 121

      #3
      are the strange characters 'line breaks' ? try comparing against or replacing vbcrlf or something like this. I remember when dealing with CSVs in Java, that the character or character string for line breaks was something strange.

      Comment

      Working...