Finding Peoples' Names in Files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • brad

    Finding Peoples' Names in Files

    Crazy question, but has anyone attempted this or seen Python code that
    does? For example, if a text file contained 'Guido' and or 'Robert' and
    or 'Susan', then we should return True, otherwise return False.
  • cokofreedom@gmail.com

    #2
    Re: Finding Peoples' Names in Files

    On Oct 11, 5:22 pm, brad <byte8b...@gmai l.comwrote:
    Crazy question, but has anyone attempted this or seen Python code that
    does? For example, if a text file contained 'Guido' and or 'Robert' and
    or 'Susan', then we should return True, otherwise return False.
    Can't you just use the string function .findall() ?

    Comment

    • Tim Williams

      #3
      Re: Finding Peoples' Names in Files

      On 11/10/2007, brad <byte8bits@gmai l.comwrote:
      Crazy question, but has anyone attempted this or seen Python code that
      does? For example, if a text file contained 'Guido' and or 'Robert' and
      or 'Susan', then we should return True, otherwise return False.
      --

      >

      Text = open(fname).rea d()

      def a_function():
      for Name in ['Guido', Robert',Susan']:
      if Name in Text:
      return 1

      if a_function():
      print "A name was found"

      :)

      Comment

      • brad

        #4
        Re: Finding Peoples' Names in Files

        cokofreedom@gma il.com wrote:
        On Oct 11, 5:22 pm, brad <byte8b...@gmai l.comwrote:
        >Crazy question, but has anyone attempted this or seen Python code that
        >does? For example, if a text file contained 'Guido' and or 'Robert' and
        >or 'Susan', then we should return True, otherwise return False.
        >
        Can't you just use the string function .findall() ?
        >
        I mean *any* possible person's name... I don't *know* the names
        beforehand :)

        Comment

        • Francesco Guerrieri

          #5
          Re: Finding Peoples' Names in Files

          On 10/11/07, brad <byte8bits@gmai l.comwrote:
          cokofreedom@gma il.com wrote:
          On Oct 11, 5:22 pm, brad <byte8b...@gmai l.comwrote:
          Crazy question, but has anyone attempted this or seen Python code that
          does? For example, if a text file contained 'Guido' and or 'Robert' and
          or 'Susan', then we should return True, otherwise return False.
          Can't you just use the string function .findall() ?
          >
          I mean *any* possible person's name... I don't *know* the names
          beforehand :)

          "I cannot combine some characters

          dhcmrlchtdj

          which the divine Library has not foreseen and which in one of
          its secret tongues do not contain a terrible meaning. No one can
          articulate a syllable which is not filled with tenderness and fear,
          which is not, in one of these languages, the powerful name of a god."

          Jorge Luis Borges, The Library of Babel

          Comment

          • brad

            #6
            Re: Finding Peoples' Names in Files

            cokofreedom@gma il.com wrote:
            However...how can you know it is a name...
            OK, I admitted in my first post that it was a crazy question, but if one
            could find an answer, one would be onto something. Maybe it's not a 100%
            answerable question, but I would guess that it is an 80% answerable
            question... I just don't know how... yet :)

            Besides admitting that it's a crazy question, I should stop and explain
            how it would be useful to me at least. Is a credit card number itself
            valuable? I would think not. One can easily re and luhn check for credit
            card numbers located in files with a great degree of accuracy, but a
            number without a name is not very useful to me. So, if one could
            associate names to luhn checked numbers automatically, then one would be
            onto something. Or at least say, "hey, this file has luhn validated CCs
            *AND* it seems to have people's names in it as well." Now then, I'd have
            less to review or perhaps as much as I have now, but I could push the
            files with numbers and names to the top of the list so that they would
            be reviewed first.

            Brad



            Comment

            • Matimus

              #7
              Re: Finding Peoples' Names in Files

              On Oct 11, 9:11 am, brad <byte8b...@gmai l.comwrote:
              cokofree...@gma il.com wrote:
              However...how can you know it is a name...
              >
              OK, I admitted in my first post that it was a crazy question, but if one
              could find an answer, one would be onto something. Maybe it's not a 100%
              answerable question, but I would guess that it is an 80% answerable
              question... I just don't know how... yet :)
              >
              Besides admitting that it's a crazy question, I should stop and explain
              how it would be useful to me at least. Is a credit card number itself
              valuable? I would think not. One can easily re and luhn check for credit
              card numbers located in files with a great degree of accuracy, but a
              number without a name is not very useful to me. So, if one could
              associate names to luhn checked numbers automatically, then one would be
              onto something. Or at least say, "hey, this file has luhn validated CCs
              *AND* it seems to have people's names in it as well." Now then, I'd have
              less to review or perhaps as much as I have now, but I could push the
              files with numbers and names to the top of the list so that they would
              be reviewed first.
              >
              Brad
              What the hell are you doing? Your post sounds to me like you have a
              huge amount of stolen, or at the very least misapprehended, data. Now
              you want to search it for credit card numbers and names so that you
              can use them.

              I am not cool with this! This is a public forum about a programming
              language. What makes you think that anybody in this forum will be cool
              with that. Perhaps you aren't doing anything illegal, but it sure is
              coming off that way. If you are doing something illegal I hope you get
              caught.

              At the very least, you might want to clarify why you are looking for
              such capability so that you don't get effectively black-listed (well,
              by me at least).

              Matt

              Comment

              • Dan Stromberg

                #8
                Re: Finding Peoples' Names in Files

                On Thu, 11 Oct 2007 11:22:50 -0400, brad wrote:
                Crazy question, but has anyone attempted this or seen Python code that
                does? For example, if a text file contained 'Guido' and or 'Robert' and
                or 'Susan', then we should return True, otherwise return False.
                It'll be hard to handle the Dweezil's and Moon Unit's of the world (I
                believe these are Frank Zappa's kids?), but you could compile a list of
                reasonably common names by gaining access to a usenet news spool, and
                pulling the names from the headers.

                But then this is starting to sound dangerously like a spam campaign - in
                which case, "Please don't!".


                Comment

                • byte8bits@gmail.com

                  #9
                  Re: Finding Peoples' Names in Files

                  On Oct 11, 12:49 pm, Matimus <mccre...@gmail .comwrote:
                  On Oct 11, 9:11 am, brad <byte8b...@gmai l.comwrote:
                  >
                  >
                  >
                  cokofree...@gma il.com wrote:
                  However...how can you know it is a name...
                  >
                  OK, I admitted in my first post that it was a crazy question, but if one
                  could find an answer, one would be onto something. Maybe it's not a 100%
                  answerable question, but I would guess that it is an 80% answerable
                  question... I just don't know how... yet :)
                  >
                  Besides admitting that it's a crazy question, I should stop and explain
                  how it would be useful to me at least. Is a credit card number itself
                  valuable? I would think not. One can easily re and luhn check for credit
                  card numbers located in files with a great degree of accuracy, but a
                  number without a name is not very useful to me. So, if one could
                  associate names to luhn checked numbers automatically, then one would be
                  onto something. Or at least say, "hey, this file has luhn validated CCs
                  *AND* it seems to have people's names in it as well." Now then, I'd have
                  less to review or perhaps as much as I have now, but I could push the
                  files with numbers and names to the top of the list so that they would
                  be reviewed first.
                  >
                  Brad
                  >
                  What the hell are you doing? Your post sounds to me like you have a
                  huge amount of stolen, or at the very least misapprehended, data. Now
                  you want to search it for credit card numbers and names so that you
                  can use them.
                  >
                  I am not cool with this! This is a public forum about a programming
                  language. What makes you think that anybody in this forum will be cool
                  with that. Perhaps you aren't doing anything illegal, but it sure is
                  coming off that way. If you are doing something illegal I hope you get
                  caught.
                  >
                  At the very least, you might want to clarify why you are looking for
                  such capability so that you don't get effectively black-listed (well,
                  by me at least).
                  >
                  Matt
                  Go have a beer and calm down a bit :) It's a legitimate purpose,
                  although it could (and probably is being used by bad guys right now).
                  My intent, as you can see from the links below, is to catch it before
                  the bad guys do.




                  Brad



                  Comment

                  • John J. Lee

                    #10
                    Re: Finding Peoples' Names in Files

                    brad <byte8bits@gmai l.comwrites:
                    Crazy question, but has anyone attempted this or seen Python code that
                    does? For example, if a text file contained 'Guido' and or 'Robert'
                    and or 'Susan', then we should return True, otherwise return False.
                    A few ideas:

                    1. If you don't have a list of names, find a list of words that
                    doesn't contain proper nouns (there are a few word lists out there,
                    not sure if any exclude people's names, though). Look for short runs
                    of two or three "words" (punctuation-separated tokens) in the email
                    that aren't in the dictionary. Some of them will be people's names.

                    2. Send the text through Google translate and look for runs of words
                    that are unchanged. Some of them will be people's names.

                    3. Search the literature and look for fancy algorithms. Here are some
                    papers (the last mentions some commercial software to do this):








                    John

                    Comment

                    • Chris Mellon

                      #11
                      Re: Finding Peoples' Names in Files

                      On 10/11/07, byte8bits@gmail .com <byte8bits@gmai l.comwrote:
                      On Oct 11, 12:49 pm, Matimus <mccre...@gmail .comwrote:
                      On Oct 11, 9:11 am, brad <byte8b...@gmai l.comwrote:


                      cokofree...@gma il.com wrote:
                      However...how can you know it is a name...
                      OK, I admitted in my first post that it was a crazy question, but if one
                      could find an answer, one would be onto something. Maybe it's not a 100%
                      answerable question, but I would guess that it is an 80% answerable
                      question... I just don't know how... yet :)
                      Besides admitting that it's a crazy question, I should stop and explain
                      how it would be useful to me at least. Is a credit card number itself
                      valuable? I would think not. One can easily re and luhn check for credit
                      card numbers located in files with a great degree of accuracy, but a
                      number without a name is not very useful to me. So, if one could
                      associate names to luhn checked numbers automatically, then one would be
                      onto something. Or at least say, "hey, this file has luhn validated CCs
                      *AND* it seems to have people's names in it as well." Now then, I'd have
                      less to review or perhaps as much as I have now, but I could push the
                      files with numbers and names to the top of the list so that they would
                      be reviewed first.
                      Brad
                      What the hell are you doing? Your post sounds to me like you have a
                      huge amount of stolen, or at the very least misapprehended, data. Now
                      you want to search it for credit card numbers and names so that you
                      can use them.

                      I am not cool with this! This is a public forum about a programming
                      language. What makes you think that anybody in this forum will be cool
                      with that. Perhaps you aren't doing anything illegal, but it sure is
                      coming off that way. If you are doing something illegal I hope you get
                      caught.

                      At the very least, you might want to clarify why you are looking for
                      such capability so that you don't get effectively black-listed (well,
                      by me at least).

                      Matt
                      >
                      Go have a beer and calm down a bit :) It's a legitimate purpose,
                      although it could (and probably is being used by bad guys right now).
                      My intent, as you can see from the links below, is to catch it before
                      the bad guys do.
                      >


                      >
                      Brad
                      >
                      In case you're doing this for PCI validation, be aware that just the
                      CC number is considered sensitive and you'd get some false negatives
                      if you filter on anything except that.

                      Random strings that match CC checksums are really quite rare and false
                      positives from that alone are unlikely to be a problem. Unless I
                      deployed this and there was a significant false positive rate I
                      wouldn't risk the false negatives, personally.

                      Comment

                      • brad

                        #12
                        Re: Finding Peoples' Names in Files

                        Chris Mellon wrote:
                        In case you're doing this for PCI validation, be aware that just the
                        CC number is considered sensitive and you'd get some false negatives
                        if you filter on anything except that.
                        >
                        Random strings that match CC checksums are really quite rare and false
                        positives from that alone are unlikely to be a problem. Unless I
                        deployed this and there was a significant false positive rate I
                        wouldn't risk the false negatives, personally.
                        Yes, it is for PCI. Our rate of false positives is low, very low. I
                        wasn't aware that a number alone was a PCI violation. Thank you! On
                        another note, we're a university (Virginia Tech) and we're subject to
                        FERPA, HIPPA, GLBA, etc... in addition to PCI. So we do these checks for
                        U.S. Social Security Numbers too in an effort to prevent or lessen the
                        chance of ID theft. Unfortunately, there is no luhn check for SSNs. We
                        follow the Social Security Administration verification guideline
                        religiously... here's an web front-end to my logic:



                        but still have many false positives on SSNs, so being able to id *names
                        and numbers* in files would still be a be benefit to us.

                        Brad

                        Comment

                        Working...