Need to mark similar phrases in two different texts

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • SuperNova

    Need to mark similar phrases in two different texts

    Hello!
    I need to mark similar phrases in two different texts, for example to
    use <btag.

    Example:

    text 1:
    Google Chrome is a browser that combines a minimal design with
    sophisticated technology to make the web faster, safer, and easier.

    text 2:
    Hematology Analyzers – Simple, Sophisticated Technology Serving All
    Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

    After comparing the following should be shown:
    Google Chrome is a browser that combines a minimal design with
    <b>sophisticate d technology</bto make the web faster, safer, and
    easier.

    Hematology Analyzers – Simple, <b>Sophisticate d Technology</bServing
    All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.

    Because "sophistica ted technology" is repeated. But unfortunately I
    don't know how to do it. Can you help me?
  • Curtis

    #2
    Re: Need to mark similar phrases in two different texts

    SuperNova wrote:
    Hello!
    I need to mark similar phrases in two different texts, for example to
    use <btag.
    >
    Example:
    >
    text 1:
    Google Chrome is a browser that combines a minimal design with
    sophisticated technology to make the web faster, safer, and easier.
    >
    text 2:
    Hematology Analyzers – Simple, Sophisticated Technology Serving All
    Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
    >
    After comparing the following should be shown:
    Google Chrome is a browser that combines a minimal design with
    <b>sophisticate d technology</bto make the web faster, safer, and
    easier.
    >
    Hematology Analyzers – Simple, <b>Sophisticate d Technology</bServing
    All Patients - Clinical Diagnostics Technology Spotlight - Medcompare.
    >
    Because "sophistica ted technology" is repeated. But unfortunately I
    don't know how to do it. Can you help me?
    That's not quite enough to go on for effectively finding matches. It
    would be trivial if you had a pre-determined list of phrases, or you
    used a query from the user.

    However, as you have it now, and since the phrase could be anything,
    you'd end up making bold useless things like indefinite/definite
    articles, prepositions, pronouns, etc.

    --
    Curtis

    Comment

    • Sjoerd

      #3
      Re: Need to mark similar phrases in two different texts

      SuperNova wrote:
      I need to mark similar phrases in two different texts, for example to
      use <btag.
      Why do you want this?

      This may work:
      1) Make a list of words in each text.
      2) Compute the intersection of these lists, so that the result is a list
      with words which are present in both texts.
      3) Filter this list to avoid common words such as 'it' and 'a'.
      4) Mark the all words in the list bold in the texts.

      Something like this:

      <?php
      $text1 = 'Google Chrome[...]';
      $text2 = 'Hematology Analyzers[...]';

      // We don't want case sensitivity
      $lower1 = strtolower($tex t1);
      $lower2 = strtolower($tex t2);

      // Array of words
      $array1 = preg_split('/\W/', $lower1);
      $array2 = preg_split('/\W/', $lower2);

      // Intersect
      $intersect = array_intersect ($array1, $array2);

      // Filter
      $filter = array('a', '');
      $filtered = array_diff($int ersect , $filter);

      // Make bold
      foreach ($filtered as $word) {
      $text1 = preg_replace("/($word)/i", '<b>\1</b>', $text1);
      $text2 = preg_replace("/($word)/i", '<b>\1</b>', $text2);
      }

      echo $text1;
      echo $text2;
      ?>

      Comment

      • SuperNova

        #4
        Re: Need to mark similar phrases in two different texts

        Why do you want this?
        >
        This may work:
        1) Make a list of words in each text.
        2) Compute the intersection of these lists, so that the result is a list
        with words which are present in both texts.
        3) Filter this list to avoid common words such as 'it' and 'a'.
        4) Mark the all words in the list bold in the texts.
        Thank you for the code sample. It's a good thing to think about. But I
        need to mark similar phrases, 2 or more words one after another. Your
        code marks all the similar words, but I need to mark only 2 or more
        words one after another.

        Comment

        • Luuk

          #5
          Re: Need to mark similar phrases in two different texts

          SuperNova schreef:
          >Why do you want this?
          >>
          >This may work:
          >1) Make a list of words in each text.
          >2) Compute the intersection of these lists, so that the result is a list
          >with words which are present in both texts.
          >3) Filter this list to avoid common words such as 'it' and 'a'.
          >4) Mark the all words in the list bold in the texts.
          >
          Thank you for the code sample. It's a good thing to think about. But I
          need to mark similar phrases, 2 or more words one after another. Your
          code marks all the similar words, but I need to mark only 2 or more
          words one after another.
          than you can 'unmark' if you got only 1 consecutive hit

          this will leave all the marked words with 2 or more consecutive hits

          (or am i missing something?)

          --
          Luuk

          Comment

          • Sjoerd

            #6
            Re: Need to mark similar phrases in two different texts

            SuperNova wrote:
            Thank you for the code sample. It's a good thing to think about. But I
            need to mark similar phrases, 2 or more words one after another. Your
            code marks all the similar words, but I need to mark only 2 or more
            words one after another.
            I am sure you can figure out how to make my example work with two words.
            Although my previous post was elaborate and even included a working
            example, I have no intentions to write code for you to solve your problem.

            Comment

            • SuperNova

              #7
              Re: Need to mark similar phrases in two different texts

              On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail .comwrote:
              I have no intentions to write code for you to solve your problem.
              I don't need code, I need algorithm. But the only thing I'm thinking
              about is to split words in array and to check words. If words are
              alike, the second word should be checked again, if it is alike too,
              the mark should be set. But I hoped that there is more fast algorithm.

              Comment

              • Bill H

                #8
                Re: Need to mark similar phrases in two different texts

                On Sep 8, 5:55 am, SuperNova <SerafimPa...@g mail.comwrote:
                On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail .comwrote:
                >
                I have no intentions to write code for you to solve your problem.
                >
                I don't need code, I need algorithm. But the only thing I'm thinking
                about is to split words in array and to check words. If words are
                alike, the second word should be checked again, if it is alike too,
                the mark should be set. But I hoped that there is more fast algorithm.
                You are probably looking for something along the line of a dictionary
                coder, the process used in some compression algorithms. see:
                http://en.wikipedia.org/wiki/Dictionary_coder for how it works.
                Instead of looking for characters, you will be looking for words.

                Bill H

                Comment

                • mijn naam

                  #9
                  Re: Need to mark similar phrases in two different texts

                  "SuperNova" <SerafimPanov@g mail.comschreef in bericht
                  news:667509dc-02e4-4144-8482-81d5f31c36ff@d7 7g2000hsb.googl egroups.com...
                  On Sep 8, 12:16 am, Sjoerd <sjoer...@gmail .comwrote:
                  >I have no intentions to write code for you to solve your problem.
                  >
                  I don't need code, I need algorithm. But the only thing I'm thinking
                  about is to split words in array and to check words. If words are
                  alike, the second word should be checked again, if it is alike too,
                  the mark should be set. But I hoped that there is more fast algorithm.

                  Start by selecting two words in a sentence. Copy those, and search for them
                  in the other sentence. If you don't find a match, forward the word pointer
                  by one, select the second and third word, redo until you've reached the last
                  two words (i.e. pointer is at the next to last word).

                  Every time you do find a match, try finding a longer match until that fails.
                  Highlight. Then forward the outer pointer not by one word, but by the amount
                  of words found.

                  Add in some boundary checking so that you don't fall of the end of a piece
                  of text.

                  Make sure you invest some time in selecting the fastest code to do this job,
                  you probably want to use strpos or strstr depending on how you're going to
                  code this. strstr allows for some shortcuts, but perhaps a solution using
                  strpos is faster.

                  You may need to tweak this algoritm so that you can find more matches, which
                  may even be longer.

                  A: If some text starts with abc, then ...
                  B: if some text contains something else but a substring of some text starts
                  with abc, then ...

                  What do you highlight? "some text" and "starts with abc, then...", or "some
                  text starts with abc, then ..." or both? (better examples will exist, but
                  you probably got the point)

                  Comment

                  • SuperNova

                    #10
                    Re: Need to mark similar phrases in two different texts

                    On Sep 9, 6:37 am, "mijn naam" <whate...@hotma il.invalidwrote :
                    "SuperNova" <SerafimPa...@g mail.comschreef in berichtnews:667 509dc-02e4-4144-8482-81d5f31c36ff@d7 7g2000hsb.googl egroups.com...
                    >
                    Thanks Bill and Mijn for helping. Your ideas are good, I think it will
                    help me.

                    Thanks!

                    Comment

                    Working...