Can somebody help out with RegExps?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • R. Tarazi

    Can somebody help out with RegExps?

    Hello together,

    I'm having extreme difficulties using RegExps for a specific problem
    and would really appreciate any help and hope somebody will read
    through my "long" posting...

    1.
    <?php
    // Find all blocks containing the postal code, a minimum of 50
    characters and a maximum of 200 characters before and after.
    //This should me all blocks containing postal code and city.

    $arrParsedBlock s = getDataUsingReg exp("'(.{50,250 })".preg_quote( $arrDaten['Plz'])."\s+".preg_qu ote($arrDaten['Ort'])."(.{50,250} )
    'is",$content) ;

    function getDataUsingReg exp($strRegexp, $string)
    {
    global $arrDaten;

    preg_match_all( $strRegexp, $string, $matches);

    $arrListe = array();

    for ($i=0; $i< count($matches[0]); $i++)
    {
    $strData = trim($matches[1][$i].$arrDaten['Plz']."
    ".$arrDaten['Ort'].$matches[2][$i]);

    $arrListe[] = $strData;
    }

    return $arrListe;
    ?>

    Question:
    ---------
    * How can I extract 3 lines before and after postal code + city?
    (instead of a specific number of characters)


    2.
    <?php
    $string = "Kontakt


    <br>
    Bill Jones

    Dr. Bill
    Jones<br>
    Internet & Webdesign<br>

    Examplestreet 9<br>
    87354 Munich<br>
    Germany<br>
    Tel. (0 8 9) 1234 <br>
    Handy (0173) 111 <br>
    Internet: http://www.foo.com<br>

    E-Mail: info@foo.com";

    echo $string;

    $output_array = getDataUsingReg exp('#Tel(.*?)< br>#m',$string) ;
    var_dump($outpu t_array);

    $output_array = getDataUsingReg exp('#Handy(.*? )<br>#m',$strin g);
    var_dump($outpu t_array);
    ?>


    Questions:
    ------------
    * I want to extract following data out of a string into an assoziative
    array (see above example) e.g.

    Array( [Name] => "Bill Jones Dr. Bill Jone"s [Company Name] =>
    "Internet & Webdesign" [Street] => "Examplestr eet 9" [City] =>
    "87354 Munich" [Country] => "Germany" [Tel] => "(0 8 9) 1234
    <br>")

    * As a basis I can use a postal code and the city name, with which I
    extracted the blocks containing these in step one.

    Lines with a telephone number can be identified including words such
    as telefon, tel., fon or telephone.

    Lines with a fax number can be identified including words such as fax
    or telefax.

    Lines with a cellural number can be identified including words such as
    handy or mobile.

    The patterns in my above example are actually very specific and
    designed for special cases and are not global at all.

    The line above the line holding postal code and city is assumed
    holding the street data.

    The 2 lines above the line holding the street data are assumed holding
    the company name.

    Lines between postal code+city and tel. are assumed holding the
    country name, where as this is optional. Sometimes there may not even
    be any country information available.

    I define the separation of lines not only by the separator new line
    (/n or <br>) but also strings/characters such as <br> or , or - or :
    or ; or |
    Since an address can be written in one line, like

    Bill Jones | Internet & Webdesign | Examplestreet 9 | 87354 Munich |

    1. Company Name
    2. Company Name
    3. Street Name
    3. Postal Code + City name
    4. Country Name (optional)
    5. Tel.
    6. Fax.
    7. Handy

    5. to 7. can of course differ in order

    => Somehow all sounds simple, but performing a regular expression
    pattern is another side of the story... :(

    Is there any RegExp professionell out there who could help out? I
    would also appreciate detailed explanations, since I'm here to learn!

    Thanks a lot!
    Rania
Working...