I am developing an authoring system that includes footnote functionality. The system stores data as htmlspecialchar s encoded html. Footnotes are numbered in the source without regard to where the footnote appears in the text. So, footnote 5 could preceed footnote 1 in the html. When its time to process the document for presentation I am re-assigning the footnote numbers so that they appear sequentially in the text. I am trying to do this using preg_replace.
Each footnote is embedded in a <span> that has an ID attribute with a unique identifier that maps the corresponding footnote text to the given marker location. The following code will reproduce the problem:
The output I am expecting from this code and sample input should look like this:
<b>MANAGEMENT 'S DISCUSSION AND ANALYSIS</b> <p>This text is shorter. <sup><span style="color: blue; font-size: 10px;" class="footnote ">[1]</span></sup>. There has to be at least two sentences presented <sup><span style="color: blue; font-size: 10px;" class="footnote ">[2]</span></sup>) discussion and analysis is designed to identify the significant <sup><span style="color: blue; font-size: 10px;" class="footnote ">[3]</span></sup> in the fiscal year ending April 30, 2004. </p>
Each instance of the id attribute has been removed so that subsequent calls to preg_match will not find the nodes that have already been processed and the arbitrary footnote numbers assigned in the authoring system have been replaced by sequential values.
What I get is this:
<b>MANAGEMENT 'S DISCUSSION AND ANALYSIS</b> <p>This text is shorter. <sup><span style="color: blue; font-size: 10px;" id="fn6" class="footnote ">[6]</span></sup>. There has to be at least two sentences presented <sup><span style="color: blue; font-size: 10px;" id="fn4" class="footnote ">[4]</span></sup>) discussion and analysis is designed to identify the significant <sup><span class='LOOKHERE ' style="color: blue; font-size: 10px;" class="footnote ">[1]</span></sup> in the fiscal year ending April 30, 2004. </p>
Notice that only the very last instance of the footnote span has been modified even though each individual span does match the regular expression. I don't understand this behavior. Am I misunderstandin g the preg_replace function? Is my logic flawed? Anybody experienced and overcome this issue? Any suggestions?
Your time and thoughts are greatly appreciated,
Dan Ellison
Each footnote is embedded in a <span> that has an ID attribute with a unique identifier that maps the corresponding footnote text to the given marker location. The following code will reproduce the problem:
Code:
error_reporting(E_ALL); $regexp = '/^(.*)<span (.*)id="fn([^"]*)"(.*)>\[(.*)\](.*)$/'; $target = <<<__EOS <b>MANAGEMENT'S DISCUSSION AND ANALYSIS</b> <p>This text is shorter. <sup><span style="color: blue; font-size: 10px;" id="fn6" class="footnote">[6]</span></sup>. There has to be at least two sentences presented <sup><span style="color: blue; font-size: 10px;" id="fn4" class="footnote">[4]</span></sup>) discussion and analysis is designed to identify the significant <sup><span style="color: blue; font-size: 10px;" id="fn5" class="footnote">[5]</span></sup> in the fiscal year ending April 30, 2004. </p> __EOS; $fnotenum = 0; if(preg_match($regexp, $target, $matches)){ $fnotenum = 0; $replacements = "$1<span class='LOOKHERE' $2 $4>[".++$fnotenum."]$6"; $target = preg_replace($regexp, $replacements, $target); } echo $target; exit();
<b>MANAGEMENT 'S DISCUSSION AND ANALYSIS</b> <p>This text is shorter. <sup><span style="color: blue; font-size: 10px;" class="footnote ">[1]</span></sup>. There has to be at least two sentences presented <sup><span style="color: blue; font-size: 10px;" class="footnote ">[2]</span></sup>) discussion and analysis is designed to identify the significant <sup><span style="color: blue; font-size: 10px;" class="footnote ">[3]</span></sup> in the fiscal year ending April 30, 2004. </p>
Each instance of the id attribute has been removed so that subsequent calls to preg_match will not find the nodes that have already been processed and the arbitrary footnote numbers assigned in the authoring system have been replaced by sequential values.
What I get is this:
<b>MANAGEMENT 'S DISCUSSION AND ANALYSIS</b> <p>This text is shorter. <sup><span style="color: blue; font-size: 10px;" id="fn6" class="footnote ">[6]</span></sup>. There has to be at least two sentences presented <sup><span style="color: blue; font-size: 10px;" id="fn4" class="footnote ">[4]</span></sup>) discussion and analysis is designed to identify the significant <sup><span class='LOOKHERE ' style="color: blue; font-size: 10px;" class="footnote ">[1]</span></sup> in the fiscal year ending April 30, 2004. </p>
Notice that only the very last instance of the footnote span has been modified even though each individual span does match the regular expression. I don't understand this behavior. Am I misunderstandin g the preg_replace function? Is my logic flawed? Anybody experienced and overcome this issue? Any suggestions?
Your time and thoughts are greatly appreciated,
Dan Ellison
Comment