Please critique this function

**Daniel T.** · Oct 22 '07, 01:15 AM

Re: Please critique this function

"Alf P. Steinbach" <alfps@start.no wrote:

However, I'm curious about ideas of making it better.
Anyone interested in critiquing it?

void formatText( const string& in, int charsPerLine, int lines,
vector< string >& out )
{
out.resize( 1 );
out[0] = "";
int prev = 0;
int pos = charsPerLine;
int lineCount = 0;
while ( pos < in.size() ) {
pos = in.find_last_of ( " -", pos );
if ( pos < prev ) {
pos = prev + charsPerLine - 1;
out.back() += in.substr( prev, pos - prev );
out.back() += '-';
prev = pos;
}
else if ( in[pos] == '-' ) {
++pos;
out.back() += in.substr( prev, pos - prev );
prev = pos;
}
else {
out.back() += in.substr( prev, pos - prev );
prev = pos + 1;
}
out.back() += (char)0x0A;
pos += charsPerLine;
++lineCount;
if ( lines <= lineCount ) {
out.back().eras e( out.back().size () - 1 );
out.push_back( "" );
lineCount = 0;
}
}
out.back() += in.substr( prev );
}

>
I rewrote it as follows, but I'm afraid that removed the desired UB:

I modified your re-write so it passes all the tests:

void formatText(
string const& in,
int charsPerLine,
int lines,
vector< string >& out )
{
vector<string result( 1 );

size_t prev = 0;
size_t pos = charsPerLine;
int lineCount = 0;
while ( pos < in.size() )
{
pos = in.find_last_of ( " -", pos );
if ( pos == string::npos || pos < prev )
{
pos = prev + charsPerLine - 1;
result.back() += in.substr( prev, pos - prev );
result.back() += '-';
prev = pos;
}
else if ( in.at( pos ) == '-' )
{
++pos;
result.back() += in.substr( prev, pos - prev );
prev = pos;
}
else
{
result.back() += in.substr( prev, pos - prev );
prev = pos + 1;
}
result.back() += '\n';
pos += charsPerLine;
++lineCount;
if ( lines <= lineCount )
{
result.back().e rase( result.back().s ize() - 1 );
result.push_bac k( "" );
lineCount = 0;
}
}
result.back() += in.substr( prev );
out.swap( result );
}

I see two basic differences in your code from mine...

1) You define 'pos' and 'prev' as unsigned values which exposed the UB.

2) You do all the work in the vector 'result' rather than the 'out'
parameter. I assume that is to insure that 'out' is unchanged if the
function throws?

The logic, however seems identical to mine. Can you think of any
improvements to the logic?

**James Kanze** · Oct 22 '07, 09:25 AM

Re: Please critique this function

On Oct 22, 12:11 am, "Daniel T." <danie...@earth link.netwrote:

The function below does exactly what I want it to (there is a main to
test it as well.) However, I'm curious about ideas of making it better.
Anyone interested in critiquing it?

Well, just from the code, I have great difficulty even
understanding it. I'm not even sure what it is supposed to do.

If you're trying to break up text into lines, I don't really
understand the output; what is each entry in the vector (since
the entries themselves have line breaks)? Without a
specification of what the function is supposed to do, you can't
begin to write it, write a test for it, or ask anyone to review
it.

Otherwise: in general, if the problem is breaking up text into
lines of a maximum width, with line breaks only at word
boundaries (or other specified places), then the usual solution
would be to break the problem up into parts: a class which
breaks the input up into words, and a class which puts the words
into the destintation, adding line breaks as needed.

--
James Kanze (GABI Software) email:james.kan ze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

**Daniel T.** · Oct 22 '07, 11:25 AM

Re: Please critique this function

"Alf P. Steinbach" <alfps@start.no wrote:

* Daniel T. -Alf P. Steinbach:

The logic, however seems identical to mine. Can you think of any
improvements to the logic?

>
formatText( "123456 123456", 6, 4, result );

Thanks. Not only did it show an anomaly in the code, but it exposed an
error in one of the other tests! Still no ideas on reducing the
complexity of the code though?

void formatText(
string const& in,
int charsPerLine,
int lines,
vector< string >& out )
{
vector<string result( 1 );

size_t prev = 0;
size_t pos = charsPerLine;
int lineCount = 0;
while ( pos < in.size() )
{
pos = in.find_last_of ( " -", pos );
if ( pos == string::npos || pos < prev )
{
pos = prev + charsPerLine - 1;
result.back() += in.substr( prev, pos - prev );
result.back() += '-';
prev = pos;
}
else if ( in.at( pos ) == '-' )
{
++pos;
result.back() += in.substr( prev, pos - prev );
prev = pos;
}
else
{
result.back() += in.substr( prev, pos - prev );
prev = pos + 1;
}
result.back() += '\n';
pos = prev + charsPerLine;
++lineCount;
if ( lines <= lineCount )
{
result.back().e rase( result.back().s ize() - 1 );
result.push_bac k( "" );
lineCount = 0;
}
}
result.back() += in.substr( prev );
out.swap( result );
}

int main()
{
string in = "hello";
vector< string out;
formatText( in, 24, 4, out );
assert( out.size() == 1 );
assert( out[0] == "hello" );

in = "Put up with it and you will get more of it.";
formatText( in, 24, 4, out );
assert( out.size() == 1 );
assert( out[0] == "Put up with it and you\nwill get more of it." );

in = "The mind of a bigot is like the pupil of the eye. The more
light you shine on it, the more it will contract.";
formatText( in, 24, 4, out );
assert( out.size() == 2 );
assert( out[0] == "The mind of a bigot is\nlike the pupil of
the\neye. The more light you\nshine on it, the more it" );
assert( out[1] == "will contract." );

in = "Floccinaucinih ilipili-fication";
formatText( in, 24, 4, out );
assert( out.size() == 1 );
assert( out[0] == "Floccinaucinih ilipili-\nfication" );

in = "Floccinaucinih ilipilification ";
formatText( in, 24, 4, out );
assert( out.size() == 1 );
assert( out[0] == "Floccinaucinih ilipilifi-\ncation" );

formatText( "123456 123456", 6, 4, out );
assert( out.size() == 1 );
assert( out[0] == "123456\n123456 " );

}

**Daniel T.** · Oct 22 '07, 11:45 AM

Re: Please critique this function

James Kanze <james.kanze@gm ail.comwrote:

On Oct 22, 12:11 am, "Daniel T." <danie...@earth link.netwrote:

The function below does exactly what I want it to (there is a
main to test it as well.) However, I'm curious about ideas of
making it better. Anyone interested in critiquing it?

>
Well, just from the code, I have great difficulty even
understanding it. I'm not even sure what it is supposed to do.

That's why I posted it. I don't generally make functions with that high
a complexity and so I was wondering if someone could make some
suggestions on the logic to see if the complexity can be reduced.

As far as what it's supposed to do... It is supposed to pass all the
tests that were included. I.E., proper execution will exit normally, if
an assert fires, then there is a problem. (Note, due to Alf's helpful
critique I have modified one of the tests and added another.)

If you're trying to break up text into lines, I don't really
understand the output; what is each entry in the vector (since the
entries themselves have line breaks)?

Each entry in the vector is a page.

Without a specification of what the function is supposed to do, you
can't begin to write it, write a test for it, or ask anyone to
review it.

The function is supposed to take a string of text and break it up into a
number of pages. Each page is supposed to be no more than charsPerLine
wide and no more than lines high. The code must be able to hyphenate any
word wider than charsPerLine (although not necessarily at the proper
place according to English grammar,) and the function must be able to
recognize when a very long word is already hyphenated so it won't add
unnecessary hyphens.

Otherwise: in general, if the problem is breaking up text into
lines of a maximum width, with line breaks only at word boundaries
(or other specified places), then the usual solution would be to
break the problem up into parts: a class which breaks the input up
into words, and a class which puts the words into the destintation,
adding line breaks as needed.

Can you show me code demonstrating this? Would such code be less complex
than what I already have? I'm interested in reducing the cyclomatic
complexity if possible.

**Daniel T.** · Oct 22 '07, 03:05 PM

Re: Please critique this function

"Alf P. Steinbach" <alfps@start.no wrote:

* Daniel T.:

"Alf P. Steinbach" <alfps@start.no wrote:

* Daniel T. -Alf P. Steinbach:

The logic, however seems identical to mine. Can you think of any
improvements to the logic?

formatText( "123456 123456", 6, 4, result );

Thanks. Not only did it show an anomaly in the code, but it exposed an
error in one of the other tests! Still no ideas on reducing the
complexity of the code though?

>
That /was/ my idea for simplifying the code: should be much simpler if
you fix that.

I fixed that, but my fix didn't change the cyclomatic complexity...

I fail to see any changes in the formatText function (but then, I only
glanced at it, I presume you'd have explained it if you fixed it)?

The only thing I can think to change is that the function performs two
different jobs through the loop. It breaks the string up into lines, and
it breaks lines up into pages. I could seperate these two jobs into two
functions; "breakLines " and "breakPages ". Oh well, I have other
functions to write. Thanks for the help.

**Kai-Uwe Bux** · Oct 22 '07, 08:05 PM

Re: Please critique this function

Daniel T. wrote:

James Kanze <james.kanze@gm ail.comwrote:

>On Oct 22, 12:11 am, "Daniel T." <danie...@earth link.netwrote:

>

The function below does exactly what I want it to (there is a
main to test it as well.) However, I'm curious about ideas of
making it better. Anyone interested in critiquing it?

>>
>Well, just from the code, I have great difficulty even
>understandin g it. I'm not even sure what it is supposed to do.

>
That's why I posted it. I don't generally make functions with that high
a complexity and so I was wondering if someone could make some
suggestions on the logic to see if the complexity can be reduced.
>
As far as what it's supposed to do... It is supposed to pass all the
tests that were included. I.E., proper execution will exit normally, if
an assert fires, then there is a problem. (Note, due to Alf's helpful
critique I have modified one of the tests and added another.)

You cannot really be serious. The following passes all the tests. Moreover,
the logic that makes is pass all the tests is very simple and easy to
understand. Nonetheless, I seriously doubt that it will satisfy you:

struct text : vector< string {
text & add ( string const & line ) {
push_back( line );
return ( *this );
}
};

struct arguments : pair< string, pair< int, int {
arguments( string const & line, int a, int b )
: pair< string, pair< int, int ( line, pair<int,int>(a ,b) )
{}
};

void formatText (
string const& in,
int charsPerLine,
int lines,
vector< string >& out )
{
map< arguments, text table;
table[ arguments( "hello", 24, 4 ) ]
.add( "hello" );
table[ arguments( "Put up with it and you will"
" get more of it.", 24, 4 ) ]
.add("Put up with it and you\nwill get more of it.");
table[ arguments( "The mind of a bigot is like the pupil"
" of the eye. The more light you shine"
" on it, the more it will contract.", 24, 4 ) ]
.add("The mind of a bigot is\nlike the pupil"
" of the\neye. The more light you\nshine"
" on it, the more it")
.add("will contract.");
table[ arguments( "Floccinaucinih ilipili-fication", 24, 4 ) ]
.add("Floccinau cinihilipili-\nfication");
table[ arguments( "Floccinaucinih ilipilification ", 24, 4 ) ]
.add("Floccinau cinihilipilifi-\ncation");
table[ arguments( "123456 123456", 6,4 ) ]
.add("123456\n1 23456");

text result = table[ arguments( in, charsPerLine, lines ) ];
out = result;
}

>If you're trying to break up text into lines, I don't really
>understand the output; what is each entry in the vector (since the
>entries themselves have line breaks)?

>
Each entry in the vector is a page.
>

>Without a specification of what the function is supposed to do, you
>can't begin to write it, write a test for it, or ask anyone to
>review it.

>
The function is supposed to take a string of text and break it up into a
number of pages. Each page is supposed to be no more than charsPerLine
wide and no more than lines high. The code must be able to hyphenate any
word wider than charsPerLine (although not necessarily at the proper
place according to English grammar,) and the function must be able to
recognize when a very long word is already hyphenated so it won't add
unnecessary hyphens.

Ah. Thanks. That helps.

I guess a first step to break up the complexity is to split the function in
two: one splits the text into lines, the next collates lines into pages.
Something like:

void split_into_line s ( string const & in,
unsigned int charsPerLine,
vector< string & out );

void collate_pages ( vector< string const & in,
unsigned int linesPerPage,
vector< string & out );

Then you'd have:

void formatText(
string const& in,
unsigned int charsPerLine,
unsigned int linesPerPage,
vector< string >& out ) {
vector< string lines;
split_into_line s( in, charsPerLine, lines );
collate_into_pa ges( lines, linesPerPage, out );
}

>Otherwise: in general, if the problem is breaking up text into
>lines of a maximum width, with line breaks only at word boundaries
>(or other specified places), then the usual solution would be to
>break the problem up into parts: a class which breaks the input up
>into words, and a class which puts the words into the destintation,
>adding line breaks as needed.

>
Can you show me code demonstrating this? Would such code be less complex
than what I already have? I'm interested in reducing the cyclomatic
complexity if possible.

What is this cyclomatic complexity (and why is it bad)?

Best

Kai-Uwe Bux

**Daniel T.** · Oct 22 '07, 08:45 PM

Re: Please critique this function

"Alf P. Steinbach" <alfps@start.no wrote:

I'm not sure, but in this equivalent expression of the function it seems
as if it might have a off-by-one bug.

That has me worried. I've hit the function with several tests but have
been unable to find any problems so far. Why do you feel there is an off
by one bug? Is there something specific, or is it just a gut feeling?

**Daniel T.** · Oct 22 '07, 11:25 PM

Re: Please critique this function

Kai-Uwe Bux <jkherciueh@gmx .netwrote:

What is this cyclomatic complexity (and why is it bad)?

http://www.literateprogramming.com/mccabe.pdf

**=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=** · Oct 23 '07, 07:55 AM

Re: Please critique this function

On 2007-10-22 13:38, Daniel T. wrote:

James Kanze <james.kanze@gm ail.comwrote:

>On Oct 22, 12:11 am, "Daniel T." <danie...@earth link.netwrote:

Can you show me code demonstrating this? Would such code be less complex
than what I already have? I'm interested in reducing the cyclomatic
complexity if possible.

A bit off-topic but in my experience there is no benefit in reducing the
cyclomatic complexity just for the sake of reducing it. A toolthat
automatically calculates the cyclomatic complexity can be useful since
it helps you find functions that are *potentially* too complex. But each
function has to be evaluated by a human to determine if it is too
complex or not, i.e. a function with a large switch-statement will have
a high cyclomatic complexity but will not necessarily be very complex.

After a quick look at your function it looks like it would have a
cyclomatic complexity of about 5, which IMO is not very high.

BTW: you use the following to add a line-break: "out.back() +=
(char)0x0A;" but would it not be better to use "out.back() += '\n';" to
make the code more portable?

--
Erik WikstrÃ¶m

**James Kanze** · Oct 23 '07, 10:35 AM

Re: Please critique this function

On Oct 22, 1:38 pm, "Daniel T." <danie...@earth link.netwrote:

James Kanze <james.ka...@gm ail.comwrote:

On Oct 22, 12:11 am, "Daniel T." <danie...@earth link.netwrote:

The function below does exactly what I want it to (there is a
main to test it as well.) However, I'm curious about ideas of
making it better. Anyone interested in critiquing it?

Well, just from the code, I have great difficulty even
understanding it. I'm not even sure what it is supposed to do.

That's why I posted it. I don't generally make functions with that high
a complexity and so I was wondering if someone could make some
suggestions on the logic to see if the complexity can be reduced.

But until we know exactly what the function is supposed to do,
how can we make suggestions as to how to implement it.

As I said, my vague impression was that it was to format text by
breaking it up into lines, independently of the line breaks in
the original. But in that case, I don't understand your output.
Either you output a string, with '\n' separating the lines, or
you output a std::vector< std::string >, with a separate line in
each element. I don't understand using an std::vector<
std::string >, and still having '\n' in the elements.

As far as what it's supposed to do... It is supposed to pass all the
tests that were included. I.E., proper execution will exit normally, if
an assert fires, then there is a problem. (Note, due to Alf's helpful
critique I have modified one of the tests and added another.)

The simplest implementation which would pass all of your tests
would simply be to look up the input in a pre-calculated array,
with the pre-defined results. That's definitely not what you
want.

Tests are NOT a specification. I can't figure out from your
tests what the function is actually supposed to do. Good
programming starts with a specification. Without that, you're
sunk before you've started.

If you're trying to break up text into lines, I don't really
understand the output; what is each entry in the vector (since the
entries themselves have line breaks)?

Each entry in the vector is a page.

OK. Since you labeled on argument charsPerLine, what's wrong
with labeling the next linesPerPage? (I'd probably have used
lineLength and pageLength, but it doesn't matter.) And youre
out argument "pageVector "? (A specification doesn't have to be
pure English text.)

That's really, of course, two problems, and should be done in
two separate functions. First break up the input into lines,
then break up the list of lines into pages. For that, I rather
prefer maintaining the lines as entries in a vector, so that I
don't have to scan for '\n' characters. Which means that
formatText would be something like:

void
formatText(
std::string const& in,
int lineLength,
int pageLength,
std::vector< std::string >&
pageVector )
{
std::vector< std::string >
lineVector ;
horizontalForma t( in, lineLength, lineVector ) ;
std::vector< std::string >::const_iterat or
it = lineVector.begi n() ;
while ( it != lineVector.end( ) ) {
std::string page ;
for ( int i = std::min( pageLength, lineVector.end( ) -
it ) ;
i 0 ;
-- i ) {
page += *it + '\n' ;
++ it ;
}
pageVector.push _back( page ) ;
}
}

(That's just off the top of my head, without testing, so it
probably contains a few errors.)

Without a specification of what the function is supposed to do, you
can't begin to write it, write a test for it, or ask anyone to
review it.

The function is supposed to take a string of text and break it up into a
number of pages. Each page is supposed to be no more than charsPerLine
wide and no more than lines high. The code must be able to hyphenate any
word wider than charsPerLine (although not necessarily at the proper
place according to English grammar,) and the function must be able to
recognize when a very long word is already hyphenated so it won't add
unnecessary hyphens.

OK. That's about what I was looking for.

Otherwise: in general, if the problem is breaking up text into
lines of a maximum width, with line breaks only at word boundaries
(or other specified places), then the usual solution would be to
break the problem up into parts: a class which breaks the input up
into words, and a class which puts the words into the destintation,
adding line breaks as needed.

Can you show me code demonstrating this? Would such code be less complex
than what I already have? I'm interested in reducing the cyclomatic
complexity if possible.

The cyclomatic complexity is O(n), I'm pretty sure.

The basic principle is simple: you write a class which takes a
string in the constructor, and returns a single word each time a
function (getWord()) is called. You instantiate an instance of
that class at the top of the horizontalForma t function, using
the string "in", and loop calling getWord. Something like:

void
horizontalForma t(
std::string const& in,
int lineLength,
std::vector< std::string >&
lineVector )
{
WordGetter words( in ) ;
std::string line ;
for ( WordGetter::con st_iterator it = words.begin() ;
it != words.end() ;
++ it ) {
std::string word( *it ) ;
int appendLength = word.size() ;
if ( line.size() != 0 ) {
++ appendLength ; // count space
}
if ( line.size() + appendLength <= lineLength ) {
if ( line.size() != 0 ) {
line += ' ' ;
}
line += word ;
} else {
if ( line.size() != 0 ) {
lineVector.push _back( line ) ;
}
line.clear() ;
if ( word.size() <= lineLength ) {
line += word ;
} else {
// additional line break logic needed here.
}
}
}
}

Writing the iterator for WordGetter, of course, is the tricky
part---the STL iterators are not really well designed for this,
and you may want to use a real iterator (like the one in GoF,
for example) instead. Easier to use and easier to implement.

--
James Kanze (GABI Software) email:james.kan ze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

**James Kanze** · Oct 23 '07, 11:45 AM

Re: Please critique this function

On Oct 23, 9:50 am, Erik Wikström <Erik-wikst...@telia. comwrote:

On 2007-10-22 13:38, Daniel T. wrote:

James Kanze <james.ka...@gm ail.comwrote:

On Oct 22, 12:11 am, "Daniel T." <danie...@earth link.netwrote:

Can you show me code demonstrating this? Would such code be less complex
than what I already have? I'm interested in reducing the cyclomatic
complexity if possible.

A bit off-topic but in my experience there is no benefit in reducing the
cyclomatic complexity just for the sake of reducing it.

The cyclomatic complexity is often related to other things,
which you do want to reduce. But I agree that looking at it
for just one function is often counter-productive; there can be
any number of reasons why that function is justifiably higher
than usual. On the other hand, it's often a good measure of the
overall complexity of the code which is being written.

A toolthat automatically calculates the cyclomatic complexity
can be useful since it helps you find functions that are
*potentially* too complex. But each function has to be
evaluated by a human to determine if it is too complex or not,
i.e. a function with a large switch-statement will have a high
cyclomatic complexity but will not necessarily be very
complex.

That's a known special case. There will also be less clearly
identifiable ones.

After a quick look at your function it looks like it would have a
cyclomatic complexity of about 5, which IMO is not very high.

I may be confusing it with something else, but I seem to recall
something around 2.5 being typical, so 5 would be greater.

Whether it is justified here is another question, but I suspect
that any reasonable cleaning up (i.e. splitting the line break
and the page break functionality into separate functions,
putting the splitting up of the input into words into a separate
function or class, isolating the "hyphenatio n", or any possible
splitting of words into a separate function or class, etc.)
would result in a significantly lower complexity.

--
James Kanze (GABI Software) email:james.kan ze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientier ter Datenverarbeitu ng
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Please critique this function

Please critique this function

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment