Python for large projects

**Alan Gauld** · Jul 18 '05, 09:46 AM

Re: Python for large projects

On 25 Mar 2004 12:21:36 +0100, Matthias <no@spam.pls> wrote:
[color=blue]
> Jacek Generowicz <jacek.generowi cz@cern.ch> writes:
>[color=green]
> > "After", is far too late, in my opinion. It's a bit like suggesting to
> > a static-typing-for-safety fan, that he should only run his program
> > through the compiler _after_ he has finished developing.[/color]
>
> I think this method was advertised as the "cleanroom approach".
> Google finds some references.[/color]

The clean room approach was slihtly different although heading in
that direction. It relied on rigorous review, inspection and
testing at every stage of the process. (sound familiar?)

It was popular in the early/mid eighties and here are a few
references:

Wicked problems, Righteous Solutions; P DeGrace & L Hulet Stahl
- many methodolofgies including a section on clean room.

Cleanroom approach to REliable Software Devt; Dyer & MIlls
Proceedings Validation Methods Research for Fault Tolerant
Avionics....; Research Triangele Institiute, 1981

Cleanroom Software Devt, An Emopirical Investigation;
Selby, Basili, Baker, 1987
IEEE Transactions on Software Engineering,
VolSE-13,#9, Sept 1987

HTH,

Alan G.

PS. Keeping programmers away from compilers is not that old a
prctice, I was working on a VAX project in 1989 that only allowed
us one compile each per day, with a full compile overnight (which
took 6 hours)

Author of the Learn to Program website

Learning to program

http://www.freenetpages.co.uk/hp/alan.gauld

**Hung Jung Lu** · Jul 18 '05, 09:46 AM

Re: Python for large projects

> On Tue, 2004-03-23 at 17:24, Cameron Laird wrote:[color=blue]
>[color=green]
> > they're at a particular DISadvantage there. If you have a
> > big job, you *particularly* need to look at Python (or Erlang,
> > or Eiffel, or ...)[/color][/color]
--------------------------------------------
gabor <gabor@z10n.net > wrote in message news:<mailman.3 00.1080071082.7 42.python-list@python.org >...[color=blue]
> ...
> i wanted to use python for a project in our company... we wanted to
> build a fairly big system/program.
>
> but when i recommended python, i got a question like:
> (previously all the programs were written in java)
> "if one of our programmers changes a method in a class/interface, we
> immediately will know about it, because the next program-rebuild will
> simply fail. but if we would use python, we wouldn't find it out".[/color]
--------------------------------------------

I use C++ and Python everyday. Let us be fair and point out some good
things about each of them.

(a) In compiled language like C++, changing function prototypes and
variable names is comfortable, because the compiler will find all
those spots that you need to change. In Python, you do not have the
same level of comfort. Sure, there are other techniques, but it's
different than clicking a button.

(b) Cameron said something very true in my opinion: for large
projects, you want Python. But he said so without giving more details.
So let me add some comments.

In my opinion, the essence of software development is code/task
factorization. It seems such a trivial concept, but if you really
really think about it, goto statements, loops, functions, classes,
arrays, pointers, OOP, macros/templates, metaprogramming , AOP,
databases, etc, just about every single technique in programming has
its base in the concept of code/task factorization. Take for instance
classes and inheritance, basically, you factor out the common parts of
two classes and push it up into a common parent class. To go one level
deeper, my belief is that at the bottom, all human intellectual
activities are based on factorization: no more, no less.

In large projects, you'll find that you need to factor out even more.
Let us take an example. Suppose you write an application, and later on
you realize that you need to make it transactional: that is, if some
exceptions happen, you want to roll back the changes. Now, this kind
of major after-thought is terrible for languages without
metaprogramming capabilities. To add a new feature, you will have to
make modifications in hundreds or thousands of spots. Another example,
suppose your software is versioned, more over, you have different
versions for the application and for the data file format, and your
application needs to work with legacy file formats. Again, without
metaprogramming capabilities, your code will have many redundant lines
of code, or be cluttered with tons of if-statements or
switch-statements. Another similar problem: you have several different
clients that buy your application, and they want some different extra
features. Again, without metaprogramming , your code will be either
hard to code (using virtual functions, function pointers, and/or
templates in C++), or will be cluttered with if-else- and switch-
statements (a terrible practice that will make your code
unmaintainable. )

As your project grows more and more complex (become threaded, many new
clients requirements, support for legacy versions, using distributed
computing in a cluster, etc.) you will realize more and more that you
need to factorize efficiently, otherwise your pain will be unbearable.

When you have reached that point, you'll come to appreciate simplicity
and purity in a language. Frankly, Python is good but still not good
enough.

For large projects, if you use a rigid language, then your best bet is
to use tons of programmers coding trivial interfaces and APIs to make
up for the shortcomings of the language. In flexible languages like
Python, you often can use metaprogramming features to factor out the
common areas. At that point, I think that issues like automatically
finding name changes as I mentioned in point (a) become small issues,
because you will have bigger concerns. The fact that you may miss a
name change or function header change is not the thing that will kill
you. The fact that your entire system is unmaintainable is the thing
that will kill you. Don't look at individual bugs when you are talking
about large projects, because your worry should not be there: your
worry should be focused on how to make your system maintainable. Bugs
can and will be fixed. But if your language does not allow you to
factorize efficiently, at the end of the day, that's what's going to
kill you.

regards,

Hung Jung

**Roger Binns** · Jul 18 '05, 09:47 AM

Re: Python for large projects

> (a) In compiled language like C++, changing function prototypes and[color=blue]
> variable names is comfortable, because the compiler will find all
> those spots that you need to change.[/color]

It won't catch some stuff such as where a prototype changes from
pass by value to pass by reference (or vice versa), or if another
operator or explicit conversion is available. [That is true
of many languages, but C++ gives the impression it has this
rigid type checking system that avoids errors if the code compiles]

In reality I find the best approach is to use multiple languages.
You can code components in C++ and glue them together using
Swig and Python. You can make multiple binaries and execute
them telling them where to send their output, or use a pipe.
That kind of thing also makes it easier dealing with issues
in the field. For example you can send the customer a different
binary (that has the same interface) or the debugging version
of a DLL/so etc.

At the end of the day, use the best tool for the job, and
don't use any that preclude you from using others at the
same time as well.

Roger

**Bill Rubenstein** · Jul 18 '05, 09:49 AM

Re: Python for large projects

In article <mailman.357.10 80147061.742.py thon-list@python.org >, gabor@z10n.net
says...[color=blue]
> On Wed, 2004-03-24 at 15:16, Bill Rubenstein wrote:[color=green]
> > ...snip...[color=darkred]
> > > > other thing is, that in the projects i work on, there seems to be
> > > > very hard to do unit tests[/color]
> > ...snip...
> >
> > The ability to do unit testing should not be an afterthought. It should be
> > considered as a major influence on the architecture of a project.
> >
> > If one cannot do proper unit testing, the architecture of the project is
> > questionable.[/color]
>
> ok, so let's use a specific example:
>
> imagine you're building a library, which fetches webpages.
>
> you have a library which can fetch 1 webpage at a time, but it is a
> synchronous library (like wget). you call him, and he returns the page.
>
> but you want an async one.
>
> so you decide to build a threadpool, where every thread will do this:
> look into a queue, and if there is a new URL to fetch, fetches it with
> his wget-like library, and saves the html page somewhere (and maybe
> signals something).
>
> and now the user who uses your library, simply adds the URL to fetch,
> and can check later asynchronously whether they are already fetched or
> not.
>
> could you tell me what unit tests would you create for this example?
>
>
> (a more generic request: is there on the internet a webpage with
> something like this? one where they have some complex
> modules/programs/algorithms, and they show how to write unittests for
> them?)
>
> thanks,
> gabor
>
>
>[/color]
Ok, I think I understand what the job is so, here is a try.

I'm assuming that this async wget's job is to start at a url, fetch it, track
down and fetch any links and such, get them, and make all of that available on
the local system for later viewing.

To make it testable, I'd design so that the application part of the system
(described above) has as limited a knowledge of its surroundings as possible --
except for the actual work performed. It should have no knowledge of a gui, for
instance.

Instead it should know about an object which represents a 'job'. This object
should have attributes and/or functions which can be accessed to find out the
base URL, the current status or state of the specific job (not started, in
progress (various states here),..., complete. There should be a log associated
with the job object where both normal and abnormal stuff can be kept. It should
also be able to provide information about the user if there is one, instructions
about the base URL, where in the local file system to store the results, etc.
During the development phase this job object is going to be a bit dynamic as new
needs for it are discovered.

There should probably be one object which can keep track of all of the job
objects and is responsible for creating new ones and deleting old ones.

All of the interfaces to the job management object and the job object need to be
formalized and properly documented. This whole subsystem can be tested, then, by
a test driver requesting services via the documented interfaces, changing the
state of a job via the documented interfaces and determining that the state
transitions are as expected. There is no need to fetch any real URLs to do this,
just pretend you did. This test driver also needs to exercise the interfaces
intended for use by a gui.

Now, as to testing the actual application code -- I'd think that you'd need a set
of URLs which would return known and stable results and a number of error
situations (bad links and such) to test against. Then a test driver would be
written to use the standard interfaces to the job management object and the job
object to schedule work against those URLs, determine when that work is done and
test that the results are as expected, highlight the differences between a prior
run against the particular URL and the current run, etc.

I've been retired for years but that was pretty much how we did it. There were
two small programming teams -- one writing application code against the formal
interface documentation and one writing test scaffolding against the same
documentation and building test cases. Things worked, the bug rate was very low,
implementation changes were localized and testable...

Anyway, it worked for us and we never had to claim that we just couldn't test
something except in production.

Bill

**Cameron Laird** · Jul 18 '05, 09:49 AM

Re: Python for large projects

In article <c3t11s$619$1@a tlantis.news.tp i.pl>,
Jarek Zgoda <jzgoda@gazeta. usun.pl> wrote:

**Cameron Laird** · Jul 18 '05, 09:49 AM

Minor observation on the programming enterprise (was: Python for large projects)

In article <8ef9bea6.04032 60837.72a8fade@ posting.google. com>,
Hung Jung Lu <hungjunglu@yah oo.com> wrote:

**Joe Mason** · Jul 18 '05, 09:49 AM

Re: Minor observation on the programming enterprise (was: Python for large projects)

In article <106e0498vbqa61 1@corp.supernew s.com>, Cameron Laird wrote:
Remarkable fact that I see as turning up all over: we work with[color=blue]
> grep(1). There are visual programming and language-savvy editors
> and IDEs and refactoring plugins and all sorts of other tools,
> and we find our variables with text searches. 'Know how to make
> a C programmer mad? Name a global variable 'i'. 'Know how to
> make him happy? Change the name to 'ii'. Both Lisp's inventor
> and I keep our human address collection in a plaintext file.[/color]

My address collection was scattered all over various databases and
phones, and I lost the phone with the most recent one. I spent a good
hour searching for an important number, and realized that the one
database I might still have access to was for a PDA I no longer owned,
with a desktop app that I could no longer run, in a Windows partition
that I couldn't boot to at the time.

I could see the actual data, but knowing the Windows world I was almost
positive it'd be some binary database, and I'd be out of luck.

Nope, XML. Almost as good as plain text for grepping. I've never been
so relieved.

Joe

**Isaac Gouy** · Jul 18 '05, 09:49 AM

Re: Python for large projects

Jacek Generowicz <jacek.generowi cz@cern.ch> wrote in message news:<tyfbrmn63 5i.fsf@pcepsft0 01.cern.ch>...
[color=blue]
> I am of the opinion that (explicit) static typing contributes to the
> bugginess of programs.[/color]

Is there a theory for the periodicity of static-checking /
dynamic-checking debates?

A couple of weeks worth has drawn to a close on comp.lang.objec t

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=cmos401b4aa3lu33qk6afgs5fg9d16649i%404ax.com&rnum=1&prev=/groups%3Fq%3Dg:thl3859316027d%26dq%3D%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26as_drrb%3Db%26as_mind%3D29%26as_minm%3D3%26as_miny%3D1995%26as_maxd%3D12%26as_maxm%3D3%26as_maxy%3D2004%26selm%3Dcmos401b4aa3lu33qk6afgs5fg9d16649i%25404ax.com

The last discussion on comp.lang.funct ional was back in Nov 2003

**Aahz** · Jul 18 '05, 09:50 AM

Re: Python for large projects

[quoting unsnipped, voting this for post of the week]

In article <8ef9bea6.04032 60837.72a8fade@ posting.google. com>,
Hung Jung Lu <hungjunglu@yah oo.com> wrote:[color=blue]
>
>I use C++ and Python everyday. Let us be fair and point out some good
>things about each of them.
>
>(a) In compiled language like C++, changing function prototypes and
>variable names is comfortable, because the compiler will find all
>those spots that you need to change. In Python, you do not have the
>same level of comfort. Sure, there are other techniques, but it's
>different than clicking a button.
>
>(b) Cameron said something very true in my opinion: for large
>projects, you want Python. But he said so without giving more details.
>So let me add some comments.
>
>In my opinion, the essence of software development is code/task
>factorizatio n. It seems such a trivial concept, but if you really
>really think about it, goto statements, loops, functions, classes,
>arrays, pointers, OOP, macros/templates, metaprogramming , AOP,
>databases, etc, just about every single technique in programming has
>its base in the concept of code/task factorization. Take for instance
>classes and inheritance, basically, you factor out the common parts of
>two classes and push it up into a common parent class. To go one level
>deeper, my belief is that at the bottom, all human intellectual
>activities are based on factorization: no more, no less.
>
>In large projects, you'll find that you need to factor out even more.
>Let us take an example. Suppose you write an application, and later on
>you realize that you need to make it transactional: that is, if some
>exceptions happen, you want to roll back the changes. Now, this kind
>of major after-thought is terrible for languages without
>metaprogrammin g capabilities. To add a new feature, you will have to
>make modifications in hundreds or thousands of spots. Another example,
>suppose your software is versioned, more over, you have different
>versions for the application and for the data file format, and your
>application needs to work with legacy file formats. Again, without
>metaprogrammin g capabilities, your code will have many redundant lines
>of code, or be cluttered with tons of if-statements or
>switch-statements. Another similar problem: you have several different
>clients that buy your application, and they want some different extra
>features. Again, without metaprogramming , your code will be either
>hard to code (using virtual functions, function pointers, and/or
>templates in C++), or will be cluttered with if-else- and switch-
>statements (a terrible practice that will make your code
>unmaintainable .)
>
>As your project grows more and more complex (become threaded, many new
>clients requirements, support for legacy versions, using distributed
>computing in a cluster, etc.) you will realize more and more that you
>need to factorize efficiently, otherwise your pain will be unbearable.
>
>When you have reached that point, you'll come to appreciate simplicity
>and purity in a language. Frankly, Python is good but still not good
>enough.
>
>For large projects, if you use a rigid language, then your best bet is
>to use tons of programmers coding trivial interfaces and APIs to make
>up for the shortcomings of the language. In flexible languages like
>Python, you often can use metaprogramming features to factor out the
>common areas. At that point, I think that issues like automatically
>finding name changes as I mentioned in point (a) become small issues,
>because you will have bigger concerns. The fact that you may miss a
>name change or function header change is not the thing that will kill
>you. The fact that your entire system is unmaintainable is the thing
>that will kill you. Don't look at individual bugs when you are talking
>about large projects, because your worry should not be there: your
>worry should be focused on how to make your system maintainable. Bugs
>can and will be fixed. But if your language does not allow you to
>factorize efficiently, at the end of the day, that's what's going to
>kill you.
>
>regards,
>
>Hung Jung[/color]

--
Aahz (aahz@pythoncra ft.com) <*> http://www.pythoncraft.com/

"usenet imitates usenet" --Darkhawk

Python for large projects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment