strong/weak typing and pointers

**Alex Martelli** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> wrote:
...[color=blue]
> I'm obviously upsetting you, and I can see that we're still not quite
> understanding each other. I have to assume that you're not the only one I'm
> upsetting through these misunderstandin gs, so for the sake of the list, I'll
> stop responding to this thread. Thanks everyone for a good discussion![/color]

I apologize if I have given the impression of being upset. I am, in a
way, I guess -- astonished and nonplusses, as if somebody asked me to
justify the existence of bread -- not of some exotic food, mind you, but
of the most obvious, elementary, fundamental substance of earthly
sustenance (in my culture, and many others around it).
[color=blue]
> P.S. If anyone would like to know my response to the float representation
> example, please contact me directly instead.[/color]

I promise not to ACT upset if you explain it here. So, we have an area
of 8 bytes in memory which we need to be able to treat as:
8 bytes, for I/O purposes, say;
a float, to feed it to some specialized register, say;
a bit indicating sign plus 15 for mantissa plus 48 for significand,
or the like, to perform masking and shifting thereof in SW -- a
structure of three odd-bit-sized integers juxtaposed;
and this is ONE example -- the specific one you had asked for.

Another example: we're going to send a controlblock of 64 bytes to some
HW peripheral, and get it back perhaps with some mods -- a typical
control/status arrangement. Depending on the top 2 (or in some case 4)
bytes' value, the structure may need to be interpreted in several
possible ways, in terms of juxtaposition of characters, halfwords and
longwords. Again, the driver responsible for talking with this
peripheral needs to be able to superimpose on the 64 bytes any of
several possible C-level struct's -- the cleanest way to do this would
appear to be pointer-casting, though unions would (as usual, of course)
be essentially equivalent. In Python, or another language that lets me
pack and unpack a struct to/from bytes in a controlled way (in Python's
case via the struct module) I can do that through a _copy_ -- I need to
go through a 'raw bytes' stage, cannot do the overlay directly; but
that's little more than a figleaf arrangement -- spending real CPU and
RAM operations because I can't be lowlevel/weakly-typed enough.

Alex

**Alex Martelli** · Jul 18 '05, 05:20 PM

Re: Summary: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> wrote:
...[color=blue]
> I wonder what people think about Ruby, which, I understand, does allow you to
> modify builtins. Can anyone tell me if you could make Ruby strings do the
> horrible coercion that PHP strings do?[/color]

Yes, you could. Reliable Ruby friends tell me that's not DONE in the
real world of Ruby, any more than pythonistas call their methods' first
argument 'foo' rather than 'self' or pepper their code with 'exec'
statements or code 200-chars nested-lambda oneliners. But though
culturally frowned on, it _is_ technically possible.

The one real example I saw, which was enough to turn me off my quest to
explore Ruby for production purposes, was making (builtin) string
comparisons case-insensitive -- apparently that _IS_ the kind of thing
_SOME_ perhaps-inexperienced Rubystas _DO_ perpetrate (breaking library
modules left, right, and center, of course). Maybe it's similar to
rather inexperienced Pythonistas dead keen on "exec myname+'='+valu e"; I
_have_ seen that horror perpetrated in real Python code (doesn't break
any library, but slows function execution down by 10 times w/o any real
advantage wrt dicts or bunch usage, and is a bug-prone piece too...).

Alex

**Michael Hobbs** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> wrote:[color=blue]
> My point here is that I think in most code, even when people do a bunch of
> bit-twiddling, they have a single underlying structure in mind, and therefore
> you see them treat the bits as one of two things: (1) The sequence of bits, i.e.
> the untyped memory block, or (2) the intended structure. IMHO, an example of
> taking advantage of weak-typing would be a case where you treat the bits as
> three different things: the sequence of bits, and two (mutually exclusive)
> intended structures.[/color]

One word: union

**Steven Bethard** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Alex Martelli <aleaxit <at> yahoo.com> writes:[color=blue]
>
> I apologize if I have given the impression of being upset.[/color]

No problem -- my mistake for misinterpreting you. I'm just sensitive to these
kind of things because I know I've previously miscommunicated , and
unintentionally got people upset before (you being one of them). ;)
[color=blue]
> I am, in a
> way, I guess -- astonished and nonplusses, as if somebody asked me to
> justify the existence of bread -- not of some exotic food, mind you, but
> of the most obvious, elementary, fundamental substance of earthly
> sustenance (in my culture, and many others around it).[/color]

Yeah, this goes to the heart of the misunderstandin g. I'm not asking anyone to
justify the _existence_ of weak-typing. Weak-typing is a direct result of a
language's support for untyped (bit/byte) data. I agree 100% that this sort of
data is not only useful, but often essential in any low-level (e.g. OS, hardware
driver, etc.) code.
[color=blue]
> So, we have an area
> of 8 bytes in memory which we need to be able to treat as:
> 8 bytes, for I/O purposes, say;
> a float, to feed it to some specialized register, say;
> a bit indicating sign plus 15 for mantissa plus 48 for significand,
> or the like, to perform masking and shifting thereof in SW -- a
> structure of three odd-bit-sized integers juxtaposed;[/color]

As a quick refresher, I quote myself in what I was looking for:
"taking advantage of weak-typing would be a case where you treat the bits as
three different things: the sequence of bits, and two (mutually exclusive)
intended structures."

My response to this example is that your two intended structures are not
mutually exclusive. Yes, you have to do some bit-twiddling, but only because
your float struct doesn't have get_sign, get_mantissa and get_significand
methods. ;) You're still dealing with the same representation, not converting
to a different type. You're just addressing a lower level part of the
representation.

I can see the point though: at least in most of the languages I'm familiar with,
float is declared as a type while there's no subtype of float that specifies the
sign, mantissa and significand.

(Oh, and by the way, in case you really were wondering, they still do teach
float representations , even in computer science (as opposed to computer
engineering), or at least they did through 1999.)
[color=blue]
> Another example: we're going to send a controlblock of 64 bytes to some
> HW peripheral, and get it back perhaps with some mods -- a typical
> control/status arrangement. Depending on the top 2 (or in some case 4)
> bytes' value, the structure may need to be interpreted in several
> possible ways, in terms of juxtaposition of characters, halfwords and
> longwords. Again, the driver responsible for talking with this
> peripheral needs to be able to superimpose on the 64 bytes any of
> several possible C-level struct's -- the cleanest way to do this would
> appear to be pointer-casting, though unions would (as usual, of course)
> be essentially equivalent.[/color]

Is the interpretation of the controlblock uniquely defined by the top 2 or 4
bytes, or are there some values for the top 2 or 4 bytes for which I have to
apply two different interpretations (C-level structs) to the same sequence of
bits?

If the top 2 or 4 bytes uniquely define the structs, then I would just say
you're just going back and forth between a typed structure and its untyped
representation. If the top 2 or 4 bytes can specify multiple interpretations
for the same sequence of bits, then this is the example I was looking for. =)

Steve

**Steven Bethard** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Michael Hobbs <mike <at> hobbshouse.org> writes:[color=blue]
>
> One word: union
>[/color]

Interestingly, unions can be well-defined even in a strongly-typed language,
e.g. OCaml:

# type int_or_list = Int of int | List of int list;;
type int_or_list = Int of int | List of int list
# Int 1;;
- : int_or_list = Int 1
# List [1; 2];;
- : int_or_list = List [1; 2]

The reason for this is that at any given time in OCaml, the sequence of bits is
only interpretable as *one* of the two types, never both. If you have a good
example of using a union (in C probably, since OCaml wouldn't let you do this I
don't think) where you want to treat a given sequence of bytes as both types *at
once*, that would be great!

Thanks,

Steve

**Alex Martelli** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> wrote:
...[color=blue]
> Yeah, this goes to the heart of the misunderstandin g. I'm not asking
> anyone to justify the _existence_ of weak-typing. Weak-typing is a direct
> result of a language's support for untyped (bit/byte) data. I agree 100%
> that this sort of data is not only useful, but often essential in any
> low-level (e.g. OS, hardware driver, etc.) code.[/color]

But so is the ability to get at the same bits/bytes in structured ways.
[color=blue][color=green][color=darkred]
> > > So, we have an area[/color]
> > of 8 bytes in memory which we need to be able to treat as:
> > 8 bytes, for I/O purposes, say;
> > a float, to feed it to some specialized register, say;
> > a bit indicating sign plus 15 for mantissa plus 48 for significand,
> > or the like, to perform masking and shifting thereof in SW -- a
> > structure of three odd-bit-sized integers juxtaposed;[/color]
>
> As a quick refresher, I quote myself in what I was looking for: "taking
> advantage of weak-typing would be a case where you treat the bits as three
> different things: the sequence of bits, and two (mutually exclusive)
> intended structures."
>
> My response to this example is that your two intended structures are not
> mutually exclusive. Yes, you have to do some bit-twiddling, but only
> because your float struct doesn't have get_sign, get_mantissa and
> get_significand methods. ;) You're still dealing with the same
> representation, not converting to a different type. You're just
> addressing a lower level part of the representation.[/color]

What do you mean by "mutually exclusive"? "Never useful at the same
time"? You're asking for an example of things never useful at the same
time that are useful at the same time?!

The struct type with so many bits being signs, exponent, significands,
IS a distinct type from double-precision float -- it's the
representation of the latter according to some standard. To multiply by
0.1 I have to have a float, to 'get the N-bit integer that gives the
exponent shifted right by 3' I have to have that struct type. They're
totally distinct (not "mutually exclusive" because they ARE useful as
ways to look at the same bitbunch at the same time, of course) types,
ways to analyze or interpret the same bunch of bits (apart from the
untyped representation where I can do binary I/O with them, too).

[color=blue]
> I can see the point though: at least in most of the languages I'm familiar
> with, float is declared as a type while there's no subtype of float that
> specifies the sign, mantissa and significand.[/color]

Right. To get at the bitfields, you use weaktyping instead.

[color=blue][color=green]
> > Another example: we're going to send a controlblock of 64 bytes to some
> > HW peripheral, and get it back perhaps with some mods -- a typical
> > control/status arrangement. Depending on the top 2 (or in some case 4)
> > bytes' value, the structure may need to be interpreted in several
> > possible ways, in terms of juxtaposition of characters, halfwords and
> > longwords. Again, the driver responsible for talking with this
> > peripheral needs to be able to superimpose on the 64 bytes any of
> > several possible C-level struct's -- the cleanest way to do this would
> > appear to be pointer-casting, though unions would (as usual, of course)
> > be essentially equivalent.[/color]
>
> Is the interpretation of the controlblock uniquely defined by the top 2 or 4
> bytes, or are there some values for the top 2 or 4 bytes for which I have to
> apply two different interpretations (C-level structs) to the same sequence of
> bits?[/color]

In the HW I was thinking of, the former is the case.
[color=blue]
> If the top 2 or 4 bytes uniquely define the structs, then I would just say
> you're just going back and forth between a typed structure and its untyped
> representation. If the top 2 or 4 bytes can specify multiple interpretations
> for the same sequence of bits, then this is the example I was looking for. =)[/color]

I need to examine the top bytes of the block as the HW returned it, in
some cases, to know what struct type is most useful to interpret the
bunch of bits. There is typically only one type (besides 'just a bunch
of 64 bytes') that it useful at _one_ given time. But weak typing does
not require parallel processing without locks -- only if two independent
threads of controls were looking at the same bits concurrently from two
separate processors would saying "at ONE time" make sense... true and
unfettered concurrent access...

As for two different interpretations of the same bits being useful (not
"at the same time"), consider a 16-bit field that can be seen as one
16-bit word or two 8-bit bytes. In the former case, '0' means the whole
operation concluded successfully, any non-0 means problems were
encountered. So, a piece of code that just needs a pass/nonpass filter
on the operation is best advised to tread that field as a 16-bit word,
so it can test it for == or != 0 atomically.

At a deeper level, one byte indicates possible problems of one kind (say
ones "intrinsic" to the procedure/operation in question), another
indicates possible problems of a different kind (say ones "extrinsic" to
the procedure per se, but caused by preemption, power failures, etc).
Unix return-status values aren't too far away from this. If you need
accurate diagnosis of what went wrong, seeing the same field as two
8-bit bytes is handier (assuming you can get some kind of lock in that
case, since you are then dealing with nonatomic testing).

You could see a test such as "if x->field16 == 0:" as a weird shorthand
for "if x->field8_a == 0 and x->field8_b == 0:", but depending on
considerations of atomicity it might not even be.

Another example where the same sequence of bits may be usefully
interpreted in more ways at the same time: given a string of bytes which
encodes some unicode text in utf-8 it's clearly useful to consider it as
such, parsing it left to right byte by byte to find the unicode chars
being encoded and display the proper glyphs, etc. But I may also want
to walk the same area of memory as a sequence of 64-bit words to compute
a simple checksum to ensure data integrity (as well as the usual need
for 'untyped' bytescan for I/O). Or, say I don't know whether the
incoming data were utf-8 or utf-16; by walking over them in both 1-byte
(utf-8) and 2-byte units I may well be able to get strong heuristic
indications of which of the two encodings was in use. Similar
heuristics are sometimes very useful even in determining whether a bunch
of 4-byte words from a record are floats or ints -- as long, of course,
as you CAN walk them both ways and compare strangeness-indicators. If
you even need to recover old data from datasets whose details were lost,
you'll find that out for yourself.

Alex

**Christophe Cavalaria** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Michael Hobbs wrote:
[color=blue]
> Steven Bethard <steven.bethard @gmail.com> wrote:[color=green]
>> My point here is that I think in most code, even when people do a bunch
>> of bit-twiddling, they have a single underlying structure in mind, and
>> therefore you see them treat the bits as one of two things: (1) The
>> sequence of bits, i.e.
>> the untyped memory block, or (2) the intended structure. IMHO, an
>> example of taking advantage of weak-typing would be a case where you
>> treat the bits as three different things: the sequence of bits, and two
>> (mutually exclusive) intended structures.[/color]
>
> One word: union[/color]
Note that in the C standard, writing to part A of an union and reading from
part B is UB : undefined behavior and so it should *not* be used.

**Steven Bethard** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Alex Martelli <aleaxit <at> yahoo.com> writes:[color=blue]
>[/color]
[snip example decomposing float representation into mantissa, etc.][color=blue]
>[/color]
[snip example determining struct type from first few bytes][color=blue]
>[/color]
[snip example decomposing 16 bit error code into two 8 bit error codes][color=blue]
>[/color]
[snip example determining utf-8 or utf-16 by trying byte stream as both]

Thanks for the examples!

I'm not quite convinced by the decomposition examples or the struct type
example, but the UTF example is definitely convincing. I can imagine that you
could extend this type of example to any case where you didn't know the actual
type of a struct. Given this situation, you could try treating the bytes as
each of the possible struct types, and see (heuristically or perhaps with a
machine learning approach) which struct type is most appropriate.

This definitely meets my criterion of treating the same set of bytes as two
different structures, and it's even useful! =) Thanks!

Steve

**Michael Hobbs** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> wrote:[color=blue]
> The reason for this is that at any given time in OCaml, the sequence of bits is
> only interpretable as *one* of the two types, never both. If you have a good
> example of using a union (in C probably, since OCaml wouldn't let you do this I
> don't think) where you want to treat a given sequence of bytes as both types *at
> once*, that would be great![/color]

This example is a little weak, but may be sufficient. The in_addr
structure used for sockets usually uses a union to provide different
views to the underlying 32-bit address. You can access the address
as 4 8-bit values, 2 16-bit values, or 1 32-bit value. Most code
these days only use the 4 8-bit representation, but the interface is
there.

Another possible example comes from the Windows API. Some of the
functions take an arbitrary length structure. If you want to make a
simple call to the function, you pass a small structure. If you
want to make a more complex call to the function, you pass a larger
structure that has more fields tacked on to the end. Usually, the
first field in the structure is an int that specifies how large the
structure is. It is used as sort of a crude version of OO in C.

I'm not sure if these are the kinds of examples you're looking for.
I don't know how anyone would be able to use a sequence of bytes as
two types of data at once. There is almost always some sort of
indicator that specifies how to interpret the bytes; otherwise, it
is just garbage.

-- Mike

**Diez B. Roggisch** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Steven Bethard wrote:
[color=blue]
> Michael Hobbs <mike <at> hobbshouse.org> writes:[color=green]
>>
>> One word: union
>>[/color]
>
> Interestingly, unions can be well-defined even in a strongly-typed
> language, e.g. OCaml:
>
> # type int_or_list = Int of int | List of int list;;
> type int_or_list = Int of int | List of int list
> # Int 1;;
> - : int_or_list = Int 1
> # List [1; 2];;
> - : int_or_list = List [1; 2][/color]

Unions in functional languages are also known as direct sums of types (as
opposed to products, which form tuples). And trying to access a union that
holds an int as list will yield an error - runtime, most probably. So there
is no way of reinterpreting an int as list, which still satisfies the
paragdigms of a strong typed language.
--
Regards,

Diez B. Roggisch

**Greg Ewing** · Jul 18 '05, 05:20 PM

Re: strong/weak typing and pointers

Diez B. Roggisch wrote:[color=blue]
> I can remeber abusing 32bit pointers in 68k processors by
> altering the most-significant byte.[/color]

Apple did this in early versions of the Memory Manager
of classic MacOS, using the upper 8 bits of a Handle
for various flags. You weren't supposed to make any
assumptions about what the upper byte contained, but
of course some people did... and their applications
broke when 32-bit addressing came in...

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand

Greg Ewing's Home Page

http://www.cosc.canterbury.ac.nz/~greg

**Mike Meyer** · Jul 18 '05, 05:21 PM

Re: Summary: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> writes:
[color=blue]
> JCM <joshway_withou t_spam <at> myway.com> writes:[color=green]
>>[color=darkred]
>> > Definition 1 is the definition most commonly used in Programming
>> > Languages literature.... However, for
>> > all intents and purposes, it is only applicable to statically typed
>> > languages; no one on the list could come up with a dyamically typed
>> > language that allowed bit-reinterpretatio n.[/color]
>>
>> Assembly language. The types of values are implied by what
>> instructions you use.[/color]
>
> I'm sure some people would argue that assembly language is untyped (not
> statically or dynamically typed) and that the operations are defined on bits,
> but this is definitely the best example I've seen. Thanks![/color]

The previously mentioned BCPL has the exact same property. For that
matter, early versions of C used to allow it to a large degree. I've
actually compiled programs written as "char *main = { ... }".

To me, a dynamically typed language is one where objects - rather than
variables - have a type attached.

<mike
--
Mike Meyer <mwm@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

**Mike Meyer** · Jul 18 '05, 05:21 PM

Re: Summary: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> writes:
[color=blue]
> Gabriel Zachmann writes:
> In summary, there are basically three interpretations of "weak-typing" discussed
> in this thread:
>
> (1) A language is "weakly-typed" if it allows code to take a block of memory
> that was originally defined as one type and reinterpret the bits of this block
> as another type.
>
> (2) A language is "weakly-typed" if it has a large number of implicit coercions.
>
> (3) A language is "weakly-typed" if it often treats objects of one type as other
> types.
>
> Definition 1 is the definition most commonly used in Programming Languages
> literature, and allows a language to be called "weakly-typed" based only on the
> language definition. However, for all intents and purposes, it is only
> applicable to statically typed languages; no one on the list could come up with
> a dyamically typed language that allowed bit-reinterpretatio n.[/color]

Definition 1 is a black/white proposition instead of being a
continuum. Once you allow the simple case needed for real-world work
of allowing an object to be treated as whatever it is or a sequence of
bytes, you can treat any type as any other type.
[color=blue]
> Definition 2 seemed to be the definition most commonly used on the list, most
> likely because it is actually applicable to a dynamically typed language like
> Python. It has the problem that in a language that supports operator
> overloading (like Python), programmers can make their language more
> "weakly-typed" by simply providing additional coercions, thus whether or not a
> language is called "weakly-typed" depends both on the language definition and
> any code written in the language.[/color]

This problem can largely be made to go away by limiting it to builtin
types. Likewise for definition 3.

I'd call Ruby's allowing builtin types to be changed a
misfeature. Builtin types should be subclassed.

<mike
--
Mike Meyer <mwm@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

**Michael Hobbs** · Jul 18 '05, 05:21 PM

Re: strong/weak typing and pointers

Steven Bethard <steven.bethard @gmail.com> wrote:[color=blue]
> The reason for this is that at any given time in OCaml, the sequence of bits is
> only interpretable as *one* of the two types, never both. If you have a good
> example of using a union (in C probably, since OCaml wouldn't let you do this I
> don't think) where you want to treat a given sequence of bytes as both types *at
> once*, that would be great![/color]

I've come up with the perfect example for you. However, it is from
the days when memory was scarce and programmers were allowed to use
any programming language they wanted, so long as it was assembly.

To conserve as much memory as possible, some programmers would use
machine code that was loaded into memory as their integer constants.
Here is an excerpt from The Story of Mel:
(http://www.catb.org/~esr/jargon/html/story-of-mel.html)

Since Mel knew the numerical value
of every operation code,
and assigned his own drum addresses,
every instruction he wrote could also be considered
a numerical constant.
He could pick up an earlier "add" instruction, say,
and multiply by it,
if it had the right numeric value.
His code was not easy for someone else to modify.

**Alex Martelli** · Jul 18 '05, 05:21 PM

Re: strong/weak typing and pointers

Greg Ewing <greg@cosc.cant erbury.ac.nz> wrote:
[color=blue]
> Diez B. Roggisch wrote:[color=green]
> > I can remeber abusing 32bit pointers in 68k processors by
> > altering the most-significant byte.[/color]
>
> Apple did this in early versions of the Memory Manager
> of classic MacOS, using the upper 8 bits of a Handle
> for various flags. You weren't supposed to make any
> assumptions about what the upper byte contained, but
> of course some people did... and their applications
> broke when 32-bit addressing came in...[/color]

I believe many implementations of high-level languages on machines where
addresses had to be aligned used LOW bits similarly. Say addresses of
integers need to be even or else a bus error will occur. Then, a word
that is used to hold an integer address has its low bit 'available' as a
flag -- it needs to be cleared before it's dereferenced, anyway.

This seems reasonably sound because, even if a later model of the CPU
should be extended to allow misaligned addresses, the OS need not
support that. Misaligned addresses can pay substantial performance
prices for little gain -- not sure about the state of play these days,
but just a few years ago you could boost the performance of some C codes
on intel CPUs (which always allowed address misalignment) quite a bit by
recompiling with flags telling the compiler to ensure addess alignment.

So, the low bit, when set, could indicate we're pointing to a Bignum
(like a Python long), when clear, that we're pointing to an ordinary
small integer, for example -- or other such dychotomous distinctions.

Of course, such an address-plus-flag must be handled as a bitmask (to
examine and clear the flag) or a pointer, interchangeably .

Alex

strong/weak typing and pointers

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment