Doubt on compiler-processor relationship

**JosAH** · Jul 17 '07, 06:33 AM

Originally posted by ajaygargnsit

Let's say we are using the gcc compiler, and a Pentium 4. Also say there is another machine running same gcc compiler, but a different processor, say Sunsparc. So, now, even though only the processor differs, yet different machine codes will be generated for the same code. (I assume it to be the case, kindly correct me I am wrong..)

Now how does the compiler know that what machine code to generate. (I presume it has got to do something with compiler "Adjusting" its parameters, during boot up time, BUT I AM NOT SURE... PLZ help..)

Looking forward to a reply.

Ajay Garg

Gcc nicely separates the compilation frontend (lexical analysis, parsing,
abstract code generation) from the backend. The backend 'knows' for what
processor it has to generate its code. Gcc can even do cross compilation,
i.e. simply replace the backend of the compiler by another one.

kind regards,

Jos

**ajaygargnsit** · Jul 17 '07, 08:45 AM

Gcc nicely separates the compilation frontend (lexical analysis, parsing,
abstract code generation) from the backend. The backend 'knows' for what
processor it has to generate its code. Gcc can even do cross compilation,
i.e. simply replace the backend of the compiler by another one.

That means, that compilation from source code to object code is a two step process :: one at the front end side, other at the back end side, right ?

**ajaygargnsit** · Jul 17 '07, 09:03 AM

The compiler itself is a (compiled) binary. You can't run a gcc for a Sparc on
a pentium, nor vice versa. See your other thread where I also explained things
a bit more. The compiler doesn't 'know' anything about other processors. It's
the code generating part of the compiler that knows all about one single processor.

Well JosAh, I moved this here, so we can continue in the same thread ...

Ok.. so that means, that in the installation CDs of Linux, we have different binaries of gcc, one each for all the processors possible?

**JosAH** · Jul 17 '07, 09:09 AM

Originally posted by ajaygargnsit

Well JosAh, I moved this here, so we can continue in the same thread ...

Ok.. so that means, that in the installation CDs of Linux, we have different binaries of gcc, one each for all the processors possible?

A binary executable is targeted for one specific processor only, so if you have
Linux running on say, an ARM and an Intel processor you need binaries for each
one of them.

Note that you can have compiler backends running on processor P1 while
they generate machine code for processor P2; they're the building blocks for
cross compilation.

kind regards,

Jos

**ajaygargnsit** · Jul 17 '07, 09:37 AM

Ok, I think that now I can make sense ... Taking my system configuration example (Fedora, gcc, Intel pentium 4), the steps that happen are as follows ::

During installation, the initial booting stage checks for the processor. Now since mine is Intel P4, it will load every binary (including gcc of course), targetted towards P4.

Now the gcc binaries include ::

a) The front end binaries (written for P4)
b) The backend binaries (also written for P4), but which are able to cross compile.

So all in all, one front end binary (for P4), several backend binaries (for P4 itself, Sunsparc, .... .... ...)..

Also that means that all the compilation in the "usual" sense (resolving references, linking, etc. etc. etc.) happens in front end, while the backend only actually converts the code into strings of 0's and 1's.

Kindly correct me, if I am wrong at any point above .

Thanks JosAH

Ajay Garg

**JosAH** · Jul 17 '07, 10:09 AM

Originally posted by ajaygargnsit

Ok, I think that now I can make sense ... Taking my system configuration example (Fedora, gcc, Intel pentium 4), the steps that happen are as follows ::

During installation, the initial booting stage checks for the processor. Now since mine is Intel P4, it will load every binary (including gcc of course), targetted towards P4.

Now the gcc binaries include ::

a) The front end binaries (written for P4)
b) The backend binaries (also written for P4), but which are able to cross compile.

So all in all, one front end binary (for P4), several backend binaries (for P4 itself, Sunsparc, .... .... ...)..

Also that means that all the compilation in the "usual" sense (resolving references, linking, etc. etc. etc.) happens in front end, while the backend only actually converts the code into strings of 0's and 1's.

Kindly correct me, if I am wrong at any point above .

Thanks JosAH

Ajay Garg

Actually resolving references etc. is the work of the linker; the loader substitutes
logical addresses to the physical addresses. The compiler backend had generated
COFF or ELF files for that (Common Object File Format and/or Extensible Link
Format).
Those files do contain the machine instructions for the processor for which the
compiler was targeted.

kind regards,

Jos

**ajaygargnsit** · Jul 17 '07, 10:49 AM

Ahh ... Well, it would be nice if I could get the "complete" procedure as to what happens, from the step of writing a C program, to the step of getting it executed ..

I will be obliged to get help.

(Well, wouldn't be nice if we start one step at a time, and then discuss each step, taking into account the different cases possible at each step..)

**JosAH** · Jul 17 '07, 12:56 PM

Originally posted by ajaygargnsit

Ahh ... Well, it would be nice if I could get the "complete" procedure as to what happens, from the step of writing a C program, to the step of getting it executed ..

I will be obliged to get help.

(Well, wouldn't be nice if we start one step at a time, and then discuss each step, taking into account the different cases possible at each step..)

I've got a better idea: read a book on compilers; Aho, Sethi and Ulman's
"the Dragon book" is legendary. For starters you might have a peek at my
little "Compilers" series in the Java Articles section.

kind regards,

Jos

**weaknessforcats** · Jul 17 '07, 03:20 PM

I always understood that these chips have a built-in x386 instruction set to native code translator.

Your C++ compiler generates x386 code which is translated by the chip into native instructions at execution time.

Otherwise, you will need a binary for every chip under the sun, plus distribution, maintenance, etc.

I even read an article once where a Pentium designer wrote a compiler that generated Pentium code instead of x386 code. Result: The program was 14x faster. His comment: Do something nice for people and they all they want to do is cook on a cmpfire.

**JosAH** · Jul 17 '07, 03:34 PM

Originally posted by weaknessforcats

I always understood that these chips have a built-in x386 instruction set to native code translator.

Your C++ compiler generates x386 code which is translated by the chip into native instructions at execution time.

Otherwise, you will need a binary for every chip under the sun, plus distribution, maintenance, etc.

But we still do; no matter what clever tricks Intel put in their pentium processors.
Halfway burried in the bare metal there's a RISC processor that emulates the
CISC instruction set of the older Intel processors. Compilers still generate that
old instruction set. But, e.g. Sparcs or ARMS (both RISC processors) still need
their own binaries. For those Intel thingies: those 'hidden' instructions can be
considered the micro code of the pentium; not very clever IMHO because it
takes million of gates and heat dissipation and all to get a reasonable performance.
Multi code technology is only a halfway solution. 100K gates RISC processors
can do the same thing at (almost) the same speed using a much slower clock
(less heat dissipation and power consumption).

kind regards,

Jos

Doubt on compiler-processor relationship

Doubt on compiler-processor relationship

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment