It sounds absurd. The C language was born from a failed project. In 1969, General Electric, Massachusetts Institute of Technology and Bell Labs jointly founded a huge project-Multics project. The purpose of the project was to create an operating system, but it clearly ran into trouble:Not only did it fail to deliver the fast and convenient online system promised, it didn’t even get anything useful. Although the development team finally managed to get Multics into action, they were still stuck in the mud, just like IBM is on OS/360. They are trying to build a very huge operating system that can be used in very small hardware systems. Multics has become a treasure trove of engineering lessons, but it also paved the way for the C language to embody”small is beautiful”.

Beware of the discouraged Bell Labs experts After evacuating the Multics project, they went to search for other tasks. One of the researchers named Ken Thompson was very interested in another operating system. He made several proposals to Bell’s management for this, but they were rejected. While waiting for official approval, Thompson and his colleague Dennis Ritchie amused themselves by porting Thompson’s”space travel” software to the less commonly used PDP-7 system. The space travel software simulates the main stars of the solar system, displays them on a graphic screen, and creates a space shuttle that can fly and land on various planets. At the same time, Thompson stepped up work and wrote a simple new operating system for PDP-7. It is much simpler and lighter than Multics. The entire system is written in assembly language. Brian Kernighan named it UNIX in 1970, and self-deprecatingly summarized the lessons learned from Multics that should not be done. Figure 1-1 describes the relationship between early C, UNIX and related hardware systems.

Did C language come first or UNIX? Speaking of this issue, it is easy for people to fall into the nest of chickens or eggs first. To be precise, UNIX appeared earlier than the C language (this is why the UNIX system time is calculated in seconds since January 1, 1970, it was generated at that time). However, we are not talking about poultry anecdotes here, but programming stories. It is very clumsy to write UNIX in assembly language, a lot of time is wasted when compiling data structure, and the system is difficult to debug and understand. Thompson wants to take advantage of some of the advantages of high-level languages, but he doesn’t want to be as inefficient as PL/I1, nor does he want to encounter the complicated problems he has encountered in Multics. After a short and unsuccessful attempt with Fortran, Thompson created the B language. He simplified the language BCPL2 used for research, so that the B interpreter can reside in the PDP-7 which is only 8KB in size. In memory. B language has never really succeeded, because of the memory limitation of the hardware system, it only allows an interpreter to be placed, not a compiler , the resulting inefficiency hinders the use of B language for UNIX system programming.

The golden rule of compiler designers:Efficiency (almost) is everything
In a compiler, efficiency is almost everything. Of course, there are other things to be concerned about, such as meaningful error messages, good documentation, and product support. But compared with the speed users need, these factors are eclipsed. The efficiency of the compiler includes two aspects:operating efficiency (the running speed of the code) and compilation efficiency (the speed of generating executable code). Except for some development and learning environments, operational efficiency plays a decisive role.
There are many compilation optimization measures that will prolong the compilation time, but can shorten the running time. There are also some optimization measures (such as clearing useless code and ignoring runtime checks, etc.) that can not only shorten compilation time, but also reduce runtime, and at the same time reduce memory usage. The disadvantage of these optimization measures is that you may not be able to find invalid results in the program. The optimization measures themselves are very cautious when converting the code, but if the programmer writes invalid code (such as:referencing objects across the array boundary, because they”know” that there is something they need nearby variables) may cause incorrect results.
This is why it is said that efficiency is almost everything, but it is not an absolute truth. If the result is incorrect, then what is the point of efficiency? The compiler designer usually provides some compiler options. In this way, each programmer can choose the optimization measures he wants. The B language is not considered a success, but the efficiency-focused”New B” created by Dennis Ritchie has been successful, which fully proves the golden rule of compiler designers.

B language by omitting some features (such as nesting process And some loop structures), simplified the BCPL language, and carried forward the”quoting array elements is equivalent to pointer plus offset reference” this idea. At the same time, B language maintains the feature of no type in BCPL language, and its only operand is the word of the machine. Thomposon invented the ++ and – operator and added it to the B compiler of PDP-7. They still exist in the C language. Many people naively think that this is due to the PDP-11’s corresponding automatic increase/decrease address model. This idea is wrong! The appearance of the automatic increase/decrease mechanism predates the appearance of the PDP-11 hardware system. Although in C language, the statement to copy a character in a string:

*p++=*s++;

can be compiled into PDP-11 code extremely efficiently:

moveb (r0)+, (r1)+

This makes many people mistakenly think that the sentence form of the former is deliberately designed based on the latter.

When the development platform was transferred to PDP-11 in 1970 , Untyped languages ​​quickly become outdated. This kind of processor is characterized by the hardware supporting several data types of different lengths, while the B language cannot express different data types. Efficiency is also an issue, which also forced Thompson to re-implement UNIX in assembly language on the PDP-11. Dennis Ritchie took advantage of the powerful performance of PDP-11 to create the”New B” (the name quickly became”C”) language that can solve multiple data types and efficiency at the same time. It uses a compiled mode instead of an interpreted mode. And introduced a type system, each variable must be declared before use.

C language early experience

The main purpose of increasing the type system is to help compiler designers distinguish the different data types possessed by the new PDP-11 machine, such as single-precision floating-point numbers, Double-precision floating-point numbers and characters, etc. This is in sharp contrast with other languages ​​such as Pascal. In Pascal, the purpose of the type system is to protect programmers from performing invalid operations on data. Due to different design philosophies, the C language excludes strong typing, which allows programmers to assign values ​​between different types of objects when needed. The addition of the type system can be said to be an afterthought, since serious evaluation and rigorous testing have never been carried out in terms of usability. To this day, many C programmers still believe that”strong typing” is nothing more than an increase in the useless work of typing on the keyboard.

In addition to the type system, many other aspects of the C language Features are built for the convenience of compiler designers (why not? The main customers of the C language in the first few years are those compiler designers). The language features developed according to the ideas of compiler designers are:

  • The array subscript starts from 0 instead of 1. Most people are used to counting from 1 instead of 0. Compiler designers choose to start from 0, because the concept of offset is deeply ingrained in their minds. But this design makes most people feel very awkward. Although we defined an array a100, you should never store data in a100, because the legal range of this array is from a0 to a99.
  • The basic data types of C language are directly related to Corresponding to the underlying hardware. For example, unlike Fortran, there is no built-in plural type in the C language. If the underlying hardware does not provide direct support for a certain language element, the compiler designer will not waste any energy on it. The C language did not support floating-point types at the beginning, and it was not added until the hardware system could directly support floating-point numbers.
  • auto keyword is obviously decoration span>. This keyword is only meaningful to compiler designers who create symbol table entries. It means”memory allocation is automatically performed when entering the program block” (as opposed to global static allocation or dynamic allocation on the heap). Other programmers don’t have to worry about the keyword auto, it is the default variable memory allocation mode.
  • The array name in the expression can be seen It is a pointer. Treating arrays as pointers simplifies a lot of things. We no longer need a complicated mechanism to distinguish them, and don’t have to endure the inefficiency of having to copy all the array contents when passing them to a function. However, arrays and pointers are not equivalent in all cases. See Chapter 4 for a more detailed discussion.
  • float is automatically expanded to double. Although this is no longer the case in ANSI C, the accuracy of floating-point constants was originally double, and float variables in all expressions are always automatically converted to doubles. The reason for this has never been made public, but it has to do with the hardware representation of floating-point numbers in the PDP-11. First of all, in PDP-11 or VAX, the cost of converting from float to double is very small, as long as a word with each bit being 0 is added to the back. If you want to switch back, just remove the second word. Secondly, you must know that there is a mode bit in some PDP-11’s floating-point number hardware representation. You can only perform float operations or double operations, but if you want to use these two To switch between modes, you must modify this bit to change the operation mode. In the early UNIX programs, float was not used too much, so it is more convenient to fix the operation mode to double, saving the compiler designer to track its changes.
  • Nested functions (inside functions Contains the definition of another function). This simplifies the compiler and slightly improves the runtime organization of C programs. The specific mechanism is described in detail in Chapter 6″Motion Poem:Runtime Data Structure”.
  • register keyword. This keyword can provide clues to the compiler designer as to which variables in the program are popular (often used) so that they can be stored in registers. This design can be said to be a mistake. If the compiler is allowed to automatically handle the allocation of registers when using each variable, it is obviously better than keeping this type of variable in the register during the lifetime of the declaration. Using the register keyword simplifies the compiler, but loses the burden to the programmer.

For the C compiler designer There are many other language features that are convenient and established. This is not a bad thing in itself, it greatly simplifies the C language itself, and by avoiding some complex language elements (such as generics and tasks in Ada, string processing in PL/I, templates and multiple inheritance in C++) , C language is easier to learn and implement, and the efficiency is very high.

Different from most other languages, C language has one Long evolutionary process. Before its current form, it has gone through many intermediate states. After many years, it has evolved from a practical tool to a language that has undergone a lot of experimentation and testing. The first C compiler appeared approximately in 1970, more than 20 years ago3. Time flies, the UNIX system as its foundation has been widely used, and the C language has also grown strong. Its emphasis on low-level operations directly supported by hardware has brought extremely high efficiency and portability, which in turn helped UNIX achieve great success.

K&R C

By the mid-1970s, the C language was very close to the current form we know and love. More improvements still exist, but most of them are just some detailed changes (such as allowing functions to return structure values) and some improvements that extend the basic types to adapt to new hardware changes. (For example, increase the keywords unsigned and long). In 1978, Steve Johnson wrote pcc, a portable C compiler. Its source code is open to outside Bell Labs and is widely transplanted, forming the basis of a whole generation of C compilers. The evolution of the C language is shown in Figure 1-2.

Figure 1-2 Late C

software creed
An extraordinary bug
The C language inherits a feature from Algol-68, which is the compound assignment operator. It allows a repetitive operand to be written only once instead of twice, giving the code generator a hint that operand addressing can also be similarly compact. An example of this is the use of b+=3 as the abbreviation for b=b+3. The initial way of writing a compound assignment operator is to write the assignment operator first, and then the operator, like:b=+3. There is a trick in the B language lexical analyzer to make it easier to implement the form of =op than the op= form currently used. But this form can cause confusion, it is easy to subtract 3 from b from
b=-3; /* */
and
b= -3; /* Assign -3 to b */
Confusion.
Therefore, this feature has been modified to the form currently in use. As part of the modification, the code formatter program indent has also been modified accordingly to determine the outdated form of the compound assignment symbol, exchange the positions of the two, and convert it to the corresponding standard form. This is a very bad decision, and no formatter should modify anything in the program except whitespace. What is unpleasant is that this approach introduces a bug, that is, almost anything (as long as it is not a variable), if it appears after the assignment operator, it will swap positions with the assignment operator.
If you are lucky, this bug may cause grammatical errors, such as:
epsilon=.0001;
will be exchanged for:
epsilon.=0001;
This sentence will not work With the compiler, you can find errors right away. But a source sentence may also look like this:
valve=!open; /*The logical inverse of valve being set to open*/
It will be quietly exchanged to:
valve!=open;/*valve and open are not equal to comparison*/
This statement can also be compiled, but its function is obviously different from the source statement, it does not change the value of valve.
In the latter case, this bug will be latent and will not be detected immediately. It is natural to add a space after the assignment, so as the obsolete form of the compound assignment operator becomes rarer, people gradually forget that the indent program was used to”improve” the obsolete form. This bug caused by the indent program did not disappear in various C compilers until the mid-1980s. This is something that should be resolutely rejected!

In 1978, C language classics The C Programming LanguagePublished. This book has been widely praised, and its authors Brian Kernighan and Dennis Ritchie are also famous for this, so this version of the C language is called”K&R C”. The publisher initially estimated that the book would sell about 1,000 copies. As of 1994, approximately 1.5 million copies of this book have been sold (see Figure 1-3). C language has become one of the most successful programming languages ​​in the last 20 years, and it may be the most successful. But with the widespread popularity of the C language, many people try to produce other variants from the C language.

Figure 1-3 Like Elvis, C language is everywhere

This paragraph is excerpted from”C Expert Programming”

UNIX system

Since the C language was born from the UNIX system and became popular because of it, we start with the UNIX system (note :The UNIX we mentioned also includes other systems, such as FreeBSD, which is a branch of UNIX, but the name is not used due to legal reasons).

1. Editing on UNIX systems

UNIX C does not have its own editor, but it can Use a common UNIX editor, such as emacs, jove, vi, or X Window System text editor.

As a programmer, you are responsible for inputting the correct program and Give an appropriate file name for the file where the program is stored. As mentioned earlier, the file name should start with.cend. Note that UNIX is case sensitive. Therefore, budget.cBUDGET.candBudget.c are 3 different but all A valid C source file name. ButBUDGET .Cis an invalid file name because the extension of the name uses uppercaseC span>not lowercasec.

Suppose we write the following in the vi compiler Program and store it ininform.cIn the file:

include 
int main(void)
{
   printf(&34;A .c is used to end a C program filename.n&34;);

   return 0;
}

The above text is the source code, inform.cis the source file. Note that the source file is the beginning of the entire compilation process, not the end.

2. Compiling on UNIX system

Although in our opinion, the program is perfect, But for computers, this is a bunch of garbled codes. The computer does not understandincludeandprintfWhat is it (maybe you don’t understand it now, but you will understand later when you learn it, but computers won’t). As mentioned earlier, we need a compiler to translate the code (source code) we write into code (machine code) that the computer can understand. The final executable file contains all the machine code needed by the computer to complete the task.

In the past, the UNIX C compiler needed to call the language-defined cc command. However, it has not kept up with the development of standards and has already withdrawn from the stage of history. However, the C compiler provided by the UNIX system usually comes from some other source, and then it is used as cc command as the compiler Alias. Therefore, although different compilers will be called in different systems, users can still continue to use the same commands.

Compileinform.c, you must enter the following command:

cc inform.c

After a few seconds, it will return to the UNIX prompt to tell the user that the task is complete. If the program is written incorrectly, you may see warnings or error messages, but let’s assume that the written program is completely correct (if the compiler reportsvoidThe error indicates that your system has not been updated to ANSI C compiler, just delete voidThat’s it). If using ls command to list files, you will find onea.outFile (see Figure 1.5). This file is an executable file that contains a translated (or compiled) program. To run the file, just enter:

a.out

The output content is as follows:

A .c is used to end a C program filename.



Figure 1.5   Use UNIX to prepare C program

If you want to save executable files (a.out ), it should be renamed. Otherwise, the file will be generated the next time the program is compileda.outfile replacement.

How to deal with object code? The C compiler will create an object code file with the same basic name as the source code, but its extension is.o. In this example, the target code file isinform.o. However, this file cannot be found, because once the linker generates a complete executable program, it will be deleted. If the original program has multiple source code files, keep the object code files. When you learn about multi-file programs later, you will understand the benefits of doing so.

Linux system

Linux is an open source, popular, UNIX-like operating system that can run on different platforms (including PC and Mac). Preparing C programs in Linux is almost the same as in UNIX systems, the difference is to use the GCCPublic domain C compiler. The compilation command is similar to:

gcc inform.c

Note that when installing Linux, you can choose whether to install GCC. If GCC has not been installed before, it must be installed. Usually, the installation process will ccasgcc alias, so it can be used in the command lineccReplacegcc.

This paragraph is excerpted from"C Primer Plus (6th Edition) Chinese Version"

div>