Summary: The complexities behind why certain projects use multiple programming languages are explored, especially those where different language components must run as a single process. It debunks the oversimplification of compilers by detailing the multi-step compilation process (pre-processing, compilation to assembly, assembly, and linking) and distinguishes between static and dynamic linking. The video explains how different compilers and toolchains (like GCC becoming GNU Compiler Collection) enable multi-language projects through the linker's role in combining object files. A key discussion point emphasizes that for successful inter-language communication, especially at a low level, languages must conform to an Application Binary Interface (ABI) which defines data passing conventions.
Why Some Projects Use Multiple Programming Languages
- Some projects involve multiple programming languages.
- Understanding this can be easy or difficult depending on project type.
- Easy understanding: Full-stack framework like Django, where Python handles backend and HTML, CSS, JavaScript build the UI. These are separate processes communicating remotely.
- Difficult understanding: Projects where components in different languages run together as a single process.
Sponsor Shout-out: Let's Get Rusty
- Rust is critical in current systems (Google, Microsoft, Linux kernel).
- Let's Get Rusty offers Rust training for skill development and career advancement in systems programming.
Core Principles for Multi-Language Projects (Single Process)
- Focus on programming languages that compile to machine code.
- Each language generally has its own compiler, so direct compilation of one language's file by another language's compiler is not possible.
- The challenge: How can languages with separate compilers, runtimes, and memory models coexist in the same binary?
Demystifying Compilers: The Multi-Step Process
- Compilers don't just directly turn source code into executables.
- They perform a complex, multi-step process.
- Example: C program compilation with GCC
- Pre-processing: Removes comments, expands macros, resolves conditional compilation and
includes
. Output is still C code. - Compilation: Translates pre-processed code into assembly language (human-readable instructions).
- Myth busted: Compilers don't always convert source code directly to machine code; often to intermediate representations like assembly or another language.
- Assembly: The assembler (a compiler itself) translates assembly code into machine code (ones and zeros), producing an object file.
- Linking: Combines multiple object files (from user code and external libraries) into a single executable.
- Pre-processing: Removes comments, expands macros, resolves conditional compilation and
Linking: Static vs. Dynamic
- Static Linking: Copies machine code of required library functions directly into the final executable.
- Self-contained and ready to run.
- Dynamic Linking: Libraries are pre-compiled into dynamic shared libraries (.so on Unix, .dll on Windows).
- Executable contains only references to functions in dynamic libraries.
- Operating system loads required functions at runtime.
- Advantages: Saves disk space and memory (only one copy of library needed), allows library updates without recompiling dependent programs.
Modularity in Compilation and Multi-Language Projects
- The compilation process is modularized; compilers like GCC hide intermediate steps by default.
- Using flags (e.g.,
-save-temps
,-S
in GCC) can expose or stop the process at specific stages (e.g., assembly). - Ability to start compilation from any phase (e.g., passing assembly file to GCC).
- Enables multi-language projects: Different language components (e.g., C and assembly) can be compiled/assembled separately and then linked together.
- Example: C code for main logic, assembly for performance-critical functions. This is applied in projects like Linux kernel, FFmpeg, OpenSSL.
GCC as a Toolchain (GNU Compiler Collection)
- GCC is not just a C compiler but a toolchain (a pipeline of tools).
- Each tool in the chain is pluggable.
- GCC evolved from GNU C Compiler to GNU Compiler Collection, supporting C++, Objective C, Fortran, Ada, D, Go.
- This perspective clarifies how different languages can be part of the same compilation system.
Mixing High-Level Languages
- Possible to mix high-level languages (e.g., C and Fortran).
- Often requires multiple compilation steps, as each language has its own pipeline and potentially runtime dependencies.
- The linker is crucial for combining object files from different languages into a single executable.
- Example: Call Rust from C:
- Implement function in Rust.
- Compile Rust code into a static or dynamic library.
- Declare and use the function in C code.
- Compile C code and link with the Rust-compiled library.
- Common to call C from Rust due to C's mature libraries and system APIs.
- Motivation: Performance optimization (writing performance-critical parts in lower-level languages like C).
Application Binary Interface (ABI)
- Just having object files isn't enough for successful linking.
- Languages must conform to an Application Binary Interface (ABI).
- ABI defines: How different components of binary code interact via hardware (e.g., how data is passed between functions using registers, calling conventions).
- Mismatch example: Language A passes parameters in registers R0, R1; Language B expects R1, R2. Or one uses pass-by-reference and the other pass-by-value. These mismatches lead to undefined behavior or crashes.
- At least one language, or the interacting part, must conform to the other's ABI expectations.
- Modern language features: Provide keywords/attributes (e.g.,
extern
in C/Rust,no_mangle
in Rust,bind
in Fortran,import "C"
in Go) to ensure generated assembly conforms to the expected ABI when interacting with other languages.