Also, do y’all call main() in the if block or do you just put the code you want to run in the if block?

  • onlinepersona@programming.dev
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    2 days ago

    Can someone explain to me how to compile a C library with “main” and a program with main? How does executing a program actually work? It has an executable flag, but what actually happens in the OS when it encounters a file with an executable file? How does it know to execute “main”? Is it possible to have a library that can be called and also executed like a program?

    Anti Commercial-AI license

    • barsoap@lemm.ee
      link
      fedilink
      arrow-up
      20
      ·
      edit-2
      2 days ago

      How does executing a program actually work?

      Way too long an answer for a lemmy post

      It has an executable flag, but what actually happens in the OS when it encounters a file with an executable file?

      Depends on OS. Linux will look at the first bytes of the file, either see (ASCII) #! (called a shebang) or ELF magic, then call the appropriate interpreter with the executable as an argument. When executing e.g. python, it’s going to call /usr/bin/env with parameters python and the file name because the shebang was #!/usr/bin/env python.

      How does it know to execute “main”?

      Compiled C programs are ELF so it will go through the ELF header, figure out which ld.so to use, then start that so that it will find all the libraries, resolve all dynamic symbols, then do some bookkeeping, and jump to _start. That is, it doesn’t: main is a C thing.

      Is it possible to have a library that can be called and also executed like a program?

      Absolutely. ld.so is an example of that.. Actually, wait, I’m not so sure any more, I’m getting things mixed up with libdl.so. In any case ld.so is an executable with a file extension that makes it look like a library.

      EDIT: It does work. My (GNU) libc spits out version info when executed as an executable.

      If you want to start looking at the innards like that I would suggest starting here: Hello world in assembly. Note the absence of a main function, the symbol the kernel actually invokes is _start, the setup necessary to call a C main is done by libc.so. Don’t try to understand GNU’s libc it’s full of hystarical raisins I would suggest musl.

      • onlinepersona@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 day ago

        EDIT: It does work. My (GNU) libc spits out version info when executed as an executable.

        How does that work? There must be something above ld.so, maybe the OS? Because looking at the ELF header, ld.so is a shared library “Type: DYN (Shared object file)”

        $ readelf -hl ld.so
        ELF Header:
          Magic:   7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00 
          Class:                             ELF64
          Data:                              2's complement, little endian
          Version:                           1 (current)
          OS/ABI:                            UNIX - GNU
          ABI Version:                       0
          Type:                              DYN (Shared object file)
          Machine:                           Advanced Micro Devices X86-64
          Version:                           0x1
          Entry point address:               0x1d780
          Start of program headers:          64 (bytes into file)
          Start of section headers:          256264 (bytes into file)
          Flags:                             0x0
          Size of this header:               64 (bytes)
          Size of program headers:           56 (bytes)
          Number of program headers:         11
          Size of section headers:           64 (bytes)
          Number of section headers:         23
          Section header string table index: 22
        
        Program Headers:
          Type           Offset             VirtAddr           PhysAddr
                         FileSiz            MemSiz              Flags  Align
          LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x0000000000000db8 0x0000000000000db8  R      0x1000
          LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                         0x0000000000029435 0x0000000000029435  R E    0x1000
          LOAD           0x000000000002b000 0x000000000002b000 0x000000000002b000
                         0x000000000000a8c0 0x000000000000a8c0  R      0x1000
          LOAD           0x00000000000362e0 0x00000000000362e0 0x00000000000362e0
                         0x0000000000002e24 0x0000000000003000  RW     0x1000
          DYNAMIC        0x0000000000037e80 0x0000000000037e80 0x0000000000037e80
                         0x0000000000000180 0x0000000000000180  RW     0x8
          NOTE           0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                         0x0000000000000040 0x0000000000000040  R      0x8
          NOTE           0x00000000000002e8 0x00000000000002e8 0x00000000000002e8
                         0x0000000000000024 0x0000000000000024  R      0x4
          GNU_PROPERTY   0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                         0x0000000000000040 0x0000000000000040  R      0x8
          GNU_EH_FRAME   0x0000000000031718 0x0000000000031718 0x0000000000031718
                         0x00000000000009b4 0x00000000000009b4  R      0x4
          GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x0000000000000000 0x0000000000000000  RW     0x10
          GNU_RELRO      0x00000000000362e0 0x00000000000362e0 0x00000000000362e0
                         0x0000000000001d20 0x0000000000001d20  R      0x1
        

        The program headers don’t have interpreter information either. Compare that to ls “Type: EXEC (Executable file)”.

        $ readelf -hl ls
        ELF Header:
          Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
          Class:                             ELF64
          Data:                              2's complement, little endian
          Version:                           1 (current)
          OS/ABI:                            UNIX - System V
          ABI Version:                       0
          Type:                              EXEC (Executable file)
          Machine:                           Advanced Micro Devices X86-64
          Version:                           0x1
          Entry point address:               0x40b6e0
          Start of program headers:          64 (bytes into file)
          Start of section headers:          1473672 (bytes into file)
          Flags:                             0x0
          Size of this header:               64 (bytes)
          Size of program headers:           56 (bytes)
          Number of program headers:         14
          Size of section headers:           64 (bytes)
          Number of section headers:         32
          Section header string table index: 31
        
        Program Headers:
          Type           Offset             VirtAddr           PhysAddr
                         FileSiz            MemSiz              Flags  Align
          PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                         0x0000000000000310 0x0000000000000310  R      0x8
          INTERP         0x00000000000003b4 0x00000000004003b4 0x00000000004003b4
                         0x0000000000000053 0x0000000000000053  R      0x1
          LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                         0x0000000000007570 0x0000000000007570  R      0x1000
          LOAD           0x0000000000008000 0x0000000000408000 0x0000000000408000
                         0x00000000000decb1 0x00000000000decb1  R E    0x1000
          LOAD           0x00000000000e7000 0x00000000004e7000 0x00000000004e7000
                         0x00000000000553a0 0x00000000000553a0  R      0x1000
          LOAD           0x000000000013c9c8 0x000000000053d9c8 0x000000000053d9c8
                         0x000000000000d01c 0x0000000000024748  RW     0x1000
          DYNAMIC        0x0000000000148080 0x0000000000549080 0x0000000000549080
                         0x0000000000000250 0x0000000000000250  RW     0x8
          NOTE           0x0000000000000350 0x0000000000400350 0x0000000000400350
                         0x0000000000000040 0x0000000000000040  R      0x8
          NOTE           0x0000000000000390 0x0000000000400390 0x0000000000400390
                         0x0000000000000024 0x0000000000000024  R      0x4
          NOTE           0x000000000013c380 0x000000000053c380 0x000000000053c380
                         0x0000000000000020 0x0000000000000020  R      0x4
          GNU_PROPERTY   0x0000000000000350 0x0000000000400350 0x0000000000400350
                         0x0000000000000040 0x0000000000000040  R      0x8
          GNU_EH_FRAME   0x0000000000126318 0x0000000000526318 0x0000000000526318
                         0x0000000000002eb4 0x0000000000002eb4  R      0x4
          GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x0000000000000000 0x0000000000000000  RW     0x10
          GNU_RELRO      0x000000000013c9c8 0x000000000053d9c8 0x000000000053d9c8
                         0x000000000000c638 0x000000000000c638  R      0x1
        

        It feels like somewhere in the flow there is the same thing that’s happening in python just more hidden. Python seems to expose it because a file can be a library and an executable at the same time.

        Anti Commercial-AI license

        • barsoap@lemm.ee
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          1 day ago

          Your ld.so contains:

          Entry point address: 0x1d780

          EDIT: …with which I meant, modulo brainfart: My libc.so.6 contains a proper entry address, while other libraries are pointing at 0x0 and coredump when executed. libc.so is a linker script, presumably because GNU compulsively overcomplicates everything.

          …I guess that’s enough for the kernel. It might be a linux-only thing, maybe even unintended and well linux doesn’t break userspace.

          Speaking of, I was playing it a bit fast and loose: _start is merely the default symbol name for the entry label, I’m sure nasm and/or ld have ways to set it to something different.

          • JATth@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            12 hours ago

            Btw, ld.so is a symlink to ld-linux-x86-64.so.2 at least on my system. It is an statically linked executable. The ld.so is, in simpler words, an interpreter for the ELF format and you can run it:

            ld.so --help
            

            Entry point address: 0x1d780

            Which seems to be contained in the only executable section segment of ld.so

            LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000028bb5 0x0000000000028bb5  R E    0x1000
            

            Edit: My understanding of this quite shallow; the above is a segment that in this case contains the entirety of the .text section.

    • MajorasMaskForever@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      2 days ago

      You don’t. In C everything gets referenced by a symbol during the link stage of compilation. Libraries ultimately get treated like your source code during compilation and all items land in a symbol table. Two items with the same name result in a link failure and compilation aborts. So a library and a program with main is no bueno.

      When Linux loads an executable they basically look at the program’s symbol table and search for “main” then start executing at that point

      Windows behaves mostly the same way, as does MacOS. Most RTOS’s have their own special way of doing things, bare metal you’re at the mercy of your CPU vendor. The C standard specifies that “main” is the special symbol we all just happen to use

    • namingthingsiseasy@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      1 day ago

      There are a lot of other helpful replies in this thread, so I won’t add much, but I did find this reference, which you could read if you have a lot of free time. But I particularly liked reading this summary:

      • _start calls the libc __libc_start_main;
      • __libc_start_main calls the executable __libc_csu_init (statically-linked part of the libc);
      • __libc_csu_init calls the executable constructors (and other initialisatios);
      • __libc_start_main calls the executable main();
      • __libc_start_main calls the executable exit().
    • anton@lemmy.blahaj.zone
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      1 day ago

      If you want to have a library that can also be a standalone executable, just put the main function in an extra file and don’t compile that file when using the library as a library.
      You could also use the preprocessor to do it similar to python but please don’t.

      Just use any build tool, and have two targets, one library and one executable:

      LIB_SOURCES = tools.c, stuff.c, more.c
      EXE_SOURCES = main.c, $LIB_SOURCES
      

      Edit: added example

    • Cratermaker@discuss.tchncs.de
      link
      fedilink
      arrow-up
      2
      ·
      2 days ago

      I haven’t done much low level stuff, but I think the ‘main’ function is something the compiler uses to establish an entry point for the compiled binary. The name ‘main’ would not exist in the compiled binary at all, but the function itself would still exist. Executable formats aren’t all the same, so they’ll have different ways of determining where this entry point function is expected to be. You can ‘run’ a binary library file by invoking a function contained therein, which is how DLL files work.