Also, do y’all call main() in the if block or do you just put the code you want to run in the if block?

  • barsoap@lemm.ee
    link
    fedilink
    arrow-up
    20
    ·
    edit-2
    3 days ago

    How does executing a program actually work?

    Way too long an answer for a lemmy post

    It has an executable flag, but what actually happens in the OS when it encounters a file with an executable file?

    Depends on OS. Linux will look at the first bytes of the file, either see (ASCII) #! (called a shebang) or ELF magic, then call the appropriate interpreter with the executable as an argument. When executing e.g. python, it’s going to call /usr/bin/env with parameters python and the file name because the shebang was #!/usr/bin/env python.

    How does it know to execute “main”?

    Compiled C programs are ELF so it will go through the ELF header, figure out which ld.so to use, then start that so that it will find all the libraries, resolve all dynamic symbols, then do some bookkeeping, and jump to _start. That is, it doesn’t: main is a C thing.

    Is it possible to have a library that can be called and also executed like a program?

    Absolutely. ld.so is an example of that.. Actually, wait, I’m not so sure any more, I’m getting things mixed up with libdl.so. In any case ld.so is an executable with a file extension that makes it look like a library.

    EDIT: It does work. My (GNU) libc spits out version info when executed as an executable.

    If you want to start looking at the innards like that I would suggest starting here: Hello world in assembly. Note the absence of a main function, the symbol the kernel actually invokes is _start, the setup necessary to call a C main is done by libc.so. Don’t try to understand GNU’s libc it’s full of hystarical raisins I would suggest musl.

    • onlinepersona@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      3 days ago

      EDIT: It does work. My (GNU) libc spits out version info when executed as an executable.

      How does that work? There must be something above ld.so, maybe the OS? Because looking at the ELF header, ld.so is a shared library “Type: DYN (Shared object file)”

      $ readelf -hl ld.so
      ELF Header:
        Magic:   7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00 
        Class:                             ELF64
        Data:                              2's complement, little endian
        Version:                           1 (current)
        OS/ABI:                            UNIX - GNU
        ABI Version:                       0
        Type:                              DYN (Shared object file)
        Machine:                           Advanced Micro Devices X86-64
        Version:                           0x1
        Entry point address:               0x1d780
        Start of program headers:          64 (bytes into file)
        Start of section headers:          256264 (bytes into file)
        Flags:                             0x0
        Size of this header:               64 (bytes)
        Size of program headers:           56 (bytes)
        Number of program headers:         11
        Size of section headers:           64 (bytes)
        Number of section headers:         23
        Section header string table index: 22
      
      Program Headers:
        Type           Offset             VirtAddr           PhysAddr
                       FileSiz            MemSiz              Flags  Align
        LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                       0x0000000000000db8 0x0000000000000db8  R      0x1000
        LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                       0x0000000000029435 0x0000000000029435  R E    0x1000
        LOAD           0x000000000002b000 0x000000000002b000 0x000000000002b000
                       0x000000000000a8c0 0x000000000000a8c0  R      0x1000
        LOAD           0x00000000000362e0 0x00000000000362e0 0x00000000000362e0
                       0x0000000000002e24 0x0000000000003000  RW     0x1000
        DYNAMIC        0x0000000000037e80 0x0000000000037e80 0x0000000000037e80
                       0x0000000000000180 0x0000000000000180  RW     0x8
        NOTE           0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                       0x0000000000000040 0x0000000000000040  R      0x8
        NOTE           0x00000000000002e8 0x00000000000002e8 0x00000000000002e8
                       0x0000000000000024 0x0000000000000024  R      0x4
        GNU_PROPERTY   0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
                       0x0000000000000040 0x0000000000000040  R      0x8
        GNU_EH_FRAME   0x0000000000031718 0x0000000000031718 0x0000000000031718
                       0x00000000000009b4 0x00000000000009b4  R      0x4
        GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                       0x0000000000000000 0x0000000000000000  RW     0x10
        GNU_RELRO      0x00000000000362e0 0x00000000000362e0 0x00000000000362e0
                       0x0000000000001d20 0x0000000000001d20  R      0x1
      

      The program headers don’t have interpreter information either. Compare that to ls “Type: EXEC (Executable file)”.

      $ readelf -hl ls
      ELF Header:
        Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
        Class:                             ELF64
        Data:                              2's complement, little endian
        Version:                           1 (current)
        OS/ABI:                            UNIX - System V
        ABI Version:                       0
        Type:                              EXEC (Executable file)
        Machine:                           Advanced Micro Devices X86-64
        Version:                           0x1
        Entry point address:               0x40b6e0
        Start of program headers:          64 (bytes into file)
        Start of section headers:          1473672 (bytes into file)
        Flags:                             0x0
        Size of this header:               64 (bytes)
        Size of program headers:           56 (bytes)
        Number of program headers:         14
        Size of section headers:           64 (bytes)
        Number of section headers:         32
        Section header string table index: 31
      
      Program Headers:
        Type           Offset             VirtAddr           PhysAddr
                       FileSiz            MemSiz              Flags  Align
        PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                       0x0000000000000310 0x0000000000000310  R      0x8
        INTERP         0x00000000000003b4 0x00000000004003b4 0x00000000004003b4
                       0x0000000000000053 0x0000000000000053  R      0x1
        LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                       0x0000000000007570 0x0000000000007570  R      0x1000
        LOAD           0x0000000000008000 0x0000000000408000 0x0000000000408000
                       0x00000000000decb1 0x00000000000decb1  R E    0x1000
        LOAD           0x00000000000e7000 0x00000000004e7000 0x00000000004e7000
                       0x00000000000553a0 0x00000000000553a0  R      0x1000
        LOAD           0x000000000013c9c8 0x000000000053d9c8 0x000000000053d9c8
                       0x000000000000d01c 0x0000000000024748  RW     0x1000
        DYNAMIC        0x0000000000148080 0x0000000000549080 0x0000000000549080
                       0x0000000000000250 0x0000000000000250  RW     0x8
        NOTE           0x0000000000000350 0x0000000000400350 0x0000000000400350
                       0x0000000000000040 0x0000000000000040  R      0x8
        NOTE           0x0000000000000390 0x0000000000400390 0x0000000000400390
                       0x0000000000000024 0x0000000000000024  R      0x4
        NOTE           0x000000000013c380 0x000000000053c380 0x000000000053c380
                       0x0000000000000020 0x0000000000000020  R      0x4
        GNU_PROPERTY   0x0000000000000350 0x0000000000400350 0x0000000000400350
                       0x0000000000000040 0x0000000000000040  R      0x8
        GNU_EH_FRAME   0x0000000000126318 0x0000000000526318 0x0000000000526318
                       0x0000000000002eb4 0x0000000000002eb4  R      0x4
        GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                       0x0000000000000000 0x0000000000000000  RW     0x10
        GNU_RELRO      0x000000000013c9c8 0x000000000053d9c8 0x000000000053d9c8
                       0x000000000000c638 0x000000000000c638  R      0x1
      

      It feels like somewhere in the flow there is the same thing that’s happening in python just more hidden. Python seems to expose it because a file can be a library and an executable at the same time.

      Anti Commercial-AI license

      • barsoap@lemm.ee
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        3 days ago

        Your ld.so contains:

        Entry point address: 0x1d780

        EDIT: …with which I meant, modulo brainfart: My libc.so.6 contains a proper entry address, while other libraries are pointing at 0x0 and coredump when executed. libc.so is a linker script, presumably because GNU compulsively overcomplicates everything.

        …I guess that’s enough for the kernel. It might be a linux-only thing, maybe even unintended and well linux doesn’t break userspace.

        Speaking of, I was playing it a bit fast and loose: _start is merely the default symbol name for the entry label, I’m sure nasm and/or ld have ways to set it to something different.

        • JATth@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          2 days ago

          Btw, ld.so is a symlink to ld-linux-x86-64.so.2 at least on my system. It is an statically linked executable. The ld.so is, in simpler words, an interpreter for the ELF format and you can run it:

          ld.so --help
          

          Entry point address: 0x1d780

          Which seems to be contained in the only executable section segment of ld.so

          LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
               0x0000000000028bb5 0x0000000000028bb5  R E    0x1000
          

          Edit: My understanding of this quite shallow; the above is a segment that in this case contains the entirety of the .text section.