Warning: Parameter 1 to Language::getMagic() expected to be a reference, value given in /opt/local/apache2/htdocs/wiki/includes/StubObject.php on line 58
Telecon 03-20-07 - OSR

Telecon 03-20-07

From OSR

Jump to: navigation, search

(Meeting notes at bottom)

Agenda:

1) Brief round table and status updates 2) Today's Topic: Pthreads support

Upcoming Topics:

  - System-level view
  - Interrupts
  - Portability and target archs


Discussion Starter:

One of the main new features of the N-way LWK is support for threading. The defacto API is POSIX threads (Pthreads) so we should have a pretty good idea of how they are going to be supported. In our world, full support probably isn't necessary, but we have to have enough to make the OpenMP runtime libraries and auto-parallelizing compilers happy. The core things I think we need to support efficiently are:

1) Spawning one thread per CPU
2) Synchronization primitives
3) Thread local data

Here's the Pthreads management API:

    pthread_t thread;   /* opaque thread ID */

    int pthread_create(
        pthread_t *             pthread,  /* returned thread ID */
        const pthread_attr_t *  attr,     /* can specify stack, detached, etc. */
        void *(*start)(void *),           /* function to run */
        void *                  arg       /* arg to pass to function */
    );
    pthread_t pthread_self(void);
    int pthread_exit(void *value_ptr);
    int pthread_detach(pthread_t thread);
    int pthread_join(pthread_t thread, void **value_ptr);

Things to discuss:

1) Some CPUs may share resources, e.g., a TLB, virtually tagged physical cache, etc. In this case, sharing the same page tables between cores might be a big win. By definition, Pthreads share the exact same address space, so this shouldn't be a problem. How about a new mechanism to say "I want this CPU to share this other CPU's address space".

    int
    region_mirror_all(
        cpu_t   src,
        cpu_t   dst
    );

Requirements are that the destination CPU not have any regions configured and not be running a thread already.

2) Need to distinguish between a thread exiting and a process exiting. In the latter, the LWK needs to notify the job launcher that the process has exited. Do we need a concept of the MAIN_THREAD? Each thread has a MAIN_THREAD associated with it, which is inherited unless the MAIN_THREAD flag is set. Exit syscall can be used to kill all threads with the same MAIN_THREAD... a thread group.

    int
    thread_create(
        cpu_t           cpu,  /* effectively the thread's ID */
        void * (*start_address)(void * priv),
        void *          priv,
        void *          stack_pointer,
        uint32_t        flags   /* MAIN_THREAD */
    )

    int
    thread_exit(
        void *          exit_value,
        int             kill_all        /* boolean */
    );

    Function            Main Thread       Other Threads
    ========            ===========       =============
    return()            All die           This thread dies
    pthread_exit()      All die           This thread dies
    exit() or _exit()   All die           All die


3) How is joining implemented?

    int
    thread_wait(
        cpu_t           cpu,
        void **         exit_value
    );


4) Pthreads mutexes and condition variables need to be efficiently supported. It doesn't seem like spin-waiting at user level will be a good idea with hundreds of cpus. Contention could be very high and N cache misses each time somebody unlocks the mutex. Furthermore, some cpus may share resources (an execution pipe in the case of SMT), making spin-waiting expensive. Possible options:

   1) Add kernel level support for mutexes (ala futexes)
   2) Rely on advanced hardware features like the MONITOR/MWAIT
      instructions on x86.  MONITOR allows you to specify a cache
      line to monitor for changes and MWAIT puts you to sleep until
      a change occurs.  AFAIK, nobody actually uses these for
      synchronization yet.
   3) Others?

Perhaps we needs something like Futexes on Linux.

Futexes:      http://people.redhat.com/drepper/futex.pdf
Linux NTPL:   http://people.redhat.com/drepper/nptl-design.pdf


5) Other topics I've missed


Meeting Notes

0) Discussed current interrupt problems a bit. Traps currently disable local interrupts. For now, probably fine... plan is for there not to really be any system calls during runtime. However, kernel code still has to do locking like interrupts are enabled since other cpus can be in the kernel at the same time. So, changing to leaving interrupts enabled for traps shouldn't be very hard.

1.0) Agreed full Pthreads support not necessary. Made analogy to OpenGL implementations... most are very incomplete but have all of the important functionality.

1,1) region_mirror_all() seems OK in principle. Agreed there is an issue with cpus that share resources. Probably don't need to implement at time 0.

2) Goal is to try to keep process management out of the kernel as much as possible.

3) Time 0 join solution is busy waiting. Long term, probably need some mechanism to sleep for CPUs that share physical resources (execution pipe, cache, etc.).

4) Time 0 mutex solution is also busy waiting. MONITOR/MWAIT sound promising.

5) Discussed next weeks' topic a bit: Job loading