Contents

Problem

Title: Directions in Parallel and Concurrent Programming

Project Report Writeup

Abstract

There has been a constant buzz in the parallel processing domain of the requirement of a good programming paradigm and a new language for parallel programming. In this report we try to study the various languages currently in use as well as new upcoming languages in this domain. We define a set of metrics on the basis of which we evaluate various current languages. We then try to see what the common as well as differentiating trends are among the more successful and less successful of these languages. The set of languages that we evaluate includes academic as well as commercial languages, all of which are in use today. We then conclude with what we found are the things that do not seem to contribute to the success of languages while what things one must be really careful about when designing and evaluating languages. As an aside we also test the old adage that maintainability is the most important aspect of commercially viable language.

Introduction

In this report we present the results of our study of the current state of paradigms and languages in the field of parallel programming. We started out with the well-recognized fact that there is a need for a better parallel programming language. We wanted to study the reasons of this assumption. With this purpose in mind we created a set of metrics by which to study the current parallel programming languages and then looked for trends that could predict what a better programming language should provide. Section 2 covers this introduction. Section 3 covers the details of the metrics that we used and their classification and use. Section 4 talks about the methods we used to evaulate the various languages by these metrics. In section 5, we provide a list of languages that we used for this study, the summary of which is provided in Section 6. In Section 7, we list the various interesting trends that we observed in the data and draw conclusions from them in Section 8. We enlist some ways to improve upon this report in Section 9. Finally, Section 10 provides links to the more important papers and webpages that aided in our study.

Selection of Metrics

When comparing the various languages we studied, we had to come up with metrics that were not specific to any one language or paradigm. It was important that the metrics would be comprehensive as well as meaningful. The metrics to be used had to be general enough so that all the studied languages exhibited them but specific enough so that different languages exhibited them differently.

Language Metrics

T. Green, in his paper on evaluation of visual frameworks, developed a set of metrics called the 'cognitive dimensions'. Later Steven Clarke, from Microsoft, showed that these metrics form a good basis for evaluation of a programming language. We included the relevant metrics into this study. These include:

  • Ease of use
  • Error-proneness
  • Hidden Intra-code dependencies
  • Incremental Parallelism
  • Abstraction Gradient
  • Consistency
  • Dependence on good design
  • Terseness
  • Viscosity
  • Portability

Of these, "ease of use" seemed like a very subjective metric. Since we were specifically studying parallel languages, we decided to further qualify this metric based on the assumption that a programmer taking on a parallel programming language must have some initial experience in programming. Based on this we evaluated ease of use on

  • what programming model the language uses / introduces.
  • how many new concepts need to be learned
  • whether there is any new syntax introduced.

Development Environment

We evaluated the development environment based on the following features

  • Language support in environment
  • Test Support
  • Debugging Support

Parallel Language Specific Metrics

There are a lot of features in a parallel programming language that make sense only in a parallel environment. These include

  • Problem Domain
  • Task Identification
  • Data Distribution / Sharing method
  • Synchronization Methods
  • Communication Framework Used
  • Incremental Parallelism
  • Generality of problem domain and hardware
  • Distributed v/s Multicore languages
  • Transaction Management

Tools and support

The ease of programming in any particular language is enhanced by the easy availability of various tools that aid in programming as well as the technical and community support that is available. These help the easy adoption of any language by new programmers.

Maintainability

The maintainability support seems to be the least valued feature by new language developers but very highly valued when inducting a new language for commercial use. We studied the following metrics that provide for better maintainability.

  • Code Readability
  • Documentation Support
  • Viscosity (Ease of making changes to the code)


Method of evaluation

For some of the languages, some or all of us had used those for coding and had a fair idea about how they fare on the metrics. Where we were unsure, we referred to available standards and literature to confirm our evaluations. None of us had worked on languages such as Erlang and Fortress before and so we learned those languages and wrote different programs in them to evaluate them first-hand. We had initially thought of creating a survey and getting programmers from various languages to fill it up to get varied inputs. But we rejected this due to dearth of enough time and people to interview who would have used some of the languages that we were surveying.

Languages Studied

Given the above metrics, we picked up a list of languages which are representative of past successes as well as current research trends which we believed have a good success possibility. These are:

  • Erlang : By Ericsson Computer Science Laboratory; soft realtime, declarative, functional language for concurrent, distributed systems
  • MPI : language-independent computer communications descriptive API for message-passing on a parallel computer.
  • OpenMP: An API for multi-platform shared-memory parallel programming in C/C++ and Fortran
  • Pthreads: IEEE POSIX 1003.1c standard threads. defined as a set of C language programming types and procedure calls
  • MapReduce: A software framework originally developed by Google to support parallel computations over large (greater than 100 terabyte) data sets on unreliable clusters of computers.
  • Cascade: Integrates a multiprocessor visual language and advanced automated scheduling techniques that enable rapid application development
  • Intel Tools (Threading Building Blocks):
  • Fortress: developed by Sun Microsystems as part of a DARPA-funded supercomputing initiative.

Language Comparison

The following table summarizes the evaluation of all the languages on the various metrics that we studied.

Parallelism Specific Metrics

Erlang MPI pthreads Cascade OpenMP MapReduce Intel Tools FORTRESS `
Domain General Purpose Soft-Real time applications HPC General Purpose pipelined recurrent tasks / specialized application chips HPC Parallel computations on large and independent data sets General purpose General Purpose, + HPC
Task Identification
  • explicit division into tasks
  • tasks split into lightweight processes
  • can control number of threads
  • explicitly specify tasks
  • can synchronize at arbitrary points in code
  • tasks split into threads
  • explicit thread creation/deletion
  • explicit division into tasks
  • can control number of threads of execution
  • prioritizable tasks
  • provides automatic generation of near
  • optimal schedules for static scheduling
  • Task creation and identification is implicit (through compiler directives) specification of functionality in terms of map and reduce implementations; concurrency handled by runtime system Task based parallelism (specify threading functionality in terms of logical tasks)
  • implicit task identification
  • tasks made into threads
  • explicit threads spawning capability
  • Data Distribution/Sharing message passing message passing shared memory shared memory shared memory implementation dependent; Phoenix uses shared memory shared memory, shared memory explicit data and task distribution possible using 'regions' and 'distributions'
    Communication message passing message passing shared memory shared memory shared memory shared memory shared memory
    Synchronization "Implicit through messages, _no_ shared data" synchronization can be achieved by explicitly waiting for messages mutex/semaphores specialized semaphore memory / as present on chip "implicit (between directives) and explicit (atomic sections) synchronization possible" runtime system; no dependencies between map/reduce stages; mutexes "through atomic sections (including abortable atomic sections and transactions)"
    Portability "Interpreted Virtual machine code so easily portable" portable as long as the implementation language compiler supports new hardware common specification and API. Implementations available for Unix/Linux/Solaris/Win32 requires new code generation tools for every hardware support for several languages (C, C++, Fortran) and several compilers implementation dependent; Hadoop is java based so highly portable; implemented as a library for several platforms (Win, Linux, Mac) slated for various OS'es. Current interpreter implementation on JVM.

    Language Dependent Metrics

    Erlang MPI pthreads Cascade OpenMP MapReduce Intel Tools FORTRESS `
    Ease of learning
  • programming model
  • Functional programming
  • standard independent of specific language model
  • can be implemented as a library as long as the network semantics are followed
  • API well-known and simple;
  • implemented as a library
  • graphical modelling simply a set of compiler directives (for the most part); also exists as a minimalistic API for fine grained parallelism Inspired by functional programming languages. Uses C++ templates and coding style resembles Scala, Standard ML, and Haskell, with familiar mathematical syntax
  • new concepts
  • modeling state implicitly as function arguments;
  • immutable variables;
  • communication using strict message passing;
  • strong, dynamic typing
  • completely new API
  • support for asynchronous reads and writes that requires careful buffer management
  • new API for specifying thread creation, deletion, rendezvous;
  • handling of shared state with locks (mutexes)
  • identifying data and control dependence between tasks.
  • graphical specification of control structures
  • clean division of model into parallelizable entities
  • innovative approach to parallelism: write a sequential piece of code, and then parallelize sections of it using pragmas/compiler directives;
  • no need to specify number of threads (handled by compiler transparently)
  • can specify data to be shared between threads, etc.
  • rigid data access patterns (key, value) pairs.
  • implemented as a library
  • Uses C++ templates
  • Automatic load balancing
  • unicode source code including subscripts and superscripts
  • rendering of code to emulate mathematical notation
  • automatic parallelism,
  • syntactic support for writing down many common kinds of collections, (tuples, arrays, matrices, vectors, maps, sets, and lists) simply by enumerating all of the collection’s elements.
  • facility for expressing programmer intent concerning the distribution of large data structures at run time
  • syntax
  • Resembles prolog no new syntax as its an API for each language no new syntax as its an API for each language model invariant of chip. actual code as required by chip manufacturer set of directives; API self-explanatory and published as part of specification. implementation dependent; no standardization
  • C++
  • uses well defined and understood abstractions (parallel_for, parallel_while, parallel_sort, etc)
  • resembles Scala, Standard ML, and Haskell, with familiar mathematical syntax
    Error proneness
  • No race conditions (message passing)
  • No locking (no shared state)
  • Mostly local/logic errors
  • asynchronous data transfers create errors that surface later in the code than where the error is
  • handling shared state; race conditions modelling language includes checks for deadlocks but does not check for memory overlay errors hard part is identifying sections of code to parallelize; from openmp site: recommended approach is eye-balling code. Does not work for fine grained parallelism, but is trivial for coarse grained parallelism (like simple for loops). map and reduce implementations are simple; task and data distribution handled completely by runtime system.


    Hidden intra-code dependencies code dependencies explicitly identified through messages; no hidden dependencies buffer / memory reuse might create unexpected code dependencies due to asynch reads and writes synchronization is error prone modelling is clean synchronization is error prone implementation is clean implementation is clean


    Incremental Parallelism easily parameterizable; runtime system manages scheduling of processes hard to add new processes unless specifically designed with that in mind since all processes must be named for all system calls cannot exploit incremental parallelism directly; need to create new threads. easily parameterizable, just a re-compilation completely transparent; easily scalable handled by the library transparently completely transparent; easily scalable


    Abstraction Gradient virtual machine based; abstracted from hardware
  • clean abstraction of parallelism not possible
  • library system abstracts code from hardware
  • library abstracts code from hardware
  • model is highly abstracted from the hardware or the language to be used
  • complete code requires specialised knowledge of the hardware
  • abstracted from hardware completely abstracted from hardware; runtime system manages scheduling; can run on clusters/SMP/multicore completely abstracted from hardware highly abstracted from the hardware and platform
    Generality
  • of problem domain
  • general purpose concurrent/distributed computing; Sending/receiving messages asynchronously is cheap. due to the message passing model can be used only for coarse-grained parallelism. Fine-grained parallelism has a large communication overhead fine-grained parallelism has high synchronization (shared state) overhead; useful for general purpose concurrent computing; works best only for recurrent tasks and single functionality chips coarse grained parallelism; works best for loop parallelism useful for data intensive computations useful for general purpose computing on multicore/SMP platforms


  • of hardware
  • SMP/multicore/Distributed computing since its an API works on any platform that the language being used compiles on completely detached from hardware; depends on system scheduler for exploiting hardware abilities; model independent, but requires translation schemes for any new hardware SMP/multicore (modelled as threads) implementations available for clusters (Hadoop) as well as SMP/multicore (Phoenix) multi-platform support


    Consistency functional programming semantics; few basic constructs; simple concurrency and communication primitives; directed effort to create a consistent and minimal set of procedures POSIX interface; well-defined; few basic constructs with very small overloading of semantics Standard; specification available; very few basic constructs; cleanly defined specifications well-defined interface moderately consistent, with intermixed and overloaded semantics


    Dependence on good design highly sensitive highly sensitive and also harder to change highly sensitive to good design: identify parallelizable code segments and critical regions is error prone and difficult to debug. highly sensitive to selection of task division and multiplicity _highly_ sensitive depends on the complexity of map and reduce highly sensitive highly sensitive


    Terseness terse; (functional programming language based) verbose and intermingled into the whole code structure cleanly defined; nomenclature is sufficient; highly terse due to high abstraction cleanly defined depends on implementation API cleanly defined cleanly defined


    Distributed or multicore both applicable to distributed memory architectures multicore/SMP (depends on system scheduler) multicore multicore/SMP both multicore Multicore


    Automatic Transaction management? no transaction management possible no transaction management possible no transaction management possible no transaction management possible provision for atomic sections, but no rollback none Yes


    Development Environment Based Metrics

    Erlang MPI pthreads Cascade OpenMP MapReduce Intel Tools FORTRESS `
    Programming Environment
    Language support interpreter and compiler no special support POSIX API add-on as library (pthreads on linux, solaris, win32) Eclipse IDE with code generator C/C++/Fortran with several compilers independent implementations available library; C++ template based Eclipse plugin still in beta with no stable download. Text file manip till then
    Test support none none none
  • supports multiple versions for test and release
  • supports generation of performance charts and timelines per core in use
  • none none Automatic test support in code. Programmer can describe unit tests and test suites along with source. Components, APIs, traits, and objects can declare properties which describe boolean conditions that the enclosing construct is expected to obey
    Debugging debugger available
  • language-specific debugger
  • some commercial tools available for parallel debugging and performance monitoring
  • no special tools for debugging available; race conditions are extremely hard to track down and debug; Poor support in IDE commercial tools available for debugging rich tool chain available from Intel: Thread checker, Thread profiler, Debugger. None
    Community Support commercial and open source versions commercial and open source versions alternatives available (NPTL) language under development for commercial purposes extensive support available; specification Hadoop is open source commercial tool, so support available; extensive documentation still nascent. we expect it to get good support in future



    Metrics of Maintainability

    Erlang MPI pthreads Cascade OpenMP MapReduce Intel Tools FORTRESS `
    Code readability code is hard to read; hard to figure out synchronization points "simple, self-describing API;" "models may get arbitrarily complex, but are simple to read for most applications" highly readable code (addition of simple directives to sequential program to parallelize it) c++ based code readable, but can get complex at first glance
    Documentation support language dependent; refactoring/profiling tool available language dependent language dependent simple commenting is possible trivial based on implementation not extensive yet
    Viscosity (introducing changes) might need extensive changes hard introducing changes is hard with shared state simple to change introducing changes in parallel sections is hard and error prone simple to change and model

    Performance Metrics

    Erlang MPI pthreads Cascade OpenMP MapReduce Intel Tools FORTRESS `
    Memory requirements low overhead (tail recursion and other optimizations) low overhead implemented as a library which is linked to. Memory overhead is low; low requirements:
  • a thin scheduler for task threads for dynamic thread allocation
  • no extra overhead for static scheduling
  • low requirements implementation dependent low overhead (from Intel website) Current JVM implementation requires moderate to high memory requirements
    Scalability highly scalable;
  • process creation is _very_cheap and _very_ fast
  • can scale to more in-core processors due to VM implementation (one scheduler per core)
  • can tune parallel behavior (command line argument: -smp N)
  • scalability hampered by message passing overhead
  • easily scalable to more in-core processors
  • automatic rescheduling depending on number of cores available
  • easily scalable (compiler) easily scalable; parallelism and concurrency handled by runtime system; library and runtime manages scalability. Scales easily with processors.


    Observations

    We found many interesting trends in the data above. For some metrics, like the programming model, we were surprised to see a lack of a definite trend whereas in the case of "maintainability metrics" we found a consistent trend towards better support in languages that are deployed in the industry. There were a few things that commercial as well as academic languages showed in common, some of them being

    • Lack of a 'favorite' language model.
    • Fine-grained Parallelism requirements were mostly supported
    • Portability was usually a design goal and so achieved in theory or practice
    • Problem domain generality was almost never considered important, though the granularity at which various languages found application differed.

    Some of the trends in which commercial languages differed from academic languages include:

    • Better test and debug support
    • Ease of use and change
    • Scalability was better studied
    • Availability of performance monitoring tools

    Conclusions

    The need for a better parallel programming languages seems to be justified by the above observations. Most of the important features that a commercial language provided were visibly absent from the easily available languages. Though most of these were peripheral features like debug support and performance monitoring tools, we believe that just the availability of these tools is not a reason enough. The language should also provide constructs to simplify the usage of such tools. We also observe that in commercial languages, maintainability needs are met with more frequently than in academic languages where performance and comprehensiveness were more stressed upon. Also, while evaluating the languages we realized that there were little or no standard benchmarks to decide whether a particular feature was at-par with the current trends. Development of such benchmarks is also a very urgent need.

    Future Work

    While, we have tried to come up with certain metrics, we have used a small subset of parallel languages to be evaluated on these metrics. Future work can thus go in two directions. One in reviewing these metrics and the second in expanding the list of studied languages. To these ends, getting assessment and review from experts from both the academia and the industry is necessary.

    Some of the languages studied are very new (e.g. Fortress). It would be a good exercise to follow its life-line, collecting moremore data based on these metrics. At the same time, a study of whether such data can predict the future of new languages and create a better approach to designing new ones would be useful for future language/paradigm designers.

    References and Links

    Weekly Goals and Milestones

    • Week 0: (- to 13 may)
      • Read up and familiarize with scope and current work. - DONE
    • Week 1 Goals: (14 may to 20 may)
      • setup, write sample code and assess two different parallel programming languages
        • FORTRESS - DONE
        • ERLANG - DONE
      • read up following papers
      • finalize a sample problem to implement across languages - IN PROGRESS
      • Investigate into following:
        • Salishan Problems - DONE
        • Dwarfs - DONE
    • Week 2 Goals: (21 may to 27 may)
      • Formalize comparison metrics; tabulate - IN PROGRESS
      • Work on two/three more programming languages (pthreads, cascade, mpi, mapreduce) - DONE
      • Interview current students and researchers in Stanford (Christos Kozyrakis and Kunle Olukotun groups) and get views from them
      • feedback and final steps from monica/ted
    • Week 3 Goals: (28 may to 3 june)
      • propose and review report/paper outline
      • collect relevant test data
    • Week 4 Goals: (4 june to 6 june)
      • propose and review report/paper outline

    Week 0 Notes

    Interesting thoughts:

    Investigated Papers:


    • Techniques and programing languages:
      • PeakStream Inc: A company specializing in data-parallel programming. Look at their

    research briefs, and intro vids.

    Week 1 Notes

    Paper Read-up Links:

    Based on Salishan problems, a few desired properties of parallel languages:

    • A language's ability to express recursive stream computations and producer/consumer parallelism, and to support dynamic task creation. (Hamming's Problem)
    • Language ability to represet recursive tree structures, the creations and manipulation of those structures and nested loop parallelism.(Paraffins Problem)
    • Language`s ability to program a set of concurrent, asynchronous processes with circular dependencies (The Doctor's Office)
    • Ability to define array structures that include nonessential elements (i.e. the zeros), and given those structure, efficiency of parallel and iterative array computations. (Skyline Matrix Solver)


    FORTRESS

    Links

    Notes on Language

    • "The Java Programming Language is designed to use a familiar syntax and class system while providing as much portability as possible. As a result, it is well suited for network and embedded programming. In contrast, Fortress uses many novel language features in an attempt to alleviate the tension between modern software engineering principles and high performance computing. Fortress is well suited to programming on massively parallel computers, as well as on smaller systems"
    • Code can be written in Unicode
    • Fortress “loops” are parallel by default
    • Developed by Sun for DARPA’s High Productivity Computing Systems Effort?
    • New language designed from scratch
    • Bias for scientific computing (FORTRAN based); but general purpose in scope
    • statically typed,
    • component-based
    • syntax extensions to other application domains possible
    • Just in time compilation
    • Programs typeset to mathematics
    • Unicode operators for overloading
    • Libraries can define new syntax
    • Parallelism by default
    • Concept of "Generator" central to parallelism
    • "Generators" (defined by libraries) manage parallelism and the assignment of threads to processors
    • Allows for specification of data distribution through the use of ``distribution" data structures
    • Features for parallelism:
      • Regions
        • Model abstract structure of the machine
        • Threads and objects belong to a region
        • Can request code run in a given region
      • Objects can be shared or local
      • Can re-distribute an array

    Notes on Installation

    • Local Machine Specification: Windows, Cygwin, Java 1.5, Ant
    • setup = unzip files.
    • Build was quite easily accomplished
    • Sample "Hello World" and demo samples very easy to run.
    • No configuration had to be specified. Is there config stuff I can alter? Can i specify the language to use certain number of processors?

    Sample Code

    • Sudoku implementation


    ERLANG

    Links

    Notes on Language

    • "Erlang is a general-purpose concurrent programming language and runtime system. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing. For concurrency it follows the Actor model. It was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It supports hot swapping so code can be changed without stopping a system."
    • Extensively used within Ericsson for various projects. Open sourced in 98.
    • Concurrency through multiple 'lighweight', easy to create and manage processes.
    • All concurrency is explicit (Message passing).
    • The processes communicate via a share-nothing asynchronous message-passing system: every process has a "mailbox", a queue of messages sent by other processes, that are not yet consumed. Threads then use the receive primitive to retrieve messages that match desired patterns. A message-handling routine tests messages in turn against each pattern, until one of them matches. When the message is consumed (removed from the mailbox) the thread resumes execution. A message may comprise any Erlang structure, including primitives (integers, floats, characters, atoms), tuples, lists, and functions. The requirement being that these structures be serializable.
    • Erlang supports Distributed programming through the same message-passing system.

    Week 2 Notes

    Thoughts (Under Development)

    Possible metrics to compare and we consider important when designing/using a HPC language:

    • Automatic Transaction management? (with possible use of keywords like atomic)
    • Automatic Parallelism? (movement of jobs to processors, shchedules etc)
    • Automatic dependency detection?
    • Error recovery strategy
    • Support for specification of data parallelism
    • Support for specifying/altering topology information
    Last modified June 13, 2007 2:50 am / Skin by Kevin Hughes
    MediaWiki