Parallel Computing: Difference between revisions
(Created page with "Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard. == Drawbacks of Imperative Programming Languages == Imperative programming languages do have one mayor drawback: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds...") |
(No difference)
|
Latest revision as of 12:51, 17 May 2024
Almost 99% of all newly invented are imperative programming languages. But imperative languages have one drawback: their parallelization is hard.
Drawbacks of Imperative Programming Languages
Imperative programming languages do have one mayor drawback: state. The concept of an imperative language is that commands are executed which change the content of variables or complex objects in the memory. When trying to create an optimizing compiler that from itself finds parallelizable parts in the code, the compiler has to keep track of data dependencies and the random side effects of each command and function call.
The possibly simplest solution to this problem is to tell the compiler exactly which loops are parallelizable. This however forces the developer to write nearly side-effect-free code. So we decided to go the pure way – to design a programming language that does not allow side-effects.
The Functional World
A "pure" functional programming language is a language where every function will compute its result only and only from its inputs. This builds a great basis for highly parallel map-reduce algorithms like we need in our clusterable in-memory database.
We took the scheme interpreter from Pieter Kelchtermans written in golang and added some extra features:
- We removed the
set!instruction because it is the only function to cause global side effects All other functions are local to the current environment and as long as you don’t change the environment, every piece of code can be run in parallel without affecting each other - We made
beginto open its own environment, so self recursion can be done by defining a function in a begin block (!beginis the scopeless version) - We fixed
if - We also allowed strings as native datatypes as well as the
concatfunction which will concatenate all strings to one string - We added a serialization mechanism to fully recover values and turn them into valid scheme code again.
carli@launix-MS-7C51:~/projekte/memcp/server-node-golang$ make go run *.go > 45 ==> 45 > (+ 1 2) ==> 3 > (define currified_add (lambda (a) (lambda (b) (+ a b)))) ==> "ok" > ((currified_add 4) 5) ==> 9 > (define add_1 (currified_add 1)) ==> "ok" > (add_1 6) ==> 7 > (add_1 (add_1 3)) ==> 5 > (define name "Peter") ==> "ok" > (concat "Hello " name) ==> "Hello Peter" >
MemCP functions that support parallelism
The following functions support parallelism:
scanrunsfilter,mapandreducein parallel for each shard,reduce2is serialscan_orderrunsfilteras well as the sorting in parallel andmapandreducein serialparallelevaluates each given parameter in parallel and continues if all jobs are donenewsessionis a threadsafe key-value store to share context over threadsonceandmutexhelp to synchronize control flow
You can read the manual by typing (help "scan") in the scheme console.
Conclusion
What did we achieve?
- We chose scheme to be our language of choice
- We stripped away those parts from scheme that make it unsafe for parallel computing
- We added some useful functions to scheme to fit our needs (string processing, parallelization primitives…)
- We implemented a serialization function that can recreate scheme code from memory objects that can be loaded on other machines
- Now we can start implementing our highly-parallel map-reduce algorithms that can take map and reduce lambda-functions, execute them in parallel and enjoy the highly parallel result