UC Berkeley: Fall 2009: CS 164: Homework 3 (Final Project Proposal) & Final Project: November 2009

Friday, November 20, 2009

Final Project Proposal: Part 4

Prompt: Decide on the implementation.

•frontend:

We will parse the language in the same way we parse the 164 language: we will be using the same AST generated from the symantic actions in project 3 to create the DAG.

•the core language: We will reuse the 164 language from project 3. The least that is required for creating the DAG is just a list of statements. After this, we will extend this to lists of statements within lists of statements when we incorporate lambdas.

•internal representation:

There are two ways we can interpret the dag created from the AST. One way is to execute the statements while traversing the dag. The other way is to compile the dag into bytecode and send that to the bytecode interpreter to run it in parallel. We are leaning towards the bytecode interpreter since we would only have to modify it and it gives us the freedom to parallelize coroutines (if we figure out a way how of course).

•interpreter/compiler:

The parser will generate the AST for the 164 language as before. Instead of compiling the AST into bytecode, the compiler will traverse the AST and record what variables are written and/or read at each statement. Then based on this the compiler will create a DAG where each node represents a statement and the edges represent the flow of the program. This means that before node N is executed, all of its parent nodes must be executed.

The bytecode interpreter will start at the top of the dag and run each independent source in parallel. Every child statement will have a counter to count how many parent statements have finished executing. When a statement is done executing, it will increment the counter of its child and check to see if the counter value is equal to the number of parents of the child. If it is equal, it will execute that statement. This continues until the DAG is traversed.

•debugging: We will debug our implementation by using GraphViz to visualize the DAG.

Final Project Proposal: Part 3

Prompt: What features specifically will your language support.

1. The domain.

Our language is based on the 164 language. Our plan is to implement the parallelism in this order:
1. General statements (var/function definitions, calculator statements, variable assignment)
2. Function calls (includes the loops)
3. Objects
4. Passing/returning of functions as first class data
5. coroutines

We believe that at this point we know how to implement the first three steps. We still have to think about the last two. Therefore, although our language will be a subset of the 164 language, it may not be equivalent (depending on how far we get). By the end of the project, we hope to simulate the parallel execution of divide and conquer algorithms like quicksort and merge sort.

Example programs:

Any divide and conquer algorithm like merge sort.
Programs that combine the results of subproblems like when a formula is split up into parts to make calculation simplier.
Maybe algorithms that iterate over a list like in mapping.

2. Give an outline of the programming model of your language.

The model of the language will be object oriented and imperative (like c++) and, depending on how far we get, may support functions as first class data (functional programming).

The method we have proposed converts a list of statements into a DAG. A program, however, is more than just a sequential list of statements: functions, which may be nested inside other functions, contain lists of statements. Thus, DAGs may be composed within other DAGs. But first, we will only focus on converting programs that are merely lists of statements into DAGs.

3. Example program:

def x = 0
def y = 0

def range(start,end){
def ret = {}
def i = 0
def length = end - start
while(i < length){
ret[i] = start+i
i = i+1
}
ret
}

for (i in range(0,10)){
x = x+1
}
for (i in range (0,20)){
y = y+1
}

print x+y

The idea behind this program is to have the for loops split into multiple processes since they do not depend on each other. Then when both processes are done, the program will continue by printing x+y .

Final Project Proposal: Part 2

Prompt: Study a language that solves a similar problem and describe it in this homework.

1. One simple code example.

Since we are changing the implementation of a language rather than the language itself, it does not make sense to provide example code since the code will resemble the 164 language code. According to Wikipedia,

"Automatic parallelization of a sequential program by a compiler is the holy grail of parallel computing. Despite decades of work by compiler researchers, automatic parallelization has had only limited success."

Therefore, we will be pleased with parallelizing a subset of the 164 language (Ras said he thinks it's possible). Fortran 90, for example, parallelizes element wise array operations. We propose parallelizing statements instead. Our proposed implementation is described in part 4.

2. Describe the implementation.

SEE PART 4

Final Project Proposal: Part 1

Prompt: Choose the problem that your language will solve.

1. What problem are you addressing?

We will address the problem of automatically converting sequential programs into parallel programs.

We will address two ways to achieve this. First, we will create a tool that will analyze an abstract syntax tree (AST) generated from code written in the 164 langauge to generate a structure where parts of it could be run in parallel by a hypothetical interpreter. (The actual interpreter written to simulate this hypothetical interpreter will, of course, be sequential.) The basic idea in the analysis is to create a directed acyclical graph (DAG) to describe how statements that perform read operations are dependent on statements that perform write operations. The interpreter will then run the program by traversing this DAG starting from all its source nodes. Thus, the first technique creates a structure that defines how the program is parallelized before it is given to the interpreter.

In contrast, the second technique parallelizes the program while it is interpreted. This will, like the first technique, require analysis of the AST before it is given to the interpreter, but the structure for parallelizing the procedure will not be explicit. In particular, consider a function that makes two recursive calls to itself, as merge sort does. Although there is no way to explicitly generate the tree structure this algorithm represents at compile time without knowing what the input is, what can be determined is that the two recursive calls can be done concurrently.

Thus, the first technique is suited for parallelizing sequences of statements while the second is suited for parallelizing recursive procedures.

However, it is debatable whether these two techniques are actually the same technique. What ultimately matters is the dependences of the read operations on the completion of write operations, as the first technique addresses. Therefore, given the first technique, the second technique may come for free. In fact, the first technique may be required for the second technique to work.

Instance of the problem: Parallelizing the merging of the results of two computations (first technique).

Consider the following pseudocode:

# Read set: {}
# Written set: {a}
def a = compute1()

# Read set: {}
# Written set: {b}
def b = compute2()

# Read set: {a, b}
# Written set: {c}
def c = combine(a, b)

The first two lines can be parallized, but doing so using pthreads would require overhead that will obsfucate this code. Instead, a DAG where an arc is drawn to a statement that reads a variable from the last statement that wrote to the variable reviews how this code can be parallized (see the following figure).

2. Argue why this problem is hard.

We can give two reasons for why is is difficult to convert sequential programs into parallel programs. First, focus shifts from the natural algorithm for solving a problem to how to get the algorithm to be parallel. It may be difficult to force parallism into the abstractions used to solve the problem. Second, the implementation may require a lot of overhead. A programmer would have to deal with constructs for parallelism, such as threads and mutexes, and the issues that arise with them, such as when to join threads or lock mutexes.

3. Argue why this problem is worth solving.

People have come to expect processor performance to increase exponentially each year, in other words, to follow Moore's Law. However, because processors are beginning to reach the power wall, where the doubling of the power required for doubling performance is the limiting factor, the trend in processor design is to parallize. That is, multiple slow processors working in parallel are used rather than a single fast processor.

However, unlike the case when writing sequential programs, the hardware currently cannot be abstracted away for programmers writing parallel programs. Thus, taking advantage of the benefits of parallism is hard for programmers. If converting sequential programs into parallel programs can be done automatically, programmers will be free to concentrate on solving problems with high-level thinking rather than be bogged down by the low-level details of managing threads. Such code would be easier to maintain, especially when it is adapted to new requirements.

Monday, November 16, 2009

Crickets chirping...

Sunday, November 15, 2009

This is team Grammatically Incorrect's submission for the third homework of CS 164, Programming Languages and Compilers.

UC Berkeley: Fall 2009: CS 164: Homework 3 (Final Project Proposal) & Final Project