3 Embedded DSLs B
Due: Wednesday, January 22, 11:59pm.
You must work with a partner. Please let me know if you do not have a partner.
Here is the starting point for this assignment: hw3-starter.zip. It includes a signature for the SQL DSL discussed in class and some example data and queries.
You will need to have the csv-reading package installed in your Racket installation. If you are using Racket from the command line you can run:
raco pkg install csv-reading
If you are using DrRacket, you can open the package manager from the File -> Package Manager menu, enter csv-reading in the "Package Source" field, and press "Install".
3.1 Implementing with lists
Implement the signature from signature.rkt using a (ListOf Row) as the type of data within a query-result. Put this implementation in core-lists.rkt.
Your task will be much easier if you make good use of Racket’s standard library and comprehension forms. Some forms I found use for include for/list, for*/list, for/fold, sort, take, remove-duplicates, hash-union, hash-filter, hash-keys, stream->list, for/stream, for*/stream, and stream-take.
3.2 Designing extensions
Beyond the features included in the signature, it would be useful for the DSL to include features for:
Sorting the data by a given column, like SQL’s ORDER BY clause.
Creating a new column that is derived by computing a function of other columns.
Design and implement these additional features in your core-lists.rkt. Include signatures and documentation that fully describe their behavior.
3.3 Implementing with streams
Next we’ll use Streams from racket/stream to create an alternative implementation that is much faster for certain queries.
Implement the signature from signature.rkt using a (StreamOf Row) as the type of data within a query-result. Put this implementation in core-streams.rkt. Include the features you designed in the previous section.
Many operations can be implemented with streams in a way that avoids processing the entire data set for queries that only request a limited number of results using a limit clause. However, some operations fundamentally need to examine the entire data set.
Design your implementation to avoid processing the entire data set when possible. In your README.md file, explain which operations this is not possible for, and why. For those where it is possible, include some RackUnit: Unit Testing tests demonstrating this in a test submodule of your core-streams.rkt. You might find the following procedure useful for formulating tests:
; (ListOf Row) -> (StreamOf Row) ; Produces a stream that contains the given rows, but raises an error ; if forced beyond those values. (define (rows-then-error rows) (stream-append (for/stream ([row rows]) row) (stream-lazy (error 'rows-then-error "stream forced too far"))))
Verify that the query in flights-query.rkt runs substantially faster when you require your core-streams.rkt instead of core-lists.rkt.
3.4 Extra credit 1
There are multiple ways to implement the join operation. Most likely, you implemented a nested loop join. In some circumstances, a hash join can be much faster.
Implement a join/hash operation with the same API as join but that uses a hash join. In your README.md, analyze the performance consequences of the choice of join including any impact on the circumstances in which streams need to be fully forced.
3.5 Extra credit 2
Design and implement a generalization of the aggregate operation called aggregate/multi. It should allow aggregation of any number of columns using different Aggregators, as well grouping by a list of multiple columns instead of a single column.
Provide an example use of your aggregate/multi in the README.md.
Submission
Submit to Gradescope, in a group submission with your partner: https://www.gradescope.com/courses/941020/assignments/5637195/
Upload your core-lists.rkt, core-streams.rkt, and README.md file. Include any tests via test submodules in your Racket files.