This article will show the design of the fastest ever interpreted subsets function. Sorry I'm too excited about this. Again, we start with the mathematical definition of the problem, which leads to a simple, correct and stunningly efficient solution. The final, fastest ever solution is still pure functional.
We start with the definition of the problem: given a set 'l' and a number 'n', return the set of all subsets of 'l' of cardinality 'n'. Sets are represented by lists.
The key was to choose the right definition.
Let ps(L) is a powerset of L that does not include {empty set} (that is, a singular set whose element is the empty set). length(ps(L)) = (- (expt 2 (length L)) 1)
Let L be a non-empty set and let L = A U B where A and B are two disjoint subsets of L.
Obviously, ps(L) = ps(A U B) = ps(A) U ps(B) U { y U x | x <- ps(A), y <- ps(B) } Let (subsets L n) = (filter (lambda (el) (= n (length el))) ps(L) )
Thus the desired function 'subsets' is a filtered powerset ps. This seems to be a stupid definition. Let's not be hasty however. Note that filter and union commute: the filter of a union of sets is the union of filtered sets. Therefore, from the previous expression
(subsets L n) = (subsets (union A B) n) = (subsets A n) U (subsets B n) Union{ y U x | x <- (subsets A k), y <- (subsets B (- n k)), k=1,n-1 }
Well, this is it. Note, we didn't say how to split L into two disjoint subsets A and B. We can do as we wish. For example, we can choose to split in such a way so that (length B) is n. In the most difficult case where n = (/ (length L) 2), this corresponds to a "divide and conquer" strategy, so to speak (at least in the first stages).
The Scheme code below implements this idea verbatim. It uses the accumulator-passing style that was expounded earlier.
(define (subsets-v5 l n)
; The initialization function. Check the boundary conditions (define (loop l ln n accum) (cond ((<= n 0) (cons '() accum)) ((< ln n) accum) ((= ln n) (cons l accum)) ((= n 1) (let fold ((l l) (accum accum)) (if (null? l) accum (fold (cdr l) (cons (cons (car l) '()) accum))))) (else (split l ln n accum))))
; split l in two parts a and b so that (length b) is n ; Invariant: (equal? (append a b) l) ; ln is the length of l (define (split l ln n accum) (let loop ((a '()) (b l) (lna 0) (lnb ln)) (if (= lnb n) (cont a lna b lnb n accum) (loop (cons (car b) a) (cdr b) (+ 1 lna) (- lnb 1)))))
; The main body of the algorithm (define (cont a lna b lnb n accum) (let* ((accum (loop a lna n accum)) (accum ; this is actually (loop b lnb n accum) (cons b accum)) ) (let inner ((k 1) (accum accum)) (if (> k (min lna (- n 1))) ; don't loop past meaningful boundaries accum (let ((as (loop a lna k '())) (bs (loop b lnb (- n k) '()))) (inner (+ 1 k) ; compute the cross-product of as and bs ; and append it to accum (let fold ((bs bs) (accum accum)) (if (null? bs) accum (fold (cdr bs) (append (map (lambda (lst) (append lst (car bs))) as) accum))))))))))
(loop l (length l) n '()))
The benchmark runs in 993 ms of user time and allocates only 36.5 MB of memory, on Gambit-C interpreter. This is the absolute, incredible record. Under SCM:
subsets-v3 (called combos by John David Stone) ;Evaluation took 1596 mSec (98 in gc) 657662 cells work, 4721364 env, 97 bytes other
subsets-v5: ;Evaluation took 700 mSec (322 in gc) 1708112 cells work, 610264 env, 105 bytes other That is, more than twice as fast.
Well, compiled code isn't that fast -- but that because subsets-v5 isn't too optimal. The append and map ought to be converted into the accumulation-passing style. That will reduce the amount of garbage as well. But it's past midnight.
The conclusion of this Friday night exercise is astonishingly trite. What a Math teacher told us is true: we have to attack the algorithm if we want to really big improvements. And Math rules!
> This article will show the design of the fastest ever interpreted > subsets function. Sorry I'm too excited about this. Again, we start > with the mathematical definition of the problem, which leads to a > simple, correct and stunningly efficient solution. The final, fastest > ever solution is still pure functional.
> We start with the definition of the problem: given a set 'l' and a > number 'n', return the set of all subsets of 'l' of cardinality > 'n'. Sets are represented by lists.
> The key was to choose the right definition.
> Let ps(L) is a powerset of L that does not include {empty set} (that > is, a singular set whose element is the empty set). > length(ps(L)) = (- (expt 2 (length L)) 1)
> Let L be a non-empty set and let L = A U B where A and B are two > disjoint subsets of L.
> Obviously, > ps(L) = ps(A U B) = > ps(A) U ps(B) U { y U x | x <- ps(A), y <- ps(B) } > Let (subsets L n) = (filter (lambda (el) (= n (length el))) ps(L) )
> Thus the desired function 'subsets' is a filtered powerset ps. This > seems to be a stupid definition. Let's not be hasty however. Note that > filter and union commute: the filter of a union of sets is the union > of filtered sets. Therefore, from the previous expression
> (subsets L n) = (subsets (union A B) n) > = (subsets A n) U (subsets B n) > Union{ y U x | x <- (subsets A k), y <- (subsets B (- n k)), > k=1,n-1 }
> Well, this is it. Note, we didn't say how to split L into two disjoint > subsets A and B. We can do as we wish. For example, we can choose to > split in such a way so that (length B) is n. In the most difficult > case where n = (/ (length L) 2), this corresponds to a "divide and > conquer" strategy, so to speak (at least in the first stages).
> The Scheme code below implements this idea verbatim. It uses the > accumulator-passing style that was expounded earlier.
> (define (subsets-v5 l n)
> ; The initialization function. Check the boundary conditions > (define (loop l ln n accum) > (cond > ((<= n 0) (cons '() accum)) > ((< ln n) accum) > ((= ln n) (cons l accum)) > ((= n 1) > (let fold ((l l) (accum accum)) > (if (null? l) accum > (fold (cdr l) (cons (cons (car l) '()) accum))))) > (else > (split l ln n accum))))
> ; split l in two parts a and b so that (length b) is n > ; Invariant: (equal? (append a b) l) > ; ln is the length of l > (define (split l ln n accum) > (let loop ((a '()) (b l) (lna 0) (lnb ln)) > (if (= lnb n) (cont a lna b lnb n accum) > (loop (cons (car b) a) (cdr b) (+ 1 lna) (- lnb 1)))))
> ; The main body of the algorithm > (define (cont a lna b lnb n accum) > (let* ((accum > (loop a lna n accum)) > (accum ; this is actually (loop b lnb n accum) > (cons b accum)) > ) > (let inner ((k 1) (accum accum)) > (if (> k (min lna (- n 1))) ; don't loop past meaningful boundaries > accum > (let ((as (loop a lna k '())) > (bs (loop b lnb (- n k) '()))) > (inner (+ 1 k) > ; compute the cross-product of as and bs > ; and append it to accum > (let fold ((bs bs) (accum accum)) > (if (null? bs) accum > (fold (cdr bs) > (append > (map (lambda (lst) (append lst (car bs))) as) > accum))))))))))
> (loop l (length l) n '()))
> The benchmark runs in 993 ms of user time and allocates only 36.5 MB > of memory, on Gambit-C interpreter. This is the absolute, incredible > record. Under SCM:
> subsets-v3 (called combos by John David Stone) > ;Evaluation took 1596 mSec (98 in gc) 657662 cells work, 4721364 env, 97 bytes other
> subsets-v5: > ;Evaluation took 700 mSec (322 in gc) 1708112 cells work, 610264 env, 105 bytes > other > That is, more than twice as fast.
> Well, compiled code isn't that fast -- but that because subsets-v5 > isn't too optimal. The append and map ought to be converted into the > accumulation-passing style. That will reduce the amount of garbage as > well. But it's past midnight.
> The conclusion of this Friday night exercise is astonishingly trite. > What a Math teacher told us is true: we have to attack the algorithm > if we want to really big improvements. And Math rules!
o...@pobox.com (o...@pobox.com) writes: > The benchmark runs in 993 ms of user time and allocates only 36.5 MB > of memory, on Gambit-C interpreter. This is the absolute, incredible > record. Under SCM:
> subsets-v3 (called combos by John David Stone) > ;Evaluation took 1596 mSec (98 in gc) 657662 cells work, 4721364 env, 97 bytes other
> subsets-v5: > ;Evaluation took 700 mSec (322 in gc) 1708112 cells work, 610264 env, 105 bytes > other > That is, more than twice as fast.
11 12 13 14 15 16 17 18 19 20) 10) (values))) (time (begin (subsets-v3 (...) ...) ...)) 4 collections 200 ms elapsed cpu time, including 110 ms collecting 195 ms elapsed real time, including 106 ms collecting 4300568 bytes allocated, including 1818144 bytes reclaimed
11 12 13 14 15 16 17 18 19 20) 10) (values))) (time (begin (subsets-v5 (...) ...) ...)) 12 collections 400 ms elapsed cpu time, including 320 ms collecting 456 ms elapsed real time, including 365 ms collecting 13524544 bytes allocated, including 4595512 bytes reclaimed
That is: Apart from garbage collection, the two procedures are very closely comparable -- but SUBSETS-V3 creates only a third as much garbage.
My guess is that, in my testing environment, SUBSETS-V3 is getting a bigger advantage from Chez Scheme's optimizations. The results I get under MzScheme are similar to Oleg's:
John David Stone - Lecturer in Computer Science and Philosophy Manager of the Mathematics Local-Area Network Grinnell College - Grinnell, Iowa 50112 - USA st...@cs.grinnell.edu - http://www.cs.grinnell.edu/~stone/
In article <7eb8ac3e.0201120056.3fc23...@posting.google.com>,
o...@pobox.com <o...@pobox.com> wrote: >The conclusion of this Friday night exercise is astonishingly trite. >What a Math teacher told us is true: we have to attack the algorithm >if we want to really big improvements. And Math rules!
John Rice at Purdue likes to say that in the past forty years algorithmic improvements (FFT, multigrid, fast multipole methods, ...) have accounted for more of the speedup in scientific computing than improvements in hardware.
o...@pobox.com (o...@pobox.com) writes: > This article will show the design of the fastest ever interpreted > subsets function. Sorry I'm too excited about this. Again, we start > with the mathematical definition of the problem, which leads to a > simple, correct and stunningly efficient solution. The final, > fastest ever solution is still pure functional. > [...]
I had a little scripting fun with measuring the times for the different programs, and it looks like the memoized simple version still beats everything else. subsets-v23 has the best GC time, and subsets-v5 has a slight advantage in its (GC - CPU) time for some inputs. All this is on MzScheme 103, so other systems might look a little different...
All graphs are timings of different lengths out of a set of 22 elements, msubset is the memoization of Oleg's first version, msubset1 is even simpler, removing the special case for len=1. The code is at
The subsets implementations posted so far have been interesting. I would like to throw a lazy stream implementation into the ring. I haven't benchmarked it extensively but it seems competitive with the code suggested so far (much faster than the slow implementations but not quite as fast as the fastest).
Obviously a lazy functional language like Haskell shows this implementation in its best light. As Oleg posted, we can define the subsets function mathematically as
subsets(S, 0) = {{}} subsets({}, n) = {} subsets({a} U S, n) = {{a}} x subsets(S, n-1) U subsets(S, n)
This translates directly into Scheme or Haskell. Scheme code has already been posted. In Haskell it looks like this:
This is slow, partly because it conses too much but mostly because the same subproblems are solved repeatedly. Memoization can be used to attack this performance problem. The lazy streams approach will give many the benefits of memoization while avoiding most of the disadvantages.
Memoization pros
- can be applied automatically without changing the naive implementation
- memoized results can be saved across top level invocations of the function
Memoization cons
- not always faster
- doesn't work well if function arguments are complex (the required hash compare might be too slow)
- usually implemented using imperative machinery under the hood (although the resulting memoized function has functional behavior)
- can lead to horrendous space leaks in the memo tables; recovering this space might require manual control partly negating the first pro listed above
In a lazy stream formulation, the nth element of the lazy stream will be a list of the subsets of cardinality n. In other words, the stream will be [subsets(S,0), subsets(S,1), subsets(S,2), ...]. The first stream element will always be the list of the empty set. For all n greater than length of the input list the element will be null. This is very smooth in Haskell:
subsets s n = (subsets_stream s) !! n
subsets_stream [] = [[]] : repeat [] subsets_stream (x:xs) = let r = subsets_stream xs s = map (map (x:)) r in [[]] : zipWith (++) s (tail r)
As a small bonus We can easily use subsets_stream to implement the powerset function:
powerset s = concat (takeWhile (not . null) (subsets s))
Unfortunately this isn't as pretty in Scheme. Comparing the Scheme implementation below to Haskell, Scheme is far more verbose since we have to explicitly request laziness, we have to provide lazy equivalents of standard library functions (CONS, CAR, CDR, LIST-REF and MAP, although these could be put in an external library) and finally currying is more convenient to use in Haskell than it is in Scheme. (Often Haskell's pattern matching would be an advantage as well, but here it plays no important part.) One tricky part about explicitly requesting laziness is that it's easy to get wrong. Look at the mix of uses of STREAM-MAP vs MAP and STREAM-CONS vs CONS in SUBSET-STREAMS below.
Since most Scheme implementations don't natively support R5RS macros, I have simply replaced all uses of STREAM-CONS with (CONS (DELAY a) (DELAY b)). Notice that I use fully lazy streams. The streams in SICP have lazy tails but eager heads.
; Most Scheme implementations don't have R5RS macros, alas. ;(define-syntax stream-cons ; (syntax-rules () ; ((stream-cons hd tl) (cons (delay hd) (delay tl)))))
(define (subsets s n) (stream-ref (subsets-stream s) n))
;; Return a lazy stream of all subsets of list XS. The nth element of ;; the stream is a list of the subsets of cardinality n (counting from 0). (define (subsets-stream xs) (if (null? xs) (cons (delay (list '())) (delay (stream-repeat '()))) (let* ((r (subsets-stream (cdr xs))) (s (stream-map (lambda (ss) (map (lambda (y) (cons (car xs) y)) ss)) r))) (cons (delay (list '())) (delay (stream-map append s (stream-cdr r)))))))
;; The remainder should be part of a lazy streams library. ;; These are the lazy list equivalents of CAR, CDR, LIST-REF and MAP. ;; STREAM-REPEAT is inspired by Haskell's repeat function.
(define (stream-car s) (force (car s)))
(define (stream-cdr s) (force (cdr s)))
;; Create an infinite stream of X's. (define (stream-repeat x) (cons (delay x) (delay (stream-repeat x))))
(define (stream-ref s n) (if (= n 0) (stream-car s) (stream-ref (stream-cdr s) (- n 1))))
;; Answer to an easy exercise in SICP. Most SICP exercises are harder ;-) (define (stream-map f . streams) (if (null? (car streams)) '() (cons (delay (apply f (map stream-car streams))) (delay (apply stream-map (cons f (map stream-cdr streams)))))))
Doug Quale <qua...@charter.net> writes: > [...] The lazy streams approach will give many the benefits of > memoization while avoiding most of the disadvantages.
> Memoization pros
> - can be applied automatically without changing the naive > implementation
> - memoized results can be saved across top level invocations of the > function
Well, your implementation fails *both* of these, the second one is not important, but I consider the first one very important.
> Memoization cons
> - not always faster
Of course you should know when to use it, but when you *do*, it's better than changing the function with an explicit matrix or whatever.
> - doesn't work well if function arguments are complex (the required > hash compare might be too slow)
Same as above -- the basic assumption is that an argument hash lookup is much faster than the computation for that argument, so you should know when to use it. (Also, know what hash to use, the subsets was particularly easy since an eq? test is enough.)
> - usually implemented using imperative machinery under the hood > (although the resulting memoized function has functional > behavior)
Ah, but that's not a bad thing -- it's a good thing that you can throw efficiency machinery on the underlying implementation and keep your code functional -- this is exactly the advantage of memoization over standard dynamic programming techniques. (The implementation of Haskell is imperative too...)
> - can lead to horrendous space leaks in the memo tables; recovering > this space might require manual control partly negating the first > pro listed above
Either you know what you're doing (for example, have a wrapper that uses a fresh table for every top level call), or you can simply use weak pointers.
> In a lazy stream formulation, the nth element of the lazy stream > will be a list of the subsets of cardinality n. [...]
That's one formulation, but it points to one thing which is nice when you're lazy -- you never care to generalize your code making it do more than you really need since redundant computations will not be done... (My standard example is instead of finding *a* solution for the n-queen problem, you find all of them and pull out the first.)
> As a small bonus We can easily use subsets_stream to implement the > powerset function:
Eli Barzilay <e...@barzilay.org> writes: > Doug Quale <qua...@charter.net> writes:
> > [...] The lazy streams approach will give many the benefits of > > memoization while avoiding most of the disadvantages.
> > Memoization pros
> > - can be applied automatically without changing the naive > > implementation
> > - memoized results can be saved across top level invocations of the > > function
> Well, your implementation fails *both* of these, the second one is not > important, but I consider the first one very important.
I don't think of lazy streams as requiring changes to the naive implementation. I think lazy is the naive implementation if you look at it the right way -- it follows directly from the mathematical definition of the problem. It does require a different perspective, so maybe it isn't as naive as I would like to think.
Saving results across top level invocations is at odds with the space leak problem I mention later. For a restricted set of arguments for this specific problem, you can encapsulate the answers for all of the integer subsets in one place. This is a bit ugly in Scheme, and it also changes the arguments to the subsets function. (Instead of passing a list and a number we need two numbers.)
;; Return a list of all subsets of the integers {1, ..., N} of ;; cardinality R. (define subsets (let ((all-subsets (cons (delay '(()) (delay (int-subsets-stream-from 1)))))) (lambda (n r) (stream-ref (stream-ref all-subsets n) r))))
Space leaks are a problem here and would require manual intervention. (No better than memoization on this point.)
Ignoring the space issues, this is a one-liner in Haskell:
int_subsets = [subsets_stream [1 .. n] | n <- [0 ..]] subsets n r = (int_subsets !! n) !! r
I didn't mention the performance benefits of memoization from avoiding repeated computation of the same subproblems since it is an obvious feature. I was thinking of it when I wrote "many", but perhaps many wasn't the right word when I meant just two out of three.
> > Memoization cons
> > - not always faster
> Of course you should know when to use it, but when you *do*, it's > better than changing the function with an explicit matrix or whatever.
I agree. One thing to watch out for is that it isn't always clear whether memoization will help. It is possible to have a single function that when invoked with certain arguments will benefit from memoization and with others will not -- how do you choose when to memoize?
Of course laziness can certainly hurt performance as well. In Haskell this problem shows up in the other direction -- you have to decide when you want to be strict instead of lazy. My guess is that this is normally harder than deciding when memoization will help.
> > - doesn't work well if function arguments are complex (the required > > hash compare might be too slow)
> Same as above -- the basic assumption is that an argument hash lookup > is much faster than the computation for that argument, so you should > know when to use it. (Also, know what hash to use, the subsets was > particularly easy since an eq? test is enough.)
Certainly. This basic assumption for memoization means that lazy streams will work some times when memoization is not appropriate. The lazy implementation doesn't need to compare function arguments.
> > - usually implemented using imperative machinery under the hood > > (although the resulting memoized function has functional > > behavior)
> Ah, but that's not a bad thing -- it's a good thing that you can throw > efficiency machinery on the underlying implementation and keep your