On 9 Jan 2002, Thomas Baruchel wrote:
> Brest, le mardi 8 janvier
> Hi, I wrote a function doing the following:
> it takes a list and an integer as arguments and returns a list of lists
> which are all different subsets of length n from the given list:
> (subset '(1 2 3 4) 2)
> --> ((1 2) (1 3) (1 4) (2 3) ( 2 4) (3 4))
> I would like to have your comments: can I improve the function
> (that I use very much; all advice to have quicker or anything better
> will be fine).
> [The sublists are not given in a "pretty" order, because I don't need it;
> since I take them are subsets (and not lists), (4 2) is for me the
> same as (2 4)]
> (define (subsets l n)
> (let ((subsets* '()))
> (letrec ((subsets0 (lambda (l2 l* n2)
> (if (zero? n2) (set! subsets* (cons l* subsets*))
> (do ((i 0 (+ i 1)))
> ((> i (- (length l2) n2)) subsets*)
> (subsets0 (list-tail l2 (+ i 1))
> (cons (list-ref l2 i) l*) (- n2 1)))))))
> (subsets0 l '() n))))
The basic problem here is that certain subsets are being calculated more
than once: Let's say you have a set like this:
(define l '(a b c d e f g))
And you want the subsets of size 3. First, you'll calculate all the subsets
which start with 'a, defined as 'a prepended onto the subsets of size 2
starting after 'a. The first subset of size 2 you generate will start with
'b, and when you've generated all those, you'll generate those that start at
'c, and so on.
When you're done generating the subsets that start with 'a, you'll generate
the subsets of size 3 that start with 'b, and then (once again) you'll
generate the subsets of size 2 that start with 'c. In fact, all the subsets
you generate as the recursive step of creating sets of size 3 starting with 'b
will have already been generated, as part of the recursive step for 'a. That
causes your time-complexity to skyrocket as n gets larger.
Sometime when I've had a little more sleep and a little less coffee I'll
try to put together a solution. But the problem is similar to the simple (but
highly wasteful) definition of the Fibonacci series:
(define (fibs n)
(cond ((eq? n 0) 0)
((eq? n 1) 1)
(else (+ (fibs (- n 1)) (fibs (- n 2))))))
Any (fibs x) where x is between 0 and n may be calculated many, many times,
when it needn't be calculated more than once. Likewise, there's no need to
calculate the list of sets of size n from l including l's first element more
than once.
---GEC
Projects page: http://home.maine.rr.com/tetsujin/
(M-x depeche-mode)
"Must... Finish... Zaku..."