Google Groups Home
Help | Sign in
Subsets of a list
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  14 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Thomas Baruchel  
View profile
 More options Jan 9 2002, 11:24 am
Newsgroups: comp.lang.scheme
From: baruc...@libertysurf.france (Thomas Baruchel)
Date: 9 Jan 2002 16:24:00 GMT
Subject: Subsets of a list
Brest, le mardi 8 janvier

Hi, I wrote a function doing the following:
it takes a list and an integer as arguments and returns a list of lists
which are all different subsets of length n from the given list:

(subset '(1 2 3 4) 2)
--> ((1 2) (1 3) (1 4) (2 3) ( 2 4) (3 4))

I would like to have your comments: can I improve the function
(that I use very much; all advice to have quicker or anything better
will be fine).
[The sublists are not given in a "pretty" order, because I don't need it;
since I take them are subsets (and not lists), (4 2) is for me the
same as (2 4)]

(define (subsets l n)
  (let ((subsets* '()))
    (letrec ((subsets0 (lambda (l2 l* n2)
                (if (zero? n2) (set! subsets* (cons l* subsets*))
                  (do ((i 0 (+ i 1)))
                      ((> i (- (length l2) n2)) subsets*)
                    (subsets0 (list-tail l2 (+ i 1))
                            (cons (list-ref l2 i) l*) (- n2 1)))))))
      (subsets0 l '() n))))

--
QlpoOTFBWSZTWcwiz1oAAC1fgHQTwOeABVAABAT7Zp4lMAC4hET1DQNGhoBoyGaQYyaZAyaG
QZGmBGDTJE01NMTJkZDQaAUhm7W8Wu9WYGQZg2Vd+s8PsaiAZJoF5jaDsEQUaCEQHgnxdw5H
siRDfoqLyg4gHe6/TCLCgm0gY3zjVSswgknIk85qBbV7GNcqz8yWcUOcrT4SlYICcQUgKxM2
gumlEIhPgCSCC4gUHVb3pREx/vdlGkW5r2P5Z+LuSKcKEhmEWetA | mimencode + bzip2


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
oleg@pobox.com  
View profile
 More options Jan 9 2002, 8:37 pm
Newsgroups: comp.lang.scheme
From: o...@pobox.com (o...@pobox.com)
Date: 9 Jan 2002 17:37:11 -0800
Local: Wed, Jan 9 2002 8:37 pm
Subject: Re: Subsets of a list

baruc...@libertysurf.france (Thomas Baruchel) wrote in message <news:a1hqr0$slp$2@wanadoo.fr>...
> it takes a list and an integer as arguments and returns a list of lists
> which are all different subsets of length n from the given list:
> (subset '(1 2 3 4) 2)
> --> ((1 2) (1 3) (1 4) (2 3) ( 2 4) (3 4))

Let's consider the mathematical definition of this function:

subset '() n ==> '() : there are no subsets in an empty set
subset l 0   ==> '(()): the only subset of size (cardinality 0) is an
                  empty set

subset (a . tail) n ==> { (a . x) | x <- subset tail (n-1) } Union
                        subset tail n
that is, subsets of size n will either contain the element a, or will
not contain it.

This mathematical definition directly translates to Scheme.

(define (subset l n)
  (cond
   ((<= n 0) '(()))
   ((null? l) '())
   ((= n 1) (map list l))
   (else
    (append
     (let ((hd (car l)))
       (map
        (lambda (lst) (cons hd lst))
        (subset (cdr l) (- n 1))))
     (subset (cdr l) n)))))

The solution is purely functional.

Note, the code has an optimization for the case n=1, in which case the
answer is a set of all singleton subsets.

Example:

(display (subset '(1 2 3 4) 0))
===> (())

(display (subset '(1 2 3 4) 1))
===> ((1) (2) (3) (4))

(display (subset '(1 2 3 4) 2))
===> ((1 2) (1 3) (1 4) (2 3) (2 4) (3 4))

(display (subset '(1 2 3 4) 3))
===> ((1 2 3) (1 2 4) (1 3 4) (2 3 4))

(display (subset '(1 2 3 4) 4))
===> ((1 2 3 4))

(display (subset '(1 2 3 4) 5))
===> ()

The cardinalities of the answers above form the Pascal triangle, btw.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Caswell  
View profile
 More options Jan 10 2002, 1:22 am
Newsgroups: comp.lang.scheme
From: George Caswell <tetsu...@maine.rr.com>
Date: Thu, 10 Jan 2002 06:19:52 GMT
Local: Thurs, Jan 10 2002 1:19 am
Subject: Re: Subsets of a list
On 9 Jan 2002, Thomas Baruchel wrote:

   The basic problem here is that certain subsets are being calculated more
than once:  Let's say you have a set like this:

(define l '(a b c d e f g))

And you want the subsets of size 3.  First, you'll calculate all the subsets
which start with 'a, defined as 'a prepended onto the subsets of size 2
starting after 'a.  The first subset of size 2 you generate will start with
'b, and when you've generated all those, you'll generate those that start at
'c, and so on.

When you're done generating the subsets that start with 'a, you'll generate
the subsets of size 3 that start with 'b, and then (once again) you'll
generate the subsets of size 2 that start with 'c.  In fact, all the subsets
you generate as the recursive step of creating sets of size 3 starting with 'b
will have already been generated, as part of the recursive step for 'a.  That
causes your time-complexity to skyrocket as n gets larger.

   Sometime when I've had a little more sleep and a little less coffee I'll
try to put together a solution.  But the problem is similar to the simple (but
highly wasteful) definition of the Fibonacci series:

(define (fibs n)
   (cond ((eq? n 0) 0)
         ((eq? n 1) 1)
         (else (+ (fibs (- n 1)) (fibs (- n 2))))))

Any (fibs x) where x is between 0 and n may be calculated many, many times,
when it needn't be calculated more than once.  Likewise, there's no need to
calculate the list of sets of size n from l including l's first element more
than once.

---GEC
Projects page: http://home.maine.rr.com/tetsujin/
(M-x depeche-mode)
"Must...  Finish...  Zaku..."


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John David Stone  
View profile
 More options Jan 10 2002, 1:16 pm
Newsgroups: comp.lang.scheme
From: John David Stone <st...@cs.grinnell.edu>
Date: 10 Jan 2002 12:16:07 -0600
Local: Thurs, Jan 10 2002 1:16 pm
Subject: Re: Subsets of a list

baruc...@libertysurf.france (Thomas Baruchel) writes:
> I wrote a function doing the following:
> it takes a list and an integer as arguments and returns a list of lists
> which are all different subsets of length n from the given list:

> (subset '(1 2 3 4) 2)
> --> ((1 2) (1 3) (1 4) (2 3) ( 2 4) (3 4))

> I would like to have your comments: can I improve the function
> (that I use very much; all advice to have quicker or anything better
> will be fine).

        Here's a version that appears to be faster.  It uses an accumulator
instead of a non-local variable and uses recursion over the list l2 rather
than over positions in that list, thus avoiding all of the calls to
LIST-REF and LIST-TAIL.

(define (combos l n)
  (letrec ((subsets0 (lambda (l2 l* n2 acc)
                       (if (zero? n2)
                           (cons l* acc)
                           (do ((rest l2 (cdr rest))
                                (available (length l2) (- available 1))
                                (acc acc (subsets0 (cdr rest)
                                                   (cons (car rest) l*)
                                                   (- n2 1)
                                                   acc)))
                               ((< available n2) acc))))))
    (subsets0 l '() n '()))))

        Using Chez Scheme 6.1 under Linux on a 700MHz Pentium III, COMBOS
took 250 ms of cpu time to find all the ten-element subsets of a
twenty-element set; SUBSETS took 460 ms.

--
   John David Stone - Lecturer in Computer Science and Philosophy
           Manager of the Mathematics Local-Area Network
           Grinnell College - Grinnell, Iowa 50112 - USA
     st...@cs.grinnell.edu - http://www.cs.grinnell.edu/~stone/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Max Hailperin  
View profile
 More options Jan 10 2002, 2:53 pm
Newsgroups: comp.lang.scheme
From: m...@gustavus.edu (Max Hailperin)
Date: 10 Jan 2002 11:53:13 -0800
Local: Thurs, Jan 10 2002 2:53 pm
Subject: Re: Subsets of a list
George Caswell <tetsu...@maine.rr.com> wrote in message <news:Pine.LNX.3.96.1020110005414.7936B-100000@adamant.homeworlds.net>...

...
[tree-recursive procedure to find subsets of size n from a list, l]

>    The basic problem here is that certain subsets are being calculated more
> than once:  ...[comparison with tree-recursive fib]...

Actually, there is no big efficiency gain to be had in this case,
because the list that is the result is of length C(|l|, n), so it is
inevitable that any procedure for consing up that list will take time
of that order -- all that you can do is improve the constant factor.
The big win, through dynamic programming or memoization, is only
possible if the problem is changed to something like counting how many
subsets there are.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Caswell  
View profile
 More options Jan 11 2002, 4:00 am
Newsgroups: comp.lang.scheme
From: George Caswell <tetsu...@maine.rr.com>
Date: Fri, 11 Jan 2002 08:57:39 GMT
Local: Fri, Jan 11 2002 3:57 am
Subject: Re: Subsets of a list
On 10 Jan 2002, Max Hailperin wrote:

> George Caswell <tetsu...@maine.rr.com> wrote in message <news:Pine.LNX.3.96.1020110005414.7936B-100000@adamant.homeworlds.net>...
> ...
> [tree-recursive procedure to find subsets of size n from a list, l]
> >    The basic problem here is that certain subsets are being calculated more
> > than once:  ...[comparison with tree-recursive fib]...

> Actually, there is no big efficiency gain to be had in this case,
> because the list that is the result is of length C(|l|, n), so it is
> inevitable that any procedure for consing up that list will take time
> of that order -- all that you can do is improve the constant factor.
> The big win, through dynamic programming or memoization, is only
> possible if the problem is changed to something like counting how many
> subsets there are.

   Hrm, I guess you're right, though I wouldn't have thought so.  My brain's
having a little difficulty with why that's the case.

   Anyway, my implementation, "subs" uses up a lot of heap space compared to,
say, "combos" posted by John Stone, and incurs more garbage collection, but
still comes out faster:

(define (expand xs sets)
    (if (or (null? xs) (null? sets))
        ()
        (cons
            (apply append (map (lambda (set) (map (lambda (s) (cons (car xs)
s)) set)) (cdr sets)))
            (expand (cdr xs) (cdr sets)))
))

(define (subs xs n)
    (cond ((eq? 0 n) ())
        (else
            (do ((i 1 (+ i 1))
                 (acc (map (lambda (x) (list (list x))) xs) (expand xs acc)))
                ((= i n) (apply append acc))))))

   Sorry if I spread a bunch of misinformation with that tree-recursion
comparison: I'm still not sure why I'm wrong, but I had the same problem with
the Monty Hall thing.

---GEC
Projects page: http://home.maine.rr.com/tetsujin/
(M-x depeche-mode)
"Must...  Finish...  Zaku..."


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John David Stone  
View profile
 More options Jan 11 2002, 1:12 pm
Newsgroups: comp.lang.scheme
From: John David Stone <st...@cs.grinnell.edu>
Date: 11 Jan 2002 12:12:36 -0600
Local: Fri, Jan 11 2002 1:12 pm
Subject: Re: Subsets of a list

        I used SUBS to find all the ten-element subsets of a twenty-element
set, in the same environment (Chez Scheme 6.1, Linux, 700MHz Pentium III)
where SUBSETS took 460 ms and COMBOS took 250 ms for the same task.  SUBS
took 990 ms.

--
   John David Stone - Lecturer in Computer Science and Philosophy
           Manager of the Mathematics Local-Area Network
           Grinnell College - Grinnell, Iowa 50112 - USA
     st...@cs.grinnell.edu - http://www.cs.grinnell.edu/~stone/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alamo Petrofsky  
View profile
 More options Jan 11 2002, 3:09 pm
Newsgroups: comp.lang.scheme
From: Alamo Petrofsky <al...@petrofsky.org>
Date: 11 Jan 2002 12:09:17 -0800
Local: Fri, Jan 11 2002 3:09 pm
Subject: Re: Subsets of a list
John David Stone <st...@cs.grinnell.edu> writes:

>         Here's a version that appears to be faster.  It uses an accumulator
> instead of a non-local variable and uses recursion over the list l2 rather
> than over positions in that list, thus avoiding all of the calls to
> LIST-REF and LIST-TAIL.

But it seems you still make a lot of calls to LENGTH, which similarly
do needless list traversals.  Here's how I would write it:

  ;; Returns all n-length subsets of the list l.
  (define (subsets l n)
    ;; ss prepends each n-length subset of l to little-acc, and returns
    ;; a list of the resulting sets, prepended to big-acc.  len must be
    ;; the length of l.
    (let ss ((l l) (len (length l)) (n n) (little-acc '()) (big-acc '()))
      (cond ((zero? n) (cons little-acc big-acc))
            ((< len n) big-acc)
            (else (ss (cdr l) (- len 1) (- n 1) (cons (car l) little-acc)
                      (ss (cdr l) (- len 1) n little-acc big-acc))))))

Is that faster or slower on your system?

-al


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Caswell  
View profile
 More options Jan 11 2002, 5:37 pm
Newsgroups: comp.lang.scheme
From: George Caswell <tetsu...@maine.rr.com>
Date: Fri, 11 Jan 2002 22:34:43 GMT
Local: Fri, Jan 11 2002 5:34 pm
Subject: Re: Subsets of a list
On 11 Jan 2002, John David Stone wrote:

   Strange...  I'd attribute it to garbage collection time - though in any
case, on the tests I ran subs was only marginally faster than combos anyway.
Oh, well, back to the drawing board.  :)

---GEC
Projects page: http://home.maine.rr.com/tetsujin/
(M-x depeche-mode)
"Must...  Finish...  Zaku..."


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
George Caswell  
View profile
 More options Jan 11 2002, 6:30 pm
Newsgroups: comp.lang.scheme
From: George Caswell <tetsu...@maine.rr.com>
Date: Fri, 11 Jan 2002 23:27:34 GMT
Local: Fri, Jan 11 2002 6:27 pm
Subject: Re: Subsets of a list
On 11 Jan 2002, John David Stone wrote:

>         I used SUBS to find all the ten-element subsets of a twenty-element
> set, in the same environment (Chez Scheme 6.1, Linux, 700MHz Pentium III)
> where SUBSETS took 460 ms and COMBOS took 250 ms for the same task.  SUBS
> took 990 ms.

   These are my results:  ys is a 20 element set.  subs2 is a variation on
subs that stores data differently.

> (define a (subs2 ys 10))

;Evaluation took 1760 mSec (0 in gc) 2469846 cells work, 1236201 env, 534
bytes other
#<unspecified>
> (define a (combos ys 10))

;Evaluation took 2740 mSec (30 in gc) 675120 cells work, 4721385 env, 34 bytes
other
#<unspecified>
> (define a (subs ys 10))

;Evaluation took 1880 mSec (10 in gc) 2663988 cells work, 1242383 env, 34
bytes other

   On my interpreter (SCM), subs and subs2 take less time, but use around 4
times as many storage cells for intermediate values.  It's a tradeoff that's
advantageous on some interpreters, I guess.

   Is Chez Scheme in Debian?

   Listing for subs2 follows:

(define (subs-base xs)
  (if (null? xs)
      ()
      (let* ((rest (subs-base (cdr xs)))
             (first (if (null? rest) () (car rest))))
        (cons (cons (list (car xs)) first) rest)
)))

(define (subs-expand xs sets)
  (if (or (null? xs) (null? sets) (null? (cdr sets)))
      ()
      (let* ((rest (subs-expand (cdr xs) (cdr sets)))
             (first (if (null? rest) () (car rest))))
        (cons
         (append (map (lambda (x) (cons (car xs) x)) (cadr sets)) first)
         rest
))))

(define (subs2 xs n)
  (if (eq? 0 n)
      ()
      (do ((i 1 (+ i 1))
           (acc (subs-base xs) (subs-expand xs acc)))
          ((eqv? i n) (car acc)))
))

---GEC
Projects page: http://home.maine.rr.com/tetsujin/
(M-x depeche-mode)
"Must...  Finish...  Zaku..."


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eli Barzilay  
View profile
 More options Jan 11 2002, 7:59 pm
Newsgroups: comp.lang.scheme
From: Eli Barzilay <e...@barzilay.org>
Date: 11 Jan 2002 19:59:18 -0500
Local: Fri, Jan 11 2002 7:59 pm
Subject: Re: Subsets of a list

George Caswell <tetsu...@maine.rr.com> writes:
> > (define a (subs2 ys 10))
> ;Evaluation took 1760 mSec (0 in gc) 2469846 cells work, 1236201 env, 534
> bytes other
> #<unspecified>
> > (define a (combos ys 10))
> ;Evaluation took 2740 mSec (30 in gc) 675120 cells work, 4721385 env, 34 bytes
> other
> #<unspecified>
> > (define a (subs ys 10))
> ;Evaluation took 1880 mSec (10 in gc) 2663988 cells work, 1242383 env, 34
> bytes other

There is a really nice lesson here -- you have this fight over who's
code runs faster, tweaking your code into something more and more
complex, while Oleg wrote a version which is clearly superior in its
clarity (as he shown it to be directly related to the definition).

So, you must have tried his code too, getting to a conclusion that it
is useless if you care about speed...  I tried it too (s1=subsets,
s2=subs, s3=subs2, s4=combos, s5=subset):

| > (define a '(a b c d e f g h i j k l m n o p q r s t u v w x y z))
| > (time (begin (s1 a 8) #f))
| cpu time: 25520 real time: 25485 gc time: 7580
| > (time (begin (s2 a 8) #f))
| cpu time: 10730 real time: 10715 gc time: 5640
| > (time (begin (s3 a 8) #f))
| cpu time: 7290 real time: 7245 gc time: 2610
| > (time (begin (s4 a 8) #f))
| cpu time: 13720