[Date Prev][Date Next][Thread Prev][Thread Next] [Search] [Date Index] [Thread Index]

Re: [MacPerl] sorting arrays



At 23.45 -0400 1999.06.16, Richard Gordon wrote:
>I had essentially the same thought since hashes are very fast and,
>until earlier today, I had never even heard of a Schwartzian
>Transform. Efficiency is a splendid thing, but short of making time
>back up, I don't know how my perl stuff could run much faster than it
>does now using extremely ordinary hash and array techniques.

It depends on your code.  In some of my programs, the ST was essential to a
proper fast sort.  I implemented sort(1) in perl, and have the same code in
my File::Sort module.  In these, you can specify several different sorts on
the same dataset.  Now, I could have done this:

  sort {
    (substr($a, 1, 3) cmp substr($b, 1, 3))
                       ||
    (substr($b, 3, 1) <=> substr($a, 3, 1))
                       ||
                   $a cmp $b
  }

But that would be a gigantic waste of time on a large data set.

  map { $_->[0] }
  sort { $a->[1] cmp $b->[1] || $b->[2] <=> $a->[2] || $a->[0] cmp $b->[0] }
  map { [ $_, substr($_, 1, 3), substr($_, 3, 1) ] }

This is MUCH faster for big data sets, such as I might encounter using in
sort(1).  The math is pretty clear: you need to do a certain amount of
comparisons, which grows as your data set grows, and for each comparison,
you need to perform a function on the data.  If you only perform a function
on that data once instead of a dozen times, you have a clear and
significant gain, which becomes more significant as your data set grows,
because all those additional substr()'s only need to be done once in an ST.
Heck, if you even need to munge the data ONCE in a sort block, you need an
ST.

There are other ways to do it, but they are all just variations on the same
theory: make a data structure of portions of your data to sort on, sort the
elements of that data structure that are to be sorted, then pull the wanted
elements back out of the other side.  Those "extremely ordinary hash and
array techniques" you could use to accomplish the same thing basically
would all end up doing an ST ... they would probably just get there more
slowly, because you'd have to create temporary data structures that don't
happen in the ST.  I've seen this, which isn't bad, but just suboptimal:

  my @tmp = map { [ ... ] };
  @tmp = sort { ... } @tmp;

Then they either use @tmp directly with pulling out $tmp[i]->[0], or map it
into YA data structure.  It is the same basic thing, again, it just
requires more moving around of data, when we are trying to move data around
as little as possible.


>Maybe Schwartzian Transform is cool and maybe it isn't, but it
>doesn't seem likely to put any more money in your pocket in either
>case.

Well, some of us don't code primarily for profit.  And if you are using
Perl, I hope you can appreciate that, because Perl exists, and continues to
exist, only because of that fact.

--
Chris Nandor          mailto:pudge@pobox.com         http://pudge.net/
%PGPKey = ('B76E72AD', [1024, '0824090B CE73CA10  1FF77F13 8180B6B6'])

===== Want to unsubscribe from this list?
===== Send mail with body "unsubscribe" to macperl-request@macperl.org