Sorting Data with the Schwartzian Transform

**KevinADC** · Feb 26 '08, 09:03 AM

Comments, corrections, discussions are welcome.

**briandfoy** · Jun 27 '08, 09:43 PM

The Schwartzian Transform is a type of cached-key sort. The basic form is a map-sort-map using an anonymous array to carry the original datum and the sortable form:

Code:

@sorted_items =
map { $_->[0] }
sort { ... }
map { [$_, ...] }
@items;

Fill in the bits to created the sortable form and the way to sort, and that's it. You get out a sorted list of the input elements which you can then use for the next bit (instead of creating completely new strings as you do):

Code:

#!/usr/bin/perl
use strict;
use warnings;

my %employees;

chomp( my @headers = split /\t/, <DATA> );

while (<DATA>) {
	chomp;
	my %hash;
	@hash{@headers} = split/\t/;
	$employees{ $hash{EmployeeID} } = \%hash;
	}

my @sorted_keys = 
	map { $_->[0] }
	sort { $a->[1] cmp $b->[1] }
	map {
		my ($m,$d,$y) = split/-/, $employees{$_}{HireDate};
		[$_, "$y$m$d"];
		} 
	keys %employees;

foreach my $key ( @sorted_keys )
	{
	print "$employees{$key}{Name} was hired on $employees{$key}{HireDate}\n";
	}
	
__DATA__
EmployeeID	Name	HireDate	Position	Department	Salary
M011	Smith,John,T	01-12-1981	Welder	Maintenance	20.00 p/h
M102	Hart,Thomas,J	06-30-1982	Supervisor	Maintenance	28.50 p/h
J309	Jones,Steve,W	02-23-1990	Janitor	Janitorial	15.50 p/h
A124	White,Mary,H	03-15-1990	Assembler	Assembly	8.75 p/h
Q365	Miles,Frank,R	12-01-1999	Inspector	Quality	16.25 p/h
A316	Roberts,Andy,P	07-30-2006	Assembler	Assembly	8.00 p/h
S554	William,Terry,K	03-3-2005	Expeditor	Stock Room	9.00 p/h
R078	Norris,Chris,K	04-17-2003	Clerk	Shipping/Recieving	9.00 p/h
X832	Anderson,Jane,M	03-23-1992	VP Operations	Mangement	33.00 p/h
X111	Kelly,Mark,D	05-29-1989	Engineer	Engineering	26.75 p/h

Also note that you can say <$in> or <tt>readline $in</tt> because those are the same thing. Don't combine them :)

For more information, reading the "Practical Reference Tricks" chapter in <i>Intermedia te Perl</i>. Good luck :)

**KevinADC** · Jul 2 '08, 09:03 AM

Originally posted by briandfoy

The Schwartzian Transform is a type of cached-key sort. The basic form is a map-sort-map using an anonymous array to carry the original datum and the sortable form:

Code:

@sorted_items =
map { $_->[0] }
sort { ... }
map { [$_, ...] }
@items;

Fill in the bits to created the sortable form and the way to sort, and that's it. You get out a sorted list of the input elements which you can then use for the next bit (instead of creating completely new strings as you do):

Code:

#!/usr/bin/perl
use strict;
use warnings;

my %employees;

chomp( my @headers = split /\t/, <DATA> );

while (<DATA>) {
	chomp;
	my %hash;
	@hash{@headers} = split/\t/;
	$employees{ $hash{EmployeeID} } = \%hash;
	}

my @sorted_keys = 
	map { $_->[0] }
	sort { $a->[1] cmp $b->[1] }
	map {
		my ($m,$d,$y) = split/-/, $employees{$_}{HireDate};
		[$_, "$y$m$d"];
		} 
	keys %employees;

foreach my $key ( @sorted_keys )
	{
	print "$employees{$key}{Name} was hired on $employees{$key}{HireDate}\n";
	}
	
__DATA__
EmployeeID	Name	HireDate	Position	Department	Salary
M011	Smith,John,T	01-12-1981	Welder	Maintenance	20.00 p/h
M102	Hart,Thomas,J	06-30-1982	Supervisor	Maintenance	28.50 p/h
J309	Jones,Steve,W	02-23-1990	Janitor	Janitorial	15.50 p/h
A124	White,Mary,H	03-15-1990	Assembler	Assembly	8.75 p/h
Q365	Miles,Frank,R	12-01-1999	Inspector	Quality	16.25 p/h
A316	Roberts,Andy,P	07-30-2006	Assembler	Assembly	8.00 p/h
S554	William,Terry,K	03-3-2005	Expeditor	Stock Room	9.00 p/h
R078	Norris,Chris,K	04-17-2003	Clerk	Shipping/Recieving	9.00 p/h
X832	Anderson,Jane,M	03-23-1992	VP Operations	Mangement	33.00 p/h
X111	Kelly,Mark,D	05-29-1989	Engineer	Engineering	26.75 p/h

Also note that you can say <$in> or <tt>readline $in</tt> because those are the same thing. Don't combine them :)

For more information, reading the "Practical Reference Tricks" chapter in <i>Intermedia te Perl</i>. Good luck :)

I am honored you took the time to post your comments Brian. It is very much appreciated.

Regards,
Kevin