NAME Parse::ExuberantCTags::Merge - Efficiently merge large exuberant ctags files SYNOPSIS use Parse::ExuberantCTags::Merge; my $merger = Parse::ExuberantCTags::Merge->new(); $merger->add_file('perltags.old', sorted => 0); $merger->add_file('perltags.new', sorted => 1); $merger->add_file('perltags.new2', sorted => 1); # potentially add more files... # sorting happens only when you call 'write': $merger->write('perltags.out'); DESCRIPTION This Perl module is intended to merge multiple *exuberant ctags* files. The synopsis says all about the interface. In order to be as efficient as possible, the module uses different sort methods depending on the input data. In the general case, it will use the Sort::External module to process the data. There are a few exceptions: Pre-sorted input files If two or more input files contain sorted data, we use the a merge sort to efficiently sort them before merging with the remaining data. Small input files If the total size of the input files is small, we load them into memory and use Perl's fast sort function. Default limit: "2^21B == 4MB". Super-small input files If the total size of the input files is extremely small, we ignore whether they're sorted or not and simply resort to Perl's sort. Default limit: "2^17B == 128kB". The sorting modules are loaded at run-time on demand only. METHODS new Creates a new merger object. add_file Adds a file to the merging process. First argument must be the file name followed by an optional named argument 'sorted' (default: false) which affects the way the data will be merged. Mixing sorted with unsorted files is possible and will produce a sorted output. Pre-sorted files are naturally somewhat faster to merge. small_size_threshold Set this to the threshold under which the total size of the input files is to be considered small enough to be sorted in memory (see above). The default should be fine. super_small_size_threshold Set this to the threshold under which the total size of the input files is to be considered small enough to be sorted in memory regardless of whether the input was partly sorted (see above). The default should be fine. This makes more sense than it sounds. Perl's sort function is fast. For small amounts of data, its low overhead wins significantly over the sort complexity. tempdir You can use this to set the location of the temporary files that are used for sorting and merging large files. By default, it goes into "File::Spec-"tmpdir()>. TODO Benchmark. SEE ALSO Exuberant ctags homepage: Wikipedia on ctags: Module that can produce ctags files from Perl code: Perl::Tags Module that can parse exuberant ctags files: Parse::ExuberantCTags Sorting modules: Sort::External, File::MergeSort (though we use a home-grown merge-sort) File::PackageIndexer AUTHOR Steffen Mueller, COPYRIGHT AND LICENSE Copyright (C) 2009 by Steffen Mueller This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.6 or, at your option, any later version of Perl 5 you may have available.