Home | Back
Sun Nov 23 21:52:58 2003  Ville Laurikari  <vl@iki.fi>

* Released tre-0.6.2.

Sun Nov 23 18:40:57 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Bugfix.
If the TNFA has a loop with an empty back reference, the matcher
went to an infinite loop.   This happened e.g. with the regexp

* lib/regexec.c (tre_match_approx): Fixed to return error if the
regexp has back references.

* lib/tre-compile.c (tre_parse): Bugfix in parsing empty
expressions and missing closing parentheses.

Sun Nov 16 20:27:17 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed a bug which
caused non-optimal matches to be returned in some cases.

* tests/retest.c: Added a couple of tests.

Sat Nov 15 15:21:23 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-compile.c (tre_expand_ast): Fixed to handle nested
repeats correctly.

Wed Nov 12 21:45:44 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-compile.c: Fixed to compile if REG_LITERAL is not

* lib/tre-match-backtrack.c: Fixed to compile without wide
character support.

Thu Nov  6 22:23:03 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-parallel.c: Bugfix.  If pmatch[] was null, the
matcher loop referred past an array, sometimes crashing.

Mon Nov  3 17:41:37 2003  Ville Laurikari  <vl@iki.fi>

* Released tre-0.6.0.

Sun Nov  2 20:27:37 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-compile.c (tre_parse): Implemented support for
REG_LITERAL.  If REG_LITERAL is used, the entire regexp is
interpreted as a literal word.

Tue Oct 14 21:20:35 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-compile.c: Changed the parser to use a context object
which contains all parse state instead of passing each state
variable separately.  Fixed bug that caused `have_backrefs' to be
reset if macros were used (this caused the wrong matcher to be
used and back references not to work).

Mon Sep 29 21:43:35 2003  Ville Laurikari  <vl@iki.fi>

* Added a "tre_" prefix to all functions that did not yet have it.

* lib/tre-compile.c: Separated regexp compilation from regcomp.c
to this file.  Now all actual functionality is implemeted in
lib/tre-*.c, and lib/reg*.c have the POSIX API wrappers.

* Implemented new syntax to control approximate matching
parameters dynamically during matching.  Thanks to Bill Yerazunis
for the suggestions!

Wed Sep  3 19:41:40 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-backtrack.c (tre_tnfa_run_backtrack): Bugfix.  Now
matching back references works correctly in wide character mode.

Thu Jul  3 19:47:33 2003  Ville Laurikari  <vl@iki.fi>

* configure.ac: Made --disable-system-abi the default.

* configure.ac: alloca() is no longer required.  Unless
--without-alloca is specified, alloca() will be used if found.

Fri May 15 09:39:34 2003  Ville Laurikari  <vl@iki.fi>

* tre/tre-match-approx.c (tre_tnfa_run_approx): Bugfix in handling
insertions.  If an insertion was found that had better cost than a
previous path, the tag values were not copied resulting in
incorrect match and submatch positions being reported.

Wed May 14 21:14:47 2003  Ville Laurikari  <vl@iki.fi>

* Released tre-0.5.3.

Wed May 14 20:11:43 2003  Ville Laurikari  <vl@iki.fi>

* tre/tre-mem.c (tre_mem_alloc_impl): Bugfix.  The returned
pointer was not always properly aligned.

Tue May 13 19:55:30 2003  Ville Laurikari  <vl@iki.fi>

* configure.ac, lib/regex.h, lib/tre-internal.h: Fixed to compile
if --disable-system-abi is used.

* lib/Makefile.am: Fixed to use $(LTLIBINTL), so gettext is found
on systems where it is not in libc (e.g. FreeBSD has it in

Thanks to Dominick Meglio <codemstr@ptdprolog.net> for the above!

Thu May  8 21:23:30 2003  Ville Laurikari  <vl@iki.fi>

* win32/config.h, win32/tre-config.h: Updated and fixed
compilation errors on Windows.  Enabled wide character and
multibyte support.

* win32/tre.dsp, win32/retest.dsp: Link against msvcprt.lib to get
wide character functions.

* src/retest.c: Don't try to call setlocale() on Windows (it seems
to crash).

* lib/regcomp.c (parse_re): Fixed bugs in the regexp parser when
wide character support is not used.  Also fixed some references
past the end of the input string.

* lib/regex.h: regcomp, regexec, regerror, and regfree weren't
defined if TRE_WCHAR was not defined.  Fixed.

Tue Apr 15 22:37:48 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-approx.c (tre_tnfa_run_approx): Fixed bugs.  A
match starting earlier was sometimes preferred over a match with a
smaller cost, and insertions were not handled correctly.

* src/agrep.c: Implemented the -B (best match) mode.  It scans the
input files twice; first to find out what is the cost of the best
matching record(s), and another time to output all records that
match with that cost.

* test/test-approx.c: Added some simple test cases.

Sun Apr 13 13:53:00 2003  Ville Laurikari  <vl@iki.fi>

* doc/tre-api.html, doc/tre-syntax.html: Beginnings of
API and regexp syntax documentation.

* lib/tre-mem.c: Changed to allocate blocks bigger than
TRE_MEM_BLOCK_SIZE if the requested amount is large.  This fixes
REG_ESPACE problems when trying to compile large regexps,
especially ones with a lot of "|".

* Released tre-0.5.2.

Mon Apr  7 18:39:05 2003  Ville Laurikari  <vl@iki.fi>

* lib/regcomp.c, lib/tre-match-parallel.c: Added support for
non-greedy repetition operators "*?", "+?", "??", and "{m,n}?".
They work similarly to the ones in Perl.

* tests/retest.c: Added tests for minimal repetition operators.

* tre.pc.in: Added pkgconfig file.

Tue Apr  1 20:20:37 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-parallel.c, lib/tre-match-approx.c: Fixed
alignment bugs when allocating pointers from a buffer.

Sat Mar 15 20:35:11 2003  Ville Laurikari  <vl@iki.fi>

* lib/regcomp.c (parse_re): Fixed to allow the empty regexp.
These were already allowed inside parentheses (e.g. "(a|)"), but
e.g. "a|" caused REG_EPAREN to be returned.  Now "", "a|", "|a",
"*", "?", etc. work as expected.

Thu Mar 13 19:49:20 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-backtrack.c: Bugfix.  Stopped too early when
scanning asciiz strings.

* configure.in: System ABI support: added checks for absolute path
to regex.h and a field in the system defined regex_t suitable for
storing a pointer to a TNFA.  TRE is now configured by default to
be compatible with the system regex ABI, unless
--disable-system-abi is used.

* lib/regex.h, lib/tre-config.h.in: System ABI support: if
TRE_USE_SYSTEM_REGEX_H if defined, include system regex.h instead
of defining everything here.

* lib/regcomp.c, lib/regexec.c, lib/tre-config.h.in:
System ABI support: use the configured field in regex_t struct for
getting and setting the pointer to the TFNA.

Thu Feb 27 19:32:43 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-*.[ch]: Fixed several references past the end of
the input string.

* lib/tre-match-approx.c: Fixed bugs in submatch tracking.

* configure.in: Added flag --disable-agrep to disable building and
installing agrep.

* Released tre-0.5.1.

Sun Feb 23 14:43:06 2003  Ville Laurikari  <vl@iki.fi>

* Released tre-0.5.0.

Fri Feb 21 19:10:30 2003  Ville Laurikari  <vl@iki.fi>

* win32/: New directory, contains project and workspace files for
compiling TRE and `retest' for Windows with MS Visual C++.
Original version contributed by Aymeric Moizard <jack@atosc.org>,
thank you!

Sun Feb 16 12:55:13 2003  Ville Laurikari  <vl@iki.fi>

* lib/regcomp.c: Rewrote code that adds tags in the AST for
submatch addressing.  Changes include:
  - Submatch boundaries now all have a tag with offset zero.  This
    makes it possible to get correct submatches for approximate
  - Removed marker and boundary tags.  Now nested submatches are
    tracked and that information is used to reset submatches from
    old repetitions.
  - Bounded iterations are now expanded after adding tags instead
    of at parse time.  This makes the code a lot cleaner.
* lib/regexec.c, lib/tre-internal.h: Related changes (no more
marker and boundary tags).

* tests/test-approx.c: Small test program for approximate

Mon Jan 13 20:28:52 2003  Ville Laurikari  <vl@iki.fi>

* lib/tre-match-approx.c: Now returns submatches of approximate
matches in the `pmatch[]' array of the `regamatch_t' struct.

Sun Jan 12 13:52:37 2003  Ville Laurikari  <vl@iki.fi>

* lib/regexec.c, lib/regex.h: Changed API of approximate matching
functions.  This API is easier to extend without having to change
the applications using the API at all.

* src/agrep.c: New command line option --show-cost (-s) to prefix
the cost of the match found to each output line.

* tests/retest.c: Added tests for back referencing.

* lib/*: Rearranged stuff.  Split all three matchers (parallel,
approximate, backtracking) into separate files.  Put tre-mem into
its own files.

Mon Jan  6 21:18:12 2003  Ville Laurikari  <vl@iki.fi>

* utils/autogen.sh: Fixed (must run aclocal before automake).

* m4/vl_check_sign.m4, m4/vl_decl_wchar_max.m4, configure.in:
Updated for new autoconf style (AC_TRY_COMPILE ->

* lib/regcomp.c, lib/regexec.c, lib/regexec-bt.c, lib/tre.h:
Implemented support for back references.  A backtracking routine
implemented in `regex-bt.c' is used instead of the parallel
matcher if back references are used is the regexp.

Fri Nov 29 20:57:52 2002  Ville Laurikari  <vl@iki.fi>

* configure.in: New options --disable-wchar and
--disable-multibyte that disable wchar_t support (and requirement)
and multibyte character support, respectively.

* lib/regcomp.c, lib/regex.h, lib/regexec.c, lib/tre.h: Related

Sun Oct 20 21:55:56 2002  Ville Laurikari  <vl@iki.fi>

* configure.in: Check getopt_long support.

* lib/regcomp.c (ast_compute_tag_info): Merged into ast_add_tags
and removed this function.

* lib/regcomp.c (ast_add_tags, parse_re, parse_bound): Bugfixes.
Range repetition did not work correctly when applied to a
parenthesized subexpression.  For example, "a{5,6}" worked correctly,
but "(a|b){5,6}" did not.

Sun Oct 20 20:27:24 2002  Ville Laurikari  <vl@iki.fi>

* Changed the name of the package from "libtre" to just "TRE".

* lib/regexec.c (tnfa_execute_approx): Implemented approximate
regexp matching.

* lib/regex.h (regaexec, reganexec, regawexec, regawnexec):
Added approximate matching API.

* src/agrep.c: First version of agrep (approximate grep).  Uses
the new approximate matching feature in the matcher library.

* lib/regexec.c (tnfa_execute): Added a loop to quickly skip over
characters that cannot possibly be the first character of a

* lib/regcomp.c (regwncomp): Related changes.

Sat Aug  3 23:42:29 2002  Ville Laurikari  <vl@iki.fi>

* Moved the library part from src/ to lib/, and changed the name of
macros/ to m4/.

* lib/regex.c: Split into `regcomp.c', `regexec.c', and

* lib/regerror.c, lib/regex.h: Threw away regwerror() since it was
pretty useless.

* lib/regerror.c: Internationalized.  The error messages returned
by regerror() are now localized through gettext() if found.
Note that libintl is *not* included in the TRE package.

* po/fi.po: Finnish translation.

* lib/regcomp.c (parse_re): Fixed bugs (there were references to
before the start of the regexp string).

* lib/regcomp.c (parse_re): Fixed bugs in parsing BREs.

* tests/retest.c: Added test cases for the BRE stuff I fixed.

* lib/regexec.c (tnfa_execute): Fixed to work when the length of
the input string is given (e.g. with regnexec()).

Sun Jul 28 00:19:04 2002  Ville Laurikari  <vl@iki.fi>

* lib/Makefile.am: Changed to install the header file `regex.h' to
$(includedir)/tre to avoid accidental inclusion with
"#include <regex.h>".

Mon Jun 24 19:34:50 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c (ast_compute_nfl): Bugfix, did not mark an iteration
node as nullable if the minimum number of iterations was above
zero and the child was nullable.  As a result, e.g. "(a*)+" did not
match the empty string.

* src/regex.c (ast_compute_tag_info): Bugfix, the tree was
traversed in the wrong order resulting in incorrect num_tags
counts for nodes in some cases.  The results ranged from missing
submatches to segfaults.

* src/regex.c (make_transitions): Bugfix, if a transition between
two states was already handled then the code aborted the loop when
it should have just skipped to the next iteration.  The result was
that sometimes some transitions were not added to the NFA and
matches were not found.

* src/regex.c (parse_bracket_items): Bugfix, referred one or two
characters past the end of the string in several places.  E.g.
compiling the regex "[a-" could cause a segfault.   

* src/regex.c (fill_pmatch): Bugfix.  If the marker boundary tag
number is bigger than tnfa->num_tags, the marker boundary tag
does not exist (or rather it does, but is the same as the match
end point).  The code here still used marker_boundary even if it
was bigger than num_tags, causing either segfaults, missing
submatches, or no symptoms.

* src/regex.c (tnfa_execute): Bugfix.  When matches were found,
the first tag value was checked to be smaller than for the
previous match.  Firstly, this was a redundant and useless check.
Secondly, it caused a segfault if REG_NOSUB was used when
compiling the regexp since there are no tags in that case and the
array is NULL.

* tests/retest.c: Added tests for all of the above.  Thanks to
Glenn Fowler for running into the bugs and providing the test

* Released libtre-0.3.2.

Wed Mar 27 21:48:48 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c: Added support for new zero-width assertions \b, \B,
\<, and \>.  Fixed a bug in ^ and $.

* src/regex.c (parse_bound): Bugfix, had forgotten to handle
boundaries of the form "{12,}" altogether.

* src/regex.c (ast_add_tags): Bugfix, set the direction of the
current tag to MAXIMIZE at ADDTAGS_POST_CATENATION, but should not

* src/regex.c (parse_re): A `)' is now interpreted as an ordinary
character in the absence of a matched `('.

* src/regex.c (regwncomp): Bugfix, did not set preg->re_nsub to
the number of parenthesized subexpressions.

* tests/retest.c: Added tests for all of the above.

* src/regex.c: Fixed to be completely thread safe.  A single
compiled regexp can now be used simultaneously in several
contexts, e.g. in main() and a signal handler, or multiple

Wed Mar 20 19:50:37 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c: Added support for Perl-compatible syntax
extensions: \t, \n, \r, \f, \a, \e, \w, \W, \s, \S, \d, \D.

* src/regex.c: Now expands character classes when using 8 bit
character sets so that iswctype() calls are avoided during

Sun Mar  3 17:50:14 2002  Ville Laurikari  <vl@iki.fi>

* macros/*: Updated all macros (they were renamed from AC_* to

* macros/vl_check_sign.m4, macros/vl_decl_wchar_max.m4: Added.

* src/regex.c: Memory management cleanups.  Much of the small
memory blocks, like AST nodes, are now allocated in large blocks
instead of one by one using the `tre_mem_t' allocator.  This got
rid of hundreds of lines of confusing memory management code.

Sat Feb 16 23:47:01 2002  Ville Laurikari  <vl@iki.fi>

* macros/ac_prog_cc_warnings.m4: Updated to version 1.3.

* macros/ac_decl_wchar_max.m4: Added this macro for checking
whether WCHAR_MAX is defined, and defining it if it isn't.

* configure.in: Added some checks for wide character stuff.

* src/regex.c: Added `tre_' prefix to all local type names to
avoid conflicts.

Mon Feb 11 21:48:03 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c (parse_bound, parse_re): Enabled support for
bound expressions.  The iterated atom is duplicated by parsing it
many times -- this seemed to be the simplest way to do it.

* src/Makefile.am: Changed library name from `libregx' to

* tests/retest.c: Added tests for bound expressions.

Sun Feb 10 18:47:23 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c (set_union): Bugfix, had `set1[s2].neg_classes' where
should have been `set2[s2].neg_classes' (this caused crashes).

* src/regex.c (ast_to_efree_tnfa): Bugfix, didn't check for
infinite maximum iteration count before making transitions for

* src/regex.c: Added interfaces regncomp() and regwncomp() which
look only at the first `n' characters of the regexp pattern.  Null
characters are allowed in the regexps when using these functions.

Sat Feb  9 22:39:08 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c: Added wide character interface: regwcomp(),
regwexec(), and regwerror().  They work exactly like regcomp(),
regexec() and regerror() except that the strings are
`wchar_t *'.  Also added support for multibyte character sets.
Fixed a lot of bugs (memory leaks, crashes) here and there.

* tests/retest.c: Added tests for multibyte character sets and
regcomp() error reporting.

* tests/randtest.c: Makes random strings and tries to compile them
with regcomp().  This can be used to find memory leaks and crashes
in the regexp compiler.

Sun Jan 27 21:42:06 2002  Ville Laurikari  <vl@iki.fi>

* src/regex.c: Added support for bracket expressions
        (e.g. "[abc]").  Multicharacter collating elements are not
supported, neither are equivalence classes.

* test: Renamed this directory to `tests'.

Sun Dec  2 20:20:12 2001  Ville Laurikari  <vl@iki.fi>

* First public release.