Skip to content

Wonky STRING_MATCHES_REGEXP behaviour - is there a bug reporting thread or am I stupid?

polymorphedsquirrelpolymorphedsquirrel Member Posts: 114
edited July 2019 in General Modding
As far as I understand from the docs, STRING_MATCHES_REGEXP should match always the whole string; therefore
~%string%~ STRING_MATCHES_REGEXP ~%regexp%~ 
and
~%string%~ STRING_MATCHES_REGEXP ~^%regexp%$~
should always be equivalent, unless ~%regexp%~ ends with an odd number of '\' characters.

However, I have:

BEGIN ~gist~
INSTALL_BY_DEFAULT
	ACTION_IF ~~ STRING_MATCHES_REGEXP ~^ $~ THEN BEGIN
		PRINT ~facepalm~
	END
	
/** Space, tab and new line with an optional carriage return character before it if built on a windows system. */
OUTER_TEXT_SPRINT WHITESPACE_CHARS ~ 		
~
/** Regexp for a sequence of ASCII white space characters. */
OUTER_TEXT_SPRINT WHITESPACE_RX ~\([%WHITESPACE_CHARS%]*\)~
/** Regexp for a character which is not ASCII white space. */
OUTER_TEXT_SPRINT NOT_WHITESPACE_RX ~\([^%WHITESPACE_CHARS%]*\)~

/** Regexp matching any sequence of '\' terminated by a character other than '"' and '\', 
  * or an odd sequence of '\' terminated by '"'. */
OUTER_TEXT_SPRINT JSON_STRING_ATOM_RX ~\(\\*[^\"]\|\([^\]\\\(\\\\\)*"\)\)~
/** Any string surrounded by a pair of '"' characters, in which all '"' are escaped with a '\'.*/
OUTER_TEXT_SPRINT JSON_STRING_RX ~"\(\(\\\(\\\\\)*"\)?\(%JSON_STRING_ATOM_RX%*\)\(\(\\\\\)*\)\)"~ 
	

DEFINE_ACTION_FUNCTION is_string
	STR_VAR json = ~~
	RET res
	BEGIN
		OUTER_SET res = 1
		ACTION_IF ~%json%~ STRING_MATCHES_REGEXP ~^%WHITESPACE_RX%%JSON_STRING_RX%%WHITESPACE_RX%$~ THEN BEGIN
			OUTER_SET res = 0
		END
	END

DEFINE_ACTION_FUNCTION is_string2
	STR_VAR json = ~~
	RET res
	BEGIN
		OUTER_SET res = 1
		ACTION_IF ~%json%~ STRING_MATCHES_REGEXP ~%WHITESPACE_RX%%JSON_STRING_RX%%WHITESPACE_RX%~ THEN BEGIN
			OUTER_SET res = 0
		END
	END

LAF is_string STR_VAR json = ~"""~ RET res END 
PRINT ~%res%~
LAF is_string2 STR_VAR json = ~"""~ RET res END
PRINT ~%res%~
Even worse,
~~ STRING_MATCHES_REGEXP ~^ $~
Prints:
facepalm

0

1

What should I do, other than go through all my weidu code and add those '^' and '$' to make it work?
Post edited by polymorphedsquirrel on

Comments

  • TressetTresset Member, Moderator Posts: 8,264
    @polymorphedsquirrel Your thread was caught by the forum's automated spam filter. I have restored it and verified you so that this should not happen again.
  • kjeronkjeron Member Posts: 2,367
    @polymorphedsquirrel Your code looks ... mixed up?
    Both function calls are for "is_string", "is_string2" isn't referenced.
    Neither function has the return variable "res".
    WHITESPACE_RX isn't defined.
    If I correct these (at least to what I think was intended) I think it works correctly, at least on the short sample strings I tried.

    json = ~"""~ returns no match for both (no whitespace before&after)
    json = ~ """ ~ returns no match for both (interior " isn't escaped)
    json = ~ "\"" ~ returns match for both
    json = ~ "\"" \~ returns no match with ^$, match without (odd # trailing \)
  • polymorphedsquirrelpolymorphedsquirrel Member Posts: 114
    Sorry, that's what I get for trying to simplify on the fly. Updated post now contains functioning code and it is repeatable for me.

    Even more worrying is that I found also that empty strings match basically everything. It should be the other way round...
  • ArdanisArdanis Member Posts: 1,736
    edited July 2019
    This is the intended behavior, that's been reported multiple times for more than ten years by now. You can make another report/complaint/request here http://forums.pocketplane.net/index.php/board,44.0.html
    Otherwise consider it STRING_INCLUDES_REGEXP instead and use ^$ if you need to match beginning/end of line.
  • kjeronkjeron Member Posts: 2,367
    Even worse,
    	ACTION_IF ~~ STRING_MATCHES_REGEXP ~^ $~ THEN BEGIN
    		PRINT ~facepalm~
    	END
    
    Prints:
    facepalm
    The return value of STRING_MATCHES_REGEXP is "0/false" when it matches, and "1/true" when it doesn't match, so it prints "facepalm" because ~^ $~ doesn't match ~~.
    I think internally it's checking for a difference, so any difference = true, and no difference = false.

    I'm not sure if it's correct, but I usually end up having to combine checking ^$ with the whitespace, like such:
    ACTION_IF ~%json%~ STRING_MATCHES_REGEXP ~[%WHITESPACE_CHARS%^]*%JSON_STRING_RX%[%WHITESPACE_CHARS%$]*~ THEN BEGIN
    
  • polymorphedsquirrelpolymorphedsquirrel Member Posts: 114
    @Ardanis, so you're saying basically STRING_MATCHES_REGEXP is exactly the same as STRING_CONTAINS_REGEXP? And that it is intended?

    As for the empty string, evidently an oversight on my part. I am sure the original problem was valid, but the code has been twice refactored since then and I can't recollect what it was.

    On an unrelated note: is it possible to simply add a new proficiency (as in a completely new weapon type) or is something hardcoded about it?
  • ArdanisArdanis Member Posts: 1,736
    edited July 2019
    so you're saying basically STRING_MATCHES_REGEXP is exactly the same as STRING_CONTAINS_REGEXP? And that it is intended?
    I think so? String matching has always been a counter-intuitive mess in WeiDU.
  • kjeronkjeron Member Posts: 2,367
    They do not work exactly the same, but STRING_MATCHES_REGEXP does have issues:
    ACTION_IF ~ab~ STRING_MATCHES_REGEXP ~a~ BEGIN	PRINT ~doesn't match a~		END
    ACTION_IF ~ab~ STRING_MATCHES_REGEXP ~^a~ BEGIN	PRINT ~doesn't match ^a~	END
    ACTION_IF ~ab~ STRING_MATCHES_REGEXP ~b~ BEGIN	PRINT ~doesn't match b~		END
    ACTION_IF ~ab~ STRING_MATCHES_REGEXP ~b$~ BEGIN	PRINT ~doesn't match b$~	END
    ACTION_IF ~ab~ STRING_CONTAINS_REGEXP ~a~ BEGIN	PRINT ~doesn't contain a~	END
    ACTION_IF ~ab~ STRING_CONTAINS_REGEXP ~b~ BEGIN	PRINT ~doesn't contain b~	END
    
    Only prints:
    doesn't match b
    doesn't match b$
    
    It appears to presume a leading ^ (if missing), but not the trailing $.
  • The user and all related content has been deleted.
  • polymorphedsquirrelpolymorphedsquirrel Member Posts: 114
    @subtledoctor
    EEE... pretty sure about which part of the sentence? I must finally learn to not formulate questions in a 'this or that' form ><

    And thanks everyone for help!
  • The user and all related content has been deleted.
Sign In or Register to comment.