Oracle正则表达式实战

2023-09-12 15:40编辑: www.jxszl.com景先生毕设

http://oracle-base.com/articles/misc/regular-expressions-support-in-oracle.php

PL/SQL Enhancements in Oracle Database 10g - Regular Expressions
PL/SQL New Features and Enhancements in Oracle Database 11g Release 1 - Enhancements to Regular Expression Built-in SQL Functions

Oracle 10g introduced support for regular expressions in SQL andPL/SQL with the following functions.

REGEXP_INSTR - Similar to INSTR except it uses a regular expression rather than a literal as the search string. 类似INSTR函数
REGEXP_LIKE - Similar to LIKE except it uses a regular expression as the search string. REGEXP_LIKE is really an operator, not a function. 类似LIKE条件
REGEXP_REPLACE - Similar to REPLACE except it uses a regular expression as the search string. 类似REPLACE函数
REGEXP_SUBSTR - Returns the string matching the regular expression. Not really similar to SUBSTR. 返回匹配正则表达式的字符串，和SUBSTR有点类似

11g开始引入2个新特性：

Learning to write regular expressions takes a little time. If youdon't do it regularly, it can be a voyage of discovery each time. The generalrules for writing regular expressions are available here.You can read the Oracle Regular Expression Support here.
此处不重复正则表达式的定义，代之以一组问题导向的正则表达式应用实例：

The data in a column is free text, but may include a 4 digit year.
DROP TABLE t1;
```
  data VARCHAR2(50)
```
```
 
```
```
INSERT INTO t1 VALUES ('2014 CODE-B');
```
```
INSERT INTO t1 VALUES ('ADSHLHSALK');
```
```
INSERT INTO t1 VALUES ('FALL 2015');
```
```
 
```
```
 
```
```
---------------------------------------------------------------------
```
```
2014 CODE-B
```
```
ADSHLHSALK
```
```
 
```
```
 
```
```
	If we needed to return rows containing a specific year we coulduse the LIKE operator (WHERE data LIKE '%2014%'),but how do we return rows using a comparison (<, <=, >, >=,<>)?
```
如果我们需要返回包含指定年份的数据我们可以使用LIKE操作符（…），但是如何通过不等操作符返回行？一条路是抽出4个数字的年份并转换为数字。通过正则表达式可以很容易实现。

我们使用d或者[0-9]来识别数字。我们需要4个一组，可以使用{4}表示。至此，我们的正则表达式为：d{4}或者[0-9]{4}。REGEXP_SUBSTR函数返回匹配指定正式表达式的字符串，所以可以用来提取我们感兴趣的文本。然后我们只需将其转换为数字并执行比较即可。
```
FROM   t1
```
```
 
```
```
---------------------------------------------------------------------
```
```
2014 CODE-B
```
```
FALL 2015
```
```
4 rows selected.
```
```
SQL>
```
Given a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.
```
	Becomes this. 分割后：
```
Art ADB 1234567 e 9876540
```
CREATE TABLE t1 (
```
```
);
```
```
INSERT INTO t1 VALUES ('ArtADB1234567e9876540');
```
```
	 
```
字符串第一部分为大写字母，可能为A-Z。我们使用[]操作符识别单个字符，至于范围则用“-”，例如“A-Z”，"a-z"或"0-9"。所以如果我们需要找大写的首字母则用“[A-Z]”。其后紧跟着的是若干小写字母，可以用+表示若干（1个或多个）。组合起来的正则表达式即为：[A-Z][a-z]+，这样拆分出的第一列方法有了。

REGEXP_SUBSTR(data, '[A-Z][a-z]+', 1, 1)

第二部分是一组包含1个或多个大写字母。我们知道需要用模式：[A-Z]+,但是为了不和第一部分冲突，我们指明匹配其第2次出现的文本。

REGEXP_SUBSTR(data, '[A-Z]+', 1, 2)

下一部分是一组纯数字。

REGEXP_SUBSTR(data, '[0-9]+', 1, 1)

下一部分是一组小写字母，同样考虑了不和第一部分冲突：

REGEXP_SUBSTR(data, '[a-z]+', 1, 2)

最后，是一组数字：

REGEXP_SUBSTR(data, '[0-9]+', 1, 2)

将以上每一部分正则表达式的输出分别作为独立字段：
```
COLUMN col2 FORMAT A15
```
```
COLUMN col4 FORMAT A15
```
```
 
```
```
       REGEXP_SUBSTR(data, '[A-Z]+', 1, 2) col2,
```
```
       REGEXP_SUBSTR(data, '[a-z]+', 1, 2) col4,
```
```
FROM   t1;
```
```
COL1          COL2          COL3            COL4        COL5
```
```
Art           ADB          1234567         e             9876540
```
```
1 row selected.
```
```
SQL>
```
We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.
DROP TABLE t1;
```
  data VARCHAR2(50)
```
```
 
```
```
INSERT INTO t1 VALUES ('97/95508/BANANA/10-FEB-12');
```
```
COMMIT;
```
We are looking for 1 or more characters that are not"/", which we do using "[^/]+". The "^" in thebrackets represents NOT and "+" means 1 or more. We also want toremove optional double quotes, so we add that as a character we don't want,giving us "[^/"]+". So if we want the data from the thirdcolumn, we need the third occurrence of this pattern.
SELECT REGEXP_SUBSTR(data, '[^/"]+', 1, 3) AS element3
```
 
```
```
---------------------------------------------------------------------
```
```
BANANA
```
```
 
```
```
 
```
```
	Example 4 : REGEXP_REPLACE
```
我们需要提取首字母大写的字符串并将其分离。原始数据如下：
```
CREATE TABLE t1 (
```
```
);
```
```
INSERT INTO t1 VALUES ('SocialSecurityNumber');
```
```
COMMIT;
```
We need to find each uppercase character "[A-Z]". Wewant to keep that character we find, so we will make that pattern asub-expression "([A-Z])", allowing us to refer to it later. For eachmatch, we want to replace it with a space, plus the matching character. Thespace is pretty obvious, but we need to use "1" to signify the textmatching the first sub expression. So we will replace the matching pattern witha space and itself, " 1". We don't want to replace the first letterof the string, so we will start at the second occurrence.
SELECT REGEXP_REPLACE(data, '([A-Z])', ' 1', 2) AS hyphen_text
```
  
```
```
--------------------------------------------------------------------
```
```
House Number
```
```
2 rows selected.
```
```
SQL>
```
We have a specific pattern of digits (9 99:99:99) and we want toknow the location of the pattern in our data.
DROP TABLE t1;
```
  data VARCHAR2(50)
```
```
 
```
```
INSERT INTO t1 VALUES ('.2 02:02:02');
```
```
COMMIT;
```
We know we are looking for groups of numbers, so we can use"[0-9]" or "d". We know the amount of digits in eachgroup, which we can indicate using the "{n}" operator, so we simplydescribe the pattern we are looking for.
SELECT REGEXP_INSTR(data, '[0-9] [0-9]{2}:[0-9]{2}:[0-9]{2}') AS string_loc_1,
```
FROM   t1;
```
```
STRING_LOC_1 STRING_LOC_2
```
```
           1            1
```
```
           3            3
```
```
3 rows selected.
```
```
SQL>
```
We have strings containing parentheses. We want to return the textwithin the parentheses for those rows that contain parentheses.
DROP TABLE t1;
```
  data VARCHAR2(50)
```
```
 
```
```
INSERT INTO t1 VALUES ('This text has no parentheses.');
```
```
COMMIT;
```
The basic pattern for text between parentheses is"∗". The "" characters are escapes for theparentheses, making them literals. Without the escapes they would be assumed todefine a sub-expression. That pattern alone is fine to identify the rows of interestusing a REGEXP_LIKE operator,but it is not appropriate in a REGEXP_SUBSTR, as itwould return the parentheses also. To omit the parentheses we need to include asub-expression inside the literal parentheses ".)". We can then REGEXP_SUBSTR using thefirst sub expression.

..)".
```
COLUMN without_parentheses FORMAT A20
```
```
SELECT data,
```
```
(data, '((.*))', 1, 1, 'i', 1) AS without_parentheses
```
```
WHERE  REGEXP_LIKE(data, '(.*)');
```
```
DATA                                               WITH_PARENTHESES     WITHOUT_PARENTHESES
```
```
This is some text (with parentheses) in it.        (with parentheses)   
This text has (parentheses too).                   (parentheses too)    
 
```
```
 
```
```
	 
```
REGEXP_SUBSTR(data,'.)', 1, 1, 'i', 1) 中最后的i代码不区分大小写，最后1个“1”代表返回哪个子表达式匹配的文本。（范围0-9）

We need to know how many times a block of 4 digits appears intext. The data looks like this.
DROP TABLE t1;
```
  data VARCHAR2(50)
```
```
 
```
```
INSERT INTO t1 VALUES ('1234 1234');
```
```
COMMIT;
```
We can identify digits using "d" or "[0-9]"and the "{4}" operator signifies 4 of them, so using"d{4}" or "[0-9]{4}" with the REGEXP_COUNT functionseems to be a valid option.
SELECT REGEXP_COUNT(data, '[0-9]{4}') AS pattern_count_1,
```
FROM   t1;
```
```
PATTERN_COUNT_1 PATTERN_COUNT_2
```
```
              1               1
```
```
              3               3
```
```
3 rows selected.
```
```
SQL>
```
We need to identify invalid email addresses. The data looks likethis.
DROP TABLE t1;
```
  data VARCHAR2(50)
```
```
 
```
```
INSERT INTO t1 VALUES ('me@example');
```
```
INSERT INTO t1 VALUES ('me.me@example.com');
```
```
INSERT INTO t1 VALUES ('me.me@example-example.com');
```
```
	 
```
下列测试给我们近似不合法的邮箱。
```
FROM   t1
```
```
 
```
```
--------------------------------------------------
```
```
@example.com
```
```
 
```
<pre style="\"white-space:" pre-wrap;="" word-wrap:="" break-word;="" background:="" rgb(242,="" 241,="" 242);\"=""> 3 rows selected.
http://www.itemperor.com/a/SQL/69.html

原文链接：http://www.jxszl.com/biancheng/shujuku/445461.html

"景先生毕设|www.jxszl.com

Oracle正则表达式实战

Oracle 10g introduced support for regular expressions in SQL andPL/SQL with the following functions.

The data in a column is free text, but may include a 4 digit year.

Given a source string, how do we split it up into separatecolumns, based on changes of case and alpha-to-numeric, such that this.

We need to pull out a group of characters from a "/"delimited string, optionally enclosed by double quotes. The data looks likethis.

We have a specific pattern of digits (9 99:99:99) and we want toknow the location of the pattern in our data.

We have strings containing parentheses. We want to return the textwithin the parentheses for those rows that contain parentheses.

We need to know how many times a block of 4 digits appears intext. The data looks like this.

We need to identify invalid email addresses. The data looks likethis.

查看完整版论文请

扫码加QQ

扫码加微信

在线客服

[QQ:351916072]