Recursive Queries in SQL Server

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Rabbit
    Recognized Expert MVP
    • Jan 2007
    • 12517

    Recursive Queries in SQL Server

    Introduction
    Starting with SQL Server 2005, you have the ability to create recursive queries using common table expressions (CTE). They are very powerful tools that can be used to query hierarchical data where you don't know beforehand how many times you have to join back to the same table. This is probably the most common use. But they can also be used to do all sorts of things, including but not limited to: Creating n number of rows based on a quantity field, extracting multiple matching substrings out of a field, creating permutations/combinations from a set, or taking date ranges from a row and breaking it up into multiple rows of smaller ranges.

    The Basic Structure of a Recursive CTE
    This is the basic structure of a CTE:
    Code:
    WITH cte AS ( 
    	Base Query
    
    	UNION ALL
    
    	Recursive Query call back to cte
    	WHERE termination check
    )
    
    SELECT * FROM cte
    A recursive CTE is made up of 3 key components.
    1. A base query that returns the initial rows to use in recursion.
    2. A query that has a call back to the CTE itself.
    3. A clause that eventually results in an empty result set so the recursion can terminate.


    #3 is absolutely key. At some point the recursion needs to end so the results can be returned. Otherwise, it will error out when it hits the max recursion set in SQL Server or will run infinitely until terminated if max recursion is set to 0. You can set the max recursion option with OPTION (MAXRECURSION #), where # is a number between 0 and 32767. The default, when the option is not specified, is 100.

    All examples below were created and tested on SQL Server 2008 R2.

    Ex. 1: Create Additional Rows from a Quantity Field

    This example takes a row and repeats the data as many times as stated by the quantity.

    Sample Data:
    Code:
    item	quantity
    a	1
    b	2
    c	3
    d	4
    e	5
    Query:
    Code:
    DECLARE @t TABLE (item char(1), quantity int)
    INSERT INTO @t VALUES ('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5)
    
    ; WITH cte AS (
    	SELECT item, 
    		quantity - 1 AS quantityLeft
    	FROM @t
    	
    	UNION ALL
    	
    	SELECT item, 
    		quantityLeft - 1 AS quantityLeft
    	FROM 
    		cte
    	-- termination clause
    	WHERE quantityLeft > 0 
    )
    
    SELECT * FROM cte ORDER BY item;
    Results:
    Code:
    item	quantityLeft
    a	0
    b	1
    b	0
    c	1
    c	0
    c	2
    d	3
    d	2
    d	1
    d	0
    e	4
    e	3
    e	2
    e	1
    e	0
    Ex. 2: Creating Permutations/Combinations

    This example takes the set of data and creates all the possible combinations and permutations of the rows. Be careful with this one as the number of possibilities grows exponentially.

    Sample Data:
    Code:
    item
    a
    b
    c
    d
    e
    Query:
    Code:
    DECLARE @t TABLE (item CHAR(1))
    INSERT INTO @t VALUES ('a'), ('b'), ('c'), ('d'), ('e')
    DECLARE @maxLen INT
    SET @maxLen = (SELECT COUNT(*) FROM @t)
    
    -- Combination, order does not matter
    ; WITH cte AS (
    	SELECT item, 
    		CONVERT(VARCHAR(255), item) AS combined
    	FROM @t
    	
    	UNION ALL
    	
    	SELECT t.item,
    		CONVERT(VARCHAR(255), cte.combined + t.item) AS combined
    	FROM 
    		cte
    		INNER JOIN @t t
    		ON cte.item < t.item
    	WHERE LEN(cte.combined + t.item) <= @maxLen
    )
    
    SELECT combined FROM cte ORDER BY LEN(combined), combined;
    
    
    -- Permutation, order matters
    ; WITH cte AS (
    	SELECT item, 
    		CONVERT(VARCHAR(255), item) AS combined
    	FROM @t
    	
    	UNION ALL
    	
    	SELECT t.item,
    		CONVERT(VARCHAR(255), cte.combined + t.item) AS combined
    	FROM 
    		cte
    		INNER JOIN @t t
    		ON cte.combined NOT LIKE '%' + t.item + '%'
    	WHERE LEN(cte.combined + t.item) <= @maxLen
    )
    
    SELECT combined FROM cte ORDER BY LEN(combined), combined;
    Combination Results:
    Code:
    combined
    a
    b
    c
    d
    e
    ab
    ac
    ad
    ae
    bc
    bd
    be
    cd
    ce
    de
    abc
    abd
    abe
    acd
    ace
    ade
    bcd
    bce
    bde
    cde
    abcd
    abce
    abde
    acde
    bcde
    abcde
    Permutation Results: Too many rows to put in the article.

    Ex. 3: Extracting Multiple PDF File Names from a Field

    This example substrings out an unknown number of PDF file names from a larger string in a field where each PDF file name ends with .pdf and begins at the first > symbol right before the .pdf and everything else is superfluous.

    Sample Data:
    Code:
    IDField	fieldName
    1	>>>>>>1.pdf test> >b>c>xyz.pdf bob >hello world.pdf foo >womp womp.pdf>
    2	>2.pdf other unnecssary stuff > bar.pdf
    Query:
    The lines that are commented out are the building blocks of finding and substringing out the PDF file name.
    Code:
    DECLARE @f TABLE (fieldName VARCHAR(255), IDField int)
    INSERT INTO @f VALUES('>>>>>>1.pdf test> >b>c>xyz.pdf bob >hello world.pdf foo >womp womp.pdf>', 1)
    INSERT INTO @f VALUES('>2.pdf other unnecssary stuff > bar.pdf', 2)
    
    ; WITH cte2 AS (
    	SELECT 
    		IDField,
    		--PATINDEX('%.pdf%', fieldName) + 3 AS PDFLocation,
    		--SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3)) AS PDFSubstring,
    		--REVERSE(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3))) AS PDFSubstringReverse,
    		--PATINDEX('%>%', REVERSE(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3)))) AS ReverseSymbolLocationBeforePDF,
    		--LEN(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3))) - PATINDEX('%>%', REVERSE(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3)))) + 2 AS SymbolLocationBeforePDF,
    		
    		CONVERT(VARCHAR(255), SUBSTRING(fieldName, 
    			LEN(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3))) - PATINDEX('%>%', REVERSE(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3)))) + 2, 
    			PATINDEX('%.pdf%', fieldName) + 3 - (LEN(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3))) - PATINDEX('%>%', REVERSE(SUBSTRING(fieldName, 1, (PATINDEX('%.pdf%', fieldName) + 3)))) + 2) + 1
    		)) AS PDFName,
    		
    		CONVERT(VARCHAR(255), STUFF(fieldName, 1, PATINDEX('%.pdf%', fieldName) + 3, '')) AS strWhatsLeft
    	FROM @f
    		
    	UNION ALL
    	
    	SELECT 
    		IDField,
    		--PATINDEX('%.pdf%', strWhatsLeft) + 3 AS PDFLocation,
    		--SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3)) AS PDFSubstring,
    		--REVERSE(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3))) AS PDFSubstringReverse,
    		--PATINDEX('%>%', REVERSE(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3)))) AS ReverseSymbolLocationBeforePDF,
    		--LEN(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3))) - PATINDEX('%>%', REVERSE(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3)))) + 2 AS SymbolLocationBeforePDF,
    		
    		CONVERT(VARCHAR(255), SUBSTRING(strWhatsLeft, 
    			LEN(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3))) - PATINDEX('%>%', REVERSE(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3)))) + 2, 
    			PATINDEX('%.pdf%', strWhatsLeft) + 3 - (LEN(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3))) - PATINDEX('%>%', REVERSE(SUBSTRING(strWhatsLeft, 1, (PATINDEX('%.pdf%', strWhatsLeft) + 3)))) + 2) + 1
    		)) AS PDFName,
    		
    		CONVERT(VARCHAR(255), STUFF(strWhatsLeft, 1, PATINDEX('%.pdf%', strWhatsLeft) + 3, '')) AS strWhatsLeft
    	FROM cte2
    	WHERE strWhatsLeft LIKE '%.pdf%'
    )
    
    SELECT * FROM cte2 ORDER BY IDField
    Results:
    Code:
    IDField	PDFName	strWhatsLeft
    1	1.pdf	 test> >b>c>xyz.pdf bob >hello world.pdf foo >womp womp.pdf>
    1	xyz.pdf	 bob >hello world.pdf foo >womp womp.pdf>
    1	hello world.pdf	 foo >womp womp.pdf>
    1	womp womp.pdf	>
    2	2.pdf	 other unnecssary stuff > bar.pdf
    2	 bar.pdf
    Ex. 4: Splitting a Date Range into Multiple Smaller Ranges

    This example takes rows that define a start and end date and creates multiple rows with a max interval of 3 days.

    Sample Data:
    Code:
    TimespanID	StartDate	EndDate
    1	2015-01-01	2015-01-02
    2	2015-01-05	2015-02-11
    Query:
    Code:
    DECLARE @t TABLE (TimespanID int, StartDate DATE, EndDate DATE)
    INSERT INTO @t VALUES (1,'1/1/2015','1/2/2015'), (2, '1/5/2015', '2/11/2015')
    
    ; WITH cte AS (
    	SELECT 
    		TimespanID,
    		StartDate,
    		CASE WHEN DATEADD(DAY, 2, StartDate) > EndDate
    			THEN EndDate
    			ELSE DATEADD(DAY, 2, StartDate)
    		END AS EndDate,
    		EndDate AS OriginalEndDate
    	FROM @t
    		
    	UNION ALL
    	
    	SELECT 
    		TimespanID,
    		DATEADD(DAY, 1, EndDate) AS StartDate,
    		CASE WHEN DATEADD(DAY, 3, EndDate) > OriginalEndDate
    			THEN OriginalEndDate
    			ELSE DATEADD(DAY, 3, EndDate)
    		END AS EndDate,
    		OriginalEndDate AS OriginalEndDate
    	FROM cte
    	WHERE DATEADD(DAY, 1, EndDate) <= OriginalEndDate
    )
    
    SELECT * FROM cte ORDER BY TimespanID
    Results:
    Code:
    TimespanID	StartDate	EndDate	OriginalEndDate
    1	2015-01-01	2015-01-02	2015-01-02
    2	2015-01-05	2015-01-07	2015-02-11
    2	2015-01-08	2015-01-10	2015-02-11
    2	2015-01-11	2015-01-13	2015-02-11
    2	2015-01-14	2015-01-16	2015-02-11
    2	2015-01-17	2015-01-19	2015-02-11
    2	2015-01-20	2015-01-22	2015-02-11
    2	2015-01-23	2015-01-25	2015-02-11
    2	2015-01-26	2015-01-28	2015-02-11
    2	2015-01-29	2015-01-31	2015-02-11
    2	2015-02-01	2015-02-03	2015-02-11
    2	2015-02-04	2015-02-06	2015-02-11
    2	2015-02-07	2015-02-09	2015-02-11
    2	2015-02-10	2015-02-11	2015-02-11
    Ex. 5: Retrieving Hierarchy from a Relationship Table

    This example takes a hierarchy stored in a table where the ParentID is the only link that can be used to establish hierarchy level. Normally this is a problem because you never know what level the employee is at and how many times you have to join back to the table to retrieve the full hierarchy.

    However, this is trivial for a recursive CTE because it will continually self join until the full hierarchy is established.

    Note that this example differs from the other examples in that there is no WHERE clause termination check. Instead, the termination happens when the lowest level of the hierarchy is reached meaning there is no lower level employee that can be joined to because no one reports to them. This also means that if the data is bad, there exists the possibility of an infinite loop in the hierarchy.

    Sample Data:
    Code:
    PK	EmployeeName	ParentID
    1	Mr. CEO	0
    2	Andy	1
    3	Billy Jean	1
    4	Charles	2
    5	Danni	2
    6	Eden	2
    7	Frank	3
    8	Geri	5
    Query:
    Code:
    DECLARE @t TABLE (PK INT, EmployeeName VARCHAR(10), ParentID INT);
    
    INSERT INTO @t VALUES
    	(1, 'Mr. CEO', 0),
    	(2, 'Andy',1 ),
    	(3, 'Billy Jean', 1),
    	(4, 'Charles', 2),
    	(5, 'Danni', 2),
    	(6, 'Eden', 2),
    	(7, 'Frank', 3),
    	(8, 'Geri', 5)
    ;
    
    ; WITH cte AS (
    	SELECT
    		PK,
    		EmployeeName,
    		1 AS HierarchyLevel,
    		CONVERT(VARCHAR(255), PK) AS HierarchySortString
    	FROM @t
    	WHERE ParentID = 0
    	
    	UNION ALL
    	
    	SELECT
    		t.PK,
    		t.EmployeeName,
    		cte.HierarchyLevel + 1 AS HierarchyLevel,
    		CONVERT(VARCHAR(255), HierarchySortString + ',' + CONVERT(VARCHAR(255), t.PK)) AS HierarchySortString
    	FROM @t AS t
    		INNER JOIN cte
    		ON t.ParentID = cte.PK
    )
    
    SELECT
    	PK,
    	REPLICATE('+', HierarchyLevel - 1) + EmployeeName AS EmployeeName,
    	HierarchyLevel
    FROM cte
    ORDER BY
    	HierarchySortString
    Results:
    Code:
    PK	EmployeeName	HierarchyLevel
    1	Mr. CEO	1
    2	+Andy	2
    4	++Charles	3
    5	++Danni	3
    8	+++Geri	4
    6	++Eden	3
    3	+Billy Jean	2
    7	++Frank	3
  • Canes816
    New Member
    • Nov 2006
    • 3

    #2
    From the permutations example you posted, it looks like you are using the string length to exit the recursion. In the data that I'm using, I'm trying to get all possible related values, but the relationship could be 1 or 2 (or multiple) levels away.

    For example, how could I get the output below, from the table defined here?

    Declare @example table (RowID varchar(10), RelID varchar(10))
    Insert Into @example Select 'Rec1', 'Rec2'
    Insert Into @example Select 'Rec2', 'Rec3'
    Insert Into @example Select 'Rec3', 'Rec4'
    Insert Into @example Select 'Rec4', 'Rec41'
    Insert Into @example Select 'Rec4', 'Rec42'
    Insert Into @example Select 'Rec4', 'Rec43'
    Insert Into @example Select 'Rec4', 'Rec44'

    The output I want to get is the following (along with any other possible "connection s"):
    Rec1-Rec2
    Rec1-Rec3
    Rec1-Rec4
    Rec1-Rec41
    Rec1-Rec42
    Rec1-Rec43
    Rec1-Rec44

    Comment

    Working...