SELECT DISTINCT with JOIN

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Craig Hoskin

    SELECT DISTINCT with JOIN

    Hi everyone

    Have a problem I would areally appreciate help with.
    I have 3 tables in a standard format for a Bookshop, eg

    Products
    Categories
    Categories_Prod ucts

    the latter allowing me to have products in multiple categories.

    Everthing works well except for one annoying little thing.

    When an individual product (which is in more than one topcategory) is added
    to the Shopping Cart it displays twice, because in my select statement I
    have the Category listed. I realise I could remove the TopCategory from the
    statement and that makes my DISTINCT work as I wanted, but Id prefer to have
    the TopCategory as it saves me later having to another SQL query (Im already
    doing one to allow me not to list category in the Statement .... but If I
    can overcome this one ... then I can remove this as well).

    Here is my table structure (the necessary bits)
    products
    idProduct int
    ....

    categories
    idcategory int
    idParentCategor y int
    topcategory int
    ...

    categories_prod ucts
    idCatProd int
    idProduct int
    idCategory

    When I run a query such as

    SELECT DISTINCT a.idProduct, a.description,a .descriptionLon g,
    a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
    c.topcategory
    FROM products a, categories_prod ucts b, categories c
    WHERE active = -1 AND homePage = -1
    AND a.idProduct = b.idProduct
    AND c.idcategory=b. idcategory
    AND prodType = 1 ORDER BY a.idProduct DESC

    This will return all products as expected, as well as any products which are
    in more than one TopCategory.

    Any ideas how to overcome this would be greatly appreciated.

    Cheers

    Craig


  • Hugo Kornelis

    #2
    Re: SELECT DISTINCT with JOIN

    On Fri, 14 May 2004 09:05:08 +1200, Craig Hoskin wrote:
    [color=blue]
    >Hi everyone
    >
    >Have a problem I would areally appreciate help with.
    >I have 3 tables in a standard format for a Bookshop, eg
    >
    > Products
    > Categories
    > Categories_Prod ucts
    >
    >the latter allowing me to have products in multiple categories.
    >
    >Everthing works well except for one annoying little thing.
    >
    >When an individual product (which is in more than one topcategory) is added
    >to the Shopping Cart it displays twice, because in my select statement I
    >have the Category listed. I realise I could remove the TopCategory from the
    >statement and that makes my DISTINCT work as I wanted, but Id prefer to have
    >the TopCategory as it saves me later having to another SQL query (Im already
    >doing one to allow me not to list category in the Statement .... but If I
    >can overcome this one ... then I can remove this as well).
    >
    >Here is my table structure (the necessary bits)
    >products
    > idProduct int
    > ....
    >
    >categories
    > idcategory int
    > idParentCategor y int
    > topcategory int
    > ...
    >
    >categories_pro ducts
    > idCatProd int
    > idProduct int
    > idCategory
    >
    >When I run a query such as
    >
    >SELECT DISTINCT a.idProduct, a.description,a .descriptionLon g,
    >a.listPrice,a. price,a.smallIm ageUrl,a.stock, a.fileName,a.no ShipCharge,
    >c.topcategor y
    >FROM products a, categories_prod ucts b, categories c
    >WHERE active = -1 AND homePage = -1
    > AND a.idProduct = b.idProduct
    > AND c.idcategory=b. idcategory
    > AND prodType = 1 ORDER BY a.idProduct DESC
    >
    >This will return all products as expected, as well as any products which are
    >in more than one TopCategory.
    >
    >Any ideas how to overcome this would be greatly appreciated.
    >
    >Cheers
    >
    >Craig
    >[/color]

    Hi Craig,

    The first idea will have to come from you. SQL Server won't make your
    choices for you. So you must decide *which* TopCategory to display for
    products that are in more than one TopCategory.

    If you can describe how to choose, I (or someone else in this NG) will
    probably be able to help writing up the query.

    Best, Hugo
    --

    (Remove _NO_ and _SPAM_ to get my e-mail address)

    Comment

    • Craig Hoskin

      #3
      Re: SELECT DISTINCT with JOIN

      Doooh

      Always the way, you post a question and find a possible solution :-)

      Would this work?

      select a.idProduct, a.description,a .descriptionLon g,
      a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge
      from Products a
      where exists (select * from categories c, categories_prod ucts b where
      a.idProduct = b.idProduct AND c.idcategory=b. idcategory)
      AND a.active = -1 AND a.homePage = -1 AND a.prodType = 1
      ORDER BY a.idProduct DESC

      Cheers

      Craig
      [color=blue]
      > Hi everyone
      >
      > Have a problem I would areally appreciate help with.
      > I have 3 tables in a standard format for a Bookshop, eg
      >
      > Products
      > Categories
      > Categories_Prod ucts
      >
      > the latter allowing me to have products in multiple categories.
      >
      > Everthing works well except for one annoying little thing.
      >
      > When an individual product (which is in more than one topcategory) is[/color]
      added[color=blue]
      > to the Shopping Cart it displays twice, because in my select statement I
      > have the Category listed. I realise I could remove the TopCategory from[/color]
      the[color=blue]
      > statement and that makes my DISTINCT work as I wanted, but Id prefer to[/color]
      have[color=blue]
      > the TopCategory as it saves me later having to another SQL query (Im[/color]
      already[color=blue]
      > doing one to allow me not to list category in the Statement .... but If I
      > can overcome this one ... then I can remove this as well).
      >
      > Here is my table structure (the necessary bits)
      > products
      > idProduct int
      > ....
      >
      > categories
      > idcategory int
      > idParentCategor y int
      > topcategory int
      > ...
      >
      > categories_prod ucts
      > idCatProd int
      > idProduct int
      > idCategory
      >
      > When I run a query such as
      >
      > SELECT DISTINCT a.idProduct, a.description,a .descriptionLon g,
      > a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
      > c.topcategory
      > FROM products a, categories_prod ucts b, categories c
      > WHERE active = -1 AND homePage = -1
      > AND a.idProduct = b.idProduct
      > AND c.idcategory=b. idcategory
      > AND prodType = 1 ORDER BY a.idProduct DESC
      >
      > This will return all products as expected, as well as any products which[/color]
      are[color=blue]
      > in more than one TopCategory.
      >
      > Any ideas how to overcome this would be greatly appreciated.
      >
      > Cheers
      >
      > Craig
      >
      >[/color]


      Comment

      • Craig Hoskin

        #4
        Re: SELECT DISTINCT with JOIN


        "Hugo Kornelis" <hugo@pe_NO_rFa ct.in_SPAM_fo> wrote in message
        news:7up7a0horj 1pjdvu7b2qcvf23 sktec53sr@4ax.c om...
        Hi Hugo[color=blue]
        >
        > The first idea will have to come from you. SQL Server won't make your
        > choices for you. So you must decide *which* TopCategory to display for
        > products that are in more than one TopCategory.[/color]

        Sorry, yes I should have calrified that aspect. Answer: I dont care :-)
        Any topcategory will do as its still the same product, and the produut
        display page will actually list all the categories its in.

        Thanks for taking the time to help me.

        Cheers

        Craig
        [color=blue]
        > If you can describe how to choose, I (or someone else in this NG) will
        > probably be able to help writing up the query.
        >
        > Best, Hugo
        > --
        >
        > (Remove _NO_ and _SPAM_ to get my e-mail address)[/color]


        Comment

        • Craig Hoskin

          #5
          Re: SELECT DISTINCT with JOIN

          No cancel that ... I forgot my c.topcategory in the qeruy.

          Back to seeking help again :-)


          Comment

          • Hugo Kornelis

            #6
            Re: SELECT DISTINCT with JOIN

            On Fri, 14 May 2004 09:30:00 +1200, Craig Hoskin wrote:
            [color=blue]
            >
            >"Hugo Kornelis" <hugo@pe_NO_rFa ct.in_SPAM_fo> wrote in message
            >news:7up7a0hor j1pjdvu7b2qcvf2 3sktec53sr@4ax. com...
            >Hi Hugo[color=green]
            >>
            >> The first idea will have to come from you. SQL Server won't make your
            >> choices for you. So you must decide *which* TopCategory to display for
            >> products that are in more than one TopCategory.[/color]
            >
            >Sorry, yes I should have calrified that aspect. Answer: I dont care :-)
            >Any topcategory will do as its still the same product, and the produut
            >display page will actually list all the categories its in.
            >
            >Thanks for taking the time to help me.
            >
            >Cheers
            >
            >Craig
            >[color=green]
            >> If you can describe how to choose, I (or someone else in this NG) will
            >> probably be able to help writing up the query.[/color][/color]

            Hi Craig,

            Hmmmm. Since SQL Server still won't choose for you, allow me to do it for
            you. The following two queries will select the "first" topcategory (I
            don't know the datatype, so it will either be the lowest numbered, the one
            that comes first in the alphabet or the oldest).

            1: Using GROUP BY

            SELECT a.idProduct, a.description, a.descriptionLo ng,
            a.listPrice, a.price, a.smallImageUrl , a.stock,
            a.fileName, a.noShipCharge, MIN(c.topcatego ry)
            FROM products a, categories_prod ucts b, categories c
            WHERE a.active = -1
            AND a.homePage = -1
            AND a.idProduct = b.idProduct
            AND c.idcategory = b.idcategory
            AND a.prodType = 1
            GROUP BY a.idProduct, a.description, a.descriptionLo ng,
            a.listPrice, a.price, a.smallImageUrl , a.stock,
            a.fileName, a.noShipCharge
            ORDER BY a.idProduct DESC


            2: Using a subquery.

            SELECT a.idProduct, a.description, a.descriptionLo ng,
            a.listPrice, a.price, a.smallImageUrl , a.stock,
            a.fileName, a.noShipCharge,
            (SELECT MIN(c.topcatego ry)
            FROM categories_prod ucts b, categories c
            WHERE a.idProduct = b.idProduct
            AND c.idcategory = b.idcategory)
            FROM products a
            WHERE a.active = -1
            AND a.homePage = -1
            AND a.prodType = 1
            ORDER BY a.idProduct DESC

            Both queries are untested. Test them both and check the results. Then
            compare the execution plan to see which is most efficient with your data.
            If you see little difference, pick the one you like best :-)

            Best, Hugo
            --

            (Remove _NO_ and _SPAM_ to get my e-mail address)

            Comment

            • Craig Hoskin

              #7
              Re: SELECT DISTINCT with JOIN

              >[color=blue][color=green]
              > > If you can describe how to choose, I (or someone else in this NG) will
              > > probably be able to help writing up the query.[/color][/color]

              Eg max(topcategory ) or something / somehow?


              Comment

              • Hugo Kornelis

                #8
                Re: SELECT DISTINCT with JOIN

                On Fri, 14 May 2004 09:33:15 +1200, Craig Hoskin wrote:
                [color=blue]
                >No cancel that ... I forgot my c.topcategory in the qeruy.
                >
                >Back to seeking help again :-)
                >[/color]

                Hi Craig,

                So I noted. But it still was a useful post, as it showed that the active,
                homepage and product columns are all in the Products table (something that
                was not clear from your first post). Without this knowledge, I could not
                have presented my second option (see my other post).

                Best, Hugo
                --

                (Remove _NO_ and _SPAM_ to get my e-mail address)

                Comment

                • Erland Sommarskog

                  #9
                  Re: SELECT DISTINCT with JOIN

                  Craig Hoskin (nospam@infobah n.co.nz) writes:[color=blue]
                  > Have a problem I would areally appreciate help with.
                  > I have 3 tables in a standard format for a Bookshop, eg
                  >
                  > Products
                  > Categories
                  > Categories_Prod ucts
                  >
                  > the latter allowing me to have products in multiple categories.
                  >
                  > Everthing works well except for one annoying little thing.
                  >
                  > When an individual product (which is in more than one topcategory) is
                  > added to the Shopping Cart it displays twice, because in my select
                  > statement I have the Category listed. I realise I could remove the
                  > TopCategory from the statement and that makes my DISTINCT work as I
                  > wanted, but Id prefer to have the TopCategory as it saves me later
                  > having to another SQL query (Im already doing one to allow me not to
                  > list category in the Statement .... but If I can overcome this one ...
                  > then I can remove this as well).[/color]

                  You want to display a distinct list, but you only want one TopCategory.
                  So which one do you want?

                  Well, assuming you only want the highest, you could do:

                  SELECT a.idProduct, a.description,a .descriptionLon g,
                  a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
                  MAX(c.topcatego ry)
                  FROM products a, categories_prod ucts b, categories c
                  WHERE active = -1 AND homePage = -1
                  AND a.idProduct = b.idProduct
                  AND c.idcategory=b. idcategory
                  AND prodType = 1 ORDER BY a.idProduct DESC
                  GROUP BY a.idProduct, a.description,a .descriptionLon g,
                  a.listPrice,a.p rice,a.smallIma geUrl,a.stock,
                  a.fileName,a.no ShipCharge



                  --
                  Erland Sommarskog, SQL Server MVP, sommar@algonet. se

                  Books Online for SQL Server SP3 at
                  Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                  Comment

                  • Craig Hoskin

                    #10
                    Re: SELECT DISTINCT with JOIN

                    Hi Hugo

                    Ive tried this which seems to work?

                    SELECT DISTINCT a.idProduct, a.description,a .descriptionLon g,
                    a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
                    c.topcategory
                    FROM products a, categories_prod ucts b, categories c
                    WHERE c.topcategory = (SELECT MAX(c.topcatego ry)
                    FROM categories_prod ucts b, categories c
                    WHERE a.idProduct = b.idProduct
                    AND c.idcategory = b.idcategory)
                    AND active = -1 AND homePage = -1 A
                    ND a.idProduct = b.idProduct AND c.idcategory=b. idcategory
                    AND prodType = 1
                    ORDER BY a.idProduct DESC


                    "Hugo Kornelis" <hugo@pe_NO_rFa ct.in_SPAM_fo> wrote in message
                    news:7up7a0horj 1pjdvu7b2qcvf23 sktec53sr@4ax.c om...[color=blue]
                    >
                    > The first idea will have to come from you. SQL Server won't make your
                    > choices for you. So you must decide *which* TopCategory to display for
                    > products that are in more than one TopCategory.
                    >
                    > If you can describe how to choose, I (or someone else in this NG) will
                    > probably be able to help writing up the query.
                    >
                    > Best, Hugo[/color]


                    Comment

                    • Craig Hoskin

                      #11
                      Re: SELECT DISTINCT with JOIN

                      Hi Hugo[color=blue]
                      >
                      > Hmmmm. Since SQL Server still won't choose for you, allow me to do it for
                      > you. The following two queries will select the "first" topcategory (I
                      > don't know the datatype, so it will either be the lowest numbered, the one
                      > that comes first in the alphabet or the oldest).[/color]

                      Thanks for that ... it was your previous posting that gave me an idea about
                      the MAX. Good plan :-)
                      [color=blue]
                      >
                      > 2: Using a subquery.
                      >
                      > SELECT a.idProduct, a.description, a.descriptionLo ng,
                      > a.listPrice, a.price, a.smallImageUrl , a.stock,
                      > a.fileName, a.noShipCharge,
                      > (SELECT MIN(c.topcatego ry)
                      > FROM categories_prod ucts b, categories c
                      > WHERE a.idProduct = b.idProduct
                      > AND c.idcategory = b.idcategory)
                      > FROM products a
                      > WHERE a.active = -1
                      > AND a.homePage = -1
                      > AND a.prodType = 1
                      > ORDER BY a.idProduct DESC
                      >
                      > Both queries are untested. Test them both and check the results. Then
                      > compare the execution plan to see which is most efficient with your data.
                      > If you see little difference, pick the one you like best :-)[/color]

                      This last one looked the best.

                      From a cursory glance, which would be better, yours where the SELECT is a
                      subquery (forgove my terminiology) or mine where it is in the WHERE clause?

                      Both provide the same results (it appears) and appear OK performace wise ...
                      just wondering which is better for a larger number of records?

                      Thanks again for all your help ... truly very much appreciated.

                      Cheers

                      Craig


                      Comment

                      • Craig Hoskin

                        #12
                        Re: SELECT DISTINCT with JOIN

                        Hi Erland

                        Thanks for the reply.

                        I note you and Hugo both placed the "select Max()" in the SELECT part, as
                        opposed to one I made where I placed it in the WHERE clause.

                        Am I correct in assuming that, given you guys and your experience, that
                        yours will be more efficient ... since you both seem to have choosen the
                        same approach?

                        Cheers

                        Craig

                        "Erland Sommarskog" <sommar@algonet .se> wrote in message
                        news:Xns94E8F3D 0D213EYazorman@ 127.0.0.1...[color=blue]
                        > Craig Hoskin (nospam@infobah n.co.nz) writes:[color=green]
                        > > Have a problem I would areally appreciate help with.
                        > > I have 3 tables in a standard format for a Bookshop, eg
                        > >
                        > > Products
                        > > Categories
                        > > Categories_Prod ucts
                        > >
                        > > the latter allowing me to have products in multiple categories.
                        > >
                        > > Everthing works well except for one annoying little thing.
                        > >
                        > > When an individual product (which is in more than one topcategory) is
                        > > added to the Shopping Cart it displays twice, because in my select
                        > > statement I have the Category listed. I realise I could remove the
                        > > TopCategory from the statement and that makes my DISTINCT work as I
                        > > wanted, but Id prefer to have the TopCategory as it saves me later
                        > > having to another SQL query (Im already doing one to allow me not to
                        > > list category in the Statement .... but If I can overcome this one ...
                        > > then I can remove this as well).[/color]
                        >
                        > You want to display a distinct list, but you only want one TopCategory.
                        > So which one do you want?
                        >
                        > Well, assuming you only want the highest, you could do:
                        >
                        > SELECT a.idProduct, a.description,a .descriptionLon g,
                        > a.listPrice,a.p rice,a.smallIma geUrl,a.stock, a.fileName,a.no ShipCharge,
                        > MAX(c.topcatego ry)
                        > FROM products a, categories_prod ucts b, categories c
                        > WHERE active = -1 AND homePage = -1
                        > AND a.idProduct = b.idProduct
                        > AND c.idcategory=b. idcategory
                        > AND prodType = 1 ORDER BY a.idProduct DESC
                        > GROUP BY a.idProduct, a.description,a .descriptionLon g,
                        > a.listPrice,a.p rice,a.smallIma geUrl,a.stock,
                        > a.fileName,a.no ShipCharge
                        >
                        >
                        >
                        > --
                        > Erland Sommarskog, SQL Server MVP, sommar@algonet. se
                        >
                        > Books Online for SQL Server SP3 at
                        > http://www.microsoft.com/sql/techinf...2000/books.asp[/color]


                        Comment

                        • Hugo Kornelis

                          #13
                          Re: SELECT DISTINCT with JOIN

                          On Fri, 14 May 2004 10:08:16 +1200, Craig Hoskin wrote:

                          (snip)[color=blue]
                          >This last one looked the best.
                          >
                          >From a cursory glance, which would be better, yours where the SELECT is a
                          >subquery (forgove my terminiology) or mine where it is in the WHERE clause?[/color]

                          Hi Craig,

                          Your query seems to contain some redundant joins. This may not hurt you if
                          the optimizer "sees" that it doesn't need to perform the join twice, but
                          I'm not sure if the optimizer is intelligent enough for that.
                          [color=blue]
                          >
                          >Both provide the same results (it appears) and appear OK performace wise ...
                          >just wondering which is better for a larger number of records?[/color]

                          Without testing, I'd say mine is quicker (probably) or they have equal
                          performance (slight chance). But you'd have to test to be sure. Make sure
                          you have an adequate (ie large) set of test data with as typical
                          distribution as possible. Run both queries with SET STATISTICS IO. Also,
                          compare the execution plans.

                          Best, Hugo
                          --

                          (Remove _NO_ and _SPAM_ to get my e-mail address)

                          Comment

                          • Hugo Kornelis

                            #14
                            Re: SELECT DISTINCT with JOIN

                            On Fri, 14 May 2004 00:48:02 +0200, Hugo Kornelis wrote:
                            [color=blue]
                            >On Fri, 14 May 2004 10:08:16 +1200, Craig Hoskin wrote:
                            >
                            >(snip)[color=green]
                            >>This last one looked the best.
                            >>
                            >>From a cursory glance, which would be better, yours where the SELECT is a
                            >>subquery (forgove my terminiology) or mine where it is in the WHERE clause?[/color]
                            >
                            >Hi Craig,
                            >
                            >Your query seems to contain some redundant joins. This may not hurt you if
                            >the optimizer "sees" that it doesn't need to perform the join twice, but
                            >I'm not sure if the optimizer is intelligent enough for that.[/color]

                            Forgot to mention - my first suggestion (identical to Erland's) has a high
                            probability of being the fastest of them all (though it does depend on
                            what indexes are available for the optimizer to toy with).

                            I suspect the long group by list is what makes this one "look" less good
                            to you. However, this will probably look better to the optimizer than the
                            correlated subqueries in my second query and in your post. Also, do keep
                            in mind that you can mentally replace the entire group by list by "all
                            columns in the select clause except topcategory".

                            Best, Hugo
                            --

                            (Remove _NO_ and _SPAM_ to get my e-mail address)

                            Comment

                            • Erland Sommarskog

                              #15
                              Re: SELECT DISTINCT with JOIN

                              Craig Hoskin (nospam@infobah n.co.nz) writes:[color=blue]
                              > I note you and Hugo both placed the "select Max()" in the SELECT part, as
                              > opposed to one I made where I placed it in the WHERE clause.
                              >
                              > Am I correct in assuming that, given you guys and your experience, that
                              > yours will be more efficient ... since you both seem to have choosen the
                              > same approach?[/color]

                              For this case it is probably more effecient. It was also less typing...

                              --
                              Erland Sommarskog, SQL Server MVP, sommar@algonet. se

                              Books Online for SQL Server SP3 at
                              Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                              Comment

                              Working...