Friday, February 24, 2012

logic of sum() with joins and using query hint

hi everybody
I have a question about the sum() function. when I join two tabeles and one
of them is the main table which I used in the from statement, sum function I
used for the joined table is giving the sum incorrectly(it is governing time
s
the other joined tabele). how can i eleminate the problem? do I have to use
query hint or someting. if so how?
the query is the basis of a fifo report. I didnt want to use a cursor and so
I wrote such a query. I solved the problem with UDFs but i want to learn the
logic and how to use query hint for sum function
id is primary key for both tables
CREATE TABLE [order] (
[id] [int] NULL ,
[date_order] [datetime] NULL ,
[product] [char] (10) ,
[quantity] [int] NULL
)
go
CREATE TABLE [distribute] (
[id] [int] NULL ,
[date_distribute] [datetime] NULL ,
[product] [char] (10),
[quantity] [int] NULL
)
--sample data
insert into distribute (id,date_distribute,product,quantity) values
(51,'2005-01-01','aaa',10)
insert into distribute (id,date_distribute,product,quantity) values
(52,'2005-01-04','aaa',13)
insert into distribute (id,date_distribute,product,quantity) values
(53,'2005-01-05','aaa',3)
insert into distribute (id,date_distribute,product,quantity) values
(54,'2005-01-06','aaa',-2)
insert into distribute (id,date_distribute,product,quantity) values
(55,'2005-01-07','aaa',8)
insert into distribute (id,date_distribute,product,quantity) values
(56,'2005-01-08','aaa',45)
insert into distribute (id,date_distribute,product,quantity) values
(57,'2005-01-10','aaa',10)
insert into [order] (id,date_order,product,quantity) values
(11,'2005-01-01','aaa',10)
insert into [order] (id,date_order,product,quantity) values
(12,'2005-01-03','aaa',20)
insert into [order] (id,date_order,product,quantity) values
(13,'2005-01-05','aaa',30)
insert into [order] (id,date_order,product,quantity) values
(14,'2005-01-08','aaa',15)
insert into [order] (id,date_order,product,quantity) values
(15,'2005-01-09','aaa',10)
--query
select o1.id,o1.date_order,o1.product, o1.quantity,d1.id as Distribute_id ,
isnull(sum(d2.quantity),0)as forobservingdist,
isnull(sum(o2.quantity),0)as forobservingord
from [order] o1
left join [order] o2 on o1.id>=o2.id and o1.product=o2.product
left join Distribute d1 on o1.date_order<=d1.date_distribute and
o1.product=d1.product
left join Distribute d2 on d2.id<=d1.id and d2.product=d1.product
group by o1.id, o1.date_order,o1.product, o1.quantity,d1.id
,d1.date_distribute,d1.quantity,d1.product
the query below has the same logic with the above query. I used UDFs for the
sum functions.
when you run the queries, forobservingord column must be the same as
forobservingord in the results of the query below
CREATE FUNCTION getorderdogan
(@.id int,@.product nvarchar(50))
RETURNS int
AS
BEGIN
DECLARE @.sum AS int
select @.sum = sum(o2.quantity) from [order] o2 where o2.id<=@.id and
o2.product=@.product
RETURN @.sum
END
go
CREATE FUNCTION getdistributedogan
(@.id int,@.product nvarchar(50))
RETURNS int
AS
BEGIN
DECLARE @.sum AS int
select @.sum = sum(d2.quantity) from Distribute d2 where d2.id<=@.id and
d2.product=@.product
RETURN isnull(@.sum,0)
END
select o1.id,o1.date_order,o1.product, o1.quantity,d1.id as Distribute_id ,
d1.date_distribute,
dbo.getdistributedogan(d1.id,d1.product)as forobservingdist,
dbo.getorderdogan(o1.id,o1.product)as forobservingord
from dbo.[order] o1
left outer join Distribute d1 on o1.date_order<=d1.date_distribute and
o1.product=d1.product
group by o1.id, o1.date_order,o1.product, o1.quantity,d1.id
,d1.date_distribute,d1.quantity,d1.productYou can achieve the same results (with better performance) by using
subqueries instead of UDF-s:
select o1.id, o1.date_order, o1.product,o1.quantity,
d1.id as Distribute_id, d1.date_distribute, (
select sum(d2.quantity) from Distribute d2
where d2.id<=d1.id and d2.product=d1.product
) as forobservingdist, (
select sum(o2.quantity) from [order] o2
where o2.id<=o1.id and o2.product=o1.product
) as forobservingord
from dbo.[order] o1
left outer join Distribute d1
on o1.date_order<=d1.date_distribute and o1.product=d1.product
group by o1.id, o1.date_order, o1.product, o1.quantity,
d1.id, d1.date_distribute, d1.quantity, d1.product
Razvan|||tnx Razvan sure I didnt think this:))
do you have any info about using query hint works like that?
also the last form of the query is like that
select o1.id, o1.date_order, o1.product, o1.quantity, d1.id as Distribute_id
, d1.date_distribute,
case when
0>(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.id,d1.produ
ct))
then
isnull(d1.quantity,0)+(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistribute
dogan(d1.id,d1.product))
when
d1.quantity>o1.quantity-(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistribu
tedogan(d1.id,d1.product))
then
(o1.quantity-(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.
id,d1.product)))
else
isnull(d1.quantity,0)
end as distributed_quantity,
case when
(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.id,d1.product
))>o1.quantity
then
o1.quantity
when
(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.id,d1.product
))<0
then
0
else
(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.id,d1.product
))
end as remainingorder,
dbo.getdistributedogan(d1.id,d1.product)as forobservingdist,
dbo.getorderdogan(o1.id,o1.product)as forobservingord
from dbo.[order] o1
left outer join Distribute d1 on o1.date_order<=d1.date_distribute and
o1.product=d1.product
group by o1.id, o1.date_order, o1.product, o1.quantity, d1.id ,
d1.date_distribute, d1.quantity, d1.product
having
((dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.id,d1.produc
t))+isnull(d1.quantity,0)>0
and
o1.quantity>(dbo.getorderdogan(o1.id,o1.product)-dbo.getdistributedogan(d1.i
d,d1.product)))
or isnull(d1.quantity,0)=0
I changed the UDFs with subqueries and its working. I looked at the query
execution plan and it looks more simple with the udfs. are you sure this wil
l
work with better performance? probably you are:))
also plan of the query with subqueries shows many hash matches. Can't I use
query hint making hash matches with left join?
thanks again
select o1.id, o1.date_order, o1.product, o1.quantity, d1.id as Distribute_id
, d1.date_distribute,
case when
0>((select sum(o2.quantity) from [order] o2 where o2.id<=o1.id and
o2.product=o1.product)-(select sum(d2.quantity) from Distribute d2 where
d2.id<=d1.id and d2.product=d1.product))
then
isnull(d1.quantity,0)+((select sum(o2.quantity) from [order] o2 where
o2.id<=o1.id and o2.product=o1.product)-(select sum(d2.quantity) from
Distribute d2 where d2.id<=d1.id and d2.product=d1.product))
when
d1.quantity>o1.quantity-((select sum(o2.quantity) from [order] o2 where
o2.id<=o1.id and o2.product=o1.product)-(select sum(d2.quantity) from
Distribute d2 where d2.id<=d1.id and d2.product=d1.product))
then
(o1.quantity-((select sum(o2.quantity) from [order] o2 where o2.id<=o1.id
and o2.product=o1.product)-(select sum(d2.quantity) from Distribute d2 where
d2.id<=d1.id and d2.product=d1.product)))
else
isnull(d1.quantity,0)
end as distributed_quantity,
case when
((select sum(o2.quantity) from [order] o2 where o2.id<=o1.id and
o2.product=o1.product)-(select sum(d2.quantity) from Distribute d2 where
d2.id<=d1.id and d2.product=d1.product))>o1.quantity
then
o1.quantity
when
((select sum(o2.quantity) from [order] o2 where o2.id<=o1.id and
o2.product=o1.product)-(select sum(d2.quantity) from Distribute d2 where
d2.id<=d1.id and d2.product=d1.product))<0
then
0
else
((select sum(o2.quantity) from [order] o2 where o2.id<=o1.id and
o2.product=o1.product)-(select sum(d2.quantity) from Distribute d2 where
d2.id<=d1.id and d2.product=d1.product))
end as remainingorder,
(select sum(d2.quantity) from Distribute d2 where d2.id<=d1.id and
d2.product=d1.product)as forobservingdist,
(select sum(o2.quantity) from [order] o2 where o2.id<=o1.id and
o2.product=o1.product)as forobservingord
from dbo.[order] o1
left outer join Distribute d1 on o1.date_order<=d1.date_distribute and
o1.product=d1.product
group by o1.id, o1.date_order, o1.product, o1.quantity, d1.id ,
d1.date_distribute, d1.quantity, d1.product
having (((select sum(o2.quantity) from [order] o2 where o2.id<=o1.id and
o2.product=o1.product)-(select sum(d2.quantity) from Distribute d2 where
d2.id<=d1.id and d2.product=d1.product))+isnull(d1.quantity,0)>0
and o1.quantity>((select sum(o2.quantity) from [order] o2 where
o2.id<=o1.id and o2.product=o1.product)-(select sum(d2.quantity) from
Distribute d2 where d2.id<=d1.id and d2.product=d1.product)))
or isnull(d1.quantity,0)=0
"Razvan Socol" wrote:

> You can achieve the same results (with better performance) by using
> subqueries instead of UDF-s:
> select o1.id, o1.date_order, o1.product,o1.quantity,
> d1.id as Distribute_id, d1.date_distribute, (
> select sum(d2.quantity) from Distribute d2
> where d2.id<=d1.id and d2.product=d1.product
> ) as forobservingdist, (
> select sum(o2.quantity) from [order] o2
> where o2.id<=o1.id and o2.product=o1.product
> ) as forobservingord
> from dbo.[order] o1
> left outer join Distribute d1
> on o1.date_order<=d1.date_distribute and o1.product=d1.product
> group by o1.id, o1.date_order, o1.product, o1.quantity,
> d1.id, d1.date_distribute, d1.quantity, d1.product
> Razvan
>|||> do you have any info about using query hint works like that?
There are no query hints that modify the results; the hints are used
only for optimizations. For informations about hints, see the "query
hints" topic in Books Online:
http://msdn.microsoft.com/library/e..._qd_03_8upf.asp

> I looked at the query execution plan and it looks more simple with the udfs.[/colo
r]
The execution plan for a query that calls multi-statement UDF-s (scalar
or table-valued) do not contain the cost of the statements contained in
the UDF. Only in-line table-valued UDF-s are expaded in the execution
plan of the calling query.
> are you sure this will work with better performance? probably you are:))
To be sure, test it yourself using Profiler or using something like
this:
DECLARE @.t datetime
SET @.t=GETDATE()
SELECT ...
PRINT CONVERT(varchar(10),DATEDIFF(ms,@.t,GETDA
TE()))+' ms'

> also plan of the query with subqueries shows many hash matches.
> Can't I use query hint making hash matches with left join?
SQL Server can execute joins in one of three ways: nested loops, merge
or hash. These ways can be used for inner joins, as well as for outer
joins (left joins, right joins or full outer joins). The query
optimizer automatically selects the best way to execute a join (nested
loops, merge or hash), based on the number of rows and the available
indexes on the joined columns. If the optimizer used a hash join,
that's because this is probably the best way to execute the query in
this particular case. Adding a join hint will force SQL Server to use
nested loops joins or merge joins, but in most cases that would have an
inferior performance. A better idea would be to add indexes to the
columns that are used in the join (in this case: product and id) and
let the query optimizer choose the way it executes the query (the query
optimizer may realize that it's better not to use the index on the id
column, and use only the index on the product column, for example).
For more informations about the ways a join can be executed, see
"Advanced Query Tuning Concepts" topic in Books Online:
http://msdn.microsoft.com/library/e..._tun_1_8pv7.asp
Razvan|||thank you very much for the easy performance test code:))
I've tried it both query gives sometimes 10ms sometimes 20ms result
I know I had to test it with much more data. thanks again.
I've tried to add index as you but results did not changed.
also I tried left hash join which returns a warning about changing the plan
and didnt change anything in result set.
then I tried some unconscious synthax but these instinctly tries didnt
change the result:)
so I give up:))
thanks
"Razvan Socol" wrote:

> There are no query hints that modify the results; the hints are used
> only for optimizations. For informations about hints, see the "query
> hints" topic in Books Online:
> http://msdn.microsoft.com/library/e..._qd_03_8upf.asp
>
> The execution plan for a query that calls multi-statement UDF-s (scalar
> or table-valued) do not contain the cost of the statements contained in
> the UDF. Only in-line table-valued UDF-s are expaded in the execution
> plan of the calling query.
>
> To be sure, test it yourself using Profiler or using something like
> this:
> DECLARE @.t datetime
> SET @.t=GETDATE()
> SELECT ...
> PRINT CONVERT(varchar(10),DATEDIFF(ms,@.t,GETDA
TE()))+' ms'
>
> SQL Server can execute joins in one of three ways: nested loops, merge
> or hash. These ways can be used for inner joins, as well as for outer
> joins (left joins, right joins or full outer joins). The query
> optimizer automatically selects the best way to execute a join (nested
> loops, merge or hash), based on the number of rows and the available
> indexes on the joined columns. If the optimizer used a hash join,
> that's because this is probably the best way to execute the query in
> this particular case. Adding a join hint will force SQL Server to use
> nested loops joins or merge joins, but in most cases that would have an
> inferior performance. A better idea would be to add indexes to the
> columns that are used in the join (in this case: product and id) and
> let the query optimizer choose the way it executes the query (the query
> optimizer may realize that it's better not to use the index on the id
> column, and use only the index on the product column, for example).
> For more informations about the ways a join can be executed, see
> "Advanced Query Tuning Concepts" topic in Books Online:
> http://msdn.microsoft.com/library/e..._tun_1_8pv7.asp
> Razvan
>

No comments:

Post a Comment