sql - The fastest way to fill table using data from other tables -
i've read conception behind functional programing , makes me reconsider way of doing things.
for example, there table:
- client, date, trial, full - client1, 14.11.2012, 1, 1 - client1, 06.02.2013, null, 1 - client1, 27.03.2013, null, 1 - client1, 15.05.2013, null, 1
the table contains millions records , half million clients. goal transform data status of client:
- client, date, status - client1, 14.11.2012, 'mixed' - client1, 01.12.2012, 'unprocessed' - client1, 01.01.2012, 'unprocessed' - client1, 13.01.2013, 'slept' - client1, 01.02.2013, 'slept' - client1, 06.02.2013, 'processed' - client1, 01.03.2013, 'unprocessed' - client1, 27.03.2013, 'processed' - client1, 01.04.2013, 'unprocessed' - client1, 01.05.2013, 'unprocessed' - client1, 15.05.2013, 'processed' - client1, 01.06.2013, 'unprocessed' - client1, 01.07.2013, 'unprocessed' - client1, 23.07.2013, 'slept' - client1, 01.08.2013, 'slept' - client1, 01.09.2013, 'slept' - client1, 01.10.2013, 'slept' - client1, 01.11.2013, 'slept' - client1, 01.12.2013, 'slept' - client1, 01.01.2014, 'slept' - client1, 10.01.2014, 'left'
the short algorithm of transformation is:
- if it's first row , trial = 1 , full = 1 status = 'mixed'
- if there no data first day of month status = 'unprocessed'
- if 60 days passed , there no records containing full = 1 status = 'slept'
- if 240 days passed , there no records containing full = 1 status = 'left'
- if there first day of month , previous status = 'slept' status = 'slept
there lot of cases skipped, because algorithm isn't issue, tools.
in order transform data within sql used following expressions:
- row_number() on (partition [client] order [date] asc)
- lag([date],1) on (partition [client] order [date] desc)
- dateadd(day,1,eomonth([date]))
- recursion
- etc
i have feeling can't fastest way transform data, multi-treading (put every client in separate tread) may helpful, not sure how sql @ that. execution plan huge after big number of cases.
so, question tool best transform data that? probably, programming language can handle way better?
update: prepared requested sql code. feel free find issue: http://pastebin.com/3ncdfqug
the below assumes you'll process 1 client @ time, example customer report
i have used dataset provided, uploaded table called clientdata
following index applied may overkill created duplicate of data, makes things lightening quick:
create nonclustered index ix_cientdata_client_date on dbo.clientdata(client,date) include (trial,[full])
i have created dates table based on given client
id, first date
value lesser of last date
value + 240 days or today.
from table, can filter out useless dates. join
dataset previous clientdata
row , process status
logic.
as have not included entire set of logical processes have completed can, leaving in error
messages if start changing things. find useful in pinpointing why case
statement isn't quite doing want to:
if object_id('tempdb..#clientjourney') not null drop table #clientjourney declare @client nvarchar(50) = '0x802b52540027e50211e24949c409c617' declare @mindate date = (select min(date) clientdata client = @client ) declare @maxdate date = (select case when dateadd(d,240,max(date)) > getdate() getdate() else dateadd(d,240,max(date)) end clientdata client = @client ) --select max(date), @mindate,@maxdate, datediff(d,max(date),@maxdate) clientdata client = @client -- create table of dates between @mindate , @maxdate recursive cte ;with dates ( select @mindate datevalue ,case when datepart(day,@mindate) = 1 1 else 0 end monthstart union select dateadd(d,1,datevalue) ,case when datepart(day,dateadd(d,1,datevalue)) = 1 1 else 0 end monthstart dates datevalue < @maxdate ) -- exclude aren't either first of month, in clientdata table or @maxdate value select row_number() on (order datevalue) rownum ,d.datevalue ,d.monthstart ,c.trial ,c.[full] #clientjourney dates d left join clientdata c on(d.datevalue = c.date , c.client = @client ) d.monthstart = 1 or c.date not null or d.datevalue = @maxdate option (maxrecursion 0) -- pull data out, joining previous item of clientdata , process status select j.rownum ,j.datevalue ,j.monthstart ,j.trial ,j.[full] -- handling of first line in dataset ,case when j.rownum = 1 case when j.trial not null , j.[full] not null 'mixed' when j.trial null , j.[full] not null 'full' when j.trial not null , j.[full] null 'trial' else 'error1' end -- handling rest of dataset else case when j.monthstart = 1 -- first of month case when j.trial not null -- client data or j.[full] not null 'processed' when j.trial null -- without client data , j.[full] null case when datediff(d,jp.datevalue,j.datevalue) < 60 -- less 60 days 'unprocessed' when datediff(d,jp.datevalue,j.datevalue) < 240 -- less 240 days 'slept' else 'left' end else 'error2' end else -- rest of month case when j.[full] = 1 -- full flag 'processed' when j.[full] null -- without full flag case when datediff(d,jp.datevalue,j.datevalue) < 60 -- less 60 days 'unprocessed' when datediff(d,jp.datevalue,j.datevalue) < 240 -- less 240 days 'slept' else 'left' end else 'error3' end end end status ,jp.datevalue ,datediff(d,jp.datevalue,j.datevalue) lastfull #clientjourney j outer apply (select top 1 datevalue -- returns recent clientdata row occured before 1 being selected #clientjourney j2 j.rownum > j2.rownum , j2.[full] not null order datevalue desc ) jp -- clean if object_id('tempdb..#clientjourney') not null drop table #clientjourney
Comments
Post a Comment