mysql - Hive : finding items that customers bought together from transactions -
i working on mysql need replicate queries on hive.
i have table in form
i retrieve following info:
on mysql, below query works:
select c.original_item_id, c.bought_with_item_id, count(*) times_bought_together ( select a.item_id original_item_id, b.item_id bought_with_item_id items inner join items b on a.transaction_id = b.transaction_id , a.item_id != b.item_id original_item_id in ('b','c')) c group c.original_item_id, c.bought_with_item_id;
but not able translate hive query, have tried lot of shuffling joins , replacing on conditions have not got necessary results. great if can find on this
hive not support not equality joins. can move condition a.item_id != b.item_id
where
clause:
create table items(transaction_id smallint, item_id string); insert overwrite table items select 1 , 'a' default.dual union select 1 , 'b' default.dual union select 1 , 'c' default.dual union select 2 , 'b' default.dual union select 2 , 'a' default.dual union select 3 , 'a' default.dual union select 4 , 'b' default.dual union select 4 , 'c' default.dual; select c.original_item_id, c.bought_with_item_id, count(*) times_bought_together ( select a.item_id original_item_id, b.item_id bought_with_item_id items inner join items b on a.transaction_id = b.transaction_id a.item_id in ('b','c') --original_item_id , a.item_id != b.item_id ) c group c.original_item_id, c.bought_with_item_id; --- ok original_item_id bought_with_item_id times_bought_together b 2 b c 2 c 1 c b 2
time taken: 24.164 seconds, fetched: 4 row(s)
Comments
Post a Comment