python - Extremely slow writing speed when inserting rows into Hive table using impyla -
i'm experiencing extremely slow writing speed when trying insert rows partitioned hive table using impyla
.
this example of code wrote in python:
from impala.dbapi import connect targets = ... # targets dictionary of objects of specific class yesterday = datetime.date.today() - datetime.timedelta(days=1) log_datetime = datetime.datetime.now() query = """ insert my_database.mytable partition (year={year}, month={month}, day={day}) values ('{yesterday}', '{log_ts}', %s, %s, %s, 1, 1) """.format(yesterday=yesterday, log_ts=log_datetime, year=yesterday.year, month=yesterday.month, day=yesterday.day) print(query) rows = tuple([tuple([i.campaign, i.adgroup, i.adwordsid]) in targets.values()]) connection = connect(host=os.environ["hadoop_ip"], port=10000, user=os.environ["hadoop_user"], password=os.environ["hadoop_passwd"], auth_mechanism="plain") cursor = connection.cursor() cursor.execute("set hive.exec.dynamic.partition.mode=nonstrict") cursor.executemany(query, rows)
interestingly, though i'm launching executemany
command impyla
still resolve multiple mapreduce jobs. in fact can see many mapreduce jobs launched many tuples included in tuple of tuples object i'm passing impyla.executemany
method.
do see wrong? give idea after more hour wrote 350 rows.
Comments
Post a Comment