python - Extremely slow writing speed when inserting rows into Hive table using impyla -


i'm experiencing extremely slow writing speed when trying insert rows partitioned hive table using impyla.

this example of code wrote in python:

from impala.dbapi import connect  targets = ... # targets dictionary of objects of specific class yesterday = datetime.date.today() - datetime.timedelta(days=1) log_datetime = datetime.datetime.now() query = """         insert my_database.mytable         partition (year={year}, month={month}, day={day})         values ('{yesterday}', '{log_ts}', %s, %s, %s, 1, 1)         """.format(yesterday=yesterday, log_ts=log_datetime,                    year=yesterday.year, month=yesterday.month,                    day=yesterday.day) print(query) rows = tuple([tuple([i.campaign, i.adgroup, i.adwordsid])               in targets.values()])  connection = connect(host=os.environ["hadoop_ip"],                      port=10000,                      user=os.environ["hadoop_user"],                      password=os.environ["hadoop_passwd"],                      auth_mechanism="plain") cursor = connection.cursor() cursor.execute("set hive.exec.dynamic.partition.mode=nonstrict") cursor.executemany(query, rows) 

interestingly, though i'm launching executemany command impyla still resolve multiple mapreduce jobs. in fact can see many mapreduce jobs launched many tuples included in tuple of tuples object i'm passing impyla.executemany method.

do see wrong? give idea after more hour wrote 350 rows.


Comments

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

javascript - Why jQuery Select box change event is now working? -