python - Download a gzipped file, md5 checksum it, and then save extracted data if matches -
i'm attempting download 2 files using python, 1 gzipped file, , other, checksum.
i verify gzipped file's contents match md5 checksum, , save contents target directory.
i found out how download files here, , learned how calculate checksum here. load urls json config file, , learned how parse json file values here.
i put following script, i'm stuck attempting store verified contents of gzipped file.
import json import gzip import urllib import hashlib # function creating md5 checksum of file def md5gzip(fname): hash_md5 = hashlib.md5() gzip.open(fname, 'rb') f: # make iterable of file , divide 4096 byte chunks # iteration ends when hit empty byte string (b"") chunk in iter(lambda: f.read(4096), b""): # update md5 hash chunk hash_md5.update(chunk) return hash_md5.hexdigest() # open configuration file in current directory open('./config.json') configfile: data = json.load(configfile) # open downloaded checksum file open(urllib.urlretrieve(data['checksumurl'])[0]) checksumfile: md5checksum = checksumfile.read() # open downloaded db file , it's md5 checksum via gzip.open filemd5 = md5gzip(urllib.urlretrieve(data['fileurl'])[0]) if (filemd5 == md5checksum): print 'downloaded correct file' # save correct file else: print 'downloaded incorrect file' # error handling
in md5gzip
, return tuple
instead of hash.
def md5gzip(fname): hash_md5 = hashlib.md5() file_content = none gzip.open(fname, 'rb') f: # make iterable of file , divide 4096 byte chunks # iteration ends when hit empty byte string (b"") chunk in iter(lambda: f.read(4096), b""): # update md5 hash chunk hash_md5.update(chunk) # file content f.seek(0) file_content = f.read() return hash_md5.hexdigest(), file_content
then, in code:
filemd5, file_content = md5gzip(urllib.urlretrieve(data['fileurl'])[0])
Comments
Post a Comment