python - Scrapy API - Spider class init argument turned to None -
after fresh install of miniconda 64-bit exe installer windows , python 2.7 on windows 7, through scrapy, here installed:
- python 2.7.12
- scrapy 1.1.1
- twisted 16.4.1
this minimal code, run "python scrapy_test.py" (using scrapy api):
#!/usr/bin/env python2.7 # -*- coding: utf-8 -*- import scrapy.spiders.crawl import scrapy.crawler import scrapy.utils.project class myspider(scrapy.spiders.crawl.crawlspider) : name = "stackoverflow.com" allowed_domains = ["stackoverflow.com"] start_urls = ["http://stackoverflow.com/"] download_delay = 1.5 def __init__(self, my_arg = none) : print "def __init__" self.my_arg = my_arg print "self.my_arg" print self.my_arg def parse(self, response) : pass def main() : my_arg = "value" process = scrapy.crawler.crawlerprocess(scrapy.utils.project.get_project_settings()) process.crawl(myspider(my_arg)) process.start() if __name__ == "__main__" : main()
gives ouput:
[scrapy] info: scrapy 1.1.1 started (bot: scrapy_project) [scrapy] info: overridden settings: {'newspider_module': 'scrapy_project.spiders', 'spider_modules': ['scrapy_project.spiders'], 'robotstxt_obey': true, 'bot_name': 'scrapy_project'} def __init__ self.my_arg value [scrapy] info: enabled extensions: ['scrapy.extensions.logstats.logstats', 'scrapy.extensions.telnet.telnetconsole', 'scrapy.extensions.corestats.corestats'] def __init__ self.my_arg none [...]
notice how init method run twice , how stored argument got turned none after second run, not want. supposed happen??
if change:
def __init__(self, my_arg = none) :
to:
def __init__(self, my_arg) :
the output is:
[...] unhandled error in deferred: [twisted] critical: unhandled error in deferred: traceback (most recent call last): file "scrapy_test.py", line 28, in main process.crawl(myspider(my_arg)) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 163, in crawl return self._crawl(crawler, *args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 167, in _crawl d = crawler.crawl(*args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\twisted\internet\defer.py", line 1331, in unwindgenerator return _inlinecallbacks(none, gen, deferred()) --- <exception caught here> --- file "c:\users\xyz\miniconda2\lib\site-packages\twisted\internet\defer.py", line 1185, in _inlinecallbacks result = g.send(result) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 90, in crawl six.reraise(*exc_info) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 71, in crawl self.spider = self._create_spider(*args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 94, in _create_spider return self.spidercls.from_crawler(self, *args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\crawl.py", line 96, in from_crawler spider = super(crawlspider, cls).from_crawler(crawler, *args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\__init__.py", line 50, in from_crawler spider = cls(*args, **kwargs) exceptions.typeerror: __init__() takes 2 arguments (1 given) [twisted] critical: traceback (most recent call last): file "c:\users\xyz\miniconda2\lib\site-packages\twisted\internet\defer.py", line 1185, in _inlinecallbacks result = g.send(result) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 90, in crawl six.reraise(*exc_info) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 71, in crawl self.spider = self._create_spider(*args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 94, in _create_spider return self.spidercls.from_crawler(self, *args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\crawl.py", line 96, in from_crawler spider = super(crawlspider, cls).from_crawler(crawler, *args, **kwargs) file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\__init__.py", line 50, in from_crawler spider = cls(*args, **kwargs) typeerror: __init__() takes 2 arguments (1 given)
no clue how around problem. idea?
here method definition scrapy.crawler.crawlerprocess.crawl()
:
crawl(crawler_or_spidercls, *args, **kwargs)
- crawler_or_spidercls (
crawler
instance,spider
subclass or string) – created crawler, or spider class or spider’s name inside project create it- args (list) – arguments initialize spider
- kwargs (dict) – keyword arguments initialize spider
this means should passing name of spider
separately kwargs
needed initialize said spider
, so:
process.crawl(myspider, my_arg = 'value')
Comments
Post a Comment