python - Scrapy API - Spider class init argument turned to None -


after fresh install of miniconda 64-bit exe installer windows , python 2.7 on windows 7, through scrapy, here installed:

  • python 2.7.12
  • scrapy 1.1.1
  • twisted 16.4.1

this minimal code, run "python scrapy_test.py" (using scrapy api):

#!/usr/bin/env python2.7 # -*- coding: utf-8 -*-  import scrapy.spiders.crawl import scrapy.crawler import scrapy.utils.project  class myspider(scrapy.spiders.crawl.crawlspider) :     name = "stackoverflow.com"     allowed_domains = ["stackoverflow.com"]     start_urls = ["http://stackoverflow.com/"]     download_delay = 1.5      def __init__(self, my_arg = none) :         print "def __init__"          self.my_arg = my_arg         print "self.my_arg"         print self.my_arg      def parse(self, response) :         pass  def main() :     my_arg = "value"      process = scrapy.crawler.crawlerprocess(scrapy.utils.project.get_project_settings())     process.crawl(myspider(my_arg))     process.start()  if __name__ == "__main__" :     main() 

gives ouput:

[scrapy] info: scrapy 1.1.1 started (bot: scrapy_project) [scrapy] info: overridden settings: {'newspider_module': 'scrapy_project.spiders', 'spider_modules': ['scrapy_project.spiders'], 'robotstxt_obey': true, 'bot_name': 'scrapy_project'} def __init__ self.my_arg value [scrapy] info: enabled extensions: ['scrapy.extensions.logstats.logstats',  'scrapy.extensions.telnet.telnetconsole',  'scrapy.extensions.corestats.corestats'] def __init__ self.my_arg none [...] 

notice how init method run twice , how stored argument got turned none after second run, not want. supposed happen??

if change:

def __init__(self, my_arg = none) : 

to:

def __init__(self, my_arg) : 

the output is:

[...] unhandled error in deferred: [twisted] critical: unhandled error in deferred:   traceback (most recent call last):   file "scrapy_test.py", line 28, in main     process.crawl(myspider(my_arg))   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 163, in crawl     return self._crawl(crawler, *args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 167, in _crawl     d = crawler.crawl(*args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\twisted\internet\defer.py", line 1331, in unwindgenerator     return _inlinecallbacks(none, gen, deferred()) --- <exception caught here> ---   file "c:\users\xyz\miniconda2\lib\site-packages\twisted\internet\defer.py", line 1185, in _inlinecallbacks     result = g.send(result)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 90, in crawl     six.reraise(*exc_info)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 71, in crawl     self.spider = self._create_spider(*args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 94, in _create_spider     return self.spidercls.from_crawler(self, *args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\crawl.py", line 96, in from_crawler     spider = super(crawlspider, cls).from_crawler(crawler, *args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\__init__.py", line 50, in from_crawler     spider = cls(*args, **kwargs) exceptions.typeerror: __init__() takes 2 arguments (1 given) [twisted] critical: traceback (most recent call last):   file "c:\users\xyz\miniconda2\lib\site-packages\twisted\internet\defer.py", line 1185, in _inlinecallbacks     result = g.send(result)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 90, in crawl     six.reraise(*exc_info)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 71, in crawl     self.spider = self._create_spider(*args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\crawler.py", line 94, in _create_spider     return self.spidercls.from_crawler(self, *args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\crawl.py", line 96, in from_crawler     spider = super(crawlspider, cls).from_crawler(crawler, *args, **kwargs)   file "c:\users\xyz\miniconda2\lib\site-packages\scrapy\spiders\__init__.py", line 50, in from_crawler     spider = cls(*args, **kwargs) typeerror: __init__() takes 2 arguments (1 given) 

no clue how around problem. idea?

here method definition scrapy.crawler.crawlerprocess.crawl():

crawl(crawler_or_spidercls, *args, **kwargs)

  • crawler_or_spidercls (crawler instance, spider subclass or string) – created crawler, or spider class or spider’s name inside project create it
  • args (list) – arguments initialize spider
  • kwargs (dict) – keyword arguments initialize spider

this means should passing name of spider separately kwargs needed initialize said spider, so:

process.crawl(myspider, my_arg = 'value') 

Comments

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

javascript - Why jQuery Select box change event is now working? -