selenium questions - Page 1

Hooman Bahreini

Asked: 2020-10-30 12:24:31 +0800 CST

Cannot create a crontab job for my scrapy program

8

I have written a small Python scraper (using Scrapy framework). The scraper requires a headless browse... I am using ChromeDriver.

As I am running this code on an Ubuntu server which does not have any GUI, I had to install Xvfb in order to run ChromeDriver on my Ubuntu server (I followed this guide)

This is my code:

class MySpider(scrapy.Spider):
    name = 'my_spider'

    def __init__(self):
        # self.driver = webdriver.Chrome(ChromeDriverManager().install())
        chrome_options = Options()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        self.driver = webdriver.Chrome('/usr/bin/chromedriver', chrome_options=chrome_options)

I can run the above code from Ubuntu shell and it execute without any errors:

ubuntu@ip-1-2-3-4:~/scrapers/my_scraper$ scrapy crawl my_spider

Now I want to setup a cron job to run the above command everyday:

# m h  dom mon dow   command
PATH=/usr/local/bin:/home/ubuntu/.local/bin/
05 12 * * * cd /home/ubuntu/scrapers/my_scraper && scrapy crawl my_spider >> /tmp/scraper.log 2>&1

but the crontab job gives me the following error:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/home/ubuntu/.local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 86, in crawl
    self.spider = self._create_spider(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/scrapy/crawler.py", line 98, in _create_spider
    return self.spidercls.from_crawler(self, *args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/scrapy/spiders/__init__.py", line 19, in from_crawler
    spider = cls(*args, **kwargs)
  File "/home/ubuntu/scrapers/my_scraper/my_scraper/spiders/spider.py", line 27, in __init__
    self.driver = webdriver.Chrome('/usr/bin/chromedriver', chrome_options=chrome_options)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
    desired_capabilities=desired_capabilities)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 157, in __init__
    self.start_session(capabilities, browser_profile)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 252, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
  (Driver info: chromedriver=2.41.578700 (2f1ed5f9343c13f73144538f15c00b370eda6706),platform=Linux 5.4.0-1029-aws x86_64)

Update

This answer help me solve the issue (but I don't quite understand why)

I ran echo $PATH on my Ubuntu shell and copied the value into the crontab:

PATH=/home/ubuntu/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
05 12 * * * cd /home/ubuntu/scrapers/my_scraper && scrapy crawl my_spider >> /tmp/scraper.log 2>&1

Note: As I have created a bounty for this question, I am happy to award it to any answer which explains why changing the PATH solved the issue.

Nembone

Asked: 2019-11-29 20:46:25 +0800 CST

Selenium issues webdriver

2

im very new in this. I'll try to explain everything and im sorry before hand if i say something nub

I'm trying to make Selenium wokrs in Linux Server, so it's just commands.

Everything is already installed (Chrome - chromedriver - python - selenium)

My sample code to test:

import time
from selenium import webdriver
driver = webdriver.Chrome()

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

So, i did a ls -l goo*

lrwxrwxrwx. 1 root root 31 Nov 29 03:29 google-chrome -> /etc/alternatives/google-chrome
lrwxrwxrwx. 1 root root 32 Nov 28 05:54 google-chrome-stable -> /opt/google/chrome/google-chrome

For any reason (if someone of you can explain to me please) google-chrome appears with ->

chromdriver is running at localhost

    port 9515
Only local connections are allowed.
Please protect ports used by ChromeDriver and related test frameworks to prevent access by malicious code.

I don't know what am i doing wrong

Any help?

junichironakashima

Asked: 2019-07-24 19:14:31 +0800 CST

How to make python wait for a program to stop before going to the next line of code

3

I'm creating a script in python that will open a program then python will wait for that program to close itself before continuing to the next code. Here is my script:

Import subprocess as sp
sp.Popen([r'C:/Folder/folder/a.exe'])
??????
????????
print("test")

The question marks are the things that I don't know.

A-K

Asked: 2019-04-21 19:16:58 +0800 CST

What's the difference between Try Ubuntu and Install Ubuntu option in VirtualBox?

4

What is the difference between "Try Ubuntu" and "Install Ubuntu" option?

I have VirtualBox on my Windows 10 and have downloaded Ubuntu ISO on my desktop.

I have configured the VirtualBox and provide Ubuntu ISO for Virtualbox. I get two options - "Try Ubuntu" and "Install Ubuntu". I am not sure which option to take.

My requirement is I want to run my Selenium scripts in parallel for which I need multiple machines, hence using VM.

If I select Install Ubuntu will it alter my laptop's file system? I intend to use VM temporarily to learn the concept of parallel script execution across multiple machines. Post learning I want to remove VirtualBox and don't want Ubuntu.

shubham bansal

Asked: 2018-05-30 03:04:58 +0800 CST

I want to install selenium webdriver in my Ubuntu 16.04 system for python

3

When I install Selenium I get the following error:

Shubham@Shubham-To-be-filled-by-O-E-M:~$ sudo apt-get update
    Get:1 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
    Hit:2 https://repo.skype.com/deb stable InRelease                       
    Hit:3 http://in.archive.ubuntu.com/ubuntu xenial InRelease                     
    Get:4 http://in.archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
    Get:5 http://in.archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]  
    Fetched 323 kB in 8s (38.6 kB/s)                                               
    Reading package lists... Done
    Shubham@Shubham-To-be-filled-by-O-E-M:~$ sudo pip install selenium
    Traceback (most recent call last):
      File "/usr/bin/pip", line 9, in <module>
        from pip import main
    ImportError: cannot import name main

How should I proceed?

Cannot create a crontab job for my scrapy program

Update

Selenium issues webdriver

How to make python wait for a program to stop before going to the next line of code

What's the difference between Try Ubuntu and Install Ubuntu option in VirtualBox?

I want to install selenium webdriver in my Ubuntu 16.04 system for python

How to install Google Chrome

Is there a command to list all users? Also to add, delete, modify users, in the terminal?

How to delete a non-empty directory in Terminal?

How to unzip a zip file from the Terminal?

How can I copy the contents of a folder to another folder in a different directory using terminal?

How do I install a .deb file via the command line?

How do I run .sh scripts?

How do I install a .tar.gz (or .tar.bz2) file?

How to list all installed packages

Unable to lock the administration directory (/var/lib/dpkg/) is another process using it?

Questions[selenium](ubuntu)

Update